Login / Signup

Tucuxi-BLAST: Enabling fast and accurate record linkage of large-scale health-related administrative databases through a DNA-encoded approach.

José Deney AraujoJuan Carlo SilvaAndré Guilherme Costa-MartinsVanderson SampaioDaniel Barros de CastroRobson Francisco de SouzaJeevan GiddaluruPablo Ivan Pereira RamosRobespierre PitaMaurício Lima BarretoManoel Barral NettoHelder Takashi Imoto Nakaya
Published in: PeerJ (2022)
Our method was able to overcome misspellings and typographical errors in administrative databases. In processing the RL of the largest simulated dataset (200k records), the state-of-the-art method took 5 days and 7 h to perform the RL, while Tucuxi-BLAST only took 23 h. When compared with five existing RL tools applied to a gold-standard dataset from real health-related databases, Tucuxi-BLAST had the highest accuracy and speed. By repurposing genomic tools, Tucuxi-BLAST can improve data-driven medical research and provide a fast and accurate way to link individual information across several administrative databases.
Keyphrases
  • big data
  • high resolution
  • patient safety
  • machine learning
  • emergency department
  • genome wide
  • hepatitis c virus
  • adverse drug
  • deep learning
  • human immunodeficiency virus
  • hiv infected
  • antiretroviral therapy