Login / Signup

A Bayesian approach for de-duplication in the presence of relational data.

Juan SosaAbel Rodríguez
Published in: Journal of applied statistics (2022)
In this paper, we study the impact of combining profile and network data in solving record de-duplication problems. We also assess the influence of a range of prior distributions on the linkage structure, and explore the use of stochastic gradient Hamiltonian Monte Carlo methods as a faster alternative to obtain samples from the posterior distribution for network parameters. Our methodology is evaluated using the RLdata500 data, which is a popular dataset in the record linkage literature.
Keyphrases
  • monte carlo
  • electronic health record
  • big data
  • mental health
  • systematic review
  • genome wide
  • gene expression
  • data analysis
  • dna methylation
  • hiv testing
  • hiv infected
  • antiretroviral therapy