A Bayesian approach for de-duplication in the presence of relational data.
Juan SosaAbel RodríguezPublished in: Journal of applied statistics (2022)
In this paper, we study the impact of combining profile and network data in solving record de-duplication problems. We also assess the influence of a range of prior distributions on the linkage structure, and explore the use of stochastic gradient Hamiltonian Monte Carlo methods as a faster alternative to obtain samples from the posterior distribution for network parameters. Our methodology is evaluated using the RLdata500 data, which is a popular dataset in the record linkage literature.