This paper presents methods to estimate the number of persons with HIV in North Carolina jails by applying finite population inferential approaches to data collected using web scraping and record linkage techniques. Administrative data are linked with web-scraped rosters of incarcerated persons in a nonrandom subset of counties. Outcome regression and calibration weighting are adapted for state-level estimation. Methods are compared in simulations and are applied to data from the US state of North Carolina. Outcome regression yielded more precise inference and allowed for county-level estimates, an important study objective, while calibration weighting exhibited double robustness under misspecification of the outcome or weight model.
Keyphrases
- hiv testing
- electronic health record
- men who have sex with men
- antiretroviral therapy
- hiv infected
- hiv positive
- human immunodeficiency virus
- big data
- hepatitis c virus
- genome wide
- physical activity
- body mass index
- tertiary care
- data analysis
- machine learning
- molecular dynamics
- low cost
- gene expression
- artificial intelligence