Login / Signup

Nonparametric Mass Imputation for Data Integration.

Sixia ChenShu YangJae Kwang Kim
Published in: Journal of survey statistics and methodology (2020)
Data integration combining a probability sample with another nonprobability sample is an emerging area of research in survey sampling. We consider the case when the study variable of interest is measured only in the nonprobability sample, but comparable auxiliary information is available for both data sources. We consider mass imputation for the probability sample using the nonprobability data as the training set for imputation. The parametric mass imputation is sensitive to parametric model assumptions. To develop improved and robust methods, we consider nonparametric mass imputation for data integration. In particular, we consider kernel smoothing for a low-dimensional covariate and generalized additive models for a relatively high-dimensional covariate for imputation. Asymptotic theories and variance estimation are developed. Simulation studies and real applications show the benefits of our proposed methods over parametric counterparts.
Keyphrases
  • electronic health record
  • big data
  • healthcare
  • machine learning
  • data analysis
  • cross sectional
  • case control