Login / Signup

Exploring the Big Data Paradox for various estimands using vaccination data from the global COVID-19 Trends and Impact Survey (CTIS).

Youqi YangWalter DempseyPeisong HanYashwant DeshmukhSylvia RichardsonBrian TomBhramar Mukherjee
Published in: Science advances (2024)
Selection bias poses a substantial challenge to valid statistical inference in nonprobability samples. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large nonprobability sample, the COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against national benchmark data from the COVID Vaccine Intelligence Network. Notably, CTIS exhibits a larger estimation error on average (0.37) compared to CVoter (0.14). Additionally, we explored the accuracy (regarding mean squared error) of CTIS in estimating successive differences (over time) and subgroup differences (for females versus males) in mean vaccine uptakes. Compared to the overall vaccination rates, targeting these alternative estimands comparing differences or relative differences in two means increased the effective sample size. These results suggest that the Big Data Paradox can manifest in countries beyond the United States and may not apply equally to every estimand of interest.
Keyphrases
  • big data
  • coronavirus disease
  • sars cov
  • artificial intelligence
  • machine learning
  • cross sectional
  • respiratory syndrome coronavirus
  • clinical trial
  • quality improvement
  • randomized controlled trial
  • open label