The 5% Medicare Standard Analytic Files (SAF) are random samples used to analyze national trends in medical treatments, expenditures, and outcomes. Their utility in small-area or multilevel analyses is unknown. To demonstrate possible limitations of the 5% SAF for analysis of health behaviors in small areas. We use descriptive Chi-square goodness-of-fit tests and mapping to explore consistency in the 5% representation of the 100% population in states and counties. We conduct multilevel modeling of individual utilization of mammography or endoscopy services for cancer screening and contrast findings across the 5% and 100% files. Subjects are enrolled in both parts A and B Medicare coverage and ages 65-104, alive and residing in the same state, with no gaps in coverage during the study period. Identically defined groups are drawn from the 5% SAF and 100% population claims and denominator files. The Chi-square tests of homogeneous population subgroups in 5% and 100% files exhibit significant differences in 7 of 8 states. Maps confirm this among states' counties and find that one state is generally under-represented by the 5% SAF, while others show areas with variable representation. Multilevel modeling results are largely consistent across the partitions of the data, but 5% sample models have much lower statistical power. Area-level covariate effect estimates show some differences across the two datasets. Multilevel modeling with contextual variables may be misleading in small area analyses conducted using 5% Medicare SAFs. Provider supply and market characteristics show inconsistent results. Disparities research may benefit from 100% files to provide statistical power needed to detect meaningful differences. This is significant because the Centers for Medicare and Medicaid Services have recently curtailed permissions to use the 100% files. These 100% files are one of few sources of population data available in the U.S. that are representative of small areas in the U.S.. In times of constrained budgets, using population data files is essential so that resources can be targeted to areas robustly identified as having greatest need or gaps in outcomes.
Keyphrases
- affordable care act
- health insurance
- healthcare
- primary care
- electronic health record
- cross sectional
- public health
- type diabetes
- big data
- squamous cell carcinoma
- machine learning
- high resolution
- metabolic syndrome
- deep learning
- papillary thyroid
- mass spectrometry
- contrast enhanced
- rna seq
- data analysis
- artificial intelligence
- small bowel
- health promotion