Estimates of daily air pollution concentrations with complete spatial and temporal coverage are important for supporting epidemiologic studies and health impact assessments. While numerous approaches have been developed for modeling air pollution, they typically only consider each pollutant separately. We describe a spatial multipollutant data fusion model that combines monitoring measurements and chemical transport model simulations that leverages dependence between pollutants to improve spatial prediction. For the contiguous United States, we created a data product of daily concentration for 12 pollutants (CO, NOx, NO 2 , SO 2 , O 3 , PM 10 , and PM 2.5 species EC, OC, NO 3 , NH 4 , SO 4 ) during the period 2005 to 2014. Out-of-sample prediction showed good performance, particularly for daily PM 2.5 species EC (R 2 = 0.64), OC (R 2 = 0.75), NH 4 (R 2 = 0.84), NO 3 (R2 = 0.73), and SO 4 (R 2 = 0.80). By employing the integrated nested Laplace approximation (INLA) for Bayesian inference, our approach also provides model-based prediction error estimates. The daily data product at 12km spatial resolution will be publicly available immediately upon publication. To our knowledge this is the first publicly available data product for major PM 2.5 species and several gases at this spatial and temporal resolution.
Keyphrases
- air pollution
- particulate matter
- electronic health record
- heavy metals
- physical activity
- lung function
- healthcare
- big data
- polycyclic aromatic hydrocarbons
- public health
- data analysis
- single molecule
- reactive oxygen species
- genetic diversity
- artificial intelligence
- room temperature
- single cell
- chronic obstructive pulmonary disease
- ionic liquid
- metal organic framework