Penalized mediation models for multivariate data.
Daniel J SchaidOzan DikilitasJason P SinnwellIftikhar J KulloPublished in: Genetic epidemiology (2021)
Statistical methods to integrate multiple layers of data, from exposures to intermediate traits to outcome variables, are needed to guide interpretation of complex data sets for which variables are likely contributing in a causal pathway from exposure to outcome. Statistical mediation analysis based on structural equation models provide a general modeling framework, yet they can be difficult to apply to high-dimensional data and they are not automated to select the best fitting model. To overcome these limitations, we developed novel algorithms and software to simultaneously evaluate multiple exposure variables, multiple intermediate traits, and multiple outcome variables. Our penalized mediation models are computationally efficient and simulations demonstrate that they produce reliable results for large data sets. Application of our methods to a study of vascular disease demonstrates their utility to identify novel direct effects of single-nucleotide polymorphisms (SNPs) on coronary heart disease and peripheral artery disease, while disentangling the effects of SNPs on the intermediate risk factors including lipids, cigarette smoking, systolic blood pressure, and type 2 diabetes.
Keyphrases
- blood pressure
- electronic health record
- type diabetes
- big data
- risk factors
- machine learning
- genome wide
- data analysis
- peripheral artery disease
- social support
- heart failure
- deep learning
- gene expression
- high throughput
- molecular dynamics
- artificial intelligence
- heart rate
- insulin resistance
- metabolic syndrome
- dna methylation
- skeletal muscle
- fatty acid
- hypertensive patients
- weight loss