Mediation analysis of multiple mediators with incomplete omics data.
John KiddChelsea K RaulersonKaren L MohlkeDan-Yu LinPublished in: Genetic epidemiology (2022)
There is an increasing interest in using multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation, protein expressions, and metabolic profiles) to study how the relationships between phenotypes and genotypes may be mediated by other omics markers. Genotypes and phenotypes are typically available for all subjects in genetic studies, but typically, some omics data will be missing for some subjects, due to limitations such as cost and sample quality. In this article, we propose a powerful approach for mediation analysis that accommodates missing data among multiple mediators and allows for various interaction effects. We formulate the relationships among genetic variants, other omics measurements, and phenotypes through linear regression models. We derive the joint likelihood for models with two mediators, accounting for arbitrary patterns of missing values. Utilizing computationally efficient and stable algorithms, we conduct maximum likelihood estimation. Our methods produce unbiased and statistically efficient estimators. We demonstrate the usefulness of our methods through simulation studies and an application to the Metabolic Syndrome in Men study.
Keyphrases
- single cell
- metabolic syndrome
- electronic health record
- big data
- machine learning
- genome wide
- dna methylation
- type diabetes
- cardiovascular disease
- gene expression
- insulin resistance
- uric acid
- deep learning
- adipose tissue
- cell free
- single molecule
- data analysis
- copy number
- middle aged
- circulating tumor
- depressive symptoms