Genome-wide Analysis of Large-scale Longitudinal Outcomes using Penalization -GALLOP algorithm.
Karolina SikorskaEmmanuel LesaffrePatrick J F GroenenFernando RivadeneiraPaul H C EilersPublished in: Scientific reports (2018)
Genome-wide association studies (GWAS) with longitudinal phenotypes provide opportunities to identify genetic variations associated with changes in human traits over time. Mixed models are used to correct for the correlated nature of longitudinal data. GWA studies are notorious for their computational challenges, which are considerable when mixed models for thousands of individuals are fitted to millions of SNPs. We present a new algorithm that speeds up a genome-wide analysis of longitudinal data by several orders of magnitude. It solves the equivalent penalized least squares problem efficiently, computing variances in an initial step. Factorizations and transformations are used to avoid inversion of large matrices. Because the system of equations is bordered, we can re-use components, which can be precomputed for the mixed model without a SNP. Two SNP effects (main and its interaction with time) are obtained. Our method completes the analysis a thousand times faster than the R package lme4, providing an almost identical solution for the coefficients and p-values. We provide an R implementation of our algorithm.
Keyphrases
- genome wide
- dna methylation
- machine learning
- cross sectional
- genome wide association
- copy number
- deep learning
- electronic health record
- endothelial cells
- primary care
- big data
- case control
- metabolic syndrome
- magnetic resonance imaging
- gene expression
- induced pluripotent stem cells
- high density
- artificial intelligence
- genetic diversity
- data analysis
- skeletal muscle
- glycemic control
- genome wide association study