Scalable Nonparametric Prescreening Method for Searching Higher-Order Genetic Interactions Underlying Quantitative Traits.
Juho A J KontioMikko J SillanpääPublished in: Genetics (2019)
Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia.
Keyphrases
- genome wide
- acute myeloid leukemia
- high throughput
- healthcare
- copy number
- primary care
- dna methylation
- squamous cell carcinoma
- electronic health record
- machine learning
- poor prognosis
- newly diagnosed
- deep learning
- chronic kidney disease
- long non coding rna
- molecularly imprinted
- binding protein
- artificial intelligence
- patient reported outcomes
- squamous cell
- genome wide identification
- patient reported
- lymph node metastasis