Leveraging Multiple Layers of Data To Predict Drosophila Complex Traits.
Fabio MorganteWen HuangPeter SørensenChristian MalteccaTrudy F C MackayPublished in: G3 (Bethesda, Md.) (2020)
The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.
Keyphrases
- genome wide
- copy number
- dna methylation
- gene expression
- signaling pathway
- poor prognosis
- type diabetes
- drosophila melanogaster
- adipose tissue
- binding protein
- metabolic syndrome
- oxidative stress
- high resolution
- mass spectrometry
- insulin resistance
- epithelial mesenchymal transition
- health information
- deep learning
- transcription factor
- data analysis
- glycemic control
- quality improvement
- long non coding rna