Integrating Coexpression Networks with GWAS to Prioritize Causal Genes in Maize.
Robert J SchaeferJean-Michel MichnoJoseph JeffersOwen A HoekengaBrian P DilkesIvan R BaxterChad L MyersPublished in: The Plant cell (2018)
Genome-wide association studies (GWAS) have identified loci linked to hundreds of traits in many different species. Yet, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in nonhuman, nonmodel species, where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes. We developed a computational approach, Camoco, that integrates loci identified by GWAS with functional information derived from gene coexpression networks. Using Camoco, we prioritized candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize (Zea mays) seeds. Strikingly, we observed a strong dependence in the performance of our approach based on the type of coexpression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, roots that are the primary elemental uptake and delivery system) outperformed other alternative networks. Two candidate genes identified by our approach were validated using mutants. Our study demonstrates that coexpression networks provide a powerful basis for prioritizing candidate causal genes from GWAS loci but suggests that the success of such strategies can highly depend on the gene expression data context. Both the software and the lessons on integrating GWAS data with coexpression networks generalize to species beyond maize.
Keyphrases
- genome wide
- genome wide association study
- network analysis
- dna methylation
- genome wide association
- gene expression
- copy number
- genome wide identification
- electronic health record
- big data
- genome wide analysis
- bioinformatics analysis
- poor prognosis
- health information
- genetic diversity
- data analysis
- healthcare
- hepatitis c virus
- binding protein
- neural network
- human immunodeficiency virus
- transcription factor
- artificial intelligence