Harnessing the predicted maize pan-interactome for putative gene function prediction and prioritization of candidate genes for important traits.
Elly PoretskyHalise Busra CagiriciCarson M AndorfTaner Z SenPublished in: G3 (Bethesda, Md.) (2024)
The recent assembly and annotation of the 26 maize nested association mapping population founder inbreds have enabled large-scale pan-genomic comparative studies. These studies have expanded our understanding of agronomically important traits by integrating pan-transcriptomic data with trait-specific gene candidates from previous association mapping results. In contrast to the availability of pan-transcriptomic data, obtaining reliable protein-protein interaction (PPI) data has remained a challenge due to its high cost and complexity. We generated predicted PPI networks for each of the 26 genomes using the established STRING database. The individual genome-interactomes were then integrated to generate core- and pan-interactomes. We deployed the PPI clustering algorithm ClusterONE to identify numerous PPI clusters that were functionally annotated using gene ontology (GO) functional enrichment, demonstrating a diverse range of enriched GO terms across different clusters. Additional cluster annotations were generated by integrating gene coexpression data and gene description annotations, providing additional useful information. We show that the functionally annotated PPI clusters establish a useful framework for protein function prediction and prioritization of candidate genes of interest. Our study not only provides a comprehensive resource of predicted PPI networks for 26 maize genomes but also offers annotated interactome clusters for predicting protein functions and prioritizing gene candidates. The source code for the Python implementation of the analysis workflow and a standalone web application for accessing the analysis results are available at https://github.com/eporetsky/PanPPI.
Keyphrases
- computed tomography
- protein protein
- genome wide
- small molecule
- copy number
- magnetic resonance imaging
- contrast enhanced
- electronic health record
- genome wide identification
- big data
- dna methylation
- healthcare
- gene expression
- machine learning
- single cell
- high resolution
- rna seq
- emergency department
- magnetic resonance
- mass spectrometry
- high density