MCRiceRepGP: a framework for the identification of genes associated with sexual reproduction in rice.
Agnieszka A GoliczPrem L BhallaMohan B SinghPublished in: The Plant journal : for cell and molecular biology (2018)
Rice is an important cereal crop, being a staple food for over half of the world's population, and sexual reproduction resulting in grain formation underpins global food security. However, despite considerable research efforts, many of the genes, especially long intergenic non-coding RNA (lincRNA) genes, involved in sexual reproduction in rice remain uncharacterized. With an increasing number of public resources becoming available, information from different sources can be combined to perform gene functional annotation. We report the development of MCRiceRepGP, a machine learning framework which integrates heterogeneous evidence and employs multicriteria decision analysis and machine learning to predict coding and lincRNA genes involved in sexual reproduction in rice. The rice genome was reannotated using deep-sequencing transcriptomic data from reproduction-associated tissue/cell types identifying previously unannotated putative protein-coding genes and lincRNAs. MCRiceRepGP was used for genome-wide discovery of sexual reproduction associated coding and lincRNA genes. The protein-coding and lincRNA genes identified have distinct expression profiles, with a large proportion of lincRNAs reaching maximum expression levels in the sperm cells. Some of the genes are potentially linked to male- and female-specific fertility and heat stress tolerance during the reproductive stage. MCRiceRepGP can be used in combination with other genome-wide studies, such as genome-wide association studies, giving greater confidence that the genes identified are associated with the biological process of interest. As more data, especially about mutant plant phenotypes, become available, the power of MCRiceRepGP will grow, providing researchers with a tool to identify candidate genes for future experiments. MCRiceRepGP is available as a web application (http://mcgplannotator.com/MCRiceRepGP/).
Keyphrases
- genome wide
- dna methylation
- machine learning
- bioinformatics analysis
- genome wide identification
- mental health
- copy number
- heat stress
- single cell
- healthcare
- poor prognosis
- rna seq
- gene expression
- induced apoptosis
- risk assessment
- stem cells
- genome wide association
- deep learning
- social media
- binding protein
- cell cycle arrest
- oxidative stress
- transcription factor
- amino acid
- drug induced
- human health
- pi k akt