Login / Signup

Machine learning approaches to identify core and dispensable genes in pangenomes.

Alan E YoccaPatrick P Edger
Published in: The plant genome (2021)
A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable). Previous pangenomic studies have identified certain functional differences between core and dispensable genes. However, identifying if a gene belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals. Here we aim to leverage the previously characterized core and dispensable gene content for two grass species [Brachypodium distachyon (L.) P. Beauv. and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome. Such a model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.
Keyphrases
  • genome wide
  • machine learning
  • genome wide identification
  • copy number
  • dna methylation
  • genome wide analysis
  • artificial intelligence
  • big data
  • single cell
  • deep learning