Login / Signup

Automatic Gene Function Prediction in the 2020's.

Stavros MakrodimitrisRoeland C H J van HamMarcel J T Reinders
Published in: Genes (2020)
The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.
Keyphrases
  • machine learning
  • genome wide
  • genome wide identification
  • deep learning
  • big data
  • dna methylation
  • transcription factor
  • high resolution
  • electronic health record
  • genome wide analysis
  • single molecule
  • drinking water