Modeling aspects of the language of life through transfer-learning protein sequences.
Michael HeinzingerAhmed ElnaggarYu WangChristian DallagoDmitrii NechaevFlorian MatthesBurkhard RostPublished in: BMC bioinformatics (2019)
Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.