Entropy and Variability: A Second Opinion by Deep Learning.
Daniel T RademakerLi C XuePeter A C 't HoenGert VriendPublished in: Biomolecules (2022)
We applied a deep learning, unsupervised feature extraction method to analyse the multiple sequence alignments of all human proteins. An auto-encoder neural architecture was trained on 27,835 multiple sequence alignments for human proteins to obtain the two features that best describe the seven million variability patterns. These two unsupervised learned features strongly resemble entropy and variability, indicating that these are the projections that retain most information when reducing the dimensionality of the information hidden in columns in multiple sequence alignments.