Organizing the bacterial annotation space with amino acid sequence embeddings.
Susanna R GrigsonJody C McKerralJames G MitchellRobert A EdwardsPublished in: BMC bioinformatics (2022)
This study demonstrates that amino acid sequence embeddings may be a powerful tool for developing more robust ontologies for annotating protein sequence data. In addition, embeddings may be beneficial for clustering protein sequences with unknown functions and selecting optimal candidate proteins to characterize experimentally.