Structure memes: Intuitive visualization of sequence logo and subfamily logo information in a 3D protein-structural context.
Eric BeitzPublished in: Proteins (2021)
The number of available protein sequences covering virtually all known species is tremendous and ever growing due to the feasibility of the underlying nucleotide sequencing. The speed at which protein structures are being determined is increasing, and as a result of refined cryo-electron microscopy the proportion of solved membrane protein folds is expanding. Sequence data are used to illustrate evolution and to group proteins into families with various levels of subfamilies. Structure data of prototypical proteins provide insight into function brought about by an interplay of specific amino acid residues that are dispersed throughout the sequence. Visually combining rich sequence information with structure data in an intuitively comprehensible way would enhance the process of elucidating key protein aspects regarding evolution, sequence relations, and function. Here, a method is described that projects the information contained in sequence logos and subfamily logos onto protein structures. The amino acid composition at a site is encoded by a mix color in the red-yellow-blue space and the information content is presented by the radius of a sphere at the α-carbon position. The resulting display is termed "structure meme." The underlying sequence and atom coordinate data are retained in the file for simple retrieval on demand using a molecular structure visualization program. Structure memes are recognizable and convey extensive information in a human-discernable way that requires little training.