Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.

Armen AbnousiShira L BroschatAnanth Kalyanaraman

Published in: BMC bioinformatics (2018)

The new clustering algorithm can be used to generate meaningful clusters of conserved regions. It is a scalable method that when paired with our prior work, NADDA for detecting conserved regions, provides a complete end-to-end pipeline for annotating protein sequences.

Keyphrases

transcription factor
single cell
protein protein
machine learning
rna seq
amino acid
electronic health record
binding protein
deep learning
big data