Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.
Armen AbnousiShira L BroschatAnanth KalyanaramanPublished in: BMC bioinformatics (2018)
The new clustering algorithm can be used to generate meaningful clusters of conserved regions. It is a scalable method that when paired with our prior work, NADDA for detecting conserved regions, provides a complete end-to-end pipeline for annotating protein sequences.