MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification.
Giulia FisconEmanuel WeitschekEleonora CellaAlessandra Lo PrestiMarta GiovanettiMuhammed Babakir-MinaMarco CiottiMassimo CiccozziAlessandra PierangeliPaola BertolazziGiovanni FeliciPublished in: BioData mining (2016)
We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions.