Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting.
Muxin GuMichael BuckleyPublished in: BMC bioinformatics (2018)
This method consistently achieves higher accuracy than two-dimensional principal component analysis and similar accuracy with hierarchical clustering using optimised parameters, which greatly reduces requirements for human input. Within the vertebrata, we demonstrate that this method was able to achieve the taxonomic resolution of family or sub-family level whereas the genus- or species-level identification may require manual interpretation or further experiments. In addition, it also identifies additional species biomarkers than those previously published.