The Aristotle Classifier: Using the Whole Glycomic Profile To Indicate a Disease State.
David HuaMilani Wijeweera PatabandigeEden P GoHeather DesairePublished in: Analytical chemistry (2019)
"The totality is not, as it were, a mere heap, but the whole is something besides the parts."-Aristotle. We built a classifier that uses the totality of the glycomic profile, not restricted to a few glycoforms, to differentiate samples from two different sources. This approach, which relies on using thousands of features, is a radical departure from current strategies, where most of the glycomic profile is ignored in favor of selecting a few features, or even a single feature, meant to capture the differences in sample types. The classifier can be used to differentiate the source of the material; applicable sources may be different species of animals, different protein production methods, or, most importantly, different biological states (disease vs healthy). The classifier can be used on glycomic data in any form, including derivatized monosaccharides, intact glycans, or glycopeptides. It takes advantage of the fact that changing the source material can cause a change in the glycomic profile in many subtle ways: some glycoforms can be upregulated, some downregulated, some may appear unchanged, yet their proportion-with respect to other forms present-can be altered to a detectable degree. By classifying samples using the entirety of their glycan abundances, along with the glycans' relative proportions to each other, the "Aristotle Classifier" is more effective at capturing the underlying trends than standard classification procedures used in glycomics, including PCA (principal components analysis). It also outperforms workflows where a single, representative glycomic-based biomarker is used to classify samples. We describe the Aristotle Classifier and provide several examples of its utility for biomarker studies and other classification problems using glycomic data from several sources.