Prediction of Natural Product Classes Using Machine Learning and 13C NMR Spectroscopic Data.
Saúl H Martínez-TreviñoVíctor Uc-CetinaMaría A Fernández-HerreraGabriel MerinoPublished in: Journal of chemical information and modeling (2020)
Structure elucidation of chemical compounds is a complex and challenging activity that requires expertise and well-suited tools. To assign the molecular structure of a given compound, 13C NMR is one of the most widely used techniques because of its broad range of structural information. Taking into account that molecules found in nature can be grouped into natural product (NP) classes because of structural similarities, we explore the possibility of NP class prediction via 13C NMR data. Employing freely available 13C NMR data of NPs, we trained four classifiers for the prediction of eight common NP classes. The best performance was obtained with the XGBoost classifier reaching f1-scores of above 0.82. We also performed experiments with different percentages of positive samples, including the glycoside presence. Furthermore, we tested cases outside the data set, yielding performances above 80% for most classes. For the chromans case, we restricted the test examples to the coumarin subclass, and the prediction accuracy increased to 100%.