Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction.

Robbin Bouwmeester Lennart MartensSven Degroeve

Published in: Analytical chemistry (2019)

Liquid chromatography is a core component of almost all mass spectrometric analyses of (bio)molecules. Because of the high-throughput nature of mass spectrometric analyses, the interpretation of these chromatographic data increasingly relies on informatics solutions that attempt to predict an analyte's retention time. The key components of such predictive algorithms are the features these are supplies with, and the actual machine learning algorithm used to fit the model parameters. Therefore, we have evaluated the performance of seven machine learning algorithms on 36 distinct metabolomics data sets, using two distinct feature sets. Interestingly, the results show that no single learning algorithm performs optimally for all data sets, with different types of algorithms achieving top performance for different types of analytes or different protocols. Our results thus show that an evaluation of machine learning algorithms for retention time prediction is needed to find a suitable algorithm for specific analytes or protocols. Importantly, however, our results also show that blending different types of models together decreases the error on outliers, indicating that the combination of several approaches holds substantial promise for the development of more generic, high-performing algorithms.

Keyphrases