Comparing classification models-a practical tutorial.

Published in: Journal of computer-aided molecular design (2021)

While machine learning models have become a mainstay in Cheminformatics, the field has yet to agree on standards for model evaluation and comparison. In many cases, authors compare methods by performing multiple folds of cross-validation and reporting the mean value for an evaluation metric such as the area under the receiver operating characteristic. These comparisons of mean values often lack statistical rigor and can lead to inaccurate conclusions. In the interest of encouraging best practices, this tutorial provides an example of how multiple methods can be compared in a statistically rigorous fashion.

Keyphrases

machine learning
deep learning
primary care
healthcare
artificial intelligence
big data
clinical evaluation