Login / Signup

An introduction to machine learning for classification and prediction.

Jason E BlackJacqueline K KueperTyler S Williamson
Published in: Family practice (2022)
Classification and prediction tasks are common in health research. With the increasing availability of vast health data repositories (e.g. electronic medical record databases) and advances in computing power, traditional statistical approaches are being augmented or replaced with machine learning (ML) approaches to classify and predict health outcomes. ML describes the automated process of identifying ("learning") patterns in data to perform tasks. Developing an ML model includes selecting between many ML models (e.g. decision trees, support vector machines, neural networks); model specifications such as hyperparameter tuning; and evaluation of model performance. This process is conducted repeatedly to find the model and corresponding specifications that optimize some measure of model performance. ML models can make more accurate classifications and predictions than their statistical counterparts and confer greater flexibility when modelling unstructured data or interactions between covariates; however, many ML models require larger sample sizes to achieve good classification or predictive performance and have been criticized as "black box" for their poor transparency and interpretability. ML holds potential in family medicine for risk profiling of patients' disease risk and clinical decision support to present additional information at times of uncertainty or high demand. In the future, ML approaches are positioned to become commonplace in family medicine. As such, it is important to understand the objectives that can be addressed using ML approaches and the associated techniques and limitations. This article provides a brief introduction into the use of ML approaches for classification and prediction tasks in family medicine.
Keyphrases