Machine Learning and Risk Assessment: Random Forest Does Not Outperform Logistic Regression in the Prediction of Sexual Recidivism.

Sonja EtzlerFelix D SchönbrodtFlorian PargentReinhard EherMartin Rettenberger

Published in: Assessment (2023)

Although many studies supported the use of actuarial risk assessment instruments (ARAIs) because they outperformed unstructured judgments, it remains an ongoing challenge to seek potentials for improvement of their predictive performance. Machine learning (ML) algorithms, like random forests, are able to detect patterns in data useful for prediction purposes without explicitly programming them (e.g., by considering nonlinear effects between risk factors and the criterion). Therefore, the current study aims to compare conventional logistic regression analyses with the random forest algorithm on a sample of N = 511 adult male individuals convicted of sexual offenses. Data were collected at the Federal Evaluation Center for Violent and Sexual Offenders in Austria within a prospective-longitudinal research design and participants were followed-up for an average of M = 8.2 years. The Static-99, containing static risk factors, and the Stable-2007, containing stable dynamic risk factors, were included as predictors. The results demonstrated no superior predictive performance of the random forest compared with logistic regression; furthermore, methods of interpretable ML did not point to any robust nonlinear effects. Altogether, results supported the statistical use of logistic regression for the development and clinical application of ARAIs.

Keyphrases