Comparing Machine Learning Models and Human Raters When Ranking Medical Student Performance Evaluations.
Jonathan D KibbleJeffrey H PlochockiPublished in: Journal of graduate medical education (2023)
The rubric for manual grading provided reliable overall scoring and ranking of MSPEs. The MLMs accurately detected positive sentiment in the MSPEs but were unable to provide reliable rank ordering.