Machine learning and deep learning systems for automated measurement of "advanced" theory of mind: Reliability and validity in children and adolescents.
Rory T DevineVenelin KovatchevImogen Grumley TraynorPhillip SmithMark LeePublished in: Psychological assessment (2023)
Understanding individual differences in theory of mind (ToM; the ability to attribute mental states to others) in middle childhood and adolescence hinges on the availability of robust and scalable measures. Open-ended response tasks yield valid indicators of ToM but are labor intensive and difficult to compare across studies. We examined the reliability and validity of new machine learning and deep learning neural network automated scoring systems for measuring ToM in children and adolescents. Two large samples of British children and adolescents aged between 7 and 13 years (Sample 1: N = 1,135, Mage = 10.22 years, SD = 1.45; Sample 2: N = 1,020, Mage = 10.36 years, SD = 1.27) completed the silent film and strange stories tasks. Teachers rated Sample 2 children's social competence with peers. A single latent-factor explained variation in performance on both the silent film and strange stories task (in Sample 1 and 2) and test performance was sensitive to age-related differences and individual differences within each age-group. A deep learning neural network automated scoring system trained on Sample 1 exhibited interrater reliability and measurement invariance with manual ratings in Sample 2. Validity of ratings from the automated scoring system was supported by unique positive associations between ToM and teacher-rated social competence. The results demonstrate that reliable and valid measures of ToM can be obtained using the new freely available deep learning neural network automated scoring system to rate open-ended text responses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).