Login / Signup

Intra-Examiner Reliability and Validity of Sagittal Cervical Spine Mensuration Methods Using Deep Convolutional Neural Networks.

Mohammad Mehdi HosseiniMohammad H MahoorJason W HaasJoseph R FerrantelliAnne-Lise DupuisJason O JaegerDeed Eric Harrison
Published in: Journal of clinical medicine (2024)
Background: The biomechanical analysis of spine and postural misalignments is important for surgical and non-surgical treatment of spinal pain. We investigated the examiner reliability of sagittal cervical alignment variables compared to the reliability and concurrent validity of computer vision algorithms used in the PostureRay ® software 2024. Methods: A retrospective database of 254 lateral cervical radiographs of patients between the ages of 11 and 86 is studied. The radiographs include clearly visualized C1-C7 vertebrae that were evaluated by a human using the software. To evaluate examiner reliability and the concurrent validity of the trained CNN performance, two blinded trials of radiographic digitization were performed by an extensively trained expert user (US) clinician with a two-week interval between trials. Then, the same clinician used the trained CNN twice to reproduce the same measures within a 2-week interval on the same 254 radiographs. Measured variables included segmental angles as relative rotation angles (RRA) C1-C7, Cobb angles C2-C7, relative segmental translations (RT) C1-C7, anterior translation C2-C7, and absolute rotation angle (ARA) C2-C7. Data were remotely extracted from the examiner's PostureRay ® system for data collection and sorted based on gender and stratification of degenerative changes. Reliability was assessed via intra-class correlations (ICC), root mean squared error (RMSE), and R 2 values. Results: In comparing repeated measures of the CNN network to itself, perfect reliability was found for the ICC (1.0), RMSE (0), and R 2 (1). The reliability of the trained expert US was in the excellent range for all variables, where 12/18 variables had ICCs ≥ 0.9 and 6/18 variables were 0.84 ≤ ICCs ≤ 0.89. Similarly, for the expert US, all R 2 values were in the excellent range (R 2 ≥ 0.7), and all RMSEs were small, being 0.42 ≤ RMSEs ≤ 3.27. Construct validity between the expert US and the CNN network was found to be in the excellent range with 18/18 ICCs in the excellent range (ICCs ≥ 0.8), 16/18 R 2 values in the strong to excellent range (R 2 ≥ 0.7), and 2/18 in the good to moderate range (R 2 RT C6/C7 = 0.57 and R 2 Cobb C6/C7 = 0.64. The RMSEs for expert US vs. the CNN network were small, being 0.37 ≤ RMSEs ≤ 2.89. Conclusions: A comparison of repeated measures within the computer vision CNN network and expert human found exceptional reliability and excellent construct validity when comparing the computer vision to the human observer.
Keyphrases