Deep learning for 3D cephalometric landmarking with heterogeneous multi-center CBCT dataset.

Jaakko SahlstenJorma JärnstedtJoel JaskariHanna NaukkarinenPhattaranant MahasantipiyaArnon CharuakkraKrista VasankariAri HietanenOsku SundqvistAntti LehtinenKimmo Kaski

Published in: PloS one (2024)

Cephalometric analysis is critically important and common procedure prior to orthodontic treatment and orthognathic surgery. Recently, deep learning approaches have been proposed for automatic 3D cephalometric analysis based on landmarking from CBCT scans. However, these approaches have relied on uniform datasets from a single center or imaging device but without considering patient ethnicity. In addition, previous works have considered a limited number of clinically relevant cephalometric landmarks and the approaches were computationally infeasible, both impairing integration into clinical workflow. Here our aim is to analyze the clinical applicability of a light-weight deep learning neural network for fast localization of 46 clinically significant cephalometric landmarks with multi-center, multi-ethnic, and multi-device data consisting of 309 CBCT scans from Finnish and Thai patients. The localization performance of our approach resulted in the mean distance of 1.99 ± 1.55 mm for the Finnish cohort and 1.96 ± 1.25 mm for the Thai cohort. This performance turned out to be clinically significant i.e., ≤ 2 mm with 61.7% and 64.3% of the landmarks with Finnish and Thai cohorts, respectively. Furthermore, the estimated landmarks were used to measure cephalometric characteristics successfully i.e., with ≤ 2 mm or ≤ 2° error, on 85.9% of the Finnish and 74.4% of the Thai cases. Between the two patient cohorts, 33 of the landmarks and all cephalometric characteristics had no statistically significant difference (p < 0.05) measured by the Mann-Whitney U test with Benjamini-Hochberg correction. Moreover, our method is found to be computationally light, i.e., providing the predictions with the mean duration of 0.77 s and 2.27 s with single machine GPU and CPU computing, respectively. Our findings advocate for the inclusion of this method into clinical settings based on its technical feasibility and robustness across varied clinical datasets.

Keyphrases