External validation of ELASTIC NET regression models including newborn metabolomic markers for postnatal gestational age estimation in East and South-East Asian infants.
Steven HawkenMalia S Q MurphyRobin DucharmeA Brianne BotaLindsay A WilsonWei ChengMa-Am Joy R TumulakMaria Melanie Liberty AlcausinMa Elouisa ReyesWenjuan QiuBeth K PotterJulian LittleMark WalkerLin ZhangCarmencita PadillaPranesh ChakrabortyKumanan WilsonPublished in: Gates open research (2020)
Background: Postnatal gestational age (GA) algorithms derived from newborn metabolic profiles have emerged as a novel method of acquiring population-level preterm birth estimates in low resource settings. To date, model development and validation have been carried out in North American settings. Validation outside of these settings is warranted. Methods: This was a retrospective database study using data from newborn screening programs in Canada, the Philippines and China. ELASTICNET machine learning models were developed to estimate GA in a cohort of infants from Canada using sex, birth weight and metabolomic markers from newborn heel prick blood samples. Final models were internally validated in an independent group of infants, and externally validated in cohorts of infants from the Philippines and China. Results: Cohorts included 39,666 infants from Canada, 82,909 from the Philippines and 4,448 from China. For the full model including sex, birth weight and metabolomic markers, GA estimates were within 5 days of ultrasound values in the Canadian internal validation (mean absolute error (MAE) 0.71, 95% CI: 0.71, 0.72), and within 6 days of ultrasound GA in both the Filipino (0.90 (0.90, 0.91)) and Chinese cohorts (0.89 (0.86, 0.92)). Despite the decreased accuracy in external settings, our models incorporating metabolomic markers performed better than the baseline model, which relied on sex and birth weight alone. In preterm and growth-restricted infants, the accuracy of metabolomic models was markedly higher than the baseline model. Conclusions: Accuracy of metabolic GA algorithms was attenuated when applied in external settings. Models including metabolomic markers demonstrated higher accuracy than models using sex and birth weight alone. As innovators look to take this work to scale, further investigation of modeling and data normalization techniques will be needed to improve robustness and generalizability of metabolomic GA estimates in low resource settings, where this could have the most clinical utility.