Partial Least Squares Discriminant Analysis and Bayesian Networks for Metabolomic Prediction of Childhood Asthma.
Rachel S KellyMichael J McGeachieKathleen A Lee-SarwarPriyadarshini KachrooSu H ChuYamini V VirkudMengna HuangAugusto A LitonjuaScott T WeissJessica Lasky-SuPublished in: Metabolites (2018)
To explore novel methods for the analysis of metabolomics data, we compared the ability of Partial Least Squares Discriminant Analysis (PLS-DA) and Bayesian networks (BN) to build predictive plasma metabolite models of age three asthma status in 411 three year olds (n = 59 cases and 352 controls) from the Vitamin D Antenatal Asthma Reduction Trial (VDAART) study. The standard PLS-DA approach had impressive accuracy for the prediction of age three asthma with an Area Under the Curve Convex Hull (AUCCH) of 81%. However, a permutation test indicated the possibility of overfitting. In contrast, a predictive Bayesian network including 42 metabolites had a significantly higher AUCCH of 92.1% (p for difference < 0.001), with no evidence that this accuracy was due to overfitting. Both models provided biologically informative insights into asthma; in particular, a role for dysregulated arginine metabolism and several exogenous metabolites that deserve further investigation as potential causative agents. As the BN model outperformed the PLS-DA model in both accuracy and decreased risk of overfitting, it may therefore represent a viable alternative to typical analytical approaches for the investigation of metabolomics data.
Keyphrases
- chronic obstructive pulmonary disease
- lung function
- allergic rhinitis
- ms ms
- pregnant women
- electronic health record
- magnetic resonance imaging
- clinical trial
- nitric oxide
- magnetic resonance
- randomized controlled trial
- computed tomography
- study protocol
- climate change
- air pollution
- risk assessment
- phase iii
- open label
- double blind