Machine Learning Prediction of Prediabetes in a Young Male Chinese Cohort with 5.8-Year Follow-Up.
Chi-Hao LiuChun-Feng ChangI-Chien ChenFan-Min LinShiow-Jyu TzouChung-Bao HsiehTa-Wei ChuDee PeiPublished in: Diagnostics (Basel, Switzerland) (2024)
The identification of risk factors for future prediabetes in young men remains largely unexamined. This study enrolled 6247 young ethnic Chinese men with normal fasting plasma glucose at the baseline (FPG base ), and used machine learning (Mach-L) methods to predict prediabetes after 5.8 years. The study seeks to achieve the following: 1. Evaluate whether Mach-L outperformed traditional multiple linear regression (MLR). 2. Identify the most important risk factors. The baseline data included demographic, biochemistry, and lifestyle information. Two models were built, where Model 1 included all variables and Model 2 excluded FPG base, since it had the most profound effect on prediction. Random forest, stochastic gradient boosting, eXtreme gradient boosting, and elastic net were used, and the model performance was compared using different error metrics. All the Mach-L errors were smaller than those for MLR, thus Mach-L provided the most accurate results. In descending order of importance, the key factors for Model 1 were FPG base , body fat (BF), creatinine (Cr), thyroid stimulating hormone (TSH), WBC, and age, while those for Model 2 were BF, white blood cell, age, TSH, TG, and LDL-C. We concluded that FPG base was the most important factor to predict future prediabetes. However, after removing FPG base , WBC, TSH, BF, HDL-C, and age were the key factors after 5.8 years.
Keyphrases
- machine learning
- risk factors
- middle aged
- metabolic syndrome
- type diabetes
- cardiovascular disease
- stem cells
- single cell
- big data
- high resolution
- adipose tissue
- weight loss
- mesenchymal stem cells
- social media
- autism spectrum disorder
- bone marrow
- current status
- intellectual disability
- uric acid
- adverse drug
- low density lipoprotein