Performance Analysis of Conventional Machine Learning Algorithms for Identification of Chronic Kidney Disease in Type 1 Diabetes Mellitus Patients.
Nakib Hayat ChowdhuryMamun Bin Ibne ReazFahmida HaqueShamim AhmadSawal Hamid Bin Mohd AliAhmad Ashrif A BakarMohammad Arif Sobhan BhuiyanPublished in: Diagnostics (Basel, Switzerland) (2021)
Chronic kidney disease (CKD) is one of the severe side effects of type 1 diabetes mellitus (T1DM). However, the detection and diagnosis of CKD are often delayed because of its asymptomatic nature. In addition, patients often tend to bypass the traditional urine protein (urinary albumin)-based CKD detection test. Even though disease detection using machine learning (ML) is a well-established field of study, it is rarely used to diagnose CKD in T1DM patients. This research aimed to employ and evaluate several ML algorithms to develop models to quickly predict CKD in patients with T1DM using easily available routine checkup data. This study analyzed 16 years of data of 1375 T1DM patients, obtained from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials directed by the National Institute of Diabetes, Digestive, and Kidney Diseases, USA. Three data imputation techniques (RF, KNN, and MICE) and the SMOTETomek resampling technique were used to preprocess the primary dataset. Ten ML algorithms including logistic regression (LR), k-nearest neighbor (KNN), Gaussian naïve Bayes (GNB), support vector machine (SVM), stochastic gradient descent (SGD), decision tree (DT), gradient boosting (GB), random forest (RF), extreme gradient boosting (XGB), and light gradient-boosted machine (LightGBM) were applied to developed prediction models. Each model included 19 demographic, medical history, behavioral, and biochemical features, and every feature's effect was ranked using three feature ranking techniques (XGB, RF, and Extra Tree). Lastly, each model's ROC, sensitivity (recall), specificity, accuracy, precision, and F-1 score were estimated to find the best-performing model. The RF classifier model exhibited the best performance with 0.96 (±0.01) accuracy, 0.98 (±0.01) sensitivity, and 0.93 (±0.02) specificity. LightGBM performed second best and was quite close to RF with 0.95 (±0.06) accuracy. In addition to these two models, KNN, SVM, DT, GB, and XGB models also achieved more than 90% accuracy.
Keyphrases
- chronic kidney disease
- end stage renal disease
- machine learning
- clinical trial
- ejection fraction
- newly diagnosed
- deep learning
- type diabetes
- peritoneal dialysis
- healthcare
- prognostic factors
- metabolic syndrome
- climate change
- risk factors
- physical activity
- electronic health record
- glycemic control
- artificial intelligence
- real time pcr
- binding protein
- drug induced
- phase ii