A computational framework of routine test data for the cost-effective chronic disease prediction.
Mingzhu LiuJian ZhouQilemuge XiYuchao LiangHaicheng LiPengfei LiangYuting GuoMing LiuTemuqile TemuqileLei YangYongchun ZuoPublished in: Briefings in bioinformatics (2023)
Chronic diseases, because of insidious onset and long latent period, have become the major global disease burden. However, the current chronic disease diagnosis methods based on genetic markers or imaging analysis are challenging to promote completely due to high costs and cannot reach universality and popularization. This study analyzed massive data from routine blood and biochemical test of 32 448 patients and developed a novel framework for cost-effective chronic disease prediction with high accuracy (AUC 87.32%). Based on the best-performing XGBoost algorithm, 20 classification models were further constructed for 17 types of chronic diseases, including 9 types of cancers, 5 types of cardiovascular diseases and 3 types of mental illness. The highest accuracy of the model was 90.13% for cardia cancer, and the lowest was 76.38% for rectal cancer. The model interpretation with the SHAP algorithm showed that CREA, R-CV, GLU and NEUT% might be important indices to identify the most chronic diseases. PDW and R-CV are also discovered to be crucial indices in classifying the three types of chronic diseases (cardiovascular disease, cancer and mental illness). In addition, R-CV has a higher specificity for cancer, ALP for cardiovascular disease and GLU for mental illness. The association between chronic diseases was further revealed. At last, we build a user-friendly explainable machine-learning-based clinical decision support system (DisPioneer: http://bioinfor.imu.edu.cn/dispioneer) to assist in predicting, classifying and treating chronic diseases. This cost-effective work with simple blood tests will benefit more people and motivate clinical implementation and further investigation of chronic diseases prevention and surveillance program.
Keyphrases
- mental illness
- cardiovascular disease
- machine learning
- mental health
- papillary thyroid
- deep learning
- squamous cell
- clinical decision support
- electronic health record
- rectal cancer
- big data
- lymph node metastasis
- type diabetes
- newly diagnosed
- healthcare
- end stage renal disease
- clinical practice
- artificial intelligence
- primary care
- gene expression
- metabolic syndrome
- dna methylation
- cardiovascular risk factors
- childhood cancer
- genome wide
- radiation therapy
- locally advanced
- high resolution
- young adults