Machine learning algorithm-based risk prediction model of coronary artery disease.

Shaik Mohammad Naushad Tajamul Hussain Bobbala IndumathiKhatoon SamreenSalman A AlrokayanVijay Kumar Kutala

Published in: Molecular biology reports (2018)

In view of high mortality associated with coronary artery disease (CAD), development of an early predicting tool will be beneficial in reducing the burden of the disease. The database comprising demographic, conventional, folate/xenobiotic genetic risk factors of 648 subjects (364 cases of CAD and 284 healthy controls) was used as the basis to develop CAD risk and percentage stenosis prediction models using ensemble machine learning algorithms (EMLA), multifactor dimensionality reduction (MDR) and recursive partitioning (RP). The EMLA model showed better performance than other models in disease (89.3%) and stenosis prediction (82.5%). This model depicted hypertension and alcohol intake as the key predictors of CAD risk followed by cSHMT C1420T, GCPII C1561T, diabetes, GSTT1, CYP1A1 m2, TYMs 5'-UTR 28 bp tandem repeat and MTRR A66G. MDR and RP models are in agreement in projecting increasing age, hypertension and cSHMTC1420T as the key determinants interacting in modulating CAD risk. Receiver operating characteristic curves exhibited clinical utility of the developed models in the following order: EMLA (C = 0.96) > RP (C = 0.83) > MDR (C = 0.80). The stenosis prediction model showed that xenobiotic pathway genetic variants i.e. CYP1A1 m2 and GSTT1 are the key determinants of percentage of stenosis. Diabetes, diet, alcohol intake, hypertension and MTRR A66G are the other determinants of stenosis. These eleven variables contribute towards 82.5% stenosis. To conclude, the EMLA model exhibited higher predictability both in terms of disease prediction and stenosis prediction. This can be attributed to higher number of iterations in EMLA model that can increase the prediction accuracy.

Keyphrases