Machine learning approaches for predicting 5-year breast cancer survival: A multicenter study.
Quynh Thi Nhu NguyenPhung Anh Alex NguyenChun-Jung WangPhan Thanh PhucRuo-Kai LinChin-Sheng HungNei-Hui KuoYu-Wen ChengShwu-Jiuan LinZong-You HsiehChi-Tsun ChengMin-Huei HsuJason C HsuPublished in: Cancer science (2023)
The study used clinical data to develop a prediction model for breast cancer survival. Breast cancer prognostic factors were explored using machine learning techniques. We conducted a retrospective study using data from the Taipei Medical University Clinical Research Database, which contains electronic medical records from three affiliated hospitals in Taiwan. The study included female patients aged over 20 years who were diagnosed with primary breast cancer and had medical records in hospitals between January 1, 2009 and December 31, 2020. The data were divided into training and external testing datasets. Nine different machine learning algorithms were applied to develop the models. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. A total of 3914 patients were included in the study. The highest AUC of 0.95 was observed with the artificial neural network model (accuracy, 0.90; sensitivity, 0.71; specificity, 0.73; PPV, 0.28; NPV, 0.94; and F1-score, 0.37). Other models showed relatively high AUC, ranging from 0.75 to 0.83. According to the optimal model results, cancer stage, tumor size, diagnosis age, surgery, and body mass index were the most critical factors for predicting breast cancer survival. The study successfully established accurate 5-year survival predictive models for breast cancer. Furthermore, the study found key factors that could affect breast cancer survival in Taiwanese women. Its results might be used as a reference for the clinical practice of breast cancer treatment.
Keyphrases
- machine learning
- prognostic factors
- body mass index
- end stage renal disease
- big data
- metabolic syndrome
- minimally invasive
- clinical practice
- ejection fraction
- electronic health record
- neural network
- atrial fibrillation
- deep learning
- coronary artery disease
- peritoneal dialysis
- polycystic ovary syndrome
- skeletal muscle
- mass spectrometry
- high resolution
- insulin resistance
- weight loss
- breast cancer risk