N-glycan fingerprint predicts alpha-fetoprotein negative hepatocellular carcinoma: A large-scale multicenter study.
Chenjun HuangMeng FangHuijuan FengLijuan LiuYa LiXuewen XuHao WangYing WangLin TongLin ZhouChunfang GaoPublished in: International journal of cancer (2021)
Alpha-fetoprotein (AFP)-negative hepatocellular carcinoma (ANHCC) patients account for more than 30% of the whole entity of HCC patients and are easily misdiagnosed. This three-phase study was designed to find and validate new ANHCC N-glycan markers which identified from The Cancer Genome Atlas (TCGA) database and noninvasive detection. Differentially expressed genes (DEGs) of N-glycan biosynthesis and degradation related genes were screened from TCGA database. Serum N-glycan structure abundances were analyzed using N-glycan fingerprint (NGFP) technology. Totally 1340 participants including ANHCC, chronic liver diseases and healthy controls were enrolled after propensity score matching (PSM). The Lasso algorithm was used to select the most significant N-glycan structures abundances. Three machine learning models [random forest (RF), support vector machine (SVM) and logistic regression (LR)] were used to construct the diagnostic algorithms. All 13N-glycan structure abundances analyzed by NGFP demonstrated significant and was enrolled by Lasso. Among the three machine learning models, LR algorithm demonstrated the best diagnostic performance for identifying ANHCC in training cohort (LR: AUC: 0.842, 95%CI: 0.784-0.899; RF: AUC: 0.825, 95%CI: 0.766-0.885; SVM: AUC: 0.610, 95%CI: 0.527-0.684). This LR algorithm achieved a high diagnostic performance again in the independent validation (AUC: 0.860, 95%CI: 0.824-0.897). Furthermore, the LR algorithm could stratify ANHCC into two distinct subgroups with high or low risks of overall survival and recurrence in follow-up validation. In conclusion, the biomarker panel consisting of 13N-glycan structures abundances using the best-performing algorithm (LR) was defined and indicative as an effective tool for HCC prediction and prognosis estimate in AFP negative subjects.
Keyphrases
- machine learning
- deep learning
- cell surface
- end stage renal disease
- artificial intelligence
- newly diagnosed
- ejection fraction
- chronic kidney disease
- high resolution
- clinical trial
- gene expression
- randomized controlled trial
- climate change
- emergency department
- patient reported outcomes
- mass spectrometry
- risk assessment
- study protocol
- human health
- patient reported