Serum Protein Fishing for Machine Learning-Boosted Diagnostic Classification of Small Nodules of Lung.
Mengjie WangXin DaiXu YangBaichuan JinYueli XieChenlu XuQiqi LiuLichao WangLisha YingWeishan LuQixun ChenTing FuDan SuYuan LiuWeihong TanPublished in: ACS nano (2024)
Diagnosis of benign and malignant small nodules of the lung remains an unmet clinical problem which is leading to serious false positive diagnosis and overtreatment. Here, we developed a serum protein fishing-based spectral library (ProteoFish) for data independent acquisition analysis and a machine learning-boosted protein panel for diagnosis of early Non-Small Cell Lung Cancer (NSCLC) and classification of benign and malignant small nodules. We established an extensive NSCLC protein bank consisting of 297 clinical subjects. After testing 5 feature extraction algorithms and six machine learning models, the Lasso algorithm for a 15-key protein panel selection and Random Forest was chosen for diagnostic classification. Our random forest classifier achieved 91.38% accuracy in benign and malignant small nodule diagnosis, which is superior to the existing clinical assays. By integrating with machine learning, the 15-key protein panel may provide insights to multiplexed protein biomarker fishing from serum for facile cancer screening and tackling the current clinical challenge in prospective diagnostic classification of small nodules of the lung.