A platform-independent AI tumor lineage and site (ATLAS) classifier.
Nicholas R RydzewskiYue ShiChenxuan LiMatthew R ChrostekHamza BakhtiarKyle T HelzerMatthew L BootsmaTracy J BergPaul M HarariJohn M FlobergGrace C BlitzerDavid KosoffAmy K TaylorMarina N SharifiMenggang YuJoshua M LangKrishnan R PatelDeborah E CitrinKaitlin E SundlingShuang G ZhaoPublished in: Communications biology (2024)
Histopathologic diagnosis and classification of cancer plays a critical role in guiding treatment. Advances in next-generation sequencing have ushered in new complementary molecular frameworks. However, existing approaches do not independently assess both site-of-origin (e.g. prostate) and lineage (e.g. adenocarcinoma) and have minimal validation in metastatic disease, where classification is more difficult. Utilizing gradient-boosted machine learning, we developed ATLAS, a pair of separate AI Tumor Lineage and Site-of-origin models from RNA expression data on 8249 tumor samples. We assessed performance independently in 10,376 total tumor samples, including 1490 metastatic samples, achieving an accuracy of 91.4% for cancer site-of-origin and 97.1% for cancer lineage. High confidence predictions (encompassing the majority of cases) were accurate 98-99% of the time in both localized and remarkably even in metastatic samples. We also identified emergent properties of our lineage scores for tumor types on which the model was never trained (zero-shot learning). Adenocarcinoma/sarcoma lineage scores differentiated epithelioid from biphasic/sarcomatoid mesothelioma. Also, predicted lineage de-differentiation identified neuroendocrine/small cell tumors and was associated with poor outcomes across tumor types. Our platform-independent single-sample approach can be easily translated to existing RNA-seq platforms. ATLAS can complement and guide traditional histopathologic assessment in challenging situations and tumors of unknown primary.
Keyphrases
- single cell
- rna seq
- machine learning
- squamous cell carcinoma
- high throughput
- small cell lung cancer
- papillary thyroid
- artificial intelligence
- deep learning
- prostate cancer
- type diabetes
- big data
- squamous cell
- radiation therapy
- mesenchymal stem cells
- young adults
- childhood cancer
- electronic health record
- dna methylation
- lymph node metastasis
- body composition
- cell free
- neural network
- circulating tumor