Minimalist approaches to cancer tissue-of-origin classification by DNA methylation.
Daniel XiaAlberto Jose LeonMichael CabaneroTrevor John PughMing Sound TsaoPrisni RathLillian Lai-Yun SiuCeleste YuPhilippe Lucien BedardFrances Alice ShepherdGelareh ZadehRunjan ChettyKenneth AldapePublished in: Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc (2020)
Classification of cancers by tissue-of-origin is fundamental to diagnostic pathology. While the combination of clinical data, tissue histology, and immunohistochemistry is usually sufficient, there remains a small but not insignificant proportion of difficult-to-classify cases. These challenging cases provide justification for ancillary molecular testing, including high-throughput DNA methylation array profiling, which promises cell-of-origin information and compatibility with formalin-fixed specimens. While diagnostically powerful, methylation profiling platforms are costly and technically challenging to implement, particularly for less well-resourced laboratories. To address this, we simulated the performance of "minimalist" methylation-based tests for cancer classification using publicly-available and internal institutional profiling data. These analyses showed that small and focused sets of the most informative CpG biomarkers from the arrays are sufficient for accurate diagnoses. As an illustrative example, one classifier, using information from just 53 out of about 450,000 available CpG probes, achieved an accuracy of 94.5% on 2575 fresh primary validation cases across 28 cancer types from The Cancer Genome Atlas Network. By training minimalist classifiers on formalin-fixed primary and metastatic cases, generally high accuracies were also achieved on additional datasets. These results support the potential of minimalist methylation testing, possibly via quantitative PCR and targeted next-generation sequencing platforms, in cancer classification.
Keyphrases
- dna methylation
- papillary thyroid
- single cell
- high throughput
- squamous cell
- genome wide
- machine learning
- deep learning
- gene expression
- high resolution
- stem cells
- healthcare
- squamous cell carcinoma
- childhood cancer
- small molecule
- rna seq
- photodynamic therapy
- drug delivery
- electronic health record
- big data
- risk assessment
- copy number
- social media
- climate change
- cell free