Digital profiling of cancer transcriptomes from histology images with grouped vision attention.
Yuanning ZhengMarija PizuricaFrancisco Carrillo-PerezHumaira NoorWei YaoChristian WohlfartKathleen MarchalAntoaneta VladimirovaOlivier GevaertPublished in: bioRxiv : the preprint server for biology (2023)
Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. RNA-sequencing has emerged as a potent tool to unravel the transcriptional heterogeneity. However, large-scale characterization of cancer transcriptomes is hindered by the limitations of costs and tissue accessibility. Here, we develop SEQUOIA , a deep learning model employing a transformer architecture to predict cancer transcriptomes from whole-slide histology images. We pre-train the model using data from 2,242 normal tissues, and the model is fine-tuned and evaluated in 4,218 tumor samples across nine cancer types. The results are further validated across two independent cohorts compromising 1,305 tumors. The highest performance was observed in cancers from breast, kidney and lung, where SEQUOIA accurately predicted 13,798, 10,922 and 9,735 genes, respectively. The well predicted genes are associated with the regulation of inflammatory response, cell cycles and hypoxia-related metabolic pathways. Leveraging the well predicted genes, we develop a digital signature to predict the risk of recurrence in breast cancer. While the model is trained at the tissue-level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies.
Keyphrases
- gene expression
- single cell
- papillary thyroid
- deep learning
- squamous cell
- inflammatory response
- dna methylation
- childhood cancer
- genome wide
- artificial intelligence
- convolutional neural network
- machine learning
- mesenchymal stem cells
- working memory
- single molecule
- big data
- lipopolysaccharide induced
- mass spectrometry
- high intensity
- cell therapy