A Bayesian framework to study tumor subclone-specific expression by combining bulk DNA and single-cell RNA sequencing data.
Yi QiaoXiaomeng HuangPhilip J MoosJonathan M AhmannAnthony D PomicterMichael W DeiningerJohn C ByrdJennifer A WoyachDeborah M StephensGabor T MarthPublished in: Genome research (2024)
Genetic and gene expression heterogeneity is an essential hallmark of many tumors, allowing the cancer to evolve and to develop resistance to treatment. Currently, the most commonly used data types for studying such heterogeneity are bulk tumor/normal whole-genome or whole-exome sequencing (WGS, WES); and single-cell RNA sequencing (scRNA-seq), respectively. However, tools are currently lacking to link genomic tumor subclonality with transcriptomic heterogeneity by integrating genomic and single-cell transcriptomic data collected from the same tumor. To address this gap, we developed scBayes, a Bayesian probabilistic framework that uses tumor subclonal structure inferred from bulk DNA sequencing data to determine the subclonal identity of cells from single-cell gene expression (scRNA-seq) measurements. Grouping together cells representing the same genetically defined tumor subclones allows comparison of gene expression across different subclones, or investigation of gene expression changes within the same subclone across time (i.e., progression, treatment response, or relapse) or space (i.e., at multiple metastatic sites and organs). We used simulated data sets, in silico synthetic data sets, as well as biological data sets generated from cancer samples to extensively characterize and validate the performance of our method, as well as to show improvements over existing methods. We show the validity and utility of our approach by applying it to published data sets and recapitulating the findings, as well as arriving at novel insights into cancer subclonal expression behavior in our own data sets. We further show that our method is applicable to a wide range of single-cell sequencing technologies including single-cell DNA sequencing as well as Smart-seq and 10x Genomics scRNA-seq protocols.
Keyphrases
- single cell
- rna seq
- gene expression
- high throughput
- electronic health record
- big data
- dna methylation
- poor prognosis
- squamous cell carcinoma
- long non coding rna
- small cell lung cancer
- data analysis
- papillary thyroid
- oxidative stress
- machine learning
- young adults
- single molecule
- systematic review
- cell death
- cell free
- molecular dynamics simulations
- combination therapy
- replacement therapy
- induced apoptosis
- nucleic acid