Exploiting allele-specific transcriptional effects of subclonal copy number alterations for genotype-phenotype mapping in cancer cell populations.
Hongyu ShiMarc J WilliamsGryte SatasAdam C WeinerAndrew W McPhersonSohrab P ShahPublished in: bioRxiv : the preprint server for biology (2023)
Somatic copy number alterations drive aberrant gene expression in cancer cells. In tumors with high levels of chromosomal instability, subclonal copy number alterations (CNAs) are a prevalent feature which often result in heterogeneous cancer cell populations with distinct phenotypes 1 . However, the extent to which subclonal CNAs contribute to clone-specific phenotypes remains poorly understood, in part due to the lack of methods to quantify how CNAs influence gene expression at a subclone level. We developed TreeAlign, which computationally integrates independently sampled single-cell DNA and RNA sequencing data from the same cell population and explicitly models gene dosage effects from subclonal alterations. We show through quantitative benchmarking data and application to human cancer data with single cell DNA and RNA libraries that TreeAlign accurately encodes clone-specific transcriptional effects of subclonal CNAs, the impact of allelic imbalance on allele-specific transcription, and obviates the need to arbitrarily define genotypic clones from a phylogenetic tree a priori . Combined, these advances lead to highly granular definitions of clones with distinct copy-number driven expression programs with increased resolution and accuracy over competing methods. The resulting improvement in assignment of transcriptional phenotypes to genomic clones enables clone-clone gene expression comparisons and explicit inference of genes that are mechanistically altered through CNAs, and identification of expression programs that are genomically independent. Our approach sets the stage for dissecting the relative contribution of fixed genomic alterations and dynamic epigenetic processes on gene expression programs in cancer.
Keyphrases
- copy number
- gene expression
- single cell
- dna methylation
- mitochondrial dna
- genome wide
- rna seq
- poor prognosis
- public health
- electronic health record
- papillary thyroid
- high throughput
- single molecule
- big data
- transcription factor
- high resolution
- circulating tumor
- cell free
- squamous cell
- endothelial cells
- machine learning
- stem cells
- squamous cell carcinoma
- binding protein
- mass spectrometry
- long non coding rna
- deep learning
- nucleic acid
- mesenchymal stem cells
- circulating tumor cells