Canopy2: tumor phylogeny inference by bulk DNA and single-cell RNA sequencing.
Ann Marie K WeidemanRujin WangJoseph G IbrahimYuchao JiangPublished in: bioRxiv : the preprint server for biology (2024)
Tumors are comprised of a mixture of distinct cell populations that differ in terms of genetic makeup and function. Such heterogeneity plays a role in the development of drug resistance and the ineffectiveness of targeted cancer therapies. Insight into this complexity can be obtained through the construction of a phylogenetic tree, which illustrates the evolutionary lineage of tumor cells as they acquire mutations over time. We propose Canopy2, a Bayesian framework that uses single nucleotide variants derived from bulk DNA and single-cell RNA sequencing to infer tumor phylogeny and conduct mutational profiling of tumor subpopulations. Canopy2 uses Markov chain Monte Carlo methods to sample from a joint probability distribution involving a mixture of binomial and beta-binomial distributions, specifically chosen to account for the sparsity and stochasticity of the single-cell data. Canopy2 demystifies the sources of zeros in the single-cell data and separates zeros categorized as non-cancerous (cells without mutations), stochastic (mutations not expressed due to bursting), and technical (expressed mutations not picked up by sequencing). Simulations demonstrate that Canopy2 consistently outperforms competing methods and reconstructs the clonal tree with high fidelity, even in situations involving low sequencing depth, poor single-cell yield, and highly-advanced and polyclonal tumors. We further assess the performance of Canopy2 through application to breast cancer and glioblastoma data, benchmarking against existing methods. Canopy2 is an open-source R package available at https://github.com/annweideman/canopy2.
Keyphrases
- single cell
- rna seq
- high throughput
- monte carlo
- electronic health record
- big data
- circulating tumor
- induced apoptosis
- copy number
- stem cells
- genome wide
- single molecule
- machine learning
- dna methylation
- drug delivery
- mesenchymal stem cells
- deep learning
- papillary thyroid
- artificial intelligence
- cell proliferation
- cell cycle arrest
- data analysis
- cancer therapy
- signaling pathway
- oxidative stress
- pi k akt