Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples.
Tim H H CoorensMichael Spencer ChapmanNicholas WilliamsIñigo MartincorenaMichael R StrattonJyoti NangaliaPeter J CampbellPublished in: Nature protocols (2024)
Phylogenetic trees are a powerful means to display the evolutionary history of species, pathogens and, more recently, individual cells of the human body. Whole-genome sequencing of laser capture microdissections or expanded stem cells has allowed the discovery of somatic mutations in clones, which can be used as natural barcodes to reconstruct the developmental history of individual cells. Here we describe Sequoia, our pipeline to reconstruct lineage trees from clones of normal cells. Candidate somatic mutations are called against the human reference genome and filtered to exclude germline mutations and artifactual variants. These filtered somatic mutations form the basis for phylogeny reconstruction using a maximum parsimony framework. Lastly, we use a maximum likelihood framework to explicitly map mutations to branches in the phylogenetic tree. The resulting phylogenies can then serve as a basis for many subsequent analyses, including investigating embryonic development, tissue dynamics in health and disease, and mutational signatures. Sequoia can be readily applied to any clonal somatic mutation dataset, including single-cell DNA sequencing datasets, using the commands and scripts provided. Moreover, Sequoia is highly flexible and can be easily customized. Typically, the runtime of the core script ranges from minutes to an hour for datasets with a moderate number (50,000-150,000) of variants. Competent bioinformatic skills, including in-depth knowledge of the R programming language, are required. A high-performance computing cluster (one that is capable of running mutation-calling algorithms and other aspects of the analysis at scale) is also required, especially if handling large datasets.
Keyphrases
- copy number
- genome wide
- induced apoptosis
- single cell
- stem cells
- cell cycle arrest
- rna seq
- healthcare
- dna methylation
- machine learning
- endoplasmic reticulum stress
- high intensity
- blood pressure
- small molecule
- magnetic resonance imaging
- autism spectrum disorder
- high resolution
- single molecule
- cell death
- cell proliferation
- mass spectrometry
- optical coherence tomography
- bone marrow
- cell therapy
- gram negative