CoRAL accurately resolves extrachromosomal DNA genome structures with long-read sequencing.
Kaiyuan ZhuMatthew Gregory JonesJens LuebeckXinxin BuHyerim YiKing L HungIvy Tsz-Lo WongShu ZhangPaul S MischelHoward ChangVineet BafnaPublished in: Genome research (2024)
Extrachromosomal DNA (ecDNA) is a central mechanism for focal oncogene amplification in cancer, occurring in approximately 15% of early-stage cancers and 30% of late-stage cancers. EcDNAs drive tumor formation, evolution, and drug resistance by dynamically modulating oncogene copy-number and rewiring gene-regulatory networks. Elucidating the genomic architecture of ecDNA amplifications is critical for understanding tumor pathology and developing more effective therapies. Paired-end short-read (Illumina) sequencing and mapping have been utilized to represent ecDNA amplifications using a breakpoint graph, where the inferred architecture of ecDNA is encoded as a cycle in the graph. Traversals of breakpoint graph have been used to successfully predict ecDNA presence in cancer samples. However, short-read technologies are intrinsically limited in the identification of breakpoints, phasing together of complex rearrangements and internal duplications, and deconvolution of cell-to-cell heterogeneity of ecDNA structures. Long-read technologies, such as from Oxford Nanopore Technologies, have the potential to improve inference as the longer reads are better at mapping structural variants and are more likely to span rearranged or duplicated regions. Here, we propose CoRAL (Complete Reconstruction of Amplifications with Long reads), for reconstructing ecDNA architectures using long-read data. CoRAL reconstructs likely cyclic architectures using quadratic programming that simultaneously optimizes parsimony of reconstruction, explained copy number, and consistency of long-read mapping. CoRAL substantially improves reconstructions in extensive simulations and 10 datasets from previously-characterized cell lines as compared to previous short and long-read based tools. As long-read usage becomes wide-spread, we anticipate that CoRAL will be a valuable tool for profiling the landscape and evolution of focal amplifications in tumors.
Keyphrases
- copy number
- single molecule
- single cell
- mitochondrial dna
- high resolution
- rna seq
- genome wide
- early stage
- papillary thyroid
- convolutional neural network
- signaling pathway
- squamous cell carcinoma
- young adults
- cell free
- lymph node
- computed tomography
- radiation therapy
- machine learning
- gene expression
- magnetic resonance
- nucleic acid
- monte carlo
- label free
- data analysis