EagleC: A deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps.
Xiao-Tao WangYu LuanFeng YuePublished in: Science advances (2022)
The Hi-C technique has been shown to be a promising method to detect structural variations (SVs) in human genomes. However, algorithms that can use Hi-C data for a full-range SV detection have been severely lacking. Current methods can only identify interchromosomal translocations and long-range intrachromosomal SVs (>1 Mb) at less-than-optimal resolution. Therefore, we develop EagleC, a framework that combines deep-learning and ensemble-learning strategies to predict a full range of SVs at high resolution. We show that EagleC can uniquely capture a set of fusion genes that are missed by whole-genome sequencing or nanopore. Furthermore, EagleC also effectively captures SVs in other chromatin interaction platforms, such as HiChIP, Chromatin interaction analysis with paired-end tag sequencing (ChIA-PET), and capture Hi-C. We apply EagleC in more than 100 cancer cell lines and primary tumors and identify a valuable set of high-quality SVs. Last, we demonstrate that EagleC can be applied to single-cell Hi-C and used to study the SV heterogeneity in primary tumors.
Keyphrases
- single cell
- deep learning
- rna seq
- convolutional neural network
- genome wide
- high resolution
- high throughput
- machine learning
- gene expression
- artificial intelligence
- dna damage
- endothelial cells
- single molecule
- transcription factor
- computed tomography
- papillary thyroid
- young adults
- big data
- dna methylation
- squamous cell carcinoma
- squamous cell
- induced pluripotent stem cells
- label free
- sensitive detection
- pet imaging
- bioinformatics analysis
- real time pcr
- data analysis