Rockfish: A transformer-based model for accurate 5-methylcytosine prediction from nanopore sequencing.
Dominik StanojevićZhe LiSara BakićRoger Sik Yin FooMile ŠikićPublished in: Nature communications (2024)
DNA methylation plays an important role in various biological processes, including cell differentiation, ageing, and cancer development. The most important methylation in mammals is 5-methylcytosine mostly occurring in the context of CpG dinucleotides. Sequencing methods such as whole-genome bisulfite sequencing successfully detect 5-methylcytosine DNA modifications. However, they suffer from the serious drawbacks of short read lengths and might introduce an amplification bias. Here we present Rockfish, a deep learning algorithm that significantly improves read-level 5-methylcytosine detection by using Nanopore sequencing. Rockfish is compared with other methods based on Nanopore sequencing on R9.4.1 and R10.4.1 datasets. There is an increase in the single-base accuracy and the F1 measure of up to 5 percentage points on R.9.4.1 datasets, and up to 0.82 percentage points on R10.4.1 datasets. Moreover, Rockfish shows a high correlation with whole-genome bisulfite sequencing, requires lower read depth, and achieves higher confidence in biologically important regions such as CpG-rich promoters while being computationally efficient. Its superior performance in human and mouse samples highlights its versatility for studying 5-methylcytosine methylation across varied organisms and diseases. Finally, its adaptable architecture ensures compatibility with new versions of pores and chemistry as well as modification types.
Keyphrases
- single molecule
- dna methylation
- single cell
- rna seq
- deep learning
- genome wide
- machine learning
- gene expression
- endothelial cells
- squamous cell carcinoma
- high resolution
- mass spectrometry
- young adults
- circulating tumor
- copy number
- loop mediated isothermal amplification
- sensitive detection
- cell free
- childhood cancer
- neural network