NOMAD2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads.
Marek KokotRoozbeh DehghannasiriTavor Z BaharavJulia SalzmanSebastian DeorowiczPublished in: bioRxiv : the preprint server for biology (2023)
NOMAD is a new, unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k -mer composition in DNA or RNA sequencing experiments. It subsumes many application-specific algorithms, from splicing detection to RNA editing to applications in DNA-sequencing and beyond. Here, we introduce NOMAD2, a fast, scalable, and user-friendly implementation of NOMAD based on KMC, an efficient k-mer counting approach. The pipeline has minimal installation requirements, and can be executed with a single command. NOMAD2 enables efficient analysis of massive RNA-Seq datasets where it reveals novel biology, showcased by rapid analysis of 1,553 human muscle cells, the entire Cancer Cell Line Encyclopedia (671 cell lines, 5.7 TB) and a deep RNAseq study of Amyotrophic Lateral Sclerosis (ALS) with ∼2 fold less computational resource and time than state of the art alignment methods. NOMAD2 enables reference-free biological discovery at unmatched scale and speed. By bypassing genome alignment, we provide examples of its new insights into RNA expression in normal and disease tissue, to introduce NOMAD2 to enable expansive biological discovery not previously possible.
Keyphrases
- single cell
- rna seq
- amyotrophic lateral sclerosis
- machine learning
- high throughput
- small molecule
- healthcare
- primary care
- endothelial cells
- crispr cas
- poor prognosis
- circulating tumor
- cell free
- induced apoptosis
- single molecule
- squamous cell carcinoma
- nucleic acid
- loop mediated isothermal amplification
- deep learning
- papillary thyroid
- cell cycle arrest
- mycobacterium tuberculosis
- signaling pathway
- binding protein
- quality improvement
- high resolution
- mass spectrometry
- cell death
- genome wide
- squamous cell
- lymph node metastasis
- real time pcr
- pluripotent stem cells