Login / Signup

Accurate assembly of multi-end RNA-seq data with Scallop2.

Qimin ZhangQian ShiMingfu Shao
Published in: Nature computational science (2022)
Modern RNA-sequencing protocols can produce multi-end data, where multiple reads originating from the same transcript are attached to the same barcode. The long-range information in the multi-end reads is beneficial in phasing complicated spliced isoforms, but assembly algorithms that leverage such information are lacking. Here we introduce Scallop2, a reference-based assembler optimized for multi-end RNA-seq data. The algorithmic core of Scallop2 consists of three steps: (1) using an algorithm to "bridge" multi-end reads into single-end phasing paths in the context of a splice graph, (2) employing a method to refine erroneous splice graphs by utilizing multi-end reads that fail to bridge, and (3) piping the refined splice graph and the bridged phasing paths into an algorithm that integrates multiple phase-preserving decompositions. Tested on 561 cells in two Smart-seq3 datasets and on 10 Illumina paired-end RNA-seq samples, Scallop2 substantially improves the assembly accuracy compared to two popular assemblers StringTie2 and Scallop.
Keyphrases
  • rna seq
  • single cell
  • machine learning
  • deep learning
  • electronic health record
  • healthcare
  • cell proliferation
  • genome wide
  • cell death
  • endoplasmic reticulum stress