Login / Signup

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.

Alaina ShumateBrandon Y WongGeo PerteaMihaela Pertea
Published in: PLoS computational biology (2022)
Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.
Keyphrases
  • single molecule
  • single cell
  • arabidopsis thaliana
  • electronic health record
  • rna seq
  • gene expression
  • high resolution
  • endothelial cells
  • big data
  • machine learning
  • mass spectrometry