Login / Signup

Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data.

Vladimir B C de SouzaBen T JordanElizabeth TsengElizabeth A NelsonKaren K HirschiGloria SheynkmanMark D Robinson
Published in: Genome biology (2023)
Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is an opportunity to call variants directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA. Here, there are two objectives: first, we perform a mini-benchmark on GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio Iso-Seq, data, but also on Nanopore and Illumina RNA-seq data; second, we propose a pipeline to process spliced-alignment files, making them suitable for variant calling with DNA-based callers. With such manipulations, high calling performance can be achieved using DeepVariant on Iso-seq data.
Keyphrases
  • rna seq
  • single cell
  • electronic health record
  • single molecule
  • genome wide
  • big data
  • copy number
  • dna methylation
  • gene expression
  • data analysis
  • deep learning
  • health information
  • social media
  • nucleic acid