Login / Signup

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs.

Jonas A SibbesenJordan M EizengaAdam M NovakJouni SirénXian ChangErik GarrisonBenedict Paten
Published in: Nature methods (2023)
Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our toolchain, which consists of additions to the VG toolkit and a standalone tool, RPVG, can construct spliced pangenome graphs, map RNA sequencing data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. We show that this workflow improves accuracy over state-of-the-art RNA sequencing mapping methods, and that it can efficiently quantify haplotype-specific transcript expression without needing to characterize the haplotypes of a sample beforehand.
Keyphrases
  • single cell
  • rna seq
  • poor prognosis
  • electronic health record
  • high resolution
  • big data
  • binding protein
  • machine learning
  • dna methylation
  • deep learning