Global transcript structure resolution of high gene density genomes through multi-platform data integration.
Tina M O'GradyXia WangKerstin Höner Zu BentrupMelody BaddooMonica ConchaErik K FlemingtonPublished in: Nucleic acids research (2016)
Annotation of herpesvirus genomes has traditionally been undertaken through the detection of open reading frames and other genomic motifs, supplemented with sequencing of individual cDNAs. Second generation sequencing and high-density microarray studies have revealed vastly greater herpesvirus transcriptome complexity than is captured by existing annotation. The pervasive nature of overlapping transcription throughout herpesvirus genomes, however, poses substantial problems in resolving transcript structures using these methods alone. We present an approach that combines the unique attributes of Pacific Biosciences Iso-Seq long-read, Illumina short-read and deepCAGE (Cap Analysis of Gene Expression) sequencing to globally resolve polyadenylated isoform structures in replicating Epstein-Barr virus (EBV). Our method, Transcriptome Resolution through Integration of Multi-platform Data (TRIMD), identifies nearly 300 novel EBV transcripts, quadrupling the size of the annotated viral transcriptome. These findings illustrate an array of mechanisms through which EBV achieves functional diversity in its relatively small, compact genome including programmed alternative splicing (e.g. across the IR1 repeats), alternative promoter usage by LMP2 and other latency-associated transcripts, intergenic splicing at the BZLF2 locus, and antisense transcription and pervasive readthrough transcription throughout the genome.
Keyphrases
- epstein barr virus
- single cell
- rna seq
- high throughput
- high density
- genome wide
- gene expression
- dna methylation
- single molecule
- transcription factor
- diffuse large b cell lymphoma
- copy number
- high resolution
- electronic health record
- big data
- sars cov
- mental health
- minimally invasive
- working memory
- genome wide identification
- nucleic acid
- real time pcr
- deep learning
- data analysis