Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments.
Ales VarabyouSteven L SalzbergMihaela PerteaPublished in: Genome research (2020)
RNA sequencing is widely used to measure gene expression across a vast range of animal and plant tissues and conditions. Most studies of computational methods for gene expression analysis use simulated data to evaluate the accuracy of these methods. These simulations typically include reads generated from known genes at varying levels of expression. Until now, simulations did not include reads from noisy transcripts, which might include erroneous transcription, erroneous splicing, and other processes that affect transcription in living cells. Here we examine the effects of realistic amounts of transcriptional noise on the ability of leading computational methods to assemble and quantify the genes and transcripts in an RNA sequencing experiment. We show that the inclusion of noise leads to systematic errors in the ability of these programs to measure expression, including systematic underestimates of transcript abundance levels and large increases in the number of false-positive genes and transcripts. Our results also suggest that alignment-free computational methods sometimes fail to detect transcripts expressed at relatively low levels.
Keyphrases
- genome wide identification
- gene expression
- transcription factor
- genome wide
- poor prognosis
- single cell
- living cells
- air pollution
- rna seq
- dna methylation
- genome wide analysis
- fluorescent probe
- copy number
- binding protein
- molecular dynamics
- public health
- long non coding rna
- electronic health record
- single molecule
- heat shock
- big data
- microbial community
- quality improvement