Application of a bioinformatic pipeline to RNA-seq data identifies novel virus-like sequence in human blood.
Marko MelnickPatrick GonzalesThomas J LaRoccaYuping SongJoanne WuuMichael BenatarBjörn OskarssonLeonard PetrucelliRobin D DowellChristopher D LinkMercedes PrudencioPublished in: G3 (Bethesda, Md.) (2021)
Numerous reports have suggested that infectious agents could play a role in neurodegenerative diseases, but specific etiological agents have not been convincingly demonstrated. To search for candidate agents in an unbiased fashion, we have developed a bioinformatic pipeline that identifies microbial sequences in mammalian RNA-seq data, including sequences with no significant nucleotide similarity hits in GenBank. Effectiveness of the pipeline was tested using publicly available RNA-seq data and in a reconstruction experiment using synthetic data. We then applied this pipeline to a novel RNA-seq dataset generated from a cohort of 120 samples from amyotrophic lateral sclerosis patients and controls, and identified sequences corresponding to known bacteria and viruses, as well as novel virus-like sequences. The presence of these novel virus-like sequences, which were identified in subsets of both patients and controls, were confirmed by quantitative RT-PCR. We believe this pipeline will be a useful tool for the identification of potential etiological agents in the many RNA-seq datasets currently being generated.
Keyphrases
- rna seq
- single cell
- end stage renal disease
- chronic kidney disease
- ejection fraction
- newly diagnosed
- electronic health record
- amyotrophic lateral sclerosis
- big data
- prognostic factors
- randomized controlled trial
- gene expression
- systematic review
- genome wide
- adverse drug
- climate change
- machine learning
- risk assessment
- endothelial cells
- mass spectrometry
- data analysis