Contaminations in (meta)genome data: An open issue for the scientific community.
Giovanna De SimoneAndrea PasquadibisceglieRoberta ProiettoFabio PolticelliSilvio AimeHuub J M Op den CampPaolo AscenziPublished in: IUBMB life (2019)
In recent years, the high throughput and the low cost of next-generation sequencing (NGS) technologies have led to an increase of the amount of (meta)genomic data, revolutionizing genomic research studies. However, the quality of sequencing data could be affected by experimental errors derived from defective methods and protocols. This represents a serious problem for the scientific community with a negative impact on the correctness of studies that involve genomic sequence analysis. As a countermeasure, several alignment and taxonomic classification tools have been developed to uncover and correct errors. In this critical review some of these integrated software tools and pipelines used to detect contaminations in reference genome databases and sequenced samples are reported. In particular, case studies of bacterial contaminations, contaminations of human origin, mitochondrial contaminations of ancient DNA, and cross contaminations are examined.
Keyphrases
- copy number
- low cost
- big data
- electronic health record
- high throughput
- mental health
- healthcare
- endothelial cells
- machine learning
- circulating tumor
- data analysis
- patient safety
- single cell
- adverse drug
- oxidative stress
- gene expression
- emergency department
- case control
- dna methylation
- single molecule
- cell free
- high throughput sequencing