DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification.
Lixin ChenPingfang LiuThomas C EvansLaurence M EttwillerPublished in: Science (New York, N.Y.) (2017)
Mutations in somatic cells generate a heterogeneous genomic population and may result in serious medical conditions. Although cancer is typically associated with somatic variations, advances in DNA sequencing indicate that cell-specific variants affect a number of phenotypes and pathologies. Here, we show that mutagenic damage accounts for the majority of the erroneous identification of variants with low to moderate (1 to 5%) frequency. More important, we found signatures of damage in most sequencing data sets in widely used resources, including the 1000 Genomes Project and The Cancer Genome Atlas, establishing damage as a pervasive cause of sequencing errors. The extent of this damage directly confounds the determination of somatic variants in these data sets.
Keyphrases
- copy number
- single cell
- oxidative stress
- genome wide
- dna damage
- papillary thyroid
- induced apoptosis
- electronic health record
- squamous cell
- healthcare
- dna methylation
- patient safety
- quality improvement
- bioinformatics analysis
- adverse drug
- mesenchymal stem cells
- gene expression
- childhood cancer
- high intensity
- single molecule
- bone marrow
- machine learning
- cell proliferation
- signaling pathway
- young adults
- mass spectrometry
- cell death
- artificial intelligence
- solid phase extraction