DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage.
Marius WelzelPeter Michael SchwarzHannah Franziska LöchelTolganay KabdullayevaSandra ClemensAnke BeckerBernd FreislebenDominik HeiderPublished in: Nature communications (2023)
The extensive information capacity of DNA, coupled with decreasing costs for DNA synthesis and sequencing, makes DNA an attractive alternative to traditional data storage. The processes of writing, storing, and reading DNA exhibit specific error profiles and constraints DNA sequences have to adhere to. We present DNA-Aeon, a concatenated coding scheme for DNA data storage. It supports the generation of variable-sized encoded sequences with a user-defined Guanine-Cytosine (GC) content, homopolymer length limitation, and the avoidance of undesired motifs. It further enables users to provide custom codebooks adhering to further constraints. DNA-Aeon can correct substitution errors, insertions, deletions, and the loss of whole DNA strands. Comparisons with other codes show better error-correction capabilities of DNA-Aeon at similar redundancy levels with decreased DNA synthesis costs. In-vitro tests indicate high reliability of DNA-Aeon even in the case of skewed sequencing read distributions and high read-dropout.
Keyphrases
- circulating tumor
- single molecule
- cell free
- nucleic acid
- circulating tumor cells
- healthcare
- type diabetes
- emergency department
- metabolic syndrome
- single cell
- machine learning
- adipose tissue
- mass spectrometry
- insulin resistance
- social media
- working memory
- patient safety
- deep learning
- electronic health record
- big data
- health information
- adverse drug
- simultaneous determination