Information Density Enhancement Using Lossy Compression in DNA Data Storage.
Seongjun SeoAnshula TandonKeun Woo LeeJee-Hyong LeeSung Ha ParkPublished in: Advanced materials (Deerfield Beach, Fla.) (2024)
This study develops two DNA lossy compression models, Models A and B, to encode grayscale images into DNA sequences, enhance information density, and enable high-fidelity image recovery. These models, distinguished by their handling of pixel domains and interpolation methods, offer a novel approach to data storage for DNA. Model A processes pixels in overlapped domains using linear interpolation, whereas Model B uses non-overlapped domains with nearest-neighbor interpolation. Through a comparative analysis with JPEG compression, the DNA lossy compression models demonstrated competitive advantages in terms of information density and image quality restoration. The application of these models to the MNIST dataset reveals their efficiency and the recognizability of decompressed images, which is validated by convolutional neural network performance. In particular, Model B2, a version of Model B, emerges as an effective method for balancing high information density (surpassing over 20 times the typical densities of 2 bits per nucleotide) with reasonably good image quality. These findings highlight the potential of DNA-based data storage systems for high-density and efficient compression, indicating a promising future for biological data storage solutions. This article is protected by copyright. All rights reserved.