Benchmarking sample pooling for epigenomics of natural populations.
Ryan J DanielsBritta S MeyerMarco GiulioSilvia G SignoriniNicoletta RiccardiCamilla Della TorreAlexandra Anh-Thu WeberPublished in: Molecular ecology resources (2024)
DNA methylation (DNAm) is a mechanism for rapid acclimation to environmental conditions. In natural systems, small effect sizes relative to noise necessitates large sampling efforts to detect differences. Large numbers of individually sequenced libraries are costly. Pooling DNA prior to library preparation may be an efficient way to reduce costs and increase sample size, yet there are to date no recommendations in ecological epigenetics research. We test whether pooled and individual libraries yield comparable DNAm signals in a natural system exposed to different pollution levels by generating whole-epigenome data from two invasive molluscs (Corbicula fluminea, Dreissena polymorpha) collected from polluted and unpolluted localities (Italy). DNA of the same individuals were used for pooled and individual epigenomic libraries and sequenced with equivalent resources per individual. We found that pooling effectively captures similar genome-wide and global methylation signals as individual libraries, highlighting that pooled libraries are representative of the global population signal. However, pooled libraries yielded orders of magnitude more data than individual libraries, which was a consequence of higher coverage. We would therefore recommend aiming for a high initial coverage of individual libraries (15×) in future studies. Consequently, we detected many more differentially methylated regions (DMRs) with the pooled libraries and a significantly lower statistical power for regions from individual libraries. Computationally pooled data from the individual libraries produced fewer DMRs and the overlap with wet-lab pooled DMRs was relatively low. We discuss possible causes for discrepancies, list benefits and drawbacks of pooling, and provide recommendations for future epigenomic studies.
Keyphrases
- dna methylation
- genome wide
- phase iii
- electronic health record
- gene expression
- healthcare
- heavy metals
- air pollution
- randomized controlled trial
- clinical practice
- big data
- data analysis
- particulate matter
- quality improvement
- study protocol
- cross sectional
- deep learning
- case control
- simultaneous determination
- affordable care act