Novel bioinformatic analyses of somatic cell contamination in sperm samples.
Carter NortonChad PollardKelaney StalkerKenneth Ivan AstonTimothy G JenkinsPublished in: Systems biology in reproductive medicine (2024)
The assessment of epigenetic profiles in sperm is sensitive to somatic cell contamination, which can influence methylation signals at gene promoters. This contamination is particularly problematic in the assessment of DNA methylation in samples with low sperm counts, where fractional amounts of somatic cell DNA can lead to significant shifts in measured methylation state. In this study, a new method of detecting possible somatic cell contamination is proposed through two multi-region bioinformatic models: a traditional differential methylation analysis and a machine learning logistic regression model. These models were trained on publicly available sperm ( n = 489) and blood ( n = 1029) DNA methylation array data and tested on a contamination set, wherein the sperm of four donors with normal sperm counts were run on a 450k methylation array with four permutations each, including pure blood, half blood and half sperm by DNA concentration, half blood and half sperm by cell count, and pure sperm ( n = 16). The DMR and logistic regression model classified the contamination testing set with 100% and 94% accuracy, respectively. These new methods of detecting the effects of somatic cell contamination allow for more accurate differentiation between epigenetic profiles that contain a biological somatic-like shift and those that have somatic-like signatures because of contamination.