Login / Signup

Effect of reduced genomic representation on using runs of homozygosity for inbreeding characterization.

Eléonore LavanchyJérôme Goudet
Published in: Molecular ecology resources (2023)
Genomic measures of inbreeding based on Identical-by-Descent (IBD) segments are increasingly used to measure inbreeding and mostly estimated on SNP arrays and whole-genome-sequencing (WGS) data. However, some softwares recurrently used for their estimation assume that genomic positions which have not been genotyped are non-variant. This might be true for WGS data, but not for reduced genomic representations and can lead to spurious IBD segments estimation. In this project, we simulated the outputs of WGS, two SNP arrays of different sizes and RAD-sequencing for three populations with different sizes and histories. We compare the results of IBD segments estimation with two softwares: runs of homozygosity (ROHs) estimated with PLINK and Homozygous-by-descent (HBD) segments estimated with RZooRoH. We demonstrate that to obtain meaningful estimates of inbreeding, RZooRoH requires a SNPs density eleven times smaller compared to PLINK: ranks of inbreeding coefficients were conserved among individuals above 22 SNPs/Mb for PLINK and 2 SNPs/Mb for RZooRoH. We also show that in populations with simple demographic histories, ROHs and HBD segments distributions are correctly estimated with both SNP arrays and WGS. PLINK correctly estimated ROHs distributions with SNP densities above 22 SNPs/Mb while RZooRoH correctly estimated HBD segments distribution with SNPs densities above 11 SNPs/Mb. However, in a population with a more complex demographic history, RZooRoH resulted in better IBD segments distributions estimation compared to PLINK even with WGS data. Consequently, we advise researchers to use either methods relying on excess homozygosity averaged across SNPs or model-based HBD segments calling methods for inbreeding estimations.
Keyphrases
  • genome wide
  • copy number
  • dna methylation
  • high density
  • electronic health record
  • genome wide association
  • big data
  • genetic diversity
  • dna damage
  • machine learning
  • ulcerative colitis
  • working memory
  • data analysis