Login / Signup

Benchmarking of small and large variants across tandem repeats.

Adam C EnglishEgor DolzhenkoHelyaneh Ziaei JamSean K McKenzieNathan D OlsonWouter De CosterJonghun ParkBida GuJustin WagnerMichael A EberleMelissa GymrekMark J P ChaissonJustin M ZookFritz J Sedlazeck
Published in: bioRxiv : the preprint server for biology (2023)
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Keyphrases
  • genome wide
  • copy number
  • dna methylation
  • endothelial cells
  • healthcare
  • fluorescent probe
  • gene expression
  • single molecule
  • induced pluripotent stem cells
  • clinical evaluation