Login / Signup

Structural variation across 138,134 samples in the TOPMed consortium.

Goo JunAdam C EnglishGinger A MetcalfJianzhi YangMark J P ChaissonNathan PankratzVipin K MenonWilliam J SalernoOlga KrashenininaAlbert Vernon SmithJohn A LaneTom BlackwellHyun-Min KangSejal SalviQingchang MengHua ShenDivya PashamSravya BhamidipatiKavya KottapalliDonna K ArnettAllison Elizabeth Ashley-KochPaul L AuerKathleen M BeutelJoshua C BisJohn E BlangeroDonald W BowdenJennifer A BrodyBrian E CadeYii-Der Ida ChenMichael H ChoJoanne E CurranMyriam FornageBarry I FreedmanTasha FingerlinBruce D GelbLifang HouYi-Jen HungJohn P KaneRobert C KaplanWonji KimRuth J F LoosGregory M MarcusRasika A MathiasStephen T McGarveyCourtney G MontgomeryTake NaseriS Mehdi NouraieMichael H PreussNicholette D D AllredPatricia A PeyserLaura M RaffieldAakrosh RatanSusan RedlineSefuiva ReupenaJerome I RotterStephen S RichMichiel RienstraIngo RuczinskiVijay G SankaranDavid A SchwartzChristine E SeidmanJonathan G SeidmanEdwin K SilvermanJennifer A SmithAdrienne M StilpKent D TaylorMarilyn J TelenScott T WeissL Keoki WilliamsBaojun WuLisa R YanekYingze ZhangJessica A Lasky-SuMarie Claude GingrasSusan K DutcherEvan E EichlerStacey GabrielSoren GermerRyan KimKarine A Viaud-MartinezDeborah A Nickersonnull nullJames LuoAlexander P ReinerRichard A GibbsEric BoerwinkleGoncalo AbecasisFritz J Sedlazeck
Published in: bioRxiv : the preprint server for biology (2023)
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Keyphrases
  • genome wide
  • single molecule
  • circulating tumor
  • electronic health record
  • single cell
  • copy number
  • dna methylation
  • quality improvement
  • machine learning
  • artificial intelligence