Multi-platform discovery of haplotype-resolved structural variation in human genomes.
Mark J P ChaissonAshley D SandersXuefang ZhaoAnkit MalhotraDavid PorubskyTobias RauschEugene J GardnerOscar L RodriguezLi GuoRyan L CollinsXian FanJia WenRobert E HandsakerSusan FairleyZev N KronenbergXiangmeng KongFereydoun HormozdiariDillon LeeAaron M WengerAlex R HastieDanny AntakiThomas AnantharamanPeter A AudanoHarrison BrandStuart CantsilierisHan CaoEliza CerveiraChong ChenXintong ChenChen-Shan ChinZechen ChongNelson T ChuangChristine C LambertDeanna M ChurchLaura ClarkeAndrew FarrellJoey FloresTimur GaleevDavid U GorkinMadhusudan GujralVictor GuryevWilliam Haynes HeatonJonas KorlachSushant KumarJee Young KwonErnest T LamJong Eun LeeJoyce LeeWan-Ping LeeSau Peng LeeShantao LiPatrick MarksKarine Viaud-MartinezSascha MeiersKatherine M MunsonFábio C P NavarroBradley J NelsonConor NodzakAmina NoorSofia Kyriazopoulou-PanagiotopoulouAndy W C PangYunjiang QiuGabriel RosanioMallory RyanAdrian StützDiana C J SpieringsAlistair WardAnneMarie E WelchMing XiaoWei XuChengsheng ZhangQihui ZhuXiangqun Zheng-BradleyErnesto Lowy-GallegoSergei YakneenSteven McCarrollGoo JunLi DingChong Lek KohBing RenPaul FlicekKen ChenMark B GersteinPui-Yan KwokPeter M LansdorpGabor T MarthJonathan SebatXing-Hua ShiAli BashirKai YeScott E DevineMichael E TalkowskiRyan E MillsTobias MarschallJan O KorbelEvan E EichlerCharles LeePublished in: Nature communications (2019)
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.
Keyphrases
- endothelial cells
- induced pluripotent stem cells
- genetic diversity
- high throughput
- small molecule
- healthcare
- machine learning
- single cell
- pluripotent stem cells
- genome wide
- high resolution
- copy number
- mental health
- gene expression
- clinical practice
- mass spectrometry
- electronic health record
- high throughput sequencing
- deep learning
- dna methylation
- artificial intelligence