A comparative study of structural variant calling in WGS from Alzheimer's disease families.
John S MalamonJohn J FarrellLi Charlie XiaBeth A DombroskiRueben G DasJessica WayAmanda B KuzmaOtto ValladaresYuk Yee LeungAllison J ScanlonIrving Antonio Barrera LopezJack BrehonyKim Carlyle WorleyNancy R ZhangLi-San WangLindsay A FarrerGerard D SchellenbergWan-Ping LeeBadri N VardarajanPublished in: Life science alliance (2024)
Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.