DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies.
Jimin ParkDaniel E CookPi-Chuan ChangAlexey KolesnikovLucas BrambrinkJuan Carlos MierJoshua GardnerBrandy McNultySamuel SaccoAyse KeskusAsher BryantTanveer AhmadJyoti ShettyYongmei ZhaoBao TranGiuseppe NarzisiAdrienne HellandByunggil YooIrina PushelLisa A LansdonChengpeng BiAdam WalterMargaret GibsonTomi PastinenMidhat S FarooqiNicolas RobineKaren H MigaAndrew CarrollMikhail KolmogorovBenedict PatenKishwar ShafinPublished in: bioRxiv : the preprint server for biology (2024)
Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples. To help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available a dataset of five matched tumor-normal cell line pairs sequenced with Illumina, PacBio HiFi, and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples and technologies (short-read and long-read), DeepSomatic consistently outperforms existing callers, particularly for indels.