Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing.
Li Tai FangBin ZhuYongmei ZhaoWanqiu ChenZhao-Wei YangLiz KerriganKurt LangenbachMaryellen de MarsCharles LuKenneth IdlerHoward JacobYuanting ZhengLuyao RenYing YuErich JaegerGary P SchrothOgan D AbaanKeyur TalsaniaJustin LackTsai-Wei ShenZhong ChenSeta StanboulyBao TranJyoti ShettyYuliya KrigaDaoud MeerzamanCu NguyenVirginie PetitjeanMarc SultanMargaret CamMonika MehtaTiffany HungEric PetersRasika KalameghamSayed Mohammad Ebrahim SahraeianMarghoob MohiyuddinYunfei GuoLijing YaoLei SongHugo Y K LamJiri DrabekPetr VojtaRoberta MaestroDaniela GasparottoSulev KõksEne ReimannAndreas SchererJessica NordlundUlrika LiljedahlRoderick V JensenMehdi PiroozniaZhipan LiChunlin XiaoStephen T SherryRebecca KuskoMalcolm MoosEric DonaldsonZivana TezakBaitang NingWeida TongJing LiPenelope Duerken-HughesClaudia CatalanottiShamoni MaheshwariJoe ShugaWinnie S LiangJonathan J KeatsJonathan AdkinsErica TassoneVictoria ZismannTimothy McDanielJeffery M TrentJonathan FooxDaniel ButlerChristopher E MasonHuixiao HongLe-Ming ShiCharles WangWenming Xiaonull nullPublished in: Nature biotechnology (2021)
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.