GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data.
Sarah SandmannAniek O de GraafBert A van der ReijdenJoop H JansenMartin DugasPublished in: PloS one (2017)
Using standard analysis methods, true mutations were missed and the obtained results contained many artifacts-no matter which platform was considered. Analysis of the parameters characterizing the true and false positive calls revealed significant platform- and variant specific differences. Application of optimized variant calling pipelines considerably improved results. 76% of all false positive single nucleotide variants and 97% of all false positive indels could be filtered out. Positive predictive values could be increased by factors of 1.07 to 1.27 in case of single nucleotide variant calling and by factors of 3.33 to 53.87 in case of indel calling. Application of the optimized variant calling pipelines leads to comparable results for all next-generation sequencing platforms analyzed. However, regarding clinical diagnostics it needs to be considered that even the optimized results still contained false positive as well as false negative calls.