Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools.
Nofe AlganmiHeba AbusamraPublished in: PloS one (2023)
The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.
Keyphrases
- data analysis
- healthcare
- high resolution
- primary care
- electronic health record
- copy number
- single molecule
- gene expression
- big data
- machine learning
- dna damage
- high density
- artificial intelligence
- loop mediated isothermal amplification
- peripheral blood
- deep learning
- quality improvement
- affordable care act
- label free
- circulating tumor cells