Improving variant prioritization in exome analysis by entropy-weighted ensemble of multiple tools.
Yanjie FanYing ZhouHuili LiuXiaomei LuoTing XuYu SunTingting YangLinlin ChenXuefan GuYong-Guo YuPublished in: Clinical genetics (2022)
Variant prioritization is a crucial step in the analysis of exome and genome sequencing. Multiple phenotype-driven tools have been developed to automate the variant prioritization process, but the efficacy of these tools in clinical setting with fuzzy phenotypic information and whether ensemble of these tools could outperform single algorithm remains to be assessed. A large rare disease cohort with heterogeneous phenotypic information, including a primary cohort of 1614 patients and a replication cohort of 1904 patients referred to exome sequencing, were recruited to assess the efficacy of variant prioritization and their ensemble. Three freely available tools-Exomiser, Xrare, and DeepPVP-and their ensemble were evaluated. The performance of all three tools was influenced by the attributes of phenotypic input. When combining these three tools by weighted-sum entropy method (EWE3), the ensemble outperformed any single algorithm, achieving a rate of 78% diagnostic variants in top 3 (13% improvement over current best performer, compared to Exomiser: 63%, Xrare: 65%, and DeepPVP: 51%), 88% in top 10 and 96% in top 30. The results were replicated in another independent cohort. Our study supports using entropy-weighted ensemble of multiple tools to improve variant prioritization and accelerate molecular diagnosis in exome/genome sequencing.
Keyphrases
- neural network
- end stage renal disease
- convolutional neural network
- copy number
- ejection fraction
- newly diagnosed
- magnetic resonance
- machine learning
- chronic kidney disease
- single cell
- deep learning
- prognostic factors
- peritoneal dialysis
- contrast enhanced
- magnetic resonance imaging
- computed tomography
- health information
- social media