Large-scale comparative evaluation of user-friendly tools for predicting variant-induced alterations of splicing regulatory elements.
Hélène TubeufCamille CharbonnierOmar SoukariehAndré BlavierArnaud LefebvreHélène DauchelThierry FrebourgPascaline GaildratAlexandra MartinsPublished in: Human mutation (2020)
Discriminating which nucleotide variants cause disease or contribute to phenotypic traits remains a major challenge in human genetics. In theory, any intragenic variant can potentially affect RNA splicing by altering splicing regulatory elements (SREs). However, these alterations are often ignored mainly because pioneer SRE predictors have proved inefficient. Here, we report the first large-scale comparative evaluation of four user-friendly SRE-dedicated algorithms (QUEPASA, HEXplorer, SPANR, and HAL) tested both as standalone tools and in multiple combined ways based on two independent benchmark datasets adding up to >1,300 exonic variants studied at the messenger RNA level and mapping to 89 different disease-causing genes. These methods display good predictive power, based on decision thresholds derived from the receiver operating characteristics curve analyses, with QUEPASA and HAL having the best accuracies either as standalone or in combination. Still, overall there was a tight race between the four predictors, suggesting that all methods may be of use. Additionally, QUEPASA and HEXplorer may be beneficial as well for predicting variant-induced creation of pseudoexons deep within introns. Our study highlights the potential of SRE predictors as filtering tools for identifying disease-causing candidates among the plethora of variants detected by high-throughput DNA sequencing and provides guidance for their use in genomic medicine settings.
Keyphrases
- copy number
- high throughput
- high glucose
- endothelial cells
- genome wide
- single cell
- diabetic rats
- transcription factor
- machine learning
- high resolution
- blood brain barrier
- deep learning
- nucleic acid
- gene expression
- risk assessment
- low cost
- decision making
- mass spectrometry
- oxidative stress
- induced pluripotent stem cells
- solid state
- bioinformatics analysis