Predicting the impact of single nucleotide variants on splicing via sequence-based deep neural networks and genomic features.
Tatsuhiko NaitoPublished in: Human mutation (2019)
Single nucleotide mutations in exonic regions can significantly affect gene function through a disruption of splicing, and various computational methods have been developed to predict the splicing-related effects of a single nucleotide mutation. We implemented a new method using ensemble learning that combines two types of predictive models: (a) base sequence-based deep neural networks (DNNs) and (b) machine learning models based on genomic attributes. This method was applied to the Massively Parallel Splicing Assay challenge of the Fifth Critical Assessment of Genome Interpretation, in which challenge participants predicted various experimentally-defined exonic splicing mutations, and achieved a promising result. We successfully revealed that combining different predictive models based upon the stacked generalization method led to significant improvement in prediction performance. In addition, whereas most of the genomic features adopted in constructing machine learning models were previously reported, feature values generated with DSSP, a DNN-based splice site prediction tool, were novel and helpful for the prediction. Learning the sequence patterns associated with normal splicing and the change in splicing site probabilities caused by a mutation was presumed to be helpful in predicting splicing disruption.