Predicting functional effect of missense variants using graph attention neural networks.
Haicang ZhangMichelle S XuXiao FanWendy K ChungYufeng ShenPublished in: Nature machine intelligence (2022)
Accurate prediction of damaging missense variants is critically important for interpreting a genome sequence. Although many methods have been developed, their performance has been limited. Recent advances in machine learning and the availability of large-scale population genomic sequencing data provide new opportunities to considerably improve computational predictions. Here we describe the graphical missense variant pathogenicity predictor (gMVP), a new method based on graph attention neural networks. Its main component is a graph with nodes that capture predictive features of amino acids and edges weighted by co-evolution strength, enabling effective pooling of information from the local protein context and functionally correlated distal positions. Evaluation of deep mutational scan data shows that gMVP outperforms other published methods in identifying damaging variants in TP53 , PTEN , BRCA1 and MSH2 . Furthermore, it achieves the best separation of de novo missense variants in neuro developmental disorder cases from those in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.
Keyphrases
- neural network
- copy number
- intellectual disability
- machine learning
- amino acid
- genome wide
- big data
- working memory
- autism spectrum disorder
- electronic health record
- magnetic resonance
- healthcare
- computed tomography
- escherichia coli
- high resolution
- convolutional neural network
- randomized controlled trial
- minimally invasive
- systematic review
- gene expression
- mass spectrometry
- cystic fibrosis
- cell proliferation
- pi k akt
- social media
- binding protein
- radiation therapy
- pseudomonas aeruginosa
- neoadjuvant chemotherapy
- protein protein
- deep learning
- network analysis
- case control