Flexible Protein-Protein Docking with a Multi-Track Iterative Transformer.
Lee-Shin ChuJeffrey A RuffoloAmeya HarmalkarJeffrey J GrayPublished in: Protein science : a publication of the Protein Society (2023)
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and re-ranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, e.g., structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multi-track iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments (MSAs), GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. On the DIPS test set, GeoDock achieves a 43% top-1 success rate, outperforming all other tested methods. However, in the standard DIPS train/test splits, we discovered contamination of close homologs in the training set. After decontaminating the training set, the success rate is 31%. On the DB5.5 test set and a benchmark dataset of antibody-antigen complexes, GeoDock outperforms the deep learning models trained using the same dataset but falls behind most of the conventional methods and AlphaFold-Multimer. GeoDock attains an average inference speed of under one second on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock. This article is protected by copyright. All rights reserved.
Keyphrases
- protein protein
- small molecule
- deep learning
- molecular dynamics
- molecular dynamics simulations
- high throughput
- machine learning
- high resolution
- convolutional neural network
- risk assessment
- artificial intelligence
- dna binding
- magnetic resonance imaging
- binding protein
- virtual reality
- single cell
- transcription factor
- body composition
- electronic health record
- computed tomography
- image quality
- oxidative stress
- drinking water
- solid state
- heavy metals
- contrast enhanced
- antiretroviral therapy