Login / Signup

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet.

John Z H ZhangJohn Zenghui Zhang
Published in: Journal of chemical information and modeling (2020)
Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, the use of deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 53.24 ± 0.17% in a 5-fold cross-validation on the training set and 55.53% and 50.71% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared with the approach that simply uses the top-k predictions and therefore enabled higher sequence identity in redesigning three proteins with Rosetta. The network and the datasets are available on a web server at http://protein.org.cn/densecpd.html. The results of this study may benefit the further development of computational protein design methods.
Keyphrases
  • amino acid
  • neural network
  • protein protein
  • binding protein
  • machine learning
  • small molecule
  • lymph node metastasis
  • artificial intelligence
  • sensitive detection
  • virtual reality