Homology-based hydrogen bond information improves crystallographic structures in the PDB.
Bart van BeusekomWouter G TouwMahidhar TatineniSandeep SomaniGunaretnam RajagopalJinquan LuoGary L GillilandAnastassis PerrakisRobbie P JoostenPublished in: Protein science : a publication of the Protein Society (2017)
The Protein Data Bank (PDB) is the global archive for structural information on macromolecules, and a popular resource for researchers, teachers, and students, amassing more than one million unique users each year. Crystallographic structure models in the PDB (more than 100,000 entries) are optimized against the crystal diffraction data and geometrical restraints. This process of crystallographic refinement typically ignored hydrogen bond (H-bond) distances as a source of information. However, H-bond restraints can improve structures at low resolution where diffraction data are limited. To improve low-resolution structure refinement, we present methods for deriving H-bond information either globally from well-refined high-resolution structures from the PDB-REDO databank, or specifically from on-the-fly constructed sets of homologous high-resolution structures. Refinement incorporating HOmology DErived Restraints (HODER), improves geometrical quality and the fit to the diffraction data for many low-resolution structures. To make these improvements readily available to the general public, we applied our new algorithms to all crystallographic structures in the PDB: using massively parallel computing, we constructed a new instance of the PDB-REDO databank (https://pdb-redo.eu). This resource is useful for researchers to gain insight on individual structures, on specific protein families (as we demonstrate with examples), and on general features of protein structure using data mining approaches on a uniformly treated dataset.
Keyphrases
- high resolution
- electronic health record
- big data
- health information
- healthcare
- single molecule
- machine learning
- small molecule
- protein protein
- wastewater treatment
- aortic valve replacement
- dna damage
- binding protein
- dna repair
- emergency department
- heart failure
- social media
- data analysis
- tandem mass spectrometry
- amino acid
- drug induced
- ejection fraction