Rate-Perturbing Single Amino Acid Mutation for Hydrolases: A Statistical Profiling.
Bailu YanXinchun RanYaoyukun JiangSarah K TorrenceLi YuanQianzhen ShaoZhongyue J YangPublished in: The journal of physical chemistry. B (2021)
Hydrolases are a critical component for modern chemical, pharmaceutical, and environmental sciences. Identifying mutations that enhance catalytic efficiency presents a roadblock to design and to discover new hydrolases for broad academic and industrial uses. Here, we report the statistical profiling for rate-perturbing mutant hydrolases with a single amino acid substitution. We constructed an integrated structure-kinetics database for hydrolases, IntEnzyDB, which contains 3907 kcats, 4175 KMs, and 2715 Protein Data Bank IDs. IntEnzyDB adopts a relational architecture with a flattened data structure, enabling facile and efficient access to clean and tabulated data for machine learning uses. We conducted statistical analyses on how single amino acids mutations influence the turnover number (i.e., kcat) and efficiency (i.e., kcat/KM), with a particular emphasis on profiling the features for rate-enhancing mutations. The results show that mutation to bulky nonpolar residues with a hydrocarbon chain involves a higher likelihood for rate acceleration than to other types of residues. Linear regression models reveal geometric descriptors of substrate and mutation residues that mediate rate-perturbing outcomes for hydrolases with bulky nonpolar mutations. On the basis of the analyses of the structure-kinetics relationship, we observe that the propensity for rate enhancement is independent of protein sizes. In addition, we observe that distal mutations (i.e., >10 Å from the active site) in hydrolases are significantly more prone to induce efficiency neutrality and avoid efficiency deletion but involve similar propensity for rate enhancement. The studies reveal the statistical features for identifying rate-enhancing mutations in hydrolases, which will potentially guide hydrolase discovery in biocatalysis.
Keyphrases
- amino acid
- machine learning
- single cell
- electronic health record
- big data
- type diabetes
- emergency department
- metabolic syndrome
- genome wide
- adipose tissue
- small molecule
- wastewater treatment
- climate change
- heavy metals
- protein protein
- body composition
- data analysis
- binding protein
- postmenopausal women
- skeletal muscle
- deep learning
- insulin resistance
- medical students
- visible light
- drug induced