Accurate Estimation of Solvent Accessible Surface Area for Coarse-Grained Biomolecular Structures with Deep Learning.
Tiejun DongTong GongWenfei LiPublished in: The journal of physical chemistry. B (2021)
Coarse-grained (CG) models of biomolecules have been widely used in protein/ribonucleic acid (RNA) three-dimensional structure prediction, docking, drug design, and molecular simulations due to their superiority in computational efficiency. Most of these applications strongly depend on the reasonable estimation of solvation free energy, which requires the accurate calculation of solvent accessible surface area (SASA). Although algorithms for SASA calculations with all-atom protein and RNA structures have been well-established, accurately estimating the SASA based on CG structures is extremely challenging. In this work, we developed a deep learning-based SASA estimator (DeepCGSA), which can provide almost perfect SASA estimation based on CG structures of protein and RNA molecules. Extensive testing analysis showed that for three types of widely used CG protein models, including the Cα-based, Cα-Cβ, and Martini models, the correlation coefficients between the predicted values and the reference values can be as high as 0.95-0.99, which perform dramatically better than available methods. In addition, the new method can be used for CG RNA structures and unfolded protein structures with much improved accuracy. We anticipate that DeepCGSA will be highly useful in the protein/RNA structure prediction, drug design, and other applications, in which accurate estimations of SASA for CG biomolecular structures are critically important.