Smoothed Spherical Truncation based on Fuzzy Membership Functions: Application to the Molecular Encoding.
César R García-JacasYovani Marrero-PonceCarlos A BrizuelaJosé Suárez-LezcanoFelix Martinez-RiosPublished in: Journal of computational chemistry (2019)
A novel spherical truncation method, based on fuzzy membership functions, is introduced to truncate interatomic (or interaminoacid) relations according to smoothing values computed from fuzzy membership degrees. In this method, the molecules are circumscribed into a sphere, so that the geometric centers of the molecules are the centers of the spheres. The fuzzy membership degree of each atom (or aminoacid) is computed from its distance with respect to the geometric center of the molecule, by using a fuzzy membership function. So, the smoothing value to be applied in the truncation of a relation (or interaction) is computed by averaging the fuzzy membership degrees of the atoms (or aminoacids) involved in the relation. This truncation method is rather different from the existing ones, at considering the geometric center for the whole molecule and not only for atom-groups, as well as for using fuzzy membership functions to compute the smoothing values. A variability study on a set comprised of 20,469 compounds (15,050 drug-like compounds, 2994 drugs approved, 880 natural products from African sources, and 1545 plant-derived natural compounds exhibiting anti-cancerous activity) demonstrated that the truncation method proposed allows to determine molecular encodings with better ability for discriminating among structurally different molecules than the encodings obtained without applying truncation or applying non-fuzzy truncation functions. Moreover, a principal component analysis revealed that orthogonal chemical information of the molecules is encoded by using the method proposed. Lastly, a modeling study proved that the truncation method improves the modeling ability of existing geometric molecular descriptors, at allowing to develop more robust models than the ones built only using non-truncated descriptors. In this sense, a comparison and statistical assessment were performed on eight chemical datasets. As a result, the models based on the truncated molecular encodings yielded statistically better results than 12 procedures considered from the literature. It can thus be stated that the proposed truncation method is a relevant strategy for obtaining better molecular encodings, which will be ultimately useful in enhancing the modeling ability of existing encodings both on small-to-medium size molecules and biomacromolecules. © 2019 Wiley Periodicals, Inc.