Fragment-based deep molecular generation using hierarchical chemical graph representation and multi-resolution graph variational autoencoder.
Zhenxiang GaoXinyu WangBlake Blumenfeld GainesXuetao ShiJinbo BiMinghu SongPublished in: Molecular informatics (2023)
Graph generative models have recently emerged as an interesting approach to construct molecular structures atom-by-atom or fragment-by-fragment. In this study, we adopt the fragment-based strategy and decompose each input molecule into a set of small chemical fragments. In drug discovery, a few drug molecules are designed by replacing certain chemical substituents with their bioisosteres or alternative chemical moieties. This inspires us to group decomposed fragments into different fragment clusters according to their local structural environment around bond-breaking positions. In this way, an input structure can be transformed into an equivalent three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer. We further implement a prototype model, named multi-resolution graph variational autoencoder (MRGVAE), to learn embeddings of constituted nodes at each layer in a fine-to-coarse order. Our decoder adopts a similar but conversely hierarchical structure. It first predicts the next possible fragment cluster, then samples an exact fragment structure out of the determined fragment cluster, and sequentially attaches it to the preceding chemical moiety. Our proposed approach demonstrates comparatively good performance in molecular evaluation metrics compared with several other graph-based molecular generative models. The introduction of the additional fragment cluster graph layer will hopefully increase the odds of assembling new chemical moieties absent in the original training set and enhance their structural diversity. We hope that our prototyping work will inspire more creative research to explore the possibility of incorporating different kinds of chemical domain knowledge into a similar multi-resolution neural network architecture.