Login / Signup

Machine learning-assisted search for novel coagulants: When machine learning can be efficient even if data availability is low.

Andrij RovenchakMaksym Druchok
Published in: Journal of computational chemistry (2024)
Design of new drugs is a challenging process: a candidate molecule should satisfy multiple conditions to act properly and make the least side-effect-perfect candidates selectively attach to and influence only targets, leaving off-targets intact. The amount of experimental data about various properties of molecules constantly grows, promoting data-driven approaches. However, the applicability of typical predictive machine learning techniques can be substantially limited by a lack of experimental data about a particular target. For example, there are many known Thrombin inhibitors (acting as anticoagulants), but a very limited number of known Protein C inhibitors (coagulants). In this study, we present our approach to suggest new inhibitor candidates by building an effective representation of chemical space. For this aim, we developed a deep learning model-autoencoder, trained on a large set of molecules in the SMILES format to map the chemical space. Further, we applied different sampling strategies to generate novel coagulant candidates. Symmetrically, we tested our approach on anticoagulant candidates, where we were able to predict their inhibition towards Thrombin. We also compare our approach with MegaMolBART-another deep learning generative model, but exploiting similar principles of navigation in a chemical space.
Keyphrases
  • machine learning
  • deep learning
  • big data
  • artificial intelligence
  • electronic health record
  • convolutional neural network
  • data analysis
  • resistance training
  • protein protein
  • high intensity
  • amino acid
  • drug induced