Login / Signup

Thompson Sampling─An Efficient Method for Searching Ultralarge Synthesis on Demand Databases.

Kathryn KlarichBrian GoldmanTrevor KramerPatrick RileyW Patrick Walters
Published in: Journal of chemical information and modeling (2024)
Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.
Keyphrases
  • drug discovery
  • high throughput
  • machine learning
  • molecular dynamics
  • big data
  • molecular dynamics simulations
  • protein protein
  • public health
  • electronic health record
  • deep learning
  • neural network