Choice of Adaptive Sampling Strategy Impacts State Discovery, Transition Probabilities, and the Apparent Mechanism of Conformational Changes.
Maxwell I ZimmermanJustin R PorterXianqiang SunRoseane R SilvaGregory R BowmanPublished in: Journal of chemical theory and computation (2018)
Interest in atomically detailed simulations has grown significantly with recent advances in computational hardware and Markov state modeling (MSM) methods, yet outstanding questions remain that hinder their widespread adoption. Namely, how do alternative sampling strategies explore conformational space and how might this influence predictions generated from the data? Here, we seek to answer these questions for four commonly used sampling methods: (1) a single long simulation, (2) many short simulations run in parallel, (3) adaptive sampling, and (4) our recently developed goal-oriented sampling algorithm, FAST. We first develop a theoretical framework for analytically calculating the probability of discovering select states on simple landscapes, where we uncover the drastic effects of varying the number and length of simulations. We then use kinetic Monte Carlo simulations on a variety of physically inspired landscapes to characterize the probability of discovering particular states and transition pathways for each of the four methods. Consistently, we find that FAST simulations discover each target state with the highest probability, while traversing realistic pathways. Furthermore, we uncover the potential pathology that short parallel simulations sometimes predict an incorrect transition pathway by crossing large energy barriers that long simulations would typically circumnavigate. We refer to this pathology as "pathway tunneling". To protect against this phenomenon when using adaptive-sampling and FAST simulations, we introduce the FAST-string method. This method enhances sampling along the highest-flux transition paths to refine an MSMs transition probabilities and discriminate between competing pathways. Additionally, we compare the performance of a variety of MSM estimators in describing accurate thermodynamics and kinetics. For adaptive sampling, we recommend simply normalizing the transition counts out of each state after adding small pseudocounts to avoid creating sources or sinks. Lastly, we evaluate whether our insights from simple landscapes hold for all-atom molecular dynamics simulations of the folding of the λ-repressor protein. Remarkably, we find that FAST-contacts predicts the same folding pathway as a set of long simulations but with orders of magnitude less simulation time.
Keyphrases