Clustering analysis of movement kinematics in reinforcement learning.
Ananda SidartaJohn KomarDavid J OstryPublished in: Journal of neurophysiology (2021)
Reinforcement learning has been used as an experimental model of motor skill acquisition, where at times movements are successful and thus reinforced. One fundamental problem is to understand how humans select exploration over exploitation during learning. The decision could be influenced by factors such as task demands and reward availability. In this study, we applied a clustering algorithm to examine how a change in the accuracy requirements of a task affected the choice of exploration over exploitation. Participants made reaching movements to an unseen target using a planar robot arm and received reward after each successful movement. For one group of participants, the width of the hidden target decreased after every other training block. For a second group, it remained constant. The clustering algorithm was applied to the kinematic data to characterize motor learning on a trial-to-trial basis as a sequence of movements, each belonging to one of the identified clusters. By the end of learning, movement trajectories across all participants converged primarily to a single cluster with the greatest number of successful trials. Within this analysis framework, we defined exploration and exploitation as types of behavior in which two successive trajectories belong to different or similar clusters, respectively. The frequency of each mode of behavior was evaluated over the course of learning. It was found that by reducing the target width, participants used a greater variety of different clusters and displayed more exploration than exploitation. Excessive exploration relative to exploitation was found to be detrimental to subsequent motor learning. NEW & NOTEWORTHY The choice of exploration versus exploitation is a fundamental problem in learning new motor skills through reinforcement. In this study, we employed a data-driven approach to characterize movements on a trial-by-trial basis with an unsupervised clustering algorithm. Using this technique, we found that changes in task demands and, in particular, in the required accuracy of movements, influenced the ratio of exploration to exploitation. This analysis framework provides an attractive tool to investigate mechanisms of explorative and exploitative behavior while studying motor learning.