Human thalamic low-frequency oscillations correlate with expected value and outcomes during reinforcement learning.
Antoine Collomb-ClercMaëlle C M GueguenLorella MinottiPhilippe KahaneVincent NavarroFabrice BartolomeiRomain CarronJean M RégisStephan ChabardèsStefano PalminteriJulien BastinPublished in: Nature communications (2023)
Reinforcement-based adaptive decision-making is believed to recruit fronto-striatal circuits. A critical node of the fronto-striatal circuit is the thalamus. However, direct evidence of its involvement in human reinforcement learning is lacking. We address this gap by analyzing intra-thalamic electrophysiological recordings from eight participants while they performed a reinforcement learning task. We found that in both the anterior thalamus (ATN) and dorsomedial thalamus (DMTN), low frequency oscillations (LFO, 4-12 Hz) correlated positively with expected value estimated from computational modeling during reward-based learning (after outcome delivery) or punishment-based learning (during the choice process). Furthermore, LFO recorded from ATN/DMTN were also negatively correlated with outcomes so that both components of reward prediction errors were signaled in the human thalamus. The observed differences in the prediction signals between rewarding and punishing conditions shed light on the neural mechanisms underlying action inhibition in punishment avoidance learning. Our results provide insight into the role of thalamus in reinforcement-based decision-making in humans.