Computational Performance of Deep Reinforcement Learning to Find Nash Equilibria.
Christoph GrafViktor ZobernigJohannes SchmidtClaude KlöcklPublished in: Computational economics (2023)
We test the performance of deep deterministic policy gradient-a deep reinforcement learning algorithm, able to handle continuous state and action spaces-to find Nash equilibria in a setting where firms compete in offer prices through a uniform price auction. These algorithms are typically considered "model-free" although a large set of parameters is utilized by the algorithm. These parameters may include learning rates, memory buffers, state space dimensioning, normalizations, or noise decay rates, and the purpose of this work is to systematically test the effect of these parameter configurations on convergence to the analytically derived Bertrand equilibrium. We find parameter choices that can reach convergence rates of up to 99%. We show that the algorithm also converges in more complex settings with multiple players and different cost structures. Its reliable convergence may make the method a useful tool to studying strategic behavior of firms even in more complex settings.