This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. Using the idea of delayed Q-learning, the paper extends the well-known Nash Q-learning algorithm to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PACMARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC. Comparative numerical results demonstrate the algorithm's performance and robustness.