Reply to Hu et al.: Applying different evaluation standards to humans vs. Large Language Models overestimates AI performance.

Published in: Proceedings of the National Academy of Sciences of the United States of America (2024)

Keyphrases