Share this comment
aisnakeoil.substack.com… "As further evidence for this hypothesis, we tested it on Codeforces problems from different times in 2021. We found that it could regularly solve problems in the easy category before September 5, but none of the problems after September 12. In fact, we can definitively show that it has memorized problems in its …
© 2025 Bryan Caplan
Substack is the home for great culture
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
"As further evidence for this hypothesis, we tested it on Codeforces problems from different times in 2021. We found that it could regularly solve problems in the easy category before September 5, but none of the problems after September 12.
In fact, we can definitively show that it has memorized problems in its training set: when prompted with the title of a Codeforces problem, GPT-4 includes a link to the exact contest where the problem appears (and the round number is almost correct: it is off by one). Note that GPT-4 cannot access the Internet, so memorization is the only explanation."
It sounds like it trained on questions that are more similar to your exam questions. Have you tried asking for similar exam questions to the ones you gave to try to determine what it used for training? Another thing to try would be to ask new questions to test similar concepts.