Mastery and monsters
Spiky intelligence and rote learning
I almost failed one of my first year university mathematics exams. I suspect some people have a vision of university maths as just doing bigger calculations. More numbers. Longer equations. Solve for x, y and z.
But new undergrads soon hit a steep new learning curve with what’s known as ‘mathematical analysis’. Suddenly, maths is no longer just about performing calculations; it’s about building rigourous proofs.
This means being able to solve problems like the below:
Let (an) be a real sequence. Prove that if (an) converges to both L and M, then L = M.
Now you might be thinking ‘well if (an) converges to both L and M, then it’s obvious L=M’. And if you thought this then, much like first year undergraduate Adam, you’d leave the exam with very few marks1.
There’s good reason for demanding such rigour; in the 19th Century, many theorems assumed to be ‘obvious’ would end up collapsing when they encountered concepts like infinity. It would take mathematicians like Karl Weierstrass, Bernhard Riemann and others to put things back on a solid footing, and make sure proofs behaved even when dealing with things that were infinitely small or infinitely large.
Over time, I learned how to be good and rigorous at mathematical analysis, and by the end of my degree, these exams produced some of my highest marks. In turn, I’d spend my PhD getting better at Bayesian inference, a topic that I didn’t focus on so much as an undergrad. In the process, some previous expertise fell by the wayside. If you’d asked me in sixth form to write down Newton’s equations of motion off the cuff, it would have been easy for me. But if you asked me as a PhD student, I’d have had to look them up.
Exams that were once hard had become easy, and vice versa. I remember one morning during my PhD, I’d been sat discussing some questions from that year’s Sixth Term Examination Paper (STEP) with my fellow students. STEP is an exam for A-Level students hoping to get on to leading UK maths degree courses; it’s designed to reach beyond school level, examining university-like mathematical thinking. It’s administered by the University of Cambridge, and PhD students would often help out with marking.
When I’d taken STEP at school, I’d found the physics questions easiest. The main challenge was adapting familiar equations to new questions. Conceptual proof-like problems about properties of numbers or functions seemed much harder. Yet when I looked over that STEP paper years later in the Cambridge coffee room, it hit me that the opposite was now true. The abstract questions now seemed easy and the physics questions drew a blank. I’d forgotten the physics equations I’d rote-learned, while building the logical toolkit I needed to tackle pure maths questions.
In other words, my ability hadn’t increased evenly; it had become spiky. Training to build depth in one area had come at the cost of breadth in another.
This ‘spikiness’ in knowledge is now a common theme in discussion of AI skill. For leading models, the performance profile isn’t rounded - it sticks out far in certain areas, and not much at all in others.
This isn’t an accident. AI models are effectively targeting top marks on the same narrow set of exams. And that makes it difficult to define what ‘good’ means.
In a recent post, Maria Sukhareva pointed out that many LLM performance benchmarks aren’t necessarily as impressive as they might seem. Take the AIME 2025 mathematical benchmark, with questions taken from the 2025 American Invitational Mathematics Examination. GPT-5.2 scored 100% in AIME 2025. As Sukhareva notes:
But could OpenAI fine-tune their model on AIME 2025 to get 100%?
They don’t even need to. The questions and answers are all over the internet. These thirty questions are public, they could have just trained on them or fine-tune an already-trained model on it if the dataset was published after knowledge cut-off.
If this is the case, it’s like a student boasting they’ve got top marks on a past exam paper, after having used that same paper as a revision tool.
This can explain why LLMs that appear very good at some narrow tasks can perform very poorly on others. When learning is focused on a narrow region of the space of possible problems, it can deliver impressive results so long as the task remains stable and repetitive. But, much like a maths student relying heavily on past exam experience, it can lead to confident overfitting when the structure of the problem changes.
This can also be true of human expertise. Great pure mathematicians may struggle with data-driven problems. Strong statisticians may be uncomfortable with physics. Talented physicists may be weak at pure maths.
The reason researchers poured so much effort into the field of mathematical analysis in the 19th Century? They wanted to be confident that their results would actually hold true. They didn’t want awkward counterexamples – or ‘mathematical monsters’, as some called them – to come along in future and trample on their work.
But the spikiness of AI knowledge, and risk of overfitting to narrow tasks, means we don’t currently have that confidence when it comes to artificial skill. What seems like ‘good’ performance in one situation won’t necessarily translate to another. We may get mastery – or we may get a monster.
If you’re interested in reading more about Weierstrass, Riemann and mathematical monsters, you might like my latest book Proof: The Uncertain Science of Certainty.
Cover image: Antoine Dautry
The person who wrote the exam later got a Fields Medal, so I guess I shouldn’t feel too bad that it was hard.


Even leaving aside difficult concepts related to infinity, there are problems whose solution seems obvious, even to highly intelligent and well educated people (including the legendary mathematician Paul Erdős), but the obvious answer is wrong. Fro example, it seems obvious in the Monty Hall problem that your probability of winning a car is the same whether or not you switch your initial choice, but in fact the probability doubles if you switch. Some rigorous analysis is required to see this clearly.
My university math professor’s favorite phrase was “it’s intuitively obvious”.