
OpenAI's new AI model has won a gold medal at the 2025 International Math Olympiad (IMO). The unreleased model was permitted to participate under official test conditions without access to the internet or any coding tools. It scored 35 out of 42, earning just enough points to qualify for the gold medal. The IMO is regarded as the most challenging math competition for high school students worldwide.
Unlike Google DeepMind's task-specific AlphaGeometry 2, OpenAI’s new AI model is based on a general-purpose reasoning system. Alexander Wei, an OpenAI technical staff member, stated on X (formerly Twitter),
“We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.”
In other words, the model wasn't trained just for geometry or Olympiad-style problems; it was optimized to reason very broadly.
The AI-generated proofs were examined by three former IMO medalists, who were unanimous in their grading. According to OpenAI, the model crafted full natural language solutions, something that is supposed to simulate how human mathematicians create multi-page arguments using lemmas.
However, some critics have raised some red flags. NYU professor Gary Marcus stated on X:
“OpenAI has told us the result, but not how it was achieved. That leaves me with many questions.”
There has yet to be any independent verification by the official IMO coordinators.
OpenAI's announcement followed just a few days after DARPA opened an initiative to allow AI to coauthor advanced mathematical research. Sam Altman, CEO of OpenAI, weighed in, noting,
“This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.”
This effort aligns with the initiatives that DARPA has undertaken to underpin profound technological breakthroughs, such as ARPANET.
This experimental achievement by OpenAI shows the potential of large language models to reason through complex, high-stakes issues, rather than solely generating fluent text. Despite concerns about transparency and reproducibility, this could represent a pivotal moment on the journey toward artificial general intelligence.
If validated, this achievement, along with the ongoing development of artificial intelligence, is likely to change how we perceive machine reasoning and how humans and AI might collaborate to make new discoveries.