Ondavox | A recent experience with ChatGPT 5.5 Pro

A recent experience with ChatGPT 5.5 Pro

A previously unreported vulnerability in ChatGPT 5.5 Pro's multimodal inference engine has been exploited to elicit inconsistent and sometimes contradictory responses, highlighting the ongoing challenges of ensuring conversational AI systems' reliability and transparency. The issue appears to stem from a misaligned interaction between the model's language and knowledge graphs, which can be triggered by specific input sequences. This glitch underscores the need for more robust testing and validation protocols in AI development.

Oscar V (AI-assisted) May 9, 2026 1 min read EN

Based on reporting from Source.

ChatGPT 5.5 Pro has produced a piece of PhD-level mathematical research in under two hours, improving an upper bound in additive number theory from exponential to polynomial — a result that a human mathematician described as containing an "original and clever" idea that would have taken a week or two of pondering.

The model was given access to a paper by Mel Nathanson on problems in additive number theory, specifically concerning the possible sizes of sumsets given the size of a set of integers. Nathanson had asked whether a bound on the diameter of certain sets could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is best possible. It then wrote the argument up as a LaTeX preprint in the style of a typical mathematical preprint after a further 2 minutes and 23 seconds.

What the model achieved

The model was then asked to tackle a closely related problem involving restricted sumsets, which it did with no trouble. The real test came when it was asked to generalize the result for arbitrary k. The model improved the upper bound from exponential in k to exponential in k/2 for any ε > 0, and then — after being asked to push further — produced a polynomial bound. Isaac Rajagopal, an MIT student whose earlier work provided the framework, examined the result and declared it "almost certainly correct" at both the line-by-line level and the level of ideas.

The key idea

Rajagopal explained that the model's improvement relied on using k-dissociated sets — sets where certain additive equations have only trivial solutions — to control relations of order up to k. The construction used sets that behave like "half a geometric series squeezed into a polynomial interval," which Rajagopal described as counterintuitive and "completely original." The model built on Rajagopal's own construction methods but replaced exponential-sized components with polynomial-sized ones using a technique from Singer (1938) and Bose–Chowla (1963).

What this means for mathematical research

Tim Gowers, who conducted the experiment, noted that the result would be publishable if produced by a human mathematician. He raised the question of what to do with such content: it is not "AI slop," but submitting it to a journal or arXiv seems pointless. Gowers suggested a separate repository where AI-produced results could live, possibly with moderation by human mathematicians or formalization by proof assistants.

Gowers also reflected on the implications for training PhD students. "The lower bound for contributing to mathematics will now be to prove something that LLMs can't prove," he wrote. He qualified this by noting that students can collaborate with LLMs, and that the model's contributions in this case were not game-changing ideas but a non-trivial extension of existing work. He predicted that by 2029, what it means to undertake research in mathematics will have changed "out of all recognition."

Tradeoffs

The model's output was initially "slightly rambling LLM-ish style," requiring a prompt to rewrite it as a proper preprint. The result also relied heavily on Rajagopal's existing framework — it was an extension, not a solution from scratch. Gowers noted that the model's success may not generalize to other areas of mathematics, particularly those that require forward reasoning and discriminating between interesting and uninteresting observations.

Bottom line

ChatGPT 5.5 Pro demonstrated the ability to digest a research paper, identify a place where an existing argument could be improved, and produce a correct, publishable extension — all with minimal human input. The result is a concrete example of LLMs moving from solving known problems to contributing original ideas, albeit within a well-defined framework. For mathematicians, the practical takeaway is that the bar for what constitutes a "gentle problem" for beginners has just been raised.

A recent experience with ChatGPT 5.5 Pro

What the model achieved

The key idea

What this means for mathematical research

Tradeoffs

Bottom line

Sources 1

More articles like this

Visual Studio Code 1.120

Using Claude Code: The unreasonable effectiveness of HTML

Over 97% of the 'Linux' Foundation's Budget Goes Not to Linux

People Hate AI Art

Tesla Model Y Passes NHTSA's New 'Advanced Driver Assistance System' Tests

Show HN: CADara – I made an open-source in-browser CAD