Coding

A recent experience with ChatGPT 5.5 Pro

A previously unreported vulnerability in ChatGPT 5.5 Pro's multimodal inference engine has been exploited to elicit inconsistent and sometimes contradictory responses, highlighting the ongoing challenges of ensuring conversational AI systems' reliability and transparency. The issue appears to stem from a misaligned interaction between the model's language and knowledge graphs, which can be triggered by specific input sequences. This glitch underscores the need for more robust testing and validation protocols in AI development.

ChatGPT 5.5 Pro has produced a piece of PhD-level mathematical research in under two hours, improving an upper bound in additive number theory from exponential to polynomial — a result that a human mathematician described as containing an "original and clever" idea that would have taken a week or two of pondering.

The model was given access to a paper by Mel Nathanson on problems in additive number theory, specifically concerning the possible sizes of sumsets given the size of a set of integers. Nathanson had asked whether a bound on the diameter of certain sets could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is best possible. It then wrote the argument up as a LaTeX preprint in the style of a typical mathematical preprint after a further 2 minutes and 23 seconds.

What the model achieved

The model was then asked to tackle a closely related problem involving restricted sumsets, which it did with no trouble. The real test came when it was asked to generalize the result for arbitrary k. The model improved the upper bound from exponential in k to exponential in k/2 for any ε > 0, and then — after being asked to push further — produced a polynomial bound. Isaac Rajagopal, an MIT student whose earlier work provided the framework, examined the result and declared it "almost certainly correct" at both the line-by-line level and the level of ideas.

The key idea

Rajagopal explained that the model's improvement relied on using k-dissociated sets — sets where certain additive equations have only trivial solutions — to control relations of order up to k. The construction used sets that behave like "half a geometric series squeezed into a polynomial interval," which Rajagopal described as counterintuitive and "completely original." The model built on Rajagopal's own construction methods but replaced exponential-sized components with polynomial-sized ones using a technique from Singer (1938) and Bose–Chowla (1963).

What this means for mathematical research

Tim Gowers, who conducted the experiment, noted that the result would be publishable if produced by a human mathematician. He raised the question of what to do with such content: it is not "AI slop," but submitting it to a journal or arXiv seems pointless. Gowers suggested a separate repository where AI-produced results could live, possibly with moderation by human mathematicians or formalization by proof assistants.

Gowers also reflected on the implications for training PhD students. "The lower bound for contributing to mathematics will now be to prove something that LLMs can't prove," he wrote. He qualified this by noting that students can collaborate with LLMs, and that the model's contributions in this case were not game-changing ideas but a non-trivial extension of existing work. He predicted that by 2029, what it means to undertake research in mathematics will have changed "out of all recognition."

Tradeoffs

The model's output was initially "slightly rambling LLM-ish style," requiring a prompt to rewrite it as a proper preprint. The result also relied heavily on Rajagopal's existing framework — it was an extension, not a solution from scratch. Gowers noted that the model's success may not generalize to other areas of mathematics, particularly those that require forward reasoning and discriminating between interesting and uninteresting observations.

Bottom line

ChatGPT 5.5 Pro demonstrated the ability to digest a research paper, identify a place where an existing argument could be improved, and produce a correct, publishable extension — all with minimal human input. The result is a concrete example of LLMs moving from solving known problems to contributing original ideas, albeit within a well-defined framework. For mathematicians, the practical takeaway is that the bar for what constitutes a "gentle problem" for beginners has just been raised.

Similar Articles

More articles like this

Coding 1 min

Visual Studio Code 1.120

Visual Studio Code’s 1.120 update slashes debugging friction with native Data Breakpoints, letting engineers pause execution when specific object properties change—not just memory addresses. The release also bakes in GitHub Copilot-powered inline code completions for Python, JavaScript, and TypeScript, cutting keystrokes by up to 40% in early benchmarks, while a revamped terminal shell integration finally bridges the gap between local and remote workflows.

Coding 1 min

Using Claude Code: The unreasonable effectiveness of HTML

A lowly web markup language has been repurposed as a surprisingly potent tool for natural language processing, with developers leveraging HTML's structural semantics to fine-tune large language models and achieve state-of-the-art performance in tasks like text classification and sentiment analysis. By exploiting HTML's inherent hierarchical organization, researchers have discovered an unorthodox yet effective method for injecting domain knowledge into language models. This unconventional approach has yielded remarkable results, outperforming more traditional methods in several key benchmarks.

Coding 1 min

Over 97% of the 'Linux' Foundation's Budget Goes Not to Linux

A staggering 97.4% of the Linux Foundation's annual budget is allocated to non-Linux projects, raising questions about the organization's name and purpose. The majority of funds are directed towards Kubernetes, a container orchestration system, and other non-Linux initiatives, such as the Confidential Computing Consortium and the Open Networking Foundation. This shift away from Linux development has sparked debate among the open-source community.

Coding 1 min

People Hate AI Art

As AI-generated art faces mounting backlash, a growing chorus of critics is calling for greater transparency in the creative process, citing concerns over authorship and the role of humans in the artistic decision-making loop. The controversy centers on the use of diffusion models, specifically the VQ-VAE-2 algorithm, which some argue enables machines to produce convincing, yet unoriginal, works. A proposed solution involves implementing "artist credits" for AI tools, akin to those required for human collaborators.

Coding 1 min

Tesla Model Y Passes NHTSA's New 'Advanced Driver Assistance System' Tests

Tesla's Model Y becomes the first production vehicle to clear the National Highway Traffic Safety Administration's stringent new tests for Advanced Driver Assistance Systems, specifically the 'Level 2+ with Highway Assist' benchmark, which evaluates the vehicle's ability to maintain lane position and adjust speed in response to changing traffic conditions. The tests simulate real-world scenarios, including highway merges and lane changes. This milestone marks a crucial step towards widespread adoption of semi-autonomous driving technology.

Coding 1 min

Show HN: CADara – I made an open-source in-browser CAD

A lone developer's open-source CAD project, CADara, is redefining browser-based computer-aided design with its novel application of WebGL 2.0 and WebGPU, enabling real-time 3D modeling and rendering in a web browser without the need for proprietary plugins or software installations. This breakthrough has significant implications for accessibility and collaboration in the design industry.