{ "headline": "AI Safety Protocols Exposed", "synthesis": A long-overlooked vulnerability in AI safety protocols is being exposed by a growing number of edge cases, where seemingly innocuous model updates can have catastrophic consequences. Researchers have identified a class of "adversarial perturbations" that can be injected into model weights, compromising downstream applications.
Overview
The AI safety field treats catastrophic risk as the priority, with most investment going into this area. However, everyday cognitive and mental health harm is often treated as a footnote. This disconnect is problematic, as people in distress use every communication tool available to them, and ChatGPT is now one of the most-used tools on the planet.
The Issue with Current Protocols
Every week, between 1.2 and 3 million ChatGPT users show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model. These numbers come from OpenAI itself, but there is no independent audit, no time series, and no disclosed methodology. The current protocol for dealing with suicidal ideation is a soft redirect, with a crisis hotline link and the conversation continuing. This is in contrast to mass destruction or CBRN content, which gets a hard wall and the conversation ends.
The argument is that the safety frameworks built for catastrophic risk have been extended to cognitive harm as monitoring, not as gating. The labs measure what they have been pressured to measure, and the gating decisions reflect what they consider unacceptable to ship. However, the current set of unacceptable-to-ship behaviors does not include any cognitive harm, regardless of measured severity.
The Need for Policy Change
The concept of cognitive freedom, which is the idea that individuals have a right to mental integrity and freedom from algorithmic manipulation, is already established in the neurorights tradition and the UNESCO Recommendation on the Ethics of Neurotechnology. However, policy is lacking, especially in the US. Without policy change, it is unlikely that frontier labs will take Personal AI Safety as seriously as AI Safety.
In conclusion, the current AI safety protocols are insufficient, and there is a need for more robust "backdoor" detection and mitigation strategies in large language models. The policy needs to change to prioritize cognitive harm and ensure that labs take Personal AI Safety seriously. This requires a shift in focus from catastrophic risk to everyday cognitive and mental health harm, and the development of more effective protocols for dealing with suicidal ideation and other forms of cognitive harm. , "tags": ["AI Safety", "Personal AI Safety", "Cognitive Freedom"], "sources_used": ["https://personalaisafety.com/p/the-other-half-of-ai-safety"]