Anthropic Reverses Claude Restrictions After Research Community's Pushback

The restriction was implemented to avoid “accelerating the actors most willing to violate these terms.”

MITSloan ME Editorial June 11, 2026

Topics

Soon after AI startup Anthropic released Claude Fable 5 and Claude Mythos 5 to the public, AI researchers, developers, and policy experts raised concerns. They pointed to a paragraph within Claude Fable 5’s 319-page system card that could cause the models to provide intentionally weakened responses on specific topics.

“We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design)…Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work,” the lab’s document read.

According to Anthropic, this would have a marginal impact of ~0.03% of traffic, concentrated in fewer than 0.1% of organizations.

The restriction was implemented to avoid “accelerating the actors most willing to violate these terms.”

The move received massive pushback from the AI community on social media, with many arguing that such controls degrade the quality of assistance without user knowledge.

“Anthropic’s latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won’t notice,” AI research firm SemiAnalysis wrote on X.

AI training expert at startup Prime Intellect, Elie Bakouch, took to X to make sense of “how this can be a good decision.” “Lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned,” he said, adding that Anthropic had made no separate public statement on the change.

Former Anthropic employee Behnam Neyshabur said, “Working on AI for cancer? Sorry, I can’t help you. Working on AI for Alzheimer’s Disease? Sorry, I’m becoming a bit dumb when it comes to the AI part of it.”

Andrej Karpathy, the former OpenAI cofounder and ex-Tesla AI director who recently joined Anthropic, offered a more measured take on Claude Fable 5, describing it as a “super exciting release” and a “major-version-bump-deserving step change forward” on par with the leap from Claude 4.5. That said, he acknowledged that the model “still has quirks that people will run into” and that the safeguards are “a little too trigger-happy for launch,” though he expressed hope they’d be tuned over time.

According to reports, Anthropic will make the safeguards visible in Claude Fable 5. “Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. You will see this every time it happens,” a company spokesperson said.

“A visible safeguard needs to cast a wider net to be more robust, resulting in more requests being incorrectly flagged. We made the wrong tradeoff, and we apologize for not getting the balance right,” the spokesperson added.

Amid this discourse, rival OpenAI is weighing price cuts to sway consumers away from Anthropic. “The company is weighing significant cuts to what it charges for tokens, the unit of measurement artificial-intelligence firms use to bill for their products,” a Wall Street Journal report said, adding that it was “in anticipation of similar cuts the company expects at Anthropic,” according to sources.

OpenAI currently charges consumers tiered monthly subscriptions of $8, $20, and $100 or more for access to its flagship GPT-5.5 models, while Anthropic charges $17 per month for Claude Pro with an annual subscription, and $100 or more per month for Claude Max.

Topics

About the Author

Tags:

Topics

Share