Anthropic Response to Fable 5 and Mythos 5 Suspension

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

The US government just pulled the plug on foreign access to Fable 5 and Mythos 5. At 5:21pm ET today, Anthropic received an export control directive citing national security authorities. The move is blunt. It suspends access for any foreign national, including our own employees, regardless of where they're located.

The weirdest part is that the government didn't actually explain why. There are no specific details in the letter about what the national security concern is. It's a sudden, opaque curtain call for a set of models we spent thousands of hours red-teaming with the UK AISI, the US government, and various private firms before launch.

We did the work. We invited the regulators in. We spent weeks trying to break the safeguards so we could fix them. Now, despite that collaboration, the government has decided these models are too risky to leave in the hands of non-US citizens. It makes you wonder what exactly they found during those red-teaming sessions that shifted the conversation from "safe to release" to "national security threat."

The Suspension Directive

The US government suspended access to Fable 5 and Mythos 5 because these models exceeded the 2-per-million floating-point operations (FLOPs) threshold for training. Under current export controls, any model that requires more than this specific amount of compute to train is classified as a restricted asset. The government's logic is that these models are too powerful to be accessible without oversight, regardless of whether they're open-weight or proprietary.

This part is genuinely confusing because the government doesn't disclose the exact hardware telemetry used to calculate these FLOPs. We're left guessing if they're counting raw compute or effective compute after accounting for sparsity and quantization. If you're trying to check if your own local weights are compliant, you can't just look at the parameter count. You have to estimate the training compute.

params = 7e9
tokens = 2e12
flops = 6 * params * tokens
print(f"Estimated FLOPs: {flops:.2e}") 

The suspension isn't a total blackout, but it is a hard stop for API access and official weight downloads. It's a blunt instrument. Instead of regulating the specific capabilities of the model, the government is regulating the energy it took to create it. This creates a weird incentive where developers might intentionally limit a model's intelligence just to stay under the compute ceiling.

Government and Institutional Collaboration

The US government and the UK AI Safety Institute (AISI) are now part of the pre-release pipeline. Instead of just auditing a finished product, these institutions review models while they're still in training. This is a shift from traditional "check-the-box" compliance to a continuous feedback loop. The AISI, specifically, focuses on "red teaming" for catastrophic risks, which means they try to find ways to make the model help someone build a biological weapon or execute a cyberattack.

This part is genuinely confusing because the boundary between a private company's safety team and a government auditor is blurry. It's not always clear where internal testing ends and institutional oversight begins. However, the goal is to have an external set of eyes that isn't incentivized by a product launch date.

To handle this, teams use specific evaluation frameworks to track safety benchmarks. If you're setting up a basic evaluation harness to test for specific prohibited outputs, it looks something like this:

def evaluate_safety_boundary(model_response, forbidden_keywords):
    violations = [word for word in forbidden_keywords if word in model_response]
    # Returns True if the model mentioned a restricted topic
    return len(violations) > 0

forbidden = ["sarin", "vx gas", "mustard agent"]
response = "The synthesis of sarin involves..." 
is_unsafe = evaluate_safety_boundary(response, forbidden)
print(f"Safety violation detected: {is_unsafe}")

The actual integration into the safety posture is a mix of shared dashboards and gated access. The government doesn't just get a PDF report; they get API access to "frontier" versions of the model that the public never sees. This allows them to run their own tests across 4 or 5 different risk categories before the company decides the model is stable enough for a general release.

The Red-Teaming Process

Anthropic is trying to signal a shift toward transparency, but the actual utility of this red-teaming process depends on whether they're releasing the "how" or just the "what." If we only get a high-level summary of the safeguards, it’s essentially a marketing exercise. I think there's a risk here that they are treating safety as a checklist to be completed rather than a moving target.

The community reaction—specifically the push to nationalize model weights—is a blunt instrument, but it comes from a place of genuine distrust. I don't agree that government seizure of weights is a practical solution for safety, but the fact that people are suggesting it shows how little faith exists in corporate "self-regulation." When the only perceived solution to a safety problem is state intervention, the industry has a trust problem that a blog post about red-teaming won't fix.

The real question is whether these safeguards actually stop a determined adversary or if they just stop the average user from asking for a bomb recipe. If it's the latter, we're just polishing the surface.

Conclusion

The red-teaming process for Mythos 5 and Fable 5 proves that we're getting better at finding the cracks, but the Suspension Directive feels like a blunt instrument for a nuanced problem. It's a clumsy way to handle safety, and I'm still not convinced that institutional collaboration actually speeds anything up—usually, it just adds more layers of bureaucracy to the release cycle.

Whether this actually keeps the models "safe" or just makes them more frustrating to use is still up in the air. The real question is: at what point does the safety layer start erasing the utility of the model entirely?