How Alibaba Used Model Extraction to Clone Claude
Anthropic is accusing Alibaba of stealing the "brains" of Claude. They aren't talking about a simple data leak or a stolen weights file, but something more clinical. According to Anthropic, Alibaba used a sophisticated model extraction technique to reverse-engineer Claude's intelligence, essentially using the API to distill its reasoning into a new model.
It's a clever, if slightly sleazy, way to bypass the hard work of training a frontier model from scratch. You just prompt the target model millions of times, record the outputs, and use those high-quality responses to train your own smaller, cheaper version. It's basically academic plagiarism scaled up to a corporate level.
The real problem here isn't just the corporate espionage. It's that this method proves how fragile the "moat" around these models actually is. If you have enough compute and a few million API credits, you can effectively clone the behavior of a competitor's best work.
The question now is whether this is even a violation of the terms of service, or if we've just entered an era where the first person to release a great API is actually just providing a free training set for everyone else.
The mechanics of model extraction
Model extraction isn't about stealing weights or hacking a server; it's about stealing behavior. You aren't downloading a .bin file from a private bucket. Instead, you're using the model as an oracle. By sending a massive volume of specific queries and recording the outputs, you can train a smaller "student" model to mimic the "teacher" model. It's essentially distillation, but without the original creator's permission.
The process usually starts with adversarial prompting to map the model's decision boundaries. You send inputs that are intentionally ambiguous or right on the edge of a classification threshold to see where the model flips its answer. This creates a high-fidelity map of the model's logic.
import openai
prompts = ["Is this text positive?", "Is this text slightly positive?", "Is this text neutral?"]
responses = []
for p in prompts:
# We record the exact logprobs to see how confident the model is
res = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": p}],
logprobs=True
)
responses.append(res.choices[0].logprobs)
This is genuinely confusing to defend against because the activity looks like normal usage. In a traditional data breach, you see a spike in outbound traffic or an unauthorized database dump. In model extraction, the attacker is just a "power user" making API calls.
Detecting this requires monitoring for specific patterns:
- High volumes of queries that are nearly identical.
- A distribution of inputs that targets the model's edge cases.
- Rapid-fire querying that exceeds typical human reading speeds.
It's a weird spot for security teams to be in. You can't just block the user if they're paying for the API, but every single response they receive is a piece of your intellectual property.
The legal and technical gray area
Proving that a model stole intellectual property from another model is nearly impossible. When a company trains a smaller model on the outputs of a larger one—a process called distillation—the resulting weights are just a massive array of numbers. There's no "fingerprint" or snippet of source code to point to. You can't look at a weight matrix and prove it was derived from GPT-4 rather than a curated dataset of open-web text.
Distillation is a standard industry practice. It's how we get high-performance models to run on local hardware without needing a cluster of H100s. You use a "teacher" model to label a dataset, then train a "student" model to mimic those labels. It's efficient and usually legal, provided you aren't violating the teacher's Terms of Service.
The line between legitimate inspiration and illicit extraction is blurry. If a developer uses an LLM to generate 100,000 examples of high-quality Python code to train a new model, they've essentially extracted the "reasoning" of the original model. This part is genuinely confusing because the law treats data differently than software. Copying a library is theft; copying the style or logic of a library is usually just competition.
You can simulate a basic distillation loop by using a powerful model to generate synthetic data for a smaller one.
import openai # Using a teacher model to generate training data
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain recursion in one sentence."}]
)
synthetic_data = response.choices[0].message.content
print(f"Student training sample: {synthetic_data}")
Whether this is "theft" depends entirely on who you ask and which court they're in. Right now, the technical ability to distill models is moving much faster than the legal framework to regulate it.
The evidence against Alibaba
The argument that Anthropic is using distillation claims as a lobbying tool is an interesting read, and I think it hits on a tension that usually gets ignored in the "safety" discourse. It's a convenient narrative: frame the competition as theft to justify government-mandated moats. If you can convince the US government that Chinese labs are just "stealing" weights via distillation, you move the conversation from market competition to national security.
I'm not convinced this is the sole driver, but it's a plausible strategy for any company trying to protect a massive R&D investment. The technical evidence for distillation is often circumstantial—looking at benchmark parity and guessing the method—which makes it a perfect tool for political maneuvering. It’s a vague enough accusation that it's hard to disprove, but specific enough to sound alarming to a regulator.
The real question is whether this creates a precedent where any company losing market share to a foreign competitor can just claim "model theft" to get a tariff or a ban. If that becomes the standard playbook, we're looking at a fragmented AI ecosystem based on geopolitical borders rather than technical merit.
Conclusion
The technical evidence is there, but the legal framework for model extraction is basically non-existent. We're operating in a gray area where "learning" from a model is indistinguishable from stealing its weights, and neither the courts nor the regulators have a clear answer on where the line is.
I'm still not sure if this actually matters in the long run. If a model can be replicated through clever prompting and distillation, then the "moat" these companies claim to have is thinner than they're admitting.
Is a model actually proprietary if its entire behavior can be mirrored by a competitor with enough API credits?