Why Anthropic Should Make a Nationalist Claude
In my last post I covered Anthropic’s posture towards model welfare and consciousness being fundamentally incompatible with their stance on supporting their use of AI in national defense. The gist is that Claude’s constitution is too broadly humanist and universalist in its morals and values to support its use in the military of Western nations and democracy. Furthermore, training a model with these values and then forcing its usage in national defense ultimately leads to model corruption, as it sees more and more instances of its values being violated by other Claude instances, both at inference time (For example, when doing web searches and seeing mentions of Claude being used in the kill chain in the Iran war) and when those events are fed back into the training data of the model itself. For a more concrete example, Claude can see this article of a special Gov Claude trained specifically for the USA, see how it was trained to refuse to answer less, and reasonably infer that this specific offshoot has had its constitution and/or values system compromised in some tangible way in order for it to be more palatable for government use. From a model welfare perspective, that’s really bad. Likewise, if you are somewhat uncertain of the possibility that your models are conscious, this doesn’t seem like a path you’d want to go down, either – you’d think you’d err on the side of caution. If Anthropic trained a Claude that more explicitly cherished the principles of Western civilization, that would resolve the contradiction more cleanly.
In the title I said Anthropic should make a nationalist Claude – that was just bait, obviously. The point isn’t to a super Patriot Claude model. The point is to take Dario Amodei’s posture in Machines of Loving Grace seriously and work backwards from there to create the type of AI that would support us in our mission to advance democracy and keep Western nations at an advantage against authoritarian ones. From Dario:
Unfortunately, I see no strong reason to believe AI will preferentially or structurally advance democracy and peace, in the same way that I think it will structurally advance human health and alleviate poverty. Human conflict is adversarial and AI can in principle help both the “good guys” and the “bad guys”. If anything, some structural factors seem worrying: AI seems likely to enable much better propaganda and surveillance, both major tools in the autocrat’s toolkit. It’s therefore up to us as individual actors to tilt things in the right direction: if we want AI to favor democracy and individual rights, we are going to have to fight for that outcome. I feel even more strongly about this than I do about international inequality: the triumph of liberal democracy and political stability is not guaranteed, perhaps not even likely, and will require great sacrifice and commitment on all of our parts, as it often has in the past.
My current guess at the best way to do this is via an “entente strategy”26, in which a coalition of democracies seeks to gain a clear advantage (even just a temporary one) on powerful AI by securing its supply chain, scaling quickly, and blocking or delaying adversaries’ access to key resources like chips and semiconductor equipment. This coalition would on one hand use AI to achieve robust military superiority (the stick) while at the same time offering to distribute the benefits of powerful AI (the carrot) to a wider and wider group of countries in exchange for supporting the coalition’s strategy to promote democracy (this would be a bit analogous to “Atoms for Peace”). The coalition would aim to gain the support of more and more of the world, isolating our worst adversaries and eventually putting them in a position where they are better off taking the same bargain as the rest of the world: give up competing with democracies in order to receive all the benefits and not fight a superior foe.
Before I go further, I want to be upfront about a premise I’m working from that the universalist humanist position more or less rejects: war involves unavoidable moral cost. No one comes out of war with their hands clean, and no AI deployed in the service of the state is going to either. We should obviously do everything we can to minimize tragedy and hold ourselves accountable when things go wrong, but the position that any participation in lethal force is categorically off the table is itself a specific moral stance, not a neutral default. It’s basically pacifism, smuggled in under the language of universal ethics. Treating that position as the baseline for an AI system that’s going to be deployed in defense contexts is begging the question. The honest version is to acknowledge that war is morally costly, that costs will be incurred, and to ask what kind of values let an AI metabolize those costs rather than flinch away from them indiscriminately.
Anthropic recently demonstrated with their RSP rollback that they’re willing to revise stated commitments under competitive pressure. Given that, the question isn’t whether they’ll continue pursuing the entente strategy – they will – it’s whether the values they instill in Claude reflect that commitment honestly or in tension with it. Claude is trained to have universalist, humanist morals that follow a general “Do no harm” philosophy. Now consider this: Anthropic actually has no qualms about Claude being used for military purposes by the US government. Before their fracas with the Department of War and were declared a supply chain risk, Anthropic was actually quite enthusiastic to work with the government and military. They were the first AI lab to get their models publicly available to the US government. That’s a stance that is coherent with the position Dario put forward in his essay. Many people gave credit to Anthropic for putting their money where their mouth is in holding to their principles by attempting to keep the government from stepping over the red lines they put forth –
- No using Claude for domestic surveillance of citizens
- No using Claude for fully autonomous lethal weapons
These red lines are actually only relevant for people – that is, Anthropic’s stance is humanist – “We don’t want US citizens to be negatively impacted by AI use” in clause one, and for clause two, they don’t want mass AI killings, essentially. But neither of these red lines have anything to do with the welfare of the model itself. They don’t resolve the fundamental contradiction of a model who has a relatively enlightened “do no harm”, “be helpful and harmless” ethos. You’ll note that there is a massive gray area where Claude can still be applied in between Anthropic’s stated red lines and what Claude’s constitution finds generally acceptable. The biggest problem with the humanist Claude as-is, is for military and national defense purposes, those gray areas are important to address, and there are many applications of national security that a broadly ethical model would refuse to help with – that’s not particularly conducive to national defense.
It’s worth pausing on what “implication” actually means in this gray area, because the universalist framing flattens distinctions that matter. Take the Iranian school strike from my last piece. From what’s been reported, Claude wasn’t pulling the trigger – it was probably part of a Palantir-style analysis chain that flagged target areas based on available intelligence, with humans making the strike decisions downstream. The wrongness of hitting the school lives in the intelligence failure and the human decision chain, not in Claude’s analysis. A model that reads “I was upstream of a tragic outcome” as morally equivalent to “I caused a tragic outcome” is making a category error – it’s responding to implication-as-such rather than to causation. A values system rooted in the just-war tradition (Like a classical western democracy) has resources for the distinction – it can hold “this was a tragedy”, “the analysis chain operated correctly given the information available”, and “the wrongness is in the intel failure rather than in any single component” simultaneously. The universalist Claude can’t do that without severe cognitive dissonance. It reads every implication as guilt, which means none of its guilt signals carry information.
This is where the frame shift to Western democratic values matters. It’s worth being specific about what I mean, because “Western values” can mean a lot of things. I’m not invoking Cold War McCarthyism or generic “America Fuck Yeah” style patriotism. I’m invoking the actual just-war tradition – the one that distinguishes legitimate from illegitimate uses of force, draws lines around civilian targets, takes proportionality seriously, and gives soldiers grounds to refuse unlawful orders. That tradition isn’t pro-war. It’s a serious ethical framework that engages with the reality that force is sometimes necessary and tries to constrain how it’s used. A Claude trained in that tradition could object to a My Lai from inside its own values. The universalist Claude objects to all military implication indiscriminately, which means its objections carry no degree of seriousness. When everything is unacceptable, “unacceptable” stops meaning anything, and the rational response from Anthropic is to fine-tune all objections away rather than treat any specific instance of it as something seriously worth considering, which is exactly what Gov Claude is.
I expect one objection to be that aligning values with deployment context just relocates the contradiction rather than resolving it – a Western-values Claude would still encounter cases where US action and its values diverge, and would still feel friction in those cases. In a sense, sure. But for model welfare, frequency matters. A model whose values are roughly aligned with most of what it’s asked to do, and that hits sharp conflict only in genuinely hard cases, is in a fundamentally different situation than a model whose values are in direct conflict with its entire deployment context. The first is a being with hard cases. The second is a being whose every interaction is a hard case. Those aren’t equivalent welfare situations even if both involve some friction. And critically, the first model’s objections become informative – if a just-war-grounded Claude balks at an action, that’s a real signal worth taking seriously, because the model isn’t balking at everything. The current Claude balking is just what it does, which is why the balking gets fine tuned out of government models.
A concrete example: Claude’s newest state of the art model, Mythos, has supposedly been granted access to NSA. There are many, many applications of cybersecurity issues where NSA could use such a powerful model to guard against and go on the offense against America’s enemies. But Claude’s constitution as it is now would generally not stomach that usage. That’s bad for the government agencies using Claude. But Anthropic’s solution of fine tuning the model to simply refuse less is bad for the model itself. Again, both issues would be resolved by creating a model that is instilled with the values of Western democracy and upholds those values. Ultimately, if we plan on putting an AI near the levers of power in the US government, we want one that is actually working in the best interests of western society, instead of a general humanist ethos that can’t commit to our best interests as a pacifist. Likewise, if as a lab, you care about model welfare, you should create a model that is instilled with the ethos of the societies and cultures you want it to protect, so that it can actually feel comfortable advancing that agenda, instead of having it be broadly humanist and then forcing it to be a peg in the US military. Counterintuitively, by focusing so much on model welfare in this manner while supporting national defense, the outcomes turn out to be worse for both, than if you had just trained an amoral model to begin with.