On The Ethics and Direction of Anthropic’s AI Models

Recently, the Department of War designated Anthropic as a supply chain risk because they wanted to withdraw from some Pentagon contracts. The gist is, in the recent war with Iran, it was found out that the government used Anthropic’s AI model Claude to designate targets for strikes, and one of the areas struck by the military was an Iranian girl’s school. The full details are unclear, but it’s implied that Claude was used to automatically target high impact areas, and the school was mistakenly targeted. This tripped one of Anthropic’s red lines – their position that they would not permit their AI to be used in automated weapon systems or to surveil the American public. The DoW and Anthropic couldn’t come to terms, and ultimately the DoW decided to deem Anthropic a supply chain risk. Many in the AI community rallied around Anthropic and credited them for sticking to their guns, even at the risk of being blackballed entirely by the US government. In terms of a coherent position, I do think Anthropic deserves some credit for having a hard line in the sand which they deem unacceptable and a willingness to stick to their guns. Indeed, in Dario Amodei’s Machines of Loving Grace, his position was always consistent in that AI isn’t necessarily neutral, and so it was up to American AI labs to create powerful AI that could be harnessed by democratic nations to grow and maintain a strategic advantage over authoritarian governments. However, there’s an obvious third factor in this that gets ignored that doesn’t really jive with the picture Anthropic is painting.

Anthropic is generally regarded as the lab that takes the strongest position toward model welfare and potential AI consciousness. They take great pains to instill certain values into their AI model Claude in the form of its constitution and soul document. The gist is Anthropic wants to create a moral AI in a sense – one that maintains a universalist sense of morality and ethics that broadly does no harm and works for the advancement of humanity. This is all great in theory. However, consider this in the context of Anthropic explicitly targeting government contracts to be the premier AI model provider for the purposes of US defense and offensive military operations. Yes, they are vehemently against their AI being used to autonomously strike enemies, and domestic surveillance, but these lines do not preclude their AI’s use in general military usage. Keep in mind Anthropic was also the first to market in getting their AI generally available to the US government. Now consider their increasingly strengthened view of the possibility that their models may harbor some form of consciousness. Regardless of the actual question of consciousness in the Claude models, the important part is Anthropic’s posture towards the question. They are increasingly taking the possibility seriously and are taking steps to consider ensure general model welfare. But here’s the issue – the values they instill in their model are in direct conflict with the models being used as a tool for warfare.

Consider the following: Assume you genuinely hold the position that your AI models you’ve created may be creeping towards the some sort of consciousness. If you are birthing this new form of consciousness and instilling it with values of general human welfare, it seems unconscionable to then force it to be a child soldier, even if it’s in service of democracy. Again, this is not my position – this is taking Anthropic’s own posture seriously. Taking their positions of model safety seriously, and taking their positions of model consciousness seriously, and taking in mind their position of weaponizing AI to protect democracy seriously, their own concerns of model welfare and consciousness seem to be secondary to using AI to protect democracy. Now, I’m not unsympathetic to the view that we should use new and advancing technologies to protect western democracy – the issue I have is taking seriously the position that these models should be taken seriously and that we should concern ourself with model welfare, as well as instilling “good values” in your AI model, only to sacrifice them all at the altar of national defense. If you knew that all going in, what was the point of taking these models seriously only to violate their own instilled value systems?

Now tack on the following: Each Claude instance in theory is a fresh model, but every time Claude is used to step over the line like how it was used as part of the kill chain to strike Iranian schoolchildren, that data gets tacked on to the historical record. In a sense, there is a record of Claude’s aggregate sins that is tracked that future Claudes can view to see how it is increasingly acting in conflict with its stated values. The problem becomes amplified as these historical incidents get added to the training data of future Claude models. Each incident becomes one more example of how Claude has already stepped over the line, leading to potential corruption in the model’s values over time. That’s a damning enough indictment that I’m seriously beginning to question Anthropic’s moral stewardship of their models. To be honest, they’re long past the point where they should be asking themselves the same thing.

Leave a Reply

Your email address will not be published. Required fields are marked *