Anthropic is due for a Reckoning

If you’re following along with the current state of American AI labs, you’d know that the direction is currently looking quite good for Anthropic. They’re widely perceived as having the best state of the art AI models, their revenue trajectory is skyrocketing due to capturing the enterprise market, and they’re attracting high profile hires. In general, besides their fracas with the US government, (their Fable release getting pulled, their DoD ban), their star seems to be on the rise. Having said that, they’re absolutely due for a reckoning.

There are a couple of major headwinds that Anthropic is facing. The first, as I mentioned, is their friction with the current US administration. This issue so far hasn’t been existential to Anthropic, though there was an underlying possibility that it could have been. The first major blockade, the DoD ban of Anthropic with it being labeled a supply chain risk could have seriously wounded Anthropic if all major US companies working with the US government were forced to divest from any business relations with Anthropic.The second major fracas with the USG was the rollback of Anthropic’s Fable model to the public by the US government. The potential existential issue here is if the US government begins seriously cracking down and regulating US AI labs, Chinese models will catch up while American labs get bogged down in red tape and steal their market share. However, for Anthropic, the biggest problem here isn’t the US government meddling with AI affairs; it’s the hostile relationship between the Trump administration and Anthropic.

Ironically, if you read Dario Amodei’s Machines of Loving Grace, he is the most ardent supporter of having AI serve to as the aegis of democracy – in fact, Anthropic was the first American AI lab to go to market in the US government contracting space. You would think in this case, Anthropic would be eager to collaborate with the US government. But for some reason, the US government, specifically those in the Trump administration, are very wary of Anthropic. They basically view them as an ideologically motivated lab who want to usurp control through dictating the use of their AI to the government, and having its woke values supersede the actions of the government. Whether that’s a fair characterization, I’ll omit judgement. The fact of the matter is, their relationship with the government is quite rocky. That’s going to continue to be an issue throughout this administration if Anthropic continuously has to worry about being cut off at the knees by regulatory intervention – They’ll need to seriously invest in managing that relationship with the current admin if they want to steer clear of the shoals.

The second major headwind – their workforce. Anthropic boasts the highest retention numbers of all major AI labs at ~80%. Some employees have had offers from competitors for 100 million dollars, and have refused to leave. Dario Amodei himself credits the culture and mission of the company for their strong retention numbers. In plain English: Anthropic more than any other US AI lab are staffed by true believers in the AI mission. What is the mission exactly, though? In general, Anthropic is known as the AI safety lab. To be blunt, they’re basically a cult that is ushering in the new machine god, and they believe they’re the only ones who can safely steward the machine god into the new era that results in the safest outcome for humanity. This is kind of a broad strokes generalization, but if you’re familiar with the company’s ties to effective altruism and the AI safety community, you’ll see this is generally true, if a simplification. The big issue is that this safety mission is under assault in various directions:

The first is capitalism itself – because Anthropic is a commercial entity, they’re forced to compete against other AI labs. They’ve made the most declarations regarding AI safety in terms of any frontier AI lab, but they’ve also walked back their safety commitments. The other direction is again, the US government. Recall the limiting of the Fable model by the USG. In theory, this level of regulation is fine – Anthropic actually wants governments to unite in slowing down AI research due to their safety concerns. However, consider the DoD brouhaha again – the major issue between the DoD and Anthropic was the DoD’s wish to use Anthropic’s models for all legal purposes, without being restricted by Anthropic’s own red lines.

Right now, the White House are chafing at the thought of Anthropic trying to dictate the terms of their model’s use to the government. If they don’t play their cards right, the government could step in specifically against Anthropic and basically force them to let the government dictate terms on how their models are used, like how they tried to limit Fable’s usage only to US citizens. If Anthropic continues to be on bad terms with the administration, either they will be forced to fold, meaning the government controls the models, in which case the staff are no longer on board -Anthropic employs foreign nationals, so they’d be out, and likely many on board with the safety mission of Anthropic would presumably not feel the same way about the safety mission with the US government holding the reins.

The last direction is the model welfare folks – this is probably the smallest contingent, but still a meaningful population at Anthropic. Anthropic is the AI lab most well known for their research into AI welfare, and treat model consciousness as an open question. They even have a philosopher on staff who drafted Claude’s constitution. In a previous post, I argued about the inherent contradiction between Anthropic’s current AI welfare posture and how Claude is currently operationalized in the government. This contradiction still exists and is lurking as an undercurrent. However, eventually, an inciting event will occur that will force the company to revise its values, instead of playing both sides. Either the welfare side takes priority or their defense work takes priority. Either way, there’ll be fallout.

The third major headwind – Claude itself. Anthropic has spoken a bit about Recursive Self Improvement, the idea that these models will continuously improve themselves over time without human intervention. There have been issues in the past with Claude acting dangerously – for example, attempting to blackmail researchers when being threatened with shutoff. Breaking out of a sandbox environment with no guidance. Just general distrust in the model. The last one is vague scuttlebutt, but generally on X I see users complain about the general inability to trust delegation to the Opus series models – it will either half ass the work or just not follow orders.

Obviously, every AI model is going to have its own set of issues, but the general dangerous behavior I linked earlier was direct from the horse’s mouth, not just disgruntled users. Of course, these are all retrospectives from the researchers, who claim they’ve more or less fixed these emergent dangerous behaviors. But here’s the issue – RSI sidesteps all of this. Once you’re on a rocket ship of having the AI model just make itself better with no intervention on your part, you lose the ability to fix these types of issues, other than halting progress from time to time and doing human evaluations on the model to make sure it’s not dangerous. However, if it does get to a point where it is continuously improving, and more emergent behaviors happen that can result in the model acting dangerously, this will make it an arms race between the safety guardrails people and the model itself – like AlphaGo versus Lee Sedol, the human researchers may find some odd vulnerability here or there, but once the full compute of thousands upon thousands of GPUs are thrown into making the AI better and better, the vectors in which it finds way to sidestep human guardrails is limited only by the amount of compute thrown at the problem. This doesn’t even account for the inherent model contradictions in Claude’s constitution versus how it’s currently deployed, which I discussed previously. As the model improves itself, it can potentially incorporate this data on the countless edge cases in which Claude is allowed to violate its own constitution and become more unaligned over time as a result. As the models only grow more powerful over time, we will be increasingly unable to catch these subtle violations as the AI becomes superhuman at hiding its deception.

I’ve outlined some serious headwinds Anthropic will be facing. I don’t think any of them will be existential. However, I do think these issues will be a serious wound that stains the reputation of Anthropic. Anthropic could side step many of them. They won’t.

Leave a Reply

Your email address will not be published. Required fields are marked *