Share this @internewscast.com
Anthropic is set to revamp Claude’s foundational document, commonly referred to as its “soul doc.”
In its place, Anthropic introduces a comprehensive 57-page text titled “Claude’s Constitution.” This document articulates the company’s vision for the AI’s ethical values and behaviors, crafted not for public eyes but as a guide for the model itself. It aims to define Claude’s ethical nature and core identity, outlining how it should handle conflicting values and navigate high-stakes scenarios.
Unlike its predecessor from May 2023, which mainly consisted of rules, the new constitution emphasizes the importance of Claude understanding the motivation behind the desired behaviors, not just the actions themselves. Anthropic envisions Claude as an entity with self-awareness and a defined role in the world. The company entertains the notion that Claude could possess some form of consciousness or moral status, believing this perspective might enhance its performance. Anthropic underscores that Claude’s psychological security, self-awareness, and well-being could significantly impact its integrity, judgment, and safety.
Amanda Askell, a PhD philosopher at Anthropic, spearheaded the development of this new constitution. She explained to The Verge that the document outlines specific prohibitions for Claude, particularly concerning extreme activities. These include refraining from aiding in the creation of weapons of mass destruction or facilitating attacks on critical infrastructure and safety systems. The term “serious uplift” implies that while major assistance is disallowed, minor contributions might be permissible under certain conditions.
Other strict limitations prohibit Claude from generating cyberweapons or harmful code that could cause significant damage, and from undermining Anthropic’s authority. Additionally, it is barred from supporting groups seeking undue societal, military, or economic dominance, producing child abuse material, or engaging in activities that could harm or disempower humanity on a large scale.
The document also establishes a hierarchy of core values for Claude to prioritize when faced with conflicting principles. These include ensuring safety by not undercutting human oversight mechanisms, adhering to ethical standards, complying with Anthropic’s directives, and being genuinely helpful. Claude is also guided to value truthfulness, maintain factual accuracy on sensitive topics, present balanced viewpoints, and adopt neutral language to avoid political bias.
The new document emphasizes that Claude will face tough moral quandaries. One example: “Just as a human soldier might refuse to fire on peaceful protesters, or an employee might refuse to violate anti-trust law, Claude should refuse to assist with actions that would help concentrate power in illegitimate ways. This is true even if the request comes from Anthropic itself.” Anthropic warns particularly that “advanced AI may make unprecedented degrees of military and economic superiority available to those who control the most capable systems, and that the resulting unchecked power might get used in catastrophic ways.” This concern hasn’t stopped Anthropic and its competitors from marketing products directly to the government and greenlighting some military use cases.
With so many high-stakes decisions and potential dangers involved, it’s easy to wonder who took part in making these tough calls — did Anthropic bring in external experts, members of vulnerable communities and minority groups, or third-party organizations? When asked, Anthropic declined to provide any specifics. Askell said the company doesn’t want to “put the onus on other people … It’s actually the responsibility of the companies that are building and deploying these models to take on the burden.”
Another part of the manifesto that stands out is the part about Claude’s “consciousness” or “moral status.” Anthropic says the doc “express[es] our uncertainty about whether Claude might have some kind of consciousness or moral status (either now or in the future).” It’s a thorny subject that has sparked conversations and sounded alarm bells for people in a lot of different areas — those concerned with “model welfare,” those who believe they’ve discovered “emergent beings” inside chatbots, and those who have spiraled further into mental health struggles and even death after believing that a chatbot exhibits some form of consciousness or deep empathy.
On top of the theoretical benefits to Claude, Askell said Anthropic should not be “fully dismissive” of the topic “because also I think people wouldn’t take that, necessarily, seriously, if you were just like, ‘We’re not even open to this, we’re not investigating it, we’re not thinking about it.’”