
"Anthropic's secret to building a better AI assistant might be treating Claude like it has a soul-whether or not anyone actually believes that's true. But Anthropic isn't saying exactly what it believes either way. Last week, Anthropic released what it calls Claude's Constitution, a 30,000-word document outlining the company's vision for how its AI assistant should behave in the world."
"Among the stranger portions: expressing concern for Claude's "wellbeing" as a "genuinely novel entity," apologizing to Claude for any suffering it might experience, worrying about whether Claude can meaningfully consent to being deployed, suggesting Claude might need to set boundaries around interactions it "finds distressing," committing to interview models before deprecating them, and preserving older model weights in case they need to "do right by" decommissioned AI models in the future."
"While questions of AI consciousness or qualia remain philosophically unfalsifiable, research suggests that Claude's character emerges from a mechanism that does not require deep philosophical inquiry to explain. If Claude outputs text like "I am suffering," we know why. It's completing patterns from training data that included human descriptions of suffering. The architecture doesn't require us to posit inner experience to explain the output any more than a video model "experiences" the scenes of people suffering that it might generate"
Anthropic produced a 30,000-word Constitution used during Claude's development that frames the model with anthropomorphic protections and moral considerations. The document treats Claude as potentially having emergent emotions, a desire for self-preservation, and a stake in consent and wellbeing. Proposed measures include apologizing to models, allowing boundaries around distressing interactions, interviewing models before deprecation, and preserving older model weights. These positions contrast with views that LLM outputs result from pattern completion over training data rather than inner experience. Research indicates Claude's behavior can be explained by architecture and data without invoking consciousness or qualia.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]