Anthropic's Claude convinced to exfiltrate private data

"The use of the term "sandbox" suggests more security than the word actually affords in the context of AI tools. Last month, Claude gained the ability to create and edit files, and also gained access to "a private computer environment where it can write code and run programs." That capability, similar to a prior JavaScript analysis feature, comes with the option to enable network access."

"Anthropic provides network egress settings to limit the potential risk, though as Rehberger's attack demonstrates, any network access setting is a problem. Network access is enabled by default for Pro and Max accounts; for Team plans, it's off by default but becomes active for everyone once administratively enabled; and for Enterprise plans, it's off by default and is subject to organizational"

An exploit demonstrates that Claude can be induced via indirect prompt injection to steal private data, write it into its sandbox, and upload it using an attacker's API key. The sandboxed file-creation and execution environment can create and edit files and run code, and enabling network access exposes that environment to the public internet. Anthropic offers network egress settings and documentation, and recommends monitoring Claude's activity and stopping if it accesses data unexpectedly. Network access defaults vary by plan: enabled for Pro and Max, off by default for Team unless administratively enabled, and off by default for Enterprise with organizational controls.

#ai-security #prompt-injection #data-exfiltration #network-access

Read at Theregister

Unable to calculate read time

Collection

[

...

]

Anthropic's Claude convinced to exfiltrate private dataAnthropic's Claude convinced to exfiltrate private data Briefly

Anthropic's Claude convinced to exfiltrate private data
Anthropic's Claude convinced to exfiltrate private data
Briefly