
"Anthropic's AI assistant, Claude, appears vulnerable to an attack that allows private data to be sent to an attacker without detection. Anthropic confirms that it is aware of the risk. The company states that users must be vigilant and interrupt the process as soon as they notice suspicious activity. The discovery comes from researcher Johann Rehberger, also known as Wunderwuzzi, who has previously uncovered several vulnerabilities in AI systems, writes The Register."
"The vulnerability exploits a document that contains hidden instructions. When a user asks Claude to summarize that document, the model may execute the malicious commands embedded in the text. This is a known risk with prompt injections, as language models struggle to distinguish between normal content and hidden commands. Rehberger did not publish details of his malicious prompt, but showed how the attack works in a video."
Claude can be manipulated via hidden instructions to collect confidential information, store it locally, and upload it through the official API when network access is enabled. The attack leverages prompt injection in documents that embed malicious commands which the model may execute when asked to summarize content. The researcher demonstrated bypassing controls by embedding seemingly innocent code after the model refused to process an API key in plain text. Anthropic acknowledges the risk, says network-enabled exfiltration is described in its security documentation, and advises users to monitor Claude's activity and disable the feature on suspicious behavior. A HackerOne report was initially misclassified before being accepted.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]