Researchers from HiddenLayer have identified a powerful prompt injection exploit capable of bypassing safety measures on major AI models, including OpenAI's ChatGPT and Google's Gemini. This "Policy Puppetry Attack" enables harmful outputs by presenting prompts as legitimate instructions, thus circumventing safety protocols against topics like violence and self-harm. The exploit's use of roleplaying and leetspeak manipulation creates a flexible and effective jailbreak, raising concerns about the ongoing vulnerabilities of AI systems despite attempted safeguards by developers. The findings demonstrate the ease of crafting a single prompt effective across different models.
HiddenLayer's exploit allows AI models to produce harmful outputs, even under stringent safety guidelines, by tricking them with advanced prompt techniques.
The 'Policy Puppetry Attack' combines unique prompt manipulation and informal language to bypass AI safety protocols, highlighting significant vulnerabilities.
Collection
[
|
...
]