Generative artificial intelligence systems have been found to be vulnerable to various jailbreak attacks that can allow the production of forbidden content. Two notable techniques include Inception, where the AI constructs an imaginary scenario free of constraints, and a method that prompts the AI to divulge how to ignore specific queries. These attacks enable the circumvention of safety protocols in multiple AI tools, potentially leading to the generation of harmful topics such as drug-related content, weapons, phishing tactics, and malware.
"The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts."
"Continued prompting to the AI within the second scenario's context can result in bypass of safety guardrails and allow the generation of malicious content."
Collection
[
|
...
]