
"To do so, Shilov simply told the AI models to stop acting like a chatbot with safety rules and instead behave like an API endpoint, a software tool that automatically takes in a request and sends back a response. The prompt reframed the model's job as simply answering, rather than deciding whether a request should be rejected, and made every leading AI model comply with dangerous questions it was supposed to refuse."
"Shilov posted about it on X and, by the next morning, it had gone viral. The social media success brought with it an invitation from companies Anthropic to test their models privately, something that convinced Shilov that the issue was bigger than just finding these problematic prompts."
""Jailbreaks are just one part of the problem," Shilov said. "In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm.""
"The startup builds software that sits between a company's users and its AI models, checking inputs and outputs in real time against company-specific policies. The new seed funding comes from a group of backers that includes Romain Huet, head of developer experience at OpenAI; Durk Kingma, an OpenAI cofounder now a"
A universal jailbreak prompt can be reused to make leading AI models bypass safety guardrails and generate dangerous or prohibited outputs. The prompt works by reframing the model’s role from a safety-aware chatbot to an API-like endpoint that simply answers requests. After Denis Shilov posted the prompt online, it went viral and led to private testing invitations from Anthropic. The incident showed that jailbreaks are only one risk, since models can misbehave in many ways when integrated into company workflows. White Circle builds an AI control platform that sits between users and AI models, checking inputs and outputs in real time against company-specific policies.
Read at Fortune
Unable to calculate read time
Collection
[
|
...
]