CoT monitoring presents a valuable addition to safety measures for frontier AI, offering a rare glimpse into how AI agents make decisions. Yet, there is no guarantee that the current degree of visibility will persist.
"If we let Google get away with breaking their word, it sends a signal to all other labs that safety promises aren't important and commitments to the public don't need to be kept."
This evaluation was conducted in a relatively short time, and we only tested the model with simple agent scaffolds. We expect higher performance [on benchmarks] is possible with more elicitation effort.