Google has released a new version of Gemini Deep Research. This is an agent designed to automate complex research tasks. The agent runs on Gemini 3 Pro. The model can process handwriting, graphs, and mathematical notation. It incorporates this visual information directly into reports and search queries. As a result, the system can not only search textual sources, but also retrieve data that was previously difficult to automate, according to SiliconANGLE.
Some specific improvements of the model include support for up to 10 reference images, meaning you can incorporate a lot more elements from different pictures in your final product; improved photorealism and detail; more accurate text rendering, a task image generating models frequently struggle with; better prompt following; and a better understanding of real-world knowledge, according to Black Forest Labs.
On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they're warmer, more conversational, and better at following instructions. The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.
The "Butter-Bench" test, as detailed in a yet-to-be-peer-reviewed paper, is a "benchmark that evaluates practical intelligence in embodied LLM." In the test, the robot had to navigate to an office kitchen, have butter be placed on a tray attached to its back, confirm the pickup, deliver it to a marked location, and finally return to its charging dock. The results of the Butter-Bench experiment, the researchers conceded, were dubious.
One of the most common questions creators and brands ask: "Is my engagement rate good?" The answer depends on your follower count. A 5% engagement rate looks very different for a neighborhood café with 500 fans than for a news publisher with half a million. That's why we analyzed 52 million Facebook posts across 213,000 accounts with over 6.9 billion engagements collectively, to see how engagement rates shift by follower tier.
It's important to note that this was all tested on the X2 Elite Extreme configuration, which comes with six additional CPU cores over the standard X2 Elite. There were no X2 Elite systems to test, so we don't know what those multi-core scores will be. I've been told that GPU performance will also scale up on the X2 Elite, but we don't yet know how much faster the X2 Elite Extreme is over its sibling.
xAI has introduced Grok 4 Fast, a new reasoning model designed for efficiency and lower cost. The model reduces average thinking tokens by 40% compared with Grok 4, which brings an estimated 98% decrease in cost for equivalent benchmark performance. It maintains a 2-million token context window and a unified architecture that supports both reasoning and non-reasoning use cases. The model also integrates tool-use capabilities such as web browsing and X search.
CrowdStrike and Meta are jointly introducing CyberSOCEval, a new suite of open source benchmarks to evaluate the performance of AI systems in security operations. The collaboration aims to help organizations select more effective AI tools for their Security Operations Center. Meta and CrowdStrike are addressing a growing challenge by introducing CyberSOCEval, a suite of benchmarks that help define what effective AI looks like for cyber defense. The system is built on Meta's open source CyberSecEval framework and CrowdStrike's frontline threat intelligence.