OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips
Briefly

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips
"But 1,000 tokens per second is actually modest by Cerebras standards. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI's own open-weight gpt-oss-120B model, suggesting that Codex-Spark's comparatively lower speed reflects the overhead of a larger or more complex model. AI coding agents have had a breakout year, with tools like OpenAI's Codex and Anthropic's Claude Code reaching a new level of usefulness for rapidly building prototypes, interfaces, and boilerplate code."
"Spark's deeper hardware story may be more consequential than its benchmark scores. The model runs on Cerebras' Wafer Scale Engine 3, a chip the size of a dinner plate that Cerebras has built its business around since at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to come out of it. OpenAI has spent the past year systematically reducing its dependence on Nvidia."
Cerebras' hardware delivers very high token throughput, with measured rates up to 3,000 tokens per second on some models while Codex-Spark runs nearer 1,000 tokens per second, indicating model size and complexity add inference overhead. AI coding agents improved markedly, making prototyping and boilerplate generation faster and making latency a decisive competitive factor. OpenAI accelerated Codex iterations, releasing GPT-5.2 and GPT-5.3-Codex under competitive pressure. Codex-Spark runs on the Cerebras Wafer Scale Engine 3 and emerges from a January partnership. OpenAI is diversifying hardware away from Nvidia through AMD and Amazon deals and by designing a custom chip.
Read at Ars Technica
Unable to calculate read time
[
|
]