#model-scaling tag

Anthropic is dropping its signature safety pledge amid a heated AI race

Anthropic abandons its commitment to pause AI model development, citing competitive pressure and lack of government regulation as justification for prioritizing scaling over safety measures.

Artificial intelligence

fromTechCrunch

3 months ago

In 2026, AI will move from hype to pragmatism | TechCrunch

2026 shifts AI from brute-force scaling to practical deployment: smaller models, embedded intelligence, human-centered systems, and new architecture research.

fromTechzine Global

3 months ago

DeepSeek breakthrough gives LLMs the highways it has long needed

As LLMs cannot grow infinitely large but do improve with size, researchers must find ways to make the technology effective at smaller scales. One well-known method is Mixture-of-Experts, where an LLM activates only a portion of itself to generate a response (text, photo, video) based on a prompt. This makes a larger model effectively smaller and faster during operation. mHC promises to be even more fundamental. It offers the chance to increase model complexity without the pain points of the past.

Artificial intelligence

fromBusiness Insider

3 months ago

China's DeepSeek kicked off 2026 with a new AI training method that analysts say is a 'breakthrough' for scaling

DeepSeek developed Manifold-Constrained Hyper-Connections (mHC), a training method that enables richer internal model communication while preserving training stability and efficiency as models scale.

Artificial intelligence

fromTechzine Global

4 months ago

Why specialized LLMs are the future of generativeAI

Specialized LLMs trained on smaller, domain-specific datasets deliver more precise, trustworthy, and secure results than ever-larger general models.

fromBusiness Insider

5 months ago

Databricks CEO says AGI is already here - and Silicon Valley just keeps moving the goalposts

Everybody would say yes, but we kept moving the goalposts," Ghodsi said in the discussion, which was published Tuesday.

Artificial intelligence

fromWIRED

6 months ago

The AI Industry's Scaling Obsession Is Headed for a Cliff

Very large, compute-heavy AI models will likely yield diminishing performance returns over the next decade, while efficiency improvements will make smaller models increasingly capable.

fromArs Technica

7 months ago

Anthropic says its new AI model "maintained focus" for 30 hours on multistep tasks

On Monday, Anthropic released Claude Sonnet 4.5, a new AI language model the company calls its "most capable model to date," with improved coding and computer use capabilities. The company also revealed Claude Code 2.0, a command-line AI agent for developers, and the Claude Agent SDK, which is a tool developers can use to build their own AI coding agents.

Artificial intelligence

fromHackernoon

1 year ago

Empirical Validation of Multi-Token Prediction for LLMs | HackerNoon

Multi-token prediction enhances model performance by scaling size, improving inference speed, and learning long-term patterns.

Artificial intelligence

fromHackernoon

56 years ago

Multi-Token Prediction: Architecture for Memory-Efficient LLM Training | HackerNoon

Multi-token prediction enhances language modeling efficacy by allowing simultaneous forecasting of multiple tokens.

Improved model performance scales with increased size.

#model-scaling#model-scaling

Anthropic is dropping its signature safety pledge amid a heated AI race

In 2026, AI will move from hype to pragmatism | TechCrunch

DeepSeek breakthrough gives LLMs the highways it has long needed

China's DeepSeek kicked off 2026 with a new AI training method that analysts say is a 'breakthrough' for scaling

Why specialized LLMs are the future of generativeAI

Databricks CEO says AGI is already here - and Silicon Valley just keeps moving the goalposts

The AI Industry's Scaling Obsession Is Headed for a Cliff

Anthropic says its new AI model "maintained focus" for 30 hours on multistep tasks

Empirical Validation of Multi-Token Prediction for LLMs | HackerNoon

Multi-Token Prediction: Architecture for Memory-Efficient LLM Training | HackerNoon

#model-scaling
#model-scaling