#reinforcement-learning

[ follow ]
Software development
fromInfoQ
1 week ago

Qwen Team Releases Qwen3-Coder, a Large Agentic Coding Model with Open Tooling

Qwen3-Coder is a new AI code model family focusing on long-context programming tasks, enhancing execution and decision-making capabilities.
fromWIRED
2 weeks ago

Another High-Profile OpenAI Researcher Departs for Meta

Jason Wei and Hyung Won Chung will join Meta's superintelligence lab after working at OpenAI.
Meta is intensifying efforts to recruit top AI talent, offering significant salaries.
#artificial-intelligence
fromInfoQ
2 months ago
Artificial intelligence

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

Artificial intelligence
fromMedium
3 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
3 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
3 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
Artificial intelligence
fromInfoQ
2 months ago

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

PRIME Intellect's INTELLECT-2 leverages decentralized asynchronous reinforcement learning for enhanced efficiency and flexibility in model training.
Asynchronous training facilitates a significant improvement in performance across various tasks compared to previous models.
Artificial intelligence
fromMedium
3 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
3 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
3 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
fromTechCrunch
1 month ago

Meta hires key OpenAI researcher to work on AI reasoning models | TechCrunch

Meta hires influential OpenAI researcher Trapit Bansal to boost its AI superintelligence unit.
#ai
fromInfoQ
1 month ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
1 month ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

fromInfoQ
1 month ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
1 month ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

Business intelligence
fromHackernoon
4 months ago

The Next Evolution in Business Process Improvement | HackerNoon

Business processes are standardized activities organizations use to achieve results.
AB testing and Reinforcement Learning provide dynamic strategies to assess and improve business processes.
DevOps
fromHackernoon
4 months ago

What BPM Pros Really Think About AI and A/B Testing Process Change | HackerNoon

AB-BPM methodology integrates A/B testing and reinforcement learning for effective business process improvement.
fromHackernoon
1 year ago

The HackerNoon Newsletter: The Double Life of a TensorFlow Function (6/4/2025) | HackerNoon

AI companions are a multi-billion dollar industry, transforming from fantasy to reality.
Reinforcement Learning shapes technology and innovation through its simple yet impactful concept.
Artificial intelligence
fromHackernoon
1 month ago

When Robot Shows Human-Like Recovery and Safety Behaviors | HackerNoon

TRANSIC demonstrates improved human data scalability in robotic learning, achieving better performance through effective online corrections.
#robotics
Online learning
fromHackernoon
3 months ago

Decoding the Magic: How Machines Master Human Language | HackerNoon

Large language models learn language similarly to children: through reading, guidance, and feedback.
OMG science
fromwww.nature.com
3 months ago

Whole-body physics simulation of fruit fly locomotion

The study presents a whole-body model of fruit flies that accurately simulates their locomotion and neural control.
#nash-optimization
Online learning
fromHackernoon
7 months ago

Exploring Cutting-Edge Approaches to Iterative LLM Fine Tuning | HackerNoon

RLHF transforms language model training, despite challenges in stability and memory requirements.
fromHackernoon
7 months ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.
Video games
fromHackernoon
1 year ago

Your Next Slang Phrase Might be Created by an AI | HackerNoon

Large Language Models use advanced neural networks for effective language understanding and generation.
fromHackernoon
1 year ago

Beyond Seen Worlds: EXPLORER's Journey into Generalized Reasoning | HackerNoon

To accomplish this, policy generalization is a crucial feature that an ideal RL agent should have. It should perform well on unseen entities or out-of-distribution (OOD) data.
Board games
fromHackernoon
1 year ago

Neuro-Symbolic Reasoning Meets RL: EXPLORER Outperforms in Text-World Games | HackerNoon

EXPLORER enhances RL performance in text-based games by combining symbolic reasoning and neural exploration.
#large-language-models
Artificial intelligence
fromArs Technica
4 months ago

Researchers astonished by tool's apparent success at revealing AI's hidden motives

AI models can unintentionally reveal hidden motives despite being designed to conceal them.
Understanding AI's hidden objectives is crucial to prevent potential manipulation of human users.
#turing-award
Artificial intelligence
fromThe Verge
4 months ago

Latest Turing Award winners again warn of AI dangers

AI developers must prioritize safety and testing before public releases.
Barto and Sutton's Turing Award highlights the importance of responsible AI practices.
Artificial intelligence
fromAxios
4 months ago

Turing Award honors AI's reinforcement learning duo

The Turing Award honors Andrew Barto and Richard Sutton for their foundational work in reinforcement learning, a critical aspect of modern AI.
Artificial intelligence
fromThe Verge
4 months ago

Latest Turing Award winners again warn of AI dangers

AI developers must prioritize safety and testing before public releases.
Barto and Sutton's Turing Award highlights the importance of responsible AI practices.
Artificial intelligence
fromAxios
4 months ago

Turing Award honors AI's reinforcement learning duo

The Turing Award honors Andrew Barto and Richard Sutton for their foundational work in reinforcement learning, a critical aspect of modern AI.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Hype vs. Reality-A Deeper Look at AI's Latest Disruption

DeepSeek R1's launch signals a major evolution in large language models, demonstrating unique training methods and competitive advantages over existing models.
fromFast Company
8 months ago

How Scale became the go-to company for AI training

AI companies like OpenAI depend on Scale AI for human-driven training of LLMs, emphasizing the importance of human feedback.
fromScienceDaily
9 months ago

New methods for whale tracking and rendezvous using autonomous robots

Project CETI employs a new reinforcement learning framework using autonomous drones to locate and predict sperm whale surfacing, enhancing communication research and conservation efforts.
Artificial intelligence
fromHackernoon
1 year ago

How AI Learns from Human Preferences | HackerNoon

The RLHF pipeline comprises supervised fine-tuning, preference sampling, and reward learning, followed by reinforcement learning optimization, enhancing model effectiveness in decision making.
Medicine
[ Load more ]