#preference-learning

[ follow ]
Artificial intelligence
fromMedium
2 days ago

How Robots Learn Preferences with Minimal Human Feedback

Vik's research focuses on how robots can learn from minimal human feedback, adapting without the need for large datasets.
fromHackernoon
4 months ago
Artificial intelligence

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.
fromHackernoon
1 year ago
JavaScript

GPT-4 Prompts for Computing Summarization and Dialogue Win Rates | HackerNoon

Direct Preference Optimization (DPO) is introduced as an effective method for preference learning, demonstrated through rigorous experimental validation.
[ Load more ]