Artificial intelligencefromMedium3 months agoHow Robots Learn Preferences with Minimal Human FeedbackVik's research focuses on how robots can learn from minimal human feedback, adapting without the need for large datasets.
fromHackernoon7 months agoThe Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoonThe paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.