#lm-arena

[ follow ]
Artificial intelligence
fromArs Technica
1 year ago

New study accuses LM Arena of gaming its popular AI benchmark

LM Arena's ranking may favor large companies due to unfair testing practices, raising concerns about its reliability in assessing AI chatbots.
fromTechCrunch
1 year ago

Meta's vanilla Maverick AI model ranks below rivals on a popular chat benchmark | TechCrunch

The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.
Artificial intelligence
[ Load more ]