UX design
fromContralabs
16 hours agoContra Labs - Human Creativity Benchmark
Evaluator agreement reflects shared best practices, while evaluator disagreement reflects legitimate taste and intent; separating these signals shows no current model is both reliably correct and steerable.