#inter-rater-reliability
#inter-rater-reliability

[ follow ]

The problems with running human evals

Result ambiguity can come in different forms. The lack of agreement among raters is the most common one, known as Inter Rater Reliability (IRR).

Artificial intelligence

Running evaluations is essential for building valuable, safe, and user-aligned AI products.

Human evaluations help capture nuances that automated tests often miss.

[ Load more ]