#code-evaluation

[ follow ]
Scala
fromHackernoon
1 year ago

Evaluating GPT and Open-Source Models on Code Mutation Tasks | HackerNoon

Closed-source LLMs generally outperform open-source models in key metrics.
GPT-4 excels in usability while GPT-3.5 is best for rapid mutation generation.
fromHackernoon
9 months ago

Inside the Evaluation Pipeline for Code LLMs With LuaUnit | HackerNoon

To streamline and standardize the automated evaluation procedure, we translated the native assertions in MCEVAL to LuaUnit-based assertions, improving consistency across benchmarks.
Scala
[ Load more ]