#benchmark-testing

[ follow ]
fromTechCrunch
1 week ago

A new AI coding challenge just published its first results - and they aren't pretty | TechCrunch

A Brazilian engineer won the K Prize AI coding challenge with only 7.5% correct answers.
fromZDNET
1 week ago

I replaced my work PC with this Dell laptop, and it was one of my best decisions

The Alienware 18 is a high-end gaming laptop that excels in performance, featuring an impressive display and mechanical keyboard.
Artificial intelligence
fromHackernoon
2 months ago

Chameleon AI Shows Competitive Edge Over LLaMa-2 and Other Models | HackerNoon

Chameleon exhibits competitive performance against leading text-only language models, excelling particularly in commonsense reasoning.
The evaluations indicate that Chameleon is capable of outperforming larger models like Llama-2 in specific benchmarks.
fromTechCrunch
3 months ago

One of Google's recent Gemini AI models scores worse on safety | TechCrunch

A recently published Google AI model, Gemini 2.5 Flash, shows a decline in safety performance compared to its predecessor, Gemini 2.0 Flash.
Artificial intelligence
fromTechCrunch
3 months ago

OpenAI's o3 AI model scores lower on a benchmark than the company initially implied | TechCrunch

OpenAI's o3 model benchmark results are disputed, raising questions about transparency and testing practices.
[ Load more ]