
"OpenAI's initial claim of over 25% accuracy for the o3 model on FrontierMath has been contested by independent tests showing only approximately 10% accuracy, raising transparency issues."
"Epoch AI's testing suggests that the discrepancies may stem from differing testing setups; OpenAI's utilization of more powerful internal configurations, or a different subset of test problems."
The release of OpenAIâs o3 AI model has sparked scrutiny after independent tests by Epoch AI revealed significant discrepancies in benchmark results. OpenAI initially claimed o3 could solve over 25% of FrontierMath problems, vastly outperforming competitors. However, Epoch found that the actual score was around 10%. While OpenAIâs lower-bound figures matched Epochâs findings, the differences illustrate potential issues in testing transparency and methodology, suggesting that OpenAI might have used more advanced resources or a different problem subset for their evaluations.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]