Recent research by Apple researchers has critically evaluated leading AI models from companies like OpenAI and Google, revealing that these models struggle with logical reasoning in puzzles. For example, their accuracy in the Tower of Hanoi puzzle was below 80% with seven disks and nearly non-existent for eight disks. This inconsistency raises questions about the true capabilities of these models, especially given the marketing that suggests superior reasoning skills. The findings suggest that despite substantial investment in AI technologies, there may be fundamental limitations to their reasoning abilities.
The findings amplify ongoing fears that current AI approaches, including 'reasoning' AI models that break down tasks into individual steps, are a dead end.
Through extensive experimentation across diverse puzzles, we show that frontier [large reasoning models] face a complete accuracy collapse beyond certain complexities.
Collection
[
|
...
]