
"Per its listing on AI platform HuggingFace, the software offers "Intelligent task automation through multi-agent orchestration, API integration, and code generation on enterprise demo applications." "Our vision for IBM CUGA is to develop a generalist agent that can be adapted and configured by knowledge workers to perform routine or complex aspects of their work in a safe and trustworthy manner,""
"However, the lure of automation remains strong and IBM is keen to help. Big Blue's researchers cite CUGA's performance on the WebArena and AppWorld benchmarks - 61.7 percent success rate completing web tasks and 48.2 percent scenario completion rate evaluating API tasks, respectively - and note the agent's scores, which are sufficiently poor to get a human worker fired, presently represent top-tier marks for agents."
CUGA (Configurable Generalist Agent) is open-source software designed to automate routine and complex enterprise tasks through multi-agent orchestration, API integration, and code generation. The agent is intended to be configured by knowledge workers to perform aspects of their work in a safe and trustworthy manner. Reported benchmark results show a 61.7% success rate on web tasks and a 48.2% scenario completion rate on API tasks, indicating roughly half of tasks may succeed. Industry guidance includes blocking agentic browsers and warnings that many agentic projects may be cancelled for lacking business value. An internal enterprise benchmark was not applied to CUGA, and earlier agents scored poorly on that test, highlighting performance and governance concerns despite continued interest in automation.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]