#inference-performance

[ follow ]
Business intelligence
fromInfoWorld
1 week ago

How I doubled my GPU efficiency without buying a single new card

Profiling GPU utilization revealed inefficiencies in the token generation phase, impacting overall performance despite high utilization during prompt processing.
Artificial intelligence
fromTheregister
3 months ago

OpenAI to serve ChatGPT on Cerebras' AI dinner plates

OpenAI will deploy 750 megawatts of Cerebras wafer-scale accelerators through 2028 in a $10B+ deal to accelerate inference using massive SRAM bandwidth.
[ Load more ]