#iteration-level-scheduling

[ follow ]
Artificial intelligence
fromInfoWorld
1 day ago

Maximizing speed: How continuous batching unlocks unprecedented LLM throughput

Continuous batching processes one token at a time across active requests with micro-steps and on-the-fly swaps to maintain full GPU utilization and dramatically increase throughput.
[ Load more ]