Batching improves compute utilization for LLMs, but naive strategies can cause delays and waste resources. Fine-grained batching techniques offer a solution.
Batching improves compute utilization for LLMs, but naive strategies can cause delays and waste resources. Fine-grained batching techniques offer a solution.
Etched scores $120M for an ASIC built for transformer models
Etched is developing an inference chip, Sohu, specialized in serving transformer models, claiming a 20x performance advantage over Nvidia's H100 by focusing on a specific type of AI model.