#key-value-cache

[ follow ]
Data science
fromInfoQ
2 weeks ago

Google's TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware

TurboQuant compresses language models' Key-Value caches by up to 6x with near-zero accuracy loss, enabling efficient use of modest hardware.
Artificial intelligence
fromTechCrunch
6 months ago

Tensormesh raises $4.5M to squeeze more inference out of AI server loads | TechCrunch

Tensormesh commercializes LMCache to retain and reuse KV caches across queries, drastically reducing GPU inference costs and improving performance for chat and agent systems.
[ Load more ]