#model-serving

[ follow ]
fromMedium
2 weeks ago
DevOps

Serve AI Models with Docker Model Runner-No Code, No Setup

Docker Model Runner eases the serving of ML models locally without complex setups.
fromHackernoon
1 year ago
Miscellaneous

General Model Serving Systems and Memory Optimizations Explained | HackerNoon

Most model serving systems overlook the autoregressive nature of large language models, limiting their optimization potential.
PagedAttention and KV Cache Manager enhance memory efficiency and performance in LLM serving, especially for autoregressive tasks.
[ Load more ]