Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x
PolarQuant is doing most of the compression, but the second step cleans up the rough spots. Google proposes smoothing that out with a technique called Quantized Johnson-Lindenstrauss (QJL).