Google Shrinks AI Memory With No Accuracy Loss—But There's a Catch
By Jose Antonio Lanz
Published on March 25, 2026.
Google Research has published TurboQuant, a compression algorithm that shrinks a major inference-memory bottleneck by at least 6x while maintaining zero loss in accuracy. The algorithm targets the KV cache, which stores everything a language model needs to remember during a conversation. This process requires extra "quantization constants" to store alongside compressed data, which can add 1 to 2 bits per value. TurboQuant claims to eliminate this overhead entirely via sub-algorithms PolarQuant and QJL (Quantized Johnson-Lindenstrauss). The result is a mathematically unbiased estimator for the attention calculations that drive transformer models. However, there is a catch: TurboQuant does not compress the model's weights, but compresses temporary memory storing mid-session attention computations.
Read Original Article