Understanding Datablocks

Learn how datablocks enable efficient long-context inference through pre-computed KV caches.

What are Datablocks?

Datablocks are pre-computed key-value (KV) cache representations of large text corpora. Instead of processing thousands of tokens on every request, datablocks allow you to pre-process context once and reuse it across millions of queries with minimal overhead.

Key Benefits

  • 26× faster inference - Eliminate redundant context processing
  • Up to 85% cost savings - Pay datablock rates instead of input token rates
  • Zero quality loss - Maintains full accuracy of the base model
  • Reusable across queries - Train once, use millions of times

How Datablocks Work

Datablocks are created through a training process that compresses large documents into compact KV cache representations. The process involves:

1. Self-Supervised Training

The model learns to compress your documents by predicting tokens in a self-supervised manner. This creates a compact representation that captures the essential information.

2. KV Cache Compression

Traditional transformers store key-value pairs for every token (e.g., 100K tokens = 100K KV pairs). Datablocks compress this into a much smaller fixed-size representation (e.g., 512 KV pairs), dramatically reducing memory and computation.

3. Inference-Time Loading

At inference time, the pre-computed KV cache is loaded directly into the model's attention mechanism. The model can then process your query against this context without recomputing it.

Next Steps