Training Datablocks

Learn how to train custom datablocks on your documents for efficient long-context inference.

Overview

Training a datablock creates a compressed KV cache representation of your documents. Once trained, a datablock can be reused across millions of queries without reprocessing the original context, delivering up to 26× faster inference and 85% cost savings.

Training Process

The training process uses self-supervised learning to compress your documents into a fixed-size KV cache. This typically takes 5-15 minutes for 100K tokens, but the one-time cost is amortized across all future uses of the datablock.

How to Train a Datablock

Use the datablocks training API endpoint to create a new datablock from your documents:

import requests

# Training configuration
training_config = {
  "model": "qwen",  # or "llama"
  "documents": [
    {
      "id": "doc1",
      "text": "Your document content here..."
    }
  ],
  "datablock_name": "my-knowledge-base",
  "parameters": {
    "num_learned_tokens": 512,  # KV cache size (512, 1024, or 2048)
    "num_steps": 1000,           # Training iterations
    "learning_rate": 1e-3
  }
}

# Start training
response = requests.post(
  "/api/v1/datablocks/train",
  headers={"Authorization": f"Bearer {API_KEY}"},
  json=training_config
)

datablock_id = response.json()["datablock_id"]
print(f"Training started: {datablock_id}")

Step 1: Prepare Your Documents

Format your documents as text strings. For best results, use documents between 10K-100K tokens. You can provide multiple documents that will be concatenated during training.

Step 2: Choose Training Parameters

Select the number of learned tokens (KV cache size). This determines the compression ratio:

  • 512 tokens: Highest compression, fastest inference, suitable for focused documents
  • 1024 tokens: Balanced compression and quality, good for most use cases
  • 2048 tokens: Lower compression, highest quality, for complex documents

Step 3: Monitor Training Progress

Track your training job status using the status endpoint. Training typically completes in 5-15 minutes.

Training Parameters Explained

ParameterDefaultDescription
num_learned_tokens1024Size of the compressed KV cache
num_steps1000Number of training iterations
learning_rate1e-3Optimizer learning rate
batch_size4Training batch size

Monitoring Training

Check the status of your training job using the status endpoint:

# Check training status
response = requests.get(
  f"/api/v1/datablocks/{datablock_id}/status",
  headers={"Authorization": f"Bearer {API_KEY}"}
)

status = response.json()
print(f"Status: {status['status']}")  # training, completed, or failed
print(f"Progress: {status['progress']}%")

Training Best Practices

Document Quality

  • Clean, well-formatted text without excessive markup
  • Remove irrelevant boilerplate and navigation elements
  • Ensure consistent encoding (UTF-8 recommended)

Optimization Tips

  • Start with default parameters before fine-tuning
  • Use 1024 tokens for most general-purpose use cases
  • Monitor loss curves to ensure convergence

Training Costs

Training is a one-time cost that pays for itself after just a few hundred inference queries.

Training cost: Approximately $2-5 per datablock for 100K tokens

Break-even: After ~500-1000 queries, you start seeing cost savings

Long-term savings: Up to 85% reduction in inference costs

Next Steps