Training Datablocks

Learn how to train custom datablocks on your documents for efficient long-context inference.

Overview

Training a datablock creates a compressed KV cache representation of your documents. Once trained, a datablock can be reused across millions of queries without reprocessing the original context, delivering up to 26× faster inference and 85% cost savings.

Training Process

The training process uses self-supervised learning to compress your documents into a fixed-size KV cache. This typically takes 5-15 minutes for 100K tokens, but the one-time cost is amortized across all future uses of the datablock.

How to Train a Datablock

Train a datablock using files you've uploaded to datablocks storage. This ensures your data is securely stored per-user and easily referenceable:

import requests

# Step 1: Upload your flat file data
with open("my_data.txt", "rb") as f:
  file_response = requests.post(
    "/api/v1/files",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": f},
    data={"purpose": "cartridge-training"}
  )
file_id = file_response.json()["id"]

# Step 2: Train datablock using the uploaded file
training_config = {
  "model": "qwen",  # or "llama", "glm-4.6"
  "training_file": file_id,  # Reference your uploaded file
  "datablock_name": "my-knowledge-base",
  "parameters": {
    "num_learned_tokens": 1024,  # KV cache size (512, 1024, or 2048)
    "num_steps": 1000,            # Training iterations
    "learning_rate": 1e-3
  }
}

# Start training
response = requests.post(
  "/api/v1/datablocks/train",
  headers={"Authorization": f"Bearer {API_KEY}"},
  json=training_config
)

datablock_id = response.json()["datablock_id"]
print(f"Training started: {datablock_id}")

Step 1: Upload Your Data Files

Upload your flat file data (CSV, JSON, TXT, PDF, Markdown) to datablocks storage using the Files API. Files are stored securely in per-user storage and can only be accessed with your API key. For best results, use files with 10K-100K tokens of content. You can upload multiple files and reference them using the training_files array parameter.

Step 2: Choose Training Parameters

Select the number of learned tokens (KV cache size). This determines the compression ratio:

  • 512 tokens: Highest compression, fastest inference, suitable for focused documents
  • 1024 tokens: Balanced compression and quality, good for most use cases
  • 2048 tokens: Lower compression, highest quality, for complex documents

Step 3: Monitor Training Progress

Track your training job status using the status endpoint. Training typically completes in 5-15 minutes.

Training Parameters Explained

ParameterDefaultDescription
num_learned_tokens1024Size of the compressed KV cache
num_steps1000Number of training iterations
learning_rate1e-3Optimizer learning rate
batch_size4Training batch size

Monitoring Training

Check the status of your training job using the status endpoint:

# Check training status
response = requests.get(
  f"/api/v1/datablocks/{datablock_id}/status",
  headers={"Authorization": f"Bearer {API_KEY}"}
)

status = response.json()
print(f"Status: {status['status']}")  # training, completed, or failed
print(f"Progress: {status['progress']}%")

Training Best Practices

Document Quality

  • Clean, well-formatted text without excessive markup
  • Remove irrelevant boilerplate and navigation elements
  • Ensure consistent encoding (UTF-8 recommended)

Optimization Tips

  • Start with default parameters before fine-tuning
  • Use 1024 tokens for most general-purpose use cases
  • Monitor loss curves to ensure convergence

Training Costs

Training is a one-time cost that pays for itself after just a few hundred inference queries.

Training cost: Approximately $2-5 per datablock for 100K tokens

Break-even: After ~500-1000 queries, you start seeing cost savings

Long-term savings: Up to 85% reduction in inference costs

Next Steps