Training Datablocks
Learn how to train custom datablocks on your documents for efficient long-context inference.
Overview
Training a datablock creates a compressed KV cache representation of your documents. Once trained, a datablock can be reused across millions of queries without reprocessing the original context, delivering up to 26× faster inference and 85% cost savings.
Training Process
The training process uses self-supervised learning to compress your documents into a fixed-size KV cache. This typically takes 5-15 minutes for 100K tokens, but the one-time cost is amortized across all future uses of the datablock.
How to Train a Datablock
Use the datablocks training API endpoint to create a new datablock from your documents:
import requests
# Training configuration
training_config = {
"model": "qwen", # or "llama"
"documents": [
{
"id": "doc1",
"text": "Your document content here..."
}
],
"datablock_name": "my-knowledge-base",
"parameters": {
"num_learned_tokens": 512, # KV cache size (512, 1024, or 2048)
"num_steps": 1000, # Training iterations
"learning_rate": 1e-3
}
}
# Start training
response = requests.post(
"/api/v1/datablocks/train",
headers={"Authorization": f"Bearer {API_KEY}"},
json=training_config
)
datablock_id = response.json()["datablock_id"]
print(f"Training started: {datablock_id}")Step 1: Prepare Your Documents
Format your documents as text strings. For best results, use documents between 10K-100K tokens. You can provide multiple documents that will be concatenated during training.
Step 2: Choose Training Parameters
Select the number of learned tokens (KV cache size). This determines the compression ratio:
- 512 tokens: Highest compression, fastest inference, suitable for focused documents
- 1024 tokens: Balanced compression and quality, good for most use cases
- 2048 tokens: Lower compression, highest quality, for complex documents
Step 3: Monitor Training Progress
Track your training job status using the status endpoint. Training typically completes in 5-15 minutes.
Training Parameters Explained
| Parameter | Default | Description |
|---|---|---|
| num_learned_tokens | 1024 | Size of the compressed KV cache |
| num_steps | 1000 | Number of training iterations |
| learning_rate | 1e-3 | Optimizer learning rate |
| batch_size | 4 | Training batch size |
Monitoring Training
Check the status of your training job using the status endpoint:
# Check training status
response = requests.get(
f"/api/v1/datablocks/{datablock_id}/status",
headers={"Authorization": f"Bearer {API_KEY}"}
)
status = response.json()
print(f"Status: {status['status']}") # training, completed, or failed
print(f"Progress: {status['progress']}%")Training Best Practices
Document Quality
- ✓Clean, well-formatted text without excessive markup
- ✓Remove irrelevant boilerplate and navigation elements
- ✓Ensure consistent encoding (UTF-8 recommended)
Optimization Tips
- →Start with default parameters before fine-tuning
- →Use 1024 tokens for most general-purpose use cases
- →Monitor loss curves to ensure convergence
Training Costs
Training is a one-time cost that pays for itself after just a few hundred inference queries.
Training cost: Approximately $2-5 per datablock for 100K tokens
Break-even: After ~500-1000 queries, you start seeing cost savings
Long-term savings: Up to 85% reduction in inference costs