Quick Start
Get started with datablocks in under 5 minutes. Train a datablock and start making efficient inference requests.
1. Get Your API Key
Sign up for a datablocks account and generate an API key from your dashboard.
Sign up here to get started with a free trial including 1M free tokens.
2. Install the Client Library
Install the datablocks Python client:
pip install datablocks-client
3. Upload Your Data Files
First, upload your flat file data (CSV, JSON, TXT, PDF, etc.) to datablocks storage. Files are stored securely per-user:
import requests
# Upload your data file
with open("my_document.txt", "rb") as f:
response = requests.post(
"https://trydatablocks.com/api/v1/files",
headers={"Authorization": "Bearer "},
files={"file": f},
data={"purpose": "cartridge-training"}
)
file_id = response.json()["id"]
print(f"File uploaded! File ID: {file_id}")
# Output: File uploaded! File ID: file-abc123Your files are stored in per-user storage and can only be accessed with your API key. Supported formats: .txt, .csv, .json, .jsonl, .pdf, .md (max 512MB per file).
4. Train Your First Datablock
Reference your uploaded file(s) to train a datablock and create a reusable KV cache:
import requests
# Train the datablock using your uploaded file
response = requests.post(
"https://trydatablocks.com/api/v1/datablocks/train",
headers={"Authorization": "Bearer "},
json={
"model": "qwen",
"training_file": file_id, # Reference the file you uploaded
"datablock_name": "my-first-datablock",
"parameters": {
"num_learned_tokens": 1024 # KV cache size
}
}
)
datablock_id = response.json()["datablock_id"]
print(f"Training started! Datablock ID: {datablock_id}")Training typically completes in 5-15 minutes. You'll receive a datablock ID to use for inference. For multiple files, use "training_files": [file_id_1, file_id_2] instead.
5. Run Inference with Your Datablock
Once training completes, use your datablock for fast, cost-effective inference:
import requests
response = requests.post(
"https://trydatablocks.com/api/v1/chat/completions",
headers={
"Authorization": "Bearer ",
"Content-Type": "application/json"
},
json={
"model": "qwen",
"messages": [
{
"role": "user",
"content": "What are the main findings in this document?"
}
],
"datablocks": [
{
"id": datablock_id,
"source": "wandb"
}
]
}
)
answer = response.json()["choices"][0]["message"]["content"]
print(answer)What You Just Achieved
Next Steps
Training Guide
Learn advanced training parameters and best practices for optimal compression.
Inference Guide
Explore advanced usage patterns like batch processing and multi-datablock queries.
Examples & Use Cases
See real-world applications across legal, medical, financial, and coding domains.
Try the Playground
Test the API interactively with our web-based playground tool.
Complete Example
Here's a complete end-to-end example with file upload, training, and inference:
import requests
import time
API_KEY = ""
BASE_URL = "https://trydatablocks.com/api/v1"
# Step 1: Upload your data file
with open("my_document.txt", "rb") as f:
file_response = requests.post(
f"{BASE_URL}/files",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"purpose": "cartridge-training"}
)
file_id = file_response.json()["id"]
print(f"File uploaded: {file_id}")
# Step 2: Train datablock using the uploaded file
train_response = requests.post(
f"{BASE_URL}/datablocks/train",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "qwen",
"training_file": file_id,
"datablock_name": "example-datablock"
}
)
datablock_id = train_response.json()["datablock_id"]
# Step 3: Wait for training (poll status)
while True:
status_response = requests.get(
f"{BASE_URL}/datablocks/{datablock_id}/status",
headers={"Authorization": f"Bearer {API_KEY}"}
)
status = status_response.json()["status"]
if status == "completed":
break
elif status == "failed":
raise Exception("Training failed")
time.sleep(30) # Check every 30 seconds
# Step 4: Run inference
inference_response = requests.post(
f"{BASE_URL}/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "qwen",
"messages": [{"role": "user", "content": "Summarize the key points"}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)
print(inference_response.json()["choices"][0]["message"]["content"])