Quick Start

Get started with datablocks in under 5 minutes. Train a datablock and start making efficient inference requests.

1. Get Your API Key

Sign up for a datablocks account and generate an API key from your dashboard.

Sign up here to get started with a free trial including 1M free tokens.

2. Install the Client Library

Install the datablocks Python client:

pip install datablocks-client

3. Train Your First Datablock

Train a datablock on your document to create a reusable KV cache:

import requests

# Your document content
document = """
[Your long document content here - can be up to 100K tokens]
This could be research papers, technical documentation, legal contracts,
medical records, or any other long-form content you need to query repeatedly.
"""

# Train the datablock
response = requests.post(
  "/api/v1/datablocks/train",
  headers={"Authorization": f"Bearer {YOUR_API_KEY}"},
  json={
    "model": "qwen",
    "documents": [{"id": "doc1", "text": document}],
    "datablock_name": "my-first-datablock",
    "parameters": {
      "num_learned_tokens": 1024  # KV cache size
    }
  }
)

datablock_id = response.json()["datablock_id"]
print(f"Training started! Datablock ID: {datablock_id}")

Training typically completes in 5-15 minutes. You'll receive a datablock ID to use for inference.

4. Run Inference with Your Datablock

Once training completes, use your datablock for fast, cost-effective inference:

import requests

response = requests.post(
  "/api/v1/chat/completions",
  headers={
    "Authorization": f"Bearer {YOUR_API_KEY}",
    "Content-Type": "application/json"
  },
  json={
    "model": "qwen",
    "messages": [
      {
        "role": "user",
        "content": "What are the main findings in this document?"
      }
    ],
    "datablocks": [
      {
        "id": datablock_id,
        "source": "wandb"
      }
    ]
  }
)

answer = response.json()["choices"][0]["message"]["content"]
print(answer)

What You Just Achieved

26× faster inference - No need to reprocess the document on every query
Up to 85% cost savings - Pay for datablock loading instead of input tokens
Reusable across millions of queries - Train once, use forever

Next Steps

Complete Example

Here's a complete end-to-end example combining training and inference:

import requests
import time

API_KEY = "your-api-key-here"
BASE_URL = "/api/v1"

# Step 1: Train datablock
train_response = requests.post(
  f"{BASE_URL}/datablocks/train",
  headers={"Authorization": f"Bearer {API_KEY}"},
  json={
    "model": "qwen",
    "documents": [{"id": "doc1", "text": "Your long document..."}],
    "datablock_name": "example-datablock"
  }
)
datablock_id = train_response.json()["datablock_id"]

# Step 2: Wait for training (poll status)
while True:
  status_response = requests.get(
    f"{BASE_URL}/datablocks/{datablock_id}/status",
    headers={"Authorization": f"Bearer {API_KEY}"}
  )
  status = status_response.json()["status"]
  if status == "completed":
    break
  elif status == "failed":
    raise Exception("Training failed")
  time.sleep(30)  # Check every 30 seconds

# Step 3: Run inference
inference_response = requests.post(
  f"{BASE_URL}/chat/completions",
  headers={"Authorization": f"Bearer {API_KEY}"},
  json={
    "model": "qwen",
    "messages": [{"role": "user", "content": "Summarize the key points"}],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

print(inference_response.json()["choices"][0]["message"]["content"])