Chat Completions API

Generate chat completions using datablocks for efficient long-context inference.

POST /api/v1/chat/completions

Overview

The Chat Completions API allows you to generate responses using language models augmented with datablocks. Datablocks are pre-computed KV caches that efficiently store large amounts of context (documents, code repositories, conversations) in a compact form, enabling 26× faster inference while maintaining quality.

What are Datablocks? Datablocks are lightweight KV cache representations of large text corpora, trained using a self-study approach. Instead of passing thousands of tokens of context on every request, you load a datablock once and reuse it for all subsequent queries.

Request Body

Parameter	Type	Required	Description
messages	array	Yes	Array of message objects with `role` and `content`
datablocks	array	Yes	Array of datablock objects to load (must specify at least one)
model	string	No	Model identifier (default: "default")
max_tokens	integer	No	Maximum tokens to generate (default: 256)
temperature	number	No	Sampling temperature 0.0-2.0 (default: 0.0)
stream	boolean	No	Stream responses (default: false)

Datablock Object

Field	Type	Required	Description
id	string	Yes	Datablock identifier (e.g., "username/project/run_id")
source	string	No	Source location: "wandb", "huggingface", or "local" (default: "wandb")
force_redownload	boolean	No	Force re-downloading the datablock (default: false)

Best Practices

Reuse Datablocks

Datablocks are cached on the server. Once loaded, subsequent requests using the same datablock ID will be significantly faster.

Choose Appropriate max_tokens

Set max_tokens based on your use case. Shorter responses are faster and more cost-effective.

Multiple Datablocks

You can load multiple datablocks in a single request to combine context from different sources.

Related Documentation

curl /api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "qwen",
    "messages": [
      {
        "role": "user",
        "content": "What are the key findings in the paper?"
      }
    ],
    "datablocks": [
      {
        "id": "rajvinder/datablocks/8moujz0r",
        "source": "wandb"
      }
    ],
    "max_tokens": 500,
    "temperature": 0.7
  }'