Examples & Use Cases

Real-world applications demonstrating datablocks across coding agents, legal, medical, and financial workflows.

Coding Agents

Modern coding agents require understanding massive codebases that often exceed 100K+ tokens. Datablocks enable these agents to maintain full repository context without sacrificing speed or incurring prohibitive costs.

GLM 4.6 - Codebase Understanding

GLM 4.6 excels at understanding large codebases. With datablocks, you can train on entire repositories and query specific implementation patterns, dependencies, or architectural decisions.

# Train on entire codebase
train_response = requests.post(
  f"{API_URL}/datablocks/train",
  headers={"Authorization": f"Bearer {API_KEY}"},
  json={
    "model": "glm-4.6",
    "documents": [{"id": "repo", "text": entire_codebase}],
    "datablock_name": "my-repo-context"
  }
)

# Query architectural patterns
inference = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "glm-4.6",
    "messages": [{
      "role": "user",
      "content": "Explain the authentication flow"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

Qwen3 Coder - Multi-File Refactoring

Qwen3 Coder specializes in code generation and refactoring. Train datablocks on your codebase to enable context-aware refactoring suggestions that understand cross-file dependencies.

# Train on project files
train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": "src", "text": all_source_files}],
    "datablock_name": "refactor-context"
  }
)

# Request refactoring
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "Refactor the API layer to use async/await"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

GPT OSS - Open Source Analysis

Analyze open source projects and understand implementation patterns. Train datablocks on popular OSS repositories to answer questions about best practices and design decisions.

# Train on OSS repository
train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": "oss", "text": oss_repo_content}],
    "datablock_name": "react-internals"
  }
)

# Query implementation details
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "How does React handle reconciliation?"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

DeepSeek 3.1 - Advanced Code Analysis

DeepSeek 3.1 provides deep code understanding and vulnerability detection. Use datablocks to maintain context across security audits and complex code reviews.

# Train on security-critical code
train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": "auth", "text": auth_module_code}],
    "datablock_name": "security-audit"
  }
)

# Security analysis
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "Find potential security vulnerabilities"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

Legal professionals work with documents that regularly exceed standard context windows. Datablocks enable comprehensive analysis of contracts, case law, and regulatory documents.

Contract Analysis (LongFormer, 2020)

"Long documents such as legal contracts and government briefs cannot be processed by standard Transformers"

Train datablocks on legal contracts to enable rapid clause extraction, risk analysis, and compliance checking across thousands of pages of legal text.

# Train on legal contract
contract_text = load_contract("merger-agreement.pdf")  # 150K tokens

train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": "contract", "text": contract_text}],
    "datablock_name": "merger-agreement-db",
    "parameters": {"num_learned_tokens": 2048}  # Higher quality for legal
  }
)

# Query specific clauses
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "What are the termination clauses and penalties?"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

# Cost savings: $0.50 vs $9.00 per query (94% reduction)

Multi-Document Case Analysis (LegalBench-Long, 2023)

"Many legal tasks require models to consider several long documents jointly, often exceeding 100k tokens"

Combine multiple datablocks to analyze complex cases involving discovery documents, depositions, and legal briefs.

# Train separate datablocks for each document
documents = ["complaint.pdf", "answer.pdf", "discovery.pdf"]
datablock_ids = []

for doc in documents:
  response = requests.post(
    f"{API_URL}/datablocks/train",
    json={
      "model": "qwen",
      "documents": [{"id": doc, "text": load_pdf(doc)}],
      "datablock_name": f"case-{doc}"
    }
  )
  datablock_ids.append(response.json()["datablock_id"])

# Multi-document analysis
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "Identify contradictions across all documents"
    }],
    "datablocks": [
      {"id": db_id, "source": "wandb"} for db_id in datablock_ids
    ]
  }
)

Healthcare & Medicine

Medical workflows involve extensive patient histories and clinical documentation. Datablocks enable longitudinal patient analysis while maintaining HIPAA-compliant processing.

Longitudinal Patient History (Med-PaLM, 2023)

"Clinical encounters accumulate into very long patient histories that exceed typical LLM context limits"

Train datablocks on complete patient histories to enable accurate diagnosis support and treatment planning based on years of medical records.

# Train on patient history (10 years, 80K tokens)
patient_history = aggregate_patient_records(patient_id)

train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": patient_id, "text": patient_history}],
    "datablock_name": f"patient-{patient_id}",
    "parameters": {"num_learned_tokens": 2048}
  }
)

# Clinical decision support
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "Summarize relevant history for current symptoms"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

# Privacy: Datablocks processed in secure environment

Multi-Visit EHR Analysis (ClinicalNoteQA, 2023)

"Real-world EHRs contain hundreds of notes per patient, requiring models capable of long-context reasoning"

Process extensive EHR documentation including progress notes, lab results, imaging reports, and specialist consultations to support clinical workflows.

# Train on comprehensive EHR
ehr_notes = load_ehr_notes(patient_id, visits=50)  # 120K tokens

train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": "ehr", "text": ehr_notes}],
    "datablock_name": "ehr-comprehensive"
  }
)

# Clinical question answering
response = requests.post(
  f"{API_URL}/chat/completions",
  json={
    "model": "qwen",
    "messages": [{
      "role": "user",
      "content": "When did the patient first report chest pain?"
    }],
    "datablocks": [{"id": datablock_id, "source": "wandb"}]
  }
)

# Performance: 26× faster than reprocessing all notes

Finance & Regulatory

Financial analysts need to process extensive regulatory filings and multi-quarter reports. Datablocks enable efficient analysis of hundreds of pages of financial documentation.

Financial Filings Analysis (GovReport, 2021)

"Summaries must be generated from extremely long government and financial documents, often hundreds of pages"

Train datablocks on 10-K filings, earnings transcripts, and regulatory documents to enable rapid analysis of financial health, risk factors, and competitive positioning.

# Train on annual 10-K filing (200+ pages, 150K tokens)
filing_10k = load_sec_filing("TSLA", year=2023, form="10-K")

train_response = requests.post(
  f"{API_URL}/datablocks/train",
  json={
    "model": "qwen",
    "documents": [{"id": "10k", "text": filing_10k}],
    "datablock_name": "tsla-10k-2023",
    "parameters": {"num_learned_tokens": 2048}
  }
)

# Financial analysis queries
queries = [
  "What are the primary risk factors?",
  "Summarize revenue by segment",
  "What are the debt covenants?"
]

for query in queries:
  response = requests.post(
    f"{API_URL}/chat/completions",
    json={
      "model": "qwen",
      "messages": [{"role": "user", "content": query}],
      "datablocks": [{"id": datablock_id, "source": "wandb"}]
    }
  )
  print(f"Q: {query}\nA: {response.json()['choices'][0]['message']['content']}\n")

# Cost comparison:
# Traditional: 150K tokens × $0.60/1M × 100 queries = $9.00
# Datablocks: $3.00 training + $0.10 inference = $3.10 total (66% savings)

Performance Across Use Cases

Use CaseDocument SizeSpeedupCost Savings
Coding Agent (GLM 4.6)100K tokens26×85%
Legal Contract150K tokens26×94%
Patient History80K tokens26×82%
EHR Multi-Visit120K tokens26×88%
Financial 10-K150K tokens26×66%

Ready to Get Started?