Examples & Use Cases
Real-world applications demonstrating datablocks across coding agents, legal, medical, and financial workflows.
Coding Agents
Modern coding agents require understanding massive codebases that often exceed 100K+ tokens. Datablocks enable these agents to maintain full repository context without sacrificing speed or incurring prohibitive costs.
GLM 4.6 - Codebase Understanding
GLM 4.6 excels at understanding large codebases. With datablocks, you can train on entire repositories and query specific implementation patterns, dependencies, or architectural decisions.
# Train on entire codebase
train_response = requests.post(
f"{API_URL}/datablocks/train",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "glm-4.6",
"documents": [{"id": "repo", "text": entire_codebase}],
"datablock_name": "my-repo-context"
}
)
# Query architectural patterns
inference = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "glm-4.6",
"messages": [{
"role": "user",
"content": "Explain the authentication flow"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)Qwen3 Coder - Multi-File Refactoring
Qwen3 Coder specializes in code generation and refactoring. Train datablocks on your codebase to enable context-aware refactoring suggestions that understand cross-file dependencies.
# Train on project files
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": "src", "text": all_source_files}],
"datablock_name": "refactor-context"
}
)
# Request refactoring
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "Refactor the API layer to use async/await"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)GPT OSS - Open Source Analysis
Analyze open source projects and understand implementation patterns. Train datablocks on popular OSS repositories to answer questions about best practices and design decisions.
# Train on OSS repository
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": "oss", "text": oss_repo_content}],
"datablock_name": "react-internals"
}
)
# Query implementation details
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "How does React handle reconciliation?"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)DeepSeek 3.1 - Advanced Code Analysis
DeepSeek 3.1 provides deep code understanding and vulnerability detection. Use datablocks to maintain context across security audits and complex code reviews.
# Train on security-critical code
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": "auth", "text": auth_module_code}],
"datablock_name": "security-audit"
}
)
# Security analysis
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "Find potential security vulnerabilities"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)Legal & Compliance
Legal professionals work with documents that regularly exceed standard context windows. Datablocks enable comprehensive analysis of contracts, case law, and regulatory documents.
Contract Analysis (LongFormer, 2020)
"Long documents such as legal contracts and government briefs cannot be processed by standard Transformers"
Train datablocks on legal contracts to enable rapid clause extraction, risk analysis, and compliance checking across thousands of pages of legal text.
# Train on legal contract
contract_text = load_contract("merger-agreement.pdf") # 150K tokens
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": "contract", "text": contract_text}],
"datablock_name": "merger-agreement-db",
"parameters": {"num_learned_tokens": 2048} # Higher quality for legal
}
)
# Query specific clauses
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "What are the termination clauses and penalties?"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)
# Cost savings: $0.50 vs $9.00 per query (94% reduction)Multi-Document Case Analysis (LegalBench-Long, 2023)
"Many legal tasks require models to consider several long documents jointly, often exceeding 100k tokens"
Combine multiple datablocks to analyze complex cases involving discovery documents, depositions, and legal briefs.
# Train separate datablocks for each document
documents = ["complaint.pdf", "answer.pdf", "discovery.pdf"]
datablock_ids = []
for doc in documents:
response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": doc, "text": load_pdf(doc)}],
"datablock_name": f"case-{doc}"
}
)
datablock_ids.append(response.json()["datablock_id"])
# Multi-document analysis
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "Identify contradictions across all documents"
}],
"datablocks": [
{"id": db_id, "source": "wandb"} for db_id in datablock_ids
]
}
)Healthcare & Medicine
Medical workflows involve extensive patient histories and clinical documentation. Datablocks enable longitudinal patient analysis while maintaining HIPAA-compliant processing.
Longitudinal Patient History (Med-PaLM, 2023)
"Clinical encounters accumulate into very long patient histories that exceed typical LLM context limits"
Train datablocks on complete patient histories to enable accurate diagnosis support and treatment planning based on years of medical records.
# Train on patient history (10 years, 80K tokens)
patient_history = aggregate_patient_records(patient_id)
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": patient_id, "text": patient_history}],
"datablock_name": f"patient-{patient_id}",
"parameters": {"num_learned_tokens": 2048}
}
)
# Clinical decision support
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "Summarize relevant history for current symptoms"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)
# Privacy: Datablocks processed in secure environmentMulti-Visit EHR Analysis (ClinicalNoteQA, 2023)
"Real-world EHRs contain hundreds of notes per patient, requiring models capable of long-context reasoning"
Process extensive EHR documentation including progress notes, lab results, imaging reports, and specialist consultations to support clinical workflows.
# Train on comprehensive EHR
ehr_notes = load_ehr_notes(patient_id, visits=50) # 120K tokens
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": "ehr", "text": ehr_notes}],
"datablock_name": "ehr-comprehensive"
}
)
# Clinical question answering
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{
"role": "user",
"content": "When did the patient first report chest pain?"
}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)
# Performance: 26× faster than reprocessing all notesFinance & Regulatory
Financial analysts need to process extensive regulatory filings and multi-quarter reports. Datablocks enable efficient analysis of hundreds of pages of financial documentation.
Financial Filings Analysis (GovReport, 2021)
"Summaries must be generated from extremely long government and financial documents, often hundreds of pages"
Train datablocks on 10-K filings, earnings transcripts, and regulatory documents to enable rapid analysis of financial health, risk factors, and competitive positioning.
# Train on annual 10-K filing (200+ pages, 150K tokens)
filing_10k = load_sec_filing("TSLA", year=2023, form="10-K")
train_response = requests.post(
f"{API_URL}/datablocks/train",
json={
"model": "qwen",
"documents": [{"id": "10k", "text": filing_10k}],
"datablock_name": "tsla-10k-2023",
"parameters": {"num_learned_tokens": 2048}
}
)
# Financial analysis queries
queries = [
"What are the primary risk factors?",
"Summarize revenue by segment",
"What are the debt covenants?"
]
for query in queries:
response = requests.post(
f"{API_URL}/chat/completions",
json={
"model": "qwen",
"messages": [{"role": "user", "content": query}],
"datablocks": [{"id": datablock_id, "source": "wandb"}]
}
)
print(f"Q: {query}\nA: {response.json()['choices'][0]['message']['content']}\n")
# Cost comparison:
# Traditional: 150K tokens × $0.60/1M × 100 queries = $9.00
# Datablocks: $3.00 training + $0.10 inference = $3.10 total (66% savings)Performance Across Use Cases
| Use Case | Document Size | Speedup | Cost Savings |
|---|---|---|---|
| Coding Agent (GLM 4.6) | 100K tokens | 26× | 85% |
| Legal Contract | 150K tokens | 26× | 94% |
| Patient History | 80K tokens | 26× | 82% |
| EHR Multi-Visit | 120K tokens | 26× | 88% |
| Financial 10-K | 150K tokens | 26× | 66% |