Best Practices

Guidelines for optimal performance and cost-efficiency with datablocks.

Document Preparation

Optimal Document Size

Train datablocks on documents between 10K-100K tokens for best compression and quality.

Too small (<5K tokens): Minimal benefit from compression
Optimal (10K-100K tokens): Best compression ratio and quality
Too large (>100K tokens): Consider splitting into multiple datablocks

Cost Optimization

Cache Reuse

Datablocks are cached server-side. The first load takes longer, but subsequent uses are nearly instant. Design your application to maximize datablock reuse.

Batch Processing

When processing multiple queries against the same context, keep the datablock loaded and batch your requests for maximum efficiency.

Quality Considerations

Datablocks maintain the same quality as the base model. However, ensure your training documents are:

Well-formatted and clean
Relevant to your use case
Representative of the queries you'll ask