Best Practices
Guidelines for optimal performance and cost-efficiency with datablocks.
Document Preparation
Optimal Document Size
Train datablocks on documents between 10K-100K tokens for best compression and quality.
- Too small (<5K tokens): Minimal benefit from compression
- Optimal (10K-100K tokens): Best compression ratio and quality
- Too large (>100K tokens): Consider splitting into multiple datablocks
Cost Optimization
Cache Reuse
Datablocks are cached server-side. The first load takes longer, but subsequent uses are nearly instant. Design your application to maximize datablock reuse.
Batch Processing
When processing multiple queries against the same context, keep the datablock loaded and batch your requests for maximum efficiency.
Quality Considerations
Datablocks maintain the same quality as the base model. However, ensure your training documents are:
- Well-formatted and clean
- Relevant to your use case
- Representative of the queries you'll ask