Dynamic Savings Slider
Control your cost savings vs. output quality with an intuitive slider control.
Overview
The Savings Slider lets you choose how much Korad.AI optimizes your requests:
- Maximum Savings (90%) — Aggressive compression, minimal quality loss
- Balanced (50-70%) — Smart optimization, recommended default
- Quality Focus (20-30%) — Minimal compression, maximum fidelity
How It Works
graph LR
A[Your Prompt] --> B{Savings Level}
B -->|High| C[Aggressive Summarization]
B -->|Medium| D[Smart Context Compression]
B -->|Low| E[Minimal Optimization]
C --> F[Cheaper Model Routing]
D --> F
E --> F
F --> G[Reduced Token Usage]
Setting Your Savings Level
Via Dashboard
- Go to korad.ai/dashboard
- Adjust the savings slider
- Your preference is saved automatically
Via API (Coming Soon)
client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[...],
korad_settings={
"savings_level": "high" # "low", "medium", "high"
}
)
Savings Levels
Maximum Savings
90% cost reduction
- Aggressive context summarization
- Maximum model downgrading
- Best for: Large document analysis, background jobs
korad_settings = {"savings_level": "maximum"}
High Savings
70% cost reduction
- Strong context compression
- Smart model selection
- Best for: Code generation, data processing
korad_settings = {"savings_level": "high"}
Balanced
50% cost reduction (Recommended)
- Smart optimization
- Quality-preserving compression
- Best for: General use, chat applications
korad_settings = {"savings_level": "medium"}
Quality Focus
30% cost reduction
- Minimal optimization
- Maximum output fidelity
- Best for: Creative writing, nuanced responses
korad_settings = {"savings_level": "low"}
What Gets Optimized?
1. Context Compression
Long conversations are intelligently summarized:
Original: 15,000 tokens
Compressed: 3,000 tokens (80% savings)
Quality loss: < 5%
2. Model Routing
Tasks are routed to cost-effective models:
| Task | Original Model | Optimized Model | Savings |
|---|---|---|---|
| Simple questions | Claude Sonnet | Claude Haiku | 70% |
| Code completion | Claude Sonnet | Optimized Sonnet | 40% |
| Complex reasoning | Claude Opus | Claude Sonnet | 50% |
3. Token Reduction
Redundant tokens are removed:
# Before (user prompt)
"""
Please explain what quantum computing is, including:
- The basic principles
- How quantum bits work
- Why it's faster than classical computing
- Current applications
"""
# After (optimized)
"""
Explain quantum computing: principles, qubits, advantages vs classical,
applications.
"""
Real-World Examples
Document Analysis
# Analyzing a 100-page document
# Original: 50,000 tokens → $0.30 (Sonnet)
# With Maximum Savings: 8,000 tokens → $0.03 (Haiku)
# Savings: 90%
Code Generation
# Generating a React component
# Original: 2,000 tokens → $0.012
# With Balanced: 1,200 tokens → $0.005
# Savings: 58%, identical output
Chat Application
# 10-turn conversation about coding
# Original: 8,000 tokens cumulative → $0.048
# With High Savings: 2,500 tokens cumulative → $0.012
# Savings: 75%, responses stay coherent
Monitoring Your Savings
Every API response includes savings metrics:
response = client.messages.create(...)
print(f"Savings: {response.korad_metrics.savings_percent}%")
print(f"Original cost: ${response.korad_metrics.original_cost:.4f}")
print(f"Your cost: ${response.korad_metrics.your_cost:.4f}")
Best Practices
- Start with Balanced — Recommended for most use cases
- Test different levels — Run A/B tests for your specific use case
- Monitor quality — Check that optimization doesn't hurt your use case
- Use Maximum Savings — For background jobs, document analysis
- Use Quality Focus — For creative work, nuanced responses
Pro Tip
You can set different savings levels for different API keys! Use one key for production (Balanced) and another for batch jobs (Maximum).
Technical Details
Summarization Algorithm
Korad.AI uses a hybrid approach:
- Extract key entities — Names, dates, technical terms
- Preserve code blocks — Untouched for accuracy
- Summarize prose — Context-aware compression
- Maintain thread structure — Conversation flow preserved
Quality Metrics
We measure quality impact across dimensions:
| Dimension | Impact at High Savings |
|---|---|
| Factual accuracy | < 2% degradation |
| Code correctness | No change |
| Conversation coherence | < 5% degradation |
| Response relevance | No change |