Skip to main content

Context Optimization

Reduce token usage while maintaining quality through intelligent context optimization.

Overview

Context Optimization automatically reduces token count by:

  • Removing redundant information
  • Compressing repetitive patterns
  • Optimizing prompt structure
  • Preserving critical information

Optimization Techniques

1. Redundancy Removal

Duplicate content is identified and removed:

# Before (3000 tokens)
"""
User asked: "How do I implement OAuth?"
I provided: Full OAuth implementation guide
User asked: "Can you show me the code?"
I provided: Complete code example
User asked: "How do I test it?"
I provided: Testing procedures
User asked AGAIN: "How do I implement OAuth?"
I provide: Reference to previous answer
"""

# After (800 tokens)
"""
OAuth implementation covered (messages 1-12).
Code example: auth.py (lines 1-150)
Testing: test_auth.py
Latest: User asked same question again → Referenced previous answer
"""

2. Pattern Compression

Repetitive patterns are compressed:

# Before (1500 tokens)
[
{"role": "user", "content": "Fix this bug"},
{"role": "assistant", "content": "Here's the fix..."},
{"role": "user", "content": "Thanks, now fix this other bug"},
{"role": "assistant", "content": "Here's the fix..."},
# ... repeated 20 times
]

# After (300 tokens)
[
{"role": "system", "content": "Debugging session: 20 bugs fixed"},
{"role": "user", "content": "Fix this new bug"},
]

3. Structure Optimization

Prompt structure is optimized:

# Before: Verbose system prompt
"""
You are an expert Python developer with 10 years of experience.
You have worked on large-scale systems and know best practices.
Please provide clear, concise answers with code examples.
When suggesting solutions, consider:
- Performance implications
- Security concerns
- Maintainability
- Scalability
"""

# After: Optimized
"""
Python expert. Prioritize: performance, security, maintainability.
Provide code examples.
"""

What Gets Optimized

Safe to Optimize

Content TypeSafe to Compress
Verbose instructions✅ Yes
Repetitive patterns✅ Yes
Redundant context✅ Yes
Long examples (if similar)✅ Yes

Never Optimized

Content TypeAlways Preserved
User code✅ Preserved
API keys/secrets✅ Preserved
Latest messages✅ Preserved
Critical decisions✅ Preserved

Optimization Levels

Conservative (20-30% reduction)

  • Remove obvious redundancy
  • Compress verbose instructions
  • Preserve most context

Balanced (40-60% reduction)

  • Aggressive redundancy removal
  • Pattern compression
  • Structure optimization

Aggressive (70-80% reduction)

  • Maximum compression
  • Risk: Some context loss
  • Best for: Well-defined tasks

Configuration

Via Savings Slider

The savings slider controls optimization level:

  • Quality Focus → Conservative optimization
  • Balanced → Balanced optimization
  • Maximum Savings → Aggressive optimization

Via API (Coming Soon)

client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[...],
korad_settings={
"optimization": {
"level": "balanced", # conservative, balanced, aggressive
"preserve_code": True,
"preserve_latest_n": 5
}
}
)

Examples

Long Documentation

# Before: 5000 tokens
"""
User: Here's my API documentation (50 pages)
Assistant: I'll analyze it...
[Full documentation included in context]
"""

# After: 800 tokens
"""
User: API documentation provided (50 pages)
Key endpoints:
- GET /users - List users
- POST /users - Create user
- PUT /users/:id - Update user
- DELETE /users/:id - Delete user
Auth: JWT required
Rate limit: 100 req/min
"""

Code Review Session

# Before: 8000 tokens (multiple files)
[
{"role": "user", "content": "Review this file..."},
{"role": "assistant", "content": "Here's my review..."},
# ... repeated for 20 files
]

# After: 1500 tokens
[
{
"role": "system",
"content": "Code review session: 20 files reviewed. Issues found: 15 critical, 30 minor."
},
{"role": "user", "content": "Review this new file..."},
]

Performance Impact

MetricBeforeAfterImprovement
Avg request size8000 tokens3200 tokens60% reduction
Avg response time2.5s1.8s28% faster
Cost per request$0.048$0.01960% savings

Monitoring

Track optimization effectiveness:

response = client.messages.create(...)

if hasattr(response, 'korad_optimization'):
print(f"Original size: {response.korad_optimization.original_tokens}")
print(f"Optimized size: {response.korad_optimization.optimized_tokens}")
print(f"Reduction: {response.korad_optimization.reduction_percent}%")

Best Practices

1. Start with Balanced

# Good: Let Korad.AI optimize automatically
savings_level = "medium"

2. Monitor Quality

# Check if optimization affects your use case
# Run A/B tests with different levels

3. Preserve Critical Context

# For important decisions, use quality focus
savings_level = "low"

4. Test Before Committing

# Try aggressive optimization on non-critical tasks first
savings_level = "high"

Savings Slider → Hybrid Summarization →