π Quick Reference Guide
| Optimizer | Best For | Data | Speed | Performance |
|---|---|---|---|---|
| None (Baseline) | Simple tasks | None | Fastest | Baseline |
| BootstrapFewShot | General improvement | 10-100 | Fast | Good |
| KNNFewShot | Context-sensitive | 100+ | Medium | Good |
| MIPRO | Maximum performance | 20-200 | Slow | Excellent |
| Fine-Tuning | Production, cost-sensitive | 1000+ | Very Slow | Excellent |
π― Decision Framework
Step 1: Analyze Your Constraints
Data Constraints
How many examples? What quality and diversity?
Time Budget
Minutes, hours, or days for optimization?
Performance Target
What accuracy improvement do you need?
Task Complexity
Simple classification or complex reasoning?
π¦ Use Case Recommendations
Use Case 1: Quick Prototype
Scenario: Building an MVP with 50 examples and 2 days deadline
Recommendation: BootstrapFewShot with
max_bootstrapped_demos=8
optimizer = BootstrapFewShot(
metric=answer_accuracy,
max_bootstrapped_demos=8,
max_labeled_demos=4
)
prototype = optimizer.compile(SupportBot(), trainset=examples)
Use Case 2: Enterprise RAG System
Scenario: 10,000 examples, high accuracy (95%+) required
Recommendation: MIPRO with auto="heavy", consider
fine-tuning for cost
# Stage 1: Quick baseline
baseline = BootstrapFewShot(metric=f1_score).compile(
LegalRAG(), trainset=trainset[:1000]
)
# Stage 2: Advanced optimization
optimizer = MIPRO(metric=weighted_metric, auto="heavy")
optimized = optimizer.compile(LegalRAG(), trainset=trainset)
Use Case 3: Real-time Classification
Scenario: 1000+ requests/sec, <100ms latency
Recommendation: KNNFewShot with caching, or
fine-tuned small model
optimizer = KNNFewShot(
k=3,
similarity_fn=semantic_similarity,
cache_embeddings=True # Speed optimization
)
classifier = optimizer.compile(ContentModerator(), trainset=examples)
π Expected Performance Patterns
| Optimizer | Accuracy Gain | Compile Time | Best For |
|---|---|---|---|
| Baseline | 0% | < 1s | Quick testing |
| BootstrapFewShot | 5-15% | 1-5 min | Most tasks |
| KNNFewShot | 5-12% | 1-2 min | Context tasks |
| MIPRO | 10-25% | 5-30 min | Complex tasks |
| Fine-Tuning | 15-30% | 1-4 hrs | Production |
π Progressive Optimization Strategy
Start simple and progressively add optimization:
def progressive_optimization(program, trainset, valset):
"""Start simple and progressively add optimization."""
stages = [
{"name": "Baseline", "optimizer": None},
{"name": "BootstrapFewShot",
"optimizer": BootstrapFewShot(metric=accuracy_metric),
"config": {"max_bootstrapped_demos": 4}},
{"name": "KNNFewShot",
"optimizer": KNNFewShot(k=3)},
{"name": "MIPRO",
"optimizer": MIPRO(metric=accuracy_metric, auto="medium")},
]
best_program = program
best_score = 0
for stage in stages:
print(f"\n=== Stage: {stage['name']} ===")
if stage['optimizer']:
compiled = stage['optimizer'].compile(
best_program,
trainset=trainset,
**stage.get('config', {})
)
else:
compiled = program
score = evaluate(compiled, valset)
print(f"Score: {score:.3f}")
if score > best_score:
best_score = score
best_program = compiled
print("β New best model!")
return best_program
π Optimization Order Effects
When combining strategies, order matters significantly:
Optimal order: Fine-tuning β Prompt Optimization
This achieves 3.5x improvement beyond individual approaches!
Suboptimal order: Prompt Optimization β Fine-tuning
Only achieves 1.8x improvement (prompts don't transfer well)
# OPTIMAL ORDER: Fine-tune first
finetuned = finetune(base_model, trainset)
dspy.settings.configure(lm=finetuned)
optimizer = MIPRO(metric=accuracy, auto="medium")
compiled = optimizer.compile(program, trainset=trainset)
# Result: 3.5x improvement!
π Synergy Quantification
Combined optimization achieves synergistic effects:
| Task | Baseline | FT Only | PO Only | Combined | Synergy |
|---|---|---|---|---|---|
| MultiHopQA | 12% | 28% | 20% | 45% | 3.5x |
| GSM8K Math | 11% | 32% | 22% | 55% | 2.8x |
| AQuA | 9% | 35% | 28% | 69% | 3.4x |
Key insight: Combined optimization exceeds the sum of individual improvementsβthis is synergy!
π² Quick Decision Tree
Starting optimization?
β
βββ Have < 20 examples?
β βββ Use: BootstrapFewShot (or no optimization)
β
βββ Have 20-100 examples?
β βββ Need max performance? β MIPRO
β βββ Need speed? β BootstrapFewShot
β
βββ Have 100+ examples?
β βββ Context-sensitive task? β KNNFewShot
β βββ Complex reasoning? β MIPRO
β
βββ Have 1000+ examples AND production needs?
βββ Consider: Fine-tuning + MIPRO