Chapter 5 Β· Section 6

Choosing Optimizers

Decision guide, trade-offs, and optimization synergy analysis.

~15 min read

πŸ“‹ Quick Reference Guide

Optimizer Best For Data Speed Performance
None (Baseline) Simple tasks None Fastest Baseline
BootstrapFewShot General improvement 10-100 Fast Good
KNNFewShot Context-sensitive 100+ Medium Good
MIPRO Maximum performance 20-200 Slow Excellent
Fine-Tuning Production, cost-sensitive 1000+ Very Slow Excellent

🎯 Decision Framework

Step 1: Analyze Your Constraints

πŸ“Š

Data Constraints

How many examples? What quality and diversity?

⏱️

Time Budget

Minutes, hours, or days for optimization?

🎯

Performance Target

What accuracy improvement do you need?

πŸ”§

Task Complexity

Simple classification or complex reasoning?

πŸ“¦ Use Case Recommendations

Use Case 1: Quick Prototype

πŸš€

Scenario: Building an MVP with 50 examples and 2 days deadline

Recommendation: BootstrapFewShot with max_bootstrapped_demos=8

optimizer = BootstrapFewShot(
    metric=answer_accuracy,
    max_bootstrapped_demos=8,
    max_labeled_demos=4
)
prototype = optimizer.compile(SupportBot(), trainset=examples)

Use Case 2: Enterprise RAG System

🏒

Scenario: 10,000 examples, high accuracy (95%+) required

Recommendation: MIPRO with auto="heavy", consider fine-tuning for cost

# Stage 1: Quick baseline
baseline = BootstrapFewShot(metric=f1_score).compile(
    LegalRAG(), trainset=trainset[:1000]
)

# Stage 2: Advanced optimization
optimizer = MIPRO(metric=weighted_metric, auto="heavy")
optimized = optimizer.compile(LegalRAG(), trainset=trainset)

Use Case 3: Real-time Classification

⚑

Scenario: 1000+ requests/sec, <100ms latency

Recommendation: KNNFewShot with caching, or fine-tuned small model

optimizer = KNNFewShot(
    k=3,
    similarity_fn=semantic_similarity,
    cache_embeddings=True  # Speed optimization
)
classifier = optimizer.compile(ContentModerator(), trainset=examples)

πŸ“ˆ Expected Performance Patterns

Optimizer Accuracy Gain Compile Time Best For
Baseline 0% < 1s Quick testing
BootstrapFewShot 5-15% 1-5 min Most tasks
KNNFewShot 5-12% 1-2 min Context tasks
MIPRO 10-25% 5-30 min Complex tasks
Fine-Tuning 15-30% 1-4 hrs Production

πŸ”„ Progressive Optimization Strategy

Start simple and progressively add optimization:

def progressive_optimization(program, trainset, valset):
    """Start simple and progressively add optimization."""
    stages = [
        {"name": "Baseline", "optimizer": None},
        {"name": "BootstrapFewShot", 
         "optimizer": BootstrapFewShot(metric=accuracy_metric),
         "config": {"max_bootstrapped_demos": 4}},
        {"name": "KNNFewShot", 
         "optimizer": KNNFewShot(k=3)},
        {"name": "MIPRO", 
         "optimizer": MIPRO(metric=accuracy_metric, auto="medium")},
    ]
    
    best_program = program
    best_score = 0
    
    for stage in stages:
        print(f"\n=== Stage: {stage['name']} ===")
        
        if stage['optimizer']:
            compiled = stage['optimizer'].compile(
                best_program,
                trainset=trainset,
                **stage.get('config', {})
            )
        else:
            compiled = program
        
        score = evaluate(compiled, valset)
        print(f"Score: {score:.3f}")
        
        if score > best_score:
            best_score = score
            best_program = compiled
            print("βœ“ New best model!")
    
    return best_program

πŸ“ Optimization Order Effects

When combining strategies, order matters significantly:

βœ…

Optimal order: Fine-tuning β†’ Prompt Optimization

This achieves 3.5x improvement beyond individual approaches!

❌

Suboptimal order: Prompt Optimization β†’ Fine-tuning

Only achieves 1.8x improvement (prompts don't transfer well)

# OPTIMAL ORDER: Fine-tune first
finetuned = finetune(base_model, trainset)
dspy.settings.configure(lm=finetuned)

optimizer = MIPRO(metric=accuracy, auto="medium")
compiled = optimizer.compile(program, trainset=trainset)
# Result: 3.5x improvement!

πŸ”— Synergy Quantification

Combined optimization achieves synergistic effects:

Task Baseline FT Only PO Only Combined Synergy
MultiHopQA 12% 28% 20% 45% 3.5x
GSM8K Math 11% 32% 22% 55% 2.8x
AQuA 9% 35% 28% 69% 3.4x
πŸ’‘

Key insight: Combined optimization exceeds the sum of individual improvementsβ€”this is synergy!

🌲 Quick Decision Tree

Starting optimization?
β”‚
β”œβ”€β”€ Have < 20 examples?
β”‚   └── Use: BootstrapFewShot (or no optimization)
β”‚
β”œβ”€β”€ Have 20-100 examples?
β”‚   └── Need max performance? β†’ MIPRO
β”‚   └── Need speed? β†’ BootstrapFewShot
β”‚
β”œβ”€β”€ Have 100+ examples?
β”‚   └── Context-sensitive task? β†’ KNNFewShot
β”‚   └── Complex reasoning? β†’ MIPRO
β”‚
└── Have 1000+ examples AND production needs?
    └── Consider: Fine-tuning + MIPRO

πŸ“ Key Takeaways

Start with BootstrapFewShotβ€”it's fast and effective for most tasks

Use MIPRO when maximum performance is critical

KNNFewShot excels at context-sensitive tasks with large datasets

Order matters: Fine-tune first, then prompt optimize

Combined optimization achieves synergistic (3x+) improvements