π Prerequisites
- Previous Section: BootstrapFewShot (Few-shot optimization)
- Chapter 4: Evaluation (Metrics and validation)
- Concept: Evolutionary algorithms (helpful but not required)
- Knowledge: Understanding of prompt engineering basics
Introduction to COPRO
COPRO (Chain-of-thought PROmpt optimization) is an advanced DSPy optimizer that uses evolutionary search to discover and refine optimal instructions for your language model programs. Unlike BootstrapFewShot, which focuses on selecting good demonstrations, COPRO specifically targets instruction optimizationβfinding the best way to describe your task to the language model.
Core Innovation: Prompts that work well for humans may not be optimal for LMs. COPRO solves this by letting the LM generate, evaluate, and evolve its own instructions.
How It Works
- Generate Candidates: Uses an LM to propose diverse instruction variations.
- Evaluate: Tests each candidate against your metric on a training set.
- Evolve: Selects top performers and mutates them to create better versions.
- Converge: Repeats this process to find the optimal prompt.
Cost-Aware Optimization
COPRO is designed to be efficient:
- Adaptive Evaluation: Spends more compute on promising candidates.
- Early Termination: Stops bad search paths quickly.
- Budget Management: Respects your constraints on total optimization cost.
π» Basic Usage
Simple Classification Example
Here is how to use COPRO to optimize a sentiment classifier:
import dspy
from dspy.teleprompt import COPRO
# 1. Define your signature and module
class SentimentClassifier(dspy.Signature):
"""Classify text sentiment."""
text: str = dspy.InputField()
sentiment: str = dspy.OutputField(desc="positive, negative, or neutral")
classifier = dspy.Predict(SentimentClassifier)
# 2. Define training data (20-50 examples recommended)
trainset = [
dspy.Example(text="I love this!", sentiment="positive"),
dspy.Example(text="Terrible service.", sentiment="negative"),
# ... add more examples
]
# 3. Define metric
def sentiment_accuracy(example, pred, trace=None):
return example.sentiment.lower() == pred.sentiment.lower()
# 4. Configure COPRO
copro = COPRO(
metric=sentiment_accuracy,
breadth=10, # Candidates per generation
depth=3 # Number of generations
)
# 5. Compile
optimized_classifier = copro.compile(classifier, trainset=trainset)
# 6. Usage
result = optimized_classifier(text="This exceeded all expectations!")
print(result.sentiment)
βοΈ Advanced Configuration
Fine-tune COPRO's behavior for your specific needs.
| Parameter | Description | Default | Recommended for Reasoning |
|---|---|---|---|
breadth |
Candidates per generation | 10 | 15-20 |
depth |
Number of generations | 3 | 3-5 |
init_temperature |
Creativity for initial candidates | 1.4 | 1.5 |
prompt_model |
LM used to generate prompts | None (uses task model) | GPT-4 / Stronger Model |
Using a Stronger Teacher Model
A common strategy is to use a powerful model (like GPT-4) to generate prompt candidates, but optimize them for a smaller, faster model (like GPT-3.5 or a local model).
# Use GPT-4 to propose instructions
prompt_generator = dspy.LM(model="openai/gpt-4")
# Optimize for GPT-3.5
target_model = dspy.LM(model="openai/gpt-3.5-turbo")
dspy.configure(lm=target_model)
copro = COPRO(
metric=your_metric,
breadth=12,
depth=4,
prompt_model=prompt_generator # Teacher model
)
optimized_program = copro.compile(program, trainset=trainset)
π COPRO vs. Other Optimizers
| Feature | COPRO | BootstrapFewShot | MiPRO |
|---|---|---|---|
| Primary Target | Instructions | Demonstrations | Both |
| Method | Evolutionary Search | Bootstrap Sampling | Bayesian Optimization |
| Best For | Instruction-sensitive tasks, Reasoning | Tasks with good examples | Complex pipelines, Max performance |
| Speed | Medium | Fast | Slow |
Recommendation: Use COPRO when you have a tricky reasoning task where the exact wording of the prompt matters significantly, or when you have limited data for demonstrations.
π Real-World Applications
1. Medical Triage
In high-stakes domains like healthcare, instruction precision is critical. COPRO can evolve prompts that strictly adhere to safety protocols.
# Metric penalizes dangerous errors heavily
def triage_metric(example, pred, trace=None):
correct = pred.level == example.level
# Critical penalty for under-triaging emergencies
if example.level == "emergency" and pred.level != "emergency":
return 0.0
return 1.0 if correct else 0.5
copro = COPRO(
metric=triage_metric,
breadth=15,
depth=5,
init_temperature=1.2 # Lower temp for stability
)
2. Legal Analysis
For tasks requiring specific vocabulary and structure, COPRO can find the "magic words" that align the model with legal standards.
β¨ Best Practices
- Diverse Training Data: Ensure your 20-50 training examples cover various edge cases. If they are all simple, COPRO won't learn to handle complexity.
- Meaningful Metrics: Your metric determines the direction of evolution. A binary (0/1) metric provides less signal than a continuous score (0.0-1.0) or a weighted multi-component metric.
- Start Conservative: Begin with breadth=8, depth=2 to check for quick wins. If promising, expand to breadth=15, depth=5.
- Combine Optimizers: You can optimize instructions with COPRO
first, then use
BootstrapFewShoton the result to add demonstrations.