Chapter 5

COPRO: Chain-of-Thought Prompt Optimization

Discover the best Instructions for your DSPy programs using evolutionary search and cost-aware optimization.

πŸ“‹ Prerequisites

  • Previous Section: BootstrapFewShot (Few-shot optimization)
  • Chapter 4: Evaluation (Metrics and validation)
  • Concept: Evolutionary algorithms (helpful but not required)
  • Knowledge: Understanding of prompt engineering basics

Introduction to COPRO

COPRO (Chain-of-thought PROmpt optimization) is an advanced DSPy optimizer that uses evolutionary search to discover and refine optimal instructions for your language model programs. Unlike BootstrapFewShot, which focuses on selecting good demonstrations, COPRO specifically targets instruction optimizationβ€”finding the best way to describe your task to the language model.

πŸ’‘

Core Innovation: Prompts that work well for humans may not be optimal for LMs. COPRO solves this by letting the LM generate, evaluate, and evolve its own instructions.

How It Works

  1. Generate Candidates: Uses an LM to propose diverse instruction variations.
  2. Evaluate: Tests each candidate against your metric on a training set.
  3. Evolve: Selects top performers and mutates them to create better versions.
  4. Converge: Repeats this process to find the optimal prompt.

Cost-Aware Optimization

COPRO is designed to be efficient:

  • Adaptive Evaluation: Spends more compute on promising candidates.
  • Early Termination: Stops bad search paths quickly.
  • Budget Management: Respects your constraints on total optimization cost.

πŸ’» Basic Usage

Simple Classification Example

Here is how to use COPRO to optimize a sentiment classifier:

import dspy
from dspy.teleprompt import COPRO

# 1. Define your signature and module
class SentimentClassifier(dspy.Signature):
    """Classify text sentiment."""
    text: str = dspy.InputField()
    sentiment: str = dspy.OutputField(desc="positive, negative, or neutral")

classifier = dspy.Predict(SentimentClassifier)

# 2. Define training data (20-50 examples recommended)
trainset = [
    dspy.Example(text="I love this!", sentiment="positive"),
    dspy.Example(text="Terrible service.", sentiment="negative"),
    # ... add more examples
]

# 3. Define metric
def sentiment_accuracy(example, pred, trace=None):
    return example.sentiment.lower() == pred.sentiment.lower()

# 4. Configure COPRO
copro = COPRO(
    metric=sentiment_accuracy,
    breadth=10,  # Candidates per generation
    depth=3      # Number of generations
)

# 5. Compile
optimized_classifier = copro.compile(classifier, trainset=trainset)

# 6. Usage
result = optimized_classifier(text="This exceeded all expectations!")
print(result.sentiment)

βš™οΈ Advanced Configuration

Fine-tune COPRO's behavior for your specific needs.

Parameter Description Default Recommended for Reasoning
breadth Candidates per generation 10 15-20
depth Number of generations 3 3-5
init_temperature Creativity for initial candidates 1.4 1.5
prompt_model LM used to generate prompts None (uses task model) GPT-4 / Stronger Model

Using a Stronger Teacher Model

A common strategy is to use a powerful model (like GPT-4) to generate prompt candidates, but optimize them for a smaller, faster model (like GPT-3.5 or a local model).

# Use GPT-4 to propose instructions
prompt_generator = dspy.LM(model="openai/gpt-4")

# Optimize for GPT-3.5
target_model = dspy.LM(model="openai/gpt-3.5-turbo")
dspy.configure(lm=target_model)

copro = COPRO(
    metric=your_metric,
    breadth=12,
    depth=4,
    prompt_model=prompt_generator  # Teacher model
)

optimized_program = copro.compile(program, trainset=trainset)

πŸ†š COPRO vs. Other Optimizers

Feature COPRO BootstrapFewShot MiPRO
Primary Target Instructions Demonstrations Both
Method Evolutionary Search Bootstrap Sampling Bayesian Optimization
Best For Instruction-sensitive tasks, Reasoning Tasks with good examples Complex pipelines, Max performance
Speed Medium Fast Slow

Recommendation: Use COPRO when you have a tricky reasoning task where the exact wording of the prompt matters significantly, or when you have limited data for demonstrations.

🌍 Real-World Applications

1. Medical Triage

In high-stakes domains like healthcare, instruction precision is critical. COPRO can evolve prompts that strictly adhere to safety protocols.

# Metric penalizes dangerous errors heavily
def triage_metric(example, pred, trace=None):
    correct = pred.level == example.level
    # Critical penalty for under-triaging emergencies
    if example.level == "emergency" and pred.level != "emergency":
        return 0.0 
    return 1.0 if correct else 0.5

copro = COPRO(
    metric=triage_metric,
    breadth=15, 
    depth=5, 
    init_temperature=1.2 # Lower temp for stability
)

2. Legal Analysis

For tasks requiring specific vocabulary and structure, COPRO can find the "magic words" that align the model with legal standards.

✨ Best Practices

  • Diverse Training Data: Ensure your 20-50 training examples cover various edge cases. If they are all simple, COPRO won't learn to handle complexity.
  • Meaningful Metrics: Your metric determines the direction of evolution. A binary (0/1) metric provides less signal than a continuous score (0.0-1.0) or a weighted multi-component metric.
  • Start Conservative: Begin with breadth=8, depth=2 to check for quick wins. If promising, expand to breadth=15, depth=5.
  • Combine Optimizers: You can optimize instructions with COPRO first, then use BootstrapFewShot on the result to add demonstrations.