The Example Class
DSPy uses the Example class to represent individual data points for
training and evaluation.
Basic Example Creation
import dspy
# Create a simple example
example = dspy.Example(
question="What is the capital of France?",
answer="Paris"
)
# Access fields
print(example.question) # "What is the capital of France?"
print(example.answer) # "Paris"
The with_inputs() Method
The with_inputs() method is criticalβit tells DSPy
which fields are inputs vs. expected outputs:
import dspy
# Create example and mark which fields are inputs
example = dspy.Example(
question="What is the capital of France?",
answer="Paris"
).with_inputs("question")
# Now DSPy knows:
# - "question" is an INPUT (given to the module)
# - "answer" is an OUTPUT (expected result for evaluation)
# Access input fields
print(example.inputs()) # {"question": "What is the capital of France?"}
Tip: Always use with_inputs() immediately after
creating an Example. Without it, DSPy optimizers and evaluators
won't know how to use your data.
Loading Datasets
From Python Dictionaries
raw_data = [
{"q": "What is 2+2?", "a": "4"},
{"q": "What is 3*3?", "a": "9"},
]
# Convert to DSPy Examples
dataset = [
dspy.Example(question=item["q"], answer=item["a"]).with_inputs("question")
for item in raw_data
]
From Hugging Face
from dspy.datasets import DataLoader
# Load from Hugging Face Hub
loader = DataLoader()
raw_data = loader.from_huggingface(
dataset_name="squad",
split="train",
fields=("question", "context", "answers"),
input_keys=("question", "context")
)
Train/Dev/Test Splits
Proper data splitting is essential for valid evaluation.
| Split | Purpose | Usage |
|---|---|---|
| Training | Optimize prompts/demonstrations | Used by optimizer |
| Development | Tune hyperparameters, iterate | Used during development |
| Test | Final unbiased evaluation | Used once at the end |
import random
# Shuffle with fixed seed for reproducibility
random.Random(42).shuffle(data)
# Split into sets
trainset = data[:200] # 200 for training
devset = data[200:500] # 300 for development
testset = data[500:1000] # 500 for testing
Data Quality Checklist
Check Required Fields
Ensure every example has the necessary input and output fields.
Remove Duplicates
Clean your dataset to prevent data leakage and bias.
Verify Inputs Marked
Double-check that with_inputs() has been called on every
example.