Business Challenge
DDI (Development Dimensions International) needed to scale their leadership assessments. Human scoring was accurate but slow (24-48 hours) and expensive.
DSPy Optimization Pipeline
They built a pipeline that breaks down the assessment into analysis, scoring, and report generation steps.
Behavioral Assessment Pipeline
Python
class BehavioralAssessmentPipeline(dspy.Module):
def __init__(self):
self.response_analyzer = ChainOfThought("question, response -> analysis")
self.scorer = Predict("analysis, criteria -> scores")
self.report_generator = ChainOfThought("scores, framework -> report")
def forward(self, question, response, framework):
analysis = self.response_analyzer(question, response, framework)
scores = self.scorer(analysis, framework)
return self.report_generator(scores, framework)
Prompt Optimization
Using `BootstrapFewShot`, they optimized prompts against expert human scores. This increased the recall score from 0.43 to 0.98.
Impact
- 17,000x Faster Delivery (seconds vs. days)
- 95% Cost Reduction
- 95% Scoring Agreement with experts