The Challenge
Writing captions for scientific figures is hard. The caption must be technically precise and match the specific writing style of the paper (or journal). Generic image captioning models fail here because they lack context and domain knowledge.
Two-Stage Pipeline
We solve this with a dual-stage approach:
- Context-Aware Generation: A module retrieves relevant text from the paper (e.g., surrounding the figure reference) to understand what the figure shows.
- Stylistic Refinement: A second module rewrites basic captions to match the target author's style using few-shot examples from their previous work.
Stage 1: Category-Specific Optimization
We use MIPROv2 to optimize prompts specifically for different figure types (e.g., bar charts, scatter plots, diagrams).
def optimize_for_category(training_data, category):
# Filter data for just this figure type (e.g. "bar_graph")
category_data = [x for x in data if x.category == category]
# Compile a specialized module
optimizer = dspy.MIPROv2(metric=caption_quality_metric)
return optimizer.compile(CategoryCaptionModule(), trainset=category_data)
Results
This specialized pipeline achieved significant metric improvements:
- ROUGE-1 Recall: +8.3%
- Style Consistency: ~45% improvement in BLEU scores against author profile examples.