Chapter 5

InPars+: Synthetic Data for IR

Generate high-quality synthetic queries for information retrieval systems.

Key Innovations

  • CPO Fine-tuning: Improves generator quality with Contrastive Preference Optimization
  • DSPy Dynamic Optimization: Real-time prompt adaptation based on retrieval performance
  • 60% Reduced Filtering: Higher initial query quality
  • Neural IR Integration: Works seamlessly with neural re-rankers

CPO Query Generator

class CPOQueryGenerator(dspy.Module):
    def __init__(self, model_name="mistralai/Mistral-7B"):
        super().__init__()
        self.query_generator = dspy.Predict(
            """Generate diverse, relevant search queries based on the document.
            Document: {document}
            Generate {num_queries} unique queries that would retrieve this document."""
        )

    def generate_queries(self, document, num_queries=5):
        result = self.query_generator(document=document, num_queries=num_queries)
        queries = self._parse_queries(result.queries)
        return self._ensure_diversity(queries)