Chapter 6 · Section 2

RAG Systems

Building intelligent document Q&A pipelines.

~45 min read

What is RAG?

Retrieval-Augmented Generation (RAG) systems combine the strengths of information retrieval with language generation to answer questions based on large collections of documents.

QUESTION RETRIEVE GENERATE ANSWER

Building a Basic RAG System

In DSPy, a RAG system is a Module that uses a dspy.Retrieve component alongside a generation component (dspy.Predict or dspy.ChainOfThought).

import dspy

class BasicRAG(dspy.Module):
    def __init__(self, num_passages=5):
        super().__init__()
        # 1. Define Retrieval Component
        self.retrieve = dspy.Retrieve(k=num_passages)
        
        # 2. Define Generation Component
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        # 1. Retrieve Phase
        context = self.retrieve(question).passages
        
        # 2. Generation Phase
        prediction = self.generate_answer(context=context, question=question)
        
        return dspy.Prediction(
            context=context,
            answer=prediction.answer
        )

Advanced RAG Techniques

Real-world questions often require more than simple keyword search. Here are two powerful patterns.

1. Multi-Stage RAG

Perform broad retrieval first, then re-rank or filter to find the most precise information.

class AdvancedRAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=10)
        self.rerank = dspy.Predict("query, documents -> ranked_documents")
        self.generate = dspy.ChainOfThought("question, context -> answer")

    def forward(self, question):
        # 1. Broad Retrieval (Get 10 docs)
        initial_docs = self.retrieve(question).passages
        
        # 2. Re-rank (Filter to top 3)
        reranked = self.rerank(
            query=question, 
            documents=initial_docs
        )
        top_context = reranked.ranked_documents[:3] 
        
        # 3. Generate Answer
        return self.generate(question=question, context=top_context)

2. Query Expansion

Sometimes the user's question isn't the best search query. Use an LLM step to improve the search terms.

class QueryExpansionRAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.expand_query = dspy.ChainOfThought("question -> search_terms")
        self.retrieve = dspy.Retrieve(k=5)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        # 1. Expand Query
        search_terms = self.expand_query(question=question).search_terms
        
        # 2. Retrieve using optimized terms
        context = self.retrieve(search_terms).passages
        
        # 3. Generate
        return self.generate(context=context, question=question)

Optimizing RAG Systems

RAG pipelines are excellent candidates for DSPy optimization because prompts for "retrieval query generation" and "answering based on context" are hard to hand-tune.

Optimization Tip: Use MIPRO if your RAG pipeline has multiple steps (like query expansion + re-ranking + generation), as it can optimize prompts across the entire chain simultaneously.

# Define a metric that checks if the answer is grounded in the retrieved context
def rag_metric(example, pred, trace=None):
    # Check if answer is correct AND supported by context
    return "correct" in pred.answer and has_citations(pred.answer)

# Compile
optimizer = dspy.MIPRO(metric=rag_metric)
compiled_rag = optimizer.compile(AdvancedRAG(), trainset=trainset)