What is RAG?
Retrieval-Augmented Generation (RAG) systems combine the strengths of information retrieval with language generation to answer questions based on large collections of documents.
Building a Basic RAG System
In DSPy, a RAG system is a Module that uses a dspy.Retrieve component
alongside a generation component (dspy.Predict or
dspy.ChainOfThought).
import dspy
class BasicRAG(dspy.Module):
def __init__(self, num_passages=5):
super().__init__()
# 1. Define Retrieval Component
self.retrieve = dspy.Retrieve(k=num_passages)
# 2. Define Generation Component
self.generate_answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
# 1. Retrieve Phase
context = self.retrieve(question).passages
# 2. Generation Phase
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(
context=context,
answer=prediction.answer
)
Advanced RAG Techniques
Real-world questions often require more than simple keyword search. Here are two powerful patterns.
1. Multi-Stage RAG
Perform broad retrieval first, then re-rank or filter to find the most precise information.
class AdvancedRAG(dspy.Module):
def __init__(self):
super().__init__()
self.retrieve = dspy.Retrieve(k=10)
self.rerank = dspy.Predict("query, documents -> ranked_documents")
self.generate = dspy.ChainOfThought("question, context -> answer")
def forward(self, question):
# 1. Broad Retrieval (Get 10 docs)
initial_docs = self.retrieve(question).passages
# 2. Re-rank (Filter to top 3)
reranked = self.rerank(
query=question,
documents=initial_docs
)
top_context = reranked.ranked_documents[:3]
# 3. Generate Answer
return self.generate(question=question, context=top_context)
2. Query Expansion
Sometimes the user's question isn't the best search query. Use an LLM step to improve the search terms.
class QueryExpansionRAG(dspy.Module):
def __init__(self):
super().__init__()
self.expand_query = dspy.ChainOfThought("question -> search_terms")
self.retrieve = dspy.Retrieve(k=5)
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
# 1. Expand Query
search_terms = self.expand_query(question=question).search_terms
# 2. Retrieve using optimized terms
context = self.retrieve(search_terms).passages
# 3. Generate
return self.generate(context=context, question=question)
Optimizing RAG Systems
RAG pipelines are excellent candidates for DSPy optimization because prompts for "retrieval query generation" and "answering based on context" are hard to hand-tune.
Optimization Tip: Use MIPRO if your RAG pipeline
has multiple steps (like query expansion + re-ranking + generation), as it can
optimize prompts across the entire chain simultaneously.
# Define a metric that checks if the answer is grounded in the retrieved context
def rag_metric(example, pred, trace=None):
# Check if answer is correct AND supported by context
return "correct" in pred.answer and has_citations(pred.answer)
# Compile
optimizer = dspy.MIPRO(metric=rag_metric)
compiled_rag = optimizer.compile(AdvancedRAG(), trainset=trainset)