Chapter 6

Retrieval Augmented Guardrails

Enhancing AI safety in high-stakes domains by evaluating outputs against retrieved historical contexts and similar cases.

Why Retrieval for Guardrails?

Standard AI guardrails often check outputs against static rules ("Do not mention X"). In complex fields like healthcare, this isn't enough. An answer might be safe in one context but dangerous in another.

Retrieval-Augmented Guardrails improve safety by retrieving similar past cases (e.g., previous patient messages and clinician responses) to use as a "ground truth" reference for evaluating the current output.

The Evaluation Pipeline

This system acts as a sophisticated judge:

  1. Retrieve: Find historically similar interactions.
  2. Compare: Use an ErrorClassifier module to check if the new AI response deviates from the clinical standards established in those historical examples.
  3. Assess Severity: If an error is found, determine if it's a minor tone issue or a critical safety risk.

Implementation

class RetrievalAugmentedEvaluator(dspy.Module):
    def forward(self, patient_message, ai_response):
        # 1. Get context
        similar_cases = self.retriever(query=patient_message, k=3)
        
        # 2. Check for errors against that context
        classification = self.error_classifier(
            message=patient_message,
            response=ai_response,
            context=similar_cases
        )
        
        return classification

Performance Gains

Adding retrieval to the guardrail system resulted in:

  • 50% higher concordance with human reviewers compared to non-retrieval baselines.
  • 42% better identification of safety concerns.