Chapter 8 · Case Study 9

Replit Code Repair

Using DSPy to fix code bugs automatically by generating synthetic training data for LSP diagnostics.

~20 min read

Business Challenge

Replit faced a situation where only 10% of code errors identified by their Language Server Protocol (LSP) had automated fixes. This left developers to manually debug huge numbers of errors.

Data Pipeline Architecture

Replit utilized DSPy to synthesize code fixes. The pipeline analyzes the error, synthesizes a fix, and then verifies it.

Code Repair Pipeline

Python
class CodeRepairPipeline(dspy.Module):
    def __init__(self):
        self.diagnostic_analyzer = ChainOfThought("code, error -> analysis")
        self.fix_synthesizer = ChainOfThought("code, analysis -> diff")
        self.fix_verifier = Predict("code, diff -> valid")

    def forward(self, code_file, error_line, error_message):
        analysis = self.diagnostic_analyzer(code_file, error_line, error_message)
        diff = self.fix_synthesizer(code_file, analysis)
        verification = self.fix_verifier(code_file, diff)
        return verification

Synthetic Data Generation

Using this pipeline, they generated a dataset of over 100,000 synthetic fixes to train a smaller 7B parameter model that could run efficiently in production.

Impact

  • 35% Automated Fix Rate (up from 10%)
  • 50,000 Daily Fixes suggested to users
  • 68% User Acceptance Rate