CI for AI systems
Detect behavioral regressions before they hit production. Automated testing for non-deterministic AI systems.
Regrada captures LLM interactions, runs evaluations against test cases, and detects when your AI's behavior changes between commits.
$ regrada test
Running test cases...
✓ greeting.hello
✓ refund.lookup
✗ customer.onboarding
Policy violation: assertions (min_pass_rate: 1.0)
Total: 3 | Passed: 2 | Failed: 1
What Regrada Does
LLMs are non-deterministic — they don't fail loudly, they change quietly. Regrada catches behavior changes before deployment.
>Record LLM API calls via HTTP proxy (regrada record)
>Convert traces into YAML test cases (regrada accept)
>Run cases against baselines and enforce policies (regrada test)
Without Regrada
✗Model updates break production
✗Prompt changes cause silent failures
✗No way to catch regressions early
✗Manual testing is slow & incomplete
With Regrada
✓Regressions caught in CI
✓Every change is validated
✓Automated behavioral testing
✓Ship with confidence
Core Features
Everything you need to test and validate your AI systems.
Policy-Based Detection
Configurable policies for assertions, PII detection, text variance, refusal rates, and latency thresholds. Define policies as code and enforce them in CI.
YAML Test Cases
Define test cases with structured inputs and assertions including text contains, max chars, and JSON schema validation. Portable files stored in your repo.
HTTP Proxy Recording
Capture LLM API traffic with HTTPS MITM proxy. Records to JSONL with session metadata and redaction presets. Zero code changes required.
CI/CD Enforcement
First-class GitHub Actions integration with automatic PR comments, regression failures, and detailed test output. Works with any CI system.
Model-Agnostic
Automatically detects and captures calls to OpenAI, Anthropic, Azure OpenAI, Google AI, Cohere, Ollama, and custom endpoints.
Baseline Modes
Store and compare baselines flexibly with local filesystem snapshots or git refs. Baselines keyed by case, provider, model, and params.
How It Works
A simple workflow that integrates into your existing CI/CD pipeline.
Record Traffic
Run regrada record to capture LLM API calls via HTTP proxy
Accept Traces
Run regrada accept to convert recorded traces into YAML test cases and baseline snapshots
Run Tests
Run regrada test to execute cases, diff against baselines, and evaluate policies
Enforce in CI
Integrate with GitHub Actions to block merges on policy violations
Tests Pass
All checks successful, ready to deploy
Tests Fail
Regression detected, review changes
Who It's For
• AI startups shipping fast
• Teams running LLMs in production
• Infra / platform engineers
• Enterprises with compliance
If AI is part of your critical path, you need Regrada.
Why Regrada
• Traditional tests can't catch LLM behavior changes
• Model updates and prompt changes need the same rigor as code
• Catching regressions in CI is faster and cheaper than debugging in production
Regrada makes AI systems testable and reliable.
Pricing
Start free and scale as you grow. No hidden fees.
Starter
For individual developers exploring Regrada
Team
For small teams running AI features in production
Scale
For companies scaling CI and AI workflows
Test your AI like you test your code.
Catch behavioral regressions before they reach production.
Stay updated on new features and releases