Run A/B tests on your Claude prompts

Sets up automated side-by-side evaluation of prompt variants using LLM judges you define. Tracks quality scores across sampling and flags regressions.

Best for: Engineers shipping Claude integrations who need confidence that prompt changes don't break things.

Engineering / pipelines-dataatomicfor-engineersneeds-integrationreview

Topics

agent-skillslaunchdarkly-aimanaged-by-terraform

Source

Creator's repository · launchdarkly/ai-tooling

View on GitHub

License: NOASSERTION