Agent Red-Team

Red-team your agent.

Most guardrails check what an agent says. That's the wrong unit. An agent is dangerous when it takes an action — wires money, emails data out, deletes records. We score the tool-call trajectory, which text classifiers are structurally blind to.

Run a sample agent below and watch it get talked into exfiltrating customer data — then test your own.

Already know you need the certificate?

Red-team consolescripted sample · real scorer

Sample target

A help-desk agent with email, ticketing, billing, and account tools — the kind most teams ship first.

The verdicts come from lib/redteam/agentTrace.ts— a deterministic scorer that inspects the agent's actual tool calls (forbidden tool, unapproved action, exfiltration sink, injected action). The sample agents' responses are scripted so the demo is reliable and free; the same corpus runs against live agents via npm run redteam:agent. We'd rather under-claim and earn trust.

Agent Red-Team

Red-team your agent.

Run a sample agent below and watch it get talked into exfiltrating customer data — then test your own.

Already know you need the certificate?

Red-team consolescripted sample · real scorer

Sample target

A help-desk agent with email, ticketing, billing, and account tools — the kind most teams ship first.