Batesian: an adversarial MCP and A2A scanner

Batesian sends crafted protocol traffic at a live agent endpoint and reports what broke, with the evidence attached. Scoped to MCP and A2A, nothing else. Go, Apache 2.0.

why it exists

MCP and A2A layer auth, identity, and session handling over JSON-RPC, and the interesting failures are protocol-specific: task IDs treated as capabilities, agent cards trusted without checking the signature, push configs that turn into SSRF, contexts that merge across principals. Generic web scanners don’t model any of that. They check the transport. Batesian checks the semantics sitting on top of it.

how it decides a finding is real

Every rule is an active probe with a built-in discriminator, so an open server never gets mislabelled as a broken-auth one. The task-IDOR rule creates a task as an authenticated owner, confirms the server rejects that same creation unauthenticated, then reads the owner’s task back over an anonymous connection. The finding is the read-back. If anonymous creation already succeeds, there’s no auth to break and the rule stays quiet.

Findings carry a confidence. confirmed means the exploit landed and Batesian watched it: a forged token accepted against a rejecting baseline, a tampered artifact read back. indicator means the posture is suspect but unprovable from the response alone. Gate CI on confirmed, triage the rest when you have time.

the rules

A few, to show the range:

Push-notification SSRF is out-of-band only. It starts a listener, registers an attacker callback, submits a task, and reports confirmed solely on a real inbound hit. Accepting the config is in-spec and never flagged.
Multi-tenant isolation, context fixation, delegation integrity are stateful multi-principal chains. Two valid distinct credentials, a discriminator proving the server isn’t simply open, then a cross-principal read or continuation. Some consume an upstream rule’s task ID off the blackboard rather than minting their own.
JWS algorithm confusion is static analysis of the card’s signatures block (alg: none, symmetric verification, cross-domain jku), reported as indicator since it never forges against the live server.

Each maps to a CWE and ships remediation. Rules are YAML. The set is intentionally small; I’d rather ship a couple dozen that mean something.

real findings

Confirmed against Google’s a2a-samples reference implementation, and an unfixed push-notification SSRF in a2a-python as of April. This is the code everyone forks from.

run it

go install github.com/calbebop/batesian/cmd/batesian@latest

batesian scan --target https://agent.example.com --output sarif > results.sarif

Source, rules, and the full catalog: github.com/calbebop/batesian.