Guardrails

A site’s guardrails are the rules Sill evaluates against every incoming mandate before the merchant’s processor is ever called. Each rule belongs to a category, carries typed parameters, and resolves to one of three actions: allow (explicit exemption), reject, or escalate to a human reviewer. Rules are grouped into an ordered, versioned policy; exactly one policy is active per site at any moment. The dashboard’s Guardrails view is where an operator authors, dry-runs, and publishes that policy.

How a mandate meets the rules

Every signed mandate that reaches Sill’s edge is evaluated against the site’s active policy in order. The first rule that matches decides the outcome and short-circuits the rest — the mandate is approved, rejected, or escalated to the human-in-the-loop queue. Each evaluation is written to the audit envelope as a signed, Merkle-chained record.

flowchart LR
  M[Signed mandate] --> V[Verifier]
  V -->|reject| R[Rejected]
  V -->|pass| P[Policy engine]
  P -->|match: allow| A[Approve]
  P -->|match: reject| R
  P -->|match: escalate| Q[HITL queue]
  P -->|no match| A
  A --> AUD[Signed audit record]
  R --> AUD
  Q --> AUD

Guardrails configured here run inside the policy stage. A handful of rules are enforced outside it — see Enforcement layers below.

Rule categories

The Guardrails view groups rules into seven categories, each with its own filter chip and count in the header strip. Adding a rule is a two-pass picker: choose a category, then a specific rule, then fill in its typed parameters.

Category	What it gates	Example rules
Agent identity	Who can call	Allowlisted agents only; require valid IntentMandate; mandate replay protection; geofence; mandate validity window cap; delegation-chain policy
Rate limits	How fast	Per-agent calls per minute; per-IP calls per hour; aggregate cap across agents; failed-auth lockout; MCP session rate limit
Spend caps	How much	Max per transaction (per-currency); daily cap per user; cart total ≤ Intent ceiling; currency must match Intent; cart immutability after authorization
Scope	What actions	HITL on destructive actions; per-customer data scoping; skill manifest integrity; mandate body size limit; emergency kill switch
Dark patterns	Manipulative buying signals	No urgency manipulation; no drip pricing; subscription requires explicit consent
Prompt injection	Adversarial inputs	Instruction-override detection; Unicode tag block; credential-leak detection
Custom DSL	Merchant-authored predicate	A single rule whose body is an expression in Sill’s policy DSL

Many rules cite a standard — AP2, OWASP API Top 10, OWASP Agentic ASI, OWASP MCP, UK CMA / DMCC, MITRE ATLAS — to make the intent of the guardrail explicit. Where a rule’s enforcement is delegated to a layer outside the policy evaluator, the dashboard renders an inline pill (ENFORCED AT VERIFIER, ENFORCED AT EDGE, etc.) so the operator sees that the rule is always-on and not merchant-configurable.

The Guardrails view, with the category filter strip across the top and one column of rule cards below.

Adding and configuring a rule

The Add rule modal is a two-pass picker. The first pass shows the seven categories; the second shows the rules in the chosen category, with each entry tagged either with its parameter form or with a COMING SOON badge. Rules whose handler has not yet shipped in packages/policy cannot be added — the dashboard preserves the roadmap entry so an operator can see what is coming but never publish an unenforceable rule.

Once a rule is selected, its typed parameter form swaps in. Common shapes:

Allowlist — a list of registered agent_ids (Agent identity → Allowlisted agents only).
Per-currency cap — one cap per ISO-4217 currency, with no FX conversion (Spend → Max per transaction; Scope → HITL on destructive actions).
Country allow/deny — an allow-list or deny-list of ISO country codes (Agent identity → Geofence).
Whole-second window — a maximum validity window (Agent identity → Mandate validity window cap).
Byte limit — a maximum mandate body size (Scope → Mandate body size limit).
Toggle-only — for emergency kill switch and similar always-or-never controls.
Custom DSL — a free-form expression in Sill’s policy DSL.

Each rule also carries an on-match action of allow, reject, or escalate. The Guardrails view’s parameter hints reflect the selected action — a “max amount cap” rule set to allow reads as an explicit exemption that skips later rules, not a rejection.

Draft, active, and the fingerprint

Editing a rule produces a draft for the site. The active policy keeps running against live traffic unchanged; the draft is what the next publish will activate. The view computes a stable fingerprint over the current rule set (rulesFingerprint(rules)) and compares it with the active policy to classify each row as unchanged, edited, added, or removed. Drafts auto-save to the server with a short debounce so a tab refresh does not lose work.

When a draft is published, Sill bumps the policy version label, writes the new active policy, and records the change in the audit envelope. Auto-seeded baselines carry a recognisable label so the dashboard and audit trail can distinguish them from a merchant-authored publish.

Dry-run (shadow evaluation)

Dry-run is a staging mode for the live policy. While dry-run is on, the active policy continues to enforce on live traffic exactly as before; in parallel, Sill evaluates the operator’s draft rules against the same incoming mandates and records every decision the draft would have made into a 7-day shadow log. Nothing is blocked, escalated, or auto-approved by the draft — only logged.

sequenceDiagram
  participant Op as Operator
  participant Dash as Guardrails view
  participant Edge as Policy engine
  participant Shadow as Shadow log (7d)

  Op->>Dash: Enable dry-run
  Op->>Dash: Edit a rule (e.g. tighten r05 cap)
  Dash->>Edge: Auto-save draft (debounced)
  Note over Edge: Active policy keeps enforcing<br/>against live mandates.
  Edge-->>Shadow: "Would have blocked" for each<br/>draft-vs-active divergence
  Op->>Dash: Review shadow log (REJECT / ESCALATE / INDETERMINATE)
  Op->>Dash: Publish (or revert)
  Dash->>Edge: Bump version, swap active = draft

The DRY-RUN RESULTS card is collapsed by default beneath the rules grid and surfaces:

Summary header — total, REJECT count, ESCALATE count, over the 7-day window.
By rule histogram — which draft rules are noisiest in the loaded shadow buffer.
Per-row “why” sentence — a categorical, plain-text explanation derived from the catalog + the operator’s configured parameters. The sentence is never built from the inbound mandate.
INDETERMINATE pill — for rate-limit rules whose shadow verdict cannot be a definitive statement (rate state is not evaluated in shadow). The pill makes the limit honest rather than asserting false certainty.
View mandate — opens the underlying mandate inline. Buyer reveal is not auto-opened.
Draft fingerprint changed — surfaced when the draft has been edited since the panel was opened, so the operator knows the prior shadow rows are stale.

Dry-run is enabled with explicit confirmation and disabled with a quiet toast — it is a per-operator UI mode, not a server-side enforcement switch.

Dry-run mode: the banner sits above the rules grid; the DRY-RUN RESULTS card sits below and lists every draft-vs-active divergence over the last seven days.

QuickTest

QuickTest lets an operator paste a sample mandate body into the Guardrails view and see exactly which rules match, in which order, and with which outcome — without enforcing anything. Rules enforced outside the policy evaluator (verifier, webhook, response, dashboard, edge, origin) are intentionally not testable from QuickTest, since they never run in the evaluator path.

Publishing

When the draft is ready, Publish bumps the version, writes the new active policy server-side, and records the change in the audit log. The next mandate that arrives is evaluated against the new policy. If a rule fails the server’s publish gate (an unsupported type, a malformed parameter, or a non-policy-configurable rule that should never have reached the draft), the publish is rejected with a rule_{i}_{reason} code and the active policy is left untouched.

Enforcement layers

Most rules are enforced inside the policy evaluator on the edge. A small set is enforced elsewhere in the pipeline; they appear in the Guardrails view for visibility but cannot be edited or disabled.

Layer	When it runs	Examples
`verifier`	Before policy evaluation	Mandate replay protection (`r11`); failed-auth lockout (`r24`); cart immutability after authorization (`r16`)
`policy`	The evaluator stage — merchant-configurable	The bulk of the rule catalog
`origin`	Configured in the policy; enforced at the rail executor right before the charge (needs server-side aggregates the edge cannot run)	Daily spend cap per user (`r06`)
`webhook`	The inbound-webhook handler (HMAC-SHA256 against the rotating secret)	Webhook signature verification (`r27`)
`response`	A post-response hook on the agent’s outbound payload	Outbound sanitization
`dashboard`	Gated on a human dashboard-user’s role	Refund authorization window (`r26`, roadmap)
`edge`	Always-on ingress guard, before any mandate exists	MCP session rate limit (`r30`)

The dashboard surfaces a tooltip on every non-policy rule explaining where it runs and why it cannot be tested from QuickTest.

Frequently asked

Does turning on dry-run weaken my active policy? No. The active policy continues to enforce exactly as before. Dry-run evaluates draft changes in parallel and logs what they would have done. The dashboard’s enable-dry-run modal is explicit about this.

Can I run two policies side by side on the same site? No. Each site has exactly one active policy. Dry-run gives you a staged-but-not-enforcing second view of the draft against the same live mandates.

What happens if a rule’s parameters are invalid at publish time? The publish is rejected with a rule_{i}_{reason} code; the active policy is unchanged. The dashboard surfaces the error inline so the operator can correct it.

Are shadow-log rows signed and exportable? Shadow rows are dashboard-only operator artifacts bounded to a 7-day window and a per-fetch cap. The signed, exportable record is the audit envelope written by the active policy. See Audit log and export.

Why is a rate-limit row in the shadow log labelled INDETERMINATE? Rate-limit state (r03, r04, r13, r17) is not evaluated in shadow, so a “would-have-blocked” verdict on those rules is not a definitive statement that enforcement would block. The pill makes the limit honest rather than overclaiming.