Refunds

A refund in Sill is an agent-initiated return that travels the same signed pipeline as the original purchase: a signed request_refund mandate enters the edge, the active policy evaluates it, a human reviewer clears the escalation, and the merchant’s existing processor (Stripe today; Shopify in test mode) executes the refund. The refund outcome — together with the original mandate it references — is written to the signed, Merkle-chained audit envelope. Sill never custodies funds: Stripe holds the card, performs the refund, and pays the buyer back. Sill issues the signed authorization and the audit record.

How a refund flows

A refund is a second mandate that points back at the original. The agent never names an amount it pleases — the refund amount is server-derived as min(signed cap, original settled total), and the refund settles on whatever rail settled the original.

sequenceDiagram
    autonumber
    participant Agent
    participant Edge as Sill edge
    participant Reviewer as Human reviewer
    participant Origin as Sill origin
    participant Processor as Stripe (or Shopify)
    participant Audit as Audit envelope

    Agent->>Edge: Signed refund mandate (request_refund + original_mandate_id)
    Edge->>Edge: Verify signature, identity, and active policy (r07 → escalate)
    Edge->>Reviewer: Pause and route to dashboard queue
    Reviewer-->>Edge: approve
    Edge->>Origin: Enqueue refund dispatch
    Origin->>Origin: Pre-rail gates (rail selection, tenancy, refund-on-refund)
    Origin->>Processor: Refund call against original charge or order
    Processor-->>Origin: succeeded / failed
    Origin->>Audit: Signed refund record + settlement evidence
    Audit-->>Agent: Verifiable refund outcome

The refund mandate shape

A refund mandate is a normal Sill mandate with intent.action = "request_refund" and a closed, narrowly-typed intent block. The agent may not assert order details (line totals, taxes, buyer identity) — those are resolved server-side from the original mandate’s own records. This is deliberate: it forecloses an “amount-smuggling” attack where the agent invents what it allegedly bought.

{
  "envelope": {
    "alg": "EdDSA",
    "kid": "agent-key-id-…"
  },
  "signed": {
    "mandate_id": "mnd_01J9F4Z9R6Q7K3P8Y2N5T6V1W2",
    "principal": { "type": "human", "ref": "buyer:opaque-id" },
    "agent": { "agent_id": "agent_anthropic_claude" },
    "site": { "site_id": "01EXAMPLE00000000000000000" },
    "intent": {
      "action": "request_refund",
      "original_mandate_id": "mnd_01J9C8P7K4V2H5F6Q8R9T3N1M2",
      "reason_code": "requested_by_customer",
      "scope": "full",
      "max_amount": 1.50,
      "currency": "USD",
      "merchant": "Example Merchant"
    },
    "issued_at": "2026-06-22T14:30:00Z"
  },
  "signature": "…"
}

Allowed reason_code values (closed set): requested_by_customer, defective, not_received, wrong_item, other.

scope is "full" in v1. A { "line_items": [...] } shape is parsed by the edge projector but the Stripe and Shopify refund executors only ship the full path today — anything else aborts with scope_unsupported before any rail call.

max_amount is a cap, never an assertion of the original total. The executor computes amount_minor = min(round(max_amount * 100), original.amount_minor) from server-side state. If the agent’s cap is too low, the refund is bounded by the cap; if the cap exceeds the settled total, the refund is bounded by the settled total.

Admission and the original-order pin

Before a refund mandate is admitted, the origin checks that the referenced original_mandate_id:

Belongs to the same site as the refund (same-site pinning, enforced in the JS match even though the SQL filter is account-wide under row-level security).
Has a settlement-authorizing decision — either approved or escalated_approved.
Actually settled on a refund-capable rail (charge paid for Stripe; order paid for Shopify).

If any check fails, the refund never reaches a rail call. The resulting state is a terminal failed_original_not_charged (or failed_tenancy_violation) row on refund_state, and a signed audit record describing the abort.

Policy and HITL

In the default policy set, a request_refund mandate hits r07 — destructive action requires human review, which escalates to the dashboard. The mandate is paused, no rail call runs, and the refund appears in the reviewer queue alongside a server-resolved view of the original order: the original audit record id, the line items at settlement, and a link to reveal the buyer block through the existing access-logged decrypt path (no PII rides inline on the queue wire).

A reviewer with the reviewer, admin, or owner user role approves or rejects. The resolution is appended to the audit envelope before the rail call runs. See Human in the loop for the full reviewer surface.

The reviewer queue for a refund escalation: the original-order panel renders server-resolved evidence; the agent’s intent carries only the closed refund triplet.

Rail selection and execution

A refund settles on the same rail that settled the original. The settlement-rail claim on the original mandate’s audit record drives dispatcher selection:

Original rail	Refund executor	Status
`stripe`	`stripe-refund-executor` (calls Stripe Refunds API)	Live-mode validated at dogfood scope
`shopify`	`shopify-refund-executor` (calls Shopify Admin GraphQL)	Test-mode only; live-mode gate not flipped

The closed set of refund-capable rails ({stripe, shopify}) is the single source of truth consumed by both:

The agent card backing predicate that decides whether request_refund may be advertised at all.
The post-commit dispatcher selection in the mandate queue worker.

Adding a future rail without wiring both paths is a compile error.

Stripe refunds

The Stripe path resolves the original charge_state row by (site_id, original_mandate_id), requires the original charge to be in a paid state, and calls Stripe’s refund endpoint against the recorded stripe_charge_id (or payment_intent when the charge id is unset). The Connect account is plumbed via the Stripe-Account header from the merchant’s stored integration.

Live mode requires three independent gates: the SILL_STRIPE_LIVE_GATE_PASSED deploy-pipeline secret, the per-account stripe_mode = live row, and the merchant’s decrypted live credential. Test-mode and live-mode credentials are dual-bound — a live event signed with the test webhook secret (or vice versa) is rejected.

Shopify refunds

The Shopify path mirrors the Stripe shape against shopify_order_state, resolves the order id from the original, and calls the Shopify Admin GraphQL refundCreate mutation. The live-mode gate for the Shopify rail is separate from the Stripe gate and has not been flipped — Shopify refunds today are test-mode only. See the overview for the honest bounds.

Idempotency and the refund-on-refund exclusion

Refunds live on a rail-agnostic refund_state table with two structural protections:

A UNIQUE (site_id, mandate_id) index makes the executor’s INSERT … ON CONFLICT DO NOTHING the idempotency lock for the refund mandate itself. Replaying the same mnd_… produces the same terminal state, never a second rail call.
A partial UNIQUE (site_id, original_mandate_id) WHERE state IN (non-failed…) index is the refund-on-refund exclusion: one non-failed refund per settled original, in v1. A second refund attempt against an already-refunded original raises 23505 and the executor surfaces already_executed evidence (with the existing stripe_refund_id echoed back) — again, no second rail call.

Both gates run before any external API call. The executor never throws past the queue worker; every failure returns structured settlement evidence and a terminal refund_state row.

Refund-state machine

pending_local_init
  ├─→ pending_stripe_call   ─→ refund_succeeded            (terminal)
  │                          ─→ refund_failed              (terminal)
  ├─→ pending_shopify_call  ─→ refund_succeeded            (terminal)
  │                          ─→ refund_failed              (terminal)
  ├─→ failed_tenancy_violation     (terminal — ZERO rail call)
  ├─→ failed_rate_limited          (terminal — ZERO rail call)
  ├─→ failed_original_not_charged  (terminal — ZERO rail call)
  └─→ failed_scope_unsupported     (terminal — v1 ships `full` only)

Every UPDATE carries a WHERE state IN (…) predicate, so a stale writer cannot regress a terminal row.

Webhook reconciliation

Stripe also emits asynchronous events that affect refund state independently of Sill’s executor:

Event	Handler outcome
`charge.refunded`	Marks the original `charge_state` as `refunded`; sets `refunded: true` and `refunded_minor` on the original audit record’s `discovery_context.settlement`.
`charge.dispute.created`	Marks the original `charge_state` as `disputed`; records the dispute reason and amount; logged at `error` severity for ops attention within Stripe’s evidence-submission window.

Both handlers require the connected-account id on the event to match the integration on file; mismatch → critical log + 200 ACK with zero state mutation. A webhook for a charge Sill did not create is acked without writing.

What the audit envelope records

The refund audit record carries — under discovery_context.settlement — a kind: "refund" evidence block discriminated by rail:

{
  "rail": "stripe",
  "kind": "refund",
  "outcome": "succeeded",
  "dispatched_at": "2026-06-22T14:31:08Z",
  "original_mandate_id": "mnd_01J9C8P7K4V2H5F6Q8R9T3N1M2",
  "stripe_refund_id": "re_3Ti4JaEAXJFotMa3...",
  "stripe_charge_id": "ch_3Ti4JaEAXJFotMa3...",
  "stripe_payment_intent_id": "pi_3Ti4JaEAXJFotMa3...",
  "stripe_mode": "live",
  "refund_state_id": "rfs_01J9F5...",
  "refunded_minor": 150,
  "original_total_minor": 150,
  "currency": "USD",
  "reason_code": "requested_by_customer"
}

The record is ed25519-signed and Merkle-chained alongside every other mandate the site processes. It is independently verifiable against the public JWKS using the same JCS + detached JWS recipe the agent card and ARD catalog use.

Outcome values: succeeded, reconciled, failed, aborted, already_executed, approved_but_rail_disabled. Reason values (on a non-success outcome) include the executor’s terminal reason — scope_unsupported, original_not_charged, already_executed, rate_limited_by_self — or a Stripe / Shopify ConnectorErrorClass.

Honest bounds

Stripe live rail. The signed-mandate → policy → refund → audit pipeline has cleared and refunded real live-mode Stripe charges end-to-end on a single Sill-controlled merchant. This is dogfood validation, not multi-merchant production refund volume.
Shopify live rail. Not flipped. Refunds against Shopify orders run test-mode only today; the live-mode gate is a separate founder/ops decision and requires Shopify Payments live plus the corresponding processor agreement.
Scope. Refund executors today ship the full scope only. The mandate shape accepts { "line_items": [...] } but the rail aborts with failed_scope_unsupported.
One non-failed refund per original. Partial / multi-tranche refunds are not in v1.
PCI posture. Refunds never see a raw PAN. Sill handles only opaque processor tokens (Stripe pm_* / charge ids / payment-intent ids). Architecture and a CI grep gate enforce this; Sill holds no PCI attestation today and claims none.

Common questions

Can an agent refund any amount it wants?

No. The agent’s max_amount is a cap. The actual refund amount is min(round(max_amount * 100), original.amount_minor), computed server-side from the original charge_state or shopify_order_state row. The agent has no way to assert a fabricated original total.

What if the same refund mandate is replayed?

The (site_id, mandate_id) unique index makes the executor’s INSERT-first idempotency lock fire — the second attempt returns the existing terminal state and produces no second rail call.

What if a second refund is requested against an already-refunded original?

The partial unique index on (site_id, original_mandate_id) raises a constraint violation; the executor surfaces already_executed evidence with the existing stripe_refund_id echoed back. No second rail call. Multi-tranche refunds are not in v1.

What happens if the original mandate was never charged?

The refund aborts with failed_original_not_charged and a signed audit record describing the abort. No rail call runs.

Can refunds skip human review?

In the default policy set, request_refund hits r07 destructive action and escalates. A merchant policy can be authored to approve refunds automatically under specific conditions, but the shipped default routes every refund through a reviewer.

What about disputes and chargebacks?

charge.dispute.created is handled at the webhook layer: the original charge_state is marked disputed, the dispute reason and amount are recorded on the original audit record’s settlement evidence, and the event is logged at error severity for ops attention. Sill does not generate dispute evidence on the merchant’s behalf — that is the merchant’s responsibility inside the Stripe Dashboard.