Identifying agents
Sill identifies AI-agent traffic at the edge by matching each request’s identification signals — primarily the User-Agent string and an optional Sill-defined client hint — against a seeded identity registry of known agents (Anthropic, OpenAI, Google, and others). Every visiting client is then placed into one of three classes (matched_agent, unknown_agent, human_likely) and the classification is persisted onto the signed audit record, so the dashboard reads a stable value rather than re-classifying at query time.
Identification is informational, not authoritative. A malicious actor can claim any User-Agent. Discovery surfaces who the client says it is; cryptographic authorization for actions (ed25519-signed mandates) is the Transactional path.
The identity registry
Section titled “The identity registry”The registry is a small set of AgentIdentitySeed records. Each record carries a stable agent_id, an organization, a user_agent_pattern (a regex), and — optionally — a sec_ch_ua_sill_agent exact-match string. In the framing of the A2A spec ecosystem and Google’s AP2 mandate work, this is the identity layer; the intent and proof layers live in the Transactional mode.
The registry is loaded from two sources, merged with a strict precedence rule:
- Bundled registry (floor). A read-only set compiled into the edge build. It is always present.
- Managed KV namespace (additive only). Sill’s origin may add new agents under unseen
agent_ids by writing to a Workers KV namespace bound at the edge. KV can never override or remove a bundled entry. A collision is ignored and logged.
A boot-time canary asserts the named-organization anchors (agent_anthropic_claude, agent_openai_gpt, agent_google_gemini) are present on every isolate. A missing canary emits an error-level structured log; serving behavior is unchanged (the floor is the safety).
Seeded agents
Section titled “Seeded agents”The bundled registry covers the agent-mode traffic Sill expects to see on merchant sites today, plus the social link-preview crawlers that commonly appear in audit logs (so the merchant does not have to triage them as “unknown agent”):
| Organization | Agent id | Matches |
|---|---|---|
| Anthropic | agent_anthropic_claude | ClaudeBot (training crawler) |
| Anthropic | agent_anthropic_claude_user | Claude-User (user-initiated fetch) |
| OpenAI | agent_openai_gpt | GPTBot (training crawler) |
| OpenAI | agent_openai_chatgpt_user | ChatGPT-User (browse tool) |
| OpenAI | agent_openai_chatgpt | ChatGPT/... (mobile in-app fetch) |
| OpenAI | agent_openai_searchbot | OAI-SearchBot (ChatGPT Search) |
agent_google_gemini | Google-Extended | |
| Perplexity | agent_perplexity | PerplexityBot |
| Microsoft | agent_microsoft_bingbot | bingbot (also grounds Copilot) |
| Meta | agent_meta_externalagent | meta-externalagent (Llama training) |
| Meta | agent_meta_externalfetcher | meta-externalfetcher (Meta AI user fetch) |
| Apple | agent_apple_extended | Applebot-Extended |
| DuckDuckGo | agent_duckduckgo_assistbot | DuckAssistBot |
| Mistral | agent_mistral_user | MistralAI-User |
| X (Twitter) | agent_x_twitterbot | Twitterbot (link preview) |
| Meta | agent_meta_facebook_external | facebookexternalhit (link preview) |
| Meta | agent_meta_facebot | Facebot (link preview) |
agent_linkedin_bot | LinkedInBot (link preview) | |
| Slack | agent_slack_link_expander | Slackbot-LinkExpanding (link unfurl) |
| Discord | agent_discord_bot | Discordbot (embed preview) |
Note that ChatGPT Atlas (OpenAI’s agentic browser) sends a standard Chrome User-Agent with no public, cryptographically signed identity. OpenAI’s own documentation states Atlas “cannot be reliably detected with simple user agent filters.” Atlas visits will appear as human_likely and cannot be distinguished from a real Chrome user without a signed-mandate handshake (the Transactional path).
Match precedence
Section titled “Match precedence”Given a visiting request, the matcher tries signals in this order:
Sec-CH-UA-Sill-Agentexact match. A Sill-defined client hint a registered agent (or the embed script’s own propagation) can send to assert identity. Strongest signal — exact-match against the registry wins immediately.User-Agentregex match. Each registry entry’suser_agent_patternis compiled to aRegExpper snapshot (cached for the lifetime of the loaded registry). The pattern uses a\bword-boundary anchor so Mozilla-wrapped UAs likeMozilla/5.0 ... (compatible; bingbot/2.0; ...)still match. If multiple patterns hit, the longest source pattern wins (most specific).- No match. The discovery record still ships with
agent_idomitted andmatch_signal = 'none'.
flowchart TD
A[Inbound request] --> B{sec_ch_ua_sill_agent<br/>exact match?}
B -- yes --> M[matched agent]
B -- no --> C{User-Agent regex<br/>match?}
C -- one or more --> D[longest pattern wins] --> M
C -- none --> N[no agent_id]
M --> K[classify visitor and append to signed audit log]
N --> K
Visitor classification
Section titled “Visitor classification”The MATCHED vs NOT-MATCHED binary is too coarse for triage — a human browsing the site and a never-seen-AI-agent both register as not-matched, but a merchant treats them differently. The edge runs a second pass, classifyVisitor, that assigns each record one of three visitor_class values:
matched_agent— the registry matched. The agent is identified; the merchant already knows who this is.unknown_agent— bot-shaped but unrecognized. Worth investigating. This bucket also captures any client that sent aSec-CH-UA-Sill-Agentclaim Sill could not match (an attempted-but-unverified self-identification is surfaced, never laundered as human).human_likely—Mozilla/-prefixedUser-Agentcarrying a known browser engine token (Chrome, Safari, Firefox, Edge, Opera, Version) and no bot keyword. Background noise; the merchant can ignore.
The heuristic is conservative: anything ambiguous stays unknown_agent. False positives toward “unknown agent” are recoverable; false negatives toward “human” would mask a real bot.
visitor_class is persisted onto the audit draft at ingest so dashboard queries read a stable value. If the heuristic changes later, only new records pick up the new classification — there is no retroactive reclassification of past records.
What gets recorded
Section titled “What gets recorded”For each Discovery beacon, the edge writes a DiscoveryDraftRecord to the replication queue. The identity-relevant fields on that record are:
{ "schema_version": 1, "draft_id": "drf_01J9...", "site_id": "01J9...", "evaluated_at": "2026-06-22T18:04:11.219Z", "observed_at": "2026-06-22T18:04:11.190Z", "identification_match": { "matched": true, "agent_id": "agent_openai_chatgpt_user", "match_signal": "user_agent_pattern", "visitor_class": "matched_agent" }, "identification_input": { "surface": "embed", "user_agent": "Mozilla/5.0 ... ChatGPT-User/webprod-20260601", "client_hints": { "sec_ch_ua_sill_agent": "\"ChatGPT-User\"" }, "referrer_origin": "https://example-merchant.com" }, "edge_meta": { "cf_ray": "8a1f9c2d4e0e6c3a-IAD", "cf_colo": "IAD", "worker_version": "b80915eb" }}The origin’s consumer drains the queue, validates the draft, and persists it into the signed, Merkle-chained audit envelope. The record is then visible in the dashboard’s audit log and exportable as part of the audit bundle. See Audit log and export.
What is not recorded
Section titled “What is not recorded”The Discovery beacon is deliberately narrow. Sill does not record raw IPs, full URLs, or query strings on the draft record:
- The Cloudflare colo code (
cf_colo, e.g.IAD) is the only location signal kept, sourced fromrequest.cf.colo. Thecf-ipcountryheader is explicitly avoided. page_context.page_path_hashis omitted in Phase 1; only thereferrer_origin(scheme + host) is kept.- An
agent_card_claim.jws_compact(if presented) is recorded verbatim into the draft but not verified at the edge in Discovery mode. Signature verification of signed mandates is the Transactional path.
What appears in the dashboard
Section titled “What appears in the dashboard”Each matched_agent record renders with the agent’s organization label. unknown_agent renders as a generic bot glyph (a triage prompt). human_likely renders as a generic person glyph and is filtered out of agent-only views.
Audit-log rows, showing how the three visitor classes render side by side.
Reporting view: per-agent breakdown over the selected window, keyed on visitor_class.
Operational behavior
Section titled “Operational behavior”A few details that matter when reading the live system:
- In-isolate cache. The merged registry is cached for 5 minutes per Workers isolate. A KV-managed addition is visible at the edge within 5 minutes (plus deploy and KV-propagation lag).
- Safe degradation. If the KV binding is absent, or the KV
listthrows, the loader returns the bundled floor. Identification continues against the named-organization anchors. - Per-isolate canary. A missing canary id emits
event: registry_canary_missingaterroronce per isolate, withmissing_agent_ids,loaded_count, andkv_present— never seed bodies, patterns, or keys. - Validator drops. A KV record is dropped (and the floor absorbs the loss) if it is malformed, if its
user_agent_patternexceeds 256 chars, or if the pattern matches the ReDoS-prone nested-quantifier shape ((...+)+,(...*)*, etc.). Bundled patterns are not subject to this check — they are in-tree and reviewed.
Frequently asked
Section titled “Frequently asked”Can a hostile party impersonate a named agent by spoofing the User-Agent?
Yes, for Discovery. Identification at the edge is informational — anyone can send any User-Agent. Sill never elevates an identification signal to authorization for Discovery. For actions that move money, the signed mandate path requires an ed25519 signature from a registered key, verified against the registry’s public_keys.
What if a new agent appears that is not in the registry?
It will be classified as unknown_agent and recorded. Operators can add it to the KV-managed registry, after which the edge picks it up within the 5-minute cache window. Bundled entries are read-only; updates to a named-organization anchor ship via an edge redeploy.
Why does Bingbot appear under “AI agents”? Bingbot is the canonical Bing crawler and the retrieval surface that grounds Microsoft Copilot’s web answers. Treating it as an identifiable agent makes Copilot-driven traffic legible to merchants.
Why do link-preview crawlers (Twitterbot, Slack, Discord, Facebook) appear in the registry? They are not AI agents, but they commonly appear in audit logs when a user shares a merchant URL. Identifying them as link-preview crawlers means the merchant does not need to triage them as unknown agents.
Are the agent identifications signed?
The identification of a visiting agent is not itself signed (the agent provides a User-Agent; Sill matches). The audit record that captures the identification is part of the append-only, ed25519-signed, Merkle-chained audit envelope. The per-site agent card Sill publishes on behalf of the merchant is also signed and independently verifiable.
See also
Section titled “See also”- Embed script — how the identification signals reach Sill’s edge.
- Agent card — the per-site signed agent card Sill publishes for inbound agents to read.
- MCP server — Sill’s per-site MCP endpoint, also signed at the agent-card pointer.
- Audit log and export — where identified agents appear in the dashboard and how to export the record.
- Audit envelope — the signed, Merkle-chained log that holds the records.
- Verify a signature — independently verifying any Sill-signed surface.
- External: A2A protocol, Google AP2 mandates, Model Context Protocol, RFC 8032 (ed25519), OWASP LLM Top 10, MITRE ATLAS.