Trust Spoke · Cloudflare
SPOKE · TRUST + IDENTITY + COMPLIANCE

Cloudflare AI Agent Verification — Bot Management, Web Bot Auth, and the Trust Layer Between Your Store and the Open Internet.

User-Agent headers can be spoofed by any scraper in a single line of code. IP ranges are shared across cloud tenants and rotate without notice. robots.txt is a policy declaration, not an enforcement mechanism — bad actors read it and ignore it. This is the bot identity problem at the edge, and it is the direct reason Cloudflare AI agent verification exists. By combining cryptographic HTTP Message Signatures under RFC 9421 with ML bot scoring, the Verified Bots program, and the new Signed Agents classification, Cloudflare turns the question "is this really an OpenAI buyer agent?" from a probabilistic guess into a cryptographically verifiable answer — before the request ever reaches your origin. This spoke covers the edge access control and bot identity layer. For application-layer identity — authorizing what an authenticated agent is allowed to do once it clears the edge — see OAuth for Agents. Three honest data points going in: RFC 9421 is a ratified IETF Proposed Standard (February 2024); Web Bot Auth is an Internet-Draft at revision -05 and is not yet an RFC; Pay-Per-Crawl is in private beta with no published GA date.

RFC 9421
Ratified Feb 2024 — the signing standard under Web Bot Auth
1–99
Bot score range — ML + Heuristics + JSD on every request
100K
Workers free-tier requests/day — MCP at the edge, no bill
Aug '25
Signed Agents launched — ChatGPT agent, Goose, Browserbase
§1 · The Bot Identity Problem at the Edge

Why User-Agent strings and IP ranges are not authentication.

When a request arrives at your Cloudflare-proxied origin with User-Agent: ChatGPT-User, you have a genuine identity problem. The User-Agent header is self-declared and unenforceable: any Python script can write that string in one line. Historically, operators supplemented user-agent inspection with IP address range validation — checking whether the source IP belongs to a CIDR block published by OpenAI, Anthropic, or Google. That mechanism is structurally brittle.

IP ranges shared across a major cloud provider's infrastructure can be claimed simultaneously by multiple tenants. They change without warning as the underlying fleet rotates. Any scraper operating from the same cloud provider — or routing through a VPN or residential proxy network — can appear to originate from the same CIDR blocks as a legitimate crawler. As Cloudflare's engineering team stated directly in their May 2025 Web Bot Auth announcement: "the logic around IP address ranges representing a product or group of users is brittle." That is not a competitive claim — it is an accurate description of how anycast IP space actually works.

robots.txt compounds the enforcement gap. It is a policy document — it declares what bots should do under a voluntary social convention. It cannot prevent any actor from reading it and ignoring it. A scraper that spoofs ChatGPT-User and disregards your robots.txt directives has violated a convention, not bypassed any technical control.

The Practical Consequence

For a US merchant running a Shopify or WooCommerce storefront: every pricing page, product catalog, and customer review section is exposed to traffic that looks like a legitimate AI buyer agent but may be a competitor's data harvesting script. Without cryptographic verification at the edge, there is no reliable signal to distinguish them at request time.

This is the gap that Cloudflare AI agent verification fills. The trust layer operates at the reverse proxy — before traffic touches your origin — and stacks three reinforcing mechanisms: cryptographic HTTP Message Signatures (Web Bot Auth), machine-learning bot scoring, and the Verified Bots / Signed Agents programs. Together, they move identity verification from policy to cryptography.

Signal What It Can Tell You What It Cannot Tell You Spoofable?
User-Agent header What the client claims to be What it actually is Yes — trivially, one line of code
Source IP / CIDR range Which cloud provider's network the request originates from Which tenant or user within that cloud generated it Effectively yes — shared tenancy, VPN, residential proxies
robots.txt Your declared crawl preference Whether the bot will comply N/A — it is a policy, not enforcement
Web Bot Auth (RFC 9421 signature) Cryptographically verified identity of the bot operator's domain Individual user identity (that is OAuth's job — see /agentmall_spoke_oauth) No — requires the private key
ML bot score (1–99) Probability of automated origin based on behavioral fingerprinting Definitive identity — it is probabilistic Partially — sophisticated evasion can raise scores
Gap 1

Spoofed User-Agents

Any script can present as ChatGPT-User, Googlebot, or any other crawlers. User-agent string rules create a false sense of security for operators who rely on them.

Gap 2

Shared IP Space

Cloud provider CIDR blocks are multi-tenant. An IP in OpenAI's published range could legitimately belong to a completely different AWS EC2 customer running on the same network block.

Gap 3

Policy vs. Enforcement

robots.txt establishes norms respected by compliant bots. It provides zero technical barrier to non-compliant scrapers or training crawlers that choose to disregard it.

§2 · Web Bot Auth + RFC 9421 HTTP Message Signatures

The spec stack: what is ratified, what is still a draft.

Honesty Marker — Critical Distinction

RFC 9421 is a ratified IETF Proposed Standard, published February 2024. It defines HTTP Message Signatures as a general cryptographic mechanism. Web Bot Auth (draft-meunier-web-bot-auth-architecture) is an IETF Internet-Draft at revision -05. It is not endorsed by the IETF and is not yet an RFC. Do not conflate them. RFC 9421 is stable; the Web Bot Auth profile built on top of it is still evolving in the standards process. (re-verify IETF draft status before launch)

RFC 9421 — The Foundation

RFC 9421 — HTTP Message Signatures — was published by the IETF in February 2024, authored by A. Backman, J. Richer, and M. Sporny. It is an active, ratified specification, not a draft. RFC 9421 defines how to create and verify cryptographic signatures over HTTP message components: request method, target URI, query string, individual headers, and optionally the request body. The signing algorithm, component identifier format, and verification procedure are all normatively defined by the RFC — this is the stable layer that Web Bot Auth builds on top of.

Web Bot Auth — The Application Profile

Web Bot Auth is Cloudflare's application-layer profile of RFC 9421, scoped specifically to bot and agent authentication. The architecture spec — draft-meunier-web-bot-auth-architecture — is an Internet-Draft at the IETF Datatracker, currently at revision -05 as of this writing. It is not endorsed by the IETF as a standard, though Cloudflare has stated the IETF is considering forming a working group around it. The profile adds three things on top of RFC 9421: the tag="web-bot-auth" parameter in Signature-Input, the Signature-Agent header pointing to the bot's key directory domain, and the well-known URL convention at /.well-known/http-message-signatures-directory for public key discovery. (re-verify IETF draft status before launch)

How Request Signing Works

The mechanism follows a public-key infrastructure model with a well-known key directory — no per-site shared secret, no pre-registration with individual websites:

  1. Key generation. The bot operator generates an Ed25519 key pair. The JWK-formatted public key is published at /.well-known/http-message-signatures-directory on their domain (e.g., operator.openai.com).
  2. Request signing. Before each HTTP request, the bot signs the target authority (@authority) and the signature-agent header using the private key. The Signature-Input header encodes the validity window (created and expires timestamps), the key ID (a JSON Web Key Thumbprint), and the tag="web-bot-auth" parameter.
  3. Verification at the edge. Cloudflare's edge checks for the presence of Signature, Signature-Input, and Signature-Agent headers. It fetches the corresponding public key from the bot's key directory, reconstructs the signature base, and verifies the signature using Ed25519. The expires parameter guards against replay attacks — a captured signed request cannot be reused after the expiration timestamp.
Signed HTTP Request — Real Example (from Cloudflare Web Bot Auth announcement)
GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 Chrome/113.0.0 MyBotCrawler/1.1
Signature-Agent: signer.example.com
Signature-Input: sig=("@authority" "signature-agent");\
               created=1700000000;\
               expires=1700011111;\
               keyid="ba3e64==";\
               tag="web-bot-auth"
Signature: sig=abc==

The Signature-Agent header points to the key directory: crawler.search.google.com for Google Search, operator.openai.com for OpenAI Operator, workers.dev for Cloudflare Workers-hosted agents. The expires timestamp — typically 5 minutes (300 seconds) after creation in reference implementations — is what prevents replay attacks on captured signatures.

Implementation Libraries

Cloudflare publishes two open-source libraries for signing and verifying Web Bot Auth requests:

A live debugging environment is available at https://http-message-signatures-example.research.cloudflare.com for testing implementations against real verification logic.

Integration into the Verified Bots Program

As of July 2025, Cloudflare integrated HTTP Message Signatures directly into its Verified Bots program. Bots applying with well-formed Message Signatures are prioritized and approved more quickly than those using legacy IP validation. Once approved, Cloudflare automatically validates signatures at the edge across all plans. For site owners, no additional configuration is required — if signature validation passes, the traffic is marked verified and available as a WAF rule field. Message Signature verification for origins was available for Free and Pro plans as of this writing, with Business and Enterprise rollout in progress. (re-verify before launch)

Standard / Draft Status Published Defines Stable?
RFC 9421 — HTTP Message Signatures IETF Proposed Standard (ratified) February 2024 General cryptographic HTTP request signing; Ed25519 + other algorithms; component identifiers; verification procedure Yes — ratified RFC
draft-meunier-web-bot-auth-architecture-05 IETF Internet-Draft, revision -05 Active draft (NOT yet RFC) Web Bot Auth profile: tag="web-bot-auth", Signature-Agent header, /.well-known/http-message-signatures-directory key discovery No — draft; subject to change (re-verify)
Ed25519 (RFC 8032) IETF RFC, ratified January 2017 Edwards-Curve Digital Signature Algorithm; the signing algorithm used by Web Bot Auth Yes
JWK Thumbprint (RFC 7638) IETF RFC, ratified September 2015 Deterministic key identifier used as keyid parameter in Signature-Input Yes
§3 · Cloudflare Bot Management Product Surface

Plan tiers, bot score mechanics, and what Enterprise actually unlocks.

Cloudflare's bot protection is not a single product — it is a tiered capability stack. Understanding which tier you are on is prerequisite to understanding which WAF fields and rule expressions are available to you. The most common operator mistake is writing WAF rules that reference cf.bot_management.score on a Pro plan, where those fields simply do not exist.

Plan Product Name Detection Scope Key Capabilities Pricing (re-verify before launch)
Free Bot Fight Mode Simple bots from cloud hosting providers + headless browsers Computationally expensive challenge; Block AI bots toggle; AI Labyrinth honeypot; robots.txt enforcement $0/month (re-verify)
Pro Super Bot Fight Mode Simple bots + headless browsers Customer-chosen block/challenge; Bot Report analytics; Definitely Automated setting; Verified bots pass-through; Static resource protection; JavaScript Detections $20/month billed annually (re-verify)
Business Super Bot Fight Mode Same as Pro Adds Likely Automated (score 2–29) setting; all Pro features $200/month billed annually (re-verify)
Enterprise + Bot Management add-on Bot Management Simple + sophisticated bots, headless browsers, domain-specific anomalies Full bot scoring (cf.bot_management.score 1–99); JA3/JA4 fingerprinting; Bot Analytics; Detection IDs; path-specific rules; all lower-tier features Custom; add-on to Enterprise contract (re-verify)

Bot Score: The ML Engine

The bot score runs from 1 to 99 and is the output of up to four layered detection engines. The scale represents the estimated probability of human origin — lower scores mean more likely automated.

Score Range Classification Detection Engine Recommended WAF Action False Positive Risk
1 Definitively automated Heuristics (high-confidence pattern match) Block Very low
2–29 Likely automated Machine Learning engine Managed Challenge Low — challenge allows real users through
30–99 Likely human ML engine Allow N/A
N/A (boolean) Verified bot Verified Bots program + Web Bot Auth Skip all bot rules (allow through) None — cryptographically verified

Three detection engines feed into the score: Heuristics — pattern matching against a database of known malicious fingerprints, setting score to 1 for high-confidence automated detections; Machine Learning — the primary engine for sophisticated bots, producing scores in the 2–99 range based on billions of requests processed daily; JavaScript Detections (JSD) — lightweight client-side injection that detects headless browsers. A fourth optional engine, Anomaly Detection, learns a domain-specific baseline and flags outliers. Cloudflare explicitly advises against enabling Anomaly Detection on commerce domains with API traffic or significant traffic variability — the false positive rate in those scenarios is high.

The __cf_bm cookie smooths bot scores across a user session, reducing false positives for real users who exhibit browsing patterns the ML engine might otherwise score below 30. It is set automatically by Cloudflare — no configuration required on your part.

JA3 / JA4 Fingerprinting

cf.bot_management.ja3_hash and cf.bot_management.ja4 are available as WAF rule fields in Bot Management for Enterprise. JA3 is a TLS fingerprint derived from the client hello — unique enough to identify specific bot implementations even when they rotate user agents or IP addresses. A scraper running from a cloud host that spoofs ChatGPT-User will still present a JA3 fingerprint consistent with a scripted HTTP client (Python's requests library, curl, or a headless Chromium build), not with OpenAI's production crawler infrastructure. This fingerprint mismatch is the core detection mechanism in the worked example in §8.

AI Labyrinth

Available on all plans including Free, AI Labyrinth is a next-generation honeypot that serves AI-generated nonsense content to unauthorized scrapers, drawing them into an infinite maze of fake pages. Human visitors and compliant bots never see these pages — AI Labyrinth content is served only to bots that follow unlinked honeypot links. Bots that enter the labyrinth are identified, and their fingerprints are fed back into Cloudflare's ML training pipeline, improving detection for all customers globally. Enable it via Security → Bots → Configure Bot Fight Mode → Enable AI Labyrinth.

Free Tier Coverage

Bot Fight Mode + AI Labyrinth + Web Bot Auth signature verification + AI Crawl Control covers the most common bot threat vectors for small-to-medium commerce operators at zero cost. You do not need Enterprise to meaningfully reduce unauthorized AI scraping.

§4 · AI Crawl Control + Pay-Per-Crawl

From visibility to monetization — the crawl economics layer.

AI Crawl Control (Formerly AI Audit)

AI Crawl Control is available on all Cloudflare plans with zero configuration required. It provides a real-time dashboard showing which AI crawlers are hitting your domain, what URLs they are fetching, whether they are respecting your robots.txt directives, and whether they are classified as buyer agents, search indexers, or training crawlers. The dashboard navigation changed from "AI Audit" to "AI Crawl Control" — re-verify the current navigation path in your dashboard, as Cloudflare updates product naming frequently.

Feature What It Shows / Does Plan Availability Config Required?
Activity dashboard Breakdown of AI crawler requests by provider, bot type, and most-fetched URL paths All plans No
robots.txt compliance tracking Which crawlers follow your directives; enforcement rules for those that do not All plans No
Allow / Block / Charge controls Per-crawler policies across three action types All plans No
Export to CSV Audit trail against content licensing agreements All plans No
Pay-Per-Crawl integration "Charge" policy that routes crawlers through HTTP 402 payment flow Private beta (re-verify) Beta signup required
Important Scope Limitation

AI Crawl Control shows you known, classified AI bots from Cloudflare's directory. It does not show unknown scrapers that are not in that directory — those appear in Security Analytics and Bot Analytics (Enterprise), not in AI Crawl Control. Cross-reference both dashboards for complete visibility.

Pay-Per-Crawl (Private Beta)

Pay-Per-Crawl is currently in private beta. Cloudflare acts as Merchant of Record — you set a price per request for your domain, and Cloudflare handles billing aggregation and financial settlement with registered crawler operators. The technical mechanism repurposes HTTP 402 Payment Required.

Reactive Flow — Crawler Discovers Price First
# 1. Crawler requests resource without payment intent
GET /products/running-shoes.html HTTP/1.1
Host: www.your-store.com

# 2. Cloudflare responds: payment required at stated price
HTTP/1.1 402 Payment Required
crawler-price: USD 0.001

# 3. Crawler retries with exact-price confirmation
GET /products/running-shoes.html HTTP/1.1
Host: www.your-store.com
crawler-exact-price: USD 0.001

# 4. Cloudflare charges and responds with content
HTTP/1.1 200 OK
crawler-charged: USD 0.001
server: cloudflare
Proactive Flow — Crawler Declares Max Price Intent
# 1. Crawler sends request with max-price header
GET /products/running-shoes.html HTTP/1.1
Host: www.your-store.com
crawler-max-price: USD 0.002

# 2a. If configured price <= crawler-max-price: charge and respond
HTTP/1.1 200 OK
crawler-charged: USD 0.001
server: cloudflare

# 2b. If configured price > crawler-max-price: payment required
HTTP/1.1 402 Payment Required
crawler-price: USD 0.005

Publisher controls allow setting a flat per-request price across the entire domain — path-level pricing was not available in the beta as of this writing (re-verify). For each known crawler, publishers choose Allow (free access), Charge (at the configured price), or Block (no access). If a crawler has no billing relationship with Cloudflare, choosing "Charge" is functionally equivalent to HTTP 403 — the crawler is told a paid relationship is possible, but no content is served.

Pay-Per-Crawl enforcement runs after existing WAF policies and Bot Management rules — your current security posture is not bypassed to charge a crawler. Crawler operators must register with Cloudflare via Web Bot Auth (Ed25519 key pair, public key directory, HTTP Message Signatures on each request) to participate. This registration requirement is the anti-spoofing control — a bad actor cannot impersonate a registered crawler and trigger fraudulent payments because the cryptographic signature cannot be forged without the private key.

Beta Status

Pay-Per-Crawl is in private beta. Beta signup is available at https://www.cloudflare.com/paypercrawl-signup/ or via your Cloudflare account executive. The product is expected to evolve significantly before general availability. Re-verify beta status, pricing mechanics, and registered crawler list before planning any revenue model around it.

The 30-Day AgentMall Newsletter

One operator note per week. The trust layer in your inbox.

Field-tested patterns, real failure modes, and the next trust-layer spoke as it ships. No fluff. Cancel any time.

§5 · Bot Taxonomy: Buyers, Indexers, Training Crawlers

Three categories, different economics — and a fourth just arrived.

The single most operationally consequential distinction in Cloudflare AI agent verification is that bots from the same AI provider — even from the same company like OpenAI — serve fundamentally different purposes and should receive different access policies. The taxonomy anchors on three OpenAI bots as the clearest example, then extends to all providers.

Category Example Bot User Agent Purpose Traffic Value Recommended Default Action
AI Assistant (Buyer Agent) ChatGPT-User, Claude-User, Perplexity-User ChatGPT-User Real-time user-directed requests — a user asking ChatGPT to check product availability, compare prices, or complete a purchase Highest — direct transactional intent Allow (verify with Web Bot Auth signature)
AI Search Indexer OAI-SearchBot, Claude-SearchBot, PerplexityBot OAI-SearchBot Building and updating search indexes for ChatGPT's search features; surfaces your content in AI-powered search results High — referral and discovery value Allow with rate limiting or Pay-Per-Crawl
AI Training Crawler GPTBot, ClaudeBot, Bytespider, CCBot GPTBot Training or fine-tuning AI foundation models; no direct referral traffic to your site Low to negative — no direct referral benefit; IP / content extraction risk Block or Charge via Pay-Per-Crawl
Critical Operator Mistake

Treating all AI bots from the same provider as equivalent — blocking GPTBot and inadvertently also blocking OAI-SearchBot and ChatGPT-User via an overly broad "block all AI bots" toggle — will remove your site from ChatGPT search results and block buyer agents from transacting with your storefront. Use per-bot controls in AI Crawl Control, not blanket category blocks.

Cloudflare's Five Responsible AI Bot Principles

Cloudflare's responsible AI bot principles framework (published September 2025) formalizes this taxonomy into five required behaviors for bots that want to be classified as "good actors." Cloudflare's position — stated directly — is that combining multiple declared purposes in a single bot forces operators into false trade-offs and violates Principle #3 (single declared purpose). The three declared purposes are: Search (building an index, providing results), AI-input (real-time retrieval-augmented grounding for generative answers), and Training (fine-tuning models). (re-verify principle list as Cloudflare publishes updates)

Signed Agents: A Fourth Category (August 2025)

As of August 2025, Cloudflare introduced a Signed Agents classification for user-directed agents that use Web Bot Auth signatures and comply with Cloudflare's signed agent policy. The first cohort includes ChatGPT agent, Goose from Block, Browserbase, and Anchor Browser. Signed agents differ from verified bots in that they are user-directed: the same agent platform (e.g., Browserbase's remote browser infrastructure) may be used by thousands of different end users, each directing the agent toward different goals. The Signed Agents program cryptographically attests to the platform, not the individual user — individual user identity remains an application-layer concern for OAuth (see /agentmall_spoke_oauth). Enterprise customers can take action on signed agents as a group in security rules. (re-verify rule field availability — announced as "coming soon" in August 2025)

Category Who Controls the Bot? Cloudflare Program Identity Method WAF Field (Enterprise)
Verified Bot Single legal entity (e.g., OpenAI directs all GPTBot traffic) Verified Bots program Web Bot Auth signature or IP validation cf.bot_management.verified_bot
Signed Agent Platform-level (user-directed; many users, one signing platform) Signed Agents program (Aug 2025) Web Bot Auth signature (required) Coming soon (re-verify)
Unknown Bot Unknown — not in Verified Bots directory N/A ML + Heuristics scoring cf.bot_management.score
Human visitor N/A N/A Score 30–99 + no bot signals Allow by default
§6 · Cloudflare Workers + Tunnel as MCP Server Hosting Path

Edge-native MCP hosting with bot protection built in.

Cloudflare Workers provides the fastest path to a bot-management-protected MCP endpoint without managing servers. Deploying an MCP server on Workers means your server runs inside Cloudflare's global network — co-located with the same bot detection pipeline that evaluates every incoming request. The WAF rules and Bot Management policies you have configured on your zone apply to agent traffic hitting your MCP endpoint before any application code runs.

Hosting Option Use Case Bot Protection Cost Floor Private Origin?
Workers (serverless) Stateless MCP tool endpoints; lightweight commerce logic Full WAF + Bot Management on all traffic Free tier: 100K req/day, 10ms CPU (re-verify) No — Workers runs in Cloudflare's network
Workers + Durable Objects Stateful MCP sessions; long-running agent interactions Full WAF + Bot Management Paid Workers plan; Durable Objects billed per request + storage (re-verify) No
Cloudflare Tunnel (cloudflared) Private origin (on-prem, VPC) exposed as public MCP endpoint Full WAF + Bot Management; origin never publicly routable Free on all plans (re-verify) Yes — origin stays private
Zero Trust MCP Server Portals (Enterprise) Centralized gateway for all enterprise MCP connections Access policies + MFA + device posture before any MCP request Cloudflare One / SASE pricing (re-verify) Yes

Workers: The Bot-Protected MCP Endpoint

A Workers-hosted MCP server at https://your-store-mcp.workers.dev/sse is immediately behind Cloudflare's WAF. The workers-oauth-provider library handles the OAuth 2.1 provider side — Dynamic Client Registration (RFC 7591) and Authorization Server Metadata (RFC 8414) — so your MCP server can authenticate agent clients out of the box. Upstream OAuth access tokens (e.g., from Shopify or GitHub) are stored encrypted in Workers KV; only a narrower, scoped token is issued to the MCP client, limiting blast radius if that token is compromised. In April 2025, Cloudflare announced generally available remote MCP servers, with partnerships with Auth0, Stytch, and WorkOS for authentication and authorization. (re-verify GA status and partner list)

Workers Free Tier Limits (re-verify at developers.cloudflare.com/workers/platform/limits/)
# Free tier
Requests:    100,000 / day
CPU time:    10 ms per request
Memory:      128 MB

# Paid Workers plan (Workers Paid)
Requests:    No cap (billed per million after included)
CPU time:    30 seconds default; up to 5 minutes for long-running
Memory:      128 MB standard; higher limits available

Tunnel: Private Origin with Public MCP

If your commerce logic lives on a private origin — an on-premise server, a private VPC, or Shopify's internal API behind a network boundary — Cloudflare Tunnel lets you expose a public MCP endpoint without a publicly routable IP on your origin. The cloudflared daemon runs on your private infrastructure and creates outbound-only connections to Cloudflare's global network. Your firewall only needs to allow outbound traffic from cloudflared; all inbound connections are blocked at the network level. Publicly, the MCP endpoint resolves to Cloudflare, which applies your full WAF and Bot Management ruleset before proxying traffic through the tunnel.

Hybrid MCP Architecture Pattern
AI Agent
   ↓
Cloudflare Edge
  ├─ Web Bot Auth signature verification (RFC 9421)
  ├─ Bot score: ML + Heuristics + JSD
  ├─ WAF custom rules (Bot Management Enterprise)
  └─ AI Crawl Control category classification
   ↓
[Cloudflare Tunnel (cloudflared)]
   ↓
Private Origin
  ├─ Shopify Storefront API
  ├─ WooCommerce REST API
  └─ Custom headless commerce backend
   ↓
OAuth 2.1 app-layer identity
(see /agentmall_spoke_oauth)
   ↓
/agents.json capability manifest
(see /agentmall_spoke_agents_page)

Zero Trust MCP Server Portals (Open Beta, August 2025)

For enterprise deployments, MCP Server Portals — part of Cloudflare One / SASE, announced August 2025, open beta — provide a centralized gateway for all MCP connections. This layer requires corporate identity provider authentication (Cloudflare Access) before an agent reaches any MCP server, enforces MFA and device posture checks, provides aggregated MCP request logs across all servers, and presents each user with only the curated list of servers and tools they are authorized to use. This is the enterprise analog of OAuth scope enforcement, but operating at the network layer before HTTP. (re-verify open beta status and feature set)

§7 · The Trust Layer Interaction Map

How bot identity, OAuth, and capability discovery form a complete stack.

The edge bot verification layer is not the complete picture — it is the first gate in a four-layer stack. Understanding where Cloudflare's role ends and where OAuth (application-layer identity) begins is critical for building a coherent trust architecture. Cloudflare verifies who is at the edge. OAuth verifies what that actor is authorized to do once they pass the edge.

Layer Component What It Answers How Cloudflare Edge Interacts Cross-Reference
Pre-auth edge Web Bot Auth (RFC 9421 signature verification) "Is this request cryptographically linked to a registered bot operator?" Verifies Ed25519 signature against public key at /.well-known/http-message-signatures-directory; marks cf.bot_management.verified_bot Cloudflare Verified Bots / Signed Agents programs
Edge scoring Bot score 1–99 (ML + Heuristics + JSD) "How likely is this request to be automated?" Assigns cf.bot_management.score; enables WAF rule branching; available on Bot Management for Enterprise Bot Management documentation
Edge policy AI Crawl Control + WAF custom rules "What access policy applies to this bot category on this path?" Per-category allow/block/charge; path-specific WAF expressions; AI Labyrinth for unknown scrapers AI Crawl Control; Pay-Per-Crawl (private beta)
App-layer auth OAuth 2.1 (PKCE / client_credentials) "What is this authenticated agent authorized to do?" Cloudflare Workers host the OAuth server; workers-oauth-provider handles RFC 7591 + RFC 8414; token scopes define API access /agentmall_spoke_oauth
Capability discovery /agents.json / Agents Page "What tools and scopes does this endpoint support?" The capability manifest can reference the Signature-Agent key directory URL, signaling Web Bot Auth support to compliant clients /agentmall_spoke_agents_page
Monetization Pay-Per-Crawl (private beta) "Has this crawler agreed to pay for access?" HTTP 402 + crawler-price headers; Cloudflare as Merchant of Record; enforced after WAF policies /agentmall_roadmap

Request Flow Diagram

Full Request Lifecycle — Edge to Origin
Internet Request
       ↓
[Cloudflare Edge]
  ├─ Web Bot Auth: Does request carry valid RFC 9421 signature?
  │    YES → Mark cf.bot_management.verified_bot = true (or signed_agent)
  │    NO  → Pass to ML + Heuristics pipeline
  ├─ Bot Score: 1–99
  │    1 = definitely automated → Block (if rule configured)
  │    2–29 = likely automated  → Challenge or Block
  │    30–99 = likely human     → Allow
  ├─ AI Crawl Control: Which category is this?
  │    AI Assistant (Buyer)   → Allow / no challenge
  │    AI Search (Indexer)    → Allow or Pay-Per-Crawl
  │    AI Training (Crawler)  → Block or Pay-Per-Crawl
  └─ WAF Custom Rules: Path-specific overrides
       /api/products  → allow cf.verified_bot_category = "AI Assistant"
       /api/checkout  → require OAuth token (→ /agentmall_spoke_oauth)
       /sitemap.xml   → allow all verified bots
       ↓
[Origin / MCP Server / Workers]
  ├─ OAuth 2.1 identity (app-layer): what can this agent do?
  │    (see /agentmall_spoke_oauth)
  └─ Response + /agents.json capability manifest
       (see /agentmall_spoke_agents_page)
AgentMall Layer What It Is Cloudflare Role
Structured Data (Layer 1) Product feeds, schema.org, llms.txt AI Crawl Control governs indexer access; Pay-Per-Crawl monetizes it
API Endpoint (Layer 2) REST/GraphQL commerce API WAF + Bot Management protect the endpoint; rate limiting governs throughput
MCP Tool Description (Layer 3) MCP server with tool definitions Workers hosts it; Tunnel connects private origins; Zero Trust gates enterprise access
UCP Compatibility (Layer 4) Universal Checkout Protocol alignment Buyer agent identification (ChatGPT-User as Signed Agent) is the trust prerequisite for UCP transactions
§8 · Worked Example: Shopify Storefront on Cloudflare

Two requests, same User-Agent, different outcomes.

Shopify + Cloudflare Architecture Note

Shopify's infrastructure already uses Cloudflare. Standard Shopify storefronts run behind Shopify's Cloudflare configuration, not yours — you cannot add custom WAF or Bot Management rules to Shopify-managed traffic. The scenario below applies cleanly to: (1) headless storefronts with a custom front-end on Cloudflare, (2) custom commerce APIs hosted outside Shopify's platform, or (3) MCP servers on Workers that call Shopify's APIs from the server side. Re-verify Shopify + Cloudflare compatibility before production deployment.

Request A — Legitimate ChatGPT-User (OpenAI ChatGPT Agent)

  1. OpenAI's ChatGPT agent sends GET /products/running-shoes with User-Agent: ChatGPT-User, Signature-Agent: operator.openai.com, and a valid Ed25519 Signature and Signature-Input header.
  2. Cloudflare's edge detects the Web Bot Auth headers and begins signature verification.
  3. Cloudflare fetches the key directory at operator.openai.com/.well-known/http-message-signatures-directory, retrieves the public key matching the keyid parameter, and verifies the signature using Ed25519.
  4. Signature verification passes. The request is marked cf.bot_management.verified_bot = true with cf.verified_bot_category = "AI Assistant".
  5. WAF Rule 1 (cf.bot_management.verified_bot → Skip all remaining custom rules) fires. Request passes through to origin.
  6. Origin serves product page. ChatGPT agent delivers product information to the user who initiated the query.

Request B — Scraper Spoofing ChatGPT-User

  1. A Python requests script sends GET /products/running-shoes with User-Agent: ChatGPT-User. No Signature, Signature-Input, or Signature-Agent headers are present.
  2. Cloudflare's edge checks for Web Bot Auth headers — absent. Falls through to ML + Heuristics pipeline.
  3. ML engine analyzes request features: no browser-consistent TLS fingerprint (JA3 hash matches a known Python/requests pattern, not a browser or OpenAI's known JA3 fingerprint); no browser session signals; cold-start request pattern. Bot score: 2.
  4. WAF Rule 3 fires: (cf.bot_management.score gt 1 and cf.bot_management.score lt 30) → Managed Challenge.
  5. The script cannot pass a Managed Challenge — it requires JavaScript execution in a real browser context. Request is abandoned.
  6. If AI Labyrinth is enabled and the scraper previously followed any honeypot links, its fingerprint may already be in the blocklist — Rule 2 (cf.bot_management.score eq 1 → Block) may fire even before Rule 3.

WAF Rule Configuration (Bot Management for Enterprise)

These are the three baseline rules from Cloudflare's official challenge-bad-bots documentation — real WAF expressions ready for deployment. Rules must be deployed in the order shown; rule order is evaluated top-to-bottom and early exit is critical.

Rule 1 — Skip Verified Bots (Highest Priority)
Expression: (cf.bot_management.verified_bot)
Action:     Skip — All remaining custom rules

# This rule must be FIRST. A verified bot (signed agent, registered crawler)
# should bypass all bot score rules. If this rule is placed after score-based
# block rules, verified bots will be incorrectly blocked before skip fires.
Rule 2 — Block Definitively Automated
Expression: (cf.bot_management.score eq 1)
Action:     Block

# Score 1 = Heuristics engine high-confidence detection.
# Blocking carries very low false-positive risk.
Rule 3 — Challenge Likely Automated
Expression: (cf.bot_management.score gt 1 and cf.bot_management.score lt 30)
Action:     Managed Challenge

# Score 2-29 = ML engine output, likely automated.
# Managed Challenge allows real users through; stops most bots.
Rule 4 — Allow AI Buyer Agents Through to Checkout API (Additive)
Expression: (cf.verified_bot_category eq "AI Assistant" and
             starts_with(http.request.uri.path, "/api/checkout"))
Action:     Skip — All remaining custom rules

# Ensures buyer agents (ChatGPT-User, Claude-User) can reach
# checkout endpoints even on paths with additional restrictions.
Rule 5 — Block Training Crawlers on Product Pages
Expression: (http.user_agent contains "GPTBot" and not cf.bot_management.verified_bot)
Action:     Block

# Blocks unverified GPTBot; allows verified GPTBot (if you choose to allow it).
# Use this pattern if you want to block training crawlers that are not
# properly registered in Cloudflare's Verified Bots program.
Plan Requirement

Rules using cf.bot_management.score, cf.bot_management.verified_bot, and cf.bot_management.ja3_hash require Bot Management for Enterprise. On Free/Pro/Business plans, use the built-in Bot Fight Mode toggles in the dashboard rather than custom rules using these WAF fields — the fields simply are not available.

Request Type Web Bot Auth Headers? Bot Score Rule Fired Outcome
Legitimate ChatGPT agent (Web Bot Auth signed) Yes — valid Ed25519 signature N/A — verified_bot = true Rule 1 (skip) Allowed through to origin
Python script spoofing ChatGPT-User No 2 (JA3 mismatch, cold-start pattern) Rule 3 (Managed Challenge) Challenge fails — request abandoned
Known scraper (Heuristics engine) No 1 (definitively automated) Rule 2 (Block) Hard blocked
Real human browser N/A 72 (likely human) No bot rules fire Allowed through to origin
GPTBot training crawler (unverified) No Varies Rule 5 (Block) if user_agent contains "GPTBot" Blocked on product pages
§9 · Common Mistakes

Eight ways Cloudflare bot rules break in production.

1. Wrong rule order — verified bot skip placed after score-based block

The problem: cf.bot_management.score lt 30 catches all likely-automated traffic. AI buyer agents (ChatGPT-User) that are verified bots have cf.bot_management.verified_bot = true — but only if your skip rule fires first. If Rule 1 (skip verified bots) is placed after Rule 2 (block score < 30), verified bots get blocked before the skip executes. Cloudflare's rules engine evaluates in strict top-to-bottom order. The fix: Rule 1 must always be the first rule and must skip all remaining custom rules when cf.bot_management.verified_bot is true. See the baseline in §8.

2. Conflating User-Agent string matching with bot identity verification

The problem: writing a WAF rule like http.user_agent contains "ChatGPT-User" → Allow, and treating that as authentication. Any scraper can write that string. User-agent-based allow rules create a spoofing bypass by design. The fix: use cf.bot_management.verified_bot (Enterprise) or cf.verified_bot_category eq "AI Assistant" for identity-based rules. On non-Enterprise plans, rely on the Block AI bots toggle — maintained by Cloudflare against a curated directory — rather than home-grown user-agent matching.

3. Enabling Bot Fight Mode on an API-first or headless storefront without exempting own traffic

The problem: Bot Fight Mode and Super Bot Fight Mode apply across all traffic on the domain. Automated traffic from your own mobile app, partner APIs, webhook integrations, or health-check monitors will be challenged or blocked. A midnight deployment pipeline hitting your staging endpoint will trigger Managed Challenge at 2am. The fix: on Bot Management for Enterprise, use not starts_with(http.request.uri.path, "/api") exclusions in custom rules. On lower plans, review the bot settings carefully and consider exempting specific IP ranges or JA3 fingerprints for known-good automation before enabling Bot Fight Mode site-wide.

4. Deploying Pay-Per-Crawl without verifying crawler registration

The problem: Pay-Per-Crawl requires both parties — publisher and crawler — to have Cloudflare accounts with configured payment details. A publisher who sets a price for an unregistered crawler will see a functional block (HTTP 403-equivalent behavior), not a paid crawl. Setting "Charge" for an unregistered crawler is ambiguous and may confuse your analytics. The fix: confirm which crawlers have active billing relationships with Cloudflare before configuring "Charge" policies. Use "Block" explicitly for crawlers without relationships. (re-verify registered crawler list as the beta expands — the list is expected to grow significantly)

5. Relying solely on AI Crawl Control and missing unknown scrapers

The problem: AI Crawl Control shows you known, classified AI bots from Cloudflare's directory. It does not show unknown scrapers that are not in that directory. Those appear in Security Analytics and Bot Analytics (Enterprise), not in AI Crawl Control. Operators who use AI Crawl Control as their sole monitoring tool systematically undercount unauthorized scraping. The fix: cross-reference AI Crawl Control data with Security Analytics. Enable AI Labyrinth — it surfaces unknown scrapers by luring them into honeypot paths and adding their fingerprints to the blocklist.

6. Not registering your own bot or agent with the Verified Bots program

The problem: if you operate your own AI agent — a commerce assistant, price comparison bot, or inventory checker — and it visits other Cloudflare-protected sites, it will score poorly on the ML engine and may be challenged or blocked. The ML engine sees an automated HTTP client with no browser signals and assigns a low bot score. The fix: submit your bot via the Verified Bots submission form in your Cloudflare account (Account Home → three dots → Configurations → Verified Bots). If your bot signs requests using Web Bot Auth, select "Request Signature" as the verification method — it is prioritized and approved more quickly than legacy IP validation.

7. Enabling Anomaly Detection on a commerce domain with API traffic or traffic spikes

The problem: Anomaly Detection learns a traffic baseline for your domain and flags outliers as score 1 (definitively automated). On an e-commerce site with large API traffic, promotional traffic spikes, or a Cloudflare for SaaS multi-tenant setup, the baseline is inherently noisy — and real users during a flash sale or product launch may be blocked en masse. Cloudflare explicitly advises against enabling Anomaly Detection on these domain types. The fix: do not enable Anomaly Detection on commerce domains with variable traffic. Use the ML engine and Heuristics pipeline instead — they handle legitimate variability without learning a fixed baseline.

8. Skipping the Signed Agents program consideration for MCP server deployments

The problem: if your MCP server is designed to be called by user-directed AI agents (ChatGPT agent, Claude agent, Goose), and those agents are registered in Cloudflare's Signed Agents directory, you need WAF rules that correctly identify and handle them. If you have generic score lt 30 → challenge rules without a prior signed-agent pass-through rule, those agents will be challenged and your MCP tool calls will silently fail at the network layer — with no error surfaced to the end user. The fix: add an explicit rule to pass signed agents (AI Assistant category + verified_bot) before your generic bot score rules. Monitor the signed agents WAF field when Cloudflare makes it generally available — announced as "coming soon" in August 2025. (re-verify field name and plan availability before launch)

§10 · FAQ

Frequently asked questions.

Is Web Bot Auth the same as RFC 9421?

No, but it is built on it. RFC 9421 is the IETF ratified standard for HTTP Message Signatures — it defines the general cryptographic signing and verification mechanism. Web Bot Auth is Cloudflare's application-layer profile of RFC 9421, scoped specifically to bot and agent authentication. It adds the tag="web-bot-auth" parameter, the Signature-Agent header pointing to the bot's key directory, and the well-known URL convention for public key discovery. The IETF Internet-Draft draft-meunier-web-bot-auth-architecture specifies the Web Bot Auth profile on top of RFC 9421. (re-verify draft status)

Can I use Web Bot Auth / Verified Bots on a free Cloudflare plan?

Yes, with important caveats. Cloudflare's edge signature verification (checking incoming Web Bot Auth headers from known bots) is available on Free and Pro plans and is currently being ramped to Business and Enterprise. The bot being verified must already be in Cloudflare's Verified Bots program. However, the cf.bot_management.verified_bot WAF rule field requires Bot Management for Enterprise — on lower plans, protection comes from the built-in Bot Fight Mode toggles, not custom WAF rules using bot score fields. (re-verify before launch)

What happens to a signed agent request if the signature is expired?

Cloudflare's signature verification checks the expires parameter in Signature-Input. If the current time is after the expiration timestamp, verification fails. The request falls back to existing bot identification and mitigation mechanisms — meaning it gets scored by the ML engine and Heuristics pipeline, not automatically passed. The expiration window is typically 5 minutes (300 seconds) in reference implementations, protecting against replay attacks. Legitimate agents re-sign each request; cached signatures are not valid.

How is Pay-Per-Crawl different from just charging crawlers via a paywall on my own origin?

The core difference is that Pay-Per-Crawl is enforced at the Cloudflare edge, not at your origin. This means: (1) you do not need to deploy any code on your origin server; (2) Cloudflare acts as Merchant of Record and handles billing aggregation; (3) the anti-spoofing guarantee comes from Web Bot Auth cryptographic verification — only crawlers that have registered with Cloudflare and signed their requests can be charged or allowed through. A paywall on your origin cannot cryptographically distinguish a registered crawler from a spoofing scraper; Cloudflare's edge can. The tradeoff: Pay-Per-Crawl is in private beta and currently limited to domain-wide flat pricing. (re-verify before launch)

Does blocking GPTBot hurt my site's visibility in ChatGPT search?

It depends on which bot you block. GPTBot is OpenAI's training crawler — blocking it prevents your content from being used in future model training. It does not directly affect OAI-SearchBot (which indexes your content for ChatGPT's search features) or ChatGPT-User (which is a real-time user-directed agent). Blocking OAI-SearchBot would remove your site from ChatGPT search results. Blocking ChatGPT-User would block user-directed agents from accessing your storefront in real time. These are separate bots with separate user agents and separate Cloudflare detection IDs — use the per-bot controls in AI Crawl Control, not blanket "block all AI bots."

My Shopify store is already on Cloudflare (Shopify uses Cloudflare). Can I add my own Bot Management rules?

Standard Shopify storefronts run behind Shopify's Cloudflare configuration, not yours. You cannot add custom WAF or Bot Management rules to traffic that Shopify controls. If you add your custom domain through your own Cloudflare account on top of Shopify, you may create double-proxying issues. Cloudflare Bot Management rules in the context described in this document apply most cleanly to: (1) headless storefronts with a custom front-end on Cloudflare, (2) custom commerce APIs (not Shopify's) behind Cloudflare, or (3) MCP servers hosted on Cloudflare Workers that call Shopify's APIs from the server side. For direct Shopify front-ends, Shopify's built-in bot reporting (Winter 2024 update) is the appropriate tool.

What is the difference between the Signed Agents program and the Verified Bots program?

Verified Bots are bots operated by a single legal entity that directs all the traffic — Googlebot, Bingbot, the OpenAI training crawler. The entity owns the bot and controls what it does. Signed Agents are user-directed: the agent acts on behalf of an individual user, meaning the same agent infrastructure (e.g., Browserbase's remote browser) might be used by thousands of different users, each directing the agent toward different goals. The Signed Agents program cryptographically attests to the platform sending the signed requests (Browserbase, OpenAI's Operator, Goose), not the individual user. Individual user identity is an application-layer concern handled by OAuth (see /agentmall_spoke_oauth).

Do I need Cloudflare Enterprise to protect an MCP server from scraping?

No, but capabilities scale with plan tier. On a Free or Pro plan using Cloudflare Workers: you get Bot Fight Mode (challenging obviously automated traffic), AI Labyrinth (honeypot for unauthorized AI scrapers), Block AI bots toggle, and Web Bot Auth signature verification (for registered bots). These are meaningful controls. What you cannot do on lower plans: build WAF rules based on cf.bot_management.score (bot score fields require Bot Management for Enterprise), write path-specific bot rules, use JA3/JA4 fingerprint matching, or access Bot Analytics. For a small-to-medium commerce operator, Bot Fight Mode + AI Crawl Control + AI Labyrinth covers the most common threat vectors. Upgrade to Bot Management when you need fine-grained path-specific rules or are seeing sophisticated evasion (rotating IPs, JA3 variability).

§11 · Step-by-Step

The 30-day rollout, in five steps.

Each step mirrors the HowTo JSON-LD at the top of this page word for word.

Step 1 — Audit your current AI bot traffic

Before writing any rules, establish a baseline. In the Cloudflare dashboard, navigate to the AI Crawl Control tab for your domain. Review the breakdown by AI provider, bot category, and most-fetched URL paths. Export the CSV. Identify: (a) which AI Assistants (buyer agents) are hitting your product and checkout paths, (b) which AI Search crawlers are indexing your content, (c) which Training crawlers are accessing your domain, and (d) whether any crawlers are ignoring your robots.txt directives. This baseline prevents you from blocking legitimate buyer agent traffic when you tighten rules.

Step 2 — Configure Bot Fight Mode and AI Labyrinth (All Plans)

In the Cloudflare dashboard, go to Security → Bots → Configure Bot Fight Mode. Enable the Block AI bots toggle to block known training crawlers that you do not wish to allow. Enable AI Labyrinth to trap unauthorized scrapers and feed fingerprint data to Cloudflare's ML training pipeline. Enable Instruct AI bot traffic with robots.txt to have Cloudflare enforce your existing robots.txt directives at the edge for bots that are known to Cloudflare's directory. These three toggles require no rule authoring and update automatically as Cloudflare adds new bot signatures.

Step 3 — Deploy baseline WAF rules for bot traffic (Enterprise: Bot Management Add-on)

If you have Bot Management for Enterprise, deploy the three-rule baseline from Section 8 in order: (1) Skip verified bots, (2) Block score = 1, (3) Managed Challenge score 2–29 and not verified bot. Then add path-specific overrides: allow cf.verified_bot_category = "AI Assistant" through to your API endpoints and MCP server path; block or challenge generic automation on /admin, /cart, and /checkout paths. Use Bot Analytics to review the impact on traffic before tightening thresholds.

Step 4 — Deploy MCP server on Cloudflare Workers with OAuth (All Plans)

Create a new Worker using npm create cloudflare@latest -- --template=remote-mcp-authless or the OAuth-enabled template. Add the workers-oauth-provider library for OAuth 2.1 support. Configure your tool definitions to check cf.verified_bot_category or session-level identity claims (from OAuth) before returning sensitive data (inventory, pricing APIs, checkout). Deploy with wrangler deploy. Your MCP endpoint at your-worker.workers.dev/sse is immediately behind Cloudflare's WAF and receives the bot detection fields on every request. For private origins, deploy cloudflared on your infrastructure and connect via a Tunnel, pointing the public-facing Cloudflare hostname to the tunnel.

Step 5 — Register your bot with Cloudflare and implement Web Bot Auth (If You Operate a Bot or Agent)

Generate an Ed25519 key pair. Publish the JWK public key at https://yourdomain.com/.well-known/http-message-signatures-directory. Sign the key directory URL itself using web-bot-auth (TypeScript or Rust library) to prove ownership. Register via the Verified Bots submission form in your Cloudflare account (Account Home → three dots → Configurations → Verified Bots), selecting "Request Signature" as the verification method. Configure every outbound HTTP request from your bot to include Signature, Signature-Input, and Signature-Agent headers. Test your implementation against https://http-message-signatures-example.research.cloudflare.com. Once approved, your bot will pass Cloudflare's signature verification at all sites it visits and will be excluded from challenges.

§12 · Continue the Guide

The trust layer continues at every spoke.

Edge bot verification is the first gate. These spokes cover what happens downstream — identity after the edge, capability discovery, and the full trust framework for agent-ready commerce.

The Window

The edge is the first gate. Build it before the agents arrive.

Cloudflare AI agent verification is deployable today — Bot Fight Mode + AI Labyrinth + Web Bot Auth signature verification at zero cost on a free plan. Pay-Per-Crawl is in private beta. The Signed Agents program launched in August 2025. The infrastructure is not theoretical: it is running on Cloudflare's global network right now, processing every request that passes through your zone. The operators who configure it today — who audit their AI bot traffic, set their per-category policies, and register their own agents with the Verified Bots program — will have a defensible, monetizable edge layer when buyer agent traffic reaches commercial scale. The roadmap has the full picture of what comes after.

Open the AgentMall Roadmap →
The 30-Day AgentMall Newsletter

One AgentMall note per week.

Trust-layer playbooks, real failure modes from operator logs, and the next spoke the morning it ships. No fluff. Cancel any time.