Citare Tools · Free
See exactly which AI bots visit your site
Every fetch by GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, and ~16 other AI crawlers — verified against each provider’s published IP ranges. Defendable in writing on every row.
Free reference. Live tracking + install snippet generator included with Citare Pulse plan and above.
Foolproof verification
Three layers run on every hit
Layer 1
UA pattern match
A canonical 25-bot allowlist matches the User-Agent string. Cheap, but spoofable — so this layer alone never sets verified=true.
Layer 2
IP-range CIDR check
Source IP matched against the bot provider's published JSON of IP ranges (OpenAI, Google, Perplexity, Apple, Microsoft). Match → verified.
Layer 3
Reverse-DNS confirmation
For providers without published IP ranges (Bingbot, Applebot), reverse-DNS the IP, match hostname suffix, forward-resolve back. All three must pass.
Headline metric, locked
WHERE ip_verified=true AND bot_class IN ('training', 'live_search')
Indexing crawlers (Googlebot, Bingbot) and unverified hits ride alongside in side stats, never folded into the headline. Every count is reproducible from raw rows.
Works for every customer
Three install paths
Pick whichever fits your hosting. One install per property is enough — fingerprint dedup collapses duplicates if you install multiple. Zero code changes; configuration only.
Vercel Log Drains
SaaS · Next.js · Vercel-hosted
- Open project Settings → Log Drains
- Add JSON destination → paste headers
- Save — verification probe is automatic
Cloudflare Logpush
~40% of sites already behind Cloudflare
- Analytics & Logs → Create Logpush job
- HTTP destination → paste headers
- Save — Cloudflare auth-check is automatic
WordPress mu-plugin
Hostinger · cPanel · managed WP
- Download zip from Settings → AI Crawlers
- Upload to /wp-content/mu-plugins/
- Paste API key in Settings → Citare Crawler
Self-hosted Nginx / Apache? Our generic NDJSON format works with any forwarder (Vector, Logtail, Fluentd) that can POST to a URL.
What we don’t hide
Honest about the limits
Anthropic doesn't publish IP ranges yet
Claude visits land as "unverified by design". We surface it on every row, separately from the headline. Re-evaluated quarterly.
Gemini uses a stealth Chrome UA
Path A's UA-pattern matcher silently drops it. Path C log forwarding catches it via IP-range check regardless of UA.
Cache plugins can short-circuit PHP
W3 Total Cache, LiteSpeed, WP Rocket can serve cached pages before PHP runs. If you have host-level forwarding (Vercel / Cloudflare), prefer that.
On Citare Pulse or above?
Open the dashboard →
Add a property in Settings → AI Crawlers and paste the install snippet for your hosting.
Want this without the full suite?
Email hi@citare.ai →
We offer AI Crawler Tracking piecemeal for select use cases. We’ll be in touch within 24h.
The four-index reality
Each AI engine’s web-search tool grounds against a differentunderlying search index. ChatGPT uses Bing. Claude uses Brave Search. Gemini and Google AI Overviews use Google’s live index. Perplexity uses its own proprietary index.
Optimising for one doesn’t automatically lift on the others — a brand top-3 on Google may be entirely absent from Brave’s smaller crawl, which means Claude won’t surface it. Most agencies treat AI search as a black box: “we hope ChatGPT mentions you.”
Citare measures the box directly. Every fetch by an AI bot, verified against the provider’s published IP ranges. When you present a monthly report, every count is defendable in writing.
Frequently asked
Which AI bots are tracked?
GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI), ClaudeBot, Claude-User, anthropic-ai (Anthropic), PerplexityBot, Perplexity-User (Perplexity), Google-Extended, Google-CloudVertexBot (Google AI), Applebot-Extended, Amazonbot, Bytespider (TikTok), Meta-ExternalAgent, CCBot (Common Crawl), DuckAssistBot, YouBot, cohere-ai, and ~10 more — the full canonical 25-bot allowlist. Indexing crawlers (Googlebot, Bingbot, Yandex) are tracked separately in side stats, never folded into the AI-citations headline.
How is a hit "verified"?
Three layers run on every hit. (1) UA pattern match against the canonical 25-bot allowlist — cheap, but spoofable, so this alone never sets verified=true. (2) IP-range CIDR check against the bot provider's published JSON of ranges (OpenAI, Google, Perplexity, Apple, Microsoft). Match → verified, full IP retained. (3) For providers that don't publish IP ranges (Bingbot, Applebot), reverse-DNS the IP, match the hostname suffix, then forward-resolve back to the original IP. All three must pass. Hits that fail layer 2 + 3 land as "unverified by design" — surfaced in a side stat, never folded into the headline number.
What about Anthropic — why are Claude hits often unverified?
Anthropic doesn't publish IP ranges yet. Their crawlers come from Google Cloud shared infrastructure — verifying off that would false-positive any GCP-hosted bot. Claude visits land as "unverified by design" and we say so explicitly on every row. Re-evaluated quarterly. If/when Anthropic publishes ranges, all historical Claude hits backfill-verify automatically against the new JSON.
What about Gemini? I keep seeing 0 hits even though I know it cites my site.
Gemini uses a stealth Chrome user-agent for grounding fetches, not a published bot string. Path A's UA-pattern matcher silently drops it. Path C (Cloudflare Logpush) catches it via IP-range check against Google's user-triggered-fetchers JSON regardless of UA. If you need complete Gemini coverage, install Path C. This is a known limit and we surface it on every Gemini-relevant view.
Which install path should I pick?
Pick whichever fits your hosting. Vercel Log Drains (best for Next.js / SaaS / anyone Vercel-hosted) — Settings → Log Drains → JSON destination. Cloudflare Logpush (best for ~40% of sites already behind Cloudflare — also the only path that catches Gemini's stealth UA via IP) — Analytics & Logs → Create Logpush job → HTTP destination. WordPress mu-plugin (best for Hostinger / cPanel / managed WP) — upload to /wp-content/mu-plugins/ and paste your API key. Fingerprint dedup collapses duplicates if you accidentally install multiple paths.
Is this a free tool I can run right now?
This page is the free reference. The actual tracking dashboard + install snippet generator is included with the Citare Pulse plan and above. If you just want this one feature standalone (without the full Citare suite), email hi@citare.ai — we offer piecemeal access for specific use cases.
Can I trust the counts in a client report?
Yes — every row is defendable. The headline metric is the SQL `WHERE ip_verified=true AND bot_class IN ('training', 'live_search')`. Unverified hits and indexing crawlers ride alongside in side stats, never inflate the headline. If a CFO or skeptical client asks "how do you know that's real?" you can show them the row: source IP, UA string, verification method (CIDR match against provider X's published JSON), and the JSON URL.