Citare Tools · Free
robots.txt Generator
Pick a preset (allow-all · block-training-allow-grounding · block-all-AI) or set per-bot decisions manually across 25+ AI crawlers + search engines. Paste-ready output, with deterministic per-bot User-agent groups.
Free. No signup. Cached 24h.
Per-bot overrides (advanced)
AI bots
| Bot | Provider | Decision |
|---|---|---|
| GPTBot | OpenAI | |
| OAI-SearchBot | OpenAI | |
| ChatGPT-User | OpenAI | |
| ClaudeBot | Anthropic | |
| Claude-User | Anthropic | |
| Google-Extended | Google AI | |
| GoogleOther | Google AI | |
| PerplexityBot | Perplexity | |
| Perplexity-User | Perplexity | |
| Applebot-Extended | Apple | |
| meta-externalagent | Meta | |
| CCBot | Common Crawl | |
| Bytespider | ByteDance / TikTok | |
| Diffbot | Diffbot |
Search engines
| Bot | Provider | Decision |
|---|---|---|
| Googlebot | ||
| Bingbot | Microsoft | |
| DuckDuckBot | DuckDuckGo | |
| YandexBot | Yandex |
Frequently asked
What's the difference between the 'block training, allow grounding' preset and 'block all AI'?
AI bots have two roles: training (the AI model reads your content to learn from) vs grounding (the AI fetches your content in real time when a user asks a question). Block-training-allow-grounding blocks GPTBot, ClaudeBot, Google-Extended (the training bots) but allows OAI-SearchBot, Perplexity, Bingbot (the live-grounding bots). This is the right choice if you want AI citations + traffic but don't want your content training future models. Block-all-AI blocks both — you disappear from AI search entirely. Use it only for paywalled or genuinely sensitive content.
Should I allow or block GPTBot?
Allow it for most sites that benefit from organic discovery. Blocking GPTBot opts you out of OpenAI's training corpus but doesn't reduce ChatGPT citation visibility (ChatGPT's live citations come from OAI-SearchBot + Bingbot, separate bots). Block GPTBot only when you've made a deliberate training-opt-out decision — usually paywalled content, copyrighted material, or original research you don't want commoditized. Brand-marketing sites and content marketing operations almost always benefit from allowing GPTBot.
Where do I put the generated robots.txt?
At the root of your site, served at /robots.txt with Content-Type: text/plain. If you're on Vercel / Cloudflare Pages / Netlify, drop it in your public/ folder. On WordPress, install a robots.txt plugin or use a custom rewrite rule. On nginx/Apache, the file goes in your document root. Test it after publishing by running the URL through Citare's AI Robots.txt Checker — that fetches your live robots.txt and validates per-bot rules.
Why does the generator output a User-agent group for every bot instead of just using User-agent: *?
Because robots.txt rules don't cascade the way most people expect. When a bot's User-Agent matches an explicit User-agent group, ONLY that group's rules apply — the wildcard User-agent: * is ignored. So if you write 'User-agent: GPTBot / Disallow: /' and nothing else, every other AI bot defaults to the wildcard group (or to implicit-allow if no wildcard exists). To control AI bot access reliably, every bot needs its own explicit User-agent group. The Citare generator outputs one per bot so the rules are deterministic.