Question 1

What's the difference between the 'block training, allow grounding' preset and 'block all AI'?

Accepted Answer

AI bots have two roles: training (the AI model reads your content to learn from) vs grounding (the AI fetches your content in real time when a user asks a question). Block-training-allow-grounding blocks GPTBot, ClaudeBot, Google-Extended (the training bots) but allows OAI-SearchBot, Perplexity, Bingbot (the live-grounding bots). This is the right choice if you want AI citations + traffic but don't want your content training future models. Block-all-AI blocks both — you disappear from AI search entirely. Use it only for paywalled or genuinely sensitive content.

Question 2

Should I allow or block GPTBot?

Accepted Answer

Allow it for most sites that benefit from organic discovery. Blocking GPTBot opts you out of OpenAI's training corpus but doesn't reduce ChatGPT citation visibility (ChatGPT's live citations come from OAI-SearchBot + Bingbot, separate bots). Block GPTBot only when you've made a deliberate training-opt-out decision — usually paywalled content, copyrighted material, or original research you don't want commoditized. Brand-marketing sites and content marketing operations almost always benefit from allowing GPTBot.

Question 3

Where do I put the generated robots.txt?

Accepted Answer

At the root of your site, served at /robots.txt with Content-Type: text/plain. If you're on Vercel / Cloudflare Pages / Netlify, drop it in your public/ folder. On WordPress, install a robots.txt plugin or use a custom rewrite rule. On nginx/Apache, the file goes in your document root. Test it after publishing by running the URL through Citare's AI Robots.txt Checker — that fetches your live robots.txt and validates per-bot rules.

Question 4

Why does the generator output a User-agent group for every bot instead of just using User-agent: *?

Accepted Answer

Because robots.txt rules don't cascade the way most people expect. When a bot's User-Agent matches an explicit User-agent group, ONLY that group's rules apply — the wildcard User-agent: * is ignored. So if you write 'User-agent: GPTBot / Disallow: /' and nothing else, every other AI bot defaults to the wildcard group (or to implicit-allow if no wildcard exists). To control AI bot access reliably, every bot needs its own explicit User-agent group. The Citare generator outputs one per bot so the rules are deterministic.

Bot	Provider	Decision
GPTBot	OpenAI
OAI-SearchBot	OpenAI
ChatGPT-User	OpenAI
ClaudeBot	Anthropic
Claude-User	Anthropic
Google-Extended	Google AI
GoogleOther	Google AI
PerplexityBot	Perplexity
Perplexity-User	Perplexity
Applebot-Extended	Apple
meta-externalagent	Meta
CCBot	Common Crawl
Bytespider	ByteDance / TikTok
Diffbot	Diffbot

Bot	Provider	Decision
Googlebot	Google
Bingbot	Microsoft
DuckDuckBot	DuckDuckGo
YandexBot	Yandex

robots.txt Generator

AI bots

Search engines

Frequently asked

What's the difference between the 'block training, allow grounding' preset and 'block all AI'?

Should I allow or block GPTBot?

Where do I put the generated robots.txt?

Why does the generator output a User-agent group for every bot instead of just using User-agent: *?

More free tools