Can AI engines even read your site?

Why this is the first question

An AI engine can only cite what it can fetch. If your site blocks the crawlers, no amount of good content matters, because the engines never see it. Many sites block them without realising, usually through a firewall, CDN or bot-protection rule, and that is one of the most common reasons a brand is never cited.

The crawlers worth knowing

OpenAI: GPTBot (training and search) and ChatGPT-User (user-directed retrieval).
Anthropic: ClaudeBot (training), Claude-User (fetches a page when a user asks) and Claude-SearchBot (search indexing). Disabling Claude-User reduces your visibility in user-directed answers.
Perplexity: PerplexityBot and Perplexity-User.
Google: Googlebot reads for Search and its AI features; Google-Extended is a separate control for AI training and grounding.

How access is controlled

Access is governed by robots.txt and by whatever sits in front of your origin (CDN, web application firewall). To allow a bot, simply do not disallow its user-agent. To block one, add a User-agent and a Disallow rule, on every subdomain. Crawlers like Anthropic's honor robots.txt and support Crawl-delay; blocking by IP address is unreliable, because it can stop them reading your robots.txt in the first place.

Check, do not assume

The safest move is to fetch a live page as each bot and confirm it returns a real page, not a challenge or block. Our free check does exactly this for your domain in seconds.