AI Crawlers

How to Audit Your robots.txt for AI Crawler Access

5 min read

Why your robots.txt probably needs updating

Most robots.txt files were last edited years ago, written to manage Googlebot and a handful of other well-known crawlers. AI search crawlers did not exist in their current form until 2023 and 2024. As a result, a large number of sites have robots.txt configurations that either accidentally block AI crawlers or leave their access undefined.

Undefined is not the same as allowed. When a crawler encounters no specific rule for its user-agent, it falls back to any wildcard rule present. A common robots.txt pattern uses a wildcard Disallow that blocks all unrecognised crawlers by default. Every AI crawler not explicitly named in that file is blocked.

Step 1: read your current robots.txt

Go to yoursite.com/robots.txt in a browser. You will see a plain text file. Look for two things: any User-agent: * wildcard rules, and any Disallow entries. A wildcard Disallow: / blocks every crawler that is not explicitly named elsewhere in the file.

# Common pattern that blocks all unnamed crawlers
User-agent: *
Disallow: /

# Only Googlebot is explicitly allowed
User-agent: Googlebot
Disallow:

In the example above, every AI crawler (GPTBot, ClaudeBot, PerplexityBot, and all others) is blocked by the wildcard rule because none of them are named. The site is fully indexed by Google but invisible to every AI search platform.

Step 2: identify which AI crawlers you want to allow

The major AI crawlers and their user-agent strings are:

  • GPTBot: used by ChatGPT (OpenAI) for web browsing and training data
  • ClaudeBot: used by Claude (Anthropic) for web access
  • PerplexityBot: used by Perplexity for real-time search
  • Google-Extended: used by Google for Gemini and AI training (separate from Googlebot)
  • Gemini-Extended: used by Google Gemini for live content retrieval
  • meta-externalagent: used by Meta AI
  • Applebot-Extended: used by Apple for AI features
  • Bytespider: used by ByteDance (TikTok parent) for AI indexing
  • cohere-ai: used by Cohere for AI model training
  • OAI-SearchBot: used by OpenAI for search index building

There are more than 20 active AI crawlers in total. The ones above cover the platforms with the highest current traffic and citation volume.

Step 3: write the correct allow rules

For each AI crawler you want to allow, add an explicit entry with an empty Disallow (which means allow all):

User-agent: GPTBot
Disallow:

User-agent: ClaudeBot
Disallow:

User-agent: PerplexityBot
Disallow:

User-agent: Google-Extended
Disallow:

User-agent: OAI-SearchBot
Disallow:
Place the explicit AI crawler allow rules before the wildcard rule in your file. robots.txt is read from top to bottom and the first matching rule wins. If the wildcard Disallow: / appears first, it may override the specific rules on some crawlers.

If you want to block a specific AI crawler while allowing others, use Disallow: / for that crawler specifically. You are not required to allow all of them. The decision of which platforms to allow is yours.

Step 4: verify the changes

After updating your robots.txt, verify it by navigating to yoursite.com/robots.txt and confirming your new rules are present. Then test each user-agent using Google Search Console's robots.txt tester, or use a dedicated robots.txt testing tool.

The SEOFliq AEO and GEO Suite extension audits your robots.txt against all 24 known AI crawlers automatically. Open it on any page of your site and it shows you a complete access report: which crawlers are allowed, which are blocked, and which have no explicit rule. It runs in seconds with no account required.