All terms

AI Crawler

GPTBot

OpenAI's web crawler. Used to gather training data and to power ChatGPT browsing. Can be allowed or disallowed in robots.txt; blocking may reduce ChatGPT citation eligibility.

What it is

GPTBot is OpenAI's web crawler that gathers publicly available content for training foundation models and, in some configurations, for ChatGPT browsing context. It identifies itself with the GPTBot user-agent string and crawls from published IP ranges.

Why it matters

Allowing GPTBot can improve the breadth of content OpenAI ingests, which influences how well your domain is represented in model knowledge and citation behaviour. Blocking it removes your content from future training corpora and may reduce long-term familiarity with your brand in generated answers.

How it works

GPTBot honours robots.txt, so access is configured by targeting the GPTBot user-agent with Allow or Disallow rules. You can permit it site-wide, restrict it to specific paths, or block it entirely while leaving other crawlers unaffected.

When it applies

Allow GPTBot when you want your content to inform model knowledge and citation; block it when content is sensitive, paywalled, or you wish to opt out of training use.

Examples

  • robots.txt: User-agent: GPTBot then Disallow: / to block training access entirely
  • robots.txt: User-agent: GPTBot then Allow: /articles/ and Disallow: /drafts/ to expose only published work
  • Server log shows repeated GET requests with user-agent containing GPTBot from OpenAI published ranges

How it is measured

  • Request volume per day from the GPTBot user-agent in server logs
  • Distinct URLs and path depth crawled by GPTBot
  • Ratio of allowed to disallowed responses (200 versus 403) served to GPTBot
  • Source IP verification against OpenAI's published crawler ranges

The Discovery Digest · Every Friday

Stay ahead of AI Search

Five updates a week across ChatGPT, Claude, Gemini, Perplexity, Copilot, Grok and Google AI Overviews, with the questions worth asking.

Free5 updates weeklyUnsubscribe anytime