AI Crawler
GPTBot
OpenAI's web crawler. Used to gather training data and to power ChatGPT browsing. Can be allowed or disallowed in robots.txt; blocking may reduce ChatGPT citation eligibility.
What it is
GPTBot is OpenAI's web crawler that gathers publicly available content for training foundation models and, in some configurations, for ChatGPT browsing context. It identifies itself with the GPTBot user-agent string and crawls from published IP ranges.
Why it matters
Allowing GPTBot can improve the breadth of content OpenAI ingests, which influences how well your domain is represented in model knowledge and citation behaviour. Blocking it removes your content from future training corpora and may reduce long-term familiarity with your brand in generated answers.
How it works
GPTBot honours robots.txt, so access is configured by targeting the GPTBot user-agent with Allow or Disallow rules. You can permit it site-wide, restrict it to specific paths, or block it entirely while leaving other crawlers unaffected.
When it applies
Allow GPTBot when you want your content to inform model knowledge and citation; block it when content is sensitive, paywalled, or you wish to opt out of training use.
Examples
- robots.txt: User-agent: GPTBot then Disallow: / to block training access entirely
- robots.txt: User-agent: GPTBot then Allow: /articles/ and Disallow: /drafts/ to expose only published work
- Server log shows repeated GET requests with user-agent containing GPTBot from OpenAI published ranges
How it is measured
- Request volume per day from the GPTBot user-agent in server logs
- Distinct URLs and path depth crawled by GPTBot
- Ratio of allowed to disallowed responses (200 versus 403) served to GPTBot
- Source IP verification against OpenAI's published crawler ranges
Related terms in AI Crawler
- ClaudeBotAnthropic's web crawler. Powers Claude's retrieval and training. Honour robots.txt; access controls are increasingly material to citation share.
- Google-ExtendedGoogle's opt-out signal for using your content in Bard, Gemini, and AI-powered Search features without affecting classical Search ranking. A separate lever from Googlebot.
- OAI-SearchBotOpenAI's user-agent for ChatGPT Search retrieval (distinct from GPTBot, which is for training). Allowing this is required for inclusion in ChatGPT Search results.
- PerplexityBotPerplexity's crawler. Allows real-time retrieval for Perplexity answers. Blocking reduces eligibility for Perplexity citation.