AI Crawlers and Your Therapy Website: The robots.txt Guide for 2026
GPTBot, ClaudeBot, OAI-SearchBot — six AI companies are crawling your therapy website right now. Here's which bots to block, which to allow, and exactly what your robots.txt should say to maximize AI discoverability without giving away your clinical content.
Who Is Currently Crawling Your Therapy Website?
Right now, while you're reading this, AI companies are crawling your therapy website. GPTBot from OpenAI. ClaudeBot from Anthropic. PerplexityBot from Perplexity. GoogleOther from Google. Applebot-Extended from Apple. Each is reading your pages, indexing your content, and using it for purposes your robots.txt file — written in a pre-AI era — was never designed to manage.
OpenAI alone tripled its web crawl activity in 2026 following the release of GPT-5. Its search crawler (OAI-SearchBot) now reaches 55% of the indexed web — a figure derived from an analysis of 66 billion bot requests (ALM Corp, 2026). Your therapy website is almost certainly in that crawl.
The problem most therapists face isn't that their website is being crawled — it's that their robots.txt file either doesn't address AI crawlers at all (leaving access ambiguous), or was set up once at site launch and never revisited. AI crawlers in their current form didn't exist before 2023. Any robots.txt written before that is operating on assumptions that no longer hold.
The good news: robots.txt gives you precise, granular control over who can access your site and what they can do with it. You can block AI training while fully welcoming AI search — and for therapists, that distinction matters considerably.
The Critical Distinction: Training Bots vs. Search Bots
Every major AI company operates at least two different types of web crawlers, with completely different purposes — and you can control them independently. Most therapist SEO guides treat all AI bots as one category. They're not.
Training crawlers gather content to build and improve AI models. When ClaudeBot visits your site, it's collecting your clinical descriptions, FAQ answers, blog posts, and approach explanations to feed into Anthropic's training dataset. When GPTBot visits, it does the same for OpenAI. Blocking these bots means your content is not used to train AI models — but it has no effect on whether AI search engines recommend you to prospective clients.
Search and retrieval crawlers index your content in real time to answer user questions. When someone asks ChatGPT "find me an anxiety therapist in Portland," a separate OpenAI crawler — OAI-SearchBot — indexes local websites to find relevant results. When a user asks Claude a question that requires current web information, Claude-User fetches it in real time. These are the crawlers that determine whether you appear in AI answers.
These crawler types use different user-agent strings and are controlled independently in robots.txt. Blocking GPTBot does not block OAI-SearchBot. Blocking ClaudeBot does not block Claude-SearchBot. OpenAI's documentation confirms: "each setting is independent of the others — a webmaster can allow OAI-SearchBot in order to appear in search results while disallowing GPTBot" (OpenAI, 2026).
The strategic framework for therapists is straightforward: block training bots, allow search bots.
Should Therapists Block AI Training Crawlers?
Yes — and the reasons are specific to the nature of therapy practice.
Client privacy and content sensitivity. Your website contains clinical descriptions, therapeutic approach explanations, sample intake language, and sometimes client testimonials. This content is sensitive in the healthcare context in a way that a retail website's content is not. Allowing AI companies to use it as training data means your therapeutic methodology gets absorbed into a commercial model, client-adjacent language appears in a training corpus without explicit consent, and your proprietary clinical approach is replicated at scale.
HIPAA boundary considerations. Your website doesn't contain PHI directly — but it describes your modalities, client populations, and clinical focus in ways that, combined with other data sources, can create ambiguity around de-identification. Blocking training crawlers is the conservative choice that aligns with your existing obligations as a licensed mental health professional.
Intellectual property. Your service descriptions, FAQ sections, blog posts, and clinical explanations reflect years of expertise. Training data is commercially valuable to AI companies. Opting out of their training pipelines is your right under robots.txt protocol — and it costs you nothing in terms of AI search visibility.
The critical clarification: blocking training bots does not prevent ChatGPT, Claude, or Perplexity from recommending you. The bots that feed AI search answers (OAI-SearchBot, Claude-SearchBot, PerplexityBot) are separate from the training bots (GPTBot, ClaudeBot). You can block one while actively welcoming the other.
The Complete AI Crawler Map for Therapist Websites
Here is every AI crawler that matters for therapy websites in 2026, its purpose, and the recommended configuration:
| Bot / User-Agent | Company | Purpose | Recommendation | Why It Matters |
|---|---|---|---|---|
GPTBot | OpenAI | Training data for GPT models | Block | Feeds OpenAI model training — does not affect ChatGPT search results |
OAI-SearchBot | OpenAI | ChatGPT Search / SearchGPT indexing | Allow | Blocking this makes you invisible to ChatGPT search results |
ChatGPT-User | OpenAI | Real-time fetching for ChatGPT user queries | Allow | Used when ChatGPT users ask questions requiring live web access |
ClaudeBot | Anthropic | Training data for Claude models | Block | Feeds Anthropic model training — does not affect Claude search citations |
Claude-SearchBot | Anthropic | Claude search indexing | Allow | Blocking this makes you invisible to Claude when users ask for therapist recommendations |
Claude-User | Anthropic | Real-time fetching for Claude user queries | Allow | Used when Claude users ask questions requiring live web access |
PerplexityBot | Perplexity | Search indexing and user query retrieval | Allow | Primary crawler for Perplexity citations; 780M+ monthly queries (Perplexity, 2026) |
GoogleOther | Google AI features (AI Overviews, Gemini) | Allow | Required for Google AI Overview citations — separate from standard Googlebot | |
Applebot-Extended | Apple | Apple Intelligence search indexing | Allow | Powers Apple Intelligence recommendations on iPhone and Mac |
Bingbot | Microsoft | Bing index (feeds Microsoft Copilot) | Allow | ChatGPT Search uses Bing's index — Bingbot is a prerequisite for ChatGPT Search visibility |
Note on Google-Extended: this user-agent controls whether your content is used to train Google's AI models (Bard/Gemini training data). Like GPTBot and ClaudeBot, it can be blocked without affecting your appearance in Google search or AI Overviews. Add User-agent: Google-Extended / Disallow: / if you want to opt out of Google's AI training dataset as well.
What Your robots.txt Should Look Like
Check your current robots.txt by typing your website URL plus /robots.txt in your browser (e.g., yourpractice.com/robots.txt). If you get a 404, your site has no robots.txt — all crawlers have access by default, including training bots.
Here is the complete recommended robots.txt for a therapist website:
# Block AI training crawlers — protect clinical content and client privacy
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
# Allow AI search and retrieval crawlers — appear in ChatGPT, Claude, Perplexity, Gemini
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: Applebot-Extended
Allow: /
# Standard search engines (required for SEO)
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# Default: allow all other bots not specifically listed above
User-agent: *
Allow: /
Sitemap: https://yourpractice.com/sitemap.xml
On WordPress: Your robots.txt is managed through Settings → Reading, the Yoast SEO plugin (Settings → Search Appearance → Tools → robots.txt Editor), or a physical file at your domain root. Any of these can be edited directly — paste the configuration above, replacing the sitemap URL with yours.
On managed platforms (Squarespace, Wix, TherapySites, Brighter Vision): robots.txt access varies by platform. Some allow custom files; others control it centrally. Contact your platform support to request AI crawler configuration — or compare platforms that handle this automatically.
Why Most Therapist robots.txt Files Are Wrong Right Now
The most common problems we see when auditing therapist robots.txt files:
No file at all. Without a robots.txt, all crawlers — including training bots — have unrestricted access. This is the current default for many therapist websites built before AI crawlers existed. It means GPTBot and ClaudeBot are collecting your clinical content without restriction.
A blanket block from a security plugin. Some WordPress security plugins and older therapist site templates include a legacy line: User-agent: * / Disallow: /. This blocks everything — including OAI-SearchBot, PerplexityBot, and GoogleOther. Therapists using this configuration are invisible to AI search despite having a public website.
An outdated AI allowlist. Earlier guidance on AI discoverability — including a section in our own How ChatGPT Recommends Therapists post — recommended explicitly allowing ClaudeBot and GPTBot. That guidance predates the training vs. search bot distinction becoming widely understood. The updated recommendation: block training bots (GPTBot, ClaudeBot), allow search bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User). Both goals are now achievable simultaneously.
Set-and-forgotten configuration. AI crawlers in their current form didn't exist before 2023. Any robots.txt written before that date was written without these bots in mind. The field is moving fast — Anthropic formalized its three-bot framework in 2025; OpenAI's search crawler reached 55% web coverage by early 2026. A robots.txt audit should now be part of every therapist website's annual maintenance.
What Is llms.txt? The New File That Tells AI What Your Site Is
Beyond robots.txt (which controls who can visit), an emerging standard gives AI systems a curated map of what to read: llms.txt.
Proposed in 2024 and seeing rapid adoption by 2026, llms.txt is a plain text Markdown file you place at yourpractice.com/llms.txt. Unlike robots.txt — which governs access — llms.txt guides AI toward your most important content. Think of it as a handcrafted index: your practice name, a one-paragraph summary, and a curated list of key pages with one-line descriptions of each.
Here is what a therapist's llms.txt might look like:
# Sunridge Therapy — Sarah Miller, LPC
> Licensed Professional Counselor in Portland, OR. Specializing in anxiety,
> trauma (EMDR-certified), and couples therapy. Accepting new clients. In-person
> and telehealth available. Aetna, BCBS, and OHP accepted.
## Services
- [Anxiety Therapy](/services/anxiety): CBT and mindfulness-based approaches for generalized anxiety and panic disorder.
- [EMDR Therapy](/services/emdr): Trauma processing using Eye Movement Desensitization and Reprocessing.
- [Couples Therapy](/services/couples): Gottman Method-informed counseling for communication and conflict.
## About & Credentials
- [About Sarah Miller, LPC](/about): Credentials, clinical philosophy, insurance, and how to get started.
## Resources
- [FAQ](/faq): Common questions about therapy process, fees, insurance, and what to expect in the first session.
When an AI system visits your site, having llms.txt means it can understand your entire practice from a single curated file — rather than crawling each page and assembling a picture from scratch. For Perplexity specifically, which reads llms.txt files when available, this improves citation accuracy: the AI gets your actual specialty description, not a misinterpretation inferred from a page that wasn't written for AI extraction.
Creating a basic llms.txt takes about 20 minutes. Host it at your domain root alongside robots.txt. Keep it under 2,000 words — the goal is curation, not comprehensiveness. Include your top 10-15 pages: services, About, FAQ, and your strongest blog posts.
How Often Do AI Crawlers Visit? (And Why That Changes Everything)
Most therapists assume AI crawlers work like Google — visiting frequently, re-indexing changes within days. The reality is dramatically different:
| Crawler | Frequency Relative to Google | Practical Implication |
|---|---|---|
| Googlebot | Baseline (frequent — days to weeks) | Changes propagate quickly; iterative improvement works |
| OAI-SearchBot (ChatGPT Search) | ~1,500× less frequent than Google | First impression may persist for months; get it right before it crawls |
| PerplexityBot | Similar to OAI-SearchBot | Content structure and freshness matter on the first pass |
| ClaudeBot | ~60,000× less frequent than Google | Changes take a very long time to propagate into model training (less relevant if you block it) |
Even as OpenAI tripled its crawl activity in 2026, AI search crawlers still visit most individual websites far less often than Googlebot. This has a direct implication for how you think about your robots.txt and content: when an AI search crawler visits, every page needs to already be optimized.
You don't get Google's iterative improvement cycle — publish a page, check rankings, refine it, see improvement within a week. With AI crawlers, the configuration that's in place when they visit is the configuration they work from until their next visit, which may be months away. Correct robots.txt, working schema markup, question-format headings, and FAQ sections all need to be in place before the crawler arrives.
Content freshness also matters. Perplexity prioritizes content published within the last 6-18 months for time-sensitive queries (Discovered Labs, 2026). Publishing regular blog posts — like this one covers — signals to AI crawlers that your site is active, increasing both crawl frequency and citation likelihood.
Getting All of This Right Without a Technical Background
Robots.txt editing and llms.txt creation are both achievable without technical expertise — but they require knowing what you're changing. A misconfigured robots.txt can make your site invisible to every search engine simultaneously. Before editing, always:
- Back up your current robots.txt. Copy the existing content to a text file before making changes.
- Test after saving. Use Google Search Console's robots.txt Tester tool to verify your changes work as intended. Check that Googlebot and Bingbot are allowed, and that GPTBot is blocked.
- Verify with a live URL test. After 24 hours, navigate to
yourpractice.com/robots.txtin an incognito browser window to confirm the updated file is live.
For therapists on WordPress using Yoast SEO: navigate to SEO → Tools → File Editor → robots.txt. Paste the configuration from Section 5, updating your sitemap URL. Save. No FTP or hosting panel access required.
For therapists on managed platforms where robots.txt is not editable: this is a meaningful limitation. AI crawler control is one of several reasons therapists are moving away from closed platforms toward purpose-built sites that handle these configurations automatically. See how WebsiteTherapy compares to TherapySites on technical AI discoverability features.
The robots.txt and llms.txt changes described here are two of roughly 17 signals that determine whether AI engines recommend your practice. For the full picture — schema markup, GBP optimization, Bing indexing, Foursquare, review signals — see The Therapist's Complete Guide to AI Discoverability. For how Google AI Overviews specifically use your site content, see Google AI Overviews for Therapists: How to Get Featured in 2026.
At WebsiteTherapy, every practice site ships with robots.txt pre-configured to block training bots and allow search bots, an auto-generated llms.txt file mapping your services and key pages, and ongoing updates when new AI crawler standards emerge. See how it works or explore pricing.
Sources: ALM Corp, "OpenAI Search Crawler Reaches 55% Web Coverage" (2026); OpenAI Developer Documentation, "Overview of OpenAI Crawlers" (2026); ALM Corp, "Anthropic's Claude Bots and robots.txt Strategy" (2026); Search Engine Journal, "Anthropic's Claude Bots Make Robots.txt Decisions More Granular" (2025); Discovered Labs, "Perplexity Optimization: How to Get Cited" (2026); Botify, "OpenAI Has Tripled Their Crawl of the Web" (2026); Better-Robots.com, "ChatGPT-User vs GPTBot vs OAI-SearchBot: Which OpenAI Control Does What" (2026); llms-txt.org specification (2024).