Effect of AI bot blocking on AI overviews/search
I have a client with a small marketing site on Workers w/Static Assets and a custom domain that is now run through Cloudflare.
He wants to know if Cloudflare's AI bot detection will prevent or degrade his site's ability to appear in all of the different types of AI overviews and searches that are becoming common online now. Understandably, his priority is to be featured as widely as possible.
I've been looking at the Cloudflare documentation and also on Google, but haven't found anyone squarely addressing this.
I'm interested in the following:
If you block a site from training, does the AI know the site less well, and, therefore, feature it less often?
- Can someone explain what result Cloudflare's AI bot blocking features (for training or otherwise) has on consumers who are using AI to search for information?
- Should all of these features be turned off if maximum exposure is the goal? (And would that increase costs?)
At present, I believe the client has the following features turned on:
- Cloudflare is managing the robots.txt file by adding AI-specific provisions to it
- A Cloudflare-managed rule is active via a toggle to "Block AI Bots", apparently from training on site data
- Cloudflare has said publicly that it's blocking Perplexity, although, I see in AI Crawl Control that there's a toggle to allow Perplexity? As a result, I'm not really sure what the default position is on this one now. Have I missed an update or further explanation?
Any and all thoughts and/or pointers to docs, etc, are appreciated.
2 Replies
1- Yes, Cloudflare's AI bot detection can degrate the site visibility for ai Crawlers
2- Yes, but it depends. If the AI company explicitly trained their models on your website (generally a person doing RLHF), it can be shown in the response, since the LLM already has some knowledge of your data/site. But for the cases where the LLM wants to know some updated info about your site, or doesn’t have any previous knowledge, it will probably try to access your protected website. If it’s blocked, it will certainly impact the feature (but it also depends, since the LLMs can have some knowledge about your site from third-party mentions).
3- probrably it return 403 Forbidden on AI crawlers, or return 200 bot changllende depending on your configuration
4.1 (princing) - if you are fectching inside a worker(only pages are free), yes, it counts and be charged on you (rf: https://developers.cloudflare.com/workers/platform/pricing/), but the limits are big
4.2 (should you turn on/off?)- dependents on your intention, if wanna to allow your website to be freely scraped by AI bots, yes. If you are afraid of bots that can harm your site, you can customize the bot protection (exp: only allow AI bots and social media bot, block any other). In my personal opinion, yes, if your compliance allow, create a custom rule to allow all AI scrapers to access your website, than can be quite intensive, but hardly(almost impossible) will surpass your workers plan limit (100k requests per day)
Thanks. Pretty good answer. I do wish Cloudflare addressed these issues directly. It's not just esoteric and most companies are small.