arrow Products
Glide CMS image Glide CMS image
Glide CMS arrow
The powerful intuitive headless CMS for busy content and editorial teams, bursting with features and sector insight. MACH architecture gives you business freedom.
Glide Go image Glide Go image
Glide Go arrow
Enterprise power at start-up speed. Glide Go is a pre-configured deployment of Glide CMS with hosting and front-end problems solved.
Glide Nexa image Glide Nexa image
Glide Nexa arrow
Audience authentication, entitlements, and preference management in one system designed for publishers and content businesses.
For your sector arrow arrow
Media & Entertainment
arrow arrow
Built for any content to thrive, whomever it's for. Get content out faster and do more with it.
Sports & Gaming
arrow arrow
Bring fans closer to their passions and deliver unrivalled audience experiences wherever they are.
Publishing
arrow arrow
Tailored to the unique needs of publishing so you can fully focus on audiences and content success.
For your role arrow arrow
Technology
arrow arrow
Unlock resources and budget with low-code & no-code solutions to do so much more.
Editorial & Content
arrow arrow
Make content of higher quality quicker, and target it with pinpoint accuracy at the right audiences.
Developers
arrow arrow
MACH architecture lets you kickstart development, leveraging vast native functionality and top-tier support.
Commercial & Marketing
arrow arrow
Speedrun ideas into products, accelerate ROI, convert interest, and own the conversation.
Technology Partners arrow arrow
Explore Glide's world-class technology partners and integrations.
Solution Partners arrow arrow
For workflow guidance, SEO, digital transformation, data & analytics, and design, tap into Glide's solution partners and sector experts.
Industry Insights arrow arrow
News
arrow arrow
News from inside our world, about Glide Publishing Platform, our customers, and other cool things.
Comment
arrow arrow
Insight and comment about the things which make content and publishing better - or sometimes worse.
Expert Guides
arrow arrow
Essential insights and helpful resources from industry veterans, and your gateway to CMS and Glide mastery.
Newsletter
arrow arrow
The Content Aware weekly newsletter, with news and comment every Thursday.
Knowledge arrow arrow
Customer Support
arrow arrow
Learn more about the unrivalled customer support from the team at Glide.
Documentation
arrow arrow
User Guides and Technical Documentation for Glide Publishing Platform headless CMS, Glide Go, and Glide Nexa.
Developer Experience
arrow arrow
Learn more about using Glide headless CMS, Glide Go, and Glide Nexa identity management.

AI bot blocking builds momentum

More sites get on board the AI bot blocking train - a trend becoming more widespread, with larger sites going first.

by Rob Corbidge
Published: 14:51, 04 September 2023

Last updated: 14:54, 04 September 2023
Tech hand stopping

Close to 20% of the world's top 1000 websites are now blocking AI training crawlers from their content, new research has shown.

Last week it was revealed that comparatively few of the most popular news sites were blocking such crawlers, such as GPTBot from OpenAI's ChatGPT, yet the new data shows that momentum is building and more sites across categories are implementing blocks.

The data, from AI detection company Originality.ai, shows that:

  • GPTbot blocking increased from 9.1% on August 22 to 12% just one week later on August 29
  • The Common Crawl Bot (CCBot) is being blocked 6.77% of the time
  • CCBot is blocked about half as often as the GPTBot is blocked
  • No website in the Top 1000 is blocking Anthropic AI
  • 18.6% of the Top 1000 websites are blocking at least one AI crawler

While general search engine crawling of sites is typically seen as desirable for publishers, so their content can be found by search engines and rewarding those who take good care of their site maps, AI crawling is a new phenomenon for most to have to consider. 

The industry viewpoint seems to be hardening against it: there is no true understanding yet of the specifics of how AI systems will use or credit the crawled content in answers it may generate, and more generally how much it leverages the free publisher-created content and data to train and boost the effectiveness of those AI systems and - it follows - benefit from that content financially.

A new relationship is being formed between those who create content and those who wish to create content on the back of the original content. It would seem obvious that the former party would have the most control over the content, yet it seems that is precisely what is being fought over.