Cloudflare, Perplexity Clash on AI Crawling Practices

Recently, there has been a concern related to the battle between Cloudflare, the largest network infrastructure provider, and Perplexity, the startup AI search engine.

The rising dispute is all about the increasing unrest surrounding how AI companies collect and use online data, the concept of an open web, and if these companies could significantly shifts the landscape of web standards.

What are Cloudflare’s Assertions?

Cloudflare had observed the stealth crawling behavior of Perplexity a few days back and has de-listed its verified web crawler bot. The company with world’s biggest CDN, Cloudflare claims that Perplexity has been stealthily collecting data from websites that specifically prohibit its bots from doing so, which violates the directives in the robots.txt files of those websites.

The robots.txt file instructs search engines about which parts of your website the crawlers can access and which they cannot. It is a Robots Exclusion protocol, an internet standard that came into effect in 2022 by the IETF.

Cloudflare launched a “pay-per-crawl” service in June that allows sites to charge AI companies for crawling their sites.

The company’s CEO Matthew Prince, said, “Cloudflare is giving content creators and publishers more control over how their content is accessed, despite the fact that some mischaracterize user-driven AI assistants as malicious bots.”

He further stated that unregulated AI crawling poses an “existential threat” to content creators, as 2.5 million sites have chosen to block AI training since July.

In this scenario, Cloudflare states that when Perplexity’s user agents (Perplexity Bot and Perplexity-User) are blocked or encounter WAF rules, they should switch to undeclared agents and rotate IP addresses to evade detection and gain access.

How is Perplexity’s Action?

Now, the point is that Perplexity has not directly denied using these techniques; moreover, they stated that the issue is a misunderstanding of how AI assistants operate compared to traditional web crawlers.

Perplexity maintains that traditional search engines crawl hundreds of millions of pages to create static indexes, regardless of whether a user has requested them. On the other side, Perplexity’s “user-driven” agents fetch content only in response to a particular user request.

Perplexity explained that, unlike traditional web crawlers that systematically index millions of pages, it only pulls and summarizes the content needed to respond to a specific query. When LegoStar’s Cloudflare stated that Perplexity ignores robots.txt, Pubcon founder Brett Tabke noted that this behavior had already been observable in server logs, and robots.txt had never been a significant roadblock.

They contend that since this situation differs significantly from indiscriminate crawling, it shouldn’t be subject to the exact robots.txt requirements.

That being said, the company states that these requests are temporary and not intended for training AI models, and they were made on behalf of an individual to retrieve information. Perplexity, the search engine, also believes that Cloudflare is misidentifying its traffic as Browser Base, a cloud-based browser service.

The Industry Context

A conflict is emerging as AI chatbots are replacing traditional search engines for retrieving information. This has become a point of debate. AI is everywhere, and companies are ramping up to stay ahead in the competition. Google has already developed AI Overviews that provide summaries even before displaying links to websites.

Although users receive a quick solution to their query, this ultimately reduces traffic to publishers. The Cloudflare–Perplexity dispute thus raises a larger, unresolved issue: In AI services’ attempts to create fresh, high-quality data, how do they respect publishers’ control rights, including monetization and the use of their work?

Traditional Web Ecosystems Staying Behind

The traditional web ecosystem is different and has been in place for decades; it sent users to publishers’ sites. Here, creators could earn money through ads or subscriptions. While AI-powered answer search engines are a barrier to this, as they offer direct summaries, thereby refraining users from visiting the primary source.

“This is why Cloudflare’s bot blockers are concerning,” said Tabke, pointing out that publishers utilizing Google Search cannot avoid having their content used for Google’s AI training or summaries, and will lose visibility in search results.

Lastly, as AI chatbots are set to become the standard method by which people look for information, they have raised many questions despite the benefits they bring.

Check out our news section to keep in the loop on the latest happenings globally.

Also Read:

What is Agentic AI? Key Benefits and Use Cases
Virtual assistants vs. Chatbots: What’s the Difference
Agentic AI: The New Trend Shaping Autonomous Decision-Making in the Tech World

Cloudflare, Perplexity AI tiff Over Web Crawling, AI and Publisher Rights

What are Cloudflare’s Assertions?

How is Perplexity’s Action?

Traditional Web Ecosystems Staying Behind

Categories

Related News

Cineverse Enhances Advanced AI Film & Television S...

Leveling Up Live Streaming: Akamai and Harmonic Bring V...

OneLayer Partners with Future Technologies to Accelerat...

Sublime Security Launches AI Agent that Autonomously Im...

Apple Released iPhone 17 Lineup – iPhone 17 Pro, ...

Mobileum and Telkomsel Partner to Deliver Scalable, AI-...

Comcast Technology Solutions Tapped by Dubai Media Inco...

Mitsubishi Electric to Acquire Nozomi Networks to Impro...

Windward Launches AI-Automated Document Validation to S...

Sumsub AI Academic Program: First-of-its-kind Partnersh...

Jobshark Launches AI-Powered Tools for Smoother Tech Re...

Menu

Useful Links

Socials

Subscribe Via Email