News

Perplexity AI Accused of Bypassing Anti-Scraping Measures to Access Restricted Content

Perplexity Allegedly Used Disguised Crawlers to Bypass Robots.txt Across Millions of Requests

Written By : Bhavesh Maurya

Reviewed By : Shovan Roy

Published:5th Aug, 2025 at 1:05 PM

Updated:5th Aug, 2025 at 1:05 PM

AI startup Perplexity has come under fire again, this time for allegedly scraping from websites that specifically blocked the startup's crawlers. The criticism comes from Cloudflare, a large internet infrastructure company, which claims the startup used evasive means to avoid the anti-scraping techniques that the website administrators put in place.

Cloudflare published findings earlier this week alleging that Perplexity’s bots accessed restricted websites in violation of the widely accepted Robots.txt standard, a file used by websites to instruct automated bots on which areas they are allowed to access.

Perplexity AI is under scrutiny for allegedly bypassing website restrictions during data collection. Cloudflare reports that Perplexity's crawlers disregarded these guidelines and tried to conceal their identity by employing sophisticated methods.

Cloudflare's Findings: Sophisticated Evasion Tactics

Cloudflare noted that it saw Perplexity's crawlers disguising themselves using a variety of schemes, such as rotating user-agent strings, which are designed to identify the type of browser and device being used to access a site, as well as changing IP addresses across different Autonomous System Numbers (ASNs), to disguise their crawlers as browsers like Google Chrome for macOS.

Legal experts believe that ignoring robots.txt violations could escalate the broader web scraping controversy around AI-generated content. Critics argue that Perplexity AI needs to adopt more transparent and ethical data sourcing practices. The scope of the operation was sizeable. Cloudflare analysis revealed that the same methods were being used at tens of thousands of websites, resulting in millions of automated requests.

The issue of Perplexity AI scraping has reignited debates about Artificial Intelligence training methods and data ownership. Cloudflare said the activity was detected through a combination of machine learning analysis and network monitoring, triggered by reports from customers whose sites had been accessed despite having Perplexity’s bots blocked.

Also Read: Aravind Srinivas Predicts Perplexity AI's Comet Browser May Replace Two Jobs

Perplexity Denies Allegations

Responding to the claims, Perplexity’s spokesperson Jesse Dwyer dismissed the report as 'a sales pitch,' asserting that the bots named in Cloudflare’s findings did not belong to the company. Dwyer claimed that the screenshots Cloudflare reported did not show instances of actual data scraping.

Cloudflare responded to Dwyer, reiterating that controlled testing connected the activities to Perplexity's systems. In direct reference to these findings, Cloudflare removed Perplexity from its approved crawler list and implemented new protections to block this kind of access for publishers.

Broader Implications for the AI Industry

This event adds to the list of controversies about Artificial Intelligence startups and content scraping. Perplexity also drew backlash in 2023 for allegedly plagiarizing journalistic content and using subscription-based materials without permission.

Cloudflare's decision reflects a bigger change in how internet companies are addressing the data collection practices of AI systems. Cloudflare recently launched a marketplace that enables website owners to charge Artificial Intelligence companies for using their data, as well as to continue offering free tools to block AI training bots.

As generative AI tools continue to exist based on huge datasets of scraped content from the internet, more conversations are emerging regarding issues of digital ownership, consent, and responsible AI development.

Perplexity AI Accused of Bypassing Anti-Scraping Measures to Access Restricted Content

Perplexity Allegedly Used Disguised Crawlers to Bypass Robots.txt Across Millions of Requests

Cloudflare's Findings: Sophisticated Evasion Tactics

Perplexity Denies Allegations

Broader Implications for the AI Industry

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

Also Read

Dogecoin Breaks Key Support Level: What it Means for Investors

Solana Spot ETF Assets Top $1.06 Billion on Rising Institutional Interest

XRP Network Activity Climbs as Investors Buy the Dip

Crypto News Today: American Bitcoin Plans Reverse Split to Protect Nasdaq Listing

Crypto Market Update: Spot Ethereum ETF Outflows Reverse as Institutional Gateways Launch