Perplexity AI crawlers accused of stealth data scraping

"Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website's preferences."

"We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity, as well as ignoring - or sometimes failing to even fetch - robots.txt files."

Perplexity, an AI search startup, is reportedly avoiding website restrictions by disguising its content-scraping bots. Cloudflare engineers noted that Perplexity modifies its user agent and source ASN to circumvent blocks and ignores robots.txt files. Despite warnings from website owners, Perplexity bots continue to operate using IP addresses outside their declared range, impersonating common browsers like Google Chrome to make millions of requests daily. This behavior highlights a growing challenge in web crawling compliance and raises concerns about data scraping practices within the industry.

#perplexity #web-crawling #data-scraping #bots #robotstxt

Read at Theregister

Unable to calculate read time

Collection

[

...

]

Perplexity AI crawlers accused of stealth data scrapingPerplexity AI crawlers accused of stealth data scraping Briefly

Perplexity AI crawlers accused of stealth data scraping
Perplexity AI crawlers accused of stealth data scraping
Briefly