Amazon’s Perplexity AI under investigation for alleged unauthorized website scraping

Spread the love



Amazon Web Services is currently investigating allegations that Perplexity AI may be violating its rules by using a crawler hosted on its servers that ignores the Robots Exclusion Protocol, according to a report from cabling. This protocol is a web standard that developers use to instruct whether or not robots can access certain pages on a website. The investigation was prompted by a previous article from cabling, which discovered a virtual machine hosted on an AWS server with an IP address linked to Perplexity AI. The machine was found to be ignoring the robots.txt instructions on websites and removing content from various reputable sites such as The Guardian, Forbes, and The New York Times.

Despite claims from Perplexity spokesperson Sara Platnick that the company’s PerplexityBot honors robots.txt instructions, it was revealed that the bot will ignore the protocol if a specific URL is included in a user’s query. Perplexity CEO Aravind Srinivas also admitted to using third-party web crawlers in addition to their own crawler, which was identified in the investigation.

While Reuters reported that Perplexity is not the only AI company bypassing robots.txt files, Amazon’s investigation specifically focuses on Perplexity AI. An Amazon spokesperson emphasized that their clients must comply with robots.txt instructions when crawling websites and that AWS’s terms of service prohibit illegal activities. Perplexity has responded to Amazon’s inquiries and denied any violations of the Robot Exclusion Protocol.

It remains to be seen what the outcome of Amazon’s investigation will be, but the issue highlights the importance of respecting web standards and guidelines when collecting content for AI training. As the use of AI continues to grow, companies must ensure that their practices align with industry standards and regulations to avoid any potential legal issues.

Article Source
https://www.engadget.com/amazon-investigating-perplexity-ai-after-accusations-it-scrapes-websites-without-consent-133003374.html