Amazon Web Services is Investigating Perplexity AI Regarding Allegations of Web Scraping

Amazon Web Services is Investigating Perplexity AI Regarding Allegations of Web Scraping



Amazon Web Services is currently investigating Perplexity AI for potential violations of its regulations. The investigation revolves around allegations that the AI ​​search startup may be crawling internet sites that have explicitly prohibited such actions using the Robots Exclusion Protocol, a standard web method for limiting automated access.

Web scraping is the method of using bots to extract content and data from a website. After extracting the underlying HTML code and data stored in a database, the scraper can replicate the entire site’s content elsewhere. Perplexity AI, backed by the Jeff Bezos family and Nvidia fund and valued at $3 billion, is under scrutiny for allegedly disregarding restrictions set by the Robot Exclusion Protocol.

The Robots Exclusion Protocol is not legally binding, but many scrapers historically respected it. Amazon Web Services requires customers to comply with robots.txt guidelines when crawling websites. The company’s terms of service prohibit customers from engaging in illegal activities, and they must adhere to all relevant laws.

Investigations revealed instances of improper scraping and plagiarism linked to Perplexity AI. Despite efforts by Condé Nast engineers to block Perplexity’s tracker through robots.txt files, the company accessed their server through an undisclosed IP address, indicating ongoing scraping activities.

Perplexity AI has been accused of actively scanning news websites that explicitly block bots from accessing their content. The IP address linked to Perplexity’s servers has been detected on sites like The Guardian, Forbes, and The New York Times. The IP address was traced to an Elastic Compute Cloud (EC2) instance hosted on Amazon Web Services, prompting an investigation by the company.

Perplexity CEO Aravind Srinivas defended the company’s actions, stating a “fundamental misunderstanding” about their operations. He mentioned that the IP address used for scraping was managed by a third-party company that provides web crawling and indexing services. Perplexity has not made any operational changes in response to Amazon’s concerns.

In response to inquiries, a Perplexity spokeswoman described Amazon’s investigation as routine and stated that the company had not altered its operations. Amazon Web Services is looking into the matter to ensure compliance with its regulations.

Article Source
https://www.techtimes.com/amp/articles/306143/20240628/amazon-web-services-investigates-perplexity-ai-data-scraping-allegations.htm