Amazon Web Services is launching an investigation into the data mining practices of Perplexity AI following reports from Forbes and With cable that the startup is using Amazon’s web archives without authorization. This has raised concerns about the unethical use of data to train AI models. Perplexity has been accused of not adhering to the robots.txt standard, which specifies guidelines for web scraping. Several media outlets, including Forbes and The New York Times, have identified Perplexity’s IP address accessing their servers without permission. However, Perplexity claims to follow the rules and states that its bot respects robots.txt. Despite the investigation, Perplexity maintains that it is in compliance with Amazon’s terms of service.
The controversy surrounding Perplexity highlights broader issues within the tech industry regarding the use of AI tools for content creation. Companies like Google and OpenAI have faced criticism for using publicly available data to train their AI models without full transparency. Microsoft’s chief AI officer recently claimed that content on the open web is fair game for AI companies to scrape and monetize, leading to concerns about copyright infringement and consent. Some media outlets have taken legal action against AI companies for unauthorized data extraction, while others have chosen to proactively license their content.
Perplexity, backed by Amazon founder Jeff Bezos’ Family Fund and Nvidia, is positioning itself as a competitor to Google by offering an AI-powered “answer engine.” The startup’s alleged disregard for the robots.txt standard has sparked a backlash, with implications for the wider tech industry’s approach to data privacy and ethical practices. As the investigation into Perplexity continues, the debate around AI tools and content creation is likely to intensify, with implications for the relationship between tech companies and media outlets.
Article Source
https://uk.pcmag.com/ai/153035/amazon-investigates-perplexity-ai-over-potential-data-scraping-violations