Amazon Web Services is currently investigating Perplexity AI for its data collection practices. Several media outlets, including Forbes and With cable, have reported that the AI startup is using web archives from AWS to train its models without consent or compensation. It has been confirmed that Amazon is looking into Perplexity’s behavior, and it has been emphasized that all AWS customers must follow the robots.txt file to prevent data scraping. Perplexity has faced criticism for publishing AI-generated news articles based on the work of human journalists, with Forbes accusing the company of “cynical theft” and creating “copycat stories” without proper citations or acknowledgment. Despite some AI companies ignoring the robots.txt standard, Perplexity, OpenAI, and Anthropic have been called out for this behavior. Perplexity has been accused of tracking sites without permission and has received backlash from various news outlets.
Perplexity, which is supported by Jeff Bezos’ Family Fund and Nvidia, aims to rival Google by offering an AI-powered “answer engine.” Tech companies’ attitudes towards news sites and web content have led to a broader backlash, with Google and OpenAI admitting to training their AI tools with publicly available data. Microsoft’s AI CEO contends that any content on the “open web” is considered “fair use” for AI companies, while The New York Times has filed a lawsuit against OpenAI and Microsoft for alleged copyright infringement. Despite some media outlets fighting against unauthorized scraping by AI companies, others have chosen to license their content proactively.
In the midst of these controversies, OpenAI recently introduced its voice assistant ChatGPT AI. The evolving landscape of AI and data collection practices continues to raise ethical and legal concerns within the tech industry, prompting discussions around transparency, consent, and fair use of online content.
Article Source
https://www.pcmag.com/news/amazon-investigates-perplexity-ai-over-potential-data-scraping-violations