TLDRs:
- Reddit has filed a lawsuit against AI startup Perplexity for allegedly scraping content without permission.
- The lawsuit also names SerpApi, Oxylabs, and AWMProxy for bypassing Reddit’s data protections.
- Reddit seeks damages and a court order to block Perplexity from accessing its content.
- The case highlights growing legal and compliance challenges for AI companies using web data.
Reddit has initiated legal action against AI startup Perplexity, accusing the company of unlawfully scraping its platform to train an AI-powered search engine.
The lawsuit, filed in a New York federal court, also targets Lithuania-based Oxylabs, Russia-based AWMProxy, and Texas-based SerpApi.
The social media giant alleges that these companies circumvented its access controls, including rate limits, CAPTCHA protections, and its robots.txt directives. Reddit claims that Perplexity collaborated with at least one of these entities to obtain Reddit content without proper licensing. According to Reddit, even after sending a cease-and-desist letter last year, citations to Reddit material by Perplexity increased fortyfold.
Allegations of Data Evasion
Central to Reddit’s complaint is evidence of active evasion rather than simple public access.
The lawsuit details nearly 3 billion search engine results pages logged in just two weeks of July 2025, including 1.05 billion scraped by SerpApi over seven days using a technique dubbed “Ludicrous Speed Max” to bypass CAPTCHA and IP restrictions.
Reddit is invoking Section 1201 of the Digital Millennium Copyright Act (DMCA), which prohibits circumvention of technological measures that control access to copyrighted material.
These actions, Reddit argues, represent clear violations of its data-protection protocols. Legal experts note that this case could help define boundaries for AI training on publicly accessible content, as the industry faces growing scrutiny over copyright and licensing issues.
AI Companies Brace for Compliance Demands
The Perplexity lawsuit illustrates broader compliance challenges for AI developers. Many firms now rely on third-party monitoring to ensure that web scraping and data acquisition adhere to copyright law.
Reddit itself planted a “test post” accessible only via Google Search to track unauthorized data scraping, highlighting the increasing sophistication of detection techniques.
Companies like Cloudflare have responded by offering AI bot-blocking tools and marketplaces where automated agents can pay for lawful data access. Meanwhile, Perplexity has publicly stated that it “will not tolerate threats against openness and the public interest” and plans to vigorously defend itself in court. Oxylabs and SerpApi have also indicated they will mount defenses, with Oxylabs expressing “shock and disappointment” over the lawsuit.
Licensing and Legal Precedents
The case underscores the importance of licensed content in AI development. Anthropic’s recent $1.5 billion copyright settlement sets a precedent, encouraging AI companies to secure data agreements before training models.
Perplexity has attempted to mitigate risk by sharing 80% of revenue from related content with publishers and enrolling major media companies, such as Gannett, in its Publisher Program.
Industry observers suggest that licensing agreements and careful vendor diligence will become standard practice for AI firms using web data. Retrieval-Augmented Generation (RAG) and other grounded AI approaches offer safer methods of incorporating external content while reducing exposure to litigation.
As the lawsuit proceeds, the outcome could have far-reaching implications for AI startups and large-scale content platforms alike, potentially reshaping how training data is acquired and used in the rapidly expanding field of generative AI.