TLDRs
- Major publishers accuse Meta of using copyrighted books without permission for AI training
- Lawsuit claims millions of textbooks, novels, and papers used in Llama development
- Case intensifies global debate over AI training and fair use boundaries
- Meta joins growing list of AI firms facing copyright-related legal challenges
Meta Platforms Inc. is once again at the center of a growing legal storm over artificial intelligence training data.
The company is now facing a proposed class-action lawsuit filed by several major academic and book publishers, who allege that Meta used millions of copyrighted books and journal articles without permission to train its Llama AI models.
The lawsuit, filed on May 5 in Manhattan federal court, includes prominent publishing houses such as Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill, alongside author Scott Turow. The plaintiffs argue that Meta’s AI development process relied heavily on protected literary and academic works, forming a key part of its large language model training pipeline.
The case has quickly become one of the most significant legal challenges facing Meta’s AI ambitions, adding fresh pressure to an already heated global debate over how artificial intelligence systems are trained.
Allegations of mass data use
The publishers claim that Meta’s Llama models were trained using a vast collection of copyrighted materials, including textbooks, scientific research papers, and well-known fictional works. Among the titles cited in the complaint are N.K. Jemisin’s The Fifth Season and Peter Brown’s The Wild Robot.
According to the lawsuit, these materials were used without authorization, and the plaintiffs argue that such usage violates copyright protections. They are seeking financial damages as well as broader legal recognition for copyright holders whose works may have been used in AI training datasets.
The complaint also seeks to represent a wider group of authors and publishers, suggesting that the scope of the alleged infringement could extend far beyond the named plaintiffs.
Fair use debate intensifies
At the heart of the case is a growing legal question, whether using copyrighted material to train AI systems qualifies as “fair use.” This issue has become a central battleground in the technology and publishing industries as AI models increasingly rely on massive datasets scraped from books, articles, and online content.
Similar lawsuits have already been filed against other major AI developers, including OpenAI and Anthropic. Courts have so far delivered mixed early rulings, leaving the legal framework unsettled.
The issue gained further attention after Anthropic reached a reported US$1.5 billion settlement in a related copyright case last year, highlighting how costly these disputes can become for AI companies.
Broader concerns over data sourcing
Beyond the current lawsuit, Meta has previously faced scrutiny over how its Llama models were trained. Court filings in separate proceedings have alleged that Meta used large-scale datasets sourced from online libraries often described as “shadow libraries,” including LibGen and Z-Library.
Some claims suggest that tens of terabytes of data were accessed through torrenting systems linked to these repositories. These allegations have raised internal concerns within Meta in the past, with some researchers reportedly questioning the ethics of using such sources for training.
There were also reported internal debates about licensing content versus relying on broad fair-use interpretations, highlighting the tension between legal caution and rapid AI development.
Growing legal pressure on AI firms
The Meta lawsuit reflects a wider shift in how courts and publishers are approaching AI training practices. Recent rulings have begun to distinguish between lawfully obtained content used for training and material allegedly sourced through unauthorized or pirated channels.
Legal experts note that while some courts have found AI training can fall under fair use in certain conditions, the use of pirated datasets may significantly weaken that defense.
As more cases move forward, the outcome of lawsuits like the one against Meta could help define the legal boundaries of AI development for years to come.
For now, Meta joins a growing list of tech giants navigating an increasingly complex intersection of artificial intelligence, copyright law, and digital content ownership.


