The Anthropic settlement made headlines because of its size — $1.5 billion is the largest copyright settlement in US history. But Anthropic wasn't alone in using pirated books to train AI. Meta's LLaMA models were trained on the same shadow library datasets. The lawsuits that followed — collectively known as Kadrey v. Meta — could eventually produce a settlement of comparable scale.
Here's what you need to know.
The Dataset at the Center: Books3
To understand the Meta lawsuit, you need to understand Books3.
Books3 is a dataset of approximately 196,000 full-text books assembled in 2020 by researcher Shawn Presser. The books were scraped primarily from Bibliotik, a private torrent tracker for ebooks — meaning the books were pirated material. Books3 was published as part of The Pile, an 800-gigabyte AI training dataset assembled by EleutherAI.
Books3 contains novels, memoirs, self-help books, academic texts, science fiction, literary fiction, and more. The authors who wrote those books never licensed their work for inclusion in an AI training dataset. They received no compensation.
When Meta used Books3 to train its LLaMA models, it copied and processed those ~196,000 books — and in the view of the plaintiffs — infringed the copyright of every author whose work was included.
The LLaMA Models and What They Were Trained On
Meta's LLaMA (Large Language Model Meta AI) is a family of open-source large language models. LLaMA 1 was released in February 2023. LLaMA 2 followed in July 2023, and subsequent versions have continued to expand the model's capabilities.
According to Meta's own documentation and court filings, LLaMA was trained on a corpus that included:
- Books3 (~196,000 full-text books)
- The Pile (including Books3 and other copyrighted material)
- Common Crawl (broad internet text)
- GitHub (code)
- Wikipedia and other encyclopedic sources
The inclusion of Books3 is the central issue in the copyright lawsuit. Meta did not license the books in Books3. It processed them — reading, tokenizing, and training neural network weights on the text — without permission from the authors.
The Kadrey Lawsuit: Who Filed and Why
In July 2023, three authors filed the initial complaint against Meta in the Northern District of California:
Richard Kadrey — best-selling fantasy author known for the Sandman Slim series, spanning more than a dozen novels.
Christopher Golden — prolific horror and dark fantasy author with over 50 novels to his name, including the bestselling Ararat and The Pandora Room.
Sarah Silverman — comedian, actress, and author of the 2010 memoir The Bedwetter.
Their complaint alleged that Meta copied their books without authorization as training data for LLaMA, violating the Copyright Act. They sought statutory damages and injunctive relief.
The Authors Guild and other organizations subsequently filed related actions. The consolidated litigation now represents a class of authors whose books appeared in Meta's training datasets.
What Meta Has Argued
Meta has not rolled over. The company has contested the lawsuits on several grounds:
Fair use. Meta argues that using text to train AI is "transformative" and therefore protected by the fair use doctrine. This is the same argument Anthropic made — and ultimately settled rather than test fully in court.
Lack of substantial similarity. Meta has argued that LLaMA's outputs don't "reproduce" the plaintiffs' works in a copyright-meaningful sense — that training is more like reading than copying.
Procedural challenges. Meta has sought to limit the scope of the class and narrow the claims.
Early court rulings have been mixed. Some claims have been narrowed; others have survived. The core copyright infringement allegations remain active as of April 2026.
Current Status: Active Discovery
The cases are in the discovery phase as of April 2026. No settlement has been announced.
Discovery is the phase where plaintiffs seek information from Meta about its training data, how it was compiled, and how LLaMA was developed and deployed commercially. This process can be contentious — companies typically resist disclosing detailed information about training data — and can take months or years.
The outcome of discovery often determines settlement leverage. If plaintiffs gain access to detailed evidence about what was in LLaMA's training data, Meta's incentive to settle increases.
How This Compares to Anthropic
The Anthropic case and the Meta LLaMA case share almost identical legal DNA:
- Both involve Books3 and shadow library datasets
- Both allege copyright infringement through AI training
- Both cases are in the Northern District of California
- Both involve a class of affected authors
The key difference is that Anthropic settled. Meta has not. That could mean a longer wait for authors — but potentially a comparable payout if and when settlement comes.
The Anthropic settlement ($1.5 billion for ~400,000 books) serves as a useful benchmark. Meta has a market value in the hundreds of billions and has significant resources if courts find substantial liability. A Meta settlement, if it comes, could be structured similarly.
What Authors Should Do Now
1. Check the Anthropic case first. The same books that appear in Books3 often appear in the Anthropic Works List. The Anthropic settlement is already approved and moving toward distribution. Check if your works qualify at TrainedOnYou.com/cases/anthropic/check-works.
2. Document your publications. Note which books you've published with their ISBNs and publication dates. This information will be essential when Meta case claims become available.
3. Join the Meta waitlist. Sign up at TrainedOnYou.com/cases/meta and we'll notify you the moment a settlement is announced or claims open.
4. Consult your agent or publisher. If you're represented, ask whether they have any information about your books appearing in Books3 or similar datasets.
The Anthropic case proved that authors can win — or at least settle for substantial money — in these AI copyright disputes. The Meta case follows the same legal logic. Authors who prepare now will be positioned to act quickly when the case resolves.
TrainedOnYou is an independent litigation finance company. We are not affiliated with Meta Platforms, Inc. or any plaintiff in the Kadrey v. Meta litigation. This article is for informational purposes only and does not constitute legal advice.