Background
In May 2023, a group of authors filed a class action against Anthropic, the AI safety company behind the Claude large language model. The core allegation: Anthropic downloaded millions of books from pirate websites — specifically LibGen and PiLiMi — and used them as training data for Claude without authorization or compensation.
The lawsuit does not challenge whether AI training on copyrighted text is fair use in general. (That question is before other courts.) Instead, it focuses specifically on how Anthropic obtained the books — by downloading pirated copies from databases that had themselves copied the books illegally. Judge William Alsup’s preliminary rulings drew a line: using published books as training data might be defensible, but storing pirated copies is not. That distinction framed the liability Anthropic faced.
The Settlement
In August 2025, Anthropic agreed to a $1.5 billion class settlement — the largest copyright settlement in U.S. history. Key terms include:
- $1.5 billion settlement fund, paid in four installments: October 2025, April 2026, September 2026, and September 2027.
- ~482,000 eligible titles, with roughly 92.77% of class works claimed by rightsholders by the fairness hearing.
- ~$3,100 per work (after costs and fees), shared among all rightsholders for each title.
- Destruction of pirated copies: Anthropic must delete all books downloaded from LibGen and PiLiMi datasets.
- No future license: Participating in the settlement does not grant Anthropic any right to use class members’ works going forward.
A fairness hearing was held May 14, 2026. Judge Martínez-Olguín heard from class counsel and seven objectors, praised the settlement’s exceptionally high 92.77% claims rate, and took the matter under submission for final review.
Key Takeaways
- The piracy theory, not the training theory, drove liability. Anthropic’s exposure arose from where it got the training data, not from the act of training itself. The settlement does not resolve whether AI training on lawfully obtained copyrighted material is fair use — that fight continues in other cases.
- $3,100 per book is the new benchmark. The per-work payout, while not a court-ordered damages award, will shape settlement negotiations in every subsequent AI copyright case involving pirated training data.
- Destruction of training data is now a settlement term. Requiring Anthropic to delete the pirated copies sets a precedent: copyright owners can demand not just payment but deletion of infringing training datasets.
- No license means future liability remains. Authors who participate in the settlement are not giving Anthropic a blanket license. Anthropic must negotiate separately for any future use of class members’ works.
Why It Matters
Bartz v. Anthropic is a landmark not just for its dollar amount but for what it signals about the limits of the training-data fair use theory. Anthropic did not settle because AI training is unlawful — it settled because it used pirated books, a separate and harder-to-defend liability. But the $1.5 billion price tag sends a message to the entire AI industry: if you train on pirated content, the exposure is enormous.
For authors and publishers, the settlement establishes that the right to license content for AI training has real monetary value. For AI companies, it underscores the importance of data provenance — knowing not just what you trained on, but how that data was obtained. The era of assuming all publicly available text is free to use for AI training is over.