Adobe Sued Over AI Training Data: Pirated Books Allegations Explained

Adobe Faces Class-Action Lawsuit Over Alleged Use of Pirated Books in AI Training

Written By:

Reviewed By:

Published on:

18 Dec 2025, 1:25 pm

Adobe’s foray into artificial intelligence has now resulted in a lawsuit against it. A new proposed class-action complaint accused the company of using pirated books to train one of its artificial intelligence language models.

The lawsuit, filed on behalf of Oregon-based author Elizabeth Lyon, claims Adobe relied on copyrighted works, without permission, to develop its SlimLM program.

What is Slimlm, and How Does Adobe Describe it?

Adobe SlimLM models are a series of light language models intended to help with document-related issues on mobile phones. According to the company, these models were pre-trained in an open-source, multi-corpora SlimPajama-627B dataset, published by artificial intelligence hardware company Cerebras in June 2023.

Adobe stated that SlimPajama was built as a deduplicated dataset for large-scale language modeling tasks. But the case at hand claims that the SlimPajama dataset is, in turn, built from a different problematic dataset named RedPajama.

Why are SlimPajama and RedPajama Being Targeted Through a Lawsuit?

Lyon asserts SlimPajama was made by copying and modifying the algorithm of RedPajama, which also integrated ‘Books3’, an enormous collection of some 191,000 books. Books3 has long been criticized for including copyrighted books obtained without the authors’ consent.

The lawsuit claims that as a derivative of RedPajama, SlimPajama reportedly contains Books3 content, including Lyon’s non-fiction writing guidebooks protected by copyright. This is also mentioned in the claim around Adobe AI training its model on pirated content.

Also Read: Adobe Brings Photoshop, Express & Acrobat to ChatGPT in Big AI Push

Why is Books3 So Dramatic Within the AI Industry?

Books3 is also a hot spot in current disputes involving AI training data. Some companies had lawsuits filed against them regarding the use of the product.

In September, authors sued Apple, accusing it of utilizing copyrighted work to train Apple Intelligence. Then, a company called Salesforce had a lawsuit filed involving RedPajama.

Examples of such cases are the growing concerns surrounding the training of AI models and the extent to which “open-source” data relieves firms of copyright infringement claims.

Is Adobe Alone in Facing Such Claims?

Litigation regarding the training data used in AI has also begun to occur. In September, Anthropic agreed to pay $1.5 billion to settle a lawsuit brought by authors who claimed the firm used pirated books to train its Claude chatbot.

The Adobe lawsuit is the latest legal occurrence in the tech industry as courts start to consider whether the use of huge datasets by generative AI infringes existing copyright law, and if companies must pay up when new copyrights are established.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

Tech news