

Adobe’s foray into artificial intelligence has now resulted in a lawsuit against it. A new proposed class-action complaint accused the company of using pirated books to train one of its artificial intelligence language models.
The lawsuit, filed on behalf of Oregon-based author Elizabeth Lyon, claims Adobe relied on copyrighted works, without permission, to develop its SlimLM program.
Adobe SlimLM models are a series of light language models intended to help with document-related issues on mobile phones. According to the company, these models were pre-trained in an open-source, multi-corpora SlimPajama-627B dataset, published by artificial intelligence hardware company Cerebras in June 2023.
Adobe stated that SlimPajama was built as a deduplicated dataset for large-scale language modeling tasks. But the case at hand claims that the SlimPajama dataset is, in turn, built from a different problematic dataset named RedPajama.
Lyon asserts SlimPajama was made by copying and modifying the algorithm of RedPajama, which also integrated ‘Books3’, an enormous collection of some 191,000 books. Books3 has long been criticized for including copyrighted books obtained without the authors’ consent.
The lawsuit claims that as a derivative of RedPajama, SlimPajama reportedly contains Books3 content, including Lyon’s non-fiction writing guidebooks protected by copyright. This is also mentioned in the claim around Adobe AI training its model on pirated content.
Also Read: Adobe Brings Photoshop, Express & Acrobat to ChatGPT in Big AI Push
Books3 is also a hot spot in current disputes involving AI training data. Some companies had lawsuits filed against them regarding the use of the product.
In September, authors sued Apple, accusing it of utilizing copyrighted work to train Apple Intelligence. Then, a company called Salesforce had a lawsuit filed involving RedPajama.
Examples of such cases are the growing concerns surrounding the training of AI models and the extent to which “open-source” data relieves firms of copyright infringement claims.
Litigation regarding the training data used in AI has also begun to occur. In September, Anthropic agreed to pay $1.5 billion to settle a lawsuit brought by authors who claimed the firm used pirated books to train its Claude chatbot.
The Adobe lawsuit is the latest legal occurrence in the tech industry as courts start to consider whether the use of huge datasets by generative AI infringes existing copyright law, and if companies must pay up when new copyrights are established.