
Meta CEO Mark Zuckerberg recently defended his company’s use of a copyrighted e-book dataset in a deposition for the ongoing AI copyright case, Kadrey v. The case is part of a massive lawsuit that involves AI companies as defendants and copyright holders are seeking to determine whether training AI models using copyrighted materials constitutes ‘fair use’.
Zuckerberg likened Meta’s predicament to YouTube’s struggle to remove unlawful content and said that, at times, it is not rational to do so and there should be an all-hands-on-deck approach toward one challenge.
The plaintiffs in the case are some of the most renowned writers such as Sarah Silverman and Ta-Nehisi Coates have accused Meta of claiming that it uses pirated materials to train its AI models. Llama AI models were also allegedly trained by Meta using LibGen, a site containing pirated eBook copies. Another website identified as LibGen which hosts links to copyrighted works has been subjected to numerous lawsuits and fines.
In his deposition, Zuckerberg said that Meta used LibGen to train at least one version of its llama models even as the company flagged potential legal issues. Nevertheless, when asked whether he heard about LibGen he replied that he did not, he said: “I haven’t heard of that specific thing.” He noted that Meta should be careful in one way, dealing with copyrighted materials for training the AI; however, he supported the approach to using big datasets in training at large.
Much like how Rosenberg’s testimony was centered on YouTube’s policies regarding alleged pirated videos, Zuckerberg defended Facebook’s action based on how YouTube dealt with the matter. Nonetheless, YouTube, which is known to have allowed videos that contain pirated content in the past, does everything in its power to delete such material for the sake of adhering to the provisions of the copyright rules. Zuckerberg insisted that it was possible that a complete prohibition against using copyrighted data in training AI models could be ‘unfair,’ similar to YouTube’s hosting of stolen music.
Zuckerberg underlined that YouTube and Meta shouldn’t be taken to court every time there’s pirated content, especially if the platform tries to take it down. However, he opened up that Meta needs to tread carefully because training models that involve pirated data is a violation of intellectual property rights.
As stated in their amended complaint, the plaintiffs claim that Meta utilized LibGen not only for the training of the Llama models but also for the coordinated search of pirated books with licensed counterparts. It is claimed that this strategy assisted Meta in identifying which books deserved to be licensed. Other allegations include that meta-researchers tried to conceal the usage of pirated content by adding what the plaintiffs refer to as “supervised samples” into the fine-tuning process of Llama models.
The amended complaint also alleges that Meta was downloading e-books from Z-Library, another site that delivers pirated material, to train Llama models. The communities that have been targeted by legal actions are the targets of allegations that include copyright infringement, wire fraud, and money laundering by Z-Library.