Judge Orders OpenAI to Hand Over 20 Million ChatGPT Logs in NYT Copyright Clash

OpenAI Ordered to Turn Over 20 Million De-Identified ChatGPT Logs After a Judge Ruled the Data is Essential in a New York Times Copyright Case
Judge Orders OpenAI to Hand Over 20 Million ChatGPT Logs in NYT Copyright Clash
Written By:
Kelvin Munene
Reviewed By:
Manisha Sharma
Published on

OpenAI has been ordered to turn over 20 million de-identified ChatGPT conversation logs in the New York Times copyright lawsuit. A US magistrate judge in Manhattan ruled that the anonymized logs are relevant to test allegations that ChatGPT reproduced news content without permission.

Judge Ona T. Wang rejected OpenAI’s request to narrow discovery based on user privacy and operational burden. She found that the 20 million logs represent a small sample compared with the “tens of billions” of records OpenAI stores. According to the ruling, the logs can help evaluate alleged reproductions, damages, and OpenAI’s fair use defenses.

The court directed OpenAI to remove identifying information and to rely on an existing protective order. The logs will have strict confidentiality requirements, including “attorneys’ eyes only” access to much of the data. The judge stated that these safeguards, combined with de-identification, address user privacy concerns.

OpenAI Privacy Concerns Clash With Copyright Discovery

OpenAI argued that producing the ChatGPT logs would expose confidential user information and weaken security practices. The company said that “99.99%” of the requested transcripts do not involve New York Times or MediaNews Group content. It also warned that the demand could encourage similar attempts to obtain private conversations from other AI services.

Publishers and news organizations pushed for access to output data to test how ChatGPT handles their reporting. They said the logs are central to showing whether the system copied or closely tracked their articles. The records also help them challenge OpenAI’s claim that they manipulated prompts to create infringement examples. The court agreed that the sample size is proportional to the needs of the case.

The dispute followed months of arguments over how much user data OpenAI must preserve and produce. Earlier orders required the company to retain a wider range of chat records, including some conversations that users tried to delete. The latest ruling builds on that direction and sets a production timeline.

Ruling Raises Stakes for AI Training Data and Compliance

The copyright lawsuit filed in 2023 forms part of a wider push by authors, media groups, and rights holders. They seek to test how existing copyright laws apply when AI companies ingest and generate content based on protected works. Other technology firms, including Microsoft and Meta, face similar claims in US and European courts.

This discovery order may influence future negotiations over data licensing and generative AI partnerships. It signals that courts may demand detailed output records even when companies cite privacy risks. 

Also Read: Judge Calls Out Apple & OpenAI, Orders Lawsuit to Stay in Texas Amid Musk’s Allegations

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

Related Stories

No stories found.
logo
Analytics Insight: Latest AI, Crypto, Tech News & Analysis
www.analyticsinsight.net