OpenAI seeks NYT source material for copyright defense

The dispute stems from a lawsuit filed by The New York Times against OpenAI and Microsoft, alleging copyright infringement.
OpenAI seeks NYT source material for copyright defense

In a high-stakes legal battle over using copyrighted materials to train artificial intelligence models, OpenAI is pushing the New York Times to disclose detailed source materials to prove the originality of its articles. In the request, part of the ongoing lawsuit The New York Times Company v. Microsoft Corp, they say, “Prove your stories are original.”

They’re asking for ‘informal discovery conference’—basically, they want the judge to force the Times to show its cards. OpenAI, which is co-defending together with Microsoft, says they need to see the paperwork that proves the Times owns and came up with all those articles they’re fighting over.

The legal background

In the lawsuit, filed last year, the New York Times alleges that OpenAI and Microsoft used its copyrighted content to train AI models without proper authorization or compensation. The case is part of a broader wave of legal actions brought by various rightsholders, including record labels, book authors, and visual artists, against AI companies.

Now, because they are in the discovery phase, both parties can request evidence from each other. This phase allows one party to seek documents, testimonies, and other materials from the opposing side to properly support or refute the copyright infringement claims at the heart of the lawsuit.

OpenAI’s requests

On July 1, OpenAI filed several specific requests for production (RFPs). For instance, RFP 12 demands “documents sufficient to show each and every written work that informed the preparation of each of Your Asserted Works, regardless of its length, format, or medium.” Essentially, OpenAI wants to see the raw materials that Times journalists used to create their articles, including reporter’s notes, interview memos, and records of materials cited.

“Having chosen to put directly at issue how the Times created the works at issue—including the methods, time, labor, and investment—OpenAI has a right to discovery into the same,” OpenAI states in their motion.

OpenAI argues that for the Times to claim copyright infringement, it must show that the contested works are original and not merely derivative of other sources. They cite precedents like Feist Publications, Inc. v. Rural Tel. Serv., which states that copyright protection only extends to original aspects of a work.

The Times’ objection

On July 3, The New York Times responded by vehemently opposing this demand, arguing that it is irrelevant and overly burdensome. They assert that the expressive nature of their work should be judged by the final articles themselves, not the underlying materials. The Times also emphasizes protecting their newsgathering process, citing the reporter’s privilege under the First Amendment and the New York Shield Law, which protects journalistic source materials from disclosure.

“OpenAI’s claim that it needs all ‘reporter’s notes, interview memos, records of materials cited, or other ‘files’ for each asserted work’—purportedly to determine whether The Times’s works are in fact protectable intellectual property—is unprecedented and turns copyright law on its head,” the Times argues in their response.

They maintain that such invasive discovery is unnecessary and serves no purpose other than to harass the newspaper.

Concerns over chilling effect

The Times warns that granting OpenAI’s requests could set a dangerous precedent, potentially deterring news organizations from pursuing copyright infringement claims in the future because they fear having their confidential newsgathering processes exposed.

Indeed, given the wildly improper scope of this request, one has to wonder if a chilling effect is exactly what OpenAI, who appears to have stolen from millions of content creators, is hoping for.

The Times

“Permitting OpenAI to investigate The Times’s privileged newsgathering process would have serious negative and far-reaching consequences,” the Times states. They argue that the undue burden of such discovery is disproportionate to the needs of the case and could significantly impact the integrity and confidentiality of investigative journalism.

What’s next?

This legal battle highlights the complex issues surrounding copyright law in the age of artificial intelligence. As AI companies train their models on vast amounts of data, including copyrighted works, questions about fair use, originality, and the nature of creative work are becoming increasingly relevant.

The outcome of this case as a whole could have far-reaching implications for both the AI industry and content creators.

The Times is arguably the only one with enough muscle to actually win this thing. If they pull it off, it could change the whole game regarding these AI companies snatching up everyone’s stuff to train their fancy answering machines.

The court has yet to rule on OpenAI’s motion, and it remains to be seen what impact this decision will have on the broader lawsuit.

You can access OpenAI’s motion to compel the New York Times to provide the requested documents here (pdf), which also outlines additional disputed issues. The New York Times’ counter-response is available here (pdf).

Posted by Alex Ivanovs

Alex is the lead editor at Stack Diary and covers stories on tech, artificial intelligence, security, privacy and web development. He previously worked as a lead contributor for Huffington Post for their Code column.