If you’ve followed the saga between the New York Times and OpenAI, you know it’s messy. But last week, it hit a new twist: a court order that seems to require OpenAI to do something that may be technically impossible, unethical, and a breach of international laws and regulations.
Magistrate Judge Wang, in the Southern District of New York, issued an order telling OpenAI to “preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying), whether such data might be deleted at a user’s request or because of ‘numerous privacy laws and regulations’ that might require OpenAI to do so.”
What OpenAI will do next is a mystery. And based on my research, it is entirely unclear what OpenAI is supposed to do in this novel situation.
The Legal Backdrop
The NYT sued OpenAI and Microsoft for copyright infringement, alleging that their tools reproduced NYT content, sometimes verbatim, after being trained on its work. In such a situation, it is typical for the parties involved to be required to preserve any information relevant to the lawsuit for discovery and use in the proceedings.
The problem here is that OpenAI is processing billions of prompts and responses. Some from free users, some from low-priced tier users, and others from higher-priced tiers, enterprise tiers, API access tiers, and more. Each of these tiers and access options involve different levels of data privacy expectations according to OpenAI’s various privacy policies.
To complicate matters further, regulatory compliance obligations differ among states and countries, including Colorado’s upcoming AI laws, the EU’s AI laws, and more.
A hearing is now scheduled to examine whether OpenAI failed to comply with the preservation order, or whether the order itself was too broad to begin with.
A Conflict Between Law and Architecture
This case highlights a growing conundrum in AI litigation: most generative AI systems don’t log every user prompt and output. Doing so would raise serious privacy concerns, balloon storage costs, and potentially violate data protection norms. And yet, in the context of litigation, failing to log that data can look like spoliation, the intentional destruction of evidence.
It’s a Catch-22: Save everything, and you risk violating user privacy and creating enormous infrastructure challenges. Save nothing, and you risk court sanctions for failing to preserve potentially relevant material.
Why This Matters for AI Regulation
The preservation issue also reveals a regulatory tension. In May Day! May Day! This is Not the AI Bill You Were Looking For, Phil Nugent wrote about the danger of lawmakers and regulators imposing broad requirements that sound good in theory but don’t hold up in practice. The OpenAI preservation order feels like one of those moments, just from a judge instead of lawmakers.
Courts and policymakers increasingly want transparency, explainability, and accountability from AI providers. But achieving that may require tradeoffs that break core design assumptions, or undermine user expectations of privacy.
In some cases, model providers may not even know what their models were trained on, let alone have logs of what those models produced months or years ago. That’s not a cover-up. That’s how modern AI works.
Where Do We Go From Here?
What should companies do? And what should courts expect?
One option is to define retention requirements before disputes arise, creating clear norms around what should be saved, when, and for how long. Another is to build legal carve-outs that account for the limitations of machine learning infrastructure, especially in consumer-facing tools like ChatGPT.
At minimum, judges and lawmakers need to understand how these systems actually operate before imposing obligations that defy their architecture.
There’s also an access-to-justice concern here. If compliance with a preservation order requires companies to log everything indefinitely, that favors big players with the money to comply and could freeze smaller or open-source developers out of the space. That’s not good for innovation, and it’s not good for equity.
The next hearing in the case will likely bring more clarity. But it won’t fix the fundamental problem: we’re regulating AI with frameworks designed for email servers and file folders. And that’s not going to cut it.