PIONEER: Copyright in the Era of Generative AI | Dr Andrés Guadamuz
The merits and limitations of current legal cases in Generative AI
I think a bit more realism, and thinking long-term, and a bit less visceral reaction would be a good thing…
Happy Wednesday, all,
Last Friday’s EGAI was all about how society is scrambling to keep pace with the GenAI revolution as people increasingly recognize that it entails more than just ‘productivity gains.’
On Monday, Geoffrey Hinton, AKA the ‘Godfather of AI,' expressed potential regret for his life's work, left Google, and issued a warning about the existential dangers of (Gen)AI.
However, there are more immediate and pressing concerns before delving into AI's long-term potential risks. One significant dilemma is the absence of a legal framework to interpret AI-generated outputs. If we consider GenAI a driving force for all information, content, and knowledge creation, the lack of a legal framework becomes a major issue.
The uncharted territory of AI-generated output is being mapped rapidly, partly due to the surge in litigation against it, often based on claims of copyright and IP law violations.
To explore these issues and more, I spoke to an exceptional legal scholar in this episode of PIONEERS:
👨⚖️ Andrés Guadamuz is an intellectual property law researcher at the University of Sussex. Throughout his career, he has worked on IP issues related to open-source software, software protection, and text and data mining. Andrés and I discuss AI and 'the Law,' including AI-generated art, so-called ‘homoeopathic copyright,’ and the current legal cases against Generative AI.
In this episode we cover:
Ownership, copyright, and IP [01:19]
Current legal cases in Generative AI [02:47]
Lawsuit 1: Microsoft’s Copilot [04:15]
Lawsuit 2: Class action against Stability AI, Midjourney and Deviant Art [09:29]
The problem with copyright maximalism [18:07]
Open vs closed models and the importance of thinking long-term [26:01]
Looking to the future [30:18]
Lawsuit 3: Stability AI & Getty Images [33:05]
Large Language Models (LLMs) and the lack of legal action in text-generation [37:28]
(To dig deeper into this topic, check out 📚Andrés’ blog about all things law & digital tech.)
The problem with ‘homoeopathic copyright’
One of the first class-action lawsuits against Generative AI companies for copyright infringement occurred in mid-January when three US artists filed against Stability AI, Midjourney, and Deviant Art, accusing them of 'stealing' their work to train their models.
While I sympathize with the artists, this lawsuit could backfire.
Firstly, as Andrés and I discuss, the core claim in the case — the so-called 'derivative' argument — is unlikely to withstand scrutiny in a court of law. The argument goes something like this: if a Generative AI model uses an artist's copyrighted work in its training dataset without permission, then every single output from that model infringes on that artist's copyright.
Not only is this technically incorrect (as it does not convey how diffusion models work), but Andrés also refers to this logic as ‘homoeopathic copyright.’ If we accept the reasoning that even a remote trace of ‘something’ in training data (or elsewhere) is enough to infringe copyright, then essentially everyone in the world is guilty of copyright infringement.
The second problematic issue with the derivative argument is that it could ultimately disadvantage smaller creators and indie artists. If virtually everything created with AI can be subject to copyright claims, then large IP holders would effectively monopolize the Generative AI. This is a bit of a "be careful what you wish for" scenario; fighting for the maximalist copyright position could well backfire for these artists…(and the rest of us.)
So, what can individual creators do? Industry-wide solutions will likely be necessary. Implementing licensing, fairness, and remuneration models, as well as technical standards, might help. In Europe, artists who don't want their work to be part of a dataset can opt-out.
Open vs Closed Models
“Whenever you are opening your datasets, you are opening yourself to litigation.”
Another issue we've been exploring in the PIONEER series is the difference between open-source and closed models. It's also relevant here because the open-source community is important for spreading and democratising these technologies but also makes them vulnerable to legal attacks. Stability AI's open-source models have encouraged innovation and broad use, for example, but the open-source training data has also legally landed them in hot water. (They are the single GenAI company named in most existing litigation.)
The future (opportunities)
While the lawyers are busy duking it out in court (and Andrés predicts they will be doing that for the next 2-5 years), there are still enormous opportunities for Generative AI.
A relevant analogy is the music industry. We often forget that early players, like Napster, faced legal challenges too. However, the revolution they ignited ultimately paved the way for the streaming services that have become second nature today.
The same applies to Generative AI. Countless revenue streams for artists and creators are yet to be imagined. As we enter this new paradigm, it's essential to recall the shifts we've experienced in recent history, with the rise of social media and streaming platforms being other prime examples.
The impact of these changes on society and the economy is undeniable, but the crucial question is: how can we proactively engage with these changes?
That’s enough food for thought today…
See you on Friday with EGAI… and now for…
Namaste,
Nina