Groundbreaking ideas and research for engaged leaders
Rotman Insights Hub | University of Toronto - Rotman School of Management Groundbreaking ideas and research for engaged leaders
Rotman Insights Hub | University of Toronto - Rotman School of Management

How should we deal with copyright and AI?

Read time:

Joshua Gans

The copyright and AI issues continue to bubble on, with numerous cases making their way through the courts. One issue of note is whether generative AI providers trained on copy-protected content without permission from the rights owners. This is definitely an open issue. Generative AI models are capable of outputting near copies of content without having been trained on that content directly. But regardless, AI providers contend their use of such materials is covered by fair use.

Fair use is something that is quite nuanced and can be difficult to apply. It is where issues such as free speech and copyright protection collide. However, one of the main arguments that the use of content in AI training is covered by fair use is that we have, at least implicitly, decided that the use of content in human training does not infringe copyrights.

It goes beyond this, of course.

Take for example, a person well-versed in a television series taking to social media to answer questions about plot details, characters and key quotes for the series. Is this different from chatbot answering questions about plot details, characters and key quotes from the same series?

Or consider Cliff Notes: Is a person providing summaries of business books different from summaries generated by AI?

Or how about a fan who replicates their favourite superhero comics by hand – is that materially different from an image-generator that creates comic frames of the same hero?

(In the latter example, it’s possible the fan might not be covered under fair use…Maybe a Disney lawyer could weigh in there.)

There is a sense that “what holds for people” ought to imply “what holds for AI.” However, it does then challenge us to ask, why are humans not liable for infringement in these situations? How does that make sense within the rationale for copyright law?

An economic argument

Original content creators face issues if others can copy their content and, in the process, persuade or otherwise reduce the willingness to pay potential customers for that content. We have copyright protection so those creators can say “no” and have it enforceable by law.

At the same time, content can be useful, but copyright protections can stand in the way of that use. A creator wants to earn profits, and so may set monopoly prices that discourage use that might otherwise occur. A creator may also not want to allow use when their content might “leak” or otherwise cause their commercial interests to be harmed.

From an economic perspective, the goal should be to encourage both the creation and use of content. In the end, much of copyright law sensibly places power in the creator’s hands, even if this discourages use on the theory that if you don’t do that, the content won’t be produced, and you won’t have any use of it as a result. This is basically the NYT’s position in its current legal fight against OpenAI. But if I summarize its lead story of the day in written form, would the NYT come after me?

I believe there are two distinct situations to consider. First, in what I call “small AI models,” some specific content is used to train an AI. Because there is a relatively small amount of content, it is feasible for the content creator to identify the use of that content and negotiate with the AI provider. In these situations, the NYT's intuition that copyright protection is the way to go is borne out. It’s needed from a social perspective because it improves content creator incentives, improves the quality of AI training data and allows these to be balanced against harm, if any, to the content creator’s commercial interests.

The second situation, which is the relevant one for most generative AI, is for “large AI models.” In such models, the sheer volume of content is so large that each “bit” has a limited value on its own to AI training — although that training would face a problem if none were available — and it is hard to identify ahead of time whether the use of content in training would damage a content provider’s commercial interests. It could turn out later that such an effect could be measured, however. That rules out any negotiations over use to balance competing interests.

For this situation, whether we want copyright protection or no protection (like a free rein fair use) depends on how valuable content is for AI training in general, and how likely on average content creators’ commercial interests are likely to be harmed. This gives us a clue as to why people don’t get in trouble for the use of content; it is unlikely that use will harm any content provider. This is not a “its' too costly to sue everyone” argument. Instead, it is a “there isn’t likely damage, so creator incentives aren’t harmed” argument.

A better approach

We should consider another approach, which I call an “ex post fair use like mechanism” (yes, I know, it ain’t the catchiest of names, but I’m an economist, not a marketer!). Here’s how it goes:

AI providers use all the content they want to train their AIs. If it turns out that individual content creators’ commercial interests are harmed, they can force the AI provider to pay them for their lost profits (their profits had the AI not existed, less their current profits).

This is different from normal copyright protection in that the copyright holder can’t prevent the use of content in AI training, and the damages are not statutory or punitive but just for lost profits.

If this can be done, it restores all content creators’ incentives to what they would be if the AI didn’t exist, creates the best possible world for training AIs, and ends up having more use of content by consumers. Original content creators are effectively insured against loss, but so long as those losses aren’t so high as to wipe out the AI provider, then AI training can occur without friction. Call me crazy, but this all looks like it would be a big win for everyone.

There are, of course, some practical challenges. First, can we really measure lost profits? Probably not perfectly, but the question is whether the worst instances could be identified and compensation paid. Second, smaller creators may still struggle to get their due. Finally, this subverts moral rights arguments for copyright, but perhaps an opt-out system could be built that preserves those rights. When YouTube had to deal with similar issues, this was the type of thing that it did.

In the end, I am optimistic that there are ways forward that don’t lead to the doom and gloom scenarios for either content creators or generative AI providers.

This article has been lightly adapted from “How should we deal with copyright and AI?” Subscribe to Mess and Magic for more of Gans’ musings on innovation and entrepreneurship. 

 


Joshua Gans is a professor of strategic management at the Rotman School of Management.