How Does OpenAI's Use of Copyrighted Material Compare to the Google Book Case?

On September 19, a class action lawsuit was filed by the Authors Guild among others against OpenAI and its related companies alleging copyright infringement. It will be interesting to see how this case is ultimately decided given the current copyright rules allowing transformative copying in the Second Circuit, where the OpenAI case was filed.

In Authors Guild v. Google, Inc. (the Google case), the Second Circuit reviewed Google's copying of books for the purpose of providing search functionality and displaying the results. The court held that "Google's unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses" because the "purpose of the copying is highly transformative" and "the revelations do not provide a significant market substitute for the protected aspects of the originals." Authors Guild v. Google, Inc., 804 F.3d 202, 229 (2d Cir. 2015).

The plaintiffs allege that ChatGPT can produce unauthorized summaries of the copyrighted text and has previously been able to provide verbatim quotes. Here are some questions/thoughts on what to watch for in the coming litigation.

How does OpenAI store data in the large language model (LLM) and will it be deemed more or less transformative than in the Google case? In the Google case, the court reasoned it is "different in purpose, character, expression, meaning, and message from the page (and the book) from which it is drawn." My guess is that the OpenAI algorithms are less transformative than in Authors Guild v. Google, Inc. because the purpose, character, and expression of the output from ChatGPT is similar to the books themselves, while a service to locate a copyrighted book using text search is not.
Do the generative aspects of ChatGPT weigh more heavily in the "significant market substitute" factor than in the Google case? Google was providing a search and limited view capability for Books, but ChatGPT allows you to generate content that could be deemed a weak substitute for the original works.
Is the commercial nature of ChatGPT different from the Google case? The Second Circuit noted in the Google case that the commercial nature of the use will not outweigh the transformative purpose of the use absent significant competition with the original. This factor will likely be decided on whether ChatGPT's commercial use is seen as a competition with the original.
Will the Supreme Court of the United States (SCOTUS) step in? The SCOTUS ultimately decided not to hear the Google case at 578 U.S. 941, 136 S. Ct. 1658 (2016). The public interest in generative AI has been enormous this year and the economic impact surrounding its legality cannot be understated. However, the SCOTUS usually takes a patient approach to emerging issues, commonly allowing circuit splits to form and the underlying results of those splits to play out in the lower courts before setting the law of the land with hindsight.

The contents of the datasets OpenAI has used to “train” its LLMs are peculiarly within its knowledge and not publicly disclosed, such that Plaintiffs are unable discern [sic] those contents with perfect accuracy. Plaintiffs make the specific allegations of infringement below based on what is known about OpenAI’s training practices; what is known about the contents, uses, and availability of the pirate book repositories such as LibGen, Bibliotik, and Z-Library; and the results of Plaintiffs’ testing of ChatGPT.

storage.courtlistener.com ...

How Does OpenAI's Use of Copyrighted Material Compare to the Google Book Case?

Media Contact

Featured People