40373
post-template-default,single,single-post,postid-40373,single-format-standard,stockholm-core-2.4,qodef-qi--no-touch,qi-addons-for-elementor-1.6.7,select-theme-ver-9.5,ajax_fade,page_not_loaded,,qode_menu_,wpb-js-composer js-comp-ver-7.4,vc_responsive,elementor-default,elementor-kit-38031
Title Image

The Impact of Copyright on Training Generative AI Models: Pathway to Innovation or Tech Giant Monopoly?

The Impact of Copyright on Training Generative AI Models: Pathway to Innovation or Tech Giant Monopoly?

This year, lawsuits against Artificial Intelligence (AI) corporations have come hard and fast as authors protest the use of their copyrighted works for training generative artificial intelligence models (Generative AI). Comedian and author, Sarah Silverman is now at the forefront of a lawsuit against Open AI, both in her individual capacity and as class representative for other novelists who claim they did not grant permission to Open AI to use their copyrighted content to train the artificial intelligence model.[1]

Most commonly known for its user interface, Chat GPT, Open AI is owned by an assortment of investors, including the likes of Microsoft, Reid Hoffman, and Khosla Ventures.[2] Now at the center of multiple legal disputes, Open AI’s best bet to weather all incoming claims is the doctrine of fair use. Fair use is a legal defense against copyright infringement for copying copyrighted work.[3] In general, use of a copyrighted work, including reproductions of the work, is not considered infringement if the purpose is to comment, criticize, report, teach, conduct research, or contribute to scholarship.[4]

The Lawsuit

Plaintiffs in Silverman v. Open AI filed suit for monetary compensation on the belief their works were part of the dataset used to train OpenAI in the general rules of human language.[5] Plaintiffs claim every single ChatGPT output necessarily infringes on the author’s right to “derivative works” on the theory that AI outputs rely on a dataset that includes Plaintiff’s books.[6] Plaintiffs’ complaint argues that “every single ChatGPT output—from a simple response to a question (e.g., ‘Yes’), to the name of the President of the United States, to a paragraph describing the plot, themes, and significance of Homer’s The Illiad—is necessarily an infringing ‘derivative work’ of Plaintiff’s books.”[7] Along this line of logic, every single output would additionally infringe on the derivative work rights of every other work included in the training data set.[8]

Plaintiff’s suspicions that their books were used to teach OpenAI how to communicate with users is based on their experience with ChatGPT as it was able to generate summaries of their written works that resemble reviews or book reports, identifying the contents and themes written.[9] Of the six causes of action asserted, Plaintiffs mainly rely on the right to derivative works, and the fact that AI training relies on the copying of work to be able to ingest it and learn from it, to assert a claim for relief.[10] OpenAI’s motion to dismiss, dated for December 7, 2023, to be reviewed by Justice Martínez-Olguín, relies mainly on the doctrine of fair use to counter these claims.[11]

To declare that every single output by ChatGPT necessarily infringes the derivative work rights of the authors whose work was used to train the model is a flawed argument that is not likely to succeed in court. Output by ChatGPT can range from the answer to a math problem, to suggestions for a trip itinerary. The fact of the matter is most of ChatGPT’s answers in its current capabilities constitute simple displays of factual information. As the Supreme Court made clear in Feist v. Rural Telephone Service and cases since then, factual information such as phone number or addresses is not subject to copyright protection.[12] Following that logic, the answer to “Did Sarah Silverman write a book?” or “When was Sarah Silverman’s book published and who is the publisher?” is also not copyrightable. The contents of the book, on the other hand, if outputted by ChatGPT in a substantially similar manner to the book itself, in response to a prompt, would most likely amount to copyright infringement. The distinguishing feature here is the output of authored creative expression and factual truths that exist irrespective of creative authorship. Plaintiffs in the suit go so far as to claim that because the OpenAI was allegedly trained on their work, the simple act of ChatGPT engaging in simulated human reasoning and written communication presents a derivative work.[13] This argument ignores that to infringe on the right derivative works, however, substantial similarity to pre-existing copyrighted material must be proven.[14] On this basis, Plaintiff’s suit is likely to fail in regard to derivative works.

The basic AI training of OpenAI seems to provide a more solid ground for Plaintiffs in this case, due to the nature of content ingestion through intermediate copying, which constitutes copyright infringement without a license to copy protected works.[15] However, it is undisputed that the advancement of generative AI carries profound impact for the future of technology and the role it plays in human’s lives. Plaintiffs do not dispute this, in fact they acknowledge the scientific advancement that ChatGPT and OpenAI present to society at large.[16] Fair use may be able to easily defeat a 17 U.S.C. § 106(1) action due to generative AI’s revolutionary capabilities. OpenAI’s argument for fair use is strong, under recent Supreme Court precedent (Oracle), to create “wholesale copies of a work as a preliminary step to develop a new, non-infringing product, even if the new product competes with the original,” does not constitute infringement.[17] OpenAI appears to also have support from legislation, codified by Congress in 1976, courts are instructed to adapt the application of fair use to bear in mind “rapid technological change” as new cases arise.[18]

While decisions on this case won’t become available for some time, maybe not until next year, it is clear that technology, and artificial intelligence, are evolving at a rate that outpaces the courts. This decision will have strong implications for the creation of future models, as new developers and young coders enter the scene. Meta has already made their AI platform open source for developers to freely download and build off of.[19] Will this decision mean that only big tech giants with the funds to license data for future training models will succeed, or will we have a new tech boom under the rise of AI as more and more smaller players enter the scene? We’ll have to wait and see.

Footnotes[+]

Ignacia Vasquez

Ignacia Vasquez is a second-year J.D. candidate at Fordham University School of Law and a staff member of the Intellectual Property, Media & Entertainment Law Journal. She holds a B.B.A in Business administration and Marketing with a double major in Gender Studies from the University of Notre Dame.