AI Copyright - Training & Inputs - AI-Training and Inputs Archives - Cokato Copyright Attorney: The Law Blog of Thomas James

Why Machine Training AI with Protected Works is Not Fair Use

… if the underlying goal of copyright’s exclusive rights and the fair use exception is to promote new “authorship,” this is doctrinally fatal to the proposal that training AIs on volumes of protected works favors a finding of fair use.

Guest blogger David Newhoff lays out the argument against the claim that training AI systems with copyright-protected works is fair use. David is the author of Who Invented Oscar Wilde? The Photograph at the Center of Modern American Copyright (Potomac Books 2020) and is a copyright advocate/writer at The Illusion of More.

As most copyright watchers already know, two lawsuits were filed at the start of the new year against AI visual works companies. In the U.S., a class-action was filed by visual artists against DeviantArt, Midjourney, and Stability AI; and in the UK, Getty Images is suing Stability AI. Both cases allege infringing use of large volumes of protected works fed into the systems to “train” the algorithms. Regardless of how these two lawsuits might unfold, I want to address the broad defense, already being argued in the blogosphere, that training generative AIs with volumes of protected works is fair use. I don’t think so.

Copyright advocates, skeptics, and even outright antagonists generally agree that the fair use exception, correctly applied, supports the broad aim of copyright law to promote more creative work. In the language of the Constitution, copyright “promotes the progress of science,” but a more accurate, modern description would be that copyright promotes new “authorship” because we do not tend to describe literature, visual arts, music, etc. as “science.”

The fair use doctrine, codified in the federal statute in 1976, originated as judge-made law, and from the seminal Folsom v. Marsh to the contemporary Andy Warhol Foundation v. Goldsmith, the courts have restated, in one way or another, their responsibility to balance the first author’s exclusive rights with a follow-on author’s interest in creating new expression. And as a matter of general principle, it is held that the public benefits from this balancing act because the result is a more diverse market of creative and cultural works.

Fair use defenses are case-by-case considerations and while there may be specific instances in which an AI purpose may be fair use, there are no blanket exceptions. More broadly, though, if the underlying goal of copyright’s exclusive rights and the fair use exception is to promote new “authorship,” this is doctrinally fatal to the proposal that training AIs on volumes of protected works favors a finding of fair use. Even if a court holds that other limiting doctrines render this activity by certain defendants to be non-infringing, a fair use defense should be rejected at summary judgment—at least for the current state of the technology, in which the schematic encompassing AI machine, AI developer, and AI user does nothing to promote new “authorship” as a matter of law.

The definition of “author” in U.S. copyright law means “human author,” and there are no exceptions to this anywhere in our history. The mere existence of a work we might describe as “creative” is not evidence of an author/owner of that work unless there is a valid nexus between a human’s vision and the resulting work fixed in a tangible medium. If you find an anonymous work of art on the street, absent further research, it has no legal author who can assert a claim of copyright in the work that would hold up in any court. And this hypothetical emphasizes the point that the legal meaning of “author” is more rigorous than the philosophical view that art without humans is oxymoronic. (Although it is plausible to find authorship in a work that combines human creativity with AI, I address that subject below.)

As a matter of law, the AI machine itself is disqualified as an “author” full stop. And although the AI owner/developer and AI user/customer are presumably both human, neither is defensibly an “author” of the expressions output by the AI. At least with the current state of technologies making headlines, nowhere in the process—from training the AI, to developing the algorithm, to entering prompts into the system—is there an essential link between those contributions and the individual expressions output by the machine. Consequently, nothing about the process of ingesting protected works to develop these systems in the first place can plausibly claim to serve the purpose of promoting new “authorship.”

**But What About the Google Books Case?**

Indeed. In the fair use defenses AI developers will present, we should expect to see them lean substantially on the holding in Authors Guild v. Google Books—a decision which arguably exceeds the purpose of fair use to promote new authorship. The Second Circuit, while acknowledging that it was pushing the boundaries of fair use, found the Google Books tool to be “transformative” for its novel utility in presenting snippets of books; and because that utility necessitates scanning whole books into its database, a defendant AI developer will presumably want to make the comparison. But a fair use defense applied to training AIs with volumes of protected works should fail, even under the highly utilitarian holding in Google Books.

While people of good intent can debate the legal merits of that decision, the utility of the Google Books search engine does broadly serve the interest of new authorship with a useful research tool—one I have used many times myself. Google Books provides a new means by which one author may research the works of another author, and this is immediately distinguishable from the generative AI which may be trained to “write books” without authors. Thus, not only does the generative AI fail to promote authorship of the individual works output by the system, but it fails to promote authorship in general.

Although the technology is primitive for the moment, these AIs are expected to “learn” exponentially and grow in complexity such that AIs will presumably compete with or replace at least some human creators in various fields and disciplines. Thus, an enterprise which proposes to diminish the number of working authors, whether intentionally or unintentionally, should only be viewed as devastating to the purpose of copyright law, including the fair use exception.

AI proponents may argue that “democratizing” creativity (i.e., putting these tools in every hand) promotes authorship by making everyone an author. But aside from the cultural vacuum this illusion of more would create, the user prompting the AI has a high burden to prove authorship, and it would really depend on what he is contributing relative to the AI. As mentioned above, some AIs may evolve as tools such that the human in some way “collaborates” with the machine to produce a work of authorship. But this hypothetical points to the reason why fair use is a fact-specific, case-by-case consideration. AI Alpha, which autonomously creates, or creates mostly without human direction, should not benefit from the potential fair use defense of AI Beta, which produces a tool designed to aid, but not replace, human creativity.

Broadly Transformative? Don’t Even Go There

Returning to the constitutional purpose of copyright law to “promote science,” the argument has already been floated as a talking point that training AI systems with protected works promotes computer science in general and is, therefore, “transformative” under fair use factor one for this reason. But this argument should find no purchase in court. To the extent that one of these neural networks might eventually spawn revolutionary utility in medicine or finance etc., it would be unsuitable to ask a court to hold that such voyages of general discovery fit the purpose of copyright, to say nothing of the likelihood that the adventure strays inevitably into patent law. Even the most elastic fair use findings to date reject such a broad defense.

It may be shown that no work(s) output by a particular AI infringes (copies) any of the works that went into its training. It may also be determined that the corpus of works fed into an AI is so rapidly atomized into data that even fleeting “reproduction” is found not to exist, and, thus, the 106(1) right is not infringed. Those questions are going to be raised in court before long, and we shall see where they lead. But to presume fair use as a broad defense for AI “training” is existentially offensive to the purpose of copyright, and perhaps to law in general, because it asks the courts to vest rights in non-humans, which is itself anathema to caselaw in other areas.[1]

It is my oft-stated opinion that creative expression without humans is meaningless as a cultural enterprise, but it is a matter of law to say that copyright is meaningless without “authors” and that there is no such thing as non-human “authors.” For this reason, the argument that training AIs on protected works is inherently fair use should be denied with prejudice.

*****
n.b.: The Copyright Office has issued New AI Copyright Guidance.

[1] Cetaceans v. Bush holding that animals do not have standing in court was the basis for rejecting PETA’S complaint against photographer Slater for infringing the copyright rights of the monkey in the “Monkey Selfie” fiasco.

Does AI Infringe Copyright?

A previous blog post addressed the question whether AI-generated creations are protected by copyright. This could be called the “output question” in the artificial intelligence area of copyright law. Another question is whether using copyright-protected works as input for AI generative processes infringes the copyrights in those works. This could be called the “input question.” Both kinds of questions are now before the courts. Minnesota attorney Tom James describes a framework for analyzing the input question.

The Input Question in AI Copyright Law

One of the top three generative-AI issues in copyright law is the question whether AI-generated creations are protected by copyright. This is what I call the “output question.” . Another question is whether using copyright-protected works as input for AI generative processes infringes the copyrights in those works. This could be called the “input question.” Both kinds of questions are now before the courts. In this blog post, I describe a framework for analyzing the input question.

The Cases

The Getty Images lawsuit

Getty Images is a stock photograph company. It licenses the right to use the images in its collection to those who wish to use them on their websites or for other purposes. Stability AI is the creator of Stable Diffusion, which is described as a “text-to-image diffusion model capable of generating photo-realistic images given any text input.” In January, 2023, Getty Images initiated legal proceedings in the United Kingdom against Stability AI. According to this Getty Images Statement, the company is claiming that Stability AI violated copyrights by using their images and metadata to train AI software without a license.

The independent artists lawsuit

Another lawsuit raising the question whether AI-generated output infringes copyright has been filed in the United States. In Andersen v. Stability AI, a group of visual artists are seeking class action status for claims against Stability AI, Midjourney Inc. and DeviantArt Inc. The artists claim that the companies use their images to train computers “to produce seemingly new images through a mathematical software process.” They describe AI-generated artwork as “collages” made in violation of copyright owners’ exclusive right to create derivative works.

The GitHut Copilot lawsuit

In November, 2022, Github announced a class action lawsuit has been filed in a U.S. federal court against GitHub, Microsoft, and OpenAI. The lawsuit claims the GitHut Copilot and OpenAI Codex coding assistant services use existing code to generate new code. By training their AI systems on open source programs, the plaintiffs claim, the defendants have allegedly infringed the rights of developers who have posted code under open-source licenses that require attribution.

How AI Works

AI, of course, stands for artificial intelligence. Almost all AI techniques involve machine learning. Machine learning, in turn, involves using a computer algorithm to make a machine improve its performance over time, without having to pre-program it with specific instructions. Data is input to enable the machine to do this. For example, to teach a machine to create a work in the style of Vincent van Gogh, many instances of van Gogh’s works would be input. The AI program contains numerous nodes that focus on different aspects of an image. Working together, these nodes will then piece together common elements of a van Gogh painting from the images the machine has been given to analyze. After going through many images of van Gogh paintings, the machine “learns” the features of a typical Van Gogh painting. The machine can then generate a new image containing these features.

In the same way, a machine can be programmed to analyze many instances of code and generate new code.

The input question comes down to this: Does creating or using a program that causes a machine to receive information about the characteristics of a creative work or group of works for the purpose of creating a new work that has the same or similar characteristics infringe the copyright in the creative work(s) that the machine uses in this way?

The Exclusive Rights of Copyright Owners

In the United States, the owner of a copyright in a work has the exclusive rights to:

reproduce (make copies of) it;
distribute copies of it;
publicly perform it;
publicly display it; and
make derivative works based on it.

(17 U.S.C. § 106). A copyright is infringed when a person exercises any of these exclusive rights without the copyright owner’s permission.

Copyright protection extends only to expression, however. Copyright does not protect ideas, facts, processes, methods, systems or principles.

Direct Infringement

Infringement can be either direct or indirect. Direct infringement occurs when somebody directly violates one of the exclusive rights of a copyright owner. Examples would be a musician who performs a copyright-protected song in public without permission, or a cartoonist who creates a comic based on the Batman and Robin characters and stories without permission.

The kind of tool an infringer uses is not of any great moment. A writer who uses word-processing software to write a story that is simply a copy of someone else’s copyright-protected story is no less guilty of infringement merely because the actual typewritten letters were generated using a computer program that directs a machine to reproduce and display typographical characters in the sequence a user selects.

Contributory and Vicarious Infringement

Infringement liability may also arise indirectly. If one person knowingly induces another person to infringe or contributes to the other person’s infringement in some other way, then each of them may be liable for copyright infringement. The person who actually committed the infringing act could be liable for direct infringement. The person who knowingly encouraged, solicited, induced or facilitated the other person’s infringing act(s) could be liable for contributory infringement.

Vicarious infringement occurs when the law holds one person responsible for the conduct of another because of the nature of the legal relationship between them. The employment relationship is the most common example. An employer generally is held responsible for an employee’s conduct, provided the employee’s acts were performed within the course and scope of the employment. Copyright infringement is not an exception to that rule.

Programmer vs. User

Direct infringement liability

Under U.S. law, machines are treated as extensions of the people who set them in motion. A camera, for example, is an extension of the photographer. Any images a person causes a camera to generate by pushing a button on it is considered the creation of the person who pushed the button, not of the person(s) who manufactured the camera, much less of the camera itself. By the same token, a person who uses the controls on a machine to direct it to copy elements of other people’s works should be considered the creator of the new work so created. If using the program entails instructing the machine to create an unauthorized derivative work of copyright-protected images, then it would be the user, not the machine or the software writer, who would be at risk of liability for direct copyright infringement.

Contributory infringement liability

Knowingly providing a device or mechanism to people who use it to infringe copyrights creates a risk of liability for contributory copyright infringement. Under Sony Corp. v. Universal City Studios, however, merely distributing a mechanism that people can use to infringe copyrights is not enough for contributory infringement liability to attach, if the mechanism has substantial uses for which copyright infringement liability does not attach. Arguably, AI has many such uses. For example, it might be used to generate new works from public domain works. Or it might be used to create parodies. (Creating a parody is fair use; it should not result in infringement liability.)

The situation is different if a company goes further and induces, solicits or encourages people to use its mechanism to infringe copyrights. Then it may be at risk of contributory liability. As the United States Supreme Court has said, “one who distributes a device with the object of promoting its use to infringe copyright, as shown by clear expression or other affirmative steps taken to foster infringement, is liable for the resulting acts of infringement by third parties.” Metro-Goldwyn-Mayer Studios Inc. v. Grokster, Ltd., 545 U.S. 913, 919 (2005). (Remember Napster?)

Fair Use

If AI-generated output is found to either directly or indirectly infringe copyright(s), the infringer nevertheless might not be held liable, if the infringement amounts to fair use of the copyrighted work(s) that were used as the input for the AI-generated work(s).

Ever since some rap artists began using snippets of copyright-protected music and sound recordings without permission, courts have embarked on a treacherous expedition to articulate a meaningful dividing line between unauthorized derivative works, on one hand, and unauthorized transformative works, on the other. Although the Copyright Act gives copyright owners the exclusive right to create works based on their copyrighted works (called derivative works), courts have held that an unauthorized derivative work may be fair use if it is “transformative.: This has caused a great deal of uncertainty in the law, particularly since the U.S. Copyright Act expressly defines a derivative work as one that transforms another work. (See 17 U.S.C. § 101: “A ‘derivative work’ is a work based upon one or more preexisting works, . . . or any other form in which a work may be recast, transformed, or adapted.” (emphasis added).)

When interpreting and applying the transformative use branch of Fair Use doctrine, courts have issued conflicting and contradictory decisions. As I wrote in another blog post, the U.S. Supreme Court has recently agreed to hear and decide Andy Warhol Foundation for the Visual Arts v. Goldsmith. It is anticipated that the Court will use this case to attempt to clear up all the confusion around the doctrine. It is also possible the Court might take even more drastic action concerning the whole “transformative use” branch of Fair Use.

Some speculate that the questions the Justices asked during oral arguments in Warhol signal a desire to retreat from the expansion of fair use that the “transformativeness” idea spawned. On the other hand, some of the Court’s recent decisions, such as Google v. Oracle, suggest the Court is not particularly worried about large-scale copyright infringing activity, insofar as Fair Use doctrine is concerned.

Conclusion

To date, it does not appear that there is any direct legal precedent in the United States for classifying the use of mass quantities of works as training tools for AI as “fair use.” It seems, however, that there soon will be precedent on that issue, one way or the other. Several lawsuits raising this generative-AI copyright issue are pending in the courts. In the meantime, generative-AI system users should proceed with caution.

But What About the Google Books Case?