Why Machine Training AI with Protected Works is Not Fair Use

… if the underlying goal of copyright’s exclusive rights and the fair use exception is to promote new “authorship,” this is doctrinally fatal to the proposal that training AIs on volumes of protected works favors a finding of fair use.

Guest blogger David Newhoff lays out the argument against the claim that training AI systems with copyright-protected works is fair use. David is the author of Who Invented Oscar Wilde? The Photograph at the Center of Modern American Copyright (Potomac Books 2020) and is a copyright advocate/writer at The Illusion of More.


As most copyright watchers already know, two lawsuits were filed at the start of the new year against AI visual works companies. In the U.S., a class-action was filed by visual artists against DeviantArt, Midjourney, and Stability AI; and in the UK, Getty Images is suing Stability AI. Both cases allege infringing use of large volumes of protected works fed into the systems to “train” the algorithms. Regardless of how these two lawsuits might unfold, I want to address the broad defense, already being argued in the blogosphere, that training generative AIs with volumes of protected works is fair use. I don’t think so.

Copyright advocates, skeptics, and even outright antagonists generally agree that the fair use exception, correctly applied, supports the broad aim of copyright law to promote more creative work. In the language of the Constitution, copyright “promotes the progress of science,” but a more accurate, modern description would be that copyright promotes new “authorship” because we do not tend to describe literature, visual arts, music, etc. as “science.”

The fair use doctrine, codified in the federal statute in 1976, originated as judge-made law, and from the seminal Folsom v. Marsh to the contemporary Andy Warhol Foundation v. Goldsmith, the courts have restated, in one way or another, their responsibility to balance the first author’s exclusive rights with a follow-on author’s interest in creating new expression. And as a matter of general principle, it is held that the public benefits from this balancing act because the result is a more diverse market of creative and cultural works.

Fair use defenses are case-by-case considerations and while there may be specific instances in which an AI purpose may be fair use, there are no blanket exceptions. More broadly, though, if the underlying goal of copyright’s exclusive rights and the fair use exception is to promote new “authorship,” this is doctrinally fatal to the proposal that training AIs on volumes of protected works favors a finding of fair use. Even if a court holds that other limiting doctrines render this activity by certain defendants to be non-infringing, a fair use defense should be rejected at summary judgment—at least for the current state of the technology, in which the schematic encompassing AI machine, AI developer, and AI user does nothing to promote new “authorship” as a matter of law.

The definition of “author” in U.S. copyright law means “human author,” and there are no exceptions to this anywhere in our history. The mere existence of a work we might describe as “creative” is not evidence of an author/owner of that work unless there is a valid nexus between a human’s vision and the resulting work fixed in a tangible medium. If you find an anonymous work of art on the street, absent further research, it has no legal author who can assert a claim of copyright in the work that would hold up in any court. And this hypothetical emphasizes the point that the legal meaning of “author” is more rigorous than the philosophical view that art without humans is oxymoronic. (Although it is plausible to find authorship in a work that combines human creativity with AI, I address that subject below.)

As a matter of law, the AI machine itself is disqualified as an “author” full stop. And although the AI owner/developer and AI user/customer are presumably both human, neither is defensibly an “author” of the expressions output by the AI. At least with the current state of technologies making headlines, nowhere in the process—from training the AI, to developing the algorithm, to entering prompts into the system—is there an essential link between those contributions and the individual expressions output by the machine. Consequently, nothing about the process of ingesting protected works to develop these systems in the first place can plausibly claim to serve the purpose of promoting new “authorship.”

But What About the Google Books Case?

Indeed. In the fair use defenses AI developers will present, we should expect to see them lean substantially on the holding in Authors Guild v. Google Books—a decision which arguably exceeds the purpose of fair use to promote new authorship. The Second Circuit, while acknowledging that it was pushing the boundaries of fair use, found the Google Books tool to be “transformative” for its novel utility in presenting snippets of books; and because that utility necessitates scanning whole books into its database, a defendant AI developer will presumably want to make the comparison. But a fair use defense applied to training AIs with volumes of protected works should fail, even under the highly utilitarian holding in Google Books.

While people of good intent can debate the legal merits of that decision, the utility of the Google Books search engine does broadly serve the interest of new authorship with a useful research tool—one I have used many times myself. Google Books provides a new means by which one author may research the works of another author, and this is immediately distinguishable from the generative AI which may be trained to “write books” without authors. Thus, not only does the generative AI fail to promote authorship of the individual works output by the system, but it fails to promote authorship in general.

Although the technology is primitive for the moment, these AIs are expected to “learn” exponentially and grow in complexity such that AIs will presumably compete with or replace at least some human creators in various fields and disciplines. Thus, an enterprise which proposes to diminish the number of working authors, whether intentionally or unintentionally, should only be viewed as devastating to the purpose of copyright law, including the fair use exception.

AI proponents may argue that “democratizing” creativity (i.e., putting these tools in every hand) promotes authorship by making everyone an author. But aside from the cultural vacuum this illusion of more would create, the user prompting the AI has a high burden to prove authorship, and it would really depend on what he is contributing relative to the AI. As mentioned above, some AIs may evolve as tools such that the human in some way “collaborates” with the machine to produce a work of authorship. But this hypothetical points to the reason why fair use is a fact-specific, case-by-case consideration. AI Alpha, which autonomously creates, or creates mostly without human direction, should not benefit from the potential fair use defense of AI Beta, which produces a tool designed to aid, but not replace, human creativity.

Broadly Transformative? Don’t Even Go There

Returning to the constitutional purpose of copyright law to “promote science,” the argument has already been floated as a talking point that training AI systems with protected works promotes computer science in general and is, therefore, “transformative” under fair use factor one for this reason. But this argument should find no purchase in court. To the extent that one of these neural networks might eventually spawn revolutionary utility in medicine or finance etc., it would be unsuitable to ask a court to hold that such voyages of general discovery fit the purpose of copyright, to say nothing of the likelihood that the adventure strays inevitably into patent law. Even the most elastic fair use findings to date reject such a broad defense.

It may be shown that no work(s) output by a particular AI infringes (copies) any of the works that went into its training. It may also be determined that the corpus of works fed into an AI is so rapidly atomized into data that even fleeting “reproduction” is found not to exist, and, thus, the 106(1) right is not infringed. Those questions are going to be raised in court before long, and we shall see where they lead. But to presume fair use as a broad defense for AI “training” is existentially offensive to the purpose of copyright, and perhaps to law in general, because it asks the courts to vest rights in non-humans, which is itself anathema to caselaw in other areas.[1]

It is my oft-stated opinion that creative expression without humans is meaningless as a cultural enterprise, but it is a matter of law to say that copyright is meaningless without “authors” and that there is no such thing as non-human “authors.” For this reason, the argument that training AIs on protected works is inherently fair use should be denied with prejudice.


[1] Cetaceans v. Bush holding that animals do not have standing in court was the basis for rejecting PETA’S complaint against photographer Slater for infringing the copyright rights of the monkey in the “Monkey Selfie” fiasco.


A Thousand Cuts: AI and Self-Destruction

David Newhoff comments on generative AI (artificial intelligence) and public policy.

A guest post written by David Newhoff. AI, of course, stands for “artificial intelligence.” David is the author of Who Invented Oscar Wilde? The Photograph at the Center of Modern American Copyright (Potomac Books 2020) and a copyright advocate/writer at The Illusion of More.


I woke up the other day thinking about artificial intelligence (AI) in context to the Cold War and the nuclear arms race, and curiously enough, the next two articles I read about AI made arms race references. Where my pre-caffeinated mind had gone was back to the early 1980s when, as teenagers, we often asked that futile question as to why any nation needed to stockpile nuclear weapons in quantities that could destroy the world many times over.

Every generation of adolescents believes—and at times confirms—that the adults have no idea what the hell they’re doing; and watching the MADness of what often seemed like a rapturous embrace of nuclear annihilation was, perhaps, the unifying existential threat which shaped our generation’s world view. Since then, reasonable arguments have been made that nuclear stalemate has yielded an unprecedented period of relative global peace, but the underlying question remains:  Are we powerless to stop the development of new modes of self-destruction?

Of course, push-button extinction is easy to imagine and, in a way, easy to ignore. If something were to go terribly wrong, and the missiles fly, it’s game over in a matter of minutes with no timeouts left. So, it is possible to “stop worrying” if not quite “love the bomb” (h/t Strangelove); but today’s technological threats preface outcomes that are less merciful than swift obliteration. Instead, they offer a slow and seemingly inexorable decline toward the dystopias of science fiction—a future in which we are not wiped out in a flash but instead “amused to death” (h/t Postman) as we relinquish humanity itself to the exigencies of technologies that serve little or no purpose.

The first essay I read about AI, written by Anja Kaspersen and Wendell Wallach for the Carnegie Council, advocates a “reset” in ethical thinking about AI, arguing that giant technology investments are once again building systems with little consideration for their potential effect on people. “In the current AI discourse we perceive a widespread failure to appreciate why it is so important to champion human dignity. There is risk of creating a world in which meaning and value are stripped from human life,” the authors write. Later, they quote Robert Oppenheimer …

It is not possible to be a scientist unless you believe that the knowledge of the world, and the power which this gives, is a thing which is of intrinsic value to humanity, and that you are using it to help in the spread of knowledge, and are willing to take the consequences.

I have argued repeatedly that generative AI “art” is devoid of meaning and value and that the question posed by these technologies is not merely how they might influence copyright law, but whether they should exist at all. It may seem farfetched to contemplate banning or regulating the development of AI tech, but it should not be viewed as an outlandish proposal. If certain AI developments have the capacity to dramatically alter human existence—perhaps even erode what it means to be human—why is this any less a subject of public policy than regulating a nuclear power plant or food safety?

Of course, public policy means legislators, and it is quixotic to believe that any Congress, let alone the current one, could sensibly address AI before the industry causes havoc. At best, the tech would flood the market long before the most sincere, bipartisan efforts of lawmakers could grasp the issues; and at worst, far too many politicians have shown that they would sooner exploit these technologies for their own gain than they would seek to regulate it in the public interest. “AI applications are increasingly being developed to track and manipulate humans, whether for commercial, political, or military purposes, by all means available—including deception,” write Kaspersen and Wallach. I think it’s fair to read that as Cambridge Analytica 2.0 and to recognize that the parties who used the Beta version are still around—and many have offices on Capitol Hill.

Kaspersen and Wallach predict that we may soon discover that generative AI will have the same effect on education that “social media has had on truth.” In response, I would ask the following: In the seven years since the destructive power of social media became headline news, have those revelations significantly changed the conversation, let alone muted the cyber-libertarian dogma of the platform owners? I suspect that AI in the classroom threatens to exacerbate rather than parallel the damage done by social media to truth (i.e., reason). If social media has dulled Socratic skills with the flavors of narcissism, ChatGPT promises a future that does not remember what Socratic skills used to mean.

And that brings me to the next article I read in which Chris Gillard and Pete Rorabaugh, writing for Slate, use “arms race” as a metaphor to criticize technological responses to the prospect of students cheating with AI systems like ChatGPT. Their article begins:

In the classroom of the future—if there still are any—it’s easy to imagine the endpoint of an arms race: an artificial intelligence that generates the day’s lessons and prompts, a student-deployed A.I. that will surreptitiously do the assignment, and finally, a third-party A.I. that will determine if any of the pupils actually did the work with their own fingers and brain. Loop complete; no humans needed. If you were to take all the hype about ChatGPT at face value, this might feel inevitable. It’s not.

In what I feared might be another tech-apologist piece labeling concern about AI a “moral panic,” Gillard and Rorabaugh make the opposite point. Their criticism of software solutions to mitigate student cheating is that it is small thinking which erroneously accepts as a fait accompli that these AI systems are here to stay whether we like it or not. “Telling us that resistance to a particular technology is futile is a favorite talking point for technologists who release systems with few if any guardrails out into the world and then put the onus on society to address most of the problems that arise,” they write.

In other words, here we go again. The ethical, and perhaps legal, challenges posed by AI are an extension of the same conversation we generally failed to have about social media and its cheery promises to be an engine of democracy. “It’s a failure of imagination to think that we must learn to live with an A.I. writing tool just because it was built,” Gillard and Rorabaugh argue. I would like to agree but am skeptical that the imagination required to reject certain technologies exists outside the rooms where ethicists gather. And this is why I wake up thinking about AI in context to the Cold War, except of course that the doctrine of Mutually Assured Destruction was rational by contrast.


Photo by the author.

View the original article on The Illusion of More.

Contact attorney Tom James for copyright help

Need help registering a copyright or a group of copyrights in the United States, or enforcing a copyright in the United States? Contact attorney Tom James.

%d bloggers like this: