AI Copyright - Training & Inputs - AI-Training and Inputs Archives - Cokato Copyright Attorney: The Law Blog of Thomas James

Court Rules AI Training is Fair Use

Court rules that using copyrighted works to train AI is fair use. Kadrey et al. v. Meta Platforms.

Just days after the first major fair use ruling in a generative-AI case, a second court has determined that using copyrighted works to train AI is fair use. Kadrey et al. v. Meta Platforms, No. 3:23-cv-03417-VC (N.D. Cal. June 25, 2025).

The Kadrey v. Meta Platforms Lawsuit

I previously wrote about this lawsuit here and here.

Meta Platforms owns and operates social media services including Facebook, Instagram, and WhatsApp. It is also the developer of a large language model (LLM) called “Llama.” One of its releases, Meta AI, is an AI chatbot that utilizes Llama.

To train its AI, Meta obtained data from a wide variety of sources. The company initially pursued licensing deals with book publishers. It turned out, though, that in many cases, individual authors owned the copyrights. Unlike music, no organization handles collective licensing of rights in book content. Meta then downloaded shadow library databases. Instead of licensing works in the databases, Meta decided to just go ahead and use them without securing licenses. To download them more quickly, Meta torrented them using BitTorrent.

Meta trained its AI models to prevent them from “memorizing” and outputting text from the training data, with the result that no more than 50 words and punctuation marks from any given work were reproduced in any given output.

The plaintiffs named in the Complaint are thirteen book authors who have published novels, plays, short stories, memoirs, essays, and nonfiction books. Sarah Silverman, author of The Bedwetter; Junot Diaz, author of The Brief Wondrous Life of Oscar Wao; and Andrew Sean Greer, author of Less, are among the authors named as plaintiffs in the lawsuit. The complaint alleges that Meta downloaded 666 copies of their books without permission and states claims for direct copyright infringement, vicarious copyright infringement, removal of copyright management information in violation of the Digital Millennium Copyright Act (DMCA), and various state law claims. All claims except the ones for direct copyright infringement and violation of the DMCA were dismissed in prior proceedings.

Both sides moved for summary judgment on fair use with respect to the claim that Meta’s use of the copyrighted works to train its AI infringed copyrights. Meta moved for summary judgment on the DMCA claims. Neither side moved for summary judgment on a claim that Meta infringed copyrights by distributing their works (via leeching or seeding).

On June 25, 2025 Judge Chhabria granted Meta’s motion for summary judgment on fair use with respect to AI training; reserved the motion for summary judgment on the DMCA claims for decision in a separate order, and held that the claim of infringing distribution via leeching or seeding “will remain a live issue in the case.”

Judge Chhabria’s Fair Use Analysis

Judge Chhabria analyzed each of the four fair use factors. As is the custom, he treated the first (Character or purpose of the use) and fourth (Effect on the market for the work) factors as the most important of the four.

He disposed of the first factor fairly easily, as Judge Alsup did in Bartz v. Anthropic, finding that the use of copyrighted works to train AI is a transformative use. This finding weighs heavily in favor of fair use. The purpose of Meta’s AI tools is not to generate books for people to read. Indeed, in this case, Meta had installed guardrails to prevent the tools from generating duplicates or near-duplicates of the books on which the AI was trained. Moreover, even if it could allow a user to prompt the creation of a book “in the style of” a specified author, there was no evidence that it could produce an identical work or a work that was substantially similar to one on which it had been trained. And writing styles are not copyrightable.

Significantly, the judge held that the use of shadow libraries to obtain unauthorized copies of books does not necessarily destroy a fair use defense. When the ultimate use to be made of a work is transformative, the downloading of books to further that use is also transformative, the judge wrote. This ruling contrasts with other judges who have intimated that using pirated copies of works weighs against, or may even prevent, a finding of fair use.

Unlike some judges, who tend to consider the fair use analysis over and done if transformative use is found, Judge Chhabria recognized that even if the purpose of the use is transformative, its effect on the market for the infringed work still has to be considered.

3 Ways of Proving Adverse Market Effect

The Order lays out three potential kinds of arguments that may be advanced to establish the adverse effect of an infringing use on the market for the work:

The infringing work creates a market substitute for the work;
Use of the work to train AI without permission deprives copyright owners of a market for licenses to use their works in AI training;
Dilution of the market with competing works.

Market Substitution

In this case, direct market substitution could not be established because Meta had installed guardrails that prevented users from generating copies of works that had been used in the training. Its AI tools were incapable of generating copies of the work that could serve as substitutes for the authors’ works.

The Market for AI Licenses

The court refused to recognize the loss of potential profits from licensing the use of a work for AI training purposes as a cognizable harm.

Market Dilution

The argument here would be that the generation of many works that compete in the same market as the original work on which the AI was trained dilutes the market for the original work. Judge Chhabria described this as indirect market substitution.

The copyright owners in this case, however, focused on the first two arguments. They did not present evidence that Meta’a AI tools were capable of generating books; that they do, in fact, generate books; or that the books they generate or are capable of generating compete with books these authors wrote. There was no evidence of diminished sales of their books.

Market harm cannot be assumed when generated copies are not copies that can serve as substitutes for the specific books claimed to have been infringed. When the output is transformative, as it was in this case, market substitution is not self-evident.

Judge Chhabria chided the plaintiffs for making only a “half-hearted argument” of a significant threat of market harm. He wrote that they presented “no meaningful evidence on market dilution at all.”

Consequently, he ruled that the fourth fair use factor favored Meta.

Conclusion

The decision in this case is as significant for what the court didn’t do as it is for what it did. It handed a fair use victory to Meta. At the same time, though, it did not rule out a finding that training AI tools on copyrighted works is not fair use in an appropriate case. The court left open the possibility that a copyright owner might prevail on a claim that training AI on copyrighted works is not fair use in a different case. And it pointed the way, albeit in dictum, namely, by making a strong showing of market dilution.

That claim is not far-fetched. https://www.wired.com/story/scammy-ai-generated-books-flooding-amazon/

AI OK; Piracy Not: Bartz v. Anthropic

Anthropic also acquired infringing copies of works from pirate sites. Judge Alsup ruled that these, and uses made from them, are not fair use.

A federal judge has issued a landmark fair use decision in a generative-AI copyright infringement lawsuit.

In a previous blog post, I wrote about the fair use decision in Thomson Reuters v. ROSS. As I explained there, that case involved a search-and-retrieval AI system, so the holding was not determinative of fair use in the context of generative AI. Now we finally have a decision that addresses fair use in the generative-AI context.

Bartz et al. v. Anthropic PBC

Anthropic is an AI software firm founded by former OpenAI employees. It offers a generative-AI tool called Claude. Like other generative-AI tools, Claude mimics human conversational skills. When a user enters a text prompt, Claude will generate a response that is very much like one a human being might make (except it is sometimes more knowledgeable.) It is able to do this by using large language models (LLMs) that have been trained on millions of books and texts.

Adrea Bartz, Charles Graeber, and Kirk Wallace Johnson are book authors. In August 2024, they sued Anthropic, claiming the company infringed the copyrights in their works. Specifically, they alleged that Anthropic copied their works from pirated and purchased sources, digitized print versions, assembled them into a central library, and used the library to train LLMs, all without permission. Anthropic asserted, among other things, a fair use defense.

Earlier this year, Anthropic filed a motion for summary judgment on the question of fair use.

On June 23, 2025, Judge Alsup issued an Order granting summary judgment in part and denying it in part. It is the first major ruling on fair use in the dozens of generative-AI copyright infringement lawsuits that are currently pending in federal courts.

The Order includes several key rulings.

Digitization

Anthropic acquired both pirated and lawfully purchased printed copies of copyright-protected works and digitized them to create a central e-library. Authors claimed that making digital copies of their works infringed the exclusive right of copyright owners to reproduce their works. (See 17 U.S.C. 106.)

In the process of scanning print books to create digital versions of them, the print copies were destroyed. Book bindings were stripped so that each individual page could be scanned. The print copies were then discarded. The digital copies were not distributed to others. Under these circumstances, the court ruled that making digital versions of print books is fair use.

The court likened format to a frame around a work, as distinguished from the work itself. As such, a digital version is not a new derivative work. Rather, it is a transformative use of an existing work. So long as the digital version is merely a substitute for a print version a person has lawfully acquired, and so long as the print version is destroyed and the digital version is not further copied or distributed to others, then digitizing a printed work is fair use. This is consistent with the first sale doctrine (17 U.S.C. 109(a)), which gives the purchaser of a copy of a work a right to dispose of that particular copy as the purchaser sees fit.

In short, the mere conversion of a lawfully acquired print book to a digital file to save space and enable searchability is transformative, and so long as the print version is destroyed and the digital version is not further copied or distributed, it is fair use.

AI Training Is Transformative Fair Use

The authors did not contend that Claude generated infringing output. Instead, they argued that copies of their works were used as inputs to train the AI. The Copyright Act, however, does not prohibit or restrict the reading or analysis of copyrighted works. So long as a copy is lawfully purchased, the owner of the purchased copy can read it and think about it as often as he or she wishes.

[I]f someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not.

Order.

Judge Alsup described AI training as “spectacularly” transformative.” Id. After considering all four fair use factors, he concluded that training AI on lawfully acquired copyright-protected works (as distinguished from the initial acquisition of copies) is fair use.

Pirating Is Not Fair Use

In addition to lawfully purchasing copies of some works, Anthropic also acquired infringing copies of works from pirate sites. Judge Alsup ruled that these, and uses made from them, are not fair use. The case will now proceed to trial on the issue of damages resulting from the infringement.

Conclusion

Each of these rulings seems, well, sort of obvious. It is nice to have the explanations laid out so clearly in one place, though.

Fair Use Decision in Thomson Reuters v. Ross

A court has handed down the first known ruling (to me, anyway) on “fair use” in the wave of copyright infringement lawsuits against AI companies that are pending in federal courts.

A court has handed down the first known ruling (to me, anyway) on “fair use” in the wave of copyright infringement lawsuits against AI companies that are pending in federal courts. The ruling came in Thomas Reuters v. Ross. Thomas Reuters filed this lawsuit against Ross Intelligence back in 2020, alleging that Ross trained its AI models on Westlaw headnotes to build a competing legal research tool, infringing numerous copyrights in the process. Ross asserted a fair use defense.

In 2023, Thomson Reuters sought summary judgment against Ross on the fair use defense. At that time, Judge Bibas denied the motion. This week, however, the judge reversed himself, knocking out at least a major portion of the fair use defense.

Ross had argued that Westlaw headnotes are not sufficiently original to warrant copyright protection and that even if they are, the use made of them was “fair use.” After painstakingly reviewing the headnotes and comparing them with the database materials, he concluded that 2,243 headnotes were sufficiently original to receive copyright protection, that Ross infringed them, and that “fair use” was not a defense in this instance because the purpose of the use was commercial and it competed in the same market with Westlaw. Because of that, it was likely to have an adverse impact on the market for Westlaw.

While this might seem to spell the end for AI companies in the many other lawsuits where they are relying on a “fair use” defense, that is not necessarily so. As Judge Bibas noted, the Ross AI was non-generative. Generative AI tools may be distinguishable in the fair use analysis.

I will be presenting a program on Recent Developments in AI Law in New Jersey this summer. This one certainly will merit mention. Whether any more major developments will come to pass between now and then remains to be seen.

New AI Copyright Infringement Lawsuit

Another copyright and trademark infringement lawsuit against an AI company was filed this week. This one pits news article publishers Advance Local Media, Condé Nast, The Atlantic, Forbes Media, The Guardian, Business Insider, LA Times, McClatchy Media Company, Newsday, Plain Dealer Publishing Company, POLITICO, The Republican Company, Toronto Star Newspapers, and Vox Media against AI company Cohere.

The complaint alleges that Cohere made unauthorized use of publisher content in developing and operating its generative AI systems, infringing numerous copyrights and trademarks. The plaintiffs are seeking an injunction and monetary damages.

Case Update: Andersen v. Stability AI

unlicensed use of copyright-protected artistic works in generative-AI systems.

Artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a class action lawsuit against Stability AI, DeviantArt, and MidJourney in federal district court alleging causes of action for copyright infringement, removal or alteration of copyright management information, and violation of publicity rights. (Andersen, et al. v. Stability AI Ltd. et al., No. 23-cv-00201-WHO (N.D. Calif. 2023).) The claims relate to the defendants’ alleged unlicensed use of their copyright-protected artistic works in generative-AI systems.

On October 30, 2023, U.S. district judge William H. Orrick dismissed all claims except for Andersen’s direct infringement claim against Stability. Most of the dismissals, however, were granted with leave to amend.

The Claims

McKernan’s and Ortiz’s copyright infringement claims

The judge dismissed McKernan’s and Ortiz’s copyright infringement claims because they did not register the copyrights in their works with the U.S. Copyright Office.

I criticized the U.S. requirement of registration as a prerequisite to the enforcement of a domestic copyright in a U.S. court in a 2019 Illinois Law Review article (“Copyright Enforcement: Time to Abolish the Pre-Litigation Registration Requirement.”) This is a uniquely American requirement. Moreover, the requirement does not apply to foreign works. This has resulted in the anomaly that foreign authors have an easier time enforcing the copyrights in their works in the United States than U.S. authors do. Nevertheless, until Congress acts to change this, it is still necessary for U.S. authors to register their copyrights with the U.S. Copyright Office before they can enforce their copyrights in U.S. courts.

Since there was no claim that McKernan or Ortiz had registered their copyrights, the judge had no real choice under current U.S. copyright law but to dismiss their claims.

Andersen’s copyright infringement claim against Stability

Andersen’s complaint alleges that she “owns a copyright interest in over two hundred Works included in the Training Data” and that Stability used some of them as training data. Defendants moved to dismiss this claim because it failed to specifically identify which of those works had been registered. The judge, however, determined that her attestation that some of her registered works had been used as training images sufficed, for pleading purposes. A motion to dismiss tests the sufficiency of a complaint to state a claim; it does not test the truth or falsity of the assertions made in a pleading. Stability can attempt to disprove the assertion later in the proceeding. Accordingly, Judge Orrick denied Stability’s motion to dismiss Andersen’s direct copyright infringement claim.

Andersen’s copyright infringement claims against DeviantArt and MidJourney

The complaint alleges that Stability created and released a software program called Stable Diffusion and that it downloaded copies of billions of copyrighted images (known as “training images”), without permission, to create it. Stability allegedly used the services of LAION (LargeScale Artificial Intelligence Open Network) to scrape the images from the Internet. Further, the complaint alleges, Stable Diffusion is a “software library” providing image-generating service to the other defendants named in the complaint. DeviantArt offers an online platform where artists can upload their works. In 2022, it released a product called “DreamUp” that relies on Stable Diffusion to produce images. The complaint alleges that artwork the plaintiffs uploaded to the DeviantArt site was scraped into the LAION database and then used to train Stable Diffusion. MidJourney is also alleged to have used the Stable Diffusion library.

Judge Orrick granted the motion to dismiss the claims of direct infringement against DeviantArt and MidJourney, with leave to amend the complaint to clarify the theory of liability.

DMCA claims

The complaint makes allegations about unlawful removal of copyright management information in violation of the Digital Millennium Copyright Act (DMCA). Judge Orrick found the complaint deficient in this respect, but granted leave to amend to clarify which defendant(s) are alleged to have done this, when it allegedly occurred, and what specific copyright management information was allegedly removed.

Publicity rights claims

Plaintiffs allege that the defendants used their names in their products by allowing users to request the generation of artwork “in the style of” their names. Judge Orrick determined the complaint did not plead sufficient factual allegations to state a claim. Accordingly, he dismissed the claim, with leave to amend. In a footnote, the court deferred to a later time the question whether the Copyright Act preempts the publicity claims.

In addition, DeviantArt filed a motion to strike under California’s Anti-SLAPP statute. The court deferred decision on that motion until after the Plaintiffs have had time to file an amended complaint.

Unfair competition claims

The court also dismissed plaintiffs’ claims of unfair competition, with leave to amend.

Breach of contract claim against DeviantArt

Plaintiffs allege that DeviantArt violated its own Terms of Service in connection with their DreamUp product and alleged scraping of works users upload to the site. This claim, too, was dismissed with leave to amend.

Conclusion

Media reports have tended to overstate the significance of Judge Orrick’s October 30, 2023 Order. Reports of the death of the lawsuit are greatly exaggerated. It would have been nice if greater attention had been paid to the registration requirement during the drafting of the complaint, but the lawsuit nevertheless is still very much alive. We won’t really know whether it will remain that way unless and until the plaintiffs amend the complaint – which they are almost certainly going to do.

Need help with copyright registration? Contact attorney Tom James.

Another AI lawsuit against Microsoft and OpenAI

A.T. v. OpenAI LP, U.S. District Court for the Northern District of California

Last June, Microsoft, OpenAI and others were hit with a class action lawsuit involving their AI data-scraping technologies. On Tuesday (September 5, 2023) another class action lawsuit was filed against them. The gravamen of both of these complaints is that these companies allegedly trained their AI technologies using personal information from millions of users, in violation of federal and state privacy statutes and other laws.

Among the laws alleged to have been violated are the Electronic Communications Privacy Act, the Computer Fraud and Abuse Act, the California Invasion of Privacy Act, California’s unfair competition law, Illinois’s Biometric Information Privacy Act, and the Illinois Consumer Fraud and Deceptive Business Practices Act. The lawsuits also allege a variety of common law claims, including negligence, invasion of privacy, conversion, unjust enrichment, breach of the duty to warn, and such.

This is just the most recent lawsuit in a growing body of claims against big AI. Many involve allegations of copyright infringement, but privacy is a growing concern. This particular suit is asking for an award of monetary damages and an order that would require the companies to implement safeguards for the protection of private data.

Microsoft reportedly has invested billions of dollars in OpenAI and its app, ChatGPT.

The case is A.T. v. OpenAI LP, U.S. District Court for the Northern District of California, No. 3:23-cv-04557 (September 5, 2023).

Is Microsoft “too big to fail” in court? We shall see.

Generative-AI: The Top 12 Lawsuits

Artificial intelligence (“AI”) is generating more than content; it is generating lawsuits. Here is a brief chronology of what I believe are the most significant lawsuits that have been filed so far.

Most of these allege copyright infringement, but some make additional or other kinds of claims, such as trademark, privacy or publicity right violations, defamation, unfair competition, and breach of contract, among others. So far, the suits primarily target the developers and purveyors of generative AI chatbots and similar technology. They focus more on what I call “input” than on “output” copyright infringement. That is to say, they allege that copyright infringement is involved in the way particular AI tools are trained.

Thomson Reuters Enterprise Centre GmbH et al. v. ROSS Intelligence (May, 2020)

Thomson Reuters Enterprise Centre GmbH et al. v. ROSS Intelligence Inc., No. 20-cv-613 (D. Del. 2020)

Thomson Reuters alleges that ROSS Intelligence copied its Westlaw database without permission and used it to train a competing AI-driven legal research platform. In defense, ROSS has asserted that it only copied ideas and facts from the Westlaw database of legal research materials. (Facts and ideas are not protected by copyright.) ROSS also argues that its use of content in the Westlaw database is fair use.

One difference between this case and subsequent generative-AI copyright infringement cases is that the defendant in this case is alleged to have induced a third party with a Westlaw license to obtain allegedly proprietary content for the defendant after the defendant had been denied a license of its own. Other cases involve generative AI technologies that operate by scraping publicly available content.

The parties filed cross-motions for summary judgment. While those motions were pending, the U.S. Supreme Court issued its decision in Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. ___, 143 S. Ct. 1258 (2023). The parties have now filed supplemental briefs asserting competing arguments about whether and how the Court’s treatment of transformative use in that case should be interpreted and applied in this case. A decision on the motions is expected soon.

Doe 1 et al. v. GitHub et al. (November, 2022)

Doe 1 et al. v. GitHub, Inc. et al., No. 22-cv-06823 (N.D. Calif. November 3, 2022)

This is a class action lawsuit against GitHub, Microsoft, and OpenAI that was filed in November, 2022. It involves GitHub’s CoPilot, an AI-powered tool that suggests lines of programming code based on what a programmer has written. The complaint alleges that Copilot copies code from publicly available software repositories without complying with the terms of applicable open-source licenses. The complaint also alleges removal of copyright management information in violation of 17 U.S.C. § 1202, unfair competition, and other tort claims.

Andersen et al. v. Stability AI et al. (January 13, 2023)

Andersen et al. v. Stability AI et al., No. 23-cv-00201 (N.D. Calif. Jan. 13, 2023)

Artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed this class action lawsuit against generative-AI companies Stability AI, Midjourney, and DeviantArt on January 13, 2023. The lawsuit alleges that the defendants infringed their copyrights by using their artwork without permission to train AI-powered image generators to create allegedly infringing derivative works. The lawsuit also alleges violations of 17 U.S.C. § 1202 and publicity rights, breach of contract, and unfair competition.

Getty Images v. Stability AI (February 3, 2023)

Getty Images v. Stability AI, No. 23-cv-00135-UNA (D. Del. February 23, 2023)

Getty Images has filed two lawsuits against Stability AI, one in the United Kingdom and one in the United States, each alleging both input and output copyright infringement. Getty Images owns the rights to millions of images. It is in the business of licensing rights to use copies of the images to others. The lawsuit also accuses Stability AI of falsifying, removing or altering copyright management information, trademark infringement, trademark dilution, unfair competition, and deceptive trade practices.

Stability AI has moved to dismiss the complaint filed in the U.S. for lack of jurisdiction.

Flora et al. v. Prisma Labs (February 15, 2023)

Flora et al. v. Prisma Labs, Inc., No. 23-cv-00680 (N.D. Calif. February 15, 2023)

Jack Flora and others filed a class action lawsuit against Prisma Labs for invasion of privacy. The complaint alleges, among other things, that the defendant’s Lensa app generates sexualized images from images of fully-clothed people, and that the company failed to notify users about the biometric data it collects and how it will be stored and/or destroyed, in violation of Illinois’s data privacy laws.

Young v. NeoCortext (April 3, 2023)

Young v. NeoCortext, Inc., 2023-cv-02496 (C.D. Calif. April 3, 2023)

This is a publicity rights case. NeoCortext’s Reface app allows users to paste images of their own faces over those of celebrities in photographs and videos. Kyland Young, a former cast member of the Big Brother reality television show, has sued NeoCortext for allegedly violating his publicity rights. The complaint alleges that NeoCortext has “commercially exploit[ed] his and thousands of other actors, musicians, athletes, celebrities, and other well-known individuals’ names, voices, photographs, or likenesses to sell paid subscriptions to its smartphone application, Reface, without their permission.”

NeoCortext has asserted a First Amendment defense, among others.

Walters v. Open AI (June 5, 2023)

Walters v. OpenAI, LLC, No. 2023-cv-03122 (N.D. Ga. July 14, 2023) (Complaint originally filed in Gwinnett County, Georgia Superior Court on June 5, 2023; subsequently removed to federal court)

This is a defamation action against OpenAI, the company responsible for ChatGPT. The lawsuit was brought by Mark Walters. He alleges that ChatGPT provided false and defamatory misinformation about him to journalist Fred Riehl in connection with a federal civil rights lawsuit against Washington Attorney General Bob Ferguson and members of his staff. ChatGPT allegedly stated that the lawsuit was one for fraud and embezzlement on the part of Mr. Walters. The complaint alleges that Mr. Walters was “neither a plaintiff nor a defendant in the lawsuit,” and “every statement of fact” pertaining to him in the summary of the federal lawsuit that ChatGPT prepared is false. A New York court recently addressed the questions of sanctions for attorneys who submit briefs containing citations to non-existent “precedents” that were entirely made up by ChatGPT. This is the first case to address tort liability for ChatGPT’s notorious creation of “hallucinatory facts.”

In July, 2023, Jeffery Battle filed a complaint against Microsoft in Maryland alleging that he, too, has been defamed as a result of AI-generated “hallucinatory facts.”

P.M. et al. v. OpenAI et al. (June 28, 2023)

P.M. et al. v. OpenAI LP et al., No. 2023-cv-03199 (N.D. Calif. June 28, 2023)

This lawsuit has been brought by underage individuals against OpenAI and Microsoft. The complaint alleges the defendants’ generative-AI products ChatGPT, Dall-E and Vall-E collect private and personally identifiable information from children without their knowledge or informed consent. The complaint sets out claims for alleged violations of the Electronic Communications Privacy Act; the Computer Fraud and Abuse Act; California’s Invasion of Privacy Act and unfair competition law; Illinois’s Biometric Information Privacy Act, Consumer Fraud and Deceptive Business Practices Act, and Consumer Fraud and Deceptive Business Practices Act; New York General Business Law § 349 (deceptive trade practices); and negligence, invasion of privacy, conversion, unjust enrichment, and breach of duty to warn.

Tremblay v. OpenAI (June 28, 2023)

Tremblay v. OpenAI, Inc., No. 23-cv-03223 (N.D. Calif. June 28, 2023)

Another copyright infringement lawsuit against OpenAI relating to its ChatGPT tool. In this one, authors allege that ChatGPT is trained on the text of books they and other proposed class members authored, and facilitates output copyright infringement. The complaint sets forth claims of copyright infringement, DMCA violations, and unfair competition.

Silverman et al. v. OpenAI (July 7, 2023)

Silverman et al. v. OpenAI, No. 23-cv-03416 (N.D. Calif. July 7, 2023)

Sarah Silverman (comedian/actress/writer) and others allege that OpenAI, by using copyright-protected works without permission to train ChatGPT, committed direct and vicarious copyright infringement, violated section 17 U.S.C. 1202(b), and their rights under unfair competition, negligence, and unjust enrichment law.

Kadrey et al. v. Meta Platforms (July 7, 2023)

Kadrey et al. v. Meta Platforms, No. 2023-cv-03417 (N.D. Calif. July 7, 2023)

The same kinds of allegations as are made in Silverman v. OpenAI, but this time against Meta Platforms, Inc.

J.L. et al. v. Alphabet (July 11, 2023)

J.L. et al. v. Alphabet, Inc. et al. (N.D. Calif. July 11, 2023)

This is a lawsuit against Google and its owner Alphabet, Inc. for allegedly scraping and harvesting private and personal user information, copyright-protected works, and emails, without notice or consent. The complaint alleges claims for invasion of privacy, unfair competition, negligence, copyright infringement, and other causes of action.

On the Regulatory Front

The U.S. Copyright Office is examining the problems associated with registering copyrights in works that rely, in whole or in part, on artificial intelligence. The U.S. Federal Trade Commission (FTC) has suggested that generative-AI implicates “competition concerns.”. Lawmakers in the United States and the European Union are considering legislation to regulate AI in various ways.

Why Machine Training AI with Protected Works is Not Fair Use

… if the underlying goal of copyright’s exclusive rights and the fair use exception is to promote new “authorship,” this is doctrinally fatal to the proposal that training AIs on volumes of protected works favors a finding of fair use.

Guest blogger David Newhoff lays out the argument against the claim that training AI systems with copyright-protected works is fair use. David is the author of Who Invented Oscar Wilde? The Photograph at the Center of Modern American Copyright (Potomac Books 2020) and is a copyright advocate/writer at The Illusion of More.

As most copyright watchers already know, two lawsuits were filed at the start of the new year against AI visual works companies. In the U.S., a class-action was filed by visual artists against DeviantArt, Midjourney, and Stability AI; and in the UK, Getty Images is suing Stability AI. Both cases allege infringing use of large volumes of protected works fed into the systems to “train” the algorithms. Regardless of how these two lawsuits might unfold, I want to address the broad defense, already being argued in the blogosphere, that training generative AIs with volumes of protected works is fair use. I don’t think so.

Copyright advocates, skeptics, and even outright antagonists generally agree that the fair use exception, correctly applied, supports the broad aim of copyright law to promote more creative work. In the language of the Constitution, copyright “promotes the progress of science,” but a more accurate, modern description would be that copyright promotes new “authorship” because we do not tend to describe literature, visual arts, music, etc. as “science.”

The fair use doctrine, codified in the federal statute in 1976, originated as judge-made law, and from the seminal Folsom v. Marsh to the contemporary Andy Warhol Foundation v. Goldsmith, the courts have restated, in one way or another, their responsibility to balance the first author’s exclusive rights with a follow-on author’s interest in creating new expression. And as a matter of general principle, it is held that the public benefits from this balancing act because the result is a more diverse market of creative and cultural works.

Fair use defenses are case-by-case considerations and while there may be specific instances in which an AI purpose may be fair use, there are no blanket exceptions. More broadly, though, if the underlying goal of copyright’s exclusive rights and the fair use exception is to promote new “authorship,” this is doctrinally fatal to the proposal that training AIs on volumes of protected works favors a finding of fair use. Even if a court holds that other limiting doctrines render this activity by certain defendants to be non-infringing, a fair use defense should be rejected at summary judgment—at least for the current state of the technology, in which the schematic encompassing AI machine, AI developer, and AI user does nothing to promote new “authorship” as a matter of law.

The definition of “author” in U.S. copyright law means “human author,” and there are no exceptions to this anywhere in our history. The mere existence of a work we might describe as “creative” is not evidence of an author/owner of that work unless there is a valid nexus between a human’s vision and the resulting work fixed in a tangible medium. If you find an anonymous work of art on the street, absent further research, it has no legal author who can assert a claim of copyright in the work that would hold up in any court. And this hypothetical emphasizes the point that the legal meaning of “author” is more rigorous than the philosophical view that art without humans is oxymoronic. (Although it is plausible to find authorship in a work that combines human creativity with AI, I address that subject below.)

As a matter of law, the AI machine itself is disqualified as an “author” full stop. And although the AI owner/developer and AI user/customer are presumably both human, neither is defensibly an “author” of the expressions output by the AI. At least with the current state of technologies making headlines, nowhere in the process—from training the AI, to developing the algorithm, to entering prompts into the system—is there an essential link between those contributions and the individual expressions output by the machine. Consequently, nothing about the process of ingesting protected works to develop these systems in the first place can plausibly claim to serve the purpose of promoting new “authorship.”

**But What About the Google Books Case?**

Indeed. In the fair use defenses AI developers will present, we should expect to see them lean substantially on the holding in Authors Guild v. Google Books—a decision which arguably exceeds the purpose of fair use to promote new authorship. The Second Circuit, while acknowledging that it was pushing the boundaries of fair use, found the Google Books tool to be “transformative” for its novel utility in presenting snippets of books; and because that utility necessitates scanning whole books into its database, a defendant AI developer will presumably want to make the comparison. But a fair use defense applied to training AIs with volumes of protected works should fail, even under the highly utilitarian holding in Google Books.

While people of good intent can debate the legal merits of that decision, the utility of the Google Books search engine does broadly serve the interest of new authorship with a useful research tool—one I have used many times myself. Google Books provides a new means by which one author may research the works of another author, and this is immediately distinguishable from the generative AI which may be trained to “write books” without authors. Thus, not only does the generative AI fail to promote authorship of the individual works output by the system, but it fails to promote authorship in general.

Although the technology is primitive for the moment, these AIs are expected to “learn” exponentially and grow in complexity such that AIs will presumably compete with or replace at least some human creators in various fields and disciplines. Thus, an enterprise which proposes to diminish the number of working authors, whether intentionally or unintentionally, should only be viewed as devastating to the purpose of copyright law, including the fair use exception.

AI proponents may argue that “democratizing” creativity (i.e., putting these tools in every hand) promotes authorship by making everyone an author. But aside from the cultural vacuum this illusion of more would create, the user prompting the AI has a high burden to prove authorship, and it would really depend on what he is contributing relative to the AI. As mentioned above, some AIs may evolve as tools such that the human in some way “collaborates” with the machine to produce a work of authorship. But this hypothetical points to the reason why fair use is a fact-specific, case-by-case consideration. AI Alpha, which autonomously creates, or creates mostly without human direction, should not benefit from the potential fair use defense of AI Beta, which produces a tool designed to aid, but not replace, human creativity.

Broadly Transformative? Don’t Even Go There

Returning to the constitutional purpose of copyright law to “promote science,” the argument has already been floated as a talking point that training AI systems with protected works promotes computer science in general and is, therefore, “transformative” under fair use factor one for this reason. But this argument should find no purchase in court. To the extent that one of these neural networks might eventually spawn revolutionary utility in medicine or finance etc., it would be unsuitable to ask a court to hold that such voyages of general discovery fit the purpose of copyright, to say nothing of the likelihood that the adventure strays inevitably into patent law. Even the most elastic fair use findings to date reject such a broad defense.

It may be shown that no work(s) output by a particular AI infringes (copies) any of the works that went into its training. It may also be determined that the corpus of works fed into an AI is so rapidly atomized into data that even fleeting “reproduction” is found not to exist, and, thus, the 106(1) right is not infringed. Those questions are going to be raised in court before long, and we shall see where they lead. But to presume fair use as a broad defense for AI “training” is existentially offensive to the purpose of copyright, and perhaps to law in general, because it asks the courts to vest rights in non-humans, which is itself anathema to caselaw in other areas.[1]

It is my oft-stated opinion that creative expression without humans is meaningless as a cultural enterprise, but it is a matter of law to say that copyright is meaningless without “authors” and that there is no such thing as non-human “authors.” For this reason, the argument that training AIs on protected works is inherently fair use should be denied with prejudice.

[1] Cetaceans v. Bush holding that animals do not have standing in court was the basis for rejecting PETA’S complaint against photographer Slater for infringing the copyright rights of the monkey in the “Monkey Selfie” fiasco.

Category: AI Copyright – Training & Inputs

Court Rules AI Training is Fair Use

The Kadrey v. Meta Platforms Lawsuit

Judge Chhabria’s Fair Use Analysis