San Francisco, CA – Tech giant Nvidia is facing a lawsuit in San Francisco federal court from a group of authors who claim that the company used their copyrighted works without permission to train its artificial intelligence (AI) platform, NeMo. Authors Brian Keene, Abdi Nazemian, and Stewart O’Nan allege that their books were included in a dataset of 196,640 books that were used to train NeMo to simulate written language, and even though the dataset was eventually removed due to reported copyright infringement, the authors argue that Nvidia admitted to training NeMo on the dataset, thereby infringing their copyrights.
The proposed class action lawsuit seeks unspecified damages for individuals whose copyrighted works contributed to the training of NeMo’s large language models (LLMs) within the last three years. LLMs are instrumental in powering AI tools like NeMo, which offers a fast and affordable way to adopt generative AI.
The lawsuit includes claims from authors whose works were included in the dataset, such as Keene’s “Ghost Walk,” Nazemian’s “Like a Love Story,” and O’Nan’s “Last Night at the Lobster.” The authors assert that their books were part of a collection called “Books3” within a dataset known as “The Pile.” Nvidia’s NeMo Megatron AI models were trained on The Pile, which was hosted on a website called Hugging Face. However, the dataset was removed in October 2023 due to copyright infringement concerns.
Nvidia, a leading AI chipmaker, has seen its stock surge by nearly 600% since the end of 2022, giving the company a market value of around $2.2 trillion. The company has not provided any comment on the pending litigation.
This lawsuit places Nvidia amidst a growing number of legal battles surrounding tech companies’ use of copyrighted content to train AI models. In addition to lawsuits filed by writers, the New York Times has also sued OpenAI and Microsoft over similar concerns.
While the outcome of this specific lawsuit remains to be seen, the impact on the broader technological landscape and the legal implications for the use of copyrighted material in AI research will undoubtedly be closely watched.