Copyright & Intellectual Property Protections with AI
BY MEGHNA SEKARAN
Image Credits: Ascannio on stock.adobe.com.
Generative AI, powered by machine learning models, has become a revolutionary tool in many industries, from design to scientific research and healthcare. Its applications range from generating music to creating visual art and programming code. According to recent statistics, almost 90 percent of large businesses across the world are now leveraging some form of AI in their day-to-day operations, and a significant portion of these companies are using generative models like OpenAI’s GPT-4, Google’s Bard, and StabilityAI for creative processes. As AI is becoming more integrated into daily life, individuals and organizations are learning how to employ these tools to enhance productivity, creativity, and innovation.
Generative AI models are trained on vast datasets consisting of text, images, music, and more to learn patterns that inform the production process for the AI. Companies source this data from a wide range of publicly available and licensed content to introduce variety in training and prevent restrictions for the pattern recognition process. In training, the data is fed to the AI model, allowing it to “learn” from the structures, themes, and patterns inherent in the data to generate outputs that resemble the data it was fed, essentially human-made content. However, the training process and data sourcing methodology raise significant questions about intellectual property (IP) rights, especially when copyrighted works are used as a part of the training dataset. These questions are further complicated by the fact that the AI models themselves cannot own intellectual property, leading to ongoing debates regarding who holds the rights to AI-generated content.
Copyright Law
Intellectual property law includes copyright and patent protection, which play a vital role in safeguarding creators’ rights over their original works, giving control to the creator regarding reproduction, distribution, and adaptations of their content. As it stands, copyright law applies to original works of authorship including books, music, software, and art in various mediums. When a human author creates an original work and fixes it (the practice of capturing the work in a sufficiently permanent medium), they become the official owner and author of their work, and this designation can be fixed via copyright. However, with the ability of generative AI to produce works of its own, the question arises as to who is the owner of the work produced by the AI model? Additionally, how should copyright protection be applied? Should legal frameworks evolve beyond only permitting human authorship?
Copyrights for AI-generated work
In 2019, Dr. Stephen Thaler, a scientist and inventor, and creator of AI models DABUS and the Creativity Machine, received national attention. DABUS had developed plans for two inventions, and Thaler brought these inventions to the attention of the U.S. Patent and Trademark Office as well as patent offices in the UK, European Union, South Africa, and Australia. Of these countries, only South Africa approved the application, and while Australia considered it, the rest denied the patent request.
After exploring the limits of patent regulations on AI-generated ideas, Thaler turned to copyrights on AI work. In June 2022, Thaler sued the U.S. Copyright Office for denying his application to register a work of art created by his AI program called the Creativity Machine. He argued that the Copyright Office’s justification ignored traditional property ownership principles and previous case precedents. The lawsuit followed a similar journey as the DABUS lawsuits, as a federal district court granted judgement in favor of the Copyright Office, citing that human authorship is necessary for a copyright claim. Since copyrightable work requires human involvement, AI-generated work where AI determines the elements of the output cannot be copyrighted as humans are not the primary authors of the work. In other countries, like China, AI-generated works of art are eligible for copyright protection so long as intellectual investment from the author is demonstrated. They recognize and highlight a key difference between content co-created by AI models and content fully generated by AI.
Copyright Infringement risks
AI poses issues with copyright that go beyond authorship. The training of AI models often involves using large datasets, which may contain copyrighted materials. While some models exclusively source content from publicly available spaces, this does not mean that the content used is not copyrighted. This raises the question of whether the use of such materials without explicit permission from the authors constitutes copyright infringement. The datasets used to train AI models are sourced from a wide range of publicly available materials, which may or may not have been cleared for use, as well as privately inputted data, depending on the purpose of the model. If AI systems use copyrighted works without permission, it could lead to infringement claims from the copyright holders, creating complex liability issues. AI developers, businesses using AI tools, and data providers could all be listed as stakeholders in the debate over liability. There have been several infringement claims and cases made in the past few years, bringing these questions of infringement determination to light. However, deciding with whom the responsibility for infringement lies in cases where AI models are trained with improperly licensed or unlicensed data is a gray area.
Under existing regulatory frameworks, the determination of copyright infringement is dependent on a few key tests and principles. The first and most important test is the Substantial Similarity Test, which examines the allegedly infringed work to identify elements that are substantially similar to the copyrighted work. In AI-generated content, this involves assessing whether the output mirrors copyrighted works closely enough to infringe on the original creator’s rights. This analysis can be split into extrinsic and intrinsic tests, where the extrinsic test assesses similarities of the two works, only focusing on the protectable or unique elements of the copyrighted work, filtering out the unprotected aspects. The intrinsic test is more holistic comparing the protected elements to the new work, and asks for an “ordinary person’s subjective impression.” This test can make or break an infringement claim, as it sets the strength of the claim from the start. Additional principles include the Fair Use Doctrine - a legal defense which allows certain uses of copyrighted material without permission for purposes like criticism, commentary, or education– however, whether this is applicable to AI is still debated. Open source licenses are also another important factor to consider, as these datasets can be subject to specific licenses that can grant permissions for use in specified conditions, and understanding the implications of these licenses and their use cases is important in determining the legality of AI model training.
Active Cases
Several lawsuits have been filed against OpenAI and other companies, including Stability AI, Bloomberg, and Anthropic. These lawsuits addressed claims of using copyrighted works from fiction authors, novels, short stories, and works of art in their training processes. With the OpenAI lawsuits, numerous defendants have been called aside from OpenAI leadership, including representatives from Microsoft, an investor in the product.
In the Andersen v. Stability AI case, plaintiffs accuse Stability AI of violating the Digital Millennium Copyright Act (DMCA) and other intellectual property rights by using copyrighted materials to train its image-generation model. Plaintiffs allege significant similarities between works produced by Stability AI’s models and artists' work, violating the Substantial Similarity test. The class action lawsuit against Stability AI alleges the company engaged in false endorsement and trade dress claims by using the copyrighted works in training, and there is both direct and induced copyright infringement in the AI model actions, complicating issues of IP rights in generative AI. These claims are based on the creation and functions of Stability AI’s Stable Diffusion and Dream Studio, as well as Midjourney Inc’s AI tool and other AI tools based on this model, citing numerous defendants. This case underscores the growing tension between the use of publicly available content in AI models and the rights of original creators of the content.
In the case of Alter v. Open AI, plaintiffs claim that OpenAI and Microsoft should be held responsible for the use of plaintiffs’ works in training their AI models, ChatGPT and Copilot. This is a consolidated class action including Alter v. OpenAI, Authors Guild v. OpenAI, and Basbanes v. OpenAI. The debated works include several fiction books and short stories and the plaintiff claims that ChatGPT’s outputs infringe on copyright by improperly summarizing or explicitly reproducing the copyrighted materials. This case and other similar lawsuits are currently being litigated in multiple jurisdictions rather than one consolidated proceeding.
Takeaways
As AI continues to redefine innovation, a fundamental truth must be acknowledged: AI systems lack legal personhood and thus cannot be credited as creators in the United States. This creates a gray area regarding who should bear responsibility for the model's creations - the developers who built the systems or those who use the tools of their own accord. Courts are still grappling with the question of whether AI training processes constitute fair use. While some argue that training on copyrighted works can be justified depending on the transformative use of the materials, advancing human efficiency and innovations, others suggest otherwise. The use of copyrighted materials in training violates authors’ exclusive rights to control the reproduction and distribution of their work.
As businesses continue to integrate generative AI into their operations, the intellectual property risks grow increasingly complex, as seen with the StabilityAI lawsuits. Companies using AI tools need to reconsider licensing agreements, especially when using AI tools that license copyrighted content in training systems and the large language learning models. Leadership must proactively assess the risks of using AI for idea generation and innovation and must remain aware of the liability issues associated with generative AI usage and who would be considered responsible for the publication of content based on copyrighted intellectual property. Multinational corporations face a larger challenge as they will need to take into account the varying IP regulation across the world, as countries like South Africa and China have adopted significantly different precedents with regard to AI regulation. In China, the usage of AI-generated art and protection of such works could benefit companies; however these same practices cannot be applied in the United States or the European Union.
As AI technology continues to advance, the legal frameworks governing technology and innovation must also advance. Businesses must develop new adaptable corporate strategies surrounding IP protections and which balance the interests of creators, AI developers, and needs of users. Ultimately, how stakeholders choose to navigate these transformations will shape the future of innovation and structure of accountability in this technology centered world.
This article was edited by Nidhi Nair and Ryan Balberman.