top of page
  • Kamini Pandey


Kamini Pandey,

O.P. Jindal Global University


Artificial Intelligence in this era is rapidly advancing and influencing various aspects of our lives. With increase in producing various apps and machines with help of artificial intelligence; producers are using data which is already available on the sites for training AI, here comes the prime question whether training generative AI infringement of copyright or not? In this paper we are going to discuss aspects which are responsible for infringement of copyright. Also, doctrine of fair use is important if we are talking about infringement of copyright and what are the factors which are come under the doctrine of fair use. Infringement of copyright while training AI is the subject of paramount importance and complexities, there are various debates which are going on this topics and number of cases which are still pending, we will try to give solution so that there is no infringement of copyright while training generative AI. Basically, this paper will conclude how infringement of copyright is taking place in large amount while training generative and what are its solutions. 

Keywords: Artificial Intelligence, Copyright Infringement, Fair Use


Copyright is a part of intellectual property rights; it is a legal right which protects the right of owner of the work. Copyright comes into existence as soon as the work is created, others need to take permission of copyright holder before using their work.

“The sine quo non of copyright is originality” which means Originality is a precondition to copyright protection. Copyright is important in protecting someone’s original work. Whereas Artificial Intelligence is science plus engineering, it is a leading technology of current era, which is stimulation of human intelligence into machines programmed by humans who can think, learn, act and solve problems like humans. AI analyse vast amount of data leading to intentional or unintentional generation of content that infringes copyright.   

The application of artificial intelligence is wide, therefore training artificial intelligence on large scale will lead to copyright infringement as artificial intelligence and copyright law are interconnected, while training AI may violet copyright laws as it can generate substantially similar work to the copyright’s owner, it can be avoided only there is fair use of copyrighted work which do not need permission of the owner. 

But recently as use of AI is on massive level to maintain fair use of copyrighted work is becoming difficult therefore there are various cases and issues regarding infringement of copyright while training generative AI.

In this paper we will discuss about how training generative AI leads to copyright infringement. We will discuss how we can have control over violation of copyright by AI, further we will see how AI and Copyright are interconnected, this paper is focusing more on exactly how AI is infringing copyright, we will refer various cases in which there are suits filed on AI companies. Also, how companies developing AI tools, how they can establish strict guidelines to prevent such infringement, additionally legal and ethical discussions addressing these challenges effectively.


AI i.e., Artificial Intelligence and Copyright are the topics which are still evolving in legal or ethical terms. For example, if we ask chat GPT to provide us some information about AI, we find similarities as the work provided is the compilation of various Articles which we find on Google or any other sources, Here comes the question who is the legal creator, the AI, creator of an original work, users or developers behind AI. Creating work using Artificial Intelligence have important implications on Copyright Law. When work is created by AI there is minimal or we can say no intervention of human, most of the countries have jurisdiction of copyright on the human created works only. For copyright protection, the work should meet the criteria of originality, in the case of AI the question is Whether AI possess originality as while training AI producers rely on existing data and algorithms created by humans. Therefore, here is the prime question, is training generative AI leads to infringement of copyright? 

Is Training Generative AI Infringement of Copyright? 

Determining infringement of AI is complex as we have to consider role of AI developers, AI systems and users but Generative AI system infringe copyright if they produce images or texts that are substantially similar to expressive elements on which they are trained. Copyright’s exclusive rights are limited by the fair use and other various doctrines. So, the question before us is whether training generative AI comes under the fair use of copyright of Law or not? Fair use is a defence to charges of infringement, court consider four factors in deciding if a use is fair;

  • Purpose and Character of the use whether use is for commercial or for non-profitable purpose. 

  • Nature of the copyrighted work.

  • Amount and substantiality of the portion used in relation of copyrighted work.

  • Effect of the use on the market.

In the case of Field v. Google, google copying content for making it accessible to public for various purposes is the fair use. Also, in the case of A.V v. iParadigms storage of student paper’s use of plagiarism detection was held fair use as it is for the educational purpose. In both the cases we can see contents used by goggle and also by the student fulfilling factor of four therefore this is the fair use of copyright, but when it comes to AI, is it a fair use? 

Developers of Artificial Intelligence system have copied millions of copyrighted works from the internet relying on the debatable theory of fair use, while training AI these works are copied multiple times which resembles to the original work. Fair use in the Artificial Intelligence is dependent and it changes system to system. Let’s take first factor that is purpose and character of AI Use, this is the most popular arguments which developers use in regard with the fair use. For this factor we have to analyse whether AI system is for profitable or non-profitable purpose, and copyrighted work must be a transformation, for instance: Open AI like ChatGPT, DALL-E, etc. ChatGPT is trained on large number of copyrighted texts, the purpose of ChatGPT is educational purpose, we can use it for free and for unlocking new features we have to take subscription of 24$, in the training process of ChatGPT or any other open AI, did it violate the copyright of original authors? The Authors Guild Survey reveals that 90% of writers believe that authors should be compensated for the use of their books in Training Generative AI, if these Open AI are processing under the fair use, then there would be no question of above 90% of Authors asking for compensation, therefore there is infringement of copyright and most of the disputes are still pending in the courts and are debatable, with increase in use of Technology there would be more complexities in weighing and finding fair use. Courts are still navigating these complexities, making to establish clear guidelines. If training generative AI is infringement to copyright as content is used for training without authorization, it can be seen as violation of copyright law. Here are some factors which are infringing the copyright while training generative AI.

  • Commercial Use:

Using copyrighted content without permission for commercial purposes significantly heightens the risk of legal consequences. Courts tend to scrutinize commercial uses more closely because they involve potential financial gain or impacting the market value of the copyrighted material. This increased scrutiny makes it crucial for businesses and developers to obtain proper licenses or permissions for any copyrighted content used in AI technologies intended for commercial applications. Failure to do so can lead to legal disputes, fines, and damage to a company's reputation. Legal consultation is essential to ensure compliance with copyright laws when developing AI technologies for commercial purposes. Even non-profit organizations can infringe copyright if they use copyrighted material without permission or without falling within fair use exemptions.

  • Use of Copyrighted Data: 

Using copyrighted material in a training dataset without proper authorization or without falling within the bounds of fair use can indeed infringe copyright laws. It's crucial for developers and organizations to be mindful of the sources of their training data and ensure they have the legal right to use the copyrighted material. Failure to do so can lead to legal consequences, including copyright infringement claims and legal actions by the copyright holders. Legal guidance is essential to navigate these complexities effectively.

  • Without explicit consent: 

Using copyrighted works without the explicit permission of the copyright owner constitutes infringement, especially if the intended use is not covered under fair use or other exemptions. It can raise legal and ethical concerns regarding privacy and data protection. Many Jurisdictions have implemented regulations such as GDPR( General Data Protection Regulations) which requires explicit consent from individual for collection of data.

  • Distribution of Copyrighted Output:

If AI-generated content includes copyrighted elements and is distributed or used publicly without authorization, it could violate copyright laws. Distributing copyrighted output generated by artificial intelligence without proper authorization can indeed lead to copyright infringement. Whether it's text, images, music, or any other content created by AI, if it includes copyrighted elements and is distributed without the appropriate permissions, it violates the rights of the original copyright holders. 

 These are some important factors which lead to infringement of copyright while training generative AI, there are more factors which will lead to infringement such as reproduction of expression, derivative works, International Considerations, etc To avoid this infringement of copyright, there are certain should which should be done by AI developers.

  • Legal Consultation

AI developers should take legal consultation from any expert in Intellectual Property Law before developing an AI. Legal professionals can help developers to understand copyright laws specific to their jurisdiction and can provide advice on how to avoid infringement while training generative AI. Lawyers can review data usage agreements and advice on the lawful data acquisition and privacy regulations also they can provide guidance regarding fair use doctrine.

  • Compensating Authors 

By compensating authors for non-infringement of copyright typically involves ensuring their works are used legally. Also, according to Guild’s report there are more than 90% of authors who seek compensation from the AI developers for using their contents for training generative AI. 

  • New Regulations

As there is rapid growth in production of AI, there should be introduction to new laws and regulations by states. There should be rigid legislations with systematic arrangements of all the laws regarding AI and copyright. There are laws which are pre existed for copyright but they should be amended as with introduction of world to AI, new regulations and amendments are must needed.

Other factors like copyright compliances, licensing agreements, fair use assessment, data scrubbing, collaboration with content creators, use of open datasets, seeking permissions, Intellectual property protection etc are other things developers of AI should take into consideration.


Therefore, yes while training generative AI it infringes the copyright. But it will vary system to system as there will be some AI which come under the fair use and some will not, so infringement is dependent on AI and its training process. This infringement shall be avoided by developers in order to avoid any kind of illegality. There are various factors which are responsible for infringement of copyright and also there are solutions for the same as we discussed. As AI is the future of Globe, there would be more focus on solving issues and also cases with respect to AI related matters. Also, Copyright which is one of the most important Intellectual property Rights must be secured while creating and training generative AI. 


  1. Copyright and Generative AI: Our Views Today


  3. Artificial Intelligence: What It Is and How It Is Used

  4. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems


  6. Artificial intelligence and copyright

  7. India: Legal Implications Of AI-Created Works In India

  8. Survey Reveals 90 Percent of Writers Believe Authors Should Be Compensated 

59 views0 comments


bottom of page