Copyright challenges in the training of artificial intelligence (AI) | HÄRTING Rechtsanwälte

The emergence of generative AI models that generate content such as texts and images raises fundamental questions about the copyright permissibility of their training methods. The following article sheds light on the legal framework and responsibilities in Switzerland.

31. December 2024

Basic aspects of copyright law

The question of whether copyright-protected works are suitable and permissible for use in the training of artificial intelligence arises because Art. 10 para. 1 CopA grants the owner of the work exclusive rights to it. In order to obtain an answer, it is necessary to take a look at both the technical possibilities for training AI systems and the legal restrictions.

In order for the AI to deliver the result desired by the user, it must be trained by the developers. To do this, the AI developers use training data, which they can use to define the problem and the output. The parameters that help with this are then constantly optimised. Generative AI models use machine learning to learn from extensive data sets, which often include copyrighted material. The data sets are also stored and copies are often created.

This process can in principle constitute a reproduction in accordance with Art. 10 CopA and requires authorisation due to the owner’s exclusive right.

The reproduction right is understood to be the comprehensive concept of reproduction. This can mean both the uploading and downloading of works, the reproduction and copying of data sets.

It has been established that the training of AI models can certainly result in the reproduction of copyright-protected content. As long as the copy of the work is aimed at consuming it, this can be understood as use in accordance with Art. 10 CopA.

Without consent, such uses can only be justified by legally regulated limitations, for example for scientific research (Art. 24d CopA) or temporary reproductions (Art. 24a CopA). These are discussed below.

Restrictions on the use of a work without consent

The personal use of a work is an exception where the owner’s consent is not required for use. This is regulated in Art. 19 CopA and states that a work can be used in the personal sphere and in a narrow circle.

It is clear from the interpretation alone that such use is not compatible with the training of AI models.

Another exception is temporary reproduction under Art. 24a CopA, which authorises temporary copies that are technically necessary, remain ephemeral and have no independent economic significance.

However, the training of AI models often requires data to be stored over a long period of time and also to be of economic benefit. This can include interim results of AI training or the archiving of old data records. For this reason, this approach should also be rejected.

Finally, Art. 24d CopA should be mentioned, which allows the use of works for the purpose of scientific research. This exception is often discussed in connection with the training of AI models.

The primary purpose and prerequisite for the use of copyright-protected works is therefore the scientific research being conducted. Universities often rely on this article, but this can also be the case for private-sector institutions, because as long as the research purpose is the main objective, the economic benefit can also be pursued.

The further prerequisite for utilisation is technical conditionality, according to which the reproduction takes place on the basis of a technical process. AI training generally fulfils these criteria. The final requirement is lawful access. It is argued that these requirements can be circumvented by applying Art. 39a CopA. This is understood to mean the so-called “right to hack” and can provide lawful access.

Based on this assumption, the restriction under Art. 24d CopA, the use of copyright-protected content for scientific purposes, enables the training of AI models.

The roles of users and those responsible

The persons responsible or developers of the AI train the AI models and are therefore responsible for the legal basis of the training data. This means that they are also responsible for the licensing and origin of the data sets used and therefore also for the copyright-protected works.

The users of the AI models that have been trained by the developers usually use them to generate their own content. In their activities, they must also ensure that the content does not infringe any copyrights. The use of non-transparent AI models cannot be used as justification, as the owner’s copyrights are also infringed in this case. This means that the works must be obtained in a lawful manner, as explained above.

Conclusion

Compliance with copyright regulations is essential for providers and users of AI applications. Providers should obtain clear licences for the training data or carefully check the limitation provisions. Users must ensure that the use of generated content does not infringe the rights of third parties. The development of specific licences for AI training could be a practicable solution to minimise legal uncertainties.

Sources

Marmy-Brändli Sandra/Oehri Isabelle: The training of artificial intelligence, sic! 2023, 655 ff.
Copyright and AI: Responsibility of providers and users