Determining the legal qualification of AI system providers
AI system providers who intend to create training datasets with personal data must determine their qualification under the GDPR: they may be qualified as controllers, joint controllers or processors.
Several operators may intervene in the development of an AI system, with varying degrees of involvement in the processing of personal data. In particular, there are:
- AI system providers developing or delegating the development of a system and placing it on the market or putting it into service under their own name or brand, against payment or for free.
- importers, distributors, and users of these systems (i.e. those deploying AI systems).
The qualification of the operators involved in each processing, within the meaning of the GDPR, must be analysed on a case-by-case basis.
The controller
The principle
The controller is the natural or legal person who determines the purposes and means of the processing, i.e. who decides on the “why” and “how” of the use of personal data.
The essential means of processing are those which are closely related to the purpose and scope of the processing, such as the type of personal data collected, the hardware and software used for the processing as well as their security, the duration of the processing, the categories of recipients and the categories of data subjects.
In practice
Some clues may help to conduct the analysis on a case-by-case basis to determine who is responsible for the processing.
A provider who is at the initiative of the development of an AI system and who creates the training dataset based on data that it has selected on its own account, may be qualified as a controller.
The same applies to a supplier who entrusts the creation of such a base to a service provider through sufficiently detailed documented instructions (see the role of processor below).
It should be noted that in some cases a provider will have recourse to a third-party who has already created a dataset as the controller (on its own initiative). It will then be necessary to identify the processing for which the provider is responsible, such as the re-use on its own account of a dataset already constituted.
Examples of controllers:
- A video streaming platform wants to develop a recommendation AI system. For this purpose, it reuses a dataset of its customers that was originally collected for the purpose of providing the service.
The streaming platform that creates the training dataset is responsible for this new processing since it has decided on the purpose (train a recommendation AI system) and the essential means of processing (i.e. the dataset it has already collected for another purpose). - The provider of a conversational agent who trains its large language model (LLM) from publicly available data on the Internet is controller of the reuse of publicly available personal data on the Internet. Indeed, the provider decides both the purpose (proposing a conversational agent) and the essential means of processing (selecting the data to be re-used).
- A provider develops an AI system based on a pre-trained model with personal data. The provider intends to retrain or adjust the model (through fine-tuning or transfer learning) with a dataset that it set up, at its initiative. In such a case, that provider will have to be classified as a controller, provided that it pursues a purpose of its own and for which it determines itself the essential means.
Reuse of data collected by another organisation
When the provider trains its AI system with data collected by another entity, it is necessary to distinguish:
- the data diffuser: the natural or legal person, public or private, who uploads online personal data or a dataset that contains personal data;
- the re-user of the data: the natural or legal person, public or private, who processes such data or datasets with the intention of using them on its own account.
The diffuser and the re-user of the data are, in principle, responsible for separate processing, since each determines the objectives and the essential means of its own processing.
The data diffuser is, in principle, responsible for the public dissemination, while the provider of the AI system that re-uses the data is responsible the usage of the data it has. The diffuser is not, in principle, responsible for the re-use of its data. It may, however, lay down conditions for the use of the data disseminated to limit its reuse or provide for certain provisions.
Example:
An administration makes real estate data publicly available and freely reusable (open data). A company wants to reuse this data to create a training dataset in order to develop an AI system able to predict certain real estate developments in a given area. The diffuser and the re-user are then responsible for separate processing, provided that these two processings are independent.
Find out more: Sheet 1 of the guide on the opening and reuse of publicly accessible data.
Joint controllers
The principle
When two or more controllers jointly determine the purposes and means of processing, they are joint controllers.
This qualification may be difficult in the presence of several stakeholders having an influence on the determination of the purposes and means of the processing. In particular, stakeholders need to determine whether they process the data for their own and distinct purposes or for a common purpose.
In practice
When the training dataset of an AI system is fed by more than one controller for a jointly defined purpose, the controllers may be qualified as joint controllers.
Case 1: academic hospitals developing an AI system for the analysis of medical images choose to use the same federated learning protocol. The latter allows them to exploit data for which they are initially separate controllers, in order to benefit from the mutualization.
In case of joint control, the parties must ensure the lawfulness of the processing (i.e. its compliance with the law), including by defining in a transparent manner their respective obligations under an agreement. The form of this agreement is not specified by the GDPR. The agreement must reflect the roles of each of the stakeholders, with joint controllers having to clearly specify “who does what” to ensure the protection of the data processed.
Please note: regardless of the terms of the agreement, the data subject may exercise his or her rights vis-à-vis each of the joint controllers.
The use of a processor
The principle
The processor is the natural or legal person who processes data on behalf of the controller, in the context of a service or provision.
In practice
The qualification of the AI system provider must be assessed on a case-by-case basis.
An AI system provider may be a processor when it develops an AI system on behalf of one of its customers as part of a service. The customer is the data controller as soon as they determine the purpose and means of the processing.
In other cases, the AI system provider may be the controller of the systems it designs to market them.
An AI system provider may use a provider to collect and process the data according to its documented instructions (e.g. to collect publicly available data on the Internet, reuse a specific dataset made available online, etc.). The latter then qualifies as a processor. It is essential for the provider of the AI system, as the controller, to ensure that its processor complies with the GDPR and limits the processing of data to its instructions, in particular by concluding a data processing agreement.
Moreover, the fact of using the same dataset for several customers, in the context of separate services, is generally a decisive indication that the provider is responsible for a separate processing, at least for the establishment of the database.