Relying on the legal basis of legitimate interests to develop an AI system

02 July 2024

Controllers will most commonly rely on their legitimate interests for the development of AI systems. However, this legal basis cannot be used without respecting its conditions and implementing sufficient safeguards.

 

This how-to sheet is open for public consultation until September the 1st 2024. More information.
This content is a courtesy translation of the original publication in French. In the event of any inconsistencies between the French version and this English translation, please note that the French version shall prevail.

 

Legitimate interest is one of the six legal bases provided for in Article 6 GDPR. 

It is often adapted for the development of AI systems by private bodies, especially when the database used is not based on the consent of individuals (it is often difficult to obtain the consent of individuals at a large scale or when personal data are collected indirectly).

With regard to public bodies, they may rely on their legitimate interests when a public authority wishes to develop an AI system only when the activities concerned are not strictly necessary for the performance of its specific tasks but for other activities legally implemented (such as, for example, human resources management processing).

For more information: on the use of legitimate interest by a public body, see in particular the use case illustrated in “How to choose the legal basis for processing? Practical cases with certain treatments implemented by the CNIL” (French version)

Reliance on legitimate interests is, however, subject to three conditions:

  • The interest pursued by the body must be “legitimate”;
  • The processing must fulfill the condition of “necessity”;
  • The processing must not disproportionately affect the rights and interests of the data subjects, taking into account their reasonable expectations. It is therefore necessary to “balance” the rights and interests at stake in the light of the specific conditions for its implementation.

The controller is bound to examine the compliance of its processing with these three conditions. To this end, it is recommended, as a good practice, to document it.  In any event, where a DPIA is necessary, the safeguards provided to limit the possible impacts on the rights of individuals must be described by the controller (see the how-to sheet “Carrying out a data protection impact assessment when necessary”).

Other legal bases may also be considered for the development of AI systems (see the how-to sheet “Ensuring the lawfulness of the data processing - Defining a legal basis”).

First condition: the interests pursued must be “legitimate”


Second condition: the processing must be "necessary"


Third condition: ensure that the objective pursued does not threaten the rights and freedoms of individuals


 

Steps Risks Measures
Data collection

In particular for web scraping, in cases where this does not fall within the reasonable expectations of persons:

- Invasion of privacy

- Impact on freedom of expression

Dedicated measures (see focus sheet on web scraping)
 

Model training and data retention

Loss of confidentiality of training data

- Anonymisation / pseudonymisation of data during collection

- Use of synthetic data

Invasion of privacy and loss of confidentiality related to data memorization/regurgitation in the model

Limiting the risks of memorization, regurgitation and generation of personal data (see questionnaire on the application of the GDPR to AI models)

Lack of transparency and opacity of processing

 

Increased transparency

- Transparent development of the AI system and its auditability

- Effective peer review of model development (see the focus sheet on open source)

- Difficulty in guaranteeing the exercise of rights

 

Facilitation of the exercise of rights

- Discretionary right to object

- Reasonable time between the creation of the training dataset and its use; 

- Transmission of the exercise of rights

Use of the AI system

- Invasion of privacy and loss of confidentiality related to data memorisation/regurgitation in the model

- Damage to reputation

- Regurgitation of protected data

 

- Limiting the risks of memorization, regurgitation and generation of personal data (see the questionnaire on the application of the GDPR to AI models)

- Discriminatory biases

- Ensuring the quality of the dataset

- Annotation as soon as the dataset is created

- Application of filters in the deployment phase

- Unlawful reuse

- Reuse licenses

- Digital watermarking