Defining a purpose

16 October 2023

The creation of a dataset containing personal data involves the processing of personal data which, pursuant to the GDPR, must have a purpose that is specified, explicit and legitimate. The CNIL helps you define the purpose(s) taking into account the specificities of AI systems development.

The principle

The purpose of the processing is the aim of the use of personal data. This objective must be specified, that is to say determined upstream, from the definition of the project and be included in the record of processing activities. It must also be explicit, that is to say, known and understandable. Finally, it must be legitimate, that is to say compatible with the tasks of the organisation.

Since the data are collected for a specific and legitimate purpose, they must not be further processed in a manner incompatible with that initial purpose. The principle of purpose limitation restricts how the controller may use or reuse these data in the future.

The requirement of a specific, explicit and legitimate purpose is particularly important, as it determines the application of other principles of the GDPR, including:

  • the principle of transparency: the purpose of the processing must be brought to the attention of the data subjects so that they are able to know the reason for the collection of the various data concerning them and to understand the use which will be made of them;
  • the principle of data minimisation: the data selected must be adequate, relevant and limited to what is necessary for the purposes for which they are processed;
  • the principle of storage limitation: the data may only be retained for a limited period, which must be defined according to the purpose for which it was collected.
In practice

Two hypotheses should be distinguished according to the legal regime depending on whether the operational use of the AI system in the deployment phase is identified from the development phase:

Case 1: the operational use of the AI system during the deployment phase is identified from the development phase

Case 2: the operational use of the AI system during the deployment phase is not clearly defined from the development phase (general purpose AI system)

Special case: creation of a dataset for scientific research purposes