Ensuring and facilitating the exercise of data subjects’ rights
05 January 2026
Individuals whose data is collected, used or reused to develop an AI system have rights over their data that allow them to maintain control over it. It is the responsibility of the controllers to comply with them and to facilitate their exercise.
The exercise of rights is closely linked to the provision of information on processing operations, which is also an obligation.
General reminder on applicable rules
Data subjects have the following rights over their personal data:
- right of access (Article 15 GDPR);
- right of rectification (Article 16 of the GDPR);
- right to erasure, also known as the right to be forgotten (Article 17 of the GDPR);
- right to restriction of processing (Article 18 GDPR);
- right to data portability where the legal basis for processing is consent or contract (Article 20 GDPR);
- right to object where the processing is based on legitimate interest or public interest (Article 21 GDPR);
- right to withdraw consent at any time where the processing of personal data is based on consent (Article 7(3) GDPR).
Data subjects must be able to exercise their rights both:
- on training datasets and,
- on AI models if they are not considered anonymous as specified in EDPB Opinion 28/2024 on certain aspects of data protection related to the processing of personal data in the context of AI models.
While the exercise of a right of access, rectification or erasure on a training dataset presents quite comparable problems compared to other large datasets, the exercise of the same rights on the AI model itself presents particular and unprecedented difficulties. This must lead to the adoption of realistic and proportionate solutions, in order to guarantee people’s rights without preventing innovation in artificial intelligence. The CNIL thus considers that the GDPR makes it possible to understand the specificities of AI models for the exercise of rights. The complexity and costs of applications that would prove disproportionate are thus factors that can be taken into account.
With regard to the development of AI models or systems: how to respond to data subjects’ rights requests?
In practice, it is quite different to respond to requests for the exercise of rights depending on whether they relate to the training data or to the model itself.
The CNIL recommends that controllers clearly inform those exercising their rights about how they interpret their request (on training data or on the AI model) and how they respond to it. This is particularly important when the modification of their data in the training datasets is not reflected (or at least not immediately) in the AI model or system already developed (provided that this is justified by one of the derogations described below).
For the exercise of rights on training datasets
-
Difficulties in identifying the data subjects
Where the controller does not or no longer need to identify data subjects and can demonstrate that it is unable to do so, it may indicate this in response to data subjects’ rights requests.
This will often be the case for the development and use of training datasets. The provider of an AI system does not, in principle, need to identify the persons whose data are present in its training dataset. When identification data are no longer available, certain metadata and precise information on the sources of each data may make it possible to find people in a training base but, again, the retention of this information is not mandatory (on the issue of information on sources, see below). For example:
- An organisation does not have to keep the faces of an image dataset for the sole purpose of enabling the exercise of rights in a set of photographs if compliance with the principle of minimisation requires it to blur the faces.
- An organisation does not have to keep identification data in the dataset for the sole purpose of allowing the possible withdrawal of consent of individuals if they have been duly informed of this and have consented to it in an informed manner (e.g. by being informed when erasure of the data stored by the AI model will not be possible).
Where the controller is unable to identify the data subjects, it is also not necessary for the controller to retain the training data in order to respond to requests for the exercise of rights (e.g. to consider retraining an AI model as further developed below).
However, it is not because the controller cannot identify the data subjects on its own that such identification is impossible. In this particular case, the GDPR leaves the possibility for individuals to provide additional information in order to help controllers identify them and thus exercise their rights.
In this regard, the GDPR does not provide for any particular form. A data subject could therefore, for example, provide an image or a pseudonym (such as a username under which the data subject has an online activity). Moreover, it is only if and only if the organisation has reasonable doubts as to whether the data belong to the applicant that it may ask the applicant to attach any document proving that the data in question are indeed its own.
Under the principle of data protection by design and the obligation to facilitate the exercise of rights, it is recommended to anticipate such identification difficulties by indicating additional information that may allow the identification of data subjects (e.g. in the case of homonymy). This may include allowing them to provide a particular file (such as an image, audio or video recording) but also other information when this is not possible.
Such additional information, provided by individuals, may not be processed for any purpose other than the exercise of their rights.
-
With regard to the right to receive a copy of the training data (under the right of access)
The right of access allows any person to obtain, free of charge, a copy of all the data processed concerning him or her. This includes the right to obtain extracts from training datasets where this is necessary to enable the data subject to exercise effectively his or her other rights (CJEU, 4 May 2023, Case C-487/21). In this regard, the CNIL recommends providing personal data, including annotations and associated metadata, in an easily understandable format.
Such communication shall be without prejudice to the rights and freedoms of others, which shall include in particular the rights of other data subjects, or the intellectual property rights or trade secrets of the dataset holder.
-
With regard to information on data processing (under the right of access)
Allowing data subjects to know the precise recipients and sources of their data is essential to ensure effective control over their data. This information allows them to exercise their rights vis-à-vis the controllers holding their data. This is particularly important due to the often-complex chain of actors in the development of AI systems.
If the person is present and identifiable in the dataset, the controller must be able to provide all the information required by Article 15 GDPR. Among these, however, information on sources calls for clarification specific to this type of database.
Where the data have not been collected directly from the data subjects (for example, in the case of transmission by a data broker), the right of access makes it possible to obtain ‘any available information as to their source’.
However, as seen above, the controller will not always have to keep such information when it does not or no longer needs to identify the data subjects, and when traceability is not necessary for the purpose or to ensure the proportionality of the processing of the training data.
It will then have to provide any relevant information at its disposal to the data subjects who request it, i.e. on all the sources used.
A case-by-case analysis is necessary to determine what information is reasonable and proportionate to retain in order to ensure individuals’ rights over their data. Although, in particular in the case of collection from public sources, the GDPR does not fall within that level of detail by not requiring the retention of information on each URL used, it is nevertheless necessary to establish sufficient documentation on the sources of the training data in order to maintain their consistency with the rights guaranteed by the Regulation, in particular by Article 15 of the GDPR, and to allow the controller to document its compliance with the GDPR.
For example:
In case of re-use of a medical dataset to create a medical imaging analysis AI, the controller will need to keep the information on the source used and where possible the means of contact of the source controller, in order to be able to provide it to individuals.
The developer of an AI model that has lawfully collected data by scraping a significant number of web pages builds a textual dataset whose very large volume makes it very difficult to search by keywords. In order to be able to verify the presence of indirectly identifying data of the persons exercising their right of access, that developer could request to provide the URL(s) of the web pages concerned and implement an indexation of the URLs targeted by the scraping allowing it to easily navigate in the dataset (in the manner of the indexation carried out for Common Crawl). The retention of URLs of scraped web pages and the indexing of training data is one way to achieve this objective, however the GDPR is not prescriptive and the controller is only required to provide the information available on the sources in the context of the right of access.
In any case, information on data sources is useful to demonstrate the compliance of the training base. Data controllers will therefore have to keep certain information on sources under the principle of and data protection by design. In some cases, the EU AI Act also requires the provider of the AI system or model to document training data, including its source (Articles 11 and 13 for high-risk systems and Article 53 for general purpose AI models).
-
With regard to the rights to rectification, objection and erasure of one’s data contained in a training dataset
Data subjects have the right to rectify the data concerning them. In the present context, this may concern incorrect annotations which the data subjects would like to see corrected.
The right to erasure makes it possible to require the deletion of data in a number of cases provided for in Article 17 GDPR, for example where the data subject reveals to the controller that it holds sensitive data concerning him or her (within the meaning of Article 9 GDPR) and for which no derogation justified their processing.
Furthermore, where the processing is based on a legitimate interest or the performance of a task carried out in the public interest, data subjects may, at any time, request to object on grounds relating to their particular situation. The data controller must then do so, unless there are compelling and legitimate reasons for continuing to process the data.
In the event of web scraping of data accessible online, the CNIL encourages the development of technical solutions that would facilitate compliance with the exercise of the right to object prior to data collection. By way of comparison, we can mention the opt-out mechanisms put in place in the field of intellectual property. There are also ‘rejection list’ mechanisms for certain processing operations, which could be transposed where appropriate in the light of the data processing carried out. In this way, the controller could respect the objection of individuals by refraining from collecting their data.
-
With regard to the notification of the rights exercised (rectification, limitation and erasure)
Article 19 GDPR provides that a controller shall notify each recipient to whom the personal data have been disclosed of any rectification, erasure of personal data or restriction of processing carried out, unless such communication proves impossible or requires disproportionate effort (depending on available technologies and implementation costs). For example, where possible, the CNIL recommends the use of application programming interfaces (APIs) (particularly in the most risky cases), or at least techniques for managing the logging of data downloads.
In case of dataset sharing, it is good practice to inform about the different versions of the datasets, and to provide for a contractual obligation (e.g. in the dataset re-use licence) to pass on the effects of the exercise of the rights of objection, rectification or erasure by their re-users.
For the exercise of rights on models whose processing is subject to the GDPR
Some AI models are trained on datasets containing personal data but are, in themselves, anonymous.
However, in other cases, the model itself contains personal data that can be extracted from it and therefore cannot be considered anonymous (see EDPB Opinion 28/2024 on certain aspects of data protection related to the processing of personal data in the context of AI models). In particular, this is generally the case for large language models which, fed by public sources, can provide information on natural persons, especially public figures.
The rights available to individuals under the GDPR then apply to the model, but there are significant difficulties in implementing them. While certain safeguards will have to be considered in this case, the CNIL considers that the requirements of the GDPR are sufficiently balanced to capture the specificities of AI models. In particular, the constraints of AI model providers in exercising rights over the data they contain should be considered, such as the difficulty (or even impossibility) of identifying the data of a particular person within the model or the complexity and costs of retraining a model that may be disproportionate.
-
With regard to the identification of the data subject within the model
Where the GDPR applies to the model, two situations can be distinguished:
- Either the presence of data of the person in the model is obvious (e.g. some AI models are specifically designed to provide output personal data used for their training);
- Either that is not the case, and the question arises as to the assessment of the likelihood that data concerning that person is in the model. The controller will then be able to demonstrate that it is unable to identify persons within its model within the meaning of Article 11 GDPR.
Once the presence of a person within the model has been identified, the question also arises of identifying the personal data contained therein.
Influence of a person’s data on model parameters
When training the most complex machine training models such as neural networks, training data is used to optimise model parameters. However, due to the large number of iterations required to train the model and the nature of the techniques used (such as gradient descent), the contribution of each piece of data to the model is diffuse and methods to trace it are still being researched.
The field of machine unlearning, the principle, advantages and limitations of which are described in the Linc article ‘Understanding machine unlearning’ and in the EDPB’s Support Pool of Experts programme document ‘Effective implementation of data subjects’ rights’, is frequently the subject of promising advances. For example, the use of influence functions, which allow to keep track of the contribution of data on model parameters during training, is one of the techniques that could solve the question of the contribution of each data. These techniques would, for example, make it possible to identify the weights influenced by the outliers contained in a dataset and thus correct their value.
Pending the progress of research on these subjects, it does not seem possible today, in the general case, for a controller to identify for a particular person, all the data concerning him which have been memorised by the AI model without the latter providing additional information. On the other hand, it may sometimes be possible for the model to determine whether the model has learned information about a person by conducting fictitious tests and attacks, such as membership inference attacks or attribute reconstruction attacks. These attacks are described in the Linc article ‘Small taxonomy of attacks on AI systems’.
The CNIL warns data controllers about the fact that this position is only valid in the light of the current state of the art, and that it actively encourages research and the development of practices to ensure the exercise of rights and the principle of data protection by design.
It should be noted that there are, however, special cases such as when the model parameters explicitly contain certain training data (which may be the case for certain models such as support vector machines or SVMs, certain data partitioning algorithms, or clustering, when an AI system is connected to a knowledge base (such as augmented recovery generation or RAG): it will then be technically possible to exercise the rights over the parameters of the model.
As with the exercise of rights over training data, the controller must, if possible, inform data subjects that it is unable to identify them on its own, and allow them, if possible, to provide additional information, enabling them to be identified so that they can exercise their rights.
The CNIL recommends that the model provider informs the data subjects of the additional information to be provided in order to identify them.
- When the training data are still available to the controller, it may be relevant to identify the person within the training data, before verifying whether they have been stored by the model and are likely to be extracted from it.
- Where the controller no longer has the training data, the latter may rely on their typology to anticipate the categories of data that may have been stored, in order to facilitate attempts to identify the person (e.g. by testing and queries on the AI model in question).
In the case of generative AI, the CNIL recommends that the developer establish an internal procedure entailing the querying of the model (e.g. through a list of carefully selected prompts) in order to verify whether it has retained personal data concerning an individual on the basis of the information supplied.
Example:
The developer of a large language model (LLM) who has trained its model on data collected by scraping from various websites on the web may indicate to a person wishing to exercise his or her rights that he or she will need to provide the (prompt) queries which, according to him or her, reveal that his or her personal data is memorised by the AI model.
To that end, the data subject may, for example, and if he or she has it, use a copy of the training data relating to him or her in order to verify that certain information may be extracted from the model (for example, by asking the model to complete a part of his or her biography accessible on an open-source encyclopedia (in the case of a public figure), or an article from a website).
Due to the probabilistic functioning of these models, it is possible that the result provided does not result from a memorisation of training data but from a generation based on correlations learned from other data. These results are sometimes referred to as hallucinations when they are factually unfounded but presented as truthful.
In some cases, it will be important to clearly distinguish the model from the system within which it is integrated. Indeed, the system – which can be considered as a kind of interface between the user and the AI model it integrates – can add different data to the query of the model or its result (e.g. through a web search or from a knowledge base in the case of the RAG). It is therefore possible for a person to mistakenly believe that an AI model contains data about him or her, whereas the results in question are caused by other components of the system. It is therefore essential to keep this distinction in mind in order to identify the right controller. For example, if a knowledge base (in the case of the RAG) is integrated into the system by its supplier, the latter will also have to take into account the requests for the exercise of rights concerning it. On the contrary, if this knowledge base has been added by a third party, the data subjects will have to turn to that third party. A dedicated how-to-sheet will soon clarify the issues of responsibility in this area.
-
With regard to the information on the processing of the model (under the right of access) and the right to obtain a copy of the data:
Where the controller has been able to identify the data subject and verify that the data has been memorised, the controller must confirm this.
Where the person has exercised his or her right to obtain a copy of his or her data, the controller should provide him or her with the result of its investigations concerning him or her, in an understandable format, in particular with examples of outputs containing his or her personal data for generative AI systems.
Where it has not been possible to verify the presence of a memorisation, but the controller has not been able to exclude that it is possible (in particular because of the technical limitations of the current methods), the CNIL recommends to inform individuals that it is not impossible that training data concerning them have been memorised by the model.
If the data controller still has the training data, the CNIL recommends that it responds to the person by confirming that his or her data has been processed for training and that it may have been stored.
Where applicable, information specific to the model should be provided, such as information on the recipients of the model (15.1.c), its retention period or the criteria used to determine it (15.1.d), the rights that can be exercised on the model (15.1.e and f), as well as its provenance when it has not been designed by the controller (15.1.g).
-
With regard to the exercise of the rights to rectification, objection or erasure on the model
Not all rights of individuals over their data are absolute. This is particularly the case for the right to erasure, which essentially results from the right to object (the conditions of which are set out more specifically below): a balance must be struck between the right of the individual and any ‘legitimate and compelling reasons’, under the GDPR, of the controller.
Although the identification of data from the parameters of a model presents significant technical difficulties to date, some practical solutions make it possible to respond to the exercise of rights (in particular retraining of the model or the application of filters) but they are, to date, either particularly expensive and delicate or imperfect.
Thus, whether or not the response to the exercise of a right is proportionate will depend on:
- the sensitivity of the data and the risks that their regurgitation or disclosure would pose to individuals.
For example, a person may wish to object to the processing of his or her data due to hallucinations or inaccuracies against him or her that may cause him or her harm (e.g. by wrongly attributing wrongful or criminal conduct or presenting the person as deceased).
- the infringement of the data controller’s freedom to conduct a business, which depends on the practical feasibility and cost of the measures to be taken to comply with the data subject’s request, in particular with regard to a possible retraining of the model (in terms of computational, environmental, human and financial resources).
For example: a large language model is fed by a large number of public sources and properly informs users about the functions of certain public figures. A public figure asks to be erased from the model. The retraining of the models comes at a very high cost. The request for erasure may in principle be rejected.
For example: an AI model to help insurance companies has been trained on a set of real cases. The model is not anonymous as its anonymity would have undermined its performance. A person, who had not objected to his presence in the training base, realises that information about the financial settlement of his claim comes out in the results of the model when some questions are asked. The controller must then in principle propose solutions to respond to the request for erasure of model data.
Techniques to ensure people’s rights to their data in AI models are evolving: controllers must remain alert to the fact that some requests, which could be refused today, will have to be processed tomorrow if the state of the art, in particular on re-identification and unlearning techniques, progresses. Without prejudice to effective methods that can be developed by the actors, the CNIL identifies, to date, two main techniques:
- the most effective is retraining, which makes it possible to erase or rectify data at the heart of the model;
- alternatively, where retraining is not possible, filters can be used to address certain effects of the model without deleting data from the model itself.
They will be examined in turn.
Where the controller still has the training data, a retraining of the model makes it possible to respond to the exercise of rights following their implementation in the dataset.
This retraining may take place periodically in order to handle several requests for the exercise of rights at the same time. In principle, the controller must respond to a request as soon as possible, and at the latest within one month. However, the GDPR provides that this period may be extended by two months in view of the complexity and number of requests (e.g. depending on the extent of the retraining to be carried out), provided that the person is informed thereof. When retraining is not planned or possible within this timeframe, the use of filters (see below) may be considered pending the next training of the model.
The controller will then have to provide an updated version of the AI model to its users, if necessary by contractually requiring them to use only a regularly updated version. A good practice is to communicate more widely about updates to datasets or templates, for example in the documentation of the dataset or on the suppliers' website, to allow data subjects to know the extent to which their requests have been complied with. This also involves encouraging recipients of earlier versions to delete them or replace them with the latest version.
Alternatively, when the retraining of the model proves disproportionate (temporarily or definitively), the CNIL recommends the implementation of other types of measures to protect the data and privacy of individuals. If the state of the art evolves rapidly, the CNIL recommends in particular the use of measures consisting in filtering the outputs of a system to allow to respond to the exercise of the rights to rectification, objection or erasure if the controller demonstrates that they are sufficiently effective and robust (i.e. they cannot be circumvented).
In practice, the most robust measures in this area are not applied to the model itself but to the AI system that implements the model, in order to limit its outputs. The model provider must therefore ensure that these measures are effectively put in place, by the model provider and/or by the users, depending on the deployment mode. A how-to-sheet dedicated to the allocation of responsibilities between the provider and the deployer of a non-anonymous AI model will be published soon.
- It is recommended to use general rules preventing the generation of personal data, rather than a ‘blacklist’ of those who have exercised their rights.
- These general rules should seek to detect and pseudonymise the personal data concerned in outputs where the production of personal data is not the intended purpose, for example through named entity recognition techniques.
- Otherwise, a blacklist should be considered. Its safety should be ensured and the impact of the filter on outputs should be assessed, in particular by assessing the increase in the risk of an membership inference attack that it may induce by modifying the statistical distribution of outputs, thus making it possible to identify persons who have objected.
The provider of the AI model or system should then provide its deployer with the means to implement those measures to enable it to meet its own obligations.
Good practice
As the technical solutions currently available are not satisfactory in all cases where a model is subject to the GDPR, the CNIL invites providers as a priority to anonymise training data or, if they are unable to do so, to ensure that the AI model is anonymous after its training.
Derogations from the exercise of rights on datasets or on the AI model
Point of vigilance: where it can rely on certain derogations, the controller must inform persons in advance (i.e. at the time of the provision of the information) that their rights are subject to restrictions and explain the reasons for the refusal of the exercise of a right to the persons who have requested it.
Apart from situations in which the controller is unable to identify the data subjects (see the dedicated developments above), the controller may derogate from the exercise of rights in the following cases:
- The request is manifestly unfounded or excessive (Article 12 GDPR).
- The organisation receiving the request is not the controller in question.
- The exercise of one or more rights is excluded by French or European law (within the meaning of Article 23 GDPR).
- For processing operations for scientific or historical research purposes, or for statistical purposes: where the exercise of the rights would be likely to render impossible or seriously impede the attainment of those specific purposes (within the meaning of Article 116 of the Decree implementing the Law on information technology and freedoms).