Artificial intelligence: The CNIL is continuing its work to develop innovative and privacy-protective AI

02 July 2024

By submitting new how-to sheets on the development of artificial intelligence systems for public consultation, the CNIL shows how the General Data Protection Regulation (GDPR) enables the promotion of innovative and responsible AI.

Respond to data protection and privacy issues raised by the development of AI

2022 was marked by a paradigm shift in the deployment and use of AI systems by and for the general public. Since then, the CNIL has been able to observe an acceleration of the desire to adopt these technologies, in all sectors: health, public services, public security, etc. The CNIL is aware that the adoption of this technology is a major factor for France in terms of competitiveness, innovation and sovereignty in the coming years.

Nevertheless, the European legal framework also aims to ensure a high level of protection of fundamental rights: there are many questions about the legal framework for these technologies and the impacts they can have on individuals.  The need for answers is therefore increasingly pressing, in order to enable the development of these technologies within a framework of trust.

Build the link between the GDPR and the European AI act

While the European regulation on AI has just been adopted and will enter into force on a staggered basis in the coming months, the CNIL wishes to provide legal certainty to the players in the sector by anticipating the relationship between the AI Act and the GDPR. Indeed, the GDPR applies to system providers, independently of the AI Act, when they use personal data for their development.

It's in this context that CNIL opens, for the second time, a public consultation to all actors in order to elaborate its recommandations :

  • the how-to sheets put forward for consultation today aim to deal with several major issues of innovation and protection: the use of web scraping, which is a widespread practice in particular for building large language models, the publication of AI models in open source but also the management of people’s rights, which is the cornerstone of the legal framework on personal data.
  • and also a questionnaire on the topic of applying the GDPR to AI models trained with personal data.

This consultation follows on from a first set of recommandations recently published after a public consultation.

Participate in the public consultation

Consult and exchange to build innovative and responsible AIs 

Companies, research labs, public authorities, trade unions of employees, professional federations, etc., the CNIL had the opportunity to have numerous exchanges with stakeholders involved in the design and development of AI systems. They raised a need for clarification on the applicable legal framework for the most widespread practices in order to build innovative and responsible AI.

Legitimate interest is the most common legal basis for the development of AI systems

While several legal bases may be used to justify the processing of personal data for the purpose of developing an AI system, the “legitimate interest” of the body carrying out the training seems to be the most appropriate in the general case. This basis requires the establishment of a risk assessment for individuals and may require specific implementation conditions to protect individuals and their data. The CNIL proposes concrete elements for data controllers, in particular when using web scraping techniques or publishing AI models in open source.

Web scraping practices can be implemented but must be particularly supervised 

The development of AI systems in some cases requires access to large datasets that can be built from data collected online. Important technical and organizational safeguards are essential to ensure that rights are respected, as in most cases the persons whose data are used are not informed of the existence of such processing. In order to improve information and facilitate the exercise of rights by individuals, the CNIL proposes to centralise a voluntary register of such practices.

Open source dissemination is a positive practice in many respects and for data protection in particular

The AI ecosystem has historically been built around community sharing and collaboration. This movement is beneficial in that it improves transparency on the functioning of AI models and systems and allows for their discussion, peer review and foster development. However, it accentuates certain risks such as those of malicious use or relating to security.

The CNIL considers that open source dissemination is beneficial for data protection in that it increases transparency for individuals. However, this practice presupposes that openness is real and that the necessary safeguards are implemented, in particular as regards the possibilities for re-use offered and the monitoring of models and their evolutions over time in order to allow effective information and exercise of rights.

The information and exercise of people’s rights must be at the heart of stakeholders’ thinking

Informing and enabling people to exercise their rights effectively is a key issue for AI systems developed using personal data. The CNIL shares insights on the means to be implemented to fulfil these obligations vis-à-vis the data subjects and indicates in which cases derogations could apply. The CNIL also proposes answers concerning certain rights that were at odds with the statistical nature of AI: right of rectification and right of deletion in particular.

The applicability of GDPR to AI models questioned

Machine learning is based on the creation of models. These are representations learned from training data. Since about 2010, a field of research in computer science has emerged on the subject of securing AI models and in particular the possibilities of memorizing, extracting or regurgitating information from training datasets. These practices can have significant implications for the confidentiality of personal data and the question of the application of the GDPR to the models themselves arises, when they are not considered anonymous. The CNIL therefore asks the professionals that this issue impacts to help it develop its future position.

The development of AI systems can be reconciled with the challenges of protecting privacy. Moreover, taking this imperative into account will make it possible to bring out devices, tools and applications that are ethical and faithful to European values. This is the condition for citizens to trust these technologies.

See the recommandations