AI is a hot topic, not least among the Data Protection Authorities, who have recently been very active developing guidance papers aimed at helping organisations grasp and assess the complexities involved. Examples of such publications include the very recent and open for consultation ICO’s AI auditing framework, or 2018 report from Norwegian DPA on Artificial Intelligence and privacy. These documents look at implications for data protection, and challenges of AI compliance with GDPR. This blog article attempts to present selected considerations and challenges that DPOs should keep in mind when preparing a DPIA for Artificial Intelligence. It does not comprehensively cover all aspects of data protection compliance, rather it focuses on those that are specific to AI and are at risk of not being addressed in the standard DPIA process.
Data is at the heart of AI, and therefore this topic will always be close to data protection. But it is worthwhile mentioning that AI may bring a whole host of implications in other areas, such as those named in the latest European Commission report: product safety and liability, laws on equal treatment, consumer protection rules, or future accessibility requirement for goods and services. The document heralds an AI conformity assessment, similar to the ones which can be found in product safety and liability rules, which would come as an addition to already required by law DPIA.
Many shades of AI
In my experience, people would often drop AI in a conversation, leaving everyone feeling very sure they know what is being discussed, and I am falling into that trap constantly myself. For example, the project proponent would say: “we will share this data with that partner and they will use AI to figure out how to make our process more efficient”, and no further questions would be asked. But Artificial Intelligence is a very broad term. It encompasses any software that is built to solve a problem through some intelligence.
If it is written in Python, it’s probably machine learning
If it is written in PowerPoint, it’s probably AI
-tweet of Mat Velloso, Technical Advisor to Satya Nadella
What has been very popular lately is machine learning and most likely AI projects that land on our desks relate to that technique. It is one of the subdomains of AI, and in simple terms it means applying algorithms and statistical models to data, so that the system performs specific task relying on patterns and inference, instead of instructions. It could take a form of supervised or unsupervised learning. Supervised learning involves mapping X to Y in a number of examples and then asking the program to map the rest, based on similarities. In unsupervised learning, nothing is pre-labelled, only the patterns are revealed, and the model is trained to recognize them.
Finally, deep learning is one of the paradigms of machine learning. While machine learning typically involves two steps, i.e. detection and recognition, in deep learning there are many steps and layers of computation. This is also where neural networks are used, which are said to resemble the complex workings of the neurons in human brain.
And this brings me the first important point of this blog: the type of AI will critically determine the type of privacy issues that will need to be addressed. It is important that we don’t stop our investigations at the statement that AI will be used, and request description of the way the technology works without any buzz words. For example, deep learning method will create a black box effect, where we may not fully know on what basis certain conclusion has been reached by the system. This is big problem for transparency and explainability. At the same time, in traditional machine learning where you label part of the data and let computer do the rest according to the same pattern, explainability and transparency will not be equally difficult to fulfil.
The beauty of a living thing is not the atoms that go into it, but the way those atoms are put together.
According to ICO, the key to successful AI adoption are diverse, well-resourced teams who together work from the very outset of the project and collectively solve the challenges. This way of work may not be natural, as we have grown used to narrow specializations and teams focusing only on their own turf, sending matters off to another person if something does not neatly fit into our role descriptions. The complexities of AI will inevitably change this and will require teams working together across disciplines such as: legal, compliance, privacy, data science, IT-development, risk management, security, and so on.
The theory and practice
In practice, the project will be started by highly technical resources, and will carry on like this for a long time. Once the go to market plan is set and there is a release deadline, they will start seeking DPO or Compliance approval.
In theory before starting, the organization should establish internal governance and risk management framework for its AI projects, map out roles and responsibilities, upskill and train relevant members and, most importantly, involve senior management and the DPO from the start. While the latter scenario may seem utopian in a fast-paced environment, the approach met so often in practice will not be sufficient to bring AI project to life in a compliant manner. Especially taking into account the requirements for the AI DPIA set out in the ICO guideline.
In ICO’s words ‘zero tolerance to risk is not possible’. It has become a cliché statement, yet one that from practical experience is very difficult to apply. Especially given that very often the decision on tradeoffs or risk acceptance are expected to be delivered singlehandedly by the DPO. This brings back the importance of internal governance, well-resourced teams, and ideally a body – comprising senior management individuals – to help in these decisions. Such governance does not need to be per project and could be established for all complex initiatives involving new technologies.
While there is tolerance to the fact that certain risks will remain, there is no tolerance to not properly assessing them. It is imperative that risks are identified, managed and mitigated. Each risk should be enlisted, its likelihood and severity should be scored, and mitigating action assigned. This needs to be documented in the DPIA.
DPIA for AI may require extra considerations.
Most DPOs in data-dependent companies carry out or assist their organisations with dozens if not hundreds of DPIAs annually. They typically use pre-defined templates or software to aid them. The tools may be perfectly suitable for the task at hand, but may require that certain creativity and a degree of caution is applied. Application of AI may bring distinct risks or challenges, not previously occurring. There is a risk that by merely following the known template, we will fail to ask important questions and consequently fail to assess important risks.
The starting point is the same as for all other projects: you need to describe what data you will collect and how, intended use and storage, description of activity including data flows, data sources, etc. Some of the issues that may be distinct for AI projects are:
1. Analysing the risk of the fairness of AI outcome that may be caused due to errors in system performance.
2. Documenting any trade-offs you are making (e.g. between explainability and accuracy, or statistical accuracy and minimization).
3. Considering and describing envisaged human oversight, and how human vs algorithmic accuracy of decision-making compares
4. Seeking and documenting the views of individuals concerned, if necessary.
To ensure fairness in the AI context, the accuracy and quality of the dataset will be of utmost importance. If the quality of the data is poor, the result will be less accurate and increase the risk of biased decision. Training dataset should not only be cleaned to exclude outdated records, but also be designed such that it is the best possible reflection of reality.
There is one more element of fairness that is specific to AI systems, namely ‘statistical accuracy’. While accuracy principle of GDPR refers to the personal data themselves, in the context of AI statistical accuracy relates to the output. To calculate your accuracy, you simply count how many times the answer is correct. Thus, before deployment, comprehensive testing should be
carried out documenting the number of false positives and false negatives. This is to ascertain and document that the system does not create a high risk of harmful decisions.
One of the methods for improving statistical accuracy is adding more data to the dataset. For example, to reduce the risk of discrimination towards minorities, there can be more data added representing such minorities to increase the likelihood of accurate result for this group.
In general, the more personal data we throw in about us, the more statistically accurate
outcome will be achieved. How to square that with the principle of data minimization? This is where we face the trade-off between GDPR principles, fairness and minimization in this case. Striking the right balance should be driven by context, as our hierarchy of values will be driven
by a given situation. For example, as demonstrated in ICO’s Project Explain, people deem explainability more important in criminal justice scenario, than healthcare context.
Some types of AI, systems will not be very transparent by definition. Hence it be challenging to apply the transparency principle. In particular, this could happen in a deep learning scenario where a so-called black box effect is observed, whereby we do not really know based on what weight and what factors the system has taken decision. Even there, your organisation needs to consider general system behaviour, and what influences the result – this needs to be explained. All other elements as described in art 13 and 14 of GDPR, including the responsible party (controller) and other details must be presented.
In particular, a process for human intervention in case of decisions with significant legal effect must be described, as well as algorithm’s decision-making logic behind the decision taken. Translating AI models into everyday language will not be a simple task. You may refer to ICO’s publication: Explaining decisions made with AI for practical tips in terms of executing transparency.
Remember that your organisation has a right to keep their IP rights and trade secrets away from the public. To the extent fulfilment of the transparency provision could affect one of those rights, certain information can be limited from disclosure requirement.
The 6 key tips for
approaching a (D)PIA when Artificial Intelligence is involved:
1. Start with a non-buzz-word description of how the system is intended to work. Describe and document the true nature of the system you are introducing.
2. The DPIA should be a product of diverse teamwork, not the sole effort of the DPO.
3. Organisations should not spare resources on their team competences. Everyone involved in work on artificial intelligence needs to have a broad understanding of risks and challenges.
4. Engage senior management in decision making and be ready to introduce proper governance.
5. Watch out for common pitfalls such as bias stemming from bad quality data or system not working as intended, test statistical accuracy, ensure there is human oversight and, where necessary, seek to involve individuals in a consultation process.
6. Remember trade-offs will be inevitable and you will have to make them. When you do, context will be critical in striking the right balance as it will drive the hierarchy of values/principles.