Why do we need explainable AI?

Machine learning and AI have shown tremendous promise in addressing some of the hardest challenges faced in industry. With more data available now than ever before, we can identify complex relationships present in practical problems, and utilise them to attain predictions, forecasts and insights that benefit businesses. However, when AI is applied to industrial problems, “black box” models are no longer sufficient. When the internal workings of a model are not well understood, it is often referred to as a “black box model”. In the application of AI, models can fall under this umbrella due to the non-linear relationships they exploit in the data. When these models are then applied to practical problems, it is vital that we understand why they are generating the outputs that they are.

What did the summer project involve?

To establish a greater understanding of AI outputs, explainable AI has become an active research field: developing methods and tools to understand the complex behaviour of AI models. In the summer of 2022, we explored the limitations and possible extensions of these explainable AI methods through a summer project coordinated between Oxford University and Smith Institute. Over the course of the summer, Gabriela van Bergen Gonzalez-Bueno joined the Smith Institute and investigated the current state of academic research regarding explainable AI and extended this to address concerns for industry, under the guidance of Dr Kieran Kalair, a senior mathematical consultant at the Smith Institute with a particular interest in explainable AI and who wanted to explore the field further.

What existing tools & methods are there for explainable AI?

There are many tools one can use to understand how an AI model makes decisions and generates outputs. We instead focused on a model-agnostic and mathematically grounded methodology called Shapley Additive Explanations (SHAP). SHAP originates in game-theory and has seen widespread uptake in industry due to its intuitive interpretation and it being applicable to any model (Lundberg & Lee, 2017).

Explainable AI Graph
Figure 1: An example application of SHAP to an AI problem. The ‘expected model output’ or ‘baseline’ is marked at the bottom of the plot, taking the value 22.533. The plot shows how an AI model takes a single instance to predict and reaches its final predicted value of 24.019. The ‘features’ the model uses are shown on the y-axis. The feature LSTAT was the most impactful on the model’s output, increasing it by 5.79. The second most impactful feature was RM, decreasing the model output by 2.17. Whilst the features in this example come from a housing dataset, the same breakdown can be produced for any machine-learning model. In another application, the features might relate to weather forecasts or human health metrics. This example is taken from https://github.com/slundberg/shap.

Why you should take care when using SHAP

There is enormous promise in using SHAP to supplement AI models, particularly in industrial contexts. However, the mathematical underpinnings must be understood and appreciated to ensure the explanations are valid and provide accurate interpretations for use by decision-makers.

There are many methodologies for computing SHAP values, each with benefits and drawbacks depending on the specific problem and model considered. Across each of these methods, an underlying mathematical problem is faced: we need to determine the output of a prediction model if it had only known a subset of all the features it has been trained on. However, this presents a practical challenge. Many AI models are not equipped to handle arbitrary missing values – a neural network cannot take half an image and analyse it in the same way it can a full image. Approximations are therefore needed, in the case of SHAP, one can approximate the desired quantity by evaluating the model many times over a set of generated feature vectors and averaging the output.

This presents a new problem, how do we generate the feature vectors used in this approximation? A naive approach is to select which features we want to omit from the model, and randomly replace them using values from the training data. Doing this and averaging the model output over all feature vectors generated in this way is equivalent to assuming all features the model uses are independent. However, this assumption is often not true. Imagine an example where a model is trained to predict blood pressure using data on age, height, and weight. Applying the naive approach, one might evaluate their model using a feature vector that represents a 5-year-old child that is 6 feet tall and weighs 80 kilos. Clearly, this can lead to unlikely or even impossible data-points being used when computing model explanations, which question their interpretation’s validity.

Very recent academic research attempts to remedy this, by attempting to infer the conditional distribution of feature subsets in place of the independence assumption. In the blood pressure example, we would therefore generate feature vectors by asking “given that the patient I want to generate explanations for is 5 years old, what might likely height and weight values be?”

How exactly to best infer this from the data in the context of explainable AI was the focus of our work in the summer.

How did the summer project explore additional capabilities for explainable AI?

Throughout the project, a methodology was developed to indirectly approximate this conditional distribution for each subset of features required when generating explanations. We focused on a model-agnostic methodology that used the data itself to judge the likelihood of a generated feature vector, rather than assuming a standard statistical distribution that might not capture the complex behaviours in the data. The proposed method ultimately ensures that the explanations generated from SHAP are not uniformly influenced by all generated feature vectors. Instead, the likelihood of each feature vector is accounted for, which ensures the explanations better reflect the practical problem to which the AI model is being applied.

To do this, we designed a process that took the initial feature vectors generated when assuming independence and determined appropriate weightings for each of these vectors that could be incorporated into the expectation, indirectly accounting for dependence in the data. A schematic of the process is shown below.

The key idea behind this approach is that, when expectations are computed, weighting all points uniformly allows unrealistic or impossible points to disproportionally influence the result. However, weighting the terms in this expectation appropriately alters the influence of these unlikely points. The schematic above still raises questions for implementation:

  • How many data-points should be used when considering the k-nearest data-points to a generated feature vector?
  • What distance function should be used to determine how far data-points lie from the generated feature vectors?
  • Given these average distances between generated feature vectors and their nearest neighbours in the training data, what function should be used to compute weights for use in the expectation?

These questions were investigated throughout the project, with various approaches proposed and tested. Additionally, the results from this approach were compared to the naive approach of assuming independence of features on problems using both simulated and real data. Across a range of datasets and model choices, the proposed method showed an improvement upon the naive approach.

We do also understand that there are two schools of thought on model explanations. One can be “true to the model” and compute SHAP values assuming feature independence. This means explanations can be generated with very unlikely or impossible feature vectors; however, it reflects what the model would do given these inputs. The alternative view is to be “true to the data”, which is the approach considered in the summer project. This ensures that structure in the data is accounted for when generating explanations and limits the influence of highly unlikely data-points. Discussion of these approaches can be found in (Aas, Jullum, & Løland, 2021), (Chen, Janizek, Lundberg, & Lee, 2020) and (Frye, et al., 2021). The most appropriate choice of methodology is application-specific, however ensuring we have robust and reliable ways to generate both forms of explanation is vital as AI becomes more prominent in industrial applications.


Our investigation into the mathematical underpinnings and limitations of existing explainable AI methods may seem like an academic exercise at first glance. However, to truly harness the power of AI in industrial contexts, we need to be able to build trust between domain experts and AI systems. Such systems are becoming increasingly central to modern society, from forecasting and supporting the operation of our power grids, use of our roads with driverless cars, and diagnosis of our health in medical applications. It is therefore vital that tools used to generate these explanations are both clearly understood and mathematically defensible. We strive to both improve the underlying tools used in explainable AI, as well as ensure their adoption into industry is done so with significant mathematical oversight to ensure the explanations generated are trustworthy.


Aas, K., Jullum, M., & Løland, A. (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence.

Chen, H., Janizek, J. D., Lundberg, S., & Lee, S.-I. (2020). True to the Model or True to the Data? Arxiv.

Frye, C., Mijolla, D. d., Begley, T., Cowton, L., Stanley, M., & Feige, I. (2021). Shapley explainability on the data manifold. Arxiv.

Lundberg, S. M., & Lee, S.-L. (2017). A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, (pp. 4768–4777).