Wanna become a data scientist within 3 months, and get a guaranteed job? Then you need to check this out !

Interpretable machine learning

One of the main differences between machine learning and statistics is that machine learning is focused more on performance, whereas statistics is focused more on interpretability. The differences between statistics and machine learning is something I have written about in the past. However, as machine learning is becoming an integral part of many aspects of the industry and society, the need for transparency and accountability has increased. The last few years have witnessed an increasing interest in methods that can help us peak into the black box and understand how the decisions are being made.

Furthermore, the current interest is in what is called model agnostic methods, that is methods that can work for any kind of model. For example, random forests and gradient boosted trees can produce feature importance scores for individual attributes. Also, individual decision trees can explain very easily how a decision was made. However, we are interested in methods that can work for any kind of model and scenario.

What is interpretable machine learning?

Interpretable machine learning describes a research niche where the focus is on understanding how machine learning models reach their decisions. Interpretable machine learning can make the models more transparent and less biased.

Right now, the following options are available to us:

  1. Model agnostic feature importance
  2. Individual conditional expectation plots and partial dependency plots
  3. Global surrogate models
  4. Local surrogate models (LIME)
  5. Shapley value explanations

Model agnostic feature importance

Fisher, Rudin and Dominici proposed a feature importance method that can work for any kind of model. This method, in simple words, works as follows:

  1. Fit a model M on dataset X and measure error E.
  2. Permute X, to get X2 and fit M on X2 to measure error E2
  3. Calculate the difference E2 – E, or calculate the ratio E2/E

In simple words, this method simply destroys the association between one feature and the target variable, and then measures how much worse the prediction becomes. A simple, but effective method. However, it is not very useful if what we care about is not the error of the model, but rather the response of the model, or an explanation as to how it got to a particular decision.

Individual Conditional Expectation plots

Individual conditional expectation plots are a generalisation of the partial dependency plots. Partial dependence plots simply show the effects of a variable on the response, marginalising across all other variables. So, how we do that? One simple way to do it, is to simply average values of  variable across all instances, keeping intact only the variable of interest. In the plot below, you can see an example from the California housing dataset. This is an example by scikit-learn. You can see that for the variable HouseAge, the value of the house increases by a small amount as the variable increases, whereas the relationship with the average number of rooms (AveRooms), seems to be a bit more complicated.

partial dependence plot

This is a simple and intuitive method to use. However, there is a fundamental flaw with it. What happens if there are clusters of observations within our data, where the response variable behaves in a different way? Averaging across all instances leads to a loss of information. This is where individual conditional expectation plots come in play. This type of plot simply breaks down the partial dependence plot for each individual instance. In the plot below, you can see an example of these plots in action. You can see that around 0, there seem to be three different clusters of instances. For some instances, the effect of the variable is negative, for some positive, and for some relatively neutral. The yellow like is equivalent to a partial dependence plot.

individual conditional expectation plot example

Global Surrogate Models

Global surrogate models are interpretable models that are trained on the output of a machine learning model, and are then used to explain it. A typical example of an interpretable model is a decision tree. By taking the predictions of a deep neural network for example, and then training a decision tree on them, we hope to better understand how the network reaches the decisions. Since we can also control the depth of the tree, we can ensure that the final decision tree is understandable by a human. While this idea is easy to implement, the issue is that the model never sees the actual dataset, but rather the output of the machine learning model. Also, it is not clear what is the cutoff error value for the surrogate value. Is 50% of the variance explained enough, or we need 90%?

Local Surrogate Models

Local interpretable model-agnostic explanation models are a relatively new idea, first having been formulated in 2016. The idea behind the local surrogate models is that we are looking into explaining the predictions of a model for one particular instance. The method works as follows.

  1. Choose your instance of interest.
  2. Perturb the dataset and get new predictions from the model.
  3. Weight the inputs according to their distance to the instance of interest.
  4. Run an interpretable model on the weighted inputs.

You can see in the picture below, for example, that how by removing patches from an image we can understand which parts of it are more important for successfully classifying it as a frog.

lime feature selection example

The LIME project offers code in python to run local surrogate models.

Shapley value explanations

This is a quite interesting method that borrows tools from game theory. It was first formulated by Erik Štrumbelj and Igor Kononenko in 2014. This approach sees feature selection as a game between different features. While the math can get quite complicated a simple explanation is as follows. We care about why the value of an instance differs from the average value. We calculate multiple permutations of the features, and then calculate the contribution of each feature to the current result. While the method is computationally intensive, and not as easily interpretable as LIME for example, it is very well grounded theoretically. So, depending on your viewpoint or use case, this can be a particularly interesting property.

Software for interpretable machine learning

If you are interested to apply some of these techniques yourself, here are some extra resources for interpretable machine learning.

lime: For LIME

Shap: Python library for getting Shapley values.

sklearn-expertsys Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

ML Insights Package to understand Supervised ML Models. The package can work with scikit-learn and XGBoost.

FairML: An end-to-end toolbox for auditing predictive models by quantifying the relative significance of the model’s inputs.

iml: Package for R with conditional expectation plots and other functionality


Wanna become a data scientist within 3 months, and get a guaranteed job? Then you need to check this out !


Dr. Stylianos Kampakis is the owner and author of The Data Scientist.