X
Innovation

Explainable AI: A guide for making black box machine learning models explainable

In the future, AI will explain itself, and interpretability could boost machine intelligence research. Getting started with the basics is a good way to get there, and Christoph Molnar's book is a good place to start.
Written by George Anadiotis, Contributor

Machine learning is taking the world by storm, helping automate more and more tasks. As digital transformation expands, the volume and coverage of available data grows, and machine learning sets its sights on tasks of increasing complexity, and achieving better accuracy.

But machine learning (ML), which many people conflate with the broader discipline of artificial intelligence (AI), is not without its issues. ML works by feeding historical real world data to algorithms used to train models. ML models can then be fed new data and produce results of interest, based on the historical data used to train the model.

A typical example is diagnosing medical conditions. ML models can be produced using data such as X-rays and CT scans, and then be fed with new data and asked to identify whether a medical condition is present or not. In situations like these, however, getting an outcome is not enough: we need to know the explanation behind it, and this is where it gets tricky.

Explainable AI

Christoph Molnar is a data scientist and PhD candidate in interpretable machine learning. Molnar has written the book "Interpretable Machine Learning: A Guide for Making Black Box Models Explainable", in which he elaborates on the issue and examines methods for achieving explainability.

Molnar uses the terms interpretable and explainable interchangeably. Notwithstanding the AI/ML conflation, this is a good introduction to explainable AI and how to get there. Well-researched and approachable, the book provides a good overview for experts and non-experts alike. While we summarize findings here, we encourage interested readers to dive in for themselves.

Interpretability can be defined as the degree to which a human can understand the cause of a decision, or the degree to which a human can consistently predict a ML model's result. The higher the interpretability of a model, the easier it is to comprehend why certain decisions or predictions have been made.

xai-book.jpg

Christoph Molnar is a data scientist and PhD candidate in interpretable machine learning. In the book "Interpretable Machine Learning: A Guide for Making Black Box Models Explainable" he elaborates on the issue and examines methods for achieving explainability

There is no real consensus about what interpretability is in ML, nor is it clear how to measure it, notes Molnar. But there is some initial research on this and an attempt to formulate some approaches for evaluation. Three main levels for the evaluation of interpretability have been proposed:

Application level evaluation (real task): Put the explanation into the product and have it tested by the end user. Evaluating fracture detection software with a ML component for example would involve radiologists testing the software directly to evaluate the model. A good baseline for this is always how good a human would be at explaining the same decision.

Human level evaluation (simple task) is a simplified application level evaluation. The difference is that these experiments are carried with laypersons instead of domain experts. This makes experiments cheaper and it is easier to find more testers. An example would be to show a user different explanations and the user would choose the best one.

Function level evaluation (proxy task) does not require humans. This works best when the class of model used has already been evaluated by someone else in a human level evaluation. For example, it might be known that the end users understand decision trees. A proxy for explanation quality may be the depth of the tree: shorter trees would get a better explainability score.

Molnar includes an array of methods for achieving interpretability, noting however that most of them are intended for the interpretation of models for tabular data. Image and text data require different methods.

Scope of Interpretability

As ML algorithms train models that produce predictions, each step can be evaluated in terms of transparency or interpretability. Molnar distinguishes among Algorithm Transparency, Global Holistic Model Interpretability, Global Model Interpretability on a Modular Level, Local Interpretability for a Single Prediction, and Local Interpretability for a Group of Predictions.

Algorithm transparency is about how the algorithm learns a model from the data and what kind of relationships it can learn. Understanding how an algorithm works does not necessarily provide insights for a specific model the algorithm generates, or individual predictions are made. Algorithm transparency only requires knowledge of the algorithm and not of the data or learned model.

Global holistic model interpretability means comprehending the entire model at once. It's about understanding how the model makes decisions, based on a holistic view of its features and each of the learned components such as weights, other parameters, and structures. Explaining the global model output requires the trained model, knowledge of the algorithm and the data.

ai robot thinking

Interpretability is a key element for machine intelligence

Getty Images/iStockphoto

While global model interpretability is usually out of reach, there is a good chance of understanding at least some models on a modular level, Molnar notes. Not all models are interpretable at a parameter level, but we can still ask how do parts of the model affect predictions. Molnar uses linear models as an example, for which weights only make sense in the context of other features in the model.

Why did the model make a certain prediction for an instance? This is the question that defines local interpretability for a single prediction. Looking at individual predictions, the behavior of otherwise complex models might be easier to explain. Locally, a prediction might only depend linearly or monotonically on some features, rather than having a complex dependence on them.

Similarly, local interpretability for a group of predictions is about answering why a model made specific predictions for a group of instances. Model predictions for multiple instances can be explained either with global model interpretation methods (on a modular level) or with explanations of individual instances.

Global methods can be applied by taking the group of instances, treating them as if the group were the complete dataset, and using the global methods with this subset. The individual explanation methods can be used on each instance and then listed or aggregated for the entire group.

Interpretable Models and Model-Agnostic Methods

The easiest way to achieve interpretability as per Molnar is to use only a subset of algorithms that create interpretable models. Linear regression, logistic regression and decision trees are commonly used interpretable models included in the book. Decision rules, RuleFit, naive Bayes and k-nearest neighbors, which is the only one not interpretable on a modular level, are also included.

Molnar summarizes interpretable model types and their properties. A model is linear if the association between features and target is modelled linearly. A model with monotonicity constraints ensures that the relationship between a feature and the target outcome always goes in the same direction over the entire range of the feature: An increase in the feature value either always leads to an increase or always to a decrease in the target outcome, making it easier to understand.

Some models can automatically include interactions between features to predict the target outcome. Interactions can be included in any type of model by manually creating interaction features. Interactions can improve predictive performance, but too many or too complex interactions can hurt interpretability. Some models handle only regression, some only classification, while others can handle both.

opera-snapshot-2020-08-06-135718-christophm-github-io.png

Interpretable machine learning models and their properties. Image: Christoph Molnar

There are, however, potential disadvantages in using interpretable models exclusively: predictive performance can be lower compared to other models, and users limit themselves to one type of model. One alternative is to use model-specific interpretation methods, but that also binds users to one model type, and it may be difficult to switch to something else.

Another alternative is model-agnostic interpretation methods, i.e. separating the explanations from the ML model. Their great advantage is flexibility. Developers are free to use any model they like, and anything that builds on a model interpretation, such as a user interface, also becomes independent of the underlying ML model.

Typically many types of ML models are evaluated to solve a task. When comparing models in terms of interpretability, working with model-agnostic explanations is easier because the same method can be used for any type of model, notes Molnar. Model, explanation and representation flexibility are desirable properties of model-agnostic explanation systems.

The methods included in the book are partial dependence plots, individual conditional expectation, accumulated local effects, feature interaction, permutation feature importance, global surrogate, local surrogate, anchors, Shapley values and SHAP.

The Future of Interpretability

Molnar's book also examines example-based explanations, which work by selecting particular instances of the dataset to explain model behavior or data distribution. These are mostly model-agnostic, as they make any model more interpretable. Example-based explanations only make sense if we can represent an instance of the data in a humanly understandable way, which works well for images.

As far as deep neural networks go, Molnar notes using model-agnostic methods is possible. However, there are two reasons why using interpretation methods developed specifically for neural networks makes sense: First, neural networks learn features and concepts in their hidden layers, so special tools are needed to uncover them. Second, the gradient can be utilized to implement interpretation methods more computationally efficient than model-agnostic methods.

Molnar concludes by offering his predictions on the future of interpretability. He believes the focus will be on model-agnostic interpretability tools, as it's much easier to automate interpretability when it is decoupled from the underlying ML model. Automation is already happening in ML, and Molnar sees this trend as continuing and expanding to include not just interpretability, but also data science work.

Molnar notes that many analytical tools are already based on data models, and a switch from analyzing assumption-based, transparent models to analyzing assumption-free black box models is imminent. Using assumption-free black box models has advantages, he notes, and adding interpretability may be the way to have the best of both worlds.

In the Future of Interpretability, robots and programs will explain themselves, and interpretability could boost machine intelligence research. Getting started with the basics of explainable AI is a good way to get there, and Molnar's book is a good place to start.

Editorial standards