Connect with us


AI Jenny : The Data Science Chatbot that will make you go crraazzzyyyy




If you can’t explain it simply, you don’t understand it well enough. — Albert Einstein

Disclaimer: This article draws and expands upon material from (1) Christoph Molnar’s excellent book on Interpretable Machine Learning which I definitely recommend to the curious reader, (2) a deep learning visualization workshop from Harvard ComputeFest 2020, as well as (3) material from CS282R at Harvard University taught by Ike Lage and Hima Lakkaraju, who are both prominent researchers in the field of interpretability and explainability. This article is meant to condense and summarize the field of interpretable machine learning to the average data scientist and to stimulate interest in the subject.

Machine learning systems are becoming increasingly employed in complex high-stakes settings such as medicine (e.g. radiology, drug development), financial technology (e.g. stock price prediction, digital financial advisor), and even in law (e.g. case summarization, litigation prediction). Despite this increased utilization, there is still a lack of sufficient techniques available to be able to explain and interpret the decisions of these deep learning algorithms. This can be very problematic in some areas where the decisions of algorithms must be explainable or attributable to certain features due to laws or regulations (such as the right to explanation), or where accountability is required.

The need for algorithmic accountability has been highlighted many times, the most notable cases of which are Google’s facial recognition algorithm that labeled some black people as gorillas, and Uber’s self-driving car which ran a stop sign. Due to the inability of Google to fix the algorithm and remove the algorithmic bias that resulted in this issue, they solved the problem by removing words relating to monkeys from Google Photo’s search engine. This illustrates the alleged black box nature of many machine learning algorithms.

The black box problem is predominantly associated with the supervised machine learning paradigm due to its predictive nature.

Accuracy alone is no longer enough.

Academics in deep learning are acutely aware of this interpretability and explainability problem, and whilst some argue that these models are essentially black boxes, there have been several developments in recent years which have been developed for visualizing aspects of deep neural networks such the features and representations they have learned. The term info-besity has been thrown around to refer to the difficulty of providing transparency when decisions are made on the basis of many individual features, due to an overload of information. The field of interpretability and explainability in machine learning has exploded since 2015 and there are now dozens of papers on the subject, some of which can be found in the references.

As we will see in this article, these visualization techniques are not sufficient for completely explaining the complex representations learned by deep learning algorithms, but hopefully, you will be convinced that the black box interpretation of deep learning is not true — we just need better techniques to be able to understand and interpret these models.

If these in-depth educational content is useful for you, you can subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new research updates.

The Black Box

All algorithms in machine learning are to some extent black boxes. One of the key ideas of machine learning is that the models are data-driven — the model is configured from the data. This fundamentally leads us to problems such as (1) how we should interpret the models, (2) how to ensure they are transparent in their decision making, and (3) making sure the results of the said algorithm are fair and statistically valid.

For something like linear regression, the models are very well understood and highly interpretable. When we move to something like a support vector machine (SVM) or a random forest model, things get a bit more difficult. In this sense, there is no white or black box algorithm in machine learning, the interpretability exists as a spectrum or a ‘gray box’ of varying grayness.

It just so happens, that at the far end of our ‘gray’ area is the neural network. Even further in this gray area is the deep neural network. When you have a deep neural network with 1.5 billion parameters — as the GPT-2 algorithm for language modeling has — it becomes extremely difficult to interpret the representations that the model has learned.

In February 2020, Microsoft released the largest deep neural network in existence (probably not for long), Turing-NLG. This network contains 17 billion parameters, which is around 1/5th of the 85 billion neurons present in the human brain (although in a neural network, parameters represent connections, of which there are ~100 trillion in the human brain). Clearly, interpreting a 17 billion parameter neural network will be incredibly difficult, but its performance may be far superior to other models because it can be trained on huge amounts of data without becoming saturated — this is the idea that more complex representations can be stored by a model with a greater number of parameters.

interpretable ML

Comparison of Turing-NLG to other deep neural networks such as BERT and GPT-2. Source

Obviously, the representations are there, we just do not understand them fully, and thus we must come up with better techniques to be able to interpret the models. Sadly, it is more difficult than reading coefficients as one is able to do in linear regression!

 interpretable ML

Neural networks are powerful models, but harder to interpret than simpler and more traditional models.

Often, we do not care how an algorithm came to a specific decision, particularly when they are operationalized in low-risk environments. In these scenarios, we are not limited in our selection of algorithms by any limitation on the interpretability. However, if interpretability is important within our algorithm — as it often is for high-risk environments — then we must accept a tradeoff between accuracy and interpretability.

So what techniques are available to help us better interpret and understand our models? It turns out there are many of these, and it is helpful to make a distinction between what these different types of techniques help us to examine.

Local vs. Global

Techniques can be local, to help us study a small portion of the network, as is the case when looking at individual filters in a neural network.

Techniques can be global, allowing us to build up a better picture of the model as a whole, this could include visualizations of the weight distributions in a deep neural network, or visualizations of neural network layers propagating through the network.

Model-Specific vs. Model-Agnostic

A technique that is highly model-specific is only suitable for use by a single type of models. For example, layer visualization is only applicable to neural networks, whereas partial dependency plots can be utilized for many different types of models and would be described as model-agnostic.

Model-specific techniques generally involve examining the structure of algorithms or intermediate representations, whereas model-agnostic techniques generally involve examining the input or output data distribution.

 interpretable ML

The distinction between different model visualization techniques and interpretability metrics. Source

I will discuss all of the above techniques throughout this article, but will also discuss where and how they can be put to use to help provide us with insight into our models.

Being Right for the Right Reasons

One of the issues that arise from our lack of model explainability is that we do not know what the model has been trained on. This is best illustrated with an apocryphal example (there is some debate as to the truth of the story, but the lessons we can draw from it are nonetheless valuable).

Hide and Seek

According to AI folklore, in the 1960s, the U.S. Army was interested in developing a neural network algorithm that was able to detect tanks in images. Researchers developed an algorithm that was able to do this with remarkable accuracy, and everyone was pretty happy with the result.

However, when the algorithm was tested on additional images, it performed very poorly. This confused the researchers as the results had been so positive during development. After a while of everyone scratching their heads, one of the researchers noticed that when looking at the two sets of images, the sky was darker in one set of images than the other.

It became clear that the algorithm had not actually learned to detect tanks that were camouflaged, but instead was looking at the brightness of the sky!

Whilst this story exacerbates one of the common criticisms of deep learning, there is truth to the fact that in a neural network, and especially a deep neural network, you do not really know what the model is learning.

This powerful criticism and the increasing importance of deep learning in academia and industry is what has led to an increased focus on interpretability and explainability. If an industry professional cannot convince their client that they understand what the model they built is doing, should it be really be used when there are large risks, such as financial losses or people’s lives?


At this point, you might be asking yourself how visualization can help us to interpret a model, given that there may be an infinite number of viable interpretations. Defining and measuring what interpretability means is not a trivial task, and there is little consensus on how to evaluate it.

There is no mathematical definition of interpretability. Two proposed definitions in the literature are:

“Interpretability is the degree to which a human can understand the cause of a decision.” — Tim Miller

“Interpretability is the degree to which a human can consistently predict the model’s result.” — Been Kim

The higher the interpretability of a machine learning model, the easier it is for someone to comprehend why certain decisions or predictions have been made. A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model. One way we can start to evaluate model interpretability is via a quantifiable proxy.

proxy is something that is highly correlated with what we are interested in studying but is fundamentally different from the object of interest. Proxies tend to be simpler to measure than the object of interest, or like in this case, just measurable — whereas our object of interest (like interpretability) may not be.

The idea of proxies is prevalent in many fields, one of which is psychology where they are used to measure abstract concepts. The most famous proxy is probably the intelligence quotient (IQ) which is a proxy for intelligence. Whilst the correlation between IQ and intelligence is not 100%, it is high enough that we can gain some useful information about intelligence from measuring IQ. There is no known way for directly measuring intelligence.

An algorithm that uses dimensional reduction to allow us to visualize high-dimensional data in a lower-dimensional space provides us with a proxy to visualize the data distribution. Similarly, a set of training images provides us with a proxy of the full data distribution of interest, but will inevitably be somewhat different to the true distribution (if you did a good job constructing the training set, it should not differ too much from a given test set).

What about post-hoc explanations?

Post-hoc explanations (or explaining after the fact) can be useful but sometimes misleading. These merely provide a plausible rationalization for the algorithmic behavior of a black box, not necessarily concrete evidence and so should be used cautiously. Post-hoc rationalization can be done with quantifiable proxies, and some of the techniques we will discuss do this.

Choosing a Visualization

Designing a visualization requires us to think about the following factors:

  • The audience to whom we are presenting (the who) — is this being done for debugging purposes? To convince a client? To convince a peer-reviewer for a research article?
  • The objective of the visualization (the what)— are we trying to understand the inputs (such as if EXIF metadata from an image is being read correctly so that an image does not enter a CNN sideways), outputs, or parameter distributions of our model? Are we interested in how inputs evolve through the network or a static feature of the network like a feature map or filter?
  • The model being developed (the how)— clearly, if you are not using a neural network, you cannot visualize feature maps of a network layer. Similarly, feature importance can be used for some models, such as XGBoost or Random Forest algorithms, but not others. Thus the model selection inherently biases what techniques can be used, and some techniques are more general and versatile than others. Developing multiple models can provide more versatility in what can be examined.

Deep models present unique challenges for visualization: we can answer the same questions about the model, but our method of interrogation must change! Because of the importance of this, we will mainly focus on deep learning visualization for the rest of the article.

Subfields of Deep Learning Visualization

There are largely three subfields of deep learning visualization literature:

  1. Interpretability & Explainability: helping to understand how deep learning models make decisions and their learned representations.
  2. Debugging & Improving: helping model curators and developers construct and troubleshoot their models, with the hope of expediting the iterative experimentation process to ultimately improve performance.
  3. Teaching Deep Learning: helping to educate amateur users about artificial intelligence — more specifically, machine learning.

Why is interpreting a neural network so difficult?

To understand why interpreting a neural network is difficult and non-intuitive, we have to understand what the network is doing to our data.

Essentially, the data we pass to the input layer — this could be an image or a set of relevant features for predicting a variable — can be plotted to form some complex distribution like that shown in the image below (this is only a 2D representation, imagine it in 1000 dimensions).

 interpretable ML

If we ran this data through a linear classifier, the model would try its best to separate the data, but since we are limited to a hypothesis class that only contains linear functions, our model will perform poorly since a large portion of the data is not linearly separable.

 interpretable ML

This is where neural networks come in. The neural network is a very special function. It has been proven that a neural network with a single hidden layer is capable of representing the hypothesis class of all non-linear functions, as long as we have enough nodes in the network. This is known as the universal approximation theorem.

It turns out that the more nodes we have, the larger our class of functions we can represent. If we have a network with only ten layers and are trying to use it to classify a million images, the network will quickly saturate and reach maximum capacity. If we have 10 million parameters, it will be able to learn a much better representation of the network, as the number of non-linear transformations increases. We say this model has a larger model capacity.

People use deep neural networks instead of a single layer because the amount of neurons needed in a single layer network increases exponentially with model capacity. The abstraction of hidden layers significantly reduces the need for more neurons but this comes at a cost for interpretability. The deeper we go, the less interpretable the network becomes.

The non-linear transformations of the neural network allow us to remap our data into a linearly separable space. At the output layer of a neural network, it then becomes arbitrary for us to separate our initially non-linear data into two classes using a linear classifier, as illustrated below.

 interpretable ML

The transformation of a non-linear dataset to one that is linearly separable using a neural network. Source

The question is, how do we know what is going on within this multi-layer non-linear transformation, which may contain millions of parameters?

Imagine a GAN model (two networks fighting each other in order to mimic the distribution of the input data) working on a 512×512 image dataset. When images are introduced into a neural network, each pixel becomes a feature of the neural network. For an image of this size, the number of features is 262,144. This means we are performing potentially 8 or 9 convolutional and non-linear transformations on over 200,000 features. How can one interpret this?

Go even more extreme to the case of 1024×1024 images, which have been developed by NVIDIA’s implementation of StyleGAN. Since the number of pixels increases by a factor of four with a doubling of image size, we would have over a million features as our input to the GAN. So we now have a one million feature neural network, performing convolutional operations and non-linear activations, and doing this over a dataset of hundreds of thousands of images.

Hopefully, I have convinced you that interpreting deep neural networks is profoundly difficult. Although the operations of a neural network may seem simple, they can produce wildly complex outcomes via some form of emergence.


For the remainder of this article, I will discuss visualization techniques that can be used for deep neural networks, since they present the greatest challenge in the interpretability and explainability of machine learning.

Weight Histograms

Weight histograms are generally applicable to any data type, so I have chosen to cover these first. Weight histograms can be very useful in determining the overall distribution of weights across a deep neural network. In general, histograms display the number of occurrences of a given value relative to each other values. If the distribution of weights is uniform, a normal distribution, or takes on some ordered structure can tell us useful information.

For example, if we want to check that all our network layers are learning from a given batch, we can see how the weight distributions change after training on the batch. Whilst this may not seem the most useful visualization at first, we can still gain valuable insight from weight histograms.

Below shows weight and bias histograms for a four-layer network in Tensorboard — Tensorflow’s main visualization tool.

 interpretable ML

Weight histograms in Tensorboard.

For those of you who are not familiar, there is another tool for plotting weight distributions is Weights and Biases (W&B), which is a relatively new company specializing in experiment tracking for deep learning. When training a large network such as a GAN with millions of parameters, the experiment tracking provided by W&B is very helpful for logging purposes and offers more functionality than Tensorboard (and is free for those of you in academia).

 interpretable ML

Weight histograms in Weights and Biases.

Saliency Maps

Going back to the tank problem we discussed previously, how could we troubleshoot this network to ensure the classifier is examining the correct portions of an image to make its predictions? One way to do this is with saliency maps.

Saliency maps were proposed in the paper “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps” in 2013, along with class maximization (discussed later). The idea behind them is fairly simple. First, we compute the gradient of the output category with respect to the input image. This gives us an indicator as to how our classification changes with respect to small changes in each of the input image pixels. If the small change creates a positive gradient, then we know that changes to that pixel increase the output value. By visualizing the gradients, we can examine which pixels are the most important for activation and ensure that portions of the image being examined correspond to the object of interest.

 interpretable ML

Saliency maps provide a visual representation of the input sensitivity of an output class.

Saliency maps provide us with a method for computing the spatial support of a given class in a given image (image-specific class saliency map). This means that we can look at a classification output from a convolution network, perform backpropagation, and look at which parts of the image were involved in classifying the image as a given class.

 interpretable ML

Examples of class-specific images and their prospective saliency maps for that class. Source

Another simple adjustment to the saliency method known as rectified saliency can be used. This involves clipping negative gradients during the backpropagation step so as to only propagate positive gradient information. Thus, only information related to an increase in output is communicated. You can find more in the paper “Visualizing and Understanding Convolutional Networks.”

Given an image with pixel locations i and j, and with c color channels (red, blue, and green in RGB images), we backpropagate the output to find the derivative that corresponds to each pixel. We then take the maximum of the absolute value across all color channels of the weights and use this as the ij-th value of the saliency map M.

 interpretable ML

The saliency map M is a 2D image with pixel locations i and j. The value of the map at each point is the maximum absolute value of the derivative found from backpropagation out of all the image color channels, c.

Visualizing saliency maps can easily be done in Keras using the Keras functions ‘visualize_saliency’ and ‘visualize_saliency_with_losses’.

Occlusion Maps

A similar technique to saliency mapping for discerning the importance of pixels in an image’s prediction is occlusion mapping. In occlusion mapping, we are still developing a map related to an image’s output. However, this time we are interested in how blocking out part of the image affects the prediction output of the image.

Occlusion based methods systematically occlude (blocking out) portions of the input image using a grey square and monitoring the classifier output. The image below — which shows an image classifier aiming to predict melanoma — clearly shows the model is localizing the objects within the scene, as the probability of the correct class drops significantly when the object is occluded (the heat map gets darker in the regions where the melanoma is because occluding this reduces the classifier output performance).

 interpretable ML

Classifier showing an occlusion map for a classifier predicting melanoma. Source

Occlusion mapping is fairly simple to implement as it just involves distorting the image at a given pixel location and saving the prediction output to plot in a heat map. A good implementation of this on GitHub by Akshay Chawla can be found here.

Class Maximization

One very powerful technique in studying neural networks is class maximization. This allows us to view the exemplar of a class, i.e. the input that would cause the class value of the classifier to be maximized in the output. For image data, we would call this the image exemplar of a class. Mathematically, this corresponds to:

 interpretable ML

Where x* corresponds to the image exemplar of class c. This notation says we want the image that gives us the maximum possible output for what class c is, which can be interpreted as what is the perfect c?

The outputs of this from a large scale classification network are fascinating. Below are some images generated by Nguyen, Yosinski, and Clune in their 2016 paper on deep convolutional network visualization. They performed class maximization on a deep convolutional neural network which was trained on the ILSVRC-2013 dataset.

 interpretable ML

Images generated from class maximization on a deep convolutional network. Source

Activation Maximization

Similar to class maximization, activation maximization helps us to visualize the exemplar of convolutional filters. Class maximization is a subset of activation maximization whereby the output softmax layer of a classification algorithm is maximized. Mathematically, activation maximization can be described as:

 interpretable ML

Where x* corresponds to the exemplar of hidden layer l or filter f in a deep neural network. This notation says we want the input (an image in the case of a convolutional network) that maximizes the filter or layer. This is illustrated below for the 8 layers of a deep convolutional neural network.

 interpretable ML

Images generated from activation maximization on a deep convolutional network. Source

LIME (Local Interpretable Model-Agnostic Explanations)

LIME stands for local interpretable model-agnostic explanations and even has its own Python package. Because the method was designed to be model-agnostic, it can be applied to many different machine learning models. It was first shown in papers by Marco Tulio Ribeiro and colleagues, including “Model-Agnostic Interpretability of Machine Learning” and ‘“Why Should I Trust You?”: Explaining the Predictions of Any Classifier’both published in 2016.

Local surrogate models are interpretable models that are used to explain individual predictions of black box machine learning models. LIME is an implementation of local surrogate models.

Surrogate models are trained to approximate the predictions of the underlying black box model.

Instead of training a global surrogate model, LIME focuses on training local surrogate models to explain individual predictions.

 interpretable ML

Explaining individual predictions to a human decision-maker. Source

In LIME, we perturb the input and analyze how our predictions change. Despite how it may sound, this is very different from occlusion mapping and saliency mapping. Our aim is to approximate the underlying model, f, using an interpretable model, g (such as a linear model with a few coefficients) from a set of possible models, G, at a given location governed by a proximity measure, πₓWe also add a regularizer, Ω, to make sure the interpretable model is as simple as possible. This is illustrated in the equation below.

 interpretable ML

The explanation model for instance x is the model g (e.g. linear regression model) that minimizes loss L (e.g. mean squared error), which measures how close the explanation is to the prediction of the original model f (e.g. an xgboost model), while the model complexity Ω(g) is kept low (e.g. prefer fewer features). G is the family of possible explanations, for example, all possible linear regression models. The proximity measure πₓ defines how large the neighborhood around instance x is that we consider for the explanation.

LIME for images works differently than LIME for tabular data and text. Intuitively, it would not make much sense to perturb individual pixels, since many more than one pixel contribute to one class. Randomly changing individual pixels would probably not change the predictions by much. Therefore, variations of the images are created by segmenting the image into “superpixels” and turning superpixels off or on.

 interpretable ML

An image of a cat that has been segmented into superpixels. Source

Superpixels are interconnected pixels with similar colors and can be turned off by replacing each pixel with a user-defined color such as gray. The user can also specify a probability for turning off a superpixel in each permutation.

 interpretable ML

Explaining an image classification prediction made by Google’s Inception neural network. The top 3 classes predicted are “Electric Guitar” (p = 0.32), “Acoustic guitar” (p = 0.24) and “Labrador” (p = 0.21). Source

The fidelity measure (how well the interpretable model approximates the black box predictions, given by our loss value L) gives us a good idea of how reliable the interpretable model is in explaining the black box predictions in the neighborhood of the data instance of interest.

LIME is also one of the few methods that works for tabular data, text and images.

Note that we can also generate global surrogate models, which follow the same idea but are used as an approximate model for the entire black box algorithm, not just a localized subset of the algorithm.

Partial Dependency Plots

The partial dependence plot shows the marginal effect one or two features have on the predicted outcome of a machine learning model. If we are analyzing the market price of a metal like gold using a dataset with a hundred features, including the value of gold in previous days, we will find that the price of gold has a much higher dependence on some features than others. For example, the gold price might be closely linked to the oil price, whilst not strongly linked to the price of avocados. This information becomes visible in a partial dependency plot.

 interpretable ML

An example of partial dependency plots for bike rentals with respect to temperature, humidity, and wind speed. We see that of the three variables, the temperature has the strongest dependency on the number of bike rentals. Source

Note that this is not the same as a linear regression model. If this was performed on a linear regression model, each of the partial dependency plots would be linear. The partial dependency plot allows us to see the relationship in its full complexity, which may be linear, exponential, or some other complex relationship.

One of the main pitfalls of the partial dependency plot is that it can only realistically show a 2D interpretation involving one or two features. Thus, modeling higher-order interaction terms between multiple variables is difficult.

There is also an inherent assumption of independence of the variables, which is often not the case (such as a correlation between height and weight, which are two common parameters in medical datasets). These correlations between variables may render one of them redundant or present issues to the algorithm due to multicollinearity. Where this becomes a problem, using Accumulated Local Effects (ALE) is much preferred, as it does not suffer from the same pitfalls as partial dependency plots when it comes to collinearity.

To avoid overinterpreting the results in data-sparse feature regions it is helpful to add a rug plot to the bottom of the partial dependency plot to see where data-rich and data-sparse regions are present.

Individual Conditional Expectation (ICE)

ICE is similar to partial dependency plots, except a different line is plotted for each instance in the dataset. Thus, the partial dependency plot gives us an averaged view of the dependency of a feature variable on the output variable, whereas ICE allows us to see the instance-specific dependency of a feature variable. This is useful when interaction variables are present that could be masked when looking at the average result, but become very apparent when using ICE.

 interpretable ML

An example of individual conditional expectation plots for bike rentals with respect to temperature, humidity, and wind speed. We see that each of the plots does not exhibit any heterogeneity between instances and so it is unlikely that any significant interaction terms are present. Source

Different types of ICE plots exist, such as centered and derivative ICE plots also exist but essentially provide the same information in different forms.

Shapley Values

The Shapley value is a concept drawn from an aspect of cooperative game theory developed in 1953 by Lloyd Shapley. In cooperative game theory, the Shapley value optimizes the payout for each player based on their average contribution over all permutations. When applied to machine learning, we assume that each feature is a player in the game, all working together to maximize the prediction, which can be considered the payout. The Shapley value assigns a portion of the payout to each feature based on its contribution to the output value.

For example, if you are looking at house prices and you remove a single feature from the analysis, how does this affect the model prediction? If the predicted value goes down by an amount, we can infer that this feature contributed this much to the prediction. Of course, it is not exactly that simple, we must perform this computation for every possible combination of features, which means we need to run  models where x is the number of features.

Thus, the Shapley value is the average marginal contribution of a feature value across all possible coalitions.

 interpretable ML

Equation for the Shapley value, ϕ, from cooperative game theory.

This equation may look daunting, so let’s examine it piece by piece from right to left. To know that marginal contribution of our point xᵢ, we calculate the prediction value of our model using all features in our feature subset, S, that do not contain feature xᵢ, and we subtract this from the prediction value of the subset with that feature still present. We then scale this for the total number of permutations of features and then sum all of these contributions. Thus, we now have a value which is essentially the average contribution of a feature for a trained model using every possible subset of features.

This discussion may seem quite abstract, so an example would be helpful. The example used in Christoph’s book is an excellent one to consider involving house prices. If we have features for predicting a house price which involve (1) the size of the apartment (numeric), (2) the proximity to a nearby park (binary), and (3) the floor of the building the apartment is on. To calculate the Shapley values for each feature, we take every possible subset of features and predict the output in each case (including the case with no features). We then sum the marginal contributions of each feature.

 interpretable ML

All possible feature permutations needed to consider to calculate the Shapley value for the simple house price prediction model. Source

A player can be an individual feature value, e.g. for tabular data, but a player can also be a group of feature values. For example, to explain an image, pixels can be grouped to superpixels and the prediction distributed among them.

As far as I know, there is no official package for Shapley values on Python, but there are some repositories available that have implemented it for machine learning. One such package can be found here.

The main disadvantage of the Shapley value is that it is very computationally expensive and time-consuming for large numbers of features due to the exponential increase in the number of possible permutations for a linear increase in the number of features. Thus, for applications where the number of features is very large, the Shapley value is typically approximated using a subset of feature permutations.


First introduced in a 2018 paper by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, the same researchers that created LIME. It also has its own Python package that was developed by Marco. It is also available in the ALIBI package for Python.

Anchors address a key shortcoming of local explanation methods like LIME which proxy the local behavior of the model in a linear way. It is, however, unclear to what extent the explanation holds up in the region around the instance to be explained since both the model and data can exhibit non-linear behavior in the neighborhood of the instance. This approach can easily lead to overconfidence in the explanation and misleading conclusions on unseen but similar instances. The anchor algorithm tackles this issue by incorporating coverage, the region where the explanation applies, into the optimization problem.

Similar to LIME, anchors can be used on text, tabular, and image data. For images, we first segment them into superpixels whilst still maintaining local image structure. The interpretable representation then consists of the presence or absence of each superpixel in the anchor. Several image segmentation techniques can be used to split an image into superpixels, such as slic or quickshift.

The algorithm supports a number of standard image segmentation algorithms (felzenszwalb, slic and quickshift) and allows the user to provide a custom segmentation function.

 interpretable ML

Anchor of a beagle being superimposed on other image backgrounds without predictive accuracy being reduced when classified using the Inception network. Source


Counterfactuals are the opposite of anchors. Anchors are features that when present are sufficient to anchor a prediction (i.e. prevent it from being changed by altering other features). In the anchor section, we looked at an example where these anchors were superpixels of an image. Every superpixel in the image that was not part of an anchor was, in fact, a counterfactual — we can alter the prediction by altering the counterfactuals, and not by altering the anchors.

Counterfactuals were first proposed in the Wachter et al 2017 paper titled “Counterfactual explanations without opening the black box: Automated decisions and the GDPR”. The basic idea of counterfactuals is that we want to find the smallest change we can make to the smallest number of features in order to get the desired output we want.

A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output.

 interpretable ML

What is a counterfactual? It is the smallest change to our feature space that allows us to cross a decision boundary. Source

This may sound like an underdefined task, as there are many ways in which we could alter our instance in order for it to meet our desired output. This phenomenon is known as the ‘Rashomon effect’ and as a result, we must cast our problem in the form of an optimization problem. Firstly, we want to ensure that we change as few features as possible, and change these features by the smallest amount possible, whilst also maintaining instances that are likely given the joint distribution of the data. The loss function for our optimization problem can be cast as

 interpretable ML

The loss function to be minimized as part of the counterfactual optimization problem.

The first term of the loss function represents the quadratic distance between the model prediction f’(x’) and the expected output y’. The second term represents a distance metric between the original instance and the counterfactual instance. The quadratic term has a scaling parameter that scales the importance of the prediction output to the distance between the normal instance x and the counterfactual instance x’.

The distance metric we use is the Manhattan distance because the counterfactual should not only be close to the original instance but should also change as few features as possible. The distance function is described as

This is the Manhattan distance scaled using the median absolute deviation.

If we have a small scaling parameter, the distance metric because more important and we prefer to see counterfactuals that are close to our normal instance. If we have a large scaling parameter, the prediction becomes more important and we are laxer on how close the counterfactual is to representing our normal instance.

When we run our algorithm, we do not need to select a value for our scaling parameter. Instead, the authors suggest that a tolerance, ϵ, is given by the user which represents how far we will tolerate the prediction being from our output. This is represented as

 interpretable ML

An additional constraint of our optimization problem.

Our optimization problem can then succinctly be described as

 interpretable ML

Our goal is to find the counterfactual x’ that minimizes our overall loss function whilst varying the scaling parameter λ.

The optimization mechanism for counterfactuals can be described as a ‘growing spheres’ approach, whereby the input instance, x, output value, y’, and tolerance parameter, ϵ, are given by the user. Initially, a small value for the scaling parameter, λ, is set. A random instance within the current ‘sphere’ of allowed counterfactuals is sampled and then used as a starting point for optimization until the instance satisfies the above constraint (i.e. if the difference between the prediction and the output value is below our tolerance). We then add this instance to our list of counterfactuals and increase the value of λ, which is effectively growing the size of our ‘sphere’. We do this recursively, generating a list of counterfactuals. At the end of the procedure, we select the counterfactual which minimizes the loss function.

Counterfactuals are implemented in the Python package ALIBI, which you can read about here (they also have an alternate description that may be helpful and clearer than my own).

Other Techniques

There are other techniques that I have not touched upon here which I refer the interested reader to. These include, but are not limited to:

Accumulated Local Effects

Feature Importance

Dimensional Reduction Techniques (PCA, t-SNE)

SHapley Additive exPlanations (SHAP)

Model Distillation

A good repository of topics on machine learning interpretability can also be found on this GitHub page which covers papers, lectures, and other blogs with material on the subject.

Final Comments

Deep learning visualization is a complex topic that has only just begun to be researched in the last few years. However, it will become more important as deep learning techniques become more integrated into our data-driven society. Most of us may value performance over understanding, but I think that being able to interpret and explain models would provide a competitive edge for individuals and companies in the future, there certainly will be a market for it.

Visualization is not the only method or the best method of interpreting or explaining the results of deep neural networks, but they are certainly a method and they can provide us with useful insight into the decision making process of complex networks.

“The problem is that a single metric, such as classification accuracy, is an incomplete description of most real-world tasks.” — Doshi-Velez and Kim 2017


Here are papers that I referenced in this article as well as papers I think the reader may find informative on the topic of algorithmic interpretability and explainability.

[1] Towards A Rigorous Science of Interpretable Machine Learning — Doshi-Velez and Kim, 2017

[2] The Mythos of Model Interpretability — Lipton, 2017

[3] Transparency: Motivations and Challenges — Weller, 2019

[4]An Evaluation of the Human-Interpretability of Explanation — Lage et. al., 2019

[5] Manipulating and Measuring Model Interpretability — Poursabzi-Sangdeh, 2018

[6] Interpretable Classifiers Using Rules and Bayesian Analysis: Building a Better Stroke Predictions Model — Letham and Rudin, 2015

[7] Interpretable Decision Sets: A Joint Framework for Description and Prediction — Lakkaraju et. al., 2016

[8] Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions — Li et. al., 2017

[9] The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification — Kim et. al., 2014

[10] Learning Optimized Risk Scores — Ustun and Rudin, 2017

[11] Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission — Caruana et. al., 2015

[12] “Why Should I Trust You?” Explaining the Predictions of Any Classifier — Ribeiro et. al., 2016

[13] Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead — Rudin, 2019

[14] Interpretation of Neural Networks is Fragile — Ghorbani et. al., 2019

[15] Visualizing Deep Neural Network Decisions: Prediction Difference Analysis — Zintgraf et. al., 2017

[16] Sanity Checks for Saliency Maps — Adebayo et. al., 2018

[17] A Unified Approach to Interpreting Model Predictions — Lundberg and Lee, 2017

[18] Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) — Kim et. al., 2018

[19] Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR — Wachter et. al., 2018

[20] Actionable Recourse in Linear Classification — Ustun et. al., 2018

[21] Causal Interpretations of Black-Box Models — Zhao and Hastie, 2018

[22] Learning Cost-Effective and Interpretable Treatment Regimes — Lakkaraju and Rudin, 2017

[23] Human-in-the-Loop Interpretability Prior — Lage et. al., 2018

[24] Faithful and Customizable Explanations of Black Box Models — Lakkaraju et. al., 2019

[25] Understanding Black-box Predictions via Influence Functions — Koh and Liang, 2017

[26] Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability — Kleinberg and Mullainathan, 2019

[27] Understanding Neural Networks Through Deep Visualization — Yosinski et al., 2015

[28] Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps — Simonyan, Vedaldi, and Zisserman, 2014

[29] Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks — Nguyen, Yosinski, and Clune, 2016

[30] Explanation in artificial intelligence: Insights from the social sciences — Tim Miller, 2017

[31] Examples are not enough, learn to criticize! Criticism for interpretability —Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo, 2016

[32] What’s Inside the Black Box? AI Challenges for Lawyers and Researchers — Ronald Yu and Gabriele Spina Ali, 2019

This article was originally published on Towards Data Science and re-published to TOPBOTS with permission from the author.

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.



Beyond Limits and The Carnrite Group Create Alliance to Drive AI Innovation in Oil & Gas, Utilities, Power and Industrial Sectors.




Beyond Limits, an industrial and enterprise-grade AI technology company built for the most demanding sectors, and The Carnrite Group, a leading management consulting firm focused on the energy and industrials sectors, today announced a strategic alliance.

Under the new multimillion-dollar revenue driving agreement, The Carnrite Group and Beyond Limits will provide strategic consulting services on the state of AI technologies and innovative use cases for Carnrite’s client base across the globe in the oil and gas, utilities, power and industrials sectors. Additionally, The Carnrite Group will receive IP licensing rights to Beyond Limits’ cutting-edge Cognitive AI technology, providing customers with direct access to Beyond Limits’ solutions.

“This is a very exciting time for Beyond Limits to gain such a valuable partner as The Carnrite Group,” said AJ Abdallat, CEO and founder of Beyond Limits. “Through Carnrite’s vast network, we hope to provide valuable guidance and increase awareness of the benefits of AI in critical sectors, including boosting operational insights, improving operating conditions, and ultimately, increase adoption of this next generation technology.”

Many sectors are experiencing a significant surge in demand for AI. This is particularly true in the energy and industrial sectors, where continued commodity price volatility has forced companies to find innovative ways to further reduce costs. The AI market is expected to rise to $7.78 billion by 2024, an increase of 22.49% from 2019.

“The Carnrite Group prides itself on helping clients address complex challenges and make difficult business decisions,” said Al Carnrite, CEO of The Carnrite Group. “Our agreement with Beyond Limits allows us to add their powerful Cognitive AI to our portfolio of consulting services while reinforcing our commitment to offer technologies that create value for our clients.”

Source: AJ Abdallat, CEO and founder, Beyond Limits.

Continue Reading


LALAL.AI – AI-Powered High-Quality Audio Splitting | Review




Are you a music lover, musician, sound producer or in the field of the music industry who keep trying with new vocals to get the best output for your new upcoming music track? Are you seriously looking for a perfect platform where there is a  clear separation of vocals and instruments from existing tracks?

“Then LALAL.AI is the prefered next-generation music separation platform for quick and precise stem extraction to separate instrumental and vocal tracks easily.” 

Let’s dive in detail about this


Developed by a team of specialists with a unique neural network based on 20TB of data which uses a machine-learning algorithm to identify and extract voice tracks and instrumentals from music tracks who are in the field of new emerging trending technologies Artificial Intelligence, Machine Learning, Mathematical optimization and data signal processing.

Mostly this tool is very useful for people who are into the music industry like DJs, sound producers, singers, musicians and even karaoke lovers.

With, users can do lots of tasks: separate backtracks and voices from songs, podcasts, create karaoke song packs, extract movies lines for translation, and many more.

A new processing filter to improve the experience and signal separation quality has been added and this filter has three processing levels Mild, Normal, Aggressive.

Key Features:

> Audio splitting with superior performance

> AI-Powered user-friendly vocal remover

> No third party software involved

> Easy to Use

> API Integration

Process Steps Involved:

> Just open in your browser

> Drag and Drop the audio file of your choice

> Let the do the separation process

> Download your tracks separately. (vocal and instrumental)

How It Works (steps involved)

The output file format is the same as what you uploaded. If you upload an mp3 file then you get the output result in mp3 and so on.

steps involved lalal

Packages Involved:

Mainly this is offering 3 packages as of now

> Lite

> Professional

> On-Demand

packages update

Still experimenting with API integration for audio splitting and other purposes and sharing new ideas and solutions to help to make the life of millions of people easier. You can check this comparison test with Spleeter here and check their latest press releases here and here.  


Continue Reading


10 Ways Machine Learning Practitioners Can Build Fairer Systems




Author profile picture

@skylerwhartonSkyler Wharton

Software Engineer (ML & Backend) @ Airbnb. My opinions are my own. [they/them]

An introduction to the harm that ML systems cause and to the power imbalance that exists between ML system developers and ML system participants …and 10 concrete ways for machine learning practitioners to help build fairer ML systems.

Image caption: Photo by Koshu Kunii on Unsplash. Image description: Photo of Black Lives Matter protesters in Washington, D.C. — 2 signs say “Black Lives Matter” and “White Silence is Violence.”

Machine learning systems are increasingly used as tools of oppression. All too often, they’re used in high-stakes processes without participants’ consent and with no reasonable opportunity for participants to contest the system’s decisions — like when risk assessment systems are used by child welfare services to identify at-risk children; when a machine learning (or “ML”) model decides who sees which online ads for employment, housing, or credit opportunities; or when facial recognition systems are used to surveil neighborhoods where Black and Brown people live.

ML systems are deployed widely because they are viewed as “neutral” and “objective.”

In reality though, machine learning systems reflect the beliefs and biases of those who design and develop them.

As a result, ML systems mirror and amplify the beliefs and biases of their designers, and are at least as susceptible to making mistakes as human arbiters.

When ML systems are deployed at scale, they cause harm — especially when their decisions are wrong. This harm is disproportionately felt by members of marginalized communities [1]. This is especially evident in this moment, when people protesting as part of the global movement for Black Lives are being tracked by police departments using facial recognition systems [2] and when an ML system was recently used to determine students’ A-level grades in the U.K. after the tests were cancelled due to the pandemic, jeopardizing the futures of poorer students, many of whom are people of color and immigrants [3].

In this post, I’ll describe some examples of harm caused by machine learning systems. Then I’ll offer some concrete recommendations and resources that machine learning practitioners can use to develop fairer machine learning systems. I hope this post encourages other machine learning practitioners to start using and educating their peers about practices for developing fairer ML systems within their teams and companies.

How machine learning systems cause harm

In June 2020, Robert Williams, a Black man, was arrested by the Detroit Police Department because a facial recognition system identified him as the person who committed a recent shoplifting; however, visual comparison of his face to the face in the photo clearly revealed that they weren’t the same person [4].

Nevertheless, Mr. Williams was arrested, interrogated, kept in custody for more than 24 hours, released on bail on his own money, and had to court before his case was dismissed.

This “accident” significantly harmed Mr. Williams and his family:

  • He felt humiliated and embarrassed. When interviewed by the New York Times about this incident, he said, “My mother doesn’t know about it. It’s not something I’m proud of … It’s humiliating.”
  • It caused lasting trauma to him and his family. Had Mr. Williams resisted arrest — which would have been reasonable given that it was unjust — he could have been killed. As it was, the experience was harrowing. He and his wife now wonder whether they need to put their two young daughters into therapy.
  • It put his job — and thus his ability to support himself and his family — at risk. He could have lost his job, even though his case was ultimately dismissed; companies have fired employees with impunity for far less. Fortunately, his boss was understanding of the situation, but his boss still advised him not to tell others at work.
  • It nearly resulted in him having a permanent criminal record. When Mr. Williams went to court, his case was initially dismissed “without prejudice,” which meant that he could still be charged later. Only after the false positive received widespread media attention did the prosecutor apologize and offer to expunge his record and fingerprints.

The harms caused here by a facial recognition system used by a local police department are unacceptable.

Facebook’s ad delivery system is another example of a harmful machine learning system. In 2019, Dr. Piotr Sapieżyński, a research scientist at Northeastern University, and his collaborators conducted an experiment using Facebook’s own marketing tools to discover how employment ads are distributed on Facebook [5, 6]. Through this experiment, they discovered that Facebook’s ad delivery system, despite neutral targeting preferences, shows significantly different job ads to each user depending upon their gender and race. In other words, even if an advertiser specifies that they want their ad to be seen uniformly by all genders and all races, Facebook’s ad delivery system will, depending on the content of the ad, show the ad to a race- and/or gender-skewed audience.

Specifically, Dr. Sapieżyński and collaborators discovered that women are more likely to receive ads for supermarket, janitor, and preschool jobs, whereas men are more likely to receive ads for taxi, artificial intelligence, and lumber jobs. (The researchers acknowledge that the study was limited to binary genders due to restrictions in Facebook’s advertising tools.) They similarly discovered that Black people are more likely to receive ads for taxi, janitor, and restaurant jobs, whereas white people are more likely to receive ads for secretary, artificial intelligence, and lumber jobs.

Facebook’s ad delivery system is an example of a consumer-facing ML system that causes harm to those who participate in it:

  • It perpetuates and amplifies gender- and race-based employment stereotypes for people who use Facebook. For example, women are shown ads for jobs that have historically been associated with “womanhood” (e.g., caregiving or cleaning jobs); seeing such ads reinforces their own — and also other genders’ — perceptions of jobs that women can or “should” do. This is also the case for the ads shown to Black people.
  • It restricts Black users’ and woman users’ access to economic opportunity. The advertisements that Facebook shows to Black people and women are for noticeably lower-paying jobs. If Black people and women do not even know about available higher-paying jobs, then they are unable to apply for and be hired for them.

The harms caused by Facebook’s ad delivery system are also unacceptable.

Broader context

In the case of both aforementioned algorithmic systems, the harm they cause goes deeper: they amplify existing systems of oppression, often in the name of “neutrality” and “objectivity.” In other words, the examples above are not isolated incidents; they contribute to long-standing patterns of harm.

For example, Black people, especially Black men and Black masculine people, have been systematically overpoliced, targeted, and murdered for the last four hundred years. This is undoubtedly still true, as evidenced by the recent murders by the police of George Floyd, Breonna Taylor, Tony McDade, and Ahmaud Arbery and recent shooting by the police of Jacob Blake.

Commercial facial recognition systems allow police departments to more easily and subtly target Black men and masculine people, including to target them at scale. A facial recognition system can identify more “criminals” in an hour than a hundred police officers could in a month, and it can do so less expensively. Thus, commercial facial recognition systems allow police departments to “mass produce” their practice of overpolicing, targeting, and murdering Black people.

Moreover, in 2018, computer science researchers Joy Buolamwini and Dr. Timnit Gebru showed that commercial facial recognition systems are significantly less accurate for darker-skinned people than they are for lighter-skinned people [7]. Indeed, when used for surveillance, facial recognition systems identify the wrong person up to 98% of the time [8]. As a result, when allowed to be used by police departments, commercial facial recognition systems cause harm not only by “scaling” police forces’ discriminatory practices but also by identifying the wrong person the majority of the time.

Facebook’s ad delivery system also amplifies a well-documented system of oppression: wealth inequality by race. In the United States, the median adjusted household income of white and Asian households is 1.6x greater than that of Black and Hispanic households (~$71K vs. $43K), and the median net worth of white households is 13x greater than that of Black households (~$144K vs. $11K) [9]. Thus, by consistently showing ads for only lower-paying jobs to the millions of Black people who use Facebook, Facebook is entrenching and widening the wealth gap between Black people and more affluent demographic groups (especially white people) in the United States. Facebook’s ad delivery system is likely similarly amplifying wealth inequities in other countries around the world.

How collecting labels for machine learning systems causes harm

Harm is not only caused by machine learning systems that have been deployed; harm is also caused while machine learning systems are being developed. That is, harm is often caused while labels are being collected for the purpose of training machine learning models.

For example, in February 2019, The Verge’s Casey Newton released a piece about the working conditions inside Cognizant, a vendor that Facebook hires to label and moderate Facebook content [10]. His findings were shocking: Facebook was essentially running a digital sweatshop.

What they discovered:

  • Employees were underpaid: In Phoenix, AZ, a moderator made $28,800/year (versus the $240,000/year total compensation of a full-time Facebook employee).
  • Working conditions at Cognizant were abysmal: Employees were often fired after making just a few mistakes a week. Since a “mistake” occurred when two employees disagreed about how a piece of content should be moderated, resentment grew between employees. Fired employees often threatened to return to work and harm their old colleagues. Additionally, employees were micromanaged: they got two 15-minute breaks and one 30-minute lunch per day. Much of their break time was spent waiting in line for the bathroom, as often >500 people had to share six bathroom stalls.
  • Employees’ mental health was damaged: Moderators spent most of their time reviewing graphically violent or hateful content, including animal abuse, child abuse, and murders. As a result of watching six hours per day of violent or hateful content, employees developed severe anxiety, often while still in training. After leaving the company, employees developed symptoms of PTSD. While employed, employees had access to only nine minutes of mental health support per day; after they left the company, they had no mental health support from Facebook or Cognizant.

Similar harms are caused by crowdsourcing platforms like Amazon Mechanical Turk, through which individuals, academic labs, or companies submit tasks for “crowdworkers” to complete:

  • Employees are underpaid. Mechanical Turk and other similar platforms are premised on a large amount of unpaid labor: workers are not paid to find tasks, for tasks they start but can’t complete due to vague instructions, for tasks rejected by task authors for often arbitrary reasons, or for breaks. As a result, the median wage for a crowdworker on Mechanical Turk is approximately $2/hour [11]. Workers who do not live in the United States, are women, and/or are disabled are likely to earn much less per hour [12].
  • Working conditions are abysmal. Workers’ income fluctuates over time, so they can’t plan for themselves or their families for the long-term; workers don’t get healthcare or any other benefits; and workers have no legal protections.
  • Employees’ mental health is damaged. Crowdworkers often struggle to find enough well-paying tasks, which causes stress and anxiety. For example, workers report waking up at 2 or 3am in order to get tasks that pay better [11].

Contrary to popular belief, many people who complete tasks on crowdsourcing platforms do so in order to earn the bulk of their income. Thus, people who work for private labeling companies like Cognizant and people who work for crowdsourcing platforms like Mechanical Turk have a similar goal: to complete labeling tasks in a safe and healthy work environment in exchange for fair wages.

Why these harms are happening

At this point, you might be asking yourself, “Why are these harms happening?” The answer is multifaceted: there are many reasons why deployed machine learning systems cause harm to their participants.

When ML systems are used

A big reason that machine learning systems cause harm is due to the contexts in which they’re used. That is, because machine learning systems are considered “neutral” and “objective,” they’re often used in high-stakes decision processes as a way to save money. High-stakes decision processes are inherently more likely to cause harm, since a mistake made during the decision process could have a significant negative impact on someone’s life.

At best, introducing a machine learning system into a high-stakes decision process does not affect the probability that the system causes harm; at worst, it increases the probability of harm, due to machine learning models’ tendency to amplify biases against marginalized groups, human complacency around auditing the model’s decisions (since they’re “neutral” and “objective”), and that machine learning models’ decisions are often uninterpretable.

How ML systems are designed

Machine learning systems also cause harm because of how they’re designed. For example, when designing a system, engineers often do not account for the possibility that the system could make an incorrect decision; thus, machine learning systems often do not include a mechanism for participants to feasibly contest the decision or seek recourse.

Whose perspectives are centered when ML systems are designed

Another reason that ML systems cause harm is that the perspectives of people who are most likely to be harmed by them are not centered when the system is being designed.

Systems designed by people will reflect the beliefs and biases — both conscious and unconscious — of those people. Machine learning systems are overwhelmingly built by a very homogenous group of people: white, Asian-American, or Asian heterosexual cisgender men who are between 20 and 50 years old, who are able-bodied and neurotypical, who are American and/or who live in the United States, and who have a traditional educational background, including a degree in computer science from one of ~50 elite universities. As a result, machine learning systems are biased towards the experiences of this narrow group of people.

Additionally, machine learning systems are used in often contexts that disproportionately involve historically marginalized groups (like predicting recidivism or surveilling “high crime” neighborhoods) or to determine access to resources that have long been unfairly denied to marginalized groups (like housing, employment opportunities, credit and loans, and healthcare). For example, since Black people have historically been denied fair access to healthcare, machine learning systems used in such contexts display similar patterns of discrimination, because they hinge on historical assumptions and data [13]. As a result, unless deliberate action is taken to center the experiences of the groups that ML systems are arbitrating, machine learning systems lead to history repeating itself.

At the intersection of the aforementioned two points is a chilling realization: the people who design machine learning systems are rarely the people who are affected by machine learning systems. This rings eerily similar to the fact that most police do not live in the cities where they work [14].

Lack of transparency around when ML systems are used

Harm is also caused by machine learning systems because it’s often unclear when an algorithm has been used to make a decision. This is because companies are not required to disclose when and how machine learning systems are used (much less get participants’ consent), even when the outcomes of those decisions affect human lives. If someone is unaware that they’ve been affected by an ML system, then they can’t attribute harm they may have experienced to it.

Additionally, even if a person knows or suspects that they’ve been harmed by a machine learning system, proving that they’ve been discriminated against is difficult or impossible, since the complete set of decisions made by the ML system is private and thus cannot be audited for discrimination. As a result, harm that machine learning systems cause often cannot be “proven.”

Lack of legal protection for ML system participants

Finally, machine learning systems cause harm because there is currently very little regulatory or legal oversight around when and how machine learning systems are used, so companies, governments, and other organizations can use them to discriminate against participants with impunity.

With respect to facial recognition, this is slowly changing: in 2019, San Francisco became the first major city to ban the use of facial recognition by local government agencies [15]. Since then, several other cities have done the same, including Oakland, CA; Somerville, MA; and Boston, MA [16, 17].

Nevertheless, there are still hundreds of known instances of local government agencies using facial recognition, including at points of entry into the United States like borders and airports and by local police for unspecified purposes [18]. Use of facial recognition systems in these contexts — especially given that the majority of their decisions are likely wrong [8] — have real-world impact, including harassment, unjustified imprisonment, and deportation.

With respect to other types of machine learning systems, there have been few legal advances.

Call to action

Given the contexts in which ML systems are used, the current lack of legal and regulatory oversight for such contexts, and the lack of societal power that people harmed by ML systems tend to have (due to their, e.g., race, gender, disability, citizenship, and/or wealth), ML system developers have massively more power than participants.

Image caption: There are huge power imbalances in machine learning system development: ML system developers have more power than ML system participants, and labeling task requesters have more power than labeling agents. [Image source:] Image description: Imbalanced scale image — ML system developer & labeling task requester weigh more than ML system participant & labeling agent

There’s a similar power dynamic between people who design labeling tasks and people who complete labeling tasks: labeling task requesters have more power than labeling agents.

Here, ML system developer is defined as anyone who is involved in the design, development, and deployment of machine learning systems, including machine learning engineers and data scientists and also software engineers of other technical disciplines, product managers, engineering managers, UX researchers, UX writers, lawyers, mid-level managers, and C-suite executives. All of these roles are included in order to emphasize that even if you don’t work directly on a machine learning system, if you work at a company or organization that uses machine learning systems, then you have power to affect change on when and how machine learning is used at your company.

Let me be clear: individual action is not enough — we desperately need well-designed legislation to guide when and how ML systems can be used. Importantly, there should be some contexts in which ML systems cannot be used, no matter how “accurate” they are, because the probability of misuse and mistakes are too great — like police departments using facial recognition systems [19].

Unfortunately, we do not have necessary legislation and regulation in place yet. In the meantime, as ML system developers, we should intentionally consider the ML systems that we, our teams, or our companies own and utilize.

How to build fairer machine learning systems

If you are a machine learning system developer — especially if you are machine learning practitioner, like an ML engineer or data scientist — here are 10 ways you can help build machine learning systems that are more fair:


When designing a new ML system or evaluating an existing ML system, ask yourself and your team the following questions about the context in which the system is being deployed/is deployed [20]:

  • What could go wrong when this ML system is deployed?
  • When something goes wrong, who is harmed?
  • How likely is it that something will go wrong?
  • Does the harm disproportionately fall on marginalized groups?

Use your answers to these questions to evaluate how to proceed. For example, if possible, proactively engineer solutions that prevent harms from occurring (e.g., add safeguards to prevent harm, like including human intervention and mechanisms for participants to contest system decisions, and inform participants that a machine learning algorithm is being used). Alternately, if the likelihood and scale of harm are too high, do not deploy it. Instead, consider pursuing a solution that does not depend on machine learning or that uses machine learning in a less risky way. Deploying a biased machine learning system can cause real-world harm to system participants as well as reputational damage to your company [21, 22, 23].


Utilize best practices for developing fairer ML systems. Machine learning fairness researchers have been designing and testing best practices for several years now. For example, one best practice is to, when releasing a dataset for public or internal use, simultaneously release a datasheet, a short document that shares information that consumers of the dataset need in order to make informed decisions about using it (e.g., mechanisms or procedures used to collect the data, whether an ethical review process was conducted, whether or not the dataset relates to people) [24].

Similarly, when releasing a trained model for public or internal use, simultaneously release a model card, a short document that shares information about the model (e.g., evaluation results (ideally disaggregated across different demographic groups and communities), intended usage(s), usages to avoid, insight into model training processes) [25].

Finally, consider implementing a company-wide process for internal algorithmic auditing, like that which Deb RajiAndrew Smart, and their collaborators proposed in their 2020 paper Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.


Work with your company or organization to develop partnerships with advocacy organizations that represent groups of people that machine learning systems tend to marginalize, in order to responsibly engage marginalized communities as stakeholders. Examples of such organizations include Color Of Change and the NAACP. Then, while developing new machine learning systems or evaluating existing machine learning systems, seek and incorporate their feedback.


Hire machine learning engineers and data scientists from underrepresented backgrounds, especially Black people, Indigenous people, Latinx people, disabled people, transgender and nonbinary people, formerly incarcerated people, and people from countries that are underrepresented in technology (e.g., countries in Africa, countries in Southeast Asia, and counties in South America). Note that this will require rethinking how talent is discovered and trained [26] — consider recruiting from historically-black colleges and universities (HBCUs) in the U.S. and coding and data science bootcamps or starting an internal program like Slack’s Next Chapter.

On a related note, work with your company to support organizations that foster talent from underrepresented backgrounds, like AI4ALLBlack Girls CodeCode2040NCWITTECHNOLOchicas, TransTech, and Out for Undergrad. Organizations like these are critical for increasing the number of people from underrepresented backgrounds in technology jobs, including in ML/AI jobs, and all of them have a proven track record of success. Additionally, consider supporting organizations like these with your own money and time.


Work with your company or organization to sign the Safe Face Pledge, an opportunity for organizations to make public commitments towards mitigating the abuse of facial analysis technology. This pledge was jointly drafted by the Algorithmic Justice League and the Center on Technology & Privacy at Georgetown Law, and has already been signed by many leading ethics and privacy experts.


Learn more about the ways in which machine learning systems cause harm. For example, here are seven recommended resources to continue learning:

  1. [Book] Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil (2016)
  2. [Book] Algorithms of Oppression: How Search Engines Reinforce Racism by Safiya Noble (2018)
  3. [Book] Artificial Unintelligence: How Computers Misunderstand the World by Meredith Broussard (2018)
  4. [Book] Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor by Virginia Eubanks (2019)
  5. [Book] Race After Technology: Abolitionist Tools for the New Jim Code by Ruha Benjamin (2019)
  6. [Book] Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass by Mary L. Gray and Siddharth Suri (2019)
  7. [Film] Coded Bias (2020)

Additionally, you can learn more about harms caused by ML systems by reading the work of journalists and researchers who are uncovering biases in machine learning systems. In addition to the researchers and journalists I’ve already named in this essay (e.g., Dr. Piotr SapieżyńskiCasey Newton, Joy BuolamwiniDr. Timnit GebruDeb RajiAndrew Smart), some examples include Julia Angwin (and anything written by The Markup), Khari JohnsonMoira WeigelLauren Kirchner, and anything written by Upturn. The work of journalists and researchers serve as important case studies for how not to design machine learning systems, which is valuable for ML practitioners’ who are aiming to develop fair and equitable ML systems.


Learn about ways in which existing machine learning systems have been improved in order to cause less harm. For example, IBM has worked to improve the performance of their commercial facial recognition system with respect to racial and gender bias (direct link), Google has worked to reduce gender bias in Google Translate (direct link), and Jigsaw (within Google) has worked to change Perspective AI (a public API for hate speech detection algorithm) to less often classify phrases containing frequently targeted groups (e.g., Muslims, women, queer people) as being hate speech (direct link).


Conduct an audit of a machine learning system for disparate impact. Disparate impact occurs when, even though a policy or system is neutral, one group of people is adversely affected more than another. Facebook’s ad delivery system is an example of a system causing disparate impact.

For example, use Project Lighthouse, a methodology that Airbnb released earlier this year that uses anonymized demographic data to measure user experience discrepancies that may be due to discrimination or bias, or ArthurAI, an ML monitoring framework that allows you to also monitor model bias. (Full disclosure: I work at Airbnb.)

Alternatively, hire an algorithmic consulting firm to conduct an audit of a machine learning system that your team or company owns, like O’Neil Risk Consulting & Algorithmic Auditing or the Algorithmic Justice League.


When hiring third-party vendors or using crowdsourcing platforms for machine learning labeling tasks, be critical of who you choose to support. Inquire about the working conditions of the people who will be labeling for you. Additionally, if possible, make an onsite visit to the vendor to gauge working conditions for yourself. What is their hourly pay? Do they have healthcare and other benefits? Are they full-time employees or contractors? Do they expose their workforce to graphically violent or hateful content? Are there opportunities for career growth and advancement within the company?


Give a presentation to your team or company about harms that machine learning systems’ cause and how to mitigate them. The more people who understand the harms that machine learning systems cause and the power imbalance that currently exists between ML system developers and ML system participants, the more likely it is that we can affect change on our teams and in our companies.


Finally, the bonus #11 in this list is, if you are eligible to do so in the United States, VOTE. There is so much at stake in this upcoming election, including the rights of BIPOC people, immigrants, women, LGBTQ people, and disabled people as well as — quite literally — the future of our democracy. If you are not registered to vote, please do so now: Register to vote. If you are registered to vote but have not requested your absentee or mail-in ballot, please do so now: Request your absentee ballotEven though Joe Biden is far from the perfect candidate, we need to elect him and Kamala Harris; this country, the people in it, and so many people around the world cannot survive another four years of a Trump presidency.


Machine learning systems are incredibly powerful tools; unfortunately though, they can be either agents of empowerment or agents of harm. As machine learning practitioners, we have a responsibility to recognize the harm that systems we build cause and then act accordingly. Together, we can work toward a world in which machine learning systems are used responsibly, do not reinforce existing systemic biases, and uplift and empower people from marginalized communities.

This piece was inspired in part by Participatory Approaches to Machine Learning, a workshop at the 2020 International Conference on Machine Learning (ICML) that I had the opportunity to attend in July. I would like to deeply thank the organizers of this event for calling attention to the power imbalance between ML system developers and ML system participants and for creating a space to discuss it: Angela ZhouDavid MadrasInioluwa Deborah RajiBogdan KulynychSmitha Milli, and Richard Zemel. Also published at here.


[1] Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil. Published 2016.

[2] NYPD used facial recognition to track down Black Lives Matter activistThe Verge. August 18, 2020.

[3] An Algorithm Determined UK Students’ Grades. Chaos EnsuedWired. August 15, 2020.

[4] Wrongfully Accused by an AlgorithmThe New York Times. June 24, 2020.

[5] Discrimination through Optimization: How Facebook’s Ad Delivery Can Lead to Biased Outcomes. Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. CSCW 2019.

[6] Turning the tables on Facebook: How we audit Facebook using their own marketing tools. Piotr Sapiezynski, Muhammad Ali, Aleksandra Korolova, Alan Mislove, Aaron Rieke, Miranda Bogen, and Avijit Ghosh. Talk given at PAML Workshop at ICML 2020.

[7] Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Joy Buolamwini and Timnit Gebru. ACM FAT* 2018.

[8] Facial-recognition software inaccurate in 98% of cases, report findsCNET. May 13, 2018.

[9] On Views of Race and Inequality, Blacks and Whites Are Worlds Apart: Demographic trends and economic well-beingPew Research Center. June 27, 2016.

[10] The Trauma Floor: The secret lives of Facebook moderators in AmericaThe Verge. February 25, 2019.

[11] The Internet Is Enabling a New Kind of Poorly Paid HellThe Atlantic. January 23, 2018.

[12] Worker Demographics and Earnings on Amazon Mechanical Turk: An Exploratory Analysis. Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Benjamin V. Hanrahan, Jeffrey P. Bigham, and Chris Callison-Burch. CHI Late Breaking Work 2019.

[13] Millions of black people affected by racial bias in health-care algorithmsNature. October 24, 2019.

[14] Most Police Don’t Live In The Cities They ServeFiveThirtyEight. August 20, 2014.

[15] San Francisco’s facial recognition technology ban, explainedVox. May 14, 2019.

[16] Beyond San Francisco, more cities are saying no to facial recognitionCNN. July 17, 2019.

[17] Boston is second-largest US city to ban facial recognitionSmart Cities Dive. July 6, 2020.

[18] Ban Facial Recognition: Map. Accessed August 30, 2020.

[19] Defending Black Lives Means Banning Facial RecognitionWired. July 10, 2020.

[20] Credit for the framing goes to Dr. Cathy O’Neil, of O’Neil Risk Consulting & Algorithmic Auditing.

[21] Amazon reportedly scraps internal AI recruiting tool that was biased against womenThe Verge. October 10, 2018.

[22] Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling techThe Verge. January 12, 2018.

[23] Facebook’s ad-serving algorithm discriminates by gender and raceMIT Technology Review. April 5, 2019.

[24] Datasheets for Datasets. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. ArXiv preprint 2018.

[25] Model Cards for Model Reporting. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. ACM FAT* 2019.

[26] Combating Anti-Blackness in the AI Community.

Author profile picture

Read my stories

Software Engineer (ML & Backend) @ Airbnb. My opinions are my own. [they/them]


The Noonification banner

Subscribe to get your daily round-up of top tech stories!


Continue Reading
Energy4 hours ago

Global Electrical SCADA Market Report 2020: Developments in IoT Technology and Cloud Computing has Increased Growth

Energy4 hours ago

Global $855 Billion Bio-Refinery Product Market to 2026 with Neste Oil, Renewable Energy, Pacific Ethanol, UOP, Abengoa Bioenergy, and Valero Energy Dominating

Energy4 hours ago

U.S. Chemical Production Expanded In September

Energy5 hours ago

$9.8 Billion Worldwide Thermal Spray Coatings Industry to 2027 – Impact of COVID-19 on the Market

Energy5 hours ago

Valisure Expands Testing Capabilities With The Addition Of Elemental Analysis

AR/VR5 hours ago

Somnium Space’s Next Updates to Add Buildable Worlds, Web Access & More

AI6 hours ago

Beyond Limits and The Carnrite Group Create Alliance to Drive AI Innovation in Oil & Gas, Utilities, Power and Industrial Sectors.

Esports6 hours ago

BIG, OG, fnatic round out Flashpoint 2 team list

Crowdfunding6 hours ago

Earn $10,249 a Year in FREE “Crypto Income”

Energy7 hours ago

Ball Corporation and Kroenke Sports & Entertainment Announce Global Partnership to Advance Sustainability in Sports and Entertainment Through Aluminum Beverage Packaging, Improved Recycling Programs and Consumer Education

Energy7 hours ago

St. James Gold Announces Private Placement

Energy7 hours ago

Worldwide Water and Wastewater Treatment Equipment Industry to 2027 – Featuring SUEZ, Ecolab & DuPont Among Others

Energy7 hours ago

Automotive Refinish Coatings Market Size Worth USD 11.69 Billion by 2027 | CAGR of 3.7%: Emergen Research

AR/VR7 hours ago

Captain Toonhead vs the Punks from Outer Space Unleashes FPS Tower Defense in 2021

Esports8 hours ago

Vitality take down BIG to set up clash against Astralis in DH Open Fall

Energy10 hours ago

Dorian LPG Ltd Provides Update for the Second Quarter 2021 and Announces Second Quarter 2021 Earnings and Conference Call Date

Energy10 hours ago

SK Innovation Declares Ambition to ‘Lead the Efforts for Battery Safety, Charging Speed and Driving Range’ at InterBattery 2020

Energy10 hours ago

Canada Nickel Makes Third New Discovery at Crawford Nickel-Cobalt Sulphide Project

Energy10 hours ago

AEP Reports Strong Third-Quarter 2020 Earnings

Blockchain13 hours ago

Eyeing EU Banks, Hex Trust Teams With SIA on Crypto Custody

Blockchain14 hours ago

Collider Labs Raises $1M to Invest in Blockchain Startups

Blockchain15 hours ago

Voyager Agrees to Buy LGO Markets and Merge 2 Firms’ Tokens

Cyber Security21 hours ago

Business Enablement By Way Of The BISO

Ecommerce21 hours ago

Turing Pi 2 – compact edge clusters with 32 GB RAM and new Raspberry…

Ecommerce21 hours ago

The Top eCommerce Companies in October, According to eCommerce…

Ecommerce21 hours ago

Footwear Manufacturer Otabo Steps Up Digital Strategy with Centric…

Ecommerce21 hours ago

Cloud Sales Veterans Release Essential Read for B2B Salespeople

Ecommerce21 hours ago

LaserShip Announces Its Time Of Need Philanthropic Program

Esports22 hours ago

Gen.G in talks with Liazz – Report

Esports23 hours ago

cogu joins MIBR as manager and coach

Energy23 hours ago

Strategic Resources Files Mustavaara Technical Report

Energy23 hours ago

Ur-Energy Announces Extension of State Bond Loan and Provides Update

Energy24 hours ago

Pettit Marine Paint Develops the Most Effective Anti-fouling Paint to Hit the Market in Many Years – ODYSSEY® TRITON

Energy24 hours ago

Core Lab Reports Third Quarter 2020 Results From Continuing Operations:

Blockchain1 day ago

Pelosi, Kudlow Signal Market-Moving US Stimulus May Wait Till After Election: Report

Energy1 day ago

A Difference-Making Disinfectant

Blockchain1 day ago

Market Wrap: PayPal Powers Bitcoin Past $12.8K as Ether Dominance Drops

Automotive1 day ago

How Car Tires Are Manufactured

Medical Devices1 day ago

5 Real World Applications of the Doppler Effect

Big Data1 day ago

Join Hands with Instagram’s New Algorithm to Boost Your Business’s User Engagement