In the last decade or so, there has been a growing number of applications of machine learning in domains that have a significant and direct impact on human life. In these domains, it is common to find that machine learning models end up learning societal biases and prejudices against certain classes or groups of humans.
Since the training datasets in many of these applications are biased, the models end up with prejudices as well. Bias in these datasets can be attributed to two different sources:
(1) Historical biases: Certain classes of people were biased against historically, and this can be reflected in the dataset;
(2) Sampling biases: The data could be imbalanced across classes or groups of people (e.g., policing and arrest records across different neighborhoods).
Machine learning researchers have established that bias exists in various forms. Accordingly, there are numerous different measures for quantifying bias and many different mitigation algorithms for decreasing one or more bias measures [1, 2, 3]. However, there are still some major challenges:
(1) Bias measures are application dependent and context-dependent.
(2) Most debiasing algorithms are measure specific.
(3) Debiasing may require retraining.
Training can be quite expensive in many real-world scenarios where the same model might be used in different jurisdictions that have different fairness requirements.
We present a new approach for reducing bias that addresses some of these issues. In our approach, we treat debiasing as an optimization problem and apply optimization procedures to parts of an already trained machine learning model. For instance, given a trained neural network, we apply optimization to set the weights on one or more layers of the network such that the weights reduce some combination of bias measures without sacrificing performance. Through experimentation, we have found that optimizing just the last layer works well.
This approach has advantages over other debiasing algorithms as it can work with already trained models. We can also use this method with any combination of bias and performance measures and can tune our objective to achieve varying amounts of bias reduction.
We used the UCI census income dataset to test our approach. The following figures show how our method performs when different bias measures are targeted. First, we looked at reducing two different bias measures, Equal Opportunity Difference and Statistical Parity Difference for two different protected attributes: race and sex. As the graphs show, our approach was able to successfully debias both measures individually across the two different attributes. For equal opportunity difference, our method even uncovered models that had higher performance than the starting model. The accuracies shown here are test accuracies.
Then, we looked at debiasing a combination of eight different measures. Surprisingly, our method was able to uncover multiple models with slightly better performance and bias than the original model, as shown in the figures below. Of course, as is known in bias research, not all of the individual measures improved (individual measure values shown on the right side in the figure below).
We recently presented the system at NeurIPS 2019 in the Robust AI for Finance workshop. Stay tuned as we release more results and an improved and much faster method in the coming weeks!
. . .
 Verma, Sahil, and Julia Rubin. “Fairness definitions explained.” In 2018 IEEE/ACM International Workshop on Software Fairness (FairWare), pp. 1–7. IEEE, 2018. http://www.ece.ubc.ca/~mjulia/publications/Fairness_Definitions_Explained_2018.pdf
 Bellamy, Rachel KE, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia et al. “AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias.” IBM Journal of Research and Development 63, no. 4/5 (2019): 4–1.
 Barocas, Solon, Moritz Hardt, and Arvind Narayanan. “Fairness in machine learning.” NIPS Tutorial (2017). https://fairmlbook.org/pdf/fairmlbook.pdf