Table 2.
Summary of the main secondary studies on Software Fairness
| Secondary study | |
|---|---|
| Mehrabi et al., A survey on bias and fairness in machine learning (Mehrabi et al. 2021) | |
| Summary: | Similarities and Differences: | 
| They analyzed the existing literature and defined two taxonomies of (1) the most common fairness and bias definitions, and (2) the state-of-art strategies that researchers proposed to mitigate unfair outcomes in different machine learning application domains. | –Similar objective: Investigate and generalize knowledge on the treatment of fairness in terms of definitions, metrics, strategies and causes of bias; | 
| –Different method and target context: We perform a large-scale survey study involving practitioners working on ML-Intensive systems. | |
| Pessach and Shmueli, A Review on Fairness in Machine Learning (Pessach and Shmueli 2022) | |
| They performed a systematic literature review focusing on classification tasks and discussed trade-offs between fairness and model accuracy, categorizing fairness-enhancing mechanisms in pre-processing, in-preprocessing, and post-processing approaches, depending on when they should be applied. | –Similar scope: Investigate and systematize knowledge about fairness treatment strategies, metrics, and trade-offs; | 
| –Different method and goal: We are interested in understanding how fairness is perceived and treated by practitioners with respect to other six non-functional requirements, by performing a large-scale survey study. | |
| Pagano et al., Bias and Unfairness in Machine Learning Models (Pagano et al. 2023) | |
| They conducted a systematic literature review to collect the most used datasets, metrics, techniques, and tools to detect and mitigate bias. | –We rely on the findings of Pagano et al. to understand the available tools, and we assess whether they are currently leveraged in the practice. | 
| –Different method and target context: We perform a survey study involving practitioners. | |
| Le Quy et al., A survey on datasets for fairness-aware machine learning (Le Quy et al. 2022) | |
| They analyzed datasets provided in the literature to investigate the relationships between protected attributes and class attributes via Bayesian networks. | –Different method and granularity: We conduct a large-scale survey study and collect practitioners’ insights on the common causes of bias in working projects. | 
| Bird et al., Fairness-Aware Machine Learning [...] (Bird et al. 2019) | |
| They drew an overview of the lessons learned in the literature and provided the community with a research road map toward a fairness-first approach, a new development way to manage fairness since the first stages of a typical machine learning development process. | –Different method and goal: We perform a survey study to collect practitioners’ opinions that we believe will be useful to formulate theoretical fairness-oriented development frameworks. | 
| Xivuri and Twinomurinzi, A Systematic Review of Fairness in AI Algorithms (Xivuri and Twinomurinzi 2021) | |
| They analyzed 47 papers to understand how the research community dealt with fairness in terms of method, domains, practices, and locations. They highlighted how fairness is currently more focused on technical and social/human aspects rather than the economic ones, and how most studies have been conducted in Europe and North America. | Similar broadness: We consider different geographical locations, application domains, people’s levels of experience and backgrounds, roles and job responsibilities; | 
| Different method and target context: We perform a survey study involving practitioners. | |
| Shrestha and Das, Exploring gender biases in ML and AI academic research [...] (Shrestha and Das 2022) | |
| They analyzed 120 papers in the context of a systematic literature review on gender bias in machine learning and artificial intelligence, warning that gender-related biases are less explored and require more attention by the research community. | –Different method and scope: We perform a large-scale survey study to understand in which phases of a typical machine learning pipeline the practitioners adopt specific strategies to deal with all kinds of bias and ethical issues. | 
| Catania et al., Fairness & friends in the data science era (Catania et al. 2022) | |
| They analyzed the existing literature to assess how researchers investigated unethical behaviour of data-driven automated decision systems in the context of complex data science pipelines. | –Different method and goal: We conduct a survey with practitioners to gain understanding of how fairness is treated in the practice, throughout the whole development process. | 
| Madaio et al., Assessing the Fairness of AI Systems [...] (Madaio et al. 2022) | |
| They conducted semi-structured interviews and structured workshops with 33 AI practitioners to understand their perspectives on processes, challenges, and needs in the machine learning system development process. | –Similar target context: We are interested in grasping the practitioners’ perspectives on the state of the practice; | 
| –Different method and generalizability of results: We perform a large-scale survey study involving respondents with variegated backgrounds. | |
| Fabris et al., Algorithmic Fairness Datasets: the Story so Far (Fabris et al. 2022) | |
| By surveying the literature, they developed a structured ontology of more than 250 datasets, that have been employed for different fairness-critical tasks in over 30 different application domains. | –We leverage Fabris et al.’s findings to design our research materials, i.e., the survey questions, considering the fairness-critical application domains listed in the ontology, and providing the participants with the possibility to list additional context they work in; | 
| –Different method and goal: We conduct a survey to understand the resources and strategies involved in a fairness-critical development scenario. | |
| Saha et al., Measuring non-expert comprehension of ML fairness metrics (Saha et al. 2020) | |
| They proposed a metric to represent the non-experts’ comprehension of specific statistical fairness definitions, exploring the relationship between comprehension, sentiment, demographics, and the definitions themselves. They validated the metrics via an online survey with non-expert participants, to test its reliability over three specific fairness statistical definitions, i.e., demographic parity, equal opportunity, and equalized odds. | –Similar method: We administered an online survey to involve industrial practitioners; | 
| –Different goal and target context: We surveyed experts practitioners with experience on fairness-critical machine learning projects to collect information on multiple aspects, i.e., the clarity, usefulness and applicability of different fairness notions, and how fairness is relevant with respect to other non-functional attributes in a typical industrial context. | |