Skip to main content
PLOS One logoLink to PLOS One
. 2023 Feb 21;18(2):e0278751. doi: 10.1371/journal.pone.0278751

The extent of algorithm aversion in decision-making situations with varying gravity

Ibrahim Filiz 1, Jan René Judek 1, Marco Lorenz 2,*, Markus Spiwoks 1
Editor: Christiane Schwieren3
PMCID: PMC9942970  PMID: 36809526

Abstract

Algorithms already carry out many tasks more reliably than human experts. Nevertheless, some subjects have an aversion towards algorithms. In some decision-making situations an error can have serious consequences, in others not. In the context of a framing experiment, we examine the connection between the consequences of a decision-making situation and the frequency of algorithm aversion. This shows that the more serious the consequences of a decision are, the more frequently algorithm aversion occurs. Particularly in the case of very important decisions, algorithm aversion thus leads to a reduction of the probability of success. This can be described as the tragedy of algorithm aversion.

Introduction

Automated decision-making or decision aids, so-called algorithms, are becoming increasingly significant for many people’s working and private lives. The progress of digitalization and the growing significance of artificial intelligence in particular mean that efficient algorithms have now already been available for decades (see, for example, [1]). These algorithms already carry out many tasks more reliably than human experts [2]. However, only a few algorithms are completely free of errors. Some areas of application of algorithms can have serious consequences in the case of a mistake–such as autonomous driving (cf. [3]), making medical diagnoses (cf. [4]), or support in criminal proceedings (cf. [5]). On the other hand, algorithms are also used for tasks which might not have serious consequences in the case of an error, such as dating service (cf. [6]), weather forecasts (cf. [7]), and the recommendation of cooking recipes (cf. [8]).

Some economic agents have a negative attitude towards algorithms. This is usually referred to as algorithm aversion (for an overview of algorithm aversion see [9,10]. Many decision-makers thus tend to delegate tasks to human experts or carry them out themselves. This is also frequently the case when it is clearly recognizable that using algorithms would lead to an increase in the quality of the results [1113].

In decision-making situations which lead to consequences which are not so serious in the case of an error, a behavioral anomaly of this kind does not have particularly significant effects. In the case of a dating service, the worst that can happen is meeting with an unsuitable candidate. In the case of an erroneous weather forecast, unless it is one for seafarers, the worst that can happen is that unsuitable clothing is worn, and if the subject is the recommendation of cooking recipes, the worst-case scenario is a bland meal. However, particularly in the case of decisions which can have serious consequences in the case of a mistake, diverging from the rational strategy would be highly risky. For example, a car crash or a wrong medical diagnosis can, in the worst case, result in someone’s death. Being convicted in a criminal case can lead to many years of imprisonment. In these serious cases, it can be expected that people tend to think more thoroughly about what to do in order to make a reasonable decision. Can algorithm aversion be overcome in serious situations in order to make a decision which maximizes utility and which, at best, can save a life?

Tversky & Kahneman [14] show that decisions can be significantly influenced by the context of the decision-making situation. The story chosen to illustrate the problem influences the salience of the information, which can also lead to an irrational neglect of the underlying mathematical facts. This phenomenon is also referred to as the framing effect (for an overview see [15]). Irrespective of the actual probability of success, subjects do allow themselves to be influenced. This study therefore uses six mathematically identical decision situations with different contexts to examine whether the extent of algorithm aversion can be influenced by a framing effect.

Moreover, it is analyzed precisely which aspects of a decision affect the choice between algorithms and human experts the most. In particular, it is examined whether subjects are prepared to desist from their algorithm aversion in decision-making situations which can have severe consequences (three of the six scenarios). Expectancy theory [16] states that the importance of a task positively influences subjects’ motivation in performing the task. Consistent with this, Mento, Cartledge & Locke [17] show in five experiments that increasing valence of a goal leads to higher goal acceptance and determination to achieve it. Gollwitzer [18] argues that the importance of a task determines the extent to which individuals develop and maintain commitment to the task. Similarly, Gendolla [19] asserts that "outcome valence and importance have effects on expectancy formation," where importance refers to the "personal relevance of events".

If algorithm aversion is due to decisions being made on gut instinct rather than analytically thought through, it should decrease with more meticulous expectancy formation, and increasing motivation and commitment, all of which result from task importance. We thus consider whether there are significantly different frequencies of algorithm aversion depending on whether the decision-making situations can have serious consequences or not.

Literature review

Previous publications have defined the term algorithm aversion in quite different ways (Table 1). These different understandings of the term are reflected in the arguments put forward as well as in the design of the experiments carried out. From the perspective of some scholars, it is only possible to speak of algorithm aversion when an algorithm recognizably provides the option with the highest quality result or probability of success (cf. [1012,20,21]). However, other scholars consider algorithm aversion to be present as soon as subjects exhibit a fundamental disapproval of an algorithm in spite of its possible superiority (cf. [2228]).

Table 1. Definitions of algorithm aversion in the literature.

Authors Definition of algorithm aversion
Dietvorst,
Simmons &
Massey, 2015 [12]
"Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion (…)"
Prahl & Van Swol, 2017 [28] "The irrational discounting of automation advice has long been known and a source of the spirited “clinical versus actuarial” debate in clinical psychology research (Dawes, 1979; Meehl, 1954). Recently, this effect has been noted in forecasting research (Önkal et al., 2009) and has been called algorithm aversion (Dietvorst, Simmons, & Massey, 2015)."
Dietvorst,
Simmons &
Massey, 2018 [29]
"Although evidence-based algorithms consistently outperform human forecasters, people often fail to use them after learning that they are imperfect, a phenomenon known as algorithm aversion."
Commerford, Dennis, Joe & Wang, 2019 [30] “(…) algorithm aversion–the tendency for individuals to discount computer-based advice more heavily than human advice, although the advice is identical otherwise.”
Horne, Nevo, O’Donovan, Cho & Adali, 2019 [24] “For example, Dietvorst et al. (Dietvorst, Simmons, and Massey 2015) studied when humans choose the human forecaster over a statistical algorithm. The authors found that aversion of the automated tool increased as humans saw the algorithm perform, even if that algorithm had been shown to perform significantly better than the human.
Dietvorst et al. explained that aversion occurs due to a quicker decrease in confidence in algorithmic forecasters over human forecasters when seeing the same mistake occur (Dietvorst, Simmons, and Massey 2015).”
Ku, 2019 [21] “(…) “algorithm aversion”, a term refers by Dietvorst et al. (Dietvorst et al. 2015) means that humans distrust algorithm even though algorithm consistently outperform humans.”
Leyer & Schneider, 2019 [31] “In the particular context of the delegation of decisions to AI-enabled systems, recent findings have revealed a general algorithmic aversion, an irrational discounting of such systems as suitable decision-makers despite objective evidence (Dietvorst, Simmons and Massey, 2018)”
Logg, Minson & Moore, 2019 [25] "(…) human distrust of algorithmic output, sometimes referred to as “algorithm aversion” (Dietvorst, Simmons, & Massey, 2015).1 “; Footnote 1: "while this influential paper [of Dietvorst et al.] is about the effect that seeing an algorithm err has on people’s likelihood of choosing it, it has been cited as being about how often people use algorithms in general."
Önkal, Gönül & De Baets, 2019 [32] “(…) people are averse to using advice from algorithms and are unforgiving toward any errors made by the algorithm (Dietvorst et al., 2015; Prahl & Van Swol, 2017).”
Rühr, Streich,
Berger & Hess, 2019 [26]
"Users have been shown to display an aversion to algorithmic decision systems [Dietvorst, Simmons, Massey, 2015] as well as to the perceived loss of control associated with excessive delegation of decision authority [Dietvorst, Simmons, Massey, 2018]."
Yeomans, Shah, Mullainathan & Kleinberg, 2019 [27] "(. . .) people would rather receive recommendations from a human than from a recommender system (. . .). This echoes decades of research showing that people are averse to relying on algorithms, in which the primary driver of aversion is algorithmic errors (for a review, see Dietvorst, Simmons, & Massey, 2015)."
Berger, Adam, Rühr & Benlian, 2020 [33] “Yet, previous research indicates that people often prefer human support to support by an IT system, even if the latter provides superior performance–a phenomenon called algorithm aversion.” (…) “These differences result in two varying understandings of what algorithm aversion is: unwillingness to rely on an algorithm that a user has experienced to err versus general resistance to algorithmic judgment.”
Burton, Stein & Jensen, 2020 [10] "(…) algorithm aversion—the reluctance of human forecasters to use superior but imperfect algorithms—(…)"
Castelo, Bos & Lehmann, 2020 [11] "The rise of algorithms means that consumers are increasingly presented with a novel choice: should they rely more on humans or on algorithms? Research suggests that the default option in this choice is to rely on humans, even when doing so results in objectively worse outcomes."
De-Arteaga, Fogliato & Chouldechova, 2020 [34] Algorithm aversion–the tendency to ignore tool recommendations after seeing that they can be erroneous (…)”
Efendić, Van de Calseyde & Evans, 2020 [22] "Algorithms consistently perform well on various prediction tasks, but people often mistrust their advice."; “However, repeated observations show that people profoundly mistrust algorithm-generated advice, especially after seeing the algorithm fail (Bigman & Gray, 2018; Diab, Pui, Yankelevich, & Highhouse, 2011; Dietvorst, Simmons, & Massey, 2015; Önkal, Goodwin, Thomson, Gönül, & Pollock, 2009).”
Erlei, Nekdem, Meub, Anand & Gadiraju, 2020 [35] “Recently, the concept of algorithm aversion has raised a lot of interest (see (Burton, Stein, and Jensen 2020) for a review). In their seminal paper, (Dietvorst, Simmons, and Massey 2015) illustrate that human actors learn differently from observing mistakes by an algorithm in comparison to mistakes by humans. In particular, even participants who directly observed an algorithm outperform a human were less likely to use the model after observing its imperfections.”
Germann & Merkle, 2020 [36] “The tendency of humans to shy away from using algorithms even when algorithms observably outperform their human counterpart has been referred to as algorithm aversion.”
Ireland, 2020 [37] “(…) some researchers find that, when compared to humans, people are averse to algorithms after recording equivalent errors.”
Jussupow, Benbasat & Heinzl, 2020 [38] "(…) literature suggests that although algorithms are often superior in performance, users are reluctant to interact with algorithms instead of human agents–a phenomenon known as algorithm aversion"
Niszczota & Kaszás, 2020 [23] “When given the possibility to choose between advice provided by a human or an algorithm, people show a preference for the former and thus exhibit algorithm aversion (Castelo et al., 2019; Dietvorst et al., 2015, 2016; Longoni et al., 2019).”
Wang, Harper & Zhu, 2020 [39] “(…) people tend to trust humans more than algorithms even when the algorithm makes more accurate predictions.”
Kawaguchi, 2021 [40] “The phenomenon in which people often obey inferior human decisions, even if they understand that algorithmic decisions outperform them, is widely observed. This is known as algorithm aversion (Dietvorst et al. 2015).”
Köbis & Mossink, 2021 [20] “When people are informed about algorithmic presence, extensive research reveals that people are generally averse towards algorithmic decision makers. This reluctance of “human decision makers to use superior but imperfect algorithms” (Burton, Stein, & Jensen, 2019; p.1) has been referred to as algorithm aversion (Dietvorst, Simmons, & Massey, 2015). In part driven by the belief that human errors are random, while algorithmic errors are systematic (Highhouse, 2008), people have shown resistance towards algorithms in various domains (see for a systematic literature review, Burton et al., 2019).”

Another important aspect of how the term algorithm aversion is understood is the question of whether and possibly also how the subjects learn about the superiority of an algorithm. Differing approaches were chosen in previous studies. Dietvorst, Simmons and Massey [12] focus on the gathering of experience in dealing with an algorithm in order to be able to assess its probability of success in comparison to one’s own performance. In a later study, Dietvorst, Simmons and Massey [29] specify the average error of an algorithm. Alexander, Blinder and Zak [41] provide exact details on the probability of success of an algorithm, or they refer to the rate at which other subjects used an algorithm in the past.

In addition, when dealing with algorithms, the way in which people receive feedback is of significance. Can subjects (by using their previous decisions) draw conclusions about the quality and/or success of an algorithm? Dietvorst, Simmons and Massey [12] merely use feedback in order to facilitate experience in dealing with an algorithm. Prahl and Van Swol [28] provide feedback after every individual decision, enabling an assessment of the success of the algorithm. Filiz et al. [42] follow this approach and use feedback after every single decision in order to examine the decrease in algorithm aversion over time.

Other aspects which emerge from the previous definitions of algorithm aversion in the literature are the reliability of an algorithm (perfect or imperfect), the observation of its reliability (the visible occurrence of errors), access to historical data on how the algorithmic forecast was drawn up; the setting (algorithm vs. expert; algorithm vs. amateur; algorithm vs. subject) as well as extent of the algorithm’s intervention (does the algorithm act as an aid to decision-making or does it carry out tasks automatically?).

In our view, the superiority of an algorithm (higher probability of success) and the knowledge of this superiority are the decisive aspects. Algorithm aversion can only be present when subjects are clearly aware that not using an algorithm reduces the expected value of their utility and they do not deploy it nevertheless. A decision against the use of an algorithm which is known to be superior reduces the expected value of the subject’s pecuniary utility and thus has to be viewed as a behavioral anomaly (cf. [4345]).

Methods and experimental design

We carry out an economic experiment in the laboratory of the Ostfalia University of Applied Sciences, in which the subjects assume the perspective of a businessperson who offers a service to his/her customers. A decision has to be made on whether this service should be carried out by specialized algorithms or by human experts.

The involvement of students as subjects was approved by the dean’s office of the business faculty and the research commission of the Ostfalia University of Applied Sciences. The economic experiment took place as part of a regular laboratory class. All participants were at least 18 years of age at the time of the experiment and are therefore considered to be of legal age in Germany. The participants had confirmed their consent by registration for the economic experiment in the online portal of the Ostfalia University, which is sufficient according to the dean’s office and the research commission. Before the start of the economic experiment, they were informed again that their participation was completely voluntary and that they could leave at any time.

In this framing approach, six decision-making scenarios are contrasted that entail different degrees of gravity of their potential consequences if they are executed not successfully. We base our experimental approach on the factual contexts in which algorithms can be used, described in the introduction, and assume that subjects perceive gravity differently in these contexts. The following services are considered: (1) Driving service with the aid of autonomous vehicles (algorithm) or with the aid of drivers, (2) The evaluation of MRI scans with the help of a specialized computer program (algorithm) or with the aid of doctors, (3) The evaluation of files on criminal cases with the aid of a specialized computer program (algorithm) or with the help of legal specialists, (4) A dating site providing matchmaking with the aid of a specialized computer program (algorithm) or with the support of staff trained in psychology, (5) The selection of recipes for cooking subscription boxes with the aid of a specialized computer program or the help of staff trained as professional chefs, and (6) The drawing up of weather forecasts with the help of a specialized computer program (algorithm) or using experienced meteorologists (Table 2).

Table 2. Decision-making scenarios.

Decision-making scenarios
(1) Driving service
(2) Evaluation of MRI scans
(3) The assessment of criminal case files
(4) Dating service
(5) Selection of cooking recipes
(6) Drawing up weather forecasts

The six scenarios that are part of this study were identified through a pre-test, in which additional scenarios were also presented from a literature analysis and brainstorming process. The final selection was made based on three criteria: comprehensibility (do the subjects understand what this application area for algorithms is about?), familiarity (do the subjects know the application area from personal experience or from the media?), and scope (are the high and low scope scenarios actually evaluated as such?). The scenarios are selected in such a way that they are relevant in the literature and that the subjects should be familiar with them from public debates or from their own experience. In this way, it is easier for the subjects to immerse themselves in the respective context. Detailed descriptions of the scenarios can be viewed in S3 File.

The study has a between-subjects design. Each subject is only confronted with one of a total of six scenarios. All six scenarios have the same probability of success: the algorithm carries out the service with a probability of success of 70%. The human expert carries out the service with a probability of success of 60%. The participants receive a show-up fee of €2, and an additional payment of €4 if the service is carried out successfully. Since we apply the same mathematical conditions of a successful execution of a service to each scenario, only the contextual framework of the six scenarios varies. A perfectly rational economic subject (homo oeconomicus) decides to use the algorithm in all six scenarios because this leads to the maximization of the expected value of the compensation. The context of the respective scenario does not play any role for a homo oeconomicus, because he exclusively strives to maximize his pecuniary benefit.

Before the experiment begins, all participants have to answer test questions (see S2 File). They have a maximum of two attempts at this. Participants who answer the test questions incorrectly twice are disregarded in the analysis, as the data should not be distorted by subjects who have misunderstood the task. The experiment starts with the participants being asked to assess the gravity of the shown decision-making scenario on a scale from 0 (not serious) to 10 (very serious). This allows us to evaluate the different scenarios based on the perceived gravity of the subjects. In this way, it is possible to assess how subjects perceive the potential effects in the context of one scenario compared to the context of another scenario. In the case of the driving service and the evaluation of MRI scans, it could be a matter of life and death. In the evaluation of documents in the context of criminal cases, it could lead to serious limitations of personal freedom. These scenarios could thus have serious consequences for third parties if they end unfavorably. The situation is different in the case of matchmaking, selecting cooking recipes and drawing up weather forecasts. Even when these tasks cannot be accomplished in a satisfactory way sometimes, the consequences should usually not be very serious. A date might turn out to be dull, or one is disappointed by the taste of a lunch, or you are out without a jacket in the rain. None of those things would be pleasant, but the potential consequences would be trivial.

A homo oeconomicus (a person who acts rationally in economic terms) must–regardless of the context–prefer the algorithm to human experts, because it maximizes his or her financial utility. Every decision in favor of the human experts has to be considered algorithm aversion.

Algorithm aversion is a phenomenon which can occur in a wide range of decision-making situations (cf. [10]). We thus presume that the phenomenon can also be observed in this study. Although the scenarios offer no rational reasons for choosing the human experts, some of the participants will do precisely this. Hypothesis 1 is: Not every subject will select the algorithm. Null hypothesis 1 is therefore: Every subject will select the algorithm.

There is some evidence that the extent of algorithm aversion is influenced by the framing of the conditions under which an algorithm operates. Hou & Jung [46] have subjects complete estimation tasks using algorithms. They vary the description of the algorithm using a framing approach and find that this has a significant impact on the willingness to follow the algorithm’s advice. Utz, Wolfers & Göritz [47] investigate the perspective on a decision. In three scenarios, they use a framing approach to vary whether a subject is the decision maker or the one affected by the decision. The influence of perspective on the choice behavior between human and algorithm is significant only in one of the three scenarios, namely in the distribution of ventilators for Covid-19 patients.

Regarding the importance and consequences of a task, the findings to date are mixed. Castelo, Bos & Lehmann [11] use a vignette study to show that framing is suited to influencing algorithm aversion. A self-reported dislike for or distrust in algorithms appears to various degrees in different contexts of a decision. Likewise, Renier, Schmid Mast & Bekbergenova [48] study, among other things, the relationship between algorithm aversion and the magnitude of a decision for the human who must bear the consequences of the decision. In a vignette study, they vary the magnitude of the consequences that result from an algorithm error. According to their description of the task, the people affected may, for example, be wrongly denied a job contract or a loan. In contrast to Castelo, Bos & Lehmann [11], they conclude that the scope has no influence on the extent of algorithm aversion.

The difference in the results of the mentioned studies already shows that there still seems to be a large knowledge gap here. Sometimes a framing approach seems suitable to change decision behavior in the context of algorithm use, and sometimes not. Nonetheless, in all four studies mentioned above, the algorithm was not recognizably the most reliable alternative, and there is also no performance-related payment for the subjects. Algorithm aversion is therefore not modeled as a behavioral anomaly.

To extend our understanding, we analyze the extent of algorithm aversion in six differently framed decision situations. We believe that a clear financial incentive that models algorithm aversion as a behavioral anomaly will enhance the framing effect. We expect that the frame will have an influence on algorithm aversion analogous to Castelo, Bos & Lehmann [11] if the financial advantage of the algorithm is clearly recognizable. Hypothesis 2 is: The proportion of decisions made in favor of the algorithm will vary significantly between the decision situations perceived as serious and trivial. Null hypothesis 2 is therefore: The proportion of decisions made in favor of the algorithm will not vary significantly between the decision situations perceived as serious and trivial.

In the literature there are numerous indications that framing can significantly influence the decision-making behavior of subjects (cf. [14]). If subjects acted rationally and maximized their utility, neither algorithm aversion nor the framing effect would arise. Nonetheless, real human subjects–as the research in behavioral economics frequently shows–do not act like homo oeconomicus. Their behavior usually tends to correspond more to the model of bounded rationality put forward by Herbert A. Simon [49]. Human beings suffer from cognitive limitations–they fall back on rules of thumb and heuristics. But they do try to make meaningful decisions–as long as this does not involve too much effort. This kind of ‘being sensible’–which is often praised as common sense–suggests that great efforts have to be made when decisions can have particularly serious consequences (for an overview of bounded rationality see [5052]).

Jack W. Brehm’s motivational intensity theory (see, e.g., [53]) identifies three main determinants of effort to make successful decisions: (1) The importance of the outcome of a successful decision, (2) the degree of difficulty of the task, and (3) the subjective assessment that the task can be successfully accomplished. The more important the outcome of a successful decision, the more pronounced the effort to make a successful decision. The more difficult the task is in relation to the desired outcome and the lower the prospect of successfully accomplishing the task, the weaker the effort to make a successful decision is pronounced.

The last two aspects are unlikely to vary much across the six decision situations in this study. The degree of difficulty of the task is consistent in all six cases. All that is required is to weigh the algorithm’s probability of success (70%) against the human expert’s probability of success (60%). This is a simple task—in all six decision situations. It can be assumed that this level of difficulty is perceived as manageable by the subjects—in all six decision situations. However, the importance of the outcome of a decision differs in the six decision situations. Three decision situations have potentially serious consequences, and the other three decision situations have potentially trivial consequences. Thus, it is to be expected that subjects will try harder to make a successful decision in the decisions that involve potentially serious consequences. This is in line with other research that shows that the valence of a goal influences expectancy formation [19] and leads to increasing motivation [16] and commitment to a task [17,18].

This is also consistent with what would generally be recognized as common sense. This everyday common sense, which demands different levels of effort for decision-making situations with different degrees of gravity, could contribute towards the behavioral anomaly of algorithm aversion appearing more seldom in decisions with possible serious consequences than in decisions with relatively insignificant effects. The founding of a company is certainly given much more thought than choosing which television program to watch on a rainy Sunday afternoon. And much more care will usually be invested in the selection of a heart surgeon than in the choice of a pizza delivery service.

The assumption that higher valence of a situation leads to more effort in decision making has already been supported by experimental economics in other contexts. For example, Muraven & Slessareva [54] tell a subset of their subjects that their responses in an effort task will be used for important research projects to combat Alzheimer’s disease. The mere belief that their effort may possibly reduce the suffering of Alzheimer’s patients leads subjects to perform significantly better than in a control group. Since higher task importance may contribute to exerting more effort, we hypothesize that it also leads subjects to focus on the really relevant aspects of a decision (here: the different probabilities of success), thus eventually decreasing algorithm aversion. Hypothesis 3 is thus: The greater the gravity of a decision, the more seldom the behavioral anomaly of algorithm aversion arises. Null hypothesis 3 is therefore: Even when the gravity of a decision-making situation increases, there is no reduction in algorithm aversion.

Results

This economic experiment is carried out between 2–14 November 2020 in the Ostfalia Laboratory of Experimental Economic Research (OLEW) of Ostfalia University of Applied Sciences in Wolfsburg. A total of 143 students of the Ostfalia University of Applied Sciences take part in the experiment. Of these, 91 subjects are male (63.6%), 50 subjects are female (35%) and 2 subjects (1.4%) describe themselves as non-binary. Of the 143 participants, 65 subjects (45.5%) study at the Faculty of Business, 60 subjects (42.0%) at the Faculty of Vehicle Technology, and 18 subjects (12.6%) at the Faculty of Health Care. Their average age is 23.5 years.

The experiment is programmed in z-Tree (cf. [55]). Only the lottery used to determine the level of success when providing the service is carried out by taking a card from a pack of cards. In this way we want to counteract any possible suspicion that the random event could be manipulated. The subjects see the playing cards and can be sure that when they choose the algorithm there is a probability of 70% that they will be successful (the pack of cards consists of seven +€4 cards and three ±€0 cards). In addition, they can be sure that if they choose a human expert their probability of success is 60% (the pack of cards consists of six +€4 cards and four €±0 cards) (see S1 and S2 Figs).

The time needed for reading the instructions of the game (see S1 File), answering the test questions (see S2 File), and carrying out the task is 10 minutes on average. A show-up fee of €2 and the possibility of a performance-related payment of €4 seem appropriate for the time spent—it is intended to be sufficient incentive for meaningful economic decisions, and the subjects do give the impression of being concentrated and motivated.

We provide the subjects with only one contextual framework of a decision situation at a time. Here, the subjects are presented with decision situations in the context of driving service (25 subjects), evaluating MRI scans (24 subjects), assessing criminal case files (22 subjects), dating service (24 subjects), selecting cooking recipes (23 subjects), and drawing of weather forecasts (25 subjects). Subjects were distributed similarly evenly across the contextual decision situations in terms of gender and faculty.

Overall, only 87 out of 143 subjects (60.84%) decide to delegate the service to the (superior) algorithm. A total of 56 subjects (39.16%) prefer to rely on human experts in spite of the lower probability of success. Null hypothesis 1 thus has to be rejected. The result of the chi-square goodness of fit test is highly significant (χ2 (N = 143) = 21.93, p < 0.001). On average, around two out of five subjects thus tend towards algorithm aversion. All subjects should be aware that preferring human experts and rejecting the algorithm reduces the expected value of the performance-related payment. However, the need to decide against the algorithm is obviously strong in a part of the subjects. To investigate the effects of our framing approach on algorithm aversion (hypothesis 2), we must first examine how subjects evaluate the six differently framed scenarios. The subjects perceive the gravity of the decision situations differently (Fig 1). While in the contextual decision situations driving service (mean gravity of 8.88), evaluation of MRI scans (mean gravity of 9.42) and assessment of criminal case files (mean gravity of 8.68), the potential consequences of not successfully performing the service are perceived as comparatively serious, the contextual decision situations dating service (mean gravity of 6.33), selection of cooking recipes (mean gravity of 7.87) and drawing up weather forecasts (mean gravity of 5.52) show less pronounced perceived gravity of potential consequences (Table 3).

Fig 1. Violin plots for the assessment of the gravity of the decision-making situations.

Fig 1

Table 3. Evaluation of gravity in a contextual decision situation.

Scenario # Mean value
of gravity
Median Standard
deviation
(1) Driving service 25 8.88 9 1.09
(2) Evaluation of MRI scans 24 9.42 10 1.14
(3) Criminal case files 22 8.68 9 1.78
(4) Dating service 24 6.33 7 2.50
(5) Selection of recipes 23 7.87 8 1.96
(6) Weather forecasts 25 5.52 6 2.57

In the six scenarios, each with identical chances of success for execution by a human expert or an algorithm, subjects decide by whom the service should be performed depending on the context in which the situation is presented. By considering the context, subjects arrive at an assessment of the gravity of the potential consequences of not successfully performing the service (Fig 1). Even though the six scenarios differ considerably from each other in their context, they are also similar in the assessment of their gravity when viewed individually. The perceived gravity of the scenarios as reported by the subjects suggests that the decision situations can be considered in two clusters, decisions with possibly serious consequences (for the highest mean gravity scores) and decisions with possibly trivial consequences (for the lowest mean gravity scores).

The comparison of the contextual decision situations with possibly serious consequences and those with possibly trivial consequences, as indicated by the mean values of the gravity levels per decision situation, show that the perceived gravity of the six scenarios is (highly) significantly different when using the Wilcoxon rank-sum test (Table 4). For example, perceived gravity of driving service differs from dating service with p < 0.001. Only the scenarios driving service and assessment of criminal case files differ from the scenario recipe selection only at a significance level of 0.1, as already suggested by their mean gravity. On average, the possible consequences of recipe selection are perceived as slightly more serious, but not as serious as driving service, evaluating MRI scans or criminal case files. Cohen’s d shows how much the means of two samples differ. An effect size of 0.2 corresponds to small effects, 0.5 to medium effects, and 0.8 to large effects [56].

Table 4. Comparison of perceived gravity using Wilcoxon rank-sum test and Cohen’s d.

(4) Dating service (5) Selection of recipes (6) Weather forecasts
p-value* Cohen’s d p-value* Cohen’s d p-value* Cohen’s d
(1) Driving service 0.000
1.33 0.064 0.64 0.000 1.70
(2) Evaluation of MRI scans 0.000 1.59 0.000 0.97 0.000 1.95
(3) Criminal case files 0.000 1.07 0.061 0.43 0.000 1.41

*p-values using Wilcoxon rank-sum test.

This confirms that subjects perceive the consequences of decision contexts to vary in gravity and leads to the classification of decision contexts into cluster A1 (possibly serious consequences: driving service, evaluation of MRI scans, and criminal case files) and cluster A2 (possibly trivial consequences: dating service, selecting cooking recipes, and weather forecasts) that we propose in this framework. The violin plot of the summarized decision situations shows that the subjects rate the gravity higher in contexts with critical consequences for physical integrity than in contexts where it does not matter (Fig 2). However, a direct comparison of the violin plots also shows that the range of decision situations rated as having trivial consequences is wider than that of the others, since some subjects also rate the gravity as very high here.

Fig 2. Violin plots for scenarios with possibly serious and trivial consequences.

Fig 2

Still, the possible consequences of each decision situation from cluster A1 are rated by the subjects as more serious than those from cluster A2. According to this classification, the mean of the perceived gravity for the decision situations with possibly serious consequences (A1) is 9.0 with a standard deviation of 1.37. In contrast, when the gravity of the decision situations with possibly trivial consequences (A2) is evaluated, the mean is 6.54 with a standard deviation of 2.53 (Table 5). The Wilcoxon rank-sum test shows that the gravity of the decision situations in cluster A1 is assessed as significantly higher than that of the decision situations in cluster A2 (z = 6.689; p < 0.001).

Table 5. Evaluation of gravity in clusters A1 and A2.

Cluster A1
(serious)
Cluster A2
(trivial)
Median 10 7
Average0} 9.00 6.54
Standard deviation 1.37 2.53

Furthermore, a difference in the number of decisions in favor of the algorithm between the two clusters can be observed. While 50.7% of the subjects in cluster A1 choose the algorithm, 70.83% in cluster A2 rely on it (for the individual decisions in the contextual decision situations, see Table 6). The chi-square test reveals that null hypothesis 2 has to be rejected (χ2 (N = 143) = 6.08, p = 0.014). The frequency with which algorithm aversion occurs is influenced by the implications involved in the decision-making situation. The framing effect has an impact.

Table 6. Decisions for and against the algorithm.

Decisions for
the algorithm
Decisions against
the algorithm
Total n Percent n Percent
Cluster A1 (serious) 71 36 50.70% 35 49.30%
    (1) Driving car 25 13 52.00% 12 48.00%
    (2) Evaluation of MRI scans 24 13 54.17% 11 45.83%
    (3) Criminal case files 22 10 45.45% 12 54.55%
Cluster A2 (trivial) 72 51 70.83% 21 29.17%
    (4) Dating service 24 18 75.00% 6 25.00%
    (5) Selection of recipes 23 14 60.87% 9 39.13%
    (6) Weather forecasts 25 19 76.00% 6 24.00%
Σ (total) 143 87 60.84% 56 39.16%

There may be situations in which people like to act irrationally at times. However, common sense suggests that one should allow oneself such lapses in situations where serious consequences must not be feared. For example, there is a nice barbecue going on and the host opens a third barrel of beer although he suspects that this will lead to hangovers the next day among some of his guests. In the case of important decisions, however, one should be wide awake and try to distance oneself from reckless tendencies. For example, if the same man visits a friend in hospital whose life would be acutely threatened by drinking alcohol after undergoing a complicated stomach operation, he would be wise to avoid bringing him a bottle of his favorite whisky. This comparison of two examples illustrates what could be described as common sense and would be approved of by most neutral observers.

Nevertheless, the results of the experiment point in the opposite direction. A framing effect sets in, but not in the way one might expect. Whereas in cluster A1 (possibly serious consequences) 49.3% of the subjects do exhibit the behavioral anomaly of algorithm aversion, this is only the case in 29.17% of the subjects in cluster A2 (possibly trivial consequences) (Table 6). It seems that algorithm aversion is all the more pronounced in important tasks.

This is confirmed by a regression analysis which demonstrates the relationship between algorithm aversion and the perceived gravity of a scenario. To perform the regression analysis, we detach from the pairwise consideration of the two clusters and relate how serious an economic agent perceived the potential consequences of his or her decision and how it was decided. This is independent of the decision context, only the perceived gravity and the associated decision are considered. For the possible assessments of the consequences (from 0 = not serious to 10 = very serious), the respective average percentage of the decisions in favor of the algorithm is determined. The decisions of all 143 subjects are included in the regression analysis (Fig 3).

Fig 3. Decisions in favor of the algorithm depending on the perceived gravity of the decision-making situation.

Fig 3

If the common sense described above would have an effect, the percentage of decisions for the algorithm from left to right (in other words with increasing perceived gravity of the decision-making situation) would tend to rise. Instead, the opposite can be observed. Whereas in the case of only a low level of gravity (zero and one) 100% of decisions are still made in favor of the algorithm, the proportion of decisions for the algorithm decreases with increasing gravity. In the case of very serious implications (nine and ten), only somewhat more than half of the subjects decide to have the service carried out by an algorithm (Fig 3). If the perceived gravity of a decision increases by a unit, the probability of a decision in favor of the algorithm falls by 3.9% (t = -2.29; p = 0.023). The 95% confidence interval ranges from -7.27% to -0.54%. Null hypothesis 3 can therefore not be rejected. In situations which might have serious consequences in the case of an error, algorithm aversion is actually especially pronounced.

Further analysis shows that the choices between algorithms and human experts are also not statistically significantly influenced by gender (χ2 (N = 143) = 2.22, p = 0.136), age (t (N = 143) = -0.44, = 0.661), mother tongue (χ2 (N = 143) = 2.68, p = 0.102), faculty at which a subject is studying (χ2 (N = 143) = 1.06, p = 0.589), semester (t (N = 143) = 0.63, p = 0.528), or previous participations in economic experiments (χ2 (N = 143) = 0.21, p = 0.644).

The six scenarios differ in numerous aspects. In order to identify the main factors influencing the decisions of the subjects, clusters are formed based on different criteria and examined with regard to differences in the subjects’ selection behavior. There are a total of ten ways to divide six scenarios into two clusters. All ten clustering opportunities are shown in Table 7.

Table 7. Overview of all possible clusters obtained by grouping the frames evenly.

Clustering
Opportunities
Cluster Frames n Algorithm Use χ2 p-value
A A1
A2
(1) (2) (3)
(4) (5) (6)
71
72
50.70%
70.83%
6.080 0.014
B B1
B2
(1) (2) (4)
(3) (5) (6)
73
70
60.27%
61.43%
0.020 0.888
C C1
C2
(1) (2) (5)
(3) (4) (6)
72
71
55.56%
66.20%
1.699 0.192
D D1
D2
(1) (2) (6)
(3) (4) (5)
74
69
60.81%
60.87%
0.000 0.994
E E1
E2
(1) (3) (4)
(2) (5) (6)
71
72
57.75%
63.89%
0.566 0.452
F F1
F2
(1) (3) (5)
(2) (4) (6)
70
73
52.86%
68.49%
3.667 0.056
G G1
G2
(1) (3) (6)
(2) (4) (5)
72
71
58.33%
63.38%
0.382 0.536
H H1
H2
(1) (4) (5)
(2) (3) (6)
72
71
62.50%
59.16%
0.168 0.682
I I1
I2
(1) (4) (6)
(2) (3) (5)
74
69
67.57%
53.62%
2.914 0.088
J J1
J2
(1) (5) (6)
(2) (3) (4)
73
70
63.01%
58.57%
0.296 0.586

(1) = Driving service, (2) = Evaluation of MRI scans, (3) = Assessment of criminal case files, (4) = Dating service, (5) = Selection of cooking recipes, (6) = Weather forecasts.

The criterion in focus in this study is the scope of a decision (clustering opportunity). We can group the six scenarios into tasks that have potentially serious consequences if performed incorrectly, e.g., death or unjust imprisonment. These are mainly driving service, evaluation of MRI scans, and assessment of criminal case files (cluster A1). On the other hand, there are tasks where the consequences are trivial if performed poorly. These are dating service, selection of cooking recipes, and weather forecasts (cluster A2). The chi-square test shows that the willingness to use an algorithm is significantly higher in the latter cluster (χ2 (N = 143) = 6.08, p = 0.014). The more serious the consequences of a decision, the less likely subjects are to delegate the decision to an algorithm.

Another aspect is the familiarity with a task (clusters J). A connection between algorithm aversion and familiarity has been suspected for some time. Luo et al. [57] argue that the more familiar and confident sales agents in dealing with a task, the more pronounced their algorithm aversion is. Gaube et al. [58] explicitly examine the influence of familiarity with a task on physicians’ algorithm aversion in the context of evaluating human chest radiographs. They contrast experienced radiologists, who have a great deal of routine with this task, with inexperienced emergency physicians. Their results also suggest that algorithm aversion may increase with increasing experience in handling a task. We can group the six scenarios into tasks that are performed frequently, perhaps even daily, by an average person. These are driving a car, selection of cooking recipes, and weather forecasts. Almost every day, each of us commutes to work or other places, decides what to eat, and wonders what the weather will be like during the day. On the other hand, evaluation of MRI scans and assessment of criminal case files are activities that most of us may never have encountered, and dating service is something that those who are single may use from time to time, and those who are in a relationship (hopefully) not that much. The chi-square test shows no significant difference between the clusters J1 and J2 (χ2 (N = 143) = 0.30, p = 0.586). Thus, the willingness to use an algorithm does not seem to be considerably affected by how often we engage in a particular activity.

Further interesting aspects are whether an algorithm requires an expert to operate it adequately or whether it can also be used by a layperson (cluster H), whether a task requires human skills, such as empathy, or not (Cluster D), and the maturity of the technology, i.e., whether the use of algorithms is already widespread today or not (Cluster F). Algorithm aversion has been observed both in extremely simple algorithms that a layperson can easily operate by him- or herself and in extremely complex algorithms (numerous examples can be found in [11]). Regarding human skills, Fuchs et al. [59] find that algorithm aversion is particularly high for tasks that are driven more by human skills than by mathematical data analysis. Kaufmann [60] shows that algorithm aversion can occur to a large extent in student performance evaluation, a task that is characterized as requiring a lot of empathy. On the maturity of technology, already 17 years ago Nadler & Shestowsky [61] raise the question of whether subjects may become accustomed to using an algorithm the longer it is established in the market.

It turns out that the willingness to choose an algorithm does not depend on the amount of expertise required to operate it (χ2 (N = 143) = 0.17, p = 0.682), nor on the extent to which human skills are involved in the task it is supposed to perform (χ2 (N = 143) = 0.00, p = 0.994). Regarding the maturity of technology, we see that activities that are already automated frequently in practice today, such as making weather forecasts, are also delegated to the algorithm much more often in the experiment (χ2 (N = 143) = 3.67, p = 0.056). However, the difference between the clusters F1 and F2 is not as large as between the clusters A1 and A2. Moreover, there is also no significant difference at a significance level of 0.05 in the frequency with which an algorithm is selected in the remaining five clustering opportunities. It therefore seems that of all the differences between the frames, the gravity of consequences of a decision are the most important aspect.

Discussion

General

The results are surprising, given that common sense would deem–particularly in the case of decisions which might have serious consequences–that the option with the greatest probability of success should be chosen. Also, with regard to Brehm’s motivational intensity theory, it can be argued that the importance of the successful execution of the action is not adequately reflected in the subjects’ decisions. In line with Hou & Jung [46] and Castelo, Bos & Lehmann [11], our results also show that a framing approach is suitable to influence decisions to engage an algorithm. The study by Utz, Wolfers & Göritz [47] shows that the preference to use an algorithm in moral scenarios (distribution of ventilators for Covid-19 treatment) is low. In our study, in scenarios that were perceived as scenarios with potentially serious consequences (driving service, evaluation of MRI scans and criminal case files) and also raise moral issues, a lower utilization rate of the algorithm is also shown. A survey by Grzymek & Puntschuh [62] clearly shows that people are less likely to use an algorithm in decision-making situations with potentially serious consequences, such as diagnosing diseases, evaluating creditworthiness, trading stocks, or pre-selecting job applicants, but more likely to use an algorithm in scenarios with less serious consequences, such as spell-checking, personalizing advertisements, or selecting the best travel route. In contrast, Renier, Schmid Mast & Bekbergenova [48] found no effect of gravity on the extent to which participants demand an improvement to an algorithm.

If subjects allow themselves to be influenced by algorithm aversion to make decisions to their own detriment, they should only do so when they can take responsibility for the consequences with a clear conscience. In cases where the consequences are particularly serious, maximization of the success rate should take priority. But the exact opposite is the case. Algorithm aversion appears most frequently in cases where it can cause the most damage. To this extent it seems necessary to speak of the tragedy of algorithm aversion.

Implications

Our results suggest that algorithm aversion is particularly prevalent where potential errors have dire consequences. This means that algorithm aversion should be especially addressed by those developers, salespeople, and other staff whose supervised areas of operation are related to human health and safety. This can be done, for example, through staff training and intensive field testing with potential users. In addition, the results suggest that clever framing of the activity that an algorithm undertakes can make users more likely to use the algorithm. For example, neutral words should be chosen when advertising medical or investment algorithms, rather than unnecessarily pointing out the general risks of such activity.

Limitations and directions for future research

Despite their advantages in establishing causal relationships, framing studies always carry the risk that subjects may have many other associations with the decision-making situations that are not the focus of the study, and yet have an unintended influence on the results. In our study, these are in particular the complexity and subjectivity of the tasks, but also moral aspects, that may be more relevant in the decisions with potentially serious consequences, in which the physical well-being or the freedom of humans are at stake. In addition, we do not focus on the variation in perceived gravity within one scenario (e.g., MRI scans for live threatening diagnosis vs. MRI scans for less severe diagnosis), but rather on the variation in gravity between different scenarios, which could be a risk in regards to causality. It remains for further studies to vary the gravity within one scenario. Moreover, these aspects may also include the familiarity from the everyday experiences of the subjects, which should be higher, for example, for weather forecasts than for MRI scans. However, these associations do not affect our core result. Our regression analysis only considers the correlation between algorithm aversion and the subjectively perceived gravity of consequences, regardless of the scenario, and finds that higher perceived consequences in general lead to an increase in algorithm aversion.

Second, it should be noted that prior experiences of the subjects and the maturity of the technologies may lead to different expectations regarding the success rates. For example, the use of algorithms for weather forecasting is already advanced and it is to be expected that an algorithm would perform better here than a human. In autonomous driving, on the other hand, the technology is not yet as advanced. However, to ensure the comparability of the scenarios, in our framing approach the probabilities must be identical in all scenarios, which may not always fit the subjects’ expectations. In addition, the success rates are directly given in the instructions of our experiment. In real life, we would first gather our own experience in all these areas to get an idea of when to rely on algorithms and when not to. Moreover, the sample size of our experiment is rather small with 143 participants. We therefore encourage future research efforts to further explore our results in a research design with more practice-oriented conditions and with a larger sample.

Finally, it is needless to say that the consequences of the decisions in our experiments might have to be borne by third parties. It would be possible to continue this line of research by giving up the framing approach and modeling a situation where the subjects are directly affected. In this case, different incentives would have to be introduced into the decision situations. Success in scenarios with possible serious consequences would then have to be rewarded with a higher amount than in scenarios with trivial consequences. However, we presume that our results would also be confirmed by an experiment based on this approach, given that it is a between-subjects design in which every subject is only presented with one scenario. Whether one receives €4 or €8 for a successful choice will probably not have a notable influence on the results. Nonetheless, the empirical examination of this assessment is something which will have to wait for future research efforts.

Summary

Many people decide against the use of an algorithm even when it is clear that the algorithm promises a higher probability of success than a human mind. This behavioral anomaly is referred to as algorithm aversion.

The subjects are placed in the position of a businessperson who has to choose whether to have a service carried out by an algorithm or by a human expert. If the service is carried out successfully, the subject receives a performance-related payment. The subjects are informed that using the respective algorithm leads to success in 70% of all cases, while the human expert is only successful in 60% of all cases. In view of the recognizably higher success rate, there is every reason to trust in the algorithm. Nevertheless, just under 40% of the subjects decide to use the human expert and not the algorithm. In this way they reduce the expected value of their performance-related payment and thus manifest the behavioral anomaly of algorithm aversion.

The most important objective of the study is to find out whether decision-making situations of varying gravity can lead to differing frequencies of the occurrence of algorithm aversion. To do this, we choose a framing approach. Six decision-making situations (with potentially serious / trivial consequences) have an identical payment structure. Against this background there is no incentive or reason to act differently in each of the six scenarios. It is a between-subjects approach–each subject is only presented with one of the six decision-making situations.

In the three scenarios with potentially serious consequences for third parties, just under 50% of the subjects exhibit algorithm aversion. In the three scenarios with potentially trivial consequences for third parties, however, less than 30% of the subjects exhibit algorithm aversion.

This is a surprising result. If a framing effect were to occur, it would have been expected to be in the opposite direction. In cases with implications for freedom or even danger to life, one should tend to select the algorithm as the option with a better success rate. Instead, algorithm aversion shows itself particularly strongly here. If it is only a matter of arranging a date, creating a weather forecast or offering cooking recipes, the possible consequences are quite clear. In a situation of this kind, one can still afford to have irrational reservations about an algorithm. Surprisingly, however, algorithm aversion occurs relatively infrequently in these situations.

One can call it the tragedy of algorithm aversion because it arises above all in situations in which it can cause particularly serious damage.

Supporting information

S1 Fig

(TIF)

S2 Fig

(TIF)

S1 Data. Framing and algorithm aversion—Supplementary data.

(XLSX)

S1 File. Instructions for the game.

(DOCX)

S2 File. Test questions.

(DOCX)

S3 File. Decision-making situations.

(DOCX)

S4 File. Determination of the random event with the aid of a lottery.

(DOCX)

Acknowledgments

The authors would like to thank the editor, the anonymous reviewers, the participants in the German Association for Experimental Economic Research e.V. (GfeW) Meeting 2021, the participants in the Economic Science Association (ESA) Meeting 2021, and the participants in the PhD seminar at the Georg August University of Göttingen, for their constructive comments and useful suggestions, which were very helpful in improving the manuscript.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Dawes R., Faust D. & Meehl P. (1989). Clinical versus actuarial judgment, Science, 243(4899), 1668–1674. doi: 10.1126/science.2648573 [DOI] [PubMed] [Google Scholar]
  • 2.Grove W. M., Zald D. H., Lebow B. S., Snitz B. E., & Nelson C. (2000). Clinical versus mechanical prediction: a meta-analysis. Psychological Assessment, 12(1), 19–30. [PubMed] [Google Scholar]
  • 3.Shariff A., Bonnefon J. F., & Rahwan I. (2017). Psychological roadblocks to the adoption of self-driving vehicles, Nature Human Behaviour, 1(10), 694–696. doi: 10.1038/s41562-017-0202-6 [DOI] [PubMed] [Google Scholar]
  • 4.Majumdar A. & Ward R. (2011). An algorithm for sparse MRI reconstruction by Schatten p-norm minimization, Magnetic resonance imaging, 29(3), 408–417. doi: 10.1016/j.mri.2010.09.001 [DOI] [PubMed] [Google Scholar]
  • 5.Simpson B. (2016). Algorithms or advocacy: does the legal profession have a future in a digital world?. Information & Communications Technology Law, 25(1), 50–61. [Google Scholar]
  • 6.Brozovsky L. & Petříček V. (2007). Recommender System for Online Dating Service, ArXiv, abs/cs/0703042. [Google Scholar]
  • 7.Sawaitul S. D., Wagh K. & Chatur P.N. (2012). Classification and Prediction of Future Weather by using Back Propagation Algorithm-An Approach, International Journal of Emerging Technology and Advanced Engineering, 2(1), 110–113. [Google Scholar]
  • 8.Ueda M., Takahata M. & Nakajima S. (2011). User’s food preference extraction for personalized cooking recipe recommendation, Proceedings of the Second International Conference on Semantic Personalized Information Management: Retrieval and Recommendation, 781, 98–105. [Google Scholar]
  • 9.Mahmud H., Islam A. N., Ahmed S. I., & Smolander K. (2022). What influences algorithmic decision-making? A systematic literature review on algorithm aversion. Technological Forecasting and Social Change, 175, 121390. [Google Scholar]
  • 10.Burton J., Stein M. & Jensen T. (2020). A Systematic Review of Algorithm Aversion in Augmented Decision Making, Journal of Behavioral Decision Making, 33(2), 220–239. [Google Scholar]
  • 11.Castelo N., Bos M. W. & Lehmann D. R. (2020). Task-dependent algorithm aversion, Journal of Marketing Research, 56(5), 809–825. [Google Scholar]
  • 12.Dietvorst B. J., Simmons J. P. & Massey C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err, Journal of Experimental Psychology: General, 144(1), 114–126. doi: 10.1037/xge0000033 [DOI] [PubMed] [Google Scholar]
  • 13.Youyou W., Kosinski M. & Stillwell D. (2015). Computer-based personality judgments are more accurate than those made by humans, Proceedings of the National Academy of Sciences, 112(4), 1036–1040. doi: 10.1073/pnas.1418680112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tversky A. & Kahneman D. (1981). The framing of decisions and the psychology of choice, Science, 211(4481), 453–458. doi: 10.1126/science.7455683 [DOI] [PubMed] [Google Scholar]
  • 15.Cornelissen J. & Werner M. D. (2014). Putting Framing in Perspective: A Review of Framing and Frame Analysis across the Management and Organizational Literature, The Academy of Management Annals, 8(1), 181–235. [Google Scholar]
  • 16.Vroom V. (1964). Work and Motivation, New York: John Wiley & Sons. [Google Scholar]
  • 17.Mento A. J., Cartledge N. D., & Locke E. A. (1980). Maryland vs Michigan vs Minnesota: Another look at the relationship of expectancy and goal difficulty to task performance, Organizational Behavior and Human Performance, 25(3), 419–440. [Google Scholar]
  • 18.Gollwitzer P. M. (1993). Goal achievement: The role of intentions, European Review of Social Psychology, 4(1), 141–185. [Google Scholar]
  • 19.Gendolla G. H. (1997). Surprise in the context of achievement: The role of outcome valence and importance, Motivation and Emotion, 21(2), 165–193. [Google Scholar]
  • 20.Köbis N. and Moss ink L. D. (2021). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry, Computers in Human Behavior, 114(2021), 1–13. [Google Scholar]
  • 21.Ku C. Y. (2020). When AIs Say Yes and I Say No: On the Tension between AI’s Decision and Human’s Decision from the Epistemological Perspectives, Információs Társadalom, 19(4), 61–76. [Google Scholar]
  • 22.Efendić E., Van de Calseyde P. P. & Evans A. M. (2020). Slow response times undermine trust in algorithmic (but not human) predictions, Organizational Behavior and Human Decision Processes, 157(C), 103–114. [Google Scholar]
  • 23.Niszczota P. & Kaszás D. (2020). Robo-investment aversion, PLoS ONE, 15(9), 1–19. doi: 10.1371/journal.pone.0239277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Horne B. D., Nevo D., O’Donovan J., Cho J. & Adali S. (2019). Rating Reliability and Bias in News Articles: Does AI Assistance Help Everyone?, ArXiv, abs/1904.01531. [Google Scholar]
  • 25.Logg J., Minson J. & Moore D. (2019). Algorithm appreciation: People prefer algorithmic to human judgment, Organizational Behavior and Human Decision Processes, 151 (C), 90–103. [Google Scholar]
  • 26.Rühr, A., Streich, D., Berger, B. & Hess, T. (2019). A Classification of Decision Automation and Delegation in Digital Investment Systems, Proceedings of the 52nd Hawaii International Conference on System Sciences, 1435–1444.
  • 27.Yeomans M., Shah A. K., Mullainathan S. & Kleinberg J. (2019). Making Sense of Recommendations, Journal of Behavioral Decision Making, 32(4), 403–414. [Google Scholar]
  • 28.Prahl A. & Van Swol L. (2017). Understanding algorithm aversion: When is advice from automation discounted?, Journal of Forecasting, 36(6), 691–702. [Google Scholar]
  • 29.Dietvorst B. J., Simmons J. P. & Massey C. (2018). Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them, Management Science, 64(3), 1155–1170. [Google Scholar]
  • 30.Commerford B. P., Dennis S. A., Joe J. R., & Wang J. (2019). Complex estimates and auditor reliance on artificial intelligence, 10.2139/ssrn.3422591. [DOI] [Google Scholar]
  • 31.Leyer, M., & Schneider, S. (2019). Me, You or Ai? How Do We Feel About Delegation, Proceedings of the 27th European Conference on Information Systems (ECIS), 1–17.
  • 32.Önkal D., Gönül M. S., & De Baets S. (2019). Trusting forecasts, Futures & Foresight Science, 1(3–4), 1–10. [Google Scholar]
  • 33.Berger B., Adam M., Rühr A., & Benlian A. (2020). Watch Me Improve—Algorithm Aversion and Demonstrating the Ability to Learn, Business & Information Systems Engineering, 1–14. [Google Scholar]
  • 34.De-Arteaga, M., Fogliato, R., & Chouldechova, A. (2020). A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores, Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Paper 509, 1–12.
  • 35.Erlei A., Nekdem F., Meub L., Anand A. & Gadiraju U. (2020). Impact of Algorithmic Decision Making on Human Behavior: Evidence from Ultimatum Bargaining, Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 8(1), 43–52. [Google Scholar]
  • 36.Germann M., & Merkle C. (2019). Algorithm Aversion in Financial Investing, 10.2139/ssrn.3364850. [DOI] [Google Scholar]
  • 37.Ireland L. (2020). Who errs? Algorithm aversion, the source of judicial error, and public support for self-help behaviors, Journal of Crime and Justice, 43(2), 174–192. [Google Scholar]
  • 38.Jussupow, E., Benbasat, I., & Heinzl, A. (2020). Why are we averse towards Algorithms? A comprehensive literature Review on Algorithm aversion, Proceedings of the 28th European Conference on Information Systems (ECIS), https://aisel.aisnet.org/ecis2020_rp/168.
  • 39.Wang, R., Harper, F. M., & Zhu, H. (2020, April). Factors Influencing Perceived Fairness in Algorithmic Decision-Making: Algorithm Outcomes, Development Procedures, and Individual Differences, Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Paper 684, 1–14.
  • 40.Kawaguchi K. (2021). When Will Workers Follow an Algorithm? A Field Experiment with a Retail Business, Management Science, 67(3), 1670–1695. [Google Scholar]
  • 41.Alexander V., Blinder C. & Zak P. J. (2018). Why trust an algorithm? Performance, cognition, and neurophysiology, Computers in Human Behavior, 89(2018), 279–288. [Google Scholar]
  • 42.Filiz I., Judek J. R., Lorenz M. & Spiwoks M. (2021). Reducing Algorithm Aversion through Experience, Journal of Behavioral and Experimental Finance, 31, 100524. [Google Scholar]
  • 43.Frey B. S. (1992). Behavioural Anomalies and Economics, in: Economics As a Science of Human Behaviour, 171–195. [Google Scholar]
  • 44.Kahneman D. & Tversky A. (1979). Prospect theory: An analysis of decision under risk, Econometrica, 47(2), 263–291. [Google Scholar]
  • 45.Tversky A. & Kahneman D. (1974). Judgment under Uncertainty: Heuristics and Biases, Science, 185(4157), 1124–1131. doi: 10.1126/science.185.4157.1124 [DOI] [PubMed] [Google Scholar]
  • 46.Hou Y. T. Y., & Jung M. F. (2021). Who is the expert? Reconciling algorithm aversion and algorithm appreciation in AI-supported decision making, Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–25.36644216 [Google Scholar]
  • 47.Utz S., Wolfers L. N., & Göritz A. S. (2021). The effects of situational and individual factors on algorithm acceptance in covid-19-related decision-making: A preregistered online experiment, Human-Machine Communication, 3, 27–45. [Google Scholar]
  • 48.Renier L. A., Schmid Mast M., & Bekbergenova A. (2021). To err is human, not algorithmic–Robust reactions to erring algorithms, Computers in Human Behavior, 124, 106879. [Google Scholar]
  • 49.Simon H. A. (1959). Theories of Decision-Making in Economics and Behavioral Science, The American Economics Review, 49(3), 253–283. [Google Scholar]
  • 50.Grüne‐Yanoff T. (2007). Bounded Rationality, Philosophy Compass, 2(3), 534–563. [Google Scholar]
  • 51.Hoffrage U., & Reimer T. (2004). Models of bounded rationality: The approach of fast and frugal heuristics, Management Revue, 15(4), 437–459. [Google Scholar]
  • 52.Lipman B. L. (1995). Information Processing and Bounded Rationality: A Survey, Canadian Journal of Economics, 28(1), 42–67. [Google Scholar]
  • 53.Brehm J. W., & Self E. A. (1989). The intensity of motivation. Annual Review of Psychology, 40, 109–131. doi: 10.1146/annurev.ps.40.020189.000545 [DOI] [PubMed] [Google Scholar]
  • 54.Muraven M., & Slessareva E. (2003). Mechanisms of self-control failure: Motivation and limited resources, Personality and Social Psychology Bulletin, 29(7), 894–906. doi: 10.1177/0146167203029007008 [DOI] [PubMed] [Google Scholar]
  • 55.Fischbacher U. (2007). z-Tree: Zurich Toolbox for Ready-made Economic Experiments, Experimental Economics, 10(2), 171–178. [Google Scholar]
  • 56.Cohen J. (1992). A power primer, Psychological bulletin, 112(1), 155–159. doi: 10.1037//0033-2909.112.1.155 [DOI] [PubMed] [Google Scholar]
  • 57.Luo X., Qin M. S., Fang Z., & Qu Z. (2021). Artificial intelligence coaches for sales agents: Caveats and solutions, Journal of Marketing, 85(2), 14–32. [Google Scholar]
  • 58.Gaube S., Suresh H., Raue M., Merritt A., Berkowitz S. J., Lermer E., et al. (2021). Do as AI say: susceptibility in deployment of clinical decision-aids, NPJ Digital Medicine, 4(1), 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Fuchs, C., Matt, C., Hess, T., & Hoerndlein, C. (2016). Human vs. Algorithmic recommendations in big data and the role of ambiguity, Twenty-second Americas Conference on Information Systems, San Diego, 2016.
  • 60.Kaufmann E. (2021). Algorithm appreciation or aversion? Comparing in-service and pre-service teachers’ acceptance of computerized expert models, Computers and Education: Artificial Intelligence, 2, 100028. [Google Scholar]
  • 61.Nadler J., & Shestowsky D. (2006). Negotiation, information technology, and the problem of the faceless other, Negotiation Theory and Research, 145–172, New York. [Google Scholar]
  • 62.Grzymek V., & Puntschuh M. (2019). What Europe Knows and Thinks About Algorithms Results of a Representative Survey. Bertelsmann Stiftung, eupinions, February 2019. [Google Scholar]

Decision Letter 0

Christiane Schwieren

10 Mar 2022

PONE-D-21-39459The Tragedy of Algorithm AversionPLOS ONE

Dear Dr. Lorenz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 Both reviewers agree that the question you are tackling is important. However, they adress some important issues with the paper, too.From my own reading and the comments of the reviewers, I would suggest you look carefully at the recommendations provided that focus on the link with theory. This is something that can be improved rather easily, but should be done in a clear  and sensible way.The issues relating the design are more difficult to address. I agree with reviewer 2 that there are more differences between the conditions rather than just gravity of consequences. As your sample size is also not convincing, I am somewhat inclined to suggest you run more experiments, making it a series of experiments by that. You could keep your first set of experiments, but add a new set that tries to handle the issues reviewer 2 is rising. As reviewer 2 also suggests that you could get a better insight into what is happening by looking at your data, you could use this to develop a follow-up experiment that helps to get to a better understanding which aspects of the differences drive the results.If for some reasons it is not possible for you to run more experiments, you should at least improve your analyses and discuss very thoroughly the limits of your design and thus, of your results.

Please submit your revised manuscript by Apr 24 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Christiane Schwieren, Dr.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. Please ensure that you refer to Figure 4 and 5 in your text as, if accepted, production will need this reference to link the reader to the figure.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper tries to extend the findings related to algorithm aversion to establish a connection between the consequences of a decision-making circumstance and the frequency of algorithm aversion. The authors conduct a lab experiment with 143 university students, which assume a businessperson's perspective who has to choose between an algorithm and a human expert to carry out a service. With six decision-making situations that vary in gravity, the authors found that the more severe consequences, the higher frequency participants exhibit algorithm aversion.

I sympathize with the efforts to continue understanding algorithm aversion. Still, I have concerns regarding the theoretical development for establishing the relationship (severity of the consequences – frequency of algorithm aversion) and the robustness of the findings due to limited experiments conducted. I will focus my comments on major and minor concerns in what follows.

Major concerns:

- The theory development relating the severity of the consequences of the decision and the presence of algorithm aversion needs more work. In the introduction, the authors briefly explain the relation mentioning the framing effect, but the relationship is not clear enough to the reader. In the derivation of hypothesis 3, the authors mentioned the model of bounded rationality, the cognitive limitations that humans suffer, and that great efforts are required when decisions have severe consequences. But why do decisions with more severe consequences induce less algorithm aversion, as Hypothesis 3 propose? Maybe the opposite could also be argued? Being the core of the paper, this relationship deserves a better explanation and discussion of the fundaments behind the possible behavior of decision-makers.

Maybe the authors can derivate their theory based on how risky people perceived the task. Some risk definitions may help, like the one proposed by Kaplan and Garrick (1981), considering the “triplet” definition of risk as “scenarios, probabilities and consequences.” Previous research has studied and suggests a higher presence of algorithm aversion in riskier circumstances (Dietvorst & Bharti, 2020; Grgic-Hlaca et al., 2019; Kawaguchi, 2020). For a general view, you can look at the high-level influencing factors of algorithm aversion described by Mahmud et al. (2022) in their literature review.

- A better explanation of why the six task were chosen would be useful related to the severity of the consequences. This may help considering as the authors mentioned, that in treatment B there is a larger range of severity evaluation, making some subjects assess the gravity of the decision as very high. Although authors ask how participants perceive the severity of the different tasks, finding significant differences between both treatment conditions, the tasks are in very different domains. Therefore, other phenomena could affect algorithm use rather than the severity of the consequences. Using control variables may help increase the robustness of the findings.

- Regarding the experiment conducted, I would prefer a bigger sample than 143 participants considering the experiment design. If we consider that in each treatment, there are three tasks and, if I understand correctly is a between design study, every task has around 24 participants. More explanation on sample sizing with power calculations would be helpful.

Minor concerns:

- The first paragraphs of the introduction focused on the definition of algorithm aversion. Although the discussion is valuable because I agree that different interpretations could be given to the concept “algorithm aversion” (Berger et al., 2020; Jussupow et al., 2020), I do not think it is the focus of this study. I suggest that the authors motivate in the introduction more directly to their main focus related to algorithm use and the connection between the consequences of a decision and the frequency of algorithm aversion.

- Maybe another title for the paper could guide readers better to what the paper is about rather than “The Tragedy of Algorithm Aversion.”

- Providing supplementary materials such as data, surveys, analyses conducted, and preregistrations (if available) would be helpful. This may help promote further research in related topics, as well as promoting open science practices. Maybe supplementary material is available, and I was not aware.

References

Berger, B., Adam, M., Rühr, A., & Benlian, A. (2020). Watch Me Improve—Algorithm Aversion and Demonstrating the Ability to Learn. Business and Information Systems Engineering, 1–14. https://doi.org/10.1007/s12599-020-00678-5

Dietvorst, B. J., & Bharti, S. (2020). People Reject Algorithms in Uncertain Decision Domains Because They Have Diminishing Sensitivity to Forecasting Error. Psychological Science, 31(10), 1302–1314. https://doi.org/10.1177/0956797620948841

Grgic-Hlaca, N., Engel, C., & Gummadi, K. P. (2019). Human Decision Making with Machine Assistance. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW). https://doi.org/10.1145/3359280

Jussupow, E., Benbasat, I., & Heinzl, A. (2020). Why are we averse towards algorithms? A comprehensive literature review on algorithm aversion. ECIS 2020 Proceedings.

Kaplan, S., & Garrick, B. J. (1981). On The Quantitative Definition of Risk. Risk Analysis, 1(1), 11–27. https://doi.org/10.1111/J.1539-6924.1981.TB01350.X

Kawaguchi, K. (2020). When Will Workers Follow an Algorithm? A Field Experiment with a Retail Business. Management Science. https://doi.org/10.1287/mnsc.2020.3599

Mahmud, H., Islam, A. N., Ahmed, S. I., & Smolander, K. (2022). What influences algorithmic decision-making? A systematic literature review on algorithm aversion. Technological Forecasting and Social Change, 175. https://doi.org/https://doi.org/10.1016/j.techfore.2021.121390

Reviewer #2: The paper explores an interesting research question that has also been adequately derived from existing literature. Moreover, the study arrives at potentially meaningful results, should they be replicable. Unfortunately, there are some doubts about this due to problems with the experimental design.

Major comments:

Lack of control (I)

The experiment consists of 2 treatments, each treatment includes three “decision-making situations” (lines 119-130). This setup seems problematic as all six “situations” differ in many regards from each other (apart from the treatment effect “gravity of the consequences”). This relates to both differences between individual “situations” within one treatment as well as differences between individual “situations” across the two treatments and possible differences between treatments. Overall, these differences could provide alternative explanations for the observed effects that are not controlled for. In more detail:

• Treatment A (possibly serious consequences) consists of the three decision making situations “Autonomous driving”, “Evaluation of MRI scans” and “The assessment of criminal case files”, whereas treatment B (no serious consequences) consists of “Dating service”, “Selection of recipes” and “Drawing up weather forecasts”. The situational descriptions include many factors that could influence the decision of participants and that are often associated with algorithm aversion in the literature.

• For the description of the situations of treatment A this, for example, relates to factors such as morality and ethics (e.g. ethical questions related to decisions of autonomous vehicles in potential accident situations, morality questions in delegating criminal investigations). From the algorithm aversion literature it is also well understood that people in particular tend to distrust algorithms for medical diagnosis (here: MRI scan). In this situation, participants might in addition find it disturbing (or “immoral”) that the decision to rely on an algorithm is not made by a doctor but by the manager of the hospital (lines 511ff.). For the decision situations in treatment B these factors seem to be of less importance.

• Two of the situations of treatment B (dating services and receipts) appear to be more subjective in nature as compared to the situations of treatment A (in particular with regard to autonomous driving, but to a lesser extend also with regard to the other two “situations”). This also constitutes a difference to the less subjective third situation of treatment B (Weather forecast). This third “situation” of treatment B furthermore appears to differ in the degree of complexity. In addition, for weather forecasts participants might “expect” algorithms being involved. Overall, it seems that participants could potentially be more familiar with the use of algorithms in all three situations of treatment B. In the real world, algorithms are frequently used in such contexts already and real life experience of participants seems more probable here. Complexity, subjectivity and familiarity are all factors identified in the literature to affect algorithm aversion. This list is not exhaustive and more factors not controlled for might play a role.

Lack of control (II)

The authors explain that “The decision-making situations are selected in such a way that the subjects should be familiar with them from public debates or from their own experience. In this way, it is easier for the subjects to immerse themselves in the respective context.” (lines 134-137). Subjects will indeed very likely be familiar with the situation or context of the decisions from the real world (though to different degrees, as explained above). But this results in a loss of control over the experiment and therefore also appears problematic:

• The study defines a clear superiority of algorithms for all six “situations” identically (70% probability of success of the algorithm compared to 60% probability for human experts, lines 139-141). According to the definition of the authors, algorithm aversion only exists if subjects chose human experts despite the superior performance of algorithms. It appears problematic that better performance might not be given for the chosen situations, at least with regard to participants` experience and perceptions of the real world. For autonomous shuttle busses, for example, technological development is still at an early stage and the real world performance is often perceived as being (still) insufficient or at least poorer than that of a human operator. Choosing a human operator might thus be the result of this real-world experience or knowledge and not of algorithm aversion.

• With regard to the chosen (student) subject pool this problem seems to be of particular relevance. Some of the decision situations seem to be directly linked to the content of the study programs of participants (autonomous driving/ health diagnosis). 60 subjects (42.0%) study at the Faculty of Vehicle Technology, and 18 subjects (12.6%) at the Faculty of Health Care (lines 201 and 202).

Problems related to the incentive structure

• With the chosen experimental design, it remains somewhat unclear whether subjects base their decision on the situational description or on the performance factor/ the success probabilities provided in the introduction of the experiment.

• Participants could either understand the missing link between incentive scheme and situational descriptions. In this case it seems probable that some participants understand the game as a choice between two lotteries and decide solely based on the probabilities of the lotteries ignoring the context completely. Or they could base their decisions on the situational descriptions. In this case it would seem likely that participants are influenced by real world experience and not only (if at all) by the success probabilities of the card decks provided. The perceived real world performance might contradict with the probabilities provided. The heterogeneous background of the participants (different study programs that are related to the scenarios) might also play a role here.

• The lotteries are implemented with the help of physical card decks (lines 214/215). This non-digital implementation of the lotteries might further support the “decoupling” of incentive and situational description in this particular context (algorithms).

• Probabilities are “made up” and not created within the experiment.

• It should be noted that the success probabilities are explained in the introduction of the experiment, but not in the situational descriptions. This might affect the salience of this information.

• The incentive scheme does not take differences in gravity into account.

• Lines 336-337: “The differing consequences of the decision-making situations do not affect the subjects themselves, but possibly have implications for third parties.” Such implications for third parties are also purely hypothetical and outside the incentive scheme of the experiment.

Data analysis

The description and analysis of the data seems somewhat incomplete:

• Line 203: How many participants for the individual scenarios?

• Lines 226ff. Gravity check: no information provided on how participants assessed the gravity of the six individual scenarios. The scenarios within a treatment are added up without any further analysis. Differences between the scenarios could also (partially) explain why “subjects perceive the gravity of the decision-making situations significantly differently”.

• Analyze for possible differences resulting from different backgrounds (study programs)

• Analyze for possible gender effects (minor comment)

• Discuss the results of the manipulation check in more detail. Did manipulation really work properly for all six situations (e.g. recipes)?

To conclude: the authors argue that the “decisive advantage of a framing approach is that the influence of a factor can be clearly identified. There is only one difference between the decision-making situations in Treatment A and Treatment B: the gravity of the possible consequences.” (lines 306 -308). I cannot fully agree with this statement. All six situations differ from each other by more than one factor. Also, all three situations of treatment A taken together seem to differ by more than one factor from the situations of treatment B taken together. Some of these factors have been identified in previous literature as being particularly linked to algorithm aversion. In addition, the incentive scheme is not fully convincing. As a result, it to some degree remains unclear whether the observed differences are the result of the treatment effect or of other (uncontrolled for) differences or of real world experience. Some of these problems could possibly be addressed by a more in depth analysis of the data (the regression analysis described in lines 277ff. indicates a promising direction as it does not rely on the aggregation of scenarios into treatments), whereas some of the limitations seem inherent to the experimental design and should, at least, be discussed as such.

Minor comments

• Line 41: for a very recent systematic literature review also see: Hasan Mahmud, A.K.M. Najmul Islam, Syed Ishtiaque Ahmed, Kari Smolander, What influences algorithmic decision-making? A systematic literature review on algorithm aversion, Technological Forecasting and Social Change, Volume 175, 2022, 121390.

• Line 64/table 1: Sometimes a definition is provided that is only an indirect quote of a definition already provided elsewhere in the table. The added value of doing so remains somewhat unclear.

• Lines 93-94: More careful formulation suggested with regard to normative recommendations in particular when other aspects (e.g. ethical considerations) may also be of importance.

• Test questions: Explain what happens if someone answers test questions incorrectly (I assume that question needs to be answered again, but this is not explicitly mentioned in the manuscript).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 21;18(2):e0278751. doi: 10.1371/journal.pone.0278751.r002

Author response to Decision Letter 0


26 Aug 2022

The reviewers have sent us numerous suggestions for improvement. Unfortunately, our response letter is thus too long to fit in this text box in the submission tool. We have therefore uploaded it as a separate MS Word document and ask you to take a look at our responses there.

Attachment

Submitted filename: Framing and Algorithm Aversion - Rev Response.docx

Decision Letter 1

Christiane Schwieren

26 Sep 2022

PONE-D-21-39459R1The Extent of Algorithm Aversion in Decision-making Situations with Varying GravityPLOS ONE

Dear Dr. Lorenz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 From my own reading, both reviewers have made very careful and partially also similar comments, and found them partially addressed.Even though I see that adding more data is impossible at this stage (it would have been a major improvement), being more clear in the description of the theory behind the hypotheses is, in my view, key. Adding literature is one thing, that is certainly an improvement, but considering this literature thoroughly would improve the paper even more.The same holds for the discussion of design choices that might not be considered ideal in hindsight. Every experiment has design choices that, after the fact, are considered not ideal. However, if there are ways to deal with that in the data analysis - if only in being very careful with respect to causal language - is warranted, if it is impossible to add to the data collection.I also see some contradictions in the argumentation of the authors with respect to the vignettes. It is clearly NOT the case that the only thing that changes is gravity - it is always context AND gravity that changes, and thus, the results are less clear than the authors claim.I can imagine that the authors might consider this too much change to the paper, but even though Plos One differs in publication criteria from other outlets, it does have a strong commitment to clear and correct data analysis and interpretation of the results - including to be careful not to oversell.

Please submit your revised manuscript by Nov 10 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Christiane Schwieren, Dr.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I want to thank the authors for their work, but I do not think the current status of the paper meets the publication standard required. Reading the responses to my comments and the revised manuscript, I realize that the authors have not adequately addressed my major concerns. Particularly my major concerns regarding the theory development is not well explained in the introduction, and only adding references to the derivation of hypothesis 3 is not enough to clarify the theory relationship between severity of the consequences and algorithm aversion. Also, I think that 143 participants is a very small sample, as I mentioned before, and probably the experiment is underpowered. Unfortunately, the authors were not able to conduct new experiments. Therefore, I think this paper requires significant rework and especially conducting more experiments to get published.

Reviewer #2: The authors have considered some of the comments and have improved the paper considerably. However, this does not fully apply to the main problem I have seen (lack of control/ risk of spurious results). In contrast to the authors' explanation, I am still not fully convinced that the design allows for a clear proof of causality with respect to the correlation between gravity and algorithm aversion.

A causal relationship between two variables exists if a variation in the independent variable results in a variation in the dependent variable, keeping all other things equal (ceteris paribus). As explained in my previous comments, the scenarios (or treatments) are quite different from each other, they differ in many aspects. The authors have included a sentence addressing this difference now. But they, in my view, still do not recognize that as a result of these differences the relationship between the two variables could be spurious/ due to changes in a third variable.

In the reply letter, the authors (if understood correctly) imply that their experiment constitutes a vignette study (author response to Reviewer 2 comment 9). I also do not find this argument convincing. While I agree with the authors that a vignette design could proof useful to answer the RQ, I do not believe that such a design has been implemented (the term “vignette” does also not appear in the paper). Vignette surveys (factorial surveys) include a description of a situation, consisting of a systematic combination of characteristics (“dimensions”) and a systematic variation of these dimensions (“levels”). Through experimental variation unconfounded effects of the dimensions/ factors can then be estimated. The experiment on hand does, however, not vary the dimension “gravity” within one scenario (e.g. MRT scans for live threatening diagnosis vs. MRT scans for less severe diagnosis; weather forecast with possible live threatening consequences, e.g. for sailors vs. weather forecast with less severe consequences such as a beach day which is cancelled), but compares across different scenarios. This difference to the standard vignette experimental design is (again) problematic with regard to causality.

As explained earlier, this problem seems inherent to the experimental design to me. I believe that the experiment can still add value to the literature, but that this limitation (if correctly observed and shared by the other reviewer) is still not addressed sufficiently in the paper.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Christiane Schwieren

23 Nov 2022

The Extent of Algorithm Aversion in Decision-making Situations with Varying Gravity

PONE-D-21-39459R2

Dear Dr. Lorenz,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Christiane Schwieren, Dr.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I appreciate the authors' efforts regarding the derivation of the hypotheses and the further analysis using cluster analysis. Still, as I mentioned previously, it is necessary to conduct more experiments mainly to increase the number of participants. Therefore, if no further experiments are possible, I think this paper does not meet the threshold to be accepted.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

Acceptance letter

Christiane Schwieren

28 Nov 2022

PONE-D-21-39459R2

The Extent of Algorithm Aversion in Decision-making Situations with Varying Gravity

Dear Dr. Lorenz:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Christiane Schwieren

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    (TIF)

    S2 Fig

    (TIF)

    S1 Data. Framing and algorithm aversion—Supplementary data.

    (XLSX)

    S1 File. Instructions for the game.

    (DOCX)

    S2 File. Test questions.

    (DOCX)

    S3 File. Decision-making situations.

    (DOCX)

    S4 File. Determination of the random event with the aid of a lottery.

    (DOCX)

    Attachment

    Submitted filename: Framing and Algorithm Aversion - Rev Response.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES