Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Nov 29;17(11):e1009590. doi: 10.1371/journal.pcbi.1009590

Crowd control: Reducing individual estimation bias by sharing biased social information

Bertrand Jayles 1,2,*, Clément Sire 3, Ralf H J M Kurvers 1
Editor: Theodore Paul Pavlic4
PMCID: PMC8659305  PMID: 34843458

Abstract

Cognitive biases are widespread in humans and animals alike, and can sometimes be reinforced by social interactions. One prime bias in judgment and decision-making is the human tendency to underestimate large quantities. Previous research on social influence in estimation tasks has generally focused on the impact of single estimates on individual and collective accuracy, showing that randomly sharing estimates does not reduce the underestimation bias. Here, we test a method of social information sharing that exploits the known relationship between the true value and the level of underestimation, and study if it can counteract the underestimation bias. We performed estimation experiments in which participants had to estimate a series of quantities twice, before and after receiving estimates from one or several group members. Our purpose was threefold: to study (i) whether restructuring the sharing of social information can reduce the underestimation bias, (ii) how the number of estimates received affects the sensitivity to social influence and estimation accuracy, and (iii) the mechanisms underlying the integration of multiple estimates. Our restructuring of social interactions successfully countered the underestimation bias. Moreover, we find that sharing more than one estimate also reduces the underestimation bias. Underlying our results are a human tendency to herd, to trust larger estimates than one’s own more than smaller estimates, and to follow disparate social information less. Using a computational modeling approach, we demonstrate that these effects are indeed key to explain the experimental results. Overall, our results show that existing knowledge on biases can be used to dampen their negative effects and boost judgment accuracy, paving the way for combating other cognitive biases threatening collective systems.

Author summary

Humans and animals are subject to a variety of cognitive biases that hamper the quality of their judgments. We study the possibility to attenuate such biases, by strategically selecting the pieces of social information to share in human groups. We focus on the underestimation bias, a tendency to underestimate large quantities. In estimation experiments, participants were asked to estimate quantities before and after receiving estimates from other group members. We varied the number of shared estimates, and their selection method. Our results show that it is indeed possible to counter the underestimation bias, by exposing participants to estimates that tend to overestimate the group median. Subjects followed the social information more when (i) it was further away from their own estimate, (ii) the pieces of social information showed a high agreement, and (iii) it was on average higher than their own estimate. We introduce a model highlighting the core role of these effects in explaining the observed patterns of social influence and estimation accuracy. The model reproduces all the main experimental patterns well. The success of our method paves the way for testing similar interventions in different social systems to impede other cognitive biases.

Introduction

Human and non-human animal judgments and decisions are characterized by a plethora of cognitive biases, i.e., deviations from assumed rationality in judgment [1]. Biases at the individual level can have negative consequences at the collective level. For instance, Mahmoodi et al. showed that the human tendency to give equal weight to the opinions of individuals (equality bias) leads to suboptimal collective decisions when groups harbor individuals with different competences [2]. Understanding the role of cognitive biases in collective systems is becoming increasingly important in modern digital societies.

The recent advent and soar of information technology has substantially altered human interactions, in particular how social information is shared and processed: people share content and opinions with thousands of contacts on social networks such as Facebook and Twitter [35], and rate and comment on sellers and products on websites like Amazon, TripAdvisor, and Airbnb [68]. While this new age of social information exchange carries vast potential for enhanced collaborative work [9] and collective intelligence [1013], it also bears the risks of amplifying existing biases. For instance, the tendency to favor interactions with like-minded people (in-group bias [14]) is reinforced by recommender systems, enhancing the emergence of echo chambers [15] and filter bubbles [16] which, in turn, further increases the risk of opinion polarization. Given the importance of the role of biases in social systems, it is important to develop strategies that can reduce their detrimental impact on judgments and decisions in social information sharing contexts.

One promising, yet hitherto untested, strategy to reduce the detrimental impact of biases is to use prior knowledge on these biases when designing the structure of social interactions. Here, we will test whether such a strategy can be employed to reduce the negative effects of a specific bias on individual and collective judgments in human groups. We use the framework of estimation tasks, which are well-suited to quantitative studies on social interactions [1720], and focus on the underestimation bias. The underestimation bias is a well-documented human tendency to underestimate large quantities in estimation tasks [2030]. The underestimation bias has been reported across various tasks, including in estimations of numerosity, population sizes of cities, pricing, astronomical or geological events, and risk judgment [20, 2730]. To illustrate with one example, in [23] subjects had to estimate the number of dots (varying from 36 to 1010) on paper sheets. Subjects underestimated the actual number of dots in all cases. This study (and others) suggested that the tendency to underestimate large quantities could stem from an internal compression of perceived stimuli [2325]. The seminal study by Lorenz et al. (2011) has shown that the effects of the underestimation bias could be amplified after social interactions in human groups, and deteriorate judgment accuracy [19].

In the present work, we investigate the effects of different interaction structures, aimed at counteracting the underestimation bias, on individual and collective accuracy (details are given below). Moreover, we investigate how these structures interact with the number of shared estimates in shaping social influence and accuracy. Previous research on estimation tasks has largely overlooked both of these factors. Thus far, research on estimation tasks mostly discussed the beneficial or detrimental effects of social influence on group performance [19, 3137]. Moreover, most previous studies focused on the impact of a single piece of social information (one estimate or the average of several estimates), or did not systematically vary their number. In addition, in most studies, subjects received social information from randomly selected individuals (either group members, or participants from former experiments) [1720, 32, 3640]. In contrast to these previous works, in many daily choices under social influence, one generally considers not only one, but several sources of social information, and these sources are rarely chosen randomly [41]. Even when not actively selecting information sources, one routinely experiences recommended content (e.g., books on Amazon, movies on Netflix, or videos on YouTube) generated by algorithms which incorporate our ‘‘tastes’’ (i.e., previous choices) and that of (similar) others [42].

Following these observations, we confronted groups with a series of estimation tasks, in which individuals first estimated miscellaneous (large) quantities, and then re-evaluated their estimates after receiving a varying number of estimates τ (τ = 1, 3, 5, 7, 9, and 11) from other group members. Crucially, the shared estimates were selected in three different manners:

  • Random treatment: subjects received personal estimates from τ random other group members. Previous research showed that when individuals in groups receive single, randomly selected estimates, individual accuracy improves because estimates converge, but collective accuracy does not [19, 20]. We hence expected to also find improvements in individual accuracy, but not in collective accuracy, at τ = 1. Furthermore, we expected individual and collective accuracy to increase with the number of shared estimates, as we anticipated subjects to use the social information better with an increasing number of shared estimates [4345].

  • Median treatment: subjects received as social information the τ estimates from other subjects (excluding their own) whose logarithm—logarithms are more suitable because humans perceive numbers logarithmically (order of magnitudes) [46]—are closest to the median log estimate m of the group. This selection method thus selects central values of the distribution and removes extreme values. Since median estimates in estimation tasks are typically (but not always) closer to the true value than randomly selected estimates (Wisdom of Crowds) [4749], we expected higher improvements in accuracy than in the Random treatment.

  • Shifted-Median treatment: as detailed above, humans have a tendency to underestimate large quantities in estimation tasks. Recent works have suggested aggregation measures taking this bias into account, or the possibility to counteract it using artificially generated social information [20, 27]. Building on this, we here tested a method that exploits prior knowledge on this underestimation bias, by selecting estimates that are likely to reduce its effects. We define, for each group and each question, a shifted (overestimated) value m′ of the median log estimate m that approximates the log of the true value T (thus compensating the underestimation bias), exploiting a relationship between m and log(T) identified from prior studies using similar tasks (for details see Experimental Design). Individuals received the estimates whose logarithm were closest to m′ > m (excluding their own estimate). This selection method also tends to eliminate extreme values, but additionally favors estimates that are slightly above the center of the distribution. Given the overall underestimation bias, values slightly above the center of the distribution are, on average, closer to the true value than values at the center of the distribution. Therefore, we expected the highest improvements in collective and individual accuracy in this treatment. Note that our method uses prior domain knowledge (to estimate the true value of a quantity) but does not require a priori knowledge of the true value of the quantity at hand. That is, the accuracy of the selected estimates is a priori unknown, and they are only statistically expected to be closer to the truth.

We first describe the distributions of estimates and sensitivities to social influence in all conditions. Next, we shed light on the key effects influencing subjects’ response to social information, which are: (i) the dispersion of the social information, (ii) the distance between the personal estimate and the social information, and (iii) whether the social information is higher or lower than the personal estimate. We then build a model of social information integration incorporating these findings, and use it to further analyze the impact of the number of shared estimates on social influenceability and estimation accuracy. We find, in accordance with our prediction, that improvements in collective accuracy are indeed highest in the Shifted-Median treatment, demonstrating the success of our method in counteracting the underestimation bias.

Experimental design

Participants were 216 students, distributed over 18 groups of 12 individuals. Each individual was confronted with 36 estimation questions displayed on a tactile tablet (all questions and participants’ answers are included as supplementary material). Questions were a mix of general knowledge and numerosity, and involved moderately large to very large quantities. Each question was asked twice: first, subjects were asked to provide their personal estimate Ep. Next, they received as social information the estimate(s) of one or several group member(s), and were asked to provide a second estimate Es (see illustration in Fig A in S1 Appendix). When providing the social information, we varied (i) the number of estimates shown (τ = 1, 3, 5, 7, 9, or 11) and (ii) how they were selected (Random, Median, or Shifted-Median treatments). The subjects were not aware of the three different treatments and were simply told that they would receive τ estimates from the other participants. Each group of 12 individuals experienced each of the 18 unique conditions (i.e., combination of number of estimates shared and their selection method) twice. Across all 18 groups, each of the 36 unique questions was asked once at every unique condition, resulting in 12 × 36 = 432 estimates per condition (both before and after social information sharing). Students received course credits for participation and were, additionally, incentivized based on their performance. Full experimental details can be found in S1 Appendix.

Note that a similar experimental design, using similar questions, was used in three previous studies—by partly the same authors [2022]. We will regularly refer to and make comparisons with these studies, in particular when describing the model, as the model shares the same backbone structure as in the three other studies.

Compensating the underestimation bias

Previous research on estimation tasks has shown that the distributions of raw estimates is generally right skewed, while the distribution of their logarithm is much more symmetric [19, 31, 39, 50]. Indeed, when considering large values, humans tend to think in terms of order of magnitude [46], making the logarithm of estimates a natural quantity to consider in estimation tasks [20]. Because the distributions of log-estimates are usually close to symmetric, the distance between the center of these distributions and the truth is often used to measure the quality of collective judgments in such tasks (Wisdom of Crowds) [21]. Although the mean is sometimes used to estimate the center of the distributions of log-estimates, the median is generally a better estimate of it [51], as most distributions are closer to Laplace distributions than to Gaussian distributions [52] (the median and the mean are the maximum likelihood estimators of the center of Laplace and Gaussian distributions, respectively).

Fig 1A shows that, within our domain (data taken from a previous study [20]), there is a linear correlation between the median log estimate m and the log of the true value T: mγ log(T), where γ ≈ 0.9 is the slope expressing this correlation (the “shifted-median parameter”). In particular, the underestimation bias translates into a value of γ < 1. We found a similar linear relationship in our current study, both when using the same questions as used previously [20] (half of our questions; Fig 1B), and when using new questions (other half; Fig 1C), underlining its consistency. Fig B in S1 Appendix shows that this correlation is also found for general knowledge and numerosity questions, as well as for moderately large and very large quantities.

Fig 1. The relationship between the logarithm of the correct answer and the median of the logarithm of estimates for (a) 98 questions (one dot per question) taken from a former study [20] and (b, c) 36 questions from the current experiment.

Fig 1

Among the 36 questions, 18 were already asked in the above cited study (b) and 18 were new (c). The slopes of the linear regression lines are 0.91 (a), 0.88 (b) and 0.91 (c), underlining the robustness of this linear trend. Note that slopes lower than 1 reflect the underestimation bias.

In the following, we quantify the significance of a statement by estimating, from the bootstrapped distribution of the considered quantity, the probability p0 that the opposite statement is true (see Materials and methods for more details). By analogy with the classical p-value, we chose to define the significance threshold p0 < 0.05. By obtaining the probability p0 that the slopes in Fig 1 are larger than 1, we thus find that slopes are significantly lower than 1 (p0 = 0 for 100,000 bootstrap runs in all three cases, see Fig C top row in S1 Appendix). Likewise, by calculating the probability p0 that the regression slope in one panel in Fig 1 is lower than in another panel (i.e., that their difference is negative), we find that slopes are not significantly different from one another (difference between panel a and panel b: p0 = 0.2; difference between panel c and panel a: p0 = 0.37; difference between panel c and panel b: p0 = 0.17; see Fig C bottom row in S1 Appendix).

For each group and each question, we used this linear relationship to construct a value m′ (the “shifted-median value”) aimed at compensating the underestimation bias, i.e., to approximate the (log of the) truth: m′ = m/γ ∼ log(T), with γ = 0.9. m′ then served as a reference to select the estimates provided to the subjects in the Shifted-Median treatment.

Results

Distribution of estimates

Following previous studies where participants had to estimate a similarly various set of quantities, we use the quantity X=log(ET) to represent estimates, where E is the actual estimate of a quantity and T the corresponding true value [2022]. This normalization ensures that estimates of different quantities are comparable, and represents a deviation from the truth in terms of orders of magnitude. In the following, we will, for simplicity, refer to X as “estimates”, with Xp referring to personal estimates and Xs to second estimates. Fig 2 shows the distributions of Xp (filled dots) and Xs (empty dots) in each treatment and number of shared estimates τ.

Fig 2. Probability density function (PDF) of personal estimates Xp (filled dots and solid lines) and second estimates Xs (empty dots and dashed lines) in the Random (black), Median (blue), and Shifted-Median (red) treatments, for each value of τ.

Fig 2

Dots are the data and lines correspond to model simulations.

Confirming previous findings, we find narrower distributions after social information sharing across all 18 conditions (see Fig D in S1 Appendix). This narrowing amounts to second estimates Xs being, on average, closer to the truth than the Xp. The model distributions of Xp (solid lines in Fig 2) are simulated by drawing the Xp from Laplace distributions, the center (median) and width (average absolute deviation from the median) of which are taken from the experimental distribution of estimates for each question. The model distributions of Xs (dashed lines) are the predictions of our model presented below. One additional constraint was added in our simulations of both personal and second estimates: since in our experiment, actual estimates Ep,s are always greater than 1, we imposed that Xp,s > −log(T), leading to a faster decay of the distribution for large negative log estimates. Previous studies have shown that distributions of estimates are indeed well approximated by Laplace distributions [21, 52], and [21] presented a heuristic argument to explain the occurrence of such Laplace distributions in the estimation task context. In Fig E in S1 Appendix, we show the distribution of Xp when all conditions are combined. The good agreement between the data and the simulation further supports the Laplace distributions assumption.

Distribution of sensitivities to social influence S

Consistent with heuristic strategies under time and cognitive constraints [5355], we assume that subjects, in evaluating a series of estimates, focus on the central tendency and dispersion of the estimates that they receive as social information. These assumptions are also supported by other studies on estimation tasks [40, 56, 57]. Consistent with the logarithmic representation and Laplace distribution assumptions, we quantify the perceived central tendency and dispersion by the mean and average absolute deviation from the mean of the logarithms of the pieces of social information received, respectively.

We interpret a subject’s second estimate Xs as the weighted arithmetic mean (the arithmetic mean of the logs is equivalent to the log of the geometric mean) of their personal estimate Xp and of the mean M = log(G) of the estimates received (G is the geometric mean of the actual estimates received): Xs = (1 − S)Xp + SM, where S is defined as the weight subjects assign to M, which we will call the sensitivity to social influence. S can thus equivalently be expressed as S = (XsXp)/(MXp). S = 0 implies that a subject keeps their personal estimate, and S = 1 that their second estimate equals the geometric mean of the estimates received. As we will show below, S depends on the number of estimates received and their dispersion.

In the following analysis of S, we will restrict S to the interval [-1, 2] (for plotting reasons, we actually restrict S to the interval [-1.05, 2.05]), thereby removing large values of S that may disproportionately affect measures based on S, in particular its average. Such large values of S are indeed meaningless as they are contingent on the way S is defined, and do not reflect a massive adjustment from Xp to Xs. Consider, for example, the case where Xp = 5 and M = 5.001. Then, Xs = 5.1 gives S = 100, while Xs is not very different from Xp. Such a restriction amounts to removing about 5.3% of the data.

Fig 3 shows that the distribution of S, in all treatments and values of τ, consists of a peak at S = 0 and a part that resembles a Gaussian distribution. Fig F in S1 Appendix shows the distributions of the fraction of cases where S = 0 per participant and per question, along with the model predictions (see the section devoted to the model). The distribution for participants is broad, but the fair agreement between the model and the experimental data suggests that this variability could mainly result from the probabilistic nature of the distribution of S, and not necessarily from a possible (and likely) variability of the participants’ individual probability to keep their personal estimate (denoted P0 below). On the other hand, the variability of the fraction of cases where S = 0 is much lower between questions than between participants, in both experiment and model, although the agreement there is only qualitative.

Fig 3. Probability density function (PDF) of sensitivities to social influence S in the Random (black), Median (blue), and Shifted-Median (red) treatments, for each value of τ.

Fig 3

Solid lines are experimental data, and dashed lines fits using Eq 2. The experimental probability P0 to keep one’s personal estimate (S = 0) is shown in the top left corner of each graph.

We thus assume that with a constant probability P0, subjects keep their initial estimate (S = 0), and with probability Pg, they draw an S in a Gaussian distribution of mean mg and standard deviation σg. This assumption imposes the following relation:

S=Pgmg,i.e.,Pg=S/mg. (1)

To determine the values of Pg, mg and σg per condition (i.e., treatment and value of τ), we fit the distributions of S with the following distribution (using the “nls” function in R):

f(S)=(1-Pg)δ(S)+PgΓ(S,mg,σg),with (2)
Γ(S,mg,σg)=12πσgexp[-(S-mg)22σg2], (3)

where Pg is fixed by Eq 1, δ(S) is the Dirac distribution centered at S = 0, and Γ(S, mg, σg) is the Gaussian distribution of mean mg and standard deviation σg.

Note that in [20, 21], another peak was measured at S = 1, amounting to about 4% of answers. However, in our experiments, this peak was absent in almost all conditions, because when more than one estimate is shared, the second estimate is very unlikely to land exactly on the geometric mean of the social information. We, therefore, did not include it in the fit.

We next analyze the dependence of the fitted parameters Pg, mg and σg on τ in the three treatments.

Dependence of Pg, mg and σg on τ

Fig 4 shows Pg, mg and σg against τ in each treatment. At τ = 1, values are comparable in all treatments. At intermediate values of τ (τ = 3, 5, 7, or 9), we however observe differences between treatments, especially for Pg and mg. Similar to above, we quantify the significance of these treatment differences at intermediate values of τ by calculating (from the bootstrapped distribution of the considered quantity) the probability p0 that the sign of the difference in the average value of the quantity at hand (here Pg, mg or σg) is opposite to that of our claim. For instance, if our claim is that Pg is higher in the Shifted-Median treatment than in the Random treatment (namely, the average value of Pg in the Shifted-Median treatment minus that in the Random treatment is positive), then p0 is the probability that the average value of Pg in the Shifted-Median treatment minus that in the Random treatment is negative (see below). Note that we compare quantities at the treatment level. Namely, we compare functions of τ, and not individual values of τ (find more details in the Materials and methods).

Fig 4. Pg, mg and σg against the number of shared estimates τ, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 4

Error bars are computed using a bootstrap procedure described in the Materials and methods, and roughly represent one standard error.

We find that Pg and mg are significantly higher in the Median (Pg : p0 = 0.0016; mg : p0 = 0) and Shifted-Median (Pg : p0 = 0; mg : p0 = 0.003) treatments than in the Random treatment, indicating a higher tendency to follow the social information in these treatments. We also find that σg is significantly lower in the Median and Shifted-Median treatments than in the Random treatment (Median: p0 = 0.013; Shifted-Median: p0 = 0.029). Moreover, mg is significantly higher in the Median treatment than in the Shifted-Median treatment (p0 = 0). However, no significant difference in Pg (p0 = 0.13) and σg (p0 = 0.34) is found between the Median and the Shifted-Median treatments. The bootstrapped distributions underlying the calculation of p0 in all cases are given in Fig G in S1 Appendix. Finally, at τ = 11 the three measures are similar across treatments. This was expected since all three treatments are equivalent in this case (i.e., subjects receive all pieces of social information). Note that in [22], a similar Random treatment was conducted, leading to very similar results.

Dependence of the dispersion of the social information σ on τ

One major difference between treatments that could help explain the above results lies in the dispersion of the estimates received as social information σ = 〈|XSIM|〉, where XSI denotes the estimates received as social information. Recall that the estimates received in the Median and Shifted-Median treatments were selected by proximity to a specific value (see Experimental Design), and are thus expected to be, on average, more similar to one another (i.e., to have a lower dispersion) than in the Random treatment. Fig 5 shows that, as expected, the average dispersion 〈σ〉 is substantially lower in the Median and Shifted-Median treatments than in the Random treatment.

Fig 5. Average dispersion 〈σ〉 of the estimates received as social information against the number of shared estimates τ, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 5

σ〉 is mostly independent of τ in the Random treatment, while it increases with τ in the Median and Shifted-Median treatments. Dots and error bars are the data, and solid lines correspond to model simulations.

Moreover, 〈σ〉 increases with τ in these treatments, while it remains close to constant in the Random treatment. Expectedly, 〈σ〉 reaches a similar value in all treatments at τ = 11. We thus expect the dependence of Pg, mg and σg on τ observed in Fig 4 to be mediated by a dependence of these measures on σ. Note that our model reproduces the empirical patterns of Fig 5 very well. We use a “Goodness-of-Fit” (GoF) to quantify this agreement:

GoF=1Nττ(Oτ-Mτ)2Cτ2, (4)

where Nτ is the total number of observables (e.g., 5 values of τ here), Oτ and Mτ are respectively the measured observable and the model prediction at any given τ, and Cτ=στ++στ-2, where στ+ and στ- are the upper and lower parts of the error bars (the computation of which is detailed in the Materials and methods) at each τ. This measure is analogous to the reduced χ-squared (where errors are assumed to follow Gaussian distributions), and compares the accuracy of the model predictions to the observed fluctuations in the data. Reliable model predictions should result in a GoF of order 1, which is the case for all our figures. In addition to the GoF, we provide the relative error between the model predictions and the observed data, given by:

RelativeError=1Nττ|Oτ-Mτ||Mτ|. (5)

The GoF values and relative errors are provided in Table B in S1 Appendix.

Dependence of Pg, mg and σg on the dispersion σ

Fig 6 shows Pg, mg and σg as functions of the average dispersion of estimates received as social information 〈σ〉, for each combination of treatment and value of τ.

Fig 6. Pg, mg and σg against the average dispersion of estimates received as social information 〈σ〉, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 6

Each dot corresponds to a specific value of τ. Values at τ = 1 were excluded since there is no dispersion at τ = 1. Dashed lines show linear fits per treatment.

We find that Pg and mg decrease linearly with 〈σ〉, reflecting a decreasing tendency to compromise with the social information as the dispersion of estimates received increases. On the contrary, σg increases linearly with 〈σ〉, suggesting that the diversity of subjects’ responses to social influence increases with the diversity of pieces of social information received. Note that in the Random treatment, the linear fits are less significant, in particular due to the fact that the range in 〈σ〉 is smaller than for the two other treatments. Yet, we will also implement such linear relations for the Random treatment in the model presented below, although its impact should be weaker than for the other two treatments for which 〈σ〉 spans a much wider range.

Dependence of S on the dispersion σ: Similarity effect

As described above, Pg and mg combined determine the average sensitivity to social influence. Fig 7 shows how 〈S〉 = Pg mg—with the values for Pg and mg taken from Fig 6A and 6B—varies with the average dispersion of estimates received 〈σ〉.

Fig 7. Pg mg against the average dispersion of estimates received as social information 〈σ〉 in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 7

Each dot corresponds to a specific value of τ. The purple dashed line shows a linear regression over all points: Pg mg decreases linearly with 〈σ〉.

Consistent with Pg and mg in Fig 6, 〈S〉 = Pg mg decreases linearly with 〈σ〉 in all treatments. We call this the similarity effect. Moreover, this linear dependence of 〈S〉 on σ appears to be treatment-independent, as a linear regression over all points fits the data very well.

Note that since we found a linear dependence of Pg (Pg = a + bσ〉) and mg (mg = a′ + b′ 〈σ〉) on 〈σ〉, the dependence of 〈S〉 = Pg mg on 〈σ〉 could have been quadratic. Yet, the quadratic term bb′〈σ2 is of the order 0.2 × 0.2 × 0.52 = 0.01, and thus negligible.

Dependence of S on D = MXp: Distance and asymmetry effect

In [20, 21], where subjects received as social information the average estimate of other group members, S depended linearly on the distance D = MXp between the personal estimate Xp and the average social information M. This effect is known as the distance effect:

S(D)=α+β|D|. (6)

Fig 8 shows the distance effect for each condition, showing that the further the social information is away from the personal estimate, the stronger it is taken into account.

Fig 8. Average sensitivity to social influence 〈S〉 against the distance D = MXp between the personal estimate Xp and average social information M, in the Random (black), Median (blue), and Shifted-Median (red) treatments for all values of τ.

Fig 8

Dots are the data, and shaded areas represent the error (computed using a bootstrap procedure described in the Materials and methods) around the data. Dashed lines are fits using Eq 7, and dotted lines at the bottom of each panel show the density distribution of the data (in arbitrary units).

For each condition (and in agreement with [22]), we find that the center of the cusp relationship is located at D = D0 < 0, rather than at D = 0. Moreover, the left and right slopes (coined β and β+ respectively) are not always similar, requiring us to fit the slopes separately. These combined effects result in an asymmetric use of social information, whereby social information that is higher than the personal estimate is weighted more than social information that is lower than the personal estimate. This effect is known as the asymmetry effect, and we will discuss it in more details below.

Finally, Fig 7 showed that we need to consider the dependence of 〈S〉 on σ. Following Fig 7, we assume this dependence to be linear (with slope β′). Taking these results together, we thus arrive at the following fitting function:

S(D,σ,τ)=α(τ)+β±(τ)|D-D0(τ)|+β(τ)σ, (7)

where α, β±, β′ and D0 can a priori depend on τ. At τ = 1, σ = 0, therefore, β′ was excluded from the parameter fitting for this case. Further details of the fitting procedure are provided in the Materials and methods.

Fig 9 shows the fitted values against τ for each treatment, and suggests that these parameters do not systematically vary with τ. We next introduce a model of social information integration, in which we will, therefore, assume that these parameters are independent of τ, and equal to their average (when τ > 1, see below).

Fig 9. Fitted parameter values of D0, α, β, β+, and β′ against τ in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 9

Dashed lines correspond to the average over all values of τ > 1. Parameters do not show any clear dependence on τ in each treatment (except possibly for β′ in the Median treatment) and are taken as independent of τ in the model, equal to their experimental mean.

Model of social information integration

The model is based on Eq 7 and is an extension of a model developed in [22] (which itself builds on [20, 21]). The key effect we add is the dependence of subjects’ sensitivity to social influence on the dispersion of estimates received as social information, since the Median and Shifted-Median treatments select relatively similar pieces of social information to share, which heavily impacts social influence (Figs 6 and 7).

The model uses log-transformed estimates X as its basic variable, and each run of the model closely mimics our experimental design. For a given quantity to estimate in a given condition (i.e., treatment and number of shared estimates), 12 agents first provide their personal estimate Xp. Following Fig 2, these personal estimates are drawn from Laplace distributions, the center and width of which are respectively the median mp and dispersion σp = 〈|Xpmp|〉 of the experimental personal estimates of the quantity.

Next, agents receive as social information τ personal estimates from other agents in the group, selected according to the selection procedure of the respective treatment (see Experimental Design). Following Fig 3, agents either keep their personal estimate (S = 0) with probability P0, or draw an S in a Gaussian distribution of mean mg and standard deviation σg with probability Pg. According to Eq 1, Pg = 〈S〉/mg, and P0 = 1 − Pg. The calculation of 〈S〉 is based on the mean M and dispersion σ of these estimates received, and follows Eq 7. We thus obtain:

Pg(D,σ,τ)=S(D,σ,τ)/mg(σ)=(α(τ)+β±(τ)|D-D0(τ)|+β(τ)σ)/mg(σ). (8)

Finally, once an S is drawn for each agent, agents update their estimate according to:

Xs=(1-S)Xp+SM. (9)

At τ = 1, the values given to Pg, mg and σg were taken from Fig 4. When sharing more than 1 estimate (i.e., τ > 1), the linear dependencies of these parameters on the dispersion of the social information 〈σ〉, shown in Fig 6, were used. Similarly, the values of D0, α, β and β+ at τ = 1 were directly taken from Fig 9, while values of D0, α, β± and β′ at τ > 1 were averaged over τ, and these averages were implemented in the model. This separation is done because the fitting was qualitatively different for τ > 1 and τ = 1, β′ being absent in the latter (no dispersion at τ = 1).

In addition to this full model, we also evaluated two simpler models, leaving out either the similarity effect (βσ term) or the asymmetry effect (D0 < 0 and ββ+), to evaluate the importance of both effects in explaining the empirical patterns. Figs H, I, and J in S1 Appendix show the predictions when excluding the similarity effect, and Figs K, L, and M in S1 Appendix when excluding the asymmetry effect.

All model simulations results shown in the figures are averages over 10,000 runs. The full model reproduces well the distributions of estimates (Fig 2), and the dependence of 〈σ〉 on τ (Fig 5). We now use the model to analyze the impact of τ on sensitivity to social influence and estimation accuracy in each treatment.

Impact of τ on sensitivity to social influence S

Fig 10A shows how 〈S〉 varies with τ in all treatments. We find that in the Median and Shifted-Median treatments, 〈S〉 increases sharply between τ = 1 and τ = 3, before decreasing steadily, consistent with the patterns of Pg and mg in Fig 4 (〈S〉 = Pg mg). In the Random treatment 〈S〉 is largely independent of τ. At τ = 11, all conditions (again) converge.

Fig 10. Average sensitivity to social influence 〈S〉 against (a) the number of shared estimates τ and (b) the average dispersion of estimates received 〈σ〉, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 10

(a) In the Random treatment, there is only a minor dependence of 〈S〉 on τ. In the Median and Shifted-Median treatments, we find an inverse-U shape relationship with τ. This is due to the similarity effect, as shown in (b): a linear decrease of 〈S〉 with 〈σ〉 when τ > 1. Filled dots are the data, while empty dots and solid lines are model simulations.

These patterns result from the similarity effect shown in Fig 10B: 〈S〉 decreases as the dispersion of estimates received increases, when τ > 1. While in the Median and Shifted-Median treatments the different levels of τ correspond to different levels of dispersion (Fig 5), and thus different levels of 〈S〉, this effect is not present in the Random treatment. Note that consistently with the relation 〈S〉 = Pg mg, the experimental values in Fig 10B are the same as those of Fig 6.

The full model is in good agreement with the data (see GoF values in Table B in S1 Appendix). When removing the dependence on σ from the model (and re-fitting the parameters accordingly), the inverse U-shape in the Median and Shifted-Median is attenuated, and the decrease of 〈S〉 with 〈σ〉 is underestimated (Fig H in S1 Appendix). This demonstrates that the similarity effect is key to explaining the patterns of sensitivity to social influence.

Impact of τ on S when D < 0 and D > 0

A more intuitive way to understand the result that D0 < 0 and β+ > β is that subjects’ sensitivity to social influence is on average higher when D > 0 (i.e., when the average social information received by subjects is higher than their personal estimate) than when D < 0 (i.e., when the average social information received by subjects is lower than their personal estimate). Fig 11 shows this so-called asymmetry effect, which is fairly well captured by the full model (see GoF values in Table B in S1 Appendix).

Fig 11. Average sensitivity to social influence 〈S〉 against the number of shared estimates τ, in the Random (black), Median (blue), and Shifted-Median (red) treatments, when the average social information M is higher than the personal estimate Xp (D = MXp > 0; squares) and when it is lower (D < 0; triangles).

Fig 11

Subjects follow the social information more on average when M is higher than Xp, than when it is lower. Filled symbols represent the data, while solid lines and empty symbols are model simulations. Table A in S1 Appendix shows the percentage of cases when D < 0 and D > 0 in all conditions.

Below, we will show that this effect also drives improvements in estimation accuracy after social information sharing. Fig L in S1 Appendix shows that the model without the asymmetry effect is unable to reproduce the higher sensitivity to social influence when D > 0 than when D < 0.

Improvements in estimation accuracy: Herding effect

In line with [2022], and for a given group in a given condition, we define:

  • the collective accuracy as the absolute value of the median of all individuals’ estimates of all quantities in that group and condition: |Mediani,q(Xi,q)| (where i runs over individuals and q over quantities/questions);

  • the individual accuracy as the median of the absolute values of all individuals’ estimates: Mediani,q(|Xi,q|).

The closer to 0, the higher/better is the accuracy. Collective accuracy represents the distance of the median estimate to the truth, and individual accuracy the median distance of individual estimates to the truth (see [20] for a more detailed discussion of the interpretation of these two quantities and their differences).

Fig 12 shows how collective and individual accuracy depend on τ in each treatment. Let Oτ and Oτ denote the collective or individual accuracy for each value of τ, before and after social information sharing. Since the dependence of both quantities on τ is weak compared to the size of the error bars, we here consider their average O=1NττOτ and O=1NττOτ over all values of τ (Nτ = 6). We quantify the improvements in collective or individual accuracy as the positive difference between both averages: 〈O〉 − 〈O′〉, and assess their significance by computing the probability p0 that the improvement is negative.

Fig 12. Collective and individual accuracy against the number of shared estimates τ, before (filled dots) and after (empty circles) social information sharing, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 12

Values closer to 0 indicate higher accuracy. Solid and dashed lines are model simulations before and after social information sharing, respectively.

We find that collective accuracy improves mildly—but significantly—in the Random and Median treatments (p0 = 0.0002 and 0.0035 respectively, see the bootstrapped distributions in Fig N top row in S1 Appendix), as predicted by the model.

This improvement is due to the asymmetry effect (Fig 11), which partly counteracts the human tendency to underestimate quantities [20, 2729]. Indeed, giving more weight to social information that is higher than one’s personal estimate shifts second estimates toward higher values, thus improving collective accuracy. The model without the asymmetry effect is unable to predict this improvement in collective accuracy (Fig M in S1 Appendix).

In the Shifted-Median treatment the improvement in collective accuracy is substantially higher and highly significant (p0 = 0 for 10,000 bootstrap runs, see the bootstrapped distribution in Fig N top right panel in S1 Appendix). The improvement in collective accuracy is substantially (and significantly) higher in the Shifted-Median treatment than in the Random treatment (p0 = 0.0018). However, we find no significant difference in the improvement between the Median and Random treatments (p0 = 0.47). The corresponding bootstrapped distributions are shown in Fig O in S1 Appendix.

This higher improvement in the Shifted-Median treatment is a consequence of the selection procedure of the pieces of social information. As shown in Fig 10, participants have a tendency to partially follow the social information (0 < 〈S〉 < 1 in all conditions, a.k.a. herding effect). Although there are no substantial differences in 〈S〉 between the Median and Shifted-Median treatments, the estimates received as social information overestimate the group median in the Shifted-Median treatment. A similar level of 〈S〉 thus shifts seconds estimates toward higher values (as compared to the Median treatment), thereby partly countering the underestimation bias and boosting collective accuracy.

For the individual accuracy, we find substantial and significant improvements in all treatments (p0 = 0, see Fig N bottom row in S1 Appendix), with slightly (and significantly) higher improvements in the Median and Shifted-Median treatments than in the Random treatment (p0 = 0.028 and 0.027 respectively, see Fig P in S1 Appendix), due to the similarity effect which boosts social information use in these treatments (Fig 10).

This confirms previous studies showing that higher levels of social information use (when 0 < 〈S〉 < 0.5) increase the narrowing of the distribution of estimates (Fig 2), thereby increasing individual accuracy [19, 20]. The model correctly predicts the magnitude of improvements in all treatments (see GoF values in Table B in S1 Appendix).

As a final remark, note that the size of the error bars presented in Fig 12 (and in the figures of the next sections) for each individual condition (treatment and value of τ; before and after social information) could be slightly misleading. Indeed, for each bootstrap run, the 12 data points in each panel of Fig 12 are in fact highly correlated, and our paired analysis presented in Figs N and O in S1 Appendix (bootstrapped distributions of the difference between two considered observables) constitutes a more rigorous and fairer assessment of the significance of our claims presented above.

Impact of D on estimation accuracy

Because subjects behave differently when receiving social information that is higher (D = MXp > 0) or lower (D < 0) than their personal estimate, we next study how these different scenarios impact accuracy. Fig 13 shows the individual accuracy for each condition, separating the answers where the personal estimate of a subject was above or below the social information.

Fig 13. Individual accuracy against the number of shared estimates τ, before (filled dots) and after (empty circles) social information sharing, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 13

The population was separated into subjects’ answers where the average social information received M was lower than their personal estimate Xp (D = MXp < 0) and subjects’ answers where the average social information received was higher than their personal estimate (D > 0). Solid and dashed lines are model simulations before and after social information sharing, respectively. Individual accuracy improves marginally for D < 0, but substantially for D > 0.

We find that, in the Random and Median treatments, subjects were more accurate when D < 0 than when D > 0 before social information sharing. This is a consequence of the underestimation bias, as personal estimates in the former (latter) case are, on average, more likely to be above (below) the median estimate of the group—and therefore closer to (farther from) the truth. In the Shifted-Median treatment, however, we observe a more complex pattern: (i) at low values of τ, individual accuracy is worse before social information sharing in this treatment than in the Random and Median treatments when D < 0, while it is better when D > 0. This reversed pattern suggests that the shifted-median values tend, on average, to slightly overestimate the truth; (ii) individual accuracy improves with τ when D < 0, but declines with it when D > 0. As τ increases, the average social information indeed decreases until it is the same as in both other treatments at τ = 11. In all conditions, individual accuracy improves mildly (but significantly) after social information sharing when D < 0 (see Fig Q top row in S1 Appendix), while it improves substantially (and highly significantly) when D > 0 (see Fig Q bottom row in S1 Appendix). The model captures the main trends (or absence of them) well (see GoF and relative errors in Table B in S1 Appendix). Fig R in S1 Appendix shows the equivalent figure for collective accuracy, showing qualitatively similar results.

Note that it may seem puzzling that accuracy before social information sharing is condition dependent. This is because we consider subpopulations, selected according to specific criteria (D > 0 and D < 0). When selecting such subpopulations, nothing forbids that inter-individual differences exist in the accuracy of personal estimates. When considering the whole population such differences between conditions, by definition, disappear (Fig 12).

Impact of S on estimation accuracy

Finally, we studied how subjects’ sensitivity to social influence affects estimation accuracy, by separating subjects’ answers into those for which S was either below or above the median value of S in that condition. Fig 14 shows individual accuracy for both categories.

Fig 14. Individual accuracy against the number of shared estimates τ, before (filled dots) and after (empty circles) social information sharing, in the Random (black), Median (blue), and Shifted-Median (red) treatments.

Fig 14

In each condition, the subjects’ answers were separated according to their corresponding value of S with respect to the median of S. Solid and dashed lines are model simulations before and after social information sharing, respectively. When S is lower than the median, the subjects tend to keep their initial estimate, and individual accuracy therefore does not change much. When S is higher than the median, the subjects tend to compromise more with the social information, resulting in high improvements.

Subjects in the below-median category provided more accurate personal estimates than those in the above-median category. It is well-known that more accurate individuals use less social information (they are also the most confident subjects in their personal estimate [20]), and this insight has also been used to improve collective estimations [37]. This result is in part related to the distance effect (Fig 8): subjects use social information the least when their initial estimate is close to the average social information, which is itself, on average, close to the truth.

Because subjects in the below-median category disregard, or barely use, social information, they do not (or barely) improve in accuracy after social information sharing. We observe no significant improvement in the Random and Median treatments (p0 = 0.28 and 0.44, respectively), and a marginal improvement in the Shifted-Median treatment, although not clearly significant (p0 = 0.06; see Fig S top row in S1 Appendix). On the contrary, subjects in the above-median category tend to compromise with the social information, thereby substantially improving in individual accuracy after social information sharing, and reaching similar levels of accuracy as the below-median category. Improvements are highly significant in all treatments (p0 = 0, see Fig S bottom row in S1 Appendix).

The model, again, reproduces these findings well, in particular the magnitudes of improvements in all cases (see GoF and relative errors in Table B in S1 Appendix), which are also in agreement with [2022]. Fig T in S1 Appendix shows the equivalent figure for collective accuracy, showing qualitatively similar patterns, albeit with substantially higher improvements in the Shifted-Median treatment for the above-median category, consistent with Fig 12.

Discussion

We have studied the impact of the number of estimates presented to individuals in human groups, and of the way these estimates are selected, on collective and individual accuracy in estimating large quantities. Our results are driven by four key mechanisms underlying social information integration:

  1. subjects give more weight to the social information when the distance between the average social information and their own personal estimate increases (distance effect). This effect has been found in several previous studies [2022, 27]. But note that in [17, 58], the authors found that for large distances, the opposite was true (namely, the weight given to advice decreased with distance to the personal estimate);

  2. subjects give more weight to the central tendency of multiple estimates when it is higher than their own personal estimate, than when it is lower (asymmetry effect). This asymmetry effect, also found in [22, 27], shifts second estimates toward higher values, thereby partly compensating the underestimation bias and improving collective accuracy. The asymmetry effect suggests that people are able to selectively use social information in order to counterbalance the underestimation bias, even without external intervention (Random treatment). Note that we cannot exclude that this effect might be partly contingent to our experimental design, and that future works find no such effect, or the opposite effect, when participants are asked to estimate different sets of quantities;

  3. subjects follow social information more when the estimates are more similar to one another (similarity effect). Previous studies have shown that similarity in individuals’ judgments correlates with judgment accuracy [59, 60], suggesting that following pieces of social information more when they are more similar is an adaptive strategy to increase the quality of one’s judgments. Our selection method in the Median and Shifted-Median treatments capitalized on this effect as it selected relatively similar pieces of social information, thereby counteracting the human tendency to underuse social information [20, 61, 62], resulting in higher individual improvement in both treatments than in the Random treatment;

  4. subjects tend to partially copy each other (herding effect), leading to a convergence of estimates after social information sharing, and therefore to an improvement in individual accuracy in all treatments. This effect is adaptive in most real-life contexts, as personal information is often limited and insufficient, such that relying on social information, at least partly, is an efficient strategy to make better judgments and decisions. Moreover, note that contrary to popular opinion, convergence of estimates need not yield negative outcomes (like impairing the Wisdom of Crowds [19, 32, 37]): even if the average opinion is biased, sharing opinions may temper extreme ones and improve the overall quality of judgments [63]. This tendency to follow the social information has another important consequence: it is possible to influence the outcome of collective estimation processes in a desired direction. In the Shifted-Median treatment, we showed that subjects’ second estimates could be “pulled” towards the truth, thus improving collective accuracy. This is an example of nudging, also demonstrated in other contexts [64]. Previous studies have shown that the same tendency can also lead, under certain conditions, to dramatic situations in which everybody copies everybody else indiscriminately (“herd behavior”) [65].

Next, we developed an agent-based model to study the importance of these effects in explaining the observed patterns. The model assumes that subjects have a fast and intuitive perception of the central tendency and dispersion of the estimates they receive, coherent with heuristic strategies under time and computational constraints [5355], and consistent with previous findings [40, 56, 57]. By using simpler models excluding either the asymmetry effect or similarity effect, we demonstrated that the above effects are key to explaining the empirical patterns of sensitivity to social influence and estimation accuracy. It is conceivable that the strategies used by people when integrating up to 11 pieces of social information in their decision-making process are very diverse and complex. Yet, despite its relative simplicity, our model is able to capture all the main observed patterns, underlining the core role of these effects in integrating several estimates of large quantities.

Our goal was to test a method to improve the quality of individual and collective judgments in social contexts. The method exploits available knowledge about cognitive biases in a given domain (here the underestimation of large quantities in estimation tasks) to select and provide individuals with relevant pieces of social information to reduce the negative effects of these biases. In [21], the social information presented to the subjects was manipulated in order to improve the accuracy of their second estimates. However, at variance with our present study, the correct answer to each question needed to be known a priori, and was exploited by “virtual influencers” providing (purposefully) incorrect social information to the subjects, specifically selected to counter the underestimation bias. Even though such fake information can help the group perform better, our method avoids such deception, and extends to situations in which the estimation context is known, but not the truth itself. Note that our shifted-median value γ ≈ 0.9 aimed at approximating the truth. The results of [21] suggest that a slightly lower value of γ (thus aiming at slightly overestimating the truth) could boost improvements in accuracy even further.

Another previous study exploited the underestimation bias by recalibrating personal estimates, thereby also successfully counteracting the underestimation bias [27]. Fig U in S1 Appendix compares our Shifted-Median treatment to a direct recalibration of personal estimates, where all Xp are divided by γ = 0.9. Collective accuracy improves similarly under both methods. Individual accuracy, however, degrades with the recalibration method, while it strongly improves with the Shifted-Median method. Our method thus outperforms a mere recalibration of personal estimates. Moreover, note that recalibrating initial estimates may be useful from an external assessor’s point of view, but does not provide participants with an opportunity to improve their accuracy, individually or collectively.

Our method may, in principle, be applied to different domains. Future work could, for instance, test this method in domains where overestimation dominates, by defining a shifted-median value below the group median; or in domains where the quantities to estimate are negative (or at least not strictly positive) or lower than one (i.e., negative in log). Another interesting direction for future research would be to explore ways to refine our method. Figs V and W in S1 Appendix show that collective and individual accuracy improve more for very large quantities than for moderately large ones, although the levels of underestimation are similar in both cases (Fig B in S1 Appendix). This suggests that the linear relationship between the median (log) estimates and the (log of the) true value may be insufficient to fully characterize this domain of estimation tasks. Considering other distributional properties, such as the dispersion, skewness and kurtosis of the estimates received, could help to fine tune the selection method to further boost accuracy.

Finally, let us point out that our population sample consisted of German undergraduate students. In [20], a cross-cultural study was conducted in France and Japan, using a similar paradigm, and found similar levels of underestimation in both countries, albeit slightly higher levels of social information use in Japan. This suggests that our observed underestimation bias is widespread in this domain, although a systematic comparison of the levels of bias and social information use in different (sub)populations is still lacking. Filling this gap could represent a major step forward in research on social influenceability and cognitive biases.

To conclude, we believe that the mechanisms underlying social information use in estimation tasks share important commonalities with related fields (e.g., opinion dynamics [66]), and that our method has the potential to inspire research in such fields. For instance, one could imagine reducing the in-group bias by extending the amount of discrepant/opposite views presented to individuals in well-identified opinion groups. Implementing methods similar to ours in recommender systems and page-ranking algorithms may thus work against filter bubbles and echo chambers, and eventually reduce polarization of opinions [67]. Similarly, it is conceivable that the effects of well-known cognitive biases such as the confirmation [68] or overconfidence bias [69] could be dampened by strategically sharing social information.

Materials and methods

Computation of the error bars

The error bars indicate the variability of our results depending on the NQ = 36 questions presented to the subjects. We call x0 the actual measurement of a quantity appearing in the figures by considering all NQ questions. We then generate the results of Nexp = 1, 000 new effective experiments. For each effective experiment indexed by n = 1, …, Nexp (bootstrap runs), we randomly draw NQ=NQ questions among the NQ questions asked (so that some questions can appear several times, and others may not appear) and recompute the quantity of interest which now takes the value xn. The upper error bar b+ for x0 is defined so that C = 68.3% (by analogy with the usual standard deviation for a normal distribution) of the xn greater than x0 are between x0 and x0 + b+. Similarly, the lower error bar b is defined so that C = 68.3% of the xn lower than x0 are between x0b and x0. The introduction of these upper and lower confidence intervals is adapted to the case when the distribution of the xn is unknown and potentially not symmetric.

Quantification of significance

For any claim of which we want to assess the significance, we use the same bootstrap procedure as above and generate the distribution of the relevant quantity. In particular, this method allows us to study paired statistics for the difference between two relevant observables. For instance, in Fig 4, we want to check whether Pg is significantly higher in the Shifted-Median treatment than in the Random treatment at intermediate values of τ (τ = 3, 5, 7, 9). The quantity of interest is therefore the difference between the average value of Pg (over τ = 3, 5, 7, 9) in the Shifted-Median treatment and that in the Random treatment, and we want to show that this quantity is significantly positive. We generate Nexp = 10, 000 bootstrap runs for the quantity of interest (Nexp = 100, 000 for the distributions related to Fig 1) and obtain the distribution of its possible values. From this distribution, we can then calculate the probability p0 that the difference is negative. More generally, given any claim, we check its significance by estimating the probability p0 that its opposite is true. Given the similarity between this quantifier p0 and the classical p-value (although our approach does not assume Gaussian-distributed errors), we consider a result significant whenever p0 < 0.05. Note that in the context of our bootstrap approach, p0 = 0 means that no occurrence of the event was observed during the Nexp runs. Hence, the actual p0 is estimated to be of the order of or lower than 1/Nexp.

It is important to keep in mind that we compare functions of τ, and not just results for individual values of τ. Indeed, while differences for each value of τ may not always be significant, differences at the treatment level are often highly significant. Consider for instance collective accuracy in the Shifted-Median treatment of Fig 12. While error bars overlap at all levels of τ, casting doubt on the improvement’s significance at any individual value of τ, the fact that all points after social information sharing are below the points before social information sharing intuitively suggests that the improvement has to be significant, as confirmed by our paired statistical analysis. In fact, even if two observables that we wish to compare fluctuate wildly between bootstrap runs, leading to significant error bars which may overlap for the two quantities (like in Fig 12), these fluctuations are in fact highly correlated and the statistics of the difference between the two quantities should then serve at quantifying their relative magnitude.

Fitting procedure used in Fig 8

Each combination of treatment and number of shared estimates contains 432 estimates. When binning data, one has to trade off the number of bins (thus displaying more detailed patterns) and the size of the bins (thus avoiding too much noise). In Fig 8, the noise within each condition was relatively high when using a bin size below 1. However, bins of size 1 were hiding the details of the relationship between 〈S〉 and D, especially the location of the bottom of the cusp.

To circumvent this problem, we use a procedure that is well adapted to such situations (previously described in [22]). First, remark that a specific binning leaves one free to choose on which values the bins are centered. For instance, a set of 5 bins centered on -2, -1, 0, 1 and 2 is as valid as a set of 5 bins centered on -2.5, -1.5, -0.5, 0.5, and 1.5, as the same data are used in both cases. Both sets of points produced are replicates of the same data, but we now have 10 points instead of 5.

In each panel of Fig 8, we used such a moving center starting the first bin at -2, and the last one at +2, producing histograms (of bin size 1) in steps of 0.1 for the bin center. This replicated the data 9 times, thus having overall 10 replicates and 50 points, instead of 5. We then removed the values beyond D = 2, thus keeping 41 points (D = −2 to D = 2).

Next, we used the following functions to fit the data in Fig 8 and obtain the values of all parameters in each condition. At τ = 1:

Sfit=α+β±|D-D0|.

At τ > 1 (i.e., including the dispersion σ):

Sfit=α+β±|D-D0|+βσ.

D0 was first fitted separately, as the minimum of an absolute value function fitted locally with the data points shown in Fig 8, in the interval [-1.2, 0.2] in each condition. Only in the Random treatment at τ = 3 and 5, and in the Median treatment at τ = 3, was the upper bound taken as 0.6 instead of 0.2 (the lower bound always remained the same, i.e. -1.2).

For the fitting of α, β, β+ and β′, we used all the data comprised within the interval shown in Fig 8, namely [-2.5, 2.5] (bins are of size one, so the dot at D = 2, for instance, shows the average of S between 1.5 and 2.5). In a few cases only did we slightly restrict the fitting interval in order to obtain better results:

  • Random treatment, τ = 1 and 11: [-1.65, 2.5]

  • Median treatment, τ = 7: [-1.9, 2.5]

  • Shifted-Median treatment, τ = 3: [-2, 2.5]

  • Shifted-Median treatment, τ = 9: [-1.2, 1.5]

We wrote a program to perform the minimization of least squares. Let Q=i(Si-Sifit)2=i(Si-α-β±|Di-D0|-βσi)2 be the sum (β′ = 0 when τ = 1), over all the data in the chosen interval (indexed by i), of squared distances between S and Sfit. Note that the data indexed by i correspond to individual participant’s answers, not to the averaged values shown in Fig 8. This is why the squared distances are not weighted (all individual answers have the same weight). We then equated to 0 the partial derivatives of Q with respect to α, β, β+ and β′ (when τ > 1) to obtain the values of these parameters.

Supporting information

S1 Appendix. Details of the experimental design and supplementary figures and tables.

Further details about the experimental design are provided (including the questions asked), as well as figures and tables supporting the statistical analysis and the main discussion. Fig A: Experimental procedure for an example question. Fig B: Correlation between median estimate and correct answer for general knowledge VS numerosity questions and for very large VS moderately large quantities. Fig C: Significance analysis of the differences between slopes in all panels of Fig 1 as well as of these slopes being lower than 1. Fig D: Narrowing of the distributions of estimates after social information sharing in Fig 2 and analysis of its significance. Fig E: Probability density function (PDF) of personal estimates Xp for all conditions combined. Fig F: Probability density function (PDF) of the fraction of instances with S = 0 for each participant and each question. Fig G: Significance analysis of the differences in Pg, mg and σg between treatments in Fig 4. Fig H: 〈S〉 against τ and 〈σ〉 for the model without similarity effect. Fig I: 〈S〉 against τ, when D < 0 and D > 0, for the model without similarity effect. Fig J: Collective and individual accuracy against τ for the model without similarity effect. Fig K: 〈S〉 against τ and 〈σ〉 for the model without asymmetry effect. Fig L: 〈S〉 against τ, when D < 0 and D > 0, for the model without asymmetry effect. Fig M: Collective and individual accuracy against τ for the model without asymmetry effect. Fig N: Significance analysis of the improvements in collective and individual accuracy in Fig 12. Fig O: Significance analysis of the difference in improvement in collective accuracy between treatments in Fig 12. Fig P: Significance analysis of the difference in improvement in individual accuracy between treatments in Fig 12. Fig Q: Significance analysis of the improvement in individual accuracy in Fig 13. Fig R: Collective accuracy against τ when D < 0 and when D > 0. Fig S: Significance analysis of the improvement in individual accuracy in Fig 14. Fig T: Collective accuracy against τ when S is below and above Median(S). Fig U: Collective and individual accuracy against τ in the Shifted-Median treatment compared to a simple recalibration of initial estimates. Fig V: Collective accuracy against τ for moderately large and very large quantities. Fig W: Individual accuracy against τ for moderately large and very large quantities. Table A: Distribution of cases when the social information provided to an individual was higher (D > 0) or lower (D < 0) than their personal estimate Table B: Goodness-of-Fit and relative error between the data and the model.

(PDF)

Acknowledgments

We are grateful to Felix Lappe for programming the experiment, and thank Alan Tump, Lucienne Eweleit, Klaus Reinhold, and Oliver Krüger for their support in the organization of our study. We are grateful to the ARC research group for their constructive feedback.

Data Availability

The data supporting the findings of this study are available at figshare: https://doi.org/10.6084/m9.figshare.12472034.v2.

Funding Statement

B.J. and R.K. were partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC 2002/1 “Science of Intelligence” – project number 390523135. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Ehrlinger J, Readinger WO, Kim B. Decision-making and cognitive biases. Reference Module in Neuroscience and Biobehavioral Psychology, Encyclopedia of Mental Health (Second Edition). 2016;pp. 5–12 [Google Scholar]
  • 2. Mahmoodi A, Bang D, Olsen K, Zhao YA, Shi Z, Broberg K, et al. Equality bias impairs collective decision-making across cultures. Proceedings of the National Academy of Science of the USA. 2015;112(12):3835–3840. doi: 10.1073/pnas.1421692112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cha M, Haddadi H, Benevenuto F, Gummadi KP. Measuring user influence in twitter: The million follower fallacy. Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. 2010;pp. 10–17.
  • 4. Jansen BJ, Zhang M, Sobel K, Chowdury A. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology. 2009;60(11):2169–2188. doi: 10.1002/asi.21149 [DOI] [Google Scholar]
  • 5. Gonçalves B, Perra N. Social phenomena: From data analysis to models. (Heidelberg, New-York: Springer International Publishing AG; ) 2015. [Google Scholar]
  • 6. Cheng M, Jin X. What do Airbnb users care about? an analysis of online review comments. International Journal of Hospitality Management. 2019;76(A):58–70. doi: 10.1016/j.ijhm.2018.04.004 [DOI] [Google Scholar]
  • 7. Schafer JB, Konstan JA, Riedl J. E-commerce recommendation applications. Data Mining and Knowledge Discovery. 2001;5(1–2):115–153. doi: 10.1023/A:1009804230409 [DOI] [Google Scholar]
  • 8. O’Connor P. User-generated content and travel: A case study on tripadvisor.com. Information and Communication Technologies in Tourism 2008. 2008;pp. 47–58. [Google Scholar]
  • 9. Fowler JH, Christakis NA. Cooperative behavior cascades in human social networks. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(12):5334–5338. doi: 10.1073/pnas.0913149107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Salminen J. Collective intelligence in humans: A literature review. arXiv:1204.3401. 2012.
  • 11. Bonabeau E. Decisions 2.0: the power of collective intelligence. MIT Sloan Management Review, Cambridge. 2009;50(2):45–52. [Google Scholar]
  • 12. Woolley AW, Aggarwal I, Malone TW. Collective intelligence and group performance. Current Directions in Psychological Science. 2015;24(6):420–424. doi: 10.1177/0963721415599543 [DOI] [Google Scholar]
  • 13. Kurvers RHJM, Herzog SM, Hertwig R, Krause J, Carney PA, Bogart A, et al. Boosting medical diagnostics by pooling independent judgments. Proceedings of The National Academy of Sciences of the United States of America. 2016;113(31):8777–8782. doi: 10.1073/pnas.1601827113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Brewer MB. In-group bias in the minimal intergroup situation: a cognitive-motivational analysis Psychological Bulletin 1979;86(2):307–324. doi: 10.1037/0033-2909.86.2.307 [DOI] [Google Scholar]
  • 15. Garrett RK. Echo chambers online: Politically motivated selective exposure among internet news users. Journal of Computer-Mediated Communication. 2009;14(2):265–285. doi: 10.1111/j.1083-6101.2009.01440.x [DOI] [Google Scholar]
  • 16. Flaxman S, Goel S, Rao JM. Filter bubbles, echo chambers, and online news consumption. Public Opinion Quarterly. 2016;80(Special issue):298–320. doi: 10.1093/poq/nfw006 [DOI] [Google Scholar]
  • 17. Yaniv I. Receiving other people’s advice: Influence and benefit. Organizational Behavior and Human Decision Processes. 2004;93(1):1–13. doi: 10.1016/j.obhdp.2003.08.002 [DOI] [Google Scholar]
  • 18. Soll JB, Larrick RP. Strategies for revising judgment: How (and how well) people use others’ opinions. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2009;35(3):780–805. [DOI] [PubMed] [Google Scholar]
  • 19. Lorenz J, Rauhut H, Schweitzer F, Helbing D. How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(22):9020–9025. doi: 10.1073/pnas.1008636108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Jayles B, Kim H-r, Escobedo R, Cezera S, Blanchet A, Kameda T, et al. How social information can improve estimation accuracy in human groups. Proceedings of the National Academy of Sciences of the United States of America. 2017;114(47):12620–12625. doi: 10.1073/pnas.1703695114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Jayles B, Escobedo R, Cezera S, Blanchet A, Kameda T, Sire C et al. The impact of incorrect social information on collective wisdom in human groups. Journal of the Royal Society Interface. 2020;17(170):20200496. doi: 10.1098/rsif.2020.0496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Jayles B, Sire C, Kurvers HJMR. Impact of sharing full versus averaged social information on social influence and estimation accuracy. Journal of the Royal Society Interface. 2021;18:20210231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Indow T, Ida M. Scaling of dot numerosity. Perception & Psychophysics. 1977;22(3):265–276. doi: 10.3758/BF03199689 [DOI] [Google Scholar]
  • 24. Krueger LE. Single judgements of numerosity. Perception & Psychophysics. 1982;31(2):175–182. [DOI] [PubMed] [Google Scholar]
  • 25. Izard V, Dehaene S. Calibrating the mental number line. Cognition. 2008;106(3):1221–1247. doi: 10.1016/j.cognition.2007.06.004 [DOI] [PubMed] [Google Scholar]
  • 26. Crollen V, Castronovo J, Seron X. Under- and Over-Estimation: A Bi-Directional Mapping Process Between Symbolic and Non-Symbolic Representations of Number? Experimental Psychology. 2011;58(1):39–49. doi: 10.1027/1618-3169/a000064 [DOI] [PubMed] [Google Scholar]
  • 27. Kao AB, Berdahl AM, Hartnett AT, Lutz MJ, Bak-Coleman JB, Ioannou CC, et al. Counteracting estimation bias and social influence to improve the wisdom of crowds. Journal of the Royal Society Interface. 2018;15(141). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Lichtenstein S, Slovic P, Fischhoff B, Layman M, Combs B. Judged frequency of lethal events. Journal of Experimental Psychology: Human Learning and Memory. 1978;4(6):551–578. [PubMed] [Google Scholar]
  • 29. Hertwig R, Pachur T, Kurzenhäuser S. Judgments of risk frequencies: Tests of possible cognitive mechanisms. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2005;31(4):621–642. [DOI] [PubMed] [Google Scholar]
  • 30. Scheibehenne B. The psychophysics of number integration: Evidence from the lab and from the field. Decision. Advance online publication 2018. [Google Scholar]
  • 31. Mavrodiev P, Tessone CJ, Schweitzer F. Quantifying the effects of social influence. Scientific Reports. 2013;3:1360. doi: 10.1038/srep01360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Kerckhove CV, Martin S, Gend P,Rentfrow PJ,Hendrickx JM, Blondel VD. Modelling influence and opinion evolution in online collective behaviour. PLoS ONE. 2016;11(6):e0157685. doi: 10.1371/journal.pone.0157685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Becker J, Brackbill D, Centola D. Network dynamics of social influence in the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America. 2017;114(26), E5070–E5076. doi: 10.1073/pnas.1615978114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Luo Y, Iyengar G, Venkatasubramanian V. Social influence makes self-interested crowds smarter: an optimal control perspective. IEEE Transactions on Computational Social Systems. 2018;5(1):200–209. doi: 10.1109/TCSS.2017.2780270 [DOI] [Google Scholar]
  • 35. Faria JJ, Dyer JR, Tosh CR, Krause J. Leadership and social information use in human crowds. Animal Behaviour. 2010;79(4). [Google Scholar]
  • 36. King AJ, Cheng L, Starke SD, Myatt JP. Is the true’ wisdom of the crowd’ to copy successful individuals? Biology Letters. 2012;8(2):197–200. doi: 10.1098/rsbl.2011.0795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Madirolas G, de Polavieja GG. Improving collective estimations using resistance to social influence. PLOS Computational Biology. 2015;11(11):e1004594. doi: 10.1371/journal.pcbi.1004594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Moussaïd M, Kämmer JE, Analytis PP, Neth H. Social influence and the collective dynamics of opinion formation. PLoS ONE. 2013;8(11):e78433. doi: 10.1371/journal.pone.0078433 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Chacoma A, Zanette DH. Opinion formation by social influence: From experiments to modeling. PLoS One 2015;10(10):e0140406. doi: 10.1371/journal.pone.0140406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Yaniv I, Milyavsky M. Using advice from multiple sources to revise and improve judgments. Organizational Behavior and Human Decision Processes. 2007;103:104–120. doi: 10.1016/j.obhdp.2006.05.006 [DOI] [Google Scholar]
  • 41. Rand DG, Arbesman S, Christakis NA. Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Science of the United States of America. 2011;108(48):19193–19198. doi: 10.1073/pnas.1108243108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Analytis PP, Barkoczi D, Herzog SM. Social learning strategies for matters of taste. Nature Human Behavior. 2018;2:415–424. doi: 10.1038/s41562-018-0343-2 [DOI] [PubMed] [Google Scholar]
  • 43. Budescu DV, Rantilla AK. Confidence in aggregation of expert opinions. Acta Psychologica. 2000;104(3):371–398. doi: 10.1016/S0001-6918(00)00037-8 [DOI] [PubMed] [Google Scholar]
  • 44. Budescu DV, Rantilla AK, Yu H-T, Karelitz TM. The effects of asymmetry among advisors on the aggregation of their opinions. Organizational Behavior and Human Decision Processes. 2003;90(1):178–194. doi: 10.1016/S0749-5978(02)00516-2 [DOI] [Google Scholar]
  • 45. Budescu DV, Yu H-T. Aggregation of opinions based on correlated cues and advisors. Journal of Behavioral Decision Making. 2007;20(2):153–177. doi: 10.1002/bdm.547 [DOI] [Google Scholar]
  • 46. Dehaene S, Izard V, Spelke E, Pica P. Log or linear? distinct intuitions of the number scale in western and amazonian indigene cultures. Science. 2008;320(5880):1217–1220. doi: 10.1126/science.1156540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Galton F. Vox populi. Nature. 1907;75:450–451. doi: 10.1038/075450a0 [DOI] [Google Scholar]
  • 48. Surowiecki J. The wisdom of crowds. (Anchor Books, New York, NY: ) 2005. [Google Scholar]
  • 49. Herzog SM, Litvinova A, Yahosseini KS, Tump AN, Kurvers RHJM. The ecological rationality of the wisdom of crowds. (In Hertwig R., Pleskac T. J., Pachur T., & The Center for Adaptive Rationality, Taming uncertainty, pp. 245–262; Cambridge, MA: MIT Press; ) 2019. [Google Scholar]
  • 50. Ioannou CC, Madirolas G, Brammer FS, Rapley HA, de Polavieja GG. Adolescents show collective intelligence which can be driven by a geometric mean rule of thumb. PLoS ONE. 2018;13:e0204462. doi: 10.1371/journal.pone.0204462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Han Y, Budescu DV. A universal method for evaluating the quality of aggregators. Judgment and Decision Making. 2019;14(4): 395–411 [Google Scholar]
  • 52.Lobo MS, Yao D. Human judgment is heavy tailed: Empirical evidence and implications for the aggregation of estimates and forecasts. INSEAD working paper series. 2010.
  • 53. Tversky A, Kahneman D. Judgment under uncertainty: Heuristics and biases. Science. 1974;185(4157):1124–1131. doi: 10.1126/science.185.4157.1124 [DOI] [PubMed] [Google Scholar]
  • 54. Simon HA. Models of Bounded Rationality. (MIT Press; ) 1982. [Google Scholar]
  • 55. Gigerenzer G, Gaissmaier W. Heuristic decision making. Annual Review Psychology. 2011;62:451–482. doi: 10.1146/annurev-psych-120709-145346 [DOI] [PubMed] [Google Scholar]
  • 56. Harries C, Yaniv I, Harvey N. Combining advice: The weight of a dissenting opinion in the consensus. Journal of Behavioral Decision Making. 2004;17:333–348. doi: 10.1002/bdm.474 [DOI] [Google Scholar]
  • 57. Molleman L, Tump AN, Gradassi A, Herzog S, Jayles B, Kurvers RHJM, et al. Strategies for integrating disparate social information. Proceedings of the Royal Society B. 2020;287:20202413 doi: 10.1098/rspb.2020.2413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Schultze T, Rakotoarisoa AF, Schulz-Hardt S. Effects of distance between initial estimates and advice on advice utilization. Judgment and Decision Making. 2015;10(2):144–171 [Google Scholar]
  • 59. Kurvers RHJM, Herzog SM, Hertwig R, Krause J, Moussaïd M, Argenziano G, et al. How to detect high-performing individuals and groups: Decision similarity predicts accuracy. Science Advances. 2019;5(11):eaaw9011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Lim J, Lee S-H. Utility and use of accuracy cues in social learning of crowd preferences. PLoS ONE. 2020;15(10):e0240997. doi: 10.1371/journal.pone.0240997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Yaniv I, Kleinberger E. Advice taking in decision making: Egocentric discounting and reputation formation. Organizational Behavior and Human Decision Processes. 2000;83(2):260–281. doi: 10.1006/obhd.2000.2909 [DOI] [PubMed] [Google Scholar]
  • 62. Tump AN, Wolf M, Krause J, Kurvers RHJM. Individuals fail to reap the collective benefits of diversity because of over-reliance on personal information. Journal of the Royal Society Interface. 2018;15:20180155. doi: 10.1098/rsif.2018.0155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Davis-Strober CP, Budescu DV, Dana J, Broomell SB. When Is a Crowd Wise? Decision. 2014;1(2):79–101. doi: 10.1037/dec0000004 [DOI] [Google Scholar]
  • 64. Thaler R, Sunstein C. Nudge: Improving Decisions about Health, Wealth, and Happiness (Yale University Press; ). 2008. [Google Scholar]
  • 65. Banerjee AV. A simple model of herd behavior. The Quarterly Journal of Economics (Oxford University Press). 1992;107(3):797–817. doi: 10.2307/2118364 [DOI] [Google Scholar]
  • 66. Lorenz J. Continuous opinion dynamics under bounded confidence: A survey. International Journal of Modern Physics C. 2007;18(12):1819–1838. doi: 10.1142/S0129183107011789 [DOI] [Google Scholar]
  • 67. Sunstein CR. #Republic: Divided democracy in the age of social media. (Princeton, NJ: Princeton University Press; ) 2018. [Google Scholar]
  • 68. Nickerson RS. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology. 1998;2(2):175–220. doi: 10.1037/1089-2680.2.2.175 [DOI] [Google Scholar]
  • 69. Dunning D. Self-Insight: Roadblocks and Detours on the Path to Knowing Thyself (Psychology Press; ). 2012. [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009590.r001

Decision Letter 0

Natalia L Komarova, Theodore Paul Pavlic

1 May 2020

Dear Dr. Jayles,

Thank you for your submission, "Debiasing the crowd: selectively exchanging social information improves collective decision-making," for consideration at PLOS Computational Biology. The manuscript provides a thought-provoking perspective on mechanisms of information sharing that improve outcomes in collective decision-making systems, particularly for the case of aggregating judgment among a population of individuals prone to estimation bias. It occurred to me that the methodologies and target application used here could potentially resonate with a broad audience cutting across multiple disciplines, which would make PLOS Computational Biology an excellent venue. Consequently, I chose to sample the judgment of a diverse group of reviewers -- with no two reviewers from the same community of literature -- that would be representative of what I think would be the target audience for an eventually published paper. We were fortunate to have so many of this selection of reviewers agree to review the paper, which should be a good sign to the authors that the subject is of broad interest.

Although the reviewers were diverse and independent, their recommendations were consistent with each other (and relatively easy to aggregate) and also in agreement with my personal feeling toward this manuscript. In its current form, I cannot recommend this manuscript for acceptance in PLOS Computational Biology. The collection of reviewers has outlined a wide range of items that need to be clarified, literature that should be referenced (and terms that should be reconsidered), and important questions to be answered. I do not feel like the critiques from the reviewers pose insurmountable challenges, and so I would like to invite you to submit a major revision of your manuscript that addresses the feedback of the reviewers. I want to emphasize (and caution you) that this major revision is not a formality; a major revision will again be reviewed based on its merits, and the ultimate decision for that revision is not clear at this stage.

It is my feeling that most of the necessary revisions can be made by re-structuring the narrative around these empirical results and better contextualizing these results within the background of the literature highlighted by the five reviewers. In this exercise, I hope that you will also take an opportunity to evaluate whether there are additional links that may have been missed beyond the several examples brought up by these reviewers. For example, your manuscript's results about the effect of group size might also be compared and contrasted with Condorcet's jury theorem or other work describing how many voters are necessary to come to a correct or accurate result. As covered by the reviewers, you should also be careful about using loaded language ("herd") and oversimplifications (like saying that no prior information is necessary for the correction parameter even though the correction parameter was estimated from prior information (albeit from a different group)).

Overall, I do not feel like you should be discouraged by this decision. Clearly, there is broad interest in your submitted results, and we are all looking forward to your major revision to see how you have addressed the significant concerns brought up by a clearly very interested group of referees.

Best wishes to you --

Theodore (Ted) P. Pavlic

Guest Editor, PLOS Computational Biology

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Theodore Paul Pavlic

Guest Editor

PLOS Computational Biology

Natalia Komarova

Deputy Editor

PLOS Computational Biology

***********************

Thank you for your submission, "Debiasing the crowd: selectively exchanging social information improves collective decision-making," for consideration at PLOS Computational Biology. The manuscript provides a thought-provoking perspective on mechanisms of information sharing that improve outcomes in collective decision-making systems, particularly for the case of aggregating judgment among a population of individuals prone to estimation bias. It occurred to me that the methodologies and target application used here could potentially resonate with a broad audience cutting across multiple disciplines, which would make PLOS Computational Biology an excellent venue. Consequently, I chose to sample the judgment of a diverse group of reviewers -- with no two reviewers from the same community of literature -- that would be representative of what I think would be the target audience for an eventually published paper. We were fortunate to have so many of this selection of reviewers agree to review the paper, which should be a good sign to the authors that the subject is of broad interest.

Although the reviewers were diverse and independent, their recommendations were consistent with each other (and relatively easy to aggregate) and also in agreement with my personal feeling toward this manuscript. In its current form, I cannot recommend this manuscript for acceptance in PLOS Computational Biology. The collection of reviewers has outlined a wide range of items that need to be clarified, literature that should be referenced (and terms that should be reconsidered), and important questions to be answered. I do not feel like the critiques from the reviewers pose insurmountable challenges, and so I would like to invite you to submit a major revision of your manuscript that addresses the feedback of the reviewers. I want to emphasize (and caution you) that this major revision is not a formality; a major revision will again be reviewed based on its merits, and the ultimate decision for that revision is not clear at this stage.

It is my feeling that most of the necessary revisions can be made by re-structuring the narrative around these empirical results and better contextualizing these results within the background of the literature highlighted by the five reviewers. In this exercise, I hope that you will also take an opportunity to evaluate whether there are additional links that may have been missed beyond the several examples brought up by these reviewers. For example, your manuscript's results about the effect of group size might also be compared and contrasted with Condorcet's jury theorem or other work describing how many voters are necessary to come to a correct or accurate result. As covered by the reviewers, you should also be careful about using loaded language ("herd") and oversimplifications (like saying that no prior information is necessary for the correction parameter even though the correction parameter was estimated from prior information (albeit from a different group)).

Overall, I do not feel like you should be discouraged by this decision. Clearly, there is broad interest in your submitted results, and we are all looking forward to your major revision to see how you have addressed the significant concerns brought up by a clearly very interested group of referees.

Best wishes to you --

Theodore (Ted) P. Pavlic

Guest Editor, PLOS Computational Biology

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Jayles and Kurvers investigate how information exchange between group members affects the accuracy of both individual and collective estimations. In particular, they investigate the effect of provision of varying quantities of others’ estimates, and propose a new framework for information exchange based on the shifted-median of previous estimates that improves collective and individual accuracy compared to simply providing estimates from other individuals at random. An agent-based model is used to explore the mechanisms behind this improvement and the dynamics of individual estimate changes.

The central finding in this manuscript is that by ‘leveraging prior knowledge’ about a common underestimation bias, information exchange can be structured to improve collective and individual accuracy. Essentially what this means is that, since we know that people will typically underestimate quantities by a certain proportion, and that they will move towards other estimations they are given, their accuracy can be improved by providing an estimate that is (statistically speaking) likely to be a slight overestimation of the truth.

The study appears well-conducted, and the combination of empirical and modelling work is well constructed. However, as it stands, I do not find the central finding sufficiently significant for publication in PLoS Computational Biology. I would justify this as follows:

1. Although the authors contend that their shifted-median method does not rely on recourse to the truth, this is only true in the sense that the answer to one specific question is unknown. It relies on strong statistical regularities in the relation between individual estimates and the truth. This is what is meant by ‘leveraging prior knowledge about this bias’ in the abstract.

2. Where these regularities apply, it would be more straightforward to simply adjust all individual estimates or the collective estimate (however obtained) directly, rather than by the contrived mechanism of providing individuals with estimates from specially chosen other individuals. There is no sense here that the selective exchange of social information could be generated endogenously from within the group, but instead it is imposed by an external agent. This same external agent could instead manipulate either individual or collective estimates directly.

3. Where these regularities do not apply there is no reason to think that this method would give improved estimations (and could even make them worse). The authors’ own introduction reveals that human estimation and decision making is prone to many contradictory biases (e.g. pessimism and optimism, L43-44). It is unlikely that one could reliably know in advance whether the specific context lends itself to underestimation (though if one could, see point 2 above)

To make a stronger case for the relevance of these results, this manuscript therefore needs a convincing motivation for:

1. The underestimation bias being widespread, important and reliably present, or identifiable in advance (so that the method works and it is known that it will work)

2. Why it is important to affect individuals estimations via the provision of selectively chosen estimations from others, rather than either directly manipulating the original estimates or simply providing alternative information to estimators (e.g. “experience suggests you are likely to be underestimating, consider raising your estimate”).

I would suggest that the authors carefully consider whether this crucial point can be sufficiently motivated before revising the manuscript.

Minor points:

1. It would be interesting to compare the accuracy of the mean as both an individual and collective measure as well as the median. The authors contend that the median of group estimates are more reliable (L132): they may certainly be more robust/less variable for small samples, but it would be useful to see if the stronger effect on the mean of the rare large estimates would counteract the underestimation bias. It would also be good to see if the method used here works for estimation of quantities that are not strictly positive.

2. The modelling work in the manuscript (largely explained in SI) is detailed and shows a good progression of models to explain features of the empirical data. A further suggestion would be to consider how each effect used in the agent-based models can be justified in terms of rational or adaptive behaviour. At the very least, there is an established statistical literature on information integration that could inform the dependence on dispersion of social information.

3. Participants were motivated to be accurate by relatively small financial reward differences based on categories of accuracy. It would be interesting to consider and discuss whether the specific structure of this reward influences the types of estimate received. For example, what if the occasional very large error had a greater or lesser effect on the final reward? Or rewards based on accuracy relative to other participants rather than solely individual accuracy?

4. L44: How can biases be individually rational? I think this requires an example considering that it is unintuitive

5. I felt the manuscript could be clearer in places by using more straightforward English. For example, replacing ‘leveraging’ with ‘using’ and ‘potentialities’ with ‘potential’. This is not to criticise the general standard of English, which is otherwise high.

Reviewer #2: I will state first that I am not in the area of social estimation, and my expertise is more in sequential design of experiments and Bayesian Optimization, which are techniques that could have an important role in the work presented by the authors.

The paper looks into the problem of underestimation bias in a setting with a group of people is asked to provide an estimate and subsequently is provided with several estimations from the remainder of the group (a subset). The authors use three mechanism to select how to exchange information: a random mechanism, a median driven mechanism, and a shifted median exchange, which considers the overestimation bias and compensates for it using a factor γ. The authors proceed optimizing this factor and identify three modeling factors to be considered to study the impact of the amount of exchanged information onto the individual accuracy and collective accuracies. These aspects are herd, asymmetry, and similarity.

The paper was an incredibly interesting read, I believe the authors make a good case on the contribution of their work compared with the available literature, which appear to be herd, asymmetry and similarity. I can appreciate the modeling value in this. However, there are some aspects which I believe the authors should clarify to make the paper contribution clearer and, possibly, stronger.

**What is the effect of the specific population being chosen? Are there aspects of the model that can indicate different outcomes in terms of the accuracy, based on the population? Can we have populations that are over and underestimating?

**It is unclear whether the theory being developed in this paper may apply to non quantitative cases. The authors refer in multiple places to recommendation systems, but more details should be provided on how the very idea of underestimation bias that the authors use to define the coefficient γ, can be extended to cases where no quantitative measures are available.

Detailed comments are attached.

Reviewer #3: This is an interesting paper reporting a model and an empirical study of collective judgment. The study seeks to understand the effects of information sharing in groups and in particular the effects of the amount of information shared and the selection process of which pieces of information are shares. A particular point of interest from the authors’ perspective is how well can a particular process (which they label “shifted median”) designed to counter natural individual judgment “biases” improve the quality of the judgments.

I like the topic and the approach. I think the experiment is well designed and, for the most part, it is well analyzed and clearly reported, but, in my view, this version is not ready for publication. Many of my reservations are related to the writing and presentation style which is imprecise and involves some over generalizations and is, occasionally, sloppy. I will list many of these instances in the order I spotted them in the manuscript, and not necessarily in terms of their importance or severity.

• There is a basic distinction in the literature between judgments and decisions. Decisions involve choices or valuations (usually of competing options) and involve consequences of these actions (often, but not always, monetary). For example – should I invest in A or B? Should I take medication X or Y? How much should I pay for this car, apartment, dress, etc.? Judgments are, as the name indicates subjective estimates of quantities, frequencies, probabilities, etc. that carry no such consequences. This paper is all about judgments and judgment biases, but the authors often refer incorrectly to decisions. This should be corrected throughout the ms.

• The authors refer to a lot of biases without ever properly defining what they mean. In some sense, without proper contextualization, any empirical regularity can be labeled a bias. In the classical work by Tversky and Kahneman biases are defined with reference to a normative model (probability theory) that dictates how judges should act in various circumstances (and even this approach is subject to criticism as in Costello and Watts, 2014), but in many of the cases listed in the paper, I am unsure why things are labeled biases. Many of them (e.g., Optimism, Pessimism) can be explained by other simpler accounts that are totally “unbiased”. This needs to be clearly explained.

• Is there a human tendency to underestimate quantities? I don’t think so! I think that it is fair to say that that human judgment is regressive and people tend to over (under) estimate low (high) quantities (see, for example reference [29] in the paper). This paper focuses primarily on “large” quantities (see the list in appendix), but fails to state this explicitly (exception line 129) and systematically, and creates the false impression that this is a more general pattern. This needs to be corrected throughout.

• There are multiple references to “human tendencies” (see for example line 22 – 23 in the Abstract). I think every one of these (over?) generalizations should be accompanied by some references to back it up.

• I am puzzled by the use of the term “exchange of information”. Every definition I am aware of, stresses the bi–directionality of any exchange, but in this context people are only receiving information from others and they can revise / adjust / refine their judgements in light of this new information, but they don’t offer anything in return, so there is no “exchange”. It is true, that every subject’s judgments are presented to the others in the group, but it is not clear to me that they know this and that there is any reciprocal thinking involved here. So, I would replace the term exchange with one that describes more accurately the setup.

• I think that the three presentation formats are not presented clearly enough, and I think it is worth explaining, what took me a while to recognize, namely that the Median differs from Random simply because it eliminates extreme values (essentially, trimming) and presents only the X/11 (X = 1,3,..11) central values of the distribution, and that Shifted Median presents the same values, but after a systematic shift.

• The statement on lines 96 – 97 is mathematically wrong (or, maybe not clearly stated): It easy to show many cases where the expected Random choice is closer to the truth than the Median. Example: Truth = 10; Assume a group of 6 people such that the 5 potential estimates are 1,2,3,4 and 9 (distances from truth = 9,8,7,6 and 1, respectively). If we choose k = 3 the median selects the estimates 2, 3 and 4 with a mean (and median) distance of 7 from Truth. But, if you consider all 10 (equally likely) different ways to choose 3 of the 5 you get Mean(10 Medians) = 7 and Mean(10 Means) = 6.2! And, if you choose k = 1, the median selects 3 with a mean (and median) distance of 7 from Truth. Under a random choice the mean distance to the truth is, again 6.2! Please clarify / correct.

• In forecasting there is a small literature on re-calibrating probabilities in aggregation (see papers by Baron et al and Turner et al). The shifted median is another, simpler, instance of re-calibration with a twist. In forecasting the transformation is done mechanically and externally after the estimation process. Here the judges are exposed to the re-calibrated judgments of their partners. This brings up an intriguing question. If one was to take the estimates from the Median condition and apply the same shifting transformation, how would these recalibrated aggregates compare to those obtain in the Shifted Median condition? Clearly it is easier to recalibrate things statistically / mechanically, but is it also better?

• I did not fully understand the difference between the collective and individual accuracy measure and I was frustrated by the insufficient and inadequate discussion. I assume that the authors calculate an individual measure for each of the 216 people based on their 36 judgments and that they calculate the collective accuracy for each of the 18 groups based on the group’s (12 members X 36 items =) 432 judgments. That is the way I would have done this but I am not sure this is what was done. I would like to see a better and clearer description.

• Related: On page 6 in the results section, you write “individual accuracy measures how close individual deviations from the truth are to 0 on average,” while technically your individual accuracy measure is an individual’s median accuracy across questions (correct?), not the mean.

• The items (listed in Appendix) are clearly of two types: The majority is based on general knowledge of actual facts such as “What is the population of X?” and a minority ask for a perceptual impression / estimation (e.g., “How many marbles are in the jar”?). It would be nice to present some evidence that the degree of underestimation is similar in the two classes (for example, use different colors for the two in Figure 1) and that the proposed method works equally for both.

• Line 166: when k = (n-1) = 11 one expects identical responses under various condition only in the absence of overweighting of one’s own original judgment (egocentric weighting), which is often seen in the literature (e.g. Yaniv & Kleinberg, 2000) and clearly in the present data (note that most values of S < 0.5).

• To make sense of the asymmetry effect and Figure 3, we need to know what is the distribution of cases where the weight of the social information is <, approximately = or > than one’s own.

• Related: Could a possible explanation for the asymmetric effect be that people, in fact, have some intuition that they tend to underestimate large quantities? Seeing others provide larger estimates than their initial belief may be a sort of cognitive permission to be more liberal with their beliefs about large quantities. While seeing smaller estimates may be seen as typical and expected.

Thinking about the results at a more general level:

Consider the improvements to collective accuracy as predicted by the model in the dashed lines of Figure 2a. Why does the model predict that the random selection method will improve collective accuracy more than the median selection method? This is especially puzzling, since this does not seem to reflect the actual differences between these groups, where it appears the random selection method was more linearly related to the number of estimates exchanged; while the median selection method was flatter across the number of estimates exchanged.

A possible intuition for this result might have to do with how the distance effect parameter was treated. It appears that the relationship between distance and belief updates was treated as linear and increasing, but this is not necessarily the universally observed expectation based on the social persuasion literature. For example, Whittaker (1963) found a curvilinear relationship between distance and belief updates. Other studies have found similar results (Fink, Kaplowitz, & Bauer, 1983; Laroche, 1977; Yaniv & Milyavsky, 2007) and Allahverdyan & Galstyan (2014) proposed a formal model incorporating this effect.

Looking at figure S6d, it does look like a curvilinear parameter might fit the data better than the linear one proposed. When combined with your model’s asymmetry effect, it seems possible that providing random estimates could lead to this somewhat odd result in Figure 2a. Random estimates are more likely to be extreme than median estimates (which are by definition the least extreme available). However, the asymmetry effect parameter means extreme estimates that are lower than the judges’ initial estimates get discounted, while ones that are higher do not. This could induce a somewhat artificial correction for the underestimation bias that isn’t present in the observed data. Treating the distance parameter as non-linear could potentially correct this and may be worth trying.

A somewhat related question is whether there were any effects of bracketing (see Herzog & Hertwig, 2009; Larrick & Soll, 2006; Soll & Mannes, 2011). The authors provide a formalization for how individuals incorporate the social information based on its geometric mean and standard deviation, but do not discuss whether people treat this information differently in cases where the estimates bracket their beliefs (whether their beliefs are within the bounds of the different estimates they receive) or do not. Normatively, in cases where people believe the estimates they receive bracket the truth, they should be more inclined to average and weight that advice fairly heavily; while in cases where the estimates they receive do not bracket the truth they should not (though this normative principle is not always observed in behavior). One could argue that estimates which bracket a judge’s initial estimate could be considered a bracket around their a priori belief about the truth, which would make such brackets qualitatively different from ones that do not. This could also have implications for comparing the random and median conditions. Especially when the number of estimates received is small, the diversity of random estimates may be more likely to bracket a judge’s initial beliefs; while the more homogenous median estimates may be less likely to.

Thank you for the opportunity to review this thought provoking paper

David Budescu

References

Allahverdyan, A. E., & Galstyan, A. (2014). Opinion dynamics with confirmation bias. PloS One, 9(7).

Baron, J., Mellers, B.A. Tetlock, P.E. Stone, E., Ungar, L.H. (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145.

Costello, F. & Watts, P. (2014). Surprisingly rational: Probability theory plus noise explains biases in judgment. Psychological Review, Vol 121(3), J 463-480

Fink, E. L., Kaplowitz, S. A., & Bauer, C. L. (1983). Positional discrepancy, psychological discrepancy, and attitude change: Experimental tests of some mathematical models. Communications Monographs, 50(4), 413–430.

Herzog, S. M., & Hertwig, R. (2009). The wisdom of many in one mind: Improving individual judgments with dialectical bootstrapping. Psychological Science, 20(2), 231–237.

Laroche, M. (1977). A model of attitude change in groups following a persuasive communication: An attempt at formalizing research findings. Behavioral Science, 22(4), 246–257.

Larrick, R. P., & Soll, J. B. (2006). Intuitions about combining opinions: Misappreciation of the averaging principle. Management Science, 52(1), 111–127.

Soll, J. B., & Mannes, A. E. (2011). Judgmental aggregation strategies depend on whether the self is involved. International Journal of Forecasting, 27(1), 81–102.

Turner, B.M., Steyvers, M., Merkle, E.C., Budescu, D.V., & Wallsten, T.S. (2014). Forecast aggregation via recalibration. Machine Learning, 95, 261-289.

Whittaker, J. O. (1963). Opinion change as a function of communication-attitude discrepancy. Psychological Reports, 13(3), 763–772.

Yaniv, I., & Milyavsky, M. (2007). Using advice from multiple sources to revise and improve judgments. Organizational Behavior and Human Decision Processes, 103(1), 104–120.

Yaniv. I. & Kleinberg. E. (2000). Advice Taking in Decision Making: Egocentric Discounting and Reputation Formation. Organizational Behavior and Human Decision Processes, 83(2), 260-281.

Reviewer #4: See attached.

Reviewer #5: In this manuscript, the authors extend previous work, by themselves as well as others in the field, to examine how social influence can affect individual and collective accuracy in estimation tasks. Specifically, here they examine how selecting the how many, and which, estimates to give to participants can improve decision accuracy by counteracting known estimation biases. Among other findings, they find that by providing participants with estimates closer to a modified median can substantially improve collective wisdom. They extend a mechanistic model of social influence incorporating the new phenomena that they identified in this study and find that their model can reproduce, to a large extent, their empirical findings. Furthermore, they use their model to identify an optimal strategy to maximized collective wisdom.

This work is a natural extension of research that has appeared in the past couple of years and makes some important findings by using a clever experimental design that clarifies some outstanding questions related to social influence and the wisdom of crowds. As such, I think that this is an important work that is highly suitable for PLoS Comp Biol. Also, I think that this manuscript is well written and the methods and analyses are sound -- as such, I would recommend publication of this manuscript almost as is. I only have a few minor comments that I hope will improve the clarity of the manuscript:

1. I think the paper Becker et al (2017) Network dynamics of social influence in the wisdom of crowds. PNAS should probably be cited somewhere since it speaks to a lot of the same issues as the present manuscript (how who-influences-who can affect the wisdom of crowds).

2. "our method does not require the a priori knowledge of the truth" (lines 143-144). While I know what the authors mean by this, one could in theory disagree with this statement because the parameterization of their model (i.e., that gamma = 0.9) requires knowledge of some previous truths. However, their method does not require knowledge of the truth for the present estimation task. This could be clarified.

3. line 192: I would argue that the asymmetry effect was described to some extent in reference 44 of this manuscript: Kao et al (2018) Counteracting estimation bias and social influence to improve the wisdom of crowds. J Royal Society Interface (Disclaimer: since I'm one of the authors of that manuscript, I've signed this review in the pursuit of transparency) In Figure 5a and 5c of that paper, we described a similar effect, where estimates larger than the focal individual's were weighted more heavily than estimates that were smaller. However, the empirical trend in the present manuscript is somewhat different, with a stronger effect size. In any case, it may be useful to point this out somewhere, to show that this asymmetry effect may be robust and widespread (although I refrain from pushing this point too strongly since it is a paper that I'm a co-author on).

Sincerely,

Albert Kao

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Current data availability statement is vague. The authors should specifiy where/how data will be made available on acceptance, and ideally provide data to reviewers for review.

Reviewer #2: Yes

Reviewer #3: None

Reviewer #4: Yes

Reviewer #5: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

Reviewer #5: Yes: Albert B. Kao

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: Review_PCOMPBIOL_D2000065.pdf

Attachment

Submitted filename: PCOMPBIOL-D-20-00065_Comments_to_Authors.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009590.r003

Decision Letter 1

Natalia L Komarova, Theodore Paul Pavlic

16 Sep 2020

Dear Dr. Jayles,

Thank you very much for submitting your manuscript "Debiasing the crowd: how to select social information for improving collective judgments?" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. Please address the reviewers' comments in your revisions.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Theodore Paul Pavlic

Guest Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

Natalia Komarova

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I am grateful to the authors for providing a detailed response to my previous review, and note that they have made substantial revisions to address reviewer comments. I would also note that having read the other reviews, I find that I am somewhat out of step with the other reviewers in my assessment of the interest/importance of the manuscript. I have no technical objections to the work in the manuscript, but below I would like to justify why I continue to believe the results are below the importance threshold for PLoS CB.

In their response to reviews the authors have repositioned their study as a test of how individual estimates of unknown quantities can be improved in social contexts. They justifiably note that they did not seek to find out how to best combine independent estimates to obtain collective accuracy. However, the text of the manuscript does not concord with the purported focus on improving individual judgements. The title of the manuscript points to the goal of improving collective judgements as the primary goal, a pattern that is repeated through the abstract (L8-9, L19, L26) and introduction (L55-56, L57-58, L69). In the results, on the occasion that individual and collective improvements are at odds, collective improvement is favoured (albeit at only minor cost to individual improvement, L 272-273). While the innovative experimental treatment (shifted-median) offers clear collective improvements, the benefit at the individual level is far from clear when compared to more standard information exchange (either random or median, Fig 2.). Only in the discussion is the revised goal of individual improvement placed at the fore (L361-362), but this is a departure from the large majority of the material that comes before.

As a result of the above, I don’t think the study can be evaluated without putting the collective efficacy of the methodology at the forefront; collective improvements are not presented as a happy consequence of individual improvement, but stand out as the most noted and noteworthy results. In this light, I consider that my previous criticism of the manuscript broadly remains: as long as the sharing of carefully chosen social information has to be coordinated by a central agent that knows the full distribution of initial estimates, it does not offer a efficient alternative method for ‘improving collective judgements’ as offered by the title of the manuscript. Since it is well-established that individuals will adjust their estimates towards those they see from others (on average), it is unsurprising that carefully choosing what they see so as to make it (statistically) more accurate will produce more accurate estimates overall. If this is coordinated centrally it is much the same *as if* individual estimates had been combined in an optimal way. If a mechanism could be designed to facilitate this biased information sharing operate endogenously (thus removing the central organising agent) then this criticism would fall away, but I cannot suggest any way to achieve this, especially as the central coordinator must also decide in advance if this estimation problem is one that belongs to the domain of typical under- or over-estimation.

Reviewer #2: I am happy with the responses and revisions of the authors and I think the papers is publishable in the current form.

Reviewer #3: This is an interesting paper reporting a model and an empirical study of collective judgment. The study seeks to understand the effects of information sharing in groups and in particular the effects of the amount of information shared and the selection process of which pieces of information are shared. A particular point of interest from the authors’ perspective is how well can a particular process (“shifted median”) designed to counter natural individual judgment biases improve the quality of the judgments. I like the topic and the approach. The experiment is well designed and, for the most part, it is well analyzed and clearly reported.

The authors addressed seriously most of my reservations and this version is much better. I have few outstanding issues/questions:

1. In clarifying the underestimation effect, the authors state “The underestimation bias is a widely documented human tendency to underestimate large quantities (typically larger than 100) in estimation tasks”. Their explanation and qualification with regard to the underestimation effect is improved. However, this example of “typically larger than 100” seems a bit odd. Isn’t there some degree of scale dependence? Would one expect the bias in describing something as a matter of 120 seconds, but not 2 minutes? The authors proceed to provide domain examples where the underestimation effect could be expected. It strikes me that it might be preferable to expand on one of these examples a bit rather than use a sort of arbitrary, scale-free pseudo-criterion like 100. What it is it about population estimates that makes it susceptible to the bias? Can you perhaps describe, or provide an example of, the distribution and its key properties? It also seems like this would serve as good motivation for the log transformations performed (though this is more clearly explicated in the revised version, which I appreciate).

2. The definitions of collective and individual accuracy are much improved. One further suggestion is that they could perhaps use slightly more qualitative motivation, more like what was provided in their response letter: “individual accuracy is a measure of the distance of an individual’s estimates from the truth, and collective accuracy is a measure of how far the central tendency of the estimates of the group is from the truth.” I think this sentence is helpful in what is a critical and potentially confusing distinction, and it is worth including something to this effect in the manuscript.

3. I am puzzled by the prediction regarding the random condition. The idea that people would be insensitive to the number of pieces of advice presented to them is counterintuitive (after all, everyone can do exactly what the authors are doing in the median condition, namely reject/ignore extreme values), and is inconsistent with empirical evidence about the way people aggregate information from multiple sources (e.g., Budescu & Rantilla ,2000; Budescu, Rantilla, Yu & Karelitz, 2003; Budescu & Yu, 2007).

4. Line 296: The tradeoff between bias and other factors in WoC is analyzed in Davis-Stober, Budescu, Dana & Broomell, 2014).

5. The authors seek to identify the “optimal” adjustment and refer to how close the shift is to the “true” value. I don’t know how seriously to take the notion that there is a “true” value. The target they use is based on the degree of underestimation observed in a study using a particular set of items / questions and I am not sure that it would replicate with different items (imagine asking people to estimate distances to various planets). I think some measure of caution and qualification is needed.

6. The one point I remain somewhat unpersuaded about is the explanation regarding the empirical results in figure 2a. Here is the question and response from the response letter:

16. This is especially puzzling, since this does not seem to reflect the actual differences between these groups, where it appears the random selection method was more linearly related to the number of estimates exchanged; while the median selection method was flatter across the number of estimates exchanged.

We agree that Fig. 2a gives the visual impression that collective improvement in the random treatment increases more linearly than in the median treatment, mostly because the blue point at tau = 1 is way higher than predicted by the model (and expected by us). Apart from this single point, both treatments increase, as predicted, linearly and it is likely that this single point deviated too much from its expected value due to noise, as often happens with limited samples. Though we tested 216 participants over 36 questions, our sample size per unique treatment combination is still limited. So, we believe we need to treat each single point with caution, but can have much more confidence in the general patterns.

While the non-linearity intuition may not have borne out, it still seems like this empirical pattern is more than just a single data point. While τ = 1 may be the most extreme deviation from the model prediction, it looks like the error bars for almost every point in the median condition almost entirely overlap. I’m not sure writing it off as sample size noise in a single treatment condition is fully justified. It is a possible explanation, but beyond the (tested) suggestion about non-linearity, isn’t a more straightforward alternative simply that in the median condition most of the benefit of social information comes from that first piece of information? The Wisdom-of-Crowds theory tells us that the median response has the expectation of being the most accurate single piece of social information available. Each subsequent piece of information (by definition not the median) is expected to be less accurate, so while the similarity and herding effects may tell us that more advice may increase the magnitude of the belief update, the actual expected value of the advice cannot improve. There is a necessary tradeoff between the impact and accuracy of extra information in this condition. In other words, there is possible theoretical reason to expect that, in the median condition, increasing τ should have less added benefit.

The authors suggestion of caution is warranted. However, the full extent the authors pay to this is to parenthetically note “(though it is unexpectedly high for τ = 1).” I’m not sure this is a sufficient treatment of this result, given how easy it is to come up with intuitive explanations. At the very least, further remarks in the context of replication seem warranted, and it may be possible to test this possible explanation on your data as well with minor modifications to your model.

Picky

• Footnote 2: There is no MLE for the “center” of a distribution; there are MLEs for well-defined statistical parameters (mean, median, mode, etc.)

• Line 197: Add (0 ≤ S ≤ 1).

Thank you for the opportunity to review this thought-provoking paper

David Budescu

References

Budescu. D.V. & Rantilla, A.K. (2000). Confidence in aggregation of expert opinions. Acta Psychologica, 104, 371-398.

Budescu, D.V., Rantilla, A.K, Yu, H., & Karelitz. T.M. (2003). The effects of asymmetry among advisors on the aggregation of their opinions. Organizational Behavior and Human Decision Processes, 90, 178 – 194.

Budescu, D.V., & Yu, HY. (2007). Aggregation of opinions based on correlated cues and advisors. Journal of Behavioral Decision Making, 20, 153-177.

Davis-Stober, C.P., Budescu, D.V., Dana, J., & Broomell, S.B. (2014). When is a crowd wise? Decision, 1, 79-101.

Reviewer #4: Please see attachment.

Reviewer #5: I appreciate the authors' efforts to respond to the comments and criticisms of _five_reviewers. I had only minor comments in the last round (reviewer 5), and unsurprisingly the authors have sufficiently addressed them in the current revision. However, I did take the time to read through all of the other reviewers' comments, and the authors' responses to them. I have some comments to the authors' responses to these comments, as well as a couple of new comments to the current version.

1. Some of the other reviewers (reviewer 1, comment 2; reviewer 2, comment 3; reviewer 3, comment 8) have mentioned the broader context of interventions that researchers could use to improve collective accuracy. To all of these comments, the authors responded that they seek to improve individual accuracy, rather than collective accuracy, thereby differentiating their paper from others. However, I disagree with this characterization of their paper. Collective accuracy, in addition to individual accuracy, comprises a major part of their results (e.g., Figure 2a, Figure 5). Furthermore, collective accuracy is mentioned multiple times, for example in their abstract and introduction (e.g., "cognitive biases... can impair the quality of collective judgments and decisions", "our restructuring of social interactions... substantially boosted collective accuracy", biases at the individual level can have negative consequences at the collective level").

Therefore, dismissing these reviewers' comments as irrelevant seems unjustified. I think that the authors could go in two directions:

(a) Substantially edit their text and results to really focus on improving individual accuracy, as they claim to solely do.

(b) Address some of the comments made by the reviewers, specifically, how their methodology compares with the universe of alternative methods to improve collective accuracy.

To add to this, if the authors are OK with manipulating what social information individuals have access to, then why not just generate fake social information, if the goal is simply to maximize individual/collective accuracy? If the authors now know the rules that individuals on average follow, then they could construct a set of completely fake social information that should push the individual exactly towards the correct answer (on average). Selecting only from the set of real social information seems to be a limitation, especially when group size is small. Moving towards fake social information would provide much more flexibility on the part of researchers. If this is true, then perhaps the "optimising collective and individual improvements" section could be modified with this more general intervention.

Or perhaps there is an ethical reason to not completely make up social information? If so, it's not clear to me why carefully selecting what social information an individual sees is so different from just making up the information. Perhaps there could be some discussion about the ethical considerations in these kinds of interventions.

2. Are the statements made in lines 180-186 backed up by some kind of statistical or quantitative analysis, or just mad by "visual inspection"? Particularly, the statement that "individual improvement also increases with tau in the Random treatment" seems dubious, as is, to a lesser extent, the statement that "individual improvement is generally higher in the Median and Shifted-median treatments than in the Random treatment."

3. The slopes of the linear regression lines in Figure 1 could be printed in the figure panels themselves, which I think would improve the clarity of this figure.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: None

Reviewer #3: None

Reviewer #4: Yes

Reviewer #5: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Giulia Pedrielli

Reviewer #3: No

Reviewer #4: No

Reviewer #5: Yes: Albert B. Kao

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

Attachment

Submitted filename: PCOMPBIOL-D-20-00065_R1_Comments_to_Authors.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009590.r005

Decision Letter 2

Natalia L Komarova, Theodore Paul Pavlic

22 Jul 2021

Dear Dr. Jayles,

Thank you very much for submitting your manuscript "Debiasing the crowd: how to select social information to improve judgment accuracy?" for consideration at PLOS Computational Biology. I apologize for a delay in the processing of this manuscript. You will find the referee comments below, along with the analysis of the Guest Editor. It is especially important that you address the comments relating to the statistical procedures implemented in your manuscript, as indicated by two of the referees and the Guest Editor. 

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Natalia L. Komarova

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Comments by the Guest Editor:

1. The contribution of this paper relative to the Interface article from Jayes et al. (2020) [21] and the preprint from Jayes et al. (2020) [48]. As discussed by Reviewer 4, both of these papers have significant overlap with the methods and aims of this paper. Furthermore, despite the overlap in authorship among the papers, they are referenced in the current manuscript as if they are independent support for the foundational arguments used in this manuscript. Instead, it appears like these papers were work done in parallel (potentially as part of one larger project), and this needs to be made more explicit. See detailed comments by Reviewer 4.

2. Several reviewers (explicit statements by Reviewers 4 and 5) have pointed out that the empirical arguments made by the authors lack statistical rigor. Some examples:

2.a) The value of gamma has been taken from a linear regression, but there are no data about the (adjusted) R^2 for this regression nor even a p-value indicating that this value of 0.9 is significantly different from 1.0. The value of gamma has been said to be visually similar across three different data sets, but no statistical test was used to justify that these three data sets are likely to have the same value (and that that value is significantly different from 1.0).

2.b) The experimental design makes use of paired before/after data, but the authors do not formally use any paired statistical analyses that would incorporate statistical blocking to account for variance that otherwise confounds the comparison across the different treatment groups.

2.c) The authors claim that the simulation model reproduces the empirical results well, but they do not attempt a formal goodness-of-fit analysis.

2.d) In general, the authors need to consider formal statistical analyses when making inferences about any empirical data -- especially when small numbers of replications are used. Visual arguments (or even arguments focused only on comparing means) are not convincing. Reviewer 5 points out that error bars represent a single standard deviation, which is a significant under estimation of confidence intervals. If we visually approximate confidence intervals by doubling the current error bars, the resulting bars show that different treatment conditions show a large overlap in response. Really showing that there is an effect requires a statistical test in this case. Even there is a significant effect demonstrated, the inferred effect size should be discussed.

In addition, some details of the experimental design pose some confusion. As pointed out by Reviewer 4, the authors state at one point that the FIRST choice of an individual seems to be influenced by the number of estimates shown to that individual. However, the experimental design should be such that the first choice by an individual occurs before any other choices are displayed. This may indicate that some rephrasing is needed, or the experimental design needs to be clarified.

3.) As discussed by Reviewer 3, the real value of the model over what is already shown empirically is not clear. As the authors have used the model, it should act as a lens helping to bring clear focus on which of several different hypothetical drivers are likely responsible for differences seen in the empirical data. However, the model does not currently complement the empirical data to provide clarity; it seems to supplement the empirical data and possibly just raise more questions. That said, I can understand how the authors might feel that without a computational model (and in light of the comments I will make in point 4 below), there may not be much reason this article would belong in PLOS Computational Biology. If the model really is the strong point of the paper that is what anchors it to this journal, then its contribution needs to be made more clear. If the model is removed from the paper, then I would recommend the authors lean on the relevance of this study to recommender systems (as they have already done). But, in the end, if the paper really becomes an empirical study of human psychology, it may be a better fit for PLOS ONE instead.

4.) The authors have gone to great lengths within the text of the article to focus on how information from the crowd can be used to reduce bias in the individual. Previous comments from reviewers have focused on how this paper is not about collective intelligence so much as leveraging information from an ensemble of other evaluators to help improve outcomes from the next evaluator. Still, the title of the paper starts with, "Debiasing the crowd," which suggests that the paper is about designing information sharing mechanisms to improve group outcomes. That is simply not the focus of this paper. The authors might want to consider an analogy to "control variates", a method employed for variance reduction in Monte Carlo methods. In variance reduction, each experimental replication has a multivariate output -- one (X) with a mean that is trying to be inferred, and another with a known mean (Y, Ybar). Control variates use the demonstrated correlation between the two response variables (cov(X,Y), which can be estimated from the data) in a similar way as the "S" variable described by the authors. In particular, the response variable (Y) with the known mean (Ybar) generates a difference from that known mean (Y-Ybar), and that difference can be scaled (with magnitude related to cov(X,Y)) to directly adjust the observed value of the focal variable (X). The authors seem to be asserting that a similar process goes on within the head of an individual when making the second prediction, and their method leverages this to try to reduce the bias in an individual. So, a more accurate title might be something like:

"Crowd Control: How to select social information to improve individual judgment accuracy"

Personally, I prefer titles that state the main results as opposed to posing questions that the reader is promised to find a an answer to within. With that in mind, I might suggest something like:

"Crowd Control: Reducing individual estimation bias by sharing biased subsets of evaluations from others"

That said, I find that one thing diluting the value of this article is that it appears to be two things at once. On one hand, it attempts to be a scientific article making inferences about how humans use social information. On the other hand, it attempts to be a design article suggesting how recommender systems (or other technologically enabled systems) might reduce intrinsic bias in the choices made by their users. I think the article would be improved if the authors would focus on one (and possibly leave the other as a short set of comments in discussion related to broader impacts). My personal recommendation would be to focus on better illuminating the four effects that relate to how humans make use of data from existing raters. Then the design comments could be left for (brief) discussion. If this was the focus, then the title of the paper would not be built around the idea of an action ("Debiasing" or "Control") but instead would be constructed to communicate novel psychological insights (e.g., "Human numerical estimation errors are highly sensitive to...").

Thank you for your time and efforts on this manuscript. I hope you find these comments to be constructive. Best wishes to you in your efforts to further revise this manuscript, if you choose to do so.

Theodore (Ted) P. Pavlic

Guest Editor

PLOS Computational Biology

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have revised the manuscript significantly to account for concerns of myself and other reviewers. I think the manuscript should now be accepted in its current form.

Reviewer #3: This is the third time I am reading this paper that reports a model and an empirical study of collective judgments. The study seeks to understand the effects of information sharing in groups and in particular the effects of the amount of information shared and the selection process of which pieces of information are shared. A particular point of interest from the authors’ perspective is how well can a particular process (“shifted median”) designed to counter natural individual judgment biases improve the quality of the judgments. I like the topic and the approach. The experiment is well designed and, for the most part, it is well analyzed and clearly reported. The authors addressed seriously my reservations and this version is better. I still have few outstanding issues/questions:

In the previous draft of the paper, I found the examples the authors used for the magnitude of the overestimation bias somewhat arbitrary (they referenced the number 100 at some point). They took my suggestion and, instead, focused on a single example, using a dot estimation task, but I still find this slightly opaque and would prefer it were expanded a bit. Can the authors provide a brief explanation of the design of the task, with particular emphasis perhaps on the range of dots participants were expected to estimate? Just slightly more context would go a long way here I feel.

I continue to be puzzled by the prediction regarding the random condition. The notion that people are insensitive to the number of pieces of advice presented to them is counterintuitive (after all, everyone can do exactly what the authors are doing in the median condition, namely reject/ignore extreme values), and is inconsistent with empirical evidence about the way people aggregate information from multiple sources (e.g., Budescu & Rantilla ,2000; Budescu, Rantilla, Yu & Karelitz, 2003; Budescu & Yu, 2007). The authors point out that their results in the random condition confirm this expectation, but this seems to be a bizarre twist of logic. I was not questioning their results, but asking for the justification for the a-priori prediction that runs counter to (at least, some) data. One can’t defend / justify a prediction, by simply, saying it was right! Do they imply that the prediction was preceded by the results?

I think we had this argument in a previous round, but I don’t think the claim on lines 114-115 is mathematically correct. If all judgments over (or under) estimate the true value (i.e. all errors have the same sign), it is easy to generate distributions where the expected error (i.e. the error of a randomly selected judge) is smaller than the error of the median. This needs to be clarified.

Line 172: What exactly does “reliable” mean here? Please define and clarify whether this is an empirical or a normative / theoretical argument. BTW Han & Budescu (2019) show the superiority of the median over the mean in a bunch of cases, and cite other papers doing so.

Line 196: Provide some data to make the point: “In X of the 18 distributions considered, the MSE (or some other measure) of the second estimate is smaller that the MSE (or whatever other measure you pick) of the first set of estimates”.

The model. On line 324 you outline a two-stage process: In a certain fraction of cases judges stick to their estimates (S=0) and in the other cases they draw S from a properly parametrized Normal distribution. As a psychologist, I am bothered by the lack of differentiation between individual and aggregate level theorizing. Figure 3 is based on all judges and all items combined and the spike of S=0 does not differentiate between the two. My question, as a psychologist, is whether the S=0 represents a subset of judges that stubbornly refuse to be influenced by social influence, a subset of items that are so easy that no one looks at the others’ estimates, or whether this reflects a uniform / universal tendency of all judges to stick to their estimates in a fraction of cases (e.g., when they feel very sure about them.) It is an easy analysis to run (the % of cases where S=0 for every judge and every item) that would help clarify this point.

The general discussion: I wish some of the four mechanisms listed would be qualified to reflect the restrictions imposed by the experiment. For example, Schultze, Rakotoarisoa and Schulz-Hardt (2015) show that the distance effect is not always monotonic, and it is possible to imagine case where people would systematically pay more attention to estimates that are lower than theirs (for example, for very rare and / or very undesirable quantities).

Finally, what was Figure 2, and now Figure 12. The new figure contains more information, but I’m not sure it’s more informative. What is persuasive about this figure is essentially the same thing as before: when τ is small, median shifted social information benefits collective accuracy meaningfully more than random social information or the unshifted median. I think this is the most interesting result in the paper and was also clear in the previous version of this figure. I don’t see how the reference lines for the model predictions are informative / useful, the solid line. This appears to just be the expected pre-advice accuracy estimate across all questions and conditions. How is the model’s prediction, which is based on social information, informative in the absence of social information?

More critically, how can we make a comparison with the empirical results when the model prediction is independent of the empirical results by condition? The pre-SI empirical results are condition specific and don’t correspond particularly well to the reference line, which makes comparison between the post-SI empirical results and model predictions extremely difficult. One way to address this might be to leave the solid line out entirely and condition the post-SI model predictions on the pre-SI empirical results. Without something like this, it is very difficult to interpret how well the model is reproducing the empirical patterns. Perhaps this was a hidden issue in the previous versions of this figure, as I may have been implicitly assuming this was the case. Otherwise, this seems like an apples-to-oranges comparison.

Overall, the results (e.g. Figure 10 in particular) seem persuasive that the model adequately captures patterns in individual behavior with regard to social information, but Figure 12 stands out as being difficult to interpret with regard to how well the model is capturing empirical patterns in collective accuracy.

Thank you for the opportunity to review this thought-provoking paper

David Budescu

Han, Y. & Budescu, D.V. (2019) A universal method for evaluating the quality of aggregators. Judgment and Decision Making, 14, 395-411.

Schultze, T., Rakotoarisoa, A., Schulz-Hardt, S. (2015) Effects of distance between initial estimates and advice on advice utilization. Judgment and Decision Making,

Reviewer #4: Please see PDF attached.

Reviewer #5: I appreciate the authors' work to improve their manuscript, and to address the comments of all of the reviewers. This version of the manuscript looks _significantly_ revised, so I have a few new comments to this version that have to do with the poor statistical treatment of the data.

1. "visual inspection confirms" (l.181) This is a very weak way to compare whether or not two sets of data are statistically different from one another. I suggest that the authors use a more rigorous statistical method to do this comparison.

2. "we find narrower distributions after social information sharing" (l.196) This is really not obvious to me, especially if you ignore the lines (which are 'model simulations') and just look at the datapoints. Again, here a rigorous statistical method to make this claim is needed.

3. "the distributions of Xp (solid lines) are simulated by drawing the Xp from Laplace distributions" (l.198) This is odd. There is a closed-form expression for the PDF of the Laplace distribution, so simulating this distribution is unnecessary. Furthermore, it appears that the authors simulated the distribution once, and plotted the same simulation across all of the panels of figure 2, and all of the stochastic jaggedness is identical across the panels. Why not just plot the exact form of the distribution?

4. "A similar pattern of social influence strength is observed at intermediate values of tau (tau = 3, 5, 7, or 9), where Pg and mg are substantially higher in the Median and Shifted-Median treatments than in the Random treatment. For sigma_g, we observe a higher value in the Random treatment than both other treatments at tau = 3 and 5, but not at higher levels of tau." (l.248-251) This appears to be a poor analysis of the data shown in figure 4. First, we can take the figure at face value. Doing so, we can see that Pg does not appear to be "substantially higher" in the Median and Shifted-Median treatments compared to the Random treatment (the blue and black error bars overlap quite a lot). Similarly, sigma_g does not look higher for the Random treatment compared to the other treatments at tau = 3 (the black error bar overlaps both the blue and red error bars).

Worse, however, is the fact that the error bars represent ONE standard error. If we double the length of the error bars to approximate a 95% confidence interval, we can see that nearly all of the error bars will overlap with one another for most of the figure, rendering the authors statements about the data in the text unsubstantiated by the data. The authors need to address this.

5. "we find that the center of the cusp relationship is located at D = D0 <0" (l. 292) However, in line 302, we find that "visual inspection was used to fix D0." Again, this is a surprisingly poor method for determining this, and the authors should use a statistically rigorous method instead.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: None

Reviewer #5: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: Yes: David V Budescu

Reviewer #4: No

Reviewer #5: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Attachment

Submitted filename: PCOMPBIOL-D-20-00065_R2_Comments_to_Authors.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009590.r007

Decision Letter 3

Natalia L Komarova, Theodore Paul Pavlic

25 Oct 2021

Dear Dr. Jayles,

We are pleased to inform you that your manuscript 'Crowd control: reducing individual estimation bias by sharing biased social information' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Theodore P. Pavlic

Guest Editor

PLOS Computational Biology

Natalia Komarova

Deputy Editor

PLOS Computational Biology

***********************************************************

Thank you for your considerable efforts responding to the questions, comments, and concerns of the reviewers and the editing staff. I believe that the current version of the manuscript is ready for larger scrutiny by the broad PLOS Computational Biology audience and will likely be a very thought provoking contribution.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009590.r008

Acceptance letter

Natalia L Komarova, Theodore Paul Pavlic

8 Nov 2021

PCOMPBIOL-D-20-00065R3

Crowd control: reducing individual estimation bias by sharing biased social information

Dear Dr Jayles,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Katalin Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Details of the experimental design and supplementary figures and tables.

    Further details about the experimental design are provided (including the questions asked), as well as figures and tables supporting the statistical analysis and the main discussion. Fig A: Experimental procedure for an example question. Fig B: Correlation between median estimate and correct answer for general knowledge VS numerosity questions and for very large VS moderately large quantities. Fig C: Significance analysis of the differences between slopes in all panels of Fig 1 as well as of these slopes being lower than 1. Fig D: Narrowing of the distributions of estimates after social information sharing in Fig 2 and analysis of its significance. Fig E: Probability density function (PDF) of personal estimates Xp for all conditions combined. Fig F: Probability density function (PDF) of the fraction of instances with S = 0 for each participant and each question. Fig G: Significance analysis of the differences in Pg, mg and σg between treatments in Fig 4. Fig H: 〈S〉 against τ and 〈σ〉 for the model without similarity effect. Fig I: 〈S〉 against τ, when D < 0 and D > 0, for the model without similarity effect. Fig J: Collective and individual accuracy against τ for the model without similarity effect. Fig K: 〈S〉 against τ and 〈σ〉 for the model without asymmetry effect. Fig L: 〈S〉 against τ, when D < 0 and D > 0, for the model without asymmetry effect. Fig M: Collective and individual accuracy against τ for the model without asymmetry effect. Fig N: Significance analysis of the improvements in collective and individual accuracy in Fig 12. Fig O: Significance analysis of the difference in improvement in collective accuracy between treatments in Fig 12. Fig P: Significance analysis of the difference in improvement in individual accuracy between treatments in Fig 12. Fig Q: Significance analysis of the improvement in individual accuracy in Fig 13. Fig R: Collective accuracy against τ when D < 0 and when D > 0. Fig S: Significance analysis of the improvement in individual accuracy in Fig 14. Fig T: Collective accuracy against τ when S is below and above Median(S). Fig U: Collective and individual accuracy against τ in the Shifted-Median treatment compared to a simple recalibration of initial estimates. Fig V: Collective accuracy against τ for moderately large and very large quantities. Fig W: Individual accuracy against τ for moderately large and very large quantities. Table A: Distribution of cases when the social information provided to an individual was higher (D > 0) or lower (D < 0) than their personal estimate Table B: Goodness-of-Fit and relative error between the data and the model.

    (PDF)

    Attachment

    Submitted filename: Review_PCOMPBIOL_D2000065.pdf

    Attachment

    Submitted filename: PCOMPBIOL-D-20-00065_Comments_to_Authors.pdf

    Attachment

    Submitted filename: Response-to-Reviewers.pdf

    Attachment

    Submitted filename: PCOMPBIOL-D-20-00065_R1_Comments_to_Authors.pdf

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Attachment

    Submitted filename: PCOMPBIOL-D-20-00065_R2_Comments_to_Authors.pdf

    Attachment

    Submitted filename: Response to the Referees.pdf

    Data Availability Statement

    The data supporting the findings of this study are available at figshare: https://doi.org/10.6084/m9.figshare.12472034.v2.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES