Skip to main content
PLOS One logoLink to PLOS One
. 2022 Oct 6;17(10):e0275473. doi: 10.1371/journal.pone.0275473

Person-to-person opinion dynamics: An empirical study using an online game

Johnathan A Adams 1, Gentry White 1,2, Robyn P Araujo 1,3,*
Editor: José Manuel Galán4
PMCID: PMC9536623  PMID: 36201432

Abstract

A model needs to make verifiable predictions to have any scientific value. In opinion dynamics, the study of how individuals exchange opinions with one another, there are many theoretical models which attempt to model opinion exchange, one of which is the Martins model, which differs from other models by using a parameter that is easier to control for in an experiment. In this paper, we have designed an experiment to verify the Martins model and contribute to the experimental design in opinion dynamic with our novel method.

Introduction

The field of opinion dynamics has a wide variety of theoretically derived models that potentially describe human interactions and the resulting change in opinions. Despite the appeal of these models, there is a dearth of empirical evidence to support their utility [1]. For a model to be scientifically verifiable, the model needs to make testable predictions about the outcome of an experiment. While modern examples of research [2, 3] demonstrate an effective method to investigate opinion dynamics models, most theoretical opinion dynamics models don’t offer predictions on behaviours, which makes it challenging if not impossible to create controlled experiments which can verify these models [4]. Consider the bounded confidence model [5, 6], which includes the parameter ϵ limiting agent interactions. Certain values of ϵ can create polarisation. But because ϵ is an abstract (and highly subjective) measure in opinion space, it is difficult to create an experimental condition to control ϵ. In general, the level of abstraction in the opinion dynamics models’ parameterisations limits the design and implementation of experiments for testing model validity. Further, opinion dynamics models created from data are also difficult to verify because, as stated in [4], models fitted to the data of experiments rarely make testable predictions about future data. We break this trend by designing and executing an experiment testing the claims made by the Martins model [7].

The Martins model [7] represents opinions as probability density functions such that a person has an opinion xR and an associated uncertainty σR+. Their opinion and uncertainty represent a Gaussian density function with mean x and standard deviation σ. When two agents interact in the Martins model, they share their opinions (and in the extended model [8] their uncertainties), and the two agents then update their opinion and uncertainty via Bayesian updating. A key parameter in the model is p ∈ [0, 1], which is the propensity for agents to believe that other agents have useful information and use that information to update their opinions. When p = 1 consensus is always reached, whereas values of p < 1 polarisation emerges. Compared to the bounded confidence model’s ϵ, the parameter p is much more interpretable and controllable (in an experimental setting) than the parameter ϵ.

We present, in this article, a comprehensive literature review of previous empirical studies in opinion dynamics. Then we outline a design for an experiment which can test whether the Martins model can predict opinion shifts of individuals. We executed such an experiment and, in this article, present the results of the experiment. In the results, we found two distinct phenomena occurring in the experiment: when two individuals are close in opinion, the Martins model made a reasonable prediction of the opinion shift; when two individuals are far in opinion, the observed opinion shift followed a what would be expected from discrete opinion choice model. We concluded by discussing our novel results and identifying the limitations of our experiment.

Previous experiments

There is limited evidence of direct use in opinion dynamics of experimental data to either verify hypotheses based on model predictions or construct empirical models. This scarcity of evidence is partly due to the difficulty of designing an experiment that accurately replicates real-world interactions while controlling the experimental conditions and has resulted in empirical data collection in opinion dynamics evolving independently from the theory.

Experimental data collection

Many empirical investigations into opinion dynamics draw inspiration from psychology studies that investigated opinion change [912]. All of these studies served as guidelines for the experimental design of the later opinion dynamics studies. For example, the study [11] aimed to test two hypotheses: “Extreme members will contribute more to the group discussion than less extreme members. (1a) They will use more words than less extreme members, and (1b) they will take more turns than the less extreme member,” and (2) “There should be greater group polarisation in a group containing an extreme member than in groups not containing an extreme member.” The authors of [11] tested these by dividing 129 participants into 43 groups of three. Participants in each group were asked their opinion and knowledge on the legalisation of marijuana before the experiment. The participants read material related to the legalisation of marijuana and then discussed the issue within their group until the group reached a compromise. After the discussion, participants reevaluated their opinion. This experimental design formed the bases for the approach to collecting data in the opinion dynamics literature.

Building empirical models

While Opinion Dynamic’s inception began in the 1950’s [13], one of the first significant studies focused on developing a model of opinion change using experimental data was published in 2013 [14]. The experimental design in [14] draws direct inspiration from the previous psychology literature, but the study generated a model of human behaviour from the collected data rather than prove any specific hypothesis. Participants were asked general knowledge questions with a real number answer, e.g. “How long is the Mississippi river?” and rated their confidence in their answer on a 1 to 6 scale, with lower meaning less confident. Participants only saw one other participant’s answer and confidence at a time.

The authors of [14] used the experimental data to create an influence map of the experimental subjects’ behaviours. An influence map is a surface in relative opinion and relative uncertainty space, which describes the opinion change of an individual according to their relative opinion and relative uncertainty with a hypothetical interaction partner. The authors used the influence map to create a decision tree model. Depending on where an interaction fell on the influence map, the model specifies three ways an agent could update their opinion after an interaction with another agent: rejecting where there is no change in opinion; compromising where the opinion shifted ‘halfway’ towards the other opinion; and adopting where the opinion changed to be the other opinion. The resulting model is related to the bounded confidence model [5, 6] such that the regions of rejecting, compromising and adopting could be used to determine an ϵ, but the influence map of [14] implies a more nuanced picture which the bounded confidence model cannot address.

More modern models like the Martins [7] and relative agreement models [15] produce similar behaviour seen in the influence map generated by [14]. But it is difficult to precisely confirm whether the models like Martins or relative agreement can accurately predict the behaviour observed in [14]. Specifically, the Martins and relative agreement models rate confidence/uncertainty as continuous values, which conflicts with the discrete 1–6 scale [14] used to measure confidence, therefore making the empirical data incompatible with the theoretical models. The goal of [14] was to use the data to generate a model, not to verify an existing model.

Other experimental designs

Modern studies have improved the experimental design of [14]. For example, the work [3] provided a novel contribution where participants interacted in pairs through digital displays and exchanged their opinions, but participants could see their interaction partner update their opinion in real-time. The new method proved a controlled, yet realistic, environment to test ideas about opinion exchange and revealed new behaviour in which participants became more confident upon observing that their interaction partner changed their opinion. This empirical evidence provides clues to produce theoretical models which can predict opinion exchange more effectively.

More significant is the work of [2]. Specifically, the study [2] investigates how groups assessed threatening objects and developed a model similar to the French model [13] which establishes seven testable predictions ranging from conditions on how individuals modify their threat assessments to predicting the process in which society might reach consensus. Agents in the model of the study, like in the original French model [13], weighted their neighbours such that when the model progressed, the agents would adopt the weighted average opinion of their neighbours according to the weighting the agent assigned each neighbour. For example, consider a three agent simulation with Agents 1, 2 and 3 holding opinions x1, x2, x3 respectively, and Agent 1 weighting every agent in the simulation according to this vector [0.2, 0.3, 0.5]. When the model updates Agent 1’s opinion will be 0.2x1 + 0.3x2 + 0.5x3. The study measured these weightings by first giving 100 chips to each participant after the group discussion. Next, participants distributed their chips according to how much they were convinced by other group members that an object was “threatening”. Participants were allowed to keep chips if they were not convinced by the group. The chip distribution provided by each participant directly measured the weightings necessary for the modified French model. The study concludes by evaluating the model’s ability to predict an individual’s threat assessments. The work of [2] demonstrates an effective method to evaluate the opinion dynamics model, which this paper hopes to emulate.

The side of opinion dynamics concerned with measuring the most influential individual has already produced empirical studies that seek to verify theoretical models’ predictions. The work of [16] offshoots from the French model [13] by imposing that the weights of the French be related to the in-degree agents, i.e. how well listened an agent is. This weighting scheme relies on a parameter ρ such that when ρ = 0 in-degree does not affect opinion dissemination when ρ = 1 neighbours are weighted proportionally to an agent’s personal in degree and when ρ → ∞ agents only listen to the neighbour with the most in-degree. In the study, [17] the authors developed an experiment that isolated the effect of in-degree. Specifically, the authors developed a social network for participants that controlled for in-degree. The result of the experiment was the rejection of the null hypotheses, i.e. ρ = 0, suggesting that in-degree has a role in opinion dissemination. With the experiment of this study, we seek to accomplish a similar goal on the inter-personal level and ascertain whether mistrust influences interpersonal communication as the Martins model describes.

Materials and methods

In comparison with previous work, we designed this study’s experiment to be more abstract. This abstraction allows for a more direct comparison between the model variables, i.e. agent uncertainty and opinion, and the data collected from the experiment. In addition, the abstraction minimises the impact of cultural bias. Consider the general knowledge question used in previous experiments. The questions limited the pool of participants to those somewhat knowledgeable of the topic, e.g. a question like “How long is the Mississippi river?” limits participants to those from the US. We naturally avoided this problem with our experiment. We took advantage of this flexibility to make the study a snowball sample study; participants are encouraged to invite others to participate, to increase the sample size for the study.

Ethics statement

The following experimental design and experiment was approved by the Queensland University of Technology (QUT) Human Research Ethics Committee (UHREC) as Negligible-Low Risk. Reference number: 2000000739.

Recruitment of participants

As stated previously, our recruitment strategy for this experiment was a snowball sampling strategy. We advertised the experiment on the social media websites Facebook and YouTube and through the mailing list and Slack workgroups of the QUT mathematics school. When a participant finished the experiment, we recorded the IP address associated with the device they used for the experiment as part of the data. We recorded IP addresses to determine the number of unique participants in the experiment. We recorded no other personal data on the participant. A total of 257 unique participants participated in the experiment, assuming that participants are unique to each IP address.

The experiment

The experiment entailed playing a game on the internet hosted on QUT servers. The goal of the game was to find a hidden dot inside a black box on screen. Participants were given information to find the dot in the context of a social interaction. Participants would first see a blue circle which was explained as information that was always reliable. The hope being that a participant would internalise the blue circle as their opinion. Next a participant would see a red circle which was explained as not being reliable all the time. The idea being that participants would interpret the red circle as rumours which are subject to being false. Lastly a participant would draw a new circle in response to the red and blue circles. We directly controlled the reliable of the red circle while informing the participants which allowed us to control for p in the Martins model. See S1 File for the source code of the website and in total we had 3760 games played.

Participant instructions

Before any participant played a game, they first saw instructions for the game. The instructions described the game similarly to how it is described in this paper except without the probability terminology, e.g. confidence intervals, to avoid confusion for the participants. The instructions developed a backstory to the game to encourage participants to role-play so that the participants responded realistically. The instructions described an eccentric Flemish trillionaire, Monsieur Dotte, as hosting the game, and they wanted the world to indulge in his passion for puzzles and social deduction. So, M. Dotte offered a ‘cash prize’ for those who do well at his game of finding the dot. This framing allowed us to communicate specific information to participants, e.g. the reliability of the red circle at different traffic light signals and the 80% chance a reliable circle had in containing the dot, while keeping the scenario plausible in the minds of the participants. It was made clear that there was no monetary reward for playing the game and the ‘cash’ was just their score after finishing the game and held no fiscal value. See S1 Fig for the instructions we gave to the participants on the game website.

The game

Initially, a participant would see an empty black box. The participant would then click on the box resulting in the blue circle appearing, e.g. they could see S2 Fig. The blue circle represented an 80% confidence interval which was explained to the participants as an 80% to contain the dot. The blue circle was 100% reliable and always gave information on the dot’s location. The dot could be outside the blue circle, but because the blue circle was 80% confidence interval, the dot would appear close to the circle.

When participants clicked again, the red circle would appear, e.g. they could see S3 Fig. Like the blue circle, the red circle purports to be an 80% confidence interval of the dots’ position, but the red circle has a probability of being unreliable, i.e. drawn at random and independent of the dot’s actual location. To communicate the unreliability of the red circle a traffic light above the box would light up such that: when the traffic light was red, the red circle had a reliability of 20%; when the light was yellow, the red circle had a reliability of 50%; when the light was green, the red circle had a reliability of 80%. The confidence interval of the red circle remained constant when the red circle was randomly determined to be reliable. Otherwise, the red circle would be drawn randomly inside the box. These reliability probabilities were communicated to the participants in the instruction and were our attempt to exactly quantify p to allow for more definitive model predictions.

Finally, the game directed the participant to consider the position and reliability of the circles and draw a new circle (hereafter, the ‘user circle’) that they believed contained the dot. After the participant finished drawing the user circle they were scored based on their accuracy (whether the dot was in their circle) and their precision (how small their circle was) relative to the blue circle and were encouraged to play again. Then we recorded the final game states, including the size and position of all three circles, the reliability of the red circle, a unique session id and the participant’s IP address. S4 Fig shows an example of a finished game.

Collected data

The Martins model [7] and its extension [8] predicts an individual’s shift in opinion based on the parameter p, the opinions x and the uncertainties σ of both individuals involved in an interaction. As part of the experiment we recorded the specific values for p, xi, xj, σi and σj of every simulated interaction. i.e. every game that a participant played and Table 1 describes how we organised that data.

Table 1. Data collected from the experiment.
Variable name Description
p The probability of the red circle giving useful information
x blue The x-coordinate of center of the blue circle
y blue The y-coordinate of center of the blue circle
r blue The radius of the blue circle
x red The x-coordinate of center of the red circle
y red The y-coordinate of center of the red circle
r red The radius of the red circle
x user The x-coordinate of center of the user circle
y user The y-coordinate of center of the user circle
r user The radius of the user circle

The blue circle is the first circle seen by the participant and represents the knowledge already acquired. The red circle is the second circle seen by the participant and represents the opinion and conjecture of another actor. The user circle is the circle drawn by the participant.

Except for p, the data in Table 1 are in units of pixels, whereas the Martins model deals in the abstract (unitless) opinion space. For clarity and similitude, we scaled all the relevant data removing the unit of pixels. We calculated the scaling factor by finding the radius of the circle of area equivalent to an HD monitor display (1920 by 1080). We then divided the radius by the number of standard deviations to produce an 80% confidence interval, resulting in 634 pixels per standard deviation. We used this factor to scale the data and remove the unit pixels.

Scoring

We encouraged participants to play multiple games by giving the participant a score per attempt. Scoring a participant’s game follows these steps:

  1. Calculate an accuracy AuserR+ and precision PuserR rating for the participant based on the circle they drew. Note that a negative value for precision results when the area of the player’s circle approaches the area of the box.

  2. Produce an overall rating for the participant, Ruser, as a weighted sum of Auser and Puser with weights wA and wP, respectively, e.g. in the experiment wA = 0.1 and wP = 70 are chosen based on preliminary experimentation to determine intuitive scoring results.

  3. Repeat steps 1 and 2 for the blue circle, producing a rating for the blue circle Rblue.

  4. Find the relative rating R for the participant’s circle
    R=Ruser-Rblue+R0,
    where R0 is the rating given for guessing equally well as the blue circle, e.g. for the experiment R0 = 2 × 10−3. Giving a participant a rating relative to the blue circle encourages participants to guess better than the guaranteed information they start with.
  5. Calculate a score S ∈ [0, Smax] as a sigmoid function of R, e.g. the experiment used
    Smax1+e-R/2.
    The constant Smax is the maximum score achievable when playing the game, e.g. in the experiment Smax is one hundred thousand.

Remark. To calculate accuracy and precision we used the following

A(e,r)=11/Lmax+e+Ubonusr-rminrmin, (1)
P(r)=Pfactor1π(r-rmin)2-π(r-rmin)2Barea-min[Barea,π(r-rmin)2], (2)

where e is the error of the circle which is the distance between the centre of the circle and the dot, r is the radius of the circle, rmin is the radius of the smallest possible circle that can be drawn, Lmax is the maximum accuracy score achievable when getting e = 0, Ubonus controls how much the circle radius factors into accuracy, Barea is the area of the box containing the dot and Pfactor controls for how precise a circle of a given area is. For the experiment Lmax = 100, Ubonus = 0.01 and Pfactor = 1.

We rated accuracy and precision this way for two main reasons. First was so that accuracy and precision would be completely unrelated to the Martins model because if it were, participants would be encouraged to guess more like the Martins model, thus biasing the data. The second was for the ratings to produce a “fair” score by relating it to tangible concepts, e.g. a circle the size of the box would be considered very imprecise and thus would give no score, i.e.

Puser-Ruser-S=0.

A score that a participant considers fair would encourage them to continue playing, at the very least not dissuade them.

Model and data predictions

The Martins model predicts that a user’s opinion will fall on the line segment connecting the centres of the blue and red circles. The predicted distance from the centre of the blue circle to the user’s opinion is

hexpected=p*d1+Rσ2, (3)

where d is the distance between the centres of both the blue and red circle, Rσ is the ratio of both the red and blue circles’ confidences which in this case means the ratio of both circles’ radiuses and p* is a variable in the model dependent on d relative to Rσ (see S1 Appendix for more details). Similarly, the difference between the variances of the user circle and the blue circle is predicted to be

kexpected=p*(11+Rσ2)((1-p*)(d)21+Rσ2-σi2). (4)

We can multiply this quantity by π and 1.292 to get the expected change in circle area, where 1.29 is the number of standard deviations away from the mean required to construct an 80% confidence interval. These theoretical values can be directly compared to the observed data and assessed for the goodness of fit.

We calculate the observed shift towards the red circle h and change in circle area k as

hobserved=(xblue-xuser)(xblue-xred)+(yblue-yuser)(yblue-yred)d, (5)
kobserved=(rblue2-ruser2)π. (6)

To compare the Martins model with the data, we calculate the following: d, the distance between xi and xj; σi, the uncertainty of initial belief; Rσ, the ratio of i and j’s confidences; and the Martins model quantity p*. The value d is the distance between the centers of both the blue and the red circles and is thus

d=(xblue-xred)2+(yblue-yred)2.

The circles are presented to the participants as 80% confidence intervals, therefore

σi=rblue/1.29

where 1.29 is the number of standard deviations from the mean required to get 80% confidence. The ratio of i and j’s confidences is

Rσ=rbluerred.

The quantity p* is a function of d, σi and Rσ and is

p*=pϕ(d,σi1+Rσ2)pϕ(d,σi1+Rσ2)+(1-p) (7)

where

ϕ(d,σi1+Rσ2)=(1/(σi2π(1+Rσ2)))e-(d)2/2σi2(1+Rσ2). (8)

Intuitively these predictions mean, even when p is low, we expect to see participants shift towards the red circle more and reduce the size of their circle (relative to the blue) more when the red and blue circle are close to each other. Likewise, even when p is high, we expect to see participants remain close to the blue while keeping the same radius as the blue circle when the red and blue circles are very far apart. This is due to the influence of p*, because p* controls the degree an agent incorporates a new opinion and p* depends on d, i.e. the difference in opinion, and σi2+σj2, i.e. the total variance of both agents’ opinion. The reliability p only effects the speed at which agents in the model effectively trust other agents and hence doesn’t produce significantly different behaviour for values of p between 0.2 and 0.8 (except for extreme case like when p → 1 [8]). Given the Martins model prediction on the circle participants draw in the game we supply the hypotheses

  1. There is no correlation between the observed and expected shifts away from the blue circle.

  2. There is no correlation between the observed and expected change in circle area from the blue circle.

Results

For each game, we calculated the user’s predicted shift from the blue to the red circle and the change in their circle area compared to that of the blue circle using the Martins model. We compared the predicted results with the observations in Fig 1 (for the raw dataset see S2 File). Upon initial observation, we see in the data a few outliers where participants drew large circles in random places relative to the blue circle. We surmise that participants were likely trying to ‘break’ the game in the experiment by drawing the largest circle possible. More interestingly, we can see two patterns emerge from the data. First is the linear relationship we expect to see between the observation and what the Martins model predicted. The second is a tendency to shift and draw a larger circle when the model predicted no change. Likely, two phenomena are simultaneously occurring in this experiment, one that the Martins model can explain and the other the model cannot explain. We suspected that the two phenomena could be divided based on parameters determined in the game, and we developed this filter to separate the data

d<0.8(rblue+rred). (9)

Eq 9 separates the data based on whether the two circles shown to the participant were overlapping by 20% of their radiuses.

Fig 1. Scatter plot of expected v.s. observed shift towards the red circle and the change in circle area relative to the blue circle.

Fig 1

(A) Shift towards the red circle. (B) Change in circle area.

When we apply Eq 9 to the data, we find these results:

  1. When the red and blue circles overlap and there is medium to high reliability, we can reject both hypotheses: (1) There is no correlation between the observed and expected shifts away from the blue circle; and (2) There is no correlation between the observed and expected change in circle area from the blue circle.

  2. When the red and blue circles do not overlap, participants are making a choice to either stay with the blue circle, adopt the red circle or compromise 44% with the red circle when trust is high.

  3. Participants tend to shift away from the red circle when trust is low, and the red and blue circles overlap.

Data explained by the Martins model

Fig 2 is the result of applying Eq 9 to the data. In Fig 2A we can see that the filtered observations broadly match expectation with the linear model producing an R2 = 0.14 (see Table 2). In Fig 2B the model performs noticeably worse with an R2 = 0.0029 (see Table 2). The p-values for the slopes from the linear models are statistically significant compared to a Type I Error Rate of α = 0.05 for the Medium and High trust scenarios this indicates that in these cases there is sufficient evidence to reject our null hypotheses and conclude that the observed results do concur with the extended Martins model predictions. In the Low trust scenarios, the p-values are not significant, indicating that in these scenarios, the extended Martins model results are not good predictors of the observed behaviour. In the Low trust scenarios individuals seem to move away from the red circle, which is counter to the assumptions of the extended Martins model. We elaborate more this negative shifting in its own section. Investigating Fig 2A reveals the effects of a confounding variable bounding observed values to a minimum value. We suspect this confounding variable to be the minimum size participants can draw their circle, which creates an artificial limit on circle size reduction.

Fig 2. Expected v.s. observed filtered based on games where the blue and red circle overlapped.

Fig 2

The solid red line is the line of best with intercept set to zero. (A) Shift towards the red circle relative to the blue circle. (B) Change in circle area relative to the blue circle.

Table 2. The slope and R2 values for Figs 2 and 3.

Shift towards Red Change in circle area
Low Mid High Total Low Mid High Total
slope 0.17 0.94 1.14 1.06 −0.02 0.23 1.08 0.27
p-value 0.58 1.06e–43 1.07e–114 1.07e–141 0.91 0.03 6.14e–39 1.16e–5
R 2 0.01 0.09 0.15 0.14 0.01 0.01 0.05 0.003

The low, mid and high headings in the table refer to trust at low, medium and high values respectively, specifically they refer to parameter values p = 0.2, 0.5 and 0.8. Total is taking the data as a whole.

We partitioned the data further by the reliability of the red circle, and Fig 3 shows such a partition. In general, the model predicts high trustworthiness interactions more accurately, and most of the outliers in the data lie within the low trustworthiness games. In particular Fig 3B, when p = 0.2 and p = 0.5, contain most of the unusually large data points compared with p = 0.8. We can explain this outlier behaviour as participants attempt unorthodox strategies to get the highest score since the red circle isn’t a reliable source of information in those cases.

Fig 3. Expected v.s. observed filtered based on games where the blue and red circle overlapped separated into different levels of trustworthiness.

Fig 3

The solid red line is the line of best with intercept set to zero. (A) Shift towards the red circle relative to the blue circle. (B) Change in circle area relative to the blue circle.

Data unexplained by the Martins model

After investigating the data of the overlapping circles, we shifted focus to the data of the non-overlapping circles. Since the red and blue circles weren’t overlapping in this case, participants would see two distinct circles. Therefore, we hypothesised that participants were choosing either to stick with the blue circle (what they know to be true), to ‘take a leap of faith’ and adopt the red circle as their new opinion, or to comprise with the red circle and draw their circle in between the red and blue circles. We tested this hunch by scaling the observed shift towards the red circle by the distance between the blue and red circles d resulting in Fig 4. From Fig 4 we can see that the majority of the unpredicted shifts (73%) were between 0 and d with most of these close to 0, which is consistent with the proposed explanation that the participants made a discrete choice between three options.

Fig 4. Histogram of the shift towards the red circle relative to the total distance between the other and blue circle when the blue and red circles where not significantly overlapping.

Fig 4

Similar to the data explained by the Martins model, we can separate the unexplained data based on trustworthiness which results in Fig 5. The tendency we expected to see, i.e. sticking with the known by drawing over the blue circle, is confined to the low trust scenarios of Fig 5, i.e. p = 0.2 and p = 0.5, but there is a tendency to comprise and a smaller chance to fully trust the red circle which contributes to a rightwards skew, particularly in the medium trust case of p = 0.5. The tendency to adopt or comprise with the red circle is unsurprisingly common in the high trust case of p = 0.8, and we can see distinguished peaks when p = 0.8, suggesting that participants are making a discrete choice concerning whether to fully, partially or not trust the red circle. Furthermore, we note the case when p = 0.8 relates closely to data collected by [14] when participants decided to comprise. In the study, [14] participants chose to adopt on average 40% of their interaction partner’s opinion into their own, which is congruent with the average opinion shift in Table 3 of 0.44.

Fig 5. Histogram of the shift towards the red circle relative to the total distance between the other and blue circle when the blue and red circles where not significantly overlapping separated into different levels of trustworthiness.

Fig 5

Table 3. The summary statistics for Figs 4 and 5 in pixels.

Low Mid High Total
Mean 0.13 0.22 0.44 0.23
Median 0.02 0.07 0.45 0.06
Min −2.05 −0.91 −0.47 −2.05
Max 4.21 1.39 2.63 4.21
1st Quantile −0.01 0.00 0.09 0.00
3rd Quantile 0.15 0.39 0.72 0.42

The low, mid and high headings in the table refer to trust at low, medium and high values respectively, specifically they refer to parameter values p = 0.2, 0.5 and 0.8. Total is taking the data as a whole.

Negative opinion shifting in the results

In the data, there has been a noticeable number of participants shifting away from the red circle, i.e. a negative opinion shift, which the Martins model does not predict. We have tabulated the number of negative opinion shifts in Table 4 and most negative shifts occur within five pixels of 0. A notable exception is when trust is low, i.e. p = 0.2, and the red and blue circles overlapped (explained data), but the majority of negative shifts that occurred were still below 50 pixels (10% the height of the play area and 14% the width) with only 10% of shifts being further than 50 pixels. Over both the explained and unexplained data sets, low trust games resulted in more negative shifts, whereas high trust games, i.e. p = 0.8, resulted in less negative shifts and both the explained and unexplained data produced similar proportional amounts of negative shifts. We can conclude that most of the negative opinion shifts occurring are from participants attempting to draw their circle on the blue circle, i.e. shifting by 0. In low trust scenarios, however, the negative shift could be more intentional, particularly for the explained data.

Table 4. Number of games that feature negative opinion shifting from the participants.

Observed Shifts (pixels) Explained (games) Unexplained (games) Total (games)
Low Mid High Total Low Mid High Total
hobserved ≥ 0 163 454 895 1512 574 447 352 1373 2885
−5 ≤ hobserved < 0 36 58 69 163 141 68 26 235 398
−50 ≤ hobserved < −5 69 82 89 240 95 58 16 169 409
hobserved < −50 31 9 11 51 14 3 0 17 68

The low, mid and high headings in the table refer to trust at low, medium and high values respectively, specifically they refer to parameter values p = 0.2, 0.5 and 0.8. Total is the total games in a particular category, either explained, unexplained or across the whole experiment.

Breakdown of individual participant involvement

To ascertain the influence of individual participants in the experimental data, we have developed Fig 6, which shows the number of games played versus the number of players. We uniquely identified participants through their IP addresses and used that information to count how many games a participant played. The median games played was 10 games whereas the mean games played was 14.63 games this suggests there is a skew in the histogram Fig 6. Furthermore, the top 10% of participants (in terms of games played) are responsible for only 35% of all games played in the experiment and 80% of participants played 23 or fewer games. The high number of games played by a small number of players presents a potential issue because their “learning” could bias the results if we assume that games are independent trials for the purposes of analysis. That is, if players’ scores improve as they played additional games, the independence assumption would be invalid, tainting the results of our analyses. To investigate whether participants were learning to play the game, we compiled Table 5 showing the mean scores for the nth game played. While for any number of attempts there is a broad range of scores, including at the extremes, we can see that average participants’ score doesn’t increase as they play more games suggesting that participants are not “learning” to play the game better or at least not learning to improve their scores.

Fig 6. Histogram of the participant frequency on the number of games played.

Fig 6

Table 5. The mean score of participants by attempts.

Attempt Mean Score
1st 3230
2nd 1983
3rd 4896
4th 2533
5th 3563
6th 5072
7th 3195
8th 2028
9th 4358
10th 3690
11th-15th 4148
16th-20th 4039
21st-25th 4212
26th-30th 5124
31st-40th 3136
41st-50th 3007
51st-88th 3909

Discussion

The data we have collected has produced surprising and interesting results. We have identified two different types of behaviour in the experiment. The first type of behaviour is congruent with the predictions made by the Martins model, while the second fell outside the scope of the Martins model, and we were able to distinguish between the two behaviours by developing Eq 9, which divides the data based on the red and blue circle overlap. When investigating the second dataset, we developed Figs 4 and 5, and we concluded that participants are treating the problem of finding the dot as a discrete choice, i.e. it must be in either the red circle or the blue. Adopting the red circle opinion is contrary to the Martins model, which considers distant opinions as “untrustworthy” even with a p close to 1. Thus, according to the Martins model, a participant should always ignore the red circle. There are multiple causes for participants to switch to a discrete choice mindset. It is either the result of cognitive bias to simplify the problem or devised from the instructions we gave to participants. The instructions explained that M. Dotte is the one who reveals the blue circle. Despite the instructions explaining that M. Dotte is always reliable, participants might still doubt M. Dotte and not fully internalise the blue circle as their opinion. Although when we compare the unexplained results to the results in the literature, we see startling agreement. We observed in Fig 5 that when reliability is low, participants tended to keep their opinions close to the blue circle, shifting towards the red circle at an average of 0.2d. But with increased reliability, participants began to “compromise” with the red circle by drawing their circle at 0.4d from the blue to the red circle. This 0.4 magnitude shift agrees closely with the results in [14], when participants decided to compromise.

The Martins model appears ineffective in predicting the change in the circle area. We note in Fig 2B that a confounding variable is bounding the observed change in the circle area, thus forcing a minimum value. We posit that the confounding variable is the minimum circle size (a five-pixel radius) relative to the size of the play area, i.e. the box. We can see a linear tread exists in Fig 2B and is cut by the minimum circle size boundary. The outliers are more extreme in Fig 2B than for Fig 2A, and we can surmise that the outliers for Fig 2B were participants’ attempts to ‘break’ the game. Essentially the participants were testing if drawing a large circle would net a substantial number of points. When the data is separated based on reliability, it is clear that the Martins model predicts high-reliability scenarios for circle area change more accurately. In that case, the prediction for circle area change is small enough so that participants can draw circles of those sizes, thus above the minimum circle size boundary.

There is much debate over whether opinions can be “negatively” influenced, i.e. when individuals’ opinion difference is large, the distance between the individuals’ opinions increases after an interaction (i.e. negative opinion shift). Negative opinion influence is not to be confused with a negative opinion shift, which is an unconditional shift away from an interaction partner’s opinion, i.e. negative opinion shifts may occur without negative influence. Some theoretical opinion dynamics models [1820] rely on negative opinion influence to create polarisation, and others like [21] predict negative opinion shift resulting from negative opinion influence (but not necessarily resulting polarisation) in a discrete opinion context. In contrast, empirical experiments that attempted to measure negative opinion influence have so far failed. For example, [22], although finding evidence of negative opinion shifts, found no evidence of negative opinion influence.

The data we collected conforms with the results in [22], we observed in our data negative opinion shifts, but it was localised to when the red and blue circles were overlapping, not when the circles were distant. Most negative shifts resulted from participants attempting to draw onto the blue circle and were within 5 pixels of the blue circle. Only when reliability was low and the red and blue circles overlapped did it appear like participants intentionally shifted away from the red circle. The Martins model only predicts positive opinion shifts and thus does not explain the negative opinion shifting occurring at low reliability. The negative opinion shifting is likely the phenomenon which causes the Martins model to be a poor predictor of low-reliability interactions. We theorise that participants, when presented with a low-reliability red circle close to the blue circle, believe that the red circle reduces the chances that the dot is in the blue circle and thus moves away from the red, which the Martins model does not consider. Although, we see similar behaviour in the model developed in [21].

Conclusion

In this paper, we aimed to verify whether the Martins model is an accurate model of opinion exchange. We can conclude from the data that the Martins model is only accurate in specific circumstances. Specifically, we can reject the two null hypotheses when the red and blue circles overlap for medium to high reliability. Furthermore, we identified two phenomena occurring in this experiment; along with the phenomena explained by the Martins model, we observed participants making discrete choices. The discrete choice behaviour exclusively occurred when the red and blue circles didn’t significantly overlap. We conjectured that the discrete choice occurred due to the human need to simplify the problem or participants not completely trusting the blue circle as their own opinion. Either way, this highlights the multifaceted nature of opinion exchange and illustrates the context-sensitivity of human behaviour. For a model of opinion exchange to sufficiently capture the complexities of interactions, the model would need to navigate the context of an interaction. Essentially the model needs to switch between discrete and continuous opinions when appropriate creating a complete synthesis of a discrete and continuous opinion model.

The Martins model predicted the opinion shifts of participants with reasonable precision when only considering the data explained by the Martins model. The R2 for the linear tread lines are low because of the presence of outliers and a significant number of interactions that resulted in negative opinion shift, but from Fig 2 it is clear that there exists a linear trend. We can conclude that the Martins model predicts the general behaviour of the participants when there is a significant overlap between the red and blue circles. This conclusion is weak, and a more robust experiment with more participants is needed to determine whether the Martins model predicts human behaviour. Due to the simplicity of our experiment design, it should be easy to recreate this experiment at scale.

Supporting information

S1 Fig. The instruction given to every participant.

(TIF)

S2 Fig. An example of the game a participant would of played when a participant is exposed to the blue circle.

(TIF)

S3 Fig. An example of the game a participant would of played when a participant is exposed to the red circle.

(TIF)

S4 Fig. An example of a completed the game for a participant.

(TIF)

S1 File. The experiments website source code.

(ZIP)

S2 File. Raw data collected from the experiment.

(CSV)

S1 Appendix. Equations of the Martins model.

(PDF)

Acknowledgments

We would like to acknowledge the contribution from the QUT eResearch team: Ryan Bennett, Mitchell Haring, Yvette Wyborn, Craig Windell and Adam Smith, for their contribution in setting up, running and testing the website involved in this study.

Data Availability

All relevant data are within the paper and its Supporting information files.

Funding Statement

Robyn P. Araujo is the recipient of an Australian Research Council (ARC) (https://www.arc.gov.au/) Future Fellowship (project number FT190100645) funded by the Australian Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Sobkowicz P. Modelling Opinion Formation with Physics Tools: Call for Closer Link with Reality. Journal of Artificial Societies and Social Simulation. 2009;12(111). [Google Scholar]
  • 2. Friedkin NE, Proskurnikov AV, Bullo F. Group dynamics on multidimensional object threat appraisals. Social Networks. 2021;65:157–167. doi: 10.1016/j.socnet.2020.12.009 [DOI] [Google Scholar]
  • 3. Pescetelli N, Yeung N. The effects of recursive communication dynamics on belief updating. Proceedings of the Royal Society B: Biological Sciences. 2020;287(1931):20200025. doi: 10.1098/rspb.2020.0025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Flache A, Mäs M, Feliciani T, Chattoe-Brown E, Deffuant G, Huet S, et al. Models of Social Influence: Towards the Next Frontiers. Journal of Artificial Societies and Social Simulation. 2017;20(4). doi: 10.18564/jasss.3521 [DOI] [Google Scholar]
  • 5. Hegselmann R, Krause U. Opinion dynamics and bounded confidence: Models, analysis and simulation. Journal of Artificial Societies and Social Simulation. 2002;5:1–24. [Google Scholar]
  • 6. Weisbuch G, Deffuant G, Amblard F, Nadal JP. Meet, discuss, and segregate! Complexity. 2002;7(3):55–63. [Google Scholar]
  • 7. Martins ACR. Bayesian updating rules in continuous opinion dynamics models. Journal of Statistical Mechanics: Theory and Experiment. 2009;2009(02):P02017. doi: 10.1088/1742-5468/2009/02/P02017 [DOI] [Google Scholar]
  • 8. Adams J, White G, Araujo R. The Role of Mistrust in the Modelling of Opinion Adoption. Journal of Artificial Societies and Social Simulation. 2021;24(4). doi: 10.18564/jasss.4624 [DOI] [Google Scholar]
  • 9. Lorenz J, Rauhut H, Schweitzer F, Helbing D. How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences. 2011;108(22):9020–9025. doi: 10.1073/pnas.1008636108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Soll JB, Larrick RP. Strategies for revising judgment: How (and how well) people use others’ opinions. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2009;35(3):780–805. [DOI] [PubMed] [Google Scholar]
  • 11. Swol LMV. Extreme members and group polarization. Social Influence. 2009;4(3):185–199. doi: 10.1080/15534510802584368 [DOI] [Google Scholar]
  • 12. Yaniv I. Receiving other people’s advice: Influence and benefit. Organizational Behavior and Human Decision Processes. 2004;93(1):1–13. doi: 10.1016/j.obhdp.2003.08.002 [DOI] [Google Scholar]
  • 13. French JR Jr. A FORMAL THEORY OF SOCIAL POWER. Psychological review. 1956;63(3):181. doi: 10.1037/h0046123 [DOI] [PubMed] [Google Scholar]
  • 14. Moussaïd M, Kämmer JE, Analytis PP, Neth H. Social Influence and the Collective Dynamics of Opinion Formation. PloS one. 2013;8(11):e78433. doi: 10.1371/journal.pone.0078433 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Deffuant G, Amblard F, Weisbuch G, Faure T. How can Extremism Prevail? A Study Based on the Relative Agreement Interaction Model. Journal of artificial societies and social simulation. 2002;5(4). [Google Scholar]
  • 16. Corazzini L, Pavesi F, Petrovich B, Stanca L. Influential listeners: An experiment on persuasion bias in social networks. European Economic Review. 2012;56(6):1276–1288. doi: 10.1016/j.euroecorev.2012.05.005 [DOI] [Google Scholar]
  • 17. Battiston P, Stanca L. Boundedly rational opinion dynamics in social networks: Does indegree matter? Journal of Economic Behavior &; Organization. 2015;119:400–421. doi: 10.1016/j.jebo.2015.08.013 [DOI] [Google Scholar]
  • 18. Mason WA, Conrey FR, Smith ER. Situating Social Influence Processes: Dynamic, Multidirectional Flows of Influence Within Social Networks. Personality and Social Psychology Review. 2007;11(3):279–300. doi: 10.1177/1088868307301032 [DOI] [PubMed] [Google Scholar]
  • 19. Kitts JA. Social influence and the emergence of norms amid ties of amity and enmity. Simulation Modelling Practice and Theory. 2006;14(4):407–422. doi: 10.1016/j.simpat.2005.09.006 [DOI] [Google Scholar]
  • 20. Mark NP. Culture and Competition: Homophily and Distancing Explanations for Cultural Niches. American Sociological Review. 2003;68(3):319. doi: 10.2307/1519727 [DOI] [Google Scholar]
  • 21. Martins ACR. Trust in the CODA model: Opinion dynamics and the reliability of other agents. Physics Letters A. 2013;377(37):2333–2339. doi: 10.1016/j.physleta.2013.07.007 [DOI] [Google Scholar]
  • 22. Takács K, Flache A, Mäs M. Discrepancy and Disliking Do Not Induce Negative Opinion Shifts. PLOS ONE. 2016;11(6):e0157948. doi: 10.1371/journal.pone.0157948 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

José Manuel Galán

13 May 2022

PONE-D-22-10361Person-to-person opinion dynamics: an empirical study using an online gamePLOS ONE

Dear Dr. Adams,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both reviewers find the work meritorious and of interest after a serious and thorough analysis of the work. Notwithstanding some of the objections and possible biases in the design of experiments, the explanations of the other elements pointed out by the reviewers deserve a detailed response before considering the work for potential publication. I look forward to a detailed review of the paper.

Please submit your revised manuscript by Jun 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

José Manuel Galán, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"Robyn P. Araujo is the recipient of an Australian Research Council (ARC) Future Fellowship (project number FT190100645) funded by the Australian Government"

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

"Robyn P. Araujo is the recipient of an Australian Research Council (ARC) (https://www.arc.gov.au/) Future Fellowship (project number FT190100645) funded by the Australian Government.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Additional Editor Comments:

Both reviewers find the work meritorious and of interest after a serious and thorough analysis of the work. Notwithstanding some of the objections and possible biases in the design of experiments, the explanations of the other elements pointed out by the reviewers deserve a detailed response before considering the work for potential publication. I look forward to a detailed review of the paper.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I found the experiment carried out by the authors very interesting to verify opinion models, especially the more abstract approach they have shown in the experiment compared to previous works. I consider this work interesting and innovative enough to be published in this journal. However, below I have proposed some minor changes that I think could improve the presented paper:

- The authors do a very extensive literature review of previous experiments, explaining them in great detail, but I miss much more information about their own. Nowhere is it explained what instructions are given to the participants about how the experiment works, how the scores given to them as a result of a correct guess are calculated or what kind of reward the participants receive in the experiment. I think that all this information should be included in order to better understand the behaviour of the participants, as they can have a great influence on the conclusions. For example, in the discussion section, the authors conclude that “participants are treating the problem of finding the dot as a discret choice, i.e. it must be in either in the red circle or the blue”. This behaviour may be a consequence of the game's instructions not being clear enough.

- The authors explain that the participants were encouraged to play again and that in total 3760 games were played by a total of 257 unique users. It would be very interesting to know the distribution of the total number of games played by each participant, to check if the results could be influenced by someone who has repeated the game too many times. In addition, an analysis could be made of whether there exists a "learning curve" in the participants, i.e. if the score obtained by each participant improves as the number of games increases. This would indicate that the observations are not independent, and therefore would need to be taken into account.

- I suggest the authors to change the name of the variable Δx confusing. This variable represents not only a variation in x, but in the distance between the centres of the two circles. I think another name would be more appropriate.

- The condition (7) presented by the authors does not represent whether the two circles overlap. The correct condition would be Δx < r_blue + r_red. This condition should be corrected or the reason for the choice should be explained.

- I suggest the authors to improve the quality of the images included in the manuscript, as the it is very low. In figure 3d it is not even possible to distinguish correctly the red line and in figure 5 the histograms are not visible because they are partially covered by the legend.

- I found some typos while reading the text (e.g. "radii" just after equation 1 or "Of the data that the Matins model" in the conclusions section). I encourage the authors to read all the text again and correct them.

Reviewer #2: Referee Report – Manuscript PONE-D-22-10361

“Person-to-person opinion dynamics: an empirical study using an online game”

In this work, authors carry out an experiment trying to emulate the Martins model on opinion dynamics. In their experiment, participants must guess the location of a hidden dot in a space. They are given two pieces of information: two circles representing the possible location of the hidden object. Participants first observe a blue circle with 80% accuracy level and then a red circle with either a 20%, 50% or 80% accuracy level. They are finally asked to indicate where they believe the dot is, also by representing a circle. Authors find that the Martins model only explains part of the observed results. In particular, the behaviour when there is a significant overlapping of the blue and red circles. When there is no overlapping, the prediction ability of the model is lower. Authors indicate that the possible explanation of this is that participants may be treating the game as a discrete choice between the red and the blue circles rather than as a continuous choice in all the space.

The paper is novel in its objective of experimentally testing the Martins model, but I have some concerns, especially regarding the experimental design. In what follows, I will present these concerns, along with some recommendations which I hope can help authors improve the quality of their paper.

MAJOR COMMENTS

Introduction:

1. I have missed a summary of the experimental design and a summary of the main results in the introduction.

Previous experiments:

1. I like the way in which you present these previous works but would like to know the relevance of some papers you only mention but do not explain, like references [9], [10] and [12].

2. Furthermore, what is the contribution of your work to these previous papers?

Materials and methods:

1. Where participants incentivized somehow as to elicit truthful behaviour?

2. How did you treat the data of participants who left in the middle of the game? Was this frequent?

The experiment:

1. My main concern with this work comes in this regard. As the game is designed it resembles more a situation of evaluating a perfect signal (blue circle) vs. an imperfect signal (red circle) rather than personal vs. outside information. I know that the blue circle is not perfect, but it gives the highest possible level of accuracy they can have (80%). I personally find it challenging to interpret that the blue circle represents an individual’s initial opinion and that the red circle represents the opinion of another individual. I see it as two signals of different quality whereas opinions have deeper connotations. For instance, if the red circle comes with a green light (80% accuracy), there should be some kind of bias towards the blue circle as it represents one’s opinion. When designing an experiment, it is difficult to find the equilibrium between making something abstract as to minimise certain biases, but at the same time, appliable to the real world. In this case, I find it particularly difficult to transfer this setting to opinion dynamics if the blue circle does not count with a more “personal touch”, let’s say.

2. Be careful with the use of red and blue in the game, as experimental evidence has shown that these colours may carry some political (and mostly unconscious) connotations. You could do a small control with the colours reversed as to discard any colour effect.

3. Be also careful with the order in which participants receive the two pieces of information. There could be an order effect, specially when the red circle comes with a green light, and one of the pieces of information has more relevance because of the order in which this information is shown. I recommend you to check literature about information disclosure in this line. If there is evidence that we usually pay more attention to the first piece of information than to the second one, this could capture the previous idea of the bias towards the blue circle (one’s opinion), that I pointed out before. Otherwise, you could also do a small control where you show the circles in the reversed order and check that your results are robust to the order in which information is disclosed.

4. If I understood correctly, participants could play as many times as possible. If this is the case, I am concerned about how you treat this data. Repeating the game allows for a learning process but if they can repeat it as many times as they wish, you are allowing for different learning degrees. One could leave it after just one try and another one could play 20 times. Are you treating the first’s unique attempt as the 20th attempt of the second participant or are you distinguishing them somehow?

Model and Data Predictions:

1. Some intuition about the data predictions could help the reader follow this section and prepare him/her for the results.

2. From my point of view, presenting a set of hypotheses to be later tested would also be useful for the reader.

Results:

1. I personally found this section specially challenging to follow. It is the most important part of the paper and I believe it deserves some special attention. Please, reconsider the exposition of your results in this section. Some paragraphs seem repetitive and other results may go unnoticed. I suggest you to enumerate your results as to make them clearer for the reader.

Discussion and conclusion:

1. These sections clarify the results section. Beyond finding to what extent the Martins model explains these experimental results, I would appreciate if you could also translate this to the topic at hand: opinions dynamic. What do these results indicate us about how we exchange opinions in the real world?

2. Are there any other interpretations of why the Martins model fails to explain behaviour when circles do not overlap? Behavioural biases? Participants not correctly understanding what the best outcome for them was during the experiment?

MINOR COMMENTS

1. Please revise typos in lines 71, 84, 186, 218, 237, 248, 253, 254, 255 and 288.

2. I suggest you to read the following paper and related works, just in case they seem relevant in the experimental literature on opinions dynamics:

- Battiston, P., & Stanca, L. (2015). Boundedly rational opinion dynamics in social networks: Does indegree matter?. Journal of Economic Behavior & Organization, 119, 400-421.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Diego Escribano Gómez

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Oct 6;17(10):e0275473. doi: 10.1371/journal.pone.0275473.r002

Author response to Decision Letter 0


11 Jul 2022

Reviewer #1:

- The authors do a very extensive literature review of previous experiments, explaining them in great detail, but I miss much more information about their own. Nowhere is it explained what instructions are given to the participants about how the experiment works, how the scores given to them as a result of a correct guess are calculated or what kind of reward the participants receive in the experiment. I think that all this information should be included in order to better understand the behaviour of the participants, as they can have a great influence on the conclusions. For example, in the discussion section, the authors conclude that “participants are treating the problem of finding the dot as a discrete choice, i.e. it must be in either in the red circle or the blue”. This behaviour may be a consequence of the game's instructions not being clear enough.

Response: We apologise for the unclear instructions and insufficient detail on the game and scoring. We have addressed this by amending the Experiments section, including a subsection discussing the instructions given to the participants in detail. We have also included a subsection focusing on scoring to engage participants and encourage further play. We note that the reviewer’s concerns about lack of clarity in the instructions to the player are interesting but would note that the instructions in the actual game were much more explicit than those initially described in the paper; these instructions are now included explicitly in the revised manuscript.

- The authors explain that the participants were encouraged to play again and that in total 3760 games were played by a total of 257 unique users. It would be very interesting to know the distribution of the total number of games played by each participant, to check if the results could be influenced by someone who has repeated the game too many times. In addition, an analysis could be made of whether there exists a "learning curve" in the participants, i.e. if the score obtained by each participant improves as the number of games increases. This would indicate that the observations are not independent, and therefore would need to be taken into account.

Response: We have now included a new subsection in Results investigating this issue in detail. We would like to thank both reviewers for giving us the opportunity to include this important analysis in the paper. Results show that there is no significant learning over multiple plays.

- I suggest the authors to change the name of the variable Δx confusing. This variable represents not only a variation in x, but in the distance between the centres of the two circles. I think another name would be more appropriate.

Response: We have changed Δx to dx to be clearer and avoid confusion.

- The condition (7) presented by the authors does not represent whether the two circles overlap. The correct condition would be Δx < r_blue + r_red. This condition should be corrected or the reason for the choice should be explained.

Response: We apologise for the lack of clarity in the notation and have re-written the condition as suggested.

- I suggest the authors to improve the quality of the images included in the manuscript, as the it is very low. In figure 3d it is not even possible to distinguish correctly the red line and in figure 5 the histograms are not visible because they are partially covered by the legend.

Response: We have reviewed and re-produced the figures for improved clarity and readability,

- I found some typos while reading the text (e.g. "radii" just after equation 1 or "Of the data that the Matins model" in the conclusions section). I encourage the authors to read all the text again and correct them.

Response: We apologise for the oversight in proofing the manuscript and have thoroughly reviewed the manuscript correcting the typographical errors.

Reviewer #2: Referee Report – Manuscript PONE-D-22-10361

“Person-to-person opinion dynamics: an empirical study using an online game”

MAJOR COMMENTS

Introduction:

1. I have missed a summary of the experimental design and a summary of the main results in the introduction.

Response: We apologise for the omission and have now included a summary of the experimental design and results in the Introduction.

Previous experiments:

1. I like the way in which you present these previous works but would like to know the relevance of some papers you only mention but do not explain, like references [9], [10] and [12].

2. Furthermore, what is the contribution of your work to these previous papers?

Response: We have now expanded on the literature review section to contextualise the cited papers and explain their relevance to our paper, and well as the context of our paper with respect to the broader literature.

Materials and methods:

1. Where participants incentivized somehow as to elicit truthful behaviour?

Response: Participants were not materially incentivised; this was made clear to the participants with a prominent disclaimer on the website. The only incentive is participants’ personal desire to maximise their score in each game. The score was framed in ‘$’, which is part of the game’s backstory designed to engage players and get them personally invested in the game’s outcome.

2. How did you treat the data of participants who left in the middle of the game? Was this frequent?

Response: If a participant left in the middle game we recorded no data, thus have no record of incomplete games. This is now explicitly stated in the manuscript.

The experiment:

1. My main concern with this work comes in this regard. As the game is designed it resembles more a situation of evaluating a perfect signal (blue circle) vs. an imperfect signal (red circle) rather than personal vs. outside information. I know that the blue circle is not perfect, but it gives the highest possible level of accuracy they can have (80%). I personally find it challenging to interpret that the blue circle represents an individual’s initial opinion and that the red circle represents the opinion of another individual. I see it as two signals of different quality whereas opinions have deeper connotations. For instance, if the red circle comes with a green light (80% accuracy), there should be some kind of bias towards the blue circle as it represents one’s opinion. When designing an experiment, it is difficult to find the equilibrium between making something abstract as to minimise certain biases, but at the same time, appliable to the real world. In this case, I find it particularly difficult to transfer this setting to opinion dynamics if the blue circle does not count with a more “personal touch”, let’s say.

Response: We appreciate the careful and considered thought the reviewer gave our paper. We should clarify that the score given the red circle is not a measure of accuracy but is a measure of the probability that the circle contains any information, i.e. its reliability as a source of information about the location of the dot. The blue circle is the initial piece of information given to the player and is 100% reliable; hence, the player will likely adopt this as "their" belief about the location of the dot. The game's instructions, which we have now included in the revised manuscript, clearly explained this to the players.

Evaluating the red circle is largely a question of trust, i.e. "do I trust that the red circle is giving me useful information?" Individuals' bias towards the blue circle is measured as h_observed and compared to the value predicted by the modified Martin's model h_expected, which accounts for the reliability of the red circle.

We can further agree with the reviewer's comments regarding experimental design challenges, particularly in opinion dynamics. We believe that the backstory given to the players (now included in the new S1 Fig) along with the structure of the game would engage players and, by design, encourage them to adopt the blue circle as their own opinion, thus leading to a more personal investment in the game.

Player deference to the blue circle as their opinion is borne out by the congruence between player behaviour and the behaviour predicted by the modified Martins model (which assumes that the blue circle is the player's initial opinion). Because the game is an abstract (context free) exercise, we do not need to take the initial steps seen in other experiments (eg. [ref]) of measuring individuals initial or baseline opinions. Instead, we supply this in the form of the blue circle. Asking general knowledge questions, like in Mossilid 2013, or asking for stances on political issues, relies on measuring those initial opinions/answers, which may introduce external biases. Our abstract approach improves upon this issue by directly controlling a participant's initial opinion.

2. Be careful with the use of red and blue in the game, as experimental evidence has shown that these colours may carry some political (and mostly unconscious) connotations. You could do a small control with the colours reversed as to discard any colour effect.

Response: We agree that this presents a potential issue for the manuscript. We believe that the manuscript avoids political bias from the choice of colours because the experiment is abstract enough to avoid any political association. In addition, the participants are sourced online, thereby drawing on an international population, and political parties sharing common colours across different countries may hold very different political viewpoints.

We chose the colours blue and red for accessibility (i.e. using websafe high-contrast colours and avoiding issues with dichromatics). Political colour bias might contribute to the noise in the data, but we believe that individual colour biases should effectively cancel each other out. Geotagging the IP addresses of the participants to break down the data by country would have violated our ethics approval; thus, it is impossible to determine the location of individuals and any potential political colour bias. We have included additional discussion of these points in our revised discussion section.

3. Be also careful with the order in which participants receive the two pieces of information. There could be an order effect, specially when the red circle comes with a green light, and one of the pieces of information has more relevance because of the order in which this information is shown. I recommend you to check literature about information disclosure in this line. If there is evidence that we usually pay more attention to the first piece of information than to the second one, this could capture the previous idea of the bias towards the blue circle (one’s opinion), that I pointed out before. Otherwise, you could also do a small control where you show the circles in the reversed order and check that your results are robust to the order in which information is disclosed.

Response: We respectfully emphasize that the blue circle represents the player’s own initial opinion, hence we show the blue circle to the participants first to take advantage of any order bias and cement the blue circle in the participants’ minds as “their” opinion. As we clarified above, the experiment is not designed to measure participants’ interpretation of perfect vs imperfect signals (in which case, randomising the order in which participants are presented with information would have been essential).

4. If I understood correctly, participants could play as many times as possible. If this is the case, I am concerned about how you treat this data. Repeating the game allows for a learning process but if they can repeat it as many times as they wish, you are allowing for different learning degrees. One could leave it after just one try and another one could play 20 times. Are you treating the first’s unique attempt as the 20th attempt of the second participant or are you distinguishing them somehow?

Response: The reviewer makes an excellent point. We have now added two different analyses to our revised manuscript to address this issue. Specifically, (1) we calculated the average score obtained on every attempt. We found that the average score of the first attempt is not substantially nor significantly different from the average score of subsequent attempts (see Table 5 in revised manuscript), suggesting that no learning is occurring and that participants behave the same regardless of the attempt. We therefore consider that treating the first attempt of a participant equally to later attempts is justified. (2) As we now highlight in our revised manuscript, 80% people played up to 23 game.

Model and Data Predictions:

1. Some intuition about the data predictions could help the reader follow this section and prepare him/her for the results.

2. From my point of view, presenting a set of hypotheses to be later tested would also be useful for the reader.

Response: These are excellent suggestions, and we have now updated our manuscript to include a hypothesis at the end of Model and Data Predictions while also elaborating on the intuition of the predictions.

Results:

1. I personally found this section specially challenging to follow. It is the most important part of the paper and I believe it deserves some special attention. Please, reconsider the exposition of your results in this section. Some paragraphs seem repetitive and other results may go unnoticed. I suggest you to enumerate your results as to make them clearer for the reader.

Response: We appreciate this thoughtful suggestion. We have now carefully revised our Results section, and segmented the our results into sub-sections. We also add clarity by enumerating our overarching findings in the introduction to our Results section.

Discussion and conclusion:

1. These sections clarify the results section. Beyond finding to what extent the Martins model explains these experimental results, I would appreciate if you could also translate this to the topic at hand: opinions dynamic. What do these results indicate us about how we exchange opinions in the real world?

Response: Polarisation in the Martins model is caused by mistrust, which is encapsulated by the parameter p, i.e. the probability that an individual will share misinformation. However, p does not directly influence an agent's new opinion when interacting, but through an intermediate variable p* - the result of applying Bayes' theorem to the model. The variable p* modifies p to reflect how more or less trustworthy an individual is. For instance, if an agent interacts with someone of very different opinion, then, according to the Martins model, the agent will believe the other agent less because, through Bayesian inference, the first agent deduces that the other agent is likely to be incorrect. The Martins model explains confirmation bias and polarisation as the result of individuals understanding that misinformation exists, and using that knowledge to reject new and different information because, based on their own opinion, the new and different information is more likely to be misinformation. There are two aspects to our results that illuminate the Martins model. In the first instance when interacting agents have relatively similar opinions, the Martins model predicts their behaviour well. In the second instance when the interacting agents have drastically differing opinions the Martins model is a poor predictor of the outcome of their interaction. The Martins model expects no shift to occur when the circle do not overlap since that would be clear evidence that the red circle is misinformed. Instead, we see participants compromising or adopting the red opinion, suggesting that when the circles are distinct participants start thinking in a discrete opinion context. The Martins model considers opinions as continuous thus the model is insufficient to describe these situations. When there is overlap between the circles, the model is surprisingly accurate in predicting opinion shift, suggesting that the model's explanation of confirmation bias is justified.

2. Are there any other interpretations of why the Martins model fails to explain behaviour when circles do not overlap? Behavioural biases? Participants not correctly understanding what the best outcome for them was during the experiment?

Response: There are multiple possible reasons why the Martins model fails to predict the behaviour, but the simplest explanation is that participants treat the game as a discrete choice problem. We can build on this explanation by speculating that the switch to discrete thinking is motivated by a cognitive bias to simplify the problem. If the participants did not understand the best outcome, we would expect no distinguished peaks in Figs 4 and 5, yet we do see distinguishable peaks. Participants understood that to maximise their score, they needed to guess better than the blue circle and attempted to incorporate the limited information provided by the red circle.

MINOR COMMENTS

1. Please revise typos in lines 71, 84, 186, 218, 237, 248, 253, 254, 255 and 288.

Response: We appreciate the reviewer’s consideration and conducted a thorough proofreading of the manuscript, correcting all typos.

2. I suggest you to read the following paper and related works, just in case they seem relevant in the experimental literature on opinions dynamics:

- Battiston, P., & Stanca, L. (2015). Boundedly rational opinion dynamics in social networks: Does indegree matter?. Journal of Economic Behavior & Organization, 119, 400-421.

Response: We thank the reviewer for the suggestion (and complete citation) of the paper by Battison and Stanca. Although their paper, which focuses on behaviour on social networks, is not directly related to our work, we found numerous authors' insights and comments useful. We now cite this work in our updated literature review.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

José Manuel Galán

19 Sep 2022

Person-to-person opinion dynamics: an empirical study using an online game

PONE-D-22-10361R1

Dear Dr. Adams,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

José Manuel Galán, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Both reviewers consider that most of the issues raised in the original submission have been resolved. Reviewer 1 requests clarification of some minor aspects to make the article easier to understand. Please try to address those suggestions in your final submission, but I consider the paper worthy of publication.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have responded to all the reviewers' comments, and the new version is much clearer. The introduction of the experiment is much more complete, the calculation of the score better explained, and the analysis of the results more detailed. However, I have some questions about the new sections added with respect to the previous version: Scoring and Breakdown participant of individual participant involvement.

Scoring

1. The authors state that both accuracy and precision are real numbers, so that they can take negative values. I find this a bit counter-intuitive to understand. What would a negative accuracy or precision mean? What are their consequences on the outcome of the game?

2. The authors use values of 0.1 and 70 for the weights associated with accuracy and precision, respectively. Why do you use these particular two values? Is it an arbitrary decision? How would the game's outcome and even the conclusions change if you used another range of values for the analysis?

Breakdown of individual participant involvement

- The analysis presented in Table 5 that concludes there is no "learning" during the rounds is a bit sparse. These results show that the mean value may be comparable but don't explain the distribution of values, quartiles or outliers. I suggest performing the same analysis but presenting the results for each round in a boxplot so that the information presented is more visual and complete.

Other changes

- I would also suggest the authors to change the notation dx to d, since this variable represents the distance in the plane, and not only in the x axes.

- Please revise grammar typos in lines 163 and 164.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Diego Escribano

Reviewer #2: No

**********

Acceptance letter

José Manuel Galán

27 Sep 2022

PONE-D-22-10361R1

Person-to-person opinion dynamics: an empirical study using an online game

Dear Dr. Adams:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. José Manuel Galán

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. The instruction given to every participant.

    (TIF)

    S2 Fig. An example of the game a participant would of played when a participant is exposed to the blue circle.

    (TIF)

    S3 Fig. An example of the game a participant would of played when a participant is exposed to the red circle.

    (TIF)

    S4 Fig. An example of a completed the game for a participant.

    (TIF)

    S1 File. The experiments website source code.

    (ZIP)

    S2 File. Raw data collected from the experiment.

    (CSV)

    S1 Appendix. Equations of the Martins model.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES