Mechanistic interplay between information spreading and opinion polarization

Kleber Andrade Oliveira; Henrique Ferraz de Arruda; Yamir Moreno

doi:10.1093/pnasnexus/pgaf402

. 2025 Dec 30;5(1):pgaf402. doi: 10.1093/pnasnexus/pgaf402

Mechanistic interplay between information spreading and opinion polarization

Kleber Andrade Oliveira ^1,^b,^✉, Henrique Ferraz de Arruda ^2,^3,^✉, Yamir Moreno ^4,⁵

Editor: Attila Szolnoki

PMCID: PMC12814710 PMID: 41561165

Abstract

We investigate how information-spreading mechanisms affect opinion dynamics and vice versa via an agent-based simulation on adaptive social networks. First, we characterize the impact of reposting on user behavior with limited memory, a feature that introduces novel system states. Then, we build an experiment mimicking information-limiting environments seen on social media platforms and study how the model parameters can determine the configuration of opinions. In this scenario, different posting behaviors may sustain polarization or reverse it. We further show the adaptability of the model by calibrating it to reproduce the statistical organization of information cascades as seen empirically in a microblogging social media platform. Our model combines mechanisms for platform content recommendation, connection rewiring, and limited-attention user behavior, paving the way for a robust understanding of echo chambers as a specialized phenomenon of opinion polarization.

Significance Statement.

Polarization and information diffusion are often studied separately, yet in online environments they are deeply intertwined. We present an agent-based model that incorporates limited memory, reposting, and recommendation algorithms, capturing how real social media feeds shape both cascades of information and opinion dynamics. Small changes in posting or recommendation rules can flip a system from consensus to polarization. Moreover, the parameters that reproduce empirical cascade statistics from the Brexit and vaccine debates also yield polarized opinion states. This mechanistic link provides a framework for understanding echo chambers and suggests practical guides for intervention, advancing both fundamental research and policy-relevant insights into digital democracy.

Introduction

The term echo chamber was popularized by Jamieson and Cappella (1) to describe a bounded media space that amplifies messages while insulating them from rebuttal. With the rise of social media, echo chambers, often conflated with filter bubbles (2), have become central to debates on misinformation, democratic resilience, and collective decision-making (3, 4). These information-limiting environments are thought to exacerbate polarization by restricting exposure to diverse viewpoints (5, 6).

Despite their importance, the mechanisms driving echo chambers remain difficult to pin down. Online behavior arises from a multilayered interplay between individual activity and platform-level algorithms. Features such as feed ranking, recommendation, and follow/unfollow dynamics evolve continuously and are rarely transparent. To understand and eventually mitigate polarization, it is crucial to develop mechanistic models that disentangle user-driven interactions from algorithmic interventions and allow systematic exploration of their effects.

Opinion dynamics models have long provided insights into how interactions shape collective beliefs (7–11). Beyond static interactions, adaptive rewiring shows how homophily can fragment networks into polarized communities (12–15). Recommendation mechanisms can further sharpen polarization, even when rewiring is unbiased (16, 17). Yet, the role of information diffusion itself, including limited attention and reposting, has received comparatively less mechanistic treatment.

Empirical studies show that individuals do not scale their attention with the abundance of available content (18). Instead, reposting, memory limits, and competition among posts produce heavy-tailed popularity distributions and critical-like dynamics (19–21). While such mechanisms are central to the physics of social media (22), their connection to polarization and echo-chamber formation remains underexplored.

Here, we investigate the interplay between information spreading and opinion polarization using an agent-based model on adaptive networks. We extend the framework of de Arruda et al. (16) by equipping agents with limited memory, enabling reposting of past content as in real-world feeds. This extension lets us track information cascades while studying how posting, recommendation, and rewiring interact with opinion dynamics.

We address three questions: (i) How does reposting under limited memory alter the states of consensus and polarization? (ii) Under what conditions do information-limiting environments sustain polarization or allow depolarization? (iii) Can the model reproduce empirical cascade patterns observed in polarized debates, and what does this imply about the mechanisms behind polarization? Through extensive simulations and calibration against datasets on Brexit and vaccine debates (23), we show that the same mechanisms that generate realistic cascade distributions also reproduce polarized opinion states. This provides a mechanistic link between diffusion and polarization.

The agent-based model

Our model extends the one introduced by de Arruda et al. (16). That model represents social media platform interactions and treats opinion as a continuous variable, bounded between $-$ 1 and 1, held by each user. Social media users are represented as nodes in an adaptive directed network. A user points to another if they receive content from them (followership relation); that is, the content is spread following the opposite direction of edges.

Each iteration of the model follows successive steps, which act as filters that determine whether a randomly activated user goes to the next step.

In this work, we extend the model by allowing users to post previously existing content. Each piece of content has a fixed opinion value θ, always drawn uniformly at random upon its creation. However, here each user is equipped with a memory list of size α, filled as they receive content from neighbors (i.e. a social media feed). Each user’s memory can be seen as an ordered list of θ values from previous posts.

Memory lists are updated in such a way that the last piece of content received is prioritized. We index the memory list with $θ_{c}$ , such that $c \in {1, \dots, α}$ , and the lower the index, the more likely the content will be posted. New pieces of content are introduced from the beginning of the list ( $θ_{1}$ ) based on what users receive from their neighbors. In this case, existing pieces of content move to a lower order (i.e. $θ_{c + 1} \leftarrow θ_{c}$ ).

When the list is at full capacity (there are α pieces of content) and new content is received, the piece at the bottom ( $θ_{α}$ ) is removed, and all other pieces move one position lower so that the new one is placed at the top.

When reposting, users choose the first piece of content in their memory lists ( $θ_{1}$ ). Once the content is posted, it moves from the beginning to the end of the list, so it will be picked last. Hence, the opinion of the content which was just posted becomes $θ_{α}$ in the memory list, and each other piece moves to a higher order (i.e. $θ_{c} \leftarrow θ_{c + 1}$ ).

We further introduce a new parameter called the innovation probability μ, which, at the posting step, controls how many iterations inject new content into the system. That is, when $μ = 1$ , the base model from de Arruda et al. (16) is recovered regardless of the memory lists.

Each iteration of our model is described by the following steps:

Activating: a user i, chosen uniformly at random, becomes active.
Posting: the active user i either creates a new content with probability μ, or chooses the content at the top of their memory list ( $θ = θ_{1}$ ) with probability $1 - μ$ . The content is posted according to a probabilistic filter function (Eq. 1) of their opinion $b_{i}$ and the content opinion θ.
Receiving: Each follower j of node i receives the content according to another probabilistic filter function (Eq. 2) of $b_{j}$ , $b_{i}$ and a parameter ϕ. When they do, the content is added to their memory list from the top. The control of this filter through ϕ is a proxy for a social media recommendation algorithm.
Realigning: Each opinion $b_{j}$ of followers who successfully received the content increases or decreases by a constant Δ. The opinions are repelled away from the post with probability $| θ - b_{j} | / 2$ and attracted otherwise.
Rewiring: Each follower j may stop following i and start following another user at random through a probability proportional to the difference between $b_{i}$ and $b_{j}$ .

These steps are visually summarized in Fig. 1. In our model extension, we implement rules to define user behavior, which were previously studied in Refs. (16, 24). In particular, we set the probabilistic rule acting as the posting filter as

P_{p} (θ, b_{i}) = \cos^{2} (\frac{π}{2} | θ - b_{i} |),

(1)

where θ is the opinion of the content (whether created or picked from the memory list), and $b_{i}$ is the opinion of the active node i. This is referred to in Ref. (24) as conflicting posting, which is the same as $P_{t}^{pol}$ from Eq. 2 in Ref. (16).

Correspondingly, the receiving filter is given by

P_{r} (b_{i}, b_{j}, ϕ) = \cos^{2} (\frac{π}{2} | b_{i} - b_{j} | + ϕ) .

(2)

This filter was previously studied as the function $P_{d}^{I}$ from Eq. 5 in Ref. (16). We refer to the parameter ϕ as the recommendation control. It determines where the cosine-squared function begins, which enables us to adjust the initial behavior of the recommendation algorithm. That is, it may let information propagate between pairs of users who agree with each other or the opposite. Notice the receiving step does not consider the content opinion θ.

Opinion polarization measures

The main system state we are interested in measuring is the distribution of user opinions. In particular, we characterize a state as polarized when the opinion distribution is bimodal.

To measure the bimodality of the opinion distributions, we adopt the bimodality coefficient used by Arruda et al. (16), which ranges from 0 to 1. The bimodality coefficient $BC (b)$ (25) is defined as

BC (b) = \frac{g^{2} + 1}{k + \frac{3 (n - 1)^{2}}{(n - 2) (n - 3)}},

(3)

where g is the sample skewness (26) and k is the excess kurtosis (26) of the opinion distribution b. The distribution b is typically considered bimodal when $BC (b) > 5 / 9$ , as empirically shown in (25).

Another way to measure how the opinion distribution changes towards polarization is via the moment ratio diagram, which is composed of bounded versions of the third- and fourth-moment ratios of the distribution called L-moments (27). That is, given the opinion distribution b and its kth order statistic $b_{(k)}$ , its sample rth L-moment $λ_{r}$ is defined by

λ_{r} = r^{- 1} (\begin{matrix} n \\ r \end{matrix})^{- 1} \sum_{b_{(1)} < \dots < b_{(j)} < \dots b_{(r)}} (- 1)^{r - j} (\begin{matrix} r - 1 \\ j \end{matrix}) b_{(j)} .

(4)

The moment ratio diagram is composed of the L-skewness $τ_{3} := λ_{3} / λ_{2}$ , which is bounded between $-$ 1 and 1, and the L-kurtosis $τ_{4} := λ_{4} / λ_{2}$ , which is bounded between $-$ 1/4 and 1.

We choose these over the conventional moments (which are unscaled) to compare different stages in the evolution of the opinion distribution under the same scale. However, the L-moments share many of the properties of their conventional counterparts. For example, a distribution with high L-skewness would be interpreted as being skewed. We work out a small example with synthetic distributions to show how these measures behave in the Supporting Information S1.

Results

Innovation probability introduces new states

As the first step to characterize our model, we run an experiment on a known parametrization in de Arruda’s model to understand the impact of the innovation probability μ. To this end, we use an Erdős–Rényi network of size $N = 10^{3}$ and mean in-degree $z = 10.21$ , and pick parameters known to lead the system to a consensus scenario when the original model is recovered ( $μ = 1$ ).

However, running the simulation with the same parameters but a different innovation probability ( $μ = 0.1$ ) yields a polarized system. We compare the two configurations for the same number of iterations in Fig. 2. In Fig. 2a, the opinion distributions are plotted against each other. We also show how each network structure is wired in each case, with nodes colored according to their opinions. Figure 2b shows the case for $μ = 1$ , which is statistically the same as the initial network, and opinions are homogeneously spread around 0. Figure 2c reveals that, for the polarized state at $μ = 0.1$ , the network becomes separated into two densely connected groups of largely opposing opinions.

Fig. 2. — New system states introduced with the parameter μ: the same parameterization except for different innovation probability μ can lead to consensus in a homogeneous topology or two polarized communities. In a), the opinion distribution is in the other two panels, where hexagon markers represent the network in (b) and triangle markers the one in (c). In b), the innovation probability $μ = 1$ . In c), $μ = 0.1$ .

In order to characterize the impact of the innovation probability μ in the opinion distribution, we run a hundred simulations on the same network for $μ = 0$ (no innovation) and $μ = 1$ (de Arruda’s original model). We then use each of these simulations to produce a trajectory of opinion distributions with 100 points. These trajectories are then compared via the moment ratio diagram of Fig. 3, which uses L-moments to describe these distributions. The L-skewness and L-kurtosis are bounded counterparts to the sample skewness and excess kurtosis. Skewness measures the asymmetry of a distribution, indicating whether the distribution of opinions is skewed. Kurtosis indicates whether the data has heavier or lighter tails compared to a normal distribution, with higher kurtosis indicating more extreme values.

We show in Fig. 3 how opinion distributions become very separated, even though they share the same starting point for both $μ = 0$ and $μ = 1$ . This starting point was omitted to improve visibility, but it is found around ( $τ_{3}, τ_{4}$ ) = (0.16,0.16), corresponding to an opinion distribution which is approximately uniform (considering the finite-sized network) between $-$ 1 and 1.

The final point of the $μ = 1$ trajectories is around (0.65,0.3), while that of the $μ = 0$ trajectories is nearly the extreme coordinates (1,1). The extreme coordinates correspond to distributions accumulated into a few specific values, as $μ = 0$ prevents the creation of posts with different opinions θ once the system is initialized. This is the same as the predominance of very few values of θ across the system (see Supporting Information S2).

Simulating an information-limiting environment

Next, we implement an experiment to investigate opinion dynamics parameterized to hinder the spread of information. This is done by running simulations on a network with two communities via a stochastic block model, where users in one community are initialized with opinion 1, and those in the other are initialized with opinion $-$ 1. Each community has 500 users with a mean in-degree and out-degree of $z = 8$ , and only 1% of edges bridge the two communities. Users are forced to remain in their community as we disable the possibility of rewiring. Furthermore, we specify an alternate posting filter (Eq. 5) and compare simulation outcomes between this filter and the original one (Eq. 1). See Materials and methods for more details.

The spread of information is measured by looking at the maximum size reached by cascades, which is the same as the number of users who share the same piece of content. In Fig. 4a and b, we show how cascades can grow according to the relationship between the innovation probability μ and the recommendation control ϕ, each for a specific posting filter (Eq. 1 for (a), Eq. 5 for (b)). For each pair of μ and ϕ, we observe the maximum cascade size in each of the 500 simulations and average them to obtain the solid line, with shades representing the difference to the corresponding SD.

Fig. 4. — Effects of model calibration on information spreading in a synthetic information-limiting environment: In a and b), for two different posting filters, the maximum cascade size produced for a range of recommendation control ϕ and five values of innovation probability μ. For each pair of μ and ϕ, we run 500 simulations and obtain the maximum cascade size in each of them; the solid line is the average among the 500 maximum cascade sizes, while the shade is the average added or subtracted to the corresponding SD. In c and d), again for each respective posting filter, we compare the maximum cascade size with content opinion θ for three values of recommendation control ϕ on a fixed innovation probability $μ = 0.2$ . Since each cascade is associated with an opinion content θ, which is continuous, we group cascades into each of a thousand bins between $-$ 1 and 1. From these, we extract those with the maximum cascade size; the resulting values are represented with lines smoothed via the Savitzky-Golay filter (28).

As expected, the lower the innovation probability μ is, the greater the sizes cascades can reach since fewer posts are competing with each other. Both posting filters have qualitatively similar behavior, with the conflicting posting showing a slightly higher SD for $μ = 0$ . Our experiment setup is shown to limit the spread of information around $ϕ = π / 2$ , where the receiving filter (Eq. 2) hinders communication within and between communities. Next, we measure how contents spread in this scenario regarding their opinion θ. By fixing $μ = 0.2$ and three values of ϕ, Fig. 4c and d shows that the less permissive recommendation control regime ( $ϕ = π / 2$ ) favors extreme content opinion, while the most permissive ( $ϕ = 0$ ) punishes it. However, the middle regime ( $ϕ = 3 π / 8$ ) displays different behaviors for each posting filter. In Fig. 4c, it noticeably hinders the growth of moderate content (i.e. θ around 0). In Fig. 4d, this effect is less pronounced, meaning the dynamics still permit the spread of content from the entire θ range.

Then, we demonstrate in Fig. 5 how the increase of content variety through the parameter μ affects the final opinion distribution for users locked into the two communities starting with extreme opinions. To do this, we use the bimodality coefficient BC. In Fig. 5a, the conflicting posting filter blocks moderate posts (by not allowing their cascades to grow, as seen in Fig. 4c). However, when the aligned posting filter is employed (Fig. 5b), the system is depolarized as the variety of content increases with μ. For $ϕ = π / 2$ , cascade growth is hindered, and therefore, the system remains polarized throughout the range of μ for both filters, well above the line indicating the BC threshold at 5/9. But when the aligned posting filter is used, for $ϕ = 3 π / 8$ and $ϕ = 0$ , we see that even though the BC values of the final opinion distribution show a wide dispersion at small μ, they narrow into unimodal opinion distributions as soon as $μ = 0.3$ and remain under the dashed threshold 5/9 for most of the remaining μ range.

Model calibration against data

We implement a calibration task to see how our model conforms to real-world scenarios. Given the availability of network structure, user opinions, and cascade sizes in the datasets, we opt to focus on the latter as the target of our model calibration. That is, we consider the empirical cascade size distribution to be aggregated over time, while the network structure and user opinions are left as degrees of freedom in the time evolution of our model (even though all simulations start from the same network structure).

Thus, we formulate the model calibration as an optimization task to reproduce the closest possible distribution of cascade sizes over our parameter space. In particular, we search for the parameters that control the behavior of the dynamics: the innovation probability μ, the recommendation control ϕ, the opinion variation Δ, and the number of iterations $n_{iter}$ . The precise formulation of the optimization task is given in Materials and methods.

In Fig. 6, we show the results for simulations parameterized according to the lowest values of KS statistic observed, which are at $α = 27$ for the Brexit dataset and $α = 29$ for the VaxNoVax dataset. For values of $α < 30$ , there is no significant difference in the KS results, indicating that α does not play a major role in the model’s dynamic outcomes. More details regarding the optimization are shown in Supporting Information S4.

For both datasets, the optimized models represent most of the cascades found in the real data. Specifically, the values of the KS statistics are on average $(3.617 \pm 0.595) \times 10^{- 3}$ for Brexit and $(1.385 \pm 0.146) \times 10^{- 2}$ for VaxNoVax. Note that a common choice to consider the KS statistic as a good fit is 0.05. In the case of our results, the values obtained are significantly lower. We acknowledge that there is a limitation in the tails of the distributions, which are not well represented. However, since the plot axes follow logarithmic scales, the discrepancy between data and simulation is much more pronounced at the extreme events (see Fig. 6).

To further understand the opinion dynamics and the effect of the cascades on the opinions, we calculate the bimodality coefficient of the resulting opinions $BC (b)$ . On average, $BC (b)$ is $(0.892 \pm 0.007)$ for Brexit and $(0.623 \pm 0.010)$ for VaxNoVax. Both values indicate that the opinion distributions are bimodal, and thus, the opinions are polarized. Interestingly, the parameters that generate cascades compatible with the real data also polarize opinions in the synthetic system. Furthermore, the bimodality of VaxNoVax is consistent with the results found in real data in our previous study (16), reported to be between $0.60$ and $0.67$ .

The methodology used considers that the best result is not necessarily when the dynamics converge to a fixed result. However, for the best results obtained here, we found that $BC$ does not change significantly even when the simulation runs for many more iterations.

Discussion and conclusion

We introduced an agent-based model that captures the two-way coupling between information diffusion and opinion polarization on adaptive social networks. By incorporating limited memory and reposting behavior, the model reflects key features of social media feeds, enabling us to study how cascades of information interact with the dynamics of opinion formation.

Our findings reveal several insights. First, the innovation probability (i.e. the likelihood of introducing new content) acts as a critical switch between consensus and polarization. Low innovation reduces content diversity, allowing entrenched narratives to dominate and splitting the network into polarized camps. Second, in information-limiting environments polarization is sustained or mitigated, even when users cannot rewire their social ties. Third, by calibrating the model to empirical cascade data from Brexit and vaccine debates, we showed that the parameter regimes that best reproduce real cascade statistics also lead to polarized opinion states.

Conceptually, our work offers a mechanistic explanation of echo chambers as emergent phenomena arising from the combined effects of user memory, reposting, and recommendation. Methodologically, it bridges theoretical opinion dynamics with empirical observations, showing that calibrated agent-based models can reproduce not only qualitative patterns but also statistical properties of real data. Practically, our results suggest that interventions targeting platform parameters, such as increasing content diversity or modifying reposting incentives, may help reduce polarization.

While our study advances understanding of polarized online environments, open challenges remain. Extreme events in cascade growth, not fully captured here, call for incorporating bursty dynamics and heterogeneity in user activity (21, 29, 30). Additionally, richer datasets combining structural, temporal, and opinion measures would enable more comprehensive calibration and validation. One may also want to understand how effective interventions can be designed from the insights of our model, which would involve modeling incentives via evolutionary game theory (31, 32). Furthermore, the study of agent-based models powered by large language models (LLMs) is gaining traction in the field of social simulations (33).

In conclusion, polarization and information diffusion cannot be studied in isolation. By tracking how cascades unfold alongside opinion shifts, our model provides a framework for investigating the mechanisms that sustain or dissolve echo chambers and offers a foundation for designing interventions to foster healthier online discourse.

Materials and methods

Datasets

In our study, we consider two datasets published by Minici et al. (23). They were collected from the social platform Twitter under a background of strongly polarized debates. These datasets contain measurements of network structure, cascades of retweets, and opinions of either users or posts. All of these measures are represented in our model, which makes the data good candidates for empirical studies.

The first dataset, named “Brexit,” was measured in the context of the Brexit referendum in the United Kingdom between May and July of 2016. It features a network with 7,589 nodes and 532,460 edges. The number of cascades is 19,963, with minimum and maximum sizes of 2 and 256, respectively. The complementary cumulative distribution function (CCDF) of the cascade sizes is shown in Fig. 6a.

The second dataset, dubbed “VaxNoVax,” was collected during vaccine debates in Italy in 2018. Its associated network has 14,315 nodes and 1,714,180 edges. It contains 43,923 cascades ranging in size from 2 to 1,468 (see Fig. 6b). We also provide the degree distribution of both networks in Supporting Information S3.

Parameter characterization

To produce Fig. 2 and 3, we set up the simulation on an Erdős-Rényi network of size $N = 10^{3}$ , with recommendation control $ϕ = π / 2$ , opinion variation $Δ = 0.1$ and feed size $α = 1$ . Opinions are initialized as a uniformly random number between $-$ 1 and 1, and we execute $10^{5}$ iterations for Fig. 2, and 100 steps of $10 N$ iterations for each value of μ in Fig. 3.

Information-limiting environment experiment

Here, we describe how our experiment in an information-limiting environment is designed and set. First, we produce a synthetic network with two communities separated by very few edges. We employ the Poisson degree-corrected stochastic block model as implemented in Ref. (34) to generate a network with size $N = 10^{3}$ , with both in-degree and out-degree distributions from Poisson with an average $z = 8$ and two modules. Only $1 %$ of the edges connect the two modules.

We want to design a polarized scenario where users are locked into two distinct communities, both separated by opinion and network structure. Hence, this experiment is executed with the rewiring rule disabled, and users have opinions initialized as 1 in one module and $-$ 1 in the other. That is, the system always starts fully polarized in this experiment.

We also highlight an important distinction to the original model setup. Instead of the posting filter from Eq. 1, we opt to use the following:

P_{p} (θ, b_{i}) = {\begin{matrix} \cos^{2} (\frac{π}{2} | θ - b_{i} |), & if | θ - b_{i} | \leq 1 \\ 0, & otherwise . \end{matrix}

(5)

This change means that, in practice, a user i will never post a content of opinion θ when the difference $| θ - b_{i} |$ is greater than one. Since each user is initialized with an opinion of either 1 or $-$ 1, they will not post any content whose θ has a different sign to their opinion unless it has changed since the beginning of the simulation.

Then, we set the opinion variation $Δ = 0.1$ and feed size $α = 1$ and proceed to understand how these features impact the growth of cascades of content posting by systematically testing how parameters react to each other, especially the recommendation control ϕ and innovation probability μ.

To produce Fig. 4a and b, we fix each value of μ and a range of 41 values of ϕ between 0 and π. For each pair of μ and ϕ, we run 500 simulations of $10^{6}$ iterations each and annotate what is the maximum cascade size resulting from each. The solid line represents the average of the 500 maximum cascade sizes, and the shades are the corresponding SD added or subtracted from the average.

We obtain Fig. 4c and d by fixing $μ = 0.2$ , three values of ϕ and running 100 simulations for $10^{6}$ iterations each. Each value of ϕ then generates a large number of cascades: 9,372,911 cascades for $ϕ = 0$ , 8,901,320 cascades for $ϕ = 3 π / 8$ and 5,090,826 cascades for $ϕ = π / 2$ . These are binned through a thousand values of content opinion θ, which we then use to extract a maximum cascade size. Finally, the maximum cascade sizes for each bin of θ are represented together through smoothed lines via the Savitzky–Golay filter (28).

Finally, Fig. 5 considers a range of 41 values of μ, from 0.1 to 0.9, and the three previously tested values of ϕ (0, $3 π / 8$ , and $π / 2$ ). At each point given by a combination of parameters, we run 1,000 simulations.

Model calibration task

Our model is calibrated via a search in a multidimensional space made of four parameters, which are μ, ϕ, Δ, and $n_{iter}$ . In particular, we minimize the Kolmogorov–Smirnov (KS) distance via a single-objective genetic algorithm from the implementation in Ref. (35) by considering the feed size α as a fixed parameter when the heuristic is initialized. The values of α are selected considering the KS statistic obtained for a range up to $α = 45$ , found in the Supporting Information S4.

Since the datasets are a small sample (36) of the activity in subsets of the Twitter platform retrieved by the first version of the API, it is a challenging task to accurately estimate the parameters a priori due to their dependence on time. We cannot know how the volume of activity represented in the sample maps to the actual one in the system at the time. In essence, it is not trivial to match one simulation iteration to a time scale in the real world.

As such, two of our parameters encapsulate the time frame we are trying to match: the number of iterations $n_{iter}$ , but also the opinion variation Δ as it modulates how intense changes in user opinion are. Intuitively, a high value of Δ means fewer iterations are needed for the system to reach stable configurations. Once there is a time window set by $n_{iter}$ and Δ, the innovation probability μ is fundamental to fix the maximum cascade size.

Finally, the recommendation control ϕ, mentioned back in section “The agent-based model” as related to how users receive posts of different opinions, can drive the system towards various states of opinion polarization (as observed in previous works (16, 24)).

Let d be the empirical distribution of cascade sizes from data and s the sample distribution of cascade sizes obtained from the simulation. As each dataset has a fixed distribution of cascade sizes, let the empirical cumulative distribution function of d be given by $F_{d} (x)$ . As for the sample distribution s, it is parameterized with the innovation probability μ, the recommendation control ϕ, the opinion variation Δ, and the number of iterations $n_{iter}$ .

We are then interested in finding the simulation parameters that minimize the Kolmogorov–Smirnov (KS) statistic (37) between the two samples. That is, the model calibration task is

\underset{μ, ϕ, Δ, n_{iter}}{minimize} sup_{x} | F_{d} (x) - F_{s} (x, μ, ϕ, Δ, α, n_{iter}) |

(6a)

subject to 0.01 \leq μ \leq 0.99

(6 b)

0 \leq ϕ \leq π

(6 c)

0 \leq Δ \leq 1

(6 d)

1.5 \times 10^{5} \leq n_{iter} \leq 6.5 \times 10^{5}

(6 e)

where $F_{s} (x, μ, Δ, ϕ, α, n_{iter})$ is the empirical cumulative distribution of cascade sizes from simulations parameterized accordingly.

After fixing a value of α, which we obtain by testing the KS statistic for different ranges of parameters (see Supporting Information S4), we obtain an optimal set of parameters with the genetic algorithm. From this set of parameters, Fig. 6 is produced by comparing the empirical cascade size tail distribution in the datasets to the one produced by averaging 100 simulations with the set of parameters.

These simulations are executed using the empirical network structure from the dataset, with opinions initialized uniformly at random between the entire interval [ $-$ 1,1]. Each iteration follows the description of section “The agent-based model,” including the posting filter given by Eq. 1.

Supplementary Material

pgaf402_Supplementary_Data

pgaf402_supplementary_data.pdf^{(2.9MB, pdf)}

Contributor Information

Kleber Andrade Oliveira, Social Dynamics Research Lab, Department of Psychology, University of Limerick, Limerick V94 T9PX, Ireland.

Henrique Ferraz de Arruda, ARAID Foundation, Av. de Ranillas 1-D, planta 2a, oficina B, Zaragoza 50018, Spain; Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Mariano Esquillor, s/n, Zaragoza 50018, Spain.

Yamir Moreno, Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Mariano Esquillor, s/n, Zaragoza 50018, Spain; Department of Theoretical Physics, Faculty of Sciences, University of Zaragoza, Pedro Cerbuna, 12, Zaragoza 50009, Spain.

Supplementary Material

Supplementary material is available at PNAS Nexus online.

Funding

K.A.O. was partially supported by the European Union through ERC grant (ID-COMPRESSION, grant number: 101124175). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. Y.M. was partially supported by the Government of Aragón, Spain, and “ERDF A way of making Europe” through grant E36-23R (FENOL), and by Ministerio de Ciencia, Innovación y Universidades, Agencia Española de Investigación (MICIU/AEI/ 10.13039/501100011033) grant no. PID2023-149409NB-I00. We acknowledge the use of the computational resources of COSNET Lab at Institute BIFI, which was funded by Banco Santander (grant Santander-UZ 2020/0274) and the Government of Aragón (grant UZ-164255). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author Contributions

H.F.A. and K.A.O.: conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing-original draft, writing-review and editing. Y.M.: conceptualization, investigation, visualization, methodology, writing-original draft, writing-review and editing.

Preprints

This manuscript was posted on a preprint: https://arxiv.org/abs/2410.17151.

Data Availability

The data used in this article was published by Minici et al. (2022) [DOI:10.1145/3511808.3557253] and are available in a GitHub repository at https://github.com/mminici/Echo-Chamber-Detection. They were downloaded on 2023 Feb 21 (Brexit dataset) and 2023 Mar 17 (VaxNoVax dataset). A Python library named DOCES (Dynamical Opinion Clusters Exploration Suite) (38) implements the model described in our paper and is available at https://github.com/hfarruda/doces.

References

1. Jamieson KH, Cappella JN. Echo chamber: rush Limbaugh and the conservative media establishment. Oxford University Press, New York, NY, USA, 2010. [Google Scholar]
2. Arguedas AR, Robertson C, Fletcher R, Nielsen R. 2022. Echo chambers, filter bubbles, and polarisation: a literature review. Technical report, Oxford, England.
3. Bak-Coleman JB, et al. 2021. Stewardship of global collective behavior. Proc Natl Acad Sci U S A. 118(27):e2025764118. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Lorenz-Spreen P, Oswald L, Lewandowsky S, Hertwig R. 2023. A systematic review of worldwide causal and correlational evidence on digital media and democracy. Nat Hum Behav. 7(1):74–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Terren L, Borge-Bravo R. 2021. Echo chambers on social media: a systematic review of the literature. Rev Commun Res. 9:99–118. https://www.rcommunicationr.org/index.php/rcr/article/view/16. [Google Scholar]
6. Kubin E, von Sikorski C. 2021. The role of (social) media in political polarization: a systematic review. Ann Int Commun Assoc. 45(3):188–206. [Google Scholar]
7. Deffuant G, Neau D, Amblard F, Weisbuch G. 2000. Mixing beliefs among interacting agents. Adv Complex Syst. 3(01n04):87–98. [Google Scholar]
8. Hegselmann R, Krause U. 2002. Opinion dynamics and bounded confidence: models, analysis and simulation. J Artif Soc Soc Simul. 5(3). [Google Scholar]
9. Galam S. 2002. Minority opinion spreading in random geometry. Eur Phys J B-Condens Matter Complex Syst. 25(4):403–406. [Google Scholar]
10. Noorazar H, Vixie KR, Talebanpour A, Hu Y. 2020. From classical to modern opinion dynamics. Int J Mod Phys C. 31(7):2050101. [Google Scholar]
11. Chen G, et al. 2017. Deffuant model on a ring with repelling mechanism and circular opinions. Phys Rev E. 95(4):042118. [DOI] [PubMed] [Google Scholar]
12. Holme P, Newman MEJ. 2006. Nonequilibrium phase transition in the coevolution of networks and opinions. Phys Rev E. 74(5):056108. [DOI] [PubMed] [Google Scholar]
13. Blex C, Yasseri T. 2022. Positive algorithmic bias cannot stop fragmentation in homophilic networks. J Math Sociol. 46(1):80–97. [Google Scholar]
14. Sasahara K, et al. 2021. Social influence and unfollowing accelerate the emergence of echo chambers. J Comput Soc Sci. 4(1):381–402. [Google Scholar]
15. Han W, Feng Y, Qian X, Yang Q, Huang C. 2020. Clusters and the entropy in opinion dynamics on complex networks. Phys A Stat Mech Appl. 559:125033. [Google Scholar]
16. Ferraz de Arruda H, et al. 2022. Modelling how social network algorithms can influence opinion polarization. Inf Sci. 588:265–278. [Google Scholar]
17. Valensise CM, Cinelli M, Quattroci‘occhi W. 2023. The drivers of online polarization: fitting models to data. Inf Sci. 642(51):119152. [Google Scholar]
18. Weng L, Flammini A, Vespignani A, Menczer F. 2012. Competition among memes in a world with limited attention. Sci Rep. 2(1):335. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Lerman K, Ghosh R. 2010. Information contagion: an empirical study of the spread of news on digg and twitter social networks. Proc Int AAAI Confer Web Soc Med. 4(1):90–97. [Google Scholar]
20. Oliveira DFM, Chan KS, Leonel ED. 2018. Scaling invariance in a social network with limited attention and innovation. Phys Lett A. 382(47):3376–3380. [Google Scholar]
21. Notarmuzi D, Castellano C, Flammini A, Mazzilli D, Radicchi F. 2022. Universality, criticality and complexity of information propagation in social media. Nat Commun. 13(1):1308. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Luca Ciampaglia G, Flammini A, Menczer F. 2015. The production of information in the attention economy. Sci Rep. 5(1):9452. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Minici M, Cinus F, Monti C, Bonchi F, Manco G. Cascade-based echo chamber detection. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22. Association for Computing Machinery, New York, NY, USA; 2022. p. 1511–1520.
24. Ferraz de Arruda H, Oliveira KA, Moreno Y. 2024. Echo chamber formation sharpened by priority users. iScience. 27. 10.1016/j.isci.2024.111098. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Pfister R, Schwarz KA, Janczyk M, Dale R, Freeman J. 2013. Good things peak in pairs: a note on the bimodality coefficient. Front Psychol. 4. 10.3389/fpsyg.2013.00700. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Kokoska S, Zwillinger D. CRC standard probability and statistics tables and formulae, student edition. 1st ed. CRC Press, Boca Raton, FL, USA, 2000. 10.1201/b16923. [DOI] [Google Scholar]
27. Hosking JRM. 2018. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Series B Stat Methodol. 52(1):105–124. [Google Scholar]
28. Savitzky A, Golay MJE. 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem. 36(8):1627–1639. [Google Scholar]
29. Crane R, Sornette D. 2008. Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci U S A. 105(41):15649–15653. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Karsai M, Jo H-H, Kaski K. 2017. Bursty human dynamics, SpringerBriefs in Complexity. 1st ed. Springer Cham, Cham, Switzerland, 10.1007/978-3-319-68540-3. [DOI] [Google Scholar]
31. Wang S, Chen X, Xiao Z, Szolnoki A, Vasconcelos VV. 2023. Optimization of institutional incentives for cooperation in structured populations. J R Soc Interface. 20(199):20220653. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Cimpeanu T, Santos FC, Moniz Pereira L, Lenaerts T, Anh Han T. 2022. Artificial intelligence development races in heterogeneous settings. Sci Rep. 12(1):1723. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Lu Y, Aleta A, Du C, Shi L, Moreno Y. 2024. Llms and generative agent-based models for complex systems research. Phys Life Rev. 51(e2313925121):283–293. [DOI] [PubMed] [Google Scholar]
34. Peixoto TP. 2014. The graph-tool python library. figshare. 10.6084/m9.figshare.1164194. [DOI]
35. Blank J, Deb K. 2020. pymoo: multi-objective optimization in Python. IEEE Access. 8:89497–89509. [Google Scholar]
36. Morstatter F, Pfeffer J, Liu H, Carley K. 2021. Is the sample good enough? Comparing data from twitter’s streaming API with twitter’s firehose. Proc Int AAAI Confer Web Soc Med. 7(1):400–408. [Google Scholar]
37. Corder GW, Foreman DI. Nonparametric statistics: a step-by-step approach. 2nd ed. John Wiley & Sons, Nashville, TN, USA, 2014. [Google Scholar]
38. Ferraz de Arruda H, Andrade Oliveira K, Moreno Y. 2025. Dynamical opinion clusters exploration suite: modeling social media opinion dynamics. SoftwareX. 30(4):102136. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Peixoto TP. 2014. The graph-tool python library. figshare. 10.6084/m9.figshare.1164194. [DOI]

Supplementary Materials

pgaf402_Supplementary_Data

pgaf402_supplementary_data.pdf^{(2.9MB, pdf)}

Data Availability Statement

[pgaf402-B1] 1. Jamieson KH, Cappella JN. Echo chamber: rush Limbaugh and the conservative media establishment. Oxford University Press, New York, NY, USA, 2010. [Google Scholar]

[pgaf402-B2] 2. Arguedas AR, Robertson C, Fletcher R, Nielsen R. 2022. Echo chambers, filter bubbles, and polarisation: a literature review. Technical report, Oxford, England.

[pgaf402-B3] 3. Bak-Coleman JB, et al. 2021. Stewardship of global collective behavior. Proc Natl Acad Sci U S A. 118(27):e2025764118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B4] 4. Lorenz-Spreen P, Oswald L, Lewandowsky S, Hertwig R. 2023. A systematic review of worldwide causal and correlational evidence on digital media and democracy. Nat Hum Behav. 7(1):74–101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B5] 5. Terren L, Borge-Bravo R. 2021. Echo chambers on social media: a systematic review of the literature. Rev Commun Res. 9:99–118. https://www.rcommunicationr.org/index.php/rcr/article/view/16. [Google Scholar]

[pgaf402-B6] 6. Kubin E, von Sikorski C. 2021. The role of (social) media in political polarization: a systematic review. Ann Int Commun Assoc. 45(3):188–206. [Google Scholar]

[pgaf402-B7] 7. Deffuant G, Neau D, Amblard F, Weisbuch G. 2000. Mixing beliefs among interacting agents. Adv Complex Syst. 3(01n04):87–98. [Google Scholar]

[pgaf402-B8] 8. Hegselmann R, Krause U. 2002. Opinion dynamics and bounded confidence: models, analysis and simulation. J Artif Soc Soc Simul. 5(3). [Google Scholar]

[pgaf402-B9] 9. Galam S. 2002. Minority opinion spreading in random geometry. Eur Phys J B-Condens Matter Complex Syst. 25(4):403–406. [Google Scholar]

[pgaf402-B10] 10. Noorazar H, Vixie KR, Talebanpour A, Hu Y. 2020. From classical to modern opinion dynamics. Int J Mod Phys C. 31(7):2050101. [Google Scholar]

[pgaf402-B11] 11. Chen G, et al. 2017. Deffuant model on a ring with repelling mechanism and circular opinions. Phys Rev E. 95(4):042118. [DOI] [PubMed] [Google Scholar]

[pgaf402-B12] 12. Holme P, Newman MEJ. 2006. Nonequilibrium phase transition in the coevolution of networks and opinions. Phys Rev E. 74(5):056108. [DOI] [PubMed] [Google Scholar]

[pgaf402-B13] 13. Blex C, Yasseri T. 2022. Positive algorithmic bias cannot stop fragmentation in homophilic networks. J Math Sociol. 46(1):80–97. [Google Scholar]

[pgaf402-B14] 14. Sasahara K, et al. 2021. Social influence and unfollowing accelerate the emergence of echo chambers. J Comput Soc Sci. 4(1):381–402. [Google Scholar]

[pgaf402-B15] 15. Han W, Feng Y, Qian X, Yang Q, Huang C. 2020. Clusters and the entropy in opinion dynamics on complex networks. Phys A Stat Mech Appl. 559:125033. [Google Scholar]

[pgaf402-B16] 16. Ferraz de Arruda H, et al. 2022. Modelling how social network algorithms can influence opinion polarization. Inf Sci. 588:265–278. [Google Scholar]

[pgaf402-B17] 17. Valensise CM, Cinelli M, Quattroci‘occhi W. 2023. The drivers of online polarization: fitting models to data. Inf Sci. 642(51):119152. [Google Scholar]

[pgaf402-B18] 18. Weng L, Flammini A, Vespignani A, Menczer F. 2012. Competition among memes in a world with limited attention. Sci Rep. 2(1):335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B19] 19. Lerman K, Ghosh R. 2010. Information contagion: an empirical study of the spread of news on digg and twitter social networks. Proc Int AAAI Confer Web Soc Med. 4(1):90–97. [Google Scholar]

[pgaf402-B20] 20. Oliveira DFM, Chan KS, Leonel ED. 2018. Scaling invariance in a social network with limited attention and innovation. Phys Lett A. 382(47):3376–3380. [Google Scholar]

[pgaf402-B21] 21. Notarmuzi D, Castellano C, Flammini A, Mazzilli D, Radicchi F. 2022. Universality, criticality and complexity of information propagation in social media. Nat Commun. 13(1):1308. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B22] 22. Luca Ciampaglia G, Flammini A, Menczer F. 2015. The production of information in the attention economy. Sci Rep. 5(1):9452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B23] 23. Minici M, Cinus F, Monti C, Bonchi F, Manco G. Cascade-based echo chamber detection. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22. Association for Computing Machinery, New York, NY, USA; 2022. p. 1511–1520.

[pgaf402-B24] 24. Ferraz de Arruda H, Oliveira KA, Moreno Y. 2024. Echo chamber formation sharpened by priority users. iScience. 27. 10.1016/j.isci.2024.111098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B25] 25. Pfister R, Schwarz KA, Janczyk M, Dale R, Freeman J. 2013. Good things peak in pairs: a note on the bimodality coefficient. Front Psychol. 4. 10.3389/fpsyg.2013.00700. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B26] 26. Kokoska S, Zwillinger D. CRC standard probability and statistics tables and formulae, student edition. 1st ed. CRC Press, Boca Raton, FL, USA, 2000. 10.1201/b16923. [DOI] [Google Scholar]

[pgaf402-B27] 27. Hosking JRM. 2018. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Series B Stat Methodol. 52(1):105–124. [Google Scholar]

[pgaf402-B28] 28. Savitzky A, Golay MJE. 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem. 36(8):1627–1639. [Google Scholar]

[pgaf402-B29] 29. Crane R, Sornette D. 2008. Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci U S A. 105(41):15649–15653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B30] 30. Karsai M, Jo H-H, Kaski K. 2017. Bursty human dynamics, SpringerBriefs in Complexity. 1st ed. Springer Cham, Cham, Switzerland, 10.1007/978-3-319-68540-3. [DOI] [Google Scholar]

[pgaf402-B31] 31. Wang S, Chen X, Xiao Z, Szolnoki A, Vasconcelos VV. 2023. Optimization of institutional incentives for cooperation in structured populations. J R Soc Interface. 20(199):20220653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B32] 32. Cimpeanu T, Santos FC, Moniz Pereira L, Lenaerts T, Anh Han T. 2022. Artificial intelligence development races in heterogeneous settings. Sci Rep. 12(1):1723. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pgaf402-B33] 33. Lu Y, Aleta A, Du C, Shi L, Moreno Y. 2024. Llms and generative agent-based models for complex systems research. Phys Life Rev. 51(e2313925121):283–293. [DOI] [PubMed] [Google Scholar]

[pgaf402-B34] 34. Peixoto TP. 2014. The graph-tool python library. figshare. 10.6084/m9.figshare.1164194. [DOI]

[pgaf402-B35] 35. Blank J, Deb K. 2020. pymoo: multi-objective optimization in Python. IEEE Access. 8:89497–89509. [Google Scholar]

[pgaf402-B36] 36. Morstatter F, Pfeffer J, Liu H, Carley K. 2021. Is the sample good enough? Comparing data from twitter’s streaming API with twitter’s firehose. Proc Int AAAI Confer Web Soc Med. 7(1):400–408. [Google Scholar]

[pgaf402-B37] 37. Corder GW, Foreman DI. Nonparametric statistics: a step-by-step approach. 2nd ed. John Wiley & Sons, Nashville, TN, USA, 2014. [Google Scholar]

[pgaf402-B38] 38. Ferraz de Arruda H, Andrade Oliveira K, Moreno Y. 2025. Dynamical opinion clusters exploration suite: modeling social media opinion dynamics. SoftwareX. 30(4):102136. [Google Scholar]

PERMALINK

Mechanistic interplay between information spreading and opinion polarization

Kleber Andrade Oliveira

Henrique Ferraz de Arruda

Yamir Moreno

Roles

Abstract

Significance Statement.

Introduction

The agent-based model

Fig. 1.

Opinion polarization measures

Results

Innovation probability introduces new states

Fig. 2.

Fig. 3.

Simulating an information-limiting environment

Fig. 4.

Fig. 5.

Model calibration against data

Fig. 6.

Discussion and conclusion

Materials and methods

Datasets

Parameter characterization

Information-limiting environment experiment

Model calibration task

Supplementary Material

Contributor Information

Supplementary Material

Funding

Author Contributions

Preprints

Data Availability

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases