Skip to main content
PLOS One logoLink to PLOS One
. 2024 Mar 11;19(3):e0300179. doi: 10.1371/journal.pone.0300179

Associative linking for collaborative thinking: Self-organization of content in online Q&A communities via user-generated links

Noa Sher 1,*, Sheizaf Rafaeli 1,2
Editor: Carlos Henrique Gomes Ferreira3
PMCID: PMC10927134  PMID: 38466733

Abstract

Virtual collaborative Q&A communities generate shared knowledge through the interaction of people and content. This knowledge is often fragmented, and its value as a collective, collaboratively formed product, is largely overlooked. Inspired by work on individual mental semantic networks, the current study explores the networks formed by user-added associative links as reflecting an aspect of self-organization within the communities’ collaborative knowledge sharing. Using eight Q&A topic-centered discussions from the Stack Exchange platform, it investigated how associative links form internal structures within the networks. Network analysis tools were used to derive topological indicator metrics of complex structures from associatively-linked networks. Similar metrics extracted from 1000 simulated randomly linked networks of comparable sizes and growth patterns were used to generate estimated sampling distributions through bootstrap resampling, and 99% confidence intervals were constructed for each metric. The discussion-network indicators were compared against these. Results showed that participant-added associative links largely led to networks that were more clustered, integrated, and included posts with more connections than those that would be expected in random networks of similar size and growth pattern. The differences were observed to increase over time. Also, the largest connected subgraphs within the discussion networks were found to be modular. Limited qualitative observations have also pointed to the impacts of external content-related events on the network structures. The findings strengthen the notion that the networks emerging from associative link sharing resemble other information networks that are characterized by internal structures suggesting self-organization, laying the ground for further exploration of collaborative linking as a form of collective knowledge organization. It underscores the importance of recognizing and leveraging this latent mechanism in both theory and practice.

Introduction

The plethora of environments promoting knowledge sharing, accumulation, and transformation are described by the umbrella term “virtual knowledge collaborations” [1, 2]. Platforms for virtual knowledge collaboration act as a medium for transferring knowledge or ideas from the minds of individuals to the collective level, via the sharing of digital artifacts [3, 4]. The amount of knowledge and thought shared within these communities is immense, and they hold great potential for the generation of integrated knowledge [57]. Virtual knowledge collaborations span from small-group task-focused settings such as academic courses to large-scale persistent discussions comprising thousands of participants. This work focuses on the prevalent and fast-growing phenomenon of large-scale collaborative knowledge-sharing discussions, a form of persistent conversation, aimed at sharing, distributing, and creating knowledge. These have become an inherent component of knowledge creation and innovation in the educational, academic, professional, and civic arenas [811].

A distinction can be made between two kinds of virtual knowledge collaboration environments. On the one hand, environments are explicitly aimed at producing shared integrated artifacts, like Wikipedia or collaborative academic virtual environments in which students produce a group product. On the other hand, discussion-centered environments revolve around sharing knowledge through open discussions or in Q&A formats, that do not strive to create a unified product [12]. Despite this, participants discussion centered environments have been found to actively engage in forms of content integration, and this appears as a substantial component of the knowledge contribution behavior [1]. In collaborative constructions of unified products, participants explicitly negotiate as a part of the collaborative knowledge construction process either through reiterated modifications of the product or through interaction with each other [13, 14]. In open discussion or Q&A settings, however, the digitally mediated discourse itself becomes the substrate for the collaborative formation of new meanings [15] and the co-creation of knowledge is reflected in the interaction between the contributors and in the mutual integration of content posted by others [1]. As these discussion-centered environments lack top-down organization and integration of knowledge, emergent organizing mechanisms based on collective participant activity are an instrumental part of the collaborative process [4, 1618]. They are also one of its notable merits, as they allow for novel combinations to appear and so enable collective creativity [17, 1921]. Establishing the associatively-linked discussions’ network qualities is a necessary milestone in the road toward identifying and studying mechanisms for user-based content organization within the complex environment of multi-participant discussions. The current study aimed to demonstrate this point by presenting the structures formed within Q&A discussions’ linked networks that emerge as a result of the users’ practice of incorporating associative links within their responses. From a topological perspective, these links can transform the otherwise fragmented topology of the discussion into a networked one, enabling the emergence of complex organizing structures such as connected sub-networks, clustering, and densely connected modules [22, 23]. In an individual cognitive setting, the emergence of macro-level structures based on local semantic associations can be considered an indication of self-organization [24]. The paper builds on this notion and explores the network formations within the collective content formed through collaborative associative linking of Q&A-based virtual knowledge collaborations, and lays the ground for further study of these networks. This could add to cumulative research regarding the self-organizing aspect of such collaborations, which has mainly concentrated on the underlying social networks, as well as other forms of knowledge organization such as tagging networks. In that, it illuminates another layer of the interactivity within large-scale virtual knowledge collaborations.

Theory and conceptual framework

Self-organization as a component of knowledge-sharing behavior in social Q&A environments

Q&A community participants post individually, rather than work together to create a uniform collective output such as a wiki page [12]. Nevertheless, participants in these discussions continuously interact with each other through the content they contribute, and integrate knowledge or ideas from others’ content into their own [1]. In their most basic format, social Q&As consist of a collection of distinct questions that can receive multiple answers, typically revolving around a topic that defines the discussion’s ontological environment. The discussion network comprises question posts, out of which answers branch out, sometimes topped with an additional comments layer. Q&A platforms often form interactive communities: participants interact with each other and with the content at several levels in an ongoing process [25]. While the participants’ straightforward engagement with the discussion comprises either posting questions or answering others’ questions, cumulative research suggests that they are also engaged in structuring and organizing the discussion content for the good of the community. Organizing efforts can include explicit activities of introducing shared meanings such as categorizing [25], or tagging [4, 17, 26, 27]. They can also include social activities that contribute to the discussion’s shaping by enhancing the salience of certain questions or answers, e.g. voting or rating [28]. The current study turns the spotlight on another type of content-organizing activity in Q&A discussions which we refer to as cross-linking: the connecting of knowledge units of question-answers (-comments) sets within a discussion through cross-referencing links between posts by embedding URLs of other questions within the answers [4, 22]. These are applied by participants to share knowledge and insight regarding relations between different knowledge units and to point out other relevant content within the discussion [16, 29]. Cross-links improve the navigability of the discussion by creating meaningful paths for the participants to follow and by directing them toward relevant knowledge. In that, their application represents another layer of interaction with content that goes beyond replying, upvoting, and other more recognized interactions: the sharing of associations. This is consistent with a more contemporary view of the participants, not only as content providers but also as path designers who facilitate navigation routes among the ocean of content, for the benefit of their peers [30]. Cross-linking can be regarded as an organic bottom-up means for organizing shared knowledge by discussion participants: as opposed to content networks produced based on textual similarities, the cross-linked network reflects intentional meaning-making and taps into an important portion of the group’s knowledge, which is the explicit knowledge about the associative, logical or causal connections between the units which compose it [31]. Accordingly, cross-linking can be viewed as a higher-order element of interaction with content, one that impacts the structure of the collective knowledge space and contributes to its organization for the community’s benefit [1618, 22]. While Q&A discussions do not produce an explicit collective product but are rather a collection of individual contributions, this work aims to demonstrate that by linking related knowledge units together, the community molds this otherwise fragmented collection into an integrative shared product that can be perceived holistically as a network.

From a topological perspective, cross-links can transform the topology of the discussion from a collection of detached tree-shaped formations of questions-answers-comments into a networked one, comprising a "network of networks". While the sub-network composing each knowledge unit has a fully hierarchical topology ("tree-shape"), the overarching "network of networks" is not hierarchical, and so it enables the emergence of complex organizing structures typically found in other information networks such as citation networks or social-reaction networks [32]. These include large connected sub-networks, local clustering, and densely connected modules [21, 22].

Fig 1 depicts a discussion graph shaped by cross-links, compared to a threaded hierarchical discussion comprising only sequential links.

Fig 1.

Fig 1

A schematic graph of a threaded hierarchical discussion (a) vs. a cross-linked discussion (b).

To demonstrate the effects of the links and highlight their role in shaping Q&A discussions’ network topologies, we 1) compare indicators of organizing structures in the networks of eight different Q&A discussions from the Stack Exchange platform, with the distribution of comparable indicators extracted from simulated null models with random links replacing the original links that were produced by the participants and 2) review how the results of these comparisons evolve over time.

The current study adds to existing work in several ways. First, it addresses large-scale discussions each consisting of thousands of posts over several years, occurring within real-world settings. Previous research on cross-link networks has either been based on academic environment small-scale discussions [21, 29] or excerpts from large networks, concentrating on the networks surrounding specific knowledge units [22]. Studies addressing entire discussions or larger portions of discussions have explored other types of edges such as shared tagging [4, 17, 27], social interactions [28, 33, 34], or textual similarities [35]. Second, using the distribution of indicators extracted from comparable null models with random links of identical size has allowed us to examine the cross-links’ unique role in shaping the topology of the knowledge network and so establishing them as a form of self-organization that relies on user knowledge-sharing behavior. Third, reviewing several different discussions allowed for the exploration of this phenomenon across different networks [27], produced by different communities, and so demonstrating differences between discussions in the topology. Fourth, adding a timeline to the analyses offers insight into the trends that this process may follow, including the relations between evolving network topologies and external events that influence the content of the discourse through the participants’ newly formed associations. This view of cross-linked Q&A discussions also has practical implications, as it can be used for extracting shared meanings formed by explicitly added connections.

The topological indicators of emergent organizing network structures

Within Q&A communities, “knowledge units” comprising questions and answers connected through cross-links form networks that can be studied using a network analysis approach [22]. By extracting network metrics, we attempted to demonstrate the emergence of topological organizing structures that would indicate a collaborative structuring of the collective knowledge artifact. The network metrics were used for evaluating network properties and for discovering emergent structures, by comparing them with equivalent metrics that were extracted from null models in which random links replaced the cross-links (see Methods and Materials section for more details).

The properties that were investigated were:

  1. Defragmentation of the networks. Cross-links can create connected subgraphs of knowledge units that are viewed by participants as conceptually related (“connected components”). They can also bridge across more loosely related content through various associations and connect subgraphs to an overarching component. This can increase the integration and reduce the fragmentation of the discussion network. The indicators that were used for assessing defragmentation were the size of the largest sub-network of connected units (“giant component”) in comparison to a randomized null model, and the prevalence of large connected components comprising ten knowledge units or more.

  2. The rise of local focal points. While most units have little or no connections to other units, some of the knowledge units become notably more connected than others and emerge as local focal points of the discussion. The indicators for the emergence of local focal points were the number of units with a degree of at least 4 (i.e., connected to at least four other knowledge units) and the maximal number of units connected to a single unit—both compared to equivalent metrics in a randomized null-model. Units that are associated with multiple other units can be regarded as local focal points. The distribution of the indicators extracted from the simulated random networks of similar density was used to verify whether these focal points are an artifact of link distribution and would be typically found in random networks with a similar edge density or rather a trait of the associative-link networks. The emergence of local focal points within the collective product of the discussion represents an emergent hierarchy in which some knowledge units are more central than others, allowing for more informed navigation of the knowledge space and facilitating the extraction of prominent themes.

  3. Micro-level organization. Within network organizations of content, such as the individual semantic lexicon or Wikipedia, concepts that are associated with each other often share other associations as well [36, 37]. This is a form of micro-level organization because it results in easier navigation between closely related concepts. In network terms, this is expressed in increased triadic closure, which means that if two units are connected, and one of them is connected to a third unit, there is an increased probability that the other will also be connected to that unit. This is indicated by a higher clustering coefficient, compared to that of a randomly connected network [38]. The absolute number of triangles was calculated as an additional indicator as the clustering coefficient which is more commonly used is dependent on the potential number of triangles.

  4. Modular Formation. Many real-world networks are modular in structure: smaller clusters of nodes are more densely connected amongst themselves in comparison to their connections to the network as a whole [39]. This creates a complex hierarchical organization of the network [38, 40] and formulates mesoscopic structures within the network that carry important information at an intermediate scale [41]. Relatively high modularity within the connected subgraphs would thus indicate another level of emergent organization, as well as replicate and elaborate on findings from Ye et al. that demonstrated high modularity within a subgraph centered around a “seed” node within the StackOverflow network [22]. Notably, within the individual human mind, a modular organization of the semantic lexicon, or knowledge of the world, is considered more adaptive in terms of accessing relevant information [36]. If the cross-links increase modularity, this could be another indication that the participants’ linking activity assists in facilitating effective navigation for other community members.

Table 1 summarizes the network properties and their corresponding metrics:

Table 1. Operationalization of network properties.

property operationalization within the network Graph metrics
Defragmentation and convergence Forming of large connected components • Size of largest connected components
• % of nodes in components>10
The emergence of local focal points Knowledge units with multiple connections • % of nodes with degree≥4
• Max degree
Micro-level organization Clustering • Global clustering coefficient
• The absolute number of fully connected triplets of nodes (triangles)
Mezzo-level organization Module formation: distinct areas of denser connectivity within the connected subgraphs • Modularity index

Notably, in the associatively linked knowledge unit networks, a relatively small number of units became connected at all. The share of these varies between different communities (see Materials and Methods section for details). This characteristic that seems inherent to the domain has implications for network density and hence for network indicators that are related to network density, such as the degree distribution and the share of the largest connected component. The benchmarks that were used here to indicate network properties were created based on the characteristics of the networks within the sample that was examined. Consequently, comparisons were made to simulated randomly linked networks with similar link densities (see Materials and Methods section for more details).

Another important point is that while, as explained above, each discussion can be perceived as a "network of networks" with the knowledge units of question-answers-comments as sub-networks, the indicators were extracted for the overarching network consisting of whole knowledge units rather than for the full network containing the answers and comments branch-outs. While this simplification of the network loses some of the information, such as the number of each question’s answers and comments indicating its collective perceived significance, it does not affect the topology of the cross-linked network and its emergent structures, which were at the focus of the current work.

Research hypotheses

Based on these, we hypothesized that

H1: Metrics representing a complex internal structures such as the percentage of knowledge units with multiple connections, the global clustering coefficient and absolute number of triangles, the size of the largest connected components, and the prevalence of large connected components of the networks will be increased compared to the same metrics produced based on corresponding null-models with random links. To test this, distributions of the metrics extracted from 1000 iterations of simulated randomly linked networks were produced and the indicators of the real networks were compared to confidence intervals formed on the basis of corresponding estimated sampling distributions that were created using bootstrap resampling based on the metrics drawn from these iterations.

H2: The connected subgraphs formed by the links will display a modular structure. This would indicate that the networks formed by the participants’ cross-linking of knowledge units display an emergence of complex organizing structures, indicating that the participants’ cross-linking acts as a self-organiztion mechanism. We also hypothesized that discussions will vary in the extent of these properties. This would indicate that the subsequent network topology is related to the specific characteristics of each community and the content domain.

The combination of H1 and H2 was designed to support the claim that the cross-links introduced by Q&A community participants form organizing structures within the discussion network that may facilitate navigation and so create a unique form of information network. If the cross-links affect the network structure in a way that contributes to its defragmentation and introduces emergent internal hierarchies this could imply that the participants collectively engage in a collaborative bottom-up structuring of the virtual knowledge collaboration environment.

To further address the emergent nature of the network structures, we analyzed the topological attributes of the discussion graphs at weekly intervals and compared them to the distribution of indicators extracted from multiple iterations of simulated respective null models with randomized cross-links. This comparison allowed for the monitoring of the effects of the cross-links on the discussion network as it evolves and might provide insight into the self-organizing quality of the networked discussions. Consequently, we hypothesized that:

H3: Over time, with the progression of the discussion, the disparity between the topological parameters of the real discussion graphs and their respective null models with randomized cross-links, will tend to increase. This would indicate an ongoing process of organization of content into a collective artifact that derives from cross-linking activity.

Material and methods

The Stack Exchange community discussions

The study examined online multi-participant discussions from Stack Exchange [42]. Stack Exchange is a network of more than a hundred question-answering (Q&A) topic-specific personal interest communities, covering a wide variety of topics such as History, Physics, Earth Science, Parenting, Beer, and many more. The largest and most known Stack Exchange community, called Stack Overflow, is centered around programming. Overall, the platform engages millions of people in posting questions, while others answer them asynchronously. Communities vary in size, but many include thousands of content-contributing participants and many thousands more passive participants, and span over nearly a decade. Researchers have studied Stack Exchange sites, and especially Stack Overflow, for various purposes. Among those, data from the discussions were used to discover the evolution of topic trends in a developer community [43, 44], to predict answer quality [45] or user participation [46], to identify experts [47], to recommend solutions to programming errors [48], and to analyze social interactions inside the cooperative community [49, 50]. The collaborative tagging system applied within the communities and its implications have also been the focus of previous work [17, 26, 27].

The data from eight Q&A discussions were downloaded from the Stack Exchange Data Dump, which hosts the entire history of every Stack Exchange community on August 24, 2021 [51]. These datasets, stored in XML format, include questions, answers, and comments, and also include specific information about linked questions: questions that the participants noted as referring to each other, by embedding hyperlinks in their answers. The data was converted to CSV format using RStudio (2020). The collection and analysis method complied with the terms and conditions for the use of Stack Exchange Data Dump data under the cc-by-sa 4.0 license, with specific limitations listed on the Stack Exchange Data Dump site. The datasets can be found at the OSF registries at https://doi.org/10.17605/OSF.IO/9XVNM.

The communities were selected arbitrarily but with some guidelines: First, all eight of the communities revolve around a shared interest in a knowledge domain (“earth science”, “history”), as opposed to some communities that revolve e around a shared practice (e.g. “graphic design”, “beer brewing”). Second, the communities that were selected ranged between 5000 and 25000 questions and spanned over at least five full years. The data used here was downloaded in August 2021 and includes the entire content of each discussion from its onset (determined by the date of the earliest post) to June 7, 2021. The onsets of the different discussions that were analyzed vary, and range between 2011–2015. Table 2 presents the general properties of the discussions, including the community topic, the time of onset of the discussion (based on the date of the first questions posted), the discussion’s duration (in days), the number of participants, the number of knowledge units comprising a question along with its set of answers and comments, the total number of cross-links and the node:link ratio, and the percent of knowledge units that are linked to at least one other knowledge unit.

Table 2. Community discussion stats.

Topic Onset Duration (days) Number of participants Number of knowledge units No. of cross-links [cross-link:node ratio] % of linked knowledge units
Politics 04/12/2012 3098 3853 12776 3378 [0.26] 33.5%
Philosophy 05/04/2011 3708 6573 16003 4070 [0.25] 28%
Psychology and Neuroscience 06/06/2011 3645 3784 7277 1790 [0.25] 28%
Earth Science 15/04/2014 2615 2998 5650 1018 [0.18] 24%
History 05/05/2011 3678 5273 12829 1865 [0.14] 21%
Health 31/03/2015 2551 3810 7189 669 [0.09] 13.5%
Biology 14/12/2011 3657 11190 25986 3749 [0.14] 19.5%
Economics 18/11/2014 2385 5510 11710 1135 [0.1] 14%

Methodology

Constructing the network graphs

Network graphs are composed of nodes, that represent objects or entities, and of links that connect between them. In this work, we constructed network graphs based on the data from eight Stack Exchange sites, each covering a specific domain. Each of these websites can be viewed as a discussion held within a specified domain of content, in which new questions (posts) are posted independently, while answers (and comments) are posted in response to questions. In constructing the network for the discussions that we analyzed, each question, along with its answers and comments was regarded as a single “knowledge unit” for analysis purposes [22], and these comprised the networks’ nodes. Administrative posts were not included. The edges consisted of the cross-links that were embedded within any component of the knowledge units in the form of URLs that refer to other knowledge units within the same site. These links are documented in the site’s data archive (see archives at https://archive.org/details/stackexchange). We refer to these as cross-links. Although the cross-links are created from one question to another, once they are added the website displays the questions as linked. A bi-directional navigation path between them is then formed, with each question directing to the other question’s page. Once posted, navigation along graph posts through their links is bi-directional. Participants can move from a post to any other post connected to it by clicking, regardless of the original direction of the link. Accordingly, we decided to treat the network as undirected. On a more theoretical level, the association between concepts and ideas is a two-way street. This research views the discussion as a holistic product of interconnected knowledge and does not address the specific directionality of links between knowledge units.

Using the R igraph package [52], we constructed 8 undirected unweighted network graphs. A negligible number of edges were redundant, that is, the same two questions were connected more than once, probably by different participants. These cases were rare and removed from the graphs. Questions that were removed by the community were also excluded from the graphs. While participants define most cross-links as indicating questions to be “related”, a minority of them are labeled “duplicates” (between 6–14% in the discussions we analyzed). The “duplicate” cross-links were preserved from a standpoint that views them as representing a specific kind of relationship between knowledge units, which is part of the network.

The null-model graphs

In network-based research, null models are a method for generating the patterns expected from the data in the absence of the process of interest, as a relevant point for comparison [53]. According to Farine, a good null model meets two demands: 1) the aspect of the network that is of most interest to us is randomized, while 2) the model should strive to maintain all other aspects of the data constant. In the case of the current work, the process of interest is the effect of cross-linking on network topology, as a collaborative mechanism for the emergent organization of knowledge. Therefore, the null models we used were created by replacing the “cross-links” that the participants formed, with random links. Notably, adding random links to a network affects topology, for instance in terms of connectedness and the forming of connected components [23]. Using such models as an anchor for comparison allowed us to extract the effects of the intentionally constructed links that are the result of participant activity. To better simulate the creation of the real networks, which were formed gradually, we extracted the size parameters of the real networks, i.e. the number of nodes and the number of links, for each day from the discussions’ initiation. The randomly linked network graphs for each day were then constructed based on the graph from the prior day, by adding to the existing network nodes and edges equal in number to the daily addition of new nodes and cross-links, the difference being that the additional edges were assigned randomly. Importantly, cross-links in the discussions are not necessarily created simultaneously with nodes, as these consist of entire knowledge-units made up of a question and its set of answers, if it has any. Links are embedded within the answers, and these can be added to the question at any point. Meanwhile new questions are formed independently of answers, and not all question-answer knowledge units become linked at all. To mimic the process of cross-linking, pairs of nodes within the "daily" simulated graph were randomly selected to become linked. As in the case of the real cross-linked networks, new links could not be redundant: they could only be introduced between nodes that have not been previously linked. So, each simulated "day" produces a network that is based on that of the previous day. Accordingly, growth pattern of the randomly linked graphs approximately mimics that of the real ones, the difference being the random assignment of the new links as opposed to associative links added by participants. This process of creating same-sized graphs with randomly linked nodes was iterated 1000 times for each community’s network, resulting in 1000 simulated networks per community. Each of these graphs was identical to the corresponding real graph in terms of size and number of links and differed only in the assignment of links within the network, as random connections replaced the real ones. Next, for each of the 1000 simulated networks, network metrics were calculated based on the topology of the network on the final simulated “day”. This was repeated for each of the communities so that each real-network topological metric could be compared against an estimated sampling distribution that was created using bootstrap resampling based on the 1000 corresponding metrics extracted from the simulated null-models for each of the metrics. Next, 99% confidence intervals were generated for each sampling distribution, to indicate the interval within which each metric would be likely to fall within given a randomly linked network. The real metrics were then compared against this interval, as a metric falling beyond the confidence interval would indicate with a high probability (>99%) that it would not likely have been produced within a randomly linked network of the same size, link distribution, and growth rate. In turn, this would enable us to reject the null hypothesis and to conclude that the cross-linked networks assume qualities of emergent organizing structures that are common in other real-world information networks. Using bootstrap resampling to create the estimated sampling distribution was a means of coping with the need to compare a single real-world exemplar (the cross-linked discussion network) with a relevant simulated distribution of randomly linked null-model, by creating a confidence interval based on iterated simulations [54].

Constructing graph networks for the chronological analysis

For each of the graphs, a series of “snapshots” were created, representing the discussion at the end of each full week since its initiation. Topological metrics were then calculated for each weekly “snapshot”, similar to the previous section. The metrics included: the number of triangles, the maximal degree of a single knowledge unit, and the size of the largest connected component. Next, using the same method described above, 100 graphs with the cross-links between knowledge units replaced with random ones were generated for each week. The metrics were calculated for each of the randomized-links graphs, resulting in 100 weekly series for each metric.

Results

Network topology metrics

As explained above, for each of the discussions, 1000 different networks were generated, composed of an identical number of nodes and links, and with a similar daily development in terms of graph size and the number of links.

Indicators for the properties of the discussion network were calculated by comparing the real network metric to the distribution of the metrics produced from the randomized network graphs. Table 3 describes these metrics:

Table 3. Network metrics.

Metric Calculation
Global clustering coefficient the rate of closed triplets out of all triplets of connected nodes in the network, or the rate of nodes that are linked to each other, given that they are both linked to a third node. The clustering coefficient is reflective of the graphs transitivity [32], meaning in this case the likelihood of a content-unit associated with two other content-units to become a triad linked by association. Note that if pairs of connected units are scarce, then this rate can be very high regardless of the total number of triads in the graph.
Number of triangles The overall number of fully connected triplets, or closed triangles. This complements the global clustering metric as it captures the actual number of associated triads.
max. degree The maximal number of cross-links for a single knowledge-unit node
Degree≥4 The number of knowledge units (nodes) with at least 4 cross-links
size of the largest connected component The size (number of nodes) of the largest weakly connected subgraph, i.e. the largest portion of the network in which all nodes can be reached from all other nodes
Nodes in large components The percentage of knowledge units (nodes) that are part of weakly connected components consisting of at least 10 nodes

For each of the discussions, the following metrics were calculated using the R packages igraph [52] and CINNA [55]: the size of the largest connected component, number of triangles, global clustering coefficient, maximal degree, rate of nodes with a degree of four and over. These were calculated for each of the real discussions, and the corresponding 1000 network graphs with randomized cross-links. For each metric, an estimated sampling distribution was created using bootstrap resampling, based on the sample of the 1000 randomized graphs. Next, the probability for the metric extracted from the real graph to be part of this sampling distribution was calculated. This was done using the R package boot [54, 56]. Table 4 presents the metrics of the real discussion networks, along with the confidence interval (CI) containing an estimated 99% of the distribution of the corresponding metrics from the randomized networks.

Table 4. Network metrics and their corresponding confidence intervals (CIs).

community size of the largest component (% of nodes) % of nodes in components≥10 Number of triangles Clustering coefficient max. degree Degree ≥4
politics 322 (2.5%) [308, 359] 11.35 [9.82,9.9] 180 [0.12,0.33] 0.11 [0.0001, 0.0003] 18 [7.1, 7.4] 252 [31, 31.9]
philosophy 2261 (14.1%) [621, 651] 14.96 [9.69, 9.76] 200 [0.26, 0.34] 0.06 [0.0002, 0.0003] 35 [7.6, 7.8] 406 [42,43]
psychology and neuroscience 625 (8.6%) [355,373] 11.13 [9.56, 9.66] 173 [0.3,0.4] 0.14 [0.0005, 0.0007] 22 [7.1, 7.3] 174 [21, 21.7]
Earth science 41 (0.7%) [44,47] 5.22 [3.89, 3.99] 55 [0.07, 0.11] 0.15 [0.0003, 0.0005] 10 [5.7,5.8] 56 [4.6, 4.9]
history 79 (0.6%) [23, 24] 3.48 [1.19, 1.24] 49 [0.02, 0.05] 0.096 [0.0001, 0.0002] 9 [5.3, 5.4] 75 [2.9, 3.2]
health 53 (0.7%) [12.4,13.1] 1.6 [0.15, 0.17] 22 [0.01, 0.04] 0.11 [0.0002, 0.0006] 8 [4.3, 4.4] 32 [0.4,0.5]
biology 1392 (5.3%) [34.8, 36.6] 7.18 [1.96, 2] 529 [0.03, 0.07] 0.08 [0.00004, 0.00009] 85 [5.9,6] 336 [9.5,10]
economics 46 (0.4%) [16,17] 222 [39.5, 42.9] 33 [0.009, 0.03] 0.08 [0.00007, 0.0002] 20 [4.8, 4.9] 47 [1,1.7]

In brackets: 99% confidence interval for metric in randomized graphs

The findings presented in Table 4 point out the disparity between the real graphs’ network metrics and the randomized null-model metrics, as most of the real-graph measures fall well outside their corresponding confidence intervals.

The difference in the topological metrics indicates that the networks shaped by the participants’ cross-links differ in topological properties from randomly linked counterparts. These properties include:

1) A higher rate of clustering was indicated by an increased clustering coefficient as well as a very significant increase in the absolute number of formed triangles. In all the networks, the average number of triangles across the randomized graphs was near zero, meaning that in most of the randomized graphs, no fully connected triplets were formed. Meanwhile, in the real networks, the number of triangles ranged from 22 for the Health community, to 529 for the Biology community, and their number was proportionate to the network’s size (Pearson’s r = 0.86). The increased clustering suggests a micro-level organization of the discussion content. While the research did not include systematic content analysis, it is plausible that linking between two knowledge units that share a connection to a third unit points to the linking of units that are similar in content or revolve around a specific sub-topic. This would also include questions that were viewed as duplicated. Fig 2 presents two examples of such connections from the History community network.

Fig 2. Closed triplets.

Fig 2

Excerpts from the History community network, containing closed triplets, with the question titles. Green edges represent links indicating very similar knowledge units (i.e. “duplicated”). See the S1 Appendix for links to original posts (S1 Appendix).

2) Forming of local focal points (Fig 3), indicated by an increase in the maximum number of connections held by a single node in the real graphs compared to the random graphs, and by a higher rate of nodes with four connections or more. In all the networks, the most connected knowledge units in the real graphs had significantly more connections than their counterparts in the randomized graphs. This suggests a form of knowledge units that are more connected and more central within the network than would have formed randomly, even in the case of a similar daily growth rate. These might be points of interest within the community, knowledge units that refer to more central concepts in the discussion, questions that were repeated multiple times and signaled as being duplicates, or any combination of these. The findings regarding the overall presence of more units with a degree of four and above imply that this was not limited to one central unit, but rather, to a phenomenon of multiple local focal points that formed across the network.

Fig 3. Local focal points.

Fig 3

Fig 3(A)–3(C) present examples of such local focal points, from the Economics discussion and the Psychology and Neuroscience discussion. See the S1 Appendix for links to original posts (S1 Appendix).

3) Defragmentation and convergence, indicated by an increase in the share of knowledge units that belong to connected components consisting of ten or more nodes, which was found in all eight discussions, and the forming of significantly larger connected components in comparison to the randomized networks, which were found in six of the eight discussions (Fig 4). These included the Economics network in which two large components of similar size were formed (one consisting of 46 nodes and the other of 43). Both were not likely to form within randomly linked networks of the same weekly size and density (the 99% confidence interval for the size of the largest connected component was: [16, 17]). The two remaining discussions included the Politics community and the Earth Science community. In the Politics discussion, the size of the largest connected component was within the confidence interval for the sizes of the largest components produced for the randomly linked graphs. This means that a connected component of similar size was likely to form for the same amount of daily knowledge units and cross-links introduced into the network, even if these were assigned randomly. In the Earth Science discussion network, the size of the largest component was significantly smaller than the size of the components found in the randomized networks. This indicates that the cross-links were not driving the network subgraphs to grow or converge into a larger graph as much as would occur within a network produced by adding random links to the network. Notably, the Earth Science community was the smallest of the eight communities observed, both in the size of the network of knowledge units and in the number of participants.

Fig 4. Large connected components.

Fig 4

Components with 10 or more nodes in the real networks compared to networks with randomly produced links. In color: components of 10 nodes or more.

Modularity

The modularity index was calculated for each network’s largest connected component based on the Louvain algorithm [57], using the cluster_louvain function implemented in the igraph R package. The analysis focused on large connected components as the modularity of the whole network is very high by definition, due to the networks’ sparsity. The maximal-modularity index Q calculated by the Louvain algorithm ranges between -1 and 1, with values above zero indicating higher inner connectivity among nodes within modules than with nodes from other modules [58]. Table 5 features the modularity indices for the largest connected components from each network.

Table 5. Modules and modularity of the largest connected components.

component size number of modules modularity
politics 322 20 0.887
philosophy 2261 40 0.911
psychology and neuroscience 625 21 0.89
Earth science 41 6 0.636
history 79 8 0.751
health 53 8 0.7
biology 1392 38 0.876
economics 6 46 0.657

The findings presented in Table 5 indicate that the connected components maintain a highly modular inner structure.

Fig 5 displays the Philosophy network’s largest connected component of 2261 nodes. The colors represent different modules.

Fig 5. The modular structure of the Philosophy network’s largest connected component.

Fig 5

Chronological analysis findings

Findings regarding the difference in network metrics suggest that the real networks were different in topology compared to their counterparts with randomized cross-links. To assess whether these differences grew, shrunk, or persisted over time and with the development of the discussion network, we used the weekly metrics that were described in the Materials and Methods section. These included calculations of the randomly linked networks’ largest connected component, the maximal degree of a single node, and the number of closed triangles, at “weekly” intervals. Note that these were simulations of networks that were constructed based on the real weekly characteristics of the networks representing the Stack Exchange discussions. So, for each of the eight communities, for each of the three metrics, the data included 100 series of “weekly” observations. Each series was in the length, in weeks, of the duration of the discussion at the time the data was downloaded. This ranged from 322 for the shortest-lived community (Health) to 530 for the longest (Philosophy). All of the metrics increase with the growth of the networks. This is true for both randomly linked networks and real networks, with the exception that for some of the randomly linked networks, triangles were not formed at any point. However, in all the networks, triangles were formed in at least some of the randomly linked networks, so overall this too had a slightly positive trend. The point of the chronological analysis was to explore whether the growth of the metrics for the real networks differed significantly from that of the randomly linked networks. To test this, we analyzed the data in the following manner: based on the 100 weekly observations for each metric from the randomly linked networks, we created a generalized linear model with the bias-corrected and accelerated (BCa) interval [59]. This method generates a bootstrapped confidence interval for the model coefficient, by randomly sampling and resampling “individual subjects”, calculating the individual regression coefficient, and using these to construct an estimated sampling distribution out of which a confidence interval for the model’s coefficient is extracted. This non-parametric analysis is appropriate for longitudinal (hence dependent) data, as in our case. Following, the regression coefficients were calculated for the metrics drawn from the real networks. Table 6 displays the linear regression coefficients for each of these metrics’ chronological week-by-week growth, and in brackets, the corresponding BCa 99% confidence interval for the coefficients of the randomly linked network metrics (CI). Higher coefficients indicate a steeper trendline or faster growth.

Table 6. Weekly growth regression coefficients for network metrics and the 99% confidence intervals for the regression coefficients of the randomly linked networks.

Community No. of weeks size of the largest component max. degree The number of triangles
regression coefficient [CI] regression coefficient [CI] regression coefficient [CI]
politics 443 0.48 [0.47, 0.6] 0.03 [0.008, 0.009] 0.37 [0.0002, 0.0008]
philosophy 529 4.41 [0.89, 1.12] 0.067 [0.0074, 0.0086] 0.38 [0.0002, 0.0007]
psychology and neuroscience 520 1.22 [0.68, 0.79] 0.037 [0.009, [0.01 0.31 [0.0002, 0.0007]
Earth science 373 0.094 [0.098, 0.13] 0.01 [0.006, 0.008] 0.14 [0.0001, 0.0006]
history 525 0.14 [0.035, 0.043] 0.013 [0.006, 0.008] 0.099 [0, 0.0002]
health 323 0.123 [0.015, 0.019] 0.016 [0.003, 0.004] 0.056 [0, 0.0002]
biology 523 3.34 [0.065, 0.079] 0.2 [0.09, 0.01] 1.061 [0.00002, 0.0003]
economics 340 0.11 [0.026, 0.032] 0.045 [0.004, 0.005] 0.08 [0, 0.00007]

* p<0.05

**p<0.01

The regression coefficients for the changes over time in the metrics of the real networks were then compared to the simulated distribution of the corresponding coefficients for metrics from the randomly linked networks. This comparison revealed that the growth of the number of triangles and the maximal degree of a single node was, with high probability, faster in the real networks than would have been in randomly linked networks with a similar growth rate in terms of the number of nodes and links. This was found for every one of the eight communities in the study, indicating that the real networks tended to cluster more and produce specific nodes with more connections over time in comparison to randomly linked networks growing at the same pace in terms of the number of nodes and links.

As for the size of the largest connected component, this was found for six of the eight communities: the coefficients for the Philosophy, Psychology, and Neuroscience, History, Health, Biology, and Economics communities fell well above their corresponding CIs, indicating a faster growth of the largest component compared to those of the randomly linked networks. However, this was not the case for the Earth Science and the Politics communities, which also had a smaller largest component at the time of the data collection compared to their randomly linked counterparts. The coefficient for the Politics community’s largest component linear growth rate fell within the CI for the randomly linked networks, while the Earth Science’s coefficient was slightly below the CI, indicating a slower growth of the largest component than would be expected from a randomly linked network of similar size and link density. A closer look at the growth of the largest components over time in the different communities reveals different patterns among communities: while in some communities the growth appears gradual and nearly linear, as a dominant large component grows by receiving new connected knowledge units, in other communities it grows in abrupt steps when two or more large components become connected at once within a linking peak. The Politics network belongs to the second category. An even closer look reveals that these peaks are correlated with election dates in English-speaking countries (especially the US but also the UK and Canada). Another such community is the Health community, from which a giant component emerged only in March 2020, with the rise of the COVID-19 pandemic. In addition, the weekly size of the largest component was determined by extracting the largest component from the network snapshot at the end of each week. While it is likely that for the most part, the same components gained volume and remained the largest ones throughout the progression of the discussion, it is possible that for some of the networks, two or more components were growing at similar rates, and hence the identity of the weekly largest component alternated between weeks, at least in the earlier weeks. Subsequently, it appears that the nature of the growth of large connected components and their relations to content and external occurrences needs to be approached specifically and receive more extensive research. Fig 6 displays the weekly growth of the real networks’ largest connected components vs. the distribution of the largest connected components of the simulated randomly linked networks.

Fig 6. Weekly growth charts of the networks’ largest connected components.

Fig 6

The weekly growth of the largest connected components for each network. In red is the real network, and in shades of gray are the simulated random-link networks.

Discussion

In this work, we used network structural qualities to examine the role of user-generated cross-links as a self-organizing mechanism in large online Q&A discussions. Our cumulative findings indicate that cross-links affect the structure of the discussion network, at the micro, mezzo, and macro levels and do so to a greater extent than would be predicted by a null model with random links replacing the genuine ones. The discussion networks tended to demonstrate a more complex topology compared to their randomized counterparts. This was indicated by increased clustering, the forming of larger connected components, and the emergence of more highly connected nodes in the networks that were generated based on the real discussion data, in comparison to networks that were constructed by replacing the cross-links with an equal amount of random links in a process that approximately mimics the formation of the discussion network in terms of growth. We further demonstrated that these effects are generally either maintained or enhanced over time and with the progression of the discussion and that their magnitude varies across communities and levels of the development process of communities. The connected subgraphs themselves were found to be of a highly modular structure. By comparing the cross-linked networks with corresponding randomly-linked networks, we were able to more effectively single out the impact of these purposeful links, compared to the general effects of the forming of an evolving network in terms of topology. Examining eight different communities operating on the same platform provided a further view of how different linking behaviors affect the topologies of the discussions, even within similar conditions in terms of the platform’s affordances. The chronological analysis of the development of some of the network indicators over time provided another level for observation, and this analysis indicated that with the progression of the Q&A discussions, the difference between the real-graph networks and the randomized model networks tended to grow on all aspects measured. It also enabled observations on the relations between major events within the content domains and the topology of the discussions, which suggests that notable changes within the content domains (e.g. the eruption of the COVID-19 pandemic or major political events) are reflected in changes to discussion network topologies. The research adds to a growing literature that looks at Q&A discussions from a networked perspective, with a focus on the artifact layer of the network: the posts and the connections between them. The networks that were constructed were not social networks, in the sense that the network nodes consist of knowledge units and not of people. However, they were generated through the social activity of sharing knowledge, thoughts, and ideas regarding the connections between knowledge units. This emphasizes the important role that the participants’ sharing of their own associative network by actively connecting knowledge units can play in shaping and structuring the collective knowledge product, in a manner that facilitates self-organization. It also highlights some of the possibilities brought on by addressing more aspects of the collaboration, that extend beyond contributing content on the one hand and interacting socially on the other.

Limitations and directions for future research

Some limitations include the arbitrary selection of eight communities to analyze, which could be broadened to include a greater share of the 170+ communities operating within the Stack Exchange network, as well as other relevant platforms. Applying similar analyses to more communities could help to either generalize or identify constraints to the findings presented here and also to further explore the differences between communities. Another limitation is the focus on a single mechanism for self-organization, which is part of a more complex networked organizing system applied by the community that includes tagging as well as a text-based similarity algorithm that is applied by the platform. This was intentional, to highlight cross-linking as a mechanism that receives less attention. Still, future research could address the multiplex network that can be constructed by combining the different kinds of links, to formulate a more complete view of the knowledge network that emerges from the community’s collaboration. This relates to another limitation which is the focus on structural and topological qualities detached from the ontological and semantic qualities of the discussion. Further research should explore the relationship between the two to deepen the understanding of the interplay between them. This was done to a certain extent by Ye et al. regarding content from the Stack Overflow website, but the analysis was restricted to the content network extending from a single exemplary ‘seed’ unit [22]. Future work could apply topic modeling methods for semantic analyses of entire discussion networks, and then compare these with the network’s topology that is formed based on the cross-links. The cross-linked network itself could be elaborated to include the subnetworks composing the knowledge units, which include the question, answers, and comments, as a multi-layer network with two different types of edges–sequential links (question-answer-comment) and cross-links. Analyses of such networks would have to take into account the different types of nodes, i.e. questions, answers, and comments, and the different types of edges, i.e. cross-links vs. hierarchical sequential links.

As the study focused on the network of connected knowledge units, it is important to acknowledge that the majority of the knowledge units, in all of the discussions, did not become connected at all. The ratio of knowledge units that were linked to others ranged between 14% and 33.5% in the discussions that were examined. Many others were linked to only one other unit. This means that looking at the network formed by cross-links leaves out a large part of the content that was added by the community. Arguably, this could be viewed as a screening process that considers the knowledge units’ linkage as some indication of their value to the conversation as a whole. Further work could try to address this point by examining the factors affecting a unit’s likelihood of becoming connected at all, and specifically of becoming connected to a larger component or even becoming a local focal point with multiple connected units. As explained in the Theory and Conceptual Framework section, the link-to-node ratio of the networks has implications for the emergence of organizing structures. More connected networks would potentially lead to more complex structures. However, while the question of artificially incentivizing cross-linking within the Stack Exchange environment using badges has come up, it was dismissed. It seems that while the community appreciates the cross-links and they are considered substantial contributions, the main reason given for withholding from gamifying cross-linking was that it will create an influx of links, which will make their added value diminish [60]. This makes sense from a network perspective, as it might impact the organic nature of the networks’ development. Still, potentially raising the participants’ awareness of the value of cross-linking, not through artificially incentivizing but through modifying community norms and socialization might contribute to the forming of effectively organized content networks. Potentially, this point could be studied within these communities through collaborations with community moderators.

In the current design of the Stack Exchange discussions, the users have access to the benefits of the cross-links as “road signs”, directing them to relevant posts within the discussion network. This is realized in two ways. The first is the hyperlink that the repliers themselves embed within their answers or comments. The other is more salient and also bi-directional: the platform displays all linked questions to the side of the knowledge unit of questions, comments, and answers currently visited, regardless of the directionality of the link. This enables users to navigate the network based on the pathways their peers created for them. The role of the cross-links in forming mezzo-level and macro-level network structures, however, is not expressed in the platform’s user interface. Possibly, adding a feature that taps into this property and feeds the information about the current network structure back into the conversation through a visual associative network map of the discussion, could affect the ways users navigate it as well as the community’s perception of their knowledge sharing as a collective product. It could also highlight the participants’ role in organizing the shared content [16, 61]. These potential influences could be further explored. Aside from the potential effects of platform affordances that might have an impact on participant behavior, the role of external influences could also be further explored. The analysis of the growth of connected components over time pointed to the plausible possibility the discussion network’s topology may also be affected by external advancements in the community’s field of interest, with changes reflecting the emergence of new associations among subjects in real life.

Another intriguing direction to follow would be a further inquiry into the dissimilarities between different communities: what are some of the explanations for a higher or lower tendency to cross-link, as well as for changes in this tendency over time? This could be assisted by modeling and comparing the evolution of different discussions, and possibly creating a generalized model for this process. Exploring the motivations of participants for cross-linking could also help shed light on the process and its outcomes and understand its interplay with other aspects of knowledge-sharing behavior. For instance, there might be users who are more inclined to link than others. Previous work on smaller-scale discussions has suggested that similar to many online phenomena, a small percentage of participants have a significant influence on the shaping of the conversation using cross-linking [29].

The current work focused on links within a community. However, answers often contain links to external sources, either other communities within Stack Exchange or links to other websites. Creating an extended network that captures the connections among different communities as well as the external contexts of knowledge could be a fruitful direction.

Conclusion

In the current work, we applied a network approach to the content network created within online multi-participant discussions. This approach was used to demonstrate some of the effects of individual participants’ local actions within the network, which consist of posting and linking, on topological changes in the overall structure of the network. In eight different Q&A communities operating within the Stack Exchange platform, indicators for the emergence of complex structures were found. These point to heightened global cohesiveness and integration, as well as to both micro-level and mezzo-level organizations of the network. The work presented here highlights the role of cross-linking within large-scale online discussions, a largely overlooked form of interaction with knowledge.

The analysis applied to whole networks enabled a broader view of how local linking can have implications for the larger-scale organization of the discussion as a network. The findings suggest that by altering the network topology, the combination of posting and linking can act as a self-organizing mechanism that allows for the formation of complex, non-linear organizing structures. The findings endorse the perception of cross-linked posts within a virtual knowledge collaboration as a unique form of information networks [32], joining other recognized networks of information such as citation networks or recommender networks. It further strengthens the emerging perception that undermines the traditional distinction between the collective product itself and the activities that were undertaken to coordinate it [62]. So, while from the participant’s point of view, cross-links can act as a tool to navigate through the discussion, by moving from one knowledge unit to another on a path laid by a knowledgeable peer, from the community’s standpoint they may be a means for promoting the formation of a collective knowledge-network product. This highlights the role of sharing associative connections as yet another level of interaction with content in virtual knowledge collaborations for both path creation and collective knowledge organization. In an individual cognitive setting, the emergence of macro-level structures within networks of content via micro-level connections can be considered an indication of self-organization [24]. In a collaborative setting, these sorts of structures may operate as an apt basis for the emergence of novel knowledge and collaborative meaning-making [31]. From a broader perspective, the approach presented here is inspired by work on individual cognitive systems and applied to a collective product, formed through multiple interactions. This direction could be further developed to explore possible manifestations of other cognitive processes at the collective content level.

Supporting information

S1 Appendix. References for Stack Exchange questions mentioned in Figs 2 and 3.

(DOCX)

pone.0300179.s001.docx (16.2KB, docx)

Acknowledgments

We would like to acknowledge the valuable contributions of Dr. Amit Rechavi to this research. Dr. Rechavi’s expertise in network analysis and his input regarding the methodological aspects of the research were invaluable in the completion of this work. We would also like to thank Dr. Neta Gilat for her helpful statistical advice, and Dr. Carmel Kent for her valuable contribution to the conceptualizations that preceded the current work and inspired it.

Data Availability

The data underlying the results presented in the study are available at https://osf.io/9xvnm Originally downloaded from the Stack Exchange Data Dump at: https://archive.org/details/stackexchange.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Zhang Y, Zhang M, Luo N, Wang Y, Niu T. Understanding the formation mechanism of high-quality knowledge in social question and answer communities: A knowledge co-creation perspective. Int J Inf Manage. 2019;48: 72–84. doi: 10.1016/j.ijinfomgt.2019.01.022 [DOI] [Google Scholar]
  • 2.Khansa L, Ma X, Liginlal D, Kim SS. Understanding Members’ Active Participation in Online Question-and-Answer Communities: A Theory and Empirical Analysis. J Manag Inf Syst. 2015;32: 162–203. doi: 10.1080/07421222.2015.1063293 [DOI] [Google Scholar]
  • 3.Kimmerle J, Cress U, Held C. The interplay between individual and collective knowledge: Technologies for organisational learning and knowledge building. Knowl Manag Res Pract. 2010;8: 33–44. doi: 10.1057/kmrp.2009.36 [DOI] [Google Scholar]
  • 4.Dankulov MM, Melnik R, Tadi B. The dynamics of meaningful social interactions and the emergence of collective knowledge. Sci Rep. 2015;5: 1–10. doi: 10.1038/srep12197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Haythornthwaite C, Kumar P, Gruzd A, Gilbert S, Esteve del Valle M, Paulin D. Learning in the wild: coding for learning and practice on Reddit. Learn Media Technol. 2018;43: 219–235. doi: 10.1080/17439884.2018.1498356 [DOI] [Google Scholar]
  • 6.Faraj S, Jarvenpaa SL, Majchrzak A. Knowledge collaboration in online communities. Organ Sci. 2011;22: 1224–1239. doi: 10.1287/orsc.1100.0614 [DOI] [Google Scholar]
  • 7.Faraj S, von Krogh G, Monteiro E, Lakhani KR. Online community as space for knowledge flows. Inf Syst Res. 2016;27: 668–684. doi: 10.1287/isre.2016.0682 [DOI] [Google Scholar]
  • 8.Bacq S, Geoghegan W, Josefy M, Stevenson R, Williams TA. The COVID-19 Virtual Idea Blitz: Marshaling social entrepreneurship to rapidly respond to urgent grand challenges. Bus Horiz. 2020;63: 705–723. doi: 10.1016/j.bushor.2020.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Campbell CD, Challen B, Turner KL, Stewart MI. # DryLabs20: A New Global Collaborative Network to Consider and Address the Challenges of Laboratory Teaching with the Challenges of COVID-19. 2020. doi: 10.1021/acs.jchemed.0c00884 [DOI] [Google Scholar]
  • 10.Fritz S, Milligan I, Ruest N, Lin J. Building community at distance: a datathon during COVID-19 distance. 2020. doi: 10.1108/DLP-04-2020-0024 [DOI] [Google Scholar]
  • 11.Jandrić P. Postdigital Research in the Time of Covid-19. Postdigital Sci Educ. 2020; 233–238. doi: 10.1080/08039410.2020.1780714 [DOI] [Google Scholar]
  • 12.Guan T, Wang L, Jin J, Song X. Knowledge contribution behavior in online Q&A communities: An empirical investigation. Comput Human Behav. 2018;81: 137–147. doi: 10.1016/j.chb.2017.12.023 [DOI] [Google Scholar]
  • 13.Cress U, Kimmerle J. A systemic and cognitive view on collaborative knowledge building with wikis. Int J Comput Collab Learn. 2008;3: 105–122. doi: 10.1007/s11412-007-9035-z [DOI] [Google Scholar]
  • 14.Mayordomo RM, Onrubia J. Work coordination and collaborative knowledge construction in a small group collaborative virtual task. Internet High Educ. 2015;25: 96–104. doi: 10.1016/j.iheduc.2015.02.003 [DOI] [Google Scholar]
  • 15.Stahl G. Group cognition in computer-assisted collaborative learning. J Comput Assist Learn. 2005;21: 79–90. doi: 10.1111/j.1365-2729.2005.00115.x [DOI] [Google Scholar]
  • 16.Kent C, Rafaeli S. How interactive is a semantic network? Concept maps and discourse in knowledge communities. Proceedings of the Annual Hawaii International Conference on System Sciences. 2016. pp. 2095–2104. doi: 10.1109/HICSS.2016.265 [DOI] [Google Scholar]
  • 17.Andjelković M, Tadić B, Dankulov MM, Rajković M, Melnik R. Topology of innovation spaces in the knowledge networks emerging through questions-and-answers. PLoS One. 2016;11: 1–17. doi: 10.1371/journal.pone.0154655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Krohn R, Weninger T. Subreddit Links Drive Community Creation and User Engagement on Reddit. Proc Int AAAI Conf Web Soc Media. 2022;16: 536–547. doi: 10.1609/icwsm.v16i1.19313 [DOI] [Google Scholar]
  • 19.Yu L, Nickerson J V., Sakamoto Y. Collective Creativity: Where we are and where we might go. Proceedings of collective intelligence. 2012. Available: http://arxiv.org/abs/1204.3890 [Google Scholar]
  • 20.Kimmerle J, Moskaliuk J, Oeberst A, Cress U. Learning and Collective Knowledge Construction With Social Media: A Process-Oriented Perspective. Educ Psychol. 2015;50: 120–137. doi: 10.1080/00461520.2015.1036273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sher N, Kent C, Rafaeli S. Creativity Is Connecting Things: The Role of Network Topology in Fostering Collective Creativity in Multi-Participant Asynchronous Online Discussions. 53rd Hawaii International Conference on System Sciences. 2020. pp. 310–319. [Google Scholar]
  • 22.Ye D, Xing Z, Kapre N. The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow. Empir Softw Eng. 2017;22: 375–406. doi: 10.1007/s10664-016-9430-z [DOI] [Google Scholar]
  • 23.Heylighen F. Complexity and Self-organization. In: Laplante Phillip A., editor. Encyclopedia of Information Systems and Technology. CRC Press; 2015. pp. 250–259. [Google Scholar]
  • 24.Baronchelli A, Ferrer-i-Cancho R, Pastor-Satorras R, Chater N, Christiansen MH. Networks in Cognitive Science. Trends Cogn Sci. 2013;17: 348–360. doi: 10.1016/j.tics.2013.04.010 [DOI] [PubMed] [Google Scholar]
  • 25.Adamic LA, Zhang J, Bakshy E, Ackerman MS. Knowledge sharing and yahoo answers: Everyone knows something. Proceeding 17th Int Conf World Wide Web 2008, WWW’08. 2008; 665–674. doi: 10.1145/1367497.1367587 [DOI] [Google Scholar]
  • 26.Ojeda C, Cvejoski K, Sifa R, Bauckhage C. Inverse dynamical inheritance in stack exchange taxonomies. Proceedings of the 11th International Conference on Web and Social Media, ICWSM 2017. 2017. pp. 644–647. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85029457963&partnerID=40&md5=eccac3f3ac88551038ae2c711d227b2e [Google Scholar]
  • 27.Fu X, Yu S, Benson AR. Modelling and analysis of tagging networks in Stack Exchange communities. J Complex Networks. 2020;8: cnz045. doi: 10.1093/comnet/cnz045 [DOI] [Google Scholar]
  • 28.Rechavi A, Rafaeli S. Knowledge and social networks in Yahoo! Answers. Proc Annu Hawaii Int Conf Syst Sci. 2012; 781–789. doi: 10.1109/HICSS.2012.398 [DOI] [Google Scholar]
  • 29.Sher N, Kent C, Rafaeli S. How ‘Networked’ are Online Collaborative Concept-Maps? Introducing Metrics for Quantifying and Comparing the ‘Networkedness’ of Collaboratively Constructed Content. Educ Sci. 2020;10: 1–16. doi: 10.3390/educsci10100267 [DOI] [Google Scholar]
  • 30.Rodi GC, Loreto V, Servedio VDP, Tria F. Optimal learning paths in information networks. Sci Rep. 2015;5: 1–10. doi: 10.1038/srep10286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Heylighen F, Heath M, Van F. The Emergence of Distributed Cognition: a conceptual framework. Proceedings of collective intentionality IV. 2004. p. 20. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.201.9282%5Cnhttp://www.citeulike.org/group/1702/article/1632918 [Google Scholar]
  • 32.Newman M. Networks. Oxford: Oxford university press; 2018. [Google Scholar]
  • 33.Gašević D, Joksimović S, Eagan BR, Shaffer DW. SENS: Network analytics to combine social and cognitive perspectives of collaborative learning. Comput Human Behav. 2018;92: 562–577. doi: 10.1016/j.chb.2018.07.003 [DOI] [Google Scholar]
  • 34.Marchegiani L, Brunetta F, Annosi MC. Faraway, Not so Close: The Conditions That Hindered Knowledge Sharing and Open Innovation in an Online Business Social Network. IEEE Trans Eng Manag. 2022;69: 451–467. doi: 10.1109/TEM.2020.2983369 [DOI] [Google Scholar]
  • 35.Lee AVY, Tan SC. Temporal Analytics with Discourse Analysis: Tracing Ideas and Impact on Communal Discourse. Proc Seventh Int Learn Anal & Knowl Conf. 2017; 120–127. doi: doi:10.1145/3027385.3027386 [Google Scholar]
  • 36.De Deyne S, Kenett YN, Anaki D. F, M, Navarro DJ. Large-scale network representations of semantics in the mental lexicon. In: Jones MN, editor. Big Data in Cognitive Science. Routledge/Taylor & Francis Group; 2016. pp. 174–202. [Google Scholar]
  • 37.Muchnik L, Itzhack R, Solomon S, Louzoun Y. Self-emergence of knowledge trees: Extraction of the Wikipedia hierarchies. Phys Rev E—Stat Nonlinear, Soft Matter Phys. 2007;76: 1–12. doi: 10.1103/PhysRevE.76.016106 [DOI] [PubMed] [Google Scholar]
  • 38.Ravasz E, Barabási AL. Hierarchical organization in complex networks. Phys Rev E—Stat Physics, Plasmas, Fluids, Relat Interdiscip Top. 2003;67: 7. doi: 10.1103/PhysRevE.67.026112 [DOI] [PubMed] [Google Scholar]
  • 39.Newman MEJ, Girvan M. Finding and evaluating community structure in networks. 2003; 1–16. doi: 10.1103/PhysRevE.69.026113 [DOI] [PubMed] [Google Scholar]
  • 40.Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat Rev Neurosci. 2009;10: 186–198. doi: 10.1038/nrn2575 [DOI] [PubMed] [Google Scholar]
  • 41.Latora V, Nicosia V, Russo G. Complex Networks: Principles, Methods and Applications. Cambridge University Press; 2017. doi: 10.1017/9781316216002 [DOI] [Google Scholar]
  • 42.Stack Exchange. 2022. [cited 31 Jan 2022]. Available: https://stackexchange.com/ [Google Scholar]
  • 43.Barua A, Thomas SW, Hassan AE. What are developers talking about? An analysis of topics and trends in Stack Overflow. Empir Softw Eng. 2014;19: 619–654. doi: 10.1007/s10664-012-9231-y [DOI] [Google Scholar]
  • 44.Rosen C, Shihab E. What are mobile developers asking about? A large scale study using stack overflow. Empir Softw Eng. 2016;21: 1192–1223. doi: 10.1007/s10664-015-9379-3 [DOI] [Google Scholar]
  • 45.Anderson A, Huttenlocher D, Kleinberg J, Leskovec J. Discovering value from community activity on focused question answering sites: A case study of stack overflow. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. 2012; 850–858. doi: 10.1145/2339530.2339665 [DOI] [Google Scholar]
  • 46.Fugelstad P, Dwyer P, Filson Moses J, Kim J, Mannino CA, Terveen L, et al. What Makes Users Rate (Share, Tag, Edit…)? Predicting Patterns of Participation in Online Communities. Proceedings of the ACM 2012 conference on Computer Supported Cooperative Wor. 2012. pp. 969–978. doi: 10.1145/2145204.2145349 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Pal A, Chang S, Konstan JA. Evolution of experts in question answering communities. ICWSM 2012—Proceedings of the 6th International AAAI Conference on Weblogs and Social Media. 2012. pp. 274–281. [Google Scholar]
  • 48.Rahman MM, Yeasmin S, Roy CK. Towards a context-aware IDE-based meta search engine for recommendation about programming errors and exceptions. 2014 Softw Evol Week—IEEE Conf Softw Maintenance, Reengineering, Reverse Eng CSMR-WCRE 2014—Proc. 2014; 194–203. doi: 10.1109/CSMR-WCRE.2014.6747170 [DOI] [Google Scholar]
  • 49.Treude C, Barzilay O, Storey MA. How do programmers ask and answer questions on the web? (NIER track). Proceedings—International Conference on Software Engineering. 2011. pp. 804–807. doi: 10.1145/1985793.1985907 [DOI] [Google Scholar]
  • 50.Guerrouj L, Azad S, Rigby PC. The influence of App churn on App success and StackOverflow discussions. 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015—Proceedings. IEEE; 2015. pp. 321–330. doi: 10.1109/SANER.2015.7081842 [DOI] [Google Scholar]
  • 51.Stack Exchange Inc. Stack Exchange Data Dump. 2021. Available: https://archive.org/details/stackexchange [Google Scholar]
  • 52.Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal, complex Syst. 2006;1695: 1–9. [Google Scholar]
  • 53.Farine DR. A guide to null models for animal social network analysis. Methods Ecol Evol. 2017;8: 1309–1320. doi: 10.1111/2041-210X.12772 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Davison AC, Hinkley D V. Bootstrap Methods and Their Applications. Cambridge: Cambridge University Press; 1997. Available: http://statwww.epfl.ch/davison/BMA/ [Google Scholar]
  • 55.Ashtiani M. CINNA: Deciphering Central Informative Nodes in Network Analysis. 2019. Available: https://cran.r-project.org/package=CINNA [DOI] [PubMed] [Google Scholar]
  • 56.Canty A, Ripley BD. boot: Bootstrap R (S-Plus) Functions. 2021. [Google Scholar]
  • 57.Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech theory Exp. 2008;2008: P10008. [Google Scholar]
  • 58.Barabasi AL. NETWORK SCIENCE. Cambridge University Press.; 2016. [Google Scholar]
  • 59.Deen M, de Rooij M. ClusterBootstrap: An R package for the analysis of hierarchical data using generalized linear models with the cluster bootstrap. Behav Res Methods. 2020;52: 572–590. doi: 10.3758/s13428-019-01252-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Should we have a badge for users pointing out relevant related questions. In: Meta Stack Exchange [Internet]. 2016. [cited 23 Jul 2023]. Available: https://meta.stackexchange.com/questions/274461/should-we-have-a-badge-for-users-pointing-out-relevant-related-questions?noredirect=1&lq=1 [Google Scholar]
  • 61.Kent C, Rechavi A. Deconstructing online social learning: network analysis of the creation, consumption and organization types of interactions. Int J Res Method Educ. 2018; 1–22. doi: 10.1080/1743727X.2018.1524867 [DOI] [Google Scholar]
  • 62.Bolici F, Howison J, Crowston K. Stigmergic coordination in FLOSS development teams: Integrating explicit and implicit mechanisms. Cogn Syst Res. 2016;38: 14–22. doi: 10.1016/j.cogsys.2015.12.003 [DOI] [Google Scholar]

Decision Letter 0

Qaisar Shaheen

23 Jan 2023

PONE-D-22-31022Associative linking for collaborative thinking: self-organization of content in online Q&A communitiesPLOS ONE

Dear Dr. Sher,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 09 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Qaisar Shaheen, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please include additional information about how your Stack Exchange dataset was collected and ensure that you have included a statement specifying whether the collection and analysis method complied with the terms and conditions for the source of the data.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

Additional Editor Comments:

Revise the manuscript as per the reviewers Suggestions.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript “Associative linking for collaborative thinking: self-organization of content in online Q&A communities” shows the results of an analysis conducted on a Q&A dataset. The work highlights the topological properties of the corresponding network, obtained by considering knowledge units as nodes and cross-links mentioning other knowledge units es edges. In particular, the obtained networks are compared with random networks used as null models, showing that the topology of real networks is not reflected by the randomized counterpart.

The measures that are used for comparison are the size of largest component, the existence of hubs, clustering, and modularity.

The results that the authors find are not surprising and totally in line with previous works. It is indeed largely known in network science that random networks do not present clustering or other topological structures that are usually found in real networks [1,2,3].

For this reason I find a bit banal hypotheses H1 and H2, there was no reason to expect a different result.

I want to highlight that I found nothing wrong in the article, it is just that it should not be presented as a new discovery but just an analysis on a new dataset that confirms results that have been previously found on several types of real networks.

I suggest to reorganize the text and reformulate the scientific question as looking for a confirmation of known results in the Q&A network too.

[1] Newman M. Networks. Oxford university press, 2018.

[2] Barabasi A. Network Science. Cambridge University Press, 2016.

[3] Latora V., Nicosia V., and Russo G. Complex networks: principles, methods and applications. Cambridge University Press, 2017.

Minor:

line 121: it seems that a sentence’s end is missing

Line 152: “organic bottom-up means” —> “organic bottom-up mean”

- Line 140: The sentence “the network consists of content units composed of question posts and their corresponding answers, connected by cross-links embedded by participants within their answers.” is not clear enough to explain how networks are built because the cross-links are explained only later, in the Methodology section.

Reviewer #2: The presentation of the article is commendable but used to show some deficiencies that must be rectified before the decision, which are

Concern # 1: The article's abstract must reflect the literature article's recommendation.

Concern # 2: The background is too lengthy. Try to compress it.

Concern # 3: Elaborate the concept with suitable diagrams as visual things elaborate the concept far better than the textual.

Concern # 4: Expand the scalability of the article by including the research gap with prior work

Concern # 5: Read and cite the following articles

1.“Work coordination and collaborative knowledge construction in a small group collaborative virtual task”

2.“Faraway, Not So Close: The Conditions That Hindered Knowledge Sharing and Open Innovation in an Online Business Social Network”

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Dr. Rizwan ALi Shah

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 11;19(3):e0300179. doi: 10.1371/journal.pone.0300179.r002

Author response to Decision Letter 0


29 Mar 2023

Editor:

Additional information and data statement were added (p. 11)

Reviewer 1:

1. The results that the authors find are not surprising and totally in line with previous works. It is indeed largely known in network science that random networks do not present clustering or other topological structures that are usually found in real networks [1,2,3].

For this reason I find a bit banal hypotheses H1 and H2, there was no reason to expect a different result.

I want to highlight that I found nothing wrong in the article, it is just that it should not be presented as a new discovery but just an analysis on a new dataset that confirms results that have been previously found on several types of real networks.

I suggest to reorganize the text and reformulate the scientific question as looking for a confirmation of known results in the Q&A network too.

response: Thank you for this opportunity to clarify an important point regarding the goal of the research.

The goal of this work what was to use this known quality of real networks to make the point that the linked posts do form information networks, and that this is enabled by the users' individual linking activity.

The hypotheses addressed the user generated links' ability to introduce organization to the otherwise fragmented knowledge within the environment and to highlight the role of crosslinks within this process.

To further clarify this point and address the properties of the cross-link networks within the context of real-world networks, we've revised the phrasing of the hypotheses (See p. 9-10)

line 121: it seems that a sentence’s end is missing

response:

Thank you, this was a leftover from a previous version and was removed.

Line 152: “organic bottom-up means” —> “organic bottom-up mean”

response: Means as in means to an end.

- Line 140: The sentence “the network consists

of content units composed of question posts and their corresponding answers,

connected by cross-links embedded by participants within their answers.” is not

clear enough to explain how networks are built because the cross-links are

explained only later, in the Methodology section

response: This is an excellent point, the sentence was rephrased to address and explain cross-links explicitly:

In our case, the network consists of nodes comprising content units of question posts and edges based on cross-links, the hyperlinks embedded by participants within their answers that refer to other content units within the same discussion.

Suggested citations:

1] Newman M. Networks. Oxford university press, 2018.

[2] Barabasi A. Network Science. Cambridge University Press, 2016.

[3] Latora V., Nicosia V., and Russo G. Complex networks: principles, methods and applications. Cambridge University Press, 2017.

response: Thank you for your highly relevant suggestions. We've integrated both Newman (2018) and Latora et al. (2016) into the manuscript. Barabasi (2016) was already cited within the paper. The additions are as follows:

Newman was cited in the introduction and in the conclusion by integrating the cross-linked discussions into Newman's classification of "networks of information" . An important insight into the role played by communities within networks from the work by Latora et al. (2017) was integrated into the manuscript

Reviewer 2

The article's abstract must reflect the literature article's recommendation

response: The abstract was revised, hopefully the revisions reflect your input.

The background is too lengthy. Try to compress it.

response: The background has been shortened

Elaborate the concept with suitable diagrams as visual things elaborate the concept far better than the textual.

response: We've added another figure to illustrate the concept of cross-links

Expand the scalability of the article by including the research gap with prior work

response: See revised version of the paragraph beginning on line 119

Read and cite the following articles

1.“Work coordination and collaborative knowledge construction in a small group collaborative virtual task”

2.“Faraway, Not So Close: The Conditions That Hindered Knowledge Sharing and Open Innovation in an Online Business Social Network”

response: Thank you for the references, the papers have been integrated into the manuscript (p. 3 and p.5)

Attachment

Submitted filename: Response to Reviewers.docx

pone.0300179.s002.docx (19KB, docx)

Decision Letter 1

Carlos Henrique Gomes Ferreira

14 Jun 2023

PONE-D-22-31022R1

Associative linking for collaborative thinking: self-organization of content in online Q&A communities via user-generated links

PLOS ONE

Dear Dr. Sher,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I would like to emphasize the importance of addressing the points raised by reviewers 1 and 3 as minor revisions, as I genuinely believe that this will result in a manuscript of outstanding quality, which will be highly regarded and valued by the interdisciplinary community. Thoughtfully incorporating the reviewers' comments will help strengthen the potential of your article, making it even more impactful.

Reviewer 3, in particular, has identified critical issues that, if addressed to some extent, can significantly enhance the contribution of your work. I acknowledge the effort and dedication you have invested in this research and encourage you to consider these suggestions toward the manuscript's acceptance. I have full confidence that, by implementing the proposed improvements, the final outcome will be a high-quality manuscript of significant relevance.

I sincerely appreciate the ongoing efforts you will dedicate to enhancing the article, and I look forward to reviewing the revised version, which I am confident will meet the expectations of the scientific community.

Please submit your revised manuscript by Jul 29 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Carlos Henrique Gomes Ferreira, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

********** 

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

********** 

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

********** 

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

********** 

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

********** 

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In my previous review I highlighted that the results reported in this article do not appear as new nor surprising, they are actually quite expected. I hence suggested to make this more evident from the paper. The authors agreed with that, but the only change that the authors made with respect to my concern was to add a couple of sentences to hypotheses H1 and H2, while I would have expected a more substantial change in how the story is told. I suggest to modify the abstract and the introduction too in this direction. Also, it would be useful if the changes in the text were highlighted so as to easily identify them.

I also want to clarify that the references that I mentioned in my review were only meant to sustain my thesis that the results are not new, not a suggestion to include them in the article.

Reviewer #2: (No Response)

Reviewer #3: The very idea of the manuscript can be addressed, in my opinion, as follows: How cross-links (hyperlinks embedded by participants within their answers that refer to other content units within the same discussion) can shape Q&A network topology? It is clearly trying to comprehend Q&A's real-world networks, by means of the Complex Network science: Since these features, once identified, can be study or classified by using tools of this theory. Following the text, one can reckons that cross-links can act not as shortcuts, but bridges among different Q&A's contents. These cross-links are created by a participant by associating two unrelated contents. Then, the null synthetic network used to compare results is a random network, which in turn, has a probability to attached link among nodes. So that, authors are associating cross-links to links place by chance on the synthetic network counterpart.

Points:

1) In the point of view of network theory it is a "meta-population" (in the network theory jargon), since inside each sub-network exists a number of nodes associated with questions, answers and comments, and links among them. It was not take into account. With effect, it was handled simply as nodes connected by chance with other nodes, meaning random links among different contents. I do not think it is surprisingly or highlighted findings. The mapping of a real-network into a random one, in this context, do not correspond to the usual modeling coming from network science.

2) Why do not consider a cross-link, in the sense connecting different sub-networks, in the context of a "Network of networks"? It will sounds to be a more natural mapping, since the connectivity distribution of each sub-network seems to have relevant/substantial role in the receiving attention by participants to "create" a cross-link with another relevant sub-network. Thus, each layer, corresponding to a network with specific content, whereas a cross-link connects different layers.

3) I do not believe if the concept of a "hub" can be used on a connectivity distribution where the major number of nodes has a single connection, whereas the so-called "hubs" has four links... In Methodology section are described eight web sites corresponding to eight domain (network) of contents. Nodes are related to questions, besides answers and comments posted in response to questions. I believe that a more interesting question should be "what is the mathematical description for the connectivity distribution of cross-links of a Q&A networked system?", for example.

For example, let take this sentence:

"In all the networks, the most connected content units in the real graphs had significantly more connections than their counterparts in the randomized graphs."

I really do not understand the novel information that a magazine's reader can take from this... "The random modeling do not extract the fundamental mechanism of a Q&A real-network. So, its not random, because participants do not choose by chance what they will response, question or comment."

Another weak point:

"These included the Economics network in which two large components of similar size were formed (one consisting of 46 nodes and the other of 43)"

I was expecting some statistical treatment to this lower number of elements. I my opinion, with no data treatment a comparison with an statistical ensemble, provided by 1000 elements (null model), is doubtful since no strong fluctuations can permeate the network realizations. Remember that for random networks the average degree <k> is approximately k, but in the real-network counterpart not necessarely.

With all this considered, I do not recommend the current version of the manuscript for publication. I do not believe the work is able to attract readers attention.</k>

********** 

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Dr. Rizwan Ali Shah

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 11;19(3):e0300179. doi: 10.1371/journal.pone.0300179.r004

Author response to Decision Letter 1


28 Jul 2023

Response to reviewers

We would first like to thank you all for taking the time and making the effort to review this paper. Your feedback has been instrumental. We realize that some major points within the manuscript were not clear enough and needed further clarification. Mainly, it is important to emphasize that this research aimed to enrich our understanding of the collective creation of knowledge, by addressing the largely overlooked perspective of the shared product of virtual knowledge collaborations as a wholistic unit. While this was assisted by network science tools, these were not the main thrust of the paper in terms of novelty. We set out to point to the emergent networked nature of collectively created organizing structures, based on shared linking, and not to create a mathematical model for this phenomenon. While this could be a fascinating direction for future work, it is beyond the scope of the current one. We believe that the current literature has not given enough attention to the concept of shared linking as a means of creating a collective networked product, and its significance as a networked phenomenon has to first be explored, introduced, and acknowledged, before further investigation. Accordingly, the paper's aims and scope were further clarified. See the revised abstract, introduction, and discussion sections.

See more detailed responses to points raised within the reviews in the following table (all page numbers apply to the tracked-changes version of the manuscript):

Suggestion /Response

Reviewer 1:

In my previous review, I highlighted that the results reported in this article do not appear as new nor surprising, they are actually quite expected. I hence suggested to make this more evident from the paper. The authors agreed with that, but the only change that the authors made with respect to my concern was to add a couple of sentences to hypotheses H1 and H2, while I would have expected a more substantial change in how the story is told. I suggest to modify the abstract and the introduction too in this direction. Also, it would be useful if the changes in the text were highlighted so as to easily identify them.

Response: As you have pointed out, the findings regarding the topology of the cross-linked networks are not surprising given the assumption that these qualify as real-world networks. However, the main idea of this research is not to establish these well-known phenomena in the study of real-world networks but rather to harness them to demonstrate that these unique kinds of networks, which have not received much attention in the existing literature of virtual knowledge collaborations, can be regarded as a collective product in the form of a network. This had to first be established, and then further explored by examining the emergence of modularity, the emergence of organizing structures over time, and the qualitative insights regarding the relations between the network properties and concurrent real-world events. As this has not been clear enough in the previous versions of the manuscript, several changes were made, including a re-phrasing of the abstract and the research hypotheses in this direction.

See the version with highlighted tracked changes, especially the abstract and the introduction sections, and page 12)

Reviewer 3

1) In the point of view of network theory it is a "meta-population" (in the network theory jargon), since inside each sub-network exists a number of nodes associated with questions, answers and comments, and links among them. It was not take into account. With effect, it was handled simply as nodes connected by chance with other nodes, meaning random links among different contents. I do not think it is surprisingly or highlighted findings. The mapping of a real-network into a random one, in this context, do not correspond to the usual modeling coming from network science.

2) Why do not consider a cross-link, in the sense connecting different sub-networks, in the context of a "Network of networks"? It will sounds to be a more natural mapping, since the connectivity distribution of each sub-network seems to have relevant/substantial role in the receiving attention by participants to "create" a cross-link with another relevant sub-network. Thus, each layer, corresponding to a network with specific content, whereas a cross-link connects different layers.

Response: As you correctly point out, the full network of content within the Q&A discussion indeed includes questions, answers, and comments, and so the cross-linked network could be viewed as a "network of networks" comprising the connected content units as an overarching network with the question-answers-comments components as sub-networks. However, these subnetworks are by design hierarchical tree-shaped networks, and so emergent topological structures, which were the focus of this work, cannot appear within them. This phenomenon occurs within the question layer, as questions, and not answers or comments, become linked. For this reason, the more simplified network of content units was fit for purpose. Future work could further dive into other properties such as the number of answers and comments as a metric for a content-unit's significance. This explanation has been added to the paper (see p.6, p.9 in the tracked changes version).

As for the mapping of the network, the randomized simulations were not an attempt to find a mathematical equation describing the network. Their role was in creating an estimated sampling distribution, using Bootstrap, to form statistical confidence intervals for the randomly linked networks' metrics, which served as a basis for comparison with the real metrics. Each of these simulations included 1000 iterations, which were used for extracting the metrics and creating the estimated sampling distributions. Importantly, we would like to emphasize here that the current work was not aimed at creating a model of such a network but rather to establish the "networkedness" of the collective artifact, by demonstrating that it harbors common real-world network properties that point to self-organization, as opposed to a random formation. This point has been added to the paper, with a more detailed explanation of the statistical process which we agree was not explained properly in the previous version (see changes in abstract, see abstract, introduction and methodological sections).

3) I do not believe if the concept of a "hub" can be used on a connectivity distribution where the major number of nodes has a single connection, whereas the so-called "hubs" has four links... In Methodology section are described eight web sites corresponding to eight domain (network) of contents. Nodes are related to questions, besides answers and comments posted in response to questions. I believe that a more interesting question should be "what is the mathematical description for the connectivity distribution of cross-links of a Q&A networked system?", for example.

For example, let take this sentence:

"In all the networks, the most connected content units in the real graphs had significantly more connections than their counterparts in the randomized graphs."

I really do not understand the novel information that a magazine's reader can take from this... "The random modeling do not extract the fundamental mechanism of a Q&A real-network. So, its not random, because participants do not choose by chance what they will response, question or comment."

Response: We acknowledge your comment on the use of the concept of "hubs". Its use was an adaptation, as the cross-linked networks are very sparse, which is an artifact of the domain, and so what would commonly be referred to as "hubs" would be extremely rare, if possible at all. We added a paragraph that explains this and changed the wording accordingly (see p.9).

As for the mathematical equation, as mentioned, this was not the focus of the current work, and its significance is largely for literature on collective cognition and the collaborative creation of knowledge than for literature in network science. The main point was to demonstrate how individual activity that consists of sharing links results in changes to the overall structure of the network, which can be considered as organizing process. Future work could dive deeper to discover a mathematical description for this process, and perhaps use that to compare different discussions as there seems to be notable variance between communities which is worth exploring.

Another weak point:

"These included the Economics network in which two large components of similar size were formed (one consisting of 46 nodes and the other of 43)"

I was expecting some statistical treatment to this lower number of elements. I my opinion, with no data treatment a comparison with an statistical ensemble, provided by 1000 elements (null model), is doubtful since no strong fluctuations can permeate the network realizations. Remember that for random networks the average degree is approximately k, but in the real-network counterpart not necessarily.

response: The description of the statistical procedure was revised and further clarified to address this point. The 1000 iterations of different randomly linked networks per each real network, which were similar in size and growth rate, network were used for extracting 1000 corresponding metrics for each of the networks, which act as a sample of randomly linked networks of similar size and growth rate. These 1000 "observations" for each of the metrics for each of the networks were used for creating an estimated sample distribution through bootstrap resampling which was used for creating a 99% confidence interval for each of the metrics. Falling beyond this interval demonstrates a high probability of not being the product of random linking, which strengthens the hypothesis that the cross-links introduce emergent order into the otherwise unorganized (and fragmented) discussions (see additions in p 12 and 17).

Attachment

Submitted filename: Response to reviwers.docx

pone.0300179.s003.docx (19.6KB, docx)

Decision Letter 2

Carlos Henrique Gomes Ferreira

16 Oct 2023

PONE-D-22-31022R2Associative linking for collaborative thinking: self-organization of content in online Q&A communities via user-generated linksPLOS ONE

Dear Dr. Sher,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 30 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Carlos Henrique Gomes Ferreira, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Partly

Reviewer #4: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #3: No

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I would have appreciated if the authors had highlighted in a different color the changes that they made to the text. Anyway they have modified and improved the text so I suggest acceptance.

Reviewer #3: I am not convinced by the answers provided by the authors... These answers are essentially devoted to including some explanations in the text. For example:

"We acknowledge your comment on the use of the concept of "hubs". Its use

was an adaptation, as the cross-linked networks are very sparse, which is an artifact of

the domain, and so what would commonly be referred to as "hubs" would be extremely

rare, if possible at all. We added a paragraph that explains this and changed the

wording accordingly (see p.9)."

So, essentially the authors' responses was an "adaptation" of what reviewers have pointed out. It causes a feeling like "we are speaking to the reader with the complex network language, but not quite".

Another interesting point can be highlited with the following example:

"As for the mathematical equation, as mentioned, this was not the focus of the current

work, and its significance is largely for literature on collective cognition and the

collaborative creation of knowledge than for literature in network science. The main

point was to demonstrate how individual activity that consists of sharing links results in

changes to the overall structure of the network, which can be considered as organizing

process. Future work could dive deeper to discover a mathematical description for this

process, and perhaps use that to compare different discussions as there seems to be

notable variance between communities which is worth exploring."

I am convinced that this work should be sent to another journal with a profile more focused on "studies of cases", since the text does not seem to be concerned with reproducibility or revealing fundamental mechanisms of the studied phenomenon. Plus, my opinion is the opposite of that of the authors: not including an universal language, such as mathematics, means attracting the attention of a restricted group of readers.

In view of these considerations, I do not recommend the manuscript because do not believe that it is able to attract the attention of the magazine's readers.

Reviewer #4: The submission "Associative linking for collaborative thinking: self-organization of content in online Q&A communities via user-generated links" presents an analysis of the clustering, modularity, connectivity, and percolation of a suite of eight networks based on the StackExchange (SE) forums. These networks were defined on the set of threads--referred to here as "content units"--and edges were drawn between them based on whether or not a cross-link exists anywhere in the question, answer, or comments of the thread between thread x and thread y. The measurements on these networks are compared to what I would describe as a percolated hypercanonical Erdos-Renyi (ER) model in which the probability of an edge (or the number of links) is pulled from a distribution which matches the number of links created from each day in the data. This is not done in any sort of Monte-Carlo way, it is meant to match the exact percolation rate of each day seen in the data. The results show that the Stack Exchange networks exhibit clustering, max degree, nodes of degree >= 4, and largest component size to be outside of the 99% CI for 1000 runs of this percolated hypercanonical ER null model. They find also that the largest connected components exhibit high modularity and growth in the largest degree and number of triangles over time exceeds the rates seen in the null model.

It is my opinion that the strongest point of this article is that they have performed network analysis upon a knowledge-creating system which I have not seen an analysis of yet, and they do well to perform the relevant literature review.

However, there are a number of shortcomings in this paper which must be overcome for it to be ready to publish.

1a. The focus of the introduction and the motivation of this article seems to be to prove that these cross-link SE networks are "real networks." The authors state: "...this would enable us to reject the null hypothesis and to conclude that the cross-linked networks assume qualities that resemble real-world networks." [385-386] which is an odd hypothesis to have because there is no objective criterion for "real networks." A "network" is an ontological object and a "real network" would just be a real system which is examined through a graphical model. It seems that a lot of the conclusions about "emergent organization" rely on some proof that this cross-linked network is a "real network." It would be more salient to frame this as simply examining the features of this network (which is inherently a "real network" because it exists in the world and is being modeled graphically) and suggesting what that says about the system.

2. This work has little to no basis in social-scientific theory about knowledge-contribution behavior, so I am uncertain as to what we have learned from this study besides that the cross-link network likely has more internal structure than a random graph. The question remains: what are we to make of these result? In response to another reviewer who asked a similar question, I see that the response was that the goal is to bring attention “the concept of shared linking as a means of creating a collective networked product” but none of the papers cited (as far as I have seen) lend any meaning to this term “collective networked product” or its significance, let alone how these network properties map to this concept. The paper cites Barabasi’s 2003 paper to suggest that clustering implies organizing (also it may be worth considering Krioukov’s 2016 PRL paper) but this is very vague. Even a step forward to citing Newman’s definition in his 2005 textbook that ‘clustering implies transitivity’ would lend at least some theoretical advancement–-i.e. Cross-linking behaviors exhibit some transitivity.

3. There are a number of issues with the null model. It is a bit of an unusual null model to use because it seems to be a percolated G(n,p)/G(n,m) model with the link parameter matching the exact number of each day. The result of this would, ultimately however, still be a G(n,p) model with p=sum(edges)/sum(nodes) at the end. As this scales independently of the number of nodes, this is a dense model (see Def. 1.3 in Random Graphs and Complex Networks by Remco van der Hofstad) and thus I’m not sure this is the appropriate null model to measure the temporal dynamics of the network.

Other smaller things:

a. Including triangles and clustering is redundant

b. I'm uncertain as to why the terms "content units" and "focal points" are introduced and what purpose they serve. It seems that "content unit" refers to a thread in the forum, and “focal unit” is referring to a node of degree >= 4 but there is no justification as to why that was chosen, nor any theory behind it to suggest the significance of such in knowledge-sharing systems.

c. Sometimes the word “sparse” gets used to seemingly describe how disconnected the network is. The ratio presented of links to nodes is actually dense enough to have one connected component and is dense in comparison to many other networks of real systems. In general, however, we measure sparsity and density within-component (so infinite distances get dropped).

Broadly, my recommendation would be to zoom in and reframe the work so that the genuinely good results obtained can be understood through the lens of theory behind knowledge-sharing, instead of trying to develop novel theory about the relationships between network topology and epistemic knowledge-units alongside obtaining results.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

Reviewer #4: Yes: Sagar Kumar

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Mar 11;19(3):e0300179. doi: 10.1371/journal.pone.0300179.r006

Author response to Decision Letter 2


11 Jan 2024

Rebuttal letter

We would like to thank all of the reviewers who have taken their time and made the effort to read our work and make suggestions. Your contributions have been instrumental.

As reviewer 1 had no further comments in this round, reviewer 2 has not added any further comments after the first round, and reviewer 3 has indicated that they disagree with the concept of the paper, this response addresses the suggestions made by reviewer 4. Please see below:

Reviewer #4: The submission "Associative linking for collaborative thinking: self-organization of content in online Q&A communities via user-generated links" presents an analysis of the clustering, modularity, connectivity, and percolation of a suite of eight networks based on the StackExchange (SE) forums. These networks were defined on the set of threads--referred to here as "content units"--and edges were drawn between them based on whether or not a cross-link exists anywhere in the question, answer, or comments of the thread between thread x and thread y. The measurements on these networks are compared to what I would describe as a percolated hypercanonical Erdos-Renyi (ER) model in which the probability of an edge (or the number of links) is pulled from a distribution which matches the number of links created from each day in the data. This is not done in any sort of Monte-Carlo way, it is meant to match the exact percolation rate of each day seen in the data. The results show that the Stack Exchange networks exhibit clustering, max degree, nodes of degree >= 4, and largest component size to be outside of the 99% CI for 1000 runs of this percolated hypercanonical ER null model. They find also that the largest connected components exhibit high modularity and growth in the largest degree and number of triangles over time exceeds the rates seen in the null model.

It is my opinion that the strongest point of this article is that they have performed network analysis upon a knowledge-creating system which I have not seen an analysis of yet, and they do well to perform the relevant literature review.

However, there are a number of shortcomings in this paper which must be overcome for it to be ready to publish.

1a. The focus of the introduction and the motivation of this article seems to be to prove that these cross-link SE networks are "real networks." The authors state: "...this would enable us to reject the null hypothesis and to conclude that the cross-linked networks assume qualities that resemble real-world networks." [385-386] which is an odd hypothesis to have because there is no objective criterion for "real networks." A "network" is an ontological object and a "real network" would just be a real system which is examined through a graphical model. It seems that a lot of the conclusions about "emergent organization" rely on some proof that this cross-linked network is a "real network." It would be more salient to frame this as simply examining the features of this network (which is inherently a "real network" because it exists in the world and is being modeled graphically) and suggesting what that says about the system.

Authors' response:

Thank you for your input. Since as you mentioned, this type of information network has scarcely been addressed in the literature, we wanted to first establish that the collection of content-units (question and answer sets) connected by user-generated internal references constitutes as a network with a structure worth exploring. The wording was revised to better describe this.

2. This work has little to no basis in social-scientific theory about knowledge-contribution behavior, so I am uncertain as to what we have learned from this study besides that the cross-linked network likely has more internal structure than a random graph. The question remains: what are we to make of these result? In response to another reviewer who asked a similar question, I see that the response was that the goal is to bring attention “the concept of shared linking as a means of creating a collective networked product” but none of the papers cited (as far as I have seen) lend any meaning to this term “collective networked product” or its significance, let alone how these network properties map to this concept. The paper cites Barabasi’s 2003 paper to suggest that clustering implies organizing (also it may be worth considering Krioukov’s 2016 PRL paper) but this is very vague. Even a step forward to citing Newman’s definition in his 2005 textbook that ‘clustering implies transitivity’ would lend at least some theoretical advancement–-i.e. Cross-linking behaviors exhibit some transitivity.

Authors' response:

The theoretical concept behind the paper is highly connected to knowledge-contribution behavior: the internal complex structures present a form of self-organization (see for instance Heylighen (in Bates & Maack. eds), 2015). This structuring is suggested as form of knowledge integration that exceeds the integration created within the text or by other means of structuring virtual knowledge collaborative discussions such as tagging. This point has now been made much more salient within the text, especially in the introduction and theoretical concept sections.

3. There are a number of issues with the null model. It is a bit of an unusual null model to use because it seems to be a percolated G(n,p)/G(n,m) model with the link parameter matching the exact number of each day. The result of this would, ultimately however, still be a G(n,p) model with p=sum(edges)/sum(nodes) at the end. As this scales independently of the number of nodes, this is a dense model (see Def. 1.3 in Random Graphs and Complex Networks by Remco van der Hofstad) and thus I’m not sure this is the appropriate null model to measure the temporal dynamics of the network.

We have given precise mathematical definitions for the notions of graph sequences being highly connected, small worlds, and scale free, extending earlier definitions in van der Hofstad (2010). Our definitions are based upon a summary of the relevant results proven for random graph models. We restrict ourselves to sparse random graphs, i.e., random graphs where the average degree remains bounded as the network size grows (recall Definition 1.3). In recent years, there has been an intensive and highly successful effort to describe asymptotic properties of graphs in the dense setting, where the average degree grows proportionally to the network size. This theory is described in terms of graph limits or graphons, which can be thought of as describing the limit of the rescaled adjacency matrix of the graph. The key ingredient in this theory is Szemer´edi’s regularity lemma (see Szemer´edi (1978)), which states roughly that one can partition a graph into several more or less homogeneous parts with homogeneous edge probabilities for the edges in between. See the book by Lov´asz (2012) for more details. We refrain from discussing the dense setting in more detail.

Authors' response:

The number of both edges and nodes in the simulated graphs are identical to those of the original graphs, and were constructed as follows: the real graph's daily addition of nodes and edges was extracted, and the randomly-linked simulated graphs were constructed by gradually adding the daily number of nodes and edges while the structure of the graph remains. That is, nodes were added and then edges were randomly assigned on top of the existing structure from the previous simulated "day" (without repetition, only nodes that have not been linked could become linked). So, at the end of each simulated "day", there is a graph identical in a number of both nodes and links to the snapshot of the real graph on that day. On the "final day", the number of nodes and edges is identical to that of the real graph, and the average degree is of course identical. This was done to count for the greater chance of earlier nodes becoming connected, as the simulated graphs mimic the growth patterns of the real ones. This was iterated 1000 times so there eventually are 1000 simulated graphs for each community. So, the difference between the real graphs and their simulated counterparts is the assignments of edges, not their overall distribution which is the same. This creates ground for a comparison that singles out the contribution of the particular assignment of edges, to demonstrate that the way the edges are added has an organizing effect on the network as a whole, with more complex internal structures than would have been produced by assigning the edges randomly.

The main point is do demonstrate that the cross-links, originally designed for easing navigability between related content have effects in shaping the graph at the mezzo and macro levels.

Other smaller things:

a. Including triangles and clustering is redundant

Authors' response:

Global clustering is defined as the rate of closed triplets ("triangles") out of all possible triplets. i.e. two nodes connected to the same third node. Potentially, if these are very scarce, the rate of triangles might still be high. Forgoing the clustering coefficient, however, removes some of the context and the insight towards the concept of transitivity in the self-organization of the content, so we decided to keep both. This has now been added to be reflected in the text (table 3).

b. I'm uncertain as to why the terms "content units" and "focal points" are introduced and what purpose they serve. It seems that "content unit" refers to a thread in the forum, and “focal unit” is referring to a node of degree >= 4 but there is no justification as to why that was chosen, nor any theory behind it to suggest the significance of such in knowledge-sharing systems.

Authors' response:

The term "content unit" refers to a set of a question with its associated answers (if any) and comments (if any). Addressing the entire set of question + answers(+comments) stems from the perception of the entire set as a distinct unit of content within the network. These might be richer in case they contain more answers/ comments, but they stand alone as separate elements regardless. Their inner structure can be hierarchical, up to three layers (question-answers-comments). Amongst them, they can only be connected through a connective element, in this case, a cross-link (but studies have also looked at mutual tags as connectors, for instance (Dankulov et al., 2015)). Ye et al. (Ye et al., 2017) referred to these as knowledge-units. As part of the re-framing of the work, all references to these were altered to knowledge units – first in order to be consistent with existing work on similar environments, and second to reflect each question-answers-comments standalone value.

The term "focal point" of the network was used to describe a unit that has become relatively connected to other units. Within the entire dataset, hardly any nodes became connected enough to constitute as "hubs", but some nodes began to display potential of acting as local intersections that were at the crossroads of multiple knowledge units. Two indicators were used to identify the emergence of local focal points: 1. The degree of the most connected unit, and 2. The number of units with several connections (4 or above). While the number is somewhat arbitrary, it was designed to capture knowledge units that became connected to several other knowledge units, and so reflect an emergence of hierarchy, units that are more central than others within the collective knowledge product. This may be useful for navigation, or for extracting themes out of the graph by focusing on these nodes. This clarification was added to the manuscript.

Sometimes the word “sparse” gets used to seemingly describe how disconnected the network is. The ratio presented of links to nodes is actually dense enough to have one connected component and is dense in comparison to many other networks of real systems. In general, however, we measure sparsity and density within-component (so infinite distances get dropped).

Authors' response:

This has been corrected throughout the manuscript.

Broadly, my recommendation would be to zoom in and reframe the work so that the genuinely good results obtained can be understood through the lens of theory behind knowledge-sharing, instead of trying to develop novel theory about the relationships between network topology and epistemic knowledge-units alongside obtaining results.

Authors' response:

Thank you for this important suggestion. We have tried to incorporate your suggestions and shift the emphasis toward the role of link-sharing within the collaborative process.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0300179.s004.docx (23.9KB, docx)

Decision Letter 3

Carlos Henrique Gomes Ferreira

23 Feb 2024

Associative linking for collaborative thinking: self-organization of content in online Q&A communities via user-generated links

PONE-D-22-31022R3

Dear Dr. Sher,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Carlos Henrique Gomes Ferreira, Ph.D.

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #5: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #5: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed all of my previous comments. I am happy with the revised version. I suggest acceptance on Plos One.

Reviewer #5: The manuscript titled "Associative Linking for Collaborative Thinking: Self-Organization of Content in Online Q&A Communities" is an insightful exploration of how virtual collaborative Q&A platforms foster collaborative knowledge creation through dynamic interactions between users and content. This study was motivated by the observation that knowledge within these communities is often fragmented, yet the collective value of this collaboratively formed knowledge base is not fully appreciated. Following research on individual mental semantic networks, the authors investigate the self-organizing nature of knowledge sharing within these communities, as manifested in the associative links created by users.

Using data from eight topic-centered discussions on the Stack Exchange platform, network analysis tools are used to examine the structure of the networks formed by these associative links. By comparing topological indicators of these networks — such as cluster coefficients, degrees of integration and the presence of strongly connected nodes — with those of 1,000 simulated networks generated by random links, the study reveals a striking pattern. The actual networks exhibit a higher degree of clustering, better integration and more strongly connected sites than their randomly generated counterparts, with these differences increasing over time. In addition, the largest connected subgraphs in these networks exhibit modularity, suggesting a nuanced internal organization.

In addition, the study makes the first qualitative observations about how external events that affect content can influence network structures. These findings support the idea that networks formed through associative linking mirror the self-organizing properties of other information networks and highlight the potential of collaborative linking as a mechanism for collective knowledge organization. This research underscores the need to recognize and exploit associative linkage in both theoretical frameworks and practical applications, and provides a compelling argument for its adoption for publication.

I've seen the history of the revisions made in response to the previous rounds in the peer review process.

I agree with the other reviewers that the manuscript has been significantly improved in terms of clarity and organization. The authors have addressed the requested changes, and in my evaluation the manuscript is in a good state to be published.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #5: No

**********

Acceptance letter

Carlos Henrique Gomes Ferreira

29 Feb 2024

PONE-D-22-31022R3

PLOS ONE

Dear Dr. Sher,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Carlos Henrique Gomes Ferreira

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. References for Stack Exchange questions mentioned in Figs 2 and 3.

    (DOCX)

    pone.0300179.s001.docx (16.2KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0300179.s002.docx (19KB, docx)
    Attachment

    Submitted filename: Response to reviwers.docx

    pone.0300179.s003.docx (19.6KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0300179.s004.docx (23.9KB, docx)

    Data Availability Statement

    The data underlying the results presented in the study are available at https://osf.io/9xvnm Originally downloaded from the Stack Exchange Data Dump at: https://archive.org/details/stackexchange.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES