Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2014 Aug 1;9(8):e103023. doi: 10.1371/journal.pone.0103023

How Much Is the Whole Really More than the Sum of Its Parts? 1 ⊞ 1 = 2.5: Superlinear Productivity in Collective Group Actions

Didier Sornette 1,*, Thomas Maillart 2, Giacomo Ghezzi 3
Editor: Matjaz Perc4
PMCID: PMC4118854  PMID: 25084157

Abstract

In a variety of open source software projects, we document a superlinear growth of production intensity (Inline graphic) as a function of the number of active developers Inline graphic, with a median value of the exponent Inline graphic, with large dispersions of Inline graphic from slightly less than Inline graphic up to Inline graphic. For a typical project in this class, doubling of the group size multiplies typically the output by a factor Inline graphic, explaining the title. This superlinear law is found to hold for group sizes ranging from 5 to a few hundred developers. We propose two classes of mechanisms, interaction-based and large deviation, along with a cascade model of productive activity, which unifies them. In this common framework, superlinear productivity requires that the involved social groups function at or close to criticality, or in a “superradiance” mode, in the sense of the appearance of a cooperative process and order involving a collective mode of developers defined by the build up of correlation between the contributions of developers. In addition, we report the first empirical test of the renormalization of the exponent of the distribution of the sizes of first generation events into the renormalized exponent of the distribution of clusters resulting from the cascade of triggering over all generation in a critical branching process in the non-meanfield regime. Finally, we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.

Introduction

Since at least Aristotle, the adage in the title has permeated human thinking, with prominent influence in psychology (Gestalt theory [1]), biology (brain functions [2], ecological networks [3]), physics (spontaneous symmetry breaking [4] and the “more is different” concept [5]), economics [6], [7] among a wealth of other examples. Prominent among other developments are the fields of complexity science, synergetics and complex adaptive system theory, which strive to understand natural and social systems in terms of a systemic or holistic approach, where the above adage is translated into the scientific concept of emergence that results from repetitive interactions between simple constituting elements in extended out-of-equilibrium adaptive systems. Dealing with groups such as firms and production units, management science also strives to understand when and how a group can be more than the sum of individuals, and to design ways to improve team performance [8][11], through the mechanism of complementarity in organization [12], [13] and innovations [14]. Because most activities in our modern environment require coordination and collaborative actions within groups of widely varying sizes, it is the fundamental aspiration of any manager, be it in the public or private sector, to find the gears that could enhance productivity.

Notwithstanding their importance in human culture and civilization since ancient times, we still have a limited understanding of the mechanisms at the origin of group productivity. Moreover, we do not really understand the conditions under which the whole is more than the sum of its parts, and how to quantify its productivity with respect to its different constituents. The bottlenecks hindering progress include the difficulties for quantifying productivity as well as the obstacles of controlled experiments that allow for clean conclusions. Indeed, most human groups and systems are entangled in their functioning and objectives, and are rarely amenable to systematic and continuous observations suitable for rigorous scientific analyses.

To address these problems, we use a source of data in which group cooperation is ubiquitous and can be quantified in great details, namely the dynamics of production intensity during the development of open source software (OSS) projects. Because OSS development is essentially collective, iterative, and cumulative, and the overhead costs for interactions is small thanks to the cheap electronic support mediating exchanges between developers, the study of potential increases of productivity by interaction and cooperation between several contributing developers is particularly well suited.

The next section presents the main empirical evidence of the superlinear production intensity law found for open source software projects. We then present two classes of mechanisms at the origin of superlinear production intensity, which are unified in the cascade model of productive activity. Empirical data tests are found to support the model. We then compare and attempt to reconcile present findings for OSS and the superlinear law previously reported for cities. A discussion section develops the broader implications of our results, and the conclusion section summarises our main results.

Quantification of productivity in open source software projects

We have analyzed the production for Inline graphic open source software projects of size ranging from Inline graphic to Inline graphic contributors. Figure 1 shows the complementary cumulative distribution of project sizes in our sample quantified by the number of developers involved in each project [all source data (Archive S1) and relevant statistics (Table S1), detailed per project, are available in Supporting Information]. The distribution is an approximate power law Inline graphic with exponent Inline graphic, which reflects a large heterogeneity of project sizes with few projects attracting many developers and a multitude of projects with just a few developers. The simplest generic mechanism for such power law distribution of human group sizes is proportional growth coupled with birth and death [15], [16] as verified empirically in OSS package reuse [17], in group [18] and in product [19] dynamics.

Figure 1. Distribution of project sizes in our sample quantified by their total number of developers.

Figure 1

The distribution follows approximately a power law with exponent Inline graphic, with an apparent deviation in the tail possibly resulting from an over-sampling bias of large projects. The bend down for small projects is likely the result of an under-sampling bias.

A first idea would be to quantify the total production (for instance proxied by the number of lines of code, commits or the number of packages) of each software and search for a relationship with the total number of involved developers over the whole project. This is misleading because the total output results from a complex interplay between a time varying numbers of involved developers and the intermittent duration and intensity of their contributions. In the extreme limit, a single developer working over a lifetime may produce as much as tens or even hundreds of developers over a few months. The large variability of developer numbers and contributions as a function of time for each project is illustrated by Figure 2, which shows the intermittent dynamics of active contributors as well as their productive activity as a function of time (in logarithmic scales).

Figure 2. Typical time series of open source software development (e.g. Apache Web Server) with active contributors (green area) and their productive activity (red area).

Figure 2

For clarity, the time series are represented in logarithmic scale and they have been smoothed with a rolling window of Inline graphic days. Over the whole project history, various epochs of productive activity can be found. The background grey areas indicate three levels of the productivity exponent Inline graphic defined by equation (1) (light grey for Inline graphic, grey for Inline graphic and dark grey for Inline graphic) for time windows of 250 days. Blank areas show time windows for which Inline graphic could not be fitted, mainly because the numbers of active contributors (resp. commits) were strongly varying over these periods. In other words, it is possible that super linear production was occurring in these periods but we could not determined it.

To capture more faithfully the actions of contributions via cooperation, we propose to focus on short-term production and group sizes. For each project, we partition its lifetime in time windows of a fixed size that we shift over the whole project duration. We then quantify the production in each window and study its relation to the number of active developers during that same time window. As proxies for the production of developers, we could use either use lines of codes (Inline graphic) or commits. Inline graphic are straightforward metrics but suffer from the criticism that real production and quality is not in general proportional to the number of code lines. Indeed, excellent contributions are in general characterized by efficient and elegant coding associated with conciseness. Among software developers, it is well recognized that the number of LOCs contributed is not a predictor of quality. However, in open collaboration, each innovation step can be seen as a commit uploaded and compounded on an online repository, which keeps track of all changes over time. Each commit reflects the contributor's commitment to expose to the community her proposed solution to an open problem. Commits are the elementary units that get peer-reviewed, tested and eventually integrated in the project knowledge base. Thus, they are a direct measure of the iterative productive process at work in peer-production. All commit activities are parsimoniously indexed and timestamped on the project repository.

Notwithstanding these arguments in favor of using commits as metrics of production, it is useful to test for a possible relation between Inline graphic and Inline graphic. Figure 3 documents a robust scaling relationship Inline graphic, with exponents Inline graphic for most of the projects. These findings shown in Figure 3 bolster our confidence in the robustness of the findings reported below, which should not be sensitive to the specific choice of the metric for production.

Figure 3. Scaling relation Inline graphic between commits and lines of code.

Figure 3

For the Apache Web Server project, the scaling exponent is Inline graphic (Inline graphic, Inline graphic). For the vast majority of projects, the relation between lines of code and commits exhibits the same scaling with Inline graphic, suggesting that we can use either commits or lines of codes, as both provide a consistent and therefore robust measure of contribution (and in addition that commits may themselves result from cascades of code production.

Figure 4 demonstrates the typical superlinear relationship

Figure 4. Typical superlinear relation in double logarithmic scale of the productive contribution Inline graphic as a function of active contributors Inline graphic per 5-day time windows for Apache Web Server (http://httpd.apache.org/).

Figure 4

The scaling exponent Inline graphic (Inline graphic and Inline graphic) is shown as the slope of a straight line in double logarithmic scale. The error bars show the 25th and 75th percentiles of contributors log-bins.

graphic file with name pone.0103023.e036.jpg (1)

where the production Inline graphic is defined as the total number of commits measured per 5-day time windows for the Apache Web Server (http://httpd.apache.org/) and Inline graphic is the number of active contributors in the same 5-day time windows. Contrary to the naive expectation that the production Inline graphic should be proportional to the number Inline graphic of developers, Figure 4 documents a superlinear relationship with exponent Inline graphic, therefore significantly larger than the value Inline graphic describing a simple proportionality Inline graphic. Over all OSS projects studied, the estimated statistical average is Inline graphic. Since Inline graphic, this explains the title of this paper. For many projects, Inline graphic is larger than Inline graphic, such as the Apache Web Server project shown in figure 4, for which Inline graphic. These results are robust with respect to the length of the time windows (from 1 day to 10 days).

Mechanisms for superlinear production

We consider two classes of mechanisms for superlinear production.

Interaction-based mechanism for superlinear production

There is a variety of channels by which contributors commit more solutions to problems when the community is more active. The peer-review process is more likely to occur when more contributors are active, there are incentives to share early with the community to avoid redundant work and some problems require collective intelligence to increase their chance to be solved [20], because they require tight coordination among different technical parts of the code [21]. A priori, the number of active developers is an extensive variable, that is, it is additive for independent non-interacting systems. When interactions between developers occur, the observed increasing return of productive activity implies that the change Inline graphic of productivity upon the addition of a developer due to the existence of interactions is not a constant but grows itself with the number of active contributors (as Inline graphic with Inline graphic). There is thus a remarkable increase of productive activity, not only as the sum of increased individual commits, but also as a result of interactions among active contributors.

Interactions leading to a phase transition

In standard models of interaction, linearity between the observable and external driving field as well as number of elements in the system is the rule (Inline graphic), except at or close to a critical phase transition point. As an illustration, consider the average magnetisation Inline graphic per spin at a function of the temperature Inline graphic in a system undergoing a paramagnetic-ferromagnetic phase transition at the critical temperature Inline graphic. The standard relation Inline graphic relates linearly the average magnetisation Inline graphic to the external intensive magnetic field Inline graphic via the susceptibility Inline graphic. Introducing the spatial spin-spin correlation length Inline graphic of the system, it is known that the susceptibility diverges as a power of the correlation length as Inline graphic

graphic file with name pone.0103023.e062.jpg (2)

where Inline graphic and Inline graphic are two critical exponents related by the hyperscaling relation Inline graphic, where Inline graphic is the space dimension. Exactly at Inline graphic, the linear relationship between Inline graphic and Inline graphic given by (2) is replaced by the nonlinear relation

graphic file with name pone.0103023.e070.jpg (3)

defining the exponent Inline graphic. This means that the collective behaviour of the spin at criticality induces a nonlinear response of the magnetisation Inline graphic for very small external magnetic fields Inline graphic (indeed, Inline graphic for Inline graphic and Inline graphic). The values of the exponents are Inline graphic in the mean-field regime, which holds at the upper critical dimension Inline graphic. The relationship (3) looks superficially similar to (1) when compared with the standard linear relation Inline graphic, but here the magnetic field is an intensive quantity while relation (1) describes the production intensity as a function of the number of group members, which is an extensive quantity. Actually, a relation similar to (1) can be derived by introducing the finiteness of the spin system and using the theory of finite-size scaling [22]. For a system of finite linear size Inline graphic and thus finite volume Inline graphic, the theory of finite-size critical phenomena implied that relation (2) is replaced by

graphic file with name pone.0103023.e082.jpg (4)

obtained simply by replacing Inline graphic by Inline graphic. In words, the unique relevant length, which is the correlation length Inline graphic for an infinite system at criticality, becomes the system size. With Inline graphic, this yields Inline graphic. Since Inline graphic is the magnetisation per spin, we obtain that the total magnetisation Inline graphic of the system with a total number Inline graphic of spins is given by

graphic file with name pone.0103023.e091.jpg (5)

that it, becomes superlinear at or close to criticality, similarly to expression (1). This type of superlinear relationship (5) holds more generally in various models of interacting elements at or close to criticality [23][26]. The meaning of criticality is that, on average, one action triggers on average one follow-up action, ensuring that the dynamics remains delicately poised between growth and decay, or between order and disorder. Therefore, an explanation of superlinear productivity by the interaction-based mechanism requires elucidating under which circumstances open source projects operate close to or at criticality. The study of dynamics of book sales [27], [28] and YouTube videos views [29] has shown evidence of these critical triggering effects in large social networks. Open source projects and their online communication platforms coupled with the code repository serve a similar social network role yet at much smaller scales [30], [31]. Since these above analyses as well as those presented here benefit from the survival bias, in other words the analyses are performed on top performers among a much larger database, the existence of criticality in these system can be interpreted as the signature of a degree of success quantified by significant activity. Specifically, considering a large universe of projects, those that are of interest in the sense of exhibiting significant dynamics in volume and quality are those for which the conditions are met to be close to criticality.

Interactions leading to superradiance-like phenomena

The superlinear dependence of the production intensity as a function of the number of group members has a rather direct analog with the phenomenon of superradiance [32], [33], a coherent effect in many-body systems of Inline graphic excited emitters that interact with a common light field. In the limit when the wavelength of the light is much greater than the separation of the emitters, then the emitters interact with the light in a collective and coherent fashion. Rather than radiating independently with a total intensity proportional to Inline graphic as would be expected for independent emitters, in the most favorable case of perfect coherence, the total radiation scales as Inline graphic, similarly to the mean-field prediction Inline graphic obtained from expression (13) when the exponent Inline graphic of the tail distribution of first generation contributions per developers is larger than or equal to Inline graphic. For more realistic experimental situations, the exponent is smaller than Inline graphic, for instance equal to Inline graphic when the initial light fluctuating field is small [34], or equal to Inline graphic for Inline graphic two-level atoms placed within isotropic photonic band-gap material (but can reach the value Inline graphic for anisotropic 3D band gaps) [35]. In physics, the superradiance effect results from the existence of correlations and interactions between emitters, similarly to the interactions between group members of OSS projects. The interactions and resulting correlations between emitters are mediated by the radiated light, similarly to the correlations between developers via the production of commits. The superradiant emission is a cooperative process involving a collective mode of all the atoms of the sample. In this collective mode, an “order” appears in the system which can be defined by the build up of correlation between the dipoles belonging to different atoms. This correlation is quite reminiscent of the spin-spin correlation appearing for example in a ferromagnetic sample [33]. There is in fact a hidden phase transition in which the role of the diverging correlation length is played by the light wavelength, which has to be much larger than the inter-emitter distances.

Moreover, the smaller value of the exponent Inline graphic for large groups and for cities, as documented below, has a straightforward interpretation in the superradiance analogy. Indeed, the maximal number of correlated emitters is limited by the correlation, or coherence volume. When the number of emitters exceeds the maximal number of those that effectively interact, the superlinear exponent decreases. This is due to the fact that, for larger numbers of emitters, the system separates into clusters or subgroups that radiate practically independently. In physics, this effect is termed filamentation. The same effect is argued to happen for the studied case of production intensity, as is discussed in the section below entitled “Reconciling present findings and superlinear production in large cities”.

Large deviation mechanism for superlinear production

The second class of mechanisms builds on the evidence of large deviations in the statistics of the production activity Inline graphic over the whole population of contributors and over the whole life of the project. Figure 5 shows the complementary cumulative distribution Inline graphic of all contributions per developer over a long period for the Apache Web Server project. One can observe an approximate power law tail dependence

Figure 5. Typical distributions of 1st generation daughter events and total number of commits per developer for the Apache Web Server project: (blue squares) Complementary cumulative distribution Inline graphic of contributions (number of commits) per developer and per 5-day time bins (1st generation daughters events in the language of the epidemic branching process described in the text) with exponent Inline graphic.

Figure 5

(red circles) Complementary cumulative distribution Inline graphic of all contributions per developer over a long period of time. Inline graphic is equivalent to measuring the cluster sizes of contributions following critical cascades (7). All distributions have been fitted using the maximum likelihood estimator (MLE). The distribution of cascade size is characterized by the exponent Inline graphic compared to the first generation daughter events distribution with exponent Inline graphic. The results showed here for Apache are representative of the distributions found in other collaborative projects.

graphic file with name pone.0103023.e112.jpg (6)

with Inline graphic. Within the epidemic framework presented in the next section, Inline graphic will be shown to be equivalent to the statistics of the cluster sizes of contributions following critical cascades [36] (see expression (12)), i.e., when the dynamics of triggering of activity is close to or at the critical point of a branching process. This result, showed for the Apache Web Server project, is representative of the distributions found in other collaborative projects.

In the presence of such a power law statistics of contributions characterized by an exponent Inline graphic, we show below that the sum of contributions over all developers is controlled by extreme contributors. The contributions made by these exceptional members of the group are also responsible for the observed superlinear behavior given by (1). This mechanism is reminiscent of the improved group performance that results from the presence of one or few surperforming individuals [37]. In this case, the largest contributor provides a finite fraction of the whole production over a given time period. This largest contributor (i.e. the “large deviation”) has a superlinear contribution in the group size [38], [39]. In this situation, the increasing productive activity results from a large heterogeneity of activity per individual. And the more contributors Inline graphic during a production period, the more likely it is to find an extremely large contribution.

Specifically, starting from expression (6) for the complementary cumulative distribution Inline graphic, we denote Inline graphic the corresponding probability density function obtained as the derivative of Inline graphic. Let us call Inline graphic, the total number of commits contributed respectively by the developers Inline graphic. Let us call Inline graphic, the largest among the set Inline graphic. A good estimate of Inline graphic is obtained by the condition that the probability Inline graphic to find a developer with a total contribution equal to or larger than Inline graphic times the number Inline graphic of active developers is equal to Inline graphic, i.e., by the definition of Inline graphic, there should be typically only one developer with such a number of commits. This yields

graphic file with name pone.0103023.e130.jpg (7)

An estimate of the typical total number of commits Inline graphic contributed by the Inline graphic developers can then be obtained as [38], [39]

graphic file with name pone.0103023.e133.jpg (8)

We stress that the scaling Inline graphic only holds for Inline graphic and is replaced by Inline graphic, i.e., linearity, for Inline graphic. The upper bound in the integral in (8) reflects that the random variables Inline graphic are not larger than Inline graphic by definition of the later. According to equation (8), the typical total production (number of commits) by Inline graphic developers is proportional to Inline graphic, when their contributions are wildly distributed with a power law distribution with exponent Inline graphic. According to this large deviation mechanism, the superlinear exponent Inline graphic is equal to Inline graphic.

graphic file with name pone.0103023.e145.jpg (9)

Within this large deviation mechanism, explaining the superlinear productive activity (Inline graphic) reduces to explaining the heavy-tailed distribution of commits Inline graphic per contributor over a large period of time, i.e., amounts to derive the power law distribution (6) with Inline graphic. For this, the next section proposes a generic model.

Cascading model of productive activity

Both the interaction-based and the large deviations mechanisms can be captured together by a generic cascade process, which is well described by the excited Hawkes conditional Poisson process [40]. The Hawkes process typically models well a variety of social dynamics involving complex human interactions such as online viral meme propagation [29], gangs and crime in large American cities [41], cyber crime [42] and financial contagion [43][45]. The Hawkes process is defined by the intensity Inline graphic of events (commits) given by

graphic file with name pone.0103023.e150.jpg (10)

where Inline graphic are the timestamps of past commits, Inline graphic is the spontaneous exogenous rate of commits, Inline graphic is the fertility of commit Inline graphic that quantifies the number of commits (of first generation) that it can potentially trigger directly, and Inline graphic is the memory kernel, whose integral is normalized to Inline graphic, which weights how much past commit activities influence future ones. The function Inline graphic typically reflects how tasks are prioritized and performed by individuals according to a rational economy where time is a non storable resource [46]. Expression (10) expresses that the number of commits contributed between time Inline graphic and Inline graphic results from two sources: (i) an exogenous source Inline graphic representing the spontaneous commits not related to previous commits; (ii) an endogenous term represented by the sum over all commits that were made prior to Inline graphic, and which are susceptible to trigger future commits. An obvious triggering mechanism is debugging: a past commit may attract the attention of a developer who fixes a bug and thus improves the code. Another triggering mechanism by which a previous commit may trigger a future commit is when the former enables new functionalities and relationships that open novel options for the developers. The Hawkes model is the simplest conditional Poisson process that combines both exogeneity and endogeneity.

The class of Hawkes models can be mapped onto the general class of branching processes [47]. The statistical average fertility Inline graphic defines the branching ratio Inline graphic, which is the key parameter. For Inline graphic, Inline graphic and Inline graphic, the process is respectively sub-critical, critical and super-critical [48], [49]. In the sub-critical regime (Inline graphic), the average activity tends to die out exponentially fast and the exogenous source term Inline graphic controls the overall dynamics. At criticality (Inline graphic), on average one commit is triggered in direct lineage by a previous commit, corresponding to a marginal sustainability of the process with infinitesimal exogenous inputs. The super-critical regime (Inline graphic) is characterised by an explosive activity that can occur with finite probability. The results derived below are thus fundamentally associated with the existence of a critical phase transition determined by the control variable Inline graphic. The nature of the critical phase transition for this Hawkes model with distribution of fertilities has been described in Refs. [36], [50], [51]. Interpreting a cluster or connected cascade in a given branching process of triggered contributions as the burst of production in a group of developers, the distribution of contributions is thus mapped onto that of triggered cluster sizes [36].

Let us define the complementary cumulative distribution Inline graphic of contributions (number of commits) per developer directly triggered by a given past commit, which can be called first-generation daughter commits generated by a mother commit. Consider the case where Inline graphic is also a power law

graphic file with name pone.0103023.e174.jpg (11)

Close to or at criticality, the distribution of cluster sizes, which is equivalent to the distribution of productive activity Inline graphic given by (6) has an exponent Inline graphic [52], under the condition that the distribution Inline graphic of contribution sizes triggered directly by previous contributions (so-called first-generation cascades) decays sufficiently fast, i.e., with Inline graphic. The result Inline graphic holds also for any distribution Inline graphic decaying asymptotically faster than a power law [36]. When Inline graphic, the mean field exponent Inline graphic is changed into [36]

graphic file with name pone.0103023.e183.jpg (12)

Together with (9), the superlinear exponent Inline graphic is predicted to be

graphic file with name pone.0103023.e185.jpg (13)

that is, equal the exponent Inline graphic of the tail distribution of first generation contributions per developers. For Inline graphic, Inline graphic and therefore Inline graphic. An analytical derivation of the prediction (13) using the Hawkes process (10) that anchors rigorously the large deviation argument of the previous section is given by Saichev and Sornette [53].

Figure 6 synthesizes the relation between superlinear productive activity, (critical) cascades, the distribution of first-generation triggering and the total distribution of activity per contributors over a sufficient long period.

Figure 6. Relationship between superlinear productive bursts, cascading dynamics, and heavy-tailed distributions of 1st generation and cumulative contributions.

Figure 6

(A) (light blue) Triggering mechanism generating the clusters of size with renormalized exponent Inline graphic from the distribution of first generation “daughter events” with exponent Inline graphic. For the sake of simplicity, we represented one cluster of activity per contributor, but triggering can occur between contributors provided that the probability of triggering remains the same between all contributors. (B) (light green) shows how the triggering mechanism generates superlinear productive activity Inline graphic as a function of the number of active contributors Inline graphic.

Empirical tests

We now turn to empirical tests of this theory. For each Inline graphic days period and for each project in our database (Archive S1), we have calibrated the power law tails of two distributions:

  1. the distribution of the total number Inline graphic of commits per contributor over the Inline graphic days, which is taken as a proxy for Inline graphic, with exponent Inline graphic;

  2. the distribution of the number of commits per developer per Inline graphic days time bin, which is assumed to be a reasonable proxy for the distribution Inline graphic of the first generation production characterized by the exponent Inline graphic.

For each OSS project, we have used the discrete maximum likelihood estimator (MLE) with a p-value threshold Inline graphic, obtained by bootstrapping, and Kolmogorov-Smirnov Distance Inline graphic to select the ranges over which the calibration is performed [54] (see Table S1, for detailed results of each OSS project analyzed).

Figure 5 shows the result for the Apache Web Server project. The fitting procedure qualifies the existence of a power law tail for the two empirical distributions with estimated exponents respectively equal to Inline graphic and Inline graphic. These values with their error bars are compatible with the prediction (12) Inline graphic, resulting from the cascades of triggering [36]. This result is typical of the other investigated OSS projects, as shown Figure 7, albeit with a considerable variability. This is expected since the projects are likely to be characterized by many more dimensions that the production and cascading effects considered here.

Figure 7. Verification of the relationship between Inline graphic, Inline graphic and Inline graphic as predicted by the theory.

Figure 7

(A) superlinear exponent Inline graphic as a function of Inline graphic, the exponent of the power law tail distribution of first generation productivity for each of the Inline graphic days periods for which both values could be calibrated. The points are concentrated around Inline graphic with almost half of them (Inline graphic over Inline graphic values) within the grey area delimited by Inline graphic and Inline graphic. To test for the relations Inline graphic and Inline graphic, we used a bi-Gaussian model. The dotted ellipses show the first three standard deviations around the barycenters and the black line represents the main axis with the bi-Gaussian model. We also performed a principal component analysis (PCA). The red dotted lines show the main direction of variance obtained with the PCA. Both methods show a positive relation between Inline graphic and Inline graphic only on second principal component (slope Inline graphic with PCA). (B) same as panel (A) for the dependence of Inline graphic versus Inline graphic with a concentration of points in the grey area (86 over 213 values) and Inline graphic. Both the bi-Gaussian fit and the PCA show strong evidence of a positive relation with slope Inline graphic with the bi-Gaussian approach and Inline graphic with the PCA.

Figure 7 presents Inline graphic as a function of Inline graphic (panel A) and Inline graphic as a function of Inline graphic (panel B) for all the OSS projects on our database, According to the cascading model of productive activity presented in the previous section, we should have Inline graphic, according to (13). Indeed, one can see that Inline graphic, Inline graphic, and Inline graphic are clustered around Inline graphic. Almost half of the considered periods (Inline graphic of a total of Inline graphic) fitted over all projects belong to the regime where Inline graphic and Inline graphic (panel A) and forty percent (86 out of 213) are such that Inline graphic (panel B) as predicted by the theory.

Let us first focus on the relationship between Inline graphic and Inline graphic shown in panel B of Figure 7. Note that the statistics on the exponent Inline graphic is significantly smaller compared to that for Inline graphic simply because we obtain one data point over each Inline graphic day periods for Inline graphic compared with one data point per Inline graphic days time bin for Inline graphic. The shaded square represents the domain over which the theory applies (86 over 213 data points). To test quantitatively the relation Inline graphic, we used a Gaussian bivariate distribution model. The dotted ellipses show the first three standard deviations equi-levels around the barycenter Inline graphic and the black line represents the principal axis of the bi-Gaussian model. We also performed a principal component analysis (PCA). The red dotted lines show the two main directions of the variance obtained with the PCA. Both methods support a positive correlation between Inline graphic and Inline graphic with slope Inline graphic with the bi-Gaussian approach and Inline graphic with PCA. To our knowledge, this may be the first empirical test ever of the renormalization of the exponent Inline graphic of first generation events into the renormalized exponent Inline graphic due to the cascade of triggering over all generation in a critical branching process [36], [52].

The evidence for the relationship between Inline graphic and Inline graphic is presented in panel A of Figure 7. First, one can observe a prevalence of the large-deviation critical interaction regime as the grey square area delimited by Inline graphic is very densely populated (184 out of 390). Second, as already pointed out, the barycenter of the cloud of data points is on Inline graphic, as expected from theory. However, we find limited support for a clear linear relation between Inline graphic and Inline graphic. The bi-Gaussian model analysis provides the three dotted ellipses showing the first three standard deviations away from the barycenter. The black line representing the main axis of the bi-Gaussian model suggests a negative correlation between Inline graphic and Inline graphic. Using a PCA analysis, we find a positive relationship on the second principal component, with slope Inline graphic. These results suggest that very productive projects and periods within projects, characterized by a large superlinear exponent Inline graphic, are likely to be due to more complex interactions between the developers and their mutual triggering that assumed by the simple theory developed above. In particular, differentiation between same-developer commit triggering and inter-developer commit triggering seem necessary along the lines of Refs. [19], [55].

Reconciling present findings and superlinear production in large cities

Figure 8 reveals that the clouds of superlinear production exponent Inline graphic exhibit an interesting regularity as a function of the total number of contributors Inline graphic of an OSS project. The intuition motivating this investigation is the following. While a minimum critical mass of contributors is needed to foster productive bursts, large projects suffer from coordination costs, which may offset the increasing return of productive activity. Figure 8 (panel A) shows indeed that the superlinear exponent Inline graphic decreases on average with the size of the projects. Panel B demonstrates that, for projects of up to Inline graphic contributors, the number of Inline graphic days periods with Inline graphic (superlinear regime) increases as a function of the total number Inline graphic of developers, approximately according to

graphic file with name pone.0103023.e275.jpg (14)

Figure 8. Evolution of the superlinear exponent Inline graphic as a function of project size.

Figure 8

(A) Average superlinear exponent Inline graphic per project as a function of the cumulative number of contributors. The circle size reflects the number of exponents fitted per Inline graphic time window, for each project and entering the average statistics. The sampling ranges from Inline graphic (small disks) to Inline graphic (largest disk). Inline graphic exhibits a slightly negative slope Inline graphic as a function of Inline graphic (Inline graphic and Inline graphic). (B) To measure the prevalence of productive bursts in projects, we measure the ratio of periods with superlinear exponent Inline graphic over all Inline graphic periods for each project as a function of Inline graphic. We distinguish a cluster of points around Inline graphic and Inline graphic (i.e. Inline graphic contributors) with a positive relationship (Inline graphic) of the ratio as a function of Inline graphic. Projects with a large pool of contributors (Inline graphic) are more randomly scattered with a lower ratio and do not obey the same relationship, suggesting a different regime.

For Inline graphic, a different regime occurs characterized by a much smaller ratio of the time periods with superlinear productivity (Inline graphic). Taken together, the two panels of Figure 8 support the view that superlinear productivity is the appanage of relatively small projects with no more than 30–40 developers in total, while larger groups face the difficult challenge of creating and maintaining productive bursts. The data is too scattered unfortunately to allow us to draw a firm conclusion on the value(s) that Inline graphic converges towards for large project sizes.

There may be a link between our results and a previous study reporting the phenomenon of superlinearity on a completely different class of objects, namely cities. Data from 360 US metropolitan areas have shown that wages, number of patents, GDP and intensity of crime scale superlinearly with population size [production Inline graphic] with an exponent Inline graphic [56], [57]. The value of Inline graphic larger than Inline graphic reflects the fact that productivity increases by about 11% with each doubling in population [58]. Qualitatively in line with our findings, the superlinearity found in our OSS data is significantly stronger (Inline graphic on average, with large variations and some projects being characterised by much larger Inline graphic's) for the smaller projects with no more than 30–40 developers. We note that our results apply to a completely different range of group sizes compared with the results for cities involving population of tens of thousand to tens of millions inhabitants.

The underlying mechanisms are perhaps different [59]. For cities, the superlinear scaling in urban productivity demonstrates the importance of cities as centers of enhanced interactions, leading to generation and exchange of knowledge and exploitation of innovations [58]. For the OSS projects, many other factors come into play, such as the role of diversity and complementarity, which describes the fact that doing more of one thing increases the return to doing more of another. Other possible mechanisms include synergies, economies of scale, coordination and leadership, role model and entrainment effect, motivations, friendship and other psychological factors. However, Figure 8 suggests that these mechanisms dampen out as the project size becomes very large, possibly leaving only those still active at the level of city sizes.

Expanding on the remark on the different sizes involved in our OSS database compared with cities, we present a simple mechanism and theoretical argument that may explain the smaller value of the superlinear exponent for cities, deriving it from our results obtained for small group sizes. The key idea is that the population of a city can be partitioned into many groups of persons interacting closely within a group and loosely or not at all across groups. Groups can be firms, or department within firms, clubs, and other organisations through which people interact. We assume that, within each group, the superlinear production law (1) holds with the exponent Inline graphic found in our OSS database.

The second ingredient is that group sizes Inline graphic are widely distributed, roughly as Zipf's law [15],

graphic file with name pone.0103023.e306.jpg (15)

where Inline graphic is the probability density function of the group sizes Inline graphic, Inline graphic if Zipf's law holds exactly, while in general Inline graphic can deviate from Inline graphic for a variety of reasons [16]. Let us assume that a city of total population Inline graphic is constituted of Inline graphic groups, respectively with memberships of Inline graphic individuals. The total production of the city is then, according to (1),

graphic file with name pone.0103023.e315.jpg (16)

assuming for the moment and for simplicity that Inline graphic is independent of group sizes. Inline graphic in expression (16) can be estimated as [38], [39]

graphic file with name pone.0103023.e318.jpg (17)

where Inline graphic is the largest group size among the Inline graphic groups, which can be estimated by

graphic file with name pone.0103023.e321.jpg (18)

By conservation and assuming for simplicity no strong overlap between the groups, we have approximately

graphic file with name pone.0103023.e322.jpg (19)

This leads to Inline graphic for Inline graphic and Inline graphic for Inline graphic. In words, a relatively thin tail of the group size distribution (Inline graphic) is associated with a number of group scaling proportionally to the total city population Inline graphic. In contrast, for a heavy tailed distribution (Inline graphic), the number of groups scales sublinearly with Inline graphic, as the few largest groups account for a finite fraction of total population. Reporting in expression (17), this yields Inline graphic, with the exponent Inline graphic obeying three possible regimes.

  1. Inline graphic implies Inline graphic: the same superlinear production exponent defines the whole city production as a function of its population as does the production of each independent group. The mechanism is clear: for Inline graphic, a few single largest groups dominate the Inline graphic-partition and account for the majority of the city population. The same scaling holds essentially because the city is almost controlled by a single group and we have assumed the same exponent Inline graphic for all groups. The empirical evidence suggests that this case does not apply.

  2. Inline graphic implies Inline graphic. In this regime, there are still very large groups that contribute to the superlinearity but their relative numbers is much less than for Inline graphic. The values Inline graphic with Inline graphic can be reconciled with Inline graphic. This exponent is, with error bounds, roughly compatible with the value found for firms in the US, close to Inline graphic [60].

  3. Inline graphic implies Inline graphic, which corresponds to a linear growth of production of the city with its population. In this regime, the overall city production is controlled by the many small groups constituting the city and there are no scale effects other than a proportionality with the number of small groups.

While this argument is quite naive, it demonstrates the importance of the interplay between partitions of cities in groups, the corresponding productivity of such groups and the size distribution of these groups. A similar story is likely to be relevant in large OSS projects, groups and firms, which for a variety of reasons ranging from cognitive limitations [61] to efficiency maximization [62] are found to organize in subgroups, often in a hierarchical way [61].

Discussion

In the early days of the industrial revolution, Adam Smith noted how the successive efficiency gains of communication means have helped reach unprecedented pools of resources and how they have unlocked some limitations of the labor market through improved division of labor [63]. The telegraph, telephone and more recently the Internet have further pushed back the possibilities for knowledge production and for labor organizations on the model of collective action [64]. Nowadays, unrelated people spontaneously team up across the world in open collaboration projects and join forces to create knowledge in the form of software, natural language [65], mathematics [66] as well as for the production of tangible goods [67]. These organizations rely primarily on the principles of peer-production [68]: (i) task self-selection, (ii) peer-review and (iii) iterative improvement, at odds with traditional market and firm production organizations [69]. Expertise can be timely and rightly pulled from a broader community towards efficient problem resolution. The present understanding of group performance in social psychology goes in the same direction: experiments involving small groups performing coordination tasks [8], [70], problem solving [37] and innovation [14] support the hypothesis that larger groups perform better because more diverse cognitive abilities can be pooled. Group productive activity can also be more than the sum of their parts if members develop social sensitivity among each others [20]. However, the marginal gain of having more individuals in a group decreases rapidly to be negligible beyond five individuals [37], [71], [72]. Similarly, as projects attract larger communities, more coordination is required through social norms and formal governance structures [21], which may in turn reduce the positive effects of peer-production [73].

Conclusion

In this paper, we have shown that productive bursts, associated with increasing return of activity, result from the mechanism of critical triggering of commits among contributors. Specifically, we have shown that production intensity, or production per unit time, grows superlinearly as a function of the number of participants in a group. Practically, we have found a superlinear relationship Inline graphic with Inline graphic between the total number Inline graphic of commits measured per Inline graphic-day time windows for different OSS projects and Inline graphic is the number of active contributors in the same Inline graphic-day time windows. We have found that these results are robust with respect to the length Inline graphic of the time windows, i.e. when varying Inline graphic from 1 day to 10 days.

Such critical triggering may operate according two co-existing mechanisms: interactions and large deviations. These mechanisms have been falsified in three independent ways: (i) documenting the superlinear relationship between productive activity Inline graphic and the number of active contributors Inline graphic characterized by the scaling exponent Inline graphic; (ii) measuring the power law tail distribution of first generation cascades with exponent Inline graphic and checking that it explains the superlinear productivity exponent Inline graphic; and (iii) measuring the power law tail distribution of production cluster sizes with exponent Inline graphic and verifying that it is approximately equal to the Inline graphic, where Inline graphic is the distribution of contributions per developer at short times.

We have found that superlinear productive activity holds for a broad range of project sizes and types, with a slight decrease of the average scaling exponent Inline graphic with the total number of contributors Inline graphic. The frequency of productive bursts occurrence in projects has been found to be very large for Inline graphic compared with larger projects. The results suggest that size and threshold effects have an influence on the ability to trigger and maintain critical triggering of individual contributions. Indeed, contributions must create enough reaction opportunities to trigger on average as many follow-up contributions. Pervasive communication systems (social networks), physical proximity (e.g. cities), or even personal dedication to the project surely help increase opportunities for a contribution to trigger a follow-up action. On the other hand, large and complex structures with overwhelming communication loads or inadequate governance structure can inhibit the ripe circulation and reuse of knowledge for the sake of further cumulative innovation. The large deviation mechanism provides another take-away lesson: open collaboration does not imply equal work between contributors. On the contrary, productive bursts are the hallmark of a minority of individual engagement with intense interactions and short-lived contributions of far above average sizes. Whether these large deviation contributions pull engagement by others or on the contrary are pushed by the community remains an open question to be elucidated.

Supporting Information

Table S1

Table containing summary statistics (comma separated file), Inline graphic, Inline graphic, and Inline graphic, for each project analyzed in this study.

(CSV)

Archive S1

Compressed archive of Python Numpy arrays containing the time series of all commits, including timestamp, user, file modified, for each open source software project analyzed in this study.

(ZIP)

Acknowledgments

We acknowledge useful feedback from Eric Schulte and Ryan Woodard.

Data Availability

The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files.

Funding Statement

One of the authors (T.M.) acknowledges support from the Swiss National Science Foundation (Grant Nr. PA00P2-145368). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Humphrey G (1924) The Psychology of the Gestalt. Journal of Educational Psychology 15: 401. [Google Scholar]
  • 2. Damoiseaux J, Greicius M (2009) Greater than the sum of its parts: a review of studies combining structural connectivity and resting-state functional connectivity. Brain structure & function 213: 525–533. [DOI] [PubMed] [Google Scholar]
  • 3.Jorgensen SE (2012) Introduction to Systems Ecology (Applied Ecology and Environmental Management). CRC Press. Available: http://www.worldcat.org/title/introduction-to-systems-ecology/oclc/651914819. Accesed 2014 Jul 11.
  • 4. Anderson PW (1963) Plasmons, Gauge Invariance, and Mass. Physical Review Online Archive (Prola) 130: 439–442. [Google Scholar]
  • 5. Anderson PW (1972) More Is Different. Science 177: 393–396. [DOI] [PubMed] [Google Scholar]
  • 6.Arthur WB (1994) Increasing Returns and Path Dependence in the Economy (Economics, Cognition, and Society). University of Michigan Press. Available: http://www.worldcat.org/isbn/0472064967. Accessed 2014 Jul 11.
  • 7.Krugman P (1996) The Self Organizing Economy. Wiley-Blackwell, 1 edition. Available: http://www.worldcat.org/isbn/1557866988. Accessed 2014 Jul 11.
  • 8. Tziner A, Eden D (1985) Effects of Crew Composition on Crew Performance: Does the Whole Equal the Sum of Its Parts? Journal of Applied Psychology 70: 85–93. [Google Scholar]
  • 9. Sundstrom E, De Meuse KP, Futrell D (1990) Work teams: Applications and effectiveness. American psychologist 45: 120. [Google Scholar]
  • 10. Cohen SG, Bailey DE (1997) What Makes Teams Work: Group Effectiveness Research from the Shop Floor to the Executive Suite. Journal of Management 23: 239–290. [Google Scholar]
  • 11. Neuman GA, Wright J (1999) Team effectiveness: beyond skills and cognitive ability. Journal of Applied Psychology 84: 376. [DOI] [PubMed] [Google Scholar]
  • 12. Ennen E, Richter A (2010) The Whole Is More Than the Sum of Its Parts Or Is It? A Review of the Empirical Literature on Complementarities in Organizations. Journal of Management 36: 207–233. [Google Scholar]
  • 13. Lin Y, Beyerlein MM (2006) Communities of practice: A critical perspective on collaboration. Advances in Interdisciplinary Studies of Work Teams 12: 53–79. [Google Scholar]
  • 14. Sacramento CA, Chang MWS, West MA (2006) Team innovation through collaboration. Advances in Interdisciplinary Studies of Work Teams 12: 81–112. [Google Scholar]
  • 15.Saichev AI, Malevergne Y, Sornette D (2009) Theory of Zipf's Law and Beyond (Lecture Notes in Economics and Mathematical Systems). Springer, 1st edition. edition. Available: http://www.worldcat.org/isbn/3642029450. Accessed 2014 Jul 11.
  • 16. Malevergne Y, Saichev AI, Sornette D (2013) Zipf's law and maximum sustainable growth. Journal of Economic Dynamics and Control 37: 1195–1212. [Google Scholar]
  • 17.Maillart T, Sornette D, Spaeth S, von Krogh G (2008) Empirical Tests of Zipf's Law Mechanism in Open Source Linux Distribution. Physical Review Letters 101: 218701+. [DOI] [PubMed]
  • 18. Zhang Q, Sornette D (2011) Empirical test of the origin of Zipf's law in growing social networks. Physica A: Statistical Mechanics and its Applications 390: 4124–4130. [Google Scholar]
  • 19. Saichev AI, Maillart T, Sornette D (2013) Hierarchy of temporal responses of multivariate self-excited epidemic processes. The European Physical Journal B 86: 1–19. [Google Scholar]
  • 20. Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW (2010) Evidence for a Collective Intelligence Factor in the Performance of Human Groups. Science 330: 686–688. [DOI] [PubMed] [Google Scholar]
  • 21. O'Mahony S, Ferraro F (2007) The Emergence of Governance in an Open Source Community. Academy of Management Journal 50: 1079–1106. [Google Scholar]
  • 22.Cardy JL (1988) Finite-size Scaling, volume 2. Current physics series, Elsevier Science Ltd.
  • 23.Grimmett GR (1994) Probability and phase transition, volume 420. Springer.
  • 24.Liggett TM (2005) Interacting particle systems. Springer.
  • 25.Galam S (2012) What is Sociophysics About? Springer.
  • 26.Domb C, Green MS, Lebowitz JL (1979–2001) Phase transitions and critical phenomena, volume 1–20. Academic Press.
  • 27.Sornette D, Deschâtres F, Gilbert T, Ageon Y (2004) Endogenous Versus Exogenous Shocks in Complex Networks: An Empirical Test Using Book Sale Rankings. Physical Review Letters 93: 228701+. [DOI] [PubMed]
  • 28. Deschatres F, Sornette D (2005) Dynamics of book sales: Endogenous versus exogenous shocks in complex networks. Physical Review E 72: 016112. [DOI] [PubMed] [Google Scholar]
  • 29. Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences 105: 15649–15653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Madey G, Freeh V, Tynan R (2002) The open source software development phenomenon: An analysis based on social network theory. In: Americas conf. on Information Systems (AMCIS2002). pp. 1806–1813.
  • 31.Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10.
  • 32. Dicke RH (1954) Coherence in Spontaneous Radiation Processes. Physical Review 93: 99–110. [Google Scholar]
  • 33. Gross M, Haroche S (1982) Superradiance: An essay on the theory of collective spontaneous emission. Physics Reports 93: 301–396. [Google Scholar]
  • 34.Benedict M (1996) Super-radiance multiatomic coherent emission. Taylor and Francis group, New York.
  • 35. John S, Quang T (1995) Localization of Superradiance near a Photonic Band Gap. Phys Rev Lett 74: 3419–3422. [DOI] [PubMed] [Google Scholar]
  • 36. Saichev AI, Helmstetter A, Sornette D (2005) Power-law distributions of offspring and generation numbers in branching models of earthquake triggering. Pure and Applied Geophysics 162: 1113–1134. [Google Scholar]
  • 37. Shaw ME (1932) A comparison of individuals and small groups in the rational solution of complex problems. The American Journal of Psychology 44: 491–504. [Google Scholar]
  • 38. Bouchaud JP, Georges A (1990) Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications. Physics Reports 195: 127–293. [Google Scholar]
  • 39.Sornette D (2006) Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools (Springer Series in Synergetics). Springer, 2nd edition. Available: http://www.worldcat.org/isbn/3540308822. Accessed 2014 Jul 11.
  • 40. Hawkes AG, Oakes D (1974) A Cluster Process Representation of a Self-Exciting Process. Journal of Applied Probability 11: 493–503. [Google Scholar]
  • 41. Mohler GO, Short MB, Brantingham PJ, Schoenberg FP, Tita GE (2011) Self-Exciting Point Process Modeling of Crime. Journal of the American Statistical Association 106: 100–108. [Google Scholar]
  • 42.Baldwin A, Gheyas I, Ioannidis C, Pym D, Willams J (2012) Contagion in Cybersecurity Attacks. In: Workshop of Economics of Information Security (WEIS 2012).
  • 43.Aït-Sahalia Y, Cacho-Diaz J, Laeven RJA (2010) Modeling Financial Contagion Using Mutually Exciting Jump Processes. National Bureau of Economic Research Working Paper Series: 15850+.
  • 44.Filimonov V, Sornette D (2012) Quantifying reflexivity in financial markets: Toward a prediction of flash crashes. Physical Review E 85 : 056108+. [DOI] [PubMed] [Google Scholar]
  • 45. Filimonov V, Bicchetti D, Maystre N, Sornette D (2014) Quantification of the high level of endogeneity and of structural regime shifts in commodity markets. Journal of International Money and Finance 42: 174–192. [Google Scholar]
  • 46.Maillart T, Sornette D, Frei S, Duebendorfer T, Saichev AI (2011) Quantification of deviations from rationality with heavy tails in human dynamics. Physical Review E 83 : 056101+. [DOI] [PubMed] [Google Scholar]
  • 47.Daley DJ, Vere-Jones D (2003) An introduction to the theory of point processes. Springer, 2nd edition. Available: http://www.worldcat.org/isbn/0387213376. Accessed 2014 Jul 11.
  • 48. Helmstetter A, Sornette D (2002) Subcritical and supercritical regimes in epidemic models of earthquake aftershocks. Journal of geophysical research 107 (B10): 2237 doi:10.1029/2001JB001580 [Google Scholar]
  • 49. Helmstetter A, Sornette D (2003) Importance of direct and indirect triggered seismicity in the ETAS model of seismicity. Geophys Res Lett 30 11: 1576+. doi:10.1029/2003GL017670 [Google Scholar]
  • 50. Saichev A, Sornette D (2004) Anomalous Power Law Distribution of Total Lifetimes of Aftershock Sequences. Phys Rev E 70: 046123. [DOI] [PubMed] [Google Scholar]
  • 51. Saichev AI, Sornette D (2010) Generation-by-generation dissection of the response function in long memory epidemic processes. The European Physical Journal B - Condensed Matter and Complex Systems 75: 343–355. [Google Scholar]
  • 52.Harris TE (2002) The theory of branching processes. Courier Dover Publications.
  • 53.Saichev A, Sornette D (2014) Superlinear scaling of offspring at criticality in branching processes. Physical Review E 89 : 012104+. [DOI] [PubMed]
  • 54. Clauset A, Shalizi CR, Newman MEJ (2009) Power-Law Distributions in Empirical Data. SIAM Review 51: 661–703. [Google Scholar]
  • 55. Saichev AI, Sornette D (2011) Generating Functions and Stability Study of Multivariate Self-Excited Epidemic Processes. European Physical Journal B 83: 271–282. [Google Scholar]
  • 56. Bettencourt LM, Lobo J, Helbing D, Kühnert C, West GB (2007) Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of Sciences of the United States of America 104: 7301–7306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Bettencourt L, West G (2010) A unified theory of urban living. Nature 467: 912–913. [DOI] [PubMed] [Google Scholar]
  • 58.Bettencourt LMA, Lobo J, Strumsky D, West GB (2010) Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities. PloS one 5. [DOI] [PMC free article] [PubMed]
  • 59. Bettencourt LMA (2013) The Origins of Scaling in Cities. Science 340: 1438–1441. [DOI] [PubMed] [Google Scholar]
  • 60. Ramsden JJ, Kiss-Haypál G (2000) Company size distribution in different countries. Physica A: Statistical Mechanics and its Applications 277: 220–227. [Google Scholar]
  • 61. Zhou WX, Sornette D, Hill RA, Dunbar RIM (2005) Discrete hierarchical organization of social group sizes. Proceedings of the Royal Society B: Biological Sciences 272: 439–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Toulouse G, Bok J (1978) Principe de moindre difficulté et structures hiérarchiques. Revue Française de Sociologie 19: 391+.
  • 63.Smith A (1776) The Wealth of Nations (Bantam Classics). Bantam Classics. Available: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20\&path = ASIN/0553585975. Accessed 2014 Jul 11.
  • 64.Ostrom E (1990) Governing the commons: the evolution of institutions for collective action. Cambridge University Press. Available: http://www.worldcat.org/isbn/0521405998. Accessed 2014 Jul 11.
  • 65. Wuchty S, Jones BF, Uzzi B (2007) The Increasing Dominance of Teams in Production of Knowledge. Science 316: 1036–1039. [DOI] [PubMed] [Google Scholar]
  • 66. Gowers T, Nielsen M (2009) Massively collaborative mathematics. Nature 461: 879–881. [DOI] [PubMed] [Google Scholar]
  • 67. Raasch C, Herstatt C, Balka K (2009) On the open design of tangible goods. R&D Management 39: 382–393. [Google Scholar]
  • 68.Benkler Y (2002) Coase's Penguin, or, Linux and “The Nature of the Firm”. The Yale Law Journal 112: 369+.
  • 69. Coase RH (1937) The Nature of the Firm. Economica 4: 386–405. [Google Scholar]
  • 70. Ingham AG, Levinger G, Graves J, Peckham V (1974) The Ringelmann effect: Studies of group size and group performance. Journal of Experimental Social Psychology 10: 371–384. [Google Scholar]
  • 71. Gordon K (1924) Group Judgments in the Field of Lifted Weights. Journal of Experimental Psychology 7: 398. [Google Scholar]
  • 72. Laughlin PR, Johnson HH (1966) Group and individual performance on a complementary task as a function of initial ability level. Journal of Experimental Social Psychology 2: 407–414. [Google Scholar]
  • 73. Halfaker A, Geiger RS, Morgan JT, Riedl J (2013) The Rise and Decline of an Open Collaboration System. American Behavioral Scientist 57: 664–688. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1

Table containing summary statistics (comma separated file), Inline graphic, Inline graphic, and Inline graphic, for each project analyzed in this study.

(CSV)

Archive S1

Compressed archive of Python Numpy arrays containing the time series of all commits, including timestamp, user, file modified, for each open source software project analyzed in this study.

(ZIP)

Data Availability Statement

The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES