Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2015 Aug 10;112(34):E4671–E4680. doi: 10.1073/pnas.1501444112

Quantifying the impact of weak, strong, and super ties in scientific careers

Alexander Michael Petersen 1,1
PMCID: PMC4553788  PMID: 26261301

Significance

A scientist will encounter many potential collaborators throughout his/her career. As such, the choice to start or terminate a collaboration can be an important strategic consideration with long-term implications. While previous studies have focused primarily on aggregate cross-sectional collaboration patterns, here we analyze the collaboration network from a researcher’s local perspective along his/her career. Our longitudinal approach reveals that scientific collaboration is characterized by a high turnover rate juxtaposed with surprisingly frequent “life partners.” We show that these extremely strong collaborations have a significant positive impact on productivity and citations—the apostle effect—representing the advantage of “super” social ties characterized by trust, conviction, and commitment.

Keywords: computational social science, cooperation, team science, career evaluation, bibliometrics

Abstract

Scientists are frequently faced with the important decision to start or terminate a creative partnership. This process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of scientific collaboration, we analyzed 473 collaboration profiles using an egocentric perspective that accounts for researcher-specific characteristics and provides insight into a range of topics, from career achievement and sustainability to team dynamics and efficiency. From more than 166,000 collaboration records, we quantify the frequency distributions of collaboration duration and tie strength, showing that collaboration networks are dominated by weak ties characterized by high turnover rates. We use analytic extreme value thresholds to identify a new class of indispensable super ties, the strongest of which commonly exhibit >50% publication overlap with the central scientist. The prevalence of super ties suggests that they arise from career strategies based upon cost, risk, and reward sharing and complementary skill matching. We then use a combination of descriptive and panel regression methods to compare the subset of publications coauthored with a super tie to the subset without one, controlling for pertinent features such as career age, prestige, team size, and prior group experience. We find that super ties contribute to above-average productivity and a 17% citation increase per publication, thus identifying these partnerships—the analog of life partners—as a major factor in science career development.


Science operates at multiple scales, ranging from the global and institutional scale down to the level of groups and individuals (1). Integrating this system are multiscale social networks that are ripe with structural, social, economic, and behavioral complexity (2). A subset of this multiplex is the scientific collaboration network, which forms the structural foundation for social capital investment, knowledge diffusion, reputation signaling, and important mentoring relations (38).

Here we focus on collaborative endeavors that result in scientific publication, a process that draws on various aspects of social ties, e.g., colocation, disciplinary identity, competition, mentoring, and knowledge flow (9). The dichotomy between strong and weak ties is a longstanding point of research (10). However, in “science of science” research, most studies have analyzed macroscopic collaboration networks aggregated across time, discipline, and individuals (1121). Hence, despite these significant efforts, we know little about how properties of the local social network affect scientists’ strategic career decisions. For example, how might creative opportunities in the local collaboration network impact a researcher’s decision to explore new avenues versus exploiting old partnerships, and what may be the career tradeoffs in the short versus the long term, especially considering that academia is driven by dynamic knowledge frontiers (22, 23).

Against this background, we develop a quantitative approach for improving our understanding of the role of weak and strong ties, meanwhile uncovering a third classification—the super tie—which we find to occur rather frequently. We analyzed longitudinal career data for researchers from cell biology and physics, together comprising a set of 473 researcher profiles spanning more than 15,000 career years, 94,000 publications, and 166,000 collaborators. To account for prestige effects, we define two groups within each discipline set, facilitating a comparison of top-cited scientists with scientists who are more representative of the entire researcher population (henceforth referred to as “Other”). From the Ni publication records spanning the first Ti career years of each central scientists i, we constructed longitudinal representations of each scientist’s coauthorship history.

We adopt an egocentric perspective to track research careers from their inception along their longitudinal growth trajectory. By using a local perspective, we control for the heterogeneity in collaboration patterns that exists both between and within disciplines. We also control for other career-specific collaboration and productivity differences that would otherwise be averaged out by aggregate cross-sectional methods. Thus, by simultaneously leveraging multiple features of the data—resolved over the dimensions of time, individuals, productivity, and citation impact—our analysis contributes to the literature on science careers as well as team activities characterized by dynamic entry and exit of human, social, and creative capital. Given that collaborations in business, industry, and academia are increasingly operationalized via team structures, our findings provide relevant quantitative insights into the mechanisms of team formation (15), efficiency (24), and performance (25, 26).

The organization of our study is structured as follows. The longitudinal nature of a career requires that we start by quantifying the tie strength between two collaborators from two different perspectives: duration and strength. First we analyze the collaboration duration, Lij, defined as the time period between the first and last publication between two researchers i and j. Our results indicate that the “invisible college” defined by collaborative research activities (i.e., excluding informal communication channels and arm’s-length associations) is surprisingly dominated by high-frequency interactions lasting only a few years. We then focus our analysis on the collaborative tie strength, Kij, defined as the cumulative number of publications coauthored by i and j during the Lij years of activity.

From the entire set of collaborators, we then identify a subset of super tie coauthors—those j with Kij values that are statistically unlikely according to an author-specific extreme value criteria. Because almost all of the researchers we analyzed have more than one super tie, and roughly half of the publications we analyzed include at least one super tie coauthor, we were able to quantify the added value of super ties—for both productivity and citation impact—in two ways: (i) using descriptive measures and (ii) implementing a fixed-effects regression model. Controlling for author-specific features, we find that super ties are associated with increased publication rates and increased citation rates.

We term this finding the “apostle effect,” signifying the dividends generated by extremely strong social ties based upon mutual trust, conviction, and commitment. This term borrows from biblical context, where an apostle represents a distinguished partner selected according to his/her noteworthy attributes from among a large pool of candidates. What we do not connote is any particular power relation (hierarchy) between i and the super tie coauthors, which is beyond the scope of this study. Also, because the perspective is centered around i, our super tie definition is not symmetric, i.e., if j is a super tie of i, i is not necessarily a super tie of j.

Because super ties have significant long-term impact on productivity and citations, our results are important from a career development perspective, reflecting the strategic benefits of cost, risk, and reward sharing via long-term partnership. The implications of research partnerships will become increasingly relevant as more careers become inextricably embedded in team science environments, wherein it can be difficult to identify contributions, signal achievement, and distribute credit. The credit distribution problem has received recent attention from the perspectives of institutional policy (8), team ethics (7), and practical implementation (2729).

Methods

Our study implements an ego network perspective, centered around each researcher career i, with weighted links connecting the central scientist to the peripheral nodes representing his/her collaborators (indexed by j). We constructed each ego network using longitudinal publication data from Thompson Reuters Web of Knowledge (TRWOK), comprising 193 biology and 280 physics careers in total. Each career profile is constructed by aggregating the publication, citation, and collaboration metadata over the first t=1Ti years of his/her career. We downloaded the TRWOK data in calendar year Yi, which is the citation count census year. Each disciplinary set includes a subset of 100 highly cited scientists (hereafter referred to as “Top”), selected using a ranking of the top-cited researchers in the high-impact journals Physical Review Letters and Cell. The rest of the researcher profiles (Other) are aggregated across physics and cell biology, with subsets that are specifically active in the domains of graphene, neuroscience, molecular biology, and genomics. The Other dataset only includes i with at least as many publications as the smallest Ni among the top-cited researchers: As such, Ni52 for biology and Ni46 for physics. This facilitates a reasonable comparison between Top and Other, possibly identifying differences attributable to innate success factors. See SI Text for further details on the data methods and selection.

This longitudinal approach leverages author-specific factors, revealing how career paths are affected by idiosyncratic events. To motivate this point, Fig. 1 illustrates the career trajectory of A. Geim, cowinner of the 2010 Nobel Prize in Physics. This schematic highlights three fundamental dimensions of collaboration ties—duration, strength, and impact: (i) each horizontal line indicates the collaboration of length Lijtijftij0+1 between i and coauthor j, beginning with their first joint publication in year tij0 and ending with their last observed joint publication in year tijf; (ii) the circle color indicates the total number of joint publications, Kij, representing our quantitative measure of tie strength; and (iii) the circle size indicates the net citations Cij=pcj,p in Yi, summed over the citations cj,p all publications p that include i and j.

Fig. 1.

Fig. 1.

Visualizing the embedding of academic careers in dynamic social networks. A career schematic showing A. Geim’s collaborations, ordered by entry year. Notable career events include the first publication in 2000 with K. S. Novoselov (cowinner of the 2010 Nobel Prize in Physics) and their first graphene publication in 2004. An interesting network reorganization accompanies Geim’s institutional move from Radboud University Nijmegen (The Netherlands) to University of Manchester (United Kingdom) in 2001. Moreover, the rapid accumulation of coauthors following the 2004 graphene discovery signals the new opportunities that accompany reputation growth.

This method of representing a science career, as illustrated in Figs. S1S3, highlights the variability in collaboration strengths, both between and within career profiles. It is also worth mentioning that because multiple j may contribute to the same p, it is possible for coauthor measures to covary. However, for the remainder of the analysis, we focus on the dyadic relations between only i and j, leaving the triadic and higher-order team structures as an avenue for future work. For example, it would be interesting to know the likelihood of triadic closure between any two super ties of i, signaling coordinated cooperation; or, contrariwise, low triadic closure rates may indicate hierarchical organization around i.

Fig. S1.

Fig. S1.

Complex relations between productivity, collaboration, and impact. AD are for A. K. Geim, who is characterized by an average collaboration duration of 2.1 y (calculated including the collaborations with Lij=1 but excluding the collaborations active in the last 2 y), a characteristic tie strength Ki=3.7 publications, a collaboration radius of Si=303 coauthors, and Ni(2012)=217 total publications; EF are for D. Acemoglu, who is characterized by an average collaboration duration of 1.6 y (also calculated including the collaborations with Lij=1 but excluding the collaborations active in the last 2 y), Ki=2.9 publications, Si=51 coauthors, and Ni(2012)=118 publications. These schematics demonstrate how the visualization of dynamic ego network changes if we use publication and citation measures that are normalized by Lij, resulting in per-year-of-collaboration (intensity) measures. (A) Collaboration measures calculated per unit time, for comparison with Fig. 1. (BD) Scatter plots for the profile of A. K. Geim relating collaboration duration (Lij), with (B) collaboration strength (Kij), (D) pairwise team size (Wij), and (C) citations (Cij). Wij is the total number of coauthors (nondistinct) on publications including i and j, a proxy for pairwise collaborative input, conditioned on i and j. The dashed line in each panel represents the ordinary least-squares fit of the log of the variables. As such, the logarithmic slope (scaling exponent) is listed in each panel, and the value in parentheses represents the SE in the last digit reported. (E and F) Economics is a field not traditionally considered to be collaborative at the rates of physics or biology. Nevertheless, prestige and collaboration life cycles are still important factors, independent of discipline. To demonstrate this, we show the career profile of the highly cited economist, Daron Acemoglu. Notable landmark achievements are indicated, including the early partnership with James A. Robinson in 2000, and their groundbreaking book, Economic Origins of Dictatorship and Democracy, published in 2005 (48). (E) Net collaboration measures for D. Acemoglu, analogous to Fig. 1. (F) Collaboration measures calculated per unit time, analogous to A.

Fig. S3.

Fig. S3.

Collaboration life cycle for the (A and C) Other biology and the (B and D) Other physics datasets. Other datasets: (A and B) Average collaboration strength, normalized to peak value, measured τ years after the initiation of the collaboration tie. (Insets) On log-linear axes, the decay appears as linear, corresponding to an exponential form. (C and D) For each {x} group, we show the average and SD (error bar) of τ1/2; we use logarithmically spaced {x} groups that correspond by color to the same {x} as in A and B. The ζ value quantifies the scaling of τ1/2 as a function of the normalized coauthor strength xKij/Ki. The sublinear (ζ<1) values indicate that collaborations are distributed over a timescale that grows slower than proportional to x; conversely, this means that longer collaborations are more productive, being characterized by increasing marginal returns (1/ζ>1). Fig. 3 shows the analogous plot for the Top physics and biology datasets; all four datasets exhibit similar features.

Results

Quantifying the Collaboration Lifetime Distribution.

We use Lij to measure the duration of the productive interaction between i and j. Across researcher profiles, we find that a remarkable 60−80% of the collaborations have Lij=1 year (see Fig. S4). Considering the overwhelming dominance of the Lij=1 events, in this subsection, we concentrate our analysis on the subset of repeat collaborations with Lij>1 that produced two or more publications. Furthermore, due to censoring bias, Lij values estimated for j who are active around the final career year of the data (Ti) may be biased toward small values. To account for this bias, in this subsection, we also exclude those collaborations that were active within the final Lic-year period, defining Lic as an initial average Lij value calculated across all j for each i. Then, we calculate a second representative mean value, Li, which is calculated excluding the j with Lij=1 and the j active in the final Lic-year period. Fig. 2A shows the probability distribution P(Li), with mean values ranging from 4 y to 6 y, consistent with the typical duration of an early career position (e.g., PhD or postdoctoral fellow, assistant professor).

Fig. S4.

Fig. S4.

Additional collaboration profile measures. (A) Cumulative distribution of the number of super ties SR,i. The mean (vertical lines) and SD are18±13 (Top biology), 16±13 (Other biology), 7.3±4.8 (Top physics), and 6.8±5.1 (Other physics). The K-S test P value calculated by comparing the biology distributions is 0.12, and, for the physics distributions, it is 0.34; in both cases, the null hypothesis that the two compared datasets arise from the same distribution is not rejected at the 5% level. (B) Cumulative distribution of the empirical (unnormalized) durations Lij (years). The Lij=1 values dominate the distribution, with P(Lij=1)= 0.73 y (Top biology), 0.78 y (Other biology), 0.61 y (Top physics), and 0.58 y (Other physics). Thus, including the Lij=1 values, the mean Lij are 2.2 y (Top biology), 1.8 y (Other biology), 2.7 y (Top physics), and 2.7 y (Other physics). To avoid age cohort bias, collaborations commenced in the final Lic period of each career profile are excluded from these distributions. (C) Cumulative distribution of the productivity premium pN defined in Eq. S1. The mean and SD are 7.6±4.4 (Top biology), 8.4±3.6 (Other biology), 8.9±4.8 (Top physics), and 9.8±4.5 (Other physics). Only the two physics datasets are significantly similar (K-S p=0.35). (D) Cumulative distribution of the citation premium pC defined in Eq. S5. The mean and SD are: 12±10 (Top biology), 13±7 (Other biology), 15±16 (Top physics), and 16±14 (Other physics). The K-S test P values calculated by comparing the two Top datasets and the two Other datasets are both greater than 0.05. An interesting and consistent pattern emerges when considering the distributions of both pN and pC: The Top scientist profiles have smaller mean values than their counterparts, and the biology profiles have smaller mean value than for physics. The mean, median, and maximum values across all datasets are 14.1, 11.3, and 134, respectively, with all but two values greater than unity. Because the maximum value is an extreme outlier, we truncate the x axes showing only values of <38, which represents more than 95% of the data.

Fig. 2.

Fig. 2.

Log-logistic distribution of collaboration duration. (A) The probability distribution P(Δ) is right-skewed and well fit by the log-logistic pdf defined in Eq. 1. (Insets) The probability distribution P(Li) shows that the characteristic collaboration length in physics and biology is typically between 2 y and 6 y. (B) The decrease in the typical collaboration timescale, Δ|t, reflects how careers transition from being pursuers of collaboration opportunities to attractors of collaboration opportunities.

Establishing statistical regularities across research profiles requires the use of a normalized duration measure, ΔijLij/Li, which controls for author-specific collaboration patterns by measuring time in units of Li. The empirical distributions are right-skewed, with approximately 63% of the data with Lij<Li (corresponding to Δij<1). Nevertheless, ∼1% of collaborations last longer than 4Li15−20 y. Moreover, Fig. 2A shows that the log-logistic probability density function (pdf),

P(Δ)=(b/a)(Δ/a)b1(1+(Δ/a)b)2, [1]

provides a good fit to the empirical data over the entire range of Δij. The log-logistic (Fisk) pdf is a well-known survival analysis distribution with property Median(Δ)=a. By construction, the mean value Δ1, which reduces our parameter space to just b as a=sin(π/b)/(π/b). For each dataset, we calculate b2.6, estimating the parameter using ordinary least squares. Associated with each P(Δ) is a hazard function representing the likelihood that a collaboration terminates for a given Δij. Because b>1, the hazard function is unimodal, with a maximum value occurring at Δc=a(b1)1/b with bounds Δc>a for b>2 and Δc>1 for b>2.83...; using the best-fit a and b values, we estimate Δc0.94 (Top biology), 1.11 (Other biology), 0.77 (Top physics), and 1.08 (Other physics). Thus, Δc represents a tipping point in the sustainability of a collaboration, because the likelihood that a collaboration terminates peaks at Δc and then decreases monotonically for Δij>Δc. This observation lends further significance to the author-specific time scale Li. The log-logistic pdf is also characterized by asymptotic power-law behavior P(Δ)Δ(b+1) for large Δij.

To determine how the Δij values are distributed across the career, we calculated the mean duration Δ|t using a 5-y (sliding window) moving average centered around career age t. If the Δij values were distributed independent of t, then Δ|t1. Instead, Fig. 2B shows a negative trend for each dataset. Interestingly, the Δ|t values are consistently larger for the Top scientists, indicating that the relatively short Lij are more concentrated at larger t. This pattern of increasing access to short-term collaboration opportunities points to an additional positive feedback mechanism contributing to cumulative advantage (30, 31).

Quantifying the Collaboration Life Cycle.

The P(Δ) distribution points to the variability of time scales in the scientific collaboration network—although a small number of collaborations last a lifetime, the remainder decay quite quickly in a collaboration environment characterized by a remarkably high churn rate. Because it is possible that a relatively long Lij corresponds to just the minimum two publications, it is also important to analyze the collaboration rate. To this end, we quantify the patterns of growth and decay in tie strength using the more than 166,000 dyadic (ij) collaboration records: Kij(t) is the cumulative number of coauthored publications between i and j up to year t, and ΔKij(t)=Kij(t)Kij(t1) is the annual publication rate.

To define a collaboration trajectory that is better suited for averaging, we normalize each individual ΔKij(τ) by its peak value,

ΔKij(τ)ΔKij(τ)/Max[ΔKij(τ)]. [2]

Here ττij=ttij0+1 is the number of years since the initiation of a given collaboration. This normalization procedure is useful for comparing and averaging time series that are characterized by just a single peak.

Expecting that the collaboration trajectories depend on the tie strength, we grouped the individual ΔKij(τ) according to the normalized coauthor strength, xijKij/Ki. The normalization factor Ki=Si1j=1SiKij is calculated across the Si distinct collaborators (the collaboration radius of i), and represents an intrinsic collaboration scale that grows in proportion to both an author’s typical collaboration size and his/her publication rate. We then aggregated the N{x} trajectories in each {x} group and calculated the average trajectory,

ΔKij(τ|x)N{x}1{x}ΔKij(τ|x). [3]

Indeed, Fig. 3 shows that the collaboration life cycle ΔKij(τ|x) depends strongly on the relative tie strength xijKij/Ki. The trajectories with xij>12.0 decay over a relatively long time scale, maintaining a value approximately 0.2Max[ΔKij(τ)] even 20 y after initiation, reminiscent of a “research life partner.” The trajectories with xij[0.9,1.4] represent common collaborations that decay exponentially over the characteristic time scale Li. A mathematical side note, useful as a modeling benchmark, is the linear decay when plotted on log-linear axes, suggesting a functional form that is exponential for large τ, ΔKij(τ|x)exp[τ/τ¯].

Fig. 3.

Fig. 3.

Growth and decay of collaboration ties for (A and C) Top biology and (B and D) Top physics. (A and B) Average collaboration intensity, normalized to peak value, measured τij years after the initiation of the collaboration tie. (Insets) On log-linear axes, the decay appears as linear, corresponding to an exponential form. (C and D) For each {x} group, we show the average and SD (error bar) of τ1/2; we use logarithmically spaced {x} groups that correspond by color to the same {x} as in A and B. The ζ value quantifies the scaling of τ1/2 as a function of the normalized coauthor strength xijKij/Ki. The sublinear (ζ<1) values indicate that collaborations are distributed over a timescale that grows slower than proportional to x; conversely, this means that longer collaborations are relatively more productive, being characterized by increasing marginal returns (1/ζ>1). Fig. S3 shows the analogous plot for the Other physics and biology datasets; all four datasets exhibit similar features.

We further emphasize the ramifications of the life cycle variation by quantifying the relation between xij and the collaboration’s half-life τ1/2, defined as the number of years to reach half of the total collaborative output according to the relation Kij(t=τ1/2)=Kij/2. We observe a scaling relation for the average half life, τ1/2xζ with ζ values ranging from 0.4 to 0.5. Sublinear values (ζ<1) indicate that a collaboration with twice the strength is likely to have a corresponding τ1/2 that is less than doubled. This feature captures the burstiness of collaborative activities, which likely arises from the heterogenous overlapping of multiple timescales, e.g., the variable contract lengths in science ranging from single-year contracts to lifetime tenure, the overlapping of multiple age cohorts, and the projects and grants themselves, which are typically characterized by relatively short terms. Nevertheless, dx/dτ1/2τ1/2(1ζ)/ζ is increasing function for ζ<1, indicating an increasing marginal returns with increasing τ1/2, further signaling the productivity benefits of long-term collaborations characterized by formalized roles, mutual trust, experience, and group learning which together can facilitate efficient interactions.

Quantifying the Tie Strength Distribution.

Here we focus on the cross-sectional distribution of tie strengths within the ego network. We use the final tie strength value Kij to distinguish the strong ties (KijKi) from the weak ties (Kij<Ki). Fig. 4A shows the cumulative distribution P(Ki) of the mean tie strength Ki, which can vary over a wide range depending on a researcher’s involvement in large-team science activities. We also quantify the concentration of tie strength using the Gini index Gi calculated from each researcher’s Kij values; the distribution P(Gi) is shown in Fig. 4B. Together, these two measures capture the variability in collaboration strengths across and within disciplines, with physics exhibiting larger Ki and Gi values.

Fig. 4.

Fig. 4.

Characteristic measures of collaboration tie strength. (A) Cumulative distribution of the mean collaboration strength, Ki. The K-S test indicates that the P(Ki) are similar for biology (p=0.031) and significantly different for physics (p=0.004). Vertical lines indicate median value. (B) Cumulative distribution of Gi. The pairwise K-S test indicates that the P(Gi) are similar for biology (p=0.14) but not for physics (p=0.02). Vertical lines indicate the mean value, with physics indicating significantly higher Gi than for biology. In (C) biology and (D) physics, for each dataset, the cumulative distribution of normalized collaboration strength xij shows excellent agreement with the exponential distribution E(x)=exp[x] (gray line) over the bulk of the distribution, with the deviations in the tail regime representing less than 0.1% of the data.

Another important author-specific variable is the publication overlap between each researcher and his/her top collaborator. This measure is defined as the fraction of a researcher’s Ni publications including his/her top collaborator, fK,i=Maxj[Kij]/Ni. We observe surprisingly large variation in fK,i, with mean and SD in the range of 0.16±0.14 for the Top scientists and 0.36±0.23 for the Other scientists. Across all profiles, the min and max fK,i values are 0.03 and 0.99, respectively, representing nearly the maximum possible variation in observed publication overlap. An example of this limiting scenario is shown in Fig. S2, highlighting the “dynamic duo” of J. L. Goldstein and M. S. Brown, winners of the 1985 Nobel Prize in Physiology or Medicine; Goldstein and Brown published more than 450 publications each, with roughly 100×fK,i95% coauthored together. Remarkably, we find that overlaps larger than 50% are not uncommon, observing 100P(fK0.5)9% (biology) and 100P(fK0.5)20% (physics) of i having more than half of their publications with their strongest collaborator.

Fig. S2.

Fig. S2.

Visualizing the dynamic collaboration profile of individual researchers: the longitudinal coauthor trajectories of (A) Anderson, (B) Geim, (C) Blackburn, and (D) Goldstein; the cross-sectional rank-citation profiles of (E) Anderson, (F) Geim, (G) Blackburn, and (H) Goldstein. For each discipline, we show the collaboration profile of two Nobel laureates (A. K. Geim and J. L. Goldstein) whose top-cited research was done with their most intense collaborator, and two collaboration profiles for two Nobel laureates (P. W. Anderson and E. H. Blackburn) whose top-cited research did not exhibit this feature. Despite their common achievement, we observe a wide variation in the entry, strength, and saturation of their collaborations. To illustrate the variation in tie strength, both within and between researcher profiles, we show the rank−coauthor profile Kij(r,t), which is defined for any given t by sorting the coauthors in decreasing order by rank r, Kij(r=1,t)Kij(2,t)Kij(r=Si,t). In this way, Kij(r,t) provides a cross-sectional representation of Kij(t). As such, snapshots of Kij(r,t) taken at different t capture the temporal evolution of a researcher’s tie strength distribution, as illustrated by the gray data points in EH. (AD) Longitudinal growth of Kij(t), the cumulative number of publications with coauthor j (colored curves), and the central author’s total number of publications Ni(t) (black curve). To reduce graphical clutter, we truncate each Kij(t) at the year of the last observed collaboration; otherwise, each panel would be dominated by horizontal lines. The gray dashed line indicates Kic, which distinguishes the Kij(t) trajectories corresponding to super ties. The distance between the vertical yellow line and the right edge of each panel indicates the mean collaboration duration, Li, for each researcher. (EH) To convey the dynamics of the rank-coauthor profile, we show snapshots of Kij(r,t) for t = 5 y, 10 y, and 20 y (increasing gray dot size), in addition to the final Kij(r,t=Ti) (colored circles) calculated for the most recently available career year Ti. The lower dashed gray line indicates Ki, which separates the weak from the strong ties. The upper dashed gray line indicates Kic, which distinguishes the SR,i super ties within the subset of strong ties. Recently, the analog of the h-index has been suggested as a way to measure the “author core” derived from the rank-coauthor distribution (49). For all panels, to facilitate visual comparison, the color scale used in the left and right column is the same for each i. To identify the coauthors with the highest net citation impact, we plot curves (circles) using thickness (radius) and color that are scaled proportional to logC˜i,j, which is the log of the total citation share of coauthor j in profile i (see Eq. S4).

However, within a researcher profile, it is likely that more than just the top collaborator was central to his/her career. Indeed, key to our investigation is the identification of the extremely strong collaborators—super ties—that are distinguished within the subset of strong ties. Hence, using the empirical information contained within each researcher’s tie strength distribution, P(Kij), we develop an objective super tie criteria that is author specific. First, to gain a better understanding of the statistical distribution of Kij, we aggregated the tie strength data across all research profiles, using the normalized collaboration strength xij. Fig. 4 C and D shows the cumulative distribution P(x) for each discipline. Each P(x) is in good agreement with the exponential distribution exp[x] (with mean value x=1 by construction), with the exception in the tail, P(x)103, which is home to extreme collaborator outliers. Thus, by a second means in addition to the result for Lij, we find that roughly 2/3 of the ties we analyzed are weak (i.e., the fraction of observations with xij<1 is given by 11/e0.63).

Based upon this empirical evidence, we use the discrete exponential distribution as our baseline model, P(Kij)exp(κiKij). We then use extreme statistics arguments to precisely define the author-specific super tie threshold Kic. The extreme statistic criterion posits that, out of the Si empirical observations, there should be just a single observation with Kij>Kic. The threshold Kic is operationalized by integrating the tail of P(Kij) according to the equation 1/Si=Kij>KicP(Kij)=exp(κiKic), with the analytic relation Ki=Kij=1KijP(Kij)=eκi/(eκi1)1+1/κi for small κi. In the relatively large Si limit, Kic is given by the simple relation

Kic=(Ki1)lnSi. [4]

The advantage of this approach is that Kic is nonparametric, depending only on the observables Ki and Si. Thus, the super tie threshold is proportional to Ki1 (the 1 arises because the minimum Kij value is 1), with a logarithmic factor lnSi reflecting the sample size dependence. This extreme value criteria is generic, and can be derived for any data following a baseline distribution; for a succinct explanation of this analytic method, see page 17 of ref. 32.

In what follows, we label each coauthor j with Kij>Kic a super tie, with indicator variable Rj1. The rest of the ties with KijKic have an indicator variable Rj0. This method has limitations, specifically in the case that the collaboration profile does not follow an exponential P(Kij). For example, consider the extreme case where every Kij=1, meaning that Kic=0 (independent of Si), resulting in all coauthors being super ties (Rj=1 for all j). This scenario is rare and unlikely to occur for researchers with relatively large Ni and Si, as in our researcher sample.

Quantifying the Prevalence and Impact of Super Ties.

How common are super ties? For each profile, we denote the number of coauthors that are super ties by SR,i (with complement S!R,i=SiSR,i). Fig. S4 shows that the distribution of SR,i is rather broad, with mean and SD SR,i values 18±13 (Top biology), 16±13 (Other biology), 7.3±4.8 (Top physics), and 6.8±5.1 (Other physics). The super tie coauthor fraction, fR,i=SR,i/Si, measures the super tie frequency on a per-collaborator basis, with mean value fR0.04 (i.e., typically one super tie for every 25 coauthors). Furthermore, Fig. 5A shows that the distribution P(fR) is common across the four datasets. We tested the universality of the probability distribution P(fR) between the Top and Other researcher datasets using the Kolmogorov−Smirnov (K-S) statistic, which tests the null hypothesis that the data come from the same underlying pdf. The smallest pairwise K-S test P value between any two P(fR) is p=0.21, indicating that we fail to reject the null hypothesis that the distributions are equal, highlighting that the four datasets are remarkably well matched with respect to the distribution of fR,i.

Fig. 5.

Fig. 5.

The frequency of super ties. Vertical lines indicate the distribution mean. (A) Cumulative distribution of the fraction fR,i of the Si coauthors that are super ties. All pairwise comparisons of the distributions have K-S P values greater than 0.21, indicating a common underlying distribution P(fR). (B) Cumulative distribution of the fraction fN,i of publications that include at least one super tie coauthor. The Top scientist distributions show mean values that are significantly smaller than their counterparts. (C) Cumulative distribution of the fraction fK,i of publications coauthored with his/her top collaborator. The mean and SD for biology (Top) is 0.15±0.16, for biology (Other) is 0.31±0.16, for physics (Top) is 0.17±0.13, and for physics (Other) is 0.38±0.26. (D) The mean rate of super ties per new collaboration, λR(t), averaged over all of the profiles in each dataset using observations aggregated over consecutive 3-y periods.

On a per-paper basis, Fig. 5B shows that the fraction of a researcher’s portfolio coauthored with at least one super tie, fN,i, can vary over the entire range of possibilities, with mean and SD 0.50±0.18 (Top biology), 0.74±0.13 (Other biology), 0.42±0.19 (Top physics), and 0.58±0.23 (Other physics). Furthermore, we found that 41% of the Top scientists have fN,i0.5. Interestingly, the distributions of fK,i and fN,i indicate that top scientists have lower levels of super tie dependency than their counterparts.

We also analyzed the arrival rate of super ties. For each profile, we tracked the number of super ties initiated in year t and normalized this number by the total number of new collaborations initiated in the same year. This ratio, λR,i(t), estimates the likelihood that a new collaboration eventually becomes a super tie as a function of career age t. For example, using the set of collaborations initiated in each scientist’s first year, we estimate the likelihood that a first-year collaborator (mentor) becomes a super tie at λR(t=1)=8% (Top biology), 16% (Other biology), 14% (Top physics), and 15% (Other physics). Fig. 5D shows the mean arrival rate, λR(t), calculated by averaging over all profiles in each dataset. The super tie arrival rate declines across the career, reaching a 5% likelihood per new collaborator at t=20 and 2.5% likelihood by t=30. The decay is not as fast for the top-cited scientists, possibly reflecting their preferential access to outstanding collaborators. However, the estimate for large t is biased toward smaller values because collaborations initiated late in the career may not have had sufficient time to grow.

In The Apostle Effect I and The Apostle Effect II, we investigate the role of super ties at the microlevel by analyzing productivity at the annual time resolution and the citation impact of individual publications. In SI Text, we provide additional evidence for the advantage of super ties by developing descriptive methods that measures the net productivity and citations of the super ties relative to all other ties.

The Apostle Effect I: Quantifying the Impact of Super Ties on Annual Productivity.

We analyzed each research profile over the career years ti[6,Min(29,Ti)], separating the data into nonoverlapping Δt-year periods, and neglecting the first 5 y to allow the Lij(t) and Kij(t) sufficient time to grow. We then modeled the dependent variable, ni,t/ni, which is the productivity aggregated over Δt-year periods, normalized by the baseline average calculated over the period of analysis. Recent analysis of assistant and tenured professors has shown that the annual publication rate is governed by slow but substantial growth across the career, with fluctuations that are largely related to collaboration size (24).

To better understand the factors contributing to productivity growth, we include controls for career age t along with four additional variables measuring the composition of collaborators from each Δt-year period. First, we calculated the average number of authors per publication, a¯i,t, a proxy for labor input, coordination costs, and the research technology level. Second, we calculated the mean duration, L¯i,t, by averaging the Lij(tΔt) values (from the previous period) across only the j who are active in t, i.e., those coauthors with ΔKij(t)>0. In this way, we account for the possibility that j was not active in the previous period (tΔt), in which case Lij(tΔt) is even smaller than Lij(t)Δt. Thus, L¯i,t measures the prior experience between i and his/her collaborators. Third, for the same set of coauthors as for L¯i,t, we calculated the Gini index of the collaboration strength, Gi,tK, using the tie strength values up to the previous period, Kij(tΔt). Thus, Gi,tK provides a standardized measure of the dispersion in coauthor activity, with values ranging from 0 (all coauthors published equally in the past with i) to 1 (extreme inequality in prior publication with i). Thus, whereas L¯i,t measures the lifetime of the group’s prior collaborations, Gi,tK measures the concentration of their prior experience. Finally, for each period t, we calculated the contribution of super tie collaborators normalized by the contribution of all other collaborators,

ρi,tj|R=1ΔKij(t)j|R=0ΔKij(t), [5]

accounting for the possibility that the relative contribution of super ties may affect productivity. Although the total coauthor contribution jΔKij(t) is highly correlated with ni,t, the correlation coefficient between ρi,t and ni,t is only 0.07. We only include researchers in this analysis if there are 4 data points for which the denominator of Eq. 5 is nonzero.

We implemented a fixed-effects regression of the model

ni,tni=βi,0+βa¯lna¯i,t+βL¯L¯i,t+βGGi,tK+βρρi,t+βtti,t+ϵi,t, [6]

which accounts for author-specific time-invariant features (βi,0), using robust SEs to account for autocorrelation within each i. Because the predictors are calculated from the same ego profile, covariance is expected; for example, the highest correlation coefficient between any two independent variables is 0.32 between lna¯i,t and Gi,tK, because the variance in Kij increases proportional to the sample size (i.e., a¯i,t). Table 1 shows the results of our model estimates for Δt=1 year, and Table S1 shows the results for Δt=3 years. We also ran the regression for all of the datasets together, “All,” and provide standardized coefficients that better facilitate a comparison of the coefficient magnitudes.

Table 1.

Parameter estimates for the productivity model for ni,t in Eq. 6 using Δt=1-y-long periods

Dataset A lna¯t L¯t GtK ρt t Nobs. Adj. R2
All 466 0.002±0.029 0.054±0.008 1.788±0.134 0.110±0.013 0.029±0.002 8,483 0.19
(Std. coeff.) 0.002±0.033 0.140±0.021 0.320±0.024 0.140±0.016 0.049±0.004
P value 0.943 0.000 0.000 0.000 0.000
Biology (Top) 99 0.123±0.056 0.011±0.018 2.816±0.270 0.111±0.026 0.031±0.003 2,202 0.24
P value 0.031 0.519 0.000 0.000 0.000
Biology (Other) 95 0.061±0.056 0.067±0.025 1.654±0.287 0.071±0.023 0.053±0.006 1,467 0.29
P value 0.275 0.008 0.000 0.003 0.000
Physics (Top) 100 0.146±0.057 0.047±0.015 2.053±0.287 0.153±0.025 0.022±0.004 2,056 0.15
P value 0.012 0.002 0.000 0.000 0.000
Physics (Other) 172 0.089±0.050 0.065±0.013 1.495±0.213 0.101±0.021 0.026±0.005 2,758 0.15
P value 0.079 0.000 0.000 0.000 0.000

Each fixed-effects model was calculated using robust SEs, implemented by the Huber/White/sandwich method. Values significant at the p0.04 level are indicated in boldface. Std. coeff., the estimates of the standardized (beta) coefficients; All, the combination of all datasets.

Table S1.

Apostle effect productivity model (ni,t): Parameter estimates for the fixed-effects regression model in Eq. 6 with Δt=3-y-long periods, using robust SEs implemented by the Huber/White/sandwich method

Dataset A lna¯t L¯t GtK ρt t Nobs. Adj. R2
All 406 0.127±0.044 0.078±0.013 1.060±0.125 0.152±0.026 0.029±0.003 2,890 0.16
 (Std. coeff.) 0.169±0.059 0.218±0.038 0.268±0.032 0.176±0.030 0.060±0.005
P value 0.004 0.000 0.000 0.000 0.000
Biology (Top) 99 0.149±0.092 0.059±0.045 3.003±0.406 0.175±0.071 0.035±0.005 782 0.24
P value 0.110 0.199 0.000 0.016 0.000
Biology (Other) 84 0.126±0.094 0.067±0.041 2.159±0.504 0.080±0.055 0.047±0.008 492 0.31
P value 0.184 0.104 0.000 0.146 0.000
Physics (Top) 99 0.073±0.112 0.086±0.022 1.918±0.426 0.159±0.036 0.024±0.004 753 0.11
P value 0.514 0.000 0.000 0.000 0.000
Physics (Other) 124 0.152±0.076 0.072±0.022 1.514±0.327 0.160±0.043 0.025±0.006 863 0.13
P value 0.047 0.001 0.000 0.000 0.000

See Table 2 for results with Δt=1. Only profiles with four or more data values were included in the regression. Values significant at the p<0.02 level are indicated in boldface.

We observed a positive coefficient βρ=0.11±0.01 (p0.003 for all datasets), meaning that larger contributions by super ties are associated with above-average productivity. By way of example, consider a scenario where the super ties contribute a third of the total coauthor input, corresponding to ρi,t=0.5, the average ρi,t value we observed. Consider a second scenario with ρi,t=1, corresponding to equal input by the super ties and their counterparts (ρi,t1 for 14% of the observations). If all other parameters contribute a baseline productivity value 1, then the additional contribution from βρ corresponds to a 100×0.5βρ/(1+0.5βρ)=5.2% productivity increase. This value is consistent with the 5% productivity spillover observed in a study of star scientists (33).

We also found that periods corresponding to higher levels of prior experience are associated with below-average productivity (βL¯<0, p0.008 for all datasets except for Top biology). Despite the costs associated with tie formation, this result demonstrates that productivity can benefit from collaborator turnover. Nevertheless, above-average productivity is associated with higher inequality in the concentration of prior experience (βG>0, p<0.001 level for all datasets). Together, these results point to the benefits of strategically pairing new collaborators with incumbent ones to promote the atypical combination of knowledge backgrounds and to achieve higher scientific impact (34). The standardized coefficients in Table 1 indicate that βG is twice as strong as βρ and βL¯; interestingly, βρ and βL¯ have opposite signs yet are balanced in magnitude, suggesting a compensation strategy for group managers.

The age coefficient βt is also positive (p<0.001 level for all datasets), consistent with patterns of steady productivity growth observed for successful research careers (5, 24, 31). Possible explanatory variables to consider in extended analyses are the SD in Kij, a contact frequency (Kij/Lij) measure of tie strength intensity per Granovetter’s original operationalization (10), and absolute calendar year y, variables that we omit here to keep the model streamlined.

The Apostle Effect II: Quantifying the Impact of Super Ties on the Long-Term Citation of Individual Publications.

The impact of super ties on a publication’s long-term citation tally is difficult to measure, because, clearly, older publications have had more time to accrue citations than newer ones—a type of censoring bias—and so a direct comparison of raw citation counts for publications from different years is technically flawed. To address this measurement problem, we map each publication’s citation count ci,p,Y(y) in census year Yi to a normalized z score,

zi,p,ylnci,p,Y(y)lncYm(y)σ[lncYm(y)]. [7]

This citation measure is well suited for the comparison of publications from different y because zi,p,y is measured relative to the mean lncYm(y) number of citations by publications from the same year y, in units of the SD, σ[lncYm(y)] (31). Thus, we take advantage of the fact that the distribution of citations obeys a universal log-normal distribution for p from the same y and discipline (35). In this way, z is defined such that the distribution P(z) is sufficiently time invariant. To confirm this property, we aggregated zi,p,y within successive 8-y periods, and calculated the conditional distributions P(z|y), which are stable and approximately normally distributed over the entire sample period (Fig. S5).

Fig. S5.

Fig. S5.

Distribution of normalized citation impact z. Each panel shows the pdf P(z|y) using z values aggregated over successive nonoverlapping 8-y periods. These panels demonstrate the distribution stability of P(z|y) over time, where z is the dependent variable in the citation apostle effect model in Eq. 8.

To define the detrending indices and σ[], we use the baseline journal set m comprising all research articles collected from the journals Nature, Proceedings of the National Academy of Sciences, and Science. We use this aggregation of three multidisciplinary journals only to control for the time-dependent feature of citation counts. We chose these journals as our baseline because they have relatively large impact factors (high citation rates), and so the temporal information contained in and σ[] is less noisy than other m with lower citation rates. Furthermore, because most publications reach their peak citation rate within 5−10 y after publication (5), we only analyze zi,p,y with y2003. In this way, the zi,p,y values we analyze are less sensitive to fluctuations early in the citation lifecycle, in addition to recent paradigm shifts in science such as the Internet, which affects the search, the retrieval, and the citation of prior literature, and the rise of open access publishing.

In our regression model, we use five explanatory variables that are author (i) and publication (p) specific. The first is the number of coauthors, ai,p, which controls for the tendency for publications with more coauthors to receive more citations (4). This variable is also a gross level of technology and coordination costs, because larger teams typically reflect endeavors with higher technical challenge distributed across a wider range of skill sets. We use lnai,p because the range of values is rather broad, appearing to be approximately log-normally distributed in the right tail (7). The second explanatory variable is the dummy variable Ri,p, which takes the value 1 if p includes a super tie and the value 0 otherwise. Remarkably, the percentage of publications including a super tie is rather close to parity for three of the four datasets: 54% (Top biology), 45% (Top physics), 74% (Other biology), and 54% (Other physics). The third age variable, ti,p, is the career age of i at the time of publication. The fourth variable, Ni(tp), is the total number of publications up to year ti,p, which is a non-citation-based measure of the central author’s reputation, visibility, and experience within the scientific community. The final explanatory variable is the collaboration radius, Si(tp), which is the cumulative number of distinct coauthors up to ti,p, representing the central author’s access to collaborative resources, as well as an estimate of the number of researchers in the local community who, having published with i, may preferentially cite i. Hence, by including Ni(tp) and Si(tp), we control for two dimensions of cumulative advantage that could potentially affect a publication’s citation tally.

We then implement a fixed-effects regression to estimate the parameters of the citation impact model,

zi,p=βi,0+βalnai,p+βRRi,p+βtti,p+βNlnNi(tp)+βSlnSi(tp)+ϵi,p, [8]

using the Huber/White/sandwich method to calculate robust SE estimates that account for heteroskedasticity and within-panel serial correlation in the idiosyncratic error term ϵi,p. We excluded publications with yp>2003, and, in order that the Top and Other datasets are well balanced, we also excluded the Other researchers with less than 43 (biology) and 33 (physics) publications (observations) as of 2003. Table 2 lists the (standardized) parameter estimates. We provide the data used for both regression models in Dataset S1.

Table 2.

Parameter estimates for the citation model for zi,p in Eq. 8 using only the publications with yp2003

Dataset A lnap Rp tp lnNi(tp) lnSi(tp) Nobs. Adj. R2
All 377 0.263±0.024 0.202±0.023 0.061±0.004 0.062±0.066 0.065±0.072 68,589 0.27
(Std. coeff.) 0.135±0.012 0.129±0.015 0.039±0.003 0.044±0.046 0.050±0.055
P value 0.000 0.000 0.000 0.347 0.367
Biology (Top) 100 0.263±0.039 0.213±0.033 0.029±0.007 0.138±0.102 0.062±0.112 22,135 0.12
P value 0.000 0.000 0.000 0.177 0.578
Biology (Other) 55 0.579±0.053 0.152±0.066 0.031±0.015 0.179±0.095 0.211±0.094 4,801 0.20
P value 0.000 0.026 0.040 0.065 0.029
Physics (Top) 100 0.139±0.043 0.230±0.044 0.070±0.007 0.277±0.118 0.119±0.135 22,673 0.19
P value 0.002 0.000 0.000 0.021 0.380
Physics (Other) 122 0.272±0.042 0.235±0.049 0.060±0.008 0.082±0.095 0.017±0.104 18,980 0.19
P value 0.000 0.000 0.000 0.389 0.870

Each fixed-effects model was calculated using robust SEs, implemented by the Huber/White/sandwich method. Values significant at the p0.04 level are indicated in boldface. Std. coeff., the estimates of the standardized (beta) coefficients; All, the combination of all datasets.

We estimated βR=0.20±0.02 (p0.026 level in each regression), indicating a significant relative citation increase when a publication is coauthored with at least one super tie. The standardized βa and βR coefficients are roughly equal, meaning that increasing ap from 1 (a solo author publication) to e3 coauthors produces roughly the same effect as a change in Rp from 0 to 1. Thus, although larger team size correlates with more citations (4), the relative strength of βR stresses the importance of who in addition to how many.

Interestingly, the career age parameter βt=0.061±0.004 is negative (significant at the p0.04 level in each regression), meaning that researchers’ normalized citation impact decreases across the career, possibly due to finite career and knowledge life cycles. This finding is consistent with a large-scale analysis of researcher histories within high-impact journals, which also shows a negative trend in the citation impact across a career (31). Neither the reputation (βN) nor collaboration radius (βS) parameters were consistently statistically significant in explaining zi,p,y, likely because they are highly correlated with tp for established researchers. Modifications to consider in followup analysis are controls for the impact factor of the journal publishing p, the absolute year y to account for shifts in citation patterns in the post-Internet era, and removing self-citations from super ties. Unfortunately, this last task requires a substantial increase in data coverage, far beyond the relatively small amount needed to construct individual ego network collaboration profiles.

We develop three additional descriptive methods in SI Text to compare the subset of publications with at least one super tie to the complementary subset of publications without one. These investigations provide further evidence for the apostle effect. First, we defined an aggregate career measure, the productivity premium pN,i (see Eq. S1), which measures the average Kij value among the super ties relative to all of the other collaborators. Second, we defined a similar career measure, the citation premium pC,i (see Eq. S5), which quantifies the average citation impact attributable to super ties relative to all of the other collaborators.

Independent of dataset, we observed rather substantial premium values. For example, the productivity premium has an average value pN8, meaning that on a per-collaborator basis, productivity with super ties is roughly 8 times higher than with the remaining collaborators. Similarly, the citation premium pC,i is also significantly right-skewed, with average value pC14, meaning that net citation impact per super tie is 14 times larger than the net citation impact from all other collaborators. We emphasize that pC,i appropriately accounts for team size by using an equal partitioning of citation credit across the ap coauthors, remedying the multiplicity problem concerning citation credit.

Third, we calculated an additional estimation of the publication-level citation advantage due to super ties (Fig. S6). For both biology and physics, we found that the publications with super ties receive roughly 17% more citations than their counterparts. In basic terms, this means that the average publication with a super tie has 21 more citations in biology and 8 more citations in physics than the average publication without a super tie. This is not a tail effect, because the citation boost factor αR=1.17 applies a multiplicative shift to the entire citation distribution, P(c˜|Rp=1)P(αRc˜|Rp=0), thereby impacting publications above and below the average.

Fig. S6.

Fig. S6.

Comparing the citation distribution for papers with and without super ties: (A and C) Top and Other biology datasets combined, and (B and D) Top and Other physics datasets combined. (A and B) The cumulative citation distribution, P(c˜), of the detrended citations c˜p defined in Eq. S2. The solid orange curve represents P(c˜) for publications with Rp=1, and the dashed black curve represents P(c˜) for publications with Rp=0. Pairwise comparison of the distributions yield K-S P values less than 106, indicating that the distributions are significantly different. The distribution means are indicated by the vertical lines with corresponding numerical value shown in each panel. The ratio between the means yields the value αR=c˜p|R=1/c˜p|R=0= 1.17 for biology and 1.16 for physics. Estimating αR using the ratio of the median values yields approximately the same value. Thus, αR represents a 16−17% citation boost for p with Rp=1, which translates, on average, to a 21-citation difference for biology and an 8-citation difference for physics. (C and D) Scatter plots of the median c˜p,i values for p with Rp=0 versus the median c˜p,i values for p with Rp=1. Values are calculated within researcher profiles; thus each dot represents a single researcher. The majority of researchers have c˜R,i>c˜!R,i, with 73% of the biology researchers and 76% of the physics researchers above the (dashed black) y=x line. The μ value estimates the per-publication citation premium that accounts for heterogeneity across i. Because αRμ, these two methods yield consistent estimates of the citation premium per publication.

Discussion

The characteristic collaboration size in science has been steadily increasing over the last century (4, 7, 21), with consequences at every level of science, from education and academic careers to universities and funding bodies (8). Understanding how this team-oriented paradigm shift affects the sustainability of careers, the efficiency of the science system, and society’s capacity to overcome grand challenges will be of great importance to a broad range of scientific actors, from scientists to science policy makers.

Collaborative activities are also fundamental to the career growth process, especially in disciplines where research activities require a division of labor. This is especially true in biology and physics research, where computational, theoretical, and experimental methods provide complementary approaches to a wide array of problems. As a result, a contemporary research group leader is likely to find the assembly of team—one that is composed of individuals with diverse yet complementary skill sets—a daunting task, especially when under constraints to optimize financial resources, valuable facilities, and other material resources. Online social network platforms, such as VIVO (www.vivoweb.org/) and Profiles RNS (profiles.catalyst.harvard.edu/), which serve as match-making recommendation systems, have been developed to facilitate the challenges of team assembly.

Our analysis indicates that 2/3 of the collaborations analyzed here are weak. Nevertheless, the remaining strong ties represent social capital investments that can indeed have important long-term implications, for example, on information spreading (17), career paths (36), and access to key strategic resources (37). In the private sector, strong ties facilitate access to new growth opportunities, playing an important role in sustaining the competitiveness of firms and employees (38). These considerations further identify why it is important for researchers to understand the opportunities that exist within their local network. Understanding the redundancies in the local network (39) and the interaction capacity of team members (25) can help a group leader optimize group intelligence (26) and monitor team efficiency (24), thereby constituting a source of strategic competitive advantage.

In summary, we developed methods to better understand the diversity of collaboration strengths. We focused on the career as the unit of analysis, operationalized by using an ego perspective so that collaborations, publications, and impact scores fit together into a temporal framework ideal for cross-sectional and longitudinal modeling. Analyzing more than 166,000 collaborations, we found that a remarkable 60−80% of the collaborations last only Lij=1 year. Within a subset of repeat collaborations (Lij 2 y), we find that roughly 2/3 of these collaborations last less than a scientist’s average duration Li5 y, yet 1% last more than 4Li20 y. This wide range in duration and the disparate frequencies of long and short Lij together point to the dichotomy of burstiness and persistence in scientific collaboration. Closer inspection of individual career paths signals how idiosyncratic events, such as changing institutions or publishing a seminal study or book, can have significant downstream impact on the arrival rate of new collaboration opportunities and tie formation (see Fig. 1 and Fig. S1). Also, the frequency of relatively large publication overlap measures (fK,i and fN,i) indicates that career partners occur rather frequently in science.

In the first part of the study, we provided descriptive insights into basic questions such as how long are typical collaborations, how often does a scientist pair up with his/her main collaborator, and what is the characteristic half-life of a collaboration. We also found that as the career progresses, researchers become attractors rather than pursuers of new collaborations. This attractive potential can contribute to cumulative advantage (30, 31), as it provides select researchers access to a large source of collaborators, which can boost productivity and increase the potential for a big discovery.

We operationalized tie strength using an egocentric perspective of the collaboration network. Because the number of publications Kij between the central scientist i and a given coauthor j was found to be exponentially distributed, the mean value Ki is a natural author-specific threshold that distinguishes the strong (KijKi) from the weak ties (Kij<Ki). Within the subset of strong ties, we identified super tie outliers using an analytic extreme-statistics threshold Kic defined in Eq. 4. Also, because the number of publications produced by a collaboration is highly correlated with its duration, a super tie also represents persistence that is in excess of the stochastic churn rate that is characteristic of the scientific system. On a per-collaborator basis, the fraction of coauthors within a research profile that are super ties (fR,i) was remarkably common across datasets, indicating that super ties occur at an average rate of 1 in 25 collaborators.

There are various candidate explanations for why such extremely strong collaborations exist. Prosocial motivators may play a strong role, i.e., for some researchers, doing science in close community may be more rewarding than going it alone. Also, the search and formation of a compatible partnership requires time and other social capital investment, i.e., networking. Hence, for two researchers who have found a collaboration that leverages their complementarity, the potential benefits of improving on their match are likely outweighed by the long-term returns associated with their stable partnership. Complementarity, and the greater skill set the partnership brings, can also provide a competitive advantage by way of research agility, whereby a larger collective resource base can facilitate rapid adjustments to new and changing knowledge fronts, thereby balancing the risks associated with changing research direction. After all, a first-mover advantage can make a significant difference in a winner-takes-all credit and reward system (2).

Scientists may also strategically pair up to share costs, rewards, and risk across their careers. In this light, an additional incentive to form super ties may be explained, in part, by the benefits of reward sharing in the current scientific credit system, wherein publication and citation credit arising from a single publication are multiplied across the ap coauthors in everyday practice. Considered in this way, the career risk associated with productivity lulls can be reduced if a close partnership is formed. For example, we observed a few “twin profiles” characterized by a publication overlap fraction fK,i between the researcher and his/her top collaborator that was nearly 100%. Moreover, we found that 9% of the biologists and 20% of the physicists shared 50% or more of their papers with their top collaborator. This highlights a particularly difficult challenge for science, which is to develop a credit system that appropriately divides the net credit but, at the same time, does not reduce the incentives for scientists to collaborate (8, 2729). Thus, it will be important to consider these relatively high levels of publication and citation overlap in the development of quantitative career evaluation measures; otherwise, there is no penalty to discourage coauthor free riding (7).

We concluded the analysis by implementing two fixed-effects regression models to determine the sign and strength of the apostle effect represented by βρ (productivity) and βR (citations). Together, these two coefficients address the fundamental question: Is there a measurable advantage associated with heavily investing in a select group of research partners?

In the first model, we measured the impact of super ties on a researcher’s annual publication rate, controlling for career age, average team size, the prior experience of i with his/her coauthors, and the relative contribution of super ties within year t as measured by ρi,t in Eq. 5. We found larger ρi,t to be associated with above-average productivity (βρ>0), indicating that super ties play a crucial role in sustaining career growth. We also found increased levels of prior experience to be associated with decreased productivity (βL¯<0), suggesting that maintaining older ties conflicts with the potential benefits from mixing new collaborators into the environment. Nevertheless, higher inequality in the concentration of prior experience was found to have a counterbalancing positive effect on productivity (βG>0).

In the second regression model, we analyzed the impact of super ties on the citation impact of individual publications, using the detrended citation measure zi,p,y defined in Eq. 7. This citation measure is normalized within publication year cohorts, thus allowing for a comparison of citation counts for research articles published in different years. We found that publications coauthored with super ties, corresponding to 52% of the papers we analyzed, have a significant increase in their long-term citations (βR>0). In SI Text, we provide additional evidence for the apostle effect, showing that publications with super ties receive 17% more citations. This added value may arise from the extra visibility the publications receives, because the super tie collaborator may also contribute a substantial reputation and future productivity that promote the visibility of the publication. This type of network-mediated reputation spillover is corroborated by a recent study finding a significant citation boost attributable to a researcher’s centrality within the collaboration network (40).

This data-oriented analysis also contributes to the literature on the science of science policy (41), providing insight and guidance in an increasingly metrics-based evaluation system on how to account for individual achievement in team settings. As such, we conclude with some policy recommendations. One particularly relevant scenario is fellowship, tenure, and career award evaluations, where it is a common practice to consider “independence from one’s thesis advisor” as a selection criteria. We show that to assess a researcher’s independence, evaluation committees should also take into consideration the level of publication overlap between a researcher and his/her strongest collaborator(s), e.g., fK,i and fN,i. However, at the same time, the beneficial role of super ties—as we have quantitatively demonstrated—should also be acknowledged and supported. For example, funding programs might consider career awards that are specifically multipolar (8), which would also benefit the research partners in academia who are actually life partners, and who may face the daunting “two-body problem” of coordinating two research careers. Furthermore, understanding the basic levels of publication overlap in science is also important for the ex post facto review of funding outcomes as a means to evaluate the efficiency of science. In large-team settings, measuring the efficiency of a laboratory or project is difficult without a better understanding of how to measure overlapping labor inputs (i.e., collaborator contributions) relative to the project outputs (e.g., publications, patents, etc.). Finally, our study informs early career researchers—who are likely to face important decisions concerning the (possibly strategic) selection of collaborative opportunities—on the positive impact that the right research partner can have on their career’s long-term sustainability and growth. In all, our results provide quantitative insights into the benefits associated with strong collaborative partnerships, pointing to the added value derived from skill-set complementarity, social trust, and long-term commitment.

SI Text

Aggregate Measures for Supertie Impact

In The Apostle Effect I and The Apostle Effect II, we implemented a regression model that elucidates the role of super ties at the annual level for productivity and at the paper level for citations. To provide additional quantitative evidence for the apostle effect, in this section, we develop additional descriptive measures that compare the contributions by super ties to the contributions from the rest of the collaborators.

Productivity Premium.

A researcher is likely to have a relatively small number of super ties, corresponding on average to 100fR4% of his/her coauthors (see Fig. 5A). However, these coauthors, by definition, contribute to a large fraction of the total output of i (corresponding on average to 100fN 40−75% of all publications; see Fig. 5B). Thus, it is important to know the relative contributions of the super ties to nonsuper ties, because there are typically very many nonsuper tie coauthors whose inputs also contribute to the output of i.

To facilitate a comparison of productivity at the aggregate career level, we first separated the sum of the tie strengths, KiTj=1SiKij=KR=1,iT+KR=0,iT, into the contribution KR,iT=j|Rj=1Kij from the super ties (j with indicator value Rj=1), and the complementary contribution K!R,iT=KiTKR=1,iT from the other ties (with Rj=0). We then define the productivity premium as the ratio of the mean tie strengths,

pN,iKij|Rj=1Kij|Rj=0=KR,iT/SR,iK!R,iT/S!R,i, [S1]

between the coauthor subsets with Rj=1 (totaling SR,i coauthors) and Rj=0 (totaling S!R,i=SiSR,i coauthors). This quantity increases as the ratio SR,i/S!R,i decreases (smaller fR,i) and as the ratio KR,iT/K!R,iT increases; its maximum value is equal to the total number of publications published by the central scientist, Ni, and is bounded by the minimum value (Kic+1)/Kic1 for large Kic.

Fig. S4C shows the cumulative distribution P(pN). In all cases, we observe 2.5pN,i33, with average pN,i values between 7 and 10. Interestingly, the Top scientists from biology tend to have smaller pN,i values than the Other scientists (Mann−Whitney difference in median test P value = 0.0008, and K-S difference in distribution test P value = 0.0007). However, the same tests failed to indicate any significant difference for the P(pN) for physics.

Citation Premium.

In economic analyses, to compare nominal prices across time, it is fundamentally necessary to account for price inflation/deflation by means of an appropriate deflator index. For the same reason, it is equally important to use deflators when comparing success measures derived from other socioeconomic systems. In professional sports, for example, the rate of achievement can be era dependent—e.g., the nonstationary home run rate in Major League Baseball is an implication of the steroids era (42, 43). In science, the publication rate in physics and biology is growing at roughly a 5% rate (5). Nevertheless, this persistent growth has been subject to periods of nonstationary growth spurts, such as during the period of the US National Institutes of Health budget doubling between 1998 and 2003 (2). Thus, with these considerations in mind, in developing comparative citation measures, it is important to appropriately account for two nonstationary features of citation credit.

First, there is the time dependence of citations, arising from the fact that papers published in different years are at different points in their citation life cycle in the citation census year Yi. The citation tallies are also affected by the underlying growth of the citation supply—due to “inflation” or “secular growth” of scientific output—which also systematically biases the comparison of raw citation counts for p from different y. Second, it is also important to divide the citation credit among the ap coauthors of each publication p, in this way placing a cap on the net credit introduced by p, and accounting for the slow but steady exponential growth in the mean number of coauthors per paper over time (7).

To address these two underlying trends, we apply two normalizations to the raw citation count cp,Y(y) (measured in census year Yi for a paper p published in year y). First, we “deflated” cp,Y(y) by dividing by the mean citation value for publications from the same year, cYm(y), and then transformed this ratio into the mean citation values for the (arbitrary) baseline year y=2000, giving the rescaled value

c˜pcp,Y(y)cYm(2000)cYm(y). [S2]

This also accounts for the fact that more recent publications have had less time to accrue citations than older publications. Second, we control for trends in team size, choosing a naive approach that divides the c˜p citations into equal shares among the ap coauthors (44). As such, we define the normalized citations credited to coauthor j of p as

c˜j,pc˜pap. [S3]

Similar to the normalization procedure used for the citation z score zi,p,y in Eq. 7, cYm(y) is the average number of citations for publications published in a benchmark set m, choosing m to be the aggregation of articles appearing in the multidisciplinary journals Nature, Proceedings of the National Academy of Sciences, and Science. We restricted our query to publications denoted as “Articles,” which excludes reviews, letters to the editor, corrections, and other content types. We use these high-impact journals because they have high citation rates and hence provide a robust detrending baseline for the time-dependent component of cp,Y(y). Again, the choice of baseline year y=2000 is arbitrary (as is the deflation year 2000 commonly used in economic analyses) and is mainly used to recover the units of citations for the c˜ measure. Because the constant factor cYm(2000) is used for all c˜p values, it does not affect our results. The advantage of c˜p over zi,p,y is that the former is a positive number including the value 0, and hence can be added across p; zi,p,y, however, can be negative and is centered around 0, and, therefore, summing across p has a different interpretation that is not suitable for what follows.

We define the cumulative measure of citation impact for coauthors i and j as

C˜i,jpwithjc˜j,p, [S4]

where the sum includes only those publications in the profile of i that also include coauthor j. In the extreme case that j is a coauthor of every publication, Kij=Ni, this pairwise measure has the upper limit equal to the citation share of the central scientist, C˜i,ipc˜i,p. The sum across all j including i, C˜i=C˜i,i+jC˜i,j, yields the net detrended citation value, which is independent of the distribution of ap.

To define a similar citation premium, we also separated the citations into the contributions from the SR,i super ties and the contributions from the S!R,i nonsuper tie collaborators. Because the total C˜i is conserved, we split the C˜i,j into two groups: The total for the coauthors with Rj=1 is C˜R,ij|R=1C˜i,j, and the total for the remaining coauthors is C˜!R,ij|R=0C˜i,j. We then define the citation premium to be the ratio of the average citation shares of the coauthors in each subset,

pC,iC˜R,iC˜!R,i=C˜R,i/SR,iC˜!R,i/S!R,i, [S5]

which has a minimum possible value equal to 0 and, in principle, has no upper bound. Fig. S4D shows the distribution of pC,i, with mean, median, and maximum values across all datasets of 14.1, 11.3, and 134, respectively. We observed only two profiles (2 out of 473) with pC,i<1. Thus, using a group-to-group comparison, this measure shows that the relative citation impact contribution of super ties to other ties is significantly greater than unity. There may be a self-selection, because high-quality work may induce follow-up research, presumably with a similar set of collaborators. Hence, the citation premium is also evidence for the value of persistent collaboration, which can leverage and build upon prior experience and cumulative pairwise achievement.

Also of interest, we observe a consistent pattern considering the distributions of both pN,i and pC,i: The Top scientist profiles have smaller mean values than their counterparts, and the biology profiles have smaller mean value than for physics. In the case of productivity, this may follow from their privileged access to short-term collaboration opportunities. In the case of the citation impact, this pattern may emerge due to the reputation asymmetry of top scientists, who, by way of their prestige, may have more control over their choice of collaborators, possibly aimed at reducing redundancy within the team, reducing the team size, which also increases the citation credit per coauthor, c˜j,p. In large-team efforts, because most collaboration durations are short with relatively small Kij, increasing ap is most likely to decrease pC,i by way of decreasing the numerator and increasing the denominator.

Because pC,i is an aggregate career measure, and the dependent variable zi,p,y in our citation regression model (Eq. 7) is a normalized measure that does not have the dimensionality of citations, it is difficult to use these quantities to measure the citation boost on a per-publication basis. Thus, to estimate the apostle effect on the long-term citation tally of individual publications, we separated the set of publications with at least one super tie coauthor (Rp=1) from the complementary set of publications without any super tie coauthors (Rp=0). To compare p from a similar era, we took all of the publications from the 11-y window 1990−2000. Also, because citation rates are discipline dependent, we distinguished between biology and physics publications. During this period, 62% (7,814) of the p have Rp=1 for biology and 57% (10,128) of the p have Rp=1 for physics. From these well-balanced subsets, we then estimated the citation impact due to Rp=1 in two ways.

First, we calculated the cumulative citation distribution, P(c˜|Rp), for the publications with Rp=0,1. Fig. S6 A and B shows each distribution on log-linear axes, which emphasizes the log-normal features of P(c˜). On this log-linear scale, the two distributions are characterized by a horizontal offset, which is visible for the majority of the c˜p range. This graphical feature indicates that, in distribution, the c˜p for Rp=1 are larger by an approximately constant factor αR, i.e., P(c˜|Rp=1)P(αRc˜|Rp=0). We estimate αR by comparing the means and the median values of the P(c˜) distributions. For example, the ratio between the means yields the value αR=c˜p|R=1/c˜p|R=0= 1.17 for biology and 1.16 for physics. Estimating αR using the ratio of the median values yields approximately the same value. Thus, αR represents a 16−17% citation boost for p with Rp=1. For the average-cited p, this boost translates to a 21-citation difference for biology and an 8-citation difference for physics. These numbers, however, arise from an aggregated dataset, so it is not necessarily true that αR is representative of all scientists.

To confirm the per-publication citation premium at the researcher level, we grouped the publications with Rp=0,1 within each profile i. To reduce the sensitivity to fluctuations, we analyzed only the i with at least 10 publications in the Rp=0 subset and at least 10 publications in the Rp=1 subset. Then, to obtain a characteristic citation measure for each the two Rp=0,1 subsets, we calculated the median value, c˜R,i, for the subset of p with Rp=1, and the median value, c˜!R,i, for the complementary publication subset with Rp=0.

Fig. S6 C and D shows the scatter plot of c˜!R,i and c˜R,i for each i. The line y=x distinguishes the researchers with c˜R,i>c˜!R,i. There is notable heterogeneity across the i in terms of the citation premium from super ties. Nevertheless, the majority of researchers have c˜R,i>c˜!R,i, with 73% of the biology researchers and 76% of the physics researchers above the y=x line. We then obtained a second estimate of the per-publication citation premium by fitting a least-squares model, c˜R,i=μc˜!R,i+ϵ, where ϵ is an ordinary least squares (OLS) error term, obtaining best-fit values μ=1.21±0.06 (biology) and μ=1.24±0.09 (physics).

Thus, these last two methods provide consistent estimates of the citation boost at the publication level, μαR corresponding to a 16−24% citation boost, pointing to a significant long-term citation impact attributable to the presence of super ties.

Data Description

Name Disambiguation Strategy.

We obtained the top-cited researcher publication data using the Distinct Author Sets function provided by TRWOK to increase the likelihood that only publications actually authored by each central author i are analyzed. On a case by case basis, we performed further author disambiguation within each profile. The Other (matched set) profiles were also downloaded from TRWOK, either by using the Distinct Author database option, or by collecting distinct researcher profile data from ResearcherID.com.

In this latter case of ResearcherID.com profiles, we collected biology and physics profiles by querying the database for profiles listing any of the following keywords: graphene, neuroscience, molecular biology, or genomics. For further details on the selection procedure and for extensive analysis of the statistical properties of these datasets, see the data descriptions in refs. 4547.

The data census year Yi refers to the calendar year in which the researcher profile data were downloaded. Let yi0 be the first calendar year of his/her first publication and yif be the calendar year of the last observed publication, so that the total number of years of data for i is Ti=yifyi0+1. Hence, depending on if the career i was completed in Yi, there are two possible scenarios relating Yi and Ti: (scenario a) if the researcher i was still active in Yi, then Yi=yif=yi0+Ti1 and ΔYi=Yiyif=0; or (scenario b) if his/her career terminated at some time before Yi, then Yi=yi0+Ti1+ΔYi, with ΔYi>0 and Ti corresponding to the final career length. The datasets comprise profiles with census year Yi varying from 2010 to 2012 (47). These relatively small variations in Yi do not alter the citation results because all citation measures are appropriately detrended to make possible comparisons across time. Moreover, the regression data are longitudinal, meaning that the observations are made according to t, and so the results do not depend on Ti or the completeness of the career. Furthermore, the regression models each include an author-level fixed-effect parameter βi,0 that controls for time-invariant author-specific properties, thereby absorbing factors related to the starting calendar year yi0 and the lag ΔYi.

For a given central author i, we aggregate the TRWOK publications and create a registry of surname and first/middle-initial pairs, {Surname, FM}, where FM can consist of one, two, or three alphabetic first-letter character abbreviations α, FM α1α2α3. Because the number of distinct coauthors per i is relatively small, on the order of 10−1,000 distinct names per profile, we assume that a name disambiguation problem among the coauthors does not introduce significant levels of type 1 “splitting” or type 2 “clumping” disambiguation errors. Hence, we perform a string matching on similar last names and α1, ignoring α2 and α3 so that publications with variable listing of α2 and α3 do not result in a type 1 “profile splitting” error. We then aggregate the publication information into the profile of coauthor j of central author i. Because our approach is egocentric, we do not analyze the publications of j that do not include i. Clearly, this would require nearly comprehensive TRWOK publication data, which is a major data limitation.

Matched Profile Selection Criteria.

To account for possible prestige effects, we compared top-cited profiles to a set of Other profiles that we matched within each discipline. To match the datasets, we collected “not top-cited” researcher profiles that had levels of career length and productivity similar to the top-cited profiles. More specifically, we introduced a productivity criteria requiring that an Other profile must have at least as many publications, Ni, as all of the researchers in the corresponding top-cited dataset: For biology, this minimum threshold value is Min(Ni|topcited)=52, and for physics, it is Min(Ni|topcited)=46. Altogether, our career dataset comprised 100 top-cited and 93 matched profiles from biology, and 100 top-cited and 180 matched profiles from physics.

Throughout our analysis, we introduced various quantities that summarize the career (career length Ti, total publications Ni, etc.) and collaboration pattern (mean duration Li, mean strength Ki, strength Gini coefficient Gi, etc.) of any given research profile i. We found that the Top and Other datasets are statistically well-matched with respect to some variables, using the K-S test to certify the null hypothesis that the underlying distributions are statistically similar. For example, the super tie coauthor fraction fR,i exhibits the same distribution across all four datasets, as shown in Fig. 5A. Other variables were well matched only within discipline, e.g., Ki, or were well matched only within Top or Other datasets, e.g., fK,i.

One variable worth mentioning, for which the Top and Other datasets were not well matched, was the career length distribution, P(Ti). Because the Top scientists were selected on account of cumulative citation tallies, they are biased toward longer Ti, many of which are completed careers. Because the maximum possible Lij is given by Ti, the Li variables may be biased toward longer values for the top-cited researcher profiles. As such, we avoid making any comparisons on account of this type of measure. Instead, our comparisons in the manuscript are based on more intensive measures, e.g., the super tie coauthor fraction fR, which are less sensitive to biases arising from systematic differences in Ti and ΔYi.

Moreover, our analysis of the apostle effect, by design, avoids the potential bias due to Ti. For example, the productivity premium pN,i and the citation premium pN,i are ratios in which both the numerator and the denominator should have approximately the same dependence on Ti, and so the effect cancels out. In the case of the regression models, the dependent and independent variables are all specific to a particular career year t.

Supplementary Material

Supplementary File

Acknowledgments

The author is grateful for helpful discussions with O. Doria, M. Imbruno, B. Tuncay, and R. Metulini and constructive criticism and keen insights from two anonymous referees. The author also acknowledges feedback from participants in the European Union Cooperation in Science and Technology (COST) Action TD1210 (KnowEscape) workshop on “Quantifying scientific impact: Networks, measures, insights?” and support from the Italian Ministry of Education for the National Research Project (PNR) “Crisis Lab.”

Footnotes

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1501444112/-/DCSupplemental.

References

  • 1.Börner K, et al. A multi-level systems perspective for the science of team science. Sci Transl Med. 2010;2(49):49cm24. doi: 10.1126/scitranslmed.3001399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Stephan P. How Economics Shapes Science. Harvard Univ Press; Cambridge, MA: 2012. [Google Scholar]
  • 3.Nahapiet J, Ghoshal S. Social capital, intellectual capital, and the organizational advantage. Acad Manage Rev. 1998;23(2):242–266. [Google Scholar]
  • 4.Wuchty S, Jones BF, Uzzi B. The increasing dominance of teams in production of knowledge. Science. 2007;316(5827):1036–1039. doi: 10.1126/science.1136099. [DOI] [PubMed] [Google Scholar]
  • 5.Petersen AM, et al. Reputation and impact in academic careers. Proc Natl Acad Sci USA. 2014;111(43):15316–15321. doi: 10.1073/pnas.1323111111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Malmgren RD, Ottino JM, Nunes Amaral LA. The role of mentorship in protégé performance. Nature. 2010;465(7298):622–626. doi: 10.1038/nature09040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Petersen AM, Pavlidis I, Semendeferi I. A quantitative perspective on ethics in large team science. Sci Eng Ethics. 2014;20(4):923–945. doi: 10.1007/s11948-014-9562-8. [DOI] [PubMed] [Google Scholar]
  • 8.Pavlidis I, Petersen AM, Semendeferi I. Together we stand. Nat Phys. 2014;10:700–702. [Google Scholar]
  • 9.Borgatti SP, Mehra A, Brass DJ, Labianca G. Network analysis in the social sciences. Science. 2009;323(5916):892–895. doi: 10.1126/science.1165821. [DOI] [PubMed] [Google Scholar]
  • 10.Granovetter MS. The strength of weak ties. Am J Sociol. 1973;78(6):1360–1380. [Google Scholar]
  • 11.Newman MEJ. The structure of scientific collaboration networks. Proc Natl Acad Sci USA. 2001;98(2):404–409. doi: 10.1073/pnas.021544898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Newman MEJ. Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E Stat Nonlin Soft Matter Phys. 2001;64(1 Pt 2):016131. doi: 10.1103/PhysRevE.64.016131. [DOI] [PubMed] [Google Scholar]
  • 13.Barabasi AL, et al. Evolution of the social network of scientific collaborations. Physica A. 2002;311(34):590–614. [Google Scholar]
  • 14.Newman MEJ. Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA. 2004;101(Suppl 1):5200–5205. doi: 10.1073/pnas.0307545100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Guimerà R, Uzzi B, Spiro J, Amaral LAN. Team assembly mechanisms determine collaboration network structure and team performance. Science. 2005;308(5722):697–702. doi: 10.1126/science.1106340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Palla G, Barabási AL, Vicsek T. Quantifying social group evolution. Nature. 2007;446(7136):664–667. doi: 10.1038/nature05670. [DOI] [PubMed] [Google Scholar]
  • 17.Pan RK, Saramäki J. The strength of strong ties in scientific collaboration networks. Europhys Lett. 2012;97(1):18007. [Google Scholar]
  • 18.Martin T, Ball B, Karrer B, Newman MEJ. Coauthorship and citation patterns in the Physical Review. Phys Rev E Stat Nonlin Soft Matter Phys. 2013;88(1):012814. doi: 10.1103/PhysRevE.88.012814. [DOI] [PubMed] [Google Scholar]
  • 19.Ke Q, Ahn YY. Tie strength distribution in scientific collaboration networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2014;90(3):032804. doi: 10.1103/PhysRevE.90.032804. [DOI] [PubMed] [Google Scholar]
  • 20.Börner K, Maru JT, Goldstone RL. The simultaneous evolution of author and paper networks. Proc Natl Acad Sci USA. 2004;101(Suppl 1):5266–5273. doi: 10.1073/pnas.0307625100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Milojević S. Principles of scientific research team formation and evolution. Proc Natl Acad Sci USA. 2014;111(11):3984–3989. doi: 10.1073/pnas.1309723111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.March JG. Exploration and exploitation in organizational learning. Organ Sci. 1991;2(1):71–87. [Google Scholar]
  • 23.Lazer D, Friedman A. The network structure of exploration and exploitation. Adm Sci Q. 2007;52(4):667–694. [Google Scholar]
  • 24.Petersen AM, Riccaboni M, Stanley HE, Pammolli F. Persistence and uncertainty in the academic career. Proc Natl Acad Sci USA. 2012;109(14):5213–5218. doi: 10.1073/pnas.1121429109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pentland A. The new science of building great teams. Harv Bus Rev. 2012;90:60–69. [Google Scholar]
  • 26.Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW. Evidence for a collective intelligence factor in the performance of human groups. Science. 2010;330(6004):686–688. doi: 10.1126/science.1193147. [DOI] [PubMed] [Google Scholar]
  • 27.Stallings J, et al. Determining scientific impact using a collaboration index. Proc Natl Acad Sci USA. 2013;110(24):9680–9685. doi: 10.1073/pnas.1220184110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Allen L, Scott J, Brand A, Hlava M, Altman M. Publishing: Credit where credit is due. Nature. 2014;508(7496):312–313. doi: 10.1038/508312a. [DOI] [PubMed] [Google Scholar]
  • 29.Shen HW, Barabási AL. Collective credit allocation in science. Proc Natl Acad Sci USA. 2014;111(34):12325–12330. doi: 10.1073/pnas.1401992111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Petersen AM, Jung WS, Yang JS, Stanley HE. Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proc Natl Acad Sci USA. 2011;108(1):18–23. doi: 10.1073/pnas.1016733108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Petersen AM, Penner O. Inequality and cumulative advantage in science careers: A case study of high-impact journals. EPJ Data Sci. 2014;3:24. [Google Scholar]
  • 32.Krapivsky P, Redner S, Ben-Naim E. A Kinetic View of Statistical Physics. Cambridge Univ Press; Cambridge, UK: 2010. [Google Scholar]
  • 33.Azoulay P, Zivin JSG, Wang J. Superstar extinction. Q J Econ. 2010;125(2):549–589. [Google Scholar]
  • 34.Uzzi B, Mukherjee S, Stringer M, Jones B. Atypical combinations and scientific impact. Science. 2013;342(6157):468–472. doi: 10.1126/science.1240474. [DOI] [PubMed] [Google Scholar]
  • 35.Radicchi F, Fortunato S, Castellano C. Universality of citation distributions: Toward an objective measure of scientific impact. Proc Natl Acad Sci USA. 2008;105(45):17268–17272. doi: 10.1073/pnas.0806977105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Clauset A, Arbesman S, Larremore DB. Systematic inequality and hierarchy in faculty hiring networks. Sci Adv. 2015;1(1):e1400005. doi: 10.1126/sciadv.1400005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Duch J, et al. The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PLoS One. 2012;7(12):e51332. doi: 10.1371/journal.pone.0051332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Uzzi B. Embeddedness in the making of financial capital: How social relations and networks benefit firms seeking financing. Am Sociol Rev. 1999;64(4):481–505. [Google Scholar]
  • 39.Burt RS. Structural Holes. Harvard Univ Press; Cambridge, MA: 1992. [Google Scholar]
  • 40.Sarigl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F. Predicting scientific success based on coauthorship networks. EPJ Data Sci. 2014;3:9. [Google Scholar]
  • 41.Fealing KH, editor. The Science of Science Policy: A Handbook. Stanford Business Books; Stanford, CA: 2011. [Google Scholar]
  • 42.Petersen AM, Jung WS, Stanley HE. On the distribution of career longevity and the evolution of home run prowess in professional baseball. Europhys Lett. 2008;83(5):50010. [Google Scholar]
  • 43.Petersen AM, Penner O, Stanley HE. Methods for detrending success metrics to account for inflationary and deflationary factors. Eur Phys J B. 2011;79(1):67–78. [Google Scholar]
  • 44.Petersen AM, Wang F, Stanley HE. Methods for measuring the citations and productivity of scientists across time and discipline. Phys Rev E Stat Nonlin Soft Matter Phys. 2010;81(3 Pt 2):036114. doi: 10.1103/PhysRevE.81.036114. [DOI] [PubMed] [Google Scholar]
  • 45.Petersen AM, Stanley HE, Succi S. Statistical regularities in the rank-citation profile of scientists. Sci Rep. 2011;1:181. doi: 10.1038/srep00181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Petersen AM, Succi S. The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile. J Informetrics. 2013;7(4):823–832. [Google Scholar]
  • 47.Penner O, Pan RK, Petersen AM, Kaski K, Fortunato S. On the predictability of future impact in science. Sci Rep. 2013;3:3052. doi: 10.1038/srep03052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Acemoglu D, Robinson JA. Economic Origins of Dictatorship and Democracy. Cambridge Univ Press; Cambridge, UK: 2005. [Google Scholar]
  • 49.Ausloos M. A scientometrics law about co-authors and their ranking: The co-author core. Scientometrics. 2013;95:895–909. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES