Abstract
One of the main concerns of researchers and institutions is how to assess the future performance of scholars and identify their potential to become successful scientists. In this study, we model scholarly success in terms of the probability of a scholar belonging to a group of highly impactful scholars as determined by their citation trajectory structures. To this end, we developed a new set of impact measures based on a scholar's citation trajectory structure (rather than on absolute citation or h-index rates), that show a stable trend and scale for highly impactful scholars, independent of their field of study, seniority and citation index. These measures were then incorporated as influence factors into the logistic regression models and used as features for probabilistic classifiers based on these models to identify the successful scholars in the heterogeneous corpus of 400 of most and least cited professors from two Israeli universities. From the practical point of view, the study may yield useful insights and serve as an aid in making promotion decisions by institutions, as well as a self-assessment tool for researchers who strive to increase their academic influence and become leaders in their field.
Keywords: Scholarly impact, Universal impact measures, Citation trajectory structure, Logistic regression classifier, Web of Science, Google Scholar
1. Introduction
Assessing and predicting future academic performance of scholars is important, since it determines their status in the academic world [1,2] and influences decisions about promotions and fund allocations [3]. Academic performance is primarily measured based on the number of publications (the scholar's productivity), and the number of citations (the scholar's impact) [4]. Over the years, many evaluation indicators were developed based on those two factors, such as h-index [5,6] and its variants (e.g. Ref. [7]). As has been shown in previous research, there is a wide diversity in publication, citation and h-index scores even among prominent scientists, such as tenured professors [8]. These scores strongly depend on the scholar's field of study, source citation index, age, seniority, networking, and other factors [1,[8], [9], [10], [11], [12], [13]]. Hence, several universal impact measures have been proposed that produce normalized scores comparable for scholars from various research fields and career stages [2,[14], [15], [16], [17]].
Yet, measuring a scholar's performance based on a single score cannot provide a comprehensive evaluation. Thus, numerous studies in this field tried to identify characteristic patterns of the performance trajectories of top scholars. Some of these works have devised productivity patterns, such as “first period peakers”, “second period peakers”, or “constant learners” [3,18,19]. Others applied such patterns to analyze citation trajectories [4,[20], [21], [22], [23], [24]]. However, their results were somewhat controversial, probably due to the high complexity and variability of the citation trajectories of successful scholars that may match various types of patterns. In addition, while productivity is, to a substantial extent, directly controlled by a scholar, impact may depend on a wide spectrum of external factors, and thus requires further investigation.
In this study, we propose a new approach where a scholar's impact is not measured by a certain index score, but rather by the probability of a scholar belonging to the group of top impactful scientists. Such classification of scholars is performed based on a set of measures that quantify the structure of their citation trajectories. Since these measures are proportion-based and do not directly rely on the citation rates, scholars with different citation rates can have similar trajectory structures and therefore similar trajectory structure-based scores. The underlying assumption is that successful scholars from different disciplines and seniority levels have similar trajectory structures. The developed measures are incorporated as dependent variables in a logistic regression model that identifies the scholar's impact group. Thus, as opposed to previous work, there is no need to select a certain type of pattern, characteristic or index, and the classification is made based on the combination of multiple indicators.
Several recent studies applied multi-feature models to predict the future citation number (or h-index) of a scholar based on his/her past performance [12,25,26]. Slope-based trajectory analysis was formerly used in citation prediction and knowledge diffusion of academic papers [27]. Yet, to the best of our knowledge, no impact prediction model has been developed that applies a set of citation trajectory structure indicators to predict a scholar's impact group (rather than a future citation number).
The research addresses the following questions.
-
1)
How to systematically measure the structure of scholars' citation trajectories independently from their specific citation scores?
-
2)
How similar is the citation number-based classification of successful scholars to the trajectory structure-based classification?
-
3)
Are there differences in trajectory structures and success models based on the data of two prominent citation indices, Web of Science (WoS) and Google Scholar (GS)?
In addition, this study examined the universality of the proposed measures by investigating whether and to what extent their rates are influenced by gender, seniority, affiliation and field of study. To the best of our knowledge, this is the first research aiming to define scholarly success in terms of the probability of a scholar belonging to a group of highly impactful scholars, as determined by their citation trajectory structures.
2. Literature review
2.1. Measures of scholarly performance
Scholarly performance is usually evaluated by measures that combine publication and citation rates of researchers, such as h-index [4], g-index [28], h2 index [29], χ-index [30], and rec-index (or rectangle-index) [7]. These measures consider the work performed by scholars throughout their entire academic career. However, they do not capture the changes in productivity and academic impact of a researcher over time. Moreover, the aggregation of publications and citations into a single number leads to inconsistencies in the ranking of scholars [31], is advantageous for older scholars and varies by field and by citation index [32]. Therefore, two main approaches to overcome these biases and inconsistencies have been presented in literature. The first approach is to normalize the h-index across disciplines by dividing the number of citations per article by the average citation number for the field and dividing the number of author's publications in a year by the average value in the discipline [16] or by dividing the scholar's h-index by the average h-index of the authors in the same discipline [15]. In addition, individual scientists of different seniority can be compared by means of the number of highly impactful papers per career year [14]. However, the up-to-date average h-index or publication values for various disciplines may not be available, thus making the proposed normalization techniques less applicable in practice.
The second approach tried to identify scholarly performance patterns (rather than a single score) by examining their publication and citation trajectories, and their changes over time [19,24], as described below.
2.2. Productivity trajectory patterns
Powers et al. [19] mapped the productivity levels of marketing academicians over a 20 years period, and profiled the medium and high producers according to the time of their productivity peak (e.g. “first period peakers”, “second period peakers”, “mid-career increasers”, etc.), by applying cluster analysis. They found that scholars with four or more publications after career years 3–7 and 8–12 are almost certain to be high producers. Feichtinger et al. [18] tried to develop an optimization model for the career of a scientist. Based on the work of Way et al. [3], they identified four life cycle patterns: typical case – first increasing then decreasing; fading case – steady decrease in publication rates after getting tenure; slump case – decrease in the middle of the career, followed by a productivity revival towards the end of the career; busy case – slow start followed by an accelerated research intensity. Petersen et al. [33] developed a proportional growth model for individual academic careers, by analyzing the academic productivity of three groups of physicists throughout their careers. They found that the distribution of production growth is a leptokurtic “tent-shaped” distribution that is symmetric. A more recent study claims that the conventional narrative of an early peak followed by a gradual decline describes only one-fifth of scholars, and that most of them manifest great inconsistency in individual productivity trajectories [3], with some reaching a peak in the beginning of their career and some later. Another recent study [8] found that publication rate patterns may depend on the source database. Thus, in WoS, mean publication rates stabilize after 15 years of seniority, while in GS they constantly grow over time.
White et al. [34] aimed to identify the characteristics of research “stars” and found that they have higher academic rank, possess better time management skills, report more time available to conduct research and enjoy higher institutional support. Kelchtermans and Veugelers [35] investigated the ability to preserve top research performance over time among scholars in biomedical and exact sciences. They found that about a quarter achieve top performance at least once in their career, while 5% persistently staying top scholars throughout their career. Li et al. [36] compared longtime career performance of Nobel laureates compared to ordinary scientists. They found that Nobel laureates are energetic producers, publishing almost twice as many papers as scientists in the comparison group. Also, they are disproportionally likely to have more than one hot streak of high impact papers over the course of their careers. However, a study that was conducted in Israel and investigated the career productivity trajectories of Israeli Prize laureates, found that there is an unequal distribution of academic productivity even among “star” scientists, and there are a few star scientists that surpass the others. This demonstrates the “law of limited excellence”, which states that a few star scientists are responsible for most publications [37].
2.3. Citation trajectory patterns
Based upon previous categorizations of publication scores, Weinberger et al. [24] investigated citation growth patterns. They focused on the differences between the most cited and the least cited scholars from various disciplines and found that the most common pattern in both groups of scholars was two-peaked. In addition, they found that the most cited scholars had a higher chance to match profiles that reflect early-career success, while the least cited scholars matched profiles that showed a citation peak later in their careers. This supports earlier findings that a career peak is characterized by one or several highly cited papers [4] produced in the early to middle years of career [23,38,39]. This peak is followed by a slow and gradual decline [40,41]. Similarly, Bjork et al. [21] found that the most common pattern of the citation trajectories among Nobel Prize laureates in economics manifested a peak around the time of the Nobel Prize followed by steady decline. However, Sinatra et al. [4] showed that there is no apparent correlation between the impact of a research paper and its occurrence on the timeline of a career, which they asserted to be completely random. Petersen et al. [22] observed a faster than linear growth in time of cumulative citations for top scientists, by examining the influence of a quantitative reputation measure, defined as the cumulative citation rate across all publications.
The above studies focused on a certain type of publication or citation trajectory feature, such as the relative position (timing) or number of peaks, the overall growth rate, or the general trend (incline or decline) before or after the peak. Few works attempted to combine multiple performance indicators into a multi-feature model to predict the future citation rate or h-index of scholars based on their past performance. Thus, Acuna et al. [25] built a linear regression model with 18 features, including the total number of citations and publications, average number of co-authors per publication, seniority, number of publications in prestigious journals and number of received grants, to successfully predict the h-index score five years later. Weihs and Etzioni [26] showed that future scientific impact (citation number) prediction is possible, even for a 10-year horizon. For the prediction task, they automatically generated 44 features for each scholar, such as publication and citation scores, h-index, and changes in citations, h-index, and mean number of citations per publication over the last two years, and used them in several regression models. Gogoglou and Manolopoulos [12] created clusters of academic peers according to their citation scores and predicted the future impact of the scholars that belong to each group based on several features such as: the publication and citation rates, co-authorship, citers and cited authors, differences in citations and h-index for every ten-year sub-period. They found that the position of the scholar within the social network of citers and cited authors is crucial for future impact. These studies tested their models on relatively homogeneous corpora of scholars (e.g. from the same field and/or similar seniority level). In this paper, we present a new approach for the successful scholar identification from diverse disciplines based on a multi-feature analysis of citation trajectory structures.
2.4. Influence of demographic factors on scholarly performance
Numerous studies have investigated the impact of various demographic factors on academic performance, for example: a scholars' age and seniority [9,39], academic rank [42], field of study [8] and gender [43]. Some studies showed gender differences in terms of scholarly productivity, with men being more productive than women [[43], [44], [45], [46], [47], [48], [49]]. However, other studies [8,50] found that women's performance rates increased and even equaled to or surpassed those of men later in their careers, because of motherhood and childcare at their early career period [46,51]. Studies that compared scholarly productivity across disciplines found that the publication rates of natural scientists exceed those of social scientists and humanists [8,[52], [53], [54], [55], [56]]. According to Gogoglou and Manolopoulos [12] and Sarigöl et al. [57], social links and networking of scholars may have effect on their future impact. In this study, our goal was to develop measures of scholarly impact that are not susceptible to biases toward certain fields, seniority levels or source citation index. Therefore, we also examined the influence of the demographic factors, as mentioned in the above literature, on the proposed measures.
3. Methodology
As discussed in the previous section, measuring the scholar's success by a single indicator (e.g. citation number, h-index) or pattern (e.g. “early/mid/late peaker”, “one/two/three peaks”) may lead to bias and underestimation of certain types of scholars. Past research showed that the number as well as timing of the peaks in the top scientist's career may be quite random [4]. Furthermore, the scholarly performance changes over time in different aspects, and there is no one typical indicator, pattern or trend that characterizes these changes for all the top scholars and distinguishes them from their less impactful peers.
This research proposes a multi-feature analysis of the structure of a scholar's citation trajectories. A citation trajectory is defined by accumulative numbers of citations (represented by Y axis) calculated on a yearly basis (represented by X axis) throughout the scholar's career. This trajectory never declines, since in the worst case, when a scholar has no new citations in a given year, the accumulated citation number in this year remains equal to that of the preceding year. In order to investigate the structure of a citation trajectory, a slope-based trajectory was derived from the scholar's original citation trajectory. The citation slope for a certain year is the number of new citations of the scholar (only) in this year. The slope trajectory reflects the direction (increasing or decreasing compared to previous years) and rate of growth (i.e. how fast the citation number increases or decreases over time). Fig. 1 below demonstrates typical slope trajectories of highly impactful vs. less impactful scholars. Our objective is to investigate and quantify the changes of a slope citation trajectory of the highly vs. less impactful scholars, but not at a certain point or period as in the past research (e.g. “early peaker” – the period of a career peak), but over their entire career (e.g. the overall direction or the overall stability of the slope trajectory). As mentioned in the Introduction, we aim to develop a set of universal measures for scholarly success that produce similar scores for top impactful scientists from different fields and seniority levels, independent of their overall publication and citation scores. Structure reflects the proportions between different regions in the trajectory, computed as relative rather than absolute values, thus two trajectories can have the same structure while having different citation scores. To assess the structure of slope trajectories, the following measures have been defined.
Fig. 1.
Slope trajectories of the typical HI100 and LI100 scholars in each corpus.
3.1. Citation trajectory structure-based measures for scholarly success
First, as a preparatory step, we defined a set of measures that reflected the yearly and overall trends of the citation slope trajectory based on the citation growth rates. We note that these measures may produce biases towards some fields of studies or seniority levels, and so were not included as universal indicators in the developed classification models.
-
1.
Seniority – is the scholar's seniority (i.e. active career years) that is computed as a total length (in years) of the scholar's career trajectory.
-
2.
Slope – the number of new citations that a scholar gained in a given year, i.e. the difference between the overall accumulated citation numbers in a given year and the preceding year, that is formally defined for a scholar as:
,Where is the overall number of citations that a scholar received from the beginning of the career up to and including the given year. Slope is always positive.
-
3.
Avg Slope – an average difference in the citation rates of a given year and the preceding year, computed for all pairs of adjacent years in the scholar's career, that is formally defined for a scholar as:
).
-
4.
Growth - a yearly slope growth rate for a scholar , i.e. the difference between the citation slope scores of the given year and the preceding year, formally defined as:
.
We note that Growth can be negative if the Slope value of the current year is lower than that of the preceding year.
-
5.
Avg Growth Percentage – an average yearly slope growth rate as a percentage of the previous year's slope, computed for all pairs of adjacent years in the scholar's career, formally defined for a scholar as follows:
.
The expectation was that successful scholars will present a slope trajectory that is characterized by a higher rate of average slope growth than less successful scholars. In other words, for more impactful scholars, the citation trajectory grows faster both yearly and over the entire career on average.
Based on the above basic measures, the first set of proportion-based measures was defined, aiming to capture the scholar's ability to preserve the positive trend in the citation slope over time. The optimal situation is when the citation slope constantly inclines with increasing growth rate. To this end, the following measures count and compare the numbers of years of positive and negative growth in the citation slope trajectories.
-
6.
Up Years – the total number of positive growth years:
,
Where .
-
7.
Down Years – the total number of negative growth years:
,Where .
-
8.
Sum – the difference between the number of positive growth years and the number of negative growth years throughout the scholar's career, formally defined as:
.
-
9.
Sum Up – the normalized Sum measure, i.e. Sum divided by the number of positive growth years: .
In the optimal case, when the citation slope constantly inclines with increasing growth rate, Up Years was expected to become close to seniority and Sum Up will be close to 1.
Then, we mapped the periods of continuous incline and decline in the growth rates of the citation slope throughout the entire scholar's career. Based on these periods, the following measures were defined and calculated for each scholar, estimating the scholar's ability to increase the citation slope rates constantly and continuously.
-
10.
Avg Length Down Up – the ratio of the average length of continuous negative growth periods and the average length of continuous positive growth periods, formally defined as:
,Where k is a period of one or more adjacent years in ’s career when .
, j is a period of one or more adjacent years in ’s career when .
-
11.
Avg Up Growth – the average slope growth per positive growth period, formally defined as:
, .
-
12.
Avg Down Growth – the average slope growth, based on absolute values of slope growth per negative growth period, formally defined as:
.
-
13.
Avg Growth Down Up – the ratio of the average slope growth per negative growth period and the average slope growth per positive growth period, defined as:
.
The expectation was that successful scholars will keep their citation rates growing for most of their career, and for longer periods than less successful scholars; their slope trajectory's trend will be more stable, and even when it decreases, the slope growth rate of the citation trajectories in the continuous incline periods will be higher than the slope drop in the continuous decline periods. Thus, relatively high scores of measures 6, 8, 9, and 11, and relatively low values of measures 7, 10, 12 and 13 are assumed to be indicative of scholarly success. It can be noted that citation patterns from previous works, such as “two peaks” [24] and “learners” [19] were generalized and quantified by these measures as well.
In addition to measuring the proportions between length and slope growth of incline and decline periods in the citation slope trajectory, their relative position in the scholar's career timeline is another important characteristic of the trajectory structure. Therefore, we chronologically divided each scholar's career trajectory to 5-year sub-periods and computed the Sum and overall Growth measures for each period separately. The maximal number of 5-year periods was 11 for WoS and 8 for GS. These period-based measures provided a generalized universal estimation of the patterns of the type: “first period peakers”, “second period peakers”, and “third period peakers” [19].
3.2. Data collection
The scholars' data was collected from the official websites of the two major Israeli universities located in the center of the country, Tel-Aviv University (Shanghai's rank of 151–200 in the world, and 4th in the country) and Bar-Ilan University (Shanghai's rank of 401–500 in the world, and 5/6th in the country in 2020). After filtering out scholars with too popular names or unidentifiable gender, over 800 tenured professors were extracted from all the faculties except for Humanities. The fact that all the scholars were granted tenure and a professorship title (corresponding to the US ranks of Associate Professor, Full Professor and Emeritus) ensures that they are well-established academicians with a successful career [8]. The professors' gender was determined manually based on Israeli naming conventions and their personal websites in ambiguous cases. In addition, the scholars' overall and yearly citation data and h-index were retrieved from WoS and GS indices. Professors with less than 10 listed publications, less than 200 citations and h-index below 10 in the two citation indices were excluded from the analysis.
Finally, in each index the 100 top-cited scholars (denoted as HI100WoS and HI100GS) and the 100 least-cited scholars (denoted as LI100WoS and LI100GS) were used as the study's corpus. 44 of the scholars were included in both HI100WoS and HI100GS, 13 of the scholars were included in both LI100WoS and LI100GS, and 13 of the scholars were included in both HI100WoS and LI100GS, thus the overall number of distinct scholars examined by the study was 330. The reason for choosing 100 most impactful and 100 least impactful scholars was the 20/80 method that was also employed in the previous research (e.g. Refs. [22,33,37,58].
The four groups (HI100WoS, HI100GS, LI100WoS, and LI100GS) included representatives from various faculties (WoS: HI100 - 48% from Life Science, 38% from Exact Sciences and Engineering, 14% from Social Sciences; LI100 - 8% from Life Science, 70% from Exact Sciences and Engineering, 22% from Social Sciences; GS: HI100 - 17% from Life Science, 69% from Exact Sciences and Engineering, 14% from Social Sciences; LI100 - 6% from Life Science, 47% from Exact Sciences and Engineering, 47% from Social Sciences), and genders (WoS: HI100 – 81 male, 19 female; LI100 – 81 male, 19 female; GS: HI100 – 89 male, 11 female; LI100 – 85 male, 15 female). However, we wish to note that the citation rates were not normalized by field of study or seniority (as suggested in the literature), and thus there may still be some bias in the four groups towards certain faculties and seniority levels (e.g. more Social Science and young scholars in the LI100 group than in the HI100 group). Yet, we still assume that most of the HI100 group are indeed successful scholars and use them as a baseline (yet not as a ground truth) dataset to build the trajectory structure-based model of scholarly success.
3.3. Data analysis
The measures presented in Section 3.1 were used to assess the impact of individual scholars and further to compute the differences between the scholars, and compare between the HI100 and LI100 subgroups based on the averaged measures’ scores over all the scholars in each group. The statistical analysis comprised the following main stages.
At the first stage, we used descriptive statistics to calculate and comparatively summarize the values of the different measures presented in Section 3.1 for each of the scholar groups. Then, Pearson correlation and one-way ANOVA tests were applied on all the measures' values to learn about their inter-relationships and differences between institutions, faculties, genders and high and low impact scholar groups. We used one-way ANOVA tests to examine the differences between faculties and the various scholar sub-corpora in terms of citation trajectory structure measures, and t-test to investigate the differences across genders and two examined citation indices (WoS and GS). The next stage aimed to build a multi-feature model for becoming a successful scholar. To this end, we performed bivariate logistic regression analysis for each corpus, using the trajectory structure indicators as independent variables, and examined whether and to what extent they contributed to scholar's classification as HI100. A total of 2000 bootstrapped samples were used to check the validity of the model and generate an optimism-corrected C-statistic, which is a less biased method than others in the field, with lower absolute and mean squared errors [59,60]. Notably, independent variables with a strong Pearson correlation rate (>0.7) were not placed together in the same regression model (see the Appendix for the complete Pearson correlation rates between all independent variables that were considered for the regressions), and small-valued measures (Sum Up, Avg Length Down Up and Avg Growth Down Up) were normalized (by adding 1 and multiplying by 100). Then, at the fifth stage, we built an automatic classifier based on a logistic regression function with multiple explanatory variables as features weighted by B values of the corresponding prediction models' indicators (i.e. the selected trajectory structure measures):
Bi, where vi are the values of the selected measures for the examined scholar.
Then, we performed normalization, in order to obtain a score in the range between 0 and 1 for the probability of a scholar belonging to each of the scholar groups (LI100 and HI100):
All the scholars that obtained a PredictionProb score between 0 and the median were classified as LI100 (group 0) and all the scholars that received a score between the median and 1 were classified as HI100 (group 1). To evaluate the model's classification accuracy, we randomly selected 30% of each sub-corpus as a test set, then trained the developed model on the remaining 70%, and applied it to the test set. Since the citation number-based classification into HI100 and LI100 was used as a baseline for comparison rather than a ground truth classification of scholarly success, the final sixth stage included a qualitative analysis of the classification mismatches, their various demographic and trajectory-based characteristics. This is in order to learn and interpret the differences between their original (citation number-based) and trajectory-based scholar groups.
4. Results
4.1. Descriptive statistics
Our analysis showed that in both citation indices (WoS and GS), the HI100 group was found to be superior to the LI100 group across almost all citation trajectory structure measures. Table 1 presents the differences between the measure scores of the two indices.
Table 1.
The differences between the total citation number, h-index, and the trajectory structure measure scores of the two citation indices.
| Dependent variable | Citation index | Scholar Group | Min Score | Max Score | Mean (SD) | Mean Ratio (HI100/LI100) | t (HI100 vs. LI100) df = 2198 | Mean Ratio (GS/WoS) |
|---|---|---|---|---|---|---|---|---|
| Total Citations | WoS | HI100 | 2304.00 | 18714.00 | 5255.72 (3448.53) | 12.10 | 13.97*** | 1.61 |
| LI100 | 206.00 | 612.00 | 434.32 (105.82) | |||||
| GS | HI100 | 8200.00 | 135550.00 | 17908.90 (16685.89) | 19.49 | 10.18*** | ||
| LI100 | 232.00 | 1462.00 | 918.70 (376.54) | |||||
| h-index | WoS | HI100 | 13.00 | 63.00 | 32.21 (10.12) | 2.81 | 20.30*** | 1.29 |
| LI100 | 10.00 | 16.00 | 11.48 (1.42) | |||||
| GS | HI100 | 16.00 | 154.00 | 56.27 (20.34) | 3.62 | 19.68*** | ||
| LI100 | 10.00 | 24.00 | 15.54 (3.78) | |||||
| Avg Slope | WoS | HI100 | 61.18 | 1087.20 | 246.37 (174.50) | 9.50 | 12.62*** | 1.57 |
| LI100 | 9.11 | 49.33 | 25.94 (8.51) | |||||
| GS | HI100 | 205.46 | 4292.21 | 663.22 (568.72) | 14.96 | 10.87*** | ||
| LI100 | 10.55 | 158.88 | 44.32 (23.22) | |||||
| Avg Growth Percentage | WoS | HI100 | 27.82% | 281.58% | 67.72% (40.90%) | 1.18 | 2.27* | 4.01 |
| LI100 | 25.45% | 127.57% | 57.47% (18.97%) | |||||
| GS | HI100 | 2.55% | 4883.16% | 296.56% (521.64%) | 4.73 | 4.48*** | ||
| LI100 | 9.25% | 178.66% | 62.67% (33.52%) | |||||
| Up Years | WoS | HI100 | 9.00 | 28.00 | 16.10 (3.57) | 1.48 | 11.22*** | 1.03 |
| LI100 | 7.00 | 22.00 | 10.91 (2.95) | |||||
| GS | HI100 | 4.00 | 28.00 | 19.50 (4.03) | 1.52 | 12.87*** | ||
| LI100 | 8.00 | 22.00 | 12.82 (3.27) | |||||
| Down Years | WoS | HI100 | 0.00 | 22.00 | 5.07 (3.93) | 0.93 | 0.75 | 1.19 |
| LI100 | 0.00 | 18.00 | 5.47 (3.60) | |||||
| GS | HI100 | 0.00 | 21.00 | 7.62 (4.87) | 1.11 | 1.16 | ||
| LI100 | 0.00 | 21.00 | 6.87 (4.25) | |||||
| Avg Up Growth | WoS | HI100 | 8.15 | 157.85 | 41.15 (25.45) | 4.49 | 12.38*** | 1.47 |
| LI100 | 3.25 | 26.27 | 9.17 (4.42) | |||||
| GS | HI100 | 26.40 | 752.29 | 99.42 (90.33) | 6.61 | 9.30*** | ||
| LI100 | 3.34 | 53.27 | 15.05 (8.82) | |||||
| Avg Down Growth | WoS | HI100 | 0.00 | 205.00 | 24.81 (28.41) | 3.50 | 6.16*** | 1.43 |
| LI100 | 0.00 | 21.50 | 7.08 (4.49) | |||||
| GS | HI100 | 0.00 | 328.08 | 47.88 (44.53) | 4.99 | 8.52*** | ||
| LI100 | 0.00 | 29.38 | 9.59 (5.97) | |||||
| Sum | WoS | HI100 | −6.00 | 23.00 | 10.95 (4.26) | 2.01 | 10.70*** | 1.00 |
| LI100 | −2.00 | 12.00 | 5.44 (2.89) | |||||
| GS | HI100 | −11.00 | 25.00 | 11.88 (5.35) | 2.00 | 9.37*** | ||
| LI100 | −5.00 | 13.00 | 5.95 (3.39) | |||||
| Sum Up | WoS | HI100 | −0.67 | 1.00 | 0.68 (0.25) | 1.33 | 4.66*** | 0.92 |
| LI100 | −0.20 | 1.00 | 0.51 (0.26) | |||||
| GS | HI100 | −2.75 | 1.00 | 0.59 (0.41) | 1.23 | 2.23** | ||
| LI100 | −0.31 | 1.00 | 0.48 (0.25) | |||||
| Avg Length Down Up | WoS | HI100 | 0.00 | 2.00 | 0.37 (0.28) | 0.62 | 5.95*** | 1.19 |
| LI100 | 0.00 | 1.37 | 0.60 (0.26) | |||||
| GS | HI100 | 0.00 | 3.75 | 0.45 (0.41) | 0.74 | 3.31** | ||
| LI100 | 0.00 | 1.31 | 0.61 (0.25) | |||||
| Avg Growth Down Up | WoS | HI100 | 0.00 | 3.71 | 0.69 (0.53) | 0.85 | 1.85 | 0.98 |
| LI100 | 0.00 | 1.83 | 0.81 (0.42) | |||||
| GS | HI100 | 0.00 | 1.65 | 0.58 (0.34) | 0.83 | 2.63** | ||
| LI100 | 0.00 | 1.46 | 0.70 (0.30) |
*p < 0.05, **p < 0.01, ***p < 0.001.
As demonstrated in Table 1, the HI100 sub-corpus is characterized by significantly higher average slope growth. HI100 scholars also exhibited significantly longer incline periods compared to the decline periods than LI100 scholars (as reflected by Sum, SumUp, Up Years, Down Years, and Avg Length Down Up). This indicates an overall more stable and positively directed career trajectories among the scholars in this group, compared to the ones in the LI100 sub-corpus, who have more skewed career trajectories. In addition, 8% of the HI100WoS and 6% of the HI100GS sub-corpora received a perfect Sum Up score (of 1.00), which means that their career was always on the rise, while only 2% of the LI100 scholars received this score in both citation indices. Remarkably, while the inter-index ratio values (GS divided by WoS, displayed in the last column of Table 1) for the general indicators were quite high (between 1.4 and 4), their values for the universal measures based on proportions between different scores (Sum, SumUp, Avg Length Down Up, Avg Growth Down Up) were all very close to 1. This finding indicates the relative independence of these measures from the citation index.
As for differences among academic institution across the citation slope trajectory measures, the results varied between the two citation indices and different measures. In the WoS corpus, the only significant difference between Tel-Aviv University and Bar-Ilan University was in the number of Up Years, in which the former had an advantage (t (198) = 2.86, p = 0.005). However, in the GS corpus, Tel-Aviv University had significantly higher rates in virtually all the indicators, e.g. Avg Slope (t (198) = 2.52, p = 0.01), Up Years (t (198) = 3.58, p < 0.001), Sum rates (t (198) = 4.49, p < 0.001), Sum Up rates (t (198 = 3.58, p < 0.001), Avg Up Growth (t (198) = 2.39, p = 0.02), Avg Length Down Up (t (198) = 3.68, p < 0.001) and Avg Growth Down Up (t (198) = 2.67, p = 0.008).
In terms of gender discrepancies, in WoS we found that men were slightly superior across almost all the measures, however the differences were not statistically significant. We also observed that all the scholars that received negative Sum or Sum Up scores, which reflect more negative growth years than positive growth years, were men (5 out of 200 in WoS and 7 out of 200 in GS).
In addition, we identified significant differences between the faculties, but for different indicators in each citation index as shown in Table 2. Generally, in WoS, the leading faculty was Life Sciences, and in GS, it was the Exact Sciences faculty (in accordance with the previous research findings based on citation values [8]). However, there were no significant differences between the faculties for (almost) all the proportion-based indicators (e.g. Sum, Sum Up, Avg Growth Down Up, Avg Length Down Up) in both citation indices. This demonstrates the stability of these measures for scholars from different fields of study.
Table 2.
The means, standard deviations and significance of the differences in trajectory structure measures between the faculties in the two corpora.
| Dependent variable | Faculty | MeanWoS (SD) | MeanGS (SD) | FWoS df = 2, 197 | FGS df = 2, 197 |
|---|---|---|---|---|---|
| Seniority | Life Sciences | 26.20 (8.00) | 28.65 (11.02) | 14.38*** | 6.98** |
| Exact Sciences | 19.76 (7.66) | 32.84 (12.82) | |||
| Social Sciences | 20.28 (5.94) | 26.21 (8.39) | |||
| Total Citations | Life Sciences | 4497.95 (3435.43) | 9415.65 (6407.59) | 10.00*** | 6.72** |
| Exact Sciences | 2288.82 (3466.92) | 12236.12 (17524.85) | |||
| Social Sciences | 1942.39 (2328.71) | 4046.07 (7196.28) | |||
| h-index | Life Sciences | 29.64 (11.41) | 41.17 (17.91) | 17.27*** | 15.15*** |
| Exact Sciences | 19.01 (12.19) | 42.11 (27.43) | |||
| Social Sciences | 18.22 (10.72) | 22.12 (15.84) | |||
| Avg Slope | Life Sciences | 192.35 (167.85) | 399.35 (324.78) | 5.13** | 5.71** |
| Exact Sciences | 121.77 (174.67) | 438.28 (588.51) | |||
| Social Sciences | 91.91 (102.49) | 175.87 (323.23) | |||
| Avg Growth Percentage | Life Sciences | 55.93% (30.05%) | 407.93% (991.46%) | 2.66 | 5.40** |
| Exact Sciences | 67.32% (36.15%) | 173.96% (223.00%) | |||
| Social Sciences | 58.79% (18.16%) | 104.28% (121.97%) | |||
| Up Years | Life Sciences | 16.59 (3.72) | 16.26 (5.68) | 27.56*** | 9.31*** |
| Exact Sciences | 12.10 (3.53) | 17.27 (4.95) | |||
| Social Sciences | 12.92 (4.18) | 14.02 (3.98) | |||
| Down Years | Life Sciences | 7.09 (4.57) | 6.83 (4.49) | 9.92*** | 5.07** |
| Exact Sciences | 4.52 (3.30) | 8.07 (4.98) | |||
| Social Sciences | 4.69 (2.66) | 5.84 (3.31) | |||
| Avg Up Growth | Life Sciences | 34.69 (25.22) | 73.86 (69.35) | 7.04** | 5.78** |
| Exact Sciences | 22.77 (24.90) | 68.18 (88.91) | |||
| Social Sciences | 17.50 (15.15) | 30.16 (39.06) | |||
| Avg Down Growth | Life Sciences | 22.80 (25.22) | 28.44 (24.51) | 4.10* | 7.53** |
| Exact Sciences | 14.03 (22.92) | 36.36 (43.83) | |||
| Social Sciences | 11.03 (8.50) | 14.34 (18.06) | |||
| Sum | Life Sciences | 9.50 (5.58) | 9.43 (6.51) | 3.60* | 0.84 |
| Exact Sciences | 7.51 (3.86) | 9.20 (5.51) | |||
| Social Sciences | 8.22 (4.43) | 8.18 (4.57) | |||
| Sum Up | Life Sciences | 0.55 (0.33) | 0.47 (0.74) | 1.16 | 0.74 |
| Exact Sciences | 0.61 (0.25) | 0.53 (0.27) | |||
| Social Sciences | 0.61 (0.22) | 0.57 (0.24) | |||
| Avg Length Down Up | Life Sciences | 0.50 (0.34) | 0.62 (0.75) | 0.20 | 0.86 |
| Exact Sciences | 0.48 (0.29) | 0.53 (0.26) | |||
| Social Sciences | 0.47 (0.21) | 0.52 (0.26) | |||
| Social Sciences | 11.03 (8.50) | 14.34 (18.06) | |||
| Avg Growth Down Up | Life Sciences | 0.73 (0.36) | 0.48 (0.27) | 0.13 | 3.71* |
| Exact Sciences | 0.77 (0.54) | 0.68 (0.34) | |||
| Social Sciences | 0.74 (0.45) | 0.62 (0.31) |
*p < 0.05, **p < 0.01, ***p < 0.001.
The HI100 group was also found to be superior to the LI100 group when examining the division of slope trajectories into sub-periods (11 in WoS and 8 in GS). As shown in Fig. 2, Fig. 3, Fig. 4, Fig. 5 below, it seems that the HI100 scholars are characterized by significantly higher Sum scores in the first sub-periods (4 in WoS and 5 in GS). We can observe a reversed trend only at very late stages (7th period in WoS and 8th period in GS), when the number of participants is not representative and the differences are not significant, however the slope growth rates remain higher for the HI100 scholars across all periods. In addition, the LI100 scholars already show very low Sum rates (less than 1) in the early-middle period of their careers (3rd period in WoS and 4th period in GS), which means that from this point forward their citation slope trajectory is more skewed (with almost the same number of incline and decline years). This corresponds to the significantly higher average Sum rates for the entire career slope trajectories of HI100 sub-corpus. Fig. 2, Fig. 3, Fig. 4, Fig. 5 present the differences between the HI100 and LI100 sub-corpora across the sub-periods' scores in the two citation indices.
Fig. 2.
Differences in Sum rates between the HI100 and LI100 sub-corpora across the sub-periods' scores in WoS corpus.
Fig. 3.
Differences in Sum rates between the HI100 and LI100 sub-corpora across the sub-periods' scores in GS corpus.
Fig. 4.
Differences in growth rates between the HI100 and LI100 sub-corpora across the sub-periods' scores in WoS corpus.
Fig. 5.
Differences in growth rates between the HI100 and LI100 sub-corpora across the sub-periods' scores in GS corpus.
As for differences between academic institutions across sub-period scores, the results varied between the two citation indices. In the WoS corpus, we found Tel-Aviv University to be dominant across almost all sub-periods, yet the differences were not statistically significant. However, in the GS corpus, Tel-Aviv University had significantly higher Sum rates in the 1st period (t (198) = 3.41, p = 0.001), 3rd period (t (196) = 2.10, p = 0.04, and 5th period (t (116) = 2.50, p = 0.01), and significantly higher growth rates in the 3rd period (t (196) = 2.47, p = 0.01). All other periods also showed an advantage for Tel-Aviv University, although it was not significant.
In terms of gender discrepancies, in WoS no significant differences were found. However, men had a slight advantage in the Sum rates of the 1st, 2nd, 4th and 11th periods, while women had the advantage later in the 3rd period and from the 5th to 10th periods. Yet in the GS corpus, we found female dominance across the first three sub-periods, with a significant advantage in the 2nd period (F = 2.64 (1.83), M = 1.84 (1.99), p < 0.05) and 3rd period (F = 2.71 (1.80), M = 1.80 (2.03), p = 0.03). Then, men had a non-significant advantage in the 4th to 6th periods, and again a reversed trend in the 7th and 8th periods. The female dominance in the 2nd and 3rd periods indicates that women are at their scholarly prime 6–15 years from the start of their careers. As for sub-periods’ growth rates, no significant differences were found in both corpora.
With respect to faculties, the only significant difference was found in the WoS corpus and only in the 1st period, with the Life Sciences faculty having an advantage in both Sum and growth rates over Exact Sciences and Social Sciences. It also showed a non-significant advantage in the 3rd period. The GS corpus showed a very diverse trend between the faculties. There were no significant differences in Sum rates, however in terms of growth rates: Life Sciences had a significant advantage in the 1st and 3rd periods, while Exact Sciences had the lead in the 2nd and 4th periods. Thus, overall, the proportion-based Sum measure seem to be quite discipline- and index-independent.
4.2. Classification models
We computed several bivariate logistic regression models with different trajectory structure measures as dependent variables for detecting the main factors that influence the scholar classification into the citation number-based impact groups (HI100/LI100). Notably, most of the variables included in the models are proportion-based indicators that do not directly reflect the citation (or h-index) scores, but rather capture the citation trajectory structure characteristic for each group. Table 3, Table 4 show the details of the obtained models.
Table 3.
The logistic regression coefficients for the factors that influence the scholar's group, based on trajectory structure measures without seniority, in the WoS corpus.
| Factors | Dependent variable: Scholar group |
|||||
|---|---|---|---|---|---|---|
| BCa 95% Confidence Interval |
||||||
| B | Bias | S.E. | Exp(B) | Lower | Upper | |
| Sum Up (*) | −0.04 | −0.003 | 0.02 | 0.96** | −0.07 | −0.02 |
| Avg Growth Down Up (*) | −0.01 | −0.001 | 0.01 | 0.99** | −0.02 | −0.01 |
| 1st period Sum | 0.84 | 0.06 | 0.21 | 2.32*** | 0.46 | 1.54 |
| 2nd period Sum | 0.63 | 0.04 | 0.20 | 1.88** | 0.26 | 2.00 |
| 3rd period Sum | 0.67 | 0.05 | 0.17 | 1.96*** | 0.37 | 1.26 |
*p < 0.05, **p < 0.01, ***p < 0.001, (*) - normalized.
Table 4.
The logistic regression coefficients for the factors that influence the scholar's group, based on trajectory structure measures without seniority, in the GS corpus.
| Factors | Dependent variable: Scholar group |
|||||
|---|---|---|---|---|---|---|
| BCa 95% Confidence Interval |
||||||
| B | Bias | S.E. | Exp(B) | Lower | Upper | |
| Sum Up (*) | −0.02 | −0.001 | 0.01 | 0.98* | −0.05 | 0.01 |
| Avg Growth Down Up (*) | −0.02 | −0.001 | 0.01 | 0.98* | −0.03 | −0.01 |
| 1st period Sum | 0.29 | 0.01 | 0.17 | 1.34 | −0.05 | 0.70 |
| 2nd period Sum | 0.26 | 0.01 | 0.13 | 1.30* | −0.01 | 0.60 |
| 3rd period Sum | 0.41 | 0.03 | 0.14 | 1.50** | 0.14 | 0.81 |
*p < 0.05, **p < 0.01, ***p < 0.001, (*) – normalized.
The regression model for WoS was found significant Chi2 (5) = 79.80, p < 0.001, with the influence variables explaining 42% (Cox & Snell R2 = 0.42) and 56% (Nagelkerke R2 = 0.56) of the variance. All the examined coefficients were found significant.
The regression model for GS was found significant Chi2 (5) = 32.20, p < 0.001, with the influence variables explaining 20% (Cox & Snell R2 = 0.20) and 26% (Nagelkerke R2 = 0.26) of the variance. All the examined coefficients, except for the 1st period Sum, were found significant.
Then, we applied the automatic classifier (PredictionProb) with 95% confidence interval based on the above models using 4-fold cross-validation technique to test the model classification accuracy. To this end, each of the two datasets of 100 scholars (HI100 and LI100) was randomly partitioned to four subsamples and the model was trained each time on about 70–75% of the sample and tested on the remaining 25–30%. We used the following values of the convergence criteria for the maximum likelihood estimation algorithm: PIN (0.05), POUT (0.10), ITERATE (20) and CUT (0.5). Table 5, Table 6 present the crosstabulation of the classifier.
Table 5.
Crosstabulation of the classification WoS model.
| Scholar groupa | ||||||
|---|---|---|---|---|---|---|
| LI100 |
HI100 |
Total |
||||
| N | % | N | % | N | % | |
| LI100 | 18 | 64.3% | 10 | 35.7% | 28 | 100.0% |
| HI100 | 2 | 8.3% | 22 | 91.7% | 24 | 100.0% |
| Total | 20 | 38.5% | 32 | 61.5% | 52 | 100.0% |
Chi2 (1) = 17.09, p < 0.001.
Table 6.
Crosstabulation of the classification GS model.
| Scholar groupa | ||||||
|---|---|---|---|---|---|---|
| LI100 |
HI100 |
Total |
||||
| N | % | N | % | N | % | |
| LI100 | 21 | 75.0% | 7 | 25.0% | 28 | 100.0% |
| HI100 | 5 | 20.8% | 19 | 79.2% | 24 | 100.0% |
| Total | 26 | 50.0% | 26 | 50.0% | 52 | 100.0% |
Chi2 (1) = 15.17, p < 0.001
As shown in Table 5, the overall classifier's accuracy for WoS was 76.9%. It correctly classified a scholar from the LI100 group in 64.3% of the cases and 91.7% of the cases in the HI100 group.
As shown in Table 6, the overall classifier's accuracy for GS was 76.9%. It correctly classified a scholar from the LI100 group in 75.0% of the cases and 79.2% of the cases in the HI100 group.
4.3. Qualitative analysis of the mismatches
Finally, we conducted a qualitative analysis to closely examine the scholars that the second trajectory model-based classifiers (presented in Table 5, Table 6) were not able to classify correctly, i.e. those whose trajectory-based classification does not match their citation number-based classification. To this end, PredictionProb (calculated separately for each citation index) was applied to all the scholars in the study corpus. Overall, there were 36 classification mismatches in WoS and 53 in GS.
The first type of mismatches were scholars in HI100 who were classified as LI100 based on the trajectory-based model. In the WoS corpus, 15 scholars from HI100 were classified in the LI100 group. Most of them were male (11), mostly from Life Sciences (7) and Exact Sciences (6) faculties (and only two from Social Sciences), with an average seniority rate of 29.4 (±10.2). The average Sum Up value of these scholars was 0.51 (±0.25), average Avg Growth Down Up was 0.74 (±0.36) and the sub-periods Sum was in the range of 0.87–1.87. In the GS corpus, 26 scholars from HI100 were classified in the LI100 group. Most of them were male (24), and almost all of them (21) were from Exact Sciences and Engineering faculties. Their average seniority rate was 41.19 (±10.24), average Sum Up value was 0.45 (±0.17), Avg Growth Down Up was 0.72 (±0.35), and the sub-periods Sum was in the range of 0.58–2.46. And so, the trajectory measures values for these scholars were closer to those of LI100 than to those of HI100 (see Table 1), which explains the classifier's result for them. These findings show that the classifier demoted mostly scholars with relatively high seniority from the faculties that tend to gain higher citation numbers in each citation index, but whose trajectory indicators were quite low.
As demonstrated in Fig. 6, the majority of these mismatches are late bloomers, early-mid career decliners or scholars whose trajectories exhibit mixed incline-decline trends. This shows that in some cases (particularly for scholars with a high seniority), it is possible to reach an overall high citation level even after some substantial drops on the way, but these are not characteristic trajectory structures of successful scholars. On the other hand, it would probably be reasonable to limit the analyzed career length (e.g. to 30 years) to avoid “punishing” scholars for a decline trend in their trajectories after (or close to) retirement. This is especially relevant for GS, where seniority values are generally higher than in WoS (see Table 2) which explains the higher number of mismatches of this type (HI100 scholars classified as LI100) in the GS citation index compared to WoS.
Fig. 6.
Examples for slope trajectories of the HI100 outliers who were classified as LI100.
The second type of mismatches (LI100 classified as HI100) was mostly comprised of scholars with (almost) constantly increasing citation slope trajectories which are characteristic of successful scholars, but from faculties that typically gain lower citation scores (as shown in Fig. 7). In the LI100 group – 21 scholars were classified incorrectly as HI100 in the WoS corpus: 15 males and six females. Most of them (14) from the Exact Sciences and Engineering faculties, four from Social Sciences and three from Life Sciences, with an average seniority rate of 17.52 (±4.32). Their average trajectory indicators scores were very close to the scores of the HI100 group (see Table 1) (Sum Up of 0.66 (±0.28), Avg Growth Down Up of 0.62 (±0.42) and the sub-periods Sum in the range of 2.33–3.38), which explains their classification result. In the GS corpus, 27 scholars from the LI100 group were classified in the HI100 group. Most of them were male (19), most of them (16) from the Social Sciences faculty, eight from Exact Sciences and three from Life Sciences. Their average seniority rate was 21.42 (±7.93), average Sum Up value was 0.67 (±0.19), Avg Growth Down Up was 0.50 (±0.3) and the sub-periods Sum was in the range of 2.3–3.3, that are quite close to those of the HI100 group (rather than to LI100) - as expected. These figures, combined with the observations from their trajectory structures (see Fig. 7), lead us to the conclusion that these are mostly scholars with (almost) constantly increasing citation slope trajectories which are characteristic of successful scholars, but from faculties that typically gain lower citation scores in each index, and this explains the mismatch between their citation number-based and trajectory-based classifications.
Fig. 7.
Examples for slope trajectories of the LI100 outliers who were classified as HI100.
This result implies that these scholars, if they continue to maintain the same trend, have great potential to become top impactful scholars. There were also five outliers whose trajectories displayed mixed trends and their proportion-based indicators were closer to the LI100 group, but the model classified them as HI100. This requires further refinement and extension of the proposed trajectory-based measures, which is a subject for future work. We also note that three scholars (one in WoS and two in GS) could not be classified, all of them from the LI100 group. This is due to very low rates of seniority (12 on average), resulting in the absence of data regarding the 3rd career trajectory sub-period, which is included in the model.
5. Discussion and conclusions
This study is part of an evolving discipline used to predict the scientific development trends, named “science of science” [61,62]. “Science of science” uses large-scale data dealing with the production of science to search for universal and domain-specific patterns. This is possible due to the vast availability of digital data on scholarly output [61].
This research presented a new quantitative methodology for modeling scholarly success based on a set of measures that capture the structure of scholars' citation slope trajectories.
These measures were used as influence factors in bivariate logistic regression models and probabilistic classifiers based on these models. To test the effectiveness of the developed methodology, we experimented with the corpus of the citation trajectories of 400 tenured professors in Israel retrieved from two citation indices, WoS and GS. We found that the developed proportion-based trajectory measures were quite stable and not influenced by the various demographic factors, such as field of study, seniority and gender, and present similar trends and scales for both examined citation indices, and thus can be considered universal. This is contrary to the other standard impact measures (e.g. the overall or yearly citation number, and h-index) and measures comprised of a single variable derived from the trajectory structure (e.g. citation slope growth or length of the incline periods), that were found to be more biased to certain disciplines in different citation indices. In addition, our expectations regarding the general characteristics of the citation slope trajectory structure of the successful scholars was confirmed. The HI100 group was found to be superior to the LI100 group in almost all the measures. In particular, we found that although both groups may have a similar number of decline years, the proportion of incline years relatively to decline years is much more meaningful in determining scholarly success. Also, while past research reported controversial conclusions regarding the most influential period for future academic impact (e.g. Refs. [4,19,23,39], the results of the Sum and Sum Up measures (in general as well as for various sub-periods) showed significant differences between the two scholar groups overall and across at least four first sub-periods (20 years of activity). This suggests that early academic investment alone is not sufficient, and consistency must be preserved to maintain the top scholarly status, i.e. it is important to maintain a constant and continuous incline with increasing growth rates.
The constructed classifiers showed the 75–79% classification match between the scholarly impact groups based on the citation number rates and on the trajectory structure model. Remarkably, the optimal models of both citation indices comprised the same predictive variables. Finally, the qualitative analysis of the classification mismatches suggests that some HI100 scholars whose slope trajectories continuously decline at some point may lose their high impact status despite gaining quite high overall citation and h-index scores. On the other hand, some scientists, whose slope trajectories are (virtually) constantly on the rise, are identified by our models as good candidates to become highly impactful scholars despite the currently low total citation and h-index values. In practice, since increasing scholarly impact is currently one of the main concerns and goals of every scientist and academic institution, the simple trajectory structure-based measures proposed in this study can be utilized to assess and predict the potential of young and mid-career scientists to become high impact scholars. Furthermore, the trajectory-based measures reflect an individual scholar's trend and impact increase compared to his/her own starting point. Thus, they define a scholar-centered evaluation approach, based on the self-improvement, as a complimentary type of scholar evaluation that can be combined with the traditional community-centered evaluation, based on the comparison to peers.
In future research, we plan to refine the model by analyzing the citation trajectories of a large and balanced corpus of thousands of scholars from different countries, institutions, seniority levels and disciplines. While in the current study we experimented with two extreme groups (most and least cited scholars), it could be interesting to expand the classification task to intermediate citation level groups. In addition, a large-scale longitudinal study, that will track the changes in the slope trajectories of scholars over time, will enable the application of more advanced computational tools in order to extend, fine-tune and generalize the initial set of measures and classification models presented in this paper.
Author contribution statement
Maor Weinberger; Maayan Zhitomirsky-Geffet: conceived and designed the experiments; performed the experiments; analyzed and interpreted the data; contributed reagents, materials, analysis tools or data; wrote the paper.
Data availability statement
Data will be made available on request.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Pan R.K., Fortunato S. Author impact factor: tracking the dynamics of individual scientific impact. Sci. Rep. 2014;4:4880. doi: 10.1038/srep04880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Waltman L. A review of the literature on citation impact indicators. J. Inform. 2016;10(2):365–391. doi: 10.1016/j.joi.2016.02.007. [DOI] [Google Scholar]
- 3.Way S.F., Morgan A.C., Clauset A., Larremore D.B. The misleading narrative of the canonical faculty productivity trajectory. Proc. Natl. Acad. Sci. U. S. A. 2017;114(44):E9216–E9223. doi: 10.1073/pnas.1702121114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sinatra R., Wang D., Deville P., Song C., Barabasi A.L. Quantifying the evolution of individual scientific impact. Science. 2016;354(6312):596–604. doi: 10.1126/science.aaf5239. [DOI] [PubMed] [Google Scholar]
- 5.Hirsch J.E. An index to quantify an individual's scientific research output. Proc. Natl. Acad. Sci. U. S. A. 2005;102(46):16569–16572. doi: 10.1073/pnas.0507655102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hirsch J.E. Does the h index have predictive power? Proc. Natl. Acad. Sci. USA. 2007;104(9):19193–19198. doi: 10.1073/pnas.0707962104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Levene M., Fenner T., Bar-Ilan J. Characterisation of the χ-index and the rec-index. Scientometrics. 2019;120(2):885–896. doi: 10.1007/s11192-019-03151-7. [DOI] [Google Scholar]
- 8.Weinberger M., Zhitomirsky-Geffet M. Diversity of success: measuring the scholarly performance diversity of tenured professors in the Israeli academia. Scientometrics. 2021;126(4):2931–2970. doi: 10.1007/s11192-020-03823-9. [DOI] [Google Scholar]
- 9.Abramo G., D'Angelo C.A., Murgia G. The combined effects of age and seniority on research performance of full professors. Sci. Publ. Pol. 2016;43(3):301–319. doi: 10.1093/scipol/scv037. [DOI] [Google Scholar]
- 10.Brizan D.G., Gallagher K., Jahangir A., Brown T. Predicting citation patterns: defining and determining influence. Scientometrics. 2016;108(1):183–200. [Google Scholar]
- 11.Garfield E. Citation analysis as a tool in journal evaluation. Science. 1972;178(4060):471–479. doi: 10.1126/science.178.4060.471. [DOI] [PubMed] [Google Scholar]
- 12.Gogoglou A., Manolopoulos Y. A data-driven unified framework for predicting citation dynamics. IEEE Transactions on Big Data. 2020;6(4):727–740. doi: 10.1109/TBDATA.2018.2884505. [DOI] [Google Scholar]
- 13.Ke W. A fitness model for scholarly impact analysis. Scientometrics. 2013;94(3):981–998. doi: 10.1007/s11192-012-0787-5. [DOI] [Google Scholar]
- 14.Franceschini F., Galetto M., Maisano D., Mastrogiacomo L. The success-index: an alternative approach to the h-index for evaluating an individual's research output. Scientometrics. 2012;92(3):621–641. doi: 10.1007/s11192-011-0570-z. [DOI] [Google Scholar]
- 15.Kaur J., Radicchi F., Menczer F. Universality of scholarly impact metrics. J. Inform. 2013;7(4):924–932. doi: 10.1016/j.joi.2013.09.002. [DOI] [Google Scholar]
- 16.Radicchi F., Fortunato S., Castellano C. Universality of citation distributions: toward an objective measure of scientific impact. Proc. Natl. Acad. Sci. USA. 2008;105(45):17268–17272. doi: 10.1073/pnas.0806977105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wildgaard L., Schneider J.W., Larsen B. A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics. 2014;101(1):125–158. doi: 10.1007/s11192-014-1423-3. [DOI] [Google Scholar]
- 18.Feichtinger G., Grass D., Kort P.M. Optimal scientific production over the life cycle. J. Econ. Dynam. Control. 2019;108 doi: 10.1016/j.jedc.2019.103752. [DOI] [Google Scholar]
- 19.Powers T.L., Swan J.E., Bos T., Patton J.F. Career research productivity patterns of marketing academicians. J. Bus. Res. 1998;42(1):75–86. doi: 10.1016/S0148-2963(97)00099-4. [DOI] [Google Scholar]
- 20.Adams J. Early citation counts correlate with accumulated impact. Scientometrics. 2005;63(3):567–581. doi: 10.1007/s11192-005-0228-9. [DOI] [Google Scholar]
- 21.Bjork S., Offer A., Soderberg G. Time series citation data: the Nobel Prize in economics. Scientometrics. 2014;98(1):185–196. doi: 10.1007/s11192-013-0989-5. [DOI] [Google Scholar]
- 22.Petersen A.M., Fortunato S., Pan R.K., Kaski K., Penner O., Rungi A., Riccaboni M., Stanley H.E., Pammolli F. Reputation and impact in academic careers. Proc. Natl. Acad. Sci. U. S. A. 2014;111(43):15316–15321. doi: 10.1073/pnas.1323111111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Silva F.N., Tandon A., Amancio D.R., Flammini A., Menczer F., Milojevic S., Fortunato S. Recency predicts bursts in the evolution of author citations. Quantit. Sci. Stud. 2019;1(3):1298–1308. doi: 10.1162/qss_a_00070. [DOI] [Google Scholar]
- 24.Weinberger M., Zhitomirsky-Geffet M., Bouhnik D. Poster Presented at the Association for Information Science and Technology Annual Meeting (ASIS&T 2020), Virtual. 2020. Identifying Citation Growth Patterns of the Top Scholars in Israel. [Google Scholar]
- 25.Acuna D.E., Allesina S., Kording K.P. Predicting scientific success. Nature. 2012;489(7415):201–202. doi: 10.1038/489201a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Weihs L., Etzioni O. In the Proceedings of: the 17th ACM/IEEE Joint Conference on Digital Libraries (JCDL '17), Toronto, ON, Canada. 2017. Learning to predict citation-based impact measures. [DOI] [Google Scholar]
- 27.Ding C.G., Hung W.-C., Lee M.-C., Wang H.-J. Exploring paper characteristics that facilitate the knowledge flow from science to technology. J. Inform. 2017;11(1):244–256. doi: 10.1016/j.joi.2016.12.004. [DOI] [Google Scholar]
- 28.Egghe L. Theory and practice of the g-index. Scientometrics. 2006;69(1):131–152. doi: 10.1007/s11192-006-0144-7. [DOI] [Google Scholar]
- 29.Mitra P. Hirsch-type indices for ranking institutions scientific research output. Curr. Sci. 2006;91(11):1439. [Google Scholar]
- 30.Fenner T., Harris M., Levene M., Bar-Ilan J. A novel bibliometric index with a simple geometric interpretation. PLoS One. 2018;13(1):1–14. doi: 10.1371/journal.pone.0200098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Waltman L., van Eck N.J. The inconsistency of the h-index. J. Am. Soc. Inf. Sci. Technol. 2012;63(2):406–415. doi: 10.1002/asi.21678. [DOI] [Google Scholar]
- 32.Hicks D., Wouters P., Waltman L., de Rijcke S., Rafols I. The Leiden Manifesto for research metrics. Nature. 2015;520(7548):429–431. doi: 10.1038/520429a. [DOI] [PubMed] [Google Scholar]
- 33.Petersen A.M., Riccaboni M., Stanley H.E., Pammolli F. Persistence and uncertainty in the academic career. Proc. Natl. Acad. Sci. U. S. A. 2012;109(14):5213–5218. doi: 10.1073/pnas.1121429109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.White C.S., James K., Burke L.A., Allen R.S. What makes a "research star"? Factors influencing the research productivity of business faculty. Int. J. Prod. Perform. Manag. 2012;61(6):584–602. doi: 10.1108/17410401211249175. [DOI] [Google Scholar]
- 35.Kelchtermans S., Veugelers R. Top research productivity and its persistence: a survival time analysis for a panel of Belgian scientists. DTEW Res. Report. 2005;576:1–31. doi: 10.1089/dst.2013.0013. [DOI] [Google Scholar]
- 36.Li J., Yin Y., Fortunato S., Wang D. Scientific elite revisited: patterns of productivity, collaboration, authorship and impact. J. R. Soc., Interface. 2020;17(165) doi: 10.1098/rsif.2020.0135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yair G., Gueta N., Davidovitch N. The law of limited excellence: publication productivity of Israel Prize laureates in the life and exact sciences. Scientometrics. 2017;113(1):299–311. doi: 10.1007/s11192-017-2465-0. [DOI] [Google Scholar]
- 38.Adams J. Early citation counts correlate with accumulated impact. Scientometrics. 2005;63(3):567–581. doi: 10.1007/s11192-005-0228-9. [DOI] [Google Scholar]
- 39.Simonton D.K. Creative productivity: a predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 1997;104(1):66–89. doi: 10.1037/0033-295X.104.1.66. [DOI] [Google Scholar]
- 40.Clauset A., Larremore D.B., Sinatra R. Data-driven predictions in the science of science. Science. 2017;355(6324):477–480. doi: 10.1126/science.aal4217. [DOI] [PubMed] [Google Scholar]
- 41.Stephan P.E., Levin S.G. Oxford University Press; Oxford, UK: 1992. Striking the Mother Lode in Science: the Importance of Age, Place, and Time. [Google Scholar]
- 42.Campbell P.G., Awe O.O., Maltenfort M.G., Moshfeghi D.M., Leng T., Moshfeghi A.A., Ratliff J.K. Medical school and residency influence on choice of an academic career and academic productivity among neurosurgery faculty in the United States. Clinical article. J. Neurosurg. 2011;115(2):380–386. doi: 10.3171/2011.3.JNS101176. [DOI] [PubMed] [Google Scholar]
- 43.Abramo G., D'Angelo C.A., Caprasecca A. Gender differences in research productivity: a bibliometric analysis of the Italian academic system. Scientometrics. 2009;79(3):517–539. doi: 10.1007/s11192-007-2046-8. [DOI] [Google Scholar]
- 44.Cooper T., Aharony N., Bar-Ilan J., Rabin Margalioth S. Women in academia: a bibliometric perspective. Inf. Res. 2019;24(4) http://InformationR.net/ir/24-4/colis/colis1926.html Retrieved from: [Google Scholar]
- 45.Kyvik S. Age and scientific productivity: differences between fields of learning. High Educ. 1990;19(1):37–55. doi: 10.1007/BF00142022. [DOI] [Google Scholar]
- 46.Kyvik S., Teigen M. Child care, research collaboration, and gender differences in scientific productivity. Sci. Technol. Hum. Val. 1996;21(1):54–71. doi: 10.1177/016224399602100103. [DOI] [Google Scholar]
- 47.Raj A., Carr P.L., Kaplan S.E., Terrin N., Breeze J.L., Freund K.M. Longitudinal analysis of gender differences in academic productivity among medical faculty across 24 medical schools in the United States. Acad. Med. 2016;91(8):1074–1079. doi: 10.1097/ACM.0000000000001251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Reed D.A., Enders F., Lindor R., McClees M., Lindor K.D. Gender differences in academic productivity and leadership appointments of physicians throughout academic careers. Acad. Med. 2011;86(1):43–47. doi: 10.1097/ACM.0b013e3181ff9ff2. [DOI] [PubMed] [Google Scholar]
- 49.Yang G., Villalta J.D., Weiss D.A., Carroll P.R., Breyer B.N. Gender differences in academic productivity and academic career choice among urology residents. J. Urol. 2012;188(4):1286–1290. doi: 10.1016/j.juro.2012.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Eloy J.A., Svider P., Chandrasekhar S.S., Husain Q., Mauro K.M., Setzen M., Baredes S. Gender disparities in scholarly productivity within academic otolaryngology departments. Otolaryngology-Head Neck Surg. (Tokyo) 2013;148(2):215–222. doi: 10.1177/0194599812466055. [DOI] [PubMed] [Google Scholar]
- 51.Lariviere V., Vignola-Gagne E., Villeneuve C., Gelinas P., Gingras Y. Sex differences in research funding, productivity and impact: an analysis of Quebec university professors. Scientometrics. 2011;87(3):483–498. doi: 10.1007/s11192-011-0369-y. [DOI] [Google Scholar]
- 52.Fulton O., Trow M. Research activity in American higher education. Sociol. Educ. 1974;47(1):29–73. doi: 10.2307/2112166. [DOI] [Google Scholar]
- 53.Blackburn R.T., Behymer C.E., Hall D.E. Research note: correlates of faculty publications. Sociol. Educ. 1978;51(2):132–141. doi: 10.2307/2112245. [DOI] [Google Scholar]
- 54.Wanner R.A., Lewis L.S., Gregorio D.I. Research productivity in academia: a comparative study of the sciences, social sciences and humanities. Sociol. Educ. 1981;54(4):238–253. doi: 10.2307/2112566. [DOI] [Google Scholar]
- 55.Sabharwal M. Comparing research productivity across disciplines and career stages. J. Comp. Pol. Anal.: Research and Practice. 2013;15(2):141–163. doi: 10.1080/13876988.2013.785149. [DOI] [Google Scholar]
- 56.Stack S. Gender, children and research productivity. Res. High. Educ. 2004;45(8):891–920. doi: 10.1007/s11162-004-5953-z. [DOI] [Google Scholar]
- 57.Sarigol E., Pfitzner R., Scholtes I., Garas A., Schweitzer F. Predicting scientific success based on coauthorship networks. EPJ Data Sci. 2014;3:1–16. doi: 10.1140/epjds/s13688-014-0009-x. Article number: 9. [DOI] [Google Scholar]
- 58.Garfield E. From the science of science to Scientometrics visualizing the history of science with HistCite software. J. Inform. 2009;3(3):173–179. doi: 10.1016/j.joi.2009.03.009. [DOI] [Google Scholar]
- 59.Harrell F.E., Jr., Lee K.L., Mark D.B. Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 2004;15(4):361–387. doi: 10.1002/(sici)1097-0258(19960229)15:4<361::aid-sim168>3.0.co;2-4. [DOI] [PubMed] [Google Scholar]
- 60.Smith G.C.S., Seaman S.R., Wood A.M., Royton P., White I.R. Correcting for optimistic prediction in small data sets. Am. J. Epidemiol. 2014;180(3):318–324. doi: 10.1093/aje/kwu140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Fortunato S., Bergstrom C.T., Borner K., Evans J.A., Helbing D., Milosević S., Petersen A.M., Radicchi F., Sinatra R., Uzzi B., Vespignani A., Waltman L., Wang D., Barabasi A.L. Science of science. Science. 2018;359(6379) doi: 10.1126/science.aao0185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hou J., Pan H., Gou T., Lee I., Kong X., Xia F. Prediction methods and applications in the science of science: a survey. Comp. Sci. Rev. 2019;34 doi: 10.1016/j.cosrev.2019.100197. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.







