A novel triangle mapping technique to study the h-index based citation distribution

Chun-Ting Zhang

doi:10.1038/srep01023

. 2013 Jan 3;3:1023. doi: 10.1038/srep01023

A novel triangle mapping technique to study the h-index based citation distribution

Chun-Ting Zhang ^1,^a

PMCID: PMC3535691 PMID: 23289065

Abstract

The h-index has received wide attention in recent years. The area under the citation function is divided by the h-index into three parts, representing h-squared, excess and h-tail citations. The h-index by itself does not carry information for excess and h-tail citations, which can play an even more dominant role than h-index in determining the citation curve, and therefore it is necessary to examine the relations among them. A triangle mapping technique is proposed here to map the three percentages of these citations onto a point within a regular triangle. By viewing the distribution of mapping points, shapes of the citation functions can be studied in a perceivable form. As an example, the distribution of the mapping points for 100 most prolific economists is studied by this technique.

The h-index, proposed by Hirsch¹ for evaluating the academic impact of individual researchers, has received wide attention in recent years. The citations received by all papers of a given researcher can be characterized by a citation distribution function, where the y-axis corresponds to the citations received by a paper, whereas the x-axis represents the paper rank arranged in descending order of citations (Fig. 1). The distribution of citation verse paper rank is called the citation distribution function or curve, denoted by Inline graphic , in which the paper receives citations. The h-index was simply defined as ¹. The area under the citation distribution curve is divided by the h-index into two parts: those of the h-core² and the h-tail³. The former is further divided into another two parts: those of excess citations⁴ and h-squared citations¹. As a consequence, the total citations are divided into three different parts: h-squared, excess and h-tail citations (Fig. 1). Indeed, the h-index lacks information for the excess and the h-tail citations, keeping only the citations related to the h-index ( Inline graphic ). Theoretically, only when is dominant among the three parts, the h-index can properly reflect the academic performance of the scientist under study, otherwise, the h-index leads to biased evaluation. The question that whether the h-index dominates the citations or not depends on the shape of citation distribution function.

The y-axis corresponds to the citations received by a paper, whereas the x-axis represents the paper rank arranged in descending order of citations. The area under the citation distribution curve is divided by the h-index into three parts: , (excess) and (h-tail).

Inline graphic — The y-axis corresponds to the citations received by a paper, whereas the x-axis represents the paper rank arranged in descending order of citations. The area under the citation distribution curve is divided by the h-index into three parts: , (excess) and (h-tail).

As pointed out by Bornmann and co-workers⁵, for an isohindex group (scientists having the same h-index), their associated citation distribution functions may display quite different shapes. Therefore, to study how to apply h-index fairly, it is necessary to study the shape of the citation distribution functions, and the current study aims to address this question by using a triangle mapping technique. One of the advantages of citation triangle method is that the comparison of different shapes of the citation distribution functions can be performed intuitively. By viewing the distribution of mapping points within the triangle, the shapes of the citation distribution functions can be studied with a perceivable manner. Based on this method, we are able to study the degree with which the h-index is applicable properly. It is hoped that the technique presented here is useful for using the h-index to evaluate academic performance in a more unbiased way.

Results

We here propose a novel triangle mapping technique to study the relations among h-squared, excess and h-tail citations. For a regular triangle, the sum of the distances from any interior point to the three sides is equal to a constant, the height of the triangle. Note that the sum of the percentages for h², e² and t² is also a constant, which equals to 1. Based on this characteristic, percentages for these 3 kinds of citations are mapped onto a point in a regular triangle (Fig. 2A). Refer to the Method section for details.

(A) A regular triangle ABC with its height being equal to 1 and center situated at O. A Descartes coordinate system is set up with its origin at O. The three sides of the triangle are denoted by h, e and t, and the distances of an interior point to them are equal to H, E and T, respectively. Therefore, the point is the mapping point for the three real numbers H, E and T. (B) The regular triangle is divided into nine smaller regular triangles. The intervals of H, E and T for each of the 9 smaller triangles are shown in Table 1.

First of all, let us consider two concrete examples. According to the citation information provided by Dodson⁶, Inline graphic , (), (), and we find , and . Therefore, the mapping point corresponding to Dodson is situated at the region No. 4, where the h-index is applicable (Fig. 2B). The second example is for the chemist Berni Alder, where (), () and ⁴, and we find , and . Therefore, the mapping point corresponding to Alder is situated at the region No. 5, where the e-index is absolutely dominant (Fig. 2B). This example shows that Alder's h-index severely under-estimates his academic impact, and in this case, the e-index should be used together with the h-index for a fair evaluation⁴.

In what follows, let us apply the citation triangle method to study the cases of citations of the 100 most prolific economists⁷. The data used to derive the corresponding h-index, e-index and t-index were kindly provided by Dr. Tol. As a consequence, we calculated the coordinates Inline graphic and of each mapping point corresponding to each economist. The distribution of the 100 points is showed in Fig. 3A. As we can see that only two points are situated at the region No. 3, i.e., an h-index dominant region. Meanwhile, only 11 points (11%) are situated above the horizontal line Inline graphic or , where the h-index can be properly applicable. Accordingly, for the remaining cases (89%), where or , the h-index should be used jointly with the e-index, even the t-index. The average h-index and e-index over the 100 points are 19 and 28.14, respectively, corresponding to the average H and E being 0.26 and 0.48 ( Inline graphic ). Overall, to have a fair and accurate evaluation, the h-index should be used together with the e-index even the t-index for most of the 100 most prolific economists.

(A) The distribution of the 100 mapping points for each of the 100 most prolific economists.Note that only 11 points (11%) are situated at the regions where the h-index can be applicable (), indicating that the h-index should be used jointly with the e-index, even the t-index, for the remaining 89 economists. (B) An example to demonstrate that the power parameter is one of the key factors, which determines the position of the mapping point. Given and starting from the region No. 3 (the h-index dominated region) with , the mapping point moves to the region No. 6 (the e-index dominated region) with . Interestingly, the track of the mapping points forms a clockwise rotating curve.

The h-index captures only the information of the citation function partially. However, the above distribution of the 100 mapping points within the triangle provides more information about the shapes of the corresponding citation functions. For example, the mapping points within the small triangle No. 5 indicate that their citation distribution functions are peaked on the beginning part. On the contrary, the mapping points within the small triangle No. 9 indicate that their citation distribution functions are flat with a long tail. In both cases, the h-index seems not appropriate in capturing the main information of citation function. To complement the h-index, Bormann and co-workers⁵ introduced three parameters: the Inline graphic upper, center and lower, which correspond to E, H and T, respectively, in this paper. In other words, the triangle mapping technique provides an intuitive representation of the upper, center and lower. Bornmann and co-workers⁵ studied the shapes of the citation distribution functions of three scientists, A, B and C, belonging to an isohindex group with h = 14. For scientist A, E = 0.82, H = 0.15 and T = 0.03, corresponding to Inline graphic , . Its mapping point is situated at the small triangle No. 5, an e-index absolutely dominated regions. According to Bornmann et al⁵ and Cole and Cole⁸, Scientist A is called perfectionist-type scientist, who has rather few but very highly cited publications. For scientist B, E = 0.39, H = 0.48 and T = 0.13, corresponding to Inline graphic , . Its mapping point is situated at the small triangle No. 2, a boundary region between h-index and e-index dominated regions. According to references5,8, Scientist B is called a prolific-type scientist, who publishes a large number of high-impact papers. For scientist C, E = 0.10, H = 0.33 and T = 0.57, corresponding to Inline graphic , . Its mapping point is situated at the small triangle No. 8, a t-index dominated region. Scientist C is called a mass producer⁵,⁸, who publishes a larger number of papers that are lowly cited. It can be seen from the above analysis, the locations of the mapping points carry the information of the types of scientists. Therefore, the triangle mapping technique is particularly useful when the academic impact of a large number of scientists is studied. In that case, clustering analysis can be performed based on the mapping point locations, and therefore scientists can be classified according to their academic performance.

Recently, Baum introduced a new parameter, called Excess-Tail Ratio⁹, denoted by Inline graphic , where . Baum found that for most cases he studied, , even Only for few cases, . The shapes of citation distribution functions for are peaked, whereas for the shapes of the citation functions are flat with a long tail. Therefore, the Excess-Tail ratio is an appropriate parameter to capture the overall shapes of the citation functions. According to eq. (12), Inline graphic or corresponds to , or , respectively.

Discussion

In what follows, we want to explore the key factors that determine the shape of the citation distribution function. As previously, we assume a simple mathematical model for the citation distribution curve Inline graphic

The total citations received by N papers, Inline graphic , is

Based on eq. (1), it was shown that⁴,¹⁰

However, we should have Inline graphic which leads to

Meanwhile, we have⁴

Using eqs. (1)–(5), we find

Therefore, the condition under which the h-index can be dominant should satisfy Inline graphic , or

To have an intuitive picture, we consider some numerical examples as follows. Taking Inline graphic , and letting respectively, we calculate the values of H and E for each case. Using eq. (12), we find 12 mapping points in the triangle, as shown in Fig. 3B. It is interesting to see that with the increase of the value, the track of the mapping points forms a clockwise rotating curve. This example shows that the power parameter Inline graphic is one of the key factors to determine the shape of the citation function. Given and there is a threshold of , when is less than this threshold, the h-index can no longer be properly applicable. In fact, , .

The main contribution of this paper is to propose the citation triangle method, by which the shapes of citation distribution functions can be studied in a perceivable form. Based on the distribution of mapping points, applicability and limitation of the h-index can be studied. Generally, the h-index is not properly applicable in the e-index or t-index dominated regions. In those cases, the h-index should be jointly applied together with the e-index or t-index. The proposed mapping technique provides a platform to study the academic impact of a group of scientists, because some mathematical methods, such as clustering analysis, can be used to study the distribution of mapping points, and the academic impact of these scientists can then be classified and compared.

Methods

The h-index was proposed by Hirsch in 2005¹. The set of h papers of a scientist was called the h-core², in which at least h citations were received by each of the h papers. The e-index was proposed by Zhang⁴, which was defined as the square root of excess citations over those used for calculating the h-index. Therefore, the total citations received by the papers in the h-core are equal to Inline graphic . The h-index divides the total citations of a scientist into two parts: the first part is of the h-core, whereas the second one is of the h-tail³. For convenience, we define the square root of citations received by all papers in the h-tail as the t-index. Therefore, the number of total citations received by all papers of a scientist, Inline graphic , is composed of three parts: , and , i.e.,

where h, e and t are the h-, e- and t-index, respectively. Letting

we have

For any regular triangle, the sum of the distances from any interior point to the three sides is equal to the height of the triangle. Consider a regular triangle ABC with its height equal to 1 (Fig. 2A). Let the center of the triangle be denoted by O, and an x − y coordinate system is set up as shown in Fig. 2A. Based on eq. (11) and the feature of the regular triangle, the set of three real numbers H, E and T is mapped onto a point P(x, y) within the triangle, as shown in Fig. 2A. Simple calculation shows that

The triangle can be divided into 9 smaller triangles (regions) as shown in Fig. 2B. We denote them by No. 1 through No. 9, respectively. Each region is characterized by a special interval of the three real numbers H, E and T, respectively. For example, at the region No.1, Inline graphic and , indicating that is absolutely dominant at this region as compared with and . Similarly, at the region No. 5, and , indicating that is absolutely dominant as compared with and . At the region No. 9, and , indicating that is absolutely dominant as compared with and . Furthermore, at the region No. 3, Inline graphic , , , so, it is called an h-index dominant region; at the region No. 6, , , , so, it is called an e-index dominant region; and at the region No. 8, , , , so, it is called a t-index dominant region. Finally, the region No. 2 is the boundary region between the h-index and e-index dominant regions, the region No. 4 is the boundary region between the h-index and t-index dominant regions, and the region No. 7 is the boundary region between the e-index and t-index dominant regions. The above description has symmetry of a regular triangle. The total description is summarized in Table 1.

Table 1. Intervals of H, E and T for each of the 9 regions (small triangles) within the citation triangle.

No.	H	E	T	Feature remark
1				The h-index absolutely dominated region
2				Boundary between h-index and e-index dominated regions
3				The h-index dominated region
4				Boundary between h-index and t-index dominated regions
5				The e-index absolutely dominated region
6				The e-index dominated region
7				Boundary between e-index and t-index dominated regions
8				The t-index dominated region
9				The t-index absolutely dominated region

Open in a new tab

The three real numbers H, E and T are the percentages of citations associated with the h-, e- and t-index, respectively. In general, H should be greater than 1/3 (or Inline graphic , where the h-index is properly applicable, otherwise, if (or , the h-index under-evaluates the academic impact of the researcher concerned. Therefore, the four regions No.1, No.2, No.3 and No.4 are the regions where the h-index can be properly applied (). The regions No.2, No.5, No.6 and No.7 are the regions where the e-index can be properly applied ( Inline graphic ), whereas those of No.4, No.7, No.8 and No.9 are the regions where the t-index can be properly applied (. In summary, the h-index can only be properly applied in the regions No.1, No.2, No.3 and No.4 ( or ); and the h-index should be jointly applied together with the e-index or t-index in the remaining regions No. 5 through No.9 ( Inline graphic or ).

Author Contributions

CTZ designed the study, performed most of the experiments, analyzed data and wrote the manuscript. All authors reviewed the manuscript.

Acknowledgments

I thank Dr. F. Gao and Dr. K. Song and for helps in preparing Figures 2–3. I am grateful to Dr. Richard Tol for kindly providing the data used to calculate the e-index and t-index for each of the 100 most prolific economists.

References

Hirsch J. E. An index to quantify an individual's scientific research output. P Natl Acad Sci USA 102, 16569–16572 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
Rousseau R. New developments related to the Hirsch index. Science Focus 1, 23–25 (2006). [Google Scholar]
Ye F. Y. & Rousseau R. Probing the h-core: an investigation of the tail-core ratio for rank distributions. Scientometrics 84, 431–439 (2010). [Google Scholar]
Zhang C. T. The e-Index, Complementing the h-Index for Excess Citations. Plos One 4, e5429 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
Bornmann L., Mutz R. & Daniel H. D. The h index research output measurement: Two approaches to enhance its accuracy. J Informetr 4, 407–414 (2010). [Google Scholar]
Dodson M. V. Citation analysis: Maintenance of h-index and use of e-index. Biochem Bioph Res Co 387, 625–626 (2009). [DOI] [PubMed] [Google Scholar]
Tol R. S. J. The h-index and its alternatives: An application to the 100 most prolific economists. Scientometrics 80, 317–324 (2009). [Google Scholar]
Cole S. & Cole J. R. Scientific Output and Recognition - Study in Operation of Reward System in Science. Am Sociol Rev 32, 377–390 (1967). [PubMed] [Google Scholar]
Baum J. The Excess-Tail Ratio: Correcting Journal Impact Factors for Citation Quality. SSRN, http://ssm.com/abstract=2038102 (2012). [Google Scholar]
Egghe L. & Rousseau R. An informetric model for the Hirsch-index. Scientometrics 69, 121–129 (2006). [Google Scholar]

[b1] Hirsch J. E. An index to quantify an individual's scientific research output. P Natl Acad Sci USA 102, 16569–16572 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b2] Rousseau R. New developments related to the Hirsch index. Science Focus 1, 23–25 (2006). [Google Scholar]

[b3] Ye F. Y. & Rousseau R. Probing the h-core: an investigation of the tail-core ratio for rank distributions. Scientometrics 84, 431–439 (2010). [Google Scholar]

[b4] Zhang C. T. The e-Index, Complementing the h-Index for Excess Citations. Plos One 4, e5429 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[b5] Bornmann L., Mutz R. & Daniel H. D. The h index research output measurement: Two approaches to enhance its accuracy. J Informetr 4, 407–414 (2010). [Google Scholar]

[b6] Dodson M. V. Citation analysis: Maintenance of h-index and use of e-index. Biochem Bioph Res Co 387, 625–626 (2009). [DOI] [PubMed] [Google Scholar]

[b7] Tol R. S. J. The h-index and its alternatives: An application to the 100 most prolific economists. Scientometrics 80, 317–324 (2009). [Google Scholar]

[b8] Cole S. & Cole J. R. Scientific Output and Recognition - Study in Operation of Reward System in Science. Am Sociol Rev 32, 377–390 (1967). [PubMed] [Google Scholar]

[b9] Baum J. The Excess-Tail Ratio: Correcting Journal Impact Factors for Citation Quality. SSRN, http://ssm.com/abstract=2038102 (2012). [Google Scholar]

[b10] Egghe L. & Rousseau R. An informetric model for the Hirsch-index. Scientometrics 69, 121–129 (2006). [Google Scholar]

PERMALINK

A novel triangle mapping technique to study the h-index based citation distribution

Chun-Ting Zhang

Abstract

Figure 1. The citation distribution curve.

Results

Figure 2. The citation triangle method in studying h-index based citations.

Figure 3. Distributions of mapping points in the citation triangle.

Discussion

Methods

Table 1. Intervals of H, E and T for each of the 9 regions (small triangles) within the citation triangle.

Author Contributions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A novel triangle mapping technique to study the h-index based citation distribution

Chun-Ting Zhang

Abstract

Figure 1. The citation distribution curve.

Results

Figure 2. The citation triangle method in studying h-index based citations.

Figure 3. Distributions of mapping points in the citation triangle.

Discussion

Methods

Table 1. Intervals of H, E and T for each of the 9 regions (small triangles) within the citation triangle.

Author Contributions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases