Supplemental Digital Content is available in the text
Keywords: bibliometric, citation analysis, dashboard, medical subject heading, social network analysis
Abstract
Background:
Publications regarding the 100 top-cited articles in a given discipline are common, but studies reporting the association between article topics and their citations are lacking. Whether or not reviews and original articles have a higher impact factor than case reports is a point for verification in this study. In addition, article topics that can be used for predicting citations have not been analyzed. Thus, this study aims to
-
(1)
provide a visualization dashboard for the 100 top-cited articles related to article types and
-
(2)
inspect major medical subject headings (i.e., MeSH terms in PubMed) to help predict citations.
Methods:
We searched PubMed Central and downloaded 100 top-cited abstracts in the journal Medicine (Baltimore) since 2011. Four article types and 7 topic categories (denoted by MeSH terms) were extracted from abstracts. Contributors to these 100 top-cited articles were analyzed. Social network analysis and Sankey diagram analysis were performed to identify influential article types and topic categories. MeSH terms were applied to predict the number of article citations. We then examined the prediction power with the correlation coefficients between MeSH weights and article citations.
Results:
The citation counts for the 100 articles ranged from 24 to 127, with an average of 39.1 citations. The most frequent article types were journal articles (82%) and comparative studies (10%), and the most frequent topics were epidemiology (48%) and blood and immunology (36%). The most productive countries were the United States (24%) and China (23%). The most cited article (PDID = 27258521) with a count of 135 was written by Dr Shang from Shandong Provincial Hospital Affiliated to Shandong University (China) in 2016. MeSH terms were evident in the prediction power of the number of article citations (correlation coefficients = 0.49, t = 5.62).
Conclusion:
The breakthrough was made by developing dashboards showing the overall concept of the 100 top-cited articles using the Sankey diagram. MeSH terms can be used for predicting article citations. Analyzing the 100 top-cited articles could help future academic pursuits and applications in other academic disciplines.
Social network analysis and Sankey diagram analysis were performed to display the articles related to article types and the topic categories.
This study was a breakthrough made for readers to understand article citations that can be predicted using MeSH terms and interpreted using the Sankey diagram.
Many articles related to the100 top-cited publications used numerous tables and figures to report study findings. Only several figures and 1 table were enough to present informative messages to readers because the Sankey diagram and hyperlinks were applied to the articles for condensing knowledge of the Internet.
1. Introduction
Studies on scholarly journals focused on research domains (RDs) and research achievements (RAs).[1–3] The former can be clustered by using social network analysis (SNA) and medical subject headings (MeSH terms).[4–6] The latter was evaluated by metrics (e.g., impact factor (IF), h-index,[7] or x-index[8])
A total of 213 articles contain the keyword “100 cited” within the title in the PubMed database.[9] Most addressed main topics include descriptive statistics (DS), RA across countries/institutes over the years, and RD (on article types or for individual authors).[1,2] For instance, several publications determined the most influential papers using citation analysis in oral lichen planus,[10] prenatal diagnosis,[11] oral leukoplakia research,[12] inflammatory bowel disease,[13] rheumatoid arthritis,[14] and infection in orthopedics.[15] Similarly, some addressed RAs and RDs for journals through bibliometric analyses.[16–19] However, all aforementioned articles applied the same method and provided information limited to DS, RA, and RD, which is a state that we wish to challenge and make a breakthrough with the citation prediction of article types.
Although bibliometric studies helped us understand the core concepts in the field of interest and provided guidance for researches, 2 aspects were frequently ignored and criticized:
-
(1)
the inability to visualize results through a dashboard highlighting relevant entities[2,20,21] and
-
(2)
the lack of a model for predicting the number of article citations for future studies.[22–25]
Previous studies[26] investigated the IF on various types of publications and found that reviews and original articles have higher IFs than case reports. Bhandari and his colleagues[27] elucidated that rigorous system reviews received more than twice the mean number of citations compared with other systematic or narrative summaries. The value of case reports, if defined by IF, would be low because they are rarely cited by others.[28] Meanwhile, the discrepancies of IF across different types of publications need to be verified. Therefore, a practical method to predict citation counts for clinical articles[25] by using the bibliometric approach will be hugely beneficial.
The objectives of this study were to analyze the 100 top-cited articles from a journal through a systematic search and apply a novel approach involving
-
(1)
a picture highlighting the most outstanding entities and
-
(2)
a mode capable of predicting the number of citations in the future.
2. Methods
2.1. Data source
Two steps were carried out to organize study data. First, we searched the PubMed database (Pubmed.com, Pubmed Central (PMC) for short) using the keyword (Medicine (Baltimore) [Journal]) and downloaded 7203 abstracts with 69,598 citations on May 1, 2020, see dataset in Supplemental Digital Content file 1.
Second, we selected the 100 top-cited articles since 2011 and categorized them with various filters. Several figures and tables were produced to illustrate
-
(1)
the major topics of Medicine (Baltimore),
-
(2)
the main contributors according to their origin countries/regions, and
-
(3)
the prediction power based on article topic categories with MeSH terms for the number of citations.
Ethical approval was not necessary for this study because all data were obtained from the database publicly available on PMC.
2.2. Data arrangement for DS and RA
The 100 top-cited articles were categorized into different types (e.g., case reports, clinical trials, comparative study, and journal articles in PMC). Topic categories referring to the MeSH terms[29] in each article were classified by using SNA.[4–6]
A contingency table was made to show the main contributors from countries/regions. A choropleth map was produced to highlight the most influential countries/regions of origin for authors based on the mean number of citations.
Furthermore, mean citations per article were listed on the basis of article types and topic categories using the pyramid plot.
2.3. Data visualization for RD
SNA[4–6] was applied to associate the 100 top-cited articles with topic categories and article types on a visual board to achieve the first goal of this study, highlighting the most outstanding entities.
Pajek software[30] was applied to perform SNA. Relevant entities regarding article types provide insights into the overall concept when presented on a visual board, which was hard to display using the traditional word cloud technique.[31] A Sankey diagram was particularly used to interpret the association of the most dominant entities in the network.
2.4. Prediction power on article citations
To achieve the second objective of this study, we computed the weights for the number of citations on MeSH terms per article according to the proportions and citations using Eqs. (1) to (5) below:
![]() |
where AL denotes the number of MeSH in an article. The weighted count of a specific MeSH term is defined in Eq. (2):
![]() |
where n is equal to 100 in this study. Similarly, the weighted citation is in Eq. (3):
![]() |
where Cj is the citation in article j, the mean citation for a MeSH term in Eq. (4), similar to the implication of IF to a journal.
As such, the weighted MeSH for an article can be yielded by Eq. (5)
![]() |
2.5. Statistics
The correlation coefficient (CC) was used to determine the prediction power between the weighted MeSH terms and the original article citations. The CC t-value was calculated using the formula (=CC∗sqrt((n-2)/(1-CC∗CC)). A prediction equation was produced by running a simple regression analysis using MedCalc 9.5.0.0 for Windows (MedCalc Software, Belgium). The significant level was set at Type I error of 0.05. The study process is presented in an MP4 file in Supplemental Digital Content file 2.
3. Results
3.1. All articles linked on the website
The results of the 100 top-cited articles published in Medicine (Baltimore) since 2011 were included in Reference,[32] where readers were invited to inspect the association among entities in detail.
The citation counts for the 100 articles ranged from 24 to 135 as of May 1, 2020, in PMC, with an average of 39.1 citations. The most productive countries of origin were the United States (24%) and China (23%). The highest number of citations per article was from France (52.6) (Table 1).
Table 1.
Distribution of publications across countries over years.

3.2. Visual representations using a choropleth map
Analysis based on the Chinese provinces and the US states showed that the top number of citations per article was from Ohio state (US) (=101), followed by Shanghai (China) (=71) and Guangdong (China) (=62.2)(Fig. 1).
Figure 1.

Mean number of citations across countries/regions.
3.3. Visual representations using pyramid plots
The majority of articles were journal articles (82%) and comparative studies (10%). Most articles were published in the topic categories of epidemiology (48%), followed by blood and immunology (36%) (Fig. 2).
Figure 2.

Comparison of citations in article types and subcategories.
3.4. Visual representations using SNA and the Sankey diagram
The 3 factors of journal article, blood and immunology, and mortality were combined, and a visual representation with the highest frequency in total counts was provided via the triangle lines linked in Figure 3. All articles denoted by black bubbles were associated with their respective article types and topic categories. Readers are invited to examine the details in Figure 3 by clicking the link at Reference.[33] Once the black bubble of interest is clicked, the article abstract immediately appears on the PubMed website. Three entities (e.g., articles, topic categories, and article types) are jointly displayed with different colors in Figure 3.
Figure 3.

Articles related to article types and subcategories on a display board.
The most cited article (PMID = 27258521) with a count of 135 was conducted by Dr Shang from Shandong Provincial Hospital Affiliated to Shandong University (China) in 2016.[34]
For highlighting the association among entities in a picture, the Sankey diagram was drawn in Figure 4.[35] Only the top dominant entities with the closest relationships were displayed and connected by the curve lines from the left side to the right side. Other weaker cited lines between entities were removed from the diagram. For instance, the year 2016 is merely connected with the highly cited article with PMID = 27258521(=135), which is sequentially liked to the next entity of China. Bubbles were colored by the clusters and sized by the cited weights. As such, the top 3 (i.e., journal article, epidemiology, and blood and immunology) connected by a triangle at the middle-top side are highlighted in the Sankey diagram. Readers are invited to scan the QR-code in Figure 4 to examine the details about the information on entities.
Figure 4.

Association of entities using the Sankey diagram to display.
3.5. Citation prediction using MeSH terms
MeSH terms were evident in generating prediction power on the number of citations (CC = 0.49, t = 5.62).
All weighted MeSH terms calculated by Eqs. (1) to (5) were applied to match article citations. We found that MeSH terms had a significant prediction power on the number of citations (CC = 0.49, t = 5.62) (Fig. 5). The regression equation is defined as article citation(y) y = 7.5473 + 0.8979 x weight(x) of MeSH term. The slop coefficient appeared statistically significant (F = 31.65, P < .001).
Figure 5.

MeSH terms to predict article citations (P < .0001).
3.6. Cluster analysis of MeSH terms
All MeSH terms with their clusters and the bubbles of different sizes according to weights of the citation per term are shown in Figure 6, representing the whole RD concept for the journal Medicine (Baltimore). The term “DiGeorge syndrome” is an isolated event due to the article[37] with a single MeSH term with the greatest bubble shown in Figure 6. As a result, the term “DiGeorge syndrome” has a larger bubble representing 107 citations, followed by the other 2 from the pathology and the mortality linked by red triangle lines in Figure 6.
Figure 6.

MeSH clusters with bubble sizes.
3.7. Creating dashboards on Google Maps
The references[33,35,36] are provided with links to Figures 3 and 4Figures 3, 4, and 6. Readers are invited to see the detailed information on the dashboard laid on Google Maps.
4. Discussion
In this study, we downloaded articles from PMC and assessed the most cited papers published in Medicine (Baltimore) since 2011 to understand the research characteristics of citations in the past. This paper provides an insight into some of the most influential entities, articles (Fig. 3), and publication-changing chronicles over the years (Table 1).
The topic categories related to article topics were found using the SNA approach and the Sankey diagram. Nevertheless, the topic category of mortality earned the highest mean number of weighted scores denoted by MeSH terms (Fig. 2). The high mean IF in past studies in the field of mortality can be attributed to its important nature in clinical medicine. Similarly, clinical trial as an article type had a higher number of citations compared with others. Meanwhile, case reports had a low IF, which is consistent with the results of a previous study.[28]
4.1. Features of this study
We provided a guideline to analyze the 100 top-cited articles in 3 steps (i.e., DS, RA, and RD). This breakthrough was made using tables and figures (e.g., Table 1 for DS, Figs. 1 and 2 for RA, and Figs. 3–6 for RD). In particular, visualization dashboards were used to present the results, and a link to the website allowed readers to examine the detailed information on their own. For instance, any article with the black bubble in Figure 3 can be clicked on and redirected to PMC for its online abstract. The board equipped with multi-variables (i.e., entities in the Sankey diagram) is superior to the traditional one-variable word cloud technique.[32]
Furthermore, both SNA and MeSH terms[1,29] were sophisticatedly incorporated for interpreting the RA in a succinct representation, as we did in Figures 3 and 4. Visualizing the characteristics of data is important and interesting for readers.[20] Finally, the distinct difference from previous articles[10–15] was that only recent articles collected were included in this study, which improved the relevance and importance for readers.
4.2. Limitations and suggestions
Although findings are promising based on the above analysis, several potential limitations may encourage further research efforts. First, the study data were only downloaded from PMC, different from other studies[10–15] that used the combination of science Scientific Citation Index (SCI), Scopus, and Google Scholar. Several journals in PubMed have not been indexed by the database of SCI. In addition, certain SCI-indexed journals have not been included in PubMed. These findings might contribute to a potential bias in the results of publications and citations obtained in this study.
Second, another bias might exist during citation extractions because the number of citations possibly increased as the date elapsed. The article citations might differ if the time available and the citation sources of the data are disparate.
Third, only articles with MeSH terms were included in this study. Others without MeSH terms were excluded from this study despite a potentially large number of citations present.
Fourth, although our cluster analysis using SNA is unique and useful, including all relevant entities on a board without confusing the readers is challenging. The results could be improved further for clarity and understandability.
Fifth, only a single journal was investigated for its RA, RD, and DS in the current study. Although several journals on a specific topic or discipline with their 100-top articles were frequently read in the literature, we performed only the visual representations and citation prediction approaches for the 100 top-cited documents in a single journal. The research methods, particularly with dashboards laid on Google Maps, can be provided to future similar studies.
Finally, article citations have many associated factors, such as the number of references and authors. The finding of MeSH terms associated with article citations is not the only factor influencing article citations. Thus, it is worth studying further about the prediction factors to the highly cited articles in the future.
5. Conclusion
The study identified the 100 top-cited articles in the journal of Medicine (Baltimore) and provided insights for readers. With the novel method of SNA and the Sankey diagram, the bibliometric analysis of 100 top-cited articles can be applied to other academic disciplines in the future.
Acknowledgments
We thank Enago (www.enago.tw) for the English language review of this manuscript.
Author contributions
YC developed the study concept and design. JCJ, YT, TW, and SC analyzed and interpreted the data. YF monitored the process of this study and helped in responding to the reviewers’ advice and comments. WC drafted the manuscript, and all authors provided critical revisions for important intellectual content. The study was supervised by YF. All authors read and approved the final manuscript.
Conceptualization: Yu-Chi Kuo, Tsair-Wei Chien.
Data curation: Jui-Chung John Lin.
Formal analysis: Yu-Chi Kuo, Jui-Chung John Lin.
Investigation: Yu-Tsen Yeh.
Methodology: Tsair-Wei Chien, Shu-Chun Kuo, Yu-Tsen Yeh.
Supervision: Yao Fong.
Validation: Shu-Chun Kuo, Yao Fong.
Visualization: Shu-Chun Kuo.
Supplementary Material
Supplementary Material
Footnotes
Abbreviations: CC = correlation coefficient, DS = descriptive statistics, IF = impact factor, MeSH = medical subject headings, PMC = Pubmed Central, RA = research achievement, RD = research domain, SCI = Scientific Citation Index, SNA = social network analysis.
How to cite this article: Kuo Y-C, Chien T-W, Kuo S-C, Yeh Y-T, Lin J-C, Fong Y. Predicting article citations using data of 100 top-cited publications in the journal Medicine since 2011: a bibliometric analysis. Medicine. 2020;99:44(e22885).
All data were downloaded from MEDLINE database at pubmed.com.
The authors have no funding and conflicts of interest to disclose.
The datasets generated during and/or analyzed during the current study are publicly available.
References
- [1].Chien TW, Wang HY, Kan WC, et al. Whether article types of a scholarly journal are different in cited metrics using cluster analysis of MeSH terms to display: a bibliometric analysis. Medicine (Baltimore) 2019;98:e17631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Chien TW, Chang Y, Wang HY. Understanding the productive author who published papers in medicine using the national health insurance database: a systematic review and meta-analysis. Medicine (Baltimore) 2018;97:e9967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Hsieh WT, Chien TW, Kuo SC, et al. Whether productive authors using the national health insurance database also achieve higher individual research metrics: a bibliometric study. Medicine (Baltimore) 2020;99:e18631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].DeFosset AR, Mosst JT, Gase LN, et al. The Los Angeles diabetes prevention coalition experience: practical applications of social network analysis to inform coalition building in chronic disease prevention. J Public Health Manag Pract 2020;26:270–9. [DOI] [PubMed] [Google Scholar]
- [5].Govoeyi B, Agbokounou AM, Camara Y, et al. Social network analysis of practice adoption facing outbreaks of African swine fever published online ahead of print, 2020 Apr 20. Prev Vet Med 2020;179:105008.doi: 10.1016/j.prevetmed.2020.105008. Epub 2020 Apr 20. PMID: 32334132. [DOI] [PubMed] [Google Scholar]
- [6].Prado AM, Pearson AA, Bertelsen NS, et al. Connecting healthcare professionals in Central America through management and leadership development: a social network analysis. Global Health 2020;16:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Hirsch JE. An index to quantify an individual's scientific research output. Proc Natl Acad Sci U S A 2005;102:16569–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Fenner T, Harris M, Levene M, et al. A novel bibliometric index with a simple geometric interpretation. PLoS One 2018;13:e0200098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Pubmed Central. Publications related to 100 top-cited articles in Pubmed Central. May 15, 2020, available at https://pubmed.ncbi.nlm.nih.gov/?term=100%5Btitle%5D+and+cited%5Btitle%5D&sort=date. [Google Scholar]
- [10].Liu W, Ma L, Song C, et al. Research trends and characteristics of oral lichen planus: a bibliometric study of the top-100 cited articles. Medicine (Baltimore) 2020;99:e18578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Zhang M, Zhou Y, Lu Y, et al. The 100 most-cited articles on prenatal diagnosis: a bibliometric analysis. Medicine (Baltimore) 2019;98:e17236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Liu W, Zhang Y, Wu L, et al. Characteristics and trends of oral leukoplakia research: a bibliometric study of the 100 most cited articles. Medicine (Baltimore) 2019;98:e16293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Chen X, Yang K, Xu Y, et al. Top-100 highest-cited original articles in inflammatory bowel disease: a bibliometric analysis. Medicine (Baltimore) 2019;98:e15718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Yin X, Cheng F, Wang X, et al. Top 100 cited articles on rheumatoid arthritis: a bibliometric analysis. Medicine (Baltimore) 2019;98:e14523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Jiang Y, Hu R, Zhu G. Top 100 cited articles on infection in orthopaedics: a bibliometric analysis. Medicine (Baltimore) 2019;98:e14067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Ahmad P, Elgamal HAM. Citation classics in the journal of endodontics and a comparative bibliometric analysis with the most downloaded articles in 2017 and. J Endod 2020;46:1042–51. [DOI] [PubMed] [Google Scholar]
- [17].Balica A, Kohut A, Tsai TJ, et al. A bibliometric analysis of citation classics in the journal of ultrasound in medicine. J Ultrasound Med 2020;39:1289–97. [DOI] [PubMed] [Google Scholar]
- [18].Fernández-Guerrero IM, Martín-Sánchez FJ, Burillo-Putze G, et al. Analysis of the citation of articles published in the European journal of emergency medicine since its foundation. Eur J Emerg Med 2019;26:65–70. [DOI] [PubMed] [Google Scholar]
- [19].Huh S. Citation analysis of the Korean journal of urology from web of science, scopus, Korean medical citation index, KoreaMed synapse, and Google scholar. Korean J Urol 2013;54:220–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Shen L, Xiong B, Li W, et al. Visualizing collaboration characteristics and topic burst on international mobile health research: bibliometric analysis. JMIR Mhealth Uhealth 2018;6:e135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Kan WC, Chou W, Chien TW, et al. The most-cited authors who published papers in JMIR Mhealth and Uhealth using the authorship-weighted scheme: bibliometric analysis. JMIR Mhealth Uhealth 2020;8:e11567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Butun E, Kaya M. Predicting citation count of scientists as a link prediction problem. IEEE Trans Cybern 2020;50:4518–29. [DOI] [PubMed] [Google Scholar]
- [23].Park I, Yoon B. Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. J Informetr 2018;12:1199–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Lin HF, Wu XF, Zhang YH. SCI citation analysis and impact factor prediction of JZUS-B in 2008. J Zhejiang Univ Sci B 2009;10:77–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Lokker C, McKibbon KA, McKinlay RJ, et al. Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study. BMJ 2008;336:655–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Rodríguez-Lago L, Molina-Leyva A, Pereiro-Ferreirós M, et al. Influence of article type on the impact factor of dermatology journals. Actas Dermosifiliogr 2018;109:432–8. [DOI] [PubMed] [Google Scholar]
- [27].Bhandari M, Montori VM, Devereaux PJ, et al. Hedges team. Doubling the impact: publication of systematic review articles in orthopaedic journals. J Bone Joint Surg Am 2004;86:1012–6. [PubMed] [Google Scholar]
- [28].Nielsen MB, Seitz K. Impact factors and prediction of popular topics in a journal. Ultraschall Med 2016;37:343–5. [DOI] [PubMed] [Google Scholar]
- [29].Eisinger D, Tsatsaronis G, Bundschus M, et al. Automated patent categorization and guided patent search using IPC as inspired by MeSH and PubMed. J Biomed Semantics 2013;4: Suppl 1: S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].de Nooy W, Mrvar A, Batagelj V. Exploratory Social Network Analysis With Pajek: Revised and Expanded. Cambridge University Press, 2nd edn.New York, NY: 2011. [Google Scholar]
- [31].Atenstaedt R. Word cloud analysis of the BJGP: 5 years on. Br J Gen Pract 2017;67:231–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Chiwn TW. The most cited article in journal Medicine since 2011. August 26, 2020, available at http://www.healthup.org.tw/html100/medicine100.htm. [Google Scholar]
- [33].Chien TW. A dashboard for showing the association of article types and articles. May 15, 2020, available at http://www.healthup.org.tw/gps/medicine1002020bcc.htm [Google Scholar]
- [34].Shang X, Li G, Liu H, et al. Comprehensive circular RNA profiling reveals that hsa_circ_0005075, a new circular RNA biomarker, is involved in hepatocellular crcinoma development. Medicine (Baltimore) 2016;95:e3811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Chien TW. The Sankey diagram on Google Maps. August 26, 2020, available at http://www.healthup.org.tw/gps/medicinesankey.htm [Google Scholar]
- [36].Chien TW. MeSH terms with citation weights. August 26, 2020, available at http://www.healthup.org.tw/gps/medicine1002020bcckey.htm [Google Scholar]
- [37].McDonald-McGinn DM, Sullivan KE. Chromosome 22q11.2 deletion syndrome (DiGeorge syndrome/velocardiofacial syndrome). Medicine (Baltimore) 2011;90:1–8. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




