Abstract
Objective:
To determine the methodologic quality of therapy articles about humans published in ISI surgical journals, and to explore the association between methodologic quality, origin, and subject matter.
Summary Background Data:
It is supposed that ISI journals contain the best methodologic articles.
Methods:
This is a bibliometric study. All journals listed in the 2002 ISI under the subject heading of “Surgery” were included. A simple randomized sampling was conducted for selected journals (Annals of Surgery, The American Surgeon, Archives of Surgery, British Journal of Surgery, European Journal of Surgery, Journal of the American College of Surgeons, Surgery, and World Journal of Surgery). Published articles related to therapy on humans of the selected journals were reviewed and analyzed. All kinds of clinical designs were considered, excluding editorials, review articles, letters to the editor, and experimental studies. The variables considered were: place of origin, design, and the methodologic quality of articles, which was determined by applying a valid and reliable scale. The review was performed interchangeably and independently by 2 research teams. Descriptive and analytical statistics were used. Statistical significance was defined as P values less than 1%.
Results:
A total of 653 articles were studied. Studies came predominantly from the United States and Europe (43.6% and 36.8%, respectively). The subject areas most frequently found were digestive and hepatobiliopancreatic surgery (29.1% and 24.5%, respectively). Average and median methodologic quality scores of the entire series were 11.6 ± 4.9 points and 11 points, respectively. The association between methodologic quality and journals was determined. Also, the association between methodologic quality and origin was observed, but no association with subject area was verified.
Conclusions:
The methodologic quality of therapy articles published in the journals analyzed is low; however, statistical significance was determined between them. Association was observed between methodologic quality and origin, but not with subject matter.
The methodologic quality of therapy articles published in some analyzed ISI surgical journals is low; however, statistical significance was determined between them, being directly proportional to their impact factor. Association was observed between the methodologic quality of an article and its geographical origin, but not its subject matter.
Bibliometric analysis of scientific publications has become an independent area of research and may, with some reservations, be used in the analysis of the output of scientifically active institutions and organizations. The method, however, is far from unproblematic, and its validity as a general benchmarking technique is frequently discussed; in particular, the question of journal impact factors often finds interest among both authors and publishers.1 It is very common that the most visited journals are those with the best impact factor (IF) because it is supposed that it contains the best quality articles methodologically.
How is the methodologic quality of a study evaluated? Since 1970, several systems have been used to determine the quality of scientific publications. One of the first was Sacket's proposal of “evidence level.” Like every scale, this one is made up of numeric or alphanumeric “values” whose semantic tradition can be reduced to “good,” “fair,” “bad,” or “very bad”; and even today it is a useful tool as a semantic reference for evidence that supports the decision-making process.2,3 However, the construct of methodologic quality has to be understood as a multidimensional concept, and even nowadays no gold standard exists to evaluate it. Some recommendations for study reports like CONSORT (Consolidated Standards of Reporting Trials) for clinical trials4 and STARD (Standards for Reporting of Diagnostic Accuracy) for diagnostic test studies,5 have been published, but none has been designed or validated with regard to the methodologic quality of article evaluation.
How to approach this topic has been a subject of interest to our working group, generating a valid and reliable scale of methodologic quality that has allowed us to carry out bibliometric studies and consider the evidence published in the area of therapeutic surgery through the implementation of systematic revisions of the literature with different types of design.6–10
The aim of this study is to determine the methodologic quality of articles of therapy on humans published in ISI surgical journals and to explore the association between methodologic quality, origin and subject matter.
MATERIALS AND METHODS
Design: Bibliometric Study
Journal Selection
A simple randomized sampling of the years 2000 to 2004 was conducted to choose 1 year for the analysis. According to this, all journals listed in the 2002 ISI Journal Citation Report under the subject heading of “Surgery” in English were included for analysis. A simple randomized sampling was conducted for selecting surgical journals for the analysis. Review journals such as Surgical Clinics of North America and Current Opinion in Surgery and research journals such as Journal of Surgical Research or Journal of Investigative Surgery were excluded. Accordingly, the journals analyzed were Annals of Surgery, Archives of Surgery, British Journal of Surgery, European Journal of Surgery, Journal of the American College of Surgeons, Surgery, The American Surgeon, and World Journal of Surgery.
Study Inclusion Criteria
Therapy articles published in the above-mentioned journals during 2002 relevant to the adult patient population (>18 years) were analyzed. Designs like series of cases, cross-sectional studies, historical and concurrent cohort studies, randomized clinical trials, simple or double blind (RCT), and multicentric RCT were considered. According to this, the number of articles listed in the 2002 ISI Journal Citation Report under the MeSH term “Surgery” [Subheading] was 19,690.11
Study Exclusion Criteria
Editorials, letters to the editor, review articles, clinical guides, systematic reviews, laboratory investigation articles, and studies conducted in pediatrics were excluded.
Studied Variables
The endpoint was methodologic quality of published articles. Other considered variables were: used design, geographic origin of articles, and subject area.
Analysis Methodology
All of the published articles related to therapy in humans of the selected journals were reviewed and analyzed applying our methodologic quality score. The review was performed interchangeably and independently by 2 review teams of 2 reviewers each (C.M., V.P., M.V., and H.L.). Discrepancy between reviewers and review teams was solved by consensus.
Methodologic Quality Score
A valid (face and content validity, and construct validity for extreme groups6,7) and reliable (interobserver reliability8) scale of methodologic quality was used. This scale is composed of 3 items; the first is related to the study design; the second to the population sample size in the study, adjusted according to the presence or absence of sample size justification; and the third related to the methodology used (objectives, design, eligibility criteria and their justification). According to this, a score which represents the sum of the 3 items is generated, with a final score that can vary between 6 and 36 points, with 6 points being the worst methodologic quality study and 36 points being the best. The cutoff point for the “methodologic quality” construct is 18 points (Table 1).7,8 Thus, 4 items related to the methodologic quality construct are evaluated: design, methodology used, sample size, and report.
TABLE 1. Methodologic Quality Score (Final Score 6–36)

Variables Codification
All the variables studied, with the exception of the endpoint or respond variable (methodologic quality), are nominal, so they were categorized. Geographic origin of articles: grouping by continents in 6 subgroups (Africa, Asia, Latin America, North America, Europe, Oceania); subject area in 6 subgroups (digestive surgery; hepatobiliopancreatic surgery; endocrine surgery; abdominal wall, breast and soft tissue surgery; thoracic and cardiovascular surgery; and miscellanea). The variable design was categorized by level of evidence criteria.2,3
Sample Size
A sample size of 500 articles was estimated considering all surgical articles of therapy published in 2002 (n = 4833),11 with a 99% confidence interval, 70% of expected frequency of good methodologic quality, and a worst expected result of 65%.
Ethical Aspects
To assure confidentiality of analyzed articles, centers, and journals, data were coded.
Analysis Plan
Descriptive statistics were calculated (median, average, and standard deviation). Analytical statistics were used for the group's comparison (Pearson χ2, Fisher, ANOVA, Bonferroni, Kruskal-Wallis, and nonparametric statistics). Statistical significance was defined as P values less than 1%.
RESULTS
A total of 653 articles were evaluated and the following designs were found: 469 series of cases (71.9%), 99 cohort studies (15.2%), and 81 clinical trials (12.4%) (Table 2).
TABLE 2. Clinical Designs Verified (n = 653)

The articles analyzed came from North America (285, 43.7%), Europe (240, 36.8%), and Asia (106, 16.2%) (Table 3). The most frequent subject areas were digestive surgery (190, 29.1%) and hepatobiliopancreatic surgery (160, 24.5%). (Table 4).
TABLE 3. Geographic Origin of Published Articles and Observed Methodologic Quality Score Average (n = 653)

TABLE 4. Subject Area of Published Articles and Observed Methodologic Quality Score (n = 653)

The average and median methodologic scores in the series were 11.6 ± 4.9 points and 11 points, respectively, with a 95% confidence interval of 11.2–12.0; 90.7% of the articles had a score less than 18 points. The British Journal of Surgery, Annals of Surgery, and European Journal of Surgery were the journals with the most articles with a score of 18 or higher (31.1%, 21.3%, and 16.4%, respectively); Annals of Surgery and British Journal of Surgery were the journals with the most articles with a score of 25 or higher (34.8% and 30.4%, respectively).
The number of articles published by each journal and average scores for each one is detailed in Table 5. In this one, and in Figure 1, significant statistical differences can be observed (Annals of Surgery and British Journal of Surgery have higher average and median scores).
TABLE 5. Surgical Journals and Observed Methodologic Quality Score Average (n = 653) (ANOVA and Bonferroni)


FIGURE 1. Median comparison and confidence intervals 25% and 75% of methodologic quality of analyzed articles by journals considered.
The association between the geographic origin of articles and the methodologic quality was verified. European publications had a better score than those originating in North America (12.7 ± 5.2 vs. 10.7 ± 4.4 points), with a P value of 0.002 (Table 3; Fig. 2). No association between subject area and methodologic quality was observed (Table 4; Fig. 3).

FIGURE 2. Median comparison and confidence intervals 25% and 75% of methodologic quality of analyzed articles by continent of origin.

FIGURE 3. Median comparison and confidence intervals 25% and 75% of methodologic quality of analyzed articles by subject area (1: digestive surgery; 2: hepatobiliopancreatic surgery; 3: endocrine surgery; 4: abdominal wall, breast and soft tissue surgery; 5: thoracic and cardiovascular surgery; 6: miscellanea).
Regarding the methodologically observed variables of item 3 (Table 1), the most frequent methodologic defects in the works analyzed were: the setting out of vague objectives; the use of designs with a low level of evidence, which for the most part is either not even mentioned or is difficult to identify in the structure of the manuscript; the lack of eligibility criteria (in more than half of the articles it is the reader who must identify the inclusion criteria and only rarely are the exclusion criteria mentioned); and the absence of justification of the sample size used, a fact that becomes even clearer in the RCTs.
DISCUSSION
To begin this article's discussion, the reader must ask: Why carry out a bibliometric study? To answer this question, it could be said that it allows us to understand what exactly is published in a journal. This study must be interpreted as an objective evaluation of the methodologic quality that in this case has been applied to articles published regarding therapy, one of the areas (together with diagnosis and prognosis) of great interest in the practice of our specialty.
As was set out in the introduction, evaluating the methodologic quality of a study is no easy task. Methodologic quality must be understood as a multidimensional concept, in which it is possible to assess various facets that can be represented as a polygon with points such as items that one can or wishes to evaluate in an article (Fig. 4). Until now, there has been no reference standard, and in this context the MINCIR group at the Department of Surgery at the Universidad de La Frontera has developed an instrument whose validity and usefulness has already been contrasted in other studies.6–10

FIGURE 4. Multidimensional concept of methodologic quality of an article. Three possibilities of presentation of a published article are simulated. In number 1, an article that complies with almost 100% of the items evaluated is schematized. In numbers 2 and 3, 2 types of articles, in which some items prevail over others, are represented. The variables (design, report, methodology, analysis, and sample size) are different items that have to be individually evaluated and then they compose the construct to which we are referring. Therefore, a lot of different possibilities for the presentation of a published article may exist. For example, an almost perfect article would have a good evaluation in all of the items, and so its figure would expand to fill almost the entire polygon; on the other hand, an article of poor methodologic quality would present an incompletely expanded polygon, since the evaluation of the items is not perfect, thus creating a different kind of geometric figure.
Of the 2057 articles published in the journals in this study in 2002, 751 were about therapy, ie, 36.5%. But a relevant fact to consider is the high proportion of case reports and series of cases (71.9%), a large part of it retrospective in character, leaving very low proportions of concurrent cohort studies (2.6%) and RCTs (12.4%), which represent evidence levels 2 and 1 in therapy, respectively.
On the other hand, it was observed that more than 80% of the studies came from 2 continents, a fact that gives cause for some comments. The first has to do with the high volume of articles that these generate. The second is that this elevated number plays a double role: on the one hand, it increases production; but on the other hand, it affects the average as well as the median of the methodologic quality of what is published. Although these same continents are the ones that publish the greatest number of high-quality studies, they are also the ones that publish the greatest number of case reports and series of cases, respectively, that often only reach 6 and 9 points on the scale used. Following this same idea, but this time seen from the point of view of the journal in which these articles are published, the Annals of Surgery must be highlighted, which only contributes 65 articles, but whose average and median are greater than all the other journals (14.6 ± 6.5 points), especially if compared with The American Surgeon, which contributes almost double the analysis studies but with an average score of 5 points lower (9.2 ± 2.9 points). Nevertheless, it must remain perfectly clear that the valuation carried out in this study takes into consideration only one aspect of journal assessment (methodologic quality), and in particular therapy articles (leaving aside the areas of diagnosis, prognosis and economic evaluations). Obviously, this study is not attempting to judge the quality of the journals involved, nor the interest or originality of the articles contained in them. Hence, this construct (ie, concept or mini-theory) could become more complex when the policies of each individual journal become involved.
Indeed, in a similar study regarding the adequacy of clinical research in a pediatric journal which all the observational studies published within a period of 6 years (n = 300), it was found that 95% were retrospective in nature and that only 25% used a control group. Most studies obtained a lower score than that calculated as the cutoff point. This research was useful for carrying out an intervention that consisted of the implementation of review guidelines for peers based on this journal's new publication standards, with the aim of improving some faulty aspects.12 Others turn to different kinds of “checklists” for the different study types. Checklists are used as reminders of the aspects that each reviewer must go through.13
Among other things, therefore, it could be asked: How much of what we publish can be considered useful and relevant to medical knowledge? A quick look at the 2 most used bibliographic medical indexes, the National Library of Medicine in Bethesda and Excerpta Medica Foundation, located in Amsterdam, produce the staggering respective figures of 250,000 and 235,000 new articles annually. If one takes into account that only 40% of citations coincide in both indexes, that results in no less than 385,000 articles on medical topics every year. If we restrict ourselves exclusively to journals in the area of surgery (some 150) and count an average of 15 articles for each edition, we reach the conclusion that the figure becomes unwieldy.14 In 1963, Solla Price formulated his famous theory that scientific production in every field of knowledge grows in exponential form, doubling itself approximately every 15 years.15
One aspect that seemed to be of interest was searching for an association between methodologic quality and subject area, since it could be thought a priori that some disciplines within our specialty could have developed as determined skill sets. And this fact, at least in this study, seems inconsistent, since no differences were observed between the subgroups generated.
Practical suggestions to help journal readers as well as editors and peer reviewers evaluate the quality of published articles are: Understanding that a great number of treated patients is not enough to ensure that a study has internal and external validity; paying attention to the reporting of different fundamental issues of a scientific manuscript (ie, description of a precise objective, a single primary result variable, study design, inclusion and exclusion criteria, and sample size calculation); recognizing designs of high levels of evidence (and its peculiarities) related to the purpose of the study; and being able to develop a manuscript, in which a logical path between objective, methodology, results, and conclusion can be identified. Such suggestions would also be highly useful to authors embarking on studies and preparing manuscripts for publication.
Furthermore, these data suggest the need to implement measures, such as educational interventions and guidelines for authors and especially for reviewers (creating, for example, “checklists” for the different study types as reminders of the aspects that each reviewer must go through), based on a series of standard points for manuscripts, with the aim of improving some of the aspects that are lacking.12,13 With this background, we have to consider the quality-versus-quantity dilemma created by the rapidly growing body of medical literature and consider publishing fewer papers but of higher quality.
CONCLUSION
It must be mentioned that the methodologic quality of the studies published on therapy in surgical journals in 2002 is low, generally speaking, a fact that can be explained by the large volume of studies included that have a low level of evidence. Statistically significant differences were established in favor of those journals with a greater impact factor (Annals of Surgery and British Journal of Surgery), and association with the continental origin of the articles.
Footnotes
Supported in part by the DID-UFRO EP2102 Project.
Reprints: Carlos Manterola, MD, PhD, Department of Surgery, Universidad de La Frontera. Casilla 54-D, Temuco, Chile. E-mail: cmantero@ufro.cl.
REFERENCES
- 1.Saha S, Saint S, Christakis DA. Impact factor: a valid measure of journal quality? J Med Libr Assoc. 2003;91:42–46. [PMC free article] [PubMed] [Google Scholar]
- 2.Sackett DL. Rules of evidence and clinical recommendations on use of antithrombotic agents. Chest. 1986;89(suppl 2):2–3. [PubMed] [Google Scholar]
- 3.Meakins JL. Innovation in surgery: the rules of evidence. Am J Surg. 2002;183:399–405. [DOI] [PubMed] [Google Scholar]
- 4.Begg C, Cho M, Eastwood S, et al. Improving the quality of reports on randomized controlled trials: recommendations of the CONSORT Study Group. Rev Esp Salud Publica. 1998;72:5–11. [PubMed] [Google Scholar]
- 5.Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138:W1–12. [DOI] [PubMed] [Google Scholar]
- 6.Manterola C, Pineda V, Vial M, et al. Revisión sistemática de la literature: propuesta metodológica para su realización. Rev Chil Cir. 2003;55–2:210–214.
- 7.Manterola C, Pineda V, Vial M, et al. Surgery for morbid obesity: selection of operation based on evidence from literature review. Obes Surg. 2005;15:106–113. [DOI] [PubMed] [Google Scholar]
- 8.Manterola C, Pineda V, Vial M. Open versus laparoscopic resection in non-complicated colon cancer. A systematic review. Cir Esp. 2005;78:28–33. [DOI] [PubMed] [Google Scholar]
- 9.Manterola C, Vial M, Pineda V. Is impact factor an appropriate index to determine the level of evidence of studies on therapeutic procedures in surgery journals. Cir Esp. 2005;78:96–99. [DOI] [PubMed] [Google Scholar]
- 10.Manterola C, Busquets J, Pascual M, et al. What is the methodological quality of articles on therapeutic procedures published in Cirugia Espanola? Cir Esp. 2006;79:95–100. [DOI] [PubMed] [Google Scholar]
- 11.http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? Accessed March 11, 2004.
- 12.Rangel SJ, Kelsey J, Henry MC, et al. Critical analysis of clinical research reporting in pediatric surgery: justifying the need for a new standard. J Pediatr Surg. 2003;38:1739–1743. [DOI] [PubMed] [Google Scholar]
- 13.http://www.espalda.org/cientifica/metodo_trabajo/evalua.asp. Accessed February 11, 2005.
- 14.Beasley SW. The value of medical publications: ‘to read them would burden the memory to no useful purpose.’ Aust NZ J Surg. 2000;70:870–874. [DOI] [PubMed] [Google Scholar]
- 15.Solla-Price DJ. Little Science: Big Science. New York: Columbia University Press, 1963. [Google Scholar]
