Skip to main content
Journal of the American Medical Informatics Association : JAMIA logoLink to Journal of the American Medical Informatics Association : JAMIA
. 2019 Dec 30;27(3):449–456. doi: 10.1093/jamia/ocz209

Understanding the nature and scope of clinical research commentaries in PubMed

James R Rogers 1,#, Hollis Mills 1,#, Lisa V Grossman 1, Andrew Goldstein 2,#, Chunhua Weng 1,✉,#
PMCID: PMC7025356  PMID: 31889182

Abstract

Scientific commentaries are expected to play an important role in evidence appraisal, but it is unknown whether this expectation has been fulfilled. This study aims to better understand the role of scientific commentary in evidence appraisal. We queried PubMed for all clinical research articles with accompanying comments and extracted corresponding metadata. Five percent of clinical research studies (N = 130 629) received postpublication comments (N = 171 556), resulting in 178 882 comment–article pairings, with 90% published in the same journal. We obtained 5197 full-text comments for topic modeling and exploratory sentiment analysis. Topics were generally disease specific with only a few topics relevant to the appraisal of studies, which were highly prevalent in letters. Of a random sample of 518 full-text comments, 67% had a supportive tone. Based on our results, published commentary, with the exception of letters, most often highlight or endorse previous publications rather than serve as a prominent mechanism for critical appraisal.

Keywords: scientific commentary, scientific communication, publishing, topic modeling, PubMed

INTRODUCTION

In the scientific evidence base, studies can be susceptible to biases or flaws jeopardizing the validity of their results.1–4 Evidence appraisal, the critical evaluation of published studies, plays an important role in differentiating good science from bad science by uncovering problems in research and its communication, such as biased experimental setup, omitted disclosure of particular limitations, and potential scientific misconduct.5–11 The evidence appraisal results can enable stakeholders to better judge the reliability of the evidence generated to select worthwhile findings for implementation in practice or pursuit as a further research avenue.

Evidence appraisals are communicated in a variety of formats.6 In person meetings, particularly in the form of journal clubs, promote active face-to-face discussion.12–16 Online forums, such as social media and blogs, provide a means for more immediate reactions.17–19 One particular channel of interest is published commentaries, which are communications that facilitate reactions in response to published literature. Examples include letters to the editor and editorials.20–24 PubMed, a search engine used to identify scientific literature primarily indexed in the Medical Literature Analysis and Retrieval System Online (MEDLINE), allows retrieval of comments to published articles.25

Analyzing published commentaries is important, because it provides insights as to which studies are considered noteworthy (for better or worse) by audiences of scientific journals to elicit a published reaction. Understanding the nature of commentaries can better optimize how one utilizes this communication mechanism for improved appraisal of available evidence. Examples of prior analyses focused on author responses to criticisms of their work, topics covered in letters to the editor, and uses of comments to strengthen and improve a particular tool.22,26–28 This study aims to expand upon the prior work and provide a large-scale descriptive analysis of the patterns and content of scientific commentary on clinical research studies to better understand its role in evidence appraisal.

MATERIALS AND METHODS

Summary of data collection and processing

To identify clinical research articles with comments, PubMed was queried on August 17, 2018, and potential PubMed IDs (PMIDs) were extracted.29 Details on the query used, extraction strategies, and preprocessing procedures are provided in Supplementary Material Excerpt 1. The search query involved the use of Medical Subject Headings (MeSH) to identify articles that were indexed with a clinical study methodology, such as “Clinical Trial” or “Observational Study.” The query also required that each article should have at least 1 comment indexed, as specified through “hascommentin.” There was no time window limit applied to the query. The extracted PMIDs from this query served as the clinical research articles for analysis, and their corresponding comments were retrieved as PMIDs. For both, the following metadata was obtained: PMIDs, publication types (PT), publication dates, journal names, MeSH terms, and, if available, PubMed Central IDs (PMCIDs) and Digital Object Identifiers (DOIs). Full-text extraction with existing procedures was performed using 3 unique sources: National Center Biotechnology Information (NCBI) Entrez Programming Utilities, PMC Open Access (OA), and non-PMC journals with OA policies.30–33

To explore the content of the comments, Latent Dirichlet Allocation (LDA) topic modeling with Gibbs sampling was implemented.34,35 To define input terms, common term normalization strategies with a “bag of words” assumption was applied. To define the number of topics, 10-fold cross-validation and perplexity was utilized. The number of possible topics examined was 10, 20, 30, 40, 50, and 100. Once the number of topics was selected, 2 authors (JRR and AG) separately examined the top 20 terms and provided labels. Then, through discussion and consensus, the authors agreed on final labels. If 2 distinct themes were well-represented within 1 topic, then a topic was allowed to be labeled with 2 descriptions (such as “Orthopedics + Scientific Communication”). Repeat topic labels were permissible if topics found were not particularly distinguishable. The R package “topicmodels” was utilized.36 Further details on preprocessing of terms and topic selection are also provided in Supplementary Material Excerpt 1.

Data analysis

The analysis focused on 3 components: (1) descriptive statistics; (2) publication pattern analysis; and (3) content analysis. As per the Internal Review Board, this study was exempt. Descriptive statistics present pertinent characteristics of the published comments and articles. Publication pattern analysis focuses on characteristics regarding extracted relations between comments and articles (ie, comment–article pairings). Content analysis focuses on topics derived from the available full-text comments, specifically topics discovered and the co-occurrence of topics within comments. The content analysis was further enriched by a small-scale, manual sentiment analysis. The sentiment analysis was performed on a random 10% sample of available full-text comments. Two reviewers with training in qualitative methods (JRR and LVG) independently read the comments. Half the comments were reviewed by both reviewers while the other half were divided evenly for single review. The 2 reviewers identified major criticisms or supporting remarks and then labeled them as “generally supportive,” “neutral,” or “generally critical.” “Generally supportive” was defined as a comment that had an overall positive tone, such as highlighting the importance of an article or praising the overall conduct. “Neutral” was defined as a comment without a clear sentiment, such as a primarily descriptive reflection of a commented upon article or a balanced communication of support and criticism without a definitive stance. Finally, “generally critical” was defined as a comment with a negative tone, such as suggesting an article has major flaws or expressing significant disagreements with study execution or interpretation. Kappa statistics were used to assess interrater reliability. The researchers met to resolve conflicts through discussion and adjudicated the results. All analyses were further conducted on 2 subsets of comments: editorials and letters. Subsets were identified using MeSH PT terms, with the exception of the manual sentiment analysis, because the reviewers could directly identify types. Unless specified elsewhere, all data analyses were performed using R 3.4.2.

RESULTS

Descriptive statistics

As of August 17, 2018, only 4.65% of published clinical research articles have at least 1 comment. There was a total of 171 556 unique comments on 130 629 unique articles on completed clinical research studies, as shown in Table 1. Comments were published in 3526 unique journals, while articles were published in 3458 unique journals. The most frequent journal for both publishing comments on clinical studies and receiving commentaries was “The New England Journal of Medicine” (9237 comments; 3449 articles). Both comments and articles shared consistently common major MeSH headings, which generally focused on either oncology or cardiovascular disease.

Table 1.

Descriptive statistics of articles and comments collected (all numbers are presented as “count (%)”)

Overalla
Editorials
Letters
Characteristic Comments (n = 171 556) Articles (n = 130 629) Comments (n = 46 644) Articles (n = 48 370) Comments (n = 85 252) Articles (n = 62 919)
Year of Publication
 Before 1990 807 (0.47) 1582 (1.21) 48 (0.10) 66 (0.14) 723 (0.85) 1431 (2.27)
 1990 to 1994 11 173 (6.51) 9861 (7.55) 2185 (4.68) 2405 (4.97) 8346 (9.79) 7440 (11.82)
 1995 to 1999 17 219 (10.04) 14 782 (11.32) 4524 (9.70) 5034 (10.41) 11 374 (13.34) 9652 (15.34)
 2000 to 2004 27 224 (15.87) 22 661 (17.35) 7871 (16.87) 8433 (17.43) 14 654 (17.19) 11 871 (18.87)
 2005 to 2009 37 686 (21.97) 30 714 (23.51) 11 245 (24.11) 12 084 (24.98) 17 204 (20.18) 13 998 (22.25)
 2010 to 2014 48 353 (28.18) 35 221 (26.96) 13 329 (28.58) 13 752 (28.43) 20 457 (24.00) 13 616 (21.64)
 2015 to August 2018 29 027 (16.92) 15 808 (12.10) 7442 (15.95) 6596 (13.64) 12 494 (14.66) 4911 (7.81)
 Missing 67 (0.04) 0 (0.0) 0 (0.0) 0 (0.0) 0.0 (0.0) 0 (0.0)
Common Journalsb
 The New England Journal of Medicine 8471 (4.94) 3449 (2.64) 1981 (4.25) 2338 (4.83) 6242 (7.32) 2635 (3.09)
 Lancet (London) 6298 (3.67) 3306 (2.53) 55 (0.12) 96 (0.20) 4381 (5.14) 2279 (2.67)
 JAMA 3392 (1.98) 1881 (1.44) 1013 (2.17) 1123 (2.32) 2258 (2.65) 1213 (1.42)
 BMJ (Clinical research ed.) 3030 (1.77) 1787 (1.37) 841 (1.80) 893 (1.85) 1896 (2.22) 1021 (1.20)
 Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology 2502 (1.46) 2163 (1.66) 1080 (2.32) 1239 (2.56) 1355 (1.59) 1012 (1.19)
 Critical Care Medicine 2610 (1.52) 2014 (1.54) 1788 (3.83) 1787 (3.69) 769 (0.90) 520 (0.61)
 The Journal of Urology 2368 (1.38) 1312 (1.00) 863 (1.85) 806 (1.67) 498 (0.58) 432 (0.51)
 Journal of the American College of Cardiology 2410 (1.40) 1992 (1.52) 1509 (3.24) 1575 (3.26) 876 (1.03) 617 (0.72)
 Circulation 2144 (1.25) 1819 (1.39) 1055 (2.26) 1103 (2.28) 980 (1.15) 745 (0.87)
 Annals of Internal Medicine 1958 (1.14) 815 (0.62) 361 (0.77) 382 (0.79) 771 (0.90) 467 (0.55)
Common PT Termsc
 Comment 171 489 (99.96) 924 (0.71) 46 644 (100) 78 (0.16) 85 252 (100) 753 (1.20)
 Letter 85 237 (49.68) 1464 (1.12) 2 (<0.01) 77 (0.16) 85 252 (100) 1289 (2.05)
 Editorial 46 644 (27.19) 410 (0.31) 46 644 (100) 46 (0.10) 2 (<0.01) 336 (0.53)
 Journal article 38 234 (22.29) 128 646 (98.48) 4 (<0.01) 48 213 (99.68) 4 (<0.01) 61 204 (97.27)
 Comparative study 7252 (4.23) 64 964 (49.73) 1576 (3.38) 22 418 (46.35) 4626 (5.43) 33 142 (52.67)
 Review 3469 (2.02) 2856 (2.19) 1862 (3.99) 712 (1.47) 85 (0.10) 1612 (2.56)
 Randomized controlled trial 1052 (0.61) 44 119 (33.77) 143 (0.31) 16 847 (34.83) 703 (0.82) 22 246 (35.36)
 Clinical Trial 1397 (0.81) 31 582 (24.18) 193 (0.41) 11 208 (23.17) 1057 (1.24) 18 637 (29.62)
 Multicenter Study 278 (0.16) 31 282 (23.95) 59 (0.13) 14 403 (29.78) 128 (0.15) 13 312 (21.16)
 Evaluation Studies 222 (0.13) 9002 (6.89) 34 (0.07) 3093 (6.39) 119 (0.14) 4089 (6.50)
Common MeSH Headingsc
 Humans 152 748 (89.04) 124 425 (95.25) 44 278 (94.93) 46 464 (96.06) 81 070 (95.09) 60 717 (96.50)
 Female 56 236 (32.78) 98 816 (75.65) 14 966 (32.09) 38 294 (79.17) 28 900 (33.90) 47 215 (75.04)
 Male 48 847 (28.47) 94 622 (72.44) 13 889 (29.78) 37 732 (78.01) 23 352 (27.39) 43 673 (69.41)
 Treatment outcome 9730 (5.67) 31 215 (23.90) 3463 (7.42) 12 546 (25.94) 4724 (5.54) 13 291 (21.12)
 Animals 8586 (5.00) 7957 (6.09) 2865 (6.14) 2641 (5.46) 3031 (3.56) 2725 (4.33)
 Middle aged 4437 (2.59) 72 855 (55.77) 1008 (2.16) 29 897 (61.81) 2843 (3.33) 34 318 (54.54)
 Adult 5559 (3.24) 60 236 (46.11) 1258 (2.70) 21 346 (44.13) 3422 (4.01) 30 625 (48.67)
 Aged 4825 (2.81) 56 418 (43.19) 1290 (2.77) 23 874 (49.36) 2898 (3.40) 26 043 (41.39)
 Prospective studies 1455 (0.85) 24 878 (19.04) 264 (0.01) 10 381 (21.46) 963 (1.13) 12 279 (19.52)
 Adolescent 2572 (1.50) 20 860 (15.97) 699 (1.50) 7096 (14.67) 1396 (1.64) 10 269 (16.32)
 Missing 67 (0.04) 0 (0.00) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)
Common Major MeSH Headingsc
 Antineoplastic Combined Chemotherapy Protocols/* therapeutic use 1878 (1.09) 2134 (1.63) 543 (1.16) 913 (1.89) 887 (1.04) 915 (1.45)
 Stents* 1151 (0.67) 1102 (0.84) 506 (1.08) 596 (1.23) 404 (0.47) 322 (0.51)
 Quality of Life* 1107 (0.65) 1208 (0.92) 370 (0.79) 528 (1.09) 480 (0.56) 460 (0.73)
 Anti-Bacterial Agents/* therapeutic use 1096 (0.64) 868 (0.66) 290 (0.62) 304 (0.63) 632 (0.74) 475 (0.75)
 Antineoplastic Agents/* therapeutic use 990 (0.58) 849 (0.65) 288 (0.62) 330 (0.68) 435 (0.51) 329 (0.52)
 Hypertension/* drug therapy 893 (0.52) 843 (0.65) 296 (0.63) 387 (0.80) 425 (0.50) 350 (0.56)
 Myocardial Infarction/* therapy 881 (0.51) 789 (0.60) 448 (0.96) 537 (1.11) 330 (0.39) 263 (0.42)
 Breast Neoplasms/* drug therapy 854 (0.50) 773 (0.59) 265 (0.57) 361 (0.75) 460 (0.54) 370 (0.59)
 Antibodies, Monoclonal/* therapeutic use 777 (0.45) 733 (0.56) 232 (0.50) 326 (0.67) 378 (0.44) 333 (0.53)
 Asthma/* drug therapy 712 (0.42) 685 (0.52) 226 (0.48) 291 (0.60) 431 (0.51) 371 (0.59)
 Missing 67 (0.04) 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Commonly Listed Funding Sources
 Research Support, Non-US Gov’t 5971 (3.48) 61 824 (47.33) 2314 (4.96) 24 842 (51.36) 2050 (2.40) 27 263 (43.33)
 Research Support, NIH, Extramural 2564 (1.49) 12 292 (9.41) 1374 (2.95) 5544 (11.46) 376 (0.44) 3822 (6.07)
 Research Support, US Gov’t, PHS 798 (0.47) 8883 (6.80) 442 (0.95) 3658 (7.56) 97 (0.11) 4324 (6.87)
 Research Support, US Gov’t, Non-PHS 451 (0.26) 3530 (2.70) 194 (0.42) 1290 (2.67) 72 (0.08) 1412 (2.24)
 Research Support, NIH, Intramural 120 (0.07) 536 (0.41) 57 (0.12) 202 (0.42) 16 (0.02) 168 (0.27)

Abbreviations: MeSH, medical subject headings; NIH, National Institutes of Health; PT, publication type; PHS, Public Health Service.

a

The sum of all comments is not expected to equal the sum of editorials and letters, because not all comments are indexed into 1 of those 2 categories.

b

Journal names were used as presented from extraction; the list of common journals was selected based on comments first (overall).

c

When selecting common PT terms, MeSH headings, and major MeSH headings, the top 5 for comments (overall) was selected first, followed by the top 5 for articles (overall) that did include terms from the prior selection. For PT terms, those related to research support (eg, “Research Support, Non-US Gov’t”) were listed as a separate category from the PT term list.

There was a total of 46 644 unique editorials commenting on 48 370 unique articles and there was a total of 85 252 unique letters commenting on 62 919 unique articles. The sum of the editorials and letters do not equal the sum of all comments, because not all comments can be indexed into 1 of those 2 categories. For example, an invited commentary that is written by a noneditor for a journal that is written similarly to an editorial would not be indexed as 1, because the author is not part of the editorial team. As a result, editorials and letters serve as 2 prominent categories for comments, but do not necessarily represent the entirety of all comments. Editorials were published in 1654 unique journals, while corresponding articles were published in 1653 unique journals. For letters, they were published in 2606 unique journals, while corresponding articles were published in 2627 unique journals. Only 2 comments were labeled as both an editorial and a letter. Compared to the overall set, similar patterns persisted for both stratifications with the exception of “The Lancet (London)” being relatively uncommon in the editorial group. Expanded presentation of published journals and major MeSH headings are available as word clouds in Supplementary Material Figures 1 and 2.

Publication pattern analysis

A total of 178 951 comment–article pairings were found, with 160 503 (90%) occurring within the same journal. The mean and median time from publication of an article to the publication of a comment was 6 months and 4 months, respectively. For editorials, there were 49 856 such pairings, with mean and median times being 2 months and 0 months, respectively. For letters, there were 71 496 such pairings, with mean and median times being 9 months and 7 months, respectively. Figure 1 presents the cumulative incidence of the length of time for all extracted pairings. Overall, the majority of commented articles received comments within 1 year for all stratifications. A visualization that includes all possible time lengths is provided in Supplementary Material 3.

Figure 1.

Figure 1.

Cumulative incidence of comment–article pairings (for time differences between 0 to 24 months).

Content analysis

Of the 5197 comments that were used for content analysis, 1504 were editorials and 1531 were letters; the remaining 2162 did not readily have an available or common MeSH index term to derive a comment type. Supplementary Material Table 1 provides a summary of descriptive statistics for the full-text comments. The subset had a different distribution of journals represented with the most common being “Critical Care (London)” (674; 13%).

After model development, 40 topics was found to be optimal. Table 2 provides labels for each topic as well as the 10 most common terms for that topic (Supplementary Material Table 2 for 20 most common and corresponding probabilities). The majority of topics focused on disease content, but there existed a few topics on interpretation or application of study results, such as “General Scientific Evaluation” (Topic 15), “Diagnostic Evaluation” (Topic 17), and “Participant Characteristics” (Topic 34). Figure 2 presents the percentage of full-text comments that contain potential appraisal topics. Of particular note, “General Scientific Evaluation” was present in 15% of all comments, 8% in all editorials, and 26% of all letters. When examining co-occurrences of topics, “General Scientific Evaluation” commonly co-occurs across all topics in letters, whereas other stratifications did not have as strongly observed patterns (Supplementary Material Figure 4 for all comments, Supplementary Material Figure 5 for editorials, and Supplementary Material Figure 6 for letters).

Table 2.

Topics with their 10 most common termsa

Topic 1: Pain Assessment Topic 2: Biochemistry Topic 3: Diabetes Topic 4: Pulmonology Topic 5: Non-Specific Article Content
pain, score, knee, item, scale, instrument, questionnair, joint, replac, osteoarthr concentr, plasma, metabol, acid, exposur, water, metabolit, vitro, enzym, compound diabet, insulin, glucos, blood, devic, hypoglycemia, monitor, technolog, glycem, hbac lung, ventil, respiratori, pressur, ard, pulmonari, airway, volum, acut, asthma http, com, doi, med, tion, org, content, van, cant, signi
Topic 6: Surgery and Anesthesia Topic 7: Genetics Topic 8: Psychiatric Disease (in Pediatrics) Topic 9: Critical Care Topic 10: Cardiology
surgeri, procedur, surgic, oper, hospit, postop, cardiac, arrest, techniqu, surgeon gene, dna, sequenc, express, genom, protein, cell, speci, transcript, rna children, depress, disord, symptom, cognit, pediatr, adult, age, mental, intervent mortal, sepsi, icu, day, ill, hospit, admiss, shock, septic, hour heart, cardiac, ventricular, left, myocardi, ablat, atrial, dysfunct, arrhythmia, fraction
Topic 11: Ischemic Disease Topic 12: Hemodynamics Topic 13: Ophthalmology + Journal Correspondence Topic 14: Metabolic Syndrome Topic 15: General Scientific Evaluation
stroke, acut, bleed, coronari, infarct, stent, ischem, anticoagul, myocardi, platelet fluid, blood, pressur, arteri, flow, oxygen, volum, shock, hemodynam, pulmonari respond, visual, eye, agre, editor, figur, read, letter, retin, thank weight, obes, metabol, acid, diet, fat, food, intak, loss, nutrit bias, error, tabl, figur, calcul, correct, assumpt, meta, read, dataset
Topic 16: Hepatology Topic 17: Diagnostic Evaluation Topic 18: Oncology Topic 19: Signaling Pathways Topic 20: Orthopedics + Scientific Communication
liver, vitamin, hepat, hcc, hcv, cirrhosi, pancreat, fibrosi, serum, defici score, diagnosi, biomark, diagnost, marker, assay, prognost, diagnos, index, decis cancer, screen, women, breast, prostat, men, hpv, psa, cervic, androgen cell, receptor, protein, signal, express, pathway, regul, inhibit, mice, channel bone, joint, surgeon, corr, orthopaed, hip, opinion, fractur, doi, con
Topic 21: Oncology Topic 22: Radiology Topic 23: Health Systems Topic 24: Autoimmune Disease Topic 25: Nervous System
cancer, tumor, surviv, chemotherapi, node, recurr, phase, inhibitor, tumour, breast imag, coronari, mri, lesion, scan, tissu, techniqu, plaqu, volum, arteri health, cost, hospit, countri, servic, program, communiti, healthcar, polici, econom cell, immun, inflammatori, anti, cytokin, express, inflamm, antibodi, antigen, arthriti brain, neuron, task, memori, neural, network, behavior, connect, cortex, cognit
Topic 26: Stem Cells + Pregnancy Topic 27: Article Meta Information Topic 28: HIV + Global Health Topic 29: Kidney Disease Topic 30: Sleep Medicine
cell, transplant, infant, pregnanc, women, stem, birth, growth, matern, fetal usa, intern, conflict, ofth, email, doi, gen, school, health, med hiv, infect, resist, sexual, art, drug, malaria, transmiss, circumcis, africa renal, kidney, aki, ckd, serum, injuri, acut, creatinin, dialysi, gfr sleep, osa, dynam, movement, frequenc, space, shape, circadian, clock, sperm
Topic 31: Behavioral Intervention Topic 32: Blood + Oxygenation Topic 33: Gastroenterology Topic 34: Participant Characteristics Topic 35: Bacteriology
intervent, physician, health, educ, train, particip, behavior, decis, program, communic blood, transfus, oxid, iron, plasma, mitochondri, trauma, injuri, air, coagul alcohol, drink, gastric, bowel, symptom, ulcer, intestin, pylori, endoscop, opioid age, women, older, smoke, sex, ethnic, adjust, white, particip, preval infect, antibiot, resist, cultur, strain, bacteri, pathogen, bacteria, isol, pneumonia
Topic 36: Genetic Variation Topic 37: Pharmacology Topic 38: Cardiovascular Disease Topic 39: Physical Exercise + Bone Health Topic 40: Immunology
genet, mutat, gene, variant, genotyp, allel, phenotyp, polymorph, frequenc, suscept dose, drug, placebo, agent, day, safeti, week, regimen, phase, arm cardiovascular, hypertens, pressur, blood, statin, vascular, heart, arteri, blocker, beta exercis, muscl, bone, physic, fractur, forc, temperatur, skelet, loss, rehabilit vaccin, virus, antibodi, influenza, membran, protein, vector, viral, energi, immun
a

Note that order of the words within each topic corresponds to higher likelihood terms for that particular topic. For example, in “Topic 1: Pain Assessment,” the term “pain” is a more prominent and likely term that is used to represent that topic as opposed to the term “score.” Supplementary Material Table 2 includes probabilities for each term.

Figure 2.

Figure 2.

Percentage of full-text comments that contain 1 of the potential appraisal topics.a

aOf the 40 topics found, these are the 5 topics that were considered to be a potential appraisal topic. In order for a comment to be included, the topic of interest must have a minimum proportion of at least 0.05 amongst the entire distribution of topics that a comment consists of. For example, if the distribution of a comment has Topic 1 at 0.67, Topic 2 at 0.32, and Topic 3 at 0.01, then only Topic 1 and Topic 2 would meet the 0.05 threshold.

There were 520 comments (10% of all full-text comments) for the sentiment analysis. Of them, 260 comments were reviewed by both reviewers, 130 were reviewed by 1 reviewer, and the remaining 130 were reviewed by the other reviewer. Because the manual review process was not limited by the assigned MeSH indices for labeling, the reviewers specified 1 of the following labels for each comment: (1) editorial or other commissioned content; (2) letter to the editor; (3) author reply; and (4) notification of scientific misconduct. During the review process, 7 comments were removed from the set of 520 because they were improperly indexed—either because the comment did not match the article it was commenting on or the comment was not actually a comment. Additionally, there were 5 comments that contained both a letter to the editor and an author’s reply that were indexed as 1 comment; these comments were counted as 2 separate comments for analysis. Altogether, the final total of comments for the manual sentiment analysis became 518 instead of 520.

Of the 518 comments identified for sentiment analysis, 339 (65%) were editorial or other commissioned content, 119 (23%) were letters to the editor, 58 (11%) were author replies, and 2 (< 1%) were notifications of scientific misconduct. Figure 3 provides a visual breakdown of the sentiment analysis (while Supplementary Material Table 3 provides a tabular format). Overall, 346 (67%) were labeled as “generally supportive,” 49 (9%) were labeled as “neutral,” and 123 (24%) were labeled as “generally critical.” When stratifying by type of comment, the following totals were found to be “generally supportive”: 286 (84%) editorials or other commissioned content; 17 (14%) letters to editor, 43 (74%) author replies, and 0 (0%) notifications of scientific misconduct. There was an observed agreement of 78% in labeling sentiment, with a kappa statistic of 0.54.

Figure 3.

Figure 3.

Distribution of sentiment for a subset of the full-text comments reviewed.a

aOnly categories with at least 50 comments are displayed. Specifically, the category of “Notification of Scientific Misconduct” is not displayed, because there were only 2 such comments in the sample (both were labeled as “Generally Critical”).

DISCUSSION

This descriptive study provides an original overview of published commentary on clinical research articles as it relates to evidence appraisal. In general, published commentary has low prevalence in the evidence base and tends to be confined to the same journals as opposed to across journals. For journals that contain the most comments, they tend to focus on general medicine as opposed to a specific disease, were established long ago, and have a high impact factor (although, citations from the comments can potentially impact this metric). These characteristics likely motivate individuals to contribute commentary and promote discussion within a journal audience, but also disincentivize engagement with other audiences.

In its entirety, comments demonstrate an overall supportive or promotional tone based on sentiment identified. It is only through stratification of comment type that differences in tone are evident. For example, editorials, which were often published at the same time as the commented upon article, generally presented the article in context of other evidence in a relatively supportive manner. In contrast, letters displayed evidence of communicating concerns about the published study, such as the lack of clarity in variable definitions or doubts regarding statistical assumptions. This becomes particularly apparent when the topic of “general scientific evaluation” is heavily represented in letters as compared to editorials. The finding that letters are the most prominent source of appraisal topics is consistent with the expectation that they provide a mechanism for additional research evaluation as explored in prior work.22,27 Another potential source exacerbating the overall tone is the publication process. Letters usually require review only from the editorial staff, which can ultimately filter out reactions.23 When a letter is potentially published, it can be accompanied by responses from the study authors, which usually involve addressing concerns brought up by the letter. These responses are expected to present the commented article in a positive manner as the authors will likely be defending their work consistent with what was identified in our sentiment analysis. Ultimately, these observations suggest that any analysis on scientific commentary needs to take comment type into careful consideration.

Related to the publication process, any future work should be aware of writing conventions within comments that can muddle their true sentiment. For example, a comment can be published to highlight a study but also acknowledge the study’s limitations specified by the study authors as commonly observed in an editorial. Despite the text mentioning limitations, the primary purpose of the comment is to focus on the importance of the study and thus provide a positive sentiment. This is particularly pertinent when considering an automated sentiment analysis strategy, because not all text in a comment is equally important. In light of this challenge, there exists encouraging work focused on sentence-level analyses of citations that can provide a solid foundation for future work.37–39

Regarding potential implications, perhaps the most striking finding is the low prevalence of commentary as a mechanism for evaluating published studies, as less than 5% of clinical research studies had at least 1 comment. This prevalence is further confounded by comment type, such as commissioned work that juxtaposes the study within the context of other work to discuss its relevance. With a constantly growing number of available clinical research studies available, considerations for prioritizing or incentivizing the production of comments may serve as a worthwhile strategy so that studies are more thoroughly evaluated, complementary to initial peer reviews. Calls for stronger postpublication review have been expressed previously highlighting that that post publication comments can foster idea synthesis, reduce potential research waste, and improve overall research quality.17,40 Although this study only explores 1 mechanism of the postpublication process, it provides original insights as to the current underutilization of published commentary.

Within this study, we were limited to publication type identifiers, and thus could not filter out by more granular stratifications, such as when a comment is invited. Moreover, the results of this analysis may not generalize to comments published outside of PubMed. Similarly, only comments that qualified for our query were extracted. Additional comments could have been missed, such as those that react to comments that do not have a clinical research study-specific indexing. For content analysis, only a subset of comments was obtained leading to limited generalizability for all comments; the sentiment analysis is further exacerbated by this limitation because it only used a sample of that subset. Journals that publish with open-access policies may serve different scientific and practitioner communities and may have different discourse norms. In terms of topic modeling, the models were run on the entirety of comments’ text as opposed to more granular components; applying topic models to paragraph level text may have revealed more nuanced topics. Different techniques, such as using word embeddings with LDA or applying alternative topic modeling techniques, may have also revealed more nuanced topics.41–43 Finally, both topic labeling and sentiment analysis are susceptible to subjectivity as different reviewers may provide different interpretations.

CONCLUSION

This study contributes a large-scale analysis of scientific commentary to better understand its role in evidence appraisal. With the majority of comments existing within the same journals as their target article, any potential appraisal is likely limited to the audience of that journal. Furthermore, our findings suggest published comments are more likely to exist in a supportive perspective unless further stratifications are examined. Ultimately, this descriptive study demonstrates that different types of commentary have different natures, but opportunities exist to utilize this communication mechanism for targeted appraisal evaluations.

Copyright/license for publication

The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide license to the Publishers and its licensees in perpetuity, in all forms, formats, and media (whether known now or created in the future), to i) publish, reproduce, distribute, display, and store the Contribution; ii) translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts, and/or abstracts of the Contribution; iii) create any other derivative work(s) based on the Contribution; iv) to exploit all subsidiary rights in the Contribution; v) the inclusion of electronic links from the Contribution to third party material wherever it may be located; and vi) license any third party to do any or all of the above.

FUNDING

This research was funded by National Library of Medicine grants R01LM009886-10 (PI: Weng) and 5T15LM007079-27 (PI: Hripcsak).

AUTHOR CONTRIBUTIONS

JRR contributed to the design of the study, data analysis, and manuscript drafting. HM contributed to the design of the study, data collection, data analysis, manuscript revisions. LVG contributed to data analysis and manuscript revisions. AG contributed to idea generation, the design of the study, data analysis and manuscript revisions. CW contributed to the idea generation, design of the study and manuscript editing, and supervised the research. The corresponding author is the guarantor of the article and attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

APPROVAL

This study did not involve human subjects, making it exempt from Institutional Review Board (IRB) approval.

Supplementary Material

ocz209_Supplementary_Data

ACKNOWLEDGMENTS

We thank Chi Yuan for assisting with data extraction assistance from PubMed.

Conflict of Interest statement

None declared.

DATA SHARING

All data and analyses methods are available upon request.

REFERENCES

  • 1. Ioannidis JPA, Greenland S, Hlatky MA, et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 2014; 383 (9912): 166–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Ioannidis J. Why most published research findings are false. PLOS Med 2005; 2 (8): e124.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Rothwell PM. External validity of randomised controlled trials: to whom do the results of this trial apply? Lancet 2005; 365 (9453): 82–93. [DOI] [PubMed] [Google Scholar]
  • 4. Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999; 282 (11): 1061–6. [DOI] [PubMed] [Google Scholar]
  • 5. Brænd AM, Straand J, Klovning A.. Clinical drug trials in general practice: how well are external validity issues reported? BMC Fam Pract 2017; 18 (1): 113.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Goldstein A, Venker E, Weng C.. Evidence appraisal: a scoping review, conceptual framework, and research agenda. J Am Med Inform Assoc 2017; 24 (6): 1192–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Casadevall A, Ellis LM, Davies EW, et al. A framework for improving the quality of research in the biological sciences. mBio 2016; 7 (4): e01256–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. George SL, Buyse M.. Data fraud in clinical trials. Clin Investig 2015; 5 (2): 161–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Savović J, Jones HE, Altman DG, et al. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med 2012; 157 (6): 429. [DOI] [PubMed] [Google Scholar]
  • 10. Guyatt GH, Oxman AD, Vist G, et al. GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol 2011; 64 (4): 407–15. [DOI] [PubMed] [Google Scholar]
  • 11. Ferreira-González I, Permanyer-Miralda G, Domingo-Salvany A, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ 2007; 334 (7597): 786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Campbell ST, Kang JR, Bishop JA.. What makes journal club effective?—a survey of orthopaedic residents and faculty. J Surg Educ 2018; 75 (3): 722–9. CrossRef][10.1016/j.jsurg.2017.07.026] [DOI] [PubMed] [Google Scholar]
  • 13. Ahmadi N, McKenzie ME, MacLean A, et al. Teaching evidence based medicine to surgery residents-is journal club the best format? A systematic review of the literature. J Surg Educ 2012; 69 (1): 91–100. [DOI] [PubMed] [Google Scholar]
  • 14. Honey CP, Baker JA.. Exploring the impact of journal clubs: a systematic review. Nurse Educ Today 2011; 31 (8): 825–31. [DOI] [PubMed] [Google Scholar]
  • 15. Wright J. Journal clubs—science as conversation. N Engl J Med 2004; 351 (1): 10–2. [DOI] [PubMed] [Google Scholar]
  • 16. Mazuryk M, Daeninck P, Neumann CM, et al. Daily journal club: an education tool in palliative care. Palliat Med 2002; 16 (1): 57–61. [DOI] [PubMed] [Google Scholar]
  • 17. Knoepfler P. Reviewing post-publication peer review. Trends Genet 2015; 31 (5): 221–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Faulkes Z. The Vacuum Shouts Back: post publication peer review on social media. Neuron 2014; 82 (2): 258–60. [DOI] [PubMed] [Google Scholar]
  • 19. Ghosh SS, Klein A, Avants B, et al. Learning from open source software projects to improve scientific review. Front Comput Neurosci 2012; 6:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Tierney E, O’Rourke C, Fenton JE.. What is the role of ‘the letter to the editor’? Eur Arch Otorhinolaryngol 2015; 272 (9): 2089–93. [DOI] [PubMed] [Google Scholar]
  • 21. Collier R. When postpublication peer review stings. CMAJ 2014; 186 (12): 904.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Horton R. Postpublication criticism and the shaping of clinical knowledge. JAMA 2002; 287 (21): 2843–7. [DOI] [PubMed] [Google Scholar]
  • 23. Winker MA, Fontanarosa PB.. Letters: a forum for scientific discourse. JAMA 1999; 281 (16): 1543–1543. [DOI] [PubMed] [Google Scholar]
  • 24.Publication Characteristics (Publication Types) with Scope Notes. https://www.nlm.nih.gov/mesh/pubtypes.html Accessed January 18, 2019.
  • 25.MEDLINE, PubMed, and PMC (PubMed Central). How are they different? https://www.nlm.nih.gov/bsd/difference.html Accessed August 13, 2018.
  • 26. Jørgensen L, Paludan-Müller AS, Laursen DRT, et al. Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews. Syst Rev 2016; 5 (1): 80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Kastner M, Menon A, Straus SE, et al. What do letters to the editor publish about randomized controlled trials? A cross-sectional study. BMC Res Notes 2013; 6 (1): 414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Gøtzsche PC, Delamothe T, Godlee F, et al. Adequacy of authors’ replies to criticism raised in electronic letters to the editor: cohort study. BMJ 2010; 341: c3926.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Cock PJA, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009; 25 (11): 1422–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Sayers E. The E-utilities In-Depth: Parameters, Syntax and More. Bethesda (MD): National Center for Biotechnology Information (US); 2017. https://www.ncbi.nlm.nih.gov/books/NBK25499/ Accessed August 13, 2018.
  • 31. Chamberlain S, Pearse W.. fulltext: Full Text of “Scholarly” Articles Across Many Data Sources; 2018. https://CRAN.R-project.org/package=fulltext Accessed August 15, 2018.
  • 32. Richardson L. beautifulsoup4; 2018. https://pypi.org/project/beautifulsoup4/ Accessed August 13, 2018.
  • 33. Shinyama Y. PDFMiner. PDFMiner; 2014. http://www.unixuser.org/∼euske/python/pdfminer/index.html Accessed August 13, 2018.
  • 34. Blei DM. Probabilistic topic models. Commun ACM 2012; 55 (4): 77–84. [Google Scholar]
  • 35. Blei DM, Ng AY, Jordan MI.. Latent Dirichlet allocation. J Mach Learn Res 2003; 3: 993–1022. [Google Scholar]
  • 36. Hornik K, Grün B.. topicmodels: an R package for fitting topic models. J Stat Softw 2011; 40: 1–30. [Google Scholar]
  • 37. Kilicoglu H. Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 2018; 19 (6): 1400–14. doi: 10.1093/bib/bbx057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Xu J, Zhang Y, Wu Y, et al. Citation sentiment analysis in clinical trial papers. AMIA Ann Symp Proc 2015; 2015: 1334–41. [PMC free article] [PubMed] [Google Scholar]
  • 39. Yu B. Automated citation sentiment analysis: what can we learn from biomedical researchers. Proc Am Soc Inf Sci Technol 2013; 50 (1): 1–9. [Google Scholar]
  • 40. Bastian H. A stronger post-publication culture is needed for better science. PLoS Med 2014; 11 (12): e1001772.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Moody CE. Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv: 160502019 Cs. http://arxiv.org/abs/1605.02019 Accessed March 19, 2019.
  • 42. Lafferty JD, Blei DM.. Correlated topic models In: Weiss Y, Schölkopf B, Platt JC, eds. Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press; 2006: 147–54. [Google Scholar]
  • 43. Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 2001; 42 (1/2): 177–96. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocz209_Supplementary_Data

Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES