Abstract
Scientific commentaries are expected to play an important role in evidence appraisal, but it is unknown whether this expectation has been fulfilled. This study aims to better understand the role of scientific commentary in evidence appraisal. We queried PubMed for all clinical research articles with accompanying comments and extracted corresponding metadata. Five percent of clinical research studies (N = 130 629) received postpublication comments (N = 171 556), resulting in 178 882 comment–article pairings, with 90% published in the same journal. We obtained 5197 full-text comments for topic modeling and exploratory sentiment analysis. Topics were generally disease specific with only a few topics relevant to the appraisal of studies, which were highly prevalent in letters. Of a random sample of 518 full-text comments, 67% had a supportive tone. Based on our results, published commentary, with the exception of letters, most often highlight or endorse previous publications rather than serve as a prominent mechanism for critical appraisal.
Keywords: scientific commentary, scientific communication, publishing, topic modeling, PubMed
INTRODUCTION
In the scientific evidence base, studies can be susceptible to biases or flaws jeopardizing the validity of their results.1–4 Evidence appraisal, the critical evaluation of published studies, plays an important role in differentiating good science from bad science by uncovering problems in research and its communication, such as biased experimental setup, omitted disclosure of particular limitations, and potential scientific misconduct.5–11 The evidence appraisal results can enable stakeholders to better judge the reliability of the evidence generated to select worthwhile findings for implementation in practice or pursuit as a further research avenue.
Evidence appraisals are communicated in a variety of formats.6 In person meetings, particularly in the form of journal clubs, promote active face-to-face discussion.12–16 Online forums, such as social media and blogs, provide a means for more immediate reactions.17–19 One particular channel of interest is published commentaries, which are communications that facilitate reactions in response to published literature. Examples include letters to the editor and editorials.20–24 PubMed, a search engine used to identify scientific literature primarily indexed in the Medical Literature Analysis and Retrieval System Online (MEDLINE), allows retrieval of comments to published articles.25
Analyzing published commentaries is important, because it provides insights as to which studies are considered noteworthy (for better or worse) by audiences of scientific journals to elicit a published reaction. Understanding the nature of commentaries can better optimize how one utilizes this communication mechanism for improved appraisal of available evidence. Examples of prior analyses focused on author responses to criticisms of their work, topics covered in letters to the editor, and uses of comments to strengthen and improve a particular tool.22,26–28 This study aims to expand upon the prior work and provide a large-scale descriptive analysis of the patterns and content of scientific commentary on clinical research studies to better understand its role in evidence appraisal.
MATERIALS AND METHODS
Summary of data collection and processing
To identify clinical research articles with comments, PubMed was queried on August 17, 2018, and potential PubMed IDs (PMIDs) were extracted.29 Details on the query used, extraction strategies, and preprocessing procedures are provided in Supplementary Material Excerpt 1. The search query involved the use of Medical Subject Headings (MeSH) to identify articles that were indexed with a clinical study methodology, such as “Clinical Trial” or “Observational Study.” The query also required that each article should have at least 1 comment indexed, as specified through “hascommentin.” There was no time window limit applied to the query. The extracted PMIDs from this query served as the clinical research articles for analysis, and their corresponding comments were retrieved as PMIDs. For both, the following metadata was obtained: PMIDs, publication types (PT), publication dates, journal names, MeSH terms, and, if available, PubMed Central IDs (PMCIDs) and Digital Object Identifiers (DOIs). Full-text extraction with existing procedures was performed using 3 unique sources: National Center Biotechnology Information (NCBI) Entrez Programming Utilities, PMC Open Access (OA), and non-PMC journals with OA policies.30–33
To explore the content of the comments, Latent Dirichlet Allocation (LDA) topic modeling with Gibbs sampling was implemented.34,35 To define input terms, common term normalization strategies with a “bag of words” assumption was applied. To define the number of topics, 10-fold cross-validation and perplexity was utilized. The number of possible topics examined was 10, 20, 30, 40, 50, and 100. Once the number of topics was selected, 2 authors (JRR and AG) separately examined the top 20 terms and provided labels. Then, through discussion and consensus, the authors agreed on final labels. If 2 distinct themes were well-represented within 1 topic, then a topic was allowed to be labeled with 2 descriptions (such as “Orthopedics + Scientific Communication”). Repeat topic labels were permissible if topics found were not particularly distinguishable. The R package “topicmodels” was utilized.36 Further details on preprocessing of terms and topic selection are also provided in Supplementary Material Excerpt 1.
Data analysis
The analysis focused on 3 components: (1) descriptive statistics; (2) publication pattern analysis; and (3) content analysis. As per the Internal Review Board, this study was exempt. Descriptive statistics present pertinent characteristics of the published comments and articles. Publication pattern analysis focuses on characteristics regarding extracted relations between comments and articles (ie, comment–article pairings). Content analysis focuses on topics derived from the available full-text comments, specifically topics discovered and the co-occurrence of topics within comments. The content analysis was further enriched by a small-scale, manual sentiment analysis. The sentiment analysis was performed on a random 10% sample of available full-text comments. Two reviewers with training in qualitative methods (JRR and LVG) independently read the comments. Half the comments were reviewed by both reviewers while the other half were divided evenly for single review. The 2 reviewers identified major criticisms or supporting remarks and then labeled them as “generally supportive,” “neutral,” or “generally critical.” “Generally supportive” was defined as a comment that had an overall positive tone, such as highlighting the importance of an article or praising the overall conduct. “Neutral” was defined as a comment without a clear sentiment, such as a primarily descriptive reflection of a commented upon article or a balanced communication of support and criticism without a definitive stance. Finally, “generally critical” was defined as a comment with a negative tone, such as suggesting an article has major flaws or expressing significant disagreements with study execution or interpretation. Kappa statistics were used to assess interrater reliability. The researchers met to resolve conflicts through discussion and adjudicated the results. All analyses were further conducted on 2 subsets of comments: editorials and letters. Subsets were identified using MeSH PT terms, with the exception of the manual sentiment analysis, because the reviewers could directly identify types. Unless specified elsewhere, all data analyses were performed using R 3.4.2.
RESULTS
Descriptive statistics
As of August 17, 2018, only 4.65% of published clinical research articles have at least 1 comment. There was a total of 171 556 unique comments on 130 629 unique articles on completed clinical research studies, as shown in Table 1. Comments were published in 3526 unique journals, while articles were published in 3458 unique journals. The most frequent journal for both publishing comments on clinical studies and receiving commentaries was “The New England Journal of Medicine” (9237 comments; 3449 articles). Both comments and articles shared consistently common major MeSH headings, which generally focused on either oncology or cardiovascular disease.
Table 1.
Descriptive statistics of articles and comments collected (all numbers are presented as “count (%)”)
Overalla |
Editorials |
Letters |
||||
---|---|---|---|---|---|---|
Characteristic | Comments (n = 171 556) | Articles (n = 130 629) | Comments (n = 46 644) | Articles (n = 48 370) | Comments (n = 85 252) | Articles (n = 62 919) |
Year of Publication | ||||||
Before 1990 | 807 (0.47) | 1582 (1.21) | 48 (0.10) | 66 (0.14) | 723 (0.85) | 1431 (2.27) |
1990 to 1994 | 11 173 (6.51) | 9861 (7.55) | 2185 (4.68) | 2405 (4.97) | 8346 (9.79) | 7440 (11.82) |
1995 to 1999 | 17 219 (10.04) | 14 782 (11.32) | 4524 (9.70) | 5034 (10.41) | 11 374 (13.34) | 9652 (15.34) |
2000 to 2004 | 27 224 (15.87) | 22 661 (17.35) | 7871 (16.87) | 8433 (17.43) | 14 654 (17.19) | 11 871 (18.87) |
2005 to 2009 | 37 686 (21.97) | 30 714 (23.51) | 11 245 (24.11) | 12 084 (24.98) | 17 204 (20.18) | 13 998 (22.25) |
2010 to 2014 | 48 353 (28.18) | 35 221 (26.96) | 13 329 (28.58) | 13 752 (28.43) | 20 457 (24.00) | 13 616 (21.64) |
2015 to August 2018 | 29 027 (16.92) | 15 808 (12.10) | 7442 (15.95) | 6596 (13.64) | 12 494 (14.66) | 4911 (7.81) |
Missing | 67 (0.04) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0.0 (0.0) | 0 (0.0) |
Common Journalsb | ||||||
The New England Journal of Medicine | 8471 (4.94) | 3449 (2.64) | 1981 (4.25) | 2338 (4.83) | 6242 (7.32) | 2635 (3.09) |
Lancet (London) | 6298 (3.67) | 3306 (2.53) | 55 (0.12) | 96 (0.20) | 4381 (5.14) | 2279 (2.67) |
JAMA | 3392 (1.98) | 1881 (1.44) | 1013 (2.17) | 1123 (2.32) | 2258 (2.65) | 1213 (1.42) |
BMJ (Clinical research ed.) | 3030 (1.77) | 1787 (1.37) | 841 (1.80) | 893 (1.85) | 1896 (2.22) | 1021 (1.20) |
Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology | 2502 (1.46) | 2163 (1.66) | 1080 (2.32) | 1239 (2.56) | 1355 (1.59) | 1012 (1.19) |
Critical Care Medicine | 2610 (1.52) | 2014 (1.54) | 1788 (3.83) | 1787 (3.69) | 769 (0.90) | 520 (0.61) |
The Journal of Urology | 2368 (1.38) | 1312 (1.00) | 863 (1.85) | 806 (1.67) | 498 (0.58) | 432 (0.51) |
Journal of the American College of Cardiology | 2410 (1.40) | 1992 (1.52) | 1509 (3.24) | 1575 (3.26) | 876 (1.03) | 617 (0.72) |
Circulation | 2144 (1.25) | 1819 (1.39) | 1055 (2.26) | 1103 (2.28) | 980 (1.15) | 745 (0.87) |
Annals of Internal Medicine | 1958 (1.14) | 815 (0.62) | 361 (0.77) | 382 (0.79) | 771 (0.90) | 467 (0.55) |
Common PT Termsc | ||||||
Comment | 171 489 (99.96) | 924 (0.71) | 46 644 (100) | 78 (0.16) | 85 252 (100) | 753 (1.20) |
Letter | 85 237 (49.68) | 1464 (1.12) | 2 (<0.01) | 77 (0.16) | 85 252 (100) | 1289 (2.05) |
Editorial | 46 644 (27.19) | 410 (0.31) | 46 644 (100) | 46 (0.10) | 2 (<0.01) | 336 (0.53) |
Journal article | 38 234 (22.29) | 128 646 (98.48) | 4 (<0.01) | 48 213 (99.68) | 4 (<0.01) | 61 204 (97.27) |
Comparative study | 7252 (4.23) | 64 964 (49.73) | 1576 (3.38) | 22 418 (46.35) | 4626 (5.43) | 33 142 (52.67) |
Review | 3469 (2.02) | 2856 (2.19) | 1862 (3.99) | 712 (1.47) | 85 (0.10) | 1612 (2.56) |
Randomized controlled trial | 1052 (0.61) | 44 119 (33.77) | 143 (0.31) | 16 847 (34.83) | 703 (0.82) | 22 246 (35.36) |
Clinical Trial | 1397 (0.81) | 31 582 (24.18) | 193 (0.41) | 11 208 (23.17) | 1057 (1.24) | 18 637 (29.62) |
Multicenter Study | 278 (0.16) | 31 282 (23.95) | 59 (0.13) | 14 403 (29.78) | 128 (0.15) | 13 312 (21.16) |
Evaluation Studies | 222 (0.13) | 9002 (6.89) | 34 (0.07) | 3093 (6.39) | 119 (0.14) | 4089 (6.50) |
Common MeSH Headingsc | ||||||
Humans | 152 748 (89.04) | 124 425 (95.25) | 44 278 (94.93) | 46 464 (96.06) | 81 070 (95.09) | 60 717 (96.50) |
Female | 56 236 (32.78) | 98 816 (75.65) | 14 966 (32.09) | 38 294 (79.17) | 28 900 (33.90) | 47 215 (75.04) |
Male | 48 847 (28.47) | 94 622 (72.44) | 13 889 (29.78) | 37 732 (78.01) | 23 352 (27.39) | 43 673 (69.41) |
Treatment outcome | 9730 (5.67) | 31 215 (23.90) | 3463 (7.42) | 12 546 (25.94) | 4724 (5.54) | 13 291 (21.12) |
Animals | 8586 (5.00) | 7957 (6.09) | 2865 (6.14) | 2641 (5.46) | 3031 (3.56) | 2725 (4.33) |
Middle aged | 4437 (2.59) | 72 855 (55.77) | 1008 (2.16) | 29 897 (61.81) | 2843 (3.33) | 34 318 (54.54) |
Adult | 5559 (3.24) | 60 236 (46.11) | 1258 (2.70) | 21 346 (44.13) | 3422 (4.01) | 30 625 (48.67) |
Aged | 4825 (2.81) | 56 418 (43.19) | 1290 (2.77) | 23 874 (49.36) | 2898 (3.40) | 26 043 (41.39) |
Prospective studies | 1455 (0.85) | 24 878 (19.04) | 264 (0.01) | 10 381 (21.46) | 963 (1.13) | 12 279 (19.52) |
Adolescent | 2572 (1.50) | 20 860 (15.97) | 699 (1.50) | 7096 (14.67) | 1396 (1.64) | 10 269 (16.32) |
Missing | 67 (0.04) | 0 (0.00) | 0 (0.0) | 0 (0.0) | 0 (0.0) | 0 (0.0) |
Common Major MeSH Headingsc | ||||||
Antineoplastic Combined Chemotherapy Protocols/* therapeutic use | 1878 (1.09) | 2134 (1.63) | 543 (1.16) | 913 (1.89) | 887 (1.04) | 915 (1.45) |
Stents* | 1151 (0.67) | 1102 (0.84) | 506 (1.08) | 596 (1.23) | 404 (0.47) | 322 (0.51) |
Quality of Life* | 1107 (0.65) | 1208 (0.92) | 370 (0.79) | 528 (1.09) | 480 (0.56) | 460 (0.73) |
Anti-Bacterial Agents/* therapeutic use | 1096 (0.64) | 868 (0.66) | 290 (0.62) | 304 (0.63) | 632 (0.74) | 475 (0.75) |
Antineoplastic Agents/* therapeutic use | 990 (0.58) | 849 (0.65) | 288 (0.62) | 330 (0.68) | 435 (0.51) | 329 (0.52) |
Hypertension/* drug therapy | 893 (0.52) | 843 (0.65) | 296 (0.63) | 387 (0.80) | 425 (0.50) | 350 (0.56) |
Myocardial Infarction/* therapy | 881 (0.51) | 789 (0.60) | 448 (0.96) | 537 (1.11) | 330 (0.39) | 263 (0.42) |
Breast Neoplasms/* drug therapy | 854 (0.50) | 773 (0.59) | 265 (0.57) | 361 (0.75) | 460 (0.54) | 370 (0.59) |
Antibodies, Monoclonal/* therapeutic use | 777 (0.45) | 733 (0.56) | 232 (0.50) | 326 (0.67) | 378 (0.44) | 333 (0.53) |
Asthma/* drug therapy | 712 (0.42) | 685 (0.52) | 226 (0.48) | 291 (0.60) | 431 (0.51) | 371 (0.59) |
Missing | 67 (0.04) | 0 (0.00) | 0 (0.00) | 0 (0.00) | 0 (0.00) | 0 (0.00) |
Commonly Listed Funding Sources | ||||||
Research Support, Non-US Gov’t | 5971 (3.48) | 61 824 (47.33) | 2314 (4.96) | 24 842 (51.36) | 2050 (2.40) | 27 263 (43.33) |
Research Support, NIH, Extramural | 2564 (1.49) | 12 292 (9.41) | 1374 (2.95) | 5544 (11.46) | 376 (0.44) | 3822 (6.07) |
Research Support, US Gov’t, PHS | 798 (0.47) | 8883 (6.80) | 442 (0.95) | 3658 (7.56) | 97 (0.11) | 4324 (6.87) |
Research Support, US Gov’t, Non-PHS | 451 (0.26) | 3530 (2.70) | 194 (0.42) | 1290 (2.67) | 72 (0.08) | 1412 (2.24) |
Research Support, NIH, Intramural | 120 (0.07) | 536 (0.41) | 57 (0.12) | 202 (0.42) | 16 (0.02) | 168 (0.27) |
Abbreviations: MeSH, medical subject headings; NIH, National Institutes of Health; PT, publication type; PHS, Public Health Service.
The sum of all comments is not expected to equal the sum of editorials and letters, because not all comments are indexed into 1 of those 2 categories.
Journal names were used as presented from extraction; the list of common journals was selected based on comments first (overall).
When selecting common PT terms, MeSH headings, and major MeSH headings, the top 5 for comments (overall) was selected first, followed by the top 5 for articles (overall) that did include terms from the prior selection. For PT terms, those related to research support (eg, “Research Support, Non-US Gov’t”) were listed as a separate category from the PT term list.
There was a total of 46 644 unique editorials commenting on 48 370 unique articles and there was a total of 85 252 unique letters commenting on 62 919 unique articles. The sum of the editorials and letters do not equal the sum of all comments, because not all comments can be indexed into 1 of those 2 categories. For example, an invited commentary that is written by a noneditor for a journal that is written similarly to an editorial would not be indexed as 1, because the author is not part of the editorial team. As a result, editorials and letters serve as 2 prominent categories for comments, but do not necessarily represent the entirety of all comments. Editorials were published in 1654 unique journals, while corresponding articles were published in 1653 unique journals. For letters, they were published in 2606 unique journals, while corresponding articles were published in 2627 unique journals. Only 2 comments were labeled as both an editorial and a letter. Compared to the overall set, similar patterns persisted for both stratifications with the exception of “The Lancet (London)” being relatively uncommon in the editorial group. Expanded presentation of published journals and major MeSH headings are available as word clouds in Supplementary Material Figures 1 and 2.
Publication pattern analysis
A total of 178 951 comment–article pairings were found, with 160 503 (90%) occurring within the same journal. The mean and median time from publication of an article to the publication of a comment was 6 months and 4 months, respectively. For editorials, there were 49 856 such pairings, with mean and median times being 2 months and 0 months, respectively. For letters, there were 71 496 such pairings, with mean and median times being 9 months and 7 months, respectively. Figure 1 presents the cumulative incidence of the length of time for all extracted pairings. Overall, the majority of commented articles received comments within 1 year for all stratifications. A visualization that includes all possible time lengths is provided in Supplementary Material 3.
Figure 1.
Cumulative incidence of comment–article pairings (for time differences between 0 to 24 months).
Content analysis
Of the 5197 comments that were used for content analysis, 1504 were editorials and 1531 were letters; the remaining 2162 did not readily have an available or common MeSH index term to derive a comment type. Supplementary Material Table 1 provides a summary of descriptive statistics for the full-text comments. The subset had a different distribution of journals represented with the most common being “Critical Care (London)” (674; 13%).
After model development, 40 topics was found to be optimal. Table 2 provides labels for each topic as well as the 10 most common terms for that topic (Supplementary Material Table 2 for 20 most common and corresponding probabilities). The majority of topics focused on disease content, but there existed a few topics on interpretation or application of study results, such as “General Scientific Evaluation” (Topic 15), “Diagnostic Evaluation” (Topic 17), and “Participant Characteristics” (Topic 34). Figure 2 presents the percentage of full-text comments that contain potential appraisal topics. Of particular note, “General Scientific Evaluation” was present in 15% of all comments, 8% in all editorials, and 26% of all letters. When examining co-occurrences of topics, “General Scientific Evaluation” commonly co-occurs across all topics in letters, whereas other stratifications did not have as strongly observed patterns (Supplementary Material Figure 4 for all comments, Supplementary Material Figure 5 for editorials, and Supplementary Material Figure 6 for letters).
Table 2.
Topics with their 10 most common termsa
Topic 1: Pain Assessment | Topic 2: Biochemistry | Topic 3: Diabetes | Topic 4: Pulmonology | Topic 5: Non-Specific Article Content |
pain, score, knee, item, scale, instrument, questionnair, joint, replac, osteoarthr | concentr, plasma, metabol, acid, exposur, water, metabolit, vitro, enzym, compound | diabet, insulin, glucos, blood, devic, hypoglycemia, monitor, technolog, glycem, hbac | lung, ventil, respiratori, pressur, ard, pulmonari, airway, volum, acut, asthma | http, com, doi, med, tion, org, content, van, cant, signi |
Topic 6: Surgery and Anesthesia | Topic 7: Genetics | Topic 8: Psychiatric Disease (in Pediatrics) | Topic 9: Critical Care | Topic 10: Cardiology |
surgeri, procedur, surgic, oper, hospit, postop, cardiac, arrest, techniqu, surgeon | gene, dna, sequenc, express, genom, protein, cell, speci, transcript, rna | children, depress, disord, symptom, cognit, pediatr, adult, age, mental, intervent | mortal, sepsi, icu, day, ill, hospit, admiss, shock, septic, hour | heart, cardiac, ventricular, left, myocardi, ablat, atrial, dysfunct, arrhythmia, fraction |
Topic 11: Ischemic Disease | Topic 12: Hemodynamics | Topic 13: Ophthalmology + Journal Correspondence | Topic 14: Metabolic Syndrome | Topic 15: General Scientific Evaluation |
stroke, acut, bleed, coronari, infarct, stent, ischem, anticoagul, myocardi, platelet | fluid, blood, pressur, arteri, flow, oxygen, volum, shock, hemodynam, pulmonari | respond, visual, eye, agre, editor, figur, read, letter, retin, thank | weight, obes, metabol, acid, diet, fat, food, intak, loss, nutrit | bias, error, tabl, figur, calcul, correct, assumpt, meta, read, dataset |
Topic 16: Hepatology | Topic 17: Diagnostic Evaluation | Topic 18: Oncology | Topic 19: Signaling Pathways | Topic 20: Orthopedics + Scientific Communication |
liver, vitamin, hepat, hcc, hcv, cirrhosi, pancreat, fibrosi, serum, defici | score, diagnosi, biomark, diagnost, marker, assay, prognost, diagnos, index, decis | cancer, screen, women, breast, prostat, men, hpv, psa, cervic, androgen | cell, receptor, protein, signal, express, pathway, regul, inhibit, mice, channel | bone, joint, surgeon, corr, orthopaed, hip, opinion, fractur, doi, con |
Topic 21: Oncology | Topic 22: Radiology | Topic 23: Health Systems | Topic 24: Autoimmune Disease | Topic 25: Nervous System |
cancer, tumor, surviv, chemotherapi, node, recurr, phase, inhibitor, tumour, breast | imag, coronari, mri, lesion, scan, tissu, techniqu, plaqu, volum, arteri | health, cost, hospit, countri, servic, program, communiti, healthcar, polici, econom | cell, immun, inflammatori, anti, cytokin, express, inflamm, antibodi, antigen, arthriti | brain, neuron, task, memori, neural, network, behavior, connect, cortex, cognit |
Topic 26: Stem Cells + Pregnancy | Topic 27: Article Meta Information | Topic 28: HIV + Global Health | Topic 29: Kidney Disease | Topic 30: Sleep Medicine |
cell, transplant, infant, pregnanc, women, stem, birth, growth, matern, fetal | usa, intern, conflict, ofth, email, doi, gen, school, health, med | hiv, infect, resist, sexual, art, drug, malaria, transmiss, circumcis, africa | renal, kidney, aki, ckd, serum, injuri, acut, creatinin, dialysi, gfr | sleep, osa, dynam, movement, frequenc, space, shape, circadian, clock, sperm |
Topic 31: Behavioral Intervention | Topic 32: Blood + Oxygenation | Topic 33: Gastroenterology | Topic 34: Participant Characteristics | Topic 35: Bacteriology |
intervent, physician, health, educ, train, particip, behavior, decis, program, communic | blood, transfus, oxid, iron, plasma, mitochondri, trauma, injuri, air, coagul | alcohol, drink, gastric, bowel, symptom, ulcer, intestin, pylori, endoscop, opioid | age, women, older, smoke, sex, ethnic, adjust, white, particip, preval | infect, antibiot, resist, cultur, strain, bacteri, pathogen, bacteria, isol, pneumonia |
Topic 36: Genetic Variation | Topic 37: Pharmacology | Topic 38: Cardiovascular Disease | Topic 39: Physical Exercise + Bone Health | Topic 40: Immunology |
genet, mutat, gene, variant, genotyp, allel, phenotyp, polymorph, frequenc, suscept | dose, drug, placebo, agent, day, safeti, week, regimen, phase, arm | cardiovascular, hypertens, pressur, blood, statin, vascular, heart, arteri, blocker, beta | exercis, muscl, bone, physic, fractur, forc, temperatur, skelet, loss, rehabilit | vaccin, virus, antibodi, influenza, membran, protein, vector, viral, energi, immun |
Note that order of the words within each topic corresponds to higher likelihood terms for that particular topic. For example, in “Topic 1: Pain Assessment,” the term “pain” is a more prominent and likely term that is used to represent that topic as opposed to the term “score.” Supplementary Material Table 2 includes probabilities for each term.
Figure 2.
Percentage of full-text comments that contain 1 of the potential appraisal topics.a
aOf the 40 topics found, these are the 5 topics that were considered to be a potential appraisal topic. In order for a comment to be included, the topic of interest must have a minimum proportion of at least 0.05 amongst the entire distribution of topics that a comment consists of. For example, if the distribution of a comment has Topic 1 at 0.67, Topic 2 at 0.32, and Topic 3 at 0.01, then only Topic 1 and Topic 2 would meet the 0.05 threshold.
There were 520 comments (10% of all full-text comments) for the sentiment analysis. Of them, 260 comments were reviewed by both reviewers, 130 were reviewed by 1 reviewer, and the remaining 130 were reviewed by the other reviewer. Because the manual review process was not limited by the assigned MeSH indices for labeling, the reviewers specified 1 of the following labels for each comment: (1) editorial or other commissioned content; (2) letter to the editor; (3) author reply; and (4) notification of scientific misconduct. During the review process, 7 comments were removed from the set of 520 because they were improperly indexed—either because the comment did not match the article it was commenting on or the comment was not actually a comment. Additionally, there were 5 comments that contained both a letter to the editor and an author’s reply that were indexed as 1 comment; these comments were counted as 2 separate comments for analysis. Altogether, the final total of comments for the manual sentiment analysis became 518 instead of 520.
Of the 518 comments identified for sentiment analysis, 339 (65%) were editorial or other commissioned content, 119 (23%) were letters to the editor, 58 (11%) were author replies, and 2 (< 1%) were notifications of scientific misconduct. Figure 3 provides a visual breakdown of the sentiment analysis (while Supplementary Material Table 3 provides a tabular format). Overall, 346 (67%) were labeled as “generally supportive,” 49 (9%) were labeled as “neutral,” and 123 (24%) were labeled as “generally critical.” When stratifying by type of comment, the following totals were found to be “generally supportive”: 286 (84%) editorials or other commissioned content; 17 (14%) letters to editor, 43 (74%) author replies, and 0 (0%) notifications of scientific misconduct. There was an observed agreement of 78% in labeling sentiment, with a kappa statistic of 0.54.
Figure 3.
Distribution of sentiment for a subset of the full-text comments reviewed.a
aOnly categories with at least 50 comments are displayed. Specifically, the category of “Notification of Scientific Misconduct” is not displayed, because there were only 2 such comments in the sample (both were labeled as “Generally Critical”).
DISCUSSION
This descriptive study provides an original overview of published commentary on clinical research articles as it relates to evidence appraisal. In general, published commentary has low prevalence in the evidence base and tends to be confined to the same journals as opposed to across journals. For journals that contain the most comments, they tend to focus on general medicine as opposed to a specific disease, were established long ago, and have a high impact factor (although, citations from the comments can potentially impact this metric). These characteristics likely motivate individuals to contribute commentary and promote discussion within a journal audience, but also disincentivize engagement with other audiences.
In its entirety, comments demonstrate an overall supportive or promotional tone based on sentiment identified. It is only through stratification of comment type that differences in tone are evident. For example, editorials, which were often published at the same time as the commented upon article, generally presented the article in context of other evidence in a relatively supportive manner. In contrast, letters displayed evidence of communicating concerns about the published study, such as the lack of clarity in variable definitions or doubts regarding statistical assumptions. This becomes particularly apparent when the topic of “general scientific evaluation” is heavily represented in letters as compared to editorials. The finding that letters are the most prominent source of appraisal topics is consistent with the expectation that they provide a mechanism for additional research evaluation as explored in prior work.22,27 Another potential source exacerbating the overall tone is the publication process. Letters usually require review only from the editorial staff, which can ultimately filter out reactions.23 When a letter is potentially published, it can be accompanied by responses from the study authors, which usually involve addressing concerns brought up by the letter. These responses are expected to present the commented article in a positive manner as the authors will likely be defending their work consistent with what was identified in our sentiment analysis. Ultimately, these observations suggest that any analysis on scientific commentary needs to take comment type into careful consideration.
Related to the publication process, any future work should be aware of writing conventions within comments that can muddle their true sentiment. For example, a comment can be published to highlight a study but also acknowledge the study’s limitations specified by the study authors as commonly observed in an editorial. Despite the text mentioning limitations, the primary purpose of the comment is to focus on the importance of the study and thus provide a positive sentiment. This is particularly pertinent when considering an automated sentiment analysis strategy, because not all text in a comment is equally important. In light of this challenge, there exists encouraging work focused on sentence-level analyses of citations that can provide a solid foundation for future work.37–39
Regarding potential implications, perhaps the most striking finding is the low prevalence of commentary as a mechanism for evaluating published studies, as less than 5% of clinical research studies had at least 1 comment. This prevalence is further confounded by comment type, such as commissioned work that juxtaposes the study within the context of other work to discuss its relevance. With a constantly growing number of available clinical research studies available, considerations for prioritizing or incentivizing the production of comments may serve as a worthwhile strategy so that studies are more thoroughly evaluated, complementary to initial peer reviews. Calls for stronger postpublication review have been expressed previously highlighting that that post publication comments can foster idea synthesis, reduce potential research waste, and improve overall research quality.17,40 Although this study only explores 1 mechanism of the postpublication process, it provides original insights as to the current underutilization of published commentary.
Within this study, we were limited to publication type identifiers, and thus could not filter out by more granular stratifications, such as when a comment is invited. Moreover, the results of this analysis may not generalize to comments published outside of PubMed. Similarly, only comments that qualified for our query were extracted. Additional comments could have been missed, such as those that react to comments that do not have a clinical research study-specific indexing. For content analysis, only a subset of comments was obtained leading to limited generalizability for all comments; the sentiment analysis is further exacerbated by this limitation because it only used a sample of that subset. Journals that publish with open-access policies may serve different scientific and practitioner communities and may have different discourse norms. In terms of topic modeling, the models were run on the entirety of comments’ text as opposed to more granular components; applying topic models to paragraph level text may have revealed more nuanced topics. Different techniques, such as using word embeddings with LDA or applying alternative topic modeling techniques, may have also revealed more nuanced topics.41–43 Finally, both topic labeling and sentiment analysis are susceptible to subjectivity as different reviewers may provide different interpretations.
CONCLUSION
This study contributes a large-scale analysis of scientific commentary to better understand its role in evidence appraisal. With the majority of comments existing within the same journals as their target article, any potential appraisal is likely limited to the audience of that journal. Furthermore, our findings suggest published comments are more likely to exist in a supportive perspective unless further stratifications are examined. Ultimately, this descriptive study demonstrates that different types of commentary have different natures, but opportunities exist to utilize this communication mechanism for targeted appraisal evaluations.
Copyright/license for publication
The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide license to the Publishers and its licensees in perpetuity, in all forms, formats, and media (whether known now or created in the future), to i) publish, reproduce, distribute, display, and store the Contribution; ii) translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts, and/or abstracts of the Contribution; iii) create any other derivative work(s) based on the Contribution; iv) to exploit all subsidiary rights in the Contribution; v) the inclusion of electronic links from the Contribution to third party material wherever it may be located; and vi) license any third party to do any or all of the above.
FUNDING
This research was funded by National Library of Medicine grants R01LM009886-10 (PI: Weng) and 5T15LM007079-27 (PI: Hripcsak).
AUTHOR CONTRIBUTIONS
JRR contributed to the design of the study, data analysis, and manuscript drafting. HM contributed to the design of the study, data collection, data analysis, manuscript revisions. LVG contributed to data analysis and manuscript revisions. AG contributed to idea generation, the design of the study, data analysis and manuscript revisions. CW contributed to the idea generation, design of the study and manuscript editing, and supervised the research. The corresponding author is the guarantor of the article and attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
APPROVAL
This study did not involve human subjects, making it exempt from Institutional Review Board (IRB) approval.
Supplementary Material
ACKNOWLEDGMENTS
We thank Chi Yuan for assisting with data extraction assistance from PubMed.
Conflict of Interest statement
None declared.
DATA SHARING
All data and analyses methods are available upon request.
REFERENCES
- 1. Ioannidis JPA, Greenland S, Hlatky MA, et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 2014; 383 (9912): 166–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Ioannidis J. Why most published research findings are false. PLOS Med 2005; 2 (8): e124.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Rothwell PM. External validity of randomised controlled trials: to whom do the results of this trial apply? Lancet 2005; 365 (9453): 82–93. [DOI] [PubMed] [Google Scholar]
- 4. Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999; 282 (11): 1061–6. [DOI] [PubMed] [Google Scholar]
- 5. Brænd AM, Straand J, Klovning A.. Clinical drug trials in general practice: how well are external validity issues reported? BMC Fam Pract 2017; 18 (1): 113.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Goldstein A, Venker E, Weng C.. Evidence appraisal: a scoping review, conceptual framework, and research agenda. J Am Med Inform Assoc 2017; 24 (6): 1192–203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Casadevall A, Ellis LM, Davies EW, et al. A framework for improving the quality of research in the biological sciences. mBio 2016; 7 (4): e01256–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. George SL, Buyse M.. Data fraud in clinical trials. Clin Investig 2015; 5 (2): 161–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Savović J, Jones HE, Altman DG, et al. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med 2012; 157 (6): 429. [DOI] [PubMed] [Google Scholar]
- 10. Guyatt GH, Oxman AD, Vist G, et al. GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol 2011; 64 (4): 407–15. [DOI] [PubMed] [Google Scholar]
- 11. Ferreira-González I, Permanyer-Miralda G, Domingo-Salvany A, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ 2007; 334 (7597): 786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Campbell ST, Kang JR, Bishop JA.. What makes journal club effective?—a survey of orthopaedic residents and faculty. J Surg Educ 2018; 75 (3): 722–9. CrossRef][10.1016/j.jsurg.2017.07.026] [DOI] [PubMed] [Google Scholar]
- 13. Ahmadi N, McKenzie ME, MacLean A, et al. Teaching evidence based medicine to surgery residents-is journal club the best format? A systematic review of the literature. J Surg Educ 2012; 69 (1): 91–100. [DOI] [PubMed] [Google Scholar]
- 14. Honey CP, Baker JA.. Exploring the impact of journal clubs: a systematic review. Nurse Educ Today 2011; 31 (8): 825–31. [DOI] [PubMed] [Google Scholar]
- 15. Wright J. Journal clubs—science as conversation. N Engl J Med 2004; 351 (1): 10–2. [DOI] [PubMed] [Google Scholar]
- 16. Mazuryk M, Daeninck P, Neumann CM, et al. Daily journal club: an education tool in palliative care. Palliat Med 2002; 16 (1): 57–61. [DOI] [PubMed] [Google Scholar]
- 17. Knoepfler P. Reviewing post-publication peer review. Trends Genet 2015; 31 (5): 221–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Faulkes Z. The Vacuum Shouts Back: post publication peer review on social media. Neuron 2014; 82 (2): 258–60. [DOI] [PubMed] [Google Scholar]
- 19. Ghosh SS, Klein A, Avants B, et al. Learning from open source software projects to improve scientific review. Front Comput Neurosci 2012; 6:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Tierney E, O’Rourke C, Fenton JE.. What is the role of ‘the letter to the editor’? Eur Arch Otorhinolaryngol 2015; 272 (9): 2089–93. [DOI] [PubMed] [Google Scholar]
- 21. Collier R. When postpublication peer review stings. CMAJ 2014; 186 (12): 904.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Horton R. Postpublication criticism and the shaping of clinical knowledge. JAMA 2002; 287 (21): 2843–7. [DOI] [PubMed] [Google Scholar]
- 23. Winker MA, Fontanarosa PB.. Letters: a forum for scientific discourse. JAMA 1999; 281 (16): 1543–1543. [DOI] [PubMed] [Google Scholar]
- 24.Publication Characteristics (Publication Types) with Scope Notes. https://www.nlm.nih.gov/mesh/pubtypes.html Accessed January 18, 2019.
- 25.MEDLINE, PubMed, and PMC (PubMed Central). How are they different? https://www.nlm.nih.gov/bsd/difference.html Accessed August 13, 2018.
- 26. Jørgensen L, Paludan-Müller AS, Laursen DRT, et al. Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews. Syst Rev 2016; 5 (1): 80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Kastner M, Menon A, Straus SE, et al. What do letters to the editor publish about randomized controlled trials? A cross-sectional study. BMC Res Notes 2013; 6 (1): 414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Gøtzsche PC, Delamothe T, Godlee F, et al. Adequacy of authors’ replies to criticism raised in electronic letters to the editor: cohort study. BMJ 2010; 341: c3926.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Cock PJA, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009; 25 (11): 1422–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Sayers E. The E-utilities In-Depth: Parameters, Syntax and More. Bethesda (MD): National Center for Biotechnology Information (US); 2017. https://www.ncbi.nlm.nih.gov/books/NBK25499/ Accessed August 13, 2018.
- 31. Chamberlain S, Pearse W.. fulltext: Full Text of “Scholarly” Articles Across Many Data Sources; 2018. https://CRAN.R-project.org/package=fulltext Accessed August 15, 2018.
- 32. Richardson L. beautifulsoup4; 2018. https://pypi.org/project/beautifulsoup4/ Accessed August 13, 2018.
- 33. Shinyama Y. PDFMiner. PDFMiner; 2014. http://www.unixuser.org/∼euske/python/pdfminer/index.html Accessed August 13, 2018.
- 34. Blei DM. Probabilistic topic models. Commun ACM 2012; 55 (4): 77–84. [Google Scholar]
- 35. Blei DM, Ng AY, Jordan MI.. Latent Dirichlet allocation. J Mach Learn Res 2003; 3: 993–1022. [Google Scholar]
- 36. Hornik K, Grün B.. topicmodels: an R package for fitting topic models. J Stat Softw 2011; 40: 1–30. [Google Scholar]
- 37. Kilicoglu H. Biomedical text mining for research rigor and integrity: tasks, challenges, directions. Brief Bioinform 2018; 19 (6): 1400–14. doi: 10.1093/bib/bbx057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Xu J, Zhang Y, Wu Y, et al. Citation sentiment analysis in clinical trial papers. AMIA Ann Symp Proc 2015; 2015: 1334–41. [PMC free article] [PubMed] [Google Scholar]
- 39. Yu B. Automated citation sentiment analysis: what can we learn from biomedical researchers. Proc Am Soc Inf Sci Technol 2013; 50 (1): 1–9. [Google Scholar]
- 40. Bastian H. A stronger post-publication culture is needed for better science. PLoS Med 2014; 11 (12): e1001772.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Moody CE. Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv: 160502019 Cs. http://arxiv.org/abs/1605.02019 Accessed March 19, 2019.
- 42. Lafferty JD, Blei DM.. Correlated topic models In: Weiss Y, Schölkopf B, Platt JC, eds. Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press; 2006: 147–54. [Google Scholar]
- 43. Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 2001; 42 (1/2): 177–96. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.