Skip to main content
. 2008 Oct 8;2008(4):MR000002. doi: 10.1002/14651858.MR000002.pub3

1.1. Analysis.

Comparison 1 Technical editing, Outcome 1 study results.

study results
Study  
Peer review and editing reports
Biddle 1996 All of the following results (reported as means and standard deviations) showed significant changes (p<0.01) with a one‐tailed t test. (Increased ease of reading is indicated by lower Gunning Fog scores but by higher Flesch Reading Ease scores.) 
 Computer analysis 
 Gunning Fog (26 case reports): before editing 18.20 (3.11), after editing 15.98 (3.85) 
 Gunning Fog (33 research reports): before editing 19.36 (2.94), after editing 14.90 (2.63) 
 Flesch Reading Ease (26 case reports): before editing 27.14 (8.60), after editing 33.79 (5.64) 
 Flesch Reading Ease (33 research reports): before editing 24.60 (9.34), after editing 32.45 (6.44) 
 Human analysis 
 Gunning Fog (10 research reports): before editing 18.23 (6.47), after editing 15.85 (7.34) 
 Flesch Reading Ease (10 research reports): before editing 26.92 (5.16), after editing 35.78 (11.37) 
 Word length (26 case reports): before editing 2793 (973), after editing 2371 (840) 
 Word length (33 research reports): before editing 4842 (1225), after editing 3609 (1043)
George 1994 RA Error rate of 35% in monitored journals and 48% in journals that did not monitor citations, p=0.066
Goodman 1994 The percentage of manuscripts scoring more than 3 on a 5‐point scale rose by 7.3% (95% CI 3.3 to 11.3) from a baseline of 75% (before peer review and editing). The average item score improved by 0.23 points (95% CI 0.07 to 0.39) from a baseline score of 3.5 (out of a possible 5). A subjective 10‐point global score of quality did not show a statistically discernible change, increasing by 0.29 units (95% CI ‐0.25 to 0.83), p = 0.3, after peer review and editing. Lower quality manuscripts showed more improvement after peer review and editing than did higher quality manuscripts. 
 The largest changes in the 34‐item instrument after peer review and editing were seen in: 
 ‐ Discussion of study limitations (47% to 65%, p<0.001) 
 ‐ Acknowledgment and justification of generalisations (58% to 79%, p<0.001) 
 ‐ Appropriateness of the strength or tone of the conclusions (71% to 85%, p=0.01) 
 ‐ Use of confidence intervals (65% to 81%, p<0.001).
Hobma 1992 RA Submitted articles contained 70 (70/100) citation errors compared to 31/100 (31%) in published articles
Jackson 2003 RA After requiring a copy of the first page of each reference, the error rate fell from 30% (30/100) in 1985 to 11% in 1995
Laccourreye 1999 Median of 1.2 errors per page in 1977, 2.2 in 1987 and 2.5 in 1997. The percentage of articles following the IMRAD (Introduction, Methods, Results And Discussion) structure also showed a statistically significant increase over time, with 100% of the 1997 reports (n=14) following IMRAD. Stricter editorial policies were introduced by the journal in 1990 and Uniform Requirements for Publishing (ICMJE 1991) were also released in 1990.
Lowry 1985 RA Quotation error in correspondence received 7/25 (28%); in correspondence published 7/61 (12%) 
 Citation error in correspondence received 7% (5/67): in correspondence published 3% (7/248) 
 Overall 69% of references in letters received were completely accurate compared to 92% in published letters
Pierie 1996 The 14 questions showing significant improvement dealt with: 
 Introduction (background); Methods (setting, definitions); Results (outcome, statistics, understandability, numerical data); Discussion (significance, other 'proof', limitations); General (abstract, length, general medical value, overall). 
 Questions not showing a statistically significant difference dealt with: 
 Introduction (objective); Methods (inclusion, distinction groups, design); Results (description, tables and figures); Discussion (conclusions, importance of conclusions); General (title). 
 The 11 editing questions showing significant improvement dealt with: 
 Readability (readability, style); Methods (setting, design, measurement technique); Results (presentation, tables and graphs, numerical data); Discussion (conclusion); General (title, references). 
 Questions not showing a statistically significant difference dealt with: 
 Readability (terms, organisation); Methods (time); Results (differences); General (abstract).
Pitkin 1999 Journal A : 8 deficient abstracts out of 44 (18%, 95% CI 6 to 30) 
 Journal B: 19 deficient abstracts out of 44 (43%, 95% CI 29 to 58) 
 Journal C: 13 deficient abstracts out of 44 (30%, 95% CI 16 to 43) 
 Journal D: 20 deficient abstracts out of 44 (45%, 95% CI 30 to 59) 
 Journal E: 14 deficient abstracts out of 44 (32%, 95% CI 18 to 45) 
 Journal F: 30 deficient abstracts out of 44 (68%, 95% CI 54 to 82) 
 The chi‐square test shows a statistically significant difference between journals (chi‐square, with 5 degrees of freedom = 31.3, p<0.001).
Pitkin 2000 All types of non‐trivial deficiencies, except unjustified conclusions, showed decreases: 
 Data inconsistent between abstract and text dropped from 8/50 to 5/50 
 Data present in abstract but not present in text dropped from 9/50 to 1/50 
 Abstracts containing both the above deficiencies dropped from 8/50 to 3/50 
 Unjustified conclusions were present in one of the 50 abstracts both before and after the quality improvement initiative.
Roberts 1994 The Gunning Fog score for the main text improved from 17.16 (SD 1.55) before editing to 16.85 (1.42) after editing (p=0.0005), a lower score indicates greater readability but both scores remained in the 'very difficult' category). The Flesch Reading Ease score was 28.19 (7.89) before editing, improving to 29.11 (7.73) afterwards (p = 0.03). A higher score represents improved readability, although the score after editing just moved from the 'very difficult' to 'difficult' category). The number of words per sentence also dropped significantly after peer review and editing, but there was a small overall increase in the length of both the main text and the abstract.
Siegel 2005 Only one of four major journals (BMJ) showed a significant increase in the number of journal titles that contained information about study methods (incrase from 49% (n=133) in 1995 to 96% ( (n=112) in 2001, p < 0.001).
Silagy 1998 15 abstracts of Cochrane reviews (CR) edited by the journal Evidence‐Based Medicine (EBM) were shorter than the original (330 EBM, 378 CR); and more readable (mean Flesch Reading Ease score of 35.9 EBM, 33.6 CR).
Winker 1999 Over half of a sample of 21 abstracts of accepted articles had deficiencies before the initiative and this dropped to zero out of 27 abstracts afterwards.
Providing instructions to authors
Asano 1995b RA After a requirement for authors to supply the first page of each reference cited, citation error dropped from 48% (45/94) in 1990 to 22% (21/96) in 1994
Fister 2005 Small but statistically significant improvements in completely accurate and technically correct references were seen in the instructional group, or the brief reminder group compared with standard practice. No significant differences were seen for substantive errors (standard practice 437/720 references (61%); brief reminder 311/613 (51%); instructional 365/702 (52%)).
Jackson 2003 RA A significant improvement in citation accuracy from 1985 (30 incorrect references out of 100) to 1995 (11 incorrect references out of 100) is attributed to requiring authors to submit first pages of all references cited in thei manuscripts
Karlawish 1999 Quality of reporting was assessed by using four measures identified from publications outlining research ethics requirements. Reporting of ethics requirements ranged from all 45 papers (100%) reporting their study justification, 36 (80%) papers reporting that informed consent had been obtained (or waived), 18 papers (40%) reporting Institutional Review Board review and 6 (13%) reporting nursing home committee review. For articles published in journals giving no instructions (n=9) the average quality score (out of 4) was 1.4; for the group with instructions less than Uniform requirements (n=7) the average quality score was 2.5, for the group with instructions conforming to the Uniform requirements (n=24) the quality score was 2.4 and for the group conforming to Uniform requirements plus giving additional instructions (n=5) the quality score was 3.2 (Kruskal‐Wallis chi‐square = 11.2, p = 0.01).
Nishina 1995c RA After authors were instructed to consult original sources for references, citation error dropped from 44% (42/96) in 1990 to 29% (28/97) in 1994
Nishina 2000 RA After authors were instructed to consult original sources for references, citation error dropped from 1990 to 26% (25/98) in 1998 and 27% (26/97) in 1999
Pitkin 1998 The types of defects in the 55 defective abstracts were: 
 ‐ inconsistencies between the body of the paper and the abstract (51% of total errors, 95% CI 38% to 64%; n=28) 
 ‐ data in the abstract but not in the body of the paper (29%, 95% CI 17% to 41%; n=15) 
 ‐ both the above defects (15%, 95% CI 10% to 20%; n=8) 
 ‐ unjustified conclusions in the abstract (5%, 95% CI 3 to 7; n=3). 
 Pitkin also surveyed a small sample of 1995 and 1996 issues of four journals for defects in abstracts. The percentage of defective abstracts ranged from 27% to 65% but the investigators did not attempt to identify the cause of this wide range: 
 New England Journal of Medicine; 27% (3 deficient out of 11 abstracts) 
 JAMA; 50% (7 out of 14) 
 American Journal of Obstetrics and Gynecology; 53% (19 out of 36) 
 Pediatrics; 65% (13 out of 20).
Providing instructions to readers
Gross 1994 With an example, 83% of observations (33/40) identified the correct model compared with 86% (36/42) for readers' observations without an example. For ability to derive correct values, the corresponding figures were 88% (35/40) and 57% (24/42).
Structuring abstracts
Booth 1997 Overall searching precision (percentage of references retrieved which were relevant) for ten searches in a simulated database was 45% for structured abstracts and 42% for unstructured abstracts. Search precision was better with structured abstracts than unstructured in five of the ten searches, the same in one search and worse in four searches. 
 Overall searching recall (percentage of 'gold standard' (i.e. all relevant) references retrieved) for ten searches was 32% for structured abstracts and 75% for unstructured abstracts. Recall of structured abstracts was worse in nine of the ten searches and the same for one search.
Comans 1990 Although structured abstracts (n=15) from Annals of Internal Medicine, BMJ and New England Journal of Medicine were judged to be clear and detailed, they often had the following information missing: 
 ‐ sociodemographic features of patients 
 ‐ patient selection methods 
 ‐ methods of statistical analysis 
 Unstructured abstracts (n=21) from Nederlands Tijdschrift voor Geneeskunde often had the following information missing: 
 ‐ details of objective 
 ‐ setting of the study 
 ‐ sociodemographic features of patients and other patient details 
 ‐ details of methods
Dupuy 2003 In a comparison of abstracts of clinical studies published in 2000 in 3 dermatology journals (Archives of Dermatology, British Journal of Dermatology and the Journal of the American Academy of Dermatology), structured abstracts (n=34) scored significantly better than unstructured abstracts (n=15): 0.71 (SD0.11) versus 0.56 (SD0.18); p=0.002). 
 Structured abstracts were longer on average than unstructured abstracts: 256 words (SD77) versus 169 (SD65); p<0.001. 
 A strong positive correlation between length and score was observed for unstructured abstracts (Pearson correlation coefficient 0.75; p=0.002) while no such significant correlation was seen for structured abstracts (Pearson correlation coefficient 0.30, p=0.08)
Harbourt 1995 All 924,478 MEDLINE records for 1989‐1991 were compared with the subset of 3873 records with structured abstracts 
 MeSH: 
 Average of 3 more headings in structured abstracts than in MEDLINE records overall (14.1 structured versus 10.1 overall) 
 Clinical trials: mean of 15.3 headings for structured abstracts (n=581 records) versus overall mean of 13.2 (n=18,495 records) 
 Reviews: mean of 10.1 headings for structured abstracts (n=116 records) versus overall mean of 8.2 (n=92,475 records) 
 Abstract length (n's as for MeSH): 
 Average length of a structured abstract is approximately 700 characters longer than the overall average (1,739.2 structured versus 1,062.8 overall) 
 Clinical trials: mean of 1,826.9 characters for structured abstracts versus overall mean of 1,195.0 
 Reviews: mean of 1,749.1 characters for structured abstracts versus overall mean of 977.3
Hartley 1996a 30 pairs of unstructured and structured (rewritten) abstracts from the British Journal of Educational Psychology were compared for time taken to search for information from the abstracts ‐ readers searched significantly faster and made significantly fewer errors when using structured abstracts.
Hartley 1996b In a companion study to Hartley 1996a, readers also searched significantly faster and made significantly fewer errors when using structured abstracts, although there was a 'learning' effect apparent in those readers who were allocated structured abstracts before unstructured ones.
Hartley 1996c Over 400 readers stated their preferences for different versions of an abstract which was modified in regard to typography, layout and position on the page. The most preferred version was bold capital letters for subheadings, a line‐space above the main heading and centring of the abstract over the top of a the subsequent two‐column article.
Hartley 1997 The readability scores of BMJ and British Journal of Psychiatry (BJP) abstracts published before (20 abstracts from each journal) and after (20 abstracts from each journal) the introduction of structured abstracts showed no significant difference in either the Flesch Reading Ease or Gunning Fog Index (BMJ Flesch t test (one tailed) = 0.12, p = ns, BMJ Gunning Fog 1.03, p = ns; BJP Flesch 0.40, p = ns, Gunning Fog 0.98, p = ns). However abstract length (number of words) was significantly greater in the structured abstracts (BMJ t test (one tailed) = 3.20, p<0.0005; BJP 2.64 p<0.01). When a single editor rewrote 30 unstructured abstracts as structured abstracts, the readability scores were significantly improved, and the abstract length significantly increased (Flesch t test 4.47, p<0.0005; Gunning Fog 2.62, p<0.01; abstract length 5.90, p<0.0005). These results were consistent when 29 unstructured abstracts were rewritten by the original 29 authors (Flesch t test 2.09, p<0.05; Gunning Fog 3.25, p<0.005, abstract length 2.20, p<0.025). 
 When 108 readers were asked to put scrambled sentences of an abstract (with the headings removed) in order, they made fewer errors with structured abstracts (mean 0.69 SD 0.98) than with unstructured ones (mean number of errors 3.40 SD 2.01): t test (two‐tailed) 8.85, p<0.001. However another study involving student readers and some differences in how the information was scrambled, did not show differences in most structured and unstructured abstract comparisons. Sixty‐three readers rated the structured version of a single abstract easier to read on a subjective 10 point scale, compared with the unstructured version (correlated t = 4.89, df 62, p<0.001, two tail test). The mean score was 6.10 (SD 2.01) for the unstructured version and 7.92 (SD 1.83) for the structured version of the abstract.
Hartley 1998 A checklist (based on Taddio 1994) and intended to measure the information content of the abstracts also showed improved scores for the structured versions of the abstracts, the mean score for unstructured abstracts was 6.4 (SD 2.8) out of a possible top score of 22, and the mean score for the structured version of the abstract was 9.1 (SD 2.6), t = 6.04, p (one‐tailed<0.0005). A crude measure suggests that student evaluators took about four minutes to evaluate each unstructured abstract and about three minutes for each structured abstract.
Hartley 2000 30 unstructured abstracts for papers submitted to journals published by the British Psychological Society rewritten as structured abstracts: 
 very similar with regard to accuracy (few inaccuracies in either set of abstracts)
Hartley 2002 When the length of unstructured abstracts was increased or the length of structured abstracts decreased in 15 journals, pagination of articles was not usually affected, except where the journal's pagination policy is start a new article on the same page as a previous article (a fomat rarelu used in scientific journals)
Hartley 2003 24 unstructured abstracts from the Journal of Educational Psychology rewritten as structured abstracts: 
 Abstract length, mean: Structured 186 words [SD 15] versus unstructured 133 [SD 22], p< 0.001 
 Sentence lengths, mean: Structured 20.8 words [SD 3.0} versus unstructured 24.6 [SD 8.3], p < 0.02 
 Percentage of passives, mean: Structured 23.7 [SD 17.3] versus unstructured 32.7 [SD 22.8], pns 
 Flesch reading score, mean: Structured 31.1 [SD 12.1] versus unstructured 21.1 [SD 13.7], p < 0.001 
 Use of longer words, mean score: Structured 35.8 [SD 4.6] versus unstructured 40.0 [SD 5.3], p < 0.001 
 Use of common words, mean score: Structured 61.1 [SD 6.3] versus unstructured 57.7 [SD 8.6], p < 0.01 
 Use of present tense, mean: Structured 4.1 [SD 1.9] versus unstructured 2.7 [2.8], p < 0.01 
 Information checklist, mean score: Structured 9.7 [SD 1.4] versus unstructured 5.5 [SD 1.0], p <0.001 
 Clarity ratings, mean: Structured 7.4 [SD 2.0] versus 6.2 [2.0], p < 0.01
Khosrotehrani 2002 Assessed abstract quality in Annales de Dermatologie before and after the introduction of structured abstracts in 1993: 
 Mean scores (based on Narine): 
 1991‐92: 0.72 (SD ‐0.20), n=8 
 1996: 0.69 (SD ‐0.12) n=17 
 2000: 0.83 (SD ‐0.08) n=18 
 Nonsignificant trend towards improved scores, reported as p = 0.015; should be 0.15?
Scherer 1998 A comparison of unstructured and structured abstracts in the Archives of Ophthalmology showed an improved CONSORT abstract 'score' (maximum score = 9) for the structured abstracts (structured mean 6.8 (standard error of the mean (SEM) 0.7), n=9: unstructured mean 4.6 (SEM 0.4) n=17, p=0.008). However no statistically significant difference in this score was seen for structured abstracts compared with unstructured abstracts in Ophthalmology (structured mean 5.6 (SEM 0.3) n=28; unstructured mean 4.9 (SEM 0.4), n=23). No statistically significant difference was seen for either journal when the CONSORT criteria were scored across the text of the paper rather than just the abstract, and no difference was seen over time (1991/92 compared to 1993/94) for unstructured abstracts in both journals. No statistically significant increase in CONSORT 'score' of the text was seen in either the Archives of Ophthalmology (structured mean score 12.3 (SEM 1.3) n=9; unstructured mean score 15.7 (SEM 1.1) n=17) or Ophthalmology (structured mean score 16.9 (SEM 0.8) n=28; unstructured mean score 16.0 (SEM 0.9) n = 23) when papers with structured abstracts were compared to papers with unstructured abstracts. The authors comment that "reporting of the CONSORT criteria in the text was unimpressive".
Taddio 1994 A comparison of 150 unstructured and 150 structured abstracts in three journals (BMJ, JAMA and CMAJ) showed the structured abstracts to be of higher quality, as measured by 33 objective criteria (unstructured mean score 0.57, structured mean score 0.74, p<0.001). Quality scores did not show a statistically significantly difference between years (1988 and 1989) or between journals except for the comparison between unstructured abstracts in BMJ and JAMA, with a lower score for BMJ abstracts, p<0.05. Two journals provided detailed instructions on how to write an abstract while one did not.
Trakas 1997 Statistically significant improvement in the quality of structured abstracts compared to unstructured abstracts, as measured by a checklist of 29 objective criteria (structured mean score 62.5 out of a possible 100 (SD 11.0); unstructured mean score 53.3 (SD 10.0), F = 9.48, p = 0.03). No statistically significant difference was detected between journal types (pharmacy, medical or health economics) or between years (1990, 1991, 1992, 1993, 1994). There was a correlation between the subjective scores given by experienced raters and the quality of abstracts as measured by the set of objective criteria
Wilczynski 1995 Many search terms were comparable for structured and unstructured abstracts, but some performed better in MEDLINE with structured abstracts, particularly for aetiology and prognosis articles
Wong 2005 Structured abstracts (1991/2 and 2001/2) were of higher quality than unstructured abstracts from 1988/89 issues of the same journals; but no significant improvement in abstract quality was seen between 1991/2 and 2001/2