Skip to main content
The Yale Journal of Biology and Medicine logoLink to The Yale Journal of Biology and Medicine
. 1976 Sep;49(4):373–383.

A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts 1

Domenic V Cicchetti, Harold O Conn
PMCID: PMC2595507  PMID: 997596

Abstract

Observer variability affects virtually all aspects of clinical medicine and investigation. One important aspect, not previously examined, is the selection of abstracts for presentation at national medical meetings. In the present study, 109 abstracts, submitted to the American Association for the Study of Liver Disease, were evaluated by three “blind” reviewers for originality, design-execution, importance, and overall scientific merit. Of the 77 abstracts rated for all parameters by all observers, interobserver agreement ranged between 81 and 88%. However, corresponding intraclass correlations varied between 0.16 (approaching statistical significance) and 0.37 (p < 0.01). Specific tests of systematic differences in scoring revealed statistically significant levels of observer bias on most of the abstract components. Moreover, the mean differences in interobserver ratings were quite small compared to the standard deviations of these differences. These results emphasize the importance of evaluating the simple percentage of rater agreement within the broader context of observer variability and systematic bias.

Full text

PDF
373

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Bartko J. J. The intraclass correlation coefficient as a measure of reliability. Psychol Rep. 1966 Aug;19(1):3–11. doi: 10.2466/pr0.1966.19.1.3. [DOI] [PubMed] [Google Scholar]
  2. Cicchetti D. V., Keitges P., Barnett R. N. How many is enough? A statistical study of proficiency testing of syphilis serology specimens. Health Lab Sci. 1974 Oct;11(4):299–305. [PubMed] [Google Scholar]
  3. Fleiss J. L. Measuring agreement between two judges on the presence or absence of a trait. Biometrics. 1975 Sep;31(3):651–659. [PubMed] [Google Scholar]
  4. HORST P. A generalized expression for the reliability of measures. Psychometrika. 1949 Mar;14(1):21–31. doi: 10.1007/BF02290137. [DOI] [PubMed] [Google Scholar]

Articles from The Yale Journal of Biology and Medicine are provided here courtesy of Yale Journal of Biology and Medicine

RESOURCES