A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts

Domenic V Cicchetti; Harold O Conn

. 1976 Sep;49(4):373–383.

A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts ¹

Domenic V Cicchetti, Harold O Conn

PMCID: PMC2595507 PMID: 997596

Abstract

Observer variability affects virtually all aspects of clinical medicine and investigation. One important aspect, not previously examined, is the selection of abstracts for presentation at national medical meetings. In the present study, 109 abstracts, submitted to the American Association for the Study of Liver Disease, were evaluated by three “blind” reviewers for originality, design-execution, importance, and overall scientific merit. Of the 77 abstracts rated for all parameters by all observers, interobserver agreement ranged between 81 and 88%. However, corresponding intraclass correlations varied between 0.16 (approaching statistical significance) and 0.37 (p < 0.01). Specific tests of systematic differences in scoring revealed statistically significant levels of observer bias on most of the abstract components. Moreover, the mean differences in interobserver ratings were quite small compared to the standard deviations of these differences. These results emphasize the importance of evaluating the simple percentage of rater agreement within the broader context of observer variability and systematic bias.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Bartko J. J. The intraclass correlation coefficient as a measure of reliability. Psychol Rep. 1966 Aug;19(1):3–11. doi: 10.2466/pr0.1966.19.1.3. [DOI] [PubMed] [Google Scholar]
Cicchetti D. V., Keitges P., Barnett R. N. How many is enough? A statistical study of proficiency testing of syphilis serology specimens. Health Lab Sci. 1974 Oct;11(4):299–305. [PubMed] [Google Scholar]
Fleiss J. L. Measuring agreement between two judges on the presence or absence of a trait. Biometrics. 1975 Sep;31(3):651–659. [PubMed] [Google Scholar]
HORST P. A generalized expression for the reliability of measures. Psychometrika. 1949 Mar;14(1):21–31. doi: 10.1007/BF02290137. [DOI] [PubMed] [Google Scholar]

[OCR_00667] Bartko J. J. The intraclass correlation coefficient as a measure of reliability. Psychol Rep. 1966 Aug;19(1):3–11. doi: 10.2466/pr0.1966.19.1.3. [DOI] [PubMed] [Google Scholar]

[OCR_00652] Cicchetti D. V., Keitges P., Barnett R. N. How many is enough? A statistical study of proficiency testing of syphilis serology specimens. Health Lab Sci. 1974 Oct;11(4):299–305. [PubMed] [Google Scholar]

[OCR_00622] Fleiss J. L. Measuring agreement between two judges on the presence or absence of a trait. Biometrics. 1975 Sep;31(3):651–659. [PubMed] [Google Scholar]

[OCR_00689] HORST P. A generalized expression for the reliability of measures. Psychometrika. 1949 Mar;14(1):21–31. doi: 10.1007/BF02290137. [DOI] [PubMed] [Google Scholar]

PERMALINK

A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts ¹

Domenic V Cicchetti

Harold O Conn

Abstract

Full text

Selected References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts 1

Domenic V Cicchetti

Harold O Conn

Abstract

Full text

Selected References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts ¹