Abstract
Observer variability affects virtually all aspects of clinical medicine and investigation. One important aspect, not previously examined, is the selection of abstracts for presentation at national medical meetings. In the present study, 109 abstracts, submitted to the American Association for the Study of Liver Disease, were evaluated by three “blind” reviewers for originality, design-execution, importance, and overall scientific merit. Of the 77 abstracts rated for all parameters by all observers, interobserver agreement ranged between 81 and 88%. However, corresponding intraclass correlations varied between 0.16 (approaching statistical significance) and 0.37 (p < 0.01). Specific tests of systematic differences in scoring revealed statistically significant levels of observer bias on most of the abstract components. Moreover, the mean differences in interobserver ratings were quite small compared to the standard deviations of these differences. These results emphasize the importance of evaluating the simple percentage of rater agreement within the broader context of observer variability and systematic bias.
Full text
PDFSelected References
These references are in PubMed. This may not be the complete list of references from this article.
- Bartko J. J. The intraclass correlation coefficient as a measure of reliability. Psychol Rep. 1966 Aug;19(1):3–11. doi: 10.2466/pr0.1966.19.1.3. [DOI] [PubMed] [Google Scholar]
- Cicchetti D. V., Keitges P., Barnett R. N. How many is enough? A statistical study of proficiency testing of syphilis serology specimens. Health Lab Sci. 1974 Oct;11(4):299–305. [PubMed] [Google Scholar]
- Fleiss J. L. Measuring agreement between two judges on the presence or absence of a trait. Biometrics. 1975 Sep;31(3):651–659. [PubMed] [Google Scholar]
- HORST P. A generalized expression for the reliability of measures. Psychometrika. 1949 Mar;14(1):21–31. doi: 10.1007/BF02290137. [DOI] [PubMed] [Google Scholar]