Abstract
To guide future investigations, the 2007 Medical Image Perception Society meeting held panel discussions to consider the current state of our knowledge of medical image perception and identify important questions to advance our understanding.
The Medical Image Perception Society (MIPS) is composed of scholars studying the processes of perception and recognition of information in medical images. Membership includes radiologists, psychologists, statisticians, physicists, engineers, and others in this growing research community. The members represent universities, hospitals, private companies, and government agencies (eg, National Institutes of Health [NIH], U.S. Food and Drug Administration). Every 2 years, MIPS holds a scientific conference to exchange current research and to conduct tutorials and workshops. The first conference was held in 1985 and the MIPS was incorporated in 1998. The purpose of the society is to promote medical image perception research. The conference enjoys growing participation and attendance, offering a chance for students to interact with seasoned perception researchers in a retreat-workshop setting.
In 1994, the NIH, the National Cancer Institute, and the Conjoint Committee on Diagnostic Radiology established priorities for medical image perception research (1). The priorities identified were to develop psychophysical models for the detection of abnormalities on medical images, to improve understanding of the mechanisms of perception important for interpreting medical images, to develop aids for interacting with displays that enhance perception, to study alternatives to sequential presentation of cross-sectional imaging data, and to perform methodologic research to improve the evaluation of medical imaging systems. These priorities continue to guide research.
The MIPS meeting periodically dedicates a session of its conference to reconsider these priorities, update them, and formulate new goals for research. For example, articles were published in 1998 that were based on an earlier MIPS conference and updated the aims of medical image perception research (2,3). Those goals included mathematical modeling of the detection task, gaining a better understanding of visual search and the nature of expertise, and developing perceptually based standards for image quality, computer-based aids to image perception, and quantitative methods for describing natural images and for measuring human detection and recognition.
The most recent MIPS conference, held in 2007 at the University of Iowa, formulated new thrusts for medical image perception research. Radiologists work in a changing environment. Analog film images are already rare. The exchange of images from one specialty to another to enhance decision making will continue to increase as information systems become more sophisticated and integrated. New types of digital images, acquired with new technologies and processed in new ways, appear every year. The burgeoning field of molecular imaging (4), which unites molecular biology and in vivo imaging, is likely to play a greater clinical role in the future but we know little to date on how best to present this new type of image data to clinicians. Regardless of these changes, the interaction of the eye-brain system with visually presented medical image data will remain at the core of radiology. Continued investigation of this complex perceptual recognition and interpretation process will be needed to offer the most useful and effective presentation of imaging information to physicians and to improve their detection and classification of disease.
It is difficult to directly monitor the ways in which image perception research has been incorporated into clinical practice, but it is clear that the numerous studies conducted over the years have had an impact. For example, many of the studies that facilitated the transition from analog film to soft-copy digital image reading and picture archiving and communication systems were based on a fundamental understanding of human perception. The Digital Imaging and Communications in Medicine (DICOM) gray-scale standard display function is based on the principle of perceptual linearization (5). The specific display function selected by DICOM is based on the Barten model and offers the additional advantage of perceptual linearization (6–9). Perceptual linearization optimizes the display by producing a tone scale in which equal changes in driving level yield changes in luminance that are perceptually equivalent across the entire luminance range. Studies have demonstrated that a perceptually linearized display does indeed improve diagnostic accuracy (10), and now it is inconceivable that electronic displays would be used for diagnostic applications without proper DICOM calibration.
To guide future investigations, at the 2007 MIPS meeting panel discussions were held to consider the current state of our knowledge of medical image perception and to identify important questions to advance our understanding. Discussions reviewed the current state of the human perception of medical images, observer modeling, visual search, display issues, and technology evaluation tools. There were two discussion leaders for each of these areas. Some of the general questions considered were: Does new imaging technology obviate or necessitate perception research? How can visual search be studied in advanced imaging? What is the value of observer modeling for clinical radiologists? How will the stimulus be defined for new and innovative observer models? What is the future of computer-aided diagnosis? Are there innovations in the pattern recognition literature that will make a difference? Will computer-aided diagnosis research be reabsorbed by the more general image processing effort? What standards will be applied to color displays, and how will we know whether they perform adequately for gray-scale images? Should there be broader standards for displays used in low-cost reading (eg, personal digital assistants and cell phones)? Is there consensus on the appropriate methods for measuring observer performance? How should the use of more modern data analysis methods be encouraged? What are the limitations of our current receiver operating characteristic (ROC) methods?
The following sections present the results of these discussions.
Human Perception of Medical Images
Background
The pattern recognition provided by radiologists is a fundamental determinant of performance of diagnostic systems (11–13). As long as radiologists interpret medical images, the limitations of human detection of abnormalities will influence the overall performance of diagnostic systems. Abnormalities on medical images are sometimes missed (14–16). We need to know the causes of these errors to find ways of eliminating them (11). We need to better understand the cognitive and perceptual mechanisms that underlie the discovery and reporting of abnormalities, as well as the formulation of diagnoses. By studying how the observer allocates attention across images, we may inform the training of radiologists, the development of effective machine readers, and ultimately provide better images and display devices.
New Research Questions
(a) In current clinical practice, radiologists rarely get feedback about their errors in diagnosis. Will this change with the Integrated Health Enterprise? If so, will feedback change how radiologists interpret images (17? Radiologists interpret more imaging studies and more images in each study (18). This increase in workload creates the potential for errors based on perceptual and cognitive overload. Computer-aided diagnosis (CAD) and other automated image analysis tools may help, but the tools currently do not seem to help all radiologists and are not available for many interpretation tasks (19,20). Perception research may help us to better guide the development of interpretation aids. (b) As the field of molecular imaging is so new and so different from what radiologists and other clinicians are used to, what viewing formats for molecular imaging are most effective?
Observer Modeling
Background
Observer modeling can be used to assess image quality throughout the imaging chain and to predict observer performance (21–23). Although observer models do not eliminate the need for observer studies, they can be used to narrow the range of experimental conditions that need to be examined with actual observers. In a model of the ideal observer, a theoretical observer takes advantage of all information contained in an image. These models have traditionally served as the basis for evaluating performance of image acquisition. The value of observer models has been underappreciated by the clinical community as shown by the small number of articles published in clinically oriented radiology journals. Greater applicability to clinical radiology has been a recent goal of this field. Newer versions of visual discrimination models more closely simulate the human visual system and recent investigations use more elements of real clinical images and abnormalities rather than just mathematically simulated backgrounds and targets (24).
New Research Questions
(a) Are observer models useful for studying clinical radiology and radiologists? Can observer models be used to model individual radiologists? (b) Can we use observer models for training? If we could show trainees what an ideal observer would do for each diagnostic modality and task, we might be able to produce more rapid perceptual learning. (c) Most observer modeling is still very basic, using simulated signals in simulated or real backgrounds without visual search. Radiology images rely increasingly on dynamic and complex display. How can ideal observer models be extended to clinical modalities?
Visual Search
Background
The experienced radiologist processes a substantial amount of diagnostic information in a short time, but not without moving his or her eyes over the images to gather visual information (11–13). Perception researchers have used eye-tracking technology extensively to study visual search in radiology (25–29). If we can understand visual search of medical images, we may be able to explain why errors of diagnosis occur and what can be done to improve performance. We also may be better able to train residents to become expert radiologists (30,31).
New Research Questions
(a) Is the eye-tracking technology sufficiently easy and robust for routine use in the clinic to provide feedback? (b) Can we begin to study visual search of more complex imaging such as scrolling through computed tomographic (CT) sections or three-dimensional images? (c) Can CAD be improved by the study of human eye-position recording? Can the CAD focus on some aspects of images compensate for the lack of attention by human readers?
Display Issues
Background
Ergonomic and human factors issues of displays and display interfaces are becoming more important to radiologists as new technologies emerge and the clinical environment becomes more computer based (32,33). For example, fatigue of radiologists interpreting electronic display images has been considered by a few researchers, but we are only beginning to determine exactly how radiologists and radiology residents are affected (34–36). If they are affected and diagnostic accuracy is impaired as a function of fatigue, we need to understand, for example, whether we need to develop countermeasures at the physiologic level (eg, oculomotor control) or at the decision-making level (eg, improved use of computer-aided decision tools).
New Research Questions
(a) Does the wide variability of the types and configurations of displays offered by different vendors affect the perception of images? For example, does diagnostic accuracy differ as a function of using a 3-megapixel monochrome liquid crystal display versus a 3-megapixel monochrome display? If only color is considered, is diagnostic performance influenced if the color display technology is based on in-plane switching (IPS) using cold cathode fluorescent lamp (CCFL) and light-emitting diode backlights, a vertical alignment using CCFL, or a twisted nematic using CCFL? What are the implications for clinical practice and laboratory research? Perception research should help us understand these variations and optimize the radiology reading environment. (b) Is it worth developing training sets of imaging studies with proved diagnoses to reduce differences among radiologists in detection accuracy?
Technology Evaluation Methodology
Background
Over the years, many improvements in receiver operator characteristic (ROC) methods have been introduced at the Far West/MIPS meetings, as well as other psychophysical and statistical methods (37–40). Feedback from the MIPS audience has proved useful to developers of new technology evaluation methods because MIPS conference attendees often make extensive use of these methods in their own research.
New Research Initiatives
(a) We need additional ROC designs, such as split-plot design and unbalanced designs, and to be able to handle missing data. In some multifactor study designs it is often not possible to completely randomize the order of the runs or trials within the block or study session, so the randomization becomes restricted. This often results in a generalization of the randomized block design called split-plot design. Unbalanced designs refer to studies in which, for example, not all subjects provide responses to every question or not all subjects complete all parts of a study. The result is that there are missing data. In radiology observer studies this can be a common problem, since radiologists are often quite busy and sometimes are not able to complete large studies. Instead of eliminating them from the analysis and discarding the data that was acquired, it would be useful to have statistical methods that could deal with the missing data. (b) Scoring limitations of the classic ROC approach are still with us. Patients often have multiple lesions or abnormalities on their images. We are in need of a widely accepted analytic method to cope with multiple responses. Moreover, there are diagnostic tasks that require more than just a binary decision; further exploration of three-class and multiclass decision paradigms is needed. (c) Are there better ways to establish disease state truth?
Benefits to Clinical Radiology
The ultimate goal of researchers involved in the development and evaluation of better hardware and software for the presentation of medical image data to the clinician is not necessarily to provide clinicians with the most “beautiful” image, but rather with the image that allows them to render the most accurate and timely interpretation. It is important to appreciate that clinical image interpretation is a moving target—technology has changed and has altered the nature of image interpretation. Our continued study of medical image perception and the general interaction of clinicians with medical imaging examinations remains a critical element of improving health care.
Acknowledgments
We thank the following discussion leaders for their excellent facilitation of the panel discussions: Craig Abbey, PhD (University of California Santa Barbara); David Channin, MD (Northwestern University); Miguel Eckstein, PhD (University of California Santa Barbara); Stephen Hillis, PhD (University of Iowa); David Manning, PhD (St Martin's College, England); Claudia Mello-Thoms, PhD (University of Pittsburgh).
Received February 7, 2009; revision requested March 16; revision received April 6; final version accepted May 5.
Funding: This research was supported by the National Institutes of Health [grant R13EB007885].
Authors stated no financial relationship to disclose.
References
- 1.Kundel HL. Medical image perception. Acad Radiol 1995;2(suppl 2):S108–S110 [DOI] [PubMed] [Google Scholar]
- 2.Krupinski EA, Kundel HL. Update on long-term goals for medical image perception research. Acad Radiol 1998;5:629–633 [DOI] [PubMed] [Google Scholar]
- 3.Krupinski EA, Kundel HL, Judy PF, Nodine CF. The Medical Image Perception Society: key issues for image perception research. Radiology 1998;209:611–612 [DOI] [PubMed] [Google Scholar]
- 4.Schober O, Rahbar K, Riemann B. Multimodality molecular imaging: from target description to clinical studies. Eur J Nucl Med Mol Imaging 2009;36:302–314 [DOI] [PubMed] [Google Scholar]
- 5.DICOM-14 Digital Imaging and Communications in Medicine (DICOM), part 14: grayscale standard display function PS 3.14-2006. http://medical.nema.org Accessed March 27, 2009
- 6.Blume H. The ACR/NEMA proposal for a grey-scale display function standard. In: Proceedings of SPIE: medical imaging 1996 Vol 2707 Bellingham, Wash: International Society for Optical Engineering, 1996;344–360 [Google Scholar]
- 7.Blume H, Ho AMK, Stevens F, Steven PM. Practical aspects of grayscale calibration of display systems. In: Proceedings of SPIE: medical imaging 2001 Vol 4323 Bellingham, Wash: International Society for Optical Engineering, 2001;28–41 [Google Scholar]
- 8.Blume H, Steven P, Cobb M, et al. Characterization of high-resolution liquid crystal displays for medical images, Part I. In: Proceedings of SPIE: medical imaging 2002 Vol 4681 Bellingham, Wash: International Society for Optical Engineering, 2002;271–292 [Google Scholar]
- 9.Blume H, Steven P, Ho A, et al. Characterization of liquid-crystal displays for medical images: Part 2. In: Proceedings of SPIE: medical imaging 2003 Vol 5029 Bellingham, Wash: International Society for Optical Engineering, 2003;449–473 [Google Scholar]
- 10.Krupinski EA, Roehrig H. The influence of a perceptually linearized display on observer performance and visual search. Acad Radiol 2000;7:8–13 [DOI] [PubMed] [Google Scholar]
- 11.Kundel HL, Nodine CF, Carmody DP. Visual scanning, pattern recognition and decision-making in pulmonary tumor detection. Invest Radiol 1978;13:175–181 [DOI] [PubMed] [Google Scholar]
- 12.Thomas EL, Lansdown EL. Visual search patterns of radiologists in training. Radiology 1963;81:288–291 [DOI] [PubMed] [Google Scholar]
- 13.Tuddenham WJ, Calvert WP. Visual search patterns in roentgen diagnosis. Radiology 1961;76:255–256 [DOI] [PubMed] [Google Scholar]
- 14.Robinson PJ. Radiology's Achilles’ heel: error and variation in the interpretation of the roentgen image. Br J Radiol 1997;70:1085–1098 [DOI] [PubMed] [Google Scholar]
- 15.Muhm JR, Miller WE, Fontana RS, et al. Lung cancer detection during a screening program using 4-month chest radiographs. Radiology 1983;148:609–615 [DOI] [PubMed] [Google Scholar]
- 16.Beam CA, Conant EF, Sickles EA. Correlation of radiologist rank as a measure of skill in screening and diagnostic interpretation of mammograms. Radiology 2006;238:446–453 [DOI] [PubMed] [Google Scholar]
- 17.Laming D., Warren R. Improving the detection of cancer in the screening of mammograms. J Med Screen 2000;7:24–30 [DOI] [PubMed] [Google Scholar]
- 18.Bhargavan M, Sunshine JH. Workload of radiologists in the United States in 2002–2003 and trends since 1991–1992. Radiology 2005;236:920–931 [DOI] [PubMed] [Google Scholar]
- 19.Taylor P, Potts HW. Computer aids and human second reading as interventions in screening mammography: two systematic reviews to compare effects on cancer detection and recall rate. Eur J Cancer 2008;44:798–807 [DOI] [PubMed] [Google Scholar]
- 20.Doi K. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph 2007;31:198–211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Barrett HH, Yao J, Rolland JP, Myers KJ. Model observers for assessment of image quality. Proc Natl Acad Sci U S A 1993;90:9758–9765 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wagner RF, Brown DG. Unified SNR analysis of medical imaging systems. Phys Med Biol 1985;30:489–518 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Barrett HH, Myers KJ. Foundations of image science Hoboken, NJ: Wiley, 2004 [Google Scholar]
- 24.Johnson JP, Krupinski EA, Nafziger JS, Yan M, Roehrig H. Visually lossless compression of breast biopsy virtual slides for telepathology. In: Proceedings of SPIE: medical imaging 2009 Vol 7263 Bellingham, Wash: International Society for Optical Engineering, 2009;72630N-1–72630N-8 [Google Scholar]
- 25.Kundel HL, Nodine CF, Krupinski EA, Mello-Thoms C. Using gaze-tracking data and mixture distribution analysis to support a holistic model for the detection of cancers on mammograms. Acad Radiol 2008;15:881–886 [DOI] [PubMed] [Google Scholar]
- 26.Kundel HL, Nodine CF, Krupinski EA. Searching for lung nodules: visual dwell indicates locations of false-positive and false-negative decisions. Invest Radiol 1989;24:472–478 [PubMed] [Google Scholar]
- 27.Kundel HL, Nodine CF, Toto L. Searching for lung nodules: the guidance of visual scanning. Invest Radiol 1991;26:777–781 [DOI] [PubMed] [Google Scholar]
- 28.Berbaum KS, Brandser EA, Franken EA, Dorfman DD, Caldwell RT, Krupinski EA. Gaze dwell times on acute trauma injuries missed because of satisfaction of search. Acad Radiol 2001;8:304–314 [DOI] [PubMed] [Google Scholar]
- 29.Manning D, Barker-Mill SC, Donovan T, Crawford T. Time-dependent observer errors in pulmonary nodule detection. Br J Radiol 2006;79:342–346 [DOI] [PubMed] [Google Scholar]
- 30.Krupinski EA. Visual scanning patterns of radiologists searching mammograms. Acad Radiol 1996;3:137–144 [DOI] [PubMed] [Google Scholar]
- 31.Nodine CF, Mello-Thoms C, Kundel HL, Weinstein SP. Time course of perception and decision making during mammographic interpretation. AJR Am J Roentgenol 2002;179:917–923 [DOI] [PubMed] [Google Scholar]
- 32.Krupinski EA, Kallergi M. Choosing a radiology workstation: technical and clinical considerations. Radiology 2007;242:671–682 [DOI] [PubMed] [Google Scholar]
- 33.Goyal N, Jain N, Rachapalli V. Ergonomics in radiology. Clin Radiol 2009;64:119–126 [DOI] [PubMed] [Google Scholar]
- 34.Burling D, Halligan S, Altman DG, Atkin W, Bartram C, et al. CT colonography interpretation times: effect of reader experience, fatigue, and scan findings in a multi-centre setting. Eur Radiol 2006;16:1745–1749 [DOI] [PubMed] [Google Scholar]
- 35.Vertinsky T, Forster B. Prevalence of eye strain among radiologists: influence of viewing variables on symptoms. AJR Am J Roentgenol 2005;184:681–686 [DOI] [PubMed] [Google Scholar]
- 36.Krupinski EA, Berbaum KS, Caldwell R. Impact of visual fatigue on observer performance. In: Proceedings of SPIE: medical imaging 2009 Vol 7263 Bellingham, Wash: International Society for Optical Engineering, 2009;72631O-1–72631O-8 [Google Scholar]
- 37.Chakraborty DP, Yoon HJ. JAFROC analysis revisited: figure-of-merit considerations for human observer studies. In: Proceedings of SPIE: medical imaging 2009 Vol 7263 Bellingham, Wash: International Society for Optical Engineering, 2009;72630T-1–72630T-12 [Google Scholar]
- 38.Paquerault S, Samuelson FW, Myers KJ, Smith RC. Non-localization and localization ROC analyses using clinically based scoring. In: Proceedings of SPIE: medical imaging 2009 Vol 7263 Bellingham, Wash: International Society for Optical Engineering, 2009;72630U-1–72630U-9 [Google Scholar]
- 39.Gallas BD, Pesce LL. Comparison of ROC methods for partially paired data. In: Proceedings of SPIE: medical imaging 2009 Vol 7263 Bellingham, Wash: International Society for Optical Engineering, 2009; 72630V-1–72630V-12 [Google Scholar]
- 40.Hillis SL, Berbaum KS, Metz CE. Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. Acad Radiol 2008;15:647–661 [DOI] [PMC free article] [PubMed] [Google Scholar]