Although a reader may have a personal preference for a specific CT colonographic interpretation technique, with proper training for use of two-dimensional and three-dimensional methods, comparable performance can be achieved.
Abstract
Purpose:
To determine whether the reader’s preference for a primary two-dimensional (2D) or three-dimensional (3D) computed tomographic (CT) colonographic interpretation method affects performance when using each technique.
Materials and Methods:
In this institutional review board–approved, HIPAA-compliant study, images from 2531 CT colonographic examinations were interpreted by 15 trained radiologists by using colonoscopy as a reference standard. Through a survey at study start, study end, and 6-month intervals, readers were asked whether their interpretive preference in clinical practice was to perform a primary 2D, primary 3D, or both 2D and 3D interpretation. Readers were randomly assigned a primary interpretation method (2D or 3D) for each CT colonographic examination. Sensitivity and specificity of each method (primary 2D or 3D), for detecting polyps of 10 mm or larger and 6 mm or larger, based on interpretive preference were estimated by using resampling methods.
Results:
Little change was observed in readers’ preferences when comparing them at study start and study end, respectively, as follows: primary 2D (eight and seven readers), primary 3D (one and two readers), and both 2D and 3D (six and six readers). Sensitivity and specificity, respectively, for identifying examinations with polyps of 10 mm or larger for readers with a primary 2D preference (n = 1128 examinations) were 0.84 and 0.86, which was not significantly different from 0.84 and 0.83 for readers who preferred 2D and 3D (n = 1025 examinations) or from 0.76 and 0.82 for readers with a primary 3D preference (n = 378 examinations). When performance by using the assigned 2D or 3D method was evaluated on the basis of 2D or 3D preference, there was no difference among those readers by using their preferred versus not preferred method of interpretation. Similarly, no significant difference among readers or preferences was seen when performance was evaluated for detection of polyps of 6 mm or larger.
Conclusion:
The reader’s preference for interpretive method had no effect on CT colonographic performance.
© RSNA, 2011
Introduction
One of the major issues limiting widespread acceptance of computed tomographic (CT) colonography is the lack of consistently high performance in multiple studies (1–5). The two largest prospective studies to date had excellent results for detecting polyps 10 mm and larger, with sensitivity of 90%–94% and specificity of 86%–96%, by using colonoscopy as a reference standard (1,2). Researchers in other studies, however, have reported sensitivity as low as 55% (4,5). Several possible reasons for this variability include lack of adequate CT colonographic training, lack of stool tagging, suboptimal scanning or interpretation technique, and differences in the patient populations studied.
Currently, training in CT colonographic interpretation is variable, with no consensus opinion about how images should be best evaluated. In fact, there are often two camps of CT colonographic readers, those who prefer a primary two-dimensional (2D) approach and those who prefer a primary three-dimensional (3D) approach. Of course, knowing lesions can be missed with both techniques, some readers may prefer to perform both 2D and 3D evaluations. To date to our knowledge, no researchers have evaluated whether the interpretive preferences of CT colonographic readers affect readers’ performance. For instance, if a reader prefers a primary 2D approach, is that reader’s performance actually better with the use of a primary 2D approach compared with a primary 3D approach? Although the National CT Colonography Trial study results indicated there was no difference in performance by using a primary 2D or a primary 3D technique (1), there was no analysis about whether readers’ preferences affected performance.
The purpose of this study was to determine whether the reader’s preference for a primary 2D or a primary 3D CT colonographic interpretation method affects performance when using each technique.
Materials and Methods
This study was supported by a financial grant from the National Cancer Institute in cooperation with the American College of Radiology Imaging Network (ACRIN). Two authors (C.D.J and A.K.H) report holding patents and license agreements with and receiving royalties from GE Healthcare (Waukesha, Wis), which produces CT colonographic software. One author (A.K.H.) did not use GE Healthcare software for data interpretation. Authors who had no conflict of interest had control of the data and the information submitted for publication.
Patient Population and Reference Standard
Asymptomatic participants 50 years or older who were prescheduled for routine colonoscopy were recruited from 15 participating sites across the United States in the ACRIN Trial between February 2005 and December 2006 (1). All sites complied with the Health Insurance Portability and Accountability Act, and approval was obtained from the institutional review board at each site. Enrolled participants completed a clinical CT colonographic examination, followed by colonoscopy.
A participant with a test positive for CT colonographic interpretation was defined as a participant with any CT colonographic finding of 5 mm or larger. A participant with true-positive results, by using colonoscopy as reference standard, was defined as a participant with at least one histologically confirmed adenoma or adenocarcinoma that was 10 mm or larger or 6 mm or larger. Lesion size was determined from the pathology report, unless the lesion was not wholly resected; for these lesions, colonoscopy-derived size estimates were used.
CT Colonographic Readers
Fifteen radiologists (nine men, six women; mean age, 43 years; range, 34–55 years) from academic (n = 12) and private (n = 3) practices participated in the ACRIN Trial. The readers’ experience was as follows: Seven readers had extensive experience (>200 CT colonographic cases), four readers had moderate experience (100–200 CT colonographic cases), and four readers had minimal experience (<100 CT colonographic cases). All readers were required to complete a training test before the study to demonstrate their ability to detect 90% of polyps 1 cm and larger in size.
CT colonographic images were evaluated by using one of five self-chosen software platforms (Vital Images, Minnetonka, Minn; Siemens, Malvern, Pa; GE Healthcare; TeraRecon, Foster City, Calif; Viatronix, Stony Brook, NY) and a randomly assigned primary image review method. The two methods used were a primary 2D interpretation with 3D problem solving or a primary 3D interpretation with 2D problem solving.
Survey of Readers’ Preferences
Readers reported their primary interpretation preference through a survey at study activation, every 6 months, and at study end. In addition, if the readers changed their interpretation preference at different intervals during the study, this change was also reported at the time the change occurred. In this way, the reader’s primary interpretive preference could retrospectively be linked to performance for individual cases.
The survey question was as follows: When performing CT colonography in clinical practice (outside the study), do you prefer: (a) primary 2D evaluation, (b) primary 3D evaluation, (c) both — complete 2D and 3D evaluation, or (d) other?
Comparison of Readers’ Performance on the Basis of Preference
In a previous analysis of these data, it was shown that there was no difference in performance whether a primary 2D or 3D method was used (1). For this study, we evaluated whether there was a difference in performance on the basis of the reader’s preference for performing a primary 2D or a primary 3D read or both. For example, do readers who prefer a primary 3D evaluation outperform readers who prefer a primary 2D evaluation? To answer this question, polyp detection results obtained from the primary study were retrospectively classified into three groups on the basis of preference: (a) readers with a stated preference for a primary 2D interpretation, (b) readers with a stated preference for a primary 3D interpretation, and (c) readers with a stated preference to perform both a 2D and 3D interpretation. Sensitivity and specificity for detecting polyps of 10 mm or larger and 6 mm or larger were compared among the groups.
Subanalysis of Readers’ Performance within Each Preference Group
A second analysis was performed to determine whether readers performed better by using their preferred versus their nonpreferred method. In other words, do readers who have a preference for a primary 2D evaluation actually perform better by using 2D rather than 3D? For this analysis, we compared the performance of all readers in each stated preference group (2D, 3D, or both) for those examinations in which they were assigned a primary 2D read versus their performance for examinations in which they were assigned a primary 3D read.
Statistical Analysis
Individual preferences were linked to each local read according to date of interpretation and date of questionnaire completion. The effect of reader’s preference on accuracy was assessed from all reads linked to a particular preference for each reader. To summarize diagnostic accuracy according to the reader’s preference while accounting for reader correlation, we used resampling methods. We randomly sampled with replacement up to 15 reader estimates and recorded their arithmetic mean. In addition, differences in the mean estimates for each preference were calculated. These bootstrap procedures were repeated 10 000 times for sensitivity and specificity of all preferences and their differences. The point estimate and 95% confidence intervals (CIs) are the 5000th, 250th and 9750th estimates from the ordered mean estimates. Statistical analysis was performed by using software (SAS, version 9.1; SAS Institute, Cary, NC), and a P value of .05 or less was considered to indicate a significant difference.
Results
Interpretive Preference Groups
Readers who stated a preference for primary 2D read the most CT colonographic studies (n = 1128), followed by readers who preferred 2D and 3D (n = 1025). The least amount of studies were interpreted by readers who stated a preference for primary 3D (n = 378).
At both study start and study end, the readers were nearly evenly divided between those who preferred a primary 2D interpretation method (eight at study start, seven at study end) and those who preferred a complete 2D and 3D evaluation (six at study start, six at study end) (Table 1). The least amount of readers preferred a primary 3D evaluation (one at study start, two at study end).
Table 1.
Note.—The readers were nearly evenly divided between those who preferred a primary 2D interpretation and those who preferred a complete 2D and 3D evaluation combined. For primary 2D, eight readers preferred the method at study start, and seven readers preferred the method at study end. For primary 3D, one reader preferred the method at study start, and two readers preferred the method at study end. For complete 2D and 3D evaluation combined, six readers preferred the method at study start, and six readers preferred the method at study end.
Comparison of Readers’ Performance on the Basis of Preferences
Our results showed that there was no significant difference in overall performance on the basis of readers’ preferences (Tables 2–4). The sensitivity and specificity values, respectively, for identifying examinations with polyps 10 mm or larger for readers with a primary 2D preference (n = 1128 examinations) were 0.84 and 0.86, and these values were not significantly different from the values of 0.84 and 0.83 for readers with a 2D and 3D preference (n = 1025 examinations) or from the values of 0.76 and 0.82 for readers with a primary 3D preference (n = 378 examinations).
Table 2.
Note.—Resampling methods were used to calculate sensitivity and specificity. No significant differences were observed between 2D and 3D performance within each preference. No significant differences were observed among readers who preferred primary 2D, readers who preferred primary 3D, and readers who preferred both 2D and 3D combined per assigned interpretation method.
Bootstrap method was used to determine 95% CIs.
Table 4.
Note.—Reader preference had no effect on performance. Resampling methods were used to calculate differences in sensitivity and specificity. No significant differences were observed among readers who preferred primary 2D, readers who preferred primary 3D, and readers who preferred both 2D and 3D combined per assigned interpretation method.
Bootstrap method was used to determine 95% CIs.
For examinations with polyps 6 mm or larger (Table 3), there was decreased sensitivity (0.70–0.75) but stable specificity (0.84–0.88) for each preference group when comparing examinations with polyps of 10 mm or larger.
Table 3.
Note.—Reader preference had no effect on performance. Resampling methods were used to calculate sensitivity and specificity. No significant differences between 2D and 3D performance within each preference were observed. No significant differences were observed among readers who preferred primary 2D, readers who preferred primary 3D, and readers who preferred both 2D and 3D combined per assigned interpretation method.
Bootstrap method was used to determine 95% CIs.
Subanalysis of Readers’ Performance in Each Preference Group
The reader’s preference for a primary interpretation method also did not have a significant effect on performance when either the preferred or nonpreferred technique was used for polyps 10 mm or larger (Table 2) or 6 mm or larger (Table 3). Readers who preferred a primary 2D method performed similarly for detection of polyps 10 mm or larger whether they were assigned a primary 2D (sensitivity, 0.88; specificity, 0.87) or primary 3D (sensitivity, 0.80; specificity, 0.85) interpretation. For readers with a primary 3D preference, there was higher sensitivity but lower specificity by using a primary 3D approach (for 2D, sensitivity of 0.69 and specificity of 0.91; for 3D, sensitivity of 0.89 and specificity of 0.80), but differences were not significant. For readers who preferred a complete 2D and 3D evaluation, sensitivity was higher with a primary 2D approach (for 2D, sensitivity of 0.90 and specificity of 0.84; for 3D, sensitivity of 0.79 and specificity of 0.83), but again the values were not significantly different from the values for the primary 3D approach.
When evaluation of the detection of smaller polyps was considered, the lowest sensitivity (0.63) was seen for readers with a preference for a primary 2D interpretation who were assigned a primary 2D interpretation.
Discussion
Researchers in prior studies of CT colonographic readers have primarily focused on the reader’s performance by using an assigned technique (2D or 3D, reduced colon preparations, virtual dissection, computer aided detection) (6–12) or the causes of the reader’s error (13–16). Studies about the evaluation of primary 2D versus primary 3D CT colonographic interpretation have had conflicting results. In a previous publication with the data used in this study, it was shown that there were no differences in CT colonographic performance when primary 2D with 3D problem solving or primary 3D with 2D problem solving was used (1). This confirms the finding of a previous study in which the investigators demonstrated that a combined 2D and 3D evaluation was best (11). Results of other studies, however, have suggested that a primary 3D approach is the superior diagnostic method (2,9). This controversy about which is the best diagnostic approach has led to heterogeneous training of CT colonographic readers who, depending on the training program chosen, may be taught that a primary 2D or a primary 3D approach is most accurate.
The superiority of primary 2D or primary 3D CT colonographic interpretation remains unclear, as there are many factors that can affect this evaluation. As mentioned previously, the training program may affect performance. For example, radiologists trained with a primary 3D approach may be better at the use of that technique than at the use of a primary 2D approach simply because they were not well trained with 2D evaluation. The 3D workstation may also affect results. The method that is more time consuming or difficult to use with that workstation may result in increased reader’s fatigue, thus yielding an inferior performance. To eliminate these potential biases for the ACRIN trial, all readers were trained equally by using primary 2D and 3D techniques (17). In addition, unlike in previous studies (3), readers were allowed to use the workstation they used in clinical practice so that the performance issues associated with learning a different workstation did not occur.
Another potential factor influencing performance with a primary 2D or a primary 3D approach may be the reader’s preference, which, to our knowledge, has not been previously evaluated. In discussions among radiologists who perform CT colonography, radiologists are often asked which method they prefer—primary 2D, primary 3D, or both combined. Prior to this study, to our knowledge, there were no data to evaluate whether the reader’s preferred technique actually translated to improved performance. Intuitively, one would assume that a reader’s preferred approach was more reliable or efficient than the alternative for that reader, but does that mean that diagnostic performance is actually better?
When we evaluated all readers in our study, the preferred primary diagnostic method did not translate to significantly improved results. In other words, readers who preferred a primary 2D approach had equivalent diagnostic performance compared with readers who preferred a primary 3D approach or those who preferred both 2D and 3D combined. One reason why no link between preference and performance was found may be due to the extensive training and testing of the readers with both 2D and 3D techniques prior to participation in the ACRIN trial (1,17). It is possible that training in both approaches can trump any advantages related to software or diagnostic preference. In addition, the readers of this study were more heterogeneous than were the readers in previous studies, including academic and private practice radiologists and both inexperienced and experienced readers. For single-site studies, inherent bias in CT colonographic training and accepted diagnostic approach may lead to false assumptions that the diagnostic approach is superior when in fact there may be suboptimal training or experience with alternate approaches.
Although the overall performance in the comparison of preference groups was not different, there was a trend toward slightly higher sensitivity when readers used their preferred technique. For example, readers with a primary 2D preference did have slightly higher sensitivity when they used a primary 2D versus a primary 3D approach (0.88 vs 0.80); however, this finding was true only for polyps 10 mm and larger and not for smaller lesions. For readers with a primary 3D preference, the sensitivity for lesions of all sizes was higher by using a primary 3D evaluation compared with a 2D evaluation but at the expense of a lower specificity. For readers who preferred to use both 2D and 3D combined, there was a slightly higher sensitivity with a primary 2D approach. Ultimately, however, none of these differences were significant. Therefore, for the readers in this study, their self-identification as a primary 2D or a primary 3D reader or a reader of both combined was not truly related to their ability to detect or exclude polyps with that technique. This finding leads to the question, if preference is not related to performance, why do readers prefer one technique versus the other? While not addressed in this study, possible reasons may be the reader’s experience with 2D or 3D images, familiarity with 3D workstations, or personal experiences. For example, some readers may have a perception that they have missed more polyps in clinical practice by using 3D and therefore personally favor a 2D approach, or vice versa. It can be reassuring for CT colonographic readers that, if equivalent training with both 2D and 3D techniques is achieved, performance with both techniques will also likely be equivalent.
The main limitation of this study is that the results are based on a reader’s preference survey administered at different times rather than on an assessment of the reader’s preference at each examination. It was not possible within the constraints of the study design to administer this survey for each CT colonographic interpretation. Because readers noted each time their preference changed during the course of the study, however, we believe we could accurately link preference and study method retrospectively. It is possible, however, that readers did change their preferred technique without noting it on the survey form, leading to potential errors in analysis. This analysis pools cases within readers’ preference, with no special consideration for the effect of the reader. Another criticism may be that the workstation used could affect 2D and 3D performance and change the results of this study. In the original manuscript (1), however, no differences in performance on the basis of the workstation used were identified, so this factor was unlikely to affect the results of this study. Finally, our sample size and distribution were sufficient to observe large differences in sensitivity (about 30%) and substantial differences in specificity (about 20%), as determined with the 95% CIs (determined with the bootstrap method) of the difference. Although we may not have the power for smaller and more clinically relevant differences, we are assured that any major differences in diagnostic accuracy by using the reader’s preference for 2D and/or 3D review were not missed.
In summary, this study demonstrates that there was no difference in performance among readers who preferred a primary 2D interpretation technique, readers who preferred a primary 3D interpretation technique, or readers who preferred an interpretation technique with both 2D and 3D combined. In addition, readers did not perform substantially better by using their preferred or nonpreferred technique. The results of this study demonstrate that, although a reader may have a personal preference for a specific CT colonographic interpretation technique, with proper training on the use of 2D and 3D methods, comparable performance can be achieved.
Advances in Knowledge.
For CT colonographic examinations with polyps 6 mm or larger and 10 mm or larger, sensitivity and specificity, respectively, were not significantly different when readers with a preference for primary 2D (0.84 and 0.86), a preference for primary 3D (0.76 and 0.82), or a preference for both 2D and 3D (0.84 and 0.83) were compared.
There is no significant difference in CT colonographic performance when readers use their preferred versus their nonpreferred technique.
Implication for Patient Care.
Images from CT colonographic examinations can be interpreted comparably by readers with different interpretation preferences.
Disclosures of Potential Conflicts of Interest: A.H. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: receives royalties for CT colonography software license from GE Healthcare and grants related to the CT750HD scanner not used for this study from GE Healthcare. Other relationships: none to disclose. M.B. Financial activities related to the present article: receives money from institutional grant to Brown University. Financial activities not related to the present article: none to disclose. Other relationships: none to disclose. M.H.C. No potential conflicts of interest to disclose. A.D. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: none to disclose. Other relationships: teaches virtual colonoscopy courses for GE Healthcare. M.D.K. No potential conflicts of interest to disclose. C.O.M. No potential conflicts of interest to disclose. B.S. No potential conflicts of interest to disclose. J.I.C. Financial activities related to the present article: receives grant to Yale University paid by ACRIN. Financial activities not related to the present article: none to disclose. Other relationships: none to disclose. R.G.O. No potential conflicts of interest to disclose. J.L.F. No potential conflicts of interest to disclose. P.Z. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: none to disclose. Other relationships: none to disclose. K.M.H. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: receives Siemens educational grant for Virtual Colonoscopy Web site. Other relationships: none to disclose. K.C. No potential conflicts of interest to disclose. R.B.I. No potential conflicts of interest to disclose. R.A.H. No potential conflicts of interest to disclose. G.C. Financial activities related to the present article: receives grant paid to University of California San Diego. Financial activities not related to the present article: none to disclose. Other relationships: none to disclose. J.Y. No potential conflicts of interest to disclose. B.A.H. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: none to disclose. Other relationships: none to disclose. C.D.J. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: receives grants or grants pending from GE Healthcare, receives money for patents (planned, pending or issued) and royalties from GE Healthcare for CT colonography software, receives payment for development of educational presentations from GE Healthcare. Other relationships: none to disclose.
Acknowledgments
The authors acknowledge the contributions of the participants in the original National CT Colonography Trial (ACRIN 6664).
Received February 4, 2010; revision requested March 16; final revision received September 17; accepted September 27; final version accepted October 4.
Funding: This research was supported by the National Institutes of Health (grant U01 CA79778 S2).
Abbreviations:
- ACRIN
- American College of Radiology Imaging Network
- CI
- confidence interval
- 3D
- three-dimensional
- 2D
- two-dimensional
References
- 1.Johnson CD, Chen MH, Toledano AY, et al. Accuracy of CT colonography for detection of large adenomas and cancers. N Engl J Med 2008;359(12):1207–1217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pickhardt PJ, Choi JR, Hwang I, et al. Computed tomographic virtual colonoscopy to screen for colorectal neoplasia in asymptomatic adults. N Engl J Med 2003;349(23):2191–2200 [DOI] [PubMed] [Google Scholar]
- 3.Johnson CD, Toledano AY, Herman BA, et al. Computerized tomographic colonography: performance evaluation in a retrospective multicenter setting. Gastroenterology 2003;125(3):688–695 [DOI] [PubMed] [Google Scholar]
- 4.Rockey DC, Paulson E, Niedzwiecki D, et al. Analysis of air contrast barium enema, computed tomographic colonography, and colonoscopy: prospective comparison. Lancet 2005;365(9456):305–311 [DOI] [PubMed] [Google Scholar]
- 5.Cotton PB, Durkalski VL, Pineau BC, et al. Computed tomographic colonography (virtual colonoscopy): a multicenter comparison with standard colonoscopy for detection of colorectal neoplasia. JAMA 2004;291(14):1713–1719 [DOI] [PubMed] [Google Scholar]
- 6.Summers RM, Frentz SM, Liu J, et al. Conspicuity of colorectal polyps at CT colonography: visual assessment, CAD performance, and the important role of polyp height. Acad Radiol 2009;16(1):4–14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Petrick N, Haider M, Summers RM, et al. CT colonography with computer-aided detection as a second reader: observer performance study. Radiology 2008;246(1):148–156 [DOI] [PubMed] [Google Scholar]
- 8.Johnson CD, Manduca A, Fletcher JG, et al. Noncathartic CT colonography with stool tagging: performance with and without electronic stool subtraction. AJR Am J Roentgenol 2008;190(2):361–366 [DOI] [PubMed] [Google Scholar]
- 9.Pickhardt PJ, Lee AD, Taylor AJ, et al. Primary 2D versus primary 3D polyp detection at screening CT colonography. AJR Am J Roentgenol 2007;189(6):1451–1456 [DOI] [PubMed] [Google Scholar]
- 10.Kim SH, Lee JM, Eun HW, et al. Two- versus three-dimensional colon evaluation with recently developed virtual dissection software for CT colonography. Radiology 2007;244(3):852–864 [DOI] [PubMed] [Google Scholar]
- 11.Johnson CD, Fletcher JG, MacCarty RL, et al. Effect of slice thickness and primary 2D versus 3D virtual dissection on colorectal lesion detection at CT colonography in 452 asymptomatic adults. AJR Am J Roentgenol 2007;189(3):672–680 [DOI] [PubMed] [Google Scholar]
- 12.de Vries AH, Jensch S, Liedenbaum MH, et al. Does a computer-aided detection algorithm in a second read paradigm enhance the performance of experienced computed tomography colonography readers in a population of increased risk? Eur Radiol 2009;19(4):941–950 [DOI] [PubMed] [Google Scholar]
- 13.Pickhardt PJ. Missed lesions at primary 2D CT colonography: further support for 3D polyp detection [letter]. Radiology 2008;246(2):648–649 [DOI] [PubMed] [Google Scholar]
- 14.Dachman AH. Response. Radiology 2008;246(2):64918227562 [Google Scholar]
- 15.Slater A, Taylor SA, Tam E, et al. Reader error during CT colonography: causes and implications for training. Eur Radiol 2006;16(10):2275–2283 [DOI] [PubMed] [Google Scholar]
- 16.Gluecker TM, Fletcher JG, Welch TJ, et al. Characterization of lesions missed on interpretation of CT colonography using a 2D search method. AJR Am J Roentgenol 2004;182(4):881–889 [DOI] [PubMed] [Google Scholar]
- 17.Fletcher JG, Chen MH, Herman BA, et al. Can radiologist training and testing ensure high performance in CT colonography? lessons from the National CT Colonography Trial. AJR Am J Roentgenol 2010;195(1):117–125 [DOI] [PMC free article] [PubMed] [Google Scholar]