Abstract
To identify discrepancies in fludeoxyglucose positron emission tomography/computed tomography (FDG PET/CT) reports generated by general radiologists and subspecialized oncological radiologists for patients with diffuse large B-cell lymphoma (DLBCL), and to assess if such discrepancies impact patient management.
Two radiologists retrospectively reviewed 72 PET/CT scans of patients with DLBCL referred to our institutions between 2009 and 2011, and recorded the discrepancies between the outside and second-opinion reports regarding multiple preset criteria using kappa statistic (Κ), including the disease stage. A multidisciplinary staging that considered all patient clinical data, pathology, and follow up radiological scans, was considered as standard of reference. A hemato-oncologist, blinded to the reports’ origin, subjectively graded the quality and structure of these reports for each patient to determine if clinical stage and disease activity could be derived accurately from these reports.
Agreement was not, or slightly, achieved between the reports regarding the binary and multilevel criteria (Κ < 0–0.2 and weighted Κ = 0.082, respectively). Second-opinion reviews of PET/CT scans were concordant with the multidisciplinary staging in 78% of cases with an almost perfect agreement (Κ = 0.860). A change in staging was demonstrated in 36% of cases. In addition, 68% of second-opinion reports were assigned the highest grades on quality (grades 4 and 5) by the hemato-oncologist, compared with 15% of outside reports, with no noted agreement (weighted Κ = –0.007).
Second-opinion review of PET/CT scans by sub-specialized oncological radiologists increases accuracy of initial staging, posttreatment evaluation and also the clinical relevance of the radiology reports.
Keywords: diffuse large B-cell lymphoma, FDG PET/CT, non-Hodgkin lymphoma, patient management, radiology report, second opinion
1. Introduction
Recent advances in chemotherapy, immunotherapy and bone marrow transplant for the treatment of patients with lymphoma have placed renewed emphasis on accurate staging and characterization of disease activity and response.[1,2] Imaging with fluorodeoxyglucose positron emission tomography-computed tomography (FDG PET/CT) is now an integral part of the staging and management of most patients with lymphoma.[3,4] In daily practice, these PET/CT scans are interpreted by radiologists or nuclear medicine physicians with a variety of backgrounds and subspecialty training. As in other malignancies, the criteria for staging and response assessment for lymphoma have changed over the past decades,[5,6] and for physicians interpreting these scans it is imperative to be familiar with such criteria to generate a clinically meaningful radiology report. Accurate reporting also requires familiarity with the various imaging patterns in which a disease can present on cross-sectional imaging for the interpreting radiologist. Sometimes, patients present to a secondary or tertiary care center with imaging studies and radiology reports generated in the community or at other institutions. In our institution, it is customary that such imaging studies are stored in the picture archiving and communication system (PACS) and presented to staff radiologists or nuclear medicine physicians for official interpretation prior to any management decision. This same standard applies to pathology specimens and outside pathology reports studies. Indeed, review and re-interpretation of pathology specimens in dedicated hemato-pathologists has been shown to cause a change in diagnosis in 17.8% and 16.4% of patients in years 2001 and 2006, respectively, in lymphoma patients.[7] Similarly, several studies have suggested that interpretation of radiologic imaging studies by expert radiologists is more accurate than interpretation by radiologists without subspecialization,[8–10] and that such re-interpretation of imaging studies may have significant impact on patient stage and management. In the current analysis, we chose to study lymphoma patients as this disease has the largest number of submitted outside PET/CTs to our department. We focused more specifically on patients with reliably FDG avid lymphoma histological subtype, diffuse large B-cell lymphoma (DLBCL), and investigated if review of outside FDG PET/CT studies by subspecialized oncoradiologists would alter patient stage and have implication for treatment response evaluation. We particularly focused on the accurate description of clinical stages and disease activity, using modern classification schemes.[6,11]
2. Methods
2.1. Case selection
This study was approved by the institutional review board with a waiver of informed consent. The analysis was compliant with the Health Insurance Portability and Accountability Act.
PET/CT scans were performed and reported in a variety of outside institutions (including private practice clinics, community hospitals, academic hospitals, and less frequently cancer centers).
Imaging studies and the accompanying radiology report were submitted to our institutional PACS. As shown in Figure 1, we conducted a retrospective data base search using the following inclusion criteria: biopsy-proven DLBCL; outside FDG PET/CT between the years 2009 and 2011 with at least 1 year of radiological and clinical follow-up (recent enough to ensure use of modern PET/CT technology but distant enough to ensure sufficient follow-up); available outside PET/CT report; ability to properly display outside studies and fuse raw axial CT and PET data using our reader software (GE healthcare volume viewer AW server 3.2), and the ability to extract the maximum standardized uptake values (SUVmax) from these imaging data. From the initial sample of 169 patients and scans collected in these 3 years, 72 were found eligible for this current analysis.
2.2. Data review and analysis
In this study, the reports from outside institutions will be referred to as “outside” reports whereas the reports from our institution will be referred to “second-opinion” reports.
Two sub-specialized oncoradiologists (PS and KR) with 6 years of experience and 2 fellowships training in nuclear medicine and oncological imaging, reviewed retrospectively the 72 scans, the original reports and the reports generated by staff radiologists/nuclear medicine physicians at our institution, and evaluated if these reports addressed critical clinical questions, including: sites of disease with exact description of location and size, the intensity of FDG uptake, appropriate classification of disease extent to enable assigning a clinical stage according to Ann Arbor classification, and appropriate measurement and description of residual FDG uptake during or after therapy to enable characterization of response using current recommendations .[6] The 2 radiologists also recorded whether radiology reports contained SUV number standardized reference regions in mediastinum and liver (needed for quality assurance and response assessment), and evaluated the level of confidence with which the outside report distinguished between malignant and benign etiology, elaborated on differential diagnoses, and further recommended management.
All the PET/CT studies were form centers where the radiologists are involved in a wide range of diagnostic imaging studies. None of the radiologists was a dedicated oncological imaging radiologist. We defined a radiologist as a subspecialized oncoradiologist, if he had at least 1 year of oncological imaging fellowship training.
A multidisciplinary staging considering all patients clinical, pathological, and imaging data, and clinical and imaging follow-up was considered as the standard of reference.
In addition, a single hemato-oncologist with 7 years’ experience in managing and treating lymphomas compared the text of the outside PET/CT report to the report generated by our institutional specialized oncologic radiologist or nuclear medicine physician without any information regarding the origin of the report. This physician recorded the clinical stage based on the information that could be derived from the report text and graded the report clarity on a scale from 1 to 5, with “grade 1” being poor, “grade 2” being fair, “grade 3” being average, “grade 4” being good and “grade 5” being excellent. The grade was a subjective measure of the report structure, addressing how easily and accurately the hemato-oncologist was able to extract the information needed for proper staging, restaging and subsequent management.
2.3. Statistical analysis
Kappa statistic (Κ) was used to assess the agreement on PET/CT features from outside and second-opinion reports. Weighted kappa with squared weights was used for features with multiple levels, including grade, mediastinum (yes, no, and unknown) and staging. We then compared the likelihood of the criteria present in the report, the staging, and the grade between the outside reports and second-opinion reports using McNemar test for 2-level features and Bowker's test of symmetry for multiple-level features, respectively. Agreement between radiology staging in each report and the multidisciplinary standard of reference staging was also assessed using weighted kappa. Kappa statistic is interpreted as follows: < 0, no agreement, 0.00–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and 0.81–1.00, almost perfect agreement. A test with P < .05 was considered statistically significant. All statistical analyses were performed in software packages SAS 9.4 (SAS Institute Inc., Cary, NC).
3. Results
Of the 72 PET/CT scans analyzed between the years 2009 and 2011, 35 were obtained for initial-staging and 37 were obtained during follow-up and for assessment of response to treatment.
3.1. Head to head comparison of the second-opinion and outside readings
Differences between the second-opinion and outside readings regarding the studied criteria are shown in Table 1. The second-opinion reviews showed no agreement or slight agreement with outside reports for most of the binary criteria (Table 1). For example, there was no agreement for accurate anatomic description of anatomical sites of disease (Κ = –0.054). In addition, the likelihood of presenting each criterion was different between the outside and second-opinion reading (P < .02), except for size and SUVmax measurements (P = .695 and .835 respectively).
Table 1.
3.2. Comparison of quality grading and radiological staging of second-opinion and outside reports
Table 2 shows the discordance in quality of reports generated by second-opinion readers versus outside readers and the related discordance in clinical stage that could be assigned based on these reports. In general, second-opinion reports were judged to be of higher quality by the hemato-oncologist who assigned higher grades (4 and 5) in 49/72 patients (68%), as compared with the outside reports for which only 11/72 (15%) received a grade of 4 or 5 (weighted Κ = –0.007 (–0.123, 0.132).
Table 2.
Discordance between the second-opinion and outside reports regarding disease staging was noted in 26/72 (36%) of patients (weighted Κ = 0.674, 95% confidence interval, CI: 0.469, 0.824). Of these 26 patients, 4 patients (15%) were upstaged from “no evidence of disease” to “stages 3 and 4”; and 3 patients (11%) were downstaged to no evidence of disease. Figure 2 represents a good example, in which second-opinion review demonstrated the exact and correct anatomical location of a hypermetabolic lesion wrongly described on outside PET/CT report, therefore changing initial disease stage. Figure 3 illustrates the case of a stage 1 DLBCL patient with interim outside PET/CT describing reactive lymph nodes as sites of disease.
3.3. Comparison to the standard of reference
Compared with the multidisciplinary staging (Table 3), 56/72 of second opinion reports (78%), were in agreement with this ultimate standard of reference, with an almost perfect agreement (Κ = 0.860). In contrast, only 40/72 (55%) of outside reports were found to be concordant with the standard of reference, with a fair agreement (Κ = 0.513). As an example, Figures 4 and 5 represent a common pitfall and a challenge for every PET/CT reader to discriminate between reactive versus malignant bone marrow uptake.
Table 3.
Focusing specifically on the 32 reports for which discrepancies were noted between outside and second opinion reports, the second-opinion reviews showed a moderate agreement (Κ = 0.699) with the standard of reference and 18 of these 32 reports (56%) were concordant with the results of the multidisciplinary staging (Table 4).
Table 4.
4. Discussion
Recent advances in imaging technology, image interpretation criteria, and clinical management of patients with lymphoma place increasing responsibility on radiologists to provide precise disease evaluation to deliver the best patient care. The diagnostic benefits of FDG PET/CT in initial staging and response evaluation have been established.[12] Residual FDG uptake on PET/CT scans obtained during and after first line therapy is also associated with patient prognosis.[13] A meta-analysis reported a sensitivity and specificity of 72% (95% CI, 61% to 82%) and 100% (95% CI, 97% to 100%) for the detection of residual disease after completion of first-line therapy for aggressive non-Hodgkin lymphomas.[14] Modern guidelines for the application and interpretation of FDG PET/CT in lymphoma were formulated at the in 11th International Conference on Malignant Lymphomas[5] and are now widely applied by oncologists in clinical practice. To ensure optimal patient management, it is imperative that oncologists and radiologists/nuclear medicine physicians “speak the same language” when interpreting and characterizing findings on diagnostic imaging studies, in particular with regard to FDG PET/CT. Based on general impression, discussions in interdisciplinary tumor boards and prior experience with second opinion review of pathology data generated at our center,[7] we hypothesized that interpretation of FDG PET/CT by expert readers specialized in oncologic imaging would provide added benefit to the oncologist treating lymphoma patients. Indeed, our study demonstrates that interpretation of FDG PET/CT by expert readers and generation of standardized reports can improve the accuracy of disease characterization and the quality of reporting. Clarity of language, application of clear criteria, reporting of data points necessary for proper staging and response assessment (such as lesion size, SUVmax, and changes in these during therapy) are clearly appreciated by oncologists, and are in fact critical for proper patient management.
Our results showed major discordance between second-opinion and outside reports for majority of the studied criteria (Table 1). For example, the lack of agreement for accurate anatomic description of anatomical sites of disease makes it difficult for clinicians to judge the extent of disease at staging, and also to compare findings at baseline with findings upon follow-up. In addition, outside report often lacked confidence in differentiating between benign and malignant FDG uptake on scans, and often did not elaborate on potential differential diagnoses for FDG uptake in a particular location, such as lung, skeletal muscle, or bone marrow. This uncertainty would have led to further (often unnecessary) radiologic imaging studies to work-up “equivocal” sites of FDG uptake, leading to potential delay in care, and additional patient exposure and cost. Such additional studies were recommended rarely be second opinion readers.
For follow-up scans, residual FDG uptake is characterized using modern scoring systems,[1,15] which rely on comparison with reference regions in the same scan and reporting of SUV numbers. We found only a slight agreement in the ability to derive a particular score form the 5-point scales when comparing outside and second opinion reports (Κ = 0.024) because most of the outside reports failed to provide the lesion with the highest SUVmax and the SUVmax at sites of residual disease (Κ = –0.016) or failed to include the comparison with FDG uptake in reference regions in mediastinum and liver (Κ = 0.082).
Interestingly, outside and second-opinion reports were also discordant regarding disease stage in 26 of the 72 patients (36%) (weighted Κ = 0.674), with sometimes considerable potential impact on patient management (eg, as shown in Figs. 4 and 5).
Again, second-opinion interpretations were found to be more accurate and clinically relevant. Oncologists appreciated the fact that second-opinion reports were more structured, with 97% of reports receiving a grade of 4 or 5 as compared with outside reports in only 44% (P < .001). Some of the limitations noted in outside reports could be remedied easily by local radiologists and nuclear medicine physician reading PET scans if they were familiar with patterns of disease spread and differential diagnoses, and applied contemporaneous criteria to their interpretation of scans.
In addition, the use of structured and standardized reporting is gaining favor because of its reproducibility and easy read by the referring physicians. Schwartz et al[16] showed that structured radiology reporting might improve patient care by increasing clarity and thoroughness in the communication of imaging findings to referring physicians. When such reports are developed with input from interdisciplinary clinical teams, they receive significantly higher mean ratings for clarity and content by referring physicians when compared with conventional (unstructured, free flowing language) radiology reports. Therefore, we have been using structured reporting for several years in our institution.
Previous studies have looked at the value of second opinion radiology reports in other diseases and have generally found that these reports are highly valued. For example, Lakhman et al[17] demonstrated that second-opinion review of gynecological oncologic MRI affected management in about 20% of patients. Unnecessary surgeries were prevented in about 7% of cases. When compared with histopathology and 6-month follow-up as reference, the second-opinion MRI reviews were correct in 83% of cases with clinically relevant discrepancies. Similarly, Gollub et al[18] found major discordance between outside and second-opinion radiology reports in 24 of 143 body CT scans, with a subsequent change in management for 3.5% of patients. Zan et al[19] evaluated 7,465 neuroradiology studies and showed clinically important discrepancy in interpretation 7.7% of cases (in 2/3 of cases related to lesion detection, and in 1/3 of cases related to patient management). The second-opinion review was more accurate in 84% of studies when compared with the definitive diagnosis. Finally, Dudley et al[20] compared the initial outside and second-opinion CT and MRI reports for 396 patients who were referred to surgical oncologists for initial staging or follow-up and concluded that reports disagreed in 41% of cases; the second-opinion interpretations were more often correct (in 94% of discrepant cases). With regard to PET/CT in general, Tahari et al[21] showed that submitted outside PET/CT exams often lacked technical information needed for appropriate scan interpretation. In another analysis,[22] it was shown that subspecialist review of submitted outside PET/CT examinations resulted in discrepant interpretation in 13% of cases in characterizing FDG uptake as benign or malignant, and the subspecialist review was accurate in 89% of cases.
Our analysis has certain limitations, mainly its retrospective nature. The subspecialized oncoradiologists had access to the original outside report when generating the second-opinion report. Second, the number of studies reviewed was relatively small. Even though we recruited all lymphoma studies submitted over the course of 3 years, the necessity for reasonably long follow-up and ability to review and measure scans with our institutional hardware and software tools limited the initially larger number of potential scans to only 72. Third, we used the agreement in the multidisciplinary meeting and follow-up as standard of reference as biopsy confirmation for each individual lesion is not possible. The hemato-oncologist grading report quality was not aware of the origin of each individual report. However, as we have used structured reporting for some time it is likely that reports generated by second-opinion readers could thus be possibly recognized. Nevertheless, this would not necessarily lead to higher rating of inhouse reports as the oncologist was asked to address a number of clinically relevant criteria in the reports. Another limitation is that the radiology reports were evaluated by a single oncologist from the same institution of the second opinion radiologist; however, the oncologist was blinded to the source of the reports.
Finally, excluding all DLBCL patients whose submitted PET/CT scans lacked the original outside report may have led to selection bias.
In conclusion, second-opinion review of FDG PET/CT in lymphoma patients by subspecialized oncologic radiologists increases accuracy of staging and response evaluation and the clinical significance of the report. For ease of interpretation and comparison with follow-up scans, oncologists favor structured reports, with accurate description of disease sites and disease activity using contemporaneous interpretation criteria.
Footnotes
Abbreviations: DLBCL = diffuse large B cell lymphoma, FDG PET/CT = fludeoxyglucose positron emission tomography/computed tomography, PACS = picture archiving and communication system, SUVmax = maximum standardized uptake value.
This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748.
The authors declare no conflicts of interest.
References
- [1].Johnson SA, Kumar A, Matasar MJ, et al. Imaging for staging and response assessment in lymphoma. Radiology 2015;276:323–38. [DOI] [PubMed] [Google Scholar]
- [2].Shelly MJ, McDermott S, O’Connor OJ, et al. 18-Fluorodeoxyglucose positron emission tomography/computed tomography in the management of aggressive non-Hodgkin's B-cell lymphoma. ISRN Hematol 2012;2012:456706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Tatsumi M, Cohade C, Nakamoto Y, et al. Direct comparison of FDG PET and CT findings in patients with lymphoma: initial experience. Radiology 2005;237:1038–45. [DOI] [PubMed] [Google Scholar]
- [4].Terasawa T, Nihashi T, Hotta T, et al. 18F-FDG PET for posttherapy assessment of Hodgkin's disease and aggressive non-Hodgkin's lymphoma: a systematic review. J Nucl Med 2008;49:13–21. [DOI] [PubMed] [Google Scholar]
- [5].Barrington SF, Mikhaeel NG, Kostakoglu L, et al. Role of imaging in the staging and response assessment of lymphoma: consensus of the International Conference on Malignant Lymphomas Imaging Working Group. J Clin Oncol 2014;32:3048–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Cheson BD. Staging and response assessment in lymphomas: the new Lugano classification. Chin Clin Oncol 2015;4:5. [DOI] [PubMed] [Google Scholar]
- [7].Matasar MJ, Shi W, Silberstien J, et al. Expert second-opinion pathology review of lymphoma in the era of the World Health Organization classification. Ann Oncol 2012;23:159–66. [DOI] [PubMed] [Google Scholar]
- [8].Eakins C, Ellis WD, Pruthi S, et al. Second opinion interpretations by specialty radiologists at a pediatric hospital: rate of disagreement and clinical implications. AJR. AJR Am J Roentgenol 2012;199:916–20. [DOI] [PubMed] [Google Scholar]
- [9].Lu MT, Tellis WM, Avrin DE. Providing formal reports for outside imaging and the rate of repeat imaging. AJR Am J Roentgenol 2014;203:107–10. [DOI] [PubMed] [Google Scholar]
- [10].Bell ME, Patel MD. The degree of abdominal imaging (AI) subspecialization of the reviewing radiologist significantly impacts the number of clinically relevant and incidental discrepancies identified during peer review of emergency after-hours body CT studies. Abdom Imaging 2014;39:1114–8. [DOI] [PubMed] [Google Scholar]
- [11].Hutchings M, Barrington SF. PET/CT for therapy response assessment in lymphoma. J Nucl Med 2009;50(Suppl 1):21–30. [DOI] [PubMed] [Google Scholar]
- [12].Barrington SF, Kirkwood AA, Franceschetto A, et al. PET-CT for staging and early response: results from the response-adapted therapy in Advanced Hodgkin Lymphoma study. Blood 2016;127:1531–8. [DOI] [PubMed] [Google Scholar]
- [13].Schot B, van Imhoff G, Pruim J, et al. Predictive value of early 18F-fluoro-deoxyglucose positron emission tomography in chemosensitive relapsed lymphoma. Br J Haematol 2003;123:282–7. [DOI] [PubMed] [Google Scholar]
- [14].Zijlstra JM, Lindauer-van der Werf G, Hoekstra OS, et al. 18F-fluoro-deoxyglucose positron emission tomography for post-treatment evaluation of malignant lymphoma: a systematic review. Hematologica 2006;91:522–9. [PubMed] [Google Scholar]
- [15].Itti E, Meignan M, Berriolo-Riedinger A, et al. An international confirmatory study of the prognostic value of early PET/CT in diffuse large B-cell lymphoma: comparison between Deauville criteria and DeltaSUVmax. Eur J Nucl Med Mol Imaging 2013;40:1312–20. [DOI] [PubMed] [Google Scholar]
- [16].Schwartz LH, Panicek DM, Berk AR, et al. Improving communication of diagnostic radiology findings through structured reporting. Radiology 2011;260:174–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Lakhman Y, D’Anastasi M, Micco M, et al. Second-opinion interpretations of gynecologic oncologic MRI examinations by sub-specialized radiologists influence patient care. Eur Radiol 2016;26:2089–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Gollub MJ, Panicek DM, Bach AM, et al. Clinical importance of reinterpretation of body CT scans obtained elsewhere in patients referred for care at a tertiary cancer center. Radiology 1999;210:109–12. [DOI] [PubMed] [Google Scholar]
- [19].Zan E, Yousem DM, Carone M, et al. Second-opinion consultations in neuroradiology. Radiology 2010;255:135–41. [DOI] [PubMed] [Google Scholar]
- [20].Dudley RA, Hricak H, Scheidler J, et al. Shared patient analysis: a method to assess the clinical benefits of patient referrals. Med Care 2001;39:1182–7. [DOI] [PubMed] [Google Scholar]
- [21].Tahari AK, Wahl RL. Quantitative FDG PET/CT in the community: experience from interpretation of outside oncologic PET/CT exams in referred cancer patients. J Med Imaging Radiat Oncol 2014;58:183–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Ulaner GA, Mannelli L, Dunphy M. Value of second-opinion review of outside institution PET-CT examinations. Nucl Med Commun 2017;38:306–11. [DOI] [PMC free article] [PubMed] [Google Scholar]