Abstract
We propose a standardized approach to quantitative molecular imaging (MI) in cancer patients with multiple lesions.
METHODS
Twenty castration-resistant-prostate-cancer patients underwent 18F-FDG and 18F-16β-fluoro-5α-dihydrotestosterone (18F-FDHT) PET/CT scans. Using a 5-point confidence-scale, two readers interpreted co-registered scan-sets on a PET-VCAR (General Electric) workstation. 203 sites/scan (specified in a lexicon) were reviewed. 18F-FDG-positive lesion bookmarks were propagated onto 18F-FDHT studies, then manually accepted or rejected. Discordant-positive 18F-FDHT lesions were similarly bookmarked. Lesional SUVmax was recorded. Tracer and tissue-specific background correction-factors were calculated via receiver-operating-characteristic analysis of 65 scan-sets.
RESULTS
Readers agreed on >99% of 18F-FDG and 18F-FDHT negative-sites. Positive-site agreement was 84% and 85%, respectively. Consensus-lesion SUVmax was highly reproducible (CCC>0.98). Receiver-operating-characteristic curves yielded four correction-factors (SUVmax 1.8-2.6). A novel scatter (“LFG”) plot depicted tumor burden and ΔSUVmax for response assessments.
CONCLUSION
Multi-lesion MI is optimized with a five-step approach incorporating a confidence scale, site lexicon, semi-automated PET software, background-correction and LFG-graphing.
Keywords: Molecular imaging, PET/CT, 18F-FDG, 18F-FDHT, semi-automated
Molecular imaging (MI) with 8F-FDG-PET is widely used for assessing the effect of treatment on tumor (1, 2). Numerous additional agents for imaging the hallmarks of cancer, such as rapid proliferation, apoptosis, amino acid synthesis, hypoxia and more specific molecules expressed on tumors, are under development (3, 4). Evaluation of these potential imaging biomarkers requires a reproducible and expeditious system to identify disease, quantify metabolic activity, and follow the course of the lesion over time, particularly in patients with a multitude of lesions. With this in mind, our group developed a PET image segmentation technique, based on adaptive thresholding (5). This method produced precise volume measurements and eliminated the subjectivity of manual contouring. Furthermore, a coordinate system was devised to facilitate longitudinal tumor tracking on serial 18F-FDG scans and characterization of tumor heterogeneity with diverse tracers (6). These tools served as a foundation for semi-automated image-based PET/CT analysis programs now produced by various manufacturers. PET-VCAR (Volume Computer Assisted Reading), an application of the Advantage Workstation (GE Healthcare, Wisconsin USA), is one such program that incorporates precise exam-to-exam co-registration, using the companion CT as a fiduciary marker, and threshold-based image segmentation. These features permit unambiguous lesion tracking and efficient analysis of large data sets, essential for streamlining pharmacodynamic and response assessments in clinical trials. In this brief communication, we report a 5-step approach intended to standardize implementation of semi-automated image analysis programs such as PET-VCAR, thereby facilitating the successful co-development of novel MI biomarkers and therapies.
MATERIALS AND METHODS
To further develop and validate our approach, we chose a clinical situation in which two radiotracers were used to image a group of patients with multiple metastatic bone and/or soft-tissue lesions. In the context of an IRB-approved protocol, 65 consecutive patients with progressive castration-resistant-prostate-cancer (CRPC) underwent paired 18F-FDG and 18F-16 - fluoro-5 -dihydrotestosterone (18F-FDHT) PET/CT scans within a 24-hour period. 18F-FDG scans were acquired ~60 minutes after injecting ~370 MBq of 18F-FDG. 18F-FDHT scans were acquired ~40 minutes after injecting ~333 MBq of 18F-FDHT (7). Patients were imaged from skull-base to upper thighs on the Discovery STE PET/CT scanner (GE Medical Systems). Reconstructed images were loaded onto a PET-VCAR workstation. Two experienced readers blindly interpreted a randomized subset of 20 scan-sets. The reader reviewed 203 sites per scan, pre-specified in an anatomic lexicon (Supplemental Table 1). Scans were first interpreted qualitatively on a five-point confidence scale for the absence or presence of malignancy (0-definitely negative; 1-probably negative; 2-equivocal; 3-probably positive; 4-definitely positive).Foci of activity visually higher than local background and not explained by physiologic/benign processes were considered positive. Sites rated 0-2 were recorded as negative; sites rated 3-4 were recorded as positive. All discrete lesions within positive sites were segmented with the threshold-based isocontour tool set at default of 42% from SUVmax (5). Coalescing lesions that could not be clearly separated were segmented as one lesion. Lesions occupying two contiguous sites were considered distinct lesions. Paired 18F-FDG and 18F-FDHT scans were automatically co-registered by PET-VCAR using a system of coarse and fine adjustments based on the characteristics of the bone and soft tissue on the companion CT. Bookmarked regions of interest (ROI) for 18F-FDG lesions were automatically duplicated and propagated onto the co-registered 18F-FDHT images. Propagated bookmarks were accepted or rejected using the confidence scale and ROIs were manually adjusted by the reader, as needed. Discordant-positive 18F-FDHT lesions were segmented in a similar fashion. SUVmax(body-weight) was obtained for every lesion and catalogued site-by-site. Results of the two readers were compared on a per-site and per-lesion basis to determine interobserver variability. Reproducibility of SUVmax measurements for consensus-lesions was assessed with Bland-Altman plots and calculation of concordance correlation coefficients (CCC)(8). SUVmax reproducibility was further analyzed after correcting for background activity (described next).
With the rationale that lesional “metabolic” activity is composed of tracer bound in tumor and unbound tracer in stroma, we aimed to subtract the contribution of stromal signal from the measured SUVmax . We hypothesized that establishing a population-based background would provide an approximate measure of stromal signal, and could also serve as a threshold for better discrimination between benign and malignant uptake. To determine this value, all 65 scan-sets were interpreted by consensus. In addition to lesional uptake, SUVmax of background activity was recorded. For bone background, a region of interest (ROI) was placed in the posterior iliac crest or other uninvolved bone if iliac crest harbored tumor. For soft tissue, an ROI was placed in gluteal muscle (chosen for ease of measurement). Tracer and tissue-specific receiver-operator-characteristic (ROC) curves were constructed by plotting background SUVs against lesion SUVs. The point on the curve closest to perfect classification (0,1) was chosen as the background/threshold SUVmax. The four resulting values were applied as a correction-factor for all segmented lesions within respective tracer/tissue categories: (lesion SUVmax) – (background SUVmax). Background-corrected lesions with SUVmax ≤0 were reassigned as PET-negative.
RESULTS
For the interobserver analysis of 18F-FDG scans, 3852 (94.9%) out of 4060 sites were classified as negative by both readers, 173 (4.1%) as positive by both, and 35 (0.9%) as positive by only one. For the 4060 18F-FDHT sites, respective classifications were 3838 (94.5%), 189 (4.6%) and 33 (0.8%). This translates to 84.3% (173/208) agreement for positive 18F-FDG sites and 85.1% (189/222) agreement for positive 18F-FDHT sites. As several positive sites contained more than one discrete lesion, the number of recorded lesions was greater than the number of positive sites. The two readers agreed on 80.8% (194/240) of all recorded 18F-FDG lesions, and 78.7% (211/268) of all 18F-FDHT lesions. SUVmax measurements for these consensus-lesions were highly concordant: for 18F-FDG, CCC=0.994(95% CI, 0.992-0.996); for 18F-FDHT, CCC=0.981(95% CI, 0.976-0.986). Consensus-lesion SUVmax reproducibility is depicted graphically with Bland-Altman plots in Figure 1.
Figure 1.
Bland-Altman plots for (A) 18F-FDG and (B) 18F-FDHT demonstrating very high reproducibility of interobserver consensus-lesion SUVmax measurements. For 18F-FDG: Bias = 0.016, with 95% limits of agreement -0.77 to 0.74. For 18F-FDHT: Bias = -0.015, with 95% limits of agreement -1.56 to 1.53. The slightly wider confidence limits for 18F-FDHT indicate higher variability in measurements, largely due to two outlying lesions, both of which can be seen on the plot.
The background analysis yielded four separate values with SUVmax 1.8 - 2.6 (Tables 1 and 2). Interobserver reproducibility for background-corrected (bkg-c) consensus-lesion SUV’s was nearly identical to the pre-correction scenario: for 18F-FDGbkg-c, CCC=0.994 (95% CI, 0.993-0.996); for 18F-FDHTbkg-c, CCC = 0.979 (95% CI, 0.973-0.985).
Table 1.
Lesion and background data from 65 18F-FDG and 18F-FDHT scan-sets utilized in the ROC-curve background analysis.
| N | Mean SUVmax |
Standard Deviation |
Minimum SUVmax |
Maximum SUVmax |
||
|---|---|---|---|---|---|---|
| Bone FDG |
Lesion | 1079 | 5.6 | 5.4 | 0.6 | 47.2 |
| Background | 65 | 1.4 | 0.3 | 0.8 | 2.3 | |
| Bone FDHT |
Lesion | 1014 | 6.3 | 3.9 | 1.0 | 28.5 |
| Background | 65 | 1.9 | 0.5 | 0.8 | 2.8 | |
| Soft FDG |
Lesion | 225 | 5.6 | 3.6 | 0.8 | 22.6 |
| Background | 50 | 1.2 | 0.4 | 0.5 | 2.0 | |
| Soft FDHT |
Lesion | 196 | 8.1 | 4.5 | 1.5 | 20.5 |
| Background | 50 | 1.4 | 0.5 | 0.5 | 3.0 |
Table 2.
ROC-curve analyses of lesion and background SUVmax data in Table 1. Four distinct tracer and tissue-dependent threshold values were obtained for optimal discrimination between benign and malignant uptake. For any given threshold, a tradeoff exists between sensitivity and specificity. In cases where the distance from the perfect marker was similar for more than one SUVmax value, we opted for greater specificity at the expense of lower sensitivity.
| SUVmax Threshold |
Specificity | Sensitivity | Distance from perfect marker |
|
|---|---|---|---|---|
| Bone FDG | 2.0 | 99.53% | 84.80% | 0.153 |
| Bone FDHT | 2.6 | 99.13% | 90.04% | 0.100 |
| Soft FDG | 1.8 | 94.03% | 94.12% | 0.084 |
| Soft FDHT | 2.3 | 99.47% | 96.94% | 0.031 |
Representative response data for two patients were graphed on a novel scatter plot designed to facilitate multi-lesion response assessments. We refer to this graph, herein, as the Larson-Fox-Gonen (LFG) Plot (Figures 2 and 3).
Figure 2.
Representative (A) 18F-FDG and (B) 18F-FDHT LFG Plots in a “non-responding” CRPC patient receiving androgen-receptor (AR) targeted therapy. The identity line indicates no change in SUV between baseline and followup (Δ SUV = 0%). The “rays” around the identity line indicate various levels of percent change. New lesions fall on the y-axis when imputing a value of zero for the baseline SUVmax.
In this example, total lesion (n = 51 at baseline) 18F-FDG and 18F-FDHT background-corrected SUV data are plotted, demonstrating marked hypermetabolism at baseline and metabolic progression at 4 weeks (increase in 18F-FDG uptake >50% for several lesions, as well as several new lesions). The 18F-FDHT plot shows concomitant suppression of 18F-FDHT uptake (>75% reduction in most lesions), despite apparent 18F-FDG progression. Corresponding maximum intensity projection (MIP) PET images, at baseline and after 4 weeks of therapy, are found at the right side of the plots.
Figure 3.
Representative (A) 18F-FDG and (B) 18F-FDHT LFG Plots in a “responding” CRPC patient receiving AR targeted therapy. Total lesion (n = 61) 18F-FDG and 18F-FDHT background corrected SUVmax data are graphed, depicting a favorable metabolic response (>75% reduction in 18F-FDG uptake for majority of lesions) and concomitant suppression of 18F-FDHT uptake (>50% in the majority of lesions). Corresponding MIP images, at baseline and after 4 weeks of therapy, are found at the right side of the plots. In Figure Focal activity on “FDG 4 Week” scan in the left axilla represents artifact (i.e. benign nodal uptake related to the radiotracer injection.)
DISCUSSION
MI offers the potential for improved detection of disease and quantitation of alterations in molecular targets. In the context of clinical trials, MI can assist in determining proof-of-mechanism of an experimental drug, and separately, treatment efficacy. A variety of PET-based methods have been proposed for quantitating treatment response, including the recently proposed PERCIST criteria (9). These methods generally recommend assessment of only a selected number of “target” lesions, modeled after structural-based criteria such as RECIST 1.1(10). However, RECIST-type criteria are largely based on pragmatism, with limited supporting evidence (11-14). In patients with many metastatic lesions, this reductive approach risks the overlooking of key lesions that are outliers in terms of behavior, and potentially responsible for a poor patient outcome. The introduction of semi-automated data analysis programs, such as PET-VCAR, renders it feasible to account for all lesions in outcome assessments, which in turn may help elucidate optimal parameters of response. In addition, this platform can be used to compare the uptake of multiple tracers in various lesions and to monitor similarities and differences in response to treatment.
Our standardized approach to comparative analysis of total-lesion MI builds on the capabilities of these semi-automated systems (Figure 4): Step 1. A five-point confidence scale is used for the initial qualitative assessment. Overall, there was high interobserver agreement (>99%) with respect to qualitatively classifying the 4060 anatomic sites as negative or positive for both 18F-FDG and 18F-FDHT scans. Agreement rate fell to roughly 84% when focusing only on positive sites, and to 80% when considering all recorded lesions, keeping in mind that some sites contained multiple lesions. An ordinal confidence scale mitigates, but cannot completely resolve, the inherent and unavoidable subjectivity of diagnostic imaging interpretation, irrespective of the workstation used. MI with PET is arguably more prone to interobserver variability than conventional structural imaging. Nevertheless, a recent paper looking at CT interpretation reported major interobserver disagreements in 26-32% of cases (15), supporting the notion that disagreement in qualitative interpretation is unavoidable. As a solution to this problem, we recommend that preliminary training sessions or consensus readouts should be integrated into imaging protocols. Step 2. A standardized lexicon for lesion nomenclature is adopted. A lexicon minimizes ambiguities in lesion assignment, particularly in the context of a total-lesion cataloguing effort. A lexicon also facilitates correlation with more conventional imaging modalities such as bone scan, CT and MRI. Step 3. Scans are analyzed semi-automatically. (a) Positive lesions are bookmarked with a threshold-based segmentation algorithm. An isocontour tool clearly defines the three-dimensional borders of the lesion, ensuring that the voxel containing the SUVmax is located within the confines of the lesion. (b) PET/CT studies are automatically co-registered. Co-registration enables automatic propagation of lesion bookmarks and facilitates unambiguous lesion tracking. In contrast to the qualitative assessment (Step 1), quantitative agreement was excellent, reflected by very high SUVmax reproducibility for consensus-lesions (CCC > 0.98 for both 18F-FDG and 18F-FDHT). These results are, at least, similar to interobserver reproducibility of SUVmax measurements obtained on a standard workstation (ICC = 0.93) (16). Step 4. Positive lesions are corrected for background activity. Background correction in this context serves two purposes: a. to eliminate the contribution of signal from unbound tracer in stroma; b. to optimally discriminate between benignity and malignancy. We utilized a population-based ROC-curve analysis to establish a standard background level, which was then applied as a correction-factor. Four separate thresholds were calculated to account for the distinct properties of each tracer in bone and soft tissue. When applicable, we opted for greater specificity over sensitivity, given the plethora of lesions. Step 5. SUV data are graphed on an LFG Plot. This plot allows for representation of large amounts of comparison data, while clearly depicting absolute and percent ΔSUVmax for individual lesions, new lesions, and trends for the total-lesion burden. Individual lesions with aberrant behavior are easily detected.
Figure 4.
Summary diagram of five-step approach to comparative analysis of total-lesion molecular imaging.
A limitation of the study is the lack of a gold standard comparator to confirm the accuracy of the segmented lesions. Nevertheless, the purpose of this brief communication is not to present specific outcome data for 18F-FDG or 18F-FDHT in CRPC. Rather, our goal is to describe a standardized and practical approach for multi-lesion assessments, as an aid for future work with molecular imaging. We intend to further validate the ROC-based background analysis in the context of pending pharmacodynamic and response assessments, as well as with tissue correlation, when available.
CONCLUSION
We have described our approach to the challenging problem of MI-based quantitative analysis of multiple lesions in individual patients or patient populations. We propose that this type of analysis benefits from semi-automated software such as PET-VCAR, which allows for unambiguous lesion tracking and reproducible quantitative assessment. A novel summary plot, the LFG plot, was developed in order to visualize data in a manner that is intuitive and permits easy assessment of treatment response. In future work, we plan to compare the largest group of lesions with smaller subsets of “target” lesions to determine the optimal number needed for prediction of clinical endpoints such as overall survival. Ultimately, we propose that this biologically-sound approach will lead to the qualification of robust imaging biomarkers.
Supplementary Material
ACKNOWLEDGEMENTS
Support for this research came from P50-CA086438 MSKCC Center for Molecular Imaging in Cancer from the National Cancer Institute, and from the Memorial Sloan-Kettering Cancer Center Specialized Program of Research Excellence (SPORE) Grant in Prostate Cancer (P50 CA92629).
References
- 1.Kelloff GJ, Hoffman JM, Johnson B, et al. Progress and promise of FDG-PET imaging for cancer patient management and oncologic drug development. Clin Cancer Res. 2005 Apr 15;11(8):2785–2808. doi: 10.1158/1078-0432.CCR-04-2626. [DOI] [PubMed] [Google Scholar]
- 2.Shankar LK, Hoffman JM, Bacharach S, et al. Consensus recommendations for the use of 18F-FDG PET as an indicator of therapeutic response in patients in National Cancer Institute Trials. J Nucl Med. 2006 Jun;47(6):1059–1066. [PubMed] [Google Scholar]
- 3.Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000 Jan 7;100(1):57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
- 4.Larson SM, Schoder H. New PET tracers for evaluation of solid tumor response to therapy. Q J Nucl Med Mol Imaging. 2009 Apr;53(2):158–166. [PubMed] [Google Scholar]
- 5.Erdi YE, Mawlawi O, Larson SM, et al. Segmentation of lung lesion volume by adaptive positron emission tomography image thresholding. Cancer. 1997 Dec 15;80(12 Suppl):2505–2509. doi: 10.1002/(sici)1097-0142(19971215)80:12+<2505::aid-cncr24>3.3.co;2-b. [DOI] [PubMed] [Google Scholar]
- 6.Erdi YE, Srivastava NC, Humm JL, Larson SM. A Coordinate System for Tumor Identification in Positron Emission Tomography (PET) Imaging. Clin Positron Imaging. 2000 Jul;3(4):131–136. doi: 10.1016/s1095-0397(00)00054-6. [DOI] [PubMed] [Google Scholar]
- 7.Larson SM, Morris M, Gunther I, et al. Tumor localization of 16beta-18F-fluoro-5alpha-dihydrotestosterone versus 18F-FDG in patients with progressive, metastatic prostate cancer. J Nucl Med. 2004 Mar;45(3):366–373. [PubMed] [Google Scholar]
- 8.Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989 Mar;45(1):255–268. [PubMed] [Google Scholar]
- 9.Wahl RL, Jacene H, Kasamon Y, Lodge MA. From RECIST to PERCIST: Evolving Considerations for PET response criteria in solid tumors. J Nucl Med. 2009 May;50(Suppl 1):122S–150S. doi: 10.2967/jnumed.108.057307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1) Eur J Cancer. 2009 Jan;45(2):228–247. doi: 10.1016/j.ejca.2008.10.026. [DOI] [PubMed] [Google Scholar]
- 11.Schwartz LH, Mazumdar M, Brown W, Smith A, Panicek DM. Variability in response assessment in solid tumors: effect of number of lesions chosen for measurement. Clin Cancer Res. 2003;9(12):4318–4323. [PubMed] [Google Scholar]
- 12.Hillman SL, An MW, O’Connell MJ, et al. Evaluation of the optimal number of lesions needed for tumor evaluation using the response evaluation criteria in solid tumors: a north central cancer treatment group investigation. J Clin Oncol. 2009 Jul 1;27(19):3205–3210. doi: 10.1200/JCO.2008.18.3269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Darkeh MH, Suzuki C, Torkzad MR. The minimum number of target lesions that need to be measured to be representative of the total number of target lesions (according to RECIST) Br J Radiol. 2009 Aug;82(980):681–686. doi: 10.1259/bjr/72829563. [DOI] [PubMed] [Google Scholar]
- 14.Moskowitz CS, Jia X, Schwartz LH, Gonen M. A simulation study to evaluate the impact of the number of lesions measured on response assessment. Eur J Cancer. 2009 Jan;45(2):300–310. doi: 10.1016/j.ejca.2008.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Abujudeh HH, Boland GW, Kaewlai R, et al. Abdominal and pelvic computed tomography (CT) interpretation: discrepancy rates among experienced radiologists. Eur Radiol. 2010 Aug;20(8):1952–1957. doi: 10.1007/s00330-010-1763-1. [DOI] [PubMed] [Google Scholar]
- 16.Jacene HA, Leboulleux S, Baba S, et al. Assessment of interobserver reproducibility in quantitative 18F-FDG PET and CT measurements of tumor response to therapy. J Nucl Med. 2009 Nov;50(11):1760–1769. doi: 10.2967/jnumed.109.063321. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







