Abstract
Introduction
Response assessment metrics play an important role in clinical trials and routine patient management. For patients with malignant pleural mesothelioma (MPM), the standard for response assessment is image-based measurements of tumor thickness made according to the modified RECIST (Response Evaluation Criteria in Solid Tumors) protocol. To classify tumor response, changes in tumor thickness are compared with the standard RECIST −30% and +20% cut-points for partial response (PR) and progressive disease (PD), respectively, which are not specific to MPM. The purpose of this work is to optimize the correlation between tumor response and patient survival by assessing the validity of existing response criteria in MPM, and proposing alternative criteria.
Methods
CT measurements of tumor thickness were acquired at baseline and throughout treatment for 78 patients undergoing standard of care chemotherapy. Overall survival was correlated with best response and first follow-up response using Harrell’s C statistic. The response criteria for PD and PR were each varied in 1% increments to obtain the optimal classification criteria. The performance was cross-validated using a leave-one-out approach.
Results
Median survival was 14.9 months. The performance of the standard RECIST criteria in correlating response with survival was 0.778, while the optimized performance was obtained with criteria of −64% for PR and +50% for PD, yielding a performance of 0.855. After cross-validation, this performance was slightly reduced to 0.829.
Conclusions
New tumor response classification criteria were obtained for patients with MPM. These criteria improve the correlation between image-based response and patient survival.
I. INTRODUCTION
Malignant pleural mesothelioma (MPM) is a malignancy of the pleural lining separating the lungs and the thoracic wall and is primarily caused by exposure to asbestos [1]. While evidence suggests that the disease may have peaked in the US in the past decade, European incidence is not forecast to peak until the next decade, and incidence in countries that continue to use asbestos in new buildings will continue to increase [2, 3]. These facts, along with the poor prognosis for the disease, highlight the necessity for effective treatments for mesothelioma [4].
Before “effective treatments” may be discovered, a surrogate of efficacy for early clinical drug development is required, which is the purview of current response assessment metrics. The assessment of disease response to therapy is a vital component of oncologic patient care and clinical trials. As Nowak states in a 2005 article, “a decrease in tumor size may or may not achieve palliation in individual patients. However, tumor response is an important surrogate for patient benefit in non-randomized clinical trials where symptom improvement and increased survival are difficult to assess” [5].
The current clinical method for tumor response assessment in mesothelioma is the modified Response Evaluation Criteria in Solid Tumors (RECIST) guidelines, which calls for two linear measurements of tumor thickness to be summed from each of three axial sections, primarily in computed tomography (CT) scans [6, 7]. To classify patients into response categories, progressive disease (PD) is a summed measurement increase between scans larger than 20%, partial response (PR) is a summed measurement decrease of 30% or more, and stable disease (SD) is any measurement change between −30% and +20%. These classification criteria are the same as the original RECIST criteria for solid tumors [8]. However, the original RECIST classification criteria are based on an extrapolation of the 1981 World Health Organization (WHO) bi-dimensional criteria, which categorize PD as an increase in the bi-dimensional measurement of 25% or more and PR as a bi-dimensional measurement decrease of 50% or more [9]. These classification criteria for two-dimensional measurements were converted to classification criteria for one-dimensional measurements using an assumption of spherical volume geometry and subsequently rounded, leading to the current RECIST classification criteria [10, 11]. Incidentally, the WHO −50%/+25% criteria were based in part on previous breast cancer cohort studies that investigated the minimum change in tumor burden that could be identified reliably by physicians through palpation.
The history of the RECIST classification criteria casts some doubt on the applicability of such criteria for classification of response in a disease so typically aspherical as mesothelioma (see Figure 1). Indeed, others have investigated alternate volume-equivalent response classification criteria for mesothelioma based on theoretical geometric models other than spheres, such as the lens, crescent, or annulus [12]. On the whole, these geometric models indicate that linear measurements acquired according to the modified RECIST guidelines would more closely approximate the corresponding volume changes seen in tumors of spherical morphology if the definition of stable disease was more broad. While these models raise important issues, they are still theoretical derivations. Image-based response assessment is often used as a surrogate for patient benefit in clinical trials, and therefore the most useful (and relevant) response classification criteria would be those developed to maximize the surrogacy of the assessment metric for meaningful outcomes such as survival. The purpose of this work is to optimize the disease-specific response classification criteria by maximizing the association between response assessment and overall survival in patients with malignant pleural mesothelioma treated with chemotherapy.
Figure 1.
The aspherical shape of malignant pleural mesothelioma (solid white arrows) differs fundamentally from the assumptions inherent in the standard RECIST classification criteria.
II. PATIENTS AND METHODS
A. Patients
Imaging and clinical data from 78 patients were obtained retrospectively from a prospective study involving fluorodeoxyglucose positron emission tomography (FDG-PET) and CT imaging of MPM [13]. All patients were over 18 years old with histologically or cytologically confirmed MPM and had not received prior chemotherapy or radiotherapy. Original patient accrual occurred from late 2003 to 2010, and the original study was approved by the local institutional Human Research Ethics Committee at Sir Charles Gairdner Hospital with patients providing written informed consent. The retrospective analysis of the HIPAA-compliant data was approved by both the originating institution’s Human Research Ethics Committee and the Institutional Review Board at The University of Chicago, where the analysis was performed. Because the original study was not a treatment study, patients were treated as clinically indicated. Initially, combination chemotherapy consisted of cisplatin and gemcitabine, and later, when it became available at the original study institution, cisplatin and pemetrexed. Palliative radiotherapy was used when indicated, and a single patient had undergone previous pleurectomy decortication.
B. Imaging
Patients were imaged using helical CT up to 1 month prior to the first cycle of chemotherapy and throughout their treatment regimen (typically after the first cycle, then every two cycles). Images were reconstructed axially with 5-mm slices. CT staging was performed according to the Union for International Cancer Control (UICC) TNM staging system (2002). CT scans were staged by a thoracic radiologist or medical oncologist experienced in mesothelioma imaging, and tumor measurements were made clinically according to the modified RECIST protocol on baseline and all follow-up scans [6]. Pathologic staging was not performed. The clinical measurement protocol dictated that all imaging examinations from an individual patient be measured by the same clinician in an attempt to minimize variability. Initially, radiologic response was classified according to the standard RECIST criteria, where partial response (PR) is a 30% reduction in tumor thickness over baseline, progressive disease (PD) is a tumor thickness increase of 20% over the nadir measurement, and stable disease (SD) is attributed to patients who failed to meet the criteria for either of the other categories.
C. Correlating Response with Survival
To measure the association between patient response classification and survival, a single response category must be assigned for each patient. This single response can be assigned in multiple ways; for instance, the “best response” for the patient achieved during some time interval (usually over the active treatment period) can be used, where PR is “better” than SD, and SD is better than PD. Alternatively, response at a predetermined follow-up time could be equally important. In this study, both the best response and response at the first follow-up scan were investigated.
If the response assessment system is to provide an accurate correlate of survival (measured from diagnosis), patients labeled as PR should survive longer than those patients labeled as SD, and both groups should survive longer than patients labeled as PD. The extent to which the desired trend holds true can be measured quantitatively using Harrell’s C statistic [14]. A value of C = 0.5 is equivalent to classification by chance alone, whereas C = 1.0 would indicate perfect separation of response groups with respect to subsequent survival times. According to Harrell, C > 0.65 indicates clinical utility, while C > 0.80 indicates high predictive accuracy [14]. The numerical value of C can be interpreted as the fraction of patient comparisons that would be “concordant.” All analyses were performed using the academic edition of Revolution R Enterprise (version 4.3, based on R version 2.12), and C was implemented in the R package “Hmisc” [15, 16].
D. Optimization and Cross-Validation
To determine the optimal set of response classification criteria, the PR and PD cutoffs were varied in 1% increments (i.e., the PR cutoff was swept from -100% to 0% in 1% increments, and the PD cutoff was swept from 0% to 100% in 1% increments). For each possible pair of cutoff criteria, the correlation between response and survival was assessed to yield one value of C. By tabulating the values of C across all possible response criteria, the optimal pair of classification criteria was determined. These optimal points represent the classification criteria for which the correlation between response and patient survival is greatest. The criteria derived in this way, using all 78 patients, will be called the “full cohort criteria.”
The optimization process requires validation, since the most optimal model from the full patient cohort has a strong tendency to yield an overly optimistic prediction rule with respect to predictions on de novo observations not involved in model building. A leave-one-out cross-validation (LOOCV) process may thus generate a more realistic value of C [14, 17]. Using LOOCV, each patient was excluded, one at a time, and the classification criteria was optimized using the other 77 patients. The optimized criteria from these 77 “training” patients were used to test the model on the 78th patient (who had been excluded from model optimization). LOOCV allows each patient to be assigned to a response category using criteria that were derived without knowledge of that particular patient’s tumor measurements, and the LOOCV-based value of C is a better indicator of how well the reported classification criteria will perform for a new, previously unknown patient.
Two additional internal validation checks were also performed. First, to evaluate possible dependence of the classification rule and C derived to specific cases or subsets of cases, bootstrap samples from the entire cohort were generated and the optimization procedure repeated, followed by a descriptive summary of the cut-points and C values. Secondly, to evaluate the performance of the derived rule from the full cohort in hypothetical independent patient cohorts, the rule was applied to random bootstrap sub-cohorts and the performance summarized.
Finally, some inferential procedures were carried out comparing the C statistics for the standard RECIST approach versus the optimized classification criteria. When estimates of C are calculated, standard errors (SE) are also calculated for the metric. However, the estimates of the performance of the standard RECIST classification criteria, Cstd, and the optimized classification criteria performance, Copt, will necessarily be correlated for a given patient sample. Therefore, to compare differences in point estimates of C, one must account for this correlation, since an assumption of independence would result in an overly conservative p-value. Correlation was calculated using a jackknife approach on the LOOCV patient subsets with analysis of variance (ANOVA) modeling, similar to a method previously described for ROC curves [18, 19]. Finally, to account for the correlation between point estimates of C, Copt was compared with Cstd using a one-sided Z-test with p < 0.05 as the standard for statistical significance.
III. RESULTS
A. Patients and Overall Survival
Of the 78 MPM patients included in this study, 66 were male and 12 were female. The median patient age at study entry was 66 years (range 41–80 years). Most patients (n=56) had epithelioid histology, with a smaller number of patients having biphasic (n=15) or sarcomatoid (n=7) histology. There were a total of 275 CT scans in this study, with a median of four scans per patient (including baseline scans). Eleven patients had only a baseline scan with one follow-up scan, while 25 patients had three scans total, 32 patients had four scans total, and 10 patients had five scans total. The median duration between scans was 45 days. CT staging identified 11 patients as stage I, 3 patients as stage II, 34 patients as stage III, and 30 patients as stage IV. Median survival was 21.8 months for stage I patients, not available for stage II patients (only one death and two censoring events occurred), 14.8 months for stage III patients, and 13.6 months for stage IV patients. The difference in survival between stages was not significant by a log-rank test (p > 0.05).
Median overall survival from diagnosis was 14.9 months (range 2.5–60 months). Of the 78 patients, there were 75 observed deaths, while three patients were lost to follow-up after a median follow-up of 35 months. The overall survival curve is shown in Figure 2.
Figure 2.

Overall survival curve for patient cohort.
B. Optimization of Classification Criteria
Using the standard RECIST classification criteria of −30% for PR and +20% for PD, the correlation between best response and overall survival was Cstdbest = 0.778 with an SE of 0.048. The correlation between first follow-up response for each patient and overall survival was Cstdfirst = 0.655 with an SE of 0.054. After optimization, the new classification criteria derived from the full cohort were −64% for PR and +50% for PD. Optimizing the correlation between response classification and survival resulted in identical criteria using both the best response and first follow-up response per patient. The performance of these full cohort criteria using the best response per patient was Coptbest = 0.855 with an SE of 0.045, and using the first follow-up response per patient, the performance was Coptfirst = 0.932 with an SE of 0.029. These values are summarized, along with their p-values comparing optimized performance with the standard RECIST classification criteria performance, in Table 1.
Table 1.
Correlation scores between patient response and overall survival from diagnosis. All p-values are calculated with reference to the appropriate standard RECIST classification criteria performance (either best response or first follow-up response per patient) and properly account for correlation between values of C.
| Classification Criteria | C | Standard Error | p -value | |
|---|---|---|---|---|
| Best response | ||||
| Standard RECIST (−30%/+20%) | 0.778 | 0.048 | – | |
|
|
||||
| Optimized (−64%/+50%) | 0.855 | 0.045 | 0.039 | |
|
|
||||
| Cross Validation | 0.829 | 0.043 | 0.121 | |
| First follow-up response | ||||
| Standard RECIST (−30%/+20%) | 0.655 | 0.054 | – | |
|
|
||||
| Optimized (−64%/+50%) | 0.932 | 0.029 | <0.001 | |
|
|
||||
| Cross Validation | 0.872 | 0.049 | <0.001 | |
Figure 3 plots the best response classification for each patient against overall survival from diagnosis using both the standard RECIST classification criteria and the optimized criteria. It can be seen that after optimization, the classification criteria group patients into only two response categories; the two patients originally classified as PD are now included in the SD category. Furthermore, many of the patients originally classified as PR but having short survival durations are now included in the SD category. Figure 4 plots survival curves for the best response categories using both the standard RECIST classification criteria and the optimized criteria. Using the standard RECIST classification criteria, the median survival for best response PD, SD, and PR was 11.5 months, 11.6 months, and 23.0 months, respectively. Using the optimized classification criteria, the median survival for best response SD and PR was 12.9 months and 24.8 months, respectively.
Figure 3a.

Correlation between best response classification per patient and survival. Using the standard RECIST classification criteria.
Figure 4a.

Overall survival from diagnosis by response category. Using the standard RECIST −30%/+20% classification criteria and each patient’s best response.
Table 2 shows a cross-tabulation of how patients are categorized using the standard RECIST and optimized classification criteria for both the best response and first follow-up response. For best response, 17 patients (22%) changed classification categories between the standard RECIST criteria and optimized criteria, and for first follow-up response, 10 patients (13%) changed classification categories.
Table 2.
Number of patients in the different response categories using the standard RECIST classification criteria and the optimized −64%/+50% classification criteria. Response classified according to best response is shown in (a), while response classified according to first follow-up response is shown in (b).
| Optimized Classification Criteria | ||||
|---|---|---|---|---|
| PR | SD | PD | ||
|
|
||||
| PR | 11 | 15 | 0 | |
|
|
||||
| Standard Classification Criteria | SD | 0 | 50 | 0 |
|
|
||||
| PD | 0 | 2 | 0 | |
|
|
||||
| (a) | ||||
| Optimized Classification Criteria | ||||
| PR | SD | PD | ||
|
|
||||
| Standard Classification Criteria | PR | 1 | 8 | 0 |
|
|
||||
| SD | 0 | 67 | 0 | |
|
|
||||
| PD | 0 | 2 | 0 | |
|
|
||||
| (b) | ||||
C. Cross-Validation of Classification Criteria
As indicated in section IID, cross-validation of the optimized classification criteria leads to a more realistic value of model performance, Ccv. Correlating each patient’s cross-validated best response with overall survival, a performance of Ccvbest = 0.829 with an SE of 0.043 was achieved. When the cross-validated first follow-up response was correlated with overall survival, the model performance was Ccvfirst = 0.872 with an SE of 0.049. The LOOCV scheme is more a validation of the optimization process than any one set of optimized criteria, and therefore these C metrics are more realistic estimates of performance without the bias of training and testing a model on the same patient cohort. These C values, along with p-values comparing cross-validated performance with the standard RECIST performance, are summarized in Table 1.
From the first bootstrap internal validation (where the classification criteria were allowed to vary with each independent bootstrap patient sample), the criteria selected for each independent bootstrap sample are summarized as follows. The PR cut-point had a median value of −64%, with a mode of −64% and a mean of −67%, and the PD cut-point had a median value of +50%, with a mode of +50% and a mean of +36%. In the second bootstrap internal validation, where the classification criteria were fixed at −64%/+50%, the mean performance across independent bootstrap samples was Cboot,optbest = 0.852 with an SE of 0.047. For the same independent bootstrap samples, the mean performance of the standard RECIST criteria was Cboot,stdbest = 0.778 with an SE of 0.050. A comparison of these bootstrap performance values and their respective standard errors to the values in Table 1 reveals them to be quite similar.
IV. DISCUSSION
In order to assess patient response to therapy, clinicians have come to rely on image-based measures of tumor burden as a surrogate for “true” patient benefit (i.e., reduced symptom burden or time until a defined event such as death). One common method for image-based assessment is the RECIST paradigm of linear measurements and response classification criteria. While the specific technique used to acquire tumor measurements has been defined in a specific sense for patients with MPM (modified RECIST), the response classification criteria for MPM patients are the same cut-points used for all tumors based on standard RECIST, which defines progressive disease (PD) as a 20% or more increase from measurement nadir, partial response (PR) as a decrease of 30% or more from baseline, and stable disease (SD) as the “middle ground.” However, these classification criteria may not be optimal for any specific disease [20]. Our aim in this study was to optimize the correlation between response classification and overall survival for MPM patients by varying the classification criteria.
The first step in this work was to quantify the relationship between response classification and survival for the standard RECIST classification criteria. If response classification based on linear measurements were “perfectly” associated with survival, every patient classified as PR would live longer than every patient classified as SD, while both classes would live longer than every patient classified as PD. When this relationship holds true, C = 1.0. Using the modified RECIST measurement technique and the standard RECIST −30%/+20% criteria, we found a correlation of Cstd = 0.778 between best patient response and survival.
While the performance of the standard RECIST criteria is within the range of “clinical utility” according to Harrell, performance could be improved by changing the response classification criteria to −64%/+50%. The performance of these criteria was measured as Copt = 0.855. To avoid bias that may result from training and testing on the same group of patients, a cross-validation approach was used to estimate an unbiased performance of Ccv = 0.829, which is in the range of “high predictive accuracy.” While comparing the full cohort performance (0.855) with the standard RECIST performance (0.778) yields a p-value of 0.039, the cross-validated p-value was 0.121. These p-values are calculated by considering the point estimates of C and their respective standard errors as well as the correlation between the two metrics. For a given group of patients, values of C from different classification criteria will be correlated because of the overlap between the discrete response categories. While Copt is significantly larger than C, Ccv is still larger than Cstd, though not significantly so.
Using the optimized response classification criteria, no patients are classified as having progressive disease as their best response. While this is a byproduct of our particular patient sample, the effective reduction in response categories from three to two is actually in line with phase II trials, where classification into only two categories is common (“responders” and “non-responders”). In fact, if the optimization process above is conducted with only one cut-point to start instead of two, the same −64% criterion is obtained to separate a responders category from a non-responders category. Some care also needs to be taken when interpreting the optimized criteria in terms of first follow-up response. With wider criteria, nearly all patients were classified as SD (77 of 78), since there has usually not been enough time for tumor burden to change dramatically in either direction. The patient imaging series used in this study are from routine clinical practice; patients who had progressive disease, unacceptable toxicity, or wished to stop treatment for other reasons did not continue with CT imaging if care was subsequently palliative only. In this context, baseline imaging with two additional scans equates to four cycles of chemotherapy, and it is not uncommon to cease treatment at this point in the context of substantial toxicity even with stable disease. Patient best response is usually achieved only after a number of treatment cycles, so despite the improved performance of the optimized criteria on the first follow-up response compared with standard RECIST, this study does not go so far as to advocate that all patient response should be assessed after only one follow-up scan. In standard clinical treatment, patients are treated as long as is practical and advisable in order to achieve the best response possible, and therefore we believe that the improved performance of the optimized classification criteria for best response is more clinically relevant than the improved performance for first response.
The issue of disease progression is also important in the context of initiating or withdrawing patient treatment. Many clinical trials incorporate progressive disease as an eligibility criterion, use progressive disease as a trigger to cease study treatment, and establish progression-free survival as an important endpoint. All these settings would be impacted by the classification criteria proposed in this study. Because this study was not originally an intervention study, we are unable to determine the impact of the proposed criteria on initiation of patient treatment. Of the 78 patients, 19 experienced disease progression according to the standard RECIST classification criteria at some point during their treatment, with a median time-to-progression of 5.0 months and a median overall survival of 14.9 months. Using the proposed classification criteria, however, only seven patients experienced disease progression at some point during their treatment with a median time-to-progression of 5.9 months and a median overall survival of 14.9 months. Using these revised criteria, patients may be eligible for clinical trials later and stay on treatment longer. In order to validate appropriate criteria for progression, it may be more appropriate to identify tumor thickness changes that correspond to meaningful deterioration in other patient-rated outcomes such as dyspnea, pain, and quality of life.
Previously, theoretical studies explored the possibility of alternate response criteria for MPM by investigating linear measurement cut-points in aspherical geometries [12]. Oxnard et al. obtained classification criteria of −67.9%/+100.1% for an annulus, −51.5%/+45.4% for a lens, and −65.8%/+73.6% for a crescent geometry, with linear measurements made according to the modified RECIST protocol. These alternate criteria are all substantially “wider” than the standard RECIST criteria, as are the optimized criteria we derived in this study; however, the theoretical criteria of Oxnard et al. were all based on volumetric equivalence to −30%/+20% changes in the diameter of a sphere, and the somewhat arbitrary provenance of those original criteria were outlined in section I.
This study sought to identify classification criteria that optimized correlation with overall survival. To fully validate these new response criteria derived from this moderately sized database, they must be tested on larger independent patient cohorts. While the leave-one-out cross-validation used in this study attempts to simulate this process, it is not a substitute for a full independent validation, and future work will seek to validate these proposed response criteria. We also caution that while these criteria predict survival in patients on cytotoxic chemotherapy, it is unclear whether they would be a valid candidate surrogate for survival benefit in patients receiving a targeted therapy. Since the variability in manual measurements has been well documented [21], it is possible that measurements from different observers would have resulted in different optimized classification criteria, and future studies will incorporate measurements from multiple observers to assess inter-observer variability. Finally, because the patient cohort was acquired from a previous study involving FDG-PET imaging for patients with MPM [13], we look forward to investigating joint correlations between response as assessed using FDG-PET imaging parameters (for instance, standardized uptake value or total glycolytic volume) and response as assessed using Modified RECIST measurements in a future study.
To summarize, the current standard for response assessment in patients with malignant pleural mesothelioma is a set of linear tumor thickness measurements acquired according to the modified RECIST protocol. Changes in these tumor measurements are compared with classification criteria, currently defined as −30% for partial response and +20% for progressive disease. Despite the original arbitrary provenance of these cut points, they perform adequately and are within the range of “clinical utility.” However, by changing these criteria to −64% and +50%, respectively, the correlation between tumor response and overall survival is improved. These optimized classification criteria appear better suited to the specific morphology and growth pattern of mesothelioma and may prove useful in the assessment of clinical trials and routine patient care.
Figure 3b.

Correlation between best response classification per patient and survival. Using the classification criteria derived from optimization on the full patient cohort. Black diamonds indicate observed events, while red circles indicate losses to follow-up. Performance metrics are summarized in Table 1.
Figure 4b.

Overall survival from diagnosis by response category. Using the optimized −64%/+50% response criteria. No patients were classified as having progressive disease with the optimized criteria.
Acknowledgments
Sources of Support: Supported, in part, by The University of Chicago Comprehensive Cancer Center, the Raine Medical Research Foundation, USPHS Grant Number R01CA102085, the Simmons Mesothelioma Foundation, the Kazan Law Firm’s Charitable Foundation, the National Health and Medical Research Council, Australia, and the Cancer Council Western Australia.
Footnotes
Conflicts of Interest: ZEL – none.
SGA – receives royalties and licensing fees through the University of Chicago related to computer-aided diagnosis.
HLK – none. JJD – none. AH – none.
AKN – remunerated speaker for Eli Lilly and received travel funding from Eli Lilly.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
REFERENCES
- [1].Robinson BWS, Lake RA. Advances in malignant mesothelioma. N Engl J Med. 2005;353:1591–1603. doi: 10.1056/NEJMra050152. [DOI] [PubMed] [Google Scholar]
- [2].Peto J, Decarli A, La Vecchia C, Levi F, Negri E. The European mesothelioma epidemic. Br J Cancer. 1999;79:666–672. doi: 10.1038/sj.bjc.6690105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Weill H, Hughes JM, Churg AM. Changing trends in US mesothelioma incidence. Occup Environ Med. 2004;61:438–441. doi: 10.1136/oem.2003.010165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Ceresoli GL, Castagneto B, Zucali PA, et al. Pemetrexed plus carboplatin in elderly patients with malignant pleural mesothelioma: Combined analysis of two phase II trials. Br J Cancer. 2008;99:51–56. doi: 10.1038/sj.bjc.6604442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Nowak AK. CT, RECIST, and malignant pleural mesothelioma. Lung Cancer. 2005;49(Suppl 1):S37–40. doi: 10.1016/j.lungcan.2005.03.030. [DOI] [PubMed] [Google Scholar]
- [6].Byrne MJ, Nowak AK. Modified RECIST criteria for assessment of response in malignant pleural mesothelioma. Ann Oncol. 2004;15:257–260. doi: 10.1093/annonc/mdh059. [DOI] [PubMed] [Google Scholar]
- [7].Armato SG, III, Entwisle J, Truong MT, et al. Current state and future directions of pleural mesothelioma imaging. Lung Cancer. 2008;59:411–420. doi: 10.1016/j.lungcan.2007.09.027. [DOI] [PubMed] [Google Scholar]
- [8].Therasse P, Arbuck SG, Eisenhauer EA, et al. New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst. 2000;92:205–216. doi: 10.1093/jnci/92.3.205. [DOI] [PubMed] [Google Scholar]
- [9].Miller AB, Hoogstraten B, Staquet M, Winkler A. Reporting results of cancer treatment. Cancer. 1981;47:207–214. doi: 10.1002/1097-0142(19810101)47:1<207::aid-cncr2820470134>3.0.co;2-6. [DOI] [PubMed] [Google Scholar]
- [10].Michaelis LC, Ratain MJ. Measuring response in a post-RECIST world: From black and white to shades of grey. Nat Rev Cancer. 2006;6:409–414. doi: 10.1038/nrc1883. [DOI] [PubMed] [Google Scholar]
- [11].Jaffe CC. Measures of response: RECIST, WHO, and new alternatives. J Clin Oncol. 2006;24:3245–3251. doi: 10.1200/JCO.2006.06.5599. [DOI] [PubMed] [Google Scholar]
- [12].Oxnard GR, Armato SG, III, Kindler HL. Modeling of mesothelioma growth demonstrates weaknesses of current response criteria. Lung Cancer. 2006;52:141–148. doi: 10.1016/j.lungcan.2005.12.013. [DOI] [PubMed] [Google Scholar]
- [13].Nowak AK, Francis RJ, Phillips MJ, et al. A novel prognostic model for malignant mesothelioma incorporating quantitative FDG-PET imaging with clinical parameters. Clin Cancer Res. 2010;16:2409–2417. doi: 10.1158/1078-0432.CCR-09-2313. [DOI] [PubMed] [Google Scholar]
- [14].Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
- [15].R Development Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2011. [Google Scholar]
- [16].Harrell FE., Jr Hmisc: Harrell Miscellaneous. 2010 with contributions from many other users. (Programs available from http://CRAN.R-project.org/package=Hmisc.
- [17].Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat. 1983;37:36–48. [Google Scholar]
- [18].Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Acad Radiol. 1997;4:587–600. doi: 10.1016/s1076-6332(97)80210-3. [DOI] [PubMed] [Google Scholar]
- [19].Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol. 1992;27:723–731. [PubMed] [Google Scholar]
- [20].Birchard KR, Hoang JK, Herndon JE, Patz EF., Jr Early changes in tumor size in patients treated for advanced stage nonsmall cell lung cancer do not correlate with survival. Cancer. 2009;115:581–586. doi: 10.1002/cncr.24060. [DOI] [PubMed] [Google Scholar]
- [21].Armato SG, III, Oxnard GR, MacMahon H, Vogelzang NJ, Kindler HL, Kocherginsky M, Starkey A. Measurement of mesothelioma on thoracic CT scans: A comparison of manual and computer-assisted techniques. Med Phys. 2004;31:1105–1115. doi: 10.1118/1.1688211. [DOI] [PubMed] [Google Scholar]

