Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 13.
Published in final edited form as: Bone Marrow Transplant. 2014 Aug 25;49(12):1513–1520. doi: 10.1038/bmt.2014.188

NIH response criteria measures are associated with important parameters of disease severity in patients with chronic GVHD

LM Curtis 1,2,11, L Grkovic 2,3,11, SA Mitchell 4, SM Steinberg 5, EW Cowen 6, MB Datiles 7, J Mays 8, C Bassim 8, G Joe 9, LE Comis 9, A Berger 10, D Avila 2, T Taylor 2, D Pulanic 3, K Cole 2, J Baruffaldi 2, DH Fowler 2, RE Gress 2, SZ Pavletic 2
PMCID: PMC7665799  NIHMSID: NIHMS1639843  PMID: 25153693

Abstract

Lack of standardized criteria measuring therapeutic response remains an obstacle to the development of better treatments for chronic GVHD (cGVHD). This cross-sectional prospective study examined the concurrent and predictive validity of 18 clinician-reported (‘Form A’) and 8 patient-reported (‘Form B’) response measures proposed by NIH criteria. Concurrent parameters of interest were NIH global score, cGVHD activity, Lee symptom score and SF36 PCS. Patient cohort included 193 adults with moderate-to-severe cGVHD. Measures associated with the highest number of outcomes were lung function score (LFS), 2-min walk, grip strength, 4-point health-care provider (HCP) and patient global scores, 11-point clinician- and patient-reported global symptom severity scores, and Karnofsky performance score (KPS). Measures associated with survival in univariate analyses led to a Cox model containing skin erythema, LFS, KPS, eosinophil count and interval from cGVHD diagnosis to enrollment as jointly associated with survival. In conclusion, 4-point HCP and patient global scores and 11-point clinician- and patient-reported global symptom severity scores are associated with the majority of concurrent outcomes. Skin erythema is a potentially reversible sign of cGVHD that is associated with survival. These results define a subset of measures that should be prioritized for evaluation in future studies.

INTRODUCTION

Chronic GVHD (cGVHD) affects 40–60% of patients who have undergone allogeneic hematopoietic cell transplantation.1 It is a major cause of nonrelapse mortality and morbidity.2 Prednisone with or without a calcineurin inhibitor is the first-line treatment for cGVHD; however, about one-half of all patients do not respond to this initial therapy,3,4 and there is no standard second-line therapy.5 At this time, no single agent has FDA regulatory approval for cGVHD indication. There is a need for clinical trials to optimize the currently used therapies and develop new agents.

One major barrier to designing clinical trials in cGVHD is the lack of standard response criteria. The 2005 NIH consensus project for clinical trials proposed a set of cGVHD-specific and ancillary response measures for validation.6 The proposed clinician-reported (‘Form A‘) and patient-reported (‘Form B‘) measures were developed to be suitable for use by stem cell transplant clinicians to quantify change in the most important cGVHD manifestations after therapy. The only therapeutic intervention study published thus far, which strictly applied NIH criteria, was in a cohort of 39 steroid-refractory patients.7 The results of that study suggested the NIH overall response calculation change was predictive for survival, and a significant correlation between the NIH overall response calculation and the 4-point Courriel clinician-assessed response scale was found. Baird et al.8 have shown that NIH organ and global severity scores are valid measures of cGVHD disease burden, and that the NIH lung score is the strongest predictor of survival outcomes. This paper extends that work by examining the association of the NIH response criteria measurements with NIH cGVHD Global Score, activity by therapeutic intent, self-assessed physical health and symptom burden. Comparatively, little has been reported concerning the distribution of NIH response measures in large observational cGVHD cohorts. Identification of the most relevant clinical characteristics and the reduction of items used for response evaluation would be valuable to guide further development of response criteria in clinical trials for cGVHD.

The NIH cGVHD natural history protocol (NCT01688466) is an observational cohort study in which cross-sectional data are prospectively gathered on patients with a wide spectrum of clinical manifestations of cGVHD. Patients participating in the cohort undergo multispecialty clinical and laboratory cGVHD measurements, functional assessments and provide standardized self-reports. The purpose of the present study was to profile the clinician- (‘Form A’) and patient- (‘Form B’) reported NIH response measures in a large group of cGVHD patients who failed prior therapies and to identify measures most highly associated with concurrent parameters of disease severity and subsequent survival to be used as tools for further development in clinical trials.

MATERIALS AND METHODS

Subjects

Patients were referred for evaluation of cGVHD on the NCI protocol NCT00092235, Natural History of Clinical and Biological Factors Determining Outcomes in Chronic Graft-Versus-Host Disease. The study was approved by the Institutional Review Board of the National Cancer Institute. All patients or their legal representatives provided informed consent in accordance with the Declaration of Helsinki. To be eligible for the trial, participants had to have been diagnosed with cGVHD per NIH criteria, meaning that at least one clinical manifestation diagnostic for cGVHD was present, or a distinct manifestation plus a supportive test in the same or another affected organ was identified.9 Other etiologies such as infection, drug reaction and malignancy had to have been ruled out. Those patients without cGVHD, with late acute GVHD or receiving therapy for malignancy in the previous 3 months were excluded from the study consideration. All evaluations were collected during a 1-week study visit at the NIH. A total of 193 patients were included in the analysis. This patient cohort was the subject of our previous analysis that focused on validation of the NIH organ-specific staging.8

NIH response criteria measures

During study visits, the clinician-assessed cGVHD response forms were completed (‘Form A’, see appendix). Evaluation included body surface area % (BSA%) of erythematous rash, moveable sclerosis, non-movable sclerosis, and largest ulcer dimension, Schirmer tear test without anesthesia, modified oral mucosa rating score (OMRS), total white blood count (WBC), total serum bilirubin, alanine aminotransferase, upper gastrointestinal (GI), esophageal and lower GI symptom scores, lung function score (LFS), total distance walked in 2 min, grip strength in dominant hand, 4-point health care provider (HCP) global score, 11-point clinician-reported cGVHD global symptom severity score, clinician-reported assessment of change in patient’s cGVHD in the past month and Karnofsky performance status score (KPS). Patient-based assessment of cGVHD (‘Form B’, see appendix) included scoring of skin pruritus, mouth dryness, mouth pain, mouth sensitivity, chief eye symptom, 4-point patient global score, 11-point patient-reported cGVHD global symptom severity score and patient-reported perception of change in cGVHD in the past month.

Clinical outcomes, measures and survival

During the same visit, NIH organ-specific 0–3 scores were assigned to each of eight organs commonly affected by cGVHD (skin, mouth, eyes, GI tract, liver, lungs, joints-fascia and female genital tract).9 The following clinically important concurrent measures were recorded (referred to as ‘outcomes of interest’): NIH global score as defined by Filipovich et al.;9 cGVHD activity by clinician therapeutic intent as previously defined by SA Grkovic et al.;10 Lee total symptom score;11 and Short-Form 36 Health Survey physical component summary. Survival follow-up information was obtained by phone calls to the patients and their physician offices or by use of the Social Security death information website.

Statistical analysis

Univariate associations between parameters and four outcomes of interest, named below, were initially determined by Wilcoxon rank sum test, Fisher’s exact test, Jonckheere-Terpstra trend test, Cochran-Armitage test for trend or Spearman rank correlation12,13 as appropriate. Spearman correlations are interpreted as follows: |r|>0.70 = strong correlation; 0.5 <|r| <0.7 = moderately strong correlation; 0.3 <|r| <0.5 = weak-to-moderately strong correlation; |r| <0.3 = weak correlation. On the basis of this univariate screening analysis, parameters found to be potentially associated with severity of outcomes (P <0.05 and r >0.35) were evaluated using individual univariate logistic models to further determine whether to include them in multiple logistic regression models for prediction of the dichotomous outcomes of NIH severity or therapeutic intent. Then multiple logistic regression analyses were used to determine the association between parameters and those dichotomous outcomes. In addition, multiple regression analysis was used to determine the association between potential predictors and the continuous Lee total score and Short-Form 36 Health Survey physical component summary score.

Survival analyses were carried out beginning at the date of entry onto the natural history protocol until death or last follow-up. Kaplan–Meier analyses and log-rank tests were used to determine the association between potential predictors and survival after entering on the trial. To perform Cox model analyses, we began with a large number of factors, which were specifically explored in this study, as well as several which were previously reported to be associated with survival in this cohort or the previous literature.8,10 Factors under consideration included BSA % erythema, WBC, bilirubin, LFS, 4-point HCP global score, 11-point clinician-assessed global symptom severity score, KPS, eosinophil count, the interval from transplant to enrollment on the present study, walk distance in 2 min, platelet count, interval from cGVHD diagnosis to enrollment on the present study, age, type of cGVHD onset and NIH skin score. These parameters were then evaluated for their joint association with survival using a Cox proportional hazards model, with the final model determined by backward selection. Of note, BSA% erythema was a continuous parameter that was dichotomized at 3% for the survival analysis (0–3% BSA erythema vs >3% BSA erythema).

Except for the need to adjust P-values in univariate survival analyses to account for groupings made to separate patients into categories with different prognosis (see footnote for Supplementary Table 2 for more details), all P-values are two-tailed and reported without adjustment for multiple comparisons.

RESULTS

Patient characteristics

Between October 2004 and March 2011, the study enrolled 245 patients. Patients were excluded from this analysis for the following reasons: 21 were excluded for missing data, 20 pediatric patients (age 1–17) were excluded because younger patients could not reliably complete certain patient surveys and assessments, 2 patients voluntarily withdrew prior to completion of the study, 6 who did not fulfill criteria for cGVHD were excluded and 3 patients were excluded for inability to comply with study requirements.

A total of 193 patients remained for the statistical analysis (Table 1). Patients’ ages were well distributed (age 18–70) with a median age of 48. Myeloablative conditioning regimens (n = 108; 56%) were more common than non-myeloablative regimens (n = 84; 44%). Most patients received peripheral blood mobilized stem cells (n = 155; 80%) and donors were related to 128 patients (66%).

Table 1.

Patient demographics

N (% or range)
All 193
Median age, years 48 (18–70)
Sex
 Male 98 (51)
 Female 95 (49)
Disease
 ALL, AML, MDS 82 (42)
 CML, IMF, MPD 31 (16)
 CLL 13 (7)
 HL, NHL 44 (23)
 MM 12 (6)
 AA, PNH 6 (3)
 Other 5 (3)
Conditioning regimen
 Myeloablative 108 (56)
 Non-myeloablative 84 (44)
Stem cell source
 BM 36 (19)
 PBSC 155 (80)
 Cord 2 (1)
Relationship
 Related 128 (66)
 Unrelated 65 (34)
HLA matched
 Yes 160 (83)
 No 28 (15)
 Unknown 5 (2)

Abbreviations: AA = aplastic anemia; HL = Hodgkin’s lymphoma; IMF = idiopathic myelofibrosis; MDS = myelodysplastic syndrome; MM = multiple myeloma; MPD = myeloproliferative disorder; NHL = non-Hodgkin’s lymphoma; PNH = paroxysmal nocturnal hemoglobinuria.

Progressive onset cGVHD was the most common presentation (n = 77; 40%) followed by de novo (n = 65; 34%) and quiescent (n = 50; 26%) (Table 2). The median time from transplant to cGVHD diagnosis was 6.9 months, and the median time from cGVHD diagnosis to study enrollment was 22.8 months. Two-thirds of patients had severe cGVHD by NIH global score (n = 130; 68%), with a median of four prior cGVHD treatments received.

Table 2.

cGVHD characteristics

N (% or range)
All patients 193
cGVHD onseta
 Progressive 77 (40)
 Quiescent 50 (26)
De novo 65 (34)
 Median days from transplant to cGVHD Dx 211 (51–2482)
 Median days from cGVHD Dx to consent 696 (0–6670)
cGVHD organ involvementb
 Skin 152 (79)
 Joints and fascia 122 (63)
 Eyes 161 (83)
 Mouth 129 (67)
 Lungs 150 (77)
 Liver 95 (49)
 GI 86 (45)
 Genital (females only, n = 95) 45 (47)
 Number of organs affected by cGVHD Median 5 (range 1–8)
 1–2 affected organs 11 (6)
 3–4 affected organs 82 (42)
 5–6 affected organs 77 (40)
 7–8 affected organs 24 (12)
NIH Global scorec
 Mild 5 (3)
 Moderate 58 (30)
 Severe 130 (68)
Prior cGVHD systemic treatment regimens
 <2 21 (11)
 2–3 71 (37)
 4–5 59 (31)
 >5 38 (20)
Intensity of immunosuppressiond
 None/mild 47 (24)
 Moderate 72 (37)
 Severe 73 (38)
Activity by therapeutic intente
 Active 78 (40)
 Non-active 77 (40)
 Other 38 (20)

Abbreviations: Dx = diagnosis; GI = gastrointestinal; NIH = National Institutes of Health.

a

Definition for cGVHD onset: progressive (acute GVHD progressed directly to cGVHD); quiescent (acute GVHD resolved, then chronic GVHD developed); de novo (acute GVHD never developed).

b

cGVHD organ involvement per NIH Organ Score.9

c

Definition for NIH Global score as follows: mild (1–2 organs affected by cGVHD with scores 1); moderate (more than 2 organs with score 1, any score 2 or lung score 1) or severe (any score of 3 or lung score of 2).9

d

Definition of intensity of immunosuppression is as follows: mild (single-agent prednisone 0.5 mg/kg per day); moderate (single agent prednisone 0.5 mg/kg per day and/or any single agent/modality); high (two or more agents/modalities +/ − prednisone 0.5 mg/kg per day).21

e

Definition of activity by therapeutic intent is as follows: active (increase systemic therapy because cGVHD is worse, substitute systemic therapy because of lack of response or withdraw systemic therapy because of lack of response); non-active (decrease systemic therapy because cGVHD is better, not change current systemic therapy because cGVHD is stable or alter systemic therapy owing to its toxicity); other (either did not receive any immunosuppressive therapy or did not meet any of the criteria).10

Distribution of clinician- (‘Form A’) and patient- (‘Form B’) reported measures

Results of clinician assessments are shown in Table 3. Percentages were calculated from the total number of patients included in the analysis (n = 193). The majority of patients affected by skin erythema, moveable sclerosis, mouth cGVHD and GI abnormalities had scores that were clustered on the low end of the scale. Of the organ-specific scores, LFS scores had the best distribution across the different subgroups listed (LFS 2, 3–5, 6–9 and 10–12). For the patient-reported measures (Form B), as shown in Table 4, the majority of scores were also clustered in the lower end of the scales (1–3 range), with the exception of the chief eye symptom scores.

Table 3.

Distribution of clinician-reported measures in patients with moderate to severe cGvHD

N (%)
Erythema (BSA%)
 0   79 (41)
 1–10   78 (40)
 11–24   17 (9)
 25–49   13 (7)
 ⩾ 50     5 (3)
Moveable sclerosis (BSA%)
 0    86 (45)
 1–10    62 (32)
 11–24    17 (9)
 25–49    17 (9)
 ⩾ 50  9 (5)
Non-moveable sclerosis (BSA%)
 0 103 (53)
 1–10    31 (16)
 11–24    17 (9)
 25–49    29 (15)
 ⩾ 50    12 (6)
Schirmer w/o anesthesia (mm)
 ⩽ 5 129 (67)
 6–10    36 (19)
 ⩾ 11    23 (12)
Modified OMRS (0–15)
 0    48 (25)
 1–2    83 (43)
 3–10    31(16)
 ⩾ 11  1 (0.01)
WBC (×103/μl)
 ⩽ 3.98    20 (10)
 Normal: 3.99–10.04 133 (69)
 ⩾ 10.05    40 (21)
Bilirubin (mg/dl)
 ⩽ 1.0 182 (94)
 >1.0    10 (5)
ALT (U/L)
 Normal: ⩽ 33    86 (45)
 34–68    71 (37)
 69–170    25 (13)
 ⩾ 171    10 (5)
LFS
 2    49 (25)
 3–5    70 (36)
 6–9    56 (29)
 10–12    15 (8)
Upper GI Score
 0 146 (76)
 1    34 (18)
 2    10 (5)
 3  3 (2)
Esophageal Score
 0 137 (71)
 1    37 (19)
 2    12 (6)
 3  7 (4)
Lower GI Score
 0 162 (84)
 1    22 (11)
 2  8 (4)
 3  1 (0.01)
Walk distance in 2 min (feet)
 ⩽ 500    52 (27)
 501–650    85 (44)
 ⩾ 651    41 (21)
Grip strength (lbs)
 ⩽ 50    88 (46)
 51–100    87 (45)
 101–150    10 (5)
KPS (%)
 30–60    21 (11)
 70–80 101 (52)
 90–100    67 (35)
 4-point HCP global score
 0  1 (0.5)
 1    38 (20)
 2    63 (33)
 3    67 (35)
11-point clinician-assessed global symptom severity score
  0  2 (1)
 1–3    38 (20)
 4–6    51 (26)
 7–8    76 (39)
 9–10    10 (5)
 Clinician Changea
 − 3  1 (1)
 − 2  7 (4)
 − 1    53 (27)
 0    74 (38)
 +1    30 (16)
 +2  8 (4)
 +3  3 (2)

Abbreviations: BSA: body surface area, w/o: without, OMRS: oral mucosa rating scale, WBC: white blood count, ALT: alanine aminotransferase, LFS: lung function score, GI:gastrointestinal, KPS: Karnofsky performance status, HCP: health care provider.

a

Clinician change in last month: −3: very much worse, −2: moderately worse, −1: a little worse, 0 - about the same, +1: a little better, +2: moderately better, +3: very much better.

Table 4.

Distribution of patient-reported measures in patients with moderate to severe cGvHD

N (%)
Skin itching
 0 55 (28)
 1–3 50 (26)
 4–6 30 (16)
 7–8 10 (5)
 9–10   7 (4)
Mouth dryness
 0 43 (22)
 1–3 57 (30)
 4–6 18 (9)
 7–8 21 (11)
 9–10 12 (6)
Mouth pain
 0 79 (41)
 1–3 37 (19)
 4–6 18 (9)
 7–8   6 (3)
 9–10 10 (5)
Mouth sensitivity
 0 56 (29)
 1–3 42 (22)
 4–6 30 (16)
 7–8   9 (5)
 9–10 13 (7)
Chief Eye Symptom
 0 13 (7)
 1–3 36 (19)
 4–6 44 (23)
 7–8 34 (18)
 9–10 17 (9)
4-point patient global score
 0   0 (0)
 1 42 (22)
 2 75 (39)
 3 31 (16)
11-point patient-reported global symptom severity score
 0   4 (2)
 1–3 31 (16)
 4–6 57 (30)
 7–8 46 (22)
 9–10 12 (6)
Patient changea
 − 3   4 (2)
 − 2 11 (6)
 − 1 28 (15)
 0 63 (33)
 +1 19 (10)
 +2 16 (8)
 +3   8 (4)
a

Patient change in last month. − 3: very much worse, − 2: moderately worse, − 1: a little worse, 0 - about the same, +1: a little better, +2: moderately better, +3: very much better.

The results of the 4-point HCP global scores (Table 3) and the 4-point patient global scores (Table 4) show a relatively even distribution across the mild (score 1), moderate (score 2) and severe (score 3) categories. A higher number of patients had an HCP global score of 3 (n = 67; 35%) than had a patient-reported global score of 3 (n = 31; 16%), which may suggest that patients downgrade their overall chronic GVHD severity. The same trend was observed when comparing 11-point clinician- and patient-reported global symptom severity scores. More patients had severe clinician-reported global symptom severity scores of 7–10 (n = 86; 45%) than had severe patient-reported global symptom severity scores (n = 58; 30%).

Association of clinician- and patient-reported measures with clinical outcomes and survival

Univariate analysis examined the association of clinician measures with concurrent outcomes of interest, as seen in Figure 1. The clinician measures associated with the highest number of outcomes were LFS (associated with three outcomes), walk distance in 2 min (three outcomes), grip strength (three outcomes), the 4-point HCP global score (four outcomes), 11-point clinician cGVHD global symptom severity score (three outcomes) and KPS (four outcomes). BSA% erythema, Schirmer’s, modified OMRS and alanine aminotransferase were not found to be associated with any of the concurrent outcomes included in this analysis.

Figure 1.

Figure 1.

Clinician-reported measures and their association with four concurrent outcomes of interest reflective of cGVHD disease burden. Univariate comparison of clinician measures with four outcomes. For the dichotomous outcomes, P values were measured presented in the following way; P <0.001 corresponded to a highly significant association indicated with solid blocks, and P <0.05 corresponded to a trend toward significance as represented by the faded blocks. For the continuous outcomes, r values were measured and presented as follows; r>0.5 corresponded to a strong to moderately strong correlation as indicated in solid blocks and r = 0.3 to 0.5 corresponded to a weak to moderately strong correlation as indicated by faded blocks. Blocks that are white represent clinician- or patient-reported parameters where there was a weak or absent association with the clinical outcomes of interest. ALT = alanine aminotransferase; NIH = National Institutes of Health; PCS = physical component scale; SF36 = short-form 36.

In the univariate analysis of the patient-reported measures (Figure 2), 4-point patient global scores and the 11-point patient-reported global symptom severity scores were associated with the most outcomes of interest (four each). The patient-reported scores for skin pruritus, mouth dryness, mouth pain and mouth sensitivity were found to be exclusively associated with the Lee total symptom score; this result was expected because the Lee symptom questionnaire includes similar severity grading of these symptoms. The result of the multivariate analysis presenting the joint association of both clinician and patient measures with concurrent outcomes is shown in Supplementary Table 1.

Figure 2.

Figure 2.

Patient-reported measures and their association with four concurrent outcomes of interest reflective of cGVHD disease burden. Univariate comparison of patient-reported measures with four outcomes. For the dichotomous outcomes, P-values were measured and presented in the following way; P <0.001 corresponded to a highly significant association indicated with solid blocks, and P <0.05 corresponded to a trend toward significance as represented by the faded blocks. For the continuous outcomes, r values were measured and presented as follows; r>0.5 corresponded to a strong to moderately strong correlation as indicated in solid blocks and r = 0.3 to 0.5 corresponded to a weak to moderately strong correlation as indicated by faded blocks. NIH = National Institutes of Health; PCS = physical component scale; SF36 = short-form 36.

Several clinician-reported parameters were associated with survival in the univariate analysis (see Supplementary Table 2) including the BSA% erythema (P = 0.0018), LFS (P = 0.014), KPS (P <0.0001) and one patient-reported parameter mouth pain (P = 0.03). In the final Cox model (Table 5), BSA% erythema (P = 0.053), LFS (P = 0.0017), KPS (P = 0.013), time from cGVHD diagnosis to enrollment (P = 0.0060) and eosinophil count (P = 0.0018) were jointly associated with survival. It was also of interest to explore whether there may be an association between previous types of immunosuppressive treatment and survival. Analyses adding this to the Cox model we have reported suggest no gain in predictive ability from this information (data not shown).

Table 5.

Cox multivariable model of factors jointly predicting survival.a

P-value Hazard ratio 95% CI
Erythema BSA% 0.053 1.995 0.992–4.013
LFS 0.0017 3.436 1.591–7.421
KPS 0.013 0.413 0.206–0.828
Time from cGVHD Dx 0.0060 0.186 0.0056–0.617
Eosinophil count 0.0018 3.960 1.672–c9.381

Abbreviations: BSA% = body surface area %; CI = confidence interval; Dx = diagnosis; KPS = Karnofsky performance score; LFS = lung function score. In addition to the ‘Form A’ and ‘Form B’ measures (P-values in Supplementary Table 2), other variables considered for inclusion in the final Cox model were time from walk distance in 2 min ( < = 438 vs >438 feet: P = 0.025 unadjusted; P = 0.074 adjusted); time from transplant to enrollment ( < = 60.5 months vs >60.5 months: P = 0.021 unadjusted, P = 0.063 adjusted); platelets ( < = 100 vs >100: P = 0.56); cGVHD onset (progressive vs quiescent+de novo: P = 0.48); age ( < = 40 vs >40: P = 0.40); NIH skin score (0–2 vs 3: P = 0.0173).

a

Categories used for comparison: erythema BSA % (0–3% vs >3%: p 0.0006 adjusted, P = 0.0018 unadjusted); LFS ( 2–7 vs 8–12: P = 0.0047 unadjusted, P = 0.014 adjusted); KPS (30–70 vs 80–100: P <0.0001 unadjusted, P <0.0001 adjusted); time from cGVHD Dx (diagnosis) to enrollment ( < = 49 months vs>49 months: P = 0.021 unadjusted, P = 0.063 adjusted); eosinophils: < = 0.5 vs >0.5: P = 0.0035).

DISCUSSION

This cross-sectional prospective study provides a comprehensive view of the distributions and association with cGVHD severity of the 2005 NIH cGVHD response criteria measures in a cohort of patients severely affected by cGVHD. First, this study provides detailed frequencies of clinician- (‘Form A’) and patient-reported (‘Form B’) cGVHD specific or ancillary NIH response measures. Several manifestations, including skin, oral mucosa, WBC, liver function tests and GI symptoms show major clustering of values in the lower ranges of the scales or are normal, suggesting that these particular scales may not be sufficiently informative in many patients with cGVHD. Also, more stringent definitions of requirements for responses in the lower ranges of scales (e.g., no possible PR) or designation of a sentinel organ (the one causing most prominent manifestations or symptoms) may be needed in patients. In contrast, clinician- and patient-reported 4-point global scores and 11-point global symptom severity scores provided best distribution of values across full range of the scales and thus are likely best at capturing the spectrum of cGVHD manifestations in an individual patient. Developing algorithms for measuring combined constellations of organ-specific and global assessment responses to assess overall therapy response, similar to what has been successfully carried out in other systemic inflammatory diseases14,15 may help in the development of highly relevant cGVHD response measurement tools.

Second, most NIH-proposed measures of response showed statistically significant associations with important clinical outcomes in the univariate analyses. Among clinician-reported measures, the LFS, grip strength, 2-min walk time, 4-point HCP global score, 11-point clinician-reported global symptom severity score and KPS were associated with the most outcomes of interest. Of the patient-reported measures, the 4-point patient global score and 11-point patient-reported global symptom severity score were most frequently associated with the four concurrent outcomes. These data indicate that Form A and Form B NIH-proposed measures of response address clinically relevant manifestations of cGVHD severity, underscore the significance of clinician- and patient-reported 4-point and 11-point global scales and support the use and development of these practical scales as part of the cGVHD response criteria tools. A 0–10 scale may be more suitable as a response assessment tool, as previous research showed their superior validity and reliability compared with a 4-point scale, also an improved discriminating power and respondent preference.16 Of interest, some measures, such as Schirmer’s, modified OMRS and alanine aminotransferase had no associations with concurrent outcomes of interest, thereby suggesting their limited impact on cGVHD severity, therapeutic intent, self-assessed health and cGVHD symptom burden in patients with established moderate or severe cGVHD. However, these measures may have utility in studies of organ-directed therapies or in patients who are newly diagnosed. The utility of the Schirmer test as a responsive measure of cGVHD severity has been questioned by other groups. Inamoto et al.17 reported that the Schirmer test does not significantly correlate with clinician- and patient-reported symptom longitudinal changes. The results of this current study lend strength to the conclusion that the Schirmer test is likely not a valid response measure for ocular cGVHD, although it may have an important role in ocular cGVHD diagnosis and staging.

Thirdly, we found that in addition to previously reported predictors of survival such as KPS, LFS and eosinophilia,8 this study also makes the observation that erythema is, jointly with these factors, associated with survival in the Cox multivariate model. Studies in prior eras indicated that presence of skin lichenoid changes and higher skin BSA% were associated with poor survival outcomes in cGVHD.18 A recent study by Jacobsohn et al.19 in a population significantly affected by erythema demonstrated that worsening of NIH skin scores was associated with poor survival. This, in addition to our findings that there is an association between erythema and survival, supports further investigations of skin erythema as a potentially valid and sensitive measure of response in cGVHD trials. One caveat with the use of skin erythema as a response measure is that the use of topical therapies can be effective in managing this manifestation of cGVHD, and alter the presence of erythema without affecting long-term outcomes. Clinical trial design should take this into account, where initiation of new topical therapies should be carefully recorded, or altogether avoided, to address this potentially confounding factor.

One important limitation of this study is the cross-sectional design, and it is possible that the findings of this study may not translate to a clinical intervention trial setting as measures of change and of potential clinical benefit. However, this current approach allowed effective concurrent assessment frequencies and distributions of all of NIH-proposed measures in a large number of patients severely affected with cGVHD who represent the majority of clinical trial candidates. This information is valuable because there is a paucity of data addressing the potential utility of these measures in cGVHD patient population. Another limitation of the study is that pediatric patients were excluded because of the inability of many to complete self-evaluations as well as other testing such as Schirmer’s, or pulmonary function tests. This underscores the need for developing new assessment tools and age-appropriate strategies for evaluating responses in children.20 Lastly, because of the referral pattern of severely affected patients, our observations may not be generalized to less severely affected or newly diagnosed cGVHD populations. However, the data presented here were generated from a patient cohort carrying the largest cGVHD disease burden and represents a patient population that is challenging to treat. Our results can be generalized to patients with moderate or severe cGVHD, a group for which a clinical trial represents a reasonable treatment option.

In conclusion, this study defines the distribution and significance of clinician-assessed (‘Form A’) and patient-reported (‘Form B’) NIH response measures in a large cohort of severely affected cGVHD patients. The clinician- and patient-reported 4-point global scores and 11-point global symptom severity scores are most closely associated with clinically important outcomes such as cGVHD severity, disease activity, symptom burden and self-assessed physical health. Skin erythema is the most easily identifiable and potentially reversible measure of cGVHD activity that was found to be associated with survival. Future clinical studies should prioritize further development of these response measures to establish standardized surrogate end points for cGVHD therapy intervention studies.

Supplementary Material

Supplemental figures
Supplemental tables

ACKNOWLEDGEMENTS

This research was also supported also by National Institutes of Health, Clinical Center, Nursing and Patient Care Services. We thank all patients and their families who participated in the natural history of cGVHD protocol. This research was supported by the Intramural Research Program of the National Institutes of Health (NIH), National Cancer Institute, Center for Cancer Research. The authors are employees of the United States Government, and, as such, this work was carried out in that capacity. The views expressed do not necessarily represent the views of the National Institutes of Health or the United States Government.

Footnotes

CONFLICT OF INTEREST

The authors declare no conflict of interest.

Supplementary Information accompanies this paper on Bone Marrow Transplantation website (http://www.nature.com/bmt)

REFERENCES

  • 1.Lee SJ, Vogelsang G, Flowers ME. Chronic graft-versus-host disease. Biol Blood Marrow Transplant 2003; 9: 215–233. [DOI] [PubMed] [Google Scholar]
  • 2.Goldman JM, Majhail NS, Klein JP, Wang Z, Sobocinski KA, Arora M et al. Relapse and late mortality in 5-year survivors of myeloablative allogeneic hematopoietic cell transplantation for chronic myeloid leukemia in first chronic phase. J Clin Oncol 2010; 28: 1888–1895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lee SJ. Have we made progress in the management of chronic graft-vs-host disease? Best Pract Res Clin Haematol 2010; 23: 529–535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Flowers ME, Storer B, Carpenter P, Rezvani AR, Vigorito AC, Campregher PV et al. Treatment change as a predictor of outcome among patients with classic chronic graft-versus-host disease. Biol Blood Marrow Transplant 2008; 14: 1380–1384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wolff D, Schleuning M, von Harsdorf S, Bacher U, Gerbitz A, Stadler M et al. Consensus conference on clinical practice in chronic GVHD: second-line treatment of chronic graft-versus-host disease. Biol Blood Marrow Transplant 2011; 17: 1–17. [DOI] [PubMed] [Google Scholar]
  • 6.Pavletic SZ, Martin P, Lee SJ, Mitchell S, Jacobsohn D, Cowen EW et al. Measuring therapeutic response in chronic graft-versus-host disease: National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic Graft-versus-Host Disease: IV. Response Criteria Working Group report. Biol Blood Marrow Transplant 2006; 12: 252–266. [DOI] [PubMed] [Google Scholar]
  • 7.Olivieri A, Cimminiello M, Corradini P, Mordini N, Fedele R, Selleri C et al. Long-term outcome and prospective validation of NIH response criteria in 39 patients receiving imatinib for steroid-refractory chronic GVHD. Blood 2013; 122: 4111–4118. [DOI] [PubMed] [Google Scholar]
  • 8.Baird K, Steinberg SM, Grkovic L, Pulanic D, Cowen EW, Mitchell SA et al. National Institutes of Health chronic graft-versus-host disease staging in severely affected patients: organ and global scoring correlate with established indicators of disease severity and prognosis. Biol Blood Marrow Transplant 2013; 19: 632–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Filipovich AH, Weisdorf D, Pavletic S, Socie G, Wingard JR, Lee SJ et al. National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: I. Diagnosis and staging working group report. Biol Blood Marrow Transplant 2005; 11: 945–956. [DOI] [PubMed] [Google Scholar]
  • 10.Grkovic L, Baird K, Steinberg SM, Williams KM, Pulanic D, Cowen EW et al. Clinical laboratory markers of inflammation as determinants of chronic graft-versus-host disease activity and NIH global severity. Leukemia 2012; 26: 633–643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lee S, Cook EF, Soiffer R, Antin JH. Development and validation of a scale to measure symptoms of chronic graft-versus-host disease. Biol Blood Marrow Transplant 2002; 8: 444–452. [DOI] [PubMed] [Google Scholar]
  • 12.Hollander M, Wolfe DA. Nonparametric statistical methods, 2nd edn (Wiley: New York, 1999. [Google Scholar]
  • 13.Agresti A Categorical data analysis, 1st edn. Wiley: New York, 1990. [Google Scholar]
  • 14.Felson DT, Anderson JJ, Boers M, Bombardier C, Furst D, Goldsmith C et al. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum 1995; 38: 727–735. [DOI] [PubMed] [Google Scholar]
  • 15.Best WR, Becktel JM, Singleton JW, Kern F Jr. Development of a Crohn’s disease activity index. National Cooperative Crohn’s Disease Study. Gastroenterology 1976; 70: 439–444. [PubMed] [Google Scholar]
  • 16.Preston CC, Colman AM. Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychol (Amst) 2000; 104: 1–15. [DOI] [PubMed] [Google Scholar]
  • 17.Inamoto Y, Chai X, Kurland BF, Cutler C, Flowers ME, Palmer JM et al. Validation of measurement scales in ocular graft-versus-host disease. Ophthalmology 2012; 119: 487–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Akpek G, Zahurak ML, Piantadosi S, Margolis J, Doherty J, Davidson R et al. Development of a prognostic model for grading chronic graft-versus-host disease. Blood 2001; 97: 1219–1226. [DOI] [PubMed] [Google Scholar]
  • 19.Jacobsohn DA, Kurland BF, Pidala J, Inamoto Y, Chai X, Palmer JM et al. Correlation between NIH composite skin score, patient-reported skin score, and outcome: results from the Chronic GVHD Consortium. Blood 2012; 120: 2545–2552 quiz 2774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wiener L, Baird K, Crum C, Powers K, Carpenter P, Baker KS et al. Child and parent perspectives of the chronic graft-versus-host disease (cGVHD) symptom experience: a concept elicitation study. Support Care Cancer 2014; 22: 295–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mitchell SA, Leidy NK, Mooney KH, Dudley WN, Beck SL, LaStayo PC et al. Determinants of functional performance in long-term survivors of allogeneic hematopoietic stem cell transplantation with chronic graft-versus-host disease (cGVHD). Bone Marrow Transplant 2010; 45: 762–769. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental figures
Supplemental tables

RESOURCES