A Comparative Evaluation of the Measurement Properties of Three Histological Indices of Mucosal Healing in Ulcerative Colitis: Geboes Score, Robarts Histopathology Index and Nancy Index

Laurent Peyrin-Biroulet; Ethan Arenson; David T Rubin; Corey A Siegel; Scott Lee; F Stephen Laroux; Wen Zhou; Tricia Finney-Hayward; Yuri Sanchez Gonzalez; Alan L Shields

doi:10.1093/ecco-jcc/jjad087

. 2023 May 24;17(11):1733–1743. doi: 10.1093/ecco-jcc/jjad087

A Comparative Evaluation of the Measurement Properties of Three Histological Indices of Mucosal Healing in Ulcerative Colitis: Geboes Score, Robarts Histopathology Index and Nancy Index

Laurent Peyrin-Biroulet ^1,^✉, Ethan Arenson ², David T Rubin ³, Corey A Siegel ⁴, Scott Lee ⁵, F Stephen Laroux ⁶, Wen Zhou ⁷, Tricia Finney-Hayward ⁸, Yuri Sanchez Gonzalez ⁹, Alan L Shields ¹⁰

PMCID: PMC10673803 PMID: 37225135

Abstract

Background and Aims

To inform their future use in regulated clinical trials to evaluate treatment efficacy hypotheses, the measurement properties of three histological indices, Geboes Score [GS], Robarts Histopathology Index [RHI] and Nancy Index [NI], were evaluated among patients with ulcerative colitis.

Methods

Analyses were conducted on data from a Phase 3 clinical trial of adalimumab [M14-033, n = 491] and focused on evaluating the measurement properties of the GS, RHI and NI. Specifically, internal consistency and inter-rater reliability, convergent, discriminant and known-group validity, and sensitivity to change were assessed at Baseline, and at Weeks 8 and 52.

Results

Internal consistency for the RHI showed lower alpha [α] values at Baseline [α = 0.62] relative to Weeks 8 [α = 0.82] and 52 [α = 0.81]. The inter-rater reliability values of RHI [0.91], NI [0.64] and GS [0.53] were excellent, good and fair, respectively. Regarding validity, Week 52 correlations were moderate to strong between full and partial Mayo scores and Mayo subscale scores and the RHI and GS, and were weak to moderate for the NI. Significant differences between mean scores of all three histological indices were observed across known-groups based on Mayo endoscopy subscores and full Mayo scores at Weeks 8 and 52 [p < 0.001].

Conclusions

The GS, RHI and NI are each capable of producing reliable and valid scores that are sensitive to changes in disease activity over time, in patients with moderately to severely active ulcerative colitis. While all three indices demonstrated relatively acceptable measurement properties, the GS and RHI performed better than the NI.

Keywords: Biomarkers, clinical trials, endoscopy

1. Introduction

Important treatment goals in ulcerative colitis [UC] are to induce remission and prevent long-term complications.^1–3 Research in UC has looked beyond symptom-based measures to mucosal healing because it is associated with better clinical outcomes.^1,2,4 For example, mucosal healing has been associated with improved outcomes at 1 year of follow-up, such as higher rates of clinical remission, corticosteroid-free clinical remission, sustained mucosal healing and avoidance of a colectomy.⁵ Mucosal healing has commonly been defined by a Mayo endoscopy subscore of either 0 or ≤1.^2,3

Assessing the mucosa by endoscopy provides information regarding the visual appearance of the mucosa, whereas the more detailed findings afforded by the microscopic examination of biopsied specimens provide additional information that reflects other components of disease status in patients with UC.⁶ In recent research, mucosal healing is defined through endoscopic plus histological assessment, but there is heterogeneity in the outcome definitions used in UC.^7,8 In support of this description, current US regulatory standards indicate that the concept of mucosal healing requires clear evidence of improvement in both endoscopic and histological assessment of the mucosa.⁹ Furthermore, in the treat-to-target consensus, mucosal healing as evidenced through histological assessment was named as a consideration in UC.³ There is therefore increased attention to histologically and endoscopically observed responses in clinical research and practice.

In order for histological assessments to be incorporated into clinical research and practice, there is a need for a validated histological assessment of the mucosa. However, there are a variety of challenges to histological assessment of the mucosa among patients with UC, ranging from the varied targets of measurement to the lack of consensus on the relevant criteria to guide their selection, implementation, use, rater training and scoring, which make it difficult to understand the relative performance of existing assessment tools.⁷ Moreover, because these assessments are considered clinician-reported outcomes completed by trained raters, their measurements [i.e. ratings] should be documented as reliable [internally, as well as within and between raters], and construct valid, and be clinically interpretable.^9–11

Further research is needed to establish whether the gains derived from complete endoscopic and histological healing are worth the patient discomfort, procedure-related risks and increased costs of frequent endoscopies and biopsies.^3,12 Additionally, because the multiple indices used to assess histological scores in UC make it difficult to compare results between studies,¹³ it has been proposed that researchers identify a core set of outcomes for patients with UC in order to standardize efficacy and safety across clinical trials and to link optimal thresholds for disease activity with specific validated outcomes.^3,7 The Geboes Score [GS], Robarts Histopathology Index [RHI] and Nancy Index [NI] are instruments commonly used to assess histopathological disease activity in patients with UC, and scores from these indices can be used to define histological inflammation and remission of the mucosa.^2,6,14 However, there is a gap in the literature comparing relative measurement performance [e.g. reliability and construct-related validity] of histological scores from these assessments among patients with UC, making it difficult for researchers to select the most appropriate index for clinical trials and practice.

The aim of the present study was to perform a measurement-focused evaluation of the GS, RHI and NI scores generated by raters evaluating patients in a Phase 3 trial, M14-033 [NCT02065622], that was designed to evaluate the safety and efficacy of adalimumab in patients with moderately to severely active UC. The results of these analyses can help researchers looking at UC outcomes to select defensible histological indicators of mucosal healing for use in their clinical trials as well as inform their use in clinical practice.

2. Materials and methods

2.1. Study population and design

This measurement-focused evaluation assessed the reliability, construct-related validity and sensitivity to change of the GS, RHI and NI using data from the M14-033 trial [SERENE-UC; NCT02065622]. In the M14-033 trial, patients were randomized to adalimumab standard [160 mg/80 mg] or high dose [4 × 160 mg] induction therapy followed by maintenance dosing of 40 mg every other week or 40 mg weekly. Inclusion and exclusion criteria are in Supplementary Materials. The M14-033 trial complied with ethical standards of the Declaration of Helsinki Committee. Informed consent was obtained from all patients who participated in the trial and data was de-identified.

Two analysis populations [APs] were included in the analyses and were defined as subsets of the intent-to-treat [ITT] population from the M14-033 trial. AP1 was defined as all ITT patients [n = 491] with GS and RHI scores at Screening, Week 8 and Week 52. AP2 pertained exclusively to patients with NI scores: 150 patients were selected from the M14-033 trial with biopsies at Screening, Week 8 and Week 52.

An inter-rater reliability analysis population [Inter-AP1], including 131 patients [of 150] whose biopsies were rated by four pathologists, was also included. These patients were selected based on a biopsy from sigmoid and/or rectum region at Baseline, Week 8 and Week 52, and an endoscopy with sigmoid and rectum scores at Week 52. Biopsies were collected from two sites within each segment at each timepoint: first, the area of most severe inflammation, or the edge of an ulcer, as determined by the endoscopic assessment; and second, the area representative of the average degree of inflammation for the entire segment as determined by endoscopy. However, only biopsies collected from the area of most severe inflammation, or the edge of an ulcer, were used in this analysis. Four independent pathologists rated the biopsies collected at Screening, Week 8 and Week 52 on the GS and NI; the RHI score was generated based on the GS. The endoscopy subscores provided a relatively uniform distribution of Mayo endoscopy subscores at Week 52.

2.2. Primary histological assessments

The aim of the present inquiry was to develop a deeper understanding of the measurement properties of the GS, RHI and NI.

2.2.1. Geboes Score

The GS assesses features relevant to histological inflammation in UC with the purpose of distinguishing among quiescent disease [inactive disease or grade 1], mildly active disease [defined by the presence of polymorphonuclear cells or neutrophils or grades 2 and 3] and moderate to severely active disease [defined by epithelial cell damage or grades 4 and 5].¹⁵ One common critique of the GS is its complexity and difficulty of scoring, and this can limit its utility in both clinical research and practice.^15,16 The assessment allows raters to characterize disease severity across six grades [0, 1, 2a/2b, 3, 4 and 5] with each grade further divided into either four or five subcategories [e.g. 0.0, 0.1, 0.2 and 0.3, and 5.0, 5.1, 5.2, 5.3 and 5.4]. This creates 29 ordered, categorical scores between 0 and 5.4, with higher scores denoting greater levels of disease activity [Supplementary Table 1]. In the present analysis, the GS reflects the highest non-zero grade from either the rectum or sigmoid tissue sample at the associated visit, and therefore ranges from 0 to 5. Though it is technically not a continuous measure, the GS is defensibly treated as such, given it is derived from a set of 29 ordered categories and common practice is to consider ordinal measures with more than seven categories as a continuous measure.¹⁷

To justify the GS approach in the present analysis, as well as provide confidence that the results generalize to grade and subcategory scoring, a chi-squared analysis was run, first to test the null hypothesis that there is no association between scoring approaches, and second, to test that the two scores were correlated to reflect the direction and strength of the relationship. The results of the chi-square analysis [χ² = 5221.7; df = 90; p < 0.001] confirm the relationship between scores, and the polychoric correlation coefficient indicates that the relationship is positive and very strong, near perfect (r = 0.98, 95% confidence interval [CI] = 0.97 and 0.98).

2.2.2. Robarts Histopathology Index

The RHI assesses four characteristics of mucosal activity, inflammatory infiltrate, lamina propria neutrophils, neutrophils in epithelium, and erosion or ulceration, all of which are rated on a scale of 0 to 3. Each characteristic is weighted [using respective multiplicative weights of 1, 2, 3 and 5] to produce a total score ranging from 0 [no disease activity] to 33 [most severe disease activity].¹⁸

2.2.3. Nancy Index

The NI assesses three characteristics of mucosal activity: chronic inflammatory infiltrate, acute inflammatory infiltrate and ulceration.¹⁹ The NI scoring algorithm uses five grades [0, 1, 2, 3 and 4] for each of its three assessment targets. Grade 0 indicates the least severe disease or the absence of significant histological disease and grade 4 indicates the most severe disease. Further details on all primary histological assessments can be found in the Supplementary Materials.

2.3. Secondary assessments

The following questionnaires and assessments were also administered in the M14-033 trial and were used to support the measurement evaluation of the histological indices: full Mayo Score, partial Mayo score, Inflammatory Bowel Disease Questionnaire [IBDQ], Work Productivity and Activity Impairment [WPAI] questionnaire for UC, 36-item Short Form Survey [SF-36v2^®] scores, UC-related symptoms, C-reactive protein [CRP] and faecal calprotectin levels [F-cal].^20–28 Further details on these secondary study assessments are provided in the Supplementary Materials.

2.4. Statistical analyses

The GS, RHI and NI scores were determined at Baseline, Week 8 and Week 52. Continuous variables were described by sample size [n], mean, standard deviation [SD] and number of missing values. Categorical [including ordinal] variables were described by the sample size and the percentage of each response choice, with missing data included in the calculation of percentage.

2.4.1. Score reliability

Internal consistency reliability reflects agreement among multiple items belonging to a scale, and therefore was computed only for the RHI [using unweighted component scores] using Cronbach’s alpha coefficient [α] at Baseline, and Weeks 8 and 52.²⁹ Cronbach’s alpha coefficient ranges from 0 to 1.0, and while there are no universally accepted rules for the interpretation of α, estimates >0.70 are typically seen as sufficient to support an assessment’s use for research purposes.¹⁷

The inter-rater reliability or the degree of consensus and agreement between different raters assessing the same individual at the same point in time was included in the analysis. Intraclass correlation coefficients [ICCs] were computed using the weighted Kappa [Kw] or single measurement absolute agreement two-way mixed effects model (ICC[2,1]). The following guidelines for interpreting Kw/ICCs were applied: <0.4 = poor reliability, 0.4–0.59 = fair reliability, 0.6–0.74 = good reliability, and >0.75 = excellent reliability.³⁰

2.4.2. Construct-related validity

Reasonably strong associations between related scores, concepts and instruments are an indicator of convergent validity, and low associations between unrelated scores, concepts and instrument are an indicator of discriminant validity. In the present analysis, Spearman correlations between the GS, RHI, NI and secondary assessments at Baseline, Week 8 and Week 52 were calculated. Secondary assessments included in this analysis were Mayo score, IBDQ, WPAI, SF-36, abdominal pain, bowel urgency, general well-being, CRP levels and F-cal levels. To facilitate interpretation of the correlation coefficient, the following guidelines were considered: negligible relationship: 0≤|r|≤0.09; weak relationship: 0.10≤|r|≤0.29; moderate relationship: 0.30≤|r|≤0.49; and strong relationship: 0.50≤|r|≤1.^31,32

2.4.3. Known-groups analysis

Known-groups analysis contributes to measurement validity conclusions and characterizes the degree to which an assessment generates scores capable of distinguishing among groups hypothesized a priori to be clinically distinct. In the present analysis, known-groups comparisons of the GS, RHI and NI scores were assessed at Baseline, Week 8 and Week 52 using analysis of variance [ANOVA] to test between-group differences among prespecified subgroups at each timepoint. The prespecified subgroups are listed in Supplementary Table 2.

2.4.4. Sensitivity to change

Sensitivity-to-change analyses focus on the evaluation of change scores in a target instrument over time to demonstrate that improvements [or reductions] seen in those scores correspond to improvements [or reductions] in other areas expected to change. In this regard, GS, RHI and NI scores were evaluated in two ways. First, descriptive statistics were generated to demonstrate that the GS, RHI and NI scores changed over time and standardized effect sizes associated with those change scores were determined at Weeks 8 and 52. The standardized effect size was derived from Cohen’s d and calculated as follows: ([(Mean follow-up) − (Mean Baseline)]/SD at Baseline). Effect sizes were interpreted as no effect [|d| < 0.2], small effect [0.2 ≤ |d| < 0.5], medium effect [0.5 ≤ |d| < 0.8] and large effect [|d| ≥ 0.8].³² Second, a correlational approach was taken to evaluate the association between the observed change in GS, RHI and NI scores, with the change scores observed in supplementary assessments. Specifically, Spearman correlations were generated to reflect the relationship between change in GS, RHI and NI scores with secondary assessment scores [Mayo score, IBDQ, WPAI, SF-36, abdominal pain, bowel urgency, general well-being, CRP levels and F-cal levels] from Baseline to Weeks 8 and 52. Correlations were described as weak [≤0.30], moderate [≤0.70], strong [≤0.90] and very strong [≤1.00].³³

Missing data were not imputed on GS, RHI, NI, full and partial Mayo scores, IBDQ, WPAI and the UC symptom daily diary; this approach is standard for measurement-focused evaluation of clinical outcome assessments. Patients with Week 8 and without Week 52 histological scores were assumed to have dropped out prematurely and not considered to have achieved the outcome at Week 52. No adjustments were used for multiplicity of tests. Where specific significance tests were used, the threshold for statistical significance was p < 0.05 [two-sided] for each test and all analyses were performed with SAS version 9.4 [SAS Institute].

3. Results

3.1. Baseline characteristics

Among the 491 patients included in the analysis, the mean age was 41.2 years [SD = 12.9] and 59.5% [N = 292] were male [Table 1]. Approximately half of the patients [54.8%] had left-sided [distal] UC, and 45.2% had pancolitis [extensive UC]. The mean disease duration was 6.9 years [SD = 7.0 years].

Table 1.

Demographics and clinical characteristics at Baseline

Characteristic	Total [N = 491]^a Mean ± SD or n [%]
Age, years	41.2 ± 12.9
Male	292 [59.5]
Weight [kg]	75.5 ± 18.3
Disease extent
Left-sided [distal] UC	269 [54.8]
Extensive UC/pancolitis	222 [45.2]
Disease duration [years]	6.9 ± 7.0
CRP level [mg/L]^b	9.9 ± 16.0
F-cal levels [µg/g] [N = 474]	2407.2 ± 2559.6
Concomitant medications
Corticosteroid	287 [58.5]
Immunosuppressant	150 [30.5]
Aminosalicylates	388 [79.0]
Biologics	0 [0.0]
Anti-TNF	53 [10.8]
Infliximab^c	53 [10.8]

Open in a new tab

AP, analysis population; CRP, C-reactive protein; F-cal, faecal calprotectin; GS, Geboes Score; ITT, intent-to-treat; RHI, Robarts Histopathology Index; SD, standard deviation; TNF, tumour necrosis factor; UC, ulcerative colitis.

^aAP1 consists of ITT patients from the M14-033 trial with a GS or RHI score at Baseline and at least one matching follow-up score.

^bMissing value/no response = 17.

^cAll patients on anti-TNF used infliximab.

3.2. GS, RHI and NI score properties

A downward trend in each histological score from Baseline to Week 52 was observed [Table 2; Supplementary Table 1]. For the GS, RHI and NI indices, the mean [±SD] score decreased from 4.1 [±1.0] to 3.1 [±1.7], 16.6 [±7.6] to 11.4 [±9.3], and 2.4 [±1.2] to 1.9 [±1.5], respectively [Table 2]. On the GS, RHI and NI indices, percentages of patients rated as ‘most severe’ decreased over time and those rated as ‘least severe’ increased over time [Table 2].

Table 2.

Characteristics of the GS, RHI and NI Scores at baseline and Week 52

Scores	Baseline [N = 491]^a	Week 8 [N = 491]	Week 52 [N = 372]
GS [0–5]
N	491	446	265
Mean ± SD	4.1 ± 1.0	3.6 ± 1.5	3.1 ± 1.7
Missing/no responses	0	45	107
GS, n [%]
Grade 0: Structural	3 [0.6]	31 [6.3]	40 [10.8]
Grade 1: Chronic inflammatory infiltrate	3 [0.6]	10 [2.0]	5 [1.3]
Grade 2A/2B: Lamina propria neutrophils and eosinophils	24 [4.9]	69 [14.1]	43 [11.6]
Grade 3: Neutrophils in epithelium	92 [18.7]	75 [15.3]	44 [11.8]
Grade 4: Crypt destruction	155 [31.6]	80 [16.3]	58 [15.6]
Grade 5: Erosion or ulceration	214 [43.6]	181 [36.9]	75 [20.2]
Missing/no response	0 [0.0]	45 [9.2]	107 [28.8]
RHI score [0–33]^b
N	491	441	261
Mean ± SD	16.6 ± 7.6	14.1 ± 9.8	11.4 ± 9.3
Missing/no response	0	50	111
RHI component scores, n [%]
Grade 1: Chronic inflammatory infiltrate
1.0 No increase	9 [1.8]	37 [7.5]	46 [12.4]
1.1 Mild but unequivocal increase	23 [4.7]	75 [15.3]	43 [11.6]
1.2 Moderate increase	119 [24.2]	90 [18.3]	58 [15.6]
1.3 Marked increase	340 [69.2]	242 [49.3]	119 [32.0]
Pending	0 [0.0]	27 [5.5]	2 [0.5]
Missing/no response	0 [0.0]	20 [4.1]	104 [28.0]
Grade 2B: Neutrophils in lamina propria
2B.0 None	21 [4.3]	81 [16.5]	74 [19.9]
2B.1 Mild but unequivocal increase	249 [50.7]	194 [39.5]	106 [28.5]
2B.2 Moderate increase	188 [38.3]	146 [29.7]	79 [21.2]
2B.3 Marked increase	33 [6.7]	24 [4.9]	7 [1.9]
Pending	0 [0.0]	27 [5.5]	2 [0.5]
Missing/no response	0 [0.0]	19 [3.9]	104 [28.0]
Grade 3: Neutrophils in epithelium
3.0 None	46 [9.4]	125 [25.5]	100 [26.9]
3.1 <5% crypts involved	7 [1.4]	13 [2.6]	10 [2.7]
3.2 <50% crypts involved	201 [40.9]	142 [28.9]	82 [22.0]
3.3 >50% crypts involved	237 [48.3]	161 [32.8]	71 [19.1]
Pending	0 [0.0]	27 [5.5]	2 [0.5]
Missing/no response	0 [0.0]	23 [4.7]	107 [28.8]
Grade 5: Erosion or ulceration
5.0 No erosion, ulceration, or granulation tissue	296 [60.3]	280 [57.0]	196 [52.7]
5.1 Recovering epithelium + adjacent inflammation	56 [11.4]	37 [7.5]	18 [4.8]
5.2 Probable erosion—focally stripped	6 [1.2]	6 [1.2]	1 [0.3]
5.3 Unequivocal erosion	45 [9.2]	39 [7.9]	12 [3.2]
5.4 Ulcer or granulation tissue	88 [17.9]	84 [17.1]	40 [10.8]
Pending	0 [0.0]	27 [5.5]	2 [0.5]
Missing/no response	0 [0.0]	18 [3.7]	103 [27.7]
Mean NI score [0–4]^c^,^d
N	151	151	151
Mean ± SD	2.4 ± 1.2	2.1 ± 1.4	1.9 ± 1.5
Missing/no response	0	0	0

Open in a new tab

AP, analysis population; GS, Geboes score; ITT, intent-to-treat; NI, Nancy Index; RHI, Robarts Histopathology Index; SD, standard deviation.

^aAP1 consists of ITT patients from the M14-033 trial with a GS or RHI score at Screening and at least one matching follow-up score.

^bThe RHI score ranges from 0 to 33, with higher scores associated with more severe disease activity. The RHI score is based on the most severe RHI score for a rectum or sigmoid tissue sample at the associated visit.

^cAP2 are patients from the M14-033 trial with an NI score at screening and at least one follow-up.

^dThe average NI score is the mean of an ordinal score that ranges from 0 to 4, with higher scores associated with more severe disease. The average score is based on the mean of four physician NI ratings at the visit.

3.3. Reliability

Internal consistency estimates for the RHI were lower at Baseline [α = 0.62] relative to Week 8 [α = 0.82] and Week 52 [α = 0.81; Supplementary Table 3]. Based on ICCs of 0.905 for RHI, 0.640 for NI and 0.530 for GS total score across the four raters, inter-rater reliability was interpreted as excellent [>0.75], good [0.6–0.74] and fair [0.4–0.59], respectively [Table 3]. Examination of inter-rater reliability of the GS components indicated that the reliability was fair for Grades 0 and 1, poor for Grade 2A, and fair for Grades 2B, 4 and 5.

Table 3.

Inter-rater reliability of the GS, RHI, NI and component scores at Week 52 [Inter-AP1^a: N = 151]

Score	ICC^b [95% CI]
GS^c	0.530^a [0.496, 0.563]
RHI score^d	0.905^b [0.875, 0.928]
NI score^e	0.640^a [0.605, 0.675]
GS Grade 0: Structural^f	0.468^a [0.427, 0.509]
GS Grade 1: Chronic inflammatory infiltrate^f	0.579^a [0.539, 0.619]
GS Grade 2A: Eosinophils in lamina propria^f	0.270^a [0.214, 0.325]
GS Grade 2B: Neutrophils in lamina propria^f	0.552^a [0.512, 0.591]
GS Grade 4: Crypt destruction^f	0.425^a [0.381, 0.469]
GS Grade 5: Erosion or ulceration^f	0.522^a [0.483, 0.561]
GS severity score^g	0.647^a [0.600, 0.693]

Open in a new tab

CI, confidence interval; GS, Geboes score; ICC, intraclass correlation coefficient; inter-API, inter-rater reliability analysis population; NI, Nancy Index; RHI, Robarts Histopathology Index.

Note: Inter-AP1 are patients from the M14-033 trial with ratings by multiple physicians on the GS, RHI or NI at Week 52.

^aOnly Inter-AP patients with scores rated by at least two physicians were included in the analysis.

^bThe ICC was computed using the [a]weighted Kappa or [b]single measurement absolute agreement two-way mixed effects model.

^cThe GS is an ordinal score that ranges from 0 to 5, with higher scores associated with greater levels of inflammation.

^dThe RHI score ranges from 0 to 33, with higher scores associated with more severe disease activity.

^eThe NI score is an ordinal score that ranges from 0 to 4, with higher scores associated with greater disease severity.

^fThe GS components are from the rectum or sigmoid tissue sample associated with the highest GS at the visit.

^gThe GS severity score is an ordinal score that ranges from 1 to 3, with higher scores associated with greater levels of severity.

3.4. Construct-related validity

Overall, on construct-related validity, the performance of both the GS and RHI were considered good vs the NI, which was considered fair [Table 4].

Table 4.

Summary of measure performance

Measurement consideration	GS	RHI	NI
Reliability
Internal consistency	NA	0.81	NA
Inter-rater reliability	0.42	0.87	0.54
Construct validity
Convergent/divergent validity [correlations]
Full Mayo score	0.39 to 0.63	0.43 to 0.68	0.27 to 0.50
IBDQ	–0.16 to –0.38	–0.18 to –0.41	–0.10 to –0.26
WPAI	0.32 to 0.40	0.31 to 0.41	0.17 to 0.36
SF-36v2	–0.26 to –0.19	–0.27 to –0.15	–0.18 to 0.02
Known-groups methods [ANOVA p-values]
Mayo endoscopy score	<0.001	<0.001	<0.001
Full Mayo score	<0.001	<0.001	<0.001
IBDQ quartiles	<0.001	<0.001	0.361
Sensitivity to change [Cohen’s d]
	–0.96	–0.67	–0.39

Open in a new tab

GS, Geboes Score; IBDQ, Inflammatory Bowel Disease Questionnaire; NA, not applicable; NI, Nancy Index; RHI, Robarts Histopathology Index; SF-36, 36-Item Short-Form Survey; WPAI, Work Productivity and Activity Impairment Questionnaire.

3.4.1. Geboes Score

At Week 8, moderate [r ≥ 0.3] to strong [r ≥ 0.5] positive Spearman correlations were found for the GS with the Mayo component scores (r = 0.43 for stool frequency subscore, 0.37 for rectal bleeding subscore, 0.53 for the endoscopy subscore and 0.41 for the physician’s global assessment [PGA]), as well as with the full and partial Mayo scores [r = 0.54 and 0.49, respectively, Supplementary Table 4]. At Week 52, correlations with the Mayo component scores were unchanged [r = 0.42 for stool frequency subscore, 0.39 for rectal bleeding subscore, 0.57 for the endoscopy subscore and 0.53 for the PGA], though correlations with the full Mayo and partial Mayo scores increased [r = 0.63 and 0.56, respectively; Table 5].

Table 5.

Spearman correlations between the GS, RHI, NI and secondary assessments at Week 52

Secondary measure	GS [N = 372]		RHI [N = 372]		NI [N = 151]
Secondary measure	N	Corr.	N	Corr.	N	Corr.
Mayo: Mean stool frequency subscore^a	237	0.42	234	0.45	131	0.43
Mayo: Mean rectal bleeding frequency subscore^a	237	0.39	234	0.43	131	0.24
Mayo: Mean endoscopy subscore^a	260	0.57	256	0.62	131	0.53
Mayo: Physician’s Global Assessment^a	238	0.53	235	0.57	132	0.43
Mayo: Full score^a	233	0.63	230	0.68	129	0.56
Mayo: Partial score^a^,^b	237	0.56	234	0.60	131	0.50
IBDQ: Bowel symptoms^c	238	–0.38	235	–0.41	129	–0.33
IBDQ: Systemic symptoms^c	238	–0.26	235	–0.28	129	–0.20
IBDQ: Emotional function^c	239	–0.25	236	–0.26	129	–0.19
IBDQ: Social function^c	234	–0.38	231	–0.41	128	–0.34
IBDQ: Fatigue^c	239	–0.16	236	–0.18	129	–0.16
IBDQ: Total score^c	233	–0.35	230	–0.37	128	–0.28
WPAI: Percentage of work time missed due to UC [absenteeism]^d	254	0.38	250	0.41	124	0.35
WPAI: Percentage of impairment while working due to UC [presenteeism]^d	162	0.39	159	0.40	71	0.39
WPAI: Percentage of overall work impairment due to UC^d	156	0.40	154	0.41	66	0.39
WPAI: Percentage of activity impairment due to UC^d	158	0.32	156	0.31	66	0.22
SF-36v2^®: Physical functioning^e	262	–0.15	258	–0.15	129	–0.19
SF-36v2^®: Role physical^e	262	–0.28	258	–0.29	129	–0.23
SF-36v2^®: Bodily pain^e	262	–0.16	258	–0.17	129	–0.07
SF-36v2^®: General health^e	262	–0.27	258	–0.27	129	–0.21
SF-36v2^®: Vitality^e	262	–0.21	258	–0.21	129	–0.21
SF-36v2^®: Social functioning^e	262	–0.31	258	–0.31	129	–0.25
SF-36v2^®: Role emotional^e	262	–0.11	258	–0.12	129	–0.07
SF-36v2^®: Mental health^e	262	–0.18	258	–0.18	129	–0.13
SF-36v2^®: PCS^e	262	–0.27	258	–0.26	129	–0.22
SF-36v2^®: MCS^e	262	–0.19	258	–0.20	129	–0.13
Mean abdominal pain^f	236	0.25	232	0.26	139	0.20
Percent bowel urgency^f	236	0.40	232	0.43	139	0.28
Mean general well-being^f	236	0.24	232	0.26	139	0.21
CRP levels^g	237	0.32	234	0.33	130	0.16
F-cal levels^g	212	0.60	210	0.62	112	0.51

Open in a new tab

AP, analysis population; Corr., correlation; CRP, normal C-reactive protein; F-cal, faecal calprotectin; GS, Geboes Score; IBDQ, Inflammatory Bowel Disease Questionnaire; ITT, intent-to-treat; MCS, Mental Component Summary; NI, Nancy Index; PCS, Physical Component Summary; RHI, Robarts Histology Index; SF-36v2^®, 36-item Short Form Survey; UC, ulcerative colitis; WPAI, Work Productivity and Activity Impairment questionnaire.

Note: AP1 are ITT patients from the M14-033 trial with a GS or RHI score at screening and at least one follow-up.

^aMayo scores are of various ranges with higher scores indicative of more severe disease.

^bPartial Mayo score is the full Mayo score excluding the endoscopy subscore.

^cThe IBDQ total score ranges from 32 to 224, with higher scores representing better quality of life.

^dThe WPAI scores are percentages from 0 to 100. Higher scores are indicative of greater impairment.

^eThe SF-36v2^® scores are norm-based, normalized to the US general population with mean 50 and standard deviation of 10. A higher score indicates better functioning or well-being.

^fThe UC symptom daily diary items are of various ranges with higher scores indicating greater severity of UC symptoms on the respective item.

^gCRP and F-cal are assessments of various ranges. Higher scores indicate more severe UC.

The GS had negatively moderate to strong correlations with IBDQ domain scores, weak positive correlations with WPAI component scores and weak negative correlations with SF-36 [Supplementary Tables 4 and 5].

Among the correlations of the GS with the remaining assessments, convergent validity was best established with F-cal levels [r = 0.47 at Week 8, r = 0.60 at Week 52; Supplementary Tables 4 and 5].

3.4.2. Robarts Histopathology Index

At Week 8, moderate [r ≥ 0.3] to strong [r ≥ 0.5] positive correlations were found for the RHI with the Mayo component scores [0.43 for stool frequency subscore, 0.37 for rectal bleeding subscore, 0.56 for the endoscopy subscore and 0.41 for the PGA], as well as with the full and partial Mayo scores [0.55 and 0.48, respectively; Supplementary Table 4]. Correlations at Week 52 remained relatively unchanged between the RHI and the Mayo stool frequency subscore [r = 0.45], rectal bleeding subscore [r = 0.43] and the endoscopy subscore [r = 0.62, Table 5]. However, there was an increase in the correlation between the RHI and the PGA [r = 0.57] at Week 52. This was associated with an increase in the correlations of RHI and the full Mayo score [r = 0.68] and the partial Mayo score [r = 0.60] at Week 52 [Table 5].

The RHI had negatively moderate correlations with IBDQ domain scores, weak positive correlations with WPAI component scores and weak correlations with SF-36 [Supplementary Tables 4 and 5]. The correlations of the RHI were consistently strongest at Weeks 8 and 52 with patient F-cal levels [r = 0.48 at Week 8, r = 0.62 at Week 52], compared with percentage bowel urgency [r = 0.34 at Week 8, r = 0.43 at Week 52] and CRP levels [r = 0.38 at Week 8, r = 0.33 at Week 52; Table 5 and Supplementary Table 5].

3.4.3. Nancy Index

At Week 8, weak [r ≥ 0.1] to moderate [r ≥ 0.3] positive correlations were found for the NI with the Mayo component scores [r = 0.43 for stool frequency subscore, r = 0.27 for rectal bleeding subscore, r = 0.39 for the endoscopy subscore and r = 0.32 for the PGA], as well as with the full and partial Mayo scores [r = 0.45 and 0.41, respectively; Supplementary Table 4]. There were mixed changes at Week 52: correlations decreased with the NI and Mayo rectal bleeding subscore [r = 0.24], whereas correlations increased for endoscopy subscore [r = 0.53] and with the PGA [r = 0.43] but did not change with the stool frequency subscore [r = 0.43; Table 5]. At Week 52, correlations between the NI and the full and partial Mayo scores increased to 0.56 and 0.50, respectively.

The NI had weak to moderate negative correlations with the IBDQ, weak to moderate positive correlations with WPAI component scores and weakly negative correlations with SF-36 [Supplementary Tables 4 and 5]. As with the GS and RHI, the NI had strongest correlations with F-cal levels at Week 8 [r = 0.51] and Week 52 [r = 0.51; Table 5 and Supplementary Table 4].

3.5. Known-groups analysis

As anticipated and is typically the case with Baseline measures, no significant difference between predefined clinical groups and GS, RHI or NI scores were observed with one exception; the GS among the full Mayo total score groupings were statistically significantly different [p = 0.006, Supplementary Table 5]. In contrast, all three histological indices were observed to have statistically significant mean score differences based on predefined clinical groupings at Week 8 [p < 0.05; Supplementary Table 6] and Week 52 [p < 0.05; Table 6]. Overall, based on known-groups analysis, the performance of both the GS and RHI was considered excellent vs the NI which was considered good [Table 4].

Table 6.

Known-groups comparisons of the GS, RHI and NI at Week 52

Comparison groups	GS [AP1; N = 372]			RHI [AP1; N = 372]			NI [AP2; N = 151]
Comparison groups	N	Mean ± SD^a	p-value^b	N	Mean ± SD^a	p-value^b	N	Mean ± SD^a	p-value^b
Mayo endoscopy subscore^c
Inactive disease	62	1.7 ± 1.5	<0.001	62	3.4 ± 5.1	<0.001	27	0.5 ± 0.9	<0.001
Mild disease	38	2.3 ± 1.7		38	6.3 ± 6.3		18	1.2 ± 1.5
Moderate disease	91	3.7 ± 1.3		90	14.7 ± 8.2		44	2.0 ± 1.5
Severe disease	69	4.1 ± 1.1		66	17.4 ± 8.4		42	2.6 ± 1.2
Full Mayo total score^d
Clinical remission [score = 2 with no subscore >1]	78	1.8 ± 1.6	<0.001	78	3.9 ± 5.4	<0.001	37	0.6 ± 1.0	<0.001
Mild disease [scores of 3–5]	68	3.1 ± 1.5		65	11.1 ± 7.7		42	2.1 ± 1.4
Moderate disease [scores of 6–10]	79	4.3 ± 1.0		79	18.7 ± 8.4		46	2.6 ± 1.3
Severe disease [scores >10]	0	NA		0	NA		0	NA
IBDQ total score^e
Based on 1st quartile	68	3.8 ± 1.3	<0.001	68	14.6 ± 9.0	<0.001	41	2.2 ± 1.4	0.039
Based on 2nd quartile	66	3.3 ± 1.5		64	12.6 ± 9.6		36	1.9 ± 1.6
Based on 3rd quartile	55	2.4 ± 1.9		54	8.2 ± 8.9		28	1.4 ± 1.5
Based on 4th quartile	44	2.4 ± 1.8		44	7.1 ± 8.3		23	1.3 ± 1.5

Open in a new tab

ANOVA, analysis of variance; AP, analysis population; GS, Geboes Score; IBDQ, Inflammatory Bowel Disease Questionnaire; ITT, intent-to-treat; NI, Nancy Index; RHI, Robarts Histopathology Index; SD, standard deviation.

Note: AP1 are ITT patients from the M14-033 trial with a GS or RHI score at screening and at least one follow-up.

^aThe NI score ranges from 0 to 4, with higher scores associated with more severe disease activity. Only patients with non-missing responses on the NI score and the anchor were included in the analysis.

^b p-values are from one-way ANOVA testing mean score differences between groups.

^cMayo endoscopy subscore groups are based on the Mayo endoscopy subscore.

^dMayo total score groupings are based on the rounded value of the Mayo total score.

^eThe IBDQ total score ranges from 32 to 224, with higher scores representing better quality of life.

3.6. Sensitivity to change

From Baseline to Week 8, small effects describe the magnitude of mean changes in both the RHI [–2.74 ± 10.75, Cohen’s d = –0.36], whereas medium effects describe the magnitude of mean change in the GS scores [–0.57 ± 1.67, d = –0.60] and NI scores [–0.57 ± 1.67, d = –0.60; Table 7]. By Week 52, NI mean score changes remained associated with small effects [–0.55 ± 1.70, d = –0.47], RHI mean score changes were associated with medium effects [–4.99 ± 11.39, d = –0.67] and GS mean score changes were associated with large effects [–0.97 ± 1.89, d = –0.96; Table 7].

Table 7.

Change scores and standardized effect sizes on the GS and RHI scores [AP1: Week 8, N = 491; Week 52, N = 372] and NI scores [AP2: N = 151] at Week 8 and Week 52

GS	N	Mean ± SD Baseline	Mean ± SD Week 8/Week 52	Mean change score ± SD	SES^a
Week 8
GS [0–5]^b	446	4.15 ± 0.94	3.58 ± 1.53	–0.57 ± 1.67	–0.60
RHI score [0–33]^c	441	16.80 ± 7.51	14.07 ± 9.76	–2.74 ± 10.75	–0.36
Average NI score [0–4]^d	151	2.44 ± 1.17	2.10 ± 1.40	–0.34 ± 1.76	–0.29
Week 52
GS [0–5]^b	265	4.11 ± 1.01	3.13 ± 1.71	–0.97 ± 1.89	–0.96
RHI score [0–33]^c	261	16.37 ± 7.47	11.38 ± 9.32	–4.99 ± 11.39	–0.67
Average NI score [0–4]^d	151	2.44 ± 1.17	1.89 ± 1.51	–0.55 ± 1.70	–0.47

Open in a new tab

AP, analysis population; GS, Geboes Score; NI, Nancy Index; RHI, Robarts Histopathology Index; SD, standard deviation; SES, standard effect size.

^aThe standardised effect size is from Cohen’s d, calculated as: [(Mean follow-up) − (Mean Baseline)]/SD at Baseline.

^bThe GS is an ordinal score that ranges from 0 to 5, with higher scores associated with greater levels of inflammation. Only patients with non-missing responses on the GS and the anchor were included in the analysis.

^cThe RHI score ranges from 0 to 33, with higher scores associated with more severe disease activity. Only patients with non-missing responses on the GS and the anchor were included in the analysis.

Overall, at both Week 8 and Week 52, change in the GS [r = 0.35 and 0.55, respectively], RHI scores [r = 0.38 and 0.57, respectively] and NI scores [r = 0.34 and 0.49, respectively] were most strongly related to the change in full Mayo total score. At Week 52, among the Mayo subscale scores, changes in each of the GS, RHI and NI scores were most strongly related to changes in endoscopy and PGA subscores [r = 0.50 and 0.45 for the GS, 0.48 and 0.43 for the RHI, and 0.44 and 0.37 for the NI].

At Week 8, there were weak to moderate relationships between change scores in GS [0.08 < r < 0.19], RHI [0.13 < r < 0.18] and NI [0.16 < r < 0.27] with change in WPAI scores [Supplementary Table 7]. At Week 52, correlations of change in GS, RHI and NI scores increased and were moderately related to change in WPAI scores associated with impairment while working [presenteeism; GS r = 0.42, RHI r = 0.35, NI r = 0.35] and overall work impairment [GS r = 0.41, RHI r = 0.34, NI r = 0.42; Table 8].

Table 8.

Correlations of change on GS, RHI and NI average scores with secondary assessment scores from Baseline to Week 52

Secondary assessment	GS [AP1; N = 372]		RHI [AP1; N = 372]		NI [AP2; N = 151]
Secondary assessment	N	Corr.	N	Corr.	N	Corr.
Mayo: Mean stool frequency subscore^a	236	0.33	233	0.33	131	0.34
Mayo: Mean rectal bleeding frequency subscore^a	236	0.23	233	0.33	131	0.27
Mayo: Mean endoscopy subscore^a	260	0.50	256	0.48	131	0.44
Mayo: Physician’s Global Assessment^a	238	0.45	235	0.43	132	0.37
Mayo: Full Mayo total score^a	232	0.55	229	0.57	129	0.49
Mayo: Partial Mayo score^a^,^b	236	0.46	233	0.50	131	0.43
IBDQ: Total score^c	230	–0.35	227	–0.27	127	–0.27
WPAI: Percentage of work time missed [absenteeism]^d	250	0.28	246	0.22	121	0.24
WPAI: Percentage of impairment while working [presenteeism]^d	144	0.42	141	0.35	65	0.35
WPAI: Percentage of overall work impairment due to UC^d	141	0.41	139	0.34	60	0.42
WPAI: Percentage of activity impairment due to UC^d	143	0.19	141	0.15	60	0.31
SF-36v2^®: Physical functioning^e	262	–0.13	258	–0.09	129	–0.18
SF-36v2^®: Role physical^e	262	–0.25	258	–0.19	129	–0.20
SF-36v2^®: Bodily pain^e	262	–0.16	258	–0.10	129	–0.16
SF-36v2^®: General health^e	262	–0.19	258	–0.16	129	–0.07
SF-36v2^®: Vitality^e	262	–0.20	258	–0.12	129	–0.27
SF-36v2^®: Social functioning^e	262	–0.21	258	–0.18	129	–0.19
SF-36v2^®: Role emotional^e	262	–0.13	258	–0.08	129	–0.08
SF-36v2^®: Mental health^e	262	–0.12	258	–0.08	129	–0.20
SF-36v2^®: PCS^e	262	–0.24	258	–0.17	129	–0.19
SF-36v2^®: MCS^e	262	–0.17	258	–0.12	129	–0.20
Mean abdominal pain^f	232	0.23	228	0.18	137	0.22
Percent bowel urgency^f	232	0.31	228	0.32	137	0.21
Mean general well-being^f	232	0.26	228	0.19	137	0.23

Open in a new tab

AP, analysis population; Corr., correlation; GS, Geboes Score; IBDQ, Inflammatory Bowel Disease Questionnaire; ITT, intent-to-treat; MCS, Mental Component Summary; NI, Nancy Index; PCS, Physical Component Summary; RHI, Robarts Histopathology Index; SF-36v2^®, 36-Item Short Form Survey; UC, ulcerative colitis; WPAI, Work Productivity and Activity Impairment questionnaire.

Note: AP1 are ITT patients from the M14-033 trial with a GS or RHI score at screening and at least one follow-up.

^aMayo scores are of various ranges with higher scores indicative of more severe disease.

^bPartial Mayo score is the full Mayo score excluding the endoscopy subscore.

^cThe IBDQ total score ranges from 32 to 224, with higher scores representing better quality of life.

^dThe WPAI scores are percentages from 0 to 100. Higher scores are indicative of greater impairment.

^eThe SF-36v2^® scores are norm-based, normalized to the US general population with mean 50 and standard deviation of 10. A higher score indicates better functioning or well-being.

^fThe UC symptom daily diary items are of various ranges with higher scores indicating greater severity of UC symptoms on the respective item.

The GS, RHI and NI change scores were only weakly related to SF-36 domain change scores at Week 8 [–0.26 < r<–0.05] and weakly to moderately related at Week 52 [–0.27 < r <–0.07; Table 8 and Supplementary Table 7]. Lastly, change in the GS, RHI and NI scores had weak to moderate positive correlations with changes in patient self-reporting of bowel urgency compared with abdominal pain and general well-being at Week 8 [0.14 < r < 0.21] and Week 52 [0.18 < r < 0.26; Table 8 and Supplementary Table 7].

Based on sensitivity to change, the overall performances of the GS, RHI and NI were considered good [Table 4].

4. Discussion

The GS, RHI and NI are each capable of producing reliable, valid scores that are sensitive to changes in disease activity over time in patients with moderately to severely active UC. Therefore, all three histological measures are justified for use in research settings. While all three indices demonstrated acceptable measurement properties, the GS and RHI performed better than the NI overall. In the context of an active treatment study and relative to Baseline, consistently decreasing mean scores in the GS, RHI and NI over time indicate improvement in UC-related inflammation and other characteristics of microscopically observed disease activity. Additionally, for all three histological measures, percentages of patients rated as most severe decreased over time and percentages of patients rated as least severe increased.

Because of the similarity in disease severity in patient cohorts, convergent validity could not be well evaluated or demonstrated for the GS, RHI and NI at Baseline. Overall, the RHI demonstrated good internal consistency at Week 8 and this was also observed at Week 52. The GS and RHI showed good convergent and discriminant validity, excellent ability to distinguish known-groups, and comparable sensitivity to change. The NI demonstrated fair convergent and discriminant validity, as well as good ability to distinguish known severity groups and was sensitive to change with concurrent assessments at Weeks 8 and 52. It is possible that the RHI had better inter-user correlation as it is a more granular assessment than both the GS and NI. Inter-rater reliability was assessed on a subset of 150 patients to provide a full spectrum of disease severity. Based on ICCs for RHI, NI and GS total score, inter-rater reliability was interpreted as excellent, good and fair, respectively. Reliability of the individual GS components was lowest for patients rated Grade 2A and higher at the more extreme grades. Inter-rater reliability may be better at high GS components because the lower GS components are associated with less severe disease and are less distinct in appearance.

All correlations between the three indices and the other assessments were in the expected direction (negative for SF-36v2^® and IBDQ, and positive for the Mayo, WPAI and other assessments [biomarkers and symptoms collected through a daily diary, such as abdominal pain, general well-being and bowel urgency]). Slightly lower correlations between the NI and the other clinical, health-related quality of life and patient-reported outcome assessments were noted, relative to the correlations observed between those assessments with both the GS and RHI.

While the GS, RHI and NI demonstrated moderate to strong correlations with F-cal levels at Weeks 8 and 52, associations were slightly stronger with the GS and RHI. Similar findings in the literature showed that patients presenting with histological activity had significantly higher F-cal levels when neutrophils in the epithelium increased.³⁴ This association between increasing histology scores and F-cal indicates that neutrophil infiltration in the epithelium may be an appropriate target to be explored further in patients with UC. Furthermore, obtaining a faecal sample is a non-invasive method with less risk to the patient than a biopsy.^3,12

Evaluation of the relationship of these three indices [GS, RHI, NI] to other clinical assessments, including endoscopic outcomes, biomarker levels and patient reports of health status, may provide a consistent definition of mucosal healing and a set of valid, core outcome indices for use in UC clinical trials.^7,11 This is important to facilitate comparison of results between clinical trials and serve as a guide in the long-term management of UC, for researchers and clinicians. In clinical practice, physicians may choose the NI because it is easy to complete and requires less training than the GS. However, current research is examining mucosal healing at a deeper histological level and more sophisticated histological endpoints require a more granular assessment that can be obtained with the GS.

A strength of this study is that it is the first analysis to provide a direct comparison of the measurement properties of three commonly used histological assessments of mucosal healing, using data from a randomized and controlled phase 3 clinical trial. Additionally, it is a prospective analysis of the same sample with three scores at three timepoints within a duration of 1 year and includes inter-rater reliability. Furthermore, these data provide clinicians and researchers with evidence to support the correct placement of histological healing within subsequent versions of treat-to-target guidelines and its use within registrational trials for new therapies.

Limitations of this study include the generalizability of the findings; the patient cohort was ~80% biologic-naïve. Therefore, the findings are not necessarily reflective of patients with moderate to severe UC who may have been treated with at least one biologic or small-molecule advanced therapy. Furthermore, the impact of prior biologic treatment on the microscopic presentation of the mucosa is not yet well documented. Another limitation is the lack of correlation tested with surgery, hospitalization and colon cancer due to the study duration being only 1 year and with a limited number of patients. In addition, while the results support the conclusion that the measurement properties of scores produced by these histological indices are valid and reliable, especially for RHI and GS, our analysis does not inform the frequency at which these measures should be used or how they could be utilized in addition to or instead of less invasive methods.

The results of this study demonstrated that the GS, RHI and NI are each capable of producing reliable, construct-valid scores that are sensitive to changes in disease activity over time in patients with moderately to severely active UC. These histological measures [GS, RHI or NI], if administered and scored in a standardized way, can provide important information about disease activity and severity to aid in treatment decisions necessary for the long-term management of UC. However, the measurement-focused analyses performed in this study suggest that the GS and RHI perform better than the NI.

Supplementary Material

jjad087_suppl_Supplementary_Material

Click here for additional data file.^{(62.8KB, docx)}

Acknowledgements

Medical writing services provided by Natalie Mitchell, of Fishawack Facilitate Ltd., part of Fishawack Health, and funded by AbbVie.

Contributor Information

Laurent Peyrin-Biroulet, University Hospital of Nancy-Brabois, Vandoeuvre-lès-Nancy, France.

Ethan Arenson, Adelphi Values, Boston, MA, USA.

David T Rubin, University of Chicago Medicine Inflammatory Bowel Disease Center, Chicago, IL, USA.

Corey A Siegel, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA.

Scott Lee, University of Washington Medical Center, Seattle, WA, USA.

F Stephen Laroux, AbbVie Bioresearch Center, Worcester, MA, USA.

Wen Zhou, AbbVie Inc., Chicago, IL, USA.

Tricia Finney-Hayward, AbbVie Ltd, Maidenhead, UK.

Yuri Sanchez Gonzalez, AbbVie Inc., Chicago, IL, USA.

Alan L Shields, Adelphi Values, Boston, MA, USA.

Funding

This work was supported by AbbVie, Inc. AbbVie participated in the study design, research, data collection, analysis and interpretation of data, writing, reviewing, and approving the publication. All authors had access to the data results, and participated in the development, review and approval of the paper. No honoraria or payments were made for authorship.

Conflict of Interest

L.P-B. has received personal fees from AbbVie, Allergan, Alma Bio Therapeutics, Amgen, Arena, Biogen, Boehringer Ingelheim, Celgene, Celltrion, Enterome, Ferring, Genentech, Gilead, Hikma, InDex Pharmaceuticals, Janssen, Merck, Nestlé, Pfizer, Pharmacosmos, Roche, Samsung Bioepis, Sandoz, Sterna Biological, Takeda and Tillotts Pharma; and grants from AbbVie, Merck and Takeda. He also holds Clementia Pharmaceuticals stock options. D.T.R. has received grant support from Takeda; and has served as a consultant for AbbVie, Altrubio, Aslan Pharmaceuticals, Athos Therapeutics, Bellatrix Pharmaceuticals, Boehringer Ingelheim, Ltd., Bristol-Myers Squibb, Celgene Chronicles, Corp/Syneos, ClostraBio, Connect BioPharma, Eco R1, Genentech/Roche, Gilead Sciences, Iterative Health, Janssen Pharmaceuticals, Kaleido Biosciences, Lilly, Pfizer, Prometheus Biosciences, Reistone, Seres Therapeutics, Takeda, Target RWE and Trellus Health. C.A.S. reports consulting/advisory board for AbbVie, Amgen, Celgene, Eli Lilly and Company, Janssen, Sandoz, Pfizer, Prometheus, Sebela and Takeda; speaker for AbbVie, Janssen, Pfizer and Takeda; grant support from Crohn’s and Colitis Foundation, AHRQ [1R01HS021747-01], AbbVie, Janssen, Pfizer and Takeda; intellectual property from MiTest Health, LLC and ColonaryConcepts, LLC; and equity interest from MiTest Health and ColonaryConcepts. S.L. has received grant/research support from AbbVie, UCB Pharma, Janssen, Salix, Takeda, Arena and AbGenomics, and has served as a consultant for UCB Cornerstone Health, Janssen, Eli Lilly, Celgene, KCRN Research, Boehringer Ingelheim, Bristol Myers Squibb, Applied Molecular Transport, December 2021 PTG-100: Phase 1 and 2a Studies in Ulcerative Colitis 1863 CLINICAL AT Arena, Celltrion, Samsung Biopsis, Bridge and Biotherapeutics. E.A. and A.L.S. are full-time employees of Adelphi Values, which conducted research on behalf of AbbVie. F.S.L, W.Z., Y.S.G. and T.F-H. are full-time employees of AbbVie and may own AbbVie stock or options.

Author Contributions

Concept and design of study: Y.S.G., A.L.S. Acquisition of data: F.S.L., Y.S.G. Analysis and interpretation of data: E.A., F.S.L., Y.S.G., A.L.S. Critical revision of article for important intellectual content: all authors. Approval of the final submitted version: all authors.

Data Availability

The data underlying this article are available in the article and in its online Supplementary Material.

REFERENCES

1. Danese S, Roda G, Peyrin-Biroulet L.. Evolving therapeutic goals in ulcerative colitis: towards disease clearance. Nat Rev Gastroenterol Hepatol 2020;17:1–2. [DOI] [PubMed] [Google Scholar]
2. Ungaro R, Colombel J-F, Lissoos T, Peyrin-Biroulet L.. A treat-to-target update in ulcerative colitis: a systematic review. Am J Gastroenterol 2019;114:874–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Turner D, Ricciuto A, Lewis A, et al. an update on the Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE) initiative of the International Organization for the Study of IBD (IOIBD): determining therapeutic goals for treat-to-target strategies in IBD. Gastroenterology 2021;160:1570–83. [DOI] [PubMed] [Google Scholar]
4. Peyrin-Biroulet L, Sandborn W, Sands B, et al. Selecting therapeutic targets in inflammatory bowel disease (STRIDE): determining therapeutic goals for treat-to-target. Am J Gastroenterol 2015;110:1324–38. [DOI] [PubMed] [Google Scholar]
5. Shah SC, Colombel J-F, Sands BE, Narula N.. Mucosal healing is associated with improved long-term outcomes of patients with ulcerative colitis: a systematic review and meta-analysis. Clin Gastroenterol Hepatol 2016;14:1245–1255.e8. [DOI] [PubMed] [Google Scholar]
6. Pai RK, Jairath V, Casteele NV, Rieder F, Parker CE, Lauwers GY.. The emerging role of histologic disease activity assessment in ulcerative colitis. Gastrointest Endosc 2018;88:887–98. [DOI] [PubMed] [Google Scholar]
7. Ma C, Panaccione R, Fedorak RN, et al. Heterogeneity in definitions of endpoints for clinical trials of ulcerative colitis: a systematic review for development of a core outcome set. Clin Gastroenterol 2018;16:637–47. e13. [DOI] [PubMed] [Google Scholar]
8. Peyrin-Biroulet L. Mucosal healing in Crohn’s disease and ulcerative colitis. Gastroenterol Hepatol 2020;16(4):206–208. [PMC free article] [PubMed] [Google Scholar]
9. U.S. Food and Drug Administration. Ulcerative colitis: clinical trial endpoints guidance for industry. Washington, D.C.: US Food and Drug Administration; 2016. [Google Scholar]
10. Powers JH III, Patrick DL, Walton MK, et al. Clinician-reported outcome assessments of treatment benefit: report of the ISPOR clinical outcome assessment emerging good practices task force. Value Health 2017;20:2–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Dave M, Loftus EV Jr. Mucosal healing in inflammatory bowel disease—a true paradigm of success? Gastroenterol Hepatol 2012;8:29-38. [PMC free article] [PubMed] [Google Scholar]
12. D’Angelo F, Felley C, Frossard JL.. Calprotectin in daily practice: where do we stand in 2017? Digestion 2017;95:293–301. [DOI] [PubMed] [Google Scholar]
13. Arkteg CB, Wergeland Sørbye S, Buhl Riis L, Dalen SM, Florholmen J, Goll R.. Real-life evaluation of histologic scores for ulcerative colitis in remission. PLoS One 2021;16:e0248224. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Magro F, Doherty G, Peyrin-Biroulet L, et al. ECCO position paper: harmonization of the approach to ulcerative colitis histopathology. J Crohns Colitis 2020;14:1503–11. [DOI] [PubMed] [Google Scholar]
15. Geboes K, Riddell R, Öst A, Jensfelt B, Persson T, Löfberg R.. A reproducible grading scale for histological assessment of inflammation in ulcerative colitis. Gut 2000;47:404–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Jauregui-Amezaga A, Geerits A, Das Y, et al. A simplified Geboes score for ulcerative colitis. J Crohns Colitis 2017;11:305–13. [DOI] [PubMed] [Google Scholar]
17. Nunnally JC. Psychometric theory 3E. New York: Tata McGraw-Hill Education; 1994. [Google Scholar]
18. Mosli MH, Feagan BG, Zou G, et al. Development and validation of a histological index for UC. Gut 2017;66:50–8. [DOI] [PubMed] [Google Scholar]
19. Marchal-Bressenot A, Salleron J, Boulagnon-Rombi C, et al. Development and validation of the Nancy histological index for UC. Gut 2017;66:43–9. [DOI] [PubMed] [Google Scholar]
20. Guyatt G, Mitchell A, Irvine EJ, et al. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology 1989;96:804–10. [PubMed] [Google Scholar]
21. Henriksen M, Jahnsen J, Lygren I, et al. C-reactive protein: a predictive factor and marker of inflammation in inflammatory bowel disease. Results from a prospective population-based study. Gut 2008;57:1518–23. [DOI] [PubMed] [Google Scholar]
22. Schroeder KW, Tremaine WJ, Ilstrup DM.. Coated oral 5-aminosalicylic acid therapy for mildly to moderately active ulcerative colitis. N Engl J Med 1987;317:1625–9. [DOI] [PubMed] [Google Scholar]
23. Walsham NE, Sherwood RA.. Fecal calprotectin in inflammatory bowel disease. Clin Exp Gastroenterol 2016;9:21–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Ware Jr JE, Sherbourne CD.. The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection. Med Care 1992;30:473-83. [PubMed] [Google Scholar]
25. Yarlas A, Maher SM, Bayliss MS, Lovley A, Cappelleri JC, DiBonaventura MD.. Psychometric validation of the work productivity and activity impairment questionnaire in ulcerative colitis: results from a systematic literature review. J Patient-Rep Outcomes 2018;2:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Irvine EJ, Feagan B, Rochon J, et al. Quality of life: a valid and reliable measure of therapeutic efficacy in the treatment of inflammatory bowel disease. Gastroenterology 1994;106:287–96. [DOI] [PubMed] [Google Scholar]
27. Naegeli AN, Hunter T, Dong Y, et al. Full, partial, and modified permutations of the mayo score: characterizing clinical and patient-reported outcomes in ulcerative colitis patients. Crohns Colitis 360 2021;3:otab007. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Lewis JD, Chuai S, Nessel L, Lichtenstein GR, Aberra FN, Ellenberg JH.. Use of the noninvasive components of the Mayo score to assess clinical response in ulcerative colitis. Inflamm Bowel Dis 2008;14:1660–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297–334. [Google Scholar]
30. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6:284–90. [Google Scholar]
31. Cohen J. A power primer. Psychol Bull 1992;112:155–9. [DOI] [PubMed] [Google Scholar]
32. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. New York: Taylor and Francis; 2013. [Google Scholar]
33. Hinkle DE, Wiersma W, Jurs SG.. Applied Statistics for the Behavioral Sciences. Boston, MA: Houghton Mifflin College Division; 2003. [Google Scholar]
34. Magro F, Lopes J, Borralho P, et al. Comparison of different histological indexes in the assessment of UC activity and their accuracy regarding endoscopic outcomes and faecal calprotectin levels. Gut 2019;68:594–603. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jjad087_suppl_Supplementary_Material

Click here for additional data file.^{(62.8KB, docx)}

Data Availability Statement

The data underlying this article are available in the article and in its online Supplementary Material.

[CIT0001] 1. Danese S, Roda G, Peyrin-Biroulet L.. Evolving therapeutic goals in ulcerative colitis: towards disease clearance. Nat Rev Gastroenterol Hepatol 2020;17:1–2. [DOI] [PubMed] [Google Scholar]

[CIT0002] 2. Ungaro R, Colombel J-F, Lissoos T, Peyrin-Biroulet L.. A treat-to-target update in ulcerative colitis: a systematic review. Am J Gastroenterol 2019;114:874–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0003] 3. Turner D, Ricciuto A, Lewis A, et al. an update on the Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE) initiative of the International Organization for the Study of IBD (IOIBD): determining therapeutic goals for treat-to-target strategies in IBD. Gastroenterology 2021;160:1570–83. [DOI] [PubMed] [Google Scholar]

[CIT0004] 4. Peyrin-Biroulet L, Sandborn W, Sands B, et al. Selecting therapeutic targets in inflammatory bowel disease (STRIDE): determining therapeutic goals for treat-to-target. Am J Gastroenterol 2015;110:1324–38. [DOI] [PubMed] [Google Scholar]

[CIT0005] 5. Shah SC, Colombel J-F, Sands BE, Narula N.. Mucosal healing is associated with improved long-term outcomes of patients with ulcerative colitis: a systematic review and meta-analysis. Clin Gastroenterol Hepatol 2016;14:1245–1255.e8. [DOI] [PubMed] [Google Scholar]

[CIT0006] 6. Pai RK, Jairath V, Casteele NV, Rieder F, Parker CE, Lauwers GY.. The emerging role of histologic disease activity assessment in ulcerative colitis. Gastrointest Endosc 2018;88:887–98. [DOI] [PubMed] [Google Scholar]

[CIT0007] 7. Ma C, Panaccione R, Fedorak RN, et al. Heterogeneity in definitions of endpoints for clinical trials of ulcerative colitis: a systematic review for development of a core outcome set. Clin Gastroenterol 2018;16:637–47. e13. [DOI] [PubMed] [Google Scholar]

[CIT0008] 8. Peyrin-Biroulet L. Mucosal healing in Crohn’s disease and ulcerative colitis. Gastroenterol Hepatol 2020;16(4):206–208. [PMC free article] [PubMed] [Google Scholar]

[CIT0009] 9. U.S. Food and Drug Administration. Ulcerative colitis: clinical trial endpoints guidance for industry. Washington, D.C.: US Food and Drug Administration; 2016. [Google Scholar]

[CIT0010] 10. Powers JH III, Patrick DL, Walton MK, et al. Clinician-reported outcome assessments of treatment benefit: report of the ISPOR clinical outcome assessment emerging good practices task force. Value Health 2017;20:2–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0011] 11. Dave M, Loftus EV Jr. Mucosal healing in inflammatory bowel disease—a true paradigm of success? Gastroenterol Hepatol 2012;8:29-38. [PMC free article] [PubMed] [Google Scholar]

[CIT0012] 12. D’Angelo F, Felley C, Frossard JL.. Calprotectin in daily practice: where do we stand in 2017? Digestion 2017;95:293–301. [DOI] [PubMed] [Google Scholar]

[CIT0013] 13. Arkteg CB, Wergeland Sørbye S, Buhl Riis L, Dalen SM, Florholmen J, Goll R.. Real-life evaluation of histologic scores for ulcerative colitis in remission. PLoS One 2021;16:e0248224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0014] 14. Magro F, Doherty G, Peyrin-Biroulet L, et al. ECCO position paper: harmonization of the approach to ulcerative colitis histopathology. J Crohns Colitis 2020;14:1503–11. [DOI] [PubMed] [Google Scholar]

[CIT0015] 15. Geboes K, Riddell R, Öst A, Jensfelt B, Persson T, Löfberg R.. A reproducible grading scale for histological assessment of inflammation in ulcerative colitis. Gut 2000;47:404–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0016] 16. Jauregui-Amezaga A, Geerits A, Das Y, et al. A simplified Geboes score for ulcerative colitis. J Crohns Colitis 2017;11:305–13. [DOI] [PubMed] [Google Scholar]

[CIT0017] 17. Nunnally JC. Psychometric theory 3E. New York: Tata McGraw-Hill Education; 1994. [Google Scholar]

[CIT0018] 18. Mosli MH, Feagan BG, Zou G, et al. Development and validation of a histological index for UC. Gut 2017;66:50–8. [DOI] [PubMed] [Google Scholar]

[CIT0019] 19. Marchal-Bressenot A, Salleron J, Boulagnon-Rombi C, et al. Development and validation of the Nancy histological index for UC. Gut 2017;66:43–9. [DOI] [PubMed] [Google Scholar]

[CIT0020] 20. Guyatt G, Mitchell A, Irvine EJ, et al. A new measure of health status for clinical trials in inflammatory bowel disease. Gastroenterology 1989;96:804–10. [PubMed] [Google Scholar]

[CIT0021] 21. Henriksen M, Jahnsen J, Lygren I, et al. C-reactive protein: a predictive factor and marker of inflammation in inflammatory bowel disease. Results from a prospective population-based study. Gut 2008;57:1518–23. [DOI] [PubMed] [Google Scholar]

[CIT0022] 22. Schroeder KW, Tremaine WJ, Ilstrup DM.. Coated oral 5-aminosalicylic acid therapy for mildly to moderately active ulcerative colitis. N Engl J Med 1987;317:1625–9. [DOI] [PubMed] [Google Scholar]

[CIT0023] 23. Walsham NE, Sherwood RA.. Fecal calprotectin in inflammatory bowel disease. Clin Exp Gastroenterol 2016;9:21–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0024] 24. Ware Jr JE, Sherbourne CD.. The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection. Med Care 1992;30:473-83. [PubMed] [Google Scholar]

[CIT0025] 25. Yarlas A, Maher SM, Bayliss MS, Lovley A, Cappelleri JC, DiBonaventura MD.. Psychometric validation of the work productivity and activity impairment questionnaire in ulcerative colitis: results from a systematic literature review. J Patient-Rep Outcomes 2018;2:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0026] 26. Irvine EJ, Feagan B, Rochon J, et al. Quality of life: a valid and reliable measure of therapeutic efficacy in the treatment of inflammatory bowel disease. Gastroenterology 1994;106:287–96. [DOI] [PubMed] [Google Scholar]

[CIT0027] 27. Naegeli AN, Hunter T, Dong Y, et al. Full, partial, and modified permutations of the mayo score: characterizing clinical and patient-reported outcomes in ulcerative colitis patients. Crohns Colitis 360 2021;3:otab007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0028] 28. Lewis JD, Chuai S, Nessel L, Lichtenstein GR, Aberra FN, Ellenberg JH.. Use of the noninvasive components of the Mayo score to assess clinical response in ulcerative colitis. Inflamm Bowel Dis 2008;14:1660–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0029] 29. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297–334. [Google Scholar]

[CIT0030] 30. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6:284–90. [Google Scholar]

[CIT0031] 31. Cohen J. A power primer. Psychol Bull 1992;112:155–9. [DOI] [PubMed] [Google Scholar]

[CIT0032] 32. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. New York: Taylor and Francis; 2013. [Google Scholar]

[CIT0033] 33. Hinkle DE, Wiersma W, Jurs SG.. Applied Statistics for the Behavioral Sciences. Boston, MA: Houghton Mifflin College Division; 2003. [Google Scholar]

[CIT0034] 34. Magro F, Lopes J, Borralho P, et al. Comparison of different histological indexes in the assessment of UC activity and their accuracy regarding endoscopic outcomes and faecal calprotectin levels. Gut 2019;68:594–603. [DOI] [PubMed] [Google Scholar]

PERMALINK

A Comparative Evaluation of the Measurement Properties of Three Histological Indices of Mucosal Healing in Ulcerative Colitis: Geboes Score, Robarts Histopathology Index and Nancy Index

Laurent Peyrin-Biroulet

Ethan Arenson

David T Rubin

Corey A Siegel

Scott Lee

F Stephen Laroux

Wen Zhou

Tricia Finney-Hayward

Yuri Sanchez Gonzalez

Alan L Shields

Abstract

Background and Aims

Methods

Results

Conclusions

1. Introduction

2. Materials and methods

2.1. Study population and design

2.2. Primary histological assessments

2.2.1. Geboes Score

2.2.2. Robarts Histopathology Index

2.2.3. Nancy Index

2.3. Secondary assessments

2.4. Statistical analyses

2.4.1. Score reliability

2.4.2. Construct-related validity

2.4.3. Known-groups analysis

2.4.4. Sensitivity to change

3. Results

3.1. Baseline characteristics

Table 1.

3.2. GS, RHI and NI score properties

Table 2.

3.3. Reliability

Table 3.

3.4. Construct-related validity

Table 4.

3.4.1. Geboes Score

Table 5.

3.4.2. Robarts Histopathology Index

3.4.3. Nancy Index

3.5. Known-groups analysis

Table 6.

3.6. Sensitivity to change

Table 7.

Table 8.

4. Discussion

Supplementary Material

Acknowledgements

Contributor Information

Funding

Conflict of Interest

Author Contributions

Data Availability

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases