Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Mar 8;16(3):e0248224. doi: 10.1371/journal.pone.0248224

Real-life evaluation of histologic scores for Ulcerative Colitis in remission

Christian Børde Arkteg 1,*, Sveinung Wergeland Sørbye 2, Lene Buhl Riis 3,4, Stig Manfred Dalen 2, Jon Florholmen 1,5, Rasmus Goll 1,5
Editor: Valérie Pittet6
PMCID: PMC7939352  PMID: 33684168

Abstract

Background

Histological evaluation of ulcerative colitis (UC) patients has been debated ever since the first description of the disease and its role in follow-up has never been fully established. Recent evidence suggests an added benefit in accuracy when evaluating if the patient is in remission. Unfortunately, there are several different histological indices, and it is difficult to compare outcomes where different scores are applied. Histopathological evaluation is prone to subjective biases, despite the use of indices. In addition, these indices are developed by expert IBD pathologist, but applied at large, by general pathologist. Therefore, we evaluated the three most applied histological indices for UC on samples from patients in remission to compare test qualities and estimate their usefulness to identify remission by both general and GI specialized pathologist.

Method

Mucosal biopsies from 41 UC patients in clinical and endoscopic remission were collected as part of a larger study on UC. Three pathologists blinded to the patients’ clinical status evaluated them using Geboes score (GS), Nancy Index (NI) and Robarts Histopathological Index (RHI). We calculated the agreement between the pathologists using Inter-class correlation (ICC) and visualized it with ICC-plots and Bland-Altman plots. Association between clinical factors and histological category were analysed by Fisher’s exact test.

Results

The ICC value for GS, RHI and NI were 0.85, 0.73 and 0.70 respectively. The limits of agreement were ±6.1, ±4.0 and ±1.4, for GS, RHI and NI, respectively. Mayo endoscopic subgrade and UC clinical score did not show association with any histological scores. Despite clinical and endoscopic remission 7–35% of the patients displayed histological inflammation on a level classified as active disease, depending on the index and cut-off.

Conclusion

A substantial amount of UC patients in clinical and endoscopic remission display inflammation on a histological level, but the ability to classify these patients accurately and consistently could be improved.

Introduction

Ulcerative colitis (UC) is a chronic disease of the colon with relapsing-remitting characteristics. The introduction of targeted antibodies, such as anti-TNF, directed against key pro-inflammatory mediators, has improved patient outcome and lowered colectomy rates [1, 2]. However, the medication is expensive and has serious side effects like lowering the immune competency against certain infections and cancers [3]. Therefore, finding optimal criteria for remission is important, not only for the patients’ health but also in a health-care economic aspect.

There is no universally applied definition of the state of remission, but usually only clinical or endoscopy-based scores are applied. The current treatment goal is partial Mayo score/SCCAI ≤1 and mucosal healing (MH) which is defined by Mayo endoscopic score (MES) of ≤1 [46]. However, this recommendation is moving towards MH to include only MES/Ulcerative Colitis Endoscopic Index of Severity (UCEIS) of 0 [7].

Histology adds a dimension in the evaluation of remission which can be beneficial. This was illustrated by a relapse prediction model that included both histologic and endoscopic activity. The model could predict relapse better than endoscopy alone [8]. Histology can detect subclinical inflammation despite endoscopically normal/near-normal mucosa and this inflammation increases the risk of an unfavourable outcome, such as relapse or neoplasia [913]. The European Crohn’s and Colitis organization has recently published guidance on this topic [14]. However, the multiple scoring indices for histopathology in UC makes it difficult to compare the results between papers [15]. In addition, most of these indices lack thorough validation [16, 17]. Geboes Score (GS), Nancy Index (NI) and Robarts Histopathological Index (RHI) are the few that are partly/fully validated, and they vary in complexities and features they evaluate [1820]. The position paper for ECCO recommends NI for clinical practice and observational studies. For histology to be of use in determining remission certain criteria must be fulfilled: A. It must add information of the inflammatory state not otherwise obtained. B. It must reliably and accurately identify these signs. C. The use yields a benefit in patient outcome. This paper focuses on the two first subjects, but also explores the relationship between histology grades and clinical parameters.

The US Food and Drug administration now recommends that histopathology should be included as endpoints in new trials. Therefore, there is an urgent need to define the histopathological remission state so it can be applied in trials and in the clinic. To address this, we evaluated the properties of the three most validated histological indices in a population defined to be in remission according to the current recommendations.

Material and methods

Study population

This study is a part of the Advanced Study of Inflammatory Bowel disease (ASIB) prospective study at the University Hospital of Northern Norway, Tromsø. All study participants gave written, informed consent. The study and storage of biological material was approved of by the Regional Committees for Medical and Health Research Ethics, division North (REK Nord ID:2012/1349).

The selected participants were previously diagnosed with UC according to diagnostic recommendation [5]. Overview of baseline characteristics is presented in Table 1. Sample collection was performed at routine endoscopy for patients in remission from August 2013 to April 2016.The most frequent clinical indication being follow-up due to cancer screening and de-escalation of treatment. Inclusion criteria were age between 18 and 80 with clinical and endoscopic remission defined as Mayo clinical score/Ulcerative Colitis Clinical Score (UCCS) of 0 or 1 and Mayo endoscopic score (MES) of 0 or 1. Total Mayo score above 1 or rectal bleeding was not included. IBD medication was not an inclusion or exclusion factor.

Table 1. Baseline characteristics.

UC remission
Number of patients 41
Gender(M/F) 16/25
Age(mean) 43
Biopsy location (Rectum/Sigmoid/Other) 22/13/6
Average endoscopic score (MES) 0.24
Average clinical score (UCCS) 0.15
Median Robarts Histopathological Index 1
Median Nancy Index 0
Median Geboes Score 4
Average Disease duration 8.8 years

Histology

All biopsies were formalin fixed immediately after sampling and embedded in paraffin. Multiple 3-μm sections were cut with a Micron microtome (HM355S, ThermoFisher, Tudor Rd, Runcorn WA7 1TA, United Kingdom) and stained with haematoxylin and eosin. In cases of multiple biopsies from one patient, the highest scoring biopsy was included in the analysis. Slides were investigated by three pathologists (SWS, SMD and LBR) blinded to the endoscopic score and biopsy location. SWS and SMD are general pathologists who evaluate 200–300 GI samples yearly, of which about 20–30 are IBD related. LBR works mainly with GI samples and sees around 180–360 IBS samples yearly. The final score for a biopsy is the average of the three pathologists. Two of the pathologists are located at the University Hospital of North-Norway while the third is located at Herlev Hospital in Denmark. SWS and LBR evaluated the slides using white light microscopy, while SMD evaluated the slides digitally, scanning them with Pannoramic 250 Flash III (3DHISTECH Ltd. Budapest, Öv u. 3, 1141, Hungary) at 40x with CaseViewer 2.3. In order to evaluate for intra-rater variability and explore the difference between light microscopy and digital microscopy SWS evaluated the slides a second time digitally with a 2-month interval. All pathologists were sent a scoring protocol to improve coherent rating.

The definition of remission across the three indices is not set, different studies have used different cut-offs. GS range from 13 to 7 (Table A of GS continuous vs original Table A in S1 Appendix) and RHI from <6 to ≤1 [2126]. While the developers of NI suggest that ≤1 should be the cut-off, 0 is also applied in some papers [27]. As the cut-off values are debatable, it is of interest to explore the impact these definitions would have on a population in clinical/endoscopic remission. Therefore, we defined two separate definitions of histological remission, one strict and one relaxed. To be in line with previous research and to exclude mucosal neutrophils and basal plasmacytosis, the strict cut-off was GS <7, RHI <4 and no points allowed for neutrophils in neither epithelium nor lamina propria and NI = 0 [14, 2730]. The relaxed cut-offs for remission for NI and RHI are the developer’s recommendation (NI <2, RHI <6). For GS, the relaxed cut-off is widely applied (GS <13) [30].

Statistics

All statistics were performed with Rstudio Version 1.2.5019. Inter/intra-rater calculation was done with the “irr” and “KappaGUI” packages. Inter-rater on ordinal/continuous variables was performed with two-way random, average score, intraclass correlation coefficient (ICC) for consistency (C,3). Intra-rater was performed with two-way random, single score ICC for absolute agreement (A,1). On categorical variables Fleiss’ kappa was applied and evaluated according to Landis et. al: < 0: Poor agreement, 0.01–0.20: Slight agreement, 0.21–0.40: Fair agreement, 0.41–0.60: Moderate agreement, 0.61–0.80: Substantial agreement, 0.81–1.00: Almost perfect agreement [31]. Bland Altman plots were calculated with mean squared error according to method proposed by Mark Jones et.al [32]. All scores were standardized by dividing them on their theoretical max and then transformed with square root because of skewness in the raw data. The standardization makes limits of agreement (LOA) directly comparable. We investigated systematic rating differences between raters with Kruskal-Wallis rank sum test. If significant, we made a sub-analysis to identify which graders were different. The sub-analysis was performed as pairwise comparisons using Wilcoxon rank sum test with multiple comparison adjusted p-values (Benjamini and Hochberg). Relationships between two dichotomous variables was assessed with chi-square test or Fisher exact test, dependent on group sizes. These statistical tests were performed with “rstatix” package for R.

Results

In total 41 biopsies from 41 UC patients in clinical and endoscopic remission were evaluated by two general pathologists and one GI-specialized pathologists using all three scoring indices. Only five biopsies were evaluated as 0 by all three pathologists across all three indices. Median scores for indices were 7, 4 and 1 for GS, RHI and NI, respectively. Between 7 and 15% of all the samples still exhibited histological activity to such a degree that they would be classified as active disease with a relaxed histological remission definition (GS<13, R<5, N<2). With a stricter remission definition, the share of active disease increases to between 22–32% of all samples (GS<7, R<4, N<1). There was a systematic difference between the three pathologists, where LBR rated higher on average than SWS and SMD with GS, but with a similar standard deviation (S1 Table). This was significant in a Kruskal-Wallis rank sum test for both GS and RHI (S1 Fig and S2 Table)

Agreement between raters

The inter-rater ICC value for the features vary from poor(<0.50) to excellent (>0.90) according to the classification suggested by Koo et al. (Table 2) [33]. Features describing severe inflammations are over/under-estimated due to small sample size for those features. Only GS achieves an agreement of good, while RHI and NI achieves moderate agreement. The intra-rater evaluation displayed better results, as the final score for the three indices ranged from good to excellent (0.78–0.92, Table 2). Fig 1 is an ICC plot illustrating inter-rater agreement between raters for each slide on the Final score for each index. Modified Bland-Altman (BA) plots displayed the limit of agreement as ±0.53, ±0.59 and ±0.35 for GS, NI and RHI, respectively (Fig 2). If transformed back to the original values it corresponds to ±6.1, ±1.4 and ±4.0. There is a tendency of higher agreement in the extremes of the scores, albeit not a big difference.

Table 2. ICC values for histological feature agreement.

Geboes Score ICC Inter ICC Intra N. patient*
Grade 0 Structural architectural changes 0.65 (0.42–0.80) 0.95 (0.91–0.97) 26
Grade 1 Chronic inflammatory infiltrate 0.83(0.71–0.90) 0.78(0.62–0.88) 26
Grade 2A Eosinophils 0.65 (0.42–0.80) -0.04(-0.33–0.26) 15
Grade 2B Lamina propria neutrophils 0.77(0.61–0.87) 0.89(0.80–0.94) 4
Grade 3 Neutrophils in epithelium 0.89(0.81–0.94) 1.00 4
Grade 4 Cryptdestruction -0.03(-0.73–0.42) 0.00 (-0.30–0.30) 3
Grade 5 Erosion/ulcus 0.10 (0.51–0,49) NA- 3
Final Grade 0.85(0.75–0,91) 0.96 (0.93–0.98) 35
Robarts Histopathological Index
Chronic Inflammatory Infiltrate 0.83 (0.71–0.90) 0.77(0.62–0.87) 26
Lamina propria neutrophils 0.77(0.61–0.87) 0.79(0.64–0.88) 4
Neutrophils in epithelium 0.89(0.81–0.94) 1.00 4
Erosion/Ulceration 0.09(0.53–0.48) NA- 3
Final Grade 0.73(0.54–0.85) 0.96(0.93–0.98) 26
Nancy Index
Chronic inflammatory cell 0.42(0.02–0.67) 0.38 (0.10–0.61) 5
Acute inflammatory cells 0.79 (0.64–0.88) 0.86 (0.75–0.92) 10
Ulceration -0.04(-0.74–0.41) NA 2
Final Grade 0.70(0.50–0.83) 0.86(0.75–0.92) 13

*Number of patients with a score >0.

Fig 1. ICC dot plot.

Fig 1

The plot illustrates how the Final grade for each slide is scored by the three pathologists. The size of circle indicates how many raters gave the same score.

Fig 2. Modified Bland-Altman plot.

Fig 2

The plot indicating the limits of agreements for the Final grade for each index. The plot shows less dispersion in the high and low average values, indicating higher agreement. The absolute scores were standardized then transformed by the square-root. This makes the LOA directly comparable.

Remission aid and clinical application

Next, we evaluated the inter-rater properties with two different cut-offs for remission, relaxed (GS <13, RHI <5, NI <2) and strict (GS <7, RHI<3 and NI <1). The latter definition resulted in a doubling of patients defined with active disease with GS and NI but no change in RHI (S2 Fig). Both cut-offs showed similar kappa values, from fair to moderate agreement (Table 3). NI and RHI performed slightly better than GS with strict cut-off.

Table 3. Inter-rater agreement evaluated with Fleiss’ kappa.

Strict Relaxed
GS 0.30 0.57
NI 0.44 0.44
RHI 0.48 0.47

GS, Geboes Score; NI, Nancy Index; RHI, Robarts Histopathological Index.

Thereafter, we investigated if there was a difference between high (MES = 1 and UCCS = 1) and low (MES = 0 and UCCS = 0) endoscopic and clinical grade and the histological category (Active or Remission). Neither clinical grade, nor endoscopic grade showed significant dependence with the histological category, regardless of strict or relaxed cut-off (S3 Table).

To control for potential confounding factors, we investigated difference in histology score by their biopsy location and IBD medication. The distribution was rectum (n = 22), sigmoid (n = 13), and other (n = 6). The Kruskal-Wallis test showed no significant difference between the different locations (S3 Fig) and the Wilcoxon rank sum test showed no effect of different medication on the histological scores (S4 Table).

Discussion

This observation study evaluates the performance of the three most validated histological scores for UC in a remission setting. The main findings are a poor to excellent inter-rater agreement between the three histological scores, as well as a fair to moderate inter-rater agreement for determining remission. The patients were defined as remission patients according to the current guidelines (i.e. clinical and endoscopic remission). Nevertheless, a substantial number showed histologic inflammatory activity, indicating that histology can unveil inflammatory features in a population of patients in remission pre-selected on clinical and endoscopic findings. This is in line with previous publications [34, 35].

Compared to previous research, our results show lower concordance between raters. Jairath et al. had inter-rater ICC of 0.88, 0,86, 0,80 for GS, RHI and NI respectively [36] and Marchal-Bressnot et al. achieved a ICC value of 0.86 when developing the NI [20]. Mosli et al. achieved 0.82 when developing RHI [19]. The GS method paper applied pairwise Cohen’s kappa and is not comparable with our results [18]. This difference could be either the result of different interpretations of scores between our raters or observational errors. All raters were sent the same scoring protocol (S1 Appendix) to improve coherent rating. It could be argued that a scoring protocol is not sufficient to ensure coherent rating from general pathologists. We argue that this the actual situation in most hospitals outside specialized tertiary centres. Thus, our results show the real-life utility of the scores. By including nothing but patients in remission, only the lower range of the histologic scales are represented, and this may be viewed as a “stress test” of the scores for this specific patient group. Consequently, lower inter-rater agreement is expected.

The results show discrepancies in the severe inflammatory features due to low number of samples. This is not the case for the eosinophilic feature of GS and the structural architectural changes features. The number of eosinophils vary greatly between subjects depending on age and location in a healthy colon [37]. There is no recommendation for what an acceptable cut-off for eosinophils per segment of colon in UC remission is, and therefore, evaluating this feature coherently is challenging. Normal variation is also one of the challenges when evaluating structural architectural changes. Due to the inherent number of histological features included in this grade such as crypt branching, mucin depletion etc., an overview of it will leave too many features to subjective interpretations. Especially, since “grade 0.0 is indicated the absence of any abnormality.”, which is almost never the case. It could be interpreted as disease specific abnormality, but the need for subjective interpretations on numerous features will challenge coherency between raters. GS is the only index to include such a feature.

Another subgrade that scored low in the inter-rater score is the Nancy Grade 1, chronic inflammatory cells. It is difficult to distinguish between moderate to severe amount of chronic inflammation from acute inflammatory features. There are seldom signs of severe chronic inflammation without concomitant presence of neutrophils, which defines the criteria of grade 1: “Grade 1 corresponds to the lack of mucosal neutrophils, a pivotal marker of disease activity, even though moderate or severe chronic inflammation can be present” [20]. Thus, making it a cause for variation and in many cases redundant.

Our intra-rater evaluation was as good or better than the inter-rater values, except for the eosinophile feature. Interestingly, there was a clearer difference between the raters than between modalities (white light microscopy or digitally scanned slides), suggested by the intra-rater results and the Kruskal-Wallis test (S1 Fig). This indicates that these methods can be used interchangeably for NI and RHI which does not evaluate eosinophils specifically. The high intra-rater score and the relatively low inter-rater score indicate that a central raters approach to IBD-pathology could be beneficial. Standardization of extraction, preparation and scanning is easier to achieve than extensive training of pathologists.

The modified Bland-Altman analysis identified the same as the Kruskal-Wallis analysis, that one pathologist rates higher than the two others, nevertheless the standard deviations are similar (S1 Table). The obvious explanation the IBD-related experience difference between LBR and the general pathologists. This indicating that experience gives a different understanding of the scores, which seems to effect accuracy but not necessarily precision as the IQR/SD are similar between raters. The LOA gives the amount the raters can be discordant with the mean estimated score. The results show better agreement for RHI than NI and GS. This could be a result of RHI being developed from GS by selecting the features of best agreement. If we evaluate the absolute LOA scores it shows that all the scores are rather insensitive to minor differences, this can explain the drop in agreement when dichotomizing the indices from continuous variables to “Active” and “Remission”. The agreement appears to be better in the low and high average scores for all three indices. An explanation could be that it is easier to rate the extremes of the distribution rather than the middle. This is unfortunate as the cut-off for remission is in the low-middle of the distribution.

In our data we defined two cut-off values for histologic remission in order to investigate whether there would be a difference between the two groups in relation to other clinical features of importance. There was no dependence between the clinical scores (MES and UCCS) and histologic category for any of the indices. This is important because a high degree of dependence between clinical scores and histology would render one of the factors redundant, as one factor could predict the other. By being independent they can complement each other Previous studies are conflicting in their report of this relationship between clinical scores and histologic scores [9, 12, 18, 21, 38].

Strengths and limitations

Our study has several limitations, first and foremost is the different modalities used to evaluate the slides and the blinding to the biopsy location. The evaluation of eosinophils was challenged by two factors, one was the blinding of biopsy location to the pathologists and the different modalities of observation for the intra-rater analysis. Eosinophils significance in UC can be debated as the two recent indices does not it include it in a separate category and marked increase in eosinophils without other inflammatory cells suggest eosinophilic colitis and not UC. Despite the poor intra-rater value for eosinophils, the other categories had good intra-rater ICC, which suggests that error introduced by scanning the samples is small. Unfortunately, our biopsies are not orientated after collection so the pathologists could not reliably evaluate basal plasmacytosis, defined as plasma cells between the base of the crypts and muscularis mucosae [39]. Nevertheless, plasma cells fall under the category of chronic cell infiltrate, which is evaluated in all indices. One could argue that a scoring protocol is insufficient education to achieve accurate evaluation by a general pathologist.

Our strengths are the approximation of real-world setting where patients are under different treatment regimens and in different clinical settings, making any finding representative for the IBD remission population.

Conclusion

Our study evaluated reliability of histology scores, in order to estimate their usefulness in clinical decisions. We found that there is a moderate to good agreement between raters when using three of the most common histological scoring indices, but with a LOA that could be improved. Unfortunately, when dichotomising the scores into active and remission the agreement falls to fair and moderate. Therefore, without more extensive training or the adoption of a central raters approach using the current histological indices for deciding remission should be done with caution.

Supporting information

S1 Fig. Difference between raters.

Significant difference between raters were tested with Wilcoxon rank sum test with Benjamini-Hochberg adjusted p-values.

(PDF)

S2 Fig. Bar plot.

Difference in patients classified as remission or active according to relaxed or strict definition of remission.

(PDF)

S3 Fig. Histologic score by biopsy location.

No difference was found between biopsy locations.

(PDF)

S1 Table. Descriptics for raters.

(DOCX)

S2 Table. Kruskal-Wallis rank sum test on raters, by indices.

(DOCX)

S3 Table. Fisher exact test between the UCCS and MES scores and histologic category.

The table shows that histological category is independent for whether the sample was collected from a MES/UCCS 0 or 1 patient. This was true for both Strict and Relaxed category.

(DOCX)

S4 Table. Wilcoxon rank sum test on the effect of medication on histological scores.

(DOCX)

S1 Appendix. Scoring aid provided to the pathologists.

(DOCX)

S1 Data

(CSV)

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This work was funded by Northern Norway Regional Health Authority ID: SFP1134-13, HNF1517-20 and HNF1468-19 The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Sandborn WJ, Rutgeerts P, Feagan BG, Reinisch W, Olson A, Johanns J, et al. Colectomy Rate Comparison After Treatment of Ulcerative Colitis With Placebo or Infliximab. Gastroenterology. 2009;137(4):1250–60. 10.1053/j.gastro.2009.06.061 [DOI] [PubMed] [Google Scholar]
  • 2.Viscido A, Papi C, Latella G, Frieri G. Has infliximab influenced the course and prognosis of acute severe ulcerative colitis? Biologics. 2019;13:23–31. 10.2147/BTT.S179006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Siegel CA, Hur C, Korzenik JR, Gazelle GS, Sands BE. Risks and benefits of infliximab for the treatment of Crohn’s disease. Clinical gastroenterology and hepatology: the official clinical practice journal of the American Gastroenterological Association. 2006;4(8):1017–24; quiz 976. 10.1016/j.cgh.2006.05.020 [DOI] [PubMed] [Google Scholar]
  • 4.Peyrin-Biroulet L, Sandborn W, Sands BE, Reinisch W, Bemelman W, Bryant RV, et al. Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE): Determining Therapeutic Goals for Treat-to-Target. The American journal of gastroenterology. 2015;110(9):1324–38. 10.1038/ajg.2015.233 [DOI] [PubMed] [Google Scholar]
  • 5.Maaser C, Sturm A, Vavricka SR, Kucharzik T, Fiorino G, Annese V, et al. ECCO-ESGAR Guideline for Diagnostic Assessment in IBD Part 1: Initial diagnosis, monitoring of known IBD, detection of complications. Journal of Crohn’s & colitis. 2019;13(2):144–64. 10.1093/ecco-jcc/jjy113 [DOI] [PubMed] [Google Scholar]
  • 6.Lamb CA, Kennedy NA, Raine T, Hendy PA, Smith PJ, Limdi JK, et al. British Society of Gastroenterology consensus guidelines on the management of inflammatory bowel disease in adults. Gut. 2019:gutjnl-2019-318484. 10.1136/gutjnl-2019-318484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ungaro R, Colombel J-F, Lissoos T, Peyrin-Biroulet L. A Treat-to-Target Update in Ulcerative Colitis: A Systematic Review. American Journal of Gastroenterology. 2019;114(6):874–83. 10.14309/ajg.0000000000000183 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bryant RV, Burger DC, Delo J, Walsh AJ, Thomas S, von Herbay A, et al. Beyond endoscopic mucosal healing in UC: histological remission better predicts corticosteroid use and hospitalisation over 6 years of follow-up. Gut. 2016;65(3):408–14. 10.1136/gutjnl-2015-309598 [DOI] [PubMed] [Google Scholar]
  • 9.Kim DB, Lee KM, Lee JM, Chung YY, Sung HJ, Paik CN, et al. Correlation between Histological Activity and Endoscopic, Clinical, and Serologic Activities in Patients with Ulcerative Colitis. Gastroenterology research and practice. 2016;2016:5832051. 10.1155/2016/5832051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Iacucci M, Fort Gasia M, Hassan C, Panaccione R, Kaplan GG, Ghosh S, et al. Complete mucosal healing defined by endoscopic Mayo subscore still demonstrates abnormalities by novel high definition colonoscopy and refined histological gradings. Endoscopy. 2015;47(8):726–34. 10.1055/s-0034-1391863 [DOI] [PubMed] [Google Scholar]
  • 11.Frieri G, Galletti B, Di Ruscio M, Tittoni R, Capannolo A, Serva D, et al. The prognostic value of histology in ulcerative colitis in clinical remission with mesalazine. Therapeutic Advances in Gastroenterology. 2017;10(10):749–59. 10.1177/1756283X17722926 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lobatón T, Bessissow T, Ruiz-Cerulla A, De Hertogh G, Bisschops R, Guardiola J, et al. Prognostic value of histological activity in patients with ulcerative colitis in deep remission: A prospective multicenter study. United European Gastroenterology Journal. 2018;6(5):765–72. 10.1177/2050640617752207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gupta RB, Harpaz N, Itzkowitz S, Hossain S, Matula S, Kornbluth A, et al. Histologic inflammation is a risk factor for progression to colorectal neoplasia in ulcerative colitis: a cohort study. Gastroenterology. 2007;133(4):1099–105; quiz 340–1. 10.1053/j.gastro.2007.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Magro F, Doherty G, Peyrin-Biroulet L, Svrcek M, Borralho P, Walsh A, et al. ECCO Position Paper: Harmonisation of the approach to Ulcerative Colitis Histopathology. Journal of Crohn’s & colitis. 2020. [DOI] [PubMed] [Google Scholar]
  • 15.Boal Carvalho P, Cotter J. Mucosal Healing in Ulcerative Colitis: A Comprehensive Review. Drugs. 2017;77(2):159–73. 10.1007/s40265-016-0676-y [DOI] [PubMed] [Google Scholar]
  • 16.Pai RK, Jairath V, Vande Casteele N, Rieder F, Parker CE, Lauwers GY. The emerging role of histologic disease activity assessment in ulcerative colitis. Gastrointestinal Endoscopy. 2018;88(6):887–98. 10.1016/j.gie.2018.08.018 [DOI] [PubMed] [Google Scholar]
  • 17.Mosli MH, Feagan BG, Sandborn WJ, D’Haens G, Behling C, Kaplan K, et al. Histologic evaluation of ulcerative colitis: a systematic review of disease activity indices. Inflammatory bowel diseases. 2014;20(3):564–75. 10.1097/01.MIB.0000437986.00190.71 [DOI] [PubMed] [Google Scholar]
  • 18.Geboes K, Riddell R, Ost A, Jensfelt B, Persson T, Lofberg R. A reproducible grading scale for histological assessment of inflammation in ulcerative colitis. Gut. 2000;47(3):404–9. 10.1136/gut.47.3.404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mosli MH, Feagan BG, Zou G, Sandborn WJ, D’Haens G, Khanna R, et al. Development and validation of a histological index for UC. Gut. 2017;66(1):50–8. 10.1136/gutjnl-2015-310393 [DOI] [PubMed] [Google Scholar]
  • 20.Marchal-Bressenot A, Salleron J, Boulagnon-Rombi C, Bastien C, Cahn V, Cadiot G, et al. Development and validation of the Nancy histological index for UC. Gut. 2017;66(1):43–9. 10.1136/gutjnl-2015-310187 [DOI] [PubMed] [Google Scholar]
  • 21.Jauregui-Amezaga A, Geerits A, Das Y, Lemmens B, Sagaert X, Bessissow T, et al. A Simplified Geboes Score for Ulcerative Colitis. Journal of Crohn’s and Colitis. 2017;11(3):305–13. 10.1093/ecco-jcc/jjw154 [DOI] [PubMed] [Google Scholar]
  • 22.Sandborn WJ, Feagan BG, Wolf DC, D’Haens G, Vermeire S, Hanauer SB, et al. Ozanimod Induction and Maintenance Treatment for Ulcerative Colitis. The New England journal of medicine. 2016;374(18):1754–62. 10.1056/NEJMoa1513248 [DOI] [PubMed] [Google Scholar]
  • 23.Zenlea T, Yee EU, Rosenberg L, Boyle M, Nanda KS, Wolf JL, et al. Histology Grade Is Independently Associated With Relapse Risk in Patients With Ulcerative Colitis in Clinical Remission: A Prospective Study. The American journal of gastroenterology. 2016;111(5):685–90. 10.1038/ajg.2016.50 [DOI] [PubMed] [Google Scholar]
  • 24.Azad S, Sood N, Sood A. Biological and histological parameters as predictors of relapse in ulcerative colitis: a prospective study. Saudi journal of gastroenterology: official journal of the Saudi Gastroenterology Association. 2011;17(3):194–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Magro F, Lopes J, Borralho P, Lopes S, Coelho R, Cotter J, et al. Comparison of different histological indexes in the assessment of UC activity and their accuracy regarding endoscopic outcomes and faecal calprotectin levels. Gut. 2019;68(4):594–603. 10.1136/gutjnl-2017-315545 [DOI] [PubMed] [Google Scholar]
  • 26.Pai RK, Khanna R, D’Haens GR, Sandborn WJ, Jeyarajah J, Feagan BG, et al. Definitions of response and remission for the Robarts Histopathology Index. Gut. 2018:gutjnl-2018-317547. [DOI] [PubMed] [Google Scholar]
  • 27.Magro F, Lopes J, Borralho P, Dias CC, Afonso J, Ministro P, et al. Comparison of the Nancy Index With Continuous Geboes Score: Histological Remission and Response in Ulcerative Colitis. Journal of Crohn’s and Colitis. 2020. 10.1093/ecco-jcc/jjaa010 [DOI] [PubMed] [Google Scholar]
  • 28.Riley SA, Mani V, Goodman MJ, Dutt S, Herd ME. Microscopic activity in ulcerative colitis: what does it mean? Gut. 1991;32(2):174–8. 10.1136/gut.32.2.174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bitton A, Peppercorn MA, Antonioli DA, Niles JL, Shah S, Bousvaros A, et al. Clinical, biological, and histologic parameters as predictors of relapse in ulcerative colitis. Gastroenterology. 2001;120(1):13–20. 10.1053/gast.2001.20912 [DOI] [PubMed] [Google Scholar]
  • 30.Bessissow T, Lemmens B, Ferrante M, Bisschops R, Van Steen K, Geboes K, et al. Prognostic value of serologic and histologic markers on clinical relapse in ulcerative colitis patients with mucosal healing. The American journal of gastroenterology. 2012;107(11):1684–92. 10.1038/ajg.2012.301 [DOI] [PubMed] [Google Scholar]
  • 31.Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics. 1977;33(1):159–74. [PubMed] [Google Scholar]
  • 32.Jones M, Dobson A, O’Brian S. A graphical method for assessing agreement with the mean between multiple observers using continuous measures. International Journal of Epidemiology. 2011;40(5):1308–13. 10.1093/ije/dyr109 [DOI] [PubMed] [Google Scholar]
  • 33.Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15(2):155–63. 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Narang V, Kaur R, Garg B, Mahajan R, Midha V, Sood N, et al. Association of endoscopic and histological remission with clinical course in patients of ulcerative colitis. Intest Res. 2018;16(1):55–61. 10.5217/ir.2018.16.1.55 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Rosenberg L, Nanda KS, Zenlea T, Gifford A, Lawlor GO, Falchuk KR, et al. Histologic Markers of Inflammation in Patients With Ulcerative Colitis in Clinical Remission. Clinical Gastroenterology and Hepatology. 2013;11(8):991–6. 10.1016/j.cgh.2013.02.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jairath V, Peyrin-Biroulet L, Zou G, Mosli M, Vande Casteele N, Pai RK, et al. Responsiveness of histological disease activity indices in ulcerative colitis: a post hoc analysis using data from the TOUCHSTONE randomised controlled trial. Gut. 2019;68(7):1162–8. 10.1136/gutjnl-2018-316702 [DOI] [PubMed] [Google Scholar]
  • 37.Yantiss RK. Eosinophils in the GI tract: How many is too many and what do they mean? Modern Pathology. 2015;28(1):S7–S21. 10.1038/modpathol.2014.132 [DOI] [PubMed] [Google Scholar]
  • 38.Lemmens B, Arijs I, Van Assche G, Sagaert X, Geboes K, Ferrante M, et al. Correlation between the endoscopic and histologic score in assessing the activity of ulcerative colitis. Inflammatory bowel diseases. 2013;19(6):1194–201. 10.1097/MIB.0b013e318280e75f [DOI] [PubMed] [Google Scholar]
  • 39.Villanacci V, Antonelli E, Reboldi G, Salemme M, Casella G, Bassotti G. Endoscopic biopsy samples of naive "colitides" patients: role of basal plasmacytosis. Journal of Crohn’s & colitis. 2014;8(11):1438–43. 10.1016/j.crohns.2014.05.003 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Valérie Pittet

14 Dec 2020

PONE-D-20-32264

Real life evaluation of histologic scores for Ulcerative Colitis in remission

PLOS ONE

Dear Dr. Arkteg,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 28 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Valérie Pittet, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please provide additional information about the participant recruitment method and the demographic details of your participants. Please ensure you have provided sufficient details to replicate the analyses such as: the recruitment date range (month and year).

3. Please include the sequences of the primers used in your qPCR experiment in the supplementary data.

4. Thank you for including your ethics statement:  "The study and storage of biological material was approved of by the Regional Ethical Committee (REK Nord ID:2012/1349). "

Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study.

Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research.

5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors performed a timely study on the use of different histologic scoring systems to assess disease activity in patients with ulcerative colitis. They chose to study the three scoring systems that are most reported and recommended in literature. Biopsy samples from a total of 41 patients in clinical remission were scored independently by three general pathologists blinded to clinical and endoscopic findings. The authors show that the correlation between histologic disease activity and both clinical and endoscopic measures of disease activity was poor. This was also true for TNF levels measured by qPCR. Importantly, they show an inter-observer correlation (inter-class correlation) for the different scoring systems of 0.85, 0.73 and 0.70, respectively. This correlation was considerably lower when assessing histologic remission. The authors conclude that the relatively high inter-observer variation, compared to the available literature, may be due to differences in interpretation of the scoring systems between participating pathologists and/or the method used to calculate the variation.

The major strength of this study is that simulates a possible future clinical practice where not only gastrointestinal pathologists with an interest in IBD will be tasked to assess histologic remission status in biopsies from patients in clinical and endoscopic remission.

The ECCO has recently published guidelines on the use of histologic scoring systems. For clinical practice, they recommend to use the Nancy Index as the other scoring systems are deemed too complex in this setting. Please integrate this in the discussion.

It is common in articles describing inter-observer variation to inform readers on the experience of the pathologists. Please provide information regarding the length of experience and some insight into the number of IBD-related biopsies assessed annually.

From a biological point of view, the definitions of histologic remission are very different when comparing scoring systems. For instance, in the strict cut-off for histologic remission GS <2A does not correspond to NI =0 but rather to NI ≤1. Although the explanation by the authors is intuitive, the differences do not allow a valid comparison at the moment. Please adjust. Regarding the RHI <3, it is better to use an RHI ≤3 with subscores of 0 for lamina propria neutrophils and neutrophils in the epithelium.

The GS and the continuous GS are used interchangeably, which is confusing. Please use one system consistently.

The assessment of eosinophils is somewhat problematic. First, the fact that the pathologists were blinded to biopsy location may limit the assessment validity. Moreover, eosinophils were scored differently on scans compared to slides. Please address this in more detail in the discussion.

The authors state that intra-observer variation was assessed by one pathologist scoring the same biopsies on glass slides and using scans. Therefore, this variation cannot be quantified, as some (or indeed all) variation may be due to the difference between glass and scan. It does not seem appropriate to report the intra-observer variation in this context.

I miss a more thorough discussion on why the inter-observer variation was high compared to published data. I feel that sending four articles as a "scoring protocol" may serve as a common knowledge base, but is insufficient to introduce a new histologic parameter or scoring system in clinical practice. Moreover, the trend towards subspecialisation may result in dedicated gastrointestinal pathologists to be present in the majority of pathology institutes. I feel these important points warrant more discussion.

The authors performed “gene quantification” for TNF, as mentioned in the Materials in Methods section. I assume they measured mRNA levels. The integration of this data in the manuscript is limited. For instance, how many patients received anti-TNF therapy? Does this influence TNF mRNA levels? I do not feel this data adds value at the moment, consider removing it.

Please provide standard deviations and interquartile ranges where appropriate in Table 1.

Reviewer #2: In this manuscript, the authors evaluate the NHI, Geboes, and RHI among patients with clinical and endoscopic remission, in attempt to determine agreement between raters for each of the indices. Furthermore, they attempt to determine associations between clinical factors and histologic scores. Overall, the paper is trying to ask an important question about whether any of these indices have reliable inter-rater agreement, and furthermore, if the index rating can be further associated with clinical variables. However this paper is problematic for several reasons.

(1) The paper, overall, requires substantial revision to improve its readability, organization, and clarity. The introduction is wordy, and in many places several sentences are employed when one might have been used. Furthermore, the problem this paper plans to address (comparative utility of the three indices) is not well-set up, and the background and references need to be significantly strengthened, with a well-reasoned argument for why this problem is of importance. The materials/methods section suffers from a similar problem with some of the methods poorly defined. The histologic section, in particular, needs to be re-written with clarification of why specific cut-off values were chosen, further details on how pathologists rated slides (i.e. how many days separated each reading of the same slide?).

(2) The sample size here is small (41 patients) and inter-rater agreement has been previously assessed for these three indices using larger cohorts. Consequently, there is a problem of novelty here. What does this paper reveal that has not already been shown? The introduction should set this up with greater clarity so that the results section can address why this paper is novel, especially given the smaller sample size compared to prior efforts.

(3) The authors make the argument that assessment of histology is important for the ascertainment of clinical outcomes, however they do not evaluate clinical outcomes here. Instead, they examine cross-sectional clinical variables (MES 1 vs MES 0) and histologic grade and find no differences. Persistent histologic activity in MES 1 vs MES 0 has been well-established in larger cohorts; the failure to find a difference here is likely related to the smaller sample size. Given that, it remains unclear why these results should be of value in this draft. Furthermore, normalization of TNF levels was utilized as a surrogate for a predictor for long term remission; histologic grades were compared with high or low levels of TNF. This is an unusual covariate to examine, especially given the negative results here. I would raise questions as to why this covariate was specifically included, and would necessitate the authors include reasons as to why this - versus other potentially more useful covariates like calprotectin - was utilized.

Reviewer #3: This work is based on a cohort of UC patients in endoscopic remission and followed in one tertiary care center, whose aim was to assess histological activity based on the Geboes score, the Robarts and the Nancy indexes. I feel that some points need to be clarified.

- The authors observed lower concordance compared to the literature. The most convincing explanation is that the inclusion criteria used in this study differ from the literature, since only patients in endoscopic remission were included. It has been shown that histological markers of acute inflammation such as ulceration have a higher inter-reader agreement than markers of mild inflammation such as eosinophils. Please include this point in the discussion section.

- Since only patients in endoscopical remission were included, the results cannot be generalized to the overall UC population and the conclusion seems overstated (page 19 l. 272 'Therefore, using the current histological indices for deciding remission should be done with caution.') Please modify.

- It is not clear whether 41 patients (abstract) or 41 biopsies (page 13. l. 143) from X patients were assessed. Please clarify.

- In case of multiples biopsies in the same patient, how was assessed histological activity ? As the maximum histological activity from any biopsies ?

- Please include in the abstract that patients in clinical and endoscopical remission were included.

- It would have been interesting to have data on how long does it take to assess each score ? It might be important for pathologists, since the process could be time-consuming.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Aart Mookhoek

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Mar 8;16(3):e0248224. doi: 10.1371/journal.pone.0248224.r002

Author response to Decision Letter 0


28 Jan 2021

Reviewer 1

The ECCO has recently published guidelines on the use of histologic scoring systems. For clinical practice, they recommend to use the Nancy Index as the other scoring systems are deemed too complex in this setting. Please integrate this in the discussion.

- True, this will be incorporated, thank you for this suggestion.

It is common in articles describing inter-observer variation to inform readers on the experience of the pathologists. Please provide information regarding the length of experience and some insight into the number of IBD-related biopsies assessed annually.

- Thank you for pointing this out, it will be addressed in the method chapter.

From a biological point of view, the definitions of histologic remission are very different when comparing scoring systems. For instance, in the strict cut-off for histologic remission GS <2A does not correspond to NI =0 but rather to NI ≤1. Although the explanation by the authors is intuitive, the differences do not allow a valid comparison at the moment. Please adjust. Regarding the RHI <3, it is better to use an RHI ≤3 with subscores of 0 for lamina propria neutrophils and neutrophils in the epithelium.

- We agree that our strict for RHI were wrong and we have adjusted it as the reviewer suggested which is also in line with Pai et al. suggestion (Pai, 2018). The result is a slight improvement in RHI kappa Fleiss value.

- Regarding the comparison between Geboes and Nancy we agree that the comparison is problematic, nevertheless we believe it has enough merit to withstand the problem. The main indicator for active inflammation is the presence of neutrophils. An abcense of neutrophils will equal to Geboes <2A.0 and Nancy = 0 because the presence of moderate to marked increase in chronic infiltrate is almost always followed by increase in neutrophils. In addition, ECCO’s position paper makes the same comparison (Margo,2020). Having the same cut of makes our results more comparative to this paper and some of Magro’s earlier papers (Journal of Crohn's and Colitis, 2020, 1–5 and Journal of Crohn's and Colitis, 2020, 169–175)

The GS and the continuous GS are used interchangeably, which is confusing. Please use one system consistently.

- Thanks for pointing this out, it is addressed in the final reviewed manuscript. All mentions of the Geboes score is now refered to the Geboes continous with the except of the ICC values. It is easier to understand the value of the ICC analysis if the indices are divided up in their subcategories and presented with their respective ICC value. In this way one can easily understand which sub-category has substantial variation. In order to facilitate the understanding when using them both we have provided a table in S2 appendix where the two indices are listed up against each other.

The assessment of eosinophils is somewhat problematic. First, the fact that the pathologists were blinded to biopsy location may limit the assessment validity. Moreover, eosinophils were scored differently on scans compared to slides. Please address this in more detail in the discussion.

- Yes this is a problem, and we have addressed it more thorough in our discussion.

The authors state that intra-observer variation was assessed by one pathologist scoring the same biopsies on glass slides and using scans. Therefore, this variation cannot be quantified, as some (or indeed all) variation may be due to the difference between glass and scan. It does not seem appropriate to report the intra-observer variation in this context.

- The reviewer makes a good point. In our approach it is difficult to distinguish what is the source of the variation observed. It is our argument that when the variation is small, even if two different modalities is used, it is unlikely that it would be greater if the same modality where used twice and therefore it is appropriate to present our results. Our intra-observer values are on par with what is reported previously (Mosli, 2017).

I miss a more thorough discussion on why the inter-observer variation was high compared to published data. I feel that sending four articles as a "scoring protocol" may serve as a common knowledge base, but is insufficient to introduce a new histologic parameter or scoring system in clinical practice. Moreover, the trend towards subspecialisation may result in dedicated gastrointestinal pathologists to be present in the majority of pathology institutes. I feel these important points warrant more discussion.

- This is an interesting notion. The reason for this approach is to simulate the real world in hospitals outside the specialized central hospitals. There is, as the reviewer noted, a movement towards more subspecialisation, but this movement is slow and years will pass before there are GI-pathologist in the majority of hospitals in Scandinavia. Nevertheless, we see a clear difference between the general pathologist and the GI-specialized pathologists, which raises an interesting point of a centralized reader. The method of extraction, preparation and scanning could more easily be standardized. And, as we showed, with the exceptions of eosinophils, scanned biopsies do not introduce much variation. Therefore, we will modify our manuscript to introduce the suggestion of a centralized reader rather than extensive training of general pathologists.

The authors performed “gene quantification” for TNF, as mentioned in the Materials in Methods section. I assume they measured mRNA levels. The integration of this data in the manuscript is limited. For instance, how many patients received anti-TNF therapy? Does this influence TNF mRNA levels? I do not feel this data adds value at the moment, consider removing it

• We agree that the TNF portion of the manuscript is a bit off topic, therefore we have removed any mention of it from the manuscript.

Reviewer 2

The paper, overall, requires substantial revision to improve its readability, organization, and clarity. The introduction is wordy, and in many places several sentences are employed when one might have been used. Furthermore, the problem this paper plans to address (comparative utility of the three indices) is not well-set up, and the background and references need to be significantly strengthened, with a well-reasoned argument for why this problem is of importance. The materials/methods section suffers from a similar problem with some of the methods poorly defined. The histologic section, in particular, needs to be re-written with clarification of why specific cut-off values were chosen, further details on how pathologists rated slides (i.e. how many days separated each reading of the same slide?).

- Thank you for addressing this, we will work on improving the readability of the paper. We have revised the introduction and the method chapter to make it clearer, more readable and to the point. Thank you for helping us improve the overall quality of our paper.

The sample size here is small (41 patients) and inter-rater agreement has been previously assessed for these three indices using larger cohorts. Consequently, there is a problem of novelty here. What does this paper reveal that has not already been shown? The introduction should set this up with greater clarity so that the results section can address why this paper is novel, especially given the smaller sample size compared to prior efforts.

- This is a valid argument. We believe that our focus on the remission patient and that the evaluation is performed by general pathologist as well as GI-specialized pathologists makes our paper different from these large cohorts. They tend focus on UC as a whole and not remission patients. In addition, the histology is evaluated by only expert or extensively IBD-trained pathologists.

The authors make the argument that assessment of histology is important for the ascertainment of clinical outcomes, however they do not evaluate clinical outcomes here. Instead, they examine cross-sectional clinical variables (MES 1 vs MES 0) and histologic grade and find no differences. Persistent histologic activity in MES 1 vs MES 0 has been well-established in larger cohorts; the failure to find a difference here is likely related to the smaller sample size. Given that, it remains unclear why these results should be of value in this draft. Furthermore, normalization of TNF levels was utilized as a surrogate for a predictor for long term remission; histologic grades were compared with high or low levels of TNF. This is an unusual covariate to examine, especially given the negative results here. I would raise questions as to why this covariate was specifically included, and would necessitate the authors include reasons as to why this - versus other potentially more useful covariates like calprotectin - was utilized.

- The reviewer makes a very good argument and the issues surrounding the TNF data has been noted by another reviewer therefore we have decided to remove it from the manuscript.

We mentioned in the introduction that histology needs to fulfil three criteria in order to be useful, it must be able to identify signs of inflammation, it must be reliable, and it must have an effect on outcome. We explicitly state that we will focus on the two first criteria. While larger studies than ours exist, with different results, we believe it does not negate our results as our focus is different. In addition, the association between histology score and MES is of importance to answer our first statement. If histology provides additional information it would not show any difference between MES 1 and 0 in a Fishers exact test for the following reason: If two factors separate a population in the exact same partitions(ie all MES 0 patients are in the histologic remission group) , then one of the factors are a surplus. When they are independent (ie not significant Fishers exact test), they can complement each other. Therefore, there is value in investigating whether the MES 1 and MES 0 population differ in regard to histology category. This point was a bit confusing in the original text. We have changed some of the statistic test to make it clearer in the revised text.

Reviewer 3

The authors observed lower concordance compared to the literature. The most convincing explanation is that the inclusion criteria used in this study differ from the literature, since only patients in endoscopic remission were included. It has been shown that histological markers of acute inflammation such as ulceration have a higher inter-reader agreement than markers of mild inflammation such as eosinophils. Please include this point in the discussion section.

- Thank you for your input, your notion will be incorporated in the final manuscript.

Since only patients in endoscopic remission were included, the results cannot be generalized to the overall UC population and the conclusion seems overstated (page 19 l. 272 'Therefore, using the current histological indices for deciding remission should be done with caution.') Please modify.

- This is true we will make adjustment to the manuscript. Thank you for making us aware of this problem.

- It is not clear whether 41 patients (abstract) or 41 biopsies (page 13. l. 143) from X patients were assessed. Please clarify.

- This will be clarified in the final draft, thank you for bringing this to our attention.

- In case of multiples biopsies in the same patient, how was assessed histological activity? As the maximum histological activity from any biopsies?

- This is explained under the subheading “Histology” and the lines 89-90.

- Please include in the abstract that patients in clinical and endoscopic remission were included.

- Thank you for aiding us in improving the paper, we will improve the abstract.

- It would have been interesting to have data on how long does it take to assess each score? It might be important for pathologists, since the process could be time-consuming.

- This is a very interesting point, but unfortunately, we do not have the time used for evaluating each slide. All the pathologist agreed that they used the longest time on Geboes and the shortest time on Nancy, but we cannot confirm this with data.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Valérie Pittet

23 Feb 2021

Real life evaluation of histologic scores for Ulcerative Colitis in remission

PONE-D-20-32264R1

Dear Dr. Arkteg,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Valérie Pittet, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for addressing my comments concerning the manuscript. I feel that the changes made based on the comments from the different reviewers have improved the manuscript.

I continue to disagree with the argumentation regarding the use of a Nancy Index score of 0 as the definition of histologic remission. In my experience, a Nancy Index of 1 is not so rare. However, I agree with the authors that this definition has gained some traction in recent literature.

Reviewer #3: Thank you for providing the extensive revisions requested by the editor and both reviewers. I feel that my queries have been adequately addressed.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Aart Mookhoek

Reviewer #3: No

Acceptance letter

Valérie Pittet

26 Feb 2021

PONE-D-20-32264R1

Real-life evaluation of histologic scores for Ulcerative Colitis in remission

Dear Dr. Arkteg:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

PD Dr. Valérie Pittet

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Difference between raters.

    Significant difference between raters were tested with Wilcoxon rank sum test with Benjamini-Hochberg adjusted p-values.

    (PDF)

    S2 Fig. Bar plot.

    Difference in patients classified as remission or active according to relaxed or strict definition of remission.

    (PDF)

    S3 Fig. Histologic score by biopsy location.

    No difference was found between biopsy locations.

    (PDF)

    S1 Table. Descriptics for raters.

    (DOCX)

    S2 Table. Kruskal-Wallis rank sum test on raters, by indices.

    (DOCX)

    S3 Table. Fisher exact test between the UCCS and MES scores and histologic category.

    The table shows that histological category is independent for whether the sample was collected from a MES/UCCS 0 or 1 patient. This was true for both Strict and Relaxed category.

    (DOCX)

    S4 Table. Wilcoxon rank sum test on the effect of medication on histological scores.

    (DOCX)

    S1 Appendix. Scoring aid provided to the pathologists.

    (DOCX)

    S1 Data

    (CSV)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES