Abstract
Background:
Hidradenitis suppurativa (HS) is a chronic, inflammatory skin disease with a large impact on health-related quality of life on patients. However, reliable and consistent outcomes measures to assess body surface area (BSA) of HS have not been established.
Objective:
To develop and assess the reliability and validity of a novel outcome instrument for assessment of HS surface area in a clinical trial setting.
Design:
Qualitative interviews and focus groups were conducted July to August 2015 and October 2017 to January 2018. Evaluation of the measurement was assessed during a single-day grading session with HS patients in April 2018. Participants, which included clinicians or participants, were recruited from academic medical centers in the mid-Atlantic.
Results:
Concept elicitation included input from 10 providers: 60% (n=6) female, 80.0% (n=8) dermatology specialists, and 20% (n=2) gynecology specialists. Cognitive debriefing was conducted with 11 providers of which 82% (n=9) were dermatologists and 18% (n=2) were gynecologists. The evaluation stage included 10 clinicians and 23 patients. The intraclass correlation coefficient (ICC) for inter-rater reliability was 0.60 (95% Confidence Interval [CI] 0.44–0.74). The ICC for intra-rater reliability was 0.98 (95CI 0.94–1.00). Transformation of the BSA score resulted in an increase in inter-rater reliability 0.75 (95CI 0.62–0.85) or 0.76 (95CI 0.62–0.85). Scores all demonstrated concurrent validity with statistically significant correlations with extant scoring methods.
Conclusions:
The novel scale is a reliable and valid HS outcome instrument and may capture a wide range of patients by assessing body surface area. Future research is necessary to demonstrate its responsiveness.
Keywords: hidradenitis suppurativa, acne inversa, measurement, Clinical Trial, Outcome Instrument, disease severity, assessment
Introduction
Hidradenitis suppurativa (HS) is a chronic, inflammatory skin disease of unknown etiology that causes swollen nodules as well as draining abscesses, sinus tracts, and fistulas, primarily in the skin folds. The painful lesions make walking, sitting, and working difficult or impossible and also open areas result in drainage that can be embarrassing and uncomfortable.(1, 2) HS has a large negative impact on health-related quality of life (HRQOL).(3–5) Due to the impact of HS on patients, there is an increasing amount of research focused on developing and studying the efficacy of therapies. It is critical to characterize the effects of treatments on HS, however, during the last three decades there have been numerous attempts to develop instruments to measure HS, and some tools have never been validated(6, 7) or have been developed within clinical trials without prior rigorous testing and research suggests this may bias study results.(8)
Recently, an international multi-stakeholder group reached consensus on a core outcome set of domains for HS clinical trials and made recommendations for future assessment of physical signs of HS.(9–11) The recommendations specifically suggested that assessment of physical signs of HS include assessment of ulceration, edema, erythema, anatomic location, and surface area.(9) In contrast, the vast majority of existing HS assessment instruments relies on specific terminology for lesion morphology, discriminating among lesion morphologies, and counting the number of each type of lesion that can be problematic.(12–19) First, the terminology used to classify lesions has been vague and agreement among experts on interpretation has been poor.(20) Secondly, physician’s ability to clinically classify lesion type is poor, as judged by sonographic evaluation.(21) Also, counting lesions of any type can be difficult, even for clinical lesions that are relatively more defined, such as acne(22) and actinic keratosis(23). Supporting this, recent studies have shown that Hurley staging and other common measures used in clinical trials have generally moderate intra-rater and inter-rater reliability.(24, 25) The current study therefore describes the development and validation of the SASH, a clinician-reported outcome instrument designed to measure the clinical signs of HS that are expected to respond to therapy.
Methods
Study design
A multi-site, mixed methods design was utilized and included four phases: (1) concept elicitation interviews and focus groups, (2) measure development, (3) cognitive debriefing, and (4) measurement evaluation. The methods used were aligned with guidance from the US Food and Drug Administration for development of clinician reported outcome measures.(26) IRB approval was obtained from each participating institution. All participants gave informed consent prior to participation in the study.
Study Sample
Participants were recruited from the PaTH network, an integrated clinical data research network with individual patient-level health data from the University of Pittsburgh Medical Center, Penn State Hershey Medical Center, Johns Hopkins Medical Center, and Geisinger Health System. Patients with HS were identified based on a search of the PaTH network electronic records. A patient was considered to have HS if she or he received the International Classification of Disease (ICD) −9 or −10 code for HS (705.83 or L73.2, respectively), twice within five years. Patients were considered eligible if they had HS for at least three months (defined as multiple, recurrent inflammatory papules, nodules, abscesses or sinus tracts in the axilla, groin, and/or buttocks), active lesions in the prior 3 months, and 18 years of age or older. Patients were excluded if they were not fluent in English or were considered a protected population (e.g. a prisoner or had a severe mental impairment that would interfere with responding to questionnaires). The study was open to pregnant women as no invasive procedures were to take place. Each person with HS was contacted by written invitation and telephone call. Study personnel reviewed the study criteria with interested patients and enrolled those who were eligible. Clinicians familiar with HS, who practice dermatology, gynecology, or surgery, were recruited through network or “snowball” sampling by the PI (JSK) or by a dermatologist at each study site.(27)
Concept Elicitation
The design of the SASH was guided by concepts elicited from semi-structured interviews conducted with clinicians who treat HS, the recommendations of the international consensus(9), and review of the literature and outcome instruments used in dermatology as well as other disciplines. Clinicians who practice dermatology, gynecology or surgery were emailed an invitation to participate, which included a description of study aims and procedures. People who contacted the research team, were willing to give informed consent, and were fluent in English were recruited. Participants received a stipend for their participation. Clinician recruitment occurred in June and July 2015. Interviews were performed in-person by one interviewer (JSK) in July and August 2015. All interviews were conducted face-to-face and lasted for approximately 60 minutes. Semi-structured interviews were performed using an interview guide (Supplemental 1), which allowed for topic consistency and flexibility. During the discussion, clinicians were given paper copies of existing HS outcome measures and clinical images of HS. All interviews were tape-recorded and transcribed verbatim by a trained transcriptionist. The sample size was guided by our intent to interview until thematic saturation (i.e. when no new information or themes are observed). Data analysis was performed by one investigator (JSK) with Nvivo 11 software (QSR International, Burlington, MA). Transcripts were reviewed line by line, and words, phrases, and passages related to clinical findings related to active HS and fluctuations in HS activity or severity. Interview data were also analyzed for concept frequency or the number of subjects or groups who discussed a concept and this was assessed by determining the total number of subjects who mentioned the concept at least once.
Outcome Measure Development
SASH development was guided by the concepts related to active disease and recommended in the international consensus(9), specifically ulceration, edema, erythema, anatomic location, and surface area. The initial measure was based on the extant literature to guide items, item response options, and instructions. Outcome measures for skin conditions(28–31) that have a rating of body surface area and scores for the intensity of associated signs and the literature describing challenges or pitfalls of these scales were reviewed.
Cognitive debriefing
Cognitive debriefing interviews and focus groups were conducted to evaluate the relevance and design of the SASH, including the concepts evaluated by the measure (content or face validity) and the ability of the target audience (English-fluent clinicians) to understand and complete the SASH. Clinicians, specifically dermatologists, the primary users of this measure were recruited through email in August 2017 and interviewed in-person by one interviewer (J.S.K.) in October 2017 through January 2018 at each study site. Clinicians received a stipend of for their participation. During the discussion, clinicians were given copies of the SASH, extant HS severity measures, and clinical images of HS. We used Think Aloud and Random Probe techniques to explore the strengths, weaknesses, and limitations of the tools (Supplement 2).(32, 33) The sample size was guided by our intent to interview until thematic saturation (i.e. when no new information or themes are observed). All interviews or focus groups were conducted face-to-face and lasted for approximately 60 minutes. Interviews were audio-recorded then transcribed by a trained transcriptionist. Data analysis was performed with Nvivo 11 software (QSR International, Burlington, MA). Data coding was performed by one investigator (JSK) and aimed to identify all difficulties with the measure and changes were made as indicated.
Measurement Evaluation
The SASH was evaluated in a multi-rater study to investigate the rater reliability and construct validity (convergent and divergent). People with HS were again recruited and participants with a broad range of disease severity, affected body sites, and skin color were purposefully selected. Clinicians were recruited and a one-day event was conducted in April 2018. All clinicians completed a training session with images of HS and the outcome instruments in order to become familiarized with them. Instruments concurrently completed by the clinicians included the SASH, Hurley staging(34), and modified Sartorius scale(35), from which the abscess nodule (AN) count was also derived. Concurrent criterion validity was assessed based on correlation to these measures. Patients completed demographic questions and the Dermatology Life Quality Index (DLQI), a widely used skin-specific health-related quality of life tool.(36) Divergent validity was assessed based comparison of the SASH and the DLQI.
Statistical Methods:
Descriptive statistics were calculated for the demographics of the samples and to characterize the scores of the outcome measures. To assess inter-rater and intra-rater reliability we used the intra-class correlation coefficient (ICC). Intra-rater reliability was evaluated by having each rater complete a second rating of one participant and to minimize bias clinicians were not informed this until the end. An ICC of 0.5 was considered minimally acceptable, while an ICC above 0.51–0.75 is moderate, and > 0.75 is excellent.(37, 38) An a priori sample size calculation indicated that 12 raters with 18 patients per rater would support adequate power (80%) to detect an intra-class correlation based on a null hypothesis of an ICC of 0.5 and an alternative hypothesis of an ICC of 0.8 using an F-test with a significance level of 0.05. Criterioncurrent validity was assessed with partial Spearman correlation coefficient, which adjusted for the different providers. Data was collected on paper forms and entered into REDCap (Research Electronic Data Capture), a secure, web-based application designed to support data capture for research studies by two study teams members and double checked for accuracy.(39) Statistical analysis was performed using SAS statistical software (SAS Inc., Cary NC).
Results
SASH Development
A total of 10 clinicians participated in the concept elicitation phase (Table 1). Clinicians described 9 major themes/concepts that should be considered when designing an outcome measure for HS in clinical trials (Table 2). The themes that arose were related to assessment of the extent of disease, including evaluation of surface area, as well as signs of disease activity such as erythema, induration, and open skin. SASH design was guided by these results and the extant literature. Cognitive debriefing interviews were conducted with 11 clinicians (Table 1) and the initial version of the instrument (Supplemental 3). Themes from in the data are found in Table 3. Formatting changes were made to improve clinician use of the instrument including, alterations to the figures indicating the body sites to improve clinician interpretation of site demarcations. Similarly, figures depicting the range of findings for the severity indices were removed and verbal descriptors were added to improve clinician interpretation. Rating for body surface area was altered to reduce the mental effort of clinician so rather than needing to determine the proportion of a body site that was involved, which requires determining both the BSA affected and the total BSA for the site, this effort was reduced to determining the affected BSA only and a reference was added for the total BSA for each site included in the instrument.
Table 1.
Description of samples for concept elicitation, pilot testing, measurement evaluation phases
| Concept elicitation phase | Cognitive debriefing phase | Measurement Evaluation | ||
|---|---|---|---|---|
| Clinicians | People with HS | |||
| Total participants, n | 10 | 11 | 10 | 23 |
| Years post-training, mean (range) | 17.8 (1–37) years | 11.5 (1–25) years | Age: 42.4 (25–60) | |
| Sex, n (%) | ||||
| Female | 6 (60.0%) | 8 (72.7%) | 6 (60%) | 18 (78.3%) |
| Male | 4 (40.0%) | 3 (27.3%) | 4 (40%) | 5 (21.7%) |
| Race | ||||
| White | 10 (100%) | 7 (63.6%) | 10 (100%) | 18 (78.3%) |
| Black | -- | 1 (9.1%) | -- | 5 (21.7%) |
| Asian | -- | 3 (27.3%) | -- | -- |
| Bi- or Multiracial | -- | -- | -- | -- |
| Ethnicity | ||||
| Hispanic | -- | 0 (0%) | 0 (0%) | 3 (13%) |
| Non-Hispanic | 10 (100%) | 11 (100%) | 10 (100%) | 20 (87%) |
| Discipline | ||||
| Dermatology | 8 (80%) | 9 (81.8%) | 10 (100%) | NA |
| Gynecology | 2 (20%) | 2 (18.2%) | -- | |
| Surgery | -- | -- | -- | |
| DLQI | NA | NA | NA | |
| Mean (SD) | 12.3 (9.0) | |||
| Treatment, n (%) | ||||
| Prior surgery | 5 (21.7%) | |||
| Adalimumab | NA | NA | NA | 3 (13%) |
| Spironolactone | 6 (26.1%) | |||
| Antimicrobial wash | 6 (26.1%) | |||
| Oral antibiotic | 9 (39.1%) | |||
| Hurley Staging, n (%)* | ||||
| Stage I | 10 (43.5%) | |||
| Stage II | NA | NA | NA | 8 (34.8%) |
| Stage III | 5 (21.7%) | |||
DLQI: Dermatology Life Quality Index; NA: not applicable
Hurley staging approach varies by rater, as noted below, and participants are classified by majority rating.
Table 2.
Themes from concept elicitation and evidence for thematic saturation
| N (%) endorsed | Participant | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
| Extent measurement – body surface area | 8 (80%) | X | X | X | X | X | X | X | X | ||
| Extent measurement – counting lesions | 5 (50%) | X | X | X | X | X | |||||
| Extent measurement – differentiating body sites | 7 (70%) | X | X | X | X | X | X | X | |||
| Difficulty with extant tools – difficulty interpreting | 9 (90%) | X | X | X | X | X | X | X | X | X | |
| Morphology – definitions and challenges applying these | 8 (80%) | X | X | X | X | X | X | X | X | ||
| Characteristics to measure - erythema | 10 (100%) | X | X | X | X | X | X | X | X | X | X |
| Characteristics to measure - induration | 8 (80%) | X | X | X | X | X | X | X | X | ||
| Characteristics to measure – drainage, skin defect | 6 (60%) | X | X | X | X | X | X | ||||
| Characteristics to measure – scarring, damage | 5 (50%) | X | X | X | X | X | |||||
Table 3.
Pilot testing results with major themes, evidence for saturation, and resultant changes
| Major Theme or Concept | N (%) | Examples | Response (SASH changes) |
|---|---|---|---|
| Difficulty scoring body surface area (BSA) percentage | 9 (82%) | “so it’s not percent of total [bsa]. It’s each individual body site. Got it…if you ask me for a total it’s easier if you give me what an arm is; 9%” I mean I kind of can try and get a percentage out of it, but I don’t know how good I am at that.” “I would go back to maybe even the palmar methods for burn scores.” |
|
| Challenges discriminating among body sites | 11 (100%) | “we’re doing suprapubic as abdomen or…in the picture it looks like you clumped it into thigh.” “so you’re calling mons, [the] perineum? Pubic or groin, I think” “the crease that would be under the pannus, then that would be abdomen.” “what’s the difference [in the diagram] between back and buttocks?” “this [diagram] gets confusing. I feel like you kind of need a diagram with A, B, C, D.” |
|
| Body sites assessed and issues of sided-ness | 6 (55%) | “separating right and left because for axilla, some people might be asymmetrical, whereas places like the breast, people are more symmetrical.” “I like your suggestion [another participant] to separate the things we think of as right and left limbs.” “I wonder if you put thighs and then did right/left instead of front/back. I’m worried it would be hard to put left and right together.” |
|
| Clarifying purpose and intended use of the instrument | 8 (73%) | “is this tool used in the presence of the patient or an image?” “if somebody is using this in the context of research primarily that they are going to have time..it’s not going to be clinic, [I’d] be an hour behind” |
|
| Intensity score for worst or as an average | 5 (45%) | “you’re going to have variation in that one image of all of this.” “I was going to capture a global average for the whole area. I think that would be fine.” |
|
| Erythema score: Challenges with naming and image | 11 (100%) | “more likely to underestimate the score for patients with dark skin… we might need a different set of scoaring ranges [colors] for people with darker skin. Or maybe you just want to talk about this as inflammation instead of erythema” |
|
| Skin Induration score: concerns about application to scar rather than inflammation | 9 (82%) | “you’re getting a measure of mostly active disease but also scarring. So hypertrophic and keloidal scarring.” “I think the pictures don’t lend anything here other than confusion for me because I don’t know what that is….looks like somebody’s fingernails.” |
|
| Open skin surface score: issues with response options | 11 (100%) | “So a little unclear…all the disease activity, what percent of it is open or ulcerated? It’s going to be pretty small most of the time. I feel like I’d have a hard time saying that these areas equal 6% of the axilla” “I think the pictures don’t lend anything here other than confusion for me because I don’t know what that is. It looks like the sun.” |
|
| Drainage not assessed | 2 (18%) | “If treatment was working, when things aren’t purulent. I mean you’re trying to to capture that with open skin…If they’re open ad clean wounds, those tend to do better than the ones [that are] actively draining.” |
|
The final SASH (Supplemental 4) consists of an assessment of the extent of HS in terms of the body surface area (BSA) involved by HS activity. BSA assessment is performed of several distinct anatomical areas, separately for the right and the left of some sites, and includes body sites not expressly measured in many extant measures. Intensity scores are assigned for each site based on three findings indicative of HS inflammatory activity including inflammatory color change (redness or violaceus coloration), induration from inflammation, and open skin surface (erosion, ulceration or protuberant granulation-like tissue). These three findings are scored on a 4-point item response scale to describe the average intensity for each body site.
Measurement Evaluation
Assessment of the measurement properties of the SASH was conducted with the help of 10 clinicians and 23 people with HS. Characteristics of the participants are presented in Table 1. Multiple investigators informally reported after the assessment session that they liked the SASH and favored it over the existing disease severity measures.
The concurrent criterion validity of the SASH with other clinician-reported measures was good with correlations ranging from 0.68–0.79 (Table 4.) Divergent validity, assessed by correlation with the DLQI, showed a negative correlation of 0.41 (95% confidence interval [CI] −0.002 – 0.70) that was not statistically significant.
Table 4.
Clinimetric assessment of outcome measures
| HS measure | Inter-Rater Reliability, ICC (95CI) |
Intra-Rater Reliability, ICC (95CI) |
Convergent Validity,* Spearman Rho (95CI) |
|---|---|---|---|
| SASH | 0.60 (0.44–0.74) | 0.98 (0.94–1.00) | -- |
| Modified Sartorius | 0.61 (0.45–0.75) | 0.81 (0.50–0.95) | 0.79 (0.73–0.84) |
| AN count | 0.26 (0.14–0.43) | 0.89 (0.68–0.97) | 0.71 (0.63–0.77) |
| Hurley stage | 0.41 (0.25–0.58) | 0.92 (0.77–0.98) | 0.68 (0.60–0.75) |
AN Count: Abscess nodule count; ICC: Interclass Correlation Coefficient; 95CI: 95% Confidence Internal.
Correlation of the SASH with each measure.
The SASH demonstrated high intra-rater reliability on test and re-test of the same subject. This is reflected in a ICC of r=0.98 (95CI= 0.94–1.00). This value was higher than the reliabilities of the other measures (Table 4). The median difference between 1st and 2nd rating for the SASH was 0.4 points and the mean difference was 1.7 points. The intercept and slope of the regression equation relating 1st scores to 2nd scores showed small differences for (i.e., intercept of 1.5, p=.21), and a slope close to 1, (b= 0.91, p<.0001) for predicting the 2nd rating based on the result of the 1st rating.
There was acceptable inter-rater reliability (ICC= 0.60 (95CI= 0.44–0.74)). This was similar to the inter-rater reliability using the modified Sartorius score and higher than using the AN count and Hurley staging (Table 4). The impact of established body surface area scoring methods on inter-rater reliability was explored by transforming the percentage of involvement for each body site from a raw number to an integer from 0 to 6 points for the proportion of the body site involvement. Both approaches were borrowed from psoriasis severity tools.(29, 40, 41) The transformation resulted in an improvement for both scoring approaches with an increase in the inter-rater reliability from ICC =0.60 (95CI=0.44–0.74) to ICC=0.75 (0.62–0.85) or 0.76 (0.62–0.85). The increased inter-rater reliability gained by these score transformations will need to be weighed against the potential impact on responsiveness in future studies.
Discussion
HS causes substantial morbidity and large negative influence on quality of life.(3, 42) Reliable and validated outcome measures are crucial to identify treatments that can impact the condition. The ideal outcome instrument should be reliable, accurate, and responsive to changes in the disease over time. Outcome instruments with more measurement error and lower reliability may increase the risk of false-negative results in studies (type II error) and abandonment of potentially effective therapies.(43, 44) Outcome measures with lower reliability and higher measurement error would require higher sample sizes, thus higher costs, in order to detect a statistically significant difference in treatment studies.(45) Many HS severity instruments have been based on terminology that can be interpreted differently and lesion counts that have doubtful reliability.(18, 20) In this study, we evaluated multiple aspects of reliability and validity for the newly proposed SASH in evaluating HS, as compared to lesion-count based instruments (modified Sartorius and AN count).
Few studies have investigated the inter-rater reliability for raters using HS outcome instruments and our findings showed that instruments dependent on lesion counting, the modified Sartorius score and AN count, had lower reliability than the SASH. In a prior study, the Hidradenitis Suppurativa Score had an interrater reliability of 0.95 for four raters; though the sample contained few patients with severe disease and scores were more similar among the raters when scoring less severe disease.(46) The inter-rater reliability of rater using the Hidradenitis Suppurativa Clinical Response (HiSCR) measure, composed of the AN count, was evaluated in two studies and demonstrated an inter-rater reliability of 0.44 (0.29–0.63)(25) and 0.38–0.67(47). In our study, the inter-rater reliability was 0.26 (0.14–0.43), which is not dissimilar to the lower range of the prior studies. The inter-rater reliability when using the SASH was improved from 0.60 to 0.75 or 0.76 when the score for BSA for each site was changed from body surface area percentage an ordinal score. This may mitigate some of the relatively small differences in the scores; however it will be important to evaluate the performance of these scoring methods in larger studies and the impact on responsiveness of the instrument. Lastly, interrater reliability can be improved with rater training, such as an on-line didactic and scored practice session.(48) Studies of psoriasis outcome instruments showed significant improvement in reliability after training.(49, 50)
The intra-rater, or test-retest, reliability of the SASH was very high (0.98 [95CI=0.94–1.00]) and in this study was higher than all other outcome instruments. The intra-rater reliability of the AN count in this study was 0.89 (95CI=0.68–0.97), which is similar to the estimate from an earlier study (0.92 [95CI=0.83–0.97]).(47) The intra-rater reliability of the components of the HiSCR, namely counts of AN and draining fistulas, were 0.91 (95CI=0.88–0.93) and 0.95 (95CI=0.93–0.96), respectively. Another recent scoring system, which is also dependent on lesion counts, had an intra-rater reliability of 0.95 (95CI, 0.92–0.98).(19)
This study should be interpreted in light of its limitations. First, while the development of the instrument included the input from providers at four different institutions, testing of the SASH occurred at a single academic medical center amongst dermatology providers. Second, the responsiveness of the tool has not been established and it will be important for clinical instruments to capture changes related to the disease course and therapies. Third, there are a number of challenges when applying such a measure to skin of color due to the challenges of identifying erythema. Fourth, fistulas (lesions unique to HS), damage, scarring and dyspigmentation were not included in this particular measure as the focus was to evaluate findings that are expected to improve in the setting of a clinical trial; however, scarring may unintentionally contribute to the induration score and this will need further study. Lastly, while we had a great deal of enthusiasm from patients to participate in the full-day measurement evaluation session, we experienced some difficulty in recruiting the number of providers indicated by the sample size calculation and recruited 10 of the 12 that were called for.
This study describes the development and measurement evaluation of the SASH, a novel clinical assess of HS severity, which incorporates body surface area, ulceration, induration, and inflammatory color change (e.g. erythema). These particular elements of the SASH align with a number of the core outcomes set by the international stakeholders as recommendations to assess the physical signs of HS.(9) The SASH demonstrated excellent inter-rater and intra-rater reliability, convergent validity, acceptability, and captured a wide range of body sites and cutaneous morphologies. The SASH should be considered as an outcome instrument for the evaluation of disease severity for future studies.
Supplementary Material
Table 5.
Impact on inter-rater reliability of alternative scoring approaches for body surface area
| Score | SASH PASI method |
SASH Alternate method |
|---|---|---|
| 0 | 0% of involved area | 0% of involved area |
| 1 | < 10% of involved area | 1–2% of involved area |
| 2 | 10–29% of involved area | 3–5% of involved area |
| 3 | 30–49% of involved area | 6–10% of involved area |
| 4 | 50–69% of involved area | 11–21% of involved area |
| 5 | 70–89% of involved area | 22–46% of involved area |
| 6 | 90–100% of involved area | 47–100% of involved area |
| SASH Inter-Rater Reliability [ICC (95CI)] | ||
|
Untransformed* 0.60 (0.44–0.74) |
0.75 (0.62–0.85) | 0.76 (0.62–0.85) |
ICC: Interclass Correlation Coefficient; 95CI: 95% Confidence Internal; PASI: Psoriasis are and severity index
Untransformed ICC was derived from scores using raw BSA involvement without transofmration into an ordinal score (0–6).
What’s already known about this topic?
The major HS disease activity scales rely on lesions counts and have moderate to good reliability. Surface area is one of the physical signs included in the Core Outcome Set for HS, but is not a part of existing HS disease activity scales.
What does this study add?
A novel disease severity scale, the Severity and area score for hidradenitis (SASH), was developed and the psychometric properties assessed. There was high inter-rater reliability of 0.75 and 0.76 when body surface was scored on an ordinal scale and an excellent intra-rater reliability of 0.98. The SASH score also demonstrated convergent validity with extant instruments.
Acknowledgement:
The authors would like to thank and acknowledge the clinicians and individuals with hidradenitis suppurativa who took part in this study and whose time and effort made this possible.
Funding Sources: Dr Kirby received funding from the Agency for Healthcare Research and Quality for this research (K08HS024585). Use of REDCap through Penn State is supported by NIH/NCATS Grant Number UL1 TR000127 and UL1 TR002014 through The Penn State Clinical & Translational Research Institute, Pennsylvania State University CTSA
Footnotes
Conflicts of Interest: Dr Kirby has been a speaker for AbbVie, advisory board participant for AbbVie, and consultant to Incyte and ChemoCentryx
IRB approval status: Reviewed and approved by Penn State IRB: STUDY00002716 and STUDY00006806; Johns Hopkins School of Medicine IRB: IRB00118850; Geisinger IRB: 2017–0338; and University of Pittsburgh IRB: EXT17060318
References
- 1.Jemec GB, Heidenheim M, Nielsen NH. Hidradenitis suppurativa--characteristics and consequences. Clinical and experimental dermatology. 1996;21(6):419–23. [DOI] [PubMed] [Google Scholar]
- 2.Matusiak L, Bieniek A, Szepietowski JC. Psychophysical aspects of hidradenitis suppurativa. Acta Derm Venereol.90(3):264–8. [DOI] [PubMed] [Google Scholar]
- 3.Wolkenstein P, Loundou A, Barrau K, Auquier P, Revuz J. Quality of life impairment in hidradenitis suppurativa: a study of 61 cases. J Am Acad Dermatol. 2007;56(4):621–3. [DOI] [PubMed] [Google Scholar]
- 4.Matusiak L, Bieniek A, Szepietowski JC. Hidradenitis suppurativa markedly decreases quality of life and professional activity. J Am Acad Dermatol.62(4):706–8, 8 e1. [DOI] [PubMed] [Google Scholar]
- 5.Kouris A, Platsidaki E, Christodoulou C, Efstathiou V, Dessinioti C, Tzanetakou V, et al. Quality of Life and Psychosocial Implications in Patients with Hidradenitis Suppurativa. Dermatology (Basel, Switzerland). 2016;232(6):687–91. [DOI] [PubMed] [Google Scholar]
- 6.Adams DR, Yankura JA, Fogelberg AC, Anderson BE. Treatment of hidradenitis suppurativa with etanercept injection. Arch Dermatol. 2010;146(5):501–4. [DOI] [PubMed] [Google Scholar]
- 7.Giamarellos-Bourboulis EJ, Pelekanou E, Antonopoulou A, Petropoulou H, Baziaka F, Karagianni V, et al. An open-label phase II study of the safety and efficacy of etanercept for the therapy of hidradenitis suppurativa. Br J Dermatol. 2008;158(3):567–72. [DOI] [PubMed] [Google Scholar]
- 8.Marshall M, Lockwood A, Bradley C, Adams C, Joy C, Fenton M. Unpublished rating scales: a major source of bias in randomized controlled trials of treatment for schizophrenia. Br J Psych. 2000;176:249–52. [DOI] [PubMed] [Google Scholar]
- 9.Thorlacius L, Ingram JR, Villumsen B, Esmann S, Kirby JS, Gottlieb AB, et al. A core domain set for hidradenitis suppurativa trial outcomes: an international Delphi process. Br J Dermatol. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Thorlacius L, Ingram JR, Garg A, Villumsen B, Esmann S, Kirby JS, et al. Protocol for the development of a core domain set for hidradenitis suppurativa trial outcomes. BMJ open. 2017;7(2):e014733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Thorlacius L, Garg A, Ingram JR, Villumsen B, Theut Riis P, Gottlieb AB, et al. Towards global consensus on core outcomes for Hidradenitis Suppurativa research: An update from the HISTORIC consensus meetings I and II. Br J Dermatol. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Grant A, Gonzalez T, Montgomery MO, Cardenas V, Kerdel FA. Infliximab therapy for patients with moderate to severe hidradenitis suppurativa: a randomized, double-blind, placebo-controlled crossover trial. J Am Acad Dermatol. 2010;62(2):205–17. [DOI] [PubMed] [Google Scholar]
- 13.Sartorius K, Emtestam L, Jemec GB, Lapins J. Objective scoring of hidradenitis suppurativa reflecting the role of tobacco smoking and obesity. Br J Dermatol. 2009;161(4):831–9. [DOI] [PubMed] [Google Scholar]
- 14.Sartorius K, Lapins J, Emtestam L, Jemec GB. Suggestions for uniform outcome variables when reporting treatment effects in hidradenitis suppurativa. Br J Dermatol. 2003;149(1):211–3. [DOI] [PubMed] [Google Scholar]
- 15.Chiricozzi A, Faleri S, Franceschini C, Caro RD, Chimenti S, Bianchi L. AISI: A New Disease Severity Assessment Tool for Hidradenitis Suppurativa. Wounds: a compendium of clinical research and practice. 2015;27(10):258–64. [PubMed] [Google Scholar]
- 16.Zouboulis CC, Tzellos T, Kyrgidis A, Jemec GBE, Bechara FG, Giamarellos-Bourboulis EJ, et al. Development and validation of the International Hidradenitis Suppurativa Severity Score System (IHS4), a novel dynamic scoring system to assess HS severity. Br J Dermatol. 2017;177(5):1401–9. [DOI] [PubMed] [Google Scholar]
- 17.Kimball AB, Jemec GB, Yang M, Kageleiry A, Signorovitch JE, Okun MM, et al. Assessing the validity, responsiveness and meaningfulness of the Hidradenitis Suppurativa Clinical Response (HiSCR) as the clinical endpoint for hidradenitis suppurativa treatment. Br J Dermatol. 2014;171(6):1434–42. [DOI] [PubMed] [Google Scholar]
- 18.Ingram JR, Hadjieconomou S, Piguet V. Development of core outcome sets in hidradenitis suppurativa: systematic review of outcome measure instruments to inform the process. Br J Dermatol. 2016;175(2):263–72. [DOI] [PubMed] [Google Scholar]
- 19.Hessam S, Scholl L, Sand M, Schmitz L, Reitenbach S, Bechara FG. A Novel Severity Assessment Scoring System for Hidradenitis Suppurativa. JAMA Dermatol. 2018;154(3):330–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lipsker D, Severac F, Freysz M, Sauleau E, Boer J, Emtestam L, et al. The ABC of Hidradenitis Suppurativa: A Validated Glossary on how to Name Lesions. Dermatology (Basel, Switzerland). 2016;232(2):137–42. [DOI] [PubMed] [Google Scholar]
- 21.Wortsman X, Moreno C, Soto R, Arellano J, Pezo C, Wortsman J. Ultrasound in-depth characterization and staging of hidradenitis suppurativa. Dermatol Surg. 2013;39(12):1835–42. [DOI] [PubMed] [Google Scholar]
- 22.Lucky AW, Barber BL, Girman CJ, Williams J, Ratterman J, Waldstreicher J. A multirater validation study to assess the reliability of acne lesion counting. J Am Acad Dermatol. 1996;35(4):559–65. [DOI] [PubMed] [Google Scholar]
- 23.Ianhez M, Fleury Junior LF, Bagatin E, Miot HA. The reliability of counting actinic keratosis. Archives of dermatological research. 2013;305(9):841–4. [DOI] [PubMed] [Google Scholar]
- 24.Ovadja ZN, Schuit MM, van der Horst C, Lapid O. Inter- and Intrarater Reliability of the Hurley Staging for Hidradenitis Suppurativa. Br J Dermatol. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Thorlacius L, Garg A, Riis PT, Nielsen SM, Bettoli V, Ingram JR, et al. Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa. Br J Dermatol. 2019. [DOI] [PubMed] [Google Scholar]
- 26.US Food & Drug Administration. Clinical Outcome Assessment Qualification Program: US Department of Health and Human Services,. [Available from: https://www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugDevelopmentToolsQualificationProgram/ucm284077.htm.
- 27.Merriam S Qualitative research: A guide to design and implementation. San Francisco, CA Jossey-Bass; 2009. [Google Scholar]
- 28.Albrecht J, Taylor L, Berlin JA, Dulay S, Ang G, Fakharzadeh S, et al. The CLASI (Cutaneous Lupus Erythematosus Disease Area and Severity Index): an outcome instrument for cutaneous lupus erythematosus. J Invest Dermatol. 2005;125(5):889–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Fredriksson T, Pettersson U. Severe psoriasis--oral therapy with a new retinoid. Dermatologica. 1978;157(4):238–44. [DOI] [PubMed] [Google Scholar]
- 30.Hanifin JM, Thurston M, Omoto M, Cherill R, Tofte SJ, Graeber M. The eczema area and severity index (EASI): assessment of reliability in atopic dermatitis. EASI Evaluator Group. Exp Dermatol. 2001;10(1):11–8. [DOI] [PubMed] [Google Scholar]
- 31.Oranje AP, Stalder JF, Taieb A, Tasset C, de Longueville M. Scoring of atopic dermatitis by SCORAD using a training atlas by investigators from different disciplines. ETAC Study Group. Early Treatment of the Atopic Child. Pediatric allergy and immunology: official publication of the European Society of Pediatric Allergy and Immunology. 1997;8(1):28–34. [DOI] [PubMed] [Google Scholar]
- 32.Fonteyn ME, Kuipers B, Grobe S. A description of think aloud method and protocol analysis. Qualitative Health Research. 1993;3(4):430–41. [Google Scholar]
- 33.Branch JL. Investigating the information-seeking processes of adolescents: The value of using think-alouds and think afters. Library and Information Science Research. 2000;22(4):371–92. [Google Scholar]
- 34.Hurley H Axillary hyperhidrosis, apocrine bromhidrosis, hidradenitis suppurativa and familial benign pemphigus. Surgical approach In: Roenigk R, Roenigk H, editors. Dermatologic Surgery, Principles and Practice. New York, New York: Marcel Dekker; 1989. [Google Scholar]
- 35.Revuz J Modifications et mode d’emploi du score de Sartorius pour évaluer la gravité de l’hidradénite suppuré e. Ann Dermatol Venereol. 2007;134:173–80. [DOI] [PubMed] [Google Scholar]
- 36.Finlay AY, Khan GK. Dermatology Life Quality Index (DLQI)--a simple practical measure for routine clinical use. Clinical and experimental dermatology. 1994;19(3):210–6. [DOI] [PubMed] [Google Scholar]
- 37.Portney LG, Watkins MP. Foundations of clinical research: applications to practice. New Jersey: Prentice Hall; 2000.
- 38.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8. [DOI] [PubMed] [Google Scholar]
- 39.Harris P, Taylor R, Thielke R, Payne J, Gonzalez N, Conde J. Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;421(2):377–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Langley RG, Ellis CN. Evaluating psoriasis with Psoriasis Area and Severity Index, Psoriasis Global Assessment, and Lattice System Physician’s Global Assessment. J Am Acad Dermatol. 2004;51(4):563–9. [DOI] [PubMed] [Google Scholar]
- 41.Jacobson CC, Kimball AB. Rethinking the Psoriasis Area and Severity Index: the impact of area should be increased. Br J Dermatol. 2004;151(2):381–7. [DOI] [PubMed] [Google Scholar]
- 42.Esmann S, Jemec GB. Psychosocial impact of hidradenitis suppurativa: a qualitative study. Acta Derm Venereol.91(3):328–32. [DOI] [PubMed] [Google Scholar]
- 43.Cleary TA, Linn RL. Error of measurement and the power of a statistical test. Br J Math Stat Psychol. 1969;22:49–55. [Google Scholar]
- 44.Murphy KR, Myors B. Statistical power analysis. Mahwah, NJ: Lawrence Erlbaum; 1998. [Google Scholar]
- 45.Perkins DO, Wyatt RJ, Bartko JJ. Penny-wise and pound-foolish: the impact of measurement error on sample size requirements in clinical trials. Biological psychiatry. 2000;47(8):762–6. [DOI] [PubMed] [Google Scholar]
- 46.Sartorius K, Killasli H, Heilborn J, Jemec GB, Lapins J, Emtestam L. Interobserver variability of clinical scores in hidradenitis suppurativa is low. Br J Dermatol. 2010;162(6):1261–8. [DOI] [PubMed] [Google Scholar]
- 47.Kimball AB, Ganguli A, Fleischer A. Reliability of the hidradenitis suppurativa clinical response in the assessment of patients with hidradenitis suppurativa. J Eur Acad Dermatol Venereol. 2018. [DOI] [PubMed] [Google Scholar]
- 48.Sadler ME, Yamamoto RT, Khurana L, Dallabrida SM. The impact of rater training on clinical outcomes assessment data: a literature review. International Journal of Clinical Trials. 2017;4(3):101–10. [Google Scholar]
- 49.Youn SW, Choi CW, Kim BR, Chae JB. Reduction of inter-rater and intra-rater variability in psoriasis area and severity index assessment by photographic training. Ann Dermatol Venereol. 2015;27(5):557–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Armstrong AW, Parsi K, Schupp CW, Mease PJ, Duffin KC. Standardizing training for psoriasis measures: Effectiveness of an online training video on psoriasis area and severity index assessment by physician and patient raters JAMA Dermatology. 2013;149(5):577–82. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
