Skip to main content
Dermatology and Therapy logoLink to Dermatology and Therapy
. 2024 Mar 15;14(3):643–669. doi: 10.1007/s13555-024-01114-2

Development and Psychometric Validation of a Patient-Reported Outcome Measure to Assess the Signs and Symptoms of Chronic Hand Eczema: The Hand Eczema Symptom Diary (HESD)

Sonja Molin 1,, Lotte Seiding Larsen 2, Peter Joensson 2, Marie Louise Oesterdal 2, Rob Arbuckle 3, Laura Grant 3, George Skingley 3, Marie L A Schuttelaar 4
PMCID: PMC10965865  PMID: 38485862

Abstract

Introduction

Chronic Hand Eczema (CHE) is an inflammatory skin disease of the hands. The Hand Eczema Symptom Diary (HESD) is a new patient-reported outcome measure of worst severity of core CHE signs/symptoms. This study aimed to evaluate content and psychometric validity of the HESD.

Methods

The HESD was developed based on the literature and concept elicitation interviews. Qualitative cognitive debriefing interviews were conducted with CHE patients to assess relevance and understanding of items, response options and recall period. Psychometric properties of the HESD (item performance, dimensionality, reliability, validity, responsiveness and estimation of meaningful change thresholds) were then assessed, first using data from a phase 2b trial (NCT03683719), and confirmed using data from the first 280 participants completing the 16-week treatment phase of a phase 3 trial (NCT04871711).

Results

Cognitive debriefing supported item refinement and removal of items and confirmed all items were well understood and relevant to patients. Item properties and dimensionality analyses in the phase 2b data supported removal of additional items, resulting in the 6-item HESD included in the phase 3 trial. Unidimensionality was supported by inter-item correlations (all > 0.70) and Rasch analysis. Internal consistency (Cronbach’s alpha = 0.96) and test-retest reliability (Intraclass Correlation Coefficient > 0.89) results were very strong. Construct validity was supported by moderate correlations with concurrent measures (0.53–0.64) and significant differences between severity groups (p < 0.001). Large effect sizes for mean change scores in participants that improved and significant differences between change groups indicated the ability to detect change. Anchor-based analyses supported within-individual responder definitions of ≥ 4-points for improvements in 7-day average HESD scores.

Conclusion

The HESD is the first CHE-specific, patient-reported outcome measure of CHE signs/symptoms developed and validated in line with regulatory guidance. This article provides evidence of strong content validity and psychometric validity and shows improvements of ≥ 4 points on 7-day average HESD scores represent clinically meaningful, important changes.

Trial Registration

NCT03683719, NCT04871711.

Supplementary Information

The online version contains supplementary material available at 10.1007/s13555-024-01114-2.

Keywords: Adults, Chronic Hand Eczema, Delgocitinib cream, DELTA 1 trial, Interview, Itch and (skin) pain, Moderate to severe, Phase 3, Psychometric validation, Qualitative research

Key Summary Points

Why carry out this study?
Existing patient-reported outcome measures (PROMs) are not expected to be fit for purpose from a regulatory perspective to assess Chronic Hand Eczema (CHE) sign/symptom severity in clinical trials of treatment interventions for CHE and clinical practice for the management of CHE.
Other existing PROMs used in CHE are either not disease specific (e.g., the Dermatology Life Quality Index) or primarily provide an assessment of quality of life rather than a comprehensive assessment of the key signs/symptoms associated with CHE (e.g., the Quality of Life in Hand Eczema).
This study aimed to assess the content validity and psychometric validity of the Hand Eczema Symptom Diary (HESD), a new PROM designed to assess CHE sign/symptom severity and measure changes in these over time in CHE clinical trials and clinical practice.
What was learned from the study?
Content validity of the 6-item HESD (assessing itch, pain, cracked skin, redness, dryness and flaking) was confirmed in cognitive debriefing interviews with CHE patients and psychometric validation activities provided strong evidence of construct validity, reliability and the ability to detect change for the HESD Itch score, HESD Pain score and the HESD score and established within-individual responder definitions for those scores.

Introduction

Chronic Hand Eczema (CHE) is an inflammatory skin disease of the hands or wrists, often caused by contact dermatitis and characterized by recurring flares and poor prognosis [1]. One-year prevalence of CHE is estimated to be 4.7–9.1% [24], and the life-time prevalence of severe to very severe HE has been reported to be 1.9% [2]. Approximately 70.0% of CHE patients have moderate to severe disease, which can persist over several years [5]. CHE is characterized by signs of erythema, infiltration, hyperkeratosis and vesicles. Secondary signs include scaling, fissures and erosions, and the condition may be exacerbated by infections [3, 6]. Core symptoms include itch and pain, which are reported to negatively impact patients’ psychological wellbeing, physical functioning, daily life activities and the ability to work [1, 710].

Patient-reported outcome measures (PROMs) are valuable to support clinical trial endpoints for new therapies alongside clinical endpoints. Regulatory guidance indicates that fit-for-purpose PROMs must measure ‘concepts of interest’ that are clinically relevant and important to how patients ‘feel and function’, with evidence of validity, reliability and ability to detect change in the target population [1114]. Qualitative research comprising a literature review and concept elicitation (CE) interviews with 20 CHE patients in the USA and five expert dermatologists was conducted to better understand the lived experience of CHE patients [9]. The signs/symptoms most frequently reported by patients included itch, pain, cracked skin, redness, thickened skin, swelling and bleeding.

The key signs and symptoms identified in this previous qualitative work were used to draft items to form the Hand Eczema Symptom Diary (HESD). The HESD is a new, CHE-specific PROM designed to assess the worst severity of CHE signs and symptoms for use in clinical trials of treatment interventions for CHE and in clinical practice for the management of CHE. Other existing measures of symptoms and/or health-related quality of life (HRQoL) used in CHE are either not specific to CHE (e.g., the Dermatology Life Quality Index [DLQI]) or have limitations that make them less likely to be accepted by regulatory authorities such as the US FDA as a measure of symptom severity. The Quality of Life in Hand Eczema (QOLHEQ) [15], for example, is a PROM that includes some measurement of the signs/symptoms of CHE but is primarily an assessment of HRQoL.

The aims of this research were to evaluate content validity of the HESD through  qualitative patient interviews and then to evaluate the psychometric properties and estimate within-individual responder definition thresholds for meaningful change using data from two clinical trials.

Methods

Study Design

This study consisted of two core activities: qualitative cognitive debriefing (CD) interviews with adult CHE patients to evaluate content validity of the HESD and psychometric validation using data from two clinical trials, pooled across treatment groups.

The study activities were performed in accordance with the Helsinki Declaration of 1964 and its later amendments. All participants provided written informed consent indicating their data will be used for medical research purposes and that the results may be published. Ethical approval and oversight for the qualitative interviews were provided by Copernicus Group Independent Review Board (CGIRB; reference ADE1-17-162). Ethical approval for the psychometric validation activities was obtained as part of the phase 2b (NCT03683719) and phase 3 (NCT04871711) trials from the independent institutional review boards (IRB) and ethics committees.

Instrument Development

The initial draft HESD included 13 items designed to assess the severity of CHE signs/symptoms. Patients rate each sign/symptom, at its worst, over the past 24 h using a 0–10 numeric rating scale (NRS). The decision to focus on ‘worst severity’ was to align with the US FDA’s stated preference for the assessment of symptoms ‘at their worst’ when label claims are sought [13]. A 24-h recall period was selected as CHE signs/symptoms can fluctuate daily, and the measure was intended to track sign/symptom severity over time. For the 0–10 NRS, two options for the upper anchor label were tested (i.e., ‘severe’ and ‘worst imaginable’) during the CD interviews, with the aim of selecting the most appropriate option from the patient perspective. The lower anchor was ‘no (sign/symptom)’ for all HESD items. The HESD was subject to multiple rounds of CD and revision, including input from expert dermatologists at key stages to ensure clinical insight into the interpretation of findings (Fig. 1).

Fig. 1.

Fig. 1

HESD development overview

Qualitative Interviews

The content validity of the draft 13-item HESD was assessed in semi-structured, qualitative, face-to-face interviews with adults with CHE in the US. The interviews comprised CD activities to assess whether the HESD was understood and relevant and captures all sign/symptom concepts important to CHE patients. The interviews were approximately 60 min in total, with 30 min focused on CD of the HESD (the remainder was focused on debriefing an impact measure). The interviews were conducted in two rounds by trained qualitative interviewers, allowing for modifications and testing of the updated instrument between rounds.

Sample and Recruitment

Participants for the interviews were identified by a recruitment agency in the US (MedQuest, New York, NY, USA) via referring primary care physicians and dermatology specialists. To be eligible, patients had to be at least 18 years of age and have a clinician-confirmed diagnosis of CHE defined according to the guidelines for management of HE (ICD-10 code: I.20, I.23, I.24, I.25, I.30) [16]. Quotas were used to ensure the inclusion of patients with different levels of disease severity according to a standard physician global assessment (PGA) and from different regions of the USA to account for potential seasonal and regional differences. To capture a diversity of patient perspectives, quotas were also set with the view to recruit a mix of patients with allergic, irritant and atopic CHE subtypes. A total of 20 CHE patients (split across the different subtypes) were expected to be sufficient to assess content validity [17]. All interview participants were compensated for their participation.

Interview Procedure

Participants were asked to complete the HESD on a handheld device using a ‘think aloud’ approach and to share their thoughts as they read each instruction/item and selected each response. Participants were asked detailed questions about their interpretation and understanding of the instructions and item wording, relevance of concepts and the appropriateness of response options and recall period.

To determine the most appropriate wording to use for the response scale, half the participants (n = 10) were first debriefed on the ‘severe’ upper anchor label, whereas the other half (n = 10) were first debriefed on the ‘worst imaginable’ upper anchor label. Following debriefing of the HESD in its entirety, participants were then presented with one or more items with the opposite upper anchor label and asked to indicate their preference and why. Usability of the handheld device was also explored.

Qualitative Analysis

All interviews were audio-recorded and transcribed verbatim, with identifiable information redacted. Transcripts were imported into Atlas.Ti Scientific Software for analysis [18]. Qualitative analysis of the verbatim transcripts involved assigning dichotomous codes to each item, instruction, response option and recall period to indicate whether it was understood, relevant and appropriate and why.

Psychometric Validation

The 11-item version of the HESD that emerged from the content validity interviews was included as an exploratory endpoint in a phase 2b dose-ranging trial conducted to evaluate the efficacy and safety of delgocitinib cream in patients with mild to severe CHE (NCT03683719). Data from the trial were used to support item reduction, development of scoring and initial evaluation of psychometric properties (see Table S1 for more detail on methods). Analyses were conducted in the full sample (N = 258). The study design and eligibility criteria for the phase 2b trial have been previously described [19].

Psychometric results from the phase 2b trial were reviewed by the US FDA, who recommended deletion of additional items (those assessing signs/symptoms least commonly reported) and the use of Patient-Global Impression (PGI) items that were more specific to the targeted measurement concepts as part of the anchor-based analyses conducted to confirm within-individual responder definitions for the HESD scores (e.g., for the HESD Itch score it was recommended to use a PGI-Severity [PGI-S] item asking about severity of itch rather than just severity of CHE). Following this feedback, additional items were removed resulting in the 6-item HESD and specific PGI items were developed for itch and pain.

The 6-item HESD was included as a secondary endpoint in a phase 3, randomized, double-blind, vehicle-controlled, parallel-group, multi-site trial evaluating the efficacy and safety of twice-daily application of delgocitinib cream 20 mg/g versus cream vehicle over a 16-week treatment period in adults with moderate to severe CHE (NCT04871711; DELTA 1). Data from the first 280 participants with an IGA-CHE score at Baseline and Week 16 were used to perform confirmatory evaluation of the measurement properties of the 6-item HESD, including anchor- and distribution-based analyses to support interpretation of change. The HESD was completed on an electronic device (eDiary) starting at least 7 days prior to Baseline and then every evening until Week 16.

Eligibility Criteria

Participants in the phase 3 trial were required to have: a diagnosis of CHE, defined as HE that has persisted for > 3 months or returned twice or more within the last 12 months; moderate to severe CHE at screening and Baseline according to the Investigator Global Assessment for Chronic Hand Eczema (IGA-CHE; score of 3 or 4); a HESD Itch score (weekly average) of ≥ 4 points for the 7 days preceding Baseline; a documented recent history of inadequate response to treatment with topical corticosteroids (TCS) (at any time within 1 year before the screening visit) or for whom TCS were documented to be otherwise medically inadvisable (e.g., due to important side effects or safety risks).

Trial Assessments

Other clinical outcome assessment (COA) measures administered alongside the HESD during the phase 3 trial were used to support construct validity analyses, define participants with stable CHE for test-retest reliability analysis and define participants who experienced change. Details of the other COA measures included in analyses are provided in Table 1.

Table 1.

Overview of other relevant COA instruments included in the phase 3 clinical trial

Assessment Description
Investigator’s Global Assessment of Chronic Hand Eczema (IGA-CHE)  The IGA-CHE allows investigators to assess overall disease severity at one given timepoint and consists of a 5-level severity scale (i.e., 0 = ‘clear’, 1 = ‘almost clear’, 2 = ‘mild’, 3 = ‘moderate’, 4 = ‘severe’). Each severity level on the scale is characterized in terms of the clinical characteristics of erythema, scaling, hyperkeratosis, vesiculation, oedema and fissures. The IGA-CHE was administered at all site visits (Baseline and Weeks 1, 2, 4, 8, 12 and 16).
Patient Global Assessment (PaGA) The PaGA is a single-item, patient-reported global assessment of HE severity with a 5-level scale (0 = ‘clear’, 1 = ‘lmost clear’, 2 = ‘mild’, 3 = ‘moderate’ and 4 = ‘severe’). The PaGA was administered during site visits at Baseline and Weeks 1, 2, 4, 8, 12 and 16.
Patient Global Impression of Severity (PGI-S) Three single-item PGI-S measures were included in the phase 3 trial, which corresponded with the concepts assessed by the HESD Itch score (PGI-S Itch), HESD Pain score (PGI-S Pain) and HESD score (HESD PGI-S). Each PGI-S asked participants to choose the response that best described the severity of their itch/pain/HE signs and symptoms over the past week, respectively, using a 4-point verbal descriptor response scale (‘none’, ‘mild’, ‘moderate’, ‘severe’). The PGI-S measures were administered at Baseline and Weeks 2, 4, 8 and 16.
Patient Global Impression of Change (PGI-C) Three single-item PGI-C measures were included in the phase 3 trial, which also corresponded with the concepts assessed by the HESD Itch score (PGI-C Itch), HESD Pain score (PGI-S Pain) and HESD score (HESD PGI-C). Each PGI-C asked participants to choose the response that best described the overall change in their itch/pain/HE signs and symptoms since starting the trial medication, respectively, using a 5-point response scale (‘much better’, ‘a little better’, ‘no change’, a little worse’, ‘much worse’). The PGI-C measures were administered at Weeks 2, 4, 8 and 16.
Dermatology Life Quality Index (DLQI) [20] The DLQI is a validated dermatology-specific questionnaire widely used as a trial endpoint in dermatological conditions. It consists of 10 items addressing the patient’s perception of the impact of their skin disease on 6 different aspects of their quality of life over the last week, including symptoms and feelings, daily activities, leisure, work or school, personal relationships and treatment. The total score ranges from 0 to 30, where a high score is indicative of poor quality of life. A change in DLQI score of at least 4 points in adults is considered clinically meaningful [21]. The DLQI was administered at Baseline and Weeks 1, 4, 8, 12 and 16.

Statistical Methods

Table 2 details the psychometric analyses used in this study. All analyses were conducted in the psychometric analysis population (comprised of the first 280 participants randomized with an IGA-CHE completion at Baseline and Week 16) and using data from Week 4, unless otherwise specified. Week 4 was selected as this timepoint was expected to provide a greater distribution of scores than other timepoints such as Baseline or Week 16. The psychometric analysis population sample size was determined by calculating the sample size that would be required to achieve a mean change in HESD scores for the moderately improved anchor group with a 95% confidence interval (CI) width of less than 1 (< 25.0% of the expected mean change of 4). This calculation assumed the same proportion of participants achieving ‘moderately improved’ and the same standard deviation (SD) as the phase 2b sample, leaving room for dropout/assumption violation. All statistical analyses were detailed a priori in a psychometric analysis plan and conducted in accordance with European Medicines Agency (EMA) and US FDA standards and US FDA guidance for PROMs and other COAs [11, 14, 22, 23].

Table 2.

Summary of psychometric analyses performed in the phase 3 clinical trial data

Analysis Description
Stage 1: Item properties
Quality of completion The quality of completion of the HESD was evaluated to identify any items with unexpectedly high levels of missing data. Missing data at the form-level and item-level were summarized at Baseline and Weeks 2, 4, 8, 12 and 16.
Item response distributions

Response distributions for each HESD item were examined to assess the frequency and percentage of each endorsed response and to identify any response options that were overly favored or evidence of unexpected or skewed distributions. This was assessed at Baseline and Weeks 2, 4, 8, 12 and 16.

Percentages of minimum and maximum responses were also calculated to examine floor and ceiling effects for all items. A floor effect was defined as a high percentage of patients endorsing the response option 0 and a ceiling effect was defined as a high percentage of patients endorsing the response option 10. A substantial effect was defined as > 15% of respondents. Items with substantial proportion of patients scoring floor/ceiling were flagged for further consideration.

Stage 2: Dimensionality and scoring
Inter-item correlations Inter-item correlations were examined for each pair of items in the HESD to ensure each item measured a distinct concept without any redundancy. Items that correlated very highly with one another (≥ 0.90) were flagged for review.
Rasch analysis

Rasch analysis was conducted to confirm the underlying unidimensional structure of the HESD and assess item performance.

Item fit and person fit statistics were examined with values between 0.5 and 1.5 considered optimal for infit mean square (MNSQ) and values between 0.5 and 2.0 are deemed optimal for outfit MNSQ [24].

Item characteristic curves were evaluated to graphically represent the probability of a participant selecting each response category (response option) for each item given the latent severity as estimated by the Rasch model. Evidence of overlapping categories (suggesting there may be too many response options), or disordered curves (suggesting that the response options are not behaving as intended) was flagged for consideration.

Yen’s Q3 local dependence indices were produced, with the expectation item residual correlations should deviate < 0.30 from the average residual correlation, with higher deviation indicating that the responses to one item depend on the responses to another (i.e., the items are locally dependent) [2527].

Item-person maps presenting the location of both respondents and items on the same latent trait were examined to evaluate the spread of item difficulty parameters and their ability to capture the severity of participants in the population.

Stage 3: Reliability and validity of scores
Reliability
Internal consistency reliability

Internal consistency reliability of the HESD score was evaluated using Cronbach’s alpha coefficient (≥ 0.70 for good internal consistency) to assess the homogeneity of items within the HESD score [28].

The impact of item removal on internal consistency reliability was examined by calculating Cronbach’s alpha with each item removed from the HESD score in turn. If the removal of an item causes the alpha value to notably increase, then that item may not be fitting well within its domain.

Corrected item-total correlations were also calculated by computing the Pearson correlation coefficient of each item with the sum of the remaining items within its corresponding score. Items with a correlation < 0.40 were considered evidence that the item did not fit well with the other items [29].

Test-retest reliability

Test-retest reliability was evaluated for the HESD Itch score (weekly average), HESD Pain score (weekly average) and HESD score (weekly average) by examining the stability of scores between Week 2 and 4 and between Week 4 and 8 in patients defined as having ‘stable’ CHE based on other assessments in the trial.

Subgroups of patients with ‘stable’ CHE were defined as patients with no change in the relevant PGI-S measure, no change on the PaGA or no change in the IGA-CHE in separate analyses.

Intra-class correlation coefficients (ICCs) were calculated and evaluated using pre-specified cut-off criteria: < 0.50 indicating poor reliability, 0.5–0.75 indicating moderate reliability, 0.75–0.90 indicating good reliability and > 0.90 indicating excellent reliability [30]. Pearson’s correlation coefficients were also calculated.

Construct validity
Convergent validity

Convergent validity was evaluated by calculating polyserial and Spearman’s correlations of the HESD Itch score, HESD Pain score and HESD score with the DLQI total score, DLQI symptoms and feelings score and DLQI item 1 (which assesses itch, pain, soreness and stinging).

Convergent validity evaluates the relationship with other measures that assess similar or related concepts. Correlations of < 0.50 were defined a priori as ‘weak’, those ≥ 0.50 and ≤ 0.70 were defined as ‘moderate’, those ≥ 0.70 and < 0.90 were defined as ‘strong’, and those ≥ 0.90 were considered ‘very strong’ [31].

The following relationships were hypothesized to exist:

 DLQI total score > 0.30 with HESD Itch score, HESD Pain score and HESD score

 DLQI symptoms and feelings subscale > 0.50 with HESD Itch score, HESD Pain score and HESD score

 DLQI item 1: itch, pain, soreness and stinging > 0.50 with HESD Itch score and HESD Pain score

Known-groups analysis

The known-groups method was used to evaluate differences in HESD Itch, HESD Pain and HESD total score among groups of patients expected to differ in severity. Known-groups were defined using scores on the PaGA, IGA-CHE, PGI-S Itch, PGI-S Pain and HESD PGI-S

Mean HESD Itch, HESD Pain and HESD total scores were compared between severity groups, with F-test one-way ANOVAs used to test the mean score differences between groups. The pre-specified criterion for known-groups validity was considered met if statistically significant differences (p < 0.50) in mean HESD Itch, HESD Pain and HESD total scores were observed between the known groups, and scores increased monotonically as expected.

Between-group effect sizes (ES) were also calculated as a measure of the magnitude of differences in scores between groups. The following pre-specified cut-offs were used to interpret the magnitude of each ES: small (ES = 0.20), moderate (ES = 05.0) and large (ES = 0.80) [32].

Ability to detect change

Ability to detect change was assessed for the HESD Itch (weekly average), HESD Pain (weekly average) and HESD score (weekly average) using data from Baseline to Week 16 in the ability to detect change analysis population.

Within-group effect sizes [32, 33] and between-group one-way ANOVA F-test were calculated to evaluate the magnitude and significance of differences in change scores between each group, respectively.

Patients were categorized into ‘improved’, ‘no change’ and ‘worsened’ groups as follows:

PaGA, IGA-CHE, PGI-S Itch, PGI-S Pain, HESD PGI-S:

 Improved: ≥ 1-level improvement

 Stable: No change

 Worsened: ≥ 1-level worsening

PGI-C Itch, PGI-C Pain, HESD PGI-C:

 Improved: ‘Much better’ or ‘A little better’

 Stable: ‘No change’

 Worsened: ‘Much worse’ or ‘A little worse’

Interpretation of scores
Anchor-based methods

Anchor-based analyses were performed in the psychometric analysis population for the HESD Itch (weekly average), HESD Pain (weekly average) and HESD score (weekly average) using data on change from Baseline to Week 16.

First, the suitability of proposed anchors was tested using a polyserial correlation coefficient to establish the relationship between the anchor categories and change in HESD scores. Anchors with correlations of < 0.3 were not taken forward for analysis [34].

Within-individual change thresholds were recommended by considering the mean change in PROM score for participants classified as either ‘moderately’ or ‘minimally’ improved according to PGI-S (Itch, Pain and HESD), PGI-C (Itch, Pain and HESD), PaGA and IGA-CHE

Empirical cumulative distribution function (eCDF) and probability density function (PDF) plots were used to allow various proposed responder definitions generated to be evaluated simultaneously.

Between-group differences in mean change PROM score was calculated for participants as defined above. The minimal important difference (MID) estimate for each anchor was defined as the difference in mean change score between minimal improvement and no change groups.

Additionally, to guide triangulation of estimates (but not explicitly define them), a correlation-weighted average was calculated, where estimates were weighted by the observed correlations between change in anchor and score as follows:

Mweighted=i=1n|ri|xii=1n|ri|

where x denotes each [absolute] estimate and r denotes the [absolute] correlation coefficient of each anchor-scale combination, for each i of n total estimates. Fisher’s z transformation was applied to the correlation coefficients [35].

Distribution-based methods

Distributional properties of the HESD Itch (weekly average), HESD Pain (weekly average) and HESD score (weekly average) were used to provide an indication of the amount of change beyond measurement error that may be considered meaningful.

Estimates were calculated as 0.5 of the standard deviation (SD) at baseline [36, 37] and the standard error of measurement (SEM). The SEM was calculated as the SD at baseline multiplied by the square root of one minus the reliability of the score at baseline [SD * (1 − r)1/2]. The ICC calculated as part of test-retest reliability analyses using the PGI-S anchor was used for the reliability coefficient.

Results

Qualitative Interviews

Interview Sample Characteristics

A total of 20 adults with CHE from the US were interviewed (n = 10 in each interview round). The interview sample represented a diverse range of demographic and clinical characteristics (Table 3).

Table 3.

Demographic and clinical characteristics of interview participants

Description Number of interview participants (N = 20)
Gender, n (%)
 Female 14 (70.0%)
 Male 6 (30.0%)
Age
Mean (range) 46.6 (18–69)
Ethnicity, n (%)
 Hispanic or Latinx 3 (15.0%)
 Non-Hispanic or Latinx 17 (85.0%)
Race, n (%)
 White/Caucasian 13 (65.0%)
 Black/African American 5 (25.0%)
 Asian or Pacific Islander 1 (5.0%)
 Indian 1 (5.0%)
Work status, n (%)
Working full or part time 16 (80.0%)
 Retired 2 (10.0%)
 Full time homemaker 1 (5.0%)
 Unemployed 1 (5.0%)
Occupation, n (%)
 Office/administration work 6 (30.0%)
 Unspecified 5 (25.0%)
 Physician 1 (5.0%)
 Waitress 1 (5.0%)
 Cleaner 1 (5.0%)
 Eye technician 1 (5.0%)
 Horse groomer 1 (5.0%)
 Nurse 1 (5.0%)
 Sales 1 (5.0%)
 Housekeeper 1 (5.0%)
 Chef 1 (5.0%)
Highest level of education, n (%)
 Some high school 3 (15.0%)
 Completed high school 5 (25.0%)
 Some years of college 3 (15.0%)
 Undergraduate or bachelor’s degree 8 (40.0%)
 Graduate degree 1 (5.0%)
Clinician reported hand eczema subtype, n (%)
 Atopic hand eczema 9 (45.0%)
 Allergic contact dermatitis 6 (30.0%)
 Irritant contact dermatitis 5 (25.0%)
Physician Global Assessment of Disease Severity (PGA), n (%)
 Mild 5 (25.0%)
 Moderate 9 (45.0%)
 Severe 6 (30.0%)
Patient Global Assessment of Disease Severity (PaGA), n (%)
 Mild 5 (25.0%)
 Moderate 7 (35.0%)
 Severe 7 (35.0%)
 Missing 1 (5.0%)
Time since diagnosis of hand eczema, n (%)
 < 1 year 1 (5.0%)
 1–4 years 3 (15.0%)
 5–10 years 9 (45.0%)
 11–15 years 2 (10.0%)
 > 15 years 5 (25.0%)

Cognitive Debriefing of HESD

The items developed to assess all sign/symptom concepts identified as important during CE were generally found to be relevant and well understood (Table 4). Items assessing itch, redness, roughness and dryness were relevant for all participants, regardless of CHE subtype. Most participants correctly understood and accurately used the recall period throughout the HESD (≥ 75.0% for each item). All participants (n = 20/20) demonstrated an understanding of both the ‘severe’ and ‘worst imaginable’ options for the upper anchor label of the response scale. However, most (n = 10/16, 63.0%) reported a preference for the ‘severe’ option as it better reflected how they perceived their experience of signs/symptoms (n = 5), and it was easier for them to understand (n = 3). All participants asked found the touchscreen to be responsive (n = 6/6), the device light and easy to hold (n = 10/10), the font size appropriate and easy to read (n = 13/13) and navigating between the items to be easy (n = 13/13).

Table 4.

Overview of HESD item understanding and relevance

HESD item Reported during CE CD understanding CD relevance Example supportive quotes
S P Total
Itch 19/20 1/20 20/20 20/20 20/20 “…I'm itchy all the time.” (Female aged 65 with severe atopic CHE)
Burning 13/20 13/20 20/20 18/20 “…burning means like it’s just a really hot sensation” (Male aged 57 with mild allergic CHE)
Pain 12/20 6/20 18/20 20/20 18/20 “…it’s painful…pain in there that you just can’t stand it.” (Female aged 33 with moderate atopic CHE)
Cracked skin 11/20 8/20 19/20 20/20 19/20 “…basically like splitting or separation of the skin, um, and like it’s visible or I can feel it.” (Male aged 41 with severe allergic CHE)
Redness 17/20 3/20 20/20 20/20 20/20 “…redness, how red. How, how itching, you know, changes color from your normal skin to the redness.” (Female aged 33 with moderate atopic CHE)
Heat 7/20 5/20 12/20 19/20 14/18 “…it’s mostly when you’re scratching on the hand is when it starts, you know, kind of getting warm or hot.” (Male aged 38 with moderate atopic CHE)
Dry skin 17/20 1/20 18/20 20/20 20/20 “Dryness to me is when it feels really rough and like not calloused…Like you could hear the roughness of it.” (Female aged 18 with mild atopic CHE)
Swelling 8/20 8/20 16/20 20/20 14/20 “…I get to scratching, and they swell up.” (Male aged 51 with moderate irritant CHE)
Blisters 6/20 5/20 11/20 19/20 14/20 “…when I think of blisters I think of actual puffiness…and maybe having a little liquid in it.” (Male aged 41 with severe allergic CHE)
Bleeding 9/20 6/20 15/20 20/20 13/19 “…any time my hands crack and they split then you could, you could see the bleeding.” (Female aged 53 with moderate atopic CHE)
Thickening 6/20 11/20 17/20 19/20 15/20 “Thickening, I think just like calluses and stuff, like roughness.” (Female aged 54 with mild irritant CHE)
Flaking 10/20 4/20 14/20 20/20 18/20 “…it’s just basically thin layers of your skin, dried skin coming off…It’s just little pieces of dried skin that peels off.” (Female aged 40 with severe atopic CHE)
Roughness 12/20 5/20 17/20 20/20 20/20 “It’s the, the texture of your skin. Like where if you have normal hand texture it’s smooth and soft. This is not smooth or soft.” (Female aged 40 with severe atopic CHE)
Hardnessa 10/20 1/20 11/20 10/10 8/10 “The skin… definitely gets hard. You can feel it.” (Male aged 61 with mild irritant CHE)
Oozing/weepinga 5/20 6/20 11/20 10/10 6/10 “It’s kind of the same as blisters… sometimes it comes like even when you scratch it and when you open it, like when you have a crack and then it starts maybe oozing like a little water.” (Female aged 25 with moderate allergic CHE)

CD Cognitive Debriefing; CE Concept Elicitation; HESD Hand Eczema Symptom Diary; S concepts mentioned spontaneously; P concepts only mentioned when probed by the interviewer

aItems only debriefed in round 2 (n = 10)

Findings from the first round of interviews suggested several signs/symptoms were closely related or conceptually equivalent. To explore item redundancy and inform item reduction, seven participants (n = 7/10) in round two were asked to group the dermatological signs/symptoms they perceived to overlap conceptually. The most frequently reported conceptual overlaps were between flaking and cracking (n = 6/7), heat and burning (n = 5/7), hardness and roughness (n = 5/7) and hardness and thickening (n = 5/7).

Based on the findings from round one, input from the expert dermatologists and further consideration of the earlier CE findings, two additional items were added to assess concepts reported during the interviews not captured by the initial HESD (‘skin hardness’ and ‘oozing/weeping’). The original 13 items were retained for further evaluation in round two without revisions.

Additional modifications were made to the HESD following round two. Items assessing skin ‘hardness’ and ‘roughness’ were removed due to their similarity/overlap with skin ‘thickening’. Thickening was retained as it reflected the terminology used most frequently by participants when describing this sign/symptom. The item assessing ‘blisters’ was also removed because of its low relevance during CE and input from the expert dermatologists that blisters are relatively rare and typically representative of severe disease and therefore would not be relevant to most CHE patients. As blisters do not change rapidly from day to day and are a sign that is easily observed by clinicians, it was decided they would be more appropriate to evaluate in clinician-completed assessments, such as the IGA-CHE and Hand Eczema Severity Index (HECSI). Additionally, the item assessing ‘heat’ was removed due to its low relevance during CE and overlap with ‘burning’. This resulted in the 11-item HESD taken forward for psychometric validation in the phase 2b trial.

Modifications to HESD Following Initial Psychometric Validation and Regulatory Feedback

Based on the findings of the initial psychometric validation using the phase 2b data but importantly also considering the findings of the qualitative research and clinical input, the HESD was reduced from 11 to 8 items. Items assessing ‘bleeding’ and ‘oozing/weeping’ were removed because of poor item performance (i.e., heavily skewed response distributions across all timepoints [range 42.9–78.2%]; see Fig. S1) and low correlations (range 0.36–0.66; Table S2) with other items in the measure. Expert clinical input suggested these signs/symptoms were more characteristic of a severe CHE flare rather than daily experience and could likely be adequately assessed through clinician-completed assessments. Additionally, the qualitative findings and high correlations (r = 0.90; Table S2) suggested ‘pain’ and ‘burning’ were very closely conceptually related to the point of redundancy. Expert clinical input confirmed that ‘pain’ was more important to assess from a clinical perspective and therefore ‘burning’ was removed.

Following US FDA feedback, the HESD was further reduced from 8 to 6 items. Items assessing ‘swelling’ and ‘thickening’ were removed because the US FDA had questioned their importance to CHE patients, given that < 50.0% of interview participants spontaneously reported experience of these symptoms (Table 4) and response distributions for the phase 2b trial indicated that > 15.0% reported ‘no sign or symptom’ at Baseline for these items (Fig. S1). This resulted in the 6-item HESD brought forward for psychometric validation in the phase 3 trial (Fig. 2).

Fig. 2.

Fig. 2.

Six-item HESD conceptual framework

Psychometric Validation of the HESD in Phase 3 Clinical Trial

For the psychometric results, we focus on the results from the evaluation of the final 6-item version in the phase 3 clinical trial. However, results from the validation of the 8-item HESD using the phase 2b data were comparable and equally strong and are summarized in Table S1.

Trial Sample Characteristics

Key demographic and clinical characteristics of the phase 3 sample are provided in Table 5. The sample included more female (65.7%) than male participants and most were clinically classified as Fitzpatrick skin types II or III (43.2% and 41.1%, respectively).

Table 5.

Demographic and clinical characteristics for the psychometric analysis population in the phase 3 trial

Demographics Psychometric analysis population (N = 280)
Gender—n (%)
 Female 184 (65.7%)
 Male 96 (34.3%)
Age
 n 280
 Mean (SD) 43.3 (14.3)
 Median 44
Q1, Q3 31, 55
Min, max 19, 77
Missing 0
Race—n (%)
 American Indian or Alaska Native 1 (0.4%)
 Asian 12 (4.3%)
 Black or African American 2 (0.7%)
 White 247 (88.2%)
 Multiple 1 (0.4%)
 Not reported 16 (5.7%)
 Other 1 (0.4%)
Ethnic origin—n (%)
 Hispanic or Latinx 10 (3.6%)
 Not Hispanic or Latinx 255 (91.1%)
 Not reported 15 (5.4%)
Fitzpatrick skin type—n (%)
 Type I 14 (5.0%)
 Type II 121 (43.2%)
 Type III 115 (41.1%)
 Type IV 26 (9.3%)
 Type V 3 (1.1%)
 Type VI 1 (0.4%)
IGA-CHE—n (%)
 3—Moderate 189 (67.5%)
 4—Severe 91 (32.5%)

Psychometric analysis population defined as the first 280 participants randomised with IGA-CHE completion at Week 16. Some countries such as France are not allowed to report ethnicity and race data. Where this is the case the 'Not reported' option is used; this is different from missing data

SD standard deviation

Stage 1: Item Properties

Quality of completion Quality of completion was relatively high across all timepoints, decreasing only slightly over time. For the 7 days prior to the Baseline visit, 80.7% (n = 226) had no missing HESD daily item scores, and for the 7 days prior to the Week 16 visit, 72.9% (n = 204) had no missing HESD daily item scores. Over 94.0% had at least four complete days of HESD daily scores for a given week and most participants were missing just 1 or 2 days in each week.

Item-response distributions For all items, post-Baseline item responses were distributed across the response scale and in line with expectations (see Figures S2, S3 and S4). Ceiling effects (> 15.0% scoring the highest possible score) were only detected during the Baseline week for ‘cracked skin’ at Day −2 (16.0%) and Day −1 (16.1%) and for ‘dryness’ from Day −7 to Day −1 (17.1–18.9%). This was limited and not a concern as participants were expected to have more severe CHE at entry. Starting at Week 4, floor effects (> 15.0% scoring the lowest possible score) became dominant, reflecting improvements associated with treatment.

Stage 2: Dimensionality and Scoring

Inter-item correlations Inter-item correlations were calculated using data from Day −1 of Week 4 to provide an initial exploration of dimensionality. All inter-item correlations were moderate to high (0.71–0.89), and no correlations exceeded 0.90, suggesting all items are closely related but no item-pairs were so closely related to suggest redundancy (Table S3).

Rasch analysis Rasch analysis provided further support for a single unidimensional structure of the HESD, with all items showing acceptable fit statistics (see Online Appendix 1 for full details). Some evidence of person misfit was found for the infit and outfit statistics for the overall model; however, the patterns were not of concern given the low proportion of participants with high fit statistics. Item characteristic curves provided evidence the response scale for each item was appropriate, with distinct peaks for most response options on the 11-point NRS. Independence between item response was demonstrated for all items except ‘dryness’ and ‘flaking’.

Stage 3: Reliability and Validity of Scores

Internal consistency reliability Internal consistency was examined using data from Day −1 of Week 4 to assess the homogeneity of items belonging to the HESD weekly average score (Table S4). Cronbach’s alpha was high (0.96) and well above the a priori threshold of > 0.70, indicating good internal consistency. Calculation of the alpha coefficient with each item deleted in turn resulted in slightly lower Cronbach’s alpha values for all items (0.95–0.96), providing support for retaining all items.

Test-rest reliability Test-retest reliability results for participants defined as stable were ‘excellent’ (ICC range 0.89–0.94) for the weekly average HESD Itch score, HESD Pain score and HESD score between Week 2 and 4 and between Week 4 and 8, irrespective of the anchor used to define stability. Pearson’s correlation coefficients (range 0.84–0.91) were similar to ICCs, providing further evidence of strong test-retest reliability for the HESD scores (Table 6). The Spearman-Brown Prophecy Formula, used to approximate the reliability of the averaged measurement when creating a weekly summary score from < 7 days, confirmed that a summary score based on just 4 days of data would still have good reliability (> 0.83).

Table 6.

Weekly average HESD Itch, HESD Pain and HESD score intra-class correlation coefficient estimates of test-retest reliability

HESD score/period Stability definition N (%) ICC estimate 95% confidence intervals Pearson correlation coefficient
Lower Upper
HESD Itch score
 Week 2–4 PaGA 143 (51.1%) 0.91 0.77 0.96 0.89
IGA-CHE 160 (57.1%) 0.91 0.83 0.95 0.87
Itch PGI-S 149 (53.2%) 0.90 0.71 0.95 0.88
 Week 4–8 PaGA 145 (51.8%) 0.93 0.88 0.96 0.89
IGA-CHE 165 (58.9%) 0.92 0.87 0.95 0.87
Itch PGI-S 165 (58.9%) 0.91 0.84 0.95 0.87
HESD Pain score
 Week 2–4 PaGA 143 (51.1%) 0.91 0.81 0.95 0.88
IGA-CHE 160 (57.1%) 0.89 0.81 0.93 0.84
Pain PGI-S 146 (52.1%) 0.90 0.74 0.95 0.88
 Week 4–8 PaGA 145 (51.8%) 0.92 0.88 0.94 0.85
IGA-CHE 165 (58.9%) 0.93 0.89 0.95 0.87
Pain PGI-S 150 (53.6%) 0.94 0.90 0.96 0.90
HESD score
 Week 2–4 PaGA 143 (51.1%) 0.93 0.80 0.96 0.91
IGA-CHE 160 (57.1%) 0.91 0.81 0.95 0.87
HESD PGI-S 154 (55.0%) 0.92 0.77 0.96 0.90
 Week 4–8 PaGA 145 (51.8%) 0.93 0.88 0.96 0.89
IGA-CHE 165 (58.9%) 0.92 0.87 0.95 0.88
HESD PGI-S 161 (57.5%) 0.94 0.89 0.97 0.91

PaGA population consists of patients that reported a change in PaGA = 0 between Week 2 and 4. IGA-CHE population consists of patients that reported a change in IGA-CHE = 0 between Week 2 and 4. HESD PGI-S population consists of patients that reported a change in HESD PGI-S = 0 between Week 2 and 4. PaGA population consists of patients that reported a change in PaGA = 0 between Week 4 and 8. IGA-CHE population consists of patients that reported a change in IGA-CHE = 0 between Week 4 and 8. HESD PGI-S population consists of patients that reported a change in HESD PGI-S = 0 between Week 4 and 8

ICC intra-class correlation coefficient, IGA-CHE investigator global assessment of severity of Chronic Hand Eczema, PaGA patient global assessment of disease severity, HESD hand eczema symptom diary, PGI-S patient global impression of severity

Convergent validity Correlations were examined between the HESD Itch score, HESD Pain score and HESD score and the DLQI (total score, symptoms and feeling subscale, and item 1: itch, pain, soreness, stinging) at Week 4 (Table 7). All convergent correlations were moderate (range 0.53–0.64) for all three HESD scores and exceeded the hypothesized threshold, providing evidence of convergent validity.

Table 7.

Correlation of HESD scores with convergent measures at Week 4

Measure Convergent measures and correlations with HESD
Spearman correlation coefficients Polyserial correlation coefficients
DLQI total score DLQI symptoms and feelings subscale DLQI item 1: itch, pain, soreness, stinging
HESD Itch score 0.53 (n = 268)a 0.59 (n = 268)a 0.58 (n = 268)a
HESD Pain score 0.58 (n = 268)a 0.62 (n = 268)a 0.60 (n = 268)a
HESD score 0.59 (n = 268)a 0.64 (n = 268)a 0.60 (n = 268)a

Population includes all patients in the psychometric analysis population without form level missing data at Week 4. All measures are scored so that higher scores mean worse Chronic Hand Eczema severity. A very strong correlation is defined as ≥ 0.90, strong correlation as ≥ 0.70 but < 0.90, moderate correlation as ≥ 0.50 but < 0.70 and small correlation as < 0.50

HESD hand eczema symptom diary, DLQI Dermatology life quality index

aCorrelations strength met or exceeded hypothesised level

Known-group validity HESD Itch scores, HESD Pain scores and HESD scores were compared among groups who differed in severity as defined based on their Patient Global Assessment (PaGA), IGA-CHE and respective Itch, Pain and HESD PGI-S scores. For the HESD Itch, there was a pattern of significantly higher mean scores (indicating worse itch) for participants who also scored higher (worse) on the PaGA, IGA-CHE and Itch PGI-S (p < 0.001 for all scores), with expected monotonic increases across severity groups (Table 8). Effect sizes indicated differences between adjacent groups were moderate to large (ES > 0.69). Similar patterns of results were found for the HESD Pain scores (Table S5) and HESD scores (Table S6), providing evidence of known groups validity for all three HESD scores.

Table 8.

Known groups validity for the HESD itch scores at Week 4

Item/score anchor n Mean HESD itch score (SD) Median HESD itch score Between groups effect sizea p valueb
PaGA
 Responses 0–1: clear—almost clear [reference group] 45 1.7 (1.89) 1.3  < 0.001
 Response 2: mild 94 4.1 (2.30) 4.1 1.09
 Response 3: moderate 98 5.6 (2.11) 5.8 0.69
 Response 4: severe disease 31 7.4 (1.69) 7.9 0.87
IGA-CHE
 Responses 0–1: clear-almost clear [reference group] 31 2.5 (2.75) 1.6  < 0.001
 Response 2: mild 110 4.0 (2.49) 4.1 0.59
 Response 3: moderate 102 5.5 (2.24) 5.5 0.62
 Response 4: severe disease 26 6.8 (2.05) 7.3 0.63
Itch PGI-S
 None [reference group] 38 1.7 (2.15) 0.7  < 0.001
 Mild 126 4.0 (2.21) 4.1 1.07
 Moderate 76 6.0 (1.83) 6.2 0.95
 Severe 28 7.9 (1.34) 8.1 1.12

Population includes all patients in the psychometric analysis population without form level missing data at Week 4

aCalculated using Hedge's g between adjacent groups. Hedge's g is calculated as the difference in means divided by the pooled standard deviation

bThe statistical significance (p = 0.05) of differences in scores between groups was calculated using the F-test of one-way ANOVAs

SD standard deviation, ANOVA analysis of variance, HESD hand eczema symptom diary, IGA-CHE investigator global assessment of severity of Chronic Hand Eczema, PaGA patient global assessment of disease severity, PGI-S patient global impression of severity

Ability to detect change Changes in HESD Itch scores, HESD Pain scores and HESD scores were compared among participants defined as “improved”, “stable” and “worsened” on the PaGA, IGA-CHE and the respective Itch, Pain and HESD PGI-S and PGI-C between Baseline and Week 16. These results provide evidence the three HESD scores can detect change over time, regardless of the rating used to define change. As shown in Table 9, for all anchors there was strong evidence that the HESD Itch score is responsive to improvements over time, with consistently large within-group effect sizes (ES ≥ 2.67) compared with moderate-large effect sizes in the ‘stable’ group (ES range 0.73–1.12). The number of participants who worsened was extremely low for the PaGA, IGA-CHE and Itch PGI-S and should therefore be interpreted with caution. Differences between change groups were statistically significant, and effect sizes were mostly large between groups. Results were similar and equally strong for the HESD Pain scores (Table S7) and HESD scores (Table S8).

Table 9.

HESD itch scores ability to detect change between Baseline and Week 16

Grouping variable n Mean HESD itch change score (SD) Median HESD itch change score (min–max) Within groups effect sizea Between groups effect sizeb Between groups p valuec
PaGA
 ≥ 1-level improvement 179 −4.32 (2.49) −4.71 (−10.1) −2.73 −1.28  < 0.001
 Change score = 0 62 −1.16 (2.42) −0.77 (−8.4) −0.73
 ≥ 1-level worsening 15 −0.96 (1.78) −1.14 (−4.3) −0.61 0.08
IGA-CHE
 ≥ 1-level improvement 181 −4.23 (2.60) −4.71 (−10.2) −2.67 −0.96  < 0.001
 Change score = 0 74 −1.77 (2.46) −1.29 (−9.4) −1.12
 ≥ 1-level worsening 10 −0.05 (1.33) −0.07 (−2.2) −0.03 0.73
Itch PGI-S
 ≥ 1-level improvement 177 −4.39 (2.48) −4.71 (−10.1) −2.77 −1.33  < 0.001
 Change score = 0 66 −1.21 (2.14) −0.93 (−7.4) −0.77
 ≥ 1-level worsening 12 0.07 (1.93) 0.07 (−5.3) 0.04 0.61
Itch PGI-C
 ‘A little better’ + ‘Much better’ 169 −4.50 (2.44) −4.86 (−10.2) −2.84 −1.33  < 0.001
 ‘No change’ 60 −1.36 (2.07) −1.14 (−6.4) −0.86
 ‘A little worse + ‘Much worse’ 32 −1.06 (2.52) −0.33 (−8.3) −0.67 0.14

aMean change score divided by the SD of the score at the earlier of the two timepoints

bBetween-group effect size to evaluate the magnitude of differences between groups. Calculated using Hedge’s g as the difference between 2 groups. Hedge’s g is calculated as the difference in means divided by the pooled standard deviation. Each value in this column relates to the difference between the group in that row and the stable group for that anchor

cF-test from a one-way ANOVA

SD: Standard Deviation; ANOVA: Analysis of variance; IGA-CHE: Investigator global assessment of severity of Chronic Hand Eczema; PaGA: Patient global assessment of disease severity; HESD: Hand eczema symptom diary; HESD PGI-S: Hand eczema symptom diary patient global impression of severity; HESD PGI-C: Hand eczema symptom diary patient global impression of change

Interpretation of scores Anchor correlations revealed that all anchors were sufficiently correlated with their target scores (range 0.54–0.70) to support meaningful change analysis. Mean change in HESD scores was calculated for participants defined as ‘minimally’ or ‘moderately’ improved on the PGI-S (Itch, Pain, HESD), PGI-C (Itch, Pain and HESD), PaGA and IGA-CHE to define within-group change thresholds. This provided a range of values that could be considered plausible as the meaningful change threshold for a within-individual responder definition: HESD Itch score (−2.8 to −5.4), HESD Pain score (−2.4 to −5.4) and HESD score (−2.7 to −5.2). A correlation weighted average of these estimates was then produced, weighted according to the strength of anchor correlation with the target score. This produced values of −4.3 for the HESD Itch score, −4.4 for the HESD Pain score and −4.2 for the HESD score. Consultation of empirical cumulative distribution function (eCDF) and probability density function (PDF) plots showed these thresholds would classify > 50.0% of ‘moderately’ and ‘much improved’ participants (based on the anchors) as improved while classifying few ‘stable’ participants as improved (see Fig. 3 as an example; see Figs. S5S9 in the online supplementary material for the remaining eCDF plots). Between-group minimal important differences (MIDs) were also estimated with plausible ranges suggested as −1.45 to −2.50 for HESD Itch score, −0.87 to −3.04 for HESD Pain score and −1.54 to −2.51 for HESD score.

Fig. 3.

Fig. 3

Empirical cumulative distribution function of HESD Itch change scores by Itch PGI-S group at Week 16

The estimates derived from these anchor-based methods were triangulated to form a recommended threshold for a responder definition. Estimates were summarized on forest plots to identify convergence around a small range of values (see Fig. 4 as an example; see Figs. S10 and S11 in the online supplementary material for the remaining forest plots). A within-individual responder definition of −4.0 for each HESD score was recommended for consistency with thresholds used for similar scores in related populations [38] as well as for ease of use/interpretation.

Fig. 4.

Fig. 4

Forest plot showing different within-group mean change and distribution-based estimates of meaningful change for the HESD Itch score

Discussion

The HESD was developed based on in-depth qualitative research with the population of interest (CHE patients), thus following best practice methods for development of PROMs [9, 1114]. As reported here, in-depth cognitive interviews were then used to confirm strong item relevance and comprehension, confirming the content validity of the measure. This was followed by psychometric evaluation using data from two clinical trials. All items were found to be highly relevant to most patients in both the qualitative and psychometric work and well understood by patients in the qualitative interviews. There were low levels of missing data throughout the phase 3 trial, providing evidence the diary is not overly burdensome to complete daily, even for a 16-week trial. Evaluation of item response distributions suggests the response scale can capture variability in severity of CHE signs/symptoms as well as changes in sign/symptom severity over time.

The final 6-item measure demonstrated strong validity, reliability and the ability to detect change over time in the specific context of adult CHE patients with moderate to severe signs/symptoms. Inter-item correlations provided evidence that all items are adequately related and can be grouped into a single unidimensional summary score. The underlying structure of the HESD was further supported by Rasch analysis, with all items showing acceptable item fit statistics. Internal consistency reliability was very high and not improved by removing any items in the scale, providing further support the HESD score (weekly average) is unidimensional and that all items assess a single underlying trait. Test-retest reliability results for all three HESD scores (weekly average) were very strong, irrespective of the timepoints used or how stability of CHE severity was defined.

Convergent validity findings were consistent with a priori hypotheses concerning relationships with the HESD scores and measures of related concepts (DLQI total score, DLQI symptoms and feeling subscale and DLQI item 1-itch, soreness and stinging). Known-group analyses provided evidence the HESD scores can discriminate among patients who differ in CHE severity. The expected pattern of monotonically increasing HESD scores across severity groups was identified and effect sizes indicated differences between adjacent groups were moderate to large. Importantly, the HESD scores were shown to be sensitive to improvements in CHE severity, with large effect sizes within groups defined as ‘improved’ and between ‘improved’ and ‘stable’ groups. The triangulation of various anchor-based methods supported a responder definition of −4.0 as the threshold for defining within-individual clinically meaningful improvements in all three HESD scores in moderate to severe CHE patients.

Strengths of this research include the capture of insights from 40 CHE patients through separate qualitative concept elicitation (n = 20) [9] and cognitive debriefing (n = 20) interviews during instrument development and content validity testing. Results from the psychometric validation conducted in two separate clinical trial populations were very consistent, providing confidence that the measurement properties are robust and not specific to a single sample. Furthermore, psychometric evaluation of the HESD was conducted in accordance with US FDA guidance for assessing measurement properties of PROMs [11, 14, 22, 23].

However, there are a few limitations to the study. The HESD was developed and tested only with patients in the US. To establish broader relevance across countries, there would be value in assessing content validity in a non-US population. For both psychometric validation populations, participants were predominantly white/Caucasian. Future confirmation of psychometric validity in larger proportions of non-white participants would be of value. Furthermore, although the interview sample included participants with three of the most prominent subtypes of CHE (i.e., allergic, atopic and irritant), it was not feasible to include all subtypes in the qualitative research. However, findings were consistent across the three prominent subtypes, providing some confidence in the generalizability of findings. Finally, all psychometric evaluation to date has been performed in clinical trial samples. If the HESD is to be used in real-world studies or in general clinical practice, further evaluation in a ‘real-world’ sample would be beneficial to confirm generalizability of the measurement properties.

Conclusion

The HESD is the first CHE-specific PRO diary measure developed and validated in line with regulatory guidance to specifically assess the severity of core signs and symptoms of CHE. Content validity of the HESD was confirmed in CD interviews with CHE patients, and psychometric validation activities showed evidence of strong construct validity, reliability and the ability to detect change for the HESD Itch score, HESD Pain score and the HESD score and that an improvement of ≥ 4 points in 7-day average HESD scores represents a clinically meaningful, important change.

Supplementary Information

Below is the link to the electronic supplementary material.

Acknowledgements

The authors thank the investigators and patients who participated in this study.

Medical Writing, Editorial and Other Assistance

Kate Burrows contributed to the design, data collection and analysis of the qualitative cognitive interviews. Sam Wratten and Amy Jones contributed to the interpretation of the phase 2 psychometric results. Piper (formerly Philip) Fromy contributed to the analysis of the phase 2 psychometric data. Medical writing support, including assisting authors with the development of the manuscript drafts and incorporation of comments, was provided by Alyson Young of Adelphi Values (Bollington, Macclesfield, UK), supported by LEO Pharma A/S, according to Good Publication Practice guidelines [39].

Author contributions

Rob Arbuckle, Laura Grant and Lotte Seiding Larsen were involved throughout the design, data collection and analysis of the qualitative cognitive interviews. Rob Arbuckle, Laura Grant, George Skingley and Lotte Seiding Larsen contributed to the design of both sets of psychometric analyses. Psychometric analyses were performed by George Skingley. All authors contributed to interpretation of study results described in the manuscript and provided input into the manuscript and reviewed and approved the final manuscript.

Funding

Adelphi Values was commissioned by LEO Pharma A/S to conduct this research and the sponsor contributed to the study design, data collection and preparation of the manuscript for publication. The sponsor is funding the journal’s rapid service fees.

Data availability

The datasets generated and/or analysed from the qualitative interviews are not publicly available in order to protect participant confidentiality. The datasets generated during and/or analysed during psychometric validation are available from the corresponding author on reasonable request.

Declarations

Conflict of interest

Sonja Molin has received honoraria as consultant/advisor or speaker and/or grants from Abbvie, Almirall, Aralez, Arcutis, Basilea, Bausch and Lomb, Boehringer-Ingelheim, Bristol Myer Squibb, Evidera, Galderma, GSK, Incyte, LEO Pharma A/S, Lilly, Novartis, Pfizer, Sanofi, Sun Pharma and UCB. She is currently investigator for Novartis and LEO Pharma A/S. Lotte Seiding Larsen was an employee of LEO Pharma A/S at the time the work was conducted and now is an employee of H. Lundbeck A/S, Valby, Denmark. Peter Joensson and Marie Louise Oesterdal are employees of LEO Pharma A/S, Ballerup, Denmark. Rob Arbuckle, Laura Grant and George Skingley are employees of Adelphi Values, a health outcomes agency contracted by LEO Pharma A/S to conduct the research. Marie Lousie Schuttelaar has been a consultant, advisory board member, investigator, and/or speaker for Sanofi, Genzyme, Regeneron Pharmaceuticals, Inc., Pfizer, LEO Pharma A/S, Eli Lilly, Galderma, AbbVie, Novartis, and Amgen. SM, SLS and MLS received no honoraria related to the development of this publication.

Ethical approval

All participants provided informed consent indicating their data will be used for medical research purposes and the study results may be published. The studies were performed in accordance with the Helsinki Declaration of 1964 and its later amendments. Ethical approval and oversight for the qualitative interviews was obtained from Group Copernicus Group Independent Review Board (CGIRB; reference ADE1-17-162). Ethical approval for the psychometric validation activities was obtained as part of the phase 2b (NCT03683719) and phase 3 (NCT04871711) trials from the independent institutional review board (IRB) and ethics committees listed in Table S9 and Table S10, respectively.

Footnotes

Prior publication: This manuscript is based on work that has been previously presented as a poster at the 2023 EADV Congress in Berlin, Germany (11–14 October).

References

  • 1.Agner T, Elsner P. Hand eczema: epidemiology, prognosis and prevention. J Eur Acad Dermatol Venereol. 2020;34(S1):4–12. doi: 10.1111/jdv.16061. [DOI] [PubMed] [Google Scholar]
  • 2.Voorberg AN, Loman L, Schuttelaar MLA. Prevalence and severity of hand eczema in the Dutch general population: a cross-sectional, questionnaire study within the lifelines cohort study. Acta Derm Venereol. 2022;102:adv0626. doi: 10.2340/actadv.v101.432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Thyssen JP, Schuttelaar MLA, Alfonso JH, et al. Guidelines for diagnosis, prevention, and treatment of hand eczema. Contact Dermat. 2022;86(5):357–378. doi: 10.1111/cod.14035. [DOI] [PubMed] [Google Scholar]
  • 4.Quaade AS, Simonsen AB, Halling AS, Thyssen JP, Johansen JD. Prevalence, incidence, and severity of hand eczema in the general population—a systematic review and meta-analysis. Contact Dermat. 2021;84(6):361–374. doi: 10.1111/cod.13804. [DOI] [PubMed] [Google Scholar]
  • 5.Lerbaek A, Kyvik KO, Ravn H, Menné T, Agner T. Clinical characteristics and consequences of hand eczema—an 8-year follow-up study of a population-based twin cohort. Contact Dermat. 2008;58(4):210–216. doi: 10.1111/j.1600-0536.2007.01305.x. [DOI] [PubMed] [Google Scholar]
  • 6.Diepgen TL, Agner T, Aberer W, et al. Management of chronic hand eczema. Contact Dermat. 2007;57(4):203–210. doi: 10.1111/j.1600-0536.2007.01179.x. [DOI] [PubMed] [Google Scholar]
  • 7.Silverberg JI, Guttman-Yassky E, Agner T, et al. Chronic hand eczema guidelines from an expert panel of the international eczema council. Dermatitis. 2021;32(5):319–326. doi: 10.1097/der.0000000000000659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zalewski A, Krajewski PK, Szepietowski JC. Prevalence and characteristics of itch and pain in patients suffering from chronic hand eczema. J Clin Med. 2023;12(13):4198. doi: 10.3390/jcm12134198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Grant L, Seiding Larsen L, Burrows K, et al. Development of a conceptual model of chronic hand eczema (CHE) based on qualitative interviews with patients and expert dermatologists. Adv Ther. 2020;37(2):692–706. doi: 10.1007/s12325-019-01164-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lynde C, Guenther L, Diepgen TL, et al. Canadian hand dermatitis management guidelines. J Cutan Med Surg. 2010;14(6):267–284. doi: 10.2310/7750.2010.09094. [DOI] [PubMed] [Google Scholar]
  • 11.FDA. Patient-focused drug development: selecting, developing, or modifying fit-for-purpose clinical outcome assessments: guidance for industry, food and drug administration staff, and other stakeholders. [Draft Guidance]. 2022. https://www.fda.gov/media/159500/download. Accessed July 20 2023.
  • 12.FDA. Patient-focused drug development: methods to identify what is important to patients: guidance for industry, food and drug administration staff, and other stakeholders. 2022. https://www.fda.gov/media/131230/download. Accessed July 20 2023.
  • 13.FDA. Patient-focused drug development: incorporating clinical outcome assessments into endpoints for regulatory decision-making: guidance for industry, food and drug administration staff, and other stakeholders. [Draft Guidance]. 2023. https://www.fda.gov/media/166830/download. Accessed July 20 2023.
  • 14.FDA. Patient-focused drug development: collecting comprehensive and representative input: guidance for industry, food and drug administration staff, and other stakeholders. 2020. https://www.fda.gov/media/139088/download. Accessed July 20 2023.
  • 15.Ofenloch RF, Weisshaar E, Matterne U, Diepgen T, Apfelbacher C. Health-related quality of life in hand eczema: international development of a new instrument. Dermatol Beruf Umwelt. 2012;60(3):102–105. doi: 10.5414/DBX00202. [DOI] [Google Scholar]
  • 16.Diepgen TL, Elsner P, Schliemann S, et al. Guideline on the management of hand eczema ICD-10 Code: L20. L23. L24. L25. L30. J Dtsch Dermatol Ges. 2009;7(Suppl 3):S1–16. doi: 10.1111/j.1610-0387.2009.07061.x. [DOI] [PubMed] [Google Scholar]
  • 17.Willis GB. Cognitive interviewing: a tool for improving questionnaire design. Chennai: Sage Publications; 2004. [Google Scholar]
  • 18.Hwang S. Utilizing qualitative data analysis software: a review of Atlas.ti. Soc Sci Comput Rev. 2008;26(4):519–527. doi: 10.1177/0894439307312485. [DOI] [Google Scholar]
  • 19.Worm M, Thyssen JP, Schliemann S, et al. The pan-JAK inhibitor delgocitinib in a cream formulation demonstrates dose response in chronic hand eczema in a 16-week randomized phase IIb trial. Br J Dermatol. 2022;187(1):42–51. doi: 10.1111/bjd.21037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Finlay AY, Khan GK. Dermatology Life Quality Index (DLQI)—a simple practical measure for routine clinical use. Clin Exp Dermatol. 1994;19(3):210–216. doi: 10.1111/j.1365-2230.1994.tb01167.x. [DOI] [PubMed] [Google Scholar]
  • 21.Basra MK, Salek MS, Camilleri L, Sturkey R, Finlay AY. Determining the minimal clinically important difference and responsiveness of the Dermatology Life Quality Index (DLQI): further data. Dermatology. 2015;230(1):27–33. doi: 10.1159/000365390. [DOI] [PubMed] [Google Scholar]
  • 22.FDA. Patient-reported outcome measures: use in medical product development to support labeling claims. Guidance for Industry. 2009. https://www.fda.gov/downloads/drugs/guidances/ucm193282.pdf. Accessed July 20 2023. [DOI] [PMC free article] [PubMed]
  • 23.EMA. Reflection paper on the regulatory guidance for the use of health-related quality of life (HRQL) measures in the evaluation of medicinal products. 2005. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003637.pdf. Accessed July 20 2023.
  • 24.Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas. 2002;3(1):85–106. [PubMed] [Google Scholar]
  • 25.Yen WM. Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Appl Psychol Meas. 1984;8(2):125–145. doi: 10.1177/014662168400800201. [DOI] [Google Scholar]
  • 26.Cano S, Chrea C, Salzberger T, et al. Development and validation of a new instrument to measure perceived risks associated with the use of tobacco and nicotine-containing products. Health Qual Life Outcomes. 2018;16(1):192. doi: 10.1186/s12955-018-0997-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Christensen KB, Makransky G, Horton M. Critical values for Yen's Q(3): identification of local dependence in the Rasch model using residual correlations. Appl Psychol Meas. 2017;41(3):178–194. doi: 10.1177/0146621616677520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Nunnally JC, Bernstein IH. Psychometric theory. 3. Cambridge: McGraw-Hill; 2010. [Google Scholar]
  • 29.Fayers PM, Machin D. Quality of life: the assessment, analysis and reporting of patient-reported outcomes. New York: Wiley; 2015. [Google Scholar]
  • 30.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–163. doi: 10.1016/j.jcm.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hinkle DE, Wiersma W, Jurs SG. Applied statistics for the behavioral sciences. Boston: Houghton Mifflin; 2003. [Google Scholar]
  • 32.Cohen J. Statistical power analysis for the behavioral sciences. Cambridge: Academic Press; 2013. [Google Scholar]
  • 33.Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989;27(3 Suppl):S178–S189. doi: 10.1097/00005650-198903001-00015. [DOI] [PubMed] [Google Scholar]
  • 34.Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61(2):102–109. doi: 10.1016/j.jclinepi.2007.03.012. [DOI] [PubMed] [Google Scholar]
  • 35.Trigg A, Griffiths P. Triangulation of multiple meaningful change thresholds for patient-reported outcome scores. Qual Life Res. 2021;30(10):2755–2764. doi: 10.1007/s11136-021-02957-4. [DOI] [PubMed] [Google Scholar]
  • 36.Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41(5):582–592. doi: 10.1097/01.Mlr.0000062554.74615.4c. [DOI] [PubMed] [Google Scholar]
  • 37.Osoba D. The clinical value and meaning of health-related quality-of-life outcomes in oncology. In: Gotay CC, Snyder C, Lipscomb J, editors. Outcomes assessment in cancer: measures, methods and applications. Cambridge: Cambridge University Press; 2004. pp. 386–405. [Google Scholar]
  • 38.Yosipovitch G, Reaney M, Mastey V, et al. Peak Pruritus Numerical Rating Scale: psychometric validation and responder definition for assessing itch in moderate-to-severe atopic dermatitis. Br J Dermatol. 2019;181(4):761–769. doi: 10.1111/bjd.17744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.DeTora LM, Toroser D, Sykes A, et al. Good Publication Practice (GPP) Guidelines for company-sponsored biomedical research: 2022 update. Ann Intern Med. 2022;175(9):1298–1304. doi: 10.7326/m22-1460. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The datasets generated and/or analysed from the qualitative interviews are not publicly available in order to protect participant confidentiality. The datasets generated during and/or analysed during psychometric validation are available from the corresponding author on reasonable request.


Articles from Dermatology and Therapy are provided here courtesy of Springer

RESOURCES