Abstract
Pemphigus is a group of rare and potentially fatal autoimmune blistering diseases that are associated with auto-antibodies that target intercellular adhesion molecules. Incidence of pemphigus varies among populations, with the lowest incidence in Switzerland and Finland at 0.6–0.76 per million per year and the highest in Jewish communities at 16.1–32 per million per year. Pemphigus is associated with devastating morbidity and despite advancements in our understanding of the disease and a widening array of therapeutic options, no cure exists. The delay in the development of a cure may in part be attributed to the absence of a standardized and completely validated severity outcome measures to allow for high-quality multicenter control studies. Such a tool is necessary to define the best practice in clinical studies, allow for accurate comparisons between study results, justify drug use within the clinical setting, and reduce the cost burden that is associated with the use of ineffective therapies. Utilizing outcome measures that are not validated provides an opportunity to synthesize outcome measures with the intent to favor particular treatments and thus produce false conclusions. According to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group, a validation of these measurement instruments requires investigating their responsiveness, reliability, and validity. More than 116 outcome measures exist to assess pemphigus severity, of which the Pemphigus Disease Area Index (PDAI), Autoimmune Bullous Skin Disorder Intensity Score (ABSIS), and Pemphigus Vulgaris Activity Score (PVAS) are the most comprehensively corroborated measures. With regard to validity and reliability, PDAI was unsurpassed by ABSIS and PVAS. Data indicate that ABSIS is more reliable than PVAS, but PVAS seems to have greater validity although the results are not consistent. PDAI, ABSIS, and PVAS have not yet had their responsiveness analyzed, which should be the next step to completely validate the outcome measures and conclusively determine which measure is superior.
Keywords: autoimmune blistering diseases, pemphigus, outcome measures, validation, pemphigus disease area index, autoimmune bullous skin disorder intensity score
Introduction
This literature review will discuss the key features of pemphigus, illustrate the significance of validated scoring systems, outline previous responsiveness studies for other dermatological scoring systems, and discuss existing outcome measures for pemphigus. The purpose is to understand which scores are better for use in studies and clinical practice and what research remains to be conducted in this area.
Background
Pemphigus is a group of autoimmune vesiculobullous diseases that are associated with auto-antibodies that target intercellular adhesion molecules. The majority of these auto-antibodies are immunoglobulin (Ig) G that target the ectodomain of desmosomal cadherins and in doing so cause loss of keratinocyte-to-keratinocyte adhesion (i.e., acantholysis). Acantholysis leads to blister formation in the epidermis and patients may develop cutaneous flaccid bullae, erosions, or pustules and/or mucosal erosions. The mechanism whereby IgG auto-antibodies induce keratinocyte detachment is still widely debated. The two principal theories include steric hindrance, which implies direct interference with desmosomal adhesion, and triggering of intracellular signalling, which causes loss of keratinocyte adhesion (Evangelista et al., 2015). There is amassing evidence to support both theories and it is likely that both are significant to the pathogenesis of pemphigus.
There are multiple pemphigus subtypes that possess characteristic clinical, histological, and immunologic features, seemingly due to distinctive desmosomal protein targets. These subtypes include pemphigus vulgaris (PV), pemphigus foliaceus (PF), endemic pemphigus foliaceus or fogo selvage (FS), paraneoplastic pemphigus (PNP), and IgA pemphigus (Evangelista et al., 2015). A diagnosis of pemphigus is reliant on clinical features, findings via lesional and perilesional biopsy (histopathology and direct immunofluorescence respectively), and serology (indirect immunofluorescence and enzyme-linked immunosorbent assay [ELISA]; Table 1).
Table 1.
Epidemiology | Clinical Features | Histopathology | Direct immune-fluorescence | Indirect immune-fluorescence | ELISA | Variants | |
---|---|---|---|---|---|---|---|
PV |
|
|
Suprabasilar split with acantholysis | Intercellular IgG deposition | Intercellular IgG deposition. Preferred substrate is monkey esophagus |
|
|
PF |
|
|
Subcorneal split with acantholysis | Intercellular IgG deposition | Intercellular IgG deposition Preferred substrate: normal human skin or Monkey esophagus |
Dsg 1 autoantibodies |
|
PNP |
|
|
Intraepidermal clefting with acantholysis Dense lichenoid infiltrate + interface dermatitis + necrotic keratinocytes |
Intercellular and/or basement membrane zone deposition of C3 and/or IgG | IgG intercellular deposition Preferred substrate: rat bladder |
|
|
IgA pemphigus- subcorneal pustular dermatosis | Any age |
|
Subcorneal clefting and pustules + nominal acantholysis Mixed dermal infiltrate |
Intercellular IgA deposition | Negative in 50% Intercellular IgA deposition Preferred substrate: monkey esophagus |
|
|
IgA pemphigus – intraepiderm-al neutrophilic dermatosis | Any age |
|
Intraepidermal pustules + nominal acantholysis Mixed dermal infiltrate |
Intercellular IgA deposition | Intercellular IgA deposition Preferred substrate: Monkey esophagus |
|
|
Dsg = desmoglein; ELISA = enzyme-linked immunosorbent assay; FS = fogo selvage; IgA = immunoglobulin A; IgG = immunoglobulin G; PV = pemphigus vulgaris; PF = pemphigus foliaceus; PNP = paraneoplastic pemphigus. Sources: Evangelista et al., 2015, Hertl et al., 2006, Hertl and Sitaru, 2015, Mihai and Sitaru, 2007, Oiso et al., 2002, Santoro et al., 2013, Tsuruta et al., 2011, Yeh et al., 2003. |
The incidence of pemphigus alters markedly between populations. This variability is due to the different genetic backgrounds and trigger factors that are associated with particular geographical locations. Most epidemiological studies concur that persons with Jewish ancestry are the most at risk to develop PV. However, the quality of these studies is impeded by their retrospective design and inability to ensure inclusion of all patients (Schmidt et al., 2015).
Mortality associated with PV and PF dropped dramatically from 75% to 30% with the introduction of corticosteroid treatment in the 1950s. Subsequently, adjuvant use of immunosuppressant drugs in the 1980s decreased mortality rates further to approximately 5% of the study populations. Most recently, studies in Taiwan and the United Kingdom have demonstrated that a patient’s risk of death compared to a healthy control is 2-3 times greater, primarily because of infections and particularly pneumonia and septicemia (Huang et al., 2012, Langan et al., 2008, Schmidt et al., 2015).
Importance of scoring systems
Measurement in medicine is impeded by the absence of a consensus on the best instruments to utilize to characterize disease severity. Consequently, this results in non-comparable study outcomes, conceivably false conclusions, and non-evidence based practice. Scoring systems in dermatology are particularly challenging given the shortage of radiographic and laboratory findings that are known to correlate with disease severity (Gaines and Werth, 2008). Thus, generic instrumentation such as the Physician Global Assessment (PGA) are often utilized. The advantage of generic instrumentation is their versatility but their poor reproducibility, reliance on physician experience with the condition, and inability to capture the severity of illnesses that are localized to small areas (e.g., acne) are significant disadvantages (Albrecht and Werth, 2007). Disease-specific scoring systems provide superior accuracy and sensitivity compared with generic scoring systems, as has been proven with the Psoriasis Area and Severity Index (PASI) and Scoring Atopic Dermatitis (SCORAD) (Schram et al., 2012, Weisman et al., 2003).
For pemphigus, there is a definite shortage of multicenter controlled studies that is widely attributed to the difficulty in objectively comparing therapeutic outcomes. A systematic literature review counted more than 116 different outcome measures for pemphigus severity that were used in 96 articles published during the preceding 25 years (Martin and Murrell, 2006). A standardized and validated scoring system is required to address this issue and used universally to: (1) quantify disease severity and progression for interventions in clinical studies (Gaines and Werth, 2008) and allow multidisciplinary discussion of cases (Loh et al., 2014); (2) justify drug use in clinical settings (Gaines and Werth, 2008); and (3) reduce financial costs by identifying and ceasing ineffective treatments (Gaines and Werth, 2008).
Importance of validation
Validation studies illustrate the responsiveness, reliability, and practicality of a tool with regard to its intended measure (Streiner et al., 2008). The use of unvalidated tools provides for the opportunity to produce incorrect study conclusions by utilizing scoring systems that are synthesized specifically in favor of particular treatments, as shown in a systematic review by Marshall et al. (2000) of schizophrenia scoring systems. The study uncovered that studies with unpublished scales were more likely to support a treatment over control.
To address the issue of unvalidated measurement tools, an international Delphi study was held from 2006 to 2007 with 43 health status measurement experts, collectively known as the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group. The COSMIN group worked to establish a consensus on the measurement properties to include when validating Health-Related Patient-Reported Outcomes (HR-PROs) and their definitions, as well as standards and design requirements for the evaluation of the defined measurement properties.
The study resulted in the isolation of three quality domains: reliability, validity, and responsiveness. Within each quality domain, there were one or more measurement properties (Fig. 1; Mokkink et al., 2006, Mokkink et al., 2010a, Mokkink et al., 2010b). While the study was designed specifically for HR-PROs such as quality of life measures, the key principles, quality domains, and definitions are applicable to validate disease severity tools. However, some of the described measurement properties in the COSMIN study, for example cross cultural validity, are not applicable to disease severity tools and thus have been excluded from our discussion.
Quality domains
Reliability
Reliability ensures that the instrument is free from measurement error and provides insight in the inherent noise or variability of the score. The reliability quality domain contains three measurement properties: internal consistency (degree of association between the items), reliability (determined via consistency in score values between observers [inter-rater] and a single observer with sufficient delay between two scorings [intra-rater], and measurement error (changes in score that are not reflective of true changes in the intended construct [e.g., Standard Error of Measurement or SEM]).
Validity
The validity quality domain also consists of three measurement properties. The first, content validity, determines whether the content of the instrument reflects the intended construct and includes the subjective measure of face validity. For pemphigus, a key question that concerns content validity is whether added weighting for site increases the validity or if lesion count alone is sufficient as site distribution plays a role in severity. The second measurement property is construct validity, which aims to conclude that the instrument results represent the intended construct by assessing internal relationships and relationships to other instruments for the same construct. It can also identify discrepancies between pertinent groups. Finally, the criterion validity measures accuracy by determining how much the outcome measure reflects a golden standard to ensure minimal systematic and random bias.
Responsiveness
The domain of responsiveness has only one eponymous measurement property to illustrate the capability of the instrument and perceive change over time. It aims to assess how well the outcome measure detects true changes in the disease state rather than measurement error. It is otherwise referenced as sensitivity to change or discriminant validity and has significant ramifications on conclusions that are drawn with regard to the efficacy of therapies in clinical studies.
Criteria in addition to those described by the COSMIN group are feasibility and cutoffs. Feasibility refers to the time taken to complete the scoring and the resources and/or costs needed to implement the instrument, which may have considerable implications on the outcome measure’s practicality. Disease severity cutoffs allow for a differentiation between mild, moderate, and severe disease status and may have important implications in clinical practice when identifying appropriate therapies and within clinical studies when drawing meaningful comparisons.
Table 2 illustrates the degree of validation of commonly-used dermatological scoring systems by key measurement properties.
Table 2.
ABSIS = Autoimmune Bullous Skin Disorder Intensity Score; BPDAI = Bullous Pemphigoid Disease Area Index; CDASI = Cutaneous Dermatomyositis Disease Area and Severity Index; CLASI = Cutaneous Lupus Erythematosus Disease Area and Severity Index; EBDASI = Epidermolysis Bullosa Disease Activity and Scarring Index; EASI = Eczema Acvitity and Scarring Index; MCID = minimal clinically-important difference; PASI = Psoriasis Area and Severity Index; PDAI = Pemphigus Disease Area Index; SCORAD = Scoring Atopic Dermatitis.
1Rahbar et al., 2014; 2Murrell et al., 2008; 3Boulard et al., 2016; 4Shimizu et al., 2014, 5Pfutze et al., 2007; 6Gourraud et al., 2012; 7Puzenat et al., 2010; 8Tofte et al., 1998; 9Hanifin et al., 2001; 10Breuer et al., 2004; 11Schram et al., 2012; 12Sartorius et al., 2010; 13Loh et al., 2014; 14Jain et al., 2016; 15Wijayanti et al., 2014, Wijayanti et al., 2016; 16Murrell et al., 2012; 17Patsatsi et al., 2012; 18Lévy-Sitbon et al., 2014; 19Bonilla-Martinez et al., 2008; 20Albrecht et al., 2005; 21Klein et al., 2011; 22Goreshi et al., 2012; 23Anyanwu et al., 2015; 24Stalder et al., 1993; 25Schram et al., 2012; 26Angelova-Fischer et al., 2005; 27Langley and Ellis, 2004; 28Zhao et al., 2015.
Previous responsiveness studies for dermatological scoring systems
Responsiveness is essentially proving the validity of the change score and whether the direction and magnitude correlate to the expected results. To determine responsiveness, a longitudinal study design is required with at least two measurements administered and some validated external reference to illustrate whether the patient’s condition is improving, remaining stable, or deteriorating. This is significant in order to conclude whether the patient was truly stable, the instrument is not responsive, or whether the results are due to poor comparator instrument quality.
The COSMIN study divided the design requirements in cases with a golden standard (items 1-7, and 15-18) and cases without golden standard (items 1-14; Fig. 2). In golden standard cases with dichotomous results, the preferred method to deduce responsiveness is a correlation between change scores using Receiver Operator Curve or sensitivity and specificity if the study instrument scores are also dichotomous (Mokkink et al., 2010a).
The COSMIN group also emphasize the need for specific hypotheses to be formulated beforehand and stated in the method section when a golden standard is not available. The hypothesis should predict direction (positive or negative) and magnitude (absolute or relative). A detailed hypothesis avoids bias by avoiding retrospective analysis and alternative explanations for weak correlations where the conclusion should be that the instrument is not responsive.
Table 3 outlines previous responsiveness studies on commonly-used dermatological scoring systems and the methods employed to establish responsiveness.
Table 3.
Instrument | Authors | Year | Sample Size | Method for Responsiveness |
---|---|---|---|---|
EBDASI | Jain et al. | 2016 | 36 | Utilized distribution and anchor-based methods. Distribution-based: Mean change of scores, standardized response mean, and standardized effect size utilized to illustrate the magnitude of change in activity and damage scores. Anchor-based: Pearson’s correlation coefficient utilized to determine the degree of correlation between change in EBDASI score and Likert scale of change. |
BPDAI | Wijayanti et al. | 2016 | 32 | Physician subjective assessment: improved, stable, deteriorated. Paired t test with BPDAI to note statistical significance. To be responsive ➔ statistically significant between improved and deteriorated, not statistically significant when stable. |
BPDAI | Patsatsi et al. | 2012 | 39 | Correlated BPDAI to BP180 titers at baseline, 3-month, and 6-month interval using Spearman’s rho correlation. |
CLASI | Bonilla-Martinez et al. | 2008 | 8 | Utilized: correlations, linear regressions, and Wilcoxon rank sum and 1-sided signed rank exact tests The difference between baseline score and day 56 (change scores) were recorded for CLASI activity and damage, and each correlated via Pearson correlation coefficient with the change score of
|
CDASI | Goreshi et al. | 2012 | 35 | Included the two consecutive visits with the greatest variance in PGA-activity for analysis. Responsiveness was measured via Standardized Response Mean (SRM), SRM = ratio of the mean differences (i.e., CDASI score before and after clinical change was noted) to the standard deviation of the differences |
EASI | Breuer et al. | 2004 | Pimecrolimus (n = 129) Control (n = 66) |
Following treatment with 1% pimecrolimus, EASI, IGA, SCORAD dropped significantly compared to the vehicle/control group as was depicted by t-test. Between-group comparisons established via Cochran-Mantel-Haenzel test. Close correlation between pairs of EASI, IGA, and SCORAD depicted via Pearson test. |
EASI, SCORAD | Schram et al. | 2012 | 143 | Mean scores of EASI and SCORAD were correlated to mean scores of IGA and PGA within each treatment group per time point. Then ROC was utilized. |
BPDAI = Bullous Pemphigoid Disease Area Index; CDASI = Cutaneous Dermatomyositis Disease Area and Severity Index; CLASI = Cutaneous Lupus Erythematosus Disease Area and Severity Index; EBDASI = Epidermolysis Bullosa Disease Activity and Scarring Index; EASI = Eczema Acvitity and Scarring Index; IGA = investigator global assessment; PGA = physcian global assessment; ROC = receiver operating characteristic; SCORAD = Scoring Atopic Dermatitis.
Minimal clinically-important difference
It is important to distinguish minimal clinically-important difference (MCID) from responsiveness. MCID is the smallest change in an instrument score that correlates to a meaningful clinical difference. It is an inappropriate measure of responsiveness because it is simply about the interpretation of a change score as opposed to validity (Mokkink et al., 2010a). MCID usually revolves around patient perception although variations can include MCID obtained through a clinical report, change in clinical parameter, and effect size. For disease severity outcome measures, it is standard for MCID to be derived from some form of physician global assessment or a related tool. There are up to nine methods to identify MCID, some that anchor solely on external criteria while others utilize internal values. The results can vary enormously based on the method used (Cook, 2008). Table 4 illustrates previous studies and methods utilized to establish MCID.
Table 4.
Instrument | Authors | Year | Sample Size | Method for MCID |
---|---|---|---|---|
EBDASI | Jain et al. | 2016 | 36 | Pearson correlation coefficient > 0.3 between Likert scale and EBDASI, thus sufficient to determine MCID.
|
BPDAI | Wijayanti et al. | 2016 | 32 | Average signed change in BPDAI of responders (determined by physician subjective assessment: improved, deteriorated, stable). Confirmed via ROC at/around this cut-off value |
CLASI | Bonilla-Martinez et al. | 2008 | 8 | Clinical cut points which represent minimal clinically meaningful change (responders) were determined
|
CDASI | Anyanwu et al. | 2015 | 128 | Utilized PGA-VAS with a clinical cut point of 2 for responders and less than 2 for non-responders. Used ROC curve to determine the change score which correlated with responders |
EASI, SCORAD | Schram et al. | 2012 | Data from three randomized control studies on atopic eczema treatments n = 143 | Responders were defined as in improvement or decline greater than or equal to 1 in PGA and IGA. ROC utilized. > 0.7 = fair, > 0.8 = good, > 0.9 = excellent responsiveness |
BPDAI = Bullous Pemphigoid Disease Area Index; CDASI = Cutaneous Dermatomyositis Disease Area and Severity Index; CLASI = Cutaneous Lupus Erythematosus Disease Area and Severity Index; EBDASI = Epidermolysis Bullosa Disease Activity and Scarring Index; EASI = Eczema Acvitity and Scarring Index; IGA = investigator global assessment; MCID = minimal clinically-important difference; PGA = physcian global assessment; ROC = receiver operating characteristic; SCORAD = Scoring Atopic Dermatitis; VAS = visual analogue scale.
Scoring systems for pemphigus
Pemphigus Disease Area Index
PDAI was published by the International Pemphigus Definitions Group (IDPG) in 2008 (Fig. 3). The IDPG held five consensus meetings between 2006 and 2008 to establish consensus definitions and develop a scoring system for pemphigus that was molded from the Cutaneous Lupus Erythematosus Disease Area and Severity Index. The IDPG panel consisted of experts on autoimmune blistering diseases, led by Victoria Werth and Dedee Murrell (Murrell et al., 2008).
PDAI scores can range from 0 to 263, comprised of 250 points for disease activity (120 for skin, 10 for scalp, and 12 for mucosa) and 13 points for damage. For activity, the size and number of lesions in each area play a role in the calculation of points assigned. The damage score reflects post-inflammatory hyperpigmentation. A considerable advantage of this scoring system is its sensitivity to small lesion numbers, which increases inter-rater reliability (Zhao and Murrell, 2015). Furthermore, PDAI does not take body surface area (BSA) or lesion type into account, which are both arduous to evaluate, cannot capture mild amounts of disease activity, and can potentially exacerbate small variations between raters.
Autoimmune Bullous Skin Disorder Intensity Score
ABSIS is a generic, AIBD outcome measure produced by the German Blistering Group (Pfutze et al., 2007). The scores can vary from 0 to 206, of which 150 points represent skin involvement, 11 points for oral involvement, and 45 points for subjective discomfort. ABSIS uses the rule of nines and rule of palms to establish BSA, and BSA and lesion type are weighting factors.
Pemphigus Vulgaris Activity Score
PVAS was created by Chams-Davatchi et al. (2013) and produces scores between 0 and 18 with 11 points for cutaneous and 7 points for mucosal involvement. Lesion type, lesion number, and distribution all contribute to the score (Fig. 4). Compared with PDAI, PVAS places less emphasis on the head and greater emphasis on the limbs. PVAS also takes into account Nikolsky’s sign and thus is more susceptible to variability based on the expertise of the rater.
Harman’s scoring system
Harman’s scoring system was created by Harman et al. (2001) in the United Kingdom and system scores are based on the number of skin and oral erosions (Fig. 5). Harman et al. related these scores with anti-desmoglein (Dsg) 1 and anti-Dsg3 ELISA and noted that there was a correlation between severity and Dsg antibody levels.The use of this scoring system is limited by the lack of validation studies and the poor sensitivity to BSA involvement and anatomical distribution as scores are awarded irrespective of site and size.
We could not identify any studies to illustrate the reliability, validity, responsiveness, feasibility, or severity cutoffs for the Harman grading system.
Validation studies on pemphigus scoring systems
Rosenbach et al. conducted a study at the University of Pennsylvania to demonstrate the inter- and intra-rater reliability and convergent validity of PDAI and ABSIS. The study was conducted with the assistance of ten dermatologists who specialize in AIBD to score 15 patients with pemphigus using the PDAI, ABSIS, and PGA scoring systems. The study was limited because the majority of patients had stable disease. Nonetheless, the study demonstrated that PDAI has strong intra- and inter-reliability with an intra-class correlation coefficient (ICC) of 0.98 (95% confidence interval [CI]: 0.96-1.0) and 0.76 (95% CI: 0.61-0.91), respectively. ABSIS intra-and inter-rater reliability had an ICC of 0.80 (95% CI: 0.65-0.96) and 0.77 (95% CI: 0.63-0.91), respectively. Thus, inter-rater reliability for PDAI and ABSIS was almost indistinguishable. However, it is important to emphasize that despite of this, capturing low disease activity (both Dsg and ELISA negative) was different with PDAI. The difference in inter-rater reliability between PDAI and ABSIS became more apparent when only the objective skin activity scores were compared: ICC of 0.86 (95% CI: 0.76-0.95) for PDAI versus 0.39 (95% CI: 0.17-0.60) for ABSIS.
This study also illustrated good convergent validity between PDAI and PGA with a Spearman’s rho correlation of 0.6 (95% CI: 0.49-0.71) compared with the poorer convergent validity of ABSIS of 0.43 (95% CI: 0.30-0.55; Rosenbach et al., 2009). Independent of this study, Chams-Davatchi et al. (2013) asked five experts to score 50 patients with PV to illustrate that PVAS has a superior convergent validity of 0.75 with PGA.
Rahbar et al. (2014) conducted a study independent of the IDPG and German blistering group and produced unbiased results of ABSIS and PDAI in comparison with their PVAS scoring system. The study had a sizeable sample size of 100 patients with active lesions. The study produced higher values for inter-rater reliabilities, which may be due to the increased sample size in the study and perhaps demonstrated a learning curve by dermatologists in the application of scoring systems over the preceding 5 years. The results showed an ICC of 0.98 (95% CI: 0.97-0.98), 0.97 (95% CI: 0.96-0.98), and 0.93 (95% CI: 0.9-0.95) for PDAI, ABSIS, and PVAS inter-rater reliability, respectively, and thus illustrated that PDAI and ABSIS are the most reproducible with almost identical ICC rates. However, this study also considered ICC rates by range and the lower range (anti-Dsg1 and anti-Dsg3 negative; n = 10) was only statistically significant for PDAI with an ICC of 0.96 (95% CI: 0.93-0.98). This illustrates that PDAI is more reliable for low disease activity than PVAS and ABSIS.
Convergent validity against anti-Dsg1 titers was the highest for PDAI, producing a Spearman's rho correlation of 0.67 (p < 0.001), 0.33 (p = 0.002), and 0.52 (p < 0.01) for PDAI, ABSIS, and PVAS, respectively. Convergent validity as determined by anti-Dsg3 titers was poor for all three instruments with an ICC of 0.35 (p = 0.001), 0.33 (p = 0.002), and 0.35 (p = 0.001) for PDAI, ABSIS, and PVAS, respectively (Rahbar et al., 2014).
Two studies have investigated severity cutoffs for PDAI to divide patients based on disease severity into mild, moderate, and severe categories. The first study was conducted in Japan and utilized the physician’s subjective impression of the disease state (mild, moderate, severe) and correlated this with the PDAI score to establish cutoffs using the Youden Index. The values obtained were identified as mild (0-8), moderate (9-24), or severe (≥ 25; Shimizu et al., 2014). In contrast, an independent French study by Boulard et al. (2016) calculated severity cutoffs by identifying scores that correlated with the 25th and 75th percentiles of the scores, which resulted in significantly higher cutoffs. The results obtained were identified as mild (0-14), moderate (15-44), or severe (≥ 45). In their discussion, the researchers justify this discrepancy by sample selection, stating that Shimizu et al. recruited newly-diagnosed and previously-treated patients with minimal severe cases while they enlisted newly-diagnosed, non-treated cases and thus with greater severity (Boulard et al., 2016).
It is also possible that there are population-based variabilities or that the differing methods that were utilized make one study more vulnerable to become skewed. No reviews or studies exist that outline the best method to calculate severity cutoffs and it seems the method can have significant implications on the results, particularly if the population is concentrated in a particular subgroup. Boulard et al. also obtained cutoffs of 17 and 53 to distinguish the three groups for ABSIS using the same method.
The mean time to complete PDAI, ABSIS and PVAS is 2.9 minutes (standard deviation [SD] 1.3 min), 1.9 minutes (SD 1.1 min) and 1.1 minutes (SD 0.7 min), respectively, which makes PVAS the fastest instrument to complete (Rahbar et al., 2014). To date, there are no responsiveness studies on any of the pemphigus scoring systems (Table 5).
Table 5.
ABSIS = Autoimmune Bullous Skin Disorder Intensity Score; CI = confidence interval; Dsg = desmoglein; ICC = xxx; PDAI = Pemphigus Disease Area Index; PGA = physician global assessment; PVAS = Pemphigus Vulgaris Activity Score; SD = standard deviation;
Conclusion
Despite the significant morbidity associated with pemphigus, there is a shortage of multicenter control studies to facilitate evidence-based practice due to the rarity of the disease and the inability to objectively compare therapeutic outcomes. PDAI and ABSIS are promising scoring systems that have proven to be valid and reliable; however, to undergo a complete validation, responsiveness must be assessed. MCID and reaffirmation of cutoff values would also provide significant information that may improve the utility of the instrument in clinical practice.
Footnotes
Conflicts of interest: D. F. Murrell was a co-author/senior author on some of the outcome measures listed (PDAI, BPDAI).
Funding sources: Australian Blistering Diseases Foundation grant and Independent Learning Program of the University of New South Wales Australia.
IRB status: The study received ethical approval from the South Eastern Sydney Local Health District Human Research Ethics Committee – Northern Network (11/STG/134).
References
- Albrecht J., Werth V.P. Development of the CLASI as an outcome instrument for cutaneous lupus erythematosus. Dermatol Ther. 2007;20:93–101. doi: 10.1111/j.1529-8019.2007.00117.x. [DOI] [PubMed] [Google Scholar]
- Albrecht J., Taylor L., Berlin J.A., Dulay S., Ang G., Fakharzadeh S. The CLASI (Cutaneous LE Disease Area and Severity Index): an outcome instrument for cutaneous lupus erythematosus. J Invest Dermatol. 2005;125:889–894. doi: 10.1111/j.0022-202X.2005.23889.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Angelova-Fischer I., Bauer A., Hipler U.C., Petrov I., Kazandjieva J., Bruckner T. The objective severity assessment of atopic dermatitis (OSAAD) score: Validity, reliability and sensitivity in adult patients with atopic dermatitis. Br J Dermatol. 2005;153:767–773. doi: 10.1111/j.1365-2133.2005.06697.x. [DOI] [PubMed] [Google Scholar]
- Anyanwu C.O., Fiorentino D.F., Chung L., Dzuong C., Wang Y., Okawa J. Validation of the Cutaneous Dermatomyositis Disease Area and Severity Index: Characterizing disease severity and assessing responsiveness to clinical change. Br J Dermatol. 2015;173:969–974. doi: 10.1111/bjd.13915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonilla-Martinez Z.L., Albrecht J., Troxel A.B., Taylor L., Okawa J., Dulay S. The cutaneous lupus erythematosus disease area and severity index: A responsive instrument to measure activity and damage in patients with cutaneous lupus erythematosus. Arch Dermatol. 2008;144:173–180. doi: 10.1001/archderm.144.2.173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boulard C., Duvert Lehembre S., Picard-Dahan C., Kern J.S., Zambruno G., Feliciani C. Calculation of cut-off values based on the Autoimmune Bullous Skin Disorder Intensity Score (ABSIS) and Pemphigus Disease Area Index (PDAI) pemphigus scoring systems for defining moderate, significant and extensive types of pemphigus. Br J Dermatol. 2016;175:142–149. doi: 10.1111/bjd.14405. [DOI] [PubMed] [Google Scholar]
- Breuer K., Braeutigam M., Kapp A., Werfel T. Influence of pimecrolimus cream 1% on different morphological signs of eczema in infants with atopic dermatitis. Dermatology. 2004;209:314–320. doi: 10.1159/000080855. [DOI] [PubMed] [Google Scholar]
- Chams-Davatchi C., Rahbar Z., Daneshpazhooh M., Mortazavizadeh S.M., Akhyani M., Esmaili N. Pemphigus vulgaris activity score and assessment of convergent validity. Acta Med Iran. 2013;51:224–230. [PubMed] [Google Scholar]
- Cook C.E. Clinimetrics corner: The Minimal Clinically Important Change Score (MCID): A necessary pretense. J Man Manip Ther. 2008;16:E82–E83. doi: 10.1179/jmt.2008.16.4.82E. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evangelista F., Culton D.A., Diaz L.A. Desmosomal proteins as autoantigens in pemphigus. In: Murrell D.F., editor. Blistering diseases: Clinical features, pathogenesis, treatment. Springer; Sydney: 2015. pp. 55–65. [Google Scholar]
- Gaines E., Werth V.P. Development of outcome measures for autoimmune dermatoses. Arch Dermatol Res. 2008;300:3–9. doi: 10.1007/s00403-007-0813-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goreshi R., Okawa J., Rose M., Feng R., Lee L.A., Hansen C.B. Evaluation of reliability, validity, and responsiveness of the CDASI and the CAT-BM. J Invest Dermatol. 2012;132:1117–1124. doi: 10.1038/jid.2011.440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gourraud P.A., Le Gall C., Puzenat E., Aubin F., Ortonne J.P., Paul C.F. Why statistics matter: Limited inter-rater agreement prevents using the psoriasis area and severity index as a unique determinant of therapeutic decision in psoriasis. J Invest Dermatol. 2012;132:2171–2175. doi: 10.1038/jid.2012.124. [DOI] [PubMed] [Google Scholar]
- Hanifin J.M., Thurston M., Omoto M., Cherill R., Tofte S.J., Graeber M. The eczema area and severity index (EASI): Assessment of reliability in atopic dermatitis. EASI Evaluator Group. Exp Dermatol. 2001;10:11–18. doi: 10.1034/j.1600-0625.2001.100102.x. [DOI] [PubMed] [Google Scholar]
- Harman K.E., Seed P.T., Gratian M.J., Bhogal B.S., Challacombe S.J., Black M.M. The severity of cutaneous and oral pemphigus is related to desmoglein 1 and 3 antibody levels. Br J Dermatol. 2001;144:775–780. doi: 10.1046/j.1365-2133.2001.04132.x. [DOI] [PubMed] [Google Scholar]
- Hertl M., Eming R., Veldman C. T cell control in autoimmune bullous skin disorders. J Clin Invest. 2006;116:1159–1166. doi: 10.1172/JCI28547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hertl M., Sitaru C. Pathogenesis, clinical manifestations, and diagnosis of pemphigus. 2015. http://www.uptodate.com/contents/pathogenesis-clinical-manifestations-and-diagnosis-of-pemphigus UpToDate. Available from:
- Huang Y.H., Kuo C.F., Chen Y.H., Yang Y.W. Incidence, mortality, and causes of death of patients with pemphigus in Taiwan: A nationwide population-based study. J Invest Dermatol. 2012;132:92–97. doi: 10.1038/jid.2011.249. [DOI] [PubMed] [Google Scholar]
- Jain S.V., Harris A.G., Su J.C., Orchard D., Warren L.J., McManus H. The Epidermolysis Bullosa Disease Activity and Scarring Index (EBDASI): Grading disease severity and assessing responsiveness to clinical change in epidermolysis bullosa. J Eur Acad Dermatol Venereol. 2016 doi: 10.1111/jdv.13953. [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein R., Moghadam-Kia S., LoMonico J., Okawa J., Coley C., Taylor L. Development of the CLASI as a tool to measure disease severity and responsiveness to therapy in cutaneous lupus erythematosus. Arch Dermatol. 2011;147:203–208. doi: 10.1001/archdermatol.2010.435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langan S.M., Smeeth L., Hubbard R., Fleming K.M., Smith C.J., West J. Bullous pemphigoid and pemphigus vulgaris—incidence and mortality in the UK: Population based cohort study. BMJ. 2008;337:a180. doi: 10.1136/bmj.a180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langley R.G., Ellis C.N. Evaluating psoriasis with Psoriasis Area and Severity Index, Psoriasis Global Assessment, and Lattice System Physician's Global Assessment. J Am Acad Dermatol. 2004;51:563–569. doi: 10.1016/j.jaad.2004.04.012. [DOI] [PubMed] [Google Scholar]
- Lévy-Sitbon C., Barbe C., Plee J., Goeldel A.L., Antonicelli F., Reguiai Z. Assessment of bullous pemphigoid disease area index during treatment: A prospective study of 30 patients. Dermatology. 2014;229:116–122. doi: 10.1159/000362717. [DOI] [PubMed] [Google Scholar]
- Loh C.C., Kim J., Su J.C., Daniel B.S., Venugopal S.S., Rhodes L.M. Development, reliability, and validity of a novel Epidermolysis Bullosa Disease Activity and Scarring Index (EBDASI) J Am Acad Dermatol. 2014;70:89–97. doi: 10.1016/j.jaad.2013.09.041. [DOI] [PubMed] [Google Scholar]
- Marshall M., Lockwood A., Bradley C., Adams C., Joy C., Fenton M. Unpublished rating scales: A major source of bias in randomised controlled trials of treatments for schizophrenia. Br J Psychiatry. 2000;176:249–252. doi: 10.1192/bjp.176.3.249. [DOI] [PubMed] [Google Scholar]
- Martin L., Murrell D.F. Measuring the immeasurable: A systematic review of outcome measures in pemphigus. Australas J Dermatol. 2006;47:A32–A33. [Google Scholar]
- Mihai S., Sitaru C. Immunopathology and molecular diagnosis of autoimmune bullous diseases. J Cell Mol Med. 2007;11:462–481. doi: 10.1111/j.1582-4934.2007.00033.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mokkink L.B., Terwee C.B., Knol D.L., Stratford P.W., Alonso J., Patrick D.L. Protocol of the COSMIN study: COnsensus-based Standards for the selection of health Measurement INstruments. BMC Med Res Methodol. 2006;6:2. doi: 10.1186/1471-2288-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mokkink L.B., Terwee C.B., Patrick D.L., Alonso J., Stratford P.W., Knol D.L. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Qual Life Res. 2010;19:539–549. doi: 10.1007/s11136-010-9606-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mokkink L.B., Terwee C.B., Patrick D.L., Alonso J., Stratford P.W., Knol D.L. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–745. doi: 10.1016/j.jclinepi.2010.02.006. [DOI] [PubMed] [Google Scholar]
- Murrell D.F., Dick S., Ahmed A.R., Amagai M., Barnadas M.A., Borradori L. Consensus statement on definitions of disease, end points, and therapeutic response for pemphigus. J Am Acad Dermatol. 2008;58:1043–1046. doi: 10.1016/j.jaad.2008.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murrell D.F., Daniel B.S., Joly P., Borradori L., Amagai M., Hashimoto T. Definitions and outcome measures for bullous pemphigoid: Recommendations by an international panel of experts. J Am Acad Dermatol. 2012;66:479–485. doi: 10.1016/j.jaad.2011.06.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oiso N., Yamashita C., Yoshioka K., Amagai M., Komai A., Nagata Y. IgG/IgA pemphigus with IgG and IgA antidesmoglein 1 antibodies detected by enzyme-linked immunosorbent assay. Br J Dermatol. 2002;147:1012–1017. doi: 10.1046/j.1365-2133.2002.04984.x. [DOI] [PubMed] [Google Scholar]
- Patsatsi A., Koletsa T., Sotiriadis D., Kartsios C., Papaconstantinou A., Kostopoulos I. Pyoderma gangrenosum in a patient with primary cutaneous peripheral T-cell lymphoma, not otherwise specified. Eur J Dermatol. 2012;22:792–793. doi: 10.1684/ejd.2012.1862. [DOI] [PubMed] [Google Scholar]
- Pfutze M., Niedermeier A., Hertl M., Eming R. Introducing a novel Autoimmune Bullous Skin Disorder Intensity Score (ABSIS) in pemphigus. Eur J Dermatol. 2007;17:4–11. doi: 10.1684/ejd.2007.0090. [DOI] [PubMed] [Google Scholar]
- Puzenat E., Bronsard V., Prey S., Gourraud P.A., Aractingi S., Bagot M. What are the best outcome measures for assessing plaque psoriasis severity? A systematic review of the literature. J Eur Acad Dermatol Venereol. 2010;24(Suppl. 2):10–16. doi: 10.1111/j.1468-3083.2009.03562.x. [DOI] [PubMed] [Google Scholar]
- Rahbar Z., Daneshpazhooh M., Mirshams-Shahshahani M., Esmaili N., Heidari K., Aghazadeh N. Pemphigus disease activity measurements: Pemphigus disease area index, autoimmune bullous skin disorder intensity score, and pemphigus vulgaris activity score. JAMA Dermatol. 2014;150:266–272. doi: 10.1001/jamadermatol.2013.8175. [DOI] [PubMed] [Google Scholar]
- Rosenbach M., Murrell D.F., Bystryn J.C., Dulay S., Dick S., Fakharzadeh S. Reliability and convergent validity of two outcome instruments for pemphigus. J Invest Dermatol. 2009;129:2404–2410. doi: 10.1038/jid.2009.72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santoro F.A., Stoopler E.T., Werth V.P. Pemphigus. Dent Clin N Am. 2013;57:597–610. doi: 10.1016/j.cden.2013.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sartorius K., Emtestam L., Lapins J., Johansson O. Cutaneous PGP 9.5 distribution patterns in hidradenitis suppurativa. Arch Dermatol Res. 2010;302:461–468. doi: 10.1007/s00403-010-1028-5. [DOI] [PubMed] [Google Scholar]
- Schmidt E., Borradori L., Joly P. Epidemiology of Autoimmune Bullous Diseases. In: Murrell D.F., editor. Blistering diseases: Clinical features, pathogenesis, treatment. Springer; Sydney: 2015. pp. 251–263. [Google Scholar]
- Schram M.E., Spuls P.I., Leeflang M.M., Lindeboom R., Bos J.D., Schmitt J. EASI, (objective) SCORAD and POEM for atepic eczema: Responsiveness and minimal clincally important difference. Allergy. 2012;67:99–106. doi: 10.1111/j.1398-9995.2011.02719.x. [DOI] [PubMed] [Google Scholar]
- Shimizu T., Takebayashi T., Sato Y., Niizeki H., Aoyama Y., Kitajima Y. Grading criteria for disease severity by pemphigus disease area index. J Dermatol. 2014;41:969–973. doi: 10.1111/1346-8138.12649. [DOI] [PubMed] [Google Scholar]
- Stalder J.F., Dutartre H., Laruche G., Litoux P. Photoprotection in children. Ann Dermatol Venereol. 1993;120:485–488. [PubMed] [Google Scholar]
- Streiner D.L., Norman G.R., Cairney J. 4th ed. Oxford; Oxford: 2008. Health measurement scales: A practical guide to their development and use. [Google Scholar]
- Tofte S.J., Graeber M., Cherill R., Omoto M., Thurston M., Hanifin J.M. Eczema area and severity index (EASI): a new tool to evaluate atopic dermatitis. J Eur Acad Dermatol Venereol. 1998;11:S197. doi: 10.1034/j.1600-0625.2001.100102.x. [DOI] [PubMed] [Google Scholar]
- Tsuruta D., Ishii N., Hamada T., Ohyama B., Fukuda S., Koga H. IgA pemphigus. Clin Dermatol. 2011;29:437–442. doi: 10.1016/j.clindermatol.2011.01.014. [DOI] [PubMed] [Google Scholar]
- Weisman S., Pollack C.R., Gottschalk R.W. Psoriasis disease severity measures: Comparing efficacy of treatments for severe psoriasis. J Dermatolog Treat. 2003;14:158–165. doi: 10.1080/09546630310013360. [DOI] [PubMed] [Google Scholar]
- Wijayanti A, Zhao CY, Boettiger D, Ishii N, Hashimoto T, Murrell DF. Reliability, validity, and responsiveness of Bullous Pemphigoid Disease Area Index (BPDAI). Paper presented at the Cutaneous Biologic Meeting; Australasian Society of Dermatological Research, North Stradbroke Island 2014. http://cutaneous-2014.p.asnevents.com.au/days/2014-09-23/abstract/18743
- Wijayanti A., Zhao C.Y., Boettiger D., Chiang Y.Z., Ishii N., Hashimoto T. The reliability, validity and responsiveness of two disease scores (BPDAI and ABSIS) for bullous pemphigoid: Which one to use? Acta Derm Venereol. 2016 doi: 10.2340/00015555-2473. [Epub ahead of print] [DOI] [PubMed] [Google Scholar]
- Yeh S.W., Ahmed B., Sami N., Razzaque A.A. Blistering disorders: Diagnosis and treatment. Dermatol Ther. 2003;16:214–223. doi: 10.1046/j.1529-8019.2003.01631.x. [DOI] [PubMed] [Google Scholar]
- Zhao C.Y., Murrell D.F. Outcome measures for autoimmune blistering diseases. J Dermatol. 2015;42:31–36. doi: 10.1111/1346-8138.12711. [DOI] [PubMed] [Google Scholar]
- Zhao C.Y., Tran A.Q., Lazo-Dizon J.P., Kim J., Daniel B.S., Venugopal S.S. A pilot comparison study of four clinician-rated atopic dermatitis severity scales. Br J Dermatol. 2015;173:488–497. doi: 10.1111/bjd.13846. [DOI] [PubMed] [Google Scholar]