Abstract
The 5-Domain Niemann Pick Type C Clinical Severity Scale (5DNPCCSS) is used in clinical practice and trials. While psychometric data support the clinical meaningfulness of the concepts and the scale’s interrater reliability, more information is needed to support its construct validity. Here, we evaluated the convergent validity of the Cognition, Speech, and Fine Motor domains. Data from 121 individuals with NPC were drawn from several studies conducted at two U.S. sites. Direct standardized assessments included the Nine-Hole pegboard or Purdue pegboard, a portion of the Clinical Evaluation of Language Fundamentals, and the age-appropriate Wechsler IQ test or the Mullen Scales of Early Learning. The 5DNPCCSS domains were significantly related in the expected directions to their respective direct assessments, supporting their construct validity. In combination with previous evidence presented for the Ambulation and Swallow domains, these results support the fitness of purpose of the 5DNPCCSS for clinical studies in NPC. ClinicalTrials.gov: NCT00344331, NCT01747135, NCT02534844
Keywords: Niemann-Pick Type C, NPCCSS, clinical outcome assessment, validity
Niemann-Pick disease, type C (NPC) is an autosomal recessive neurodegenerative disorder characterized by endolysosomal storage of unesterified cholesterol in brain, liver, spleen and other peripheral tissues.1 NPC is caused by pathological variants in NPC1 (90–95%) or NPC2 (~5%) that yield deficient function of the corresponding proteins that normally bind and transport cholesterol, resulting in deficits in intracellular trafficking of cholesterol and leading to the combination of toxic lipid accumulation in lysosomes as well as the relative deficiency of necessary cholesterol in the rest of the cell.2 The incidence of NPC is estimated at about 1/100,000, although older-onset cases are likely under-ascertained.1,3 Age of onset ranges from birth to the eighth decade, with a highly heterogeneous phenotype including psychiatric, neurological, hearing, pulmonary, and visceral signs that vary depending on age of presentation, with hepatosplenomegaly more prevalent in perinatal- and infantile-onset forms, and psychiatric disease more common in teenage- and adult-onset cases.2,4 The classic juvenile form of NPC involves insidious onset in mid-childhood of vertical supranuclear gaze palsy (VSGP), clumsiness, cognitive decline and/or fine motor impairments evolving to overt dementia, ataxia, intention tremor, dystonia, spasticity, dysarthria, dysphagia, hearing loss, seizures and gelastic cataplexy.5 NPC is relentlessly progressive with death in childhood-onset cases typically in the teens or early twenties, and slower progression of later-onset cases. There were no FDA-approved treatments for NPC until recently when arimoclomol in combination with miglustat and N-acetyl-L-leucine were approved. Neither of these treatments has been shown to stop disease progression fully or long-term and both intrathecal and intravenous 2-hydroxypropyl-beta- cyclodextrin are still in development for NPC, with other advances such as gene therapy for NPC on the horizon.
As with many other rare diseases, the identification of outcome measures that are clinically relevant to a heterogeneous population and sufficiently responsive to treatment has been a major challenge. The NPC Clinical Severity Scale (NPC-CSS-17) was developed at the National Institutes of Health.6 The initial NPC-CSS had 17 domains that were rated by a clinician on a Likert-style scale. Clinical experience guided the assignment of weights to each domain, with major domains rated on a scale of 0 – 5 and minor domains rated 0 – 2; the domains were summed for a total score (range 0 – 61). The NPC-CSS-17 was first evaluated in two cohorts, one prospective natural history sample (n=18; aged 4 – 51, median 8.5 years) and a second sample from a retrospective medical chart review (n=19; age 2 – 38, median 14 years).6 The authors found support for the validity of the scale in the pattern of increasing total scores both within person over time and across participants of increasing age. Internal consistency was supported by generally high domain-total correlations, with the exception of the Hearing domain. Strong interrater reliability was also observed across three raters. Since its publication, the NPC-CSS-17 has been used in nearly every study of NPC, and a recent Delphi study with 19 experts found agreement for the NPC-CSS-17 as the first choice to assess severity of disease in clinical trial settings.7
In the years since its publication, researchers and regulators have noted that the wide-ranging composition of the NPC-CSS-17, meant to capture the full range of clinical presentation, might reduce its ability to detect an effect of intervention. Item content such as auditory brainstem response testing results, seizures, movement disorders, and psychiatric symptoms are either diagnostic biomarkers or treated concomitantly and could therefore confound the assessment of treatment efficacy. To address these concerns, 23 clinicians and 49 patients and caregivers were surveyed to determine which of the NPC-CSS-17 domains they regarded as most clinically important.8,9 Both clinicians and patients/caregivers chose the same five domains: Ambulation, Fine Motor Skills, Swallow, Cognition, and Speech.
Several independent studies have produced support for the psychometric profile of the five-domain version (5DNPCCSS). Strong inter- and intra-rater reliability was observed for both live10 and video-based4 ratings of the 5DNPCCSS. Three studies yielded evidence for the criterion validity of the 5DNPCCSS score via strong correlation with the 17 domain NPCCSS total score,9,10 for the convergent validity of the Fine Motor Skills and Ambulation domain-level scores via moderate-to-strong correlations with the 9-Hole Peg Test and the Scale for Assessment and Rating of Ataxia,9 and for the convergent validity of the Swallow domain via moderate correlations with the ASHA National Outcomes Measurement System and an adaptation of the Rosenbek Penetration/Aspiration Scale.11 However, no convergent validity evidence for the other domains has been reported. Existing work also supports the validity of change in the 5DNPCCSS score, with a one-unit change in any domain being endorsed as clinically meaningful.9
The goal of the current analysis is to add to the literature supporting the validity of the 5DNPCCSS for use as a clinical outcome assessment of disease severity by exploring domain level scores in Fine Motor, Speech, and Cognition with direct objective assessments of these constructs.
Participants and Methods
Participants
Participants were drawn from several studies performed at two sites at which participants with clinically and biochemically confirmed NPC were followed with regular clinical evaluations, including a natural history study for NPC (NCT00344331), a phase 1/2a trial of intrathecal 2-hydroxypropyl-beta-cyclodextrin (adrabetadex, NCT01747135), a phase 2b/3 trial of adrabetadex (NCT02534844), and an expanded access program for treatment of NPC with adrabetadex (IND119856). Each of these studies was approved by the Institutional Review Board (IRB) of the National Institutes of Health (06-CH-0186, 13-CH-0001, 16-CH-0016) or the Rush University IRB (ORAs 13092305, 15081309). Written consent, and verbal assent when appropriate, was obtained for all participants. Some participants were enrolled in more than one study. Data, indexed by unique subject identifier and visit date, were compiled across all studies and an individual encounter was included for analysis if (a) the 5DNPCCSS was administered and (b) one of the direct assessments (described below) was administered. Although all assessments corresponding to a study visit were usually completed on the same date, a window of up to 180 days was used to accommodate occasions where testing occurred on separate days.
Measures
A battery of neurodevelopmental and medical assessments was performed for each participant, dependent on the specific protocol. A subset of these were selected for the current analysis, as described below. All assessments were performed by trained clinicians or medical doctors. Given the time range and number of studies included in the analysis, the study staff may have varied even within site.
5DNPCCSS.
The 5DNPCCSS was completed based on a combination of caregiver interview and direct examination of the participant, including in some cases, and almost exclusively for the cognitive domain rating when used, the results of assessments described below. The 5DNPCCSS comprises the Ambulation, Fine Motor Skills, Swallow, Cognition, and Speech domains, each with a point range of 0 – 5. Only the Fine Motor Skills, Cognition, and Speech domains are addressed in the current analysis. Participants with ratings of 5 on the Fine Motor Skills domain were not able to complete the fine motor direct assessments and were excluded from analysis. The Speech domain is predominantly focused on dysarthria, however, at more severe levels of impairment when there tends to be a paucity of speech, cognitive ability, receptive language, and non-verbal communication become part of the scoring. Further, individuals receiving a score of 3 or 5 on the Speech domain would not be able to complete the direct speech assessment described below and were excluded from analysis. Thus, the results of analysis correspond only to ratings of 0 – 2. The rating scales for the Fine Motor and Cognitive domains were converted into consecutive numbers for analysis (see Table 1).
Table 1.
5DNPCCSS Domains and Direct Assessments Used for Convergent Validity Analysis
5DNPCCSS Domain | Direct Assessment and Scores | 5DNPCCSS Ratings | ||
---|---|---|---|---|
Original Rating | Rating Text | Score Used for Analysis | ||
Fine Motor Skills | Nonverbal and verbal developmental quotient of the Mullen Scales of Early Learning (MSEL); Nonverbal and verbal IQ from the Wechsler Preschool and Primary Scale of Intelligence (WPPSI), the Wechsler Intelligence Scale for Children (WISC), or the Wechsler Adult Intelligence Scale (WAIS) | 0 | Normal | 0 |
1 | Slight dysmetria/dystonia | 1 | ||
2 | Mild dysmetria/dystonia | 2 | ||
4 | Moderate dysmetria/dystonia | 3 | ||
5 | Severe dysmetria/dystonia | Excludeda | ||
Cognition | Clinical Evaluation of Language Fundamentals (CELF) Formulated Sentences Z-score | 0 | Normal | 0 |
1 | Mild learning delay, grade-appropriate for age | 1 | ||
3 | Moderate learning delay; individualized curriculum or modified work setting | 2 | ||
4 | Severe delay/plateau; no longer in school or able to work, some loss of cognitive function | 3 | ||
5 | Minimal cognitive function | 4 | ||
Speech | Nine-hole Peg Dominant and Non-dominant Z-scores or Purdue Pegboard Dominant and Non-dominant Z-scores | 0 | Normal | 0 |
1 | Mild dysarthria | 1 | ||
2 | Severe dysarthria | 2 | ||
3 | Nonverbal/functional communication skills for needs | Excludeda | ||
5 | Minimal communication | Excludeda |
Participants who received this score were unable to complete the direct assessment. This score is not evaluated in the analysis.
Cognitive Assessments.
Cognitive ability was directly assessed with a developmental or traditional IQ test, depending on the age and ability of the participant. Developmental testing was used for young children and older children who could not achieve basal on the IQ testing. The Mullen Scales of Early Learning (MSEL) is a developmental direct assessment normed for children 0 – 68 months. To operationalize performance on the MSEL, we selected developmental quotients, which are calculated as 100*(age equivalent / chronological age). This was due to widespread floor effects in the standard score, which obscure variability, as well as the fact that the MSEL does not produce verbal and nonverbal domains. The verbal developmental quotient (VDQ) uses the average of Receptive Language and Expressive Language age equivalents, and the nonverbal developmental quotient (NVDQ) uses the Fine Motor and Visual Receptive age equivalents. Developmental quotients have no population distribution but in practice are treated as analogous to standard scores. However, DQs decrease as a function of age regardless of performance,12 and so we restricted the analysis of MSEL data to participants in the age range of the test.
Traditional IQ testing was performed where possible, using the age-appropriate Wechsler test. For children aged 2 years 6 months to 7 years 7 months, the Wechsler Preschool and Primary Scales of Intelligence (WPPSI-IV) was used. Children aged 6 to 16 years 11 months received the Wechsler Intelligence Scales for Children (WISC-IV or WISC-V), and adolescents/adults at least 16 years of age received the Wechsler Adult Intelligence Scales (WAIS-IV). Children who initiated study participation prior to age 6 were switched from the WPPSI to the WISC at age 6 or 7 years; children who joined the study after age 6 received only the WISC. Individuals who initiated study participation at 16 years or older were given the WAIS; otherwise, they were switched from WISC to WAIS between the ages of 16 years, 0 months and 16 years, 11 months. Some participants received the Wechsler Abbreviated Scales of Intelligence; this test was not included in the current analysis. Verbal IQ (VIQ) was operationalized as the Verbal Comprehension Index from all forms, and nonverbal IQ (NVIQ) was the Visual Spatial Composite for the WPPSI-IV, the Nonverbal Index from the WISC-V, and the Perceptual Reasoning Index from the WISC-IV and WAIS-IV. These are norm-referenced standard scores with a population distribution of 100±15 and a floor of 40.
Fine Motor Assessments.
Fine motor ability was directly assessed with one of two tests. The Nine-Hole Peg Test13,14 requires participants to place nine pegs into evenly spaced holes, with the board position at the midline of the body. Performance for the dominant and non-dominant hand is operationalized using the time-to-completion (in seconds), which can then be compared to normative data14 to produce Z-scores. Higher Z-scores indicate worse performance relative to normative data. The Nine-Hole Peg Test was used only at Rush.
For more able participants, we used the more difficult Purdue Pegboard.15–17 Performance is operationalized as the number of pegs placed within 30 seconds, which can be compared to age and sex-based normative data for ages 5 years and up17 to produce Z-scores. Higher Z-scores indicate better performance relative to normative data.
Speech.
The direct assessment of speech was the CELF-4 Formulated Sentences subscale.18 While this language task does not measure dysarthria, it requires speech output and requires the person being tested to produce statements understandable to the examiner. The Formulated Sentences is valid for ages 5 – 21 years and was not administered to children younger than 5 years, regardless of speech ability. However, it was also administered to participants older than the normative range, necessitating the use of alternative scoring procedures. Z-derivation scores were calculated using the mean and standard deviations provided in the CELF-4 manual for the participant’s age band. The CELF-4 does not provide normative data past the age of 21 years 11 months, but research indicates that in structured oral language tasks, adults do not differ from adolescents.19 Thus, for participants older than 21 years 11 months, the mean and standard deviations provided for the 21 years 11 months age band were used. Higher Z-scores indicate better relative performance. The CELF-4 Formulated Sentences subscale was administered only at Rush.
Statistical analysis
The goal of this study was to contribute to the psychometric case for the 5DNPCCSS by evaluating its convergent validity with direct assessments of the same constructs, both in terms of status and change in status. Given that the 5DNPCCSS reflects severity relative to typical expectations, we operationalized performance on the direct assessments using normative scores. We specified a multilevel model of the direct assessment as a function of 5DNPCCSS score. To facilitate the evaluation of the relationship between change in the instruments, we separated the between-person and within-person effects of 5DNPCCSS domain on the direct assessment. Following procedures described elsewhere,20 we calculated the average 5DNPCCSS domain-level score for each person (between-subject effect) and the difference from that average at each visit (within-subject effect). The between-person effect is interpreted as the average difference in direct assessment score between individuals with a one-unit difference in severity score. The within-person effect is interpreted as the expected change in the direct assessment score given a one-unit change in severity score for a given person. These two terms were allowed to interact, testing the hypothesis that a one-unit change in 5DNPCCSS domain score has the same relationship with the direct assessment for all 5DNPCCSS domain severity levels (i.e., that change in the 5DNPCCSS has equal interval properties with respect to the direct assessment).
The assessment of change in this analysis focuses on within-person variability, or whether a 5DNPCCSS score that is higher/lower than the person’s average was associated with a higher/lower direct assessment score. Importantly, this is agnostic to time; we did not model change over time (or age) in any assessment, because the focus was on the relationship between contemporaneous ratings on two measures, versus change in an individual measure over time. Thus, the models did not contain a parameter for time-to-follow-up (i.e., time between repeated observations). For the same reason, as well as because both the severity scores and direct assessment scores were age-norm referenced to some extent, the models did not include age. We included site (NIH and RUMC) as a design covariate and used a random subject-level intercept and slope for the effect of within-person 5DNPCCSS domain change. All parameter estimates are reported with 95% confidence intervals alongside the test statistics with raw p-values. To reflect the effect of multiplicity, however, we use a threshold of p = .005 for in-text discussion of statistical significance.
Results
A total of 121 participants were eligible for analysis; for each direct assessment the analysis sets ranged in size from n=25 (CELF) to n=67 (Purdue) (Table 2). Within each domain, most participants had at least two assessments, though the median [IQR] number of repeated visits ranged from 2 [1, 3] for the MSEL to 7 [1, 13] for the Nine-hole. At their earliest assessment within each domain, the median [IQR] age of participants ranged from 2.29 [1.38, 3.27] for the MSEL to 18.21 [6.33, 25.83] for the Wechsler scales. Baseline 5DNPCCSS and direct assessment data are summarized by domain in Table 3. Moderate-to-strong Pearson correlations in the expected directions (absolute values ranging .53 – .76) were observed between all direct assessment scores and their corresponding 5DNPCCSS domain severity scores (see Figures 1 and 2). Initial multilevel modeling indicated high stability of direct assessment scores within person, with ICCs ranging from .70 for Nine-hole Dominant Z-score to .88 for Wechsler NVIQ.
Table 2.
Participant characteristics by analysis subsample
Characteristic | Cognition | Speech | Fine Motor | Total Sample | ||
---|---|---|---|---|---|---|
MSEL | Wechsler | CELF | 9-Hole | Purdue | ||
Sample Size, n(% of total sample) | 36 (30%) | 60 (50%) | 25 (21%) | 37 (31%) | 67 (55%) | 121 (100%) |
Site, n (% subsample) | ||||||
NIH | 22 (61%) | 31 (52%) | 0 | 0 | 50 (72%) | 72 (59%) |
Rush | 14 (39%) | 29 (48%) | 25 (100%) | 37 (100%) | 19 (28%) | 51 (41%) |
Total number of assessments | 84 | 172 | 96 | 247 | 270 | NA |
Individuals with repeated (≥2) assessments, n (% subsample) | 20 (56%) | 38 (63%) | 19 (76%) | 26 (70%) | 45 (67%) | NA |
Number of repeated assessments, median [IQR] | 2 [1, 3] | 2 [1, 4] | 4 [2, 4] | 7 [1, 13] | 2 [1, 6] | NA |
Age at first assessment in years, median [IQR] | 2.29 [1.38, 3.27] | 18.21 [6.33, 25.83] | 17.88 [7.42, 21.97] | 12.65 [7.16, 19.99] | 17.25 [10.92, 21.98] | NA |
Total duration of follow-up in years, median [IQR] | 1.6 [1.04, 2.51] | 3.08 [2, 4.54] | 3.14 [2.99, 5.07] | 3.73 [1.37, 5.1] | 2.92 [1.42, 3.67] | |
Sex, n(% subsample) | ||||||
Male | 23 (64%) | 32 (53%) | 15 (60%) | 21 (57%) | 38 (57%) | 69 (57%) |
Female | 13 (36%) | 28 (47%) | 10 (40%) | 16 (43%) | 29 (43%) | 52 (43%) |
Race, n(% subsample) | ||||||
Asian | 0 (0%) | 1 (2%) | 1 (4%) | 2 (5%) | 1 (1%) | 2 (2%) |
Black or African-American | 1 (3%) | 2 (3%) | 1 (4%) | 1 (3%) | 2 (3%) | 4 (3%) |
Multiple | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (1%) | 1 (1%) |
White | 35 (97%) | 54 (90%) | 22 (88%) | 31 (84%) | 60 (90%) | 110 (91%) |
Unknown | 0 (0%) | 3 (5%) | 1 (4%) | 3 (8%) | 3 (4%) | 4 (3%) |
Note: IQR = interquartile range. This table reflects only the visits included in the current analysis, which was determined based on the availability of at least one direct assessment and an 5DNPCCSS. For direct assessments where more than one subscale was used, the sample size reflects the number of participants with any of the subscales. The total sample includes all participants included in at least one domain of analysis.
Table 3.
Baseline descriptive statistics for 5DNPCCSS and direct assessments
Direct Assessment Domain / Score | Sample Size | 5DNPCCSS Domain score, mean (SD) | Direct assessment score, mean (SD) | Pearson correlation between 5DNPCCSS domain and direct assessment |
---|---|---|---|---|
n | Mean (SD) | Mean (SD) | r (p) | |
Cognition | 33 | 0.61 (0.93) | 89.89 (35.03) | −0.70 |
MSEL NVDQ | 34 | 0.53 (0.9) | 79.44 (26.5) | −0.76 |
MSEL VDQ | 49 | 1.47 (0.89) | 73.45 (15.32) | −0.61 |
Wechsler NVIQ | 60 | 1.4 (0.87) | 83.93 (18.49) | −0.70 |
Wechsler VIQ | 33 | 0.61 (0.93) | 89.89 (35.03) | −0.70 |
Speech | ||||
CELF-4 Formulated Sentences Z-score | 25 | 1.08 (0.57) | −2.31 (2.62) | −0.79 |
Fine Motor | ||||
Nine-hole Dominant Z-score | 37 | 1.62 (0.89) | 20.57 (26.11) | 0.53 |
Nine-hole Nondominant Z-score | 34 | 1.59 (0.89) | 22.11 (22.46) | 0.63 |
Purdue Dominant Z-score | 59 | 1.34 (0.86) | −5.03 (2.53) | −0.65 |
Purdue Nondominant Z-score | 60 | 1.37 (0.88) | −5.59 (2.7) | −0.61 |
Note: MSEL = Mullen Scales of Early Learning. NVDQ/IQ = nonverbal developmental/intelligence quotient. VDQ/IQ = verbal developmental/intelligence quotient. CELF = Clinical Evaluation of Language Fundamentals. Baseline refers to the earliest available assessment for a given participant. The 5DNPCCSS is captured as an integer but was treated as a continuous variable in the current analysis plan. For the Nine-hole, higher scores reflect worse performance relative to normative data. For all other direct assessments, higher scores reflect better performance relative to normative data.
Figure 1. Relationship between Wechsler IQ and 5DNPCCSS Cognition Rating at baseline (n=60).
Note: IQ = intelligence quotient. Using the first available assessment from each person, their nonverbal IQ (left) and verbal IQ (right) were plotted against their transformed 5DNPCCSS Cognition Score (see Table 1). The color of the point indicates the test used (WAIS = Wechsler Adult Intelligence Scale; WISC = Wechsler Intelligence Scales for Children; WPPSI = Wechsler Preschool and Primary Scales of Intelligence). The population mean for IQ is 100 and the standard deviation is 15. The dotted lines at 100 and 70 show the population mean and two standard deviations below the mean, respectively. A small amount of horizontal jitter was added to the data to reduce overplotting.
Figure 2. Relationship between Mullen DQ and 5DNPCCSS Cognition Rating at baseline (n=36).
Note: DQ = developmental quotient (ratio of mental age to chronological age). The nonverbal (left) and verbal (right) DQ from the Mullen for each person was plotted against their transformed 5DNPCCSS Cognition Score (see Table 1). DQs have no population distribution but are often interpreted in the same way as intelligence quotients (IQ). The population distribution of IQ is 100±15, and the dotted lines in the plot correspond to the average IQ and two standard deviations below average. A small amount of horizontal jitter was added to the data to reduce overplotting. All Mullens included in the analysis were administered in age-range.
Each additional point of difference in 5DNPCCSS Cognition was significantly associated with a difference of −11.77 [−15.48, −8.05] for Wechsler NVIQ and −15.26 [−18.92, −11.59] for Wechsler VIQ. There was no significant main effect of change in 5DNPCCSS Cognition on change in either NVIQ or VIQ. There was a VIQ trend (p=.008) for this relationship to depend on the 5DNPCCSS severity level, such that the expected change in VIQ for each unit change in 5DNPCCSS Cognition was larger and more negative for individuals with less severe starting 5DNPCCSS values. Post-hoc contrasts indicated that this was due to significant relationships between change in 5DNPCCSS Cognition and VIQ for individuals with starting 5DNPCCSS values of 0 (−17.75 [−27.52, −7.99]; t(8) = −4.16, p = .003) and 1 (−9.18 [−14.26, −4.11]; t(7) = −4.26, p = .004) but not for starting values of 2 (p=.78) or 3 (p=.09).
For the MSEL, a one-unit difference in 5DNPCCSS Cognition was significantly associated with a difference in VDQ (−26.89 [−33.53, −20.25]) and NVDQ (−25.94 [−34.31, −17.57]). Within-person change in 5DNPCCSS was not associated with change in MSEL NVDQ, but it was significantly associated with change in MSEL VDQ such that a one-unit increase in 5DNPCCSS Cognition predicted a 20.36 [7.87, 32.85] decline in VDQ. There was a trend (p=.07) for this to depend on 5DNPCCSS Cognition severity, such that the reduction in VDQ associated with a one-point increase in 5DNPCCCSS was smaller for individuals with more severe ratings, but it did not reach statistical significance.
5DNPCCSS Speech severity was significantly and inversely associated with Z-scores on the CELF (parameter estimate: −3.12 [−4.55, −1.68]), but changes in the two scores were unrelated.
The 5DNPCCSS Fine Motor severity score was significantly related to the Purdue Z-scores (Dominant: −1.76 [−2.23, −1.3], Non-dominant: −1.69 [−2.21, −1.18]) and Nine-hole Z-scores (Dominant: 22.11 [11.43, 32.78], Non-dominant: 25.23 [12.8, 37.65]) in the expected directions. However, change in 5DNPCCSS Fine Motor was not associated with change in any Purdue or Nine-hole score.
Discussion
The 5DNPCCSS is a widely used clinical assessment of NPC symptom severity. The domains captured by the 5DNPCCSS are considered meaningful by both clinicians and parents of individuals with NPC,8,9 and good inter- and intra-rater reliability has been documented.4,10 The current study adds to this and other work supporting the validity of the Fine Motor,9 Ambulation,9 and Swallow11 domains by demonstrating the convergent validity of the Cognition, Speech, and Fine Motor domain severity scores with widely used direct assessments of those concepts. In all cases, differences of 1 point in the 5DNPCCSS domain was associated with a difference in the direct assessment that equaled approximately 1 SD of the sample distribution of that score.
A 1-point difference in 5DNPCCSS severity was associated with 11- and 15-point differences in Wechsler nonverbal and verbal IQ scores, respectively. The population standard deviation for an IQ score is 15 points, and the bands of intellectual disability severity classification are typically defined using this standard deviation, so these differences are clinically significant. Similarly, a 1-point difference in 5DNPCCSS severity was associated with 26- and 27- point differences in MSEL nonverbal and verbal DQs, respectively. The DQ is in practice interpreted along the same lines as the IQ, and so this represents a very large effect. The difference in magnitude between the Wechsler and MSEL effects is likely a function of the scale of the scores; the Wechsler IQs are bounded by the test floor in a way that the MSEL DQ is not. For both direct assessments of cognition, change in the 5DNPCCSS was not associated with change in the nonverbal index. However, within-person change in 5DNPCCSS Cognition was to some degree associated with change in both verbal IQ and DQ, such that less severely affected participants whose 5DNPCCSS Cognition scores worsened experienced more worsening in the direct assessment scores than more severely affected participants. In other words, the 5DNPCCSS Cognition severity score changed more easily relative to the direct assessment for more severe participants. We interpret these results with caution because relatively few participants experienced change in 5DNPCCSS Cognition scores, but a likely explanation is that the range of cognitive function covered by the mildest Cognition rating is much wider than for the other ratings: an individual with an initially average IQ could lose up to 30 points before dropping into the below average range (e.g., 115 to 85). We hypothesize that one reason why within-person verbal IQ and DQ change was more closely associated with 5DNPC-CSS Cognition change is because declines in nonverbal IQ and DQ could have been more related to changes in the fine motor skills required for that domain than in cognition itself. The participants in both the Wechsler and MSEL samples had the full range of 5DNPCCSS Cognition severity scores. However, the analysis was limited to individuals with sufficient ability to take the cognitive test appropriate to their chronological age level. Therefore, these results do not necessarily apply to more severely impacted individuals with NPC.
An additional methodological point to consider when interpreting these results is that the model of analysis treated the between-person effect of 5DNPCCSS score as interval, meaning that the model was constrained to identify a constant mean difference in direct assessment score between 5DNPCCSS levels. This was selected for consistency with existing research, which overwhelmingly treats the 5DNPCCSS as a continuous variable, and to reduce the number of parameters estimated by the already-complex model. However, the within-person effect was allowed to vary by severity level, resulting in the findings described above. That the within-person effect was not interval (i.e., the effect magnitude of a one-unit change in 5DNPCCSS smaller for more severe ratings), suggests that the between-person effect may also not be interval.
Among individuals with 5DNPCCSS Speech severity rating less than 3, scores were significantly and inversely related to CELF Formulated Sentences, such that each point of 5DNPCCSS difference was associated with about 3 points difference in CELF Z-score. Given that the CELF Z-score distribution in this sample had a standard deviation of about 2.5 points, this is a large effect. We also corroborated existing data9 by demonstrating the convergent validity of the 5DNPCCSS Fine Motor severity score (<5) with z-scores for both the Nine-hole and Purdue tests. Within each test, the relationships were of similar magnitude for dominant and nondominant hands and was approximately equivalent to 1 SD of the sample distribution of direct assessment scores. This result expands upon existing evidence9 for the convergent validity of 5DNPCCSS through its use of Z-scores in lieu of raw scores, accounting for age-related effects on performance.
While these results support the validity of the Cognitive, Speech, and Fine Motor domains, they are limited in their generalizability because no single direct assessment of these domains is suitable for the whole range of phenotypic expression in NPC. Indeed, based on the broad range of symptoms and high phenotypic variability in NPC, multiple domain ratings as those in the 5DNPCCSS are required to fully capture disease severity. Further, separate evaluation of each domain allows for the fact that while progression at the domain level does correlate with overall progression,10 there can also be substantial disparities in the rates of worsening among domains. The sample composition must also be considered when interpreting the current results. For Speech and Fine Motor, individuals with the most severe 5DNPCCSS scores were excluded from analysis, because they did not have sufficient ability to complete the direct assessment. While the full range of 5DNPCCSS Cognition severity scores were represented in analysis, it is true that more severely affected individuals were excluded by virtue of not having within-age-range cognitive assessments. For these reasons, these validity data do not directly apply to patients with the most severe presentation. This is a well-recognized problem in conditions affecting neurodevelopment, especially those associated with significant levels of disability.21 Although not a direct performance-based measure, and thus may be considered less objective, standardized caregiver-informed assessments like the Vineland Adaptive Behavior Scales22 cover the full range of ability. Such instruments may therefore provide a more robust convergent validity assessment for the 5DNPCCSS in future work, to ensure that the results are applicable to the full range of phenotypic presentation in NPC.
Relationships between change in the 5DNPCCSS severity score and change in the direct assessment were observed only for the Cognition domain, and then only with the direct assessment of verbal ability. Thus, these results do not broadly support the responsivity, or sensitivity to change, of the 5DNPCCSS. However, we note that this analysis did not represent a very robust test. The ICCs for all models were very high, indicating that most variability was between-person (versus within-person). Thus, despite the data being longitudinal in nature, there was little change in direct assessments (or 5DNPCCSS) to explain. This may have been due to an insufficiently long follow-up period, which in this study averaged around 3 years; excepting the earliest onset patients, the progression of NPC symptoms typically spans many years. More work is certainly needed to properly evaluate the responsivity of the 5DNPCCSS, but this will likely need to occur in the context of an effective intervention and/or longer follow-up durations. Finally, we note that the assessments used here as indicators of convergent validity were also available to the clinicians making the 5DNPCCSS ratings, though the information was only regularly used for the Cognition domain. This was a necessary limitation, as good clinical judgment must make use of all available information, of which direct assessment is an important constituent for the Cognition domain. The fact that the correlations between the 5DNPCCSS and the direct assessments were not perfect indicates that clinicians used a range of other information to make their final judgments.
In summary, the results of this study add to the growing body of psychometric support4,9,10 for the use of the 5DNPCCSS as a valid measure of severity in people with NPC. However, more evidence is needed to support its use as an index of change, and future psychometric research should focus on longer term data sets and settings where meaningful change is likely to occur, such as successful treatment trials.
Table 4.
Multilevel model results
5DNPCCSS Domain |
Direct Assessment Subdomain | Parameter | Intercept | Between-person effect of 5DNPCCSS | Within-person effect of change in 5DNPCCSS | Between*Within |
---|---|---|---|---|---|---|
Cognition | Wechsler VIQ | slope | 76.7 [72.02, 81.37] | −15.26 [−18.92, −11.59] | −0.61 [−4.88, 3.66] | 8.57 [3.44, 13.7] |
test | t(61)=32.16, p < .001 | t(59)=−8.16, p < .001 | t(11)=−0.28, p=0.78 | t(10)=3.28, p=0.008 | ||
Wechsler NVIQ | slope | 69.99 [65.23, 74.75] | −11.77 [−15.48, −8.05] | −2.67 [−6.72, 1.37] | 2.38 [−4.9, 9.65] | |
test | t(54)=28.83, p < .001 | t(53)=−6.2, p < .001 | t(11)=−1.29, p=0.22 | t(9)=0.64, p=0.54 | ||
MSEL VDQ | slope | 89.8 [82.03, 97.56] | −26.89 [−33.53, −20.25] | −20.36 [−32.85, −7.87] | 8.59 [−0.17, 17.34] | |
test | t(37)=22.66, p < .001 | t(35)=−7.94, p < .001 | t(20)=−3.2, p=0.005 | t(19)=1.92, p=0.07 | ||
MSEL NVDQ | slope | 101.13 [91.86, 110.39] | −25.94 [−34.31, −17.57] | −17.55 [−45.55, 10.45] | −1.81 [−23.09, 19.48] | |
test | t(32)=21.39, p < .001 | t(30)=−6.07, p < .001 | t(5)=−1.23, p=0.27 | t(4)=−0.17, p=0.88 | ||
Speech | CELF Formulate Sentences Z-score | slope | −2.49 [−3.33, −1.66] | −3.12 [−4.55, −1.68] | −0.72 [−2.09, 0.65] | 0.15 [−2.12, 2.43] |
test | t(24)=−5.85, p < .001 | t(24)=−4.25, p < .001 | t(4)=−1.03, p=0.36 | t(4)=0.13, p=0.9 | ||
Fine Motor | Purdue Dominant Hand Z-score | slope | −3.61 [−4.27, −2.96] | −1.76 [−2.23, −1.3] | −0.1 [−0.65, 0.46] | −0.22 [−0.81, 0.36] |
test | t(81)=−10.79, p < .001 | t(60)=−7.42, p < .001 | t(10)=−0.34, p=0.74 | t(8)=−0.74, p=0.48 | ||
Purdue Non-dominant Hand Z-score | slope | −4.25 [−4.98, −3.52] | −1.69 [−2.21, −1.18] | −0.44 [−0.98, 0.1] | 0.42 [−0.17, 1] | |
test | t(81)=−11.36, p < .001 | t(58)=−6.43, p < .001 | t(14)=−1.61, p=0.13 | t(14)=1.4, p=0.18 | ||
Nine-hole Dominant Hand Z-score | slope | 33.96 [23.37, 44.56] | 22.11 [11.43, 32.78] | 1.31 [−6.73, 9.35] | 1.32 [−7.64, 10.29] | |
test | t(35)=6.28, p < .001 | t(36)=4.06, p < .001 | t(218)=0.32, p=0.75 | t(218)=0.29, p=0.77 | ||
Nine-hole Non-dominant Hand Z-score | slope | 38.96 [26.35, 51.57] | 25.23 [12.8, 37.65] | 4.89 [−3.9, 13.69] | 5.26 [−4.48, 15.01] | |
test | t(34)=6.06, p < .001 | t(36)=3.98, p < .001 | t(216)=1.09, p=0.28 | t(216)=1.06, p=0.29 |
Note: MSEL = Mullen Scales of Early Learning. NVDQ/IQ = nonverbal developmental/intelligence quotient. VDQ/IQ = verbal developmental/intelligence quotient. CELF = Clinical Evaluation of Language Fundamentals. The between-subject effect of 5DNPCCSS domain was centered at the median value (Wechsler: 2; MSEL: 0; CELF: 1; Purdue: 1; Nine-hole: 2). Thus, the between-within interaction is interpreted as the additional expected change in outcome as a function of change in 5DNPCCSS domain score for a one-unit increase above the median between-subject 5DNPCCSS value. For the Nine-hole, higher scores reflect worse performance relative to normative data. For all other direct assessments, higher scores reflect better performance relative to normative data.
Acknowledgement.
This work was possible due to the invaluable contribution of the individuals with NPC1 and their families who have participated in clinical research. Portions of this project were presented at the Ara Parseghian Medical Research Fund and National Niemann-Pick Disease Foundation conferences. The NICHD Clinical Trials Database was the data capture system for this project. We are grateful for the assistance of Patricia Pullen and the NICHD Clinical Trials Database team, which was essential to this research. We appreciate the clerical assistance of Seth Cutler in preparing this report.
Funding statement.
This work was supported by the Hope for Hayley, Samantha’s Search for the Cure, and Chase the Cure funds (EBK). This work was also supported by the Ara Parseghian Medical Research Foundation (NYF, FDP). This work was also supported in part by the Intramural Research Programs of the NIMH (ZIC-MH002961) and NICHD (ZIA HD008989).
Declaration of Conflicting Interests.
Drs. Farmer, Joseph, Giserman-Kiss, and Thurm declare no conflicts of interest. Dr. Berry-Kravis has received funding from Acadia, Alcobra, AMO, Asuragen, Avexis, Biogen, BioMarin, Cydan, Engrail, Erydel, Fulcrum, GeneTx, GW, Healx, Ionis, Jaguar, Kisbee, Lumos, Marinus, Mazhi, Moment Biosciences, Neuren, Neurogene, Neurotrope, Novartis, Orphazyme/Kempharm/Zevra, Ovid, PTC Therapeutics, Retrophin, Roche, Seaside Therapeutics, Taysha, Tetra, Ultragenyx, Yamo, Zynerba, and Vtesse/Sucampo/Mallinckrodt Pharmaceuticals, to consult on trial design or run clinical or lab validation trials in genetic neurodevelopmental or neurodegenerative disorders, all of which is directed to RUMC in support of rare disease programs; Dr Berry-Kravis receives no personal funds and RUMC has no relevant financial interest in any of the commercial entities listed. Dr. Porter’s research group has received support as part of a Cooperative Research and Development Agreement between NICHD, NIH and Vtesse/Sucampo/Mallinckrodt/Mandos to facilitate the development of 2-hydroxypropyl-β-cyclodextrin for the treatment of individuals with NPC.
Footnotes
Ethical considerations. Participants were enrolled in at least one of the following studies: NCT00344331, NCT01747135, NCT02534844, or an expanded access program (IND119856). Each of these studies was approved by the Institutional Review Board (IRB) of the National Institutes of Health (06-CH-0186, 13-CH-0001, 16-CH-0016) or the Rush University IRB (ORAs 13092305, 15081309).
Consent to participate. Written consent, and verbal assent when appropriate, was obtained for all participants.
Data Availability Statement.
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
- 1.Vanier MT. Niemann-Pick disease type C. Orphanet journal of rare diseases. 2010;5:1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Runz H, Dolle D, Schlitter AM, Zschocke J. NPC-db, a Niemann-Pick type C disease gene variation database. Human mutation. 2008;29(3):345–350. [DOI] [PubMed] [Google Scholar]
- 3.Wassif CA, Cross JL, Iben J, et al. High incidence of unrecognized visceral/neurological late-onset Niemann-Pick disease, type C1, predicted by analysis of massively parallel sequencing data sets. Genetics in Medicine. 2016;18(1):41–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mengel E, Bembi B, del Toro M, et al. Clinical disease progression and biomarkers in Niemann–Pick disease type C: a prospective cohort study. Orphanet Journal of Rare Diseases. 2020/11/23 2020;15(1):328. doi: 10.1186/s13023-020-01616-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Imrie J, Dasgupta S, Besley G, et al. The natural history of Niemann–Pick disease type C in the UK. Journal of inherited metabolic disease. 2007;30:51–59. [DOI] [PubMed] [Google Scholar]
- 6.Yanjanin NM, Vélez JI, Gropman A, et al. Linear clinical progression, independent of age of onset, in Niemann–Pick disease, type C. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics. 2010;153B(1):132–140. doi: 10.1002/ajmg.b.30969 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Evans W, Patterson M, Platt F, et al. International consensus on clinical severity scale use in evaluating Niemann–Pick disease Type C in paediatric and adult patients: results from a Delphi Study. Orphanet Journal of Rare Diseases. 2021/11/18 2021;16(1):482. doi: 10.1186/s13023-021-02115-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cortina-Borja M, te Vruchte D, Mengel E, et al. Annual severity increment score as a tool for stratifying patients with Niemann-Pick disease type C and for recruitment to clinical trials. Orphanet Journal of Rare Diseases. 2018/08/16 2018;13(1):143. doi: 10.1186/s13023-018-0880-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Patterson MC, Lloyd-Price L, Guldberg C, et al. Validation of the 5-domain Niemann-Pick type C Clinical Severity Scale. Orphanet Journal of Rare Diseases. 2021/02/12 2021;16(1):79. doi: 10.1186/s13023-021-01719-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Farhat N, Bailey L, Friedmann K, et al. Consistently High Agreement Between Independent Raters of Niemann-Pick Type C1 Clinical Severity Scale in Phase 2/3 Trial. Pediatric Neurology. 2022/02/01/ 2022;127:32–38. doi: 10.1016/j.pediatrneurol.2021.11.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Solomon BI, Muñoz AM, Sinaii N, et al. Phenotypic expression of swallowing function in Niemann–Pick disease type C1. Orphanet Journal of Rare Diseases. 2022/09/05 2022;17(1):342. doi: 10.1186/s13023-022-02472-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bishop SL, Farmer C, Thurm A. Measurement of nonverbal IQ in autism spectrum disorder: Scores in young adulthood compared to early childhood. Journal of autism and developmental disorders. 2015;45:966–974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Smith YA, Hong E, Presson C. Normative and validation studies of the Nine-hole Peg Test with children. Perceptual and motor skills. 2000;90(3):823–843. [DOI] [PubMed] [Google Scholar]
- 14.Wang Y-C, Bohannon RW, Kapellusch J, Garg A, Gershon RC. Dexterity as measured with the 9-Hole Peg Test (9-HPT) across the age span. Journal of Hand Therapy. 2015;28(1):53–60. [DOI] [PubMed] [Google Scholar]
- 15.Tiffin J, Asher EJ. The Purdue Pegboard: norms and studies of reliability and validity. Journal of applied psychology. 1948;32(3):234. [DOI] [PubMed] [Google Scholar]
- 16.Wang Y-C, Magasi SR, Bohannon RW, et al. Assessing dexterity function: a comparison of two alternatives for the NIH Toolbox. Journal of Hand Therapy. 2011;24(4):313–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lafayette Instruments. Purdue Pegboard Users Manual. 2015. [Google Scholar]
- 18.Semel E, Wiig E, Secord W. Clinical evaluation of language fundamentals. 4th. Toronto, Canada: The Psychological Corporation/A brand of Harcourt Assessment. Inc; 2003. [Google Scholar]
- 19.Nippold MA, Frantz-Kaspar MW, Vigeland LM. Spoken language production in young adults: Examining syntactic complexity. Journal of Speech, Language, and Hearing Research. 2017;60(5):1339–1347. [DOI] [PubMed] [Google Scholar]
- 20.Curran PJ, Bauer DJ. The disaggregation of within-person and between-person effects in longitudinal models of change. Annual review of psychology. 2011;62:583–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Farmer C AJK, Berry-Kravis E, Thurm A. Psychometric perspectives on developmental outcome and endpoint selection in treatment trials for genetic conditions associated with neurodevelopmental disorder. In: Esbensen A, Schworer E, eds. International Review of Research in Developmental Disabilities. Elsevier Inc. Academic Press; 2022:1–39. [Google Scholar]
- 22.Sparrow SS, Cicchetti DV, Saulnier CA. Vineland-3: Vineland adaptive behavior scales. PsychCorp; 2016. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.