Abstract
Background
Progressive supranuclear palsy (PSP) is a neurodegenerative, late‐onset disease that is challenging in terms of assessment. The Progressive Supranuclear Palsy Rating Scale (PSPRS), a 28‐item clinician‐reported scale, is the most established clinical outcome assessment method. Recently, the U.S. Food and Drug Administration (FDA) has proposed a subscale of 10 items as an alternative to full PSPRS.
Objectives
To quantitatively evaluate and compare the properties of full PSPRS and the FDA subscale using item response theory. To develop a progression model of the disease and assess relative merits of study designs and analysis options.
Methods
Data of 979 patients from four interventional trials and two registries were available for analysis. Our investigation was divided into: (1) estimating informativeness of the 28 items; (2) estimating disease progression; and (3) comparing the scales, trial designs, and analysis options with respect to power to detect a clinically relevant treatment effect.
Results
PSPRS item scores had a low pairwise correlation (r = 0.17 ± 0.14) and the items irritability, sleep difficulty, and postural tremor were uncorrelated with the other items. The FDA‐selected items displayed higher correlation (r = 0.35 ± 0.14) and were the basis for a longitudinal item response model including disease progression. Trial simulations indicated that identification of a disease‐modifying treatment effect required less than half the study size if the analysis was based on longitudinal item information compared with total scores at end‐of‐treatment.
Conclusion
A longitudinal item response model based on the FDA‐selected PSPRS items is a promising tool in evaluating treatments for PSP. © 2024 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Keywords: progressive supranuclear palsy, progressive supranuclear palsy rating scale, item response theory
Progressive supranuclear palsy (PSP) is a rare neurodegenerative disease characterized by progressive motor symptoms, including difficulties with gait, posture, and eye movements, alongside cognitive and behavioral manifestations like non‐fluent aphasia, apathy, and impulsivity. The clinical criteria proposed by the National Institute of Neurological Disorders and Stroke and the Society for Progressive Supranuclear Palsy and neuroprotection and natural history in Parkinson plus syndromes has been considered as the gold standard for diagnosis since 1996. 1 It has since been revised by the Movement Disorder Society (MDS)‐PSP study group to improve sensitivity towards less common variants of PSP, resulting in the emergence of MDS‐PSP criteria. 2 While clear advancements have been achieved with regard to diagnosis, assessing PSP severity remains controversial. The clinician‐reported Progressive Supranuclear Palsy Rating Scale (PSPRS) is a widely utilized clinical assessment tool for this purpose. 3 It is a 28‐item scale that evaluates disability in several domains, including gait, ocular, and mental functions. PSPRS has proven useful for clinical evaluation of the disease, mainly because of its ability to capture patient deterioration over time and the correlation of its total score with survival. 3 Recently, The U.S. Food and Drug Administration (FDA) has proposed a subscale of 10 items of the 28 in PSPRS. Those 10 items test history, bulbar, and gait performances. Beside the selection of items, the FDA has proposed the rescoring of 9 of those 10 items, through the merging of some response categories, potentially reducing the information content of the selected items. To our knowledge, the rationale behind the selection and the rescoring recommendations has not been publicly communicated. Therefore, one of the main objectives of this work was to quantitatively compare the suitability and informativeness of the scales: PSPRS, the FDA‐recommended subscale before rescoring (PSPRS‐10), and the FDA‐recommended subscale after rescoring (rPSPRS‐10) as tools for the assessment of the severity of PSP.
Item response theory (IRT) has evolved as a powerful framework to improve the design and analysis of tests composed of individual items, while also enhancing the understanding of how these items relate to the underlying latent (unobserved) variable being measured. 4 IRT uses mathematical models to describe the probability of each categorical response per item given a latent variable representing a person's disability (or ability depending on the test in hand). For the purpose of comparing the three previously mentioned variants of the PSPRS, IRT offers a quantitative setup to analyze the discrimination ability and informativeness of each item of those scales in the assessment of the underlying disability.
Beside assessing the properties of the scales, IRT can, through its latent variable, be used to build a longitudinal model to describe the progression of the disease severity. While there is substantial literature revolving around the survival of PSP patients and their mean rate of deterioration, 5 , 6 , 7 , 8 to our knowledge PSP progression has not been extensively investigated at the patient level. Therefore, an objective of this work was to build a PSP progression model to predict deterioration at the individual level based on the estimated typical progression and individual covariates.
Finally, an important aspect of the evaluation of PSPRS variants is the power to detect a statistically significant treatment effect, especially since there is currently no known treatment for this debilitating disease. 9 A more powerful scale can be favored as a primary outcome assessment tool to measure treatment efficacy in the upcoming clinical trials. Power analysis has been performed previously, testing different PSPRS variants and effect sizes. 10 In this article we also investigate the power to identify an existing treatment effect associated with the three previously mentioned scales (ie, PSPRS‐28, PSPRS‐10, rPSPRS‐10) under several scenarios. Finally, we discuss our general conclusions regarding the overall utility and suitability of the three scales.
Methods
Data
Data (longitudinal PSPRS item‐level data and patient characteristics) from four interventional trials and two registries were pooled for the analysis: (1) the ARISE trial was a randomized, three‐armed, parallel group trial of 2000 and 4000 mg tilavonemab controlled against placebo, 11 where all arms were included in the analysis; (2) the PASSPORT study was a phase II, randomized, placebo‐controlled trial of gosuranemab, where only placebo arm data were included in this analysis 12 ; (3) TAUROS, a phase II trial of the GSK‐3 inhibitor tideglusib controlled against placebo, where both active and placebo arms were included 13 ; (4) PROSPERA, a randomized, placebo‐controlled trial evaluating rasagiline, where all arms were included in the analysis 14 ; and (5) DescribePSP and (6) ProPSP, both are registry data from the German multicenter networks for standardized prospective collection of clinical data, imaging data, and biomaterials of patients with PSP. 15
Software
Analyses were performed using NONMEM 7.5 (ICON plc, Dublin, Ireland) and the open source Piraid R package (Pharmacometrics research group, Uppsala University, Uppsala, Sweden). 16 The Perl‐coded program PsN 17 was used to generate simulation‐based diagnostics, perform covariate identification algorithms, and power simulations.
IRT Modeling
Model development was conducted in three sequential steps: (1) development of the IRT model, (2) development of the disease progression model, and (3) stepwise covariate model building as implemented in PsN. 18
A graded response model was used to describe the data of each item of PSPRS. In this step, each observation time was treated as a separate individual. Equation 1 describes the model parametrization
| (1) |
where is the disability of subject i. When Eq. 1 with item‐specific values for a j and b j,s is displayed graphically, it is termed an item characteristic curve (ICC). To provide a reference for the latent variable scale, the distribution of was fixed to a mean of zero and a variance of one. a j represents the discrimination parameter for the item j and b j,s denotes the difficulty for the s‐th level of the item (b j,s ≤ b j,s + 1). The probability to have the score s is then calculated by
| (2) |
-
2
For the longitudinal model, item‐level parameters obtained in step (1) were fixed and parameters of a linear latent variable progression model were estimated:
| (3) |
where is the individual's predicted latent variable at observation k, is the latent variable prediction at baseline for a given individual i, is the individual's estimated slope, and t is time. The linearity assumption was assessed visually and using model fit criteria 19 (Figs 2 and S4).
-
3
Covariates were subsequently tested on baseline and slope parameters using stepwise covariate model building forward search using a log‐likelihood ratio test (LLRT) at P < 0.05, followed by backwards search at P < 0.01. The following covariates were explored: disease duration, age, sex, interventional trial: yes/no, and Richardson's syndrome: yes/no. Continuous covariates were assessed for linear relations to baseline and slope.
| (4) |
where for an individual i, are the covariate coefficients for baseline covariates and are the covariate coefficients for slope covariates
FIG. 2.

Pooled‐data visual predictive check (VPC) (diagnostic tool for the progression model) for PSPRS‐10. The 5th and 95th percentiles are highlighted in blue, and the median is highlighted in red. Observed data are shown in circles. Solid lines are the median of the observations. The red shaded area is the 95% confidence interval (CI) of the median model predictions. Dashed lines are the 95th and 5th percentiles of the observations. Blue shaded areas are the 95% CIs of the model predictions of the 95th and 5th percentiles. The yellow x axis labels mark the intervals where the observed data are binned (ie, the observed data are grouped into time intervals). From the binning, the median and the percentiles are calculated for each of those time intervals. PSP, progressive supranuclear palsy; PSPRS, Progressive Supranuclear Palsy Rating Scale. [Color figure can be viewed at wileyonlinelibrary.com]
The estimated function of the IRT model was, as a model diagnostic, compared with a generalized additive model (GAM) fit that uses a resampling‐based smoothing function, using observed item‐level data and individual latent variable estimates as dependent and independent variables, respectively. 20 Spearman correlation analysis of observed and simulated scores and residuals were used to diagnose the adequacy of the model to replicate observed patterns in item correlation. Longitudinal model evaluation was performed using simulation‐based visual predictive checks (VPCs) 21 where observed percentiles are compared with model simulated percentiles using individual baseline values of covariates included in the final model. Finally, individual longitudinal predictions were calculated with confidence intervals (CIs) based on the uncertainty of empirical Bayesian estimates.
Fisher Information
Given a random variable x with the probability density function f(x; θ), the Fisher Information is defined as the expectation of the negative second derivative of the log‐likelihood function ℓ(θ|x) with respect to the parameter vector θ and it is a measure of the amount of information that a study design contains about θ. Of particular interest in this study is the assessment of the contribution of individual items to this information.
Total Score Models
Besides the main item‐based IRT model developed in earlier steps, two total score progression models were built:
Based on the IRT model, an expected total score can be derived as a function of the latent variable (the disability). The derived functions for the expected mean total score and its variability can be approximated using Chebyshev polynomials. 22 Such a procedure has been shown to be advantageous in the analysis of total score data. 21 An IRT‐informed total score model was built making use of the previously mentioned translation polynomials to predict total score evolution and its variability over time as a function of a linear latent variable progression as in Eq. 3, with an additive residual error variance parameter.
A linear total score model as in Eq. 5
| (5) |
where is the total score prediction for the individual i at observation k, is the individual baseline parameter, is the individual slope, and t is time.
Trial Designs
Disease‐modifying effects of 0%, 20%, 30%, and 50% were investigated. The 0% change was used for type I error assessment, while the 20%, 30%, and 50% scenarios were used to detect the number of subjects to achieve 80% power in parallel design. The power assessment was carried out by simulating 1000 trials of 100, 400, and 500 subjects each for 20%, 30%, and 50% disease‐modifying scenarios, respectively, using the PSPRS‐10 IRT model, with 1:1 (treatment : placebo) random allocation and observations at 0, 3, 6, 9, and 12 months. A full power curve was then constructed using the parametric power estimation algorithm 23 for different analysis models based on longitudinal item or total score data.
Type I error was investigated across the two scales and the three developed models. We used the PSPRS‐10 IRT model to simulate 1000 trials of 50 patients each also in a 1:1 parallel design.
The hypothesis testing was based on a two‐sided LLRT for a fixed effect parameter that represents the change in the progression of the disability as in Eq. 6:
| (6) |
Results
Data
A total of 979 patients were included in this study with a mean number of PSPRS visits of 3.8 ± 1.5. Table 1 shows patient characteristics at first record. Data for 259 patients came from registries. We observed (1) similar age and sex distributions across the studied populations, (2) PSPRS‐28 total score at first record and percentage of Richardson's syndrome phenotype were on average higher for clinical trials than for the registries, and (3) follow‐up duration was generally longer for registries than for the clinical trials.
TABLE 1.
Study population description
| Variable | Study | |||||
|---|---|---|---|---|---|---|
| ARISE | PASSPORT | PROSPERA | TAUROS | DescribePSP (observational) | ProPSP (observational) | |
| Patients (N) | 377 | 161 | 44 | 138 | 127 | 132 |
| Richardson's syndrome phenotype [N, (%)] | 377 (100) | 161 (100) | 44 (100) | 82 (59) | 92 (72) | 99 (75) |
| Sex [n, (%)] | ||||||
| Female | 158 (42) | 72 (45) | 24 (55) | 63 (46) | 53 (42) | 61 (46) |
| Male | 219 (58) | 89 (55) | 20 (45) | 75 (54) | 74 (58) | 71 (54) |
| Age (years, mean ± SD) | ||||||
| Baseline | 68.8 ± 6.8 | 68.9 ± 6.6 | 67.7 ± 5.4 | 68.3 ± 7.0 | 69.8 ± 7.2 | 69.1 ± 7.8 |
| Disease duration [years, median (IQR)] | ||||||
| Baseline | 3.2 (1.7) | 1.2 (1.8) | 3.1 (2.3) | 2.3 (3.1) | 3.2 (2.9) | 3.6 (2.9) |
| Final | 3.9 (1.8) | 2.1 (1.7) | 3.9 (2.2) | 4.0 (2.6) | 5.1 (3.5) | 4.8 (3.4) |
| Delta | 0.8 (0.3) | 1.0 (0.1) | 1.0 (0.2) | 1.2 (0.6) | 1.3 (1.1) | 1.0 (0.8) |
| Number of PSPRS observations per patient [median (IQR)] | 5.0 (1.0) | 3.0 (0.0) | 5.0 (1.3) | 7.0 (3.0) | 2.0 (1.0) | 2.0 (1.0) |
| PSPRS‐28 total score [median (IQR)] | ||||||
| Baseline | 36.0 (16.0) | 38.0 (15.0) | 28.0 (10.3) | 39.0 (13.8) | 29.0 (16.0) | 32.0 (20.8) |
| Final | 45.0 (20.0) | 48.0 (20.0) | 38.0 (13.3) | 49.5 (20.8) | 37.0 (24.5) | 41.0 (23.0) |
| Annualized delta | 9.5 (14.1) | 11.1 (11.1) | 12.3 (8.2) | 11.5 (13.6) | 7.0 (9.8) | 7.6 (10.9) |
| PSPRS‐10 total score [median (IQR)] | 38.0 (1.8) | |||||
| Baseline | 16.0 (9.0) | 17.0 (9.0) | 12.5 (4.0) | 19.0 (8.0) | 12.0 (8.0) | 14.0 (10.0) |
| Final | 21.0 (11.0) | 23.0 (9.0) | 17.0 (7.3) | 25.0 (11) | 18.0 (16.0) | 20.0 (12.3) |
| Annualized delta | 5.0 (8.6) | 6.5 (6.0) | 5.9 (6.0) | 6.1 (7.7) | 4.3 (5.1) | 4.4 (5.9) |
| rPSPRS‐10 total score [median (IQR)] | 38.0 (1.8) | |||||
| Baseline | 9.0 (7.0) | 10.0 (7.0) | 7.0 (3.0) | 11.0 (7.0) | 7.0 (5.5) | 8.0 (7.0) |
| Final | 14.0 (9.0) | 15.0 (8.0) | 10.0 (6.0) | 16.0 (9.8) | 11.0 (12.0) | 12.0 (10.0) |
| Annualized delta | 4.0 (7.0) | 5.0 (5.0) | 4.0 (5.5) | 5.0 (6.1) | 3.2 (4.0) | 2.7 (4.3) |
Abbreviations: PSP, progressive supranuclear palsy; SD, standard deviation; IQR, interquartile range; PSPRS, Progressive Supranuclear Palsy Rating Scale; rPSPRS, FDA‐recommended PSRPS subscale after rescoring.
The average pairwise Spearman correlation coefficient for PSPRS‐28 item scores was low (r = mean ± SD) (r = 0.17 ± 0.13) at first record. The items irritability, sleep difficulty, and postural tremor were uncorrelated with other items, with an average pairwise correlation (r = 0.01 ± 0.05). Items in the gait domain (see Fig. 1 for domain definition) showed a higher pairwise internal correlation (r = 0.53 ± 0.17) than observed in other domains (r = 0.15 ± 0.02). The mean pairwise correlation across domains was moderate (r = 0.35 ± 0.09). While all score categories were represented for all items in the dataset, some categories were less well represented. However, the overall lowest score frequency was 0.6% (21 responses) (Table S11).
FIG. 1.

Item characteristic curves (ICCs) for PSPRS‐28 with Fisher information value. First two rows are the 10 items selected by the U.S. Food and Drug Administration (FDA). The letter between parentheses signifies the PSPRS domain, where (A) is history, (B) is mentation, (C) is bulbar, (D) is ocular motor, (E) is limb motor, and (F) is gait. PSP, progressive supranuclear palsy; PSPRS, Progressive Supranuclear Palsy Rating Scale. [Color figure can be viewed at wileyonlinelibrary.com]
IRT Model for PSPRS‐28
The ICCs for each item of the PSPRS‐28 scale based on the IRT model are shown in Figure 1. These curves represent the probability of a specific score value given the latent variable, which represents disease severity. Three items, irritability, sleep difficulty, and postural tremor, show flat relationships indicating that they are not related to the severity distributions of the other items, in line with the observed lack of correlation. There is a large spread in the Fisher information values across the different items as displayed in Figure 1. The 10 FDA‐selected items display the highest information, 76% of the total Fisher information, despite being only 36% of the items. Among the domains, gait carried the highest information content with 57% of the total information.
Assessment of PSPRS‐10 and rPSPRS‐10
IRT Model
A second IRT model was developed for the FDA‐selected items. Pairwise Item correlation for the 10 observed items in PSPRS‐10 was generally higher than that observed for PSPRS‐28 (r = 0.34 ± 0.14). Figure S1 shows the correlation matrices across FDA‐selected items for both the observed data and those simulated from the IRT model. The IRT model successfully replicated the correlations observed in the real data. The residual correlations between items, displayed in Figure S2, indicate that largely the simulated data agree with the real data with respect to item‐to‐item correlations. However, a significant deviation can be observed between the items dysphagia and dysphagia for solids.
For a given ability, is referred to as discrimination because the higher the parameter value the higher the probability of distinguishing between levels of the ability, while is referred to as difficulty, because this parameter determines at what ability the response will likely change value. The parameter estimates for the difficulty and discrimination parameters for the PSPRS‐10 IRT model can be found in Table S1. The model ICC fits were compared with that of GAM fit (Fig. S3). The figure shows a good overall agreement in both fits across all FDA‐selected items. PSPRS‐10 versus rPSPRS‐10 information content can be found in Table S2. A loss of information ranging from 3% to 36% was observed after collapsing responses as per FDA recommendation.
Progression Model
The progression for PSPRS‐10 was described by a linear function on the latent variable scale. The range of the estimated disability can vary according to the disease under question and the scale used for evaluation. In the context of this analysis, a range of −4 to 4 as seen in Figure 1 could explain the transition of probabilities from likely lower scores at low disability (towards an estimated value of −4) to likely higher scores at severe disability (towards +4). Table S3 shows the longitudinal parameter estimates for the IRT model. The model estimated 0.884 increase in the disability per year for the interventional studies, while it predicted slower progression for the registry patients at 0.64 increase in disability per year. The predicted first record disability had a mean of 0.00 ± 0.81. The stepwise covariate model building yielded selection of the covariates in Table S3, where the only covariate that was found to significantly affect the slope was whether an individual belonged to an observational trial or not.
A VPC with all data pooled for the PSPRS‐10 can be found in Figure 2 where the fit looks good overall. A VPC stratified per study type can be found in Figure S4. The plot shows underprediction at the 2.5th and 97.5th percentiles for the observational data at the beginning of the study but predicts the median well for both interventional and observational data. The overall fit was visually better for the interventional than for the observational studies. Fisher information content and the ICCs for PSPRS‐10 can be found in Figures [Link], [Link]. Individual predictions with 95% CI using the same model can be found in Figure 3.
FIG. 3.

Individual predictions of PSPRS‐10 total score. Patients (A), (B), (C), and (D) were chosen randomly from the interventional trials. The points highlight the observed data. The solid lines show the individual prediction, and the colored areas are the 95% confidence intervals (CI) of the specific individuals and the 95% prediction intervals of the typical individual. PSP, progressive supranuclear palsy; PSPRS, Progressive Supranuclear Palsy Rating Scale. [Color figure can be viewed at wileyonlinelibrary.com]
Trial Designs
Two total score progression models were estimated for PSPRS‐10, an IRT‐informed and a linear mixed‐effect model. The parameter estimates for both models are available in Tables S4 and S5. Table 2 shows the number of subjects required to achieve a power of 80% for PSPRS‐10 and rPSPRS‐10 scales and the IRT, IRT‐informed, and linear models for 1‐year trials with three different hypothesized treatment effects. There were two trends observed in Table 2: PSPRS‐10 had higher power than rPSPRS‐10, and the power was highest for the IRT model and lowest for the linear total score model. In general, all three models had controlled type I error rate (Table S6). Full power curves for PSPRS‐10 can be seen in Figures [Link], [Link], [Link].
TABLE 2.
Sample sizes required for a two‐arm, 1‐year follow‐up trial to detect 20%, 30%, and 50% change in the active arm relative to placebo
| Model | Scale | 20% effect (n) | 30% effect (n) | 50% effect (n) |
|---|---|---|---|---|
| IRT | PSPRS‐10 | 139 | 58 | 19 |
| rPSPRS‐10 | 154 | 65 | 22 | |
| IRT‐end of treatment | PSPRS‐10 | 154 | 65 | 22 |
| rPSPRS‐10 | 178 | 75 | 25 | |
| IRT‐informed total score | PSPRS‐10 | 169 | 72 | 25 |
| IRT‐informed total score‐end of treatment | PSPRS‐10 | 183 | 79 | 28 |
| Linear total score | PSPRS‐10 | 303 | 124 | 39 |
| rPSPRS‐10 | 338 | 137 | 43 | |
| Total score‐end of treatment | PSPRS‐10 | 316 | 131 | 43 |
| rPSPRS‐10 | 341 | 140 | 46 |
Data are the required numbers of patients per arm, based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts and are generated using the parametric power estimation (PPE) algorithm.
Abbreviations: PSPRS, Progressive Supranuclear Palsy Rating Scale; rPSPRS, FDA‐recommended PSRPS subscale after rescoring; IRT, item response theory.
Discussion
A common justification for combining responses across multiple items of a clinical outcome assessment is that they are all reflective of the severity of an underlying disease. The IRT theory formalizes this concept and allows a quantitative analysis. The FDA terms this “reflective indicator model” and a hallmark of an appropriate model is unidimensionality across the different items. 24 Such unidimensionality is evident when a single latent variable, representing disease severity, can explain correlations in the variability across all items of the assessment. For PSPRS, the IRT analyses made it evident that this was not the case. A general reason for lack of unidimensionality can be that some items are highly influenced by processes not related to the disease. This is likely the cause for the poor performance of the items “irritability,” “sleep difficulty,” and “postural tremor.” Such items contribute no signal, but considerable noise to the assessment. Therefore, a reduced scale, which does not include such items, would be preferable. A way to determine the need for inclusion of items in the assessment is to be guided by the item's informativeness under the IRT model. This can be done either by inspecting the ICCs or the Fisher information value (Fig. 1). Items with low contribution could, with little loss, be omitted. In the PSPRS, the 10 items selected by the FDA contained 76% of the information across items, and the only item that was not included despite a relatively high information was “voluntary downward saccades.”
The IRT analysis of the FDA‐selected items only demonstrates a better agreement with the reflective indicator model. All items of a single assessment show a positive correlation with each other (Fig. S1), indicating an underlying common source for variation (ie, disease severity). Also, the remaining variability, as assessed in a residual analysis, showed overall good performance through a lack of positive correlations. However, one item pair, “dysphagia” and “dysphagia for solids,” show higher positive residual correlation with each other than expected under unidimensionality (Fig. S2). A refinement of the clinical outcome assessment scale to address this could be to select only one of these two items or to combine them into one item. While the FDA‐selected items show a better agreement with the unidimensionality, there are still aspects that may be less desirable. One such feature is that the information content is highly driven by the gait items (“neck rigidity,” “arising from chair,” “gait,” “postural stability,” and “sitting down”), which make up 83% of the information in the assessments.
The informativeness of an item is related both to the number of levels, the more levels the better, and the ability to differentiate the levels. High values of the discrimination parameters of an IRT model is a sign of clearly distinguishable levels. For the PSPRS‐10, the average value for the discrimination parameters was 2.11 (Table S1). This is a relatively high value when compared with the corresponding average values for other clinical outcome assessment scales. IRT models for the Movement Disorder Society‐Unified Parkinson's Disease Rating Scale (MDS‐UPDRS) in Parkinson's disease, the Alzheimer's Disease Assessment Scale‐Cognitive Subscale (ADAS‐Cog) in Alzheimer's disease, and the Expanded Disability Status Scale (EDSS) in multiple sclerosis report 0.78, 1.76, and 1.62, respectively. 25 , 26 , 27 Further, poorly separated score levels may result in collapse of the probability curves, such that the separation between two adjacent levels disappear. This was not the case for the PSPRS‐10 IRT model; all relevant difficulty parameters were well separated from zero (Table S1, parameters b2–b4). Given these properties of the IRT model for PSPRS‐10, it is not surprising that the rescoring, where some levels are collapsed, results in a loss of information (Table S2).
The latent variable scale is unbounded and disease progression in IRT models can therefore almost always be well described by a linear change over time (eg, references 25 , 26 , 27 ). In the present work, a linear model captured reasonably well both the time trend and the variability across patients (Fig. 2, S4, S10). Progression models for total score outcomes typically need to be nonlinear, at least when capturing disease severities from mild to severe, as the assessments scales are bounded. Such models are more complex and more challenging, especially when utilizing sparse data. An advantage with IRT models is that they draw from the rich item‐level information and yet can predict the time course on both the latent variable and total score scale. To make predictions on the total score level, the underlying ICCs or polynomial approximations can be used. In the present work, Chebyshev polynomials (Fig. S11) were used and the progression rates in the interventional studies were predicted to be 5.5, 6.8, and 4.2 per year at total PSPRS‐10 score values of 10, 20, and 30.
The interventional studies had a higher baseline and a slower progression than the observational trials. The higher baseline is likely related to the inclusion criteria in the interventional studies. The higher progression rate in the observational studies could be related to a selection process, for example, that patients with rapid progression are more likely to have new assessments.
The investigations on expected trial performance focused on studies assessing disease‐modifying effects, where the progression rate was decreased in the treatment arm. Further, as in previous sample size calculations for PSP, 10 , 28 the focus was on 1‐year, two‐arm trials. Therefore, we explored different treatment sizes (20%–50%), scale versions (PSPRS‐10 or rPSPRS‐10), data sources (longitudinal versus end‐of‐treatment data), and analysis models (IRT or total score) under a 1‐year two‐arm setting. PSPRS‐10, as expected, was consistently more powerful than rPSPRS‐10. Using end‐of‐treatment effect analysis instead of the whole‐time profile resulted in significantly less power when using the linear total score models. A longitudinal IRT‐based analysis of PSPRS‐10 data in this study predicted a considerably smaller sample size (38 patients) needed to detect a 50% change in progression compared with a previous estimate (102 patients) based on a PSPRS total score end‐of‐treatment analysis. Similarly, only 276 patients were required to detect a 20% change using an IRT longitudinal PSPRS‐10 model, compared with the previously reported PSPRS total score end‐of‐treatment analysis (618 patients). 10
In summary, we have through analysis of a large database from both interventional and observational studies been able to understand item‐level properties of the PSPRS (Fig. S12, Table S12). Based on these properties, a focus on the items selected in PSPRS‐10 seems motivated. A rescoring of those items, however, is likely to weaken the scale as a tool to identify disease‐modifying treatment effects. Positive impact on the power of an interventional trial is driven by inclusion of patients with low disease severity, long trial duration, longitudinal data, and IRT model analysis.
Financial Disclosures
E.L.P. is employed by and owns stock in Pharmetheus. G.H. was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy within the framework of the Munich Cluster for Systems Neurology (EXC 2145 SyNergy – ID 390857198), the German Federal Ministry of Education and Research (BMBF, CurePML EN2021‐039); the Rainwater Charitable Foundation (Pre‐PSP), the European Joint Programme on Rare Diseases (Improve‐PSP), Deutsche Forschungsgemeinschaft (HO2402/18‐1 MSAomics), Petermax‐Müller Foundation (Etiology and Therapy of Synucleinopathies and Tauopathies), the German Parkinson Society (DPG): ProAPS; has ongoing research collaborations with Roche, UCB, AbbVie; serves as a consultant for AbbVie, Alzprotect, Amylyx, Aprinoia, Asceneuron, Bayer, Bial, Biogen, Biohaven, Epidarex, Ferrer, Kyowa Kirin, Lundbeck, Novartis, Retrotope, Roche, Sanofi, Servier, Takeda, Teva, UCB; received honoraria for scientific presentations from AbbVie, Bayer, Bial, Biogen, Bristol Myers Squibb, Esteve, Kyowa Kirin, Pfizer, Roche, Teva, UCB, Zambon; holds a patent on Treatment of Synucleinopathies (US 10,918,628 B2; EP 17 787 904.6‐1109/3 525 788) received publication royalties from Academic Press, Kohlhammer, Thieme. M.O.K. has ongoing research collaborations with Roche, GSK, Bayer; and serves as a consultant for and holds stock in Pharmetheus. None of the other coauthors have any financial conflicts of interest to declare.
Author Roles
(1) Article Project: A. Conception, B. Design, C. Execution, D. Discussion; (2) Manuscript Preparation: A. Writing, B. Review and Critique; (3) Dataset: A. Acquisition, B. Data Quality Check.
M.G.: 1B, 1C, 1D, 2A, 2B, 3B.
E.L.P.: 1B, 1D, 2B.
E.Y.: 1D, 2B.
F.K.: 1D, 2B.
M.P.: 1D, 2B.
F.H.: 1D, 2B, 3A, 3B.
G.H.: 1A, 1D, 2A, 2B, 3A, 3B.
M.O.K.: 1A, 1B, 1D, 2A, 2B.
Supporting information
Figure S1. PSPRS‐10 pairwise item correlation Spearman coefficients and their 95% confidence intervals at first record, simulated from the item response theory model (upper left triangle) and observed (lower right triangle).
Figure S2. PSPRS‐10 item response theory model residual item pairwise Spearman correlation coefficients and their 95% confidence intervals at first record, simulated versus estimated.
Figure S3. Generalized additive models (GAM)‐based item characteristic curves (ICCs) for PSPRS‐10. The solid red line is the item response theory (IRT)‐based fit, while the grey solid line is the GAM‐based fit to the observed data with the individual latent variable estimates, resampled from the posterior distribution, as independent variable. The grey area is the 95% confidence interval of the GAM fit based on 10 resamples. The GAM fit made use of cumulative logit (proportional odds) link function, where error is accounted for in the response probability distribution.
Figure S4. Stratified visual predictive check for PSPRS‐10. The 5th and 95th percentiles are highlighted in blue, and the median is highlighted in red. Observed data are shown in circles. Solid lines are the median of the observations. Red shaded area is the 95% confidence interval of the median model predictions. Dashed lines are the 95th and 5th percentiles of the observations. Blue shaded areas are the 95% confidence intervals of the model predictions of the 95th and 5th percentiles. The yellow x axis labels mark the intervals where the observed data are binned (ie, the observed data are grouped into time intervals). From the binning, the median and the percentiles are calculated for each of those time intervals. Through model fit, we can deduce that our latent variable linear progression assumption was reasonable. Further, adding a quadratic term to the described prediction Eq. 3 did not result in an improved fit (Bayesian Information Criterion of quadratic 82,315 vs. linear 82,309). No covariate cycling was observed in the stepwise covariate model building algorithm used.
Figure S5. Fisher information content for PSPRS‐10.
Figure S6. Item characteristic curves of PSPRS‐10 item response theory model.
Figure S7. Parametric power estimation‐based power curves for 20% disease‐modifying effect for PSPRS‐10. On the x axis is the total number of subjects required for a parallel two‐armed study based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts. IRT label in the plot signifies the item response theory‐based longitudinal item‐level model. EOT is the end of treatment where the underlying model only makes use of the first and the last time points. IRT‐TS is the item response theory informed total score model. Linear‐TS is the linear total score model. The models were only fitted to the interventional trials and their parameter estimates are provided in Tables [Link], [Link]. The parametric power estimation is a power mapping algorithm allowing the estimation of a full power curve from a single size trial simulation. It assumes that the distribution of delta‐objective function value follows a non‐central chi square distribution, with estimation of the non‐centrality parameter, which allows the scaling of predicted power to a full power curve. A two‐sided alpha level of 5% was used.
Figure S8. Parametric power estimation‐based power curves for 30% disease‐modifying effect for PSPRS‐10. On the x axis is the total number of subjects required for a parallel two‐armed study based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts. IRT label in the plot signifies the item response theory‐based longitudinal item‐level model. EOT is the end of treatment where the underlying model only makes use of the first and the last time points. IRT‐TS is the item response theory informed total score model. Linear‐TS is the linear total score model. The models were only fitted to the interventional trials and their parameter estimates are provided in Tables [Link], [Link]. The parametric power estimation is a power mapping algorithm allowing the estimation of a full power curve from a single size trial simulation. It assumes that the distribution of delta‐objective function value follows a non‐central chi square distribution, with estimation of the non‐centrality parameter, which allows the scaling of predicted power to a full power curve. A two‐sided alpha level of 5% was used.
Figure S9. Parametric power estimation‐based power curves for 50% disease‐modifying effect for PSPRS‐10. On the x axis is the total number of subjects required for a parallel two‐armed study based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts. IRT label in the plot signifies the item response theory‐based longitudinal item‐level model. EOT is the end of treatment where the underlying model only makes use of the first and the last time points. IRT‐TS is the item response theory informed total score model. Linear‐TS is the linear total score model. The models were only fitted to the interventional trials and their parameter estimates are provided in Tables [Link], [Link]. The parametric power estimation is a power mapping algorithm allowing the estimation of a full power curve from a single size trial simulation. It assumes that the distribution of delta‐objective function value follows a non‐central chi square distribution, with estimation of the non‐centrality parameter, which allows the scaling of predicted power to a full power curve. A two‐sided alpha level of 5% was used.
Figure S10. Pooled data visual predictive check for rPSPRS‐10. The 2.5th and 97.5th percentiles are highlighted in blue, and the median is highlighted in red. Observed data are shown in circles. Solid line is the median of the observations. Red shaded areas are the median 95% confidence interval of the median model predictions. Dashed lines are the 97.5th and 2.5th percentiles of the observations. Blue shaded areas are the 95% confidence intervals of the model predictions of the 97.5th and 2.5th percentiles. The yellow x axis labels mark the intervals where the observed data are binned (ie, the observed data are grouped into time intervals). From the binning, the median and the percentiles are calculated for each of those time intervals.
Figure S11. Item response expected total score value and theory‐based polynomial approximation versus latent variable (top left panel). Expected standard deviation, true and polynomial approximation (lower left panel). Right hand panels show the derivatives of the left panel functions.
Figure S12. PSPRS‐28 pairwise item correlation Spearman coefficients and their 95% confidence intervals at first record, simulated from the item response theory model (upper left triangle) and observed (lower right triangle).
Table S1. Parameter estimates of item response theory models of PSPRS‐10 and rPSPRS‐10.
Table S2. Information content of the Food and Drug Administration (FDA)‐selected items before and after rescoring (separate model fit).
Table S3. PSPRS‐10 and rPSPRS‐10 models longitudinal parameter estimates.
Table S4. Item response theory informed total score progression model parameter estimates for PSPRS‐10.
Table S5. Linear total score progression model parameter estimates for PSPRS‐10.
Table S6. Type I error of the three progression models across the two scales, based on 1000 simulations of a single trial of 100 and 500 subjects. The calculations are based on a two‐armed design with no dropouts. Most error rates were within the binomial confidence interval of 5% for 1000 trials of [3.81%: 6.53%]. EOT, end of treatment.
Table S7. Parameter estimates of item response theory models of PSPRS‐10 and rPSPRS‐10 (used for power analysis/applied only to interventional studies).
Table S8. PSPRS‐10 and rPSPRS‐10 item response theory models longitudinal parameter estimates (used for power analysis/applied only to interventional studies).
Table S9. Item response theory informed total score progression model parameter estimates for PSPRS‐10 and rPSPRS‐10 (used for power analysis/applied only to interventional studies).
Table S10. Linear total score progression model parameter estimates for PSPRS‐10 and rPSPRS‐10 (used for power analysis/applied only to interventional studies).
Table S11. Occurrence of item scores across all studies and all visits.
Table S12. Information content percentage for PSPRS‐28, all data pooled.
Acknowledgments
The computations/data handling was enabled by resources in project NAISS 2023‐5‐516 provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at UPPMAX, funded by the Swedish Research Council through Grant Agreement No. 2022‐06725.
We thank Amylyx Pharmaceuticals for providing the PSPRS rescoring guideline, based on an FDA PreIND meeting. The modified 10‐item PSPRS considered in this article was recommended to Amylyx Pharmaceuticals in a pre‐IND (Investigational New Drug Application) meeting. However, there is no generally FDA‐approved version of the PSPRS. Instead, the FDA has informally indicated that each PSP trial sponsor must have a pre‐IND meeting where the FDA will provide a PSPRS modification specific to that trial. The findings on the specific PSPRS‐10 considered in this article may not generalize to other such modifications.
This publication is based on research using data from data contributors AbbVie that has been made available through Vivli, Inc. Vivli has not contributed to or approved, and is not in any way responsible for, the contents of this publication.
Relevant conflicts of interest/financial disclosures: G.H. has ongoing research collaborations with Roche, UCB, AbbVie; serves as a consultant for AbbVie, Alzprotect, Amylyx, Aprineua, Asceneuron, Bayer, Bial, Biogen, Biohaven, Epidarex, Ferrer, Kyowa Kirin, Lundbeck, Novartis, Retrotope, Roche, Sanofi, Servier, Takeda, Teva, UCB; received honoraria for scientific presentations from AbbVie, Bayer, Bial, Biogen, Bristol Myers Squibb, Kyowa Kirin, Pfizer, Roche, Teva, UCB, Zambon; holds a patent on Treatment of Synucleinopathies (US 10,918,628 B2, EP 17 787 904.6‐1109/3 525 788); received publication royalties from Academic Press, Kohlhammer, Thieme. M.O.K. has ongoing research collaborations with Roche, GSK, Bayer; and serves as a consultant for and holds stock in Pharmetheus.
Funding agencies: G.H. was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy within the framework of the Munich Cluster for Systems Neurology (EXC 2145 SyNergy – ID 390857198), European Joint Programme on Rare Diseases (Improve‐PSP), the Niedersächsisches Ministerium für Wissenschaft und Kunst (MWK, ZN3440.TP)/VolkswagenStiftung (Niedersächsisches Vorab), and Petermax‐Müller Foundation (Etiology and Therapy of Synucleinopathies and Tauopathies). Also, M.O.K. and M.G. were funded by the Swedish Research Council grant 2018‐03317.
Data Availability Statement
The data were collected within the context of clinical trials or observational studies and can be requested from the respective owners of the data.
References
- 1. Litvan I, Agid Y, Calne D, Campbell G, Dubois B, Duvoisin RC, et al. Clinical research criteria for the diagnosis of progressive supranuclear palsy (Steele‐Richardson‐Olszewski syndrome): report of the NINDS‐SPSP international workshop. Neurology 1996;47(1):1–9. [DOI] [PubMed] [Google Scholar]
- 2. Höglinger GU, Respondek G, Stamelou M, Kurz C, Josephs KA, Lang AE, et al. Clinical diagnosis of progressive supranuclear palsy: the Movement Disorder Society criteria. Mov Disord 2017;32(6):853–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Golbe LI, Ohman‐Strickland PA. A clinical rating scale for progressive supranuclear palsy. Brain 2007;130(6):1552–1565. [DOI] [PubMed] [Google Scholar]
- 4. Reise SP, Waller NG. Item response theory and clinical measurement. Annu Rev Clin Psychol 2009;5(1):27–48. [DOI] [PubMed] [Google Scholar]
- 5. Glasmacher SA, Leigh PN, Saha RA. Predictors of survival in progressive supranuclear palsy and multiple system atrophy: a systematic review and meta‐analysis. J Neurol Neurosurg Psychiatry 2017;88(5):402–411. [DOI] [PubMed] [Google Scholar]
- 6. Guasp M, Molina‐Porcel L, Painous C, Caballol N, Camara A, Perez‐Soriano A, et al. Association of PSP phenotypes with survival: a brain‐bank study. Parkinsonism Relat Disord 2021;84:77–81. [DOI] [PubMed] [Google Scholar]
- 7. Payan CAM, Viallet F, Landwehrmeyer BG, Bonnet AM, Borg M, Durif F, et al. Disease severity and progression in progressive supranuclear palsy and multiple system atrophy: validation of the NNIPPS – PARKINSON PLUS SCALE. PLoS One 2011;6(8):e22293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Bang J, Lobach IV, Lang AE, Grossman M, Knopman DS, Miller BL, et al. Predicting disease progression in progressive supranuclear palsy in multicenter clinical trials. Parkinsonism Relat Disord 2016;28:41–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Stamelou M, Höglinger G. A review of treatment options for progressive supranuclear palsy. CNS Drugs 2016;30(7):629–636. [DOI] [PubMed] [Google Scholar]
- 10. Stamelou M, Schöpe J, Wagenpfeil S, Del Ser T, Bang J, Lobach IY, et al. Power calculations and placebo effect for future clinical trials in progressive supranuclear palsy. Mov Disord 2016;31(5):742–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Höglinger GU, Litvan I, Mendonca N, Wang D, Zheng H, Rendenbach‐Mueller B, et al. Safety and efficacy of tilavonemab in progressive supranuclear palsy: a phase 2, randomised, placebo‐controlled trial. Lancet Neurol 2021;20(3):182–192. [DOI] [PubMed] [Google Scholar]
- 12. Dam T, Boxer AL, Golbe LI, Höglinger GU, Morris HR, Litvan I, et al. Safety and efficacy of anti‐tau monoclonal antibody gosuranemab in progressive supranuclear palsy: a phase 2, randomized, placebo‐controlled trial. Nat Med 2021;27(8):1451–1457. [DOI] [PubMed] [Google Scholar]
- 13. Tolosa E, Litvan I, Höglinger GU, Burn D, Lees A, Andrés MV, et al. A phase 2 trial of the GSK‐3 inhibitor tideglusib in progressive supranuclear palsy. Mov Disord 2014;29(4):470–478. [DOI] [PubMed] [Google Scholar]
- 14. Nuebling G, Hensler M, Paul S, Zwergal A, Crispin A, Lorenzl S. PROSPERA: a randomized, controlled trial evaluating rasagiline in progressive supranuclear palsy. J Neurol 2016;263(8):1565–1574. [DOI] [PubMed] [Google Scholar]
- 15. Respondek G, Höglinger GU. DescribePSP and ProPSP: German multicenter networks for standardized prospective collection of clinical data, imaging data, and biomaterials of patients with progressive supranuclear palsy. Front Neurol [Internet] 2021;12:644064. [cited Nov 7, 2023]. Available from: https://www.frontiersin.org/articles/10.3389/fneur.2021.644064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Piraid [Internet] . Uppsala University, Pharmacometrics Research Group, 2023. [cited Nov 7, 2023]; Available from: https://github.com/UUPharmacometrics/piraid.
- 17. PsN . Home [Internet]. [cited Nov 7, 2023]; Available from: https://uupharmacometrics.github.io/PsN/.
- 18. Jonsson EN, Karlsson MO. Automated covariate model building within NONMEM. Pharm Res 1998;15(9):1463–1468. [DOI] [PubMed] [Google Scholar]
- 19. Delattre M, Lavielle M, Poursat MA. A note on BIC in mixed‐effects models. Electron J Statist 2014;8:456–475. [Google Scholar]
- 20. Lyauk YK, Jonker DM, Lund TM, Hooker AC, Karlsson MO. Item response theory modeling of the International Prostate Symptom Score in patients with lower urinary tract symptoms associated with benign prostatic hyperplasia. AAPS J 2020;22(5):115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Bergstrand M, Hooker AC, Wallin JE, Karlsson MO. Prediction‐corrected visual predictive checks for diagnosing nonlinear mixed‐effects models. AAPS J 2011;13(2):143–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wellhagen GJ, Ueckert S, Kjellsson MC, Karlsson MO. An item response theory‐informed strategy to model total score data from composite scales. AAPS J 2021;23(3):45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Ueckert S, Karlsson M, Hooker C. Accelerating Monte Carlo power studies through parametric power estimation. J Pharmacokinet Pharmacodyn 2016;43:223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Center for Drug Evaluation and Research . Submitting clinical trial datasets and documentation for clinical outcome assessments using item response theory [Internet]. FDA, 2023. [cited Jul 18, 2024]; Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/submitting-clinical-trial-datasets-and-documentation-clinical-outcome-assessments-using-item.
- 25. Gottipati G, Karlsson MO, Plan EL. Modeling a composite score in Parkinson's disease using item response theory. AAPS J 2017;19(3):837–845. [DOI] [PubMed] [Google Scholar]
- 26. The Alzheimer's Disease Neuroimaging Initiative , Ueckert S, Plan EL, Ito K, Karlsson MO, Corrigan B, et al. Improved utilization of ADAS‐Cog assessment data through item response theory based pharmacometric modeling. Pharm Res 2014;31(8):2152–2165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Novakovic AM, Krekels EHJ, Munafo A, Ueckert S, Karlsson MO. Application of item response theory to modeling of Expanded Disability Status Scale in multiple sclerosis. AAPS J 2017;19(1):172–179. [DOI] [PubMed] [Google Scholar]
- 28. Yousefi E, Gewily M, König F, Höglinger G, Hopfner F, Karlsson MO, et al. Efficiency of multivariate tests in trials in progressive supranuclear palsy [Internet]. arXiv, 2023. [cited Apr 9, 2024]; Available from: http://arxiv.org/abs/2312.08169. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. PSPRS‐10 pairwise item correlation Spearman coefficients and their 95% confidence intervals at first record, simulated from the item response theory model (upper left triangle) and observed (lower right triangle).
Figure S2. PSPRS‐10 item response theory model residual item pairwise Spearman correlation coefficients and their 95% confidence intervals at first record, simulated versus estimated.
Figure S3. Generalized additive models (GAM)‐based item characteristic curves (ICCs) for PSPRS‐10. The solid red line is the item response theory (IRT)‐based fit, while the grey solid line is the GAM‐based fit to the observed data with the individual latent variable estimates, resampled from the posterior distribution, as independent variable. The grey area is the 95% confidence interval of the GAM fit based on 10 resamples. The GAM fit made use of cumulative logit (proportional odds) link function, where error is accounted for in the response probability distribution.
Figure S4. Stratified visual predictive check for PSPRS‐10. The 5th and 95th percentiles are highlighted in blue, and the median is highlighted in red. Observed data are shown in circles. Solid lines are the median of the observations. Red shaded area is the 95% confidence interval of the median model predictions. Dashed lines are the 95th and 5th percentiles of the observations. Blue shaded areas are the 95% confidence intervals of the model predictions of the 95th and 5th percentiles. The yellow x axis labels mark the intervals where the observed data are binned (ie, the observed data are grouped into time intervals). From the binning, the median and the percentiles are calculated for each of those time intervals. Through model fit, we can deduce that our latent variable linear progression assumption was reasonable. Further, adding a quadratic term to the described prediction Eq. 3 did not result in an improved fit (Bayesian Information Criterion of quadratic 82,315 vs. linear 82,309). No covariate cycling was observed in the stepwise covariate model building algorithm used.
Figure S5. Fisher information content for PSPRS‐10.
Figure S6. Item characteristic curves of PSPRS‐10 item response theory model.
Figure S7. Parametric power estimation‐based power curves for 20% disease‐modifying effect for PSPRS‐10. On the x axis is the total number of subjects required for a parallel two‐armed study based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts. IRT label in the plot signifies the item response theory‐based longitudinal item‐level model. EOT is the end of treatment where the underlying model only makes use of the first and the last time points. IRT‐TS is the item response theory informed total score model. Linear‐TS is the linear total score model. The models were only fitted to the interventional trials and their parameter estimates are provided in Tables [Link], [Link]. The parametric power estimation is a power mapping algorithm allowing the estimation of a full power curve from a single size trial simulation. It assumes that the distribution of delta‐objective function value follows a non‐central chi square distribution, with estimation of the non‐centrality parameter, which allows the scaling of predicted power to a full power curve. A two‐sided alpha level of 5% was used.
Figure S8. Parametric power estimation‐based power curves for 30% disease‐modifying effect for PSPRS‐10. On the x axis is the total number of subjects required for a parallel two‐armed study based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts. IRT label in the plot signifies the item response theory‐based longitudinal item‐level model. EOT is the end of treatment where the underlying model only makes use of the first and the last time points. IRT‐TS is the item response theory informed total score model. Linear‐TS is the linear total score model. The models were only fitted to the interventional trials and their parameter estimates are provided in Tables [Link], [Link]. The parametric power estimation is a power mapping algorithm allowing the estimation of a full power curve from a single size trial simulation. It assumes that the distribution of delta‐objective function value follows a non‐central chi square distribution, with estimation of the non‐centrality parameter, which allows the scaling of predicted power to a full power curve. A two‐sided alpha level of 5% was used.
Figure S9. Parametric power estimation‐based power curves for 50% disease‐modifying effect for PSPRS‐10. On the x axis is the total number of subjects required for a parallel two‐armed study based on a significance level of 5% and a power of 80%. Sample sizes are calculations based on no dropouts. IRT label in the plot signifies the item response theory‐based longitudinal item‐level model. EOT is the end of treatment where the underlying model only makes use of the first and the last time points. IRT‐TS is the item response theory informed total score model. Linear‐TS is the linear total score model. The models were only fitted to the interventional trials and their parameter estimates are provided in Tables [Link], [Link]. The parametric power estimation is a power mapping algorithm allowing the estimation of a full power curve from a single size trial simulation. It assumes that the distribution of delta‐objective function value follows a non‐central chi square distribution, with estimation of the non‐centrality parameter, which allows the scaling of predicted power to a full power curve. A two‐sided alpha level of 5% was used.
Figure S10. Pooled data visual predictive check for rPSPRS‐10. The 2.5th and 97.5th percentiles are highlighted in blue, and the median is highlighted in red. Observed data are shown in circles. Solid line is the median of the observations. Red shaded areas are the median 95% confidence interval of the median model predictions. Dashed lines are the 97.5th and 2.5th percentiles of the observations. Blue shaded areas are the 95% confidence intervals of the model predictions of the 97.5th and 2.5th percentiles. The yellow x axis labels mark the intervals where the observed data are binned (ie, the observed data are grouped into time intervals). From the binning, the median and the percentiles are calculated for each of those time intervals.
Figure S11. Item response expected total score value and theory‐based polynomial approximation versus latent variable (top left panel). Expected standard deviation, true and polynomial approximation (lower left panel). Right hand panels show the derivatives of the left panel functions.
Figure S12. PSPRS‐28 pairwise item correlation Spearman coefficients and their 95% confidence intervals at first record, simulated from the item response theory model (upper left triangle) and observed (lower right triangle).
Table S1. Parameter estimates of item response theory models of PSPRS‐10 and rPSPRS‐10.
Table S2. Information content of the Food and Drug Administration (FDA)‐selected items before and after rescoring (separate model fit).
Table S3. PSPRS‐10 and rPSPRS‐10 models longitudinal parameter estimates.
Table S4. Item response theory informed total score progression model parameter estimates for PSPRS‐10.
Table S5. Linear total score progression model parameter estimates for PSPRS‐10.
Table S6. Type I error of the three progression models across the two scales, based on 1000 simulations of a single trial of 100 and 500 subjects. The calculations are based on a two‐armed design with no dropouts. Most error rates were within the binomial confidence interval of 5% for 1000 trials of [3.81%: 6.53%]. EOT, end of treatment.
Table S7. Parameter estimates of item response theory models of PSPRS‐10 and rPSPRS‐10 (used for power analysis/applied only to interventional studies).
Table S8. PSPRS‐10 and rPSPRS‐10 item response theory models longitudinal parameter estimates (used for power analysis/applied only to interventional studies).
Table S9. Item response theory informed total score progression model parameter estimates for PSPRS‐10 and rPSPRS‐10 (used for power analysis/applied only to interventional studies).
Table S10. Linear total score progression model parameter estimates for PSPRS‐10 and rPSPRS‐10 (used for power analysis/applied only to interventional studies).
Table S11. Occurrence of item scores across all studies and all visits.
Table S12. Information content percentage for PSPRS‐28, all data pooled.
Data Availability Statement
The data were collected within the context of clinical trials or observational studies and can be requested from the respective owners of the data.
