ABSTRACT
Initially proposed for analyzing composite endpoints, the win odds have recently received increasing interest for the analysis of ordinal outcomes. When comparing an ordinal outcome between two groups, the win odds are the odds that a randomly selected subject from the first group has a better outcome than a randomly selected subject from the second group. As such, the win odds are an effect size that is closely related to the Mann–Whitney U test. The win odds can be adjusted for covariates by the probabilistic index model. Here, we aim to extend this model for repeated measurements. We modify the estimation equations of the probabilistic index model to account for within‐subject correlation. Parameter estimation can be conveniently done via some data re‐structuring and the R package geepack. We implement a sandwich‐type estimator to estimate the variance‐covariance matrix. Simulations show that the estimation of the win odds is consistent and the coverage of confidence intervals is close to nominal. We provide an application by reanalyzing a neurological trial for the treatment of Guillain–Barré syndrome (SID‐GBS trial). This extension establishes the win odds as a promising summary measure to compare longitudinal ordinal outcomes. R package lwo is available on GitHub for implementing the proposed method.
Keywords: clinical trials, ordinal longitudinal outcomes, probabilistic index model, win odds
Abbreviations
- GBS
Guillain–Barré syndrome
- PIM
probabilistic index model
- RCT
randomized controlled trial
1. Introduction
Ordinal data are common in many scientific disciplines. For example, in psychometrics, the well‐known Likert scale is often used to measure opinions, attitudes, and motivations. In neurology, ordinal scales are frequently used to assess the health functioning of patients with neurological diseases, e.g., the modified Rankin Scale (mRS) [1] and the Glasgow Outcome Scale Extended (GOS‐E) [2].
If we assign numerical values to the categories of the ordinal scale, we can use statistical methods for numerical data, such as the ‐test and linear regression. Of course, this approach is only appropriate to the extent that the numerical values are a meaningful representation of the categories [3]. Ordinal scales are also often dichotomized so that statistical methods for binary data, such as logistic regression, can be used. Dichotomization is inefficient because it results in a considerable loss of information [4, 5]. It is statistically more efficient and rigorous to use methods that specifically address the discrete, ordered nature of ordinal outcomes. One such method is the proportional odds regression model [6], which is sometimes called a shift analysis. The treatment effect is then quantified as the “common odds ratio” which is interpreted as a general shift to higher (or lower) ordinal categories [7].
Another common choice for the comparison of ordinal outcomes is the rank‐based Mann–Whitney U test [8, 9]. However, it is not so clear what the relevant effect size is. The “probabilistic index” has been suggested as an appropriate choice [10]. Every subject in one group is compared to every subject in the other group to assess who has a better outcome, with ties broken by a coin toss. The probabilistic index is then the probability of a “win” of a subject from the first group over a subject from the second group. The associated odds are referred to as the “win odds” [11]. The straightforward interpretation of these “win statistics” has resulted in their considerable popularity in several fields, including cardiology and neurology [12, 13]. Moreover, a regression framework has been developed to model the effects of covariates on the probabilistic index [14]. This is called the Probabilistic Index Model. It is implemented in the R package pim [15].
Subjects may transition between ordinal categories over time. Studies with a longitudinal design can capture such varying trajectories and characterize changes in the effects of interest [16, 17]. Longitudinal data contain richer information than a single‐time measurement, potentially enhancing the statistical efficiency [18, 19, 20]. Unfortunately, longitudinal information remains underused despite the fact that many studies did have repeated measurements. A review of randomized controlled trials (RCTs) published in top medical journals revealed that while 79% of the RCTs had repeated outcome measurements, only 23% used all outcome data in their primary analyses [21]. Specifically, for ordinal longitudinal data, analyses rarely used both the ordering information and the repeated measurements [22, 23]. Often, either a single‐time‐point analysis was performed on the full ordinal scale, or a longitudinal analysis was done on the dichotomized ordinal scale [23].
Several extensions of the proportional odds model to longitudinal outcomes have been proposed. Within‐subject correlation may be modeled by adding random effects [24] or by using previous outcomes as predictors for the next outcome [22]. Here, our goal is to extend the Probabilistic Index Model to handle repeated measurements.
As a motivating example, we use data from the SID‐GBS trial [25]: A randomized controlled trial that investigated the effect of a second intravenous immunoglobulin dose in patients with severe Guillain–Barré syndrome (GBS). GBS is an acute, immune‐mediated disorder of the peripheral nerves characterized by a monophasic disease course [26]. Patients typically experience a rapid neurological deterioration, followed by a stabilization phase and a slow, often incomplete, recovery. The timing of disease progression varies substantially across individuals, ranging from a few days to several weeks. This heterogeneity leaves room for potential gains through longitudinal analysis.
The remainder of the paper is structured as follows. In Section 2, we provide the statistical background for the win statistics and provide a brief introduction to the Probabilistic Index Model (PIM). In Section 3, we propose our extension of the PIM to the longitudinal setting. We present its operating characteristics in a simulation study in Section 4. In Section 5, we demonstrate the method by re‐analyzing the SID‐GBS trial. We end with a brief discussion. An R package lwo has been developed to implement the proposed method (available at https://github.com/Yongxi‐Long/lwo). All code used to reproduce the simulation study and the SID‐GBS trial analysis is available at https://github.com/Yongxi‐Long/Longitudinal_Win_Odds, and is also included in the Appendix.
2. Statistical Background
The Mann–Whitney U (MWU) test is used to assess whether one distribution tends to yield larger values than the other [8]. Unlike ‐tests, the rank‐based MW test relies only on the ranks, so that it is well‐suited for the comparison of ordinal data [9]. Suppose we compare two groups of sizes and with outcomes and . The MWU statistic is defined as the number of times that an value is larger than a value. In the presence of ties, which is often the case for ordinal data, the MWU statistic is modified by assigning 1/2 for tied pairs [27].
The MWU statistic may be standardized and compared to the standard normal distribution. Now, let have the Bernoulli distribution with probability 1/2 (a “coin toss”) and define the event
The Probabilistic Index (PI) is the probability of this event
It is easy to see that is an unbiased estimator of the PI [28]. Therefore, the PI may be viewed as the effect size associated with the MW test [10, 29]. The null hypothesis of the MWU test is that the two distributions are equal, which immediately implies that PI = 1/2, but the converse does not necessarily hold. So the hypothesis that PI = 1/2 is more general than equality of distributions [11, 28].
The win odds are the odds of the PI, defined as PI/(1‐PI) [11, 30]. Other proposed “win statistics” [31] can also be derived from pairwise comparisons: The win ratio, which is the ratio of win proportions, and the net benefit, which is the difference in win proportions [32, 33]. Both these measures discard the ties. Here, we focus on the PI and win odds to retain the information contained in ties.
When comparing two groups, the PI can be estimated as the proportion of wins plus half of the proportion of ties. However, it is not clear how to account for the effect of covariates. The Probabilistic Index Model (PIM) proposed by Thas et al. [14] allows flexible covariate adjustment by embedding the PI in a regression framework. Now, let be the ordinal outcome of interest and be a ‐dimensional vector of relevant baseline covariates. Consider a sample of individuals with observations . In the simple two‐group comparison, we only need to consider the pairs that are formed by a subject from the first group and a subject from the second group. In the more general set‐up, we need to consider all possible pairs (Figure 1, we may freely choose some ordering). We define the events with the previous Bernoulli random variable
so that
We also define the indicator so that .
FIGURE 1.

Formulation of ten pairs from a hypothetical cohort of five subjects at two visits. The left table presents individual‐level data, and the right table presents pseudo‐outcomes from pairwise comparisons and pseudo‐covariate values as the differences between individual covariate values.
Following the authors of PIM [14], we make the strong, but practical assumption that depends on only through the difference . That is, we assume . To arrive at the probabilistic index model (PIM), we make one further assumption that has the specific form
| (1) |
where is a ‐dimensional vector of regression parameters. By using a logit link, we are modeling the log odds of the probabilistic index, which is the log of win odds. It is a logistic regression model applied to the transformed pair‐level pseudo‐outcomes and pseudo‐covariates . Of course, other link functions are also possible [14]. Note that the PIM has no intercept, which is a consequence of the fact that we have only pair‐level comparisons.
Thas et al. [14] proposed to estimate by solving the following estimation equations
| (2) |
where . For simplicity, we use the variance function as if there were no ties [34], i.e., .
Even if the individual‐level data are independent, the pair‐level data are dependent. For example, the pseudo‐outcomes and are dependent because they share subject 1. To account for this dependence, a sandwich estimator is proposed for the variance‐covariance matrix [14]
| (3) |
where
The PIM is implemented in the R package pim [15].
Obtaining win odds from the PIM in Equation (1) is rather straightforward. As an example, consider an RCT. Let denote the treatment assignment where is the experimental treatment and the control condition. Let denote the patients' age, or some other prognostic factor to be adjusted for. The following PIM
gives the win odds of the treatment as , adjusted for age.
3. Longitudinal Extension of the PIM
Suppose now we have a longitudinal study. The ordinal outcome of interest is measured at visits for individuals . Let be a vector of covariates for the ‐th individual at visit . It can include both time‐fixed baseline covariates and time‐dependent covariates, such as biomarker values. We now form pairs within each visit with pseudo‐outcome and pseudo‐covariates (Figure 1). We may not necessarily have pseudo‐pairs at each visit due to unbalanced data. The PIM in Equation (1) can be extended to the longitudinal setting by modeling the mean of the pseudo‐outcome for all visits
| (4) |
We now have to account for the correlation between the same pair across different visits. We modify the estimation equations in Equation (2) by substituting with
| (5) |
where , and . is a working correlation matrix [35] for repeated outcome measures on the same pair . is now an diagonal matrix with as the ‐th element of the diagonal.
If the pairs were independent, the covariance matrix would be readily estimated by the regular robust sandwich estimator [36]. But the pairs are sparsely correlated in the sense that some are independent while others are dependent due to sharing the same subject. The standard errors estimated from the regular sandwich estimator will be too small.
The pairs in a longitudinal setting have three different types of correlations. First, there is a within‐subject correlation due to repeated measurements on the same subjects. So the measurements of the pair (1,2) at different visits are correlated (Figure 1). Second, there is a correlation due to sharing the same patient. So pair (1,2) and pair (1,3) at the same visit are correlated. Thirdly, pairs (1, 2) and (1,3) at different visits are also correlated as a result of the two types of correlations.
To address these dependencies, we proceed in two steps. First, in the estimating equation (5), we added a working covariance matrix with a specified working correlation matrix to account for the temporal correlation of repeated measurements in parameter estimation. Second, for variance estimation, the usual “robust sandwich” estimator would suffice if pairs were independent. In that case, the “meat” of the sandwich would simply be the sum of cross‐products of the cluster score contributions, i.e., .
However, our setting featured a mix of different types of correlation. We modified the “meat” part of the sandwich estimator from Equation (3) by including additional cross‐products of the correlated cluster score functions. This led to the following equation.
| (6) |
We do not distinguish the three different types of correlation; instead, we simply use to indicate whether pair at time is correlated with pair at time . Specifically,
4. Simulation
We set up our simulation study in the context of the SID‐GBS trial [25]. The primary outcome was Guillain–Barré syndrome disability scale (GBS‐DS) at 4 weeks after the start of the standard intravenous immunoglobulin treatment. GBS‐DS is a seven‐category ordinal scale ranging from score 0 to score 6 (Table 1). GBS‐DS was also measured at week 1, 2, 8, 12, and 26. We consider covariate adjustment for age (continuous) and preceding diarrhea (binary, yes/no).
TABLE 1.
Description of the Guillain–Barré syndrome disability scale (GBS‐DS).
| Score | Description |
|---|---|
| 0 | completely normal |
| 1 | mild symptoms or signs, but able to run |
| 2 | can walk independently 10 meters without help, but cannot run |
| 3 | can walk 10 m with help |
| 4 | bedridden or requiring wheelchair |
| 5 | need assisted ventilation |
| 6 | death |
Our estimand of interest is the log win odds at an arbitrary time point after baseline, adjusted for age and preceding diarrhea, denoted as .
We assessed the performance of the estimator from the longitudinal probabilistic index model (4) using the following metrics: (i) Bias. (ii) Difference of the model variance from the sandwich estimator compared to the Monte Carlo variance. (iii) Coverage of the confidence intervals. For each scenario, we ran 10 000 iterations to ensure that the Monte Carlo standard error of the coverage probability was approximately .
4.1. Data Generating Process
We used GBS‐DS as our outcome and considered two visits after baseline (week 0), week 4, and week 8. We lumped the seven‐category GBS‐DS into five categories (combined score 0 with score 1 and score 6 with score 5) because score 0 and score 6 were not observed for the primary outcome of the SID‐GBS trial.
The ordinal outcome GBS‐DS at each time point was generated from a proportional odds model as follows
are the time effects, and are the treatment effects at weeks 4 and 8, respectively. and are the prognostic effects of age and preceding diarrhea. To obtain correlated GBS‐DS measurements on the same patient, we used the genOrdCat function from the simstudy package [37]. It generates correlated values from the logistic distribution using a standard normal copula‐like approach with a supplied correlation matrix.
We took our covariate distribution from the SID‐GBS trial. Age was generated from a normal distribution with a mean of 60 and a standard deviation of 10. The status of preceding diarrhea was generated from a Bernoulli distribution with a mean of 0.4.
4.2. Simulated Scenarios
We considered four different scenarios (Table 2). The first scenario is the null scenario of no treatment effect throughout follow‐up. The second scenario is a mimic of the neural SID‐GBS trial. The third is a trial with increasing treatment effect over time, and the fourth is a trial with constant treatment effect over time. The correlation matrix is the autoregressive of order one . We varied the autocorrelation coefficient to achieve different strengths of within‐subject correlation. For each scenario, we assessed the influence of varying sample sizes. The intercepts 's were set to to match the marginal proportion of GBS‐DS categories observed in the SID‐GBS trial.
TABLE 2.
Design parameter values for simulation scenarios.
| Scenario | Description | Sample size |
|
|
|
|
|
|
|
Win odds (week 4) | Win odds (week 8) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Null | 50, 100, 200 | 0.3, 0.6 | 0.6 | 1.2 | 0 | 0 | −0.005 | 0.23 | 1 | 1 | |||||||
| 2 | SID‐GBS trial mimic | 50, 100, 200 | 0.3, 0.6 | 0.6 | 1.2 | −0.08 | −0.16 | −0.005 | 0.23 | 0.95 | 0.91 | |||||||
| 3 | Increasing treatment effect | 50, 100, 200 | 0.3, 0.6 | 0.6 | 1.2 | 0.4 | 0.8 | −0.005 | 0.23 | 1.27 | 1.65 | |||||||
| 4 | Constant treatment effect | 50, 100, 200 | 0.3, 0.6 | 0 | 0 | 1 | 1 | −0.005 | 0.23 | 1.80 | 1.80 |
The bias and the coverage probabilities along with the Monte Carlo 95% confidence interval for each simulation scenario are shown in Figure 2. The coverage probabilities are not systematically influenced by the magnitude of the win odds or the strength of the within‐subject correlation. The sandwich estimator tends to slightly underestimate the variance when the sample size is relatively small (). The coverage probabilities converge to the target level as the sample size increases.
FIGURE 2.

Simulation results. (a) Coverage probability of longitudinal win odds at week 0, 4, 8 for scenarios presented in Table 2. Each scenario has 10 000 Monte Carlo simulated data sets, resulting in a Monte Carlo standard error of about 0.002 for the simulated coverage probability of 0.95. (b) Boxplot of difference between estimates of log win odds () and estimands () at week for scenarios presented in Table 2.
5. An Application to the SID‐GBS Trial
As an example, we performed a re‐analysis of the SID‐GBS trial [25] to illustrate our proposed method. In the SID‐GBS trial, the primary analysis of GBS‐DS at week 4 post‐randomization reported a common odds ratio of 1.4 (95% CI: 0.6–3.3) from the proportional odds model, adjusted for pre‐randomization covariates, including age, preceding diarrhea, and baseline MRC sum score. We used data from all six visits (week 1, 2, 4, 8, 12, and 26) after baseline and modeled the trajectory of the treatment effect during follow‐up.
We visualized individual trajectories of GBS‐DS status in Figure 3. We can observe a general trend towards better (lower) GBS‐DS scores after an initial deterioration. This agrees with the natural history of GBS that symptoms peak within 4 weeks, followed by a recovery period for most patients [26]. The white gaps in trajectories indicate that some patients had missed outcome measurements at several visits (Figure 3a). The SID‐GBS trial aimed to evaluate the effect of a second dose of intravenous immunoglobulin in GBS patients with poor prognosis, so we can see that about 80% of patients were in relatively severe status (GBS‐DS categories 4 and 5) at baseline (Figure 3b). The proportion of patients being able to walk (GBS‐DS categories 0–3) at week 26 is almost 80%. The change of the outcome distribution over time is similar between the control and the intervention group.
FIGURE 3.

(a) Line chart of GBS‐DS status over the follow‐up in the SID‐GBS trial, each line represents one patient trajectory. (b) Stacked bar chart of observed proportions of GBS‐DS scores over the follow‐up in the SID‐GBS trial.
First, we fitted multiple single‐time PIMs to estimate the win odds separately for each visit. Then we applied the proposed longitudinal PIM (4) to simultaneously model all visits. We estimated the average win odds over time, as well as the time‐varying win odds where the time (as a categorical variable) interacted with the treatment. Figure 4a presents the three types of win odds, both unadjusted and adjusted for age, preceding diarrhea, and baseline GBS‐DS score. The estimates from different approaches are generally similar and indicate a neutral overall effect. The average win odds over time are close to one. The time‐varying win odds peak between week 4 and week 12, then start to wear off (Figure 4a).
FIGURE 4.

Estimated win odds with point‐wise 95% confidence intervals from different modeling approaches. (a) Cross‐sectional win odds; Longitudinal average win odds; Longitudinal time‐varying win odds. For each type, the unadjusted estimate is followed by the estimate adjusted for age, preceding diarrhea, and baseline GBS‐DS. (b) Spline with a knot at week 2 for the trajectory of win odds, adjusted for age, preceding diarrhea, and baseline GBS‐DS.
Compared to the single‐time PIM, the longitudinal PIM provides a less noisy covariate adjustment, resulting in shorter confidence intervals. This is achieved by estimating a single set of baseline covariate effects across all visits. Additionally, the PIM quantifies the covariate effects in win odds as well, which should be interpreted in the context of pairwise comparisons. For instance, the estimate for the coefficient of preceding diarrhea is 0.04, which means that for two randomly selected patients with the same treatment assignment, age and baseline GBS‐DS score, evaluated at the same time point, the odds that the patient with preceding diarrhea has a better (lower) GBS‐DS score is estimated to be (95% CI: 0.66–1.39), compared to the patient without preceding diarrhea.
We can also model the trend of the treatment effect over time in a continuous way. For example, we modeled the time‐by‐treatment interaction by natural cubic splines, with a knot at week 2 based on previous exploratory visualizations (Figure 4b). Throughout the follow‐up period, no significant treatment benefit of a second immunoglobulin dose over placebo was observed in patients with severe GBS.
6. Discussion
In this paper, we propose an extension of the probabilistic index model to analyze ordinal longitudinal outcomes. The treatment effect is quantified as the win odds, which can be interpreted as the odds of a randomly selected patient from the treatment group having a better outcome than a randomly selected control, with ties split evenly. The straightforward interpretation of the win odds has been argued to be more communicable than the common odds ratio from the proportional odds model [38, 39], which is a weighted average of binary odds ratios across all possible dichotomizations of the ordinal scale. However, note that the win odds are not the same as the causal effect, which is the odds that a randomly selected patient will have a better outcome under treatment than under control. This is not identifiable from randomized controlled trials [40]. By formulating the win odds within a regression framework, the longitudinal PIM offers greater flexibility in adjusting for covariates and handling clustered or repeated measurements, compared to previously proposed stratified win odds approaches [41] and Mann–Whitney‐type tests for clustered data [42].
In monophasic disorders such as the Guillain–Barré syndrome and traumatic brain injury, the timing of outcome assessment is a frequent topic of debate, mainly due to variability in disease progression and recovery trajectories among individuals [43, 44]. A single‐time point analysis can be either too early—before meaningful recovery has occurred, or too late—the treatment effect has waned, to demonstrate a difference. The proposed longitudinal PIM addresses this challenge by providing a more comprehensive assessment of the treatment benefit in time‐sensitive settings.
The PIM is essentially a logistic regression model estimating the conditional mean of the pseudo‐outcomes. Therefore, estimation can be done conveniently via existing software. We use the R package geepack with extra scripting to implement the sandwich‐type estimator for estimating the variance‐covariance matrix separately. We do not distinguish between the three types of correlations but instead assess whether pairs are correlated. It has been shown that subjects do not necessarily need to share the same correlation structure for the sandwich estimator to be consistent [36]. Consequently, the sandwich estimator remains robust even when the correlation structure of the pairs is misspecified. Slight underestimation of the variance is observed under a sample size of 50 due to the small sample bias of the sandwich estimator. The bootstrap can reduce the bias [45], but the coverage is already acceptable in our opinion. We tried Mancl and DeRouen's bias‐corrected robust variance estimator [46], but it did not meaningfully improve the coverage. Application of the proposed method to very small sample sizes is not recommended, due to bias [47] and potential separation issues [48] in logistic regression. Note that for a sample size of subjects, a total of comparisons are made at each visit. The computation time can become a burden with very large sample sizes [15].
The extension of the probabilistic index model is also useful for longitudinal continuous outcomes with outliers or skewed distributions. Because the probabilistic index only exploits ordering information, it is not affected by order‐preserving transformations of the outcomes. However, this robustness may come with the cost in terms of statistical efficiency, as the quantitative information contained in continuous outcomes is not used [28, 49]. Win odds have also received attention for analyzing composite time‐to‐event outcomes, in fields such as cardiology [13]. In that case, one should caution against its dependency on follow‐up time [31]. For instance, win odds may fail to show the long‐term benefit of a more important but less frequent category if the follow‐up time is not sufficient. Moreover, informative censoring can cause bias in win odds [50].
Via this work, we hope to add an additional tool to the analysis of longitudinal ordinal outcomes. More work is needed to compare the performance of the PIM extension to other longitudinal models for ordinal data, such as the proportional odds random effect model [51] and the Markov model [22]. Future research could explore methods for handling unbalanced longitudinal designs with irregular visit schedules, where aligning observations to form pseudo‐pairs at the same time can be challenging. Overall, we consider the extension of win odds as a promising summary measure to compare longitudinal ordinal outcomes.
Funding
This work was supported by Annexon Biosciences.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Appendix S1: sim70536‐sup‐0001‐AppendixS1.pdf.
Appendix S2: sim70536‐sup‐0002‐AppendixS2.pdf.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
References
- 1. Rankin J., “Cerebral Vascular Accidents in Patients Over the Age of 60: II. Prognosis,” Scottish Medical Journal 2, no. 5 (1957): 200–215. [DOI] [PubMed] [Google Scholar]
- 2. McMillan T., Wilson L., Ponsford J., Levin H., Teasdale G., and Bond M., “The Glasgow Outcome Scale—40 Years of Application and Refinement,” Nature Reviews Neurology 12, no. 8 (2016): 477–485. [DOI] [PubMed] [Google Scholar]
- 3. Liddell T. M. and Kruschke J. K., “Analyzing Ordinal Data With Metric Models: What Could Possibly Go Wrong?,” Journal of Experimental Social Psychology 79 (2018): 328–348. [Google Scholar]
- 4. Roozenbeek B., Lingsma H. F., Perel P., et al., “The Added Value of Ordinal Analysis in Clinical Trials: An Example in Traumatic Brain Injury,” Critical Care 15 (2011): 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Taylor A. B., West S. G., and Aiken L. S., “Loss of Power in Logistic, Ordinal Logistic, and Probit Regression When an Outcome Variable Is Coarsely Categorized,” Educational and Psychological Measurement 66, no. 2 (2006): 228–239. [Google Scholar]
- 6. McCullagh P., “Regression Models for Ordinal Data,” Journal of the Royal Statistical Society. Series B, Statistical Methodology 42, no. 2 (1980): 109–127. [Google Scholar]
- 7. Long Y., Wiegers E. J., Jacobs B. C., Steyerberg E. W., and Zwet v E W., “Role of the Proportional Odds Assumption for the Analysis of Ordinal Outcomes in Neurologic Trials,” Neurology 105, no. 8 (2025): e214146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Mann H. B. and Whitney D. R., “On a Test of Whether One of Two Random Variables Is Stochastically Larger Than the Other,” Annals of Mathematical Statistics 18 (1947): 50–60. [Google Scholar]
- 9. McKnight P. E. and Najab J., “Mann‐Whitney U Test,” in The Corsini Encyclopedia of Psychology (John Wiley & Sons, Ltd., 2010), 1. [Google Scholar]
- 10. Acion L., Peterson J. J., Temple S., and Arndt S., “Probabilistic Index: An Intuitive Non‐Parametric Approach to Measuring the Size of Treatment Effects,” Statistics in Medicine 25, no. 4 (2006): 591–602. [DOI] [PubMed] [Google Scholar]
- 11. Brunner E., Vandemeulebroecke M., and Mütze T., “Win Odds: An Adaptation of the Win Ratio to Include Ties,” Statistics in Medicine 40, no. 14 (2021): 3367–3384. [DOI] [PubMed] [Google Scholar]
- 12. Churilov L., Johns H., and Turc G., ““Tournament Methods” for the Ordinal Analysis of Modified Rankin Scale: The Past, the Present, and the Future,” Stroke 53, no. 10 (2022): 3032–3034, 10.1161/STROKEAHA.122.039614. [DOI] [PubMed] [Google Scholar]
- 13. Ferreira J. P., Jhund P. S., Duarte K., et al., “Use of the Win Ratio in Cardiovascular Trials,” Heart Failure 8, no. 6 (2020): 441–450. [DOI] [PubMed] [Google Scholar]
- 14. Thas O., Neve J. D., Clement L., and Ottoy J. P., “Probabilistic Index Models,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 74, no. 4 (2012): 623–671. [Google Scholar]
- 15. Meys J., De Neve J., Sabbe N., and Guimaraes de Castro Amorim G., “pim: Fit Probabilistic Index Models,” R Package Version 2.0.2, (2020).
- 16. Albert P. S., “Longitudinal Data Analysis (Repeated Measures) in Clinical Trials,” Statistics in Medicine 18, no. 13 (1999): 1707–1732. [DOI] [PubMed] [Google Scholar]
- 17. Fitzmaurice G. M. and Ravichandran C., “A Primer in Longitudinal Data Analysis,” Circulation 118, no. 19 (2008): 2005–2010. [DOI] [PubMed] [Google Scholar]
- 18. Anota A., Barbieri A., Savina M., et al., “Comparison of Three Longitudinal Analysis Models for the Health‐Related Quality of Life in Oncology: A Simulation Study,” Health and Quality of Life Outcomes 12 (2014): 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Ashbeck E. L. and Bell M. L., “Single Time Point Comparisons in Longitudinal Randomized Controlled Trials: Power and Bias in the Presence of Missing Data,” BMC Medical Research Methodology 16 (2016): 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Schober P. and Vetter T. R., “Repeated Measures Designs and Analysis of Longitudinal Data: If at First You Do Not Succeed—Try, Try Again,” Anesthesia & Analgesia 127, no. 2 (2018): 569–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Bell M. L., Fiero M., Horton N. J., and Hsu C. H., “Handling Missing Data in RCTs; a Review of the Top Medical Journals,” BMC Medical Research Methodology 14 (2014): 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Rohde M. D., French B., Stewart T. G., and F. E. Harrell, Jr. , “Bayesian Transition Models for Ordinal Longitudinal Outcomes,” Statistics in Medicine 43 (2024): 3539–3561. [DOI] [PubMed] [Google Scholar]
- 23. Long Y., Ruiter D. S. C., Luijten L. W., et al., “Statistical Practice of Ordinal Outcome Analysis in Neurologic Trials: A Literature Review,” Neurology 104, no. 4 (2025): e210229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Agresti A. and Lang J. B., “A Proportional Odds Model With Subject‐Specific Effects for Repeated Ordered Categorical Responses,” Biometrika 80, no. 3 (1993): 527–534. [Google Scholar]
- 25. Walgaard C., Jacobs B. C., Lingsma H. F., et al., “Second Intravenous Immunoglobulin Dose in Patients With Guillain‐Barré Syndrome With Poor Prognosis (SID‐GBS): A Double‐Blind, Randomised, Placebo‐Controlled Trial,” Lancet Neurology 20, no. 4 (2021): 275–283. [DOI] [PubMed] [Google Scholar]
- 26. Willison H. J., Jacobs B. C., and Doorn v P A., “Guillain‐Barre Syndrome,” Lancet 388, no. 10045 (2016): 717–727. [DOI] [PubMed] [Google Scholar]
- 27. Hemelrijk J., “Note on Wilcoxon's Two‐Sample Test When Ties Are Present,” Annals of Mathematical Statistics 23, no. 1 (1952): 133–135. [Google Scholar]
- 28. De Schryver M. and De Neve J., “A Tutorial on Probabilistic Index Models: Regression Models for the Effect Size P (Y1¡ Y2),” Psychological Methods 24, no. 4 (2019): 403–418. [DOI] [PubMed] [Google Scholar]
- 29. Zhou W., “Statistical Inference for P (x¡ y),” Statistics in Medicine 27, no. 2 (2008): 257–279. [DOI] [PubMed] [Google Scholar]
- 30. Song J., Verbeeck J., Huang B., et al., “The Win Odds: Statistical Inference and Regression,” Journal of Biopharmaceutical Statistics 33, no. 2 (2023): 140–150. [DOI] [PubMed] [Google Scholar]
- 31. Dong G., Huang B., Wang D., Verbeeck J., Wang J., and Hoaglin D. C., “Adjusting Win Statistics for Dependent Censoring,” Pharmaceutical Statistics 20, no. 3 (2021): 440–450. [DOI] [PubMed] [Google Scholar]
- 32. Pocock S. J., Ariti C. A., Collier T. J., and Wang D., “The Win Ratio: A New Approach to the Analysis of Composite Endpoints in Clinical Trials Based on Clinical Priorities,” European Heart Journal 33, no. 2 (2012): 176–182. [DOI] [PubMed] [Google Scholar]
- 33. Buyse M., “Generalized Pairwise Comparisons of Prioritized Outcomes in the Two‐Sample Problem,” Statistics in Medicine 29, no. 30 (2010): 3245–3257. [DOI] [PubMed] [Google Scholar]
- 34. De Neve J., “Probabilistic Index Models,” (PhD Thesis), Ghent University, (2013).
- 35. Zeger S. L., Liang K. Y., and Albert P. S., “Models for Longitudinal Data: A Generalized Estimating Equation Approach,” Biometrics 44 (1988): 1049–1060. [PubMed] [Google Scholar]
- 36. Zeger S. L. and Liang K. Y., “Longitudinal Data Analysis for Discrete and Continuous Outcomes,” Biometrics 42, no.1 (1986): 121–130. [PubMed] [Google Scholar]
- 37. Goldfeld K. and Wujciak‐Jens J., “Simstudy: Illuminating Research Methods Through Data Generation,” Journal of Open Source Software 5, no. 54 (2020): 2763, 10.21105/joss.02763. [DOI] [Google Scholar]
- 38. Rahlfs V. W., Zimmermann H., and Lees K. R., “Effect Size Measures and Their Relationships in Stroke Studies,” Stroke 45, no. 2 (2014): 627–633. [DOI] [PubMed] [Google Scholar]
- 39. Howard G., Waller J. L., Voeks J. H., et al., “A Simple, Assumption‐Free, and Clinically Interpretable Approach for Analysis of Modified Rankin Outcomes,” Stroke 43, no. 3 (2012): 664–669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Fay M. P., Brittain E. H., Shih J. H., Follmann D. A., and Gabriel E. E., “Causal Estimands and Confidence Intervals Associated With Wilcoxon‐Mann‐Whitney Tests in Randomized Experiments,” Statistics in Medicine 37, no. 20 (2018): 2923–2937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Dong G., Hoaglin D. C., Huang B., et al., “The Stratified Win Statistics (Win Ratio, Win Odds, and Net Benefit),” Pharmaceutical Statistics 22, no. 4 (2023): 748–756. [DOI] [PubMed] [Google Scholar]
- 42. Rosner B. and Grove D., “Use of the Mann–Whitney U‐Test for Clustered Data,” Statistics in Medicine 18, no. 11 (1999): 1387–1400. [DOI] [PubMed] [Google Scholar]
- 43. Leonhard S. E., Mandarakas M. R., Gondim F. A., et al., “Diagnosis and Management of Guillain–Barré Syndrome in Ten Steps,” Nature Reviews Neurology 15, no. 11 (2019): 671–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Christensen B. K., Colella B., Inness E., et al., “Recovery of Cognitive Function After Traumatic Brain Injury: A Multilevel Modeling Analysis of Canadian Outcomes,” Archives of Physical Medicine and Rehabilitation 89, no. 12 (2008): S3–S15. [DOI] [PubMed] [Google Scholar]
- 45. Amorim G., Thas O., Vermeulen K., Vansteelandt S., and De Neve J., “Small Sample Inference for Probabilistic Index Models,” Computational Statistics & Data Analysis 121 (2018): 137–148. [Google Scholar]
- 46. Mancl L. A. and DeRouen T. A., “A Covariance Estimator for GEE With Improved Small‐Sample Properties,” Biometrics 57, no. 1 (2001): 126–134. [DOI] [PubMed] [Google Scholar]
- 47. W. W. Hauck, Jr. and Donner A., “Wald's Test as Applied to Hypotheses in Logit Analysis,” Journal of the American Statistical Association 72, no. 360a (1977): 851–853. [Google Scholar]
- 48. Albert A. and Anderson J. A., “On the Existence of Maximum Likelihood Estimates in Logistic Regression Models,” Biometrika 71, no. 1 (1984): 1–10. [Google Scholar]
- 49. Liu Q., Shepherd B. E., Li C., and F. E. Harrell, Jr. , “Modeling Continuous Response Variables Using Ordinal Regression,” Statistics in Medicine 36, no. 27 (2017): 4316–4335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Finkelstein D. M. and Schoenfeld D. A., “Graphing the Win Ratio and Its Components Over Time,” Statistics in Medicine 38, no. 1 (2019): 53–61. [DOI] [PubMed] [Google Scholar]
- 51. Hedeker D. and Gibbons R. D., “A Random‐Effects Ordinal Regression Model for Multilevel Analysis,” Biometrics 50 (1994): 933–944. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix S1: sim70536‐sup‐0001‐AppendixS1.pdf.
Appendix S2: sim70536‐sup‐0002‐AppendixS2.pdf.
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
