Abstract
Background and Objectives
Parkinson disease (PD) and progressive supranuclear palsy (PSP) are often difficult to differentiate in the clinic. The MR parkinsonism index (MRPI) has been recommended to assist in making this distinction. We aimed to assess the usefulness of this tool in our real-world practice of movement disorders.
Methods
We prospectively obtained MRI scans on consecutive patients with movement disorders with a clinical indication for imaging and obtained measures of MRI regions of interest (ROIs) from our neuroradiologists. The authors reviewed all MRI scans and corrected any errors in the original ROI drawings for this analysis. We retrospectively assigned diagnoses using established consensus criteria from progress notes stored in our electronic medical record. We analyzed the data using multinomial logistic regression models and receiver operating curve analysis to determine the predictive accuracy of the MRI ratios.
Results
MRI measures and consensus diagnoses were available on 130 patients with PD, 54 with PSP, and 77 diagnosed as other. The out-of-sample prediction error rate of our 5 regression models ranged from 45% to 59%. The average sensitivity and specificity of the 5 models in the testing sample were 53% and 80%, respectively. The positive predictive value of an MRPI ≥13.55 (the published cutoff) in our patients was 79%.
Discussion
These results indicate that MRI measures of brain structures were not effective at predicting diagnosis in individual patients. We conclude that the search for a biomarker that can differentiate PSP from PD must continue.
Progressive supranuclear palsy (PSP) is a neurodegenerative disease categorized by vertical ophthalmoplegia, axial rigidity, postural instability, and cognitive dysfunction.1 This disorder is not usually difficult to distinguish from Parkinson disease (PD) late in the course because PD is not associated with downgaze paresis and slowed vertical saccades, which are the hallmark brainstem signs of PSP. However, early in the clinical course, PSP can manifest the classic PD symptoms of asymmetric onset and resting tremor, while still being moderately levodopa responsive.2 When evaluating patients with parkinsonism in the clinic, it is essential to make an accurate diagnosis as soon as possible, even in early disease, to enable accurate prognostic counseling of the patient and for potential enrollment in clinical trials. Early differentiation of PSP from PD and other forms of parkinsonism remains a challenge and is an unmet need.
Early efforts to differentiate PSP from PD used electroencephalography3 and diffusion-weighted MRI4 but were developed based on very small cohorts of patients and were not widely adopted. In 2008, an article was published describing a means for discriminating PSP from PD using measurements on an MRI scan.5 This method involved measuring the area of the pons and midbrain and the width of the superior and middle cerebellar peduncles (SCP and MCP) on a standard MRI. These values were used to calculate the MR parkinsonism index (MRPI), which was defined as follows:
![]() |
where pons and midbrain represent the area of these structures in the midline and MCP and SCP represent the width of these structures (averaging the left and right sides). Using receiver operating curve (ROC) analysis, they developed optimal cutoff values, which, in their sample of 33 patients with PSP and 108 patients with PD, yielded a high sensitivity and specificity, and in the case of the MRPI using a cutoff of 13.55, produced a positive predictive value of 100% for identifying PSP. In 2018, the same group published a refinement of the original MRPI, which they termed MRPI 2.0, which required additional measurements of the width of the third ventricle and frontal horns of the lateral ventricles.6 Their goal was to use additional MRI measures to enhance their algorithm's ability to distinguish PSP with predominant parkinsonism (oculomotor dysfunction without falls or balance impairment, PSP-P) from PD. When comparing patients with PSP-P (n = 36) with those with PD (n = 56), MRPI 2.0 exhibited a higher sensitivity but lower specificity compared with the original MRPI, resulting in a lower positive predictive value for MRPI 2.0. By contrast, MRPI 2.0 and the original MRPI performed equally well at differentiating PSP Richardson syndrome (oculomotor dysfunction + early falls or balance impairment) with a positive predictive value of 100%.
Following the publication of the original MRPI, a number of studies have demonstrated its usefulness in differentiating PSP from PD2,7-14 and vascular parkinsonism.9,15 Accordingly, beginning in 2016, our neuroradiologists developed a PSP MRI protocol and began routinely reporting the MRI measures required for calculating the MRPI in our patients with movement disorders. In this report, we show the real-world performance of these measures in our clinical setting.
Methods
Standard Protocol Approvals, Registrations, and Patient Consents
The study protocol and Health Insurance Portability and Accountability Act waiver were reviewed by the institutional review board and determined to meet the criteria for exempt research.
Participants and MRI Structure Measurement
During the study period from April 2016 through December 2020, consecutive patients seen in our movement disorders center with a clinical indication for an MRI scan received the PSP protocol, and the measures required for calculating the MRPI were documented in the report. Imaging was performed on 1 of 2 scanners (a 3T Philips Ingenia and a 3T Siemens Prisma). Details of imaging sequences are presented in eAppendix 1, links.lww.com/CPJ/A414. Figure 1, A–C, shows sample images and drawings (enhanced by an artist) of regions of interest (ROIs) used for this purpose. Patients receiving the PSP protocol during this period were being evaluated for PD, PSP, multiple systems atrophy, essential tremor, various forms of dementia, and other varieties of atypical parkinsonism. Clinical indications for MRI scans were to exclude mimicking conditions and evaluate complaints such as cognitive impairment.
Figure 1. Sagittal T1-Weighted Volumetric Spoiled Gradient-Echo MRIs.
(A–C) show measurements (enhanced by an artist) performed by the research team strictly adhering to the method described by Quattrone et al. (D–F) are images from the same patient with drawings made by the original interpreting neuroradiologists demonstrating examples of common deviations from the correct drawings. In (D), the separation between the midbrain and pons is set in a lower-than-expected location. The lines delineating the limits of the pons are not straight parallel lines as shown correctly in (A). As a result, the midbrain is overestimated, and the pons is underestimated by these measurements. Comparing (B and E), (E) designates the superior cerebellar peduncle at the thickest part of the peduncle rather than the middle as shown in (B). In F, the inferior limit of the middle cerebellar peduncle is set in the ventral pons, and this slice is more lateral than it should be as shown correctly in (C). Double sided arrows indicate the width of the structure shown.
Subsequently, our electronic medical record (EMR) system was queried to identify those patients receiving MRI scans ordered by the movement disorders clinicians during the study period, and these reports were inspected for the presence of documented measures needed to calculate the MRPI. Patients with complete measures were assigned a diagnosis by the data abstractor based on clinical notes as described below. A spreadsheet was assembled containing clinical and demographic information on patients along with the MRI measures of interest originally documented by the interpreting neuroradiologist.
A sample of the scans with measures recorded at the time of original interpretation was reviewed by 2 of the neuroradiologist authors (F.S.F and B.R.S.) who discovered significant inaccuracies in the original drawings rendering the measures initially documented by the interpreting neuroradiologist potentially unreliable (see Figure 1, D–F, for examples). Accordingly, all MRI scans were reinspected by the authors (neuroradiologists, a neurology resident, and a research intern), and where inaccuracies in drawings were found, the ROIs were redrawn and the relevant measures recalculated using the published methodology.5 For the purposes of this report, only recalculated measures based on redrawings were used, unless the original drawings and calculations were confirmed to be accurate on review of one of the authors.
Data Abstraction
Clinical notes in the EMR were retrospectively reviewed to extract clinical data elements necessary for diagnosis. Patients were classified as having PSP if they met the criteria for probable, possible, or suggestive PSP according to the Movement Disorders Society (MDS) criteria.1 Patients were classified as PD if they met the MDS clinical diagnostic criteria for established or probable PD.16 These classifications of patients with PD and PSP are referred to as the inclusive cohort (as distinguished from the restrictive cohort, described below). Patients who did not fit the classification as PD or PSP were designated other. This included patients with dementia with Lewy bodies, essential tremor, neuroleptic-induced parkinsonism, and other forms of atypical parkinsonism (see eTable 1, links.lww.com/CPJ/A415, for a complete list of diagnoses included in the Other category). Of note, the clinical data abstractor was blinded to the MRI measurements, and accordingly, these parameters were not used to assist with clinical classification.
The MRI measurements collected included the pons area (mm2), midbrain area (mm2), right MCP width (mm), left MCP width (mm), right SCP width (mm), and left SCP width (mm). The average of the right and left MCP width and that of the right and left SCP width were used in the formula to calculate the MRPI.
Analytic Plan
The aim of our project was to examine the performance of the MRI measures, in various combinations and as calculated in the MRPI, to discriminate between the 3 clinical groups: PD, PSP, and other. To examine the discriminant ability of these clinical measures, multinomial logistic regression was performed in which diagnosis was treated as the response. Five different multinomial logistic regression models were fit with different covariates: Model 1 included the pons area, midbrain area, average MCP width, and average SCP width, model 2 incorporated the pons area, midbrain area, and the MCP/SCP ratio, model 3 contained the pons/midbrain ratio, average MCP, and average SCP, model 4 included the pons/midbrain ratio and the MCP/SCP ratio, and model 5 was the MRPI.5
For each model, a training sample consisting of the floor (lowest integer) of 90% of patients with each diagnosis was randomly selected and used to estimate the multinomial regression model. Then, the resulting model was used to predict the most probable diagnosis based on the observed covariate values for those patients in the training sample and those who were not included in the training sample (known as the testing sample). The predicted diagnosis was determined by computing the probability of belonging to each group based on the observed variables and then assigning a patient to the group for which the probability is the greatest. The predicted diagnoses were compared with the true diagnoses of the training sample and testing sample to provide the in-sample and out-of-sample misclassification rates, respectively. The out-of-sample misclassification rates provide an estimate of how well the multinomial logistic regression model will perform for future patients; the higher the out-of-sample misclassification rate, the worse the performance of the model for its intended purpose of determining diagnosis using MRI measures. This process was repeated 1,000 times with patients randomly assigned to the training and testing sample at each iteration.
Next, to explore the ability of the 5 models above to accurately discriminate between PD and PSP, an ROC approach was used only on those patients classified as PD or PSP (excluding other) as was used by the authors who developed the MRPI. To do so, we first fit a logistic regression model using a training data set with the predictor variables defined in the respective model to the response, which was 1 if a patient was diagnosed with PSP and 0 if a patient was diagnosed with PD. An ROC curve was fit to the results to determine the threshold of the fitted value from the logistic regression model, which achieved the maximum summation of the sensitivity and specificity. Then, using the logistic regression model fit using the training set, we estimated the fitted value corresponding to the testing set. The sensitivity and specificity in the testing set were determined for the cutoff, which led to the maximum summation of the sensitivity and specificity. This process was repeated for 1,000 iterations to obtain an indication of the predictive ability of these measurements. This identical ROC approach was repeated to determine the ability of the models to discriminate PSP from alternative diagnoses (which consisted of PD and other).
We then performed an additional analysis repeating the above procedures using only the subset of patients with highest diagnostic certainty (established PD only, probable PSP only, and other) to determine whether this strengthened the multinomial regression models or ROC curves for classification of patients through use of MRI measurements. This subset of patients was termed the restrictive cohort.
Finally, we recalculated using the published methodology5 to determine the sensitivity, specificity, and positive predictive value of the recommended measures on our population to enable direct comparison with those in the published literature. Demographic and clinical features together with imaging ratios were analyzed by group assignment using the Welch t tests.
Data Availability
Anonymized data not published within this article will be made available by request from any qualified investigator.
Results
Our search of the EMR identified 282 patients for whom the MRI measures of interest were available. After retrospectively classifying patients using consensus criteria for PD and PSP, 12 patients were excluded from the analysis because their final diagnosis was unknown, or they had been diagnosed clinically with PD or PSP but failed to meet the consensus criteria for those diagnoses. Of the remaining 270 patients, 9 patients had poor-quality MRI scans preventing accurate redrawing of ROIs by one of the authors and so these also were excluded from analysis. The remaining 261 patients are the subject of this report. Of these, 130 had PD (63 established and 67 probable), 54 had PSP (41 probable, 6 possible, and 7 suggestive), and 77 had a diagnosis of other. As part of their imaging review, the authors deemed that adjustments to at least one of the measurements were needed in 72% of patients to obtain conformity to the Quattrone criteria.
Table 1 provides basic demographic and clinical data together with mean imaging ratios on the 261 participants grouped by diagnosis (inclusive cohort). The data show that on average, when comparing groups with PSP and other diagnoses, the MRPI and pons/midbrain ratio successfully discriminate PSP from these other diagnoses. Figure 2 displays boxplots of the imaging ratios stratified by diagnosis.
Table 1.
Clinical, Demographic, and Imaging Ratio Features of the 261 Participants With MRI Measures Obtained During the Routine Course of Care for a Movement Disorder
Figure 2. Median Values of Ratios for the 3 Groups (PD, PSP, and other).
Tukey boxplots showing the median (horizontal line), interquartile range (box), range excluding outliers (vertical lines), and outliers (dots) for the 3 diagnostic groups (PD, PSP, and other) for (A) magnetic resonance parkinsonism index (MRPI), (B) middle cerebellar peduncle/superior cerebellar peduncle ratio, and (C) pons/midbrain ratio. PD = Parkinson disease; PSP = progressive supranuclear palsy.
Multinomial Logistic Regression (Prediction of Diagnosis of PD, PSP, or Other)
Our use of multinomial logistic regression was designed to determine whether these MRI measures are useful for prediction of an individual patient's diagnosis, which is the clinically useful application of these measurement techniques. Table 2 presents the average in-sample error (that is the percentage of incorrectly classified patients in the training sample) and the average out-of-sample error (percentage of incorrectly classified patients in the testing sample) by model for the inclusive and restrictive cohorts. The results for both groupings are similar, and because the misclassification rates are high, we conclude that none of the 5 multinomial models accurately predict cohort membership.
Table 2.
Average Error Rates for the 5 Models for the Inclusive and Restrictive Cohorts

ROC Analysis (Prediction of PSP vs PD Excluding Other)
The results of our analysis based on the ROC curves obtained from the logistic regression model meant to discriminate PSP from PD are shown in Table 3 for the inclusive and restrictive PD and PSP cohorts. We found that model 1, regardless of cohort definition, had the greatest average in-sample area under the curve (AUC). Model 5 was the worst-performing model of those tested in terms of AUC and sacrificed sensitivity for relatively high specificity.
Table 3.
Results of ROC Analysis Considering PSP vs PD Only (Excluding Other) for the Inclusive and Restrictive Cohorts
ROC Analysis (Prediction of PSP vs PD and Other)
The results of our analysis based on the ROC curves obtained from the logistic regression model to discriminate PSP from PD and other are shown in Table 4 for the inclusive or restrictive PD and PSP cohorts. In this analysis, model 1 had the greatest average in-sample AUC for both cohorts. Model 5 was again the worst performing of the models tested exhibiting very low sensitivity.
Table 4.
Results of ROC Analysis Considering PSP vs PD and Other for the Inclusive and Restrictive Cohorts
Sensitivity, Specificity, and Positive Predictive Value Using Original Published Cutoffs
Using the cutoffs proposed in the original article for the use of MRI measures in the diagnosis of PSP,5 we found overall poor performance of these in our patient population as shown in Table 5. The cutoff values were much too high to perform well in our patients, resulting in high specificity but very low sensitivity and thus overall disappointing positive predictive values.
Table 5.
Sensitivity, Specificity, and PPV of a Diagnosis of PSP in Our Cohort Using Published Cutoff Values

Discussion
The key finding of our study is that the MRPI and multiple models involving the component measures on which the index is based are ineffective for correctly identifying the diagnosis in individual patients. We undertook this study because early in our experience with the MRPI, we encountered patients who clearly had PD but met the criteria for PSP using this tool and vice versa. Previous studies evaluating this tool used ROC analysis demonstrating that in their particular sample of patients, the tool effectively distinguished PD from PSP.2,7,8,10,12-14,17 However, ROC analysis on a given sample of patients is not suitable for prediction of a tool's accuracy in classifying future patients, which was the goal of our approach.
As shown in Table 1 and Figure 2, when evaluating means in diagnostic groups, the pons/midbrain ratio proved the most useful in separating PSP from either PD or other. This finding was not unexpected because the pathology of PSP results in atrophy of the midbrain, which increases this ratio. However, demonstration of a difference in groups is not relevant to determining the diagnosis in an individual patient.
We used multinomial logistic regression as our primary analytic method as opposed to ROC analysis for 2 reasons. First, the response of interest contained 3 potential values (PD, PSP, and other), whereas ROC curves are constructed based on the sensitivity and specificity of differentiating only 2 classes. Second, we were interested in incorporating more than one measure into our decision-making process, with the exception of model 5, and thus had to examine some linear combination of the measures as opposed to exhaustively searching through the potential combinations of cutoffs defined for each measure. Multinomial logistic regression enabled us to estimate the linear combination of defined measures (or predictors) in each model and obtain the probabilities of group membership, which could then be used to measure the predictive accuracy of the model.
In addition to our multinomial logistic regression approach, we also used ROC curve analysis to determine whether our results using that methodology aligned with previous reports. The results in Table 3 showed significantly lower sensitivity and specificity across all models used when compared with the literature where most studies report at least 80% sensitivity and specificity for the ratios. Thus, in our sample of patients with consensus-defined PD and PSP, these MRI measures do not correlate well with diagnosis regardless of the analytic process used.
The primary limitation of our study was the retrospective classification of patients by diagnosis using clinical data reported in progress notes. Although our clinical movement disorders team is well versed on the clinical features of the various parkinsonisms and generally records all relevant information when making a clinical diagnosis, we cannot exclude the possibility that some information may have been omitted on some patients (such as a sufficiently detailed description of extraocular movements) that would affect diagnostic classification using consensus criteria. The ideal study of the suitability of these MRI measures for diagnostic classification would use a large cohort of subjects with autopsy-proven PSP and PD.
Strengths of our study include the unbiased approach to obtaining the MRI measures on all patients with a movement disorder who needed an MRI scan at our center for any reason during the study period. This approach avoided the bias inherent in ordering scans only for patients with more uncertain diagnoses. However, not all patients seen during the study period received an MRI, and those selected for the imaging study had some valid clinical indication for it. In some cases, the indication was diagnostic uncertainty, whereas in others, it was due to concern for cognitive decline necessitating a search for a structural cause unrelated to the primary diagnosis. In any case, we consider it a strength that scans were obtained in this selected group enriched for diagnostic uncertainty because it is this group for which a tool like MRPI would be needed. In addition, the authors also carefully inspected the original drawings of ROIs documented by the original interpreting neuroradiologist and made necessary adjustments in 72% of patients to ensure compliance with the MRPI definition. Another strength was the use of training and testing sets in our analysis and 1,000 iterations of resampling to derive our results. This approach is more suitable for prediction of usefulness of a diagnostic test on future patients than the ROC analyses commonly reported in the literature.
In conclusion, we found that the recommended MRI measures for discrimination of PSP from other forms of parkinsonism performed poorly in our real-world use of these measurements. In addition, the high proportion of patients with at least one noncompliant measurement identified during our study review suggests that the use of such measures outside a strictly controlled research setting is not likely to be useful. The search for a reliable imaging or blood-based diagnostic biomarker for PD, PSP, and other forms of parkinsonism must continue.
Appendix. Authors

Study Funding
The authors report no targeted funding.
Disclosure
The authors report no relevant disclosures. Full disclosure form information provided by the authors is available with the full text of this article at Neurology.org/cp.
References
- 1.Hoglinger GU, Respondek G, Stamelou M, et al. Clinical diagnosis of progressive supranuclear palsy: the movement disorder society criteria. Mov Disord. 2017;32(6):853-864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Longoni G, Agosta F, Kostic VS, et al. MRI measurements of brainstem structures in patients with Richardson's syndrome, progressive supranuclear palsy-parkinsonism, and Parkinson's disease. Mov Disord. 2011;26(2):247-255. [DOI] [PubMed] [Google Scholar]
- 3.Su PC, Goldensohn ES. Progressive supranuclear palsy. Electroencephalographic studies. Arch Neurol. 1973;29(3):l83-l86. [DOI] [PubMed] [Google Scholar]
- 4.Seppi K, Schocke MF, Esterhammer R, et al. Diffusion-weighted imaging discriminates progressive supranuclear palsy from PD, but not from the Parkinson variant of multiple system atrophy. Neurology. 2003;60(6):922-927. [DOI] [PubMed] [Google Scholar]
- 5.Quattrone A, Nicoletti G, Messina D, et al. MR imaging index for differentiation of progressive supranuclear palsy from Parkinson disease and the Parkinson variant of multiple system atrophy. Radiology. 2008;246(1):214-221. [DOI] [PubMed] [Google Scholar]
- 6.Quattrone A, Morelli M, Nigro S, et al. A new MR imaging index for differentiation of progressive supranuclear palsy-parkinsonism from Parkinson's disease. Parkinsonism Relat Disord. 2018;54:3-8. [DOI] [PubMed] [Google Scholar]
- 7.Eraslan C, Acarer A, Guneyli S, et al. MRI evaluation of progressive supranuclear palsy: differentiation from Parkinson's disease and multiple system atrophy. Neurol Res. 2019;41(2):110-117. [DOI] [PubMed] [Google Scholar]
- 8.Heim B, Mangesius S, Krismer F, et al. Diagnostic accuracy of MR planimetry in clinically unclassifiable parkinsonism. Parkinsonism Relat Disord. 2021;82:87-91. [DOI] [PubMed] [Google Scholar]
- 9.Kim BC, Choi SM, Choi KH, et al. MRI measurements of brainstem structures in patients with vascular parkinsonism, progressive supranuclear palsy, and Parkinson's disease. Neurol Sci. 2017;38(4):627-633. [DOI] [PubMed] [Google Scholar]
- 10.Morelli M, Arabia G, Salsone M, et al. Accuracy of magnetic resonance parkinsonism index for differentiation of progressive supranuclear palsy from probable or possible Parkinson disease. Mov Disord. 2011;26(3):527-533. [DOI] [PubMed] [Google Scholar]
- 11.Nigro S, Arabia G, Antonini A, et al. Magnetic Resonance Parkinsonism Index: diagnostic accuracy of a fully automated algorithm in comparison with the manual measurement in a large Italian multicentre study in patients with progressive supranuclear palsy. Eur Radiol. 2017;27(6):2665-2675. [DOI] [PubMed] [Google Scholar]
- 12.Picillo M, Tepedino MF, Abate F, et al. Midbrain MRI assessments in progressive supranuclear palsy subtypes. J Neurol Neurosurg Psychiatry. 2020;91(1):98-103. [DOI] [PubMed] [Google Scholar]
- 13.Sankhla CS, Patil KB, Sawant N, Gupta S. Diagnostic accuracy of Magnetic Resonance Parkinsonism Index in differentiating progressive supranuclear palsy from Parkinson's disease and controls in Indian patients. Neurol India. 2016;64(2):239-245. [DOI] [PubMed] [Google Scholar]
- 14.Zanigni S, Calandra-Buonaura G, Manners DN, et al. Accuracy of MR markers for differentiating progressive supranuclear palsy from Parkinson's disease. Neuroimage Clin. 2016;11:736-742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mostile G, Nicoletti A, Cicero CE, et al. Magnetic resonance parkinsonism index in progressive supranuclear palsy and vascular parkinsonism. Neurol Sci. 2016;37(4):591-595. [DOI] [PubMed] [Google Scholar]
- 16.Postuma RB, Berg D, Stern M, et al. MDS clinical diagnostic criteria for Parkinson's disease. Mov Disord. 2015;30(12):1591-1601. [DOI] [PubMed] [Google Scholar]
- 17.Nigro S, Morelli M, Arabia G, et al. Magnetic Resonance Parkinsonism Index and midbrain to pons ratio: which index better distinguishes Progressive Supranuclear Palsy patients with a low degree of diagnostic certainty from patients with Parkinson Disease? Parkinsonism Relat Disord. 2017;41:31-36. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Anonymized data not published within this article will be made available by request from any qualified investigator.






