Abstract
This paper proposes a disease progression model for early stage Parkinson’s Disease (PD) based on DaTscan images. The model has two novel aspects: first, the model is fully coupled across the two caudates and putamina. Second, the model uses a new constraint called model mirror symmetry (MMS). A full Bayesian analysis, with collapsed Gibbs sampling using conjugate priors, is used to obtain posterior samples of the model parameters. The model identifies PD progression subtypes and reveals novel fast modes of PD progression.
Keywords: Parkinson’s Disease, Disease progression model, Bayesian analysis
1. Introduction
Parkinson’s Disease (PD) is a common neurodegenerative disease characterized by loss of dopaminergic neurons, and accompanied by progressively worsening clinical motor and non-motor symptoms. PD is also a heterogeneous disease; it exhibits vastly different rates of progression in different subjects.
DaTscan imaging is the commercial name for SPECT imaging with 123 I-FP-CIT. DaTscans measure the local density of presynaptic dopamine transporters (DaT). Dopaminergic neural loss decreases DaT density and is visible as signal loss in DaTscan images. Our goal is to model the progression of PD as it manifests in DaTscans. We use the Parkinson’s Progression Marker Initiative (PPMI) dataset, described in more details in Sect. 2.
For quantitative analysis, intensity values in every voxel of a DaTscan are converted to Striatal Binding Ratios (SBR). The SBR at voxel is defined as , where is the intensity in voxel and is the mean (or median) intensity in the occipital lobe [8]. SBR is a measure of DaT density in the voxel. The sum or the mean of the SBR in a brain region is taken as a measure of DaT density in the region.
DaTscan and PD characteristics that are important in modeling early stage PD are listed below. Our model takes these characteristics into account:
Disease Stages: PD progresses along stages called Braak Stages [3]. Early stage PD affects the striatum, with the putamen affected more than the caudate.
Coupled Progress: Early stage PD is also asymmetric; one brain hemisphere is affected more than the other hemisphere [7]. As the disease progresses, the disease becomes symmetric, demonstrating a coupling between ROIs.
Exponential Loss: SBR loss in the striatum due to PD progression is approximately exponential [1,6].
Heterogeneity: PD is a heterogeneous disease, with patients progressing at different rates and exhibiting variable clinical symptoms. Different PD subtypes have been proposed using clinical symptoms (MDS-UPDRS ratings), e.g. rigidity-dominant vs. tremor-dominant [4], or early-onset vs. late-onset [13]. To the best of our knowledge, the existence of image-based progression subtypes has not yet been reported in the literature. One of our goals is to investigate such subtypes.
Besides these previously known properties, we identify a new property called model mirror symmetry (MMS) which is critical in reducing the dimension of the model:
Model Mirror Symmetry: Progression from the asymmetric state to the symmetric state does not depend on the hemisphere that the disease originally affected. This implies that the progression model should remain invariant if left hemisphere voxels (or regions) are swapped with right hemisphere voxels (or regions). We call this model mirror symmetry.
The above properties suggest that PD progression in DaTscan images can be modeled as a mixture of linear dynamical systems (LDS), where the transition matrices of the dynamical systems are constrained to be centrosymmetric. This is explained further in Sect. 3.1. We fit the model using a Bayesian methodology: collapsed Gibbs sampling (Sect. 4.1) and Bayesian model selection for the optimal number of mixture components (Sect. 4.2). Sampling avoids entrapment in spurious local maxima, a common problem in maximum-likelihood methods and the EM algorithm.
Modeling brain disease progression is a relatively new problem. Using prior knowledge (about toxic protein aggregation followed by transmission along neuronal pathways) graph theoretic approaches have been proposed for Alzheimer’s disease progression, e.g. [14]. Other approaches use discrete event models or generalized linear time models for disease progression [5,9]. Regression of image features with longitudinal clinical scores are also used for disease progression modeling [17]. Most of these methods are applied to MRI images of Alzheimer’s disease. DaTscans have a lower resolution than MRI images and contain very different kind of information (DaT density rather than structural information). MRI-based models are not directly applicable to DaTscans.
2. Data
The PPMI dataset has 449 early stage PD subjects. The subjects are imaged at baseline and approximately at 1, 2, 4, and 5 years out. Not all subjects have a complete time series. And the time series for different subjects has slightly different timings. The PPMI DaTscan images are registered to the MNI space; but, we did find some misregistered images in the data. We preprocessed PPMI data to eliminate subjects with only a single scan, and subjects with misregistered images (using a simple non-parametric correlation test). In the end, data from 365 subjects survived, and entered the analysis. Of these subjects, 320 subjects had 3 or more scans, 130 subjects had 4 or more scans, and 3 subjects had 5 scans.
Since early stage PD mostly affects the caudate and putamen, we modeled the mean SBR in the two caudates and putamina. The MNI atlas was used to identify caudates and putamina, and manual ROI (similar to the one in [18]) was created for the occipital lobe (Fig. 1 shows these regions on 9 slices). Figure 2 shows mean SBR time series for the 365 surviving subjects. The time series for left and right caudates and the left and right putamina are shown in different plots. For each subject, the sequence of the mean SBRs are shown as blue, orange, yellow, and purple arrows which correspond to data from time 1 to 2, 2 to 3, 3 to 4, and 4 to 5 respectively. Note that, modulo noise, the data are symmetric around the 45° line. The figure shows that any model that fits well to this data should remain invariant if the “left” and “right” labels are swapped. This is model mirror symmetry.
Fig. 1.

Caudates (red), putamina (green), and occipital lobe (blue) ROIs superimposed on the mean baseline DaTscan image (36th - 44th slices shown).
Fig. 2.

Time series of the mean SBR in the caudates and putamina. Blue, orange, yellow, purple arrows represent the vectors from time 1 to 2, 2 to 3, 3 to 4, and 4 to 5 respectively. LC = left caudate, RC = right caudate, LP = left putamen, RP = right putamen.
3. Disease Progression Model
3.1. Time Series Model
The four mean SBRs are arranged in a 4 by 1 vector in the order: left caudate (LC), left putamen (LP), right putamen (RP), and right caudate (RC). Note that with . This vector is the feature we extract from every image, and our goal is to model the time series of this feature for all subjects.
Keeping in mind that SBR is known to decay exponentially in PD, a continuous time evolution model for is the linear differential equation: where . Since is not required to be diagonal, this model captures coupled progression between the caudates and putamina.
Next, we account for the fact that the longitudinal time series for each subject is discrete and not necessarily uniformly sampled in time. Letting denote the discrete time series for the th subject, and letting denote the difference in time between the th and the th imaging times, the differential equation can be discretized as , where we assume that the noise has a Gaussian distribution, . Denoting the entire time series for the subject as , we have
| (1) |
where and with estimated from all the .
Next, we impose MMS:
Definition 1. is a symmetric permutation if for , and refers to the th component of vector . A matrix is centrosymmetric if for all , where is a symmetric permutation.
Because of how LC, LP, RP, and RC are arranged in , swapping right and left hemisphere ROIs corresponds to applying a symmetric permutation to . Thus, MMS implies that should be a centrosymmetric matrix. The set of all centrosymmetric matrices forms a subspace of all matrices. This subspace has dimension , hence the number of parameters for fitting the data is reduced by approximately half. Also, if a centrosymmetric matrix has distinct eigenvalues, its eigenvectors are either symmetric or skew-symmetric [16]. We will see the importance of this property when interpreting results.
Let be a vector of the form where 1 only appears at the th position and the th position and the other components are zero. Then, any centrosymmetric matrix can be represented as where is the basis and contains the coordinates for expressing in this basis. Hence, the density in (1) can be reorganized as :
| (2) |
where
| (3) |
Note that are functions of , but the dependence is not explicitly shown to simplify notation. For Bayesian analysis we will need conjugate priors for and a posterior predictive probability density for . The quadratic form for in Eq. (2) is either strictly positive definite or positive semi-definite depending on the length of the time series. In either case, the density belongs to the exponential family and a conjugate prior is available [2]. For completeness, we state the conjugate prior and the posterior for the specific form in (2):
Theorem 1. Suppose are conditionally independent random variables, with densities given by Eq. (2) where is symmetric. Then, has a conjugate prior in the form of normal-inverse-gamma (NIG)
| (4) |
where is positive definite, and Because the prior is conjugate, the posterior is also NIG
| (5) |
with
In Eq. (4), are parameters of the prior (hyperparameters). We jointly refer to them as . A direct consequence of Theorem 1 is that the posterior predictive distribution has a closed form:
Corollary 1. Suppose has density given by Eq. (2) where is symmetric. Then, the posterior predictive density of given is
| (6) |
where
3.2. Heterogeneity and Mixture Models
The above model holds for all the subjects satisfying a single differential equation. It does not account for heterogeneity. Heterogeneity implies that different subjects may be modeled by different transition matrices (as represented by their coordinates ) and corresponding noise variances . Assuming that there are distinct and , then the time series for each subject can be modeled as generated by first picking a latent variable , and then sampling from the distribution . The density of given all and is, of course, a mixture model
where with . We call this a mixture of linear dynamical systems model, or mixture LDS for short. We are interested in estimating for this model.
Since has a categorical distribution, for Bayesian analysis we use its conjugate prior, a Dirichlet distribution . For , we use NIG as its conjugate prior, i.e. , from Theorem 1. and are hyperparameters. The mixture LDS model captures all of the characteristics of PD progression listed in Sect. 1.
4. Bayesian Analysis
Bayesian analysis of the above model consists of generating samples from the posterior using MCMC methods, where is the vector of latent variables. The overall strategy is to sample from sequentially (a.k.a. Gibbs sampling). We use a collapsed Gibbs sampler for sampling from , and then sample the rest from .
4.1. Collapsed Gibbs Sampling
To sample , we integrate out and sample from . This is collapsed Gibbs sampling and it is well known that it leads to faster convergence . The samples of are generated one component at a time based on
where (the same for ), and . Assuming that , the first term in the product is easily shown to be , where and if its argument is true and zero otherwise. The second term is calculated from (6) according to Corollary 1.
To sample from , we sample and then . Sampling is straightforward since and . Sampling is done by sampling from NIG in (5) following Theorem 1 since where .
We use weak priors by setting the hyperparameters to and . For the results reported below, we use 3000 MCMC iterations with initial 40% samples discarded.
4.2. Choosing the Number of Clusters
Our model requires , the number of clusters to be chosen a priori. We choose the number of clusters using Bayesian model selection as well as cross validation [10,11]. For cross validation, we divide the dataset into 10 subsets (10-fold cross validation). Taking each subset as test set, we use the remaining data as training set to infer the parameters and then evaluate the log-likelihood of the test set.
For Bayesian model selection, we denote for the model with components, and . Assuming constant on , we have . Then, finding the optimal is equivalent to finding the maximum of with respect to . The density is evaluated by the integral
| (7) |
The Gibbs sampler discussed above generates samples from . Using as a proposal distribution and the generated samples of , we evaluate the integral in (7) by importance sampling.
5. Results
Recall from Sect. 2 that time series data from 365 subjects survived preprocessing. This is the dataset that we analyze. First we determine the number of clusters, then we use the determined number of clusters to sample from the posterior of the model parameters. Finally, we interpret the mean of the posterior.
5.1. Determine the Number of Clusters
Figure 3 shows plots for Bayesian model selection (with ) as well as 10-fold cross validation as a function of number of clusters. In each plot, the blue curve shows the value of the log-likelihood vs. the number of clusters. The log-likelihood behavior is similar in both plots, consistent with the common understanding that empirical Bayesian and cross validation have similar results. We observed that as the number of clusters increased, empty clusters were created. In both plots, the orange curve shows the number of clusters that were not empty. Note that the log-likelihood increases with number of clusters and approximately plateaus from 4 clusters onwards. The one exception is when we initialize with 9 clusters, where Bayesian model selection gives 5 non-empty clusters with a slightly higher log-likelihood value. For cross validation, the average number of final number of clusters (over the 10 folds) remains around 4. Considering these results, we use 4 clusters for the dataset.
Fig. 3.

Model selection using Bayesian (a) and cross validation (b). The y-axis has two scales corresponding to log-likelihood value (blue curve) and number of nonempty clusters (orange curve) separately.
5.2. Interpreting the Model
Having chosen the number of clusters, Table 1 shows the mean of the posterior distribution for , which we take as the “fit” of the model to the data. The table does not directly show matrices , instead it shows the eigenvalues and eigenvectors of the matrices. These are easier to interpret, as we discuss below. Figure 4 shows the raw SBR time series for subjects for each cluster.
Table 1.
from averaging the generated samples.
| Class | k = 1 | k = 2 | k = 3 | k = 4 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| π k | 0.382 | 0.072 | 0.397 | 0.148 | ||||||||||||
| σ k | 0.069 | 0.268 | 0.128 | 0.087 | ||||||||||||
| λ a | −0.3 | −0.2 | −0.2 | −0.1 | −0.7 | −0.4 | −0.4 | −0.1 | −0.4 | −0.3 | −0.1 | −0.1 | −0.6 | −0.4 | −0.2 | −0.2 |
| 0.4 | −0.3 | −0.6 | −0.6 | 0.5 | 0.7 | −0.7 | −0.6 | −0.4 | 0.5 | −0.6 | 0.6 | 0.5 | −0.2 | −0.6 | 0.6 | |
| −0.6 | 0.7 | −0.3 | −0.4 | −0.5 | −0.1 | −0.2 | −0.4 | 0.6 | −0.5 | −0.4 | 0.4 | −0.5 | 0.7 | −0.4 | 0.4 | |
| 0.6 | 0.7 | 0.3 | −0.4 | 0.5 | −0.1 | 0.2 | −0.4 | −0.6 | −0.5 | 0.4 | 0.4 | 0.5 | 0.7 | 0.4 | 0.4 | |
| −0.4 | −0.3 | 0.6 | −0.6 | −0.5 | 0.7 | 0.7 | −0.6 | 0.4 | 0.5 | 0.6 | 0.6 | −0.5 | −0.2 | 0.6 | 0.6 | |
is represented by its eigenvalues and eigenvectors . Eigenvectors are shown to first significant digit to conserve horizontal space.
Fig. 4.

Four clusters from Gibbs sampling on the whole data.
To interpret the model, first note that cluster 2 has the largest and the smallest . Also, the SBR trajectories for this cluster (Fig. 4) are more disorganized than SBR trajectories for other clusters. Quite likely, this cluster represents additional outliers in the data. We focus on the remaining clusters. For these clusters, the eigenstructure of ’s is particularly illuminating: All eigenvalues of ’s are real and negative, showing that all linear combinations of SBRs decrease with time. Cluster 1 has the least negative eigenvalues, while cluster 4 has the most negative eigenvalues. Cluster 3 is intermediate. Thus cluster 4 captures the fastest evolving subjects, cluster 1 the slowest, and cluster 3 the intermediate. The eigenvalues for cluster 4 are almost twice the eigenvalues for cluster 1, suggesting that DaT loss proceeds approximately twice as fast in cluster 4. Further evidence for this comes directly from the data. Figure 5 shows the histograms of the magnitude of initial velocities as calculated from the raw SBRs for each cluster. The medians of the histograms are 0.17, 0.61, 0.27, 0.36 respectively, which verifies our speed analysis.
Fig. 5.

Histogram of starting velocity magnitude for each cluster. The medians for the 4 clusters are 0.17, 0.61, 0.27, 0.36 respectively.
The solution to a is determined by the eigenvalues and eigenvectors of . The eigenvector determines the subspace in which the time series proceeds with the eigenvalue as the time constant. Recall from Sect. 3.1 that the eigenvectors of centrosymmetric matrices are either symmetric or skew-symmetric. This symmetry or antisymmetry represents symmetry or asymmetry of disease progression. To see this, suppose is a skew-symmetric eigenvector, say . Because of how we have arranged the caudates and putamina in , this suggests that goes to zero with speed . In other words, a skew-symmetric eigenvector and its eigenvalue capture how the asymmetry (of the linear combination ) between the two hemispheres decreases to zero. Similarly, a symmetric eigenvector and its eigenvalue capture how the symmetry (i.e. the weighted “mean” of the SBRs) decreases to zero. Finally, recall from Sect. 1 that the difference between SBR uptake in the caudate and putamen reflects the extent of PD in each brain hemisphere. Thus if and have opposite signs, then a skew-symmetric eigenvector represents asymmetry in the disease across the hemispheres, while a symmetric eigenvector represents the mean disease in both hemispheres.
Applying this interpretation to each cluster of Table 1 gives the following: The most negative eigenvalue in each cluster has a skew-symmetric eigenvector. This implies that the loss of asymmetry of disease across hemispheres has the fastest speed of all possible linear combinations of SBRs. Also note that the first eigenvector in clusters 1, 3 and 4 is numerically quite similar. The next significant eigenvalue in each cluster has a symmetric eigenvector. The last eigenvector in all clusters represents the “mean” of all four regions and has the slowest eigenvalue, hence the mean (or net) SBR in the four regions is the slowest indicator of early stage PD.
6. Discussion and Conclusion
The results presented above show that PD progression as manifest in DaTscans is heterogeneous, with one subtype (cluster 4) progressing almost twice as fast as the slowest subtype (cluster 1). This is the first significant finding of this paper. The second, and potentially more interesting, finding is that within each subtype (cluster) the change in SBR asymmetry across hemispheres has a faster time constant than the change in mean SBR across hemispheres. Moreover, the change in asymmetry is the fastest change among all linear combinations of SBRs. Whether this finding can be utilized to create a sensitive disease progression index is an open question. We plan to investigate this in the future. Such a disease progression index is likely to have significant implications for clinical trials that use DaTscans.
Model mirror symmetry is also a novel idea with broader implications. Extending the MMS idea to high dimensional data (more structures, voxel level rather than ROI modeling) could be challenging as the dimension increases. One possible route to higher dimensions is to require to be sparse or low rank. This will be the focus of forthcoming research.
In conclusion, this paper proposes a mixture LDS model for PD progression in DaTscans. The model reveals that progression in DaTsans is heterogeneous, with a significant range of progression time constants. The model also suggests that the change in asymmetry may be a more sensitive index of PD disease progression.
Acknowledgements.
This research is supported by the NIH grant R01NS107328. We gratefully acknowledge discussions with Prof. Sule Tinaz of the Dept. of Neurology Yale University.
The data used in the preparation of this article was obtained from the Parkinson’s Progression Markers Initiative (PPMI) database (up-to-date information is available at http://www.ppmi-info.org). PPMI – a public-private partnership – is funded by the Michael J. Fox Foundation for Parkinson’s Research and multiple funding partners. The full list of PPMI funding partners can be found at ppmi- info.org/fundingpartners.
References
- 1.Au WL, Adams JR, Troiano A, Stoessel AJ: Parkinson’s disease: in vivo assessment of disease progression using positron emission tomography. Mol. Brain Res 134, 24–33 (2005) [DOI] [PubMed] [Google Scholar]
- 2.Bernardo JM, Smith AF: Bayesian Theory. Wiley, Hoboken: (2009) [Google Scholar]
- 3.Braak H, Tredici KD, Rub U, de Vos RA, Steur ENJ, Braak E: Staging of brain pathology related to sporadic Parkinson’s disease. Neurobiol. Aging 24(2), 197–211(2003) [DOI] [PubMed] [Google Scholar]
- 4.Eggers C, Kahraman D, Fink GR, Schmidt M, Timmermann L: Akinetic-rigid and tremor-dominant Parkinson’s disease patients show different patterns of FP-CIT single photon emission computer tomography. Mov. Disord 26(3), 416–423(2011) [DOI] [PubMed] [Google Scholar]
- 5.Fonteijn HM, et al. : An event-based model for disease progression and its application in familial Alzheimer’s disease and Huntington’s disease. NeuroImage 60(3), 1880–1889(2012) [DOI] [PubMed] [Google Scholar]
- 6.Hilker R, et al. : Nonlinear progression of Parkinson disease as determined by serial positron emission tomographic imaging of striatal fluorodopa F 18 activity. Archiv. Neurol 62(3), 378–382 (2005) [DOI] [PubMed] [Google Scholar]
- 7.Hoehn MM, Yahr MD: Parkinsonism: onset, progression and mortality. Neurology 17, 427–442(1967) [DOI] [PubMed] [Google Scholar]
- 8.Innis RB, et al. : Consensus nomenclature for in vivo imaging of reversibly binding radioligands. J. Cereb. Blood Flow Metab 27(9), 1533–1539 (2007) [DOI] [PubMed] [Google Scholar]
- 9.Jedynak BM, et al. : A computational neurodegenerative disease progression score: method and results with the Alzheimer’s disease Neuroimaging Initiative cohort. Neuroimage 63(3), 1478–1486 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kass RE, Raftery AE: Bayes factors. J. Am. Stat. Assoc 90(430), 773–795 (1995) [Google Scholar]
- 11.McLachlan GJ, Rathnayake S: On the number of components in a Gaussian mixture model. Wiley Interdisc. Rev.: Data Min. Knowl. Discov 4(5), 341–355 (2014) [Google Scholar]
- 12.Murphy KP: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge: (2012) [Google Scholar]
- 13.Quinn N, Critchley P, Marsden CD: Young onset Parkinson’s disease. Mov. Disord 2, 73–91 (1987) [DOI] [PubMed] [Google Scholar]
- 14.Raj A, Powell F: Models of network spread and network degeneration in brain disorders. Biol. Psychiatry: Cogn. Neurosci. Neuroimaging 3, 788–797 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sudderth EB: Graphical models for visual object recognition and tracking. Ph.D. thesis, Massachusetts Institute of Technology (2006) [Google Scholar]
- 16.Weaver JR: Centrosymmetric (cross-symmetric) matrices, their basic properties, eigenvalues, and eigenvectors. Am. Math. Mon 92(10), 711–717 (1985) [Google Scholar]
- 17.Zhou J, Liu J, Narayan VA, Ye J: Modeling disease progression via fused sparse group lasso. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1095–1103. ACM (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zubal IG, Early M, Yuan O, Jennings D, Marek K, Seibyl JP: Optimized, automated striatal uptake analysis applied to SPECT brain scans of Parkinson’s disease patients. J. Nucl. Med 48(6), 857–864 (2007) [DOI] [PubMed] [Google Scholar]
