Skip to main content
NeuroImage : Clinical logoLink to NeuroImage : Clinical
. 2018 Mar 10;18:802–813. doi: 10.1016/j.nicl.2018.03.007

Development and validation of a novel dementia of Alzheimer's type (DAT) score based on metabolism FDG-PET imaging

Karteek Popuri a,1, Rakesh Balachandar a, Kathryn Alpert b, Donghuan Lu a, Mahadev Bhalla a, Ian R Mackenzie c, Robin Ging-Yuek Hsiung d, Lei Wang b, Mirza Faisal Beg a,*; the Alzheimer's Disease Neuroimaging Initiative1
PMCID: PMC5988459  PMID: 29876266

Abstract

Fluorodeoxyglucose positron emission tomography (FDG-PET) imaging based 3D topographic brain glucose metabolism patterns from normal controls (NC) and individuals with dementia of Alzheimer's type (DAT) are used to train a novel multi-scale ensemble classification model. This ensemble model outputs a FDG-PET DAT score (FPDS) between 0 and 1 denoting the probability of a subject to be clinically diagnosed with DAT based on their metabolism profile. A novel 7 group image stratification scheme is devised that groups images not only based on their associated clinical diagnosis but also on past and future trajectories of the clinical diagnoses, yielding a more continuous representation of the different stages of DAT spectrum that mimics a real-world clinical setting. The potential for using FPDS as a DAT biomarker was validated on a large number of FDG-PET images (N=2984) obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database taken across the proposed stratification, and a good classification AUC (area under the curve) of 0.78 was achieved in distinguishing between images belonging to subjects on a DAT trajectory and those images taken from subjects not progressing to a DAT diagnosis. Further, the FPDS biomarker achieved state-of-the-art performance on the mild cognitive impairment (MCI) to DAT conversion prediction task with an AUC of 0.81, 0.80, 0.77 for the 2, 3, 5 years to conversion windows respectively.

Keywords: FDG-PET, Glucose metabolism, Dementia of Alzheimer's type (DAT), Multi-scale ensemble classifier

Highlights

  • A novel FDG-PET biomarker indicating probability of dementia of Alzheimer's type (DAT) diagnosis

  • A new 7 group image stratification approach representing different stages along the DAT spectrum

  • Comprehensive biomarker evaluation using 2984 ADNI images achieved a good validation AUC of 0.78.

  • Achieved an AUC of 0.81, 0.80, 0.77 for 2, 3, 5 years to conversion on MCI to DAT conversion prediction

1. Introduction

Alzheimer's disease (AD) is a neurodegenerative disorder characterized by the presence of AD pathology (ADP) such as aberrant deposition of amyloid beta (Aβ) proteins, and the appearance of neurofibrillary tangles of tau proteins. The initial symptom of AD is cognitive impairment notably in the memory domain, that gradually involves other domains leading to a clinical diagnosis of dementia of Alzheimer's type (DAT). Patients with DAT progressively succumb to severe stages of dementia, requiring complete assistance for daily activities. DAT is the most common form of dementia, affecting 1 in 9 people over the age of 65 years (Alzheimer's Association, 2015) and as many as 1 in 3 people over the age of 85 (Hebert et al., 2013). As of 2015, there were an estimated 46.8 million dementia afflicted growing to reach 131.5 million in 2050 (Prince et al., 2016), projecting a very sizeable burden on healthcare systems and caregivers worldwide. This impending public health crisis due to rising DAT cases has prompted drug-development efforts to find treatments for AD that can reduce the severity of ADP or remove it altogether Cummings et al., 2014, 7, Godyń et al., 2016. However, the success of such treatments ultimately depends on the ability to diagnose DAT as early as possible before irreversible brain damage occurs. Therefore, in recent years there has been a considerable push towards developing robust biomarkers useful for diagnosing DAT in clinical practice (Weiner et al., 2017).

Fluorodeoxyglucose positron emission tomography (FDG-PET) is a minimally invasive neuroimaging technique to quantify the glucose metabolism in the brain which indirectly measures the underlying neuronal activity (Mosconi et al., 2010). As metabolic disruptions are hypothesized to precede the appearance of cognitive symptoms in AD (Jack et al., 2013), FDG-PET imaging presents itself as an attractive tool for investigating the metabolism changes triggered by ADP across the entire DAT spectrum, ranging from the presymptomatic phase to the mild cognitive impairment (MCI) stage followed by dementia. Our aim in this work is to develop an automatic method that can aid in the interpretation of the 3D topographic metabolism patterns encoded in FDG-PET images for the purpose of DAT diagnosis. To this end, we devised a supervised machine learning framework that takes as input a FDG-PET image of subject and outputs a continuous value between 0 and 1 termed as the FDG-PET DAT score (FPDS), which indicates the probability of the subject's metabolism profile to be belonging to the DAT trajectory, i.e., how likely is the subject to be clinically diagnosed with DAT.

One of the main contributions of our work is the introduction of a novel approach for stratifying the imaging data used in the development and validation of the proposed FPDS methodology. Most commonly, imaging biomarker studies employ a 3 group stratification, where the clinical diagnostic labels of NC, MCI and DAT assigned at the time of image acquisition are directly used for grouping the imaging data (Rathore et al., 2017). In contrast, here we present a stratification scheme that groups images based not only on their associated clinical diagnosis but also on past and future clinical diagnoses. Our novel stratification is able to more faithfully represent the different diagnostic trajectories observed in a real-world clinical setting when compared to the stratification depending only on the diagnosis at a single timepoint. For instance, based on our stratification, we can distinguish among NC images that stay NC (stable NC, sNC) from those that convert to MCI (unstable NC, uNC), and from those that convert to DAT (progressive NC, pNC). A similar delineation is also induced among the MCI and DAT images using our stratification scheme. An important contribution in this paper is the design of a novel multi-scale ensemble classification model for the proposed FPDS computation. The ensemble model consists of several individual classifiers trained on features extracted from the FDG-PET image at multiple scales. The probability predictions from each of these individual classifiers regarding the association of the given FDG-PET image with a DAT trajectory are fused together to obtain a more robust final FPDS prediction. Another noteworthy contribution of our work is the exhaustive and comprehensive statistical evaluation approach used to validate the FPDS predictions. First, the training model fit was evaluated and then a pseudo-independent test sample consisting of follow-up images corresponding to the baseline training data was used to obtain a more accurate estimate of the ensemble model's generalization error. Finally, the predictive performance of the FPDS biomarker was evaluated on a large completely independent validation set of images taken from different stages of the DAT spectrum demonstrating a strong generalization potential of the reported results. To the best of our knowledge, ours is the largest FDG-PET based imaging biomarker study reported till date. Our study goals align with the phase 3 aims of the structured five-phase AD biomarker development framework that was recently proposed (Garibotto et al., 2017), and the results presented in our paper add to the currently available evidence for supporting the use of FDG-PET as a diagnostic tool for DAT (Frisoni et al., 2017).

2. Materials and methods

2.1. Study participants

Data used in the preparation of this article was obtained from the ADNI database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by principal investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early Alzheimer's disease. Till date, ADNI has involved 1887 subjects and assessed over one or more visits. Clinical diagnosis received by these subjects, can be broadly categorized among one of NC, MCI and DAT. Detailed description of the ADNI recruitment procedure, image acquisition protocols and diagnostic criteria can be found at www.adni-info.org and inclusion criterion are detailed in Petersen et al. (2010).

2.2. Novel database stratification

We devised a novel stratification scheme to distinguish within the NC, MCI and DAT groups based on past and future clinical diagnosis received by the individual (Table 1). Each of these three groups were further divided into subgroups based on the diagnoses received during their follow-up. The subgroups are named according to the convention ‘prefixGroup’, where ‘Group’ is the clinical diagnosis obtained during the imaging visit, and ‘prefix’ signifies the past or the future clinical diagnoses of the same individual. Images associated with clinical diagnosis of NC, and a consistent diagnoses of NC during the entire ADNI study period are termed as the stable NC (sNC) group. Images associated with clinical diagnosis of NC, but convert to MCI in the future visits are termed as unstable NC (uNC). Images associated with a clinical diagnosis of NC and convert to DAT in their future visits are termed as progressive NC (pNC). Similarly, images associated with MCI are subgrouped as stable MCI (sMCI) and progressive MCI (pMCI) based on persistent MCI diagnosis and conversion to DAT diagnosis respectively in their subsequent followup. Images with a clinical diagnosis of DAT who joined ADNI at the DAT stage, i.e., they converted to clinical diagnosis of DAT prior to ADNI recruitment, and remained DAT for the future ADNI visits are termed as stable DAT (sDAT). Images with a clinical diagnosis of DAT, with the recent past ADNI clinical diagnosis of either NC or MCI, i.e., they converted to DAT within the ADNI visits are termed as early DAT (eDAT). Note that a past or future clinical diagnosis visit may or may not include neuroimaging, but the past or future clinical diagnosis enables an enriched staging of each image given the evolution of clinical diagnosis.

Table 1.

Novel stratification of ADNI images and associated demographic, clinical & biomarker details. The stratification was based on two criteria, clinical diagnosis of subjects at the time of FDG-PET image acquisition and their longitudinal clinical progression. Each image is assigned a membership of the form ‘prefixGroup’, where ‘Group’ is the clinical diagnosis at imaging visit, and ‘prefix’ signals past or future clinical diagnoses. For e.g., an image is designated as pNC if the subject was assigned a NC diagnosis at that particular imaging visit, but the subject converts to DAT at a future timepoint. The eDAT images are associated with the diagnosis of DAT, but the subject had received NC or MCI status during previous ADNI visits (conversion within ADNI window). Whereas, the sDAT images belong to subjects with a consistent clinical diagnosis of DAT throughout the ADNI study window, hence these individuals have progressed to DAT prior to their ADNI recruitment.

Dementia trajectory Group name Clinical diagnosis at imaging Clinical progression Nc [images] Aged [years] MMSEa, d [Max. 30] CSFa, d [t-tau/Aβ1−42]
DAT−b sNC:stable NCe NCa NCNC 753 75.44 (5.95) 29.08 (1.17) 0.37 (0.26)
DAT− uNC:unstable NC NC NC → MCI 110 78.93 (4.91) 29.05 (1.13) 0.47 (0.32)
DAT− sMCI:stable MCI MCIa NC → MCIorMCIMCI 881 75.02 (7.77) 27.86 (1.95) 0.55 (0.47)
DAT+b pNC:progressive NC NC NC → MCI → DAT 58 78.20 (4.43) 28.90 (1.29) 0.59 (0.27)
DAT+ pMCI:progressive MCI MCI NC → MCI → DAT orMCI → DAT 486 74.87 (7.12) 26.77 (2.06) 0.88 (0.52)
DAT+ eDAT:early DAT DATa NC → MCI → DATor MCI → DAT 232 76.59 (6.77) 22.25 (4.51) 0.94 (0.62)
DAT+ sDAT:stable DATf DAT DATDAT 464 75.80 (7.49) 22.02 (3.64) 1.03 (0.58)
a

NC: normal controls, MCI: mild cognitive impairment, DAT: dementia of Alzheimer's type

MMSE: mini mental state examination, CSF: cerebrospinal fluid, t-tau: total tau, Aβ1−42: beta amyloid 1-42.

b

DAT+: On DAT trajectory, i.e., at some point in time, these subjects will be clinically diagnosed as DAT

DAT−: not on the DAT trajectory and will not get a DAT diagnosis in the ADNI window.

c

A total of 2984 FDG-PET images were taken from 1298 subjects.

Number of subjects corresponding to images in each of the groups:

sNC (360), uNC (52), sMCI (431), pNC (18), pMCI (205), eDAT (133), sDAT (238)

Number of subjects with images across multiple groups:

uNC & sMCI (18), pNC & pMCI (7), pNC & eDAT (6), pMCI & eDAT (110), pNC & pMCI & eDAT (2).

d

The mean (standard deviation) age, MMSE score and CSF measure values within each group are given.

CSF measures were only available for a subset of images in each of the groups:

sNC (384), uNC (48), sMCI (470), pNC (24), pMCI (205), eDAT (66), sDAT (230).

e

Baseline sNC: N=360, Age: 73.81 (6.07), MMSE: 29.05 (1.22), CSF: 0.36 (0.25)

follow-up sNC: N=393, Age: 76.93 (5.44), MMSE: 29.11 (1.11), CSF: 0.39 (0.28).

f

Baseline sDAT: N=238, Age: 74.93 (7.87), MMSE: 23.22 (2.13), CSF: 1.02 (0.58)

follow-up sDAT: N=226, Age: 76.71 (6.97), MMSE: 20.76 (4.40), CSF: 1.06 (0.58).

The proposed stratification provides key advantage, offers subgroups namely pNC, pMCI, eDAT and sDAT, that represent various stages of DAT trajectory. The pNC subgroup is the earliest, the sDAT subgroup is the most advanced and the pMCI and eDAT subgroups are in-between these extremes along the DAT spectrum. These are denoted as the DAT+ class of images indicating their trajectory towards DAT. The subjects in the sNC, uNC and sMCI subgroups do not include a followup clinical diagnosis of DAT during the ADNI window; so although there is the possibility that post-ADNI these could progress to a clinical diagnosis of DAT, for the purposes of analysis in this paper, these subgroups are considered to not be on the DAT+ trajectory, hence denoted as DAT−.

2.3. MRI processing

Pre-processing of the 3D structural MPRAGE T1-weighted MRI images from ADNI included standard intensity normalization to remove image geometry distortions arising from gradient non-linearity, B1 calibrations to correct for image intensity non-uniformities and N3 histogram peak sharpening (http://adni.loni.usc.edu/methods/mri-analysis/mri-pre-processing). The pre-processed images were segmented into the gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) tissue regions (Dale et al., 1999) using the Freesurfer software package (https://surfer.nmr.mgh.harvard.edu). A rigorous quality control procedure was employed to manually identify and correct any errors in the automated tissue segmentations following Freesurfer's troubleshooting guidelines. Subsequently, the GM tissue region was parcellated into 85 different anatomical ROIs using Freesurfer's cortical (Desikan et al., 2006) and subcortical (Fischl et al., 2002) labeling pipelines.

2.4. FDG-PET processing

The ADNI FDG-PET images used in this study were pre-processed using a series of steps to mitigate inter-scanner variability and obtain FDG-PET data with a uniform spatial resolution and intensity range for further analysis (http://adni.loni.usc.edu/methods/PET-analysis/pre-processing). Briefly, the original raw FDG-PET frames were co-registered and averaged to obtain a single FDG-PET image, which was then mapped from its native space to a standard 160 × 160 × 96 image grid with 1.5 × 1.5 × 1.5 mm3 voxels. After standardizing the spatial resolution and orientation, the intensity range of the FDG-PET image was normalized such that average intensity of all the foreground voxels in the image was exactly equal to one. The intensity normalized images were then filtered using scanner-specific filter functions to obtain FDG-PET data at a uniform smoothing level of isotropic 8 mm full width at half maximum (FWHM) Gaussian kernel.

2.5. Multi-scale patch-wise FDG-PET SUVR features

In order to better localize the average regional glucose metabolism signal, each of the 85 GM ROIs obtained using Freesurfer were further subdivided into smaller volumetric sub-regions or patches. Our previously proposed adaptive surface patch generation method (Raamana et al., 2015), which is based on k-means clustering, was applied to the 3D image domain to obtain a patch-wise parcellation of the GM ROIs. Instead of subdividing each GM ROI into a fixed number of patches, the number of patches per ROI were adaptively determined using the patch size parameter (m), denoting the number of voxels in each patch. This achieves a patch density (patches in ROI/voxels in ROI) that is uniform 1m throughout the image domain, which is desirable, as it leads to a compact yet rich description of the entire GM tissue region. The scale-space theory framework (Witkin, 1984) argues for storing the signal at multiple scales in the absence of a-priori knowledge regarding the appropriate scale at which to analyze the signal. Motivated by this scale-space idea, we generated 16 different levels of patch-wise parcellations, m = {100,150,200,250,300,350,400,450,500, 1000,1500,2000,3000,4000,5000,10000} to obtain a fine to coarse multi-scale representation of the GM region for capturing the regional glucose metabolism signals at different scales. We note that the patch-wise parcellations were initially generated on the standard MNI ICBM 152 non-linear average T1 template (Grabner et al., 2006) (http://nist.mni.mcgill.ca/?p=858) and then were propagated to each of the target MRI images in our dataset using the large deformation diffeomorphic metric mapping (LDDMM) non-rigid registration (Beg et al., 2005). This template-based parcellation approach ensures a one-to-one correspondence between the target image patches, which is required for the construction of a valid multi-scale FDG-PET feature space in the next step.

The FDG-PET images were co-registered with their respective MRI images using the inter-modal linear registration facility (Jenkinson et al., 2002) available as part of the FSL-FLIRT program (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT). The quality of the co-registration was visually checked and the detected failures were corrected by re-running FSL-FLIRT with a narrower rotation angle search range parameter to avoid getting trapped in local minima. The estimated 9 degrees of freedom (DOF) mapping was used to transfer the patch-wise parcellations from the MRI domain onto the FDG-PET domain. The mean FDG-PET image intensity value in each of the mapped patches was used to calculate the patch-wise standardized uptake value ratios (SUVRs) as, the mean intensity in a given patch divided by the mean intensity in the brainstem, chosen as the reference ROI Herholz et al., 1990, Sanabria-Diaz et al., 2013, Gray et al., 2012, Herholz et al., 2002. This resulted in a total of M = 17 (including the original Freesurfer parcellation) patch-wise FDG-PET SUVR feature vectors that encoded the multi-scale regional glucose metabolism information derived from a given target FDG-PET image.

2.6. FDG-PET DAT score computation via supervised ensemble learning

A supervised classification framework following the well established ensemble learning paradigm was used to calculate the proposed FDG-PET DAT score from the multi-scale patch-wise SUVR feature vectors. The main idea behind ensemble based supervised classification is to combine several individually trained classifiers together to obtain a single, more robust classification model (Dietterich, 2000). Accordingly, in the proposed framework, classifiers were trained separately on each of the individual multi-scale feature vector spaces to construct a classifier ensemble. Then, a fusion of the multiple predictions from individual classifiers in the ensemble was performed, yielding the ensemble model estimate about the probability of the input multi-scale feature vectors belonging to the DAT+ trajectory. This probabilistic prediction output by the ensemble classification framework was taken to be the proposed FDG-PET DAT score.

The training samples corresponding to the DAT− and DAT+ classes needed for building the ensemble classification model were given by the baseline sNC (N=360) and sDAT (N=238) images respectively (Table 1, Footnotes 5 and 6). The proposed M multi-scale patch-wise FDG-PET SUVR feature vectors were extracted from all the training samples. To prevent over-fitting of the ensemble model to the chosen training sample set, the subagging approach (Buhlmann, 2003) was employed to randomly generate F = 100 subsets of training samples. The random sampling was performed using a sampling ratio of γ = 0.8 in a stratified manner to avoid class imbalance, ensuring an equal number of samples from both the DAT− and DAT+ classes, i.e., Ntrain = 2 ×⌊0.8 × 238⌋ = 380 samples in each of the F training subsets. An ensemble of M × F probabilistic kernel (Damoulas and Girolami, 2008) classifiers were individually trained on each of the M feature spaces using the F different training subsets. The classifier training was preceded by a t-statistic based feature selection step to identify the k = ⌊Ntrain/10⌋ = 38 most discriminative features within the feature vector and also to address the “curse of dimensionality” issue (Raamana et al., 2015). Each of the M × F = 1700 trained probabilistic kernel classifiers output a continuous scalar pi ∈ [0 1],i = {1,…M × F}, that denotes the probability of an input feature vector belonging to the DAT+ class (1 − pi being the DAT− class membership probability). The FDG-PET DAT score is then simply defined as the mean of the DAT+ class probability predictions obtained from each of the M × F classifiers.

In summary, given an unseen “test” sample containing a FDG-PET/MRI image pair, we first extract the M multi-scale patch-wise SUVR features vectors from the images, and then reduce the dimensionality of each of these feature vectors by retaining only the k most discriminative features that were identified during the training phase. The pruned feature vectors are fed to the previously trained M × F classifier ensemble to obtain M × F probability predictions regarding the DAT+ class membership, which are then averaged to obtain the FDG-PET DAT score corresponding to the given test sample.

3. Results

Our study dataset consisted of 2984 FDG-PET images (with corresponding structural MRI images), belonging to 1294 ADNI subjects, who have undergone imaging and clinical evaluations at one or more longitudinal time points. The images were stratified into one of the 7 study groups based on the clinical diagnosis received at the time of image acquisition and the clinical diagnosis received previously and/or during subsequent follow-up time points (Table 1).

In the proposed stratification scheme, we distinguish among the images that have a clinical diagnosis of NC (sNC, uNC, pNC) at the imaging visit. Within this NC group, there are NC that will stay NC, i.e., stable NC (sNC, N=753 images), convert to MCI, i.e., unstable NC (uNC, N=110 images) or convert to DAT, i.e., progressive NC (pNC, N=58 images), and hence even though all are NC, the images are treated as distinct subgroups of the NC group given their future divergent evolution of clinical diagnosis. In a similar fashion, we distinguish among the images with clinical diagnosis of MCI as consisting of those who will continue to stay MCI, i.e., stable MCI (sMCI, N=881 images) throughout ADNI, or convert to AD, i.e., progressive MCI (pMCI, N=486 images) at a future visit. Finally, we distinguish among those images that have an associated clinical diagnosis of DAT. Those DAT that had a previous clinical diagnosis of NC or MCI, i.e., joined ADNI as either NC or MCI and converted to DAT during ADNI are denoted as the early DAT group (eDAT, N=232 images) given their recent conversion, whereas those that joined ADNI with a clinical diagnosis of DAT and hence their conversion was prior to their ADNI recruitment and remained DAT throughout the ADNI window are designated as the stable DAT (sDAT, N=464 images). There are 110 individuals with FDG-PET images at both the pMCI and the eDAT stages, i.e., these individuals underwent conversion from MCI to DAT during the ADNI window and this conversion was sampled with neuroimaging.

3.1. Demographic, clinical & biomarker values across groups

The 7 stratified image sets were compared for group-level differences in their associated age, mini mental state exam (MMSE) score and CSF t-tau/Aβ1−42 measure (ratio of total tau to beta amyloid 1-42) values. Pairwise significance testing of the group mean value differences was performed between all the groups, using the t-test in the case of normally distributed data and the Wilcoxon rank sum test for the non-parametric data distribution case. The p-values obtained from each of the pairwise significance tests are reported in Table 2. The statistical significance threshold was set at p <0.001. The mean age was observed to be statistically similar across all the groups except for the uNC and pNC groups which exhibited significantly higher ages. The mean MMSE scores were significantly higher among the sNC, uNC and pNC groups when compared to either the sMCI and pMCI groups or the eDAT and sDAT groups. The DAT− (sNC, uNC, sMCI) groups had significantly lower mean CSF t-tau/Aβ1−42 measures when compared to the DAT+ (pNC, pMCI, eDAT, sDAT) groups apart from the two cases where pNC showed statistically similar CSF t-tau/Aβ1−42 measures compared to uNC and sMCI respectively.

Table 2.

The p-values corresponding to the significance of the pairwise group differences in the age, MMSE score and CSF t-tau/Aβ1−42 measure values among the 7 stratified groups. The t-test or Wilcoxon ranksum test was used depending on if the data followed a normal distribution or not. The cases where the group mean values were significantly (p <0.001) different are highlighted in bold and the cases where data followed a normal distribution are underlined.

Groups Age MMSE CSF
sNC-uNC <0.0001 0.5276 0.0046
sNC-sMCI 0.8034 <0.0001 <0.0001
sNC-pNC <0.0001 0.3760 <0.0001
sNC-pMCI 0.4997 <0.0001 <0.0001
sNC-eDAT 0.0211 <0.0001 <0.0001
sNC-sDAT 0.0932 <0.0001 <0.0001
uNC-sMCI <0.0001 <0.0001 0.5432
uNC-pNC 0.3340 0.6900 0.0170
uNC-pMCI <0.0001 <0.0001 <0.0001
uNC-eDAT 0.0003 <0.0001 <0.0001
uNC-sDAT <0.0001 <0.0001 <0.0001
sMCI-pNC 0.0028 <0.0001 0.0555
sMCI-pMCI 0.6312 <0.0001 <0.0001
sMCI-eDAT 0.0149 <0.0001 <0.0001
sMCI-sDAT 0.1029 <0.0001 <0.0001
pNC-pMCI 0.0005 <0.0001 0.0029
pNC-eDAT 0.0290 <0.0001 0.0055
pNC-sDAT 0.0181 <0.0001 <0.0001
pMCI-eDAT 0.0046 <0.0001 0.8320
pMCI-sDAT 0.0424 <0.0001 0.0047
eDAT-sDAT 0.2709 0.0945 0.1072

Automatic salient ROI selection for FPDS computation

The feature selection phase of the ensemble classification model training identified several ROIs that contained strong discriminatory FDG uptake information useful for separating the DAT− and DAT+ classes. Specifically, each of the individual 1700 classifiers in the ensemble model automatically selected a set of 38 most discriminative ROIs from which the multi-scale patch-wise FDG-PET SUVR features were taken and used to compute the FPDS. In Table 3, selection frequencies of the ROIs chosen by the classifier ensemble are listed. The selection frequency of a ROI is defined as the fraction of the classifiers in the ensemble that chose the particular ROI. Interestingly, ROIs from the left hemisphere exhibited much higher selection frequencies compared to the corresponding right hemisphere ROIs. Further, the cortical ROIs had far greater selection frequencies than the subcortical ROIs. In particular, the isthmus and posterior parts of the cingulate gyrus, the precuneus and the inferior and middle temporal gyri had very high (>90%) total (left and right averaged) selection frequencies.

Table 3.

Most discriminative ROIs chosen by the ensemble classification model. The ROIs are listed in descending order of their total (left and right averaged) selection frequency. Note that only ROIs with a non-zero selection frequency (selected at least once) are shown.

ROI name Frequency (%) [Left | Right]
Isthmuscingulate 100.00 | 99.65
Precuneus 100.00 | 83.88
Inferiortemporal 99.82 | 83.35
Posteriorcingulate 96.12 | 85.06
Middletemporal 99.35 | 80.71
Inferiorparietal 99.18 | 64.94
Supramarginal 67.41 | 26.06
Entorhinal 57.94 | 32.53
Hippocampus 47.82 | 32.00
Bankssts 27.76 | 15.82
Rostralmiddlefrontal 24.94 | 17.18
Amygdala 22.18 | 17.29
Parahippocampal 28.00 | 10.06
Caudalmiddlefrontal 22.76 | 13.18
Fusiform 24.29 | 0.53
Medialorbitofrontal 12.76 | 10.29
Superiorfrontal 14.29 | 5.94
Superiortemporal 11.94 | 5.24
Lateralorbitofrontal 12.18 | 2.24
Superiorparietal 11.41 | 3.00
Parsopercularis 9.88 | 1.06
Temporalpole 9.35 | 0.18
Rostralanteriorcingulate 5.18 | 0.00
Frontalpole 0.82 | 0.82
Caudate 0.71 | 0.00
Parstriangularis 0.35 | 0.00
Parsorbitalis 0.18 | 0.00

3.2. FPDS distribution among training (sNC and sDAT) groups

In Fig. 1, the distribution of FPDS values among the baseline and follow-up images from the sNC and sDAT groups are shown. As the baseline images were used for training the ensemble model, the FPDS values for the baseline images were determined via the out-of-bag prediction approach to avoid biased estimates. In this approach, the FPDS for a given baseline image was computed by only fusing predictions from classifiers in the ensemble that did not have the given baseline image as part of their subagging training subset. The follow-up images were not involved in the ensemble model training, so they were treated as unseen test samples and their FPDS values were computed using the standard approach of fusing predictions from all the classifiers in the ensemble. It can be seen from Fig. 1 that the FPDS distributions of the sNC and sDAT groups are very well separated with an excellent (>0.95) area under the curve (AUC) of the receiver operating characteristic (ROC) in both the baseline and follow-up image cases. Moreover, high specificities and sensitivities (∼ 0.90 balanced accuracies) were achieved when using a FPDS threshold of 0.5 to classify the baseline and follow-up images as belonging to either the DAT− or the DAT+ trajectory.

Fig. 1.

Fig. 1

FPDS distribution among the sNC and sDAT images and classification performance obtained in assigning images to either the DAT− or DAT+ trajectory using a 0.5 FPDS threshold. The top row presents the out-of-bag predictions on the baseline images, which were used for training the ensemble model. The bottom row shows ensemble model predictions on the follow-up subgroup. The follow-up images were not part of training and hence were considered as unseen test samples for the purpose of FPDS computation. The (number of images: mean FPDS) is shown for each subgroup. Balanced accuracy is the mean of the sensitivity and specificity measures.

3.3. FPDS distribution among the validation image groups

Imaging data from the uNC and sMCI groups that belong to the DAT− trajectory, along with images from the pNC, pMCI and eDAT groups that are on the DAT+ trajectory constituted the independent validation set used for evaluating the proposed ensemble model framework for FPDS computation. In Fig. 2, FPDS distributions across these independent validation image groups are shown. In general, the mean FPDS values among the DAT− trajectory groups (<0.4) were much lower compared to the FPDS group means across the DAT+ trajectory groups (>0.6), except for the pNC group which had a mean FPDS value of 0.35 which was similar to that of the DAT− groups. It should however also be noted that the pNC group contained far fewer images (N=58) in comparison to the other groups. Overall, there was a good degree of separation between the DAT− and DAT+ FPDS distributions resulting in an AUC of 0.78. Further, this separability translated into a balanced accuracy of 0.70 when the images were classified into either the DAT− or the DAT+ trajectory using a 0.5 FPDS threshold.

Fig. 2.

Fig. 2

The FPDS distribution among validation image groups and the classification performance obtained in determining dementia trajectories (DAT− or DAT+) for these images using a 0.5 FPDS threshold. The FPDS histograms corresponding to the groups on the DAT− (uNC, sMCI) and the DAT+ (pNC, pMCI, eDAT) trajectories are stacked together respectively. The (number of images: mean FPDS) is shown for each group. Balanced accuracy is mean of sensitivity and specificity.

3.4. FPDS trend across age ranges in validation image groups

The mean FPDS values and classification accuracies (based on 0.5 FPDS threshold) obtained from the validation image subsets taken across different age ranges within the uNC and sMCI groups (DAT− trajectory), and the pNC, pMCI and eDAT groups (DAT+ trajectory) are presented in Fig. 3. The FPDS means in the sMCI group gradually increased from less than 0.2 in the younger age ranges (55–70 years) to greater than 0.5 among the older age ranges (85–95 years). This wide and gradual variation of FPDS values manifested as a steady decrease in the accuracy of identifying the sMCI images as DAT− from above 0.85 in the younger age ranges (55–70 years) to below 0.5 in the older age ranges (85–95 years). The eDAT group exhibited uniformly high FPDS mean values across all the age ranges lying in a short interval of 0.74 –0.9. Consequently, a majority of the eDAT images were correctly classified as DAT+, leading to a high overall accuracy of 0.89. In contrast to the sMCI and eDAT groups, no consistent age-related patterns of FPDS mean values and classification accuracies were observed among the uNC, pNC and pMCI groups. The pMCI group displayed relatively high FPDS mean values (>0.67) in three disjoint age ranges 55–60, 70–75 and 85–90 years, and accordingly the classification accuracies of 0.85, 0.71 and 0.81 respectively observed in these age ranges, were considerably higher than the overall pMCI group average of 0.68. Surprisingly, the pNC and uNC groups were found to have a similar FPDS mean in the 70–75 years range, and further in the following 75–80 and 85–90 years ranges the pNC group had lower FPDS means relative to the uNC group. This lead to mis-labeling of most pNC images as DAT−, yielding a very poor overall classification accuracy of 0.28.

Fig. 3.

Fig. 3

Age-based analysis of FPDS score: heat map plots showcasing the trend of mean FPDS (top) and classification accuracy (bottom) obtained across different age ranges within each of the validation image groups. The classification accuracies were calculated using a 0.5 FPDS threshold. The number of images in a (image group, age range) is printed on the corresponding heat map cell, while the total number of images within a group is shown in parentheses under each column of the heat maps. The overall mean FPDS and classification accuracy within a group are given above respective heat map columns.

3.5. FPDS versus time to conversion in progressive image groups

In Fig. 4, the mean FPDS values and classification accuracies (based on 0.5 FPDS threshold) computed from image subsets taken across different ranges of time to conversion within the the pNC and pMCI groups are shown. The time to conversion is defined as the number of years from the image scan date to the earliest future timepoint at which the subject associated with the image was given a clinical diagnosis of DAT. The pMCI group exhibited relatively high mean FPDS values (0.64 –0.71) among the 0–3 years to conversion range. But, in the later time to conversion ranges, especially beyond the 4 years to conversion range, a considerable decrease (0.26 –0.46) in the FPDS means was observed. Therefore, for the pMCI group, good classification accuracies (0.7 –0.78) were only observed in the 0–3 years to conversion range, past which the pMCI images were frequently misclassified as DAT−, reducing the overall accuracy to 0.68. The pNC group showed low FPDS mean values (0.17 –0.52) across all the time to conversion ranges, leading to incorrect labeling of more than 72% of the pNC images as DAT− (0.28 overall classification accuracy).

Fig. 4.

Fig. 4

Heat maps showing variation of mean FPDS (left) and classification accuracy (right) across different time to conversion ranges in the progressive image groups (pNC and pMCI). The time to conversion indicates the number of years from the image scan date to the first clinical diagnosis of DAT for the subject associated with the image. A FPDS threshold of 0.5 was used to calculate the classification accuracies. The number of images in a (image group,time to conversion range) is printed on the corresponding heat map cell, while the total number of images within a group is shown in parentheses under each column of the heat maps. The overall mean FPDS and classification accuracy within a group are given above respective heat map columns.

3.6. Correlation between FPDS and CSF t-tau/Aβ1−42

To investigate the causal association of FPDS with established ADP measures, Pearson correlation analysis was performed between the CSF t-tau/Aβ1−42 and FPDS values. The correlation results obtained across the training (Fig. 5) and independent validation (Fig. 6) image groups are reported. In the case of training image groups, correlation analysis was performed using the combined set of baseline and follow-up images in each of the sNC and sDAT groups respectively. The spread of t-tau/Aβ1-42 values in the sNC group (0.1 –1.54) was relatively narrow as compared to sDAT group (0.15 –3.6). Both the sNC and sDAT groups showed a weak yet positive correlation between the t-tau/Aβ1−42 and FPDS, with the sDAT group showing a relatively stronger correlation coefficient (r=0.13) that was also statistically significant (p=0.0489). Among the validation image groups, the t-tau/Aβ1−42 values of the NC groups (uNC and pNC) had a relatively narrow range (0.11 –1.47) compared to the other DAT− (sMCI) and DAT+ (pMCI and eDAT) groups. In general, the FPDS was weakly, but positively correlated with t-tau/Aβ1−42. The correlation coefficient ranged between 0.13 and 0.31 among the various groups considered. However, correlation coefficients exhibited by DAT− groups (uNC and sMCI) were found to be statistically significant (p=0.0380 and p <0.0001), whereas the DAT+ (pNC, pMCI and eDAT) groups only exhibited a trend of positive correlations with r-values in the 0.13 –0.24 interval.

Fig. 5.

Fig. 5

Pearson correlation between CSF t-tau/Aβ1−42 and FPDS across the sNC and sDAT images (baseline and follow-up combined). The CSF t-tau/Aβ1−42 measures were only available for a subset of images and their numbers are shown in parentheses. The statistical significance threshold for correlation coefficient (r) was set at p <0.05. The FPDS distribution and classification accuracy obtained using a 0.5 FPDS threshold within the τ/Aβ − (t-tau/Aβ1−42 <= 0.52) and τ/Aβ + (t-tau/Aβ1−42 > 0.52) sub-groups is also shown.

Fig. 6.

Fig. 6

Pearson correlation between CSF t-tau/Aβ1−42 and FPDS values across the independent validation image groups. Number of images with CSF t-tau/Aβ1−42 measures available are given in parentheses. Correlation coefficient (r) was considered significant at p <0.05. The FPDS distribution and classification accuracy obtained using a 0.5 FPDS threshold within the τ/Aβ − (t-tau/Aβ1−42 <= 0.52) and τ/Aβ + (t-tau/Aβ1−42 > 0.52) sub-groups is also shown.

The potential influence of t-tau/Aβ1−42 on FPDS was further characterized by generating the FPDS distributions among the sub-groups associated with low and high t-tau/Aβ1−42 values respectively. A previously published t-tau/Aβ1−42 cut-off of 0.52 (Duits et al., 2014) was used to define the τ/Aβ − (low-risk AD: t-tau/Aβ1−42 <= 0.52) and τ/Aβ + (high-risk AD: t-tau/Aβ1−42 > 0.52) sub-groups within each of the 7 stratified image groups. In Fig. 5, it can be seen that, for both the sNC and sDAT, the mean FPDS value is lower among the τ/Aβ − compared to the τ/Aβ +. Accordingly, for the sNC which belong to the DAT− trajectory, the classification accuracy (using a 0.5 FPDS threshold) in the τ/Aβ − is higher compared to τ/Aβ +. On the other hand, for the sDAT which are on the DAT+ trajectory, the classification accuracy is higher among the τ/Aβ + instead. A similar trend can also be observed among the validation groups (uNC, sMCI, pNC, pMCI, eDAT), as shown in Fig. 6, where all the groups show lower mean FPDS values among the τ/Aβ − in comparison to τ/Aβ +. Furthermore, the classification accuracies are higher in the τ/Aβ − for the DAT− groups (uNC, sMCI) compared to τ/Aβ +, whereas they are higher in the τ/Aβ + for the DAT+ groups (pNC, pMCI, eDAT).

4. Discussion

In this paper we report the development and validation of a novel FDG-PET DAT score (FPDS). We computed the FPDS using a multi-scale supervised ensemble learning approach on FDG-PET images. The FPDS is a single scalar value between 0 and 1. It indicates the probability of the brain metabolism profile captured in a subject's FDG-PET image to be belonging to the DAT+ trajectory. The FPDS was developed in an ensemble machine-learning paradigm trained on FDG-PET images belonging to sNC and sDAT subjects from the ADNI database. FPDS as a DAT biomarker was then comprehensively validated on a large number of ADNI FDG-PET images (N=2984) across the sNC, uNC, sMCI, pNC, pMCI, eDAT and sDAT stratification.

4.1. Real-world stratification scheme

The proposed stratification of imaging data into the 7 groups (Table 1) provided a clinically relevant perspective for the development of the FPDS framework. Particularly, the stratification scheme helped establish a clear delineation between images taken from subjects on the DAT− and DAT+ trajectories, and thus formulating DAT biomarker discovery as a supervised machine learning problem of building a classification model that can predict the probability of an image belonging to either the DAT− or DAT+ class. Most previous studies on imaging biomarkers were limited to stratifying images based on the NC, MCI and DAT diagnostic labels assigned at the time of image acquisition (Rathore et al., 2017). However, in recent years there has been interest in developing early stage DAT biomarkers adopting a sMCI/pMCI stratification of images associated with a clinical diagnosis of MCI (Tong et al., 2017). Our novel approach extends this MCI image stratification idea to the entire DAT spectrum by also stratifying the NC and DAT images into the sNC/uNC/pNC and eDAT/sDAT groups respectively. This enabled the validation of the FPDS in a realistic experimental setting that is quite close to a practical clinical setup, where images from the uNC, sMCI, pNC, pMCI and eDAT groups were completely blinded from the trained ensemble classification model. We put forth our stratification approach as an ideal benchmark to evaluate future DAT biomarker methods.

4.2. Characteristics of the stratified groups

The training and validation image sets used in our analysis were found to be unbiased with respect to the associated relevant non-imaging phenotypic information, justifying the ignoring of non-imaging covariates in the proposed supervised learning framework. The age, MMSE and CSF t-tau/Aβ1−42 values observed across the stratified groups did not reveal any anomalous group difference patterns Table 1, Table 2 that could potentially confound the proposed FDG-PET imaging based analysis. Most importantly, the sNC and sDAT groups used for training the FPDS model had similar mean ages, and further as expected the sNC group had a significantly higher MMSE but a significantly lower t-tau/Aβ1−42 compared to the sDAT group. Moreover, mean ages among the sMCI, pMCI and eDAT groups in the validation image set were also comparable to the training groups. The other two validation groups namely uNC and pNC had slightly, yet statistically significantly higher mean ages (∼ 3 years older) than the training groups. However, this significant group difference might just be reflective of a sampling bias given that the uNC (N=110) and pNC (N=58) groups have considerably fewer images compared to the training groups, sNC (N=753) and sDAT (N=464). The group differences in MMSE and t-tau/Aβ1−42 values between the validation and training groups followed known patterns, where the DAT− groups (uNC and sMCI) showed significantly higher mean MMSE but significantly lower mean t-tau/Aβ1−42 when compared to the sDAT group, whereas the DAT+ groups (pNC, pMCI and eDAT) had significantly lower mean MMSE but significantly higher mean t-tau/Aβ1−42 in comparison to the sNC group.

4.3. FPDS computation model characteristics

Two aspects of the trained ensemble classification model warrant further discussion, viz., the ROIs chosen by the model for FPDS computation (Table 3), and the model's predictive performance on the sNC and sDAT groups (Fig. 1).

The ROIs selected by the ensemble model included parieto-temporal regions along with precuneus and cingulate gyrus. Recent studies have demonstrated the hypometabolism of parieto-temporal regions including precuneus and posterior cingulate as the earliest evidences for MCI progression to DAT Arbizu et al., 2013, Ewers et al., 2014. Further, left hemisphere regions were chosen more often compared to their corresponding contralateral regions. Similar, preferential left sided hypometabolism was reported during early stages of DAT (Brown et al., 2014). Hence, the ROIs chosen by the ensemble model for FPDS computation are consistent with established spatial hypometabolism patterns in DAT.

The ensemble model's FPDS predictions on the sNC and sDAT images were consistent with the fact that these images belong to individuals who are at the extremities of the DAT spectrum, i.e., the FPDS distributions of sNC and sDAT were skewed and only had a small overlap with AUCs of 0.95 and 0.98 for the baseline and follow-up subgroups respectively. Interestingly, among the sNC images with FPDS>0.5 (misclassified as DAT+), the mean t-tau/Aβ1−42 was found to be slightly higher (0.42 vs 0.36, p=0.2463) than the sNC images with FPDS<=0.5 (correctly identified as DAT−). Whereas, in sDAT images with FPDS<=0.5 (misclassified as DAT−), the mean t-tau/Aβ1−42 was statistically significantly lower (0.84 vs 1.06, p=0.0073) when compared to sDAT images with FPDS>0.5 (accurately labeled as DAT+). These observations agree with the positive correlations found between the t-tau/Aβ1−42 and FPDS values among the sNC and sDAT groups respectively (Fig. 5). The occurrence of DAT like metabolism patterns (higher FPDS) among the misclassified sNC might be owing to their increased t-tau/Aβ1−42 values and in a similar manner the shift away from DAT metabolism patterns (lower FPDS) among the misclassified sDAT could be attributed to their relatively lower t-tau/Aβ1−42 values.

The predictive performance of the ensemble model is on par with (or better than) the sNC vs sDAT classification results published in the latest FDG-PET imaging based studies, which showed AUCs ranging from 0.93 on a cohort of 52 sNC and 51 sDAT images (Ye et al., 2016) to 0.97 on a 117 sNC and 113 sDAT cohort (Li et al., 2017). These studies evaluated their classification models using a 10-fold cross-validation scheme, which is known (Bylander, 2002) to produce generalization error estimates (measure of predictive performance on unseen data) similar to that of the out-of-bag prediction scheme, that was used for evaluation of the ensemble model on baseline sNC and sDAT images. However, arguably the ensemble model's performance on the follow-up images gives a much better estimate of the generalization error, as the follow-up images were completely hidden during the ensemble model training process and hence can be considered as unseen data, despite their implicit relation to the corresponding baseline images. It's also important to highlight the relatively large sample size of the follow-up image set (393 sNC and 226 sDAT images) in comparison to the cohorts used in previous FDG-PET studies Ye et al., 2016, Li et al., 2017, Weiner et al., 2017. This further underscores the confidence in the reported predictive performance of the ensemble model on sNC and sDAT groups.

Comprehensive evaluation of the FPDS computation model

The ensemble classification model's predictive performance evaluated on a large independent validation set of images (N=1767), taken from individuals at different stages of AD spectrum, provided a rigorous and a realistic way to assess the potential of using FPDS for DAT diagnosis (Fig. 2). The ensemble model achieved an AUC of 0.78 in discriminating the DAT− (uNC and sMCI) and the DAT+ (pNC, pMCI and eDAT) groups, strongly advocating the consideration of FPDS as a DAT biomarker. A more detailed analysis of the FPDS predictions across the DAT− and DAT+ groups revealed a non-trivial association between the t-tau/Aβ1−42 and FPDS values. The DAT− images with FPDS>0.5 (misclassified as DAT+) had statistically significantly higher mean t-tau/Aβ1−42 (0.71 vs 0.49, p <0.0001) compared to the DAT− images with FPDS<=0.5 (correctly labeled as DAT−). While, the mean t-tau/Aβ1−42 among the DAT+ images with FPDS<=0.5 (misclassified as DAT−) was found to be significantly lower (0.74 vs 0.94, p=0.0015) compared to the DAT+ images with FPDS>0.5 (correctly labeled as DAT+). In light of these findings, along with the positive correlations observed between t-tau/Aβ1−42 and FPDS within each of the DAT− and DAT+ groups (Fig. 6), it can be speculated that the relatively higher t-tau/Aβ1−42 values might be triggering the presence of DAT like metabolism patterns (higher FPDS) in misclassified DAT−. Similarly, the comparatively lower t-tau/Aβ1−42 values could be the underlying cause behind the lack of DAT like metabolism patterns (lower FPDS) among the misclassified DAT+.

The predicted FPDS values for the sMCI images were observed to increase with age (Fig. 3), i.e., images corresponding to older subjects tended to have higher FPDS values compared to images taken from younger subjects. In particular, when comparing the two subgroups of sMCI images whose age ranges were above and below the average sMCI age of ∼ 75 years (Table 1) respectively, the mean FPDS for the images in the older subgroup was found to be significantly greater than the mean FPDS among the images from the younger subgroup (0.45 vs 0.22, p <0.0001). Further, as could be expected based on the statistically significant positive correlation observed between FPDS and t-tau/Aβ1−42 in sMCI (r=0.2526 with p <0.0001, Fig. 6), the older subgroup also showed a significantly higher mean t-tau/Aβ1−42 compared to the younger subgroup (0.61 vs 0.51, p <0.0001). Apart from the sMCI group, none of the other uNC, pNC, pMCI and eDAT groups displayed any apparent age specific FPDS patterns (Fig. 3). Nevertheless, among these groups a trend of positive correlations between t-tau/Aβ1−42 and FPDS was observed (Fig. 6). This suggests a possible age independent causal relationship between t-tau/Aβ1−42 and the occurrence of DAT like metabolism patterns.

In the pMCI group, the predicted FPDS values were found to decrease with longer time to conversion (Fig. 4). Notably, the mean FPDS for the subgroup of images that were within 4 years to conversion was significantly higher than that of the image subgroup whose conversion times exceeded 4 years (0.67 vs 0.43, p <0.0001). Moreover, in concordance with the positive correlation observed between t-tau/Aβ1−42 and FPDS among the pMCI (r=0.1312 with p <0.0609, Fig. 6), the mean t-tau/Aβ1−42 for the within 4 years to conversion subgroup was also significantly higher compared to the subgroup with longer than 4-year conversion times (0.91 vs 0.66, p=0.0229). Based on these findings it is conceivable that, from around 4 years prior to a clinical diagnosis of DAT there might be a noticeable increase in the t-tau/Aβ1−42 causing a prevalence of DAT like metabolism patterns among the pMCI. While beyond the 4-year conversion window, it can be expected that there would be a considerable reduction in the appearance of DAT like metabolism patterns in pMCI. In fact, metabolic disruptions in an earlier NC stage of pMCI were found to be virtually undetectable as evidenced by the significantly lower mean FPDS of the pNC group compared to the pMCI (0.35 vs 0.63, p <0.0001) and also indicated by the extremely low classification accuracy (0.28, Fig. 4) achieved on pNC images.

MCI conversion prediction - comparison with state-of-the-art

Several FDG-PET image analysis methods have previously been considered for addressing the task of predicting MCI to DAT conversion Young et al., 2013, Zhu et al., 2014, Cheng et al., 2015a, Cheng et al., 2015b, Lange et al., 2015, Wang et al., 2016, Pagani et al., 2017, Inui et al., 2017, Liu et al., 2017, 2. In these methods, the main idea is to train a binary classification model for separating the MCI into two groups, the sMCI which remain stable and the pMCI that convert to DAT in the future. Aside from the standard approach of using images from the sMCI and pMCI groups as training data Wang et al., 2016, Pagani et al., 2017, Zhu et al., 2014, some of the methods have augmented the training process with information derived from the sNC and sDAT images as well Liu et al., 2017, 2, Lange et al., 2015, Cheng et al., 2015a, Cheng et al., 2015b. Further, akin to the proposed approach for training the FPDS computation model, there were a few methods that solely employed the sNC and sDAT images during the training phase Young et al., 2013, Inui et al., 2017.

In Table 4, the sMCI vs pMCI classification results reported in the aforementioned works are summarized. Table 4 also shows the AUC achieved when using the FPDS to discriminate between the sMCI and the pMCI that are within 2, 3 and 5 years to conversion respectively. The proposed FPDS based approach outperformed almost all of the state-of-the-art methods achieving an AUC of more than 0.77 in each of the three time to conversion cases. Only Wang et al., 2016, Pagani et al., 2017 reported higher AUCs than the proposed approach. However, these two methods reported the cross-validated AUC, whereas a more challenging independent validation experiment was used to evaluate the performance of the FPDS approach. Moreover, in Wang et al., 2016, Pagani et al., 2017, the classification model parameter tuning was done using the testing subsets of the cross-validation splits which leads to inflated estimates of the classification performance. In fact to avoid such an optimistic performance evaluation, the other methods reporting cross-validated AUCs Zhu et al., 2014, Cheng et al., 2015a, Cheng et al., 2015b used “nested” cross-validation, where the classification model parameters were tuned on the training subsets of the cross-validation splits rather than the testing subsets. Last, it should be highlighted that the better performance of the FPDS approach was demonstrated on a considerably larger sample size (>5x more images) compared to the other methods.

Table 4.

Comparison of sMCI vs pMCI classification performance obtained using FPDS with the state-of-the-art FDG-PET based methods.

Study sMCI:pMCI [images] Time to conversion Evaluation scheme AUC
Zhu et al. (2014) 56:43 0–2 years 10-fold CVa 0.774
Cheng et al. (2015a) 56:43 0–2 years 10-fold CV 0.734
Cheng et al. (2015b) 56:43 0–2 years 10-fold CV 0.741
FPDS 881:254 0–2 years Independent validation 0.806
Young et al. (2013) 96:47 0–3 years Independent validation 0.767
Lange et al. (2015) 181:60 0–3 years Independent validation 0.746
Wang et al. (2016) 65:64 0–3 years LOOCVa 0.802
Liu et al. (2017) 108:126 0–3 years Prediction on training set 0.736
FPDS 881:362 0–3 years Independent validation 0.796
Pagani et al. (2017) 27:95 0–5 years 21-fold CV 0.911
Inui et al. (2017) 19:49 0–5 years Independent validation 0.712
FPDS 881:442 0–5 years Independent validation 0.772
a

CV: cross-validation, LOOCV: leave-one-out cross-validation.

4.4. Limitations and future directions

The results reported in this paper are understandably limited by the ADNI data characteristics. In general, the final (at time of death) clinical diagnosis for ADNI subjects is not known, this is because either the subjects are surviving or they were not followed-up around their demise. Consequently, it is possible that some subjects currently determined to be on the DAT− trajectory, i.e., with images belonging to the sNC/uNC/sMCI groups, might receive a clinical diagnosis of DAT in the future. While this is a limitation in our current results, in case the final diagnosis for such subjects becomes available, it would be interesting to review if the ensemble model had actually correctly predicted the sNC/uNC/sMCI images of these subjects as belonging to the DAT+ trajectory (FPDS>0.5). Another limitation of the reported results, as mentioned before, is that the correlation analysis between the FPDS and t-tau/Aβ1−42 values was reported only on a subset of the images owing to partial availability of CSF measures in the ADNI database. In spite of these ADNI data related limitations, it is important to note that both the novel stratification scheme and the ensemble classification framework proposed in our work have a more general applicability and are not specific to the ADNI cohort used in this study. In fact, as part of future work we plan to extend our methodology to incorporate multimodal imaging data and validate it on other relevant AD neuroimaging databases.

Acknowledgments

Funding for this research is gratefully acknowledged from National Science Engineering Research Council (NSERC), Canadian Institutes of Health Research (CIHR), Brain Canada, Pacific Alzheimer's Research Foundation, the Michael Smith Foundation for Health Research (MSFHR), and the National Institute on Aging (R01 AG055121-01A1). We thank Compute Canada for the computational infrastructure provided for the data processing in this study. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

References

  1. Alzheimer's Association Alzheimer's disease facts and figures. Alzheimers Dement. 2015;11(3):332. doi: 10.1016/j.jalz.2015.02.003. [DOI] [PubMed] [Google Scholar]
  2. Arbizu J., Prieto E., Martinez-Lage P., Marti-Climent J.M., Garcia-Granero M., Lamet I., Pastor P., Riverol M., Gomez-Isla M.T., Penuelas I., Richter J.A., Weiner M.W., Alzheimer's Disease Neuroimaging Initiative Automated analysis of FDG PET as a tool for single-subject probabilistic prediction and detection of Alzheimer's disease dementia. Eur. J. Nucl. Med. Mol. Imaging. 2013;40(9):1394–1405. doi: 10.1007/s00259-013-2458-z. [DOI] [PubMed] [Google Scholar]
  3. Beg M.F., Miller M.I., Trouvé A., Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphism. Int. J. Comput. Vis. 2005;61(2):139–157. [Google Scholar]
  4. Brown R.K., Bohnen N.I., Wong K.K., Minoshima S., Frey K.A. Brain PET in suspected dementia: patterns of altered FDG metabolism. Radiographics. 2014;34(3):684–701. doi: 10.1148/rg.343135065. [DOI] [PubMed] [Google Scholar]
  5. Buhlmann P. 2003. Bagging, subagging and bragging for improving some prediction algorithms: Recent Advances and Trends in Nonparametric Statistics. [Google Scholar]
  6. Bylander T. Estimating generalization error on two-class datasets using out-of-bag estimates. Mach. Learn. 2002;48(1/3):287–297. [Google Scholar]
  7. Cheng B., Liu M., Suk H.-I., Shen D., Zhang D., Alzheimer's Disease Neuroimaging Initiative, Alzheimer's Disease Neuroimaging Multimodal manifold-regularized transfer learning for MCI conversion prediction. Brain Imaging Behav. 2015;9(4):913–926. doi: 10.1007/s11682-015-9356-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cheng B., Liu M., Zhang D., Munsell B.C., Shen D. Domain transfer learning for MCI conversion prediction. IEEE Trans. Biomed. Eng. 2015;62(7):1805–1817. doi: 10.1109/TBME.2015.2404809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cummings J.L., Morstorf T., Zhong K. Alzheimer's disease drug-development pipeline: few candidates, frequent failures. Alzheimers Res. Ther. 2014, 7;6(4):37. doi: 10.1186/alzrt269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dale A.M., Fischl B., Sereno M.I. Cortical surface-based analysis: I. segmentation and surface reconstruction. Neuroimage. 1999;9(2):179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
  11. Damoulas T., Girolami M.A. Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection. Bioinformatics. 2008;24(10):1264–1270. doi: 10.1093/bioinformatics/btn112. May. [DOI] [PubMed] [Google Scholar]
  12. Desikan R.S., Ségonne F., Fischl B., Quinn B.T., Dickerson B.C., Blacker D., Buckner R.L., Dale A.M., Maguire R.P., Hyman B.T. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31(3):968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]
  13. Dietterich T.G. International workshop on multiple classifier systems. Springer; 2000. Ensemble methods in machine learning; pp. 1–15. [Google Scholar]
  14. Duits F.H., Teunissen C.E., Bouwman F.H., Visser P.-J., Mattsson N., Zetterberg H., Blennow K., Hansson O., Minthon L., Andreasen N., Marcusson J., Wallin A., Rikkert M.O., Tsolaki M., Parnetti L., Herukka S.-K., Hampel H., De Leon M.J., Schröder J., Aarsland D., Blankenstein M.A., Scheltens P., van der Flier W.M. The cerebrospinal fluid Alzheimer profile: easily said, but what does it mean? Alzheimers Dement. 2014;10(6):713–723. doi: 10.1016/j.jalz.2013.12.023. 11. [DOI] [PubMed] [Google Scholar]
  15. Ewers M., Brendel M., Rizk-Jackson A., Rominger A., Bartenstein P., Schuff N., Weiner M.W., Alzheimer's Disease Neuroimaging Initiative (ADNI) Reduced FDG-PET brain metabolism and executive function predict clinical progression in elderly healthy subjects. NeuroImage: Clin. 2014;4:45–52. doi: 10.1016/j.nicl.2013.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fischl B., Salat D.H., Busa E., Albert M., Dieterich M., Haselgrove C., van der Kouwe A., Killiany R., Kennedy D., Klaveness S., Montillo A., Makris N., Rosen B., Dale A.M. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33(3):341–355. doi: 10.1016/s0896-6273(02)00569-x. [DOI] [PubMed] [Google Scholar]
  17. Frisoni G.B., Boccardi M., Barkhof F., Blennow K., Cappa S., Chiotis K., Démonet J.-F., Garibotto V., Giannakopoulos P., Gietl A., Hansson O., Herholz K., Jack C.R., Nobili F., Nordberg A., Snyder H.M., Ten Kate M., Varrone A., Albanese E., Becker S., Bossuyt P., Carrillo M.C., Cerami C., Dubois B., Gallo V., Giacobini E., Gold G., Hurst S., Lönneborg A., Lovblad K.-O., Mattsson N., Molinuevo J.-L., Monsch A.U., Mosimann U., Padovani A., Picco A., Porteri C., Ratib O., Saint-Aubert L., Scerri C., Scheltens P., Schott J.M., Sonni I., Teipel S., Vineis P., Visser P.J., Yasui Y., Winblad B. Strategic roadmap for an early diagnosis of Alzheimer's disease based on biomarkers. Lancet Neurol. 2017;16(8):661–676. doi: 10.1016/S1474-4422(17)30159-X. 8. [DOI] [PubMed] [Google Scholar]
  18. Garibotto V., Herholz K., Boccardi M., Picco A., Varrone A., Nordberg A., Nobili F., Ratib O. Clinical validity of brain fluorodeoxyglucose positron emission tomography as a biomarker for Alzheimer's disease in the context of a structured 5-phase development framework. Neurobiol. Aging. 2017;52:183–195. doi: 10.1016/j.neurobiolaging.2016.03.033. 4. [DOI] [PubMed] [Google Scholar]
  19. Godyń J., Jończyk J., Panek D., Malawska B. Therapeutic strategies for Alzheimer's disease in clinical trials. Pharmacol. Rep. 2016;68(1):127–138. doi: 10.1016/j.pharep.2015.07.006. 2. [DOI] [PubMed] [Google Scholar]
  20. Grabner G., Janke A., Budge M., Smith D., Pruessner J., Collins D. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2006. 2006. Symmetric atlasing and model based segmentation: an application to the hippocampus in older adults; pp. 58–66. [DOI] [PubMed] [Google Scholar]
  21. Gray K.R., Wolz R., Heckemann R.A., Aljabar P., Hammers A., Rueckert D. Multi-region analysis of longitudinal FDG-PET for the classification of Alzheimer's disease. NeuroImage. 2012;60(1):221–229. doi: 10.1016/j.neuroimage.2011.12.071. Mar. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hebert L.E., Weuve J., Scherr P.A., Evans D.A. Alzheimer disease in the United States (2010–2050) estimated using the 2010 census. Neurology. 2013;80(19):1778–1783. doi: 10.1212/WNL.0b013e31828726f5. May. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Herholz K., Adams R., Kessler J., Szelies B., Grond M., Heiss W.-D. Criteria for the diagnosis of Alzheimer's disease with Positron Emission Tomography. Dement. Geriatr. Cogn. Disord. 1990;1(3):156-Ű164. [Google Scholar]
  24. Herholz K., Salmon E., Perani D., Baron J.-C., Holthoff V., Frölich L., Schönknecht P., Ito K., Mielke R., Kalbe E., Zündorf G., Delbeuck X., Pelati O., Anchisi D., Fazio F., Kerrouche N., Desgranges B., Eustache F., Beuthien-Baumann B., Menzel C., Schröder J., Kato T., Arahata Y., Henze M., Heiss W.-D. Discrimination between Alzheimer dementia and controls by automated analysis of multicenter FDG PET. NeuroImage. 2002;17(1):302Ű-316. doi: 10.1006/nimg.2002.1208. 9. [DOI] [PubMed] [Google Scholar]
  25. Inui Y., Ito K., Kato T., SEAD-J Study Group Longer-term investigation of the value of 18F-FDG-PET and magnetic resonance imaging for predicting the conversion of mild cognitive impairment to Alzheimer's disease: a multicenter study. J. Alzheimers Dis. 2017:1–11. doi: 10.3233/JAD-170395. 9 Preprint (Preprint) [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jack C.R., Knopman D.S., Jagust W.J., Petersen R.C., Weiner M.W., Aisen P.S., Shaw L.M., Vemuri P., Wiste H.J., Weigand S.D., Lesnick T.G., Pankratz V.S., Donohue M.C., Trojanowski J.Q. Tracking pathophysiological processes in Alzheimer's disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 2013;12(2):207–216. doi: 10.1016/S1474-4422(12)70291-0. 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Jenkinson M., Bannister P., Brady M., Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage. 2002;17(2):825–841. doi: 10.1016/s1053-8119(02)91132-8. [DOI] [PubMed] [Google Scholar]
  28. Lange C., Suppa P., Frings L., Brenner W., Spies L., Buchert R. Optimization of statistical single subject analysis of brain FDG PET for the prognosis of mild cognitive impairment-to-Alzheimer's disease conversion. J. Alzheimers Dis. 2015;49(4):945–959. doi: 10.3233/JAD-150814. 11. [DOI] [PubMed] [Google Scholar]
  29. Li Q., Wu X., Xu L., Chen K., Yao L., Li R. Multi-modal discriminative dictionary learning for Alzheimer's disease and mild cognitive impairment. Comput. Methods Prog. Biomed. 2017;150:1–8. doi: 10.1016/j.cmpb.2017.07.003. [DOI] [PubMed] [Google Scholar]
  30. Liu K., Chen K., Yao L., Guo X. Prediction of mild cognitive impairment conversion using a combination of independent component analysis and the Cox model. Front. Hum. Neurosci. 2017, 2;11(February):33. doi: 10.3389/fnhum.2017.00033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mosconi L., Berti V., Glodzik L., Pupi A., De Santi S., de Leon M.J. Pre-clinical detection of Alzheimer's disease using FDG-PET, with or without amyloid imaging. J. Alzheimers Dis. 2010;20(3):843–854. doi: 10.3233/JAD-2010-091504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Pagani M., Nobili F., Morbelli S., Arnaldi D., Giuliani A., Öberg J., Girtler N., Brugnolo A., Picco A., Bauckneht M., Piva R., Chincarini A., Sambuceti G., Jonsson C., de Carli F. Early identification of MCI converting to AD: a FDG PET study. Eur. J. Nucl. Med. Mol. Imaging. 2017;44(12):2042–2052. doi: 10.1007/s00259-017-3761-x. 6. [DOI] [PubMed] [Google Scholar]
  33. Petersen R.C., Aisen P.S., Beckett L.A., Donohue M.C., Gamst A.C., Harvey D.J., Jack C.R., Jr, Jagust W.J., Shaw L.M., Toga A.W., Trojanowski J.Q., Weiner M.W. Alzheimer's Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology. 2010;74(3):201–209. doi: 10.1212/WNL.0b013e3181cb3e25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Prince M., Comas-Herrera A., Knapp M., Guerchet M., Karagiannidou M. Tech. rep. Alzheimer's Disease International; 2016. World Alzheimer Report 2016 Improving healthcare for people living with dementia. Coverage, Quality and costs now and in the future. [Google Scholar]
  35. Raamana P.R., Weiner M.W., Wang L., Beg M.F., Alzheimer's Disease Neuroimaging Initiative Thickness network features for prognostic applications in dementia. Neurobiol. Aging. 2015;36:S91–S102. doi: 10.1016/j.neurobiolaging.2014.05.040. 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Rathore S., Habes M., Iftikhar M.A., Shacklett A., Davatzikos C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer's disease and its prodromal stages. NeuroImage. 2017;155:530–548. doi: 10.1016/j.neuroimage.2017.03.057. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Sanabria-Diaz G., Martinez-Montes E., Melie-Garcia L., Alzheimer's Disease Neuroimaging Initiative Glucose metabolism during resting state reveals abnormal brain networks organization in the Alzheimer's disease and mild cognitive impairment. PLoS ONE. 2013;8(7) doi: 10.1371/journal.pone.0068860. e68860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Tong T., Gao Q., Guerrero R., Ledig C., Chen L., Rueckert D., Alzheimer's Disease Neuroimaging Initiative A novel grading biomarker for the prediction of conversion from mild cognitive impairment to Alzheimer's disease. IEEE Trans. Biomed. Eng. 2017;64(1):155–165. doi: 10.1109/TBME.2016.2549363. 1. [DOI] [PubMed] [Google Scholar]
  39. Wang P., Chen K., Yao L., Hu B., Wu X., Zhang J., Ye Q., Guo X. Multimodal classification of mild cognitive impairment based on partial least squares. J. Alzheimers Dis. 2016;54(1):359–371. doi: 10.3233/JAD-160102. 8. [DOI] [PubMed] [Google Scholar]
  40. Weiner M.W., Veitch D.P., Aisen P.S., Beckett L.A., Cairns N.J., Green R.C., Harvey D., Jack C.R., Jagust W., Morris J.C., Petersen R.C., Saykin A.J., Shaw L.M., Toga A.W., Trojanowski J.Q. Recent publications from the Alzheimer's disease neuroimaging initiative: reviewing progress toward improved AD clinical trials. Alzheimers Dement. 2017;13(4):e1–e85. doi: 10.1016/j.jalz.2016.11.007. 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Witkin A. Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’84. vol. 9. IEEE; 1984. Scale-space filtering: a new approach to multi-scale description; pp. 150–153. [Google Scholar]
  42. Ye T., Zu C., Jie B., Shen D., Zhang D., Alzheimer's Disease Neuroimaging Initiative Discriminative multi-task feature selection for multi-modality classification of Alzheimer's disease. Brain Imaging Behav. 2016;10(3):739–749. doi: 10.1007/s11682-015-9437-x. 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Young J., Modat M., Cardoso M.J., Mendelson A., Cash D., Ourselin S., Alzheimer's Disease Neuroimaging Initiative Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment. NeuroImage: Clinical. 2013;2(1):735–745. doi: 10.1016/j.nicl.2013.05.004. 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Zhu X., Suk H.I., Shen D. A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis. NeuroImage. 2014;100:91–105. doi: 10.1016/j.neuroimage.2014.05.078. 10. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from NeuroImage : Clinical are provided here courtesy of Elsevier

RESOURCES