Deep Learning based Classification of FDG-PET Data for Alzheimers Disease Categories

Shibani Singh; Anant Srivastava; Liang Mi; Richard J Caselli; Kewei Chen; Dhruman Goradia; Eric M Reiman; Yalin Wang

doi:10.1117/12.2294537

. Author manuscript; available in PMC: 2017 Dec 18.

Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2017 Nov 17;10572:105720J. doi: 10.1117/12.2294537

Deep Learning based Classification of FDG-PET Data for Alzheimers Disease Categories

Shibani Singh ^a, Anant Srivastava ^a, Liang Mi ^a, Richard J Caselli ^b, Kewei Chen ^c, Dhruman Goradia ^c, Eric M Reiman ^c, Yalin Wang ^a

PMCID: PMC5733797 NIHMSID: NIHMS901022 PMID: 29263566

Abstract

Fluorodeoxyglucose (FDG) positron emission tomography (PET) measures the decline in the regional cerebral metabolic rate for glucose, offering a reliable metabolic biomarker even on presymptomatic Alzheimer's disease (AD) patients. PET scans provide functional information that is unique and unavailable using other types of imaging. However, the computational efficacy of FDG-PET data alone, for the classification of various Alzheimers Diagnostic categories, has not been well studied. This motivates us to correctly discriminate various AD Diagnostic categories using FDG-PET data. Deep learning has improved state-of-the-art classification accuracies in the areas of speech, signal, image, video, text mining and recognition. We propose novel methods that involve probabilistic principal component analysis on max-pooled data and mean-pooled data for dimensionality reduction, and multilayer feed forward neural network which performs binary classification. Our experimental dataset consists of baseline data of subjects including 186 cognitively unimpaired (CU) subects, 336 mild cognitive impairment (MCI) subjects with 158 Late MCI and 178 Early MCI, and 146 AD patients from Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. We measured F1-measure, precision, recall, negative and positive predictive values with a 10-fold cross validation scheme. Our results indicate that our designed classifiers achieve competitive results while max pooling achieves better classification performance compared to mean-pooled features. Our deep model based research may advance FDG-PET analysis by demonstrating their potential as an effective imaging biomarker of AD.

Keywords: Deep Learning, Multilayer Perceptrons, Alzheimers, Neural Networks, Cross Validation, Dimensionality Reduction, PET

1. Introduction

In the study of Alzheimers disease (AD), neuroimaging based measures have shown high sensitivity in tracking changes over time and thus were proposed as possible biomarkers to evaluate AD burden and progression and response to interventions. In addition to the pathological amyloid and tau imaging measurements for AD, fluorodeoxyglucose (FDG) positron emission tomography (PET) characterizes the cerebral glucose hypometabolism related to AD and AD risk, offering a reliable metabolic biomarker even at its presymptomatic stage. Fig. 1 visualizes the neural activity in normalized PET scans of AD and normal subjects. We see that the central image in both cases displays loss of functionality for AD patients as compared to normal. There has been growing interest to study FDG-PET for AD and AD risk and particularly to identify and predict mild cognitive impairment (MCI). Although numerous analysis tools have been developed, much of the prior work (e.g.¹), has relied on voxel-wise analysis corrected by multiple comparisons to discover group-wise differences and the general trend in data. However, there are a number of issues in extending the group analysis framework to compute AD risk on individual basis. For example, prior work has showed that the statistically significant pixels obtained in group difference studies do not necessarily carry strong statistical power for predictions.² To develop an effective precision medicine, one needs some system which may be able to measure subtle difference and make robust prediction/classification on an individual basis. Thus far, it is still challenging to build FDG-PET imaging diagnosis and prognosis systems because of the tremendous difficulty to optimally integrate global functional image information.

Normalized PET image slices for CU and AD subjects.

In the study of Alzheimers disease (AD), neuroimaging based measures have shown high sensitivity in tracking changes over time and thus were proposed as possible biomarkers to evaluate AD burden, progression and response to interventions. In addition to the pathological amyloid and tau imaging measurements for AD, fluorodeoxyglucose (FDG) positron emission tomography (PET) characterizes the cerebral glucose hypometabolism related to AD and AD risk, offering a reliable metabolic biomarker even at its presymptomatic stage. Fig. 1 visualizes the neural activity in normalized PET scans of AD and normal subjects. We see that the central image in both cases displays loss of functionality for AD patients as compared to normal. There has been growing interest to study FDG-PET for AD and AD risk and particularly to identify and predict mild cognitive impairment (MCI). Although numerous analysis tools have been developed, much of the prior work (e.g.¹), has relied on voxel-wise analysis corrected by multiple comparisons to discover group-wise differences and the general trend in data. However, there are a number of issues in extending the group analysis framework to compute AD risk on individual basis. For example, prior work has showed that the statistically significant pixels obtained in group difference studies do not necessarily carry strong statistical power for predictions.² To develop an effective precision medicine, one needs some system which may be able to measure subtle difference and make robust prediction/classification on an individual basis. Thus far, it is still challenging to build FDG-PET imaging diagnosis and prognosis systems because of the tremendous difficulty to optimally integrate global functional image information.

Recently deep learning has helped achieve state-of-the-art classification results in myriad classification problems in the areas of signal, speech, text, image processing and medical imaging.³ Deep learning based feature representation using auto-encoders was recently used to achieve high accuracies using MRI and PET data.⁴ Deep learning has also been used for classification using MRI and PET data.⁵ Classification has been improvised using a combination of multiple imaging modalities to improvise on neuroimaging biomarkers, requiring less labeled data.⁶ The advance in deep learning research inspires us to develop novel deep learning methods to advance the FDG-PET analysis research which may facilitate their use in preclinical and clinical AD treatment development.

In this work, we propose a novel method that involves dimensionality reduction using Probabilistic Principal Component Analysis on max-pooled data and mean-pooled data, and a Multilayer Feed Forward Neural Network (also known as Multilayer Perceptron(MLP)) which performs binary classification. Fig 2 shows the pipeline of our system. We validated our algorithm in the Alzheimers Disease Neuroimaging Initiative (ADNI) baseline dataset (N= 668) consisting of baseline data of subjects including 186 healthy control (CU), 336 Mild Cognitive Impairment (MCI) with 158 Late MCI and 178 Early MCI, and 146 AD. The FDG-PET images were processed using SPM,⁷ for alignment, segmentation and normalization. We measured F1-measure, precision, recall, negative and positive predictive values with a 10-fold cross validation. Our results indicate that our designed classifiers achieve competitive results while max pooling results into better classification performance compared to results on mean pooled features.

Classification Pipeline. From the left to the right: pre-processed PET data is the initial input data; PET images are normalized by ; Max-pooling/Mean-pooling for two classification pipelines performed on each subject's data to reduce the dimension of features from 79x95x79 to 4050x1; probabilistic Principal Component Analysis (PCA) is applied to further reduce the number of features from 4050 to 250 – 300; new features with reduced dimensions per image is passed to train a Multilayer Feed-forward Neural Network; the neural network assigns class labels asa binary classification problems.

Our work has three main contributions. First, we propose a coherent and efficient deep learning framework that well explores the possibility of FDG-PET for AD diagnosis. Secondly we evaluated our work in a relatively large dataset and achieved competitive results. Thirdly we exhibit the effective increase in classification performance with the addition of demographic variables (Age, Gender, APOE 1, APOE 2, and FAQ score) to our max-pooled (intensity) data. We conduct thorough comparison experiments by comparing our work to other state-of-the-art FDG-PET analysis methods. Our work may inspire more deep learning based work on FDG-PET analysis and advance preclinical AD research.

2. Data and Methods

We work on FDG-PET data from the ADNI-2 dataset. This dataset contains FDG-PET data that has been manually labeled into diagnostic categories by an expert. The baseline data of patients includes 186 healthy control (CU), 336 Mild Cognitive Impairment (MCI) with 158 Late MCI and 178 Early MCI, and 146 AD.

2.1 Data and Processing

The size of each FDG-PET image is 79 × 95 × 79. Table 1 shows the age distribution for our subjects. We normalize the data to linearly align all the images into a common space using the software toolkit Statistical Parametric Mapping.⁷ The normalized FDG-PET images are of size 79 × 95 × 79. Each value is a voxel intensity value. We use the intensity values for the whole brain in our experiments. Each voxel is a feature and hence the feature dimensionality is 592895(f_dim) per image data sample. Since the number of data samples is much less than the number of features(n ≪ f_dim), we use dimensionality reduction techniques to reduce f_dim. This is discussed in the next section. We then use a multilayer perceptron classifier to perform binary classification. A gene called APOE can influence the risk for the more common late-onset type of Alzheimer's. There are three types of the APOE gene, called alleles: APOE2, E3 and E4. The E2 allele is the rarest form of APOE and carrying even one copy appears to reduce the risk of developing Alzheimer's by up to 40%. APOE3 is the most common allele and doesn't seem to influence risk. The APOE4 allele, present in approximately 20% of people, increases the risk for Alzheimer's and lowers the age of onset. The National Institutes of Health recommends genetic testing for APOE status to advance drug research in clinical trials. APOE4 is just one of many risk factors for dementia and its influence can vary across age, gender, race, and nationality⁸.⁹

Table 1. Age Distribution for Subjects.

Category	Age ± SD	Age Range	Males	Females
AD	74.74 ± 8.16	56 ∼ 90	85	61
MCI	71.88 ± 7.34	55 ∼ 91	186	150
LMCI	72.50 ± 7.51	55 ∼ 91	84	74
EMCI	71.34 ± 7.20	55 ∼ 88	102	76
CU	73.56 ± 6.25	56 ∼ 89	89	97

Patch Size (n)	Without F1	Demographics Precision	Recall	With F1	Demographics Precision	Recall
5	0.9312	0.9462	0.9167	0.9474	0.9677	0.9278
6	0.9175	0.9570	0.8812	0.9708	0.9839	0.9581
7	0.9044	0.9409	0.8706	0.9710	0.9892	0.9534
8	0.9271	0.9570	0.8990	0.9735	0.9892	0.9583
9	0.9231	0.9677	0.8824	0.9737	0.9946	9536
10	0.9255	0.8867	0.9677	0.9735	0.9839	0.9632
11	0.9251	0.9624	0.8905	0.9661	0.9946	0.9391
12	0.9299	0.9624	0.8995	0.9661	0.9946	0.9391
13	0.8144	0.7822	0.8495	0.9561	0.9946	0.9204
14	0.9199	0.9570	0.8856	0.9634	0.9892	0.9388
15	0.9133	0.9624	0.8689	0.9609	0.9892	0.9340

Method	Measure	AD / CU	AD / MCI	CU / MCI	AD / EMCI	AD / LMCI	CU / LMCI	CU / EMCI	LMCI /EMCI
PPCA + MLP	F1	0.9734	0.8954	0.7830	0.8621	0.7790	0.8325	0.72	0.656
	Prec	0.9839	0.9167	0.8214	0.8562	0.7603	0.9086	0.7742	0.6910
	Recall	0.9632	0.875	0.7479	0.8681	0.7986	0.7682	0.6729	0.6244

Truncated SVD + MLP	F1	0.9526	0.9053	0.7734	0.7473	0.7596	0.79	0.6667	0.6062
	Prec	0.9731	0.9673	0.7619	0.7192	0.7466	0.8495	0.7097	0.6573
	Recall	0.9330	0.8508	0.7853	0.7778	0.7730	0.7383	0.6286	0.5625

Kernel PCA + MLP	F1	0.9659	0.8937	0.7598	0.8489	0.7622	0.8082	0.735	0.64
	Prec	0.9859	0.9137	0.7530	0.8082	0.7466	0.8495	0.7903	0.6742
	Recall	0.9436	0.8746	0.7667	0.8940	0.7786	0.7707	0.6869	0.6091

MLP	Measure	AD / CU	AD / MCI	CU / MCI	AD / EMCI	AD / LMCI	CU / LMCI	CU / EMCI	LMCI /EMCI
Maxpooled Data	F1	0.9275	0.8612	0.7527	0.8112	0.7230	0.6976	0.6253	0.6844
	Prec	0.9624	0.8958	0.8155	0.7945	0.7328	0.7258	0.6505	0.7247
	Recall	0.895	0.8292	0.6990	0.8286	0.7133	0.6716	0.6020	0.6482

Maxpooled + Age + Gender	F1	0.9326	0.8632	0.7531	0.8211	0.7569	0.7413	0.6064	0.6813
	Prec	0.9677	0.9018	0.8125	0.8014	0.7466	0.8011	0.6129	0.6966
	Recall	0.9	0.8278	0.7018	0.8417	0.7676	0.6898	0.6	0.6667

Maxpooled + Age + Gender + APOE + FAQ	F1	0.9734	0.8954	0.7830	0.8621	0.7790	0.8325	0.72	0.656
	Prec	0.9839	0.9167	0.8214	0.8562	0.7603	0.9086	0.7742	0.6910
	Recall	0.9632	0.875	0.7479	0.8681	0.7986	0.7682	0.6729	0.6244

Measure	1 HL	2 HL	3 HL	4 HL	5 HL
HL Config	(820)	(820,150)	(820,150,905)	(820,150,905,70)	(820,150,905,70,15)
F₁ score	0.9520	0.9544	0.9558	0.9522	0.9548
Precision	0.9982	0.9844	0.9712	0.9608	0.9642
Recall	0.9098	0.9262	0.9408	0.9438	0.9455

Data	Measure	AD / CU	AD / MCI	CU / MCI	AD / EMCI	AD / LMCI	CU / LMCI	CU / EMCI	LMCI /EMCI
No Demo	F₁ score	0.9430	0.8743	0.7527	0.8747	0.7706	0.6976	0.6388	0.6844
Demo	F₁ score	0.9814	0.9125	0.7858	0.9036	0.8288	0.8325	0.72	0.656

Performance Comparison	Pooling	AD / CU	AD / MCI	CU / MCI	AD / EMCI	AD / LMCI	CU / LMCI	CU / EMCI	LMCI /EMCI
F-1 score	Max	0.92	0.86	0.75	0.82	0.72	0.70	0.63	0.64
F-1 score	Mean	0.93	0.86	0.71	0.85	0.74	0.63	0.57	0.61
Precision	Max	0.97	0.90	0.82	0.80	0.73	0.73	0.65	0.64
Precision	Mean	0.96	0.89	0.73	0.87	0.77	0.62	0.59	0.59
Recall	Max	0.87	0.83	0.70	0.84	0.71	0.67	0.60	0.64
Recall	Mean	0.90	0.83	0.70	0.82	0.71	0.65	0.55	0.63
NPV	Max	0.82	0.57	0.37	0.88	0.73	0.58	0.55	0.60
NPV	Mean	0.87	0.58	0.42	0.77	0.66	0.72	0.54	0.70
PPV	Max	0.97	0.90	0.82	0.80	0.73	0.73	0.65	0.64
PPV	Mean	0.96	0.89	0.73	0.87	0.77	0.62	0.60	0.59

Method	Measure	AD / CU	AD / MCI	CU / MCI	AD / EMCI	AD / LMCI	CU / LMCI	CU / EMCI	LMCI /EMCI
PPCA+MLP	F1	0.9734	0.8954	0.7830	0.8621	0.7790	0.8325	0.72	0.656
	Prec	0.9839	0.9167	0.8214	0.8562	0.7603	0.9086	0.7742	0.6910
	Recall	0.9632	0.875	0.7479	0.8681	0.7986	0.7682	0.6729	0.6244

PPCA + Linear SVM	F1	0.9558	0.8781	0.7625	0.8522	0.7279	0.7413	0.6598	0.6136
	Prec	0.9892	0.8899	0.75	0.8493	0.7329	0.7473	0.6882	0.6067
	Recall	0.9246	0.8667	0.7754	0.8552	0.7230	0.7354	0.6337	0.6207

PPCA + SGD Classifier	F1	0.9551	0.8846	0.7463	0.8255	0.6957	0.7287	0.6773	0.5977
	Prec	0.9731	0.8899	0.7440	0.8425	0.7123	0.7366	0.6828	0.5843
	Recall	0.9378	0.8794	0.7485	0.8092	0.6797	0.7211	0.6720	0.6118

PPCA+GNB Classifier	F1	0.9080	0.8113	0.7098	0.7451	0.6351	0.6841	0.6067	0.6011
	Prec	0.8495	0.8125	0.7024	0.7808	0.6438	0.7043	0.6344	0.6180
	Recall	0.9753	0.8101	0.7173	0.7125	0.6267	0.6650	0.5813	0.5851

#HL	AD vs CU		AD vs CU with Demo
#HL	Config	F₁ Score	Config	F₁ Score
1	(700)	0.9430	(525)	0.9563
2	(700,555)	0.9415	(525,880)	0.9737
3	(700,555,305)	0.9403	(525,880,880)	0.9761
4	(700,555,305,25)	0.9368	(525,880,880,255)	0.9812
5	(700,555,305,25,10)	0.9393	(525,880,880,255,775)	0.9814

#HL	AD vs MCI		AD vs MCI with Demo
#HL	Config	F₁ Score	Config	F₁ Score
1	(85)	0.8684	(160)	0.9086
2	(85,120)	0.8727	(160,270)	0.9125
3	(85,120,110)	0.8743	(160,270,405)	0.9083
4	(85,120,110,625)	0.8734	(160,270,405,350)	0.9086
5	(85,120,110,625,120)	0.8677	(160,270,405,350,215)	0.9140

#HL	EMCI vs AD		EMCI vs AD with Demo
#HL	Config	F₁ Score	Config	F₁ Score
1	(755)	0.8696	(80)	0.9003
2	(755,55)	0.8736	(80,190)	0.9011
3	(755,55,625)	0.8747	(80,190,380)	0.9036
4	(755,55,625,15)	0.8641	(80,190,380,425)	0.9000
5	(755,55,625,15,585)	0.8644	(80,190,380,425,550)	0.8950

#HL	LMCI vs AD		LMCI vs AD with Demo
#HL	Config	F₁ Score	Config	F₁ Score
1	(380)	0.7561	(215)	0.8288
2	(380,660)	0.7706	(215,105)	0.8193
3	(380,660,70)	0.7688	(215,105,150)	0.8098
4	(380,660,70,55)	0.7679	(215,105,150,600)	0.8086
5	(380,660,70,55,535)	0.7580	(215,105,150,600,260)	0.8086

#HL	LMCI vs CU		LMCI vs CU with Demo
#HL	Config	F₁ Score	Config	F₁ Score
1	(915)	0.6512	(560)	0.7324
2	(915,20)	0.6536	(560,490)	0.7774
3	(915,20,690)	0.6539	(560,490,35)	0.7747
4	(915,20,690,170)	0.6471	(560,490,35,40)	0.7735
5	(915,20,690,170,15)	0.6507	(560,490,35,40,75)	0.7671

PERMALINK

Deep Learning based Classification of FDG-PET Data for Alzheimers Disease Categories

Shibani Singh

Anant Srivastava

Liang Mi

Richard J Caselli

Kewei Chen

Dhruman Goradia

Eric M Reiman

Yalin Wang

Abstract

1. Introduction

Figure 1.

Figure 2.

2. Data and Methods

2.1 Data and Processing

Table 1. Age Distribution for Subjects.

2.1.1 Extent of Linear Separation

Table 2. Linear SVM, an estimate of linear separability.

2.1.2 Sparse Representation of Data

2.2 Feature Selection using Maxpooling

Table 3. Performance Comparison: Patch Size vs F1-Accuracy.

2.3 Dimensionality Reduction using Probabilistic Principal Component Analysis

Table 4.

Figure 3.

2.4 Multilayer Perceptron

Figure 4. Schematic of Rosenblatt's perceptron.

2.4.1 Activation Function

Figure 6.

2.4.2 Backpropagation

2.5 Finding an Optimal Configuration

3. Results

Figure 7. ROC for Multilayer Perceptron Classifier. The figure on the left is without the addition of demographic features, and on the right is with the addition of demographic features.

Comparison with other Classification Algorithms

Comparison with Other Dimensionality Reduction Algorithms

Table 6. Performance Comparison: PPCA vs Other Dimensionality Reduction Algorithms.

Effect of Demographic Features

Table 7. Performance Comparison: Maxpooled data vs (Maxpooled+Age+Gender).

Table 8. Addition of AD/CU beta positive/negative with demographic features.

Table 9. Summary of Best Results.

Comparison of Max-pooled data with Mean-pooled data

Table 10. Classification comparison for Max-Pooled and Mean-Pooled Data.

4. Conclusion and Future Work

Figure 5. Multilayer Perceptron with 1 hidden layer.

Table 5. Performance Comparison: MLP vs Other Machine Learning Algorithms.

Acknowledgments

Appendix A. Configuring A 5-Hidden-Layer MLP

Table 11. Estimating an Optimal Configuration for AD vs. CU Classification.

Table 12. Estimating an Optimal Configuration for AD vs. MCI Classification.

Table 13. Estimating an Optimal Configuration for EMCI vs. AD Classification.

Table 14. Estimating an Optimal Configuration for LMCI vs. AD Classification.

Table 15. Estimating an Optimal Configuration for LMCI vs. CU Classification.

Table 16. Estimating an Optimal Configuration for EMCI vs. CU Classification.

Table 17. Estimating an Optimal Configuration for EMCI vs. LMCI Classification.

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases