Using normative modelling to detect disease progression in mild cognitive impairment and Alzheimer’s disease in a cross-sectional multi-cohort study

Walter H L Pinaya; Cristina Scarpazza; Rafael Garcia-Dias; Sandra Vieira; Lea Baecker; Pedro F da Costa; Alberto Redolfi; Giovanni B Frisoni; Michela Pievani; Vince D Calhoun; João R Sato; Andrea Mechelli

doi:10.1038/s41598-021-95098-0

. 2021 Aug 3;11:15746. doi: 10.1038/s41598-021-95098-0

Using normative modelling to detect disease progression in mild cognitive impairment and Alzheimer’s disease in a cross-sectional multi-cohort study

Walter H L Pinaya ^1,^2,^3,^✉, Cristina Scarpazza ^1,⁴, Rafael Garcia-Dias ¹, Sandra Vieira ¹, Lea Baecker ¹, Pedro F da Costa ^5,⁶, Alberto Redolfi ⁷, Giovanni B Frisoni ^8,⁹, Michela Pievani ⁸, Vince D Calhoun ¹⁰, João R Sato ², Andrea Mechelli ¹

PMCID: PMC8333350 PMID: 34344910

Abstract

Normative modelling is an emerging method for quantifying how individuals deviate from the healthy populational pattern. Several machine learning models have been implemented to develop normative models to investigate brain disorders, including regression, support vector machines and Gaussian process models. With the advance of deep learning technology, the use of deep neural networks has also been proposed. In this study, we assessed normative models based on deep autoencoders using structural neuroimaging data from patients with Alzheimer’s disease (n = 206) and mild cognitive impairment (n = 354). We first trained the autoencoder on an independent dataset (UK Biobank dataset) with 11,034 healthy controls. Then, we estimated how each patient deviated from this norm and established which brain regions were associated to this deviation. Finally, we compared the performance of our normative model against traditional classifiers. As expected, we found that patients exhibited deviations according to the severity of their clinical condition. The model identified medial temporal regions, including the hippocampus, and the ventricular system as critical regions for the calculation of the deviation score. Overall, the normative model had comparable cross-cohort generalizability to traditional classifiers. To promote open science, we are making all scripts and the trained models available to the wider research community.

Subject terms: Alzheimer's disease, Diagnostic markers, Computer science, Biomedical engineering

Introduction

Normative modelling is an emerging method for quantifying and describing how individuals deviate from the expected pattern learned from a population or large sample¹. Recently, this approach has been applied to neuroimaging data to investigate a number of brain disorders, such as attention deficit hyperactivity disorder^{2, 3}, autism spectrum disorder^{4, 5}, schizophrenia^{3, 5, 6} and dementia^{7, 8}. The procedure of normative modelling used in these studies has two steps: (i) first, statistical models are estimated to characterise the typical brain data from a reference cohort; (ii) then, the estimated model is applied to a target clinical cohort in order to quantify the variation (e.g. due to the effect of brain disorders).

Many statistical models have been proposed for normative modelling, including regression, support vector machines and Gaussian process modelling (for an extensive list, see Marquand et al., 2019). In Pinaya et al.⁵, we proposed a normative modelling approach based on the use of deep autoencoders to evaluate psychiatric patients. The use of a deep learning approach^{10, 11} enables models to learn multiple levels of representation about the intricate structure of the data and identify the most important morphological characteristic of the healthy brain. In addition, in Pinaya et al.⁵, the models were able to detect deviations at the level of the individual, with patients with schizophrenia and patients with autism spectrum disorder presenting values significantly higher than the healthy controls (HC).

Similar to psychiatric disorders, the clinical interpretation of magnetic resonance imaging scans can be challenging in the context of neurodegenerative disorders, as brain alterations may be difficult to distinguish from those related to healthy ageing. The identification of disease-related alterations can be particularly tricky in the early stages of a disorder. For this reason, there is a grown interest in the development of methods for quantifying deviations of regional brain volumes that can discriminate between healthy and pathological ageing, with the ultimate aim of improving diagnostic and prognostic assessment of neurodegenerative disorders¹². Here, we used the autoencoder normative method⁵ to evaluate the most common type of dementia in the elderly worldwide, Alzheimer’s disease (AD).

First, we trained the normative models using a large number of HC subjects (> 11,000 participants). Then, we assessed the performance of these models using data from patients with a diagnosis of mild cognitive impairment (MCI), the prodromal stage to AD, and patients with a diagnosis of AD. This assessment involved calculating the deviation, i.e. the extent to which subjects deviate from the norm, in five additional datasets composed of patients with MCI, patients with AD, and HC subjects. We had two main hypotheses. First, we hypothesised that the normative models would be robust and sensitive enough to create deviation values that reflect the severity of the brain anatomical alterations due to the disease, i.e. individuals with AD would deviate from normality more than those with MCI. Second, we hypothesised that the main brain regions driving the observed deviation would include the medial temporal cortex and the ventricular system, consistent with the results of previous neuroimaging studies of MCI and AD^{13, 14}. Finally, we compared the performance of the normative approach against traditional classifiers to discriminate the patient groups from the HC group.

Methods

Datasets

In our analysis, we used six datasets: the UK Biobank¹⁵, the Alzheimer’s Disease Neuroimaging Initiative (ADNI)¹⁶, the Australian Imaging Biomarkers and Lifestyle Study of Ageing (AIBL)¹⁷, the Alzheimer’s Disease Repository Without Borders (ARWiBo)^{18, 19}, the Open Access Series of Imaging Studies: Cross-Sectional (OASIS-1)²⁰, and the Minimal Interval Resonance Imaging in Alzheimer's Disease (MIRIAD)²¹.

The UK Biobank is a study that aims to follow the health and well-being of 500,000 volunteer participants across the United Kingdom. From these participants, a subsample was chosen to collect multimodal imaging, including structural neuroimaging. Here, we used an early release of the project’s data comprising of 11,034 HC participants. The inclusion criteria for the present study were: (a) subjects who had the data collected in the same MRI scanner (from Cheadle centre), (b) age between 47 ND 73 years old. The only exclusion criterion was previous hospitalization associated with the diagnosis of mental and behavioural disorders, disease of the nervous system, cerebrovascular diseases, benign neoplasm of meninges, brain and other parts of the central nervous system, or injuries to the head. This study (UK Biobank project #40323) was covered by the general ethical approval for UK Biobank studies from the NHS National Research Ethics Service on 17th June 2011 (Ref 11/NW/0382). All methods were carried out in accordance with the approved guidelines and regulations. All UK Biobank participants provided written informed consent. More details about the dataset can be found elsewhere^{15, 22–24}.

The ADNI consortium started in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner. Its goal was to verify whether different neuroimaging biomarkers and neuropsychological assessments can be combined to measure the progression of MCI and to study the development of AD. All ADNI participants provided written informed consent, and study protocols were approved by each local site’s institutional review board. All methods were carried out in accordance with the approved guidelines. Further information about ADNI, including full study protocols, complete inclusion and exclusion criteria, and data collection and availability can be found at http://www.adni-info.org/. All methods as stated on the website were performed with the relevant guidelines and regulations. In this study, we included the structural MRI collected during the ADNI GO, ADNI 2 and ADNI 3 phases. Similar to UK Biobank, we included only subjects with age between 47 and 73 years old. The final dataset comprised of 517 subjects, where 212 were HC, 159 were patients with early MCI (EMCI), 82 were patient with late MCI (LMCI), and 64 were patients with AD. In the ADNI datasets, participants were assigned to these MCI stages based on different levels of impairment on a single episodic memory measure, with the EMCI group showing milder episodic memory impairment than the LMCI group^{25, 26}.

The AIBL dataset was developed to enhance the understanding of the pathogenesis of AD, concentrating on its early diagnosis (more details can be found in Ellis et al., 2009). Ethics approval for the AIBL study and all experimental protocols was provided by the ethics committees of Austin Health, St Vincent’s Health, Hollywood Private Hospital and Edith Cowan University. All experiments and methods were carried out in accordance with the approved guidelines and regulations and all volunteers gave written informed consent before participating in the study. Here, we included the structural MRI of subjects between 47 and 73 years old, to match the age range of the UK Biobank dataset. The final group was composed of 346 subjects, where 262 were HC, 46 were patients with MCI (stage not known), and 38 were patients with AD.

The ARWiBo is a cross-sectional dataset including data from patients and controls enrolled at the Scientific Institute for the Research and Care of Alzheimer’s Disease [Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy]. A multidisciplinary team of neurologists, neuroscientists, image analysists, neurophysiologists, and geneticists are involved in the assessment of patients. As part of their assessment, participants undergo blood drawing (for APOE genotyping), clinical and cognitive evaluations as well as high-resolution MRI scanning (more details can be found in Frisoni et al., 2009 and Galluzzi et al., 2010). Here, we included the structural MRI of subjects between 47 and 73 years old, to match the age range of the UK Biobank dataset. The resulting group was composed of 319 subjects, including 215 HC, 67 patients with MCI (stage not known), and 37 patients with AD. Ethics approval for the ARWiBo study and all experimental protocols was provided by the local ethics committee and all participants signed an informed participation consent. All experiments and methods were carried out in accordance with the approved guidelines and regulations.

The OASIS-1 dataset is the result of a collaborative effort of investigators from a single acquisition site supported by the National Institute on Aging (NIA), the Howard Hughes Medical Institute, the Biomedical Informatics Research Network (BIRN) and the Washington University Alzheimer’s Disease Research Center [Alzheimer’s Disease Research Center (ADRC)]. This collaborative effort aimed to create a freely available MRI dataset for the wider scientific community. The original dataset consisted of a cross-sectional collection of subjects aged 18 to 96. It included participants over the age of 60 who had received a clinical diagnosis of very mild to moderate AD (for more information, please see http://www.oasis-brains.org). In our analysis, we selected data collected from individuals who were between 47 and 73 years old, to match the age range of the UK Biobank dataset. The resulting group was composed of 78 subjects, including 41 HC and 37 patients with AD. Ethics approval for the OASIS-1 study and all experimental protocols was provided by the local ethics committee and all participants signed an informed participation consent. All subjects participated in accordance with guidelines of the Washington University Human Studies Committee. All experiments and methods were carried out in accordance with the approved guidelines and regulations.

The MIRIAD dataset was designed to establish the minimal interval over which it would be feasible to undertake clinical trials in AD using atrophy measured from longitudinal MRI as an outcome measure²¹. Ethical approval for the MIRIAD study (and subsequently its release) was received from the local research ethics committee, and written consent obtained from all participants. All experiments and methods were carried out in accordance with the approved guidelines and regulations. Here, we included the structural MRI of subjects between 47 and 73 years old, to match the age range of the UK Biobank dataset. The resulting group was composed of 48 subjects, including 18 HC and 30 patients with AD.

In the present study, we used the UK Biobank set to train the autoencoders and the ADNI, AIBL, ARWiBo, OASIS-1, and MIRIAD datasets to assess the normative model performance on data from patients with MCI and AD. To perform comparisons between HC and patient groups, we ensured that there were no significant statistical differences regarding age and sex in all five clinical datasets. We assessed each dataset independently using the ANOVA test to verify any differences in age and the Chi-square test of homogeneity to investigate differences in the sex ratios between groups (Tables 1, 2).

Table 1.

Demographic information for the subjects from the UK Biobank dataset, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the Australian Imaging Biomarkers and Lifestyle Study of Ageing (AIBL) dataset. We used ANOVA test and the chi‐square test of homogeneity to test for significant differences in age and sex between healthy controls and patients. Abbreviations: HC = healthy control; EMCI = early mild cognitive impairment; LMCI = late mild cognitive impairment; AD = Alzheimer’s disease; MCI = mild cognitive impairment; SD = standard deviation.

	UK BIOBANK n = 11,034	ADNI n = 517				p	AIBL n = 346			p
	UK BIOBANK n = 11,034	HC n = 212	EMCI n = 159	LMCI n = 82	AD n = 64	p	HC n = 262	MCI n = 46	AD n = 38	p
Age, y						0.87				0.28
Mean ± SD	61.6 ± 7.0	66.6 ± 3.7	66.4 ± 4.2	66.2 ± 5.1	66.6 ± 5.4		68.2 ± 3.2	68.2 ± 3.6	67.3 ± 5.2
Range	[47, 73]	[56, 72]	[56, 73]	[56, 73]	[56, 73]		[60, 73]	[56, 73]	[55, 73]
Sex, n (%)						0.25				0.15
Men	5180 (47)	90 (42)	72 (45)	40 (49)	36 (56)		113 (43)	19 (41)	17 (45)
Women	5854 (53)	122 (58)	87 (55)	42 (51)	28 (44)		149 (57)	27 (59)	21 (55)

Open in a new tab

Table 2.

Demographic information for the subjects from the Alzheimer’s Disease Repository Without Borders (ARWiBo) dataset, the Open Access Series of Imaging Studies: Cross-Sectional (OASIS-1) dataset, and the Minimal Interval Resonance Imaging in Alzheimer's Disease (MIRIAD) dataset. We used ANOVA test and the chi‐square test of homogeneity to test for significant differences in age and sex between healthy controls and patients. Abbreviations: HC = healthy control; AD = Alzheimer’s disease; MCI = mild cognitive impairment; SD = standard deviation.

	ARWiBo n = 319			p	OASIS-1 N = 78		p	MIRIAD n = 48		p
	HC n = 215	MCI n = 67	AD n = 37	p	HC n = 41	AD n = 37	p	HC n = 18	AD n = 30	p
Age, y				0.16			0.07			0.71
Mean ± SD	65.1 ± 4.4	66.4 ± 5.7	65.1 ± 6.0		68.2 ± 3.8	69.7 ± 3.1		66.7 ± 4.1	66.2 ± 4.6
Range	[57, 73]	[47, 73]	[50, 73]		[61, 73]	[62, 73]		[59, 73]	[56, 73]
Sex, n (%)				0.70			0.30			0.76
Men	86 (40)	23 (34)	14 (38)		11 (27)	14 (38)		7 (39)	13 (43)
Women	129 (60)	44 (66)	23 (62)		30 (73)	23 (62)		11 (61)

Open in a new tab

MRI processing

We used the FreeSurfer software (version 6.0) to estimate the brain regions’ volumes from the T1 weighted images. This estimation was performed using the “recon-all” command (see Fischl, 2012; Fischl et al., 2002, for more information). During this processing, the cortical surface of each hemisphere was parcellated according to the Desikan–Killiany atlas²⁹ and anatomical volumetric measures were obtained via a whole-brain segmentation procedure (Aseg atlas)²⁸. The final data included the cortical volume for each of the 68 cortical subregions (34 per hemisphere) and the volume of 33 neuroanatomical structures, totalling 101 subregions/structures (the complete list is presented in the supplementary materials).

Normative model

In this paper, we developed the normative model using the adversarial autoencoder (AAE; Fig. 1)^{30, 31}. As an autoencoder, this neural network has an encoder and a decoder. The function of the encoder is to take in an input x and map it into a latent encoding space, creating a latent code h. Then, the goal of the decoder is to reconstruct the input data based on the latent code. The AAE is a blend of this autoencoder framework with adversarial training, which is used in generative adversarial networks modelling³². This autoencoder uses the adversarial training to shape the distribution of the latent code to look similar to a predefined prior distribution. The AAE achieves this desired distribution by incorporating a discriminator network into its structure. In this scheme, the discriminator receives two types of inputs: random numbers sampled from the desired prior distribution, and the latent code. During the training process, the discriminator will make predictions regarding whether its input data was sampled from the prior distribution or the latent code. The adversarial training forces the encoder to produce a latent code space that can fool the discriminator into predicting that the encoded samples are just another sample from the prior distribution.

In this study, we trained the AAE to codify and reconstruct the data of HC subjects. The main idea of this normative approach is that, since the AAE only learns how to reconstruct images from HC individuals, it will be less precise at mapping images from patients, which differ due to the pathological mechanisms of the disorder. As a result, the difference between the reconstructed data and the original data will be larger in patients than HC individuals.

Regarding our model architecture, we used an encoder with two hidden layers with 100 neurons, and a latent code with a size of 20 neurons. The decoder and the discriminator had a similar structure (two hidden layers with 100 neurons). All hidden layers had a leaky ReLU non-linearity³³. The latent code and the decoder’s output layer had a linear activation function.

Normative model training

To train the autoencoder, first, we performed the pre-processing of the brain features. This involved estimating the relative brain region volumes for each subject by dividing the original brain region volumes by the total intracranial volume. Then, we normalised the relative brain region volumes across all the participants in the training set. In this step, we performed a normalisation robust to outliers by subtracting the median value of the relative brain region volume and then scaling the data according to its interquartile range. Centering and scaling was done independently for each brain region. The same relevant statistics (median and interquartile range) were later used to normalise the data from the clinical datasets before feeding them to the model.

In our analyses, we used a conditioned AAE³⁰. This type of autoencoder allows us to influence the model’s reconstruction using the demographic variables, i.e. age and sex. To input these variables into the model, we transformed age and sex into one-hot encoding vectors. After this transformation, each subject has an age vector with 27 positions, where each position corresponds to a year within the range of 47–73 years. In this vector, all positions have value zero except the one that indicates the subject’s age which has a value equal to 1. The subject’s sex was represented in a one-hot encoded vector with two positions, one for male and one for female. The AAE’s decoder used these vectors together with the latent code to reconstruct the brain data. This architecture forces the network to disentangle the label information from the latent code³⁰.

With the features pre-processed and the conditioning data prepared, we trained the autoencoder to minimise the mean squared value of its reconstruction error using Adam optimizer³⁴ for 200 epochs. A minibatch approach was used in this gradient descent-based optimizer, with a batch size of 256. The model was trained with a cyclical learning rate³⁵, which allows convergence of the training with fewer epochs. We started using a base learning rate with a value of 0.0001 and a maximum learning rate value of 0.005, chosen using the “LR Range Test”³⁶. The learning rate cycle had a basic triangular shape with an amplitude decaying (gamma = 0.98).

In this study, we accessed the robustness of the autoencoder approach by training it with different simulated sets using the bootstrapping as the resampling method. We created 1,000 bootstrapped sets (each one with n = 11,032) by sampling with replacement from the UK Biobank. These bootstrapped sets were used to train the AAE. With this resampling method, we calculated: the value of the mean deviation (“Analysis of the observed deviations” section) for each group from the ADNI, AIBL, ARWiBo, OASIS-1, and MIRIAD datasets, the discriminative performance of the normative approach (“Analysis of the observed deviations” section), and the deviation from normality of each brain region (“Brain regions deviations” section).

Analysis of the observed deviations

Similar to Pinaya et al.⁵, we processed the data of each subject using the AAE, and we calculated the mean squared error between the reconstruction and the inputted data as the metric of brain deviation (Eq. 1).

o b s e r v e d d e v i a t i o n = \frac{1}{n u m b e r o f r e g i o n s} \sum_{i = 1}^{n u m b e r o f r e g i o n s} {(x_{i} - {\hat{x}}_{i})}^{2}

where $x_{i}$ is the normalised value of the brain region $i$ , ${\hat{x}}_{i}$ is the autoencoder reconstructed value of the brain region $i$ , and $n u m b e r o f r e g i o n s$ is the number of cortical regions and neuroanatomical structures used (i.e. $n u m b e r o f r e g i o n s$ = 101).

In each iteration of the bootstrap method, we used the trained autoencoder to obtain the deviation metric of the subjects from the ADNI, AIBL, ARWiBo, OASIS-1, and MIRIAD datasets. Then, we calculated the difference between the mean deviation scores of each pair of groups. We identified a significant difference between groups if the confidence interval (95% of confidence) of this difference did not include the zero. Besides, we used the subjects’ deviations to obtain the discriminative performance of the autoencoder approach, measured by the area under the receiver operating characteristic curve (AUC).

Brain regions deviations

The autoencoder approach can quantify how much each brain region deviated from normality and contributed to the observed deviation. These values were obtained by measuring the difference between the inputted value and its reconstruction. In our study, we quantified the deviation for each subject from the ADNI, AIBL, ARWiBo, OASIS-1, and MIRIAD datasets. Then, in each iteration of the bootstrap method, we calculated the effect size of each brain region deviation—using Cliff’s delta³⁷ value—between the HC group and each patient group. Here we used Cliff’s delta—a non-parametric effect size measure—because the observed deviation presents a gamma distribution.

Comparison against traditional machine learning classification

A further aim of the present study was to compare the performance of our normative model against a traditional classification approach. To measure the performance of the classifiers, we calculated the AUC using the 0.632 + bootstrap method³⁸ with 1,000 iterations. Each clinical dataset (ADNI, AIBL, ARWiBo, OASIS-1, and MIRIAD) was analysed independently using the HC and patient groups to train the classifiers. Besides, the analysis was performed as multiple binary classifications between HC and each clinical group (e.g. HC versus LMCI).

In each iteration, first, we created the bootstrapped set by sampling the original data (from ADNI, AIBL, ARWiBo, OASIS-1, and MIRIAD datasets) with replacement. This bootstrapped set had the same size as the original dataset (for example, when analysing the ADNI dataset to classify healthy controls and patients with Alzheimer’s disease, the bootstrapped set had 212 + 64 = 276 subjects), and it contained repeated subjects (due to replacement). For each iteration, the subjects not included in the bootstrapped set were used as the out-of-bag set (i.e. test set).

Next, we obtained the relative brain region volumes of each subject by dividing the original volume by the total intracranial volume. Then, we normalised the values of the relative brain volumes across the subjects. In this normalisation step, we removed the median value of the brain regions and scaled the data according to the interquartile range. Centering and scaling was done independently for each brain region. The same relevant statistics (median and interquartile range) were later used to normalise the out-of-bag set.

To perform the classification analysis, we used a relevance vector machine (RVM)³⁹ with a linear kernel. The RVM is a Bayesian treatment of identical functional form to the Support Vector Machines (SVM)⁴⁰. One advantage of the RVM form over the SVM is that it is not necessary to estimate the error/margin trade-off parameter ‘C’. After we trained the RVM on the bootstrapped set, we used the model to obtain the predicted probability of a subject belonging to the patient class. Using these probabilities, we calculated two AUC values, one for the bootstrapped set (called “resubstitution” metric) and one for the test set (called “out-of-bag” metric). By using the 0.632 + bootstrap method, we minimised the optimistic and pessimistic bias of the estimate and obtained the AUC value (Eq. 2).

A U C_{bootstrap} = \frac{1}{b} \sum_{i = 1}^{b} (ω * A U C_{o u t - o f - b a g, i} + (1 - ω) * A U C_{r e s u b s t i t u t i o n, i})

where b was the number of iterations and the weight ω was defined considering the relative overfitting rate (full description in Efron and Tibshirani, 1997). To obtain the confidence interval (CI; 95% of confidence), we used the percentile method⁴¹. Next, we compared these confidence intervals with the AUC obtained during the normative approach.

Finally, we compared the generalization of the classifiers with the results of the autoencoders. In this analysis, we used each trained classifier to predict the group of the subjects from the other clinical datasets. In order to verify if the performance in the independent datasets was significantly different from the normative approach, we calculated the difference between the AUCs of this generalization analysis and the AUCs of the autoencoders. With the 1,000 measures of the difference, we calculated its confidence interval (95% confidence) to verify if this difference is different from zero.

Experiments

We conducted our experiments in Python 3 using the Tensorflow 2.0 library (https://www.tensorflow.org/) and the sklearn_rvm library (https://github.com/Mind-the-Pineapple/sklearn-rvm) developed by Baecker et al.⁴². We have made publicly available the codes and trained models used in this study at https://github.com/Warvito/Normative-modelling-using-deep-autoencoders. A Google’s Colaboratory notebook that calculates the deviations scores of new data is available at https://colab.research.google.com/github/Warvito/Normative-modelling-using-deep-autoencoders/blob/master/notebooks/predict.ipynb.

Results

Comparison of deviation values for healthy controls and patients

Figure 2 shows the mean value of the observed deviation for each group. For the ADNI dataset, we found a mean value of 0.28 ([0.27, 0.32]; 95% CI) for HC; 0.29 ([0.28, 0.35]; 95% CI) for EMCI; 0.32 ([0.30, 0.38]; 95% CI) for LMCI; 0.37 ([0.34, 0.47]; 95% CI) for AD. For the AIBL dataset, we found a mean value of 0.30 ([0.28, 0.33]; 95% CI) for HC; 0.36 ([0.34, 0.42]; 95% CI) for MCI; and 0.40 ([0.36, 0.50]; 95% CI) for AD. For the ARWiBo dataset, we found a mean value of 0.32 ([0.30, 0.38]; 95% CI) for HC; 0.37 ([0.34, 0.47]; 95% CI) for MCI; and 0.46 ([0.40, 0.62]; 95% CI) for AD. For the OASIS-1 dataset, we found a mean value of 0.41 ([0.39, 0.46]; 95% CI) for HC and 0.65 ([0.58, 0.79]; 95% CI) for AD. For the MIRIAD dataset, we found a mean value of 0.26 ([0.24, 0.30]; 95% CI) for HC and 0.48 ([0.41, 0.71]; 95% CI) for AD.

Mean value of the observed deviation calculated by the autoencoder for each group. The square marker indicates the mean value and the horizontal bars indicates the 95% confidence interval calculated using the percentile method on the bootstrap analysis. Abbreviations: AD = Alzheimer’s disease; EMCI = early mild cognitive impairment; LMCI = late mild cognitive impairment; MCI = mild cognitive impairment; HC = healthy controls; ADNI = Alzheimer’s Disease Neuroimaging Initiative; AIBL = Australian Imaging Biomarkers and Lifestyle Study of Ageing; ARWiBo = Alzheimer's Disease Repository Without Borders; OASIS-1 = Open Access Series of Imaging Studies: Cross-Sectional; MIRIAD = Minimal Interval Resonance Imaging in Alzheimer's Disease.

When we examined the confidence intervals of the observed deviations, we found that the five independent datasets presented mean deviation scores significantly different between groups, with the exception of the comparison between HC and EMCI in the ADNI dataset (difference range [-0.03, 0.00]) and the comparison between MCI and AD in the AIBL dataset (difference range [-0.09, 0.00]) (more details can be found in the supplementary materials).

Normative model performance in discriminative tasks

We examined if the observed deviations could be used to predict if a person belonged to the patient or HC group (Fig. 3) using ROC curves. This revealed that the generated deviation values reflected the severity of the disease. Specifically, based on the AUC, it was possible to discriminate patients with AD vs HC better than patients with MCI vs HC, and to discriminate patients with LMCI vs HC better than patients with EMCI vs HC.

Discriminative performance of the normative approach. The solid line indicates the mean receiver operating characteristic curve across the bootstrap iterations with the shaded area indicating the 95% confidence interval calculated using the percentile method on the bootstrap analysis. The dashed line indicates the chance level. Abbreviations: AD = Alzheimer’s disease; AUC-ROC = area under the receiver operating characteristic curve; EMCI = early mild cognitive impairment; LMCI = late mild cognitive impairment; MCI = mild cognitive impairment; HC = healthy controls; ADNI = Alzheimer’s Disease Neuroimaging Initiative; AIBL = Australian Imaging Biomarkers and Lifestyle Study of Ageing; ARWiBo = Alzheimer's Disease Repository Without Borders; OASIS-1 = Open Access Series of Imaging Studies: Cross-Sectional; MIRIAD = Minimal Interval Resonance Imaging in Alzheimer's Disease.

Brain regions deviations

Figure 4 present the Cliff’s delta of each brain region when comparing its deviation in the HC group against the deviation in the patient groups. Only the regions with effect sizes significantly different from zero are shown (complete list presented in the supplementary materials). Among the regions showing significant deviation in patients with AD, we found the lateral ventricles, temporal horns, hippocampus, entorhinal cortex, parahippocampal cortex, and amygdala. A number of these regions also showed a high deviation in patients with MCI, including the lateral ventricles and hippocampus. Finally, we also noted that effect sizes were smaller for the regions identified in patients with MCI relative to those identified in patients with AD.

Traditional machine learning classification

Using the RVM, we verified the performance of a traditional classifier when performing binary classification between HC and patients. For the ADNI dataset, we obtained an AUC = 0.69 ([0.58, 0.77]; 95% CI) when analysing patients with EMCI, an AUC = 0.76 ([0.64, 0.84]; 95% CI) when analysing patients with LMCI, and an AUC = 0.93 ([0.87, 0.97]; 95% CI) when analysing patients with AD. For the AIBL dataset, an AUC = 0.37 ([0.00, 0.78]; 95% CI) when analysing subjects with MCI, and we obtained an AUC = 0.93 ([0.86, 0.93]; 95% CI) when analysing patients with AD. Note, that the AUC for the AIBL dataset when analysing MCI had a wide interval. This interval was exacerbated due to the presence of overfitting and the 0.632 + bootstrap method compensatory effect that reduce the effect of bias caused by this overfitting. For the ARWiBo dataset, we obtained an AUC = 0.68 ([0.52, 0.78]; 95% CI) when analysing subjects with MCI, and an AUC = 0.94 ([0.87, 0.98]; 95% CI) when analysing patients with AD. For the OASIS-1 dataset, we obtained an AUC = 0.86 ([0.69, 0.96]; 95% CI) when analysing patients with AD. For the MIRIAD dataset, we obtained an AUC = 0.86 ([0.70, 0.96]; 95% CI) when analysing patients with AD.

To identify significant differences between the performance of the normative models and traditional classifiers, we calculated the confidence interval (95% of confidence) of the difference in AUC between the two methods. The traditional classifiers were superior to the normative models when predicting the difference between the groups in the ADNI dataset and the difference between HC and AD in the AIBL dataset; in contrast the performance of the two approaches was comparable for all other comparisons (more details can be found in the supplementary materials).

Finally, we examined how a classifier trained on a certain dataset would perform when applied to other datasets (i.e. cross-cohort generalizability). The results of this examination are presented in Tables 3 and 4. When predicting AD, the classifiers had a higher mean performance than the normative approach in most cases (except when the model was trained on MIRIAD dataset and evaluated on ARWiBo dataset). However, the difference was not significantly different in almost half of the cases. When predicting MCI, the classifiers presented a lower mean performance in all cases, but the difference was not significantly different.

Table 3.

Generalization performance of the classifiers for the classification between HC and patients with Alzheimer’s disease. In this table, the rows indicate the dataset where the classifier is trained and the columns indicate the dataset where the performance was tested. The area under the receiver operating characteristic curve is shown with the upper and lower bound of its 95% confidence interval. Performance significantly different from the normative approach calculated using the confidence interval of the difference between the approach across the bootstrap scheme is indicated by “*”.

	ADNI	AIBL	ARWiBo	OASIS-1	MIRIAD
ADNI	–	0.89 [0.93, 0.83] *	0.88 [0.81, 0.93]	0.84 [0.76, 0.90] *	0.98 [0.95, 1.00]
AIBL	0.88 [0.82, 0.93] *	–	0.89 [0.93, 0.83] *	0.86 [0.80, 0.91] *	0.98 [0.94, 1.00]
ARWiBo	0.88 [0.81, 0.92] *	0.88 [0.83, 0.92] *	–	0.80 [0.72, 0.86]	0.96 [0.91, 0.99]
OASIS-1	0.90 [0.84, 0.93] *	0.89 [0.83, 0.93] *	0.86 [0.77, 0.92]	–	0.96 [0.91, 1.00]
MIRIAD	0.89 [0.83, 0.93] *	0.87 [0.80, 0.91] *	0.82 [0.73, 0.91]	0.83 [0.74, 0.89]	–

Open in a new tab

Table 4.

Generalization performance of the classifiers for the classification between HC and patients with mild cognitive impairment. In this table, the rows indicate the dataset where the classifier is trained and the columns indicate the dataset where the performance was measured. The area under the receiver operating characteristic curve is shown with the upper and lower bound of its 95% confidence interval. No case had a performance significantly different from the normative approach calculated using the confidence interval of the difference between the approach across the bootstrap scheme.

	AIBL	ARWiBo
AIBL	–	0.61 [0.54, 0.67]
ARWiBo	0.59 [0.53, 0.65]	–

Open in a new tab

Discussion

In this study, we evaluated the performance of the normative modelling approach based on deep autoencoders on data from patients with MCI and AD. Consistent with our first hypothesis, we found that the approach was effective in generating deviation values that reflect the severity of the disease, with patients with AD showing higher deviations than patients with MCI, and patients with LMCI showing larger deviations than patients with EMCI. We also measured how much each brain region deviated from normality and contributed to the observed deviation. Here, we found that regions from the ventricular system and medial temporal lobe were among those making the greatest significant contribution to deviation, consistent with our second hypothesis. Finally, we compared the performance of the normative approach versus a traditional classification approach. Although a higher performance was found for traditional classifiers in most cases, the difference was not statistically significant in the majority of cases.

We have replicated previous findings that the autoencoder is capable of detecting neuroanatomical deviation in individuals with brain disorders⁵. In particular, in each of our five independent datasets, the normative model was able to assign higher values to patients with AD than healthy controls. This pattern was expected since the disorder is associated with profound alterations in the brain morphometry which were not present in the training set^{13, 14}. In addition, we have expanded these findings by demonstrating for the first time that autoencoders are capable of discriminating between different stages of the disease progress (i.e. EMCI versus LMCI versus AD). In particular, we observed that the MCI group presented intermediary deviation values in three independent datasets (ADNI, AIBL and ARWiBo). These values were also expected since the MCI is considered as a transitory stage between HC and AD⁴³, and usually present less brain atrophy compared to AD⁴⁴. In addition, within the ADNI dataset, the MCI subjects were divided into two categories, EMCI and LMCI. Although individuals in both stages meet the conventional criteria for MCI, EMCI is associated with less pronounced symptoms thought to reflect an earlier point in the clinical spectrum than LMCI. In our analyses, we found that the patients with LMCI had a significantly (i.e. the confidence interval of the difference between the group do not overlap zero) larger deviation than patients with EMCI providing further confirmation that that deep autoencoders are capable of discriminating between different stages of the disease course.

With the autoencoder based approach, it was possible to identify the brain regions with the highest deviations from the expected normative values. Consistent with our second hypothesis, the AD group showed high levels of deviation in structures that are part of the ventricular system (such as the lateral ventricles, temporal horns, and 3^rd ventricle) and in the medial temporal cortex, including the hippocampus, entorhinal cortex, parahippocampal cortex, and amygdala. Progressive ventricular expansion is one of the most reliable morphological changes in dementia patients, reflecting the increasing atrophy of the brain⁴⁵. Likewise, medial temporal cortex atrophy is among the most consistent findings in neuroimaging studies of AD^{13, 46} and an established marker of AD⁴⁷. While deviations in the MCI group had a smaller sizes than those in the AD group, there was a high degree of overlap in the hippocampus, parahippocampal cortex and several temporoparietal regions, consistent with previous neuroimaging studies of MCI^48–51. The smaller effect size in MCI might be explained by two (not mutually-exclusive) factors: (i) earlier stage in the AD course, hence milder atrophy, (ii) heterogeneity of the MCI construct. Since MCI patients were not selected based on AD biomarkers (i.e., presence of beta-amyloid and tau protein in the cerebrospinal fluid)⁵², this group will likely include a mixture of AD and non-AD cases, hence the milder/diluted effect.

Finally, we compared the performance of our normative approach with traditional classifiers. The performance of the classifiers was measured in two schemes, on data from the same dataset where the model was trained and on data from independent clinical datasets (generalization performance). Although the traditional classifiers had a better mean performance in most cases, the differences between the two approaches were not statistically significant in most of the cases, especially when predicting the subjects from the ARWiBo, OASIS-1 and MIRIAD datasets. This similarity was more evident during the prediction of the patients with MCI (with exception the ADNI dataset).

Although we evaluated our method using a range of different datasets, we did not assess the impact of MRI scanners and acquisition parameters. Recent studies have showed that these variables can have a measurable impact on the performance of machine learning models, highlighting the importance of inter-scanner harmonisation^{53, 54}. In particular, MRI scanners and acquisition parameters have been shown to influence the results not only in traditional machine learning classification but also normative modelling⁵⁵. For this reason, further studies need to be performed to analyse the influence of inter-scanner harmonisation, which can be implemented using tools such as Neuroharmony⁵⁴ or Combat⁵⁶, on the performance of autoencoder based methods.

Different from a case–control context, the normative approach does not need to be trained in a dataset with reasonable balancing between HC and patient groups. It is trained using only healthy controls, which enables the use of large cohorts of HC participants^{1, 9}, such as UK Biobank and Human Connectome Project⁵⁷. Our approach is not linked with any labels during training; this enables its application to an array of clinical tasks (including diagnosis, prognosis, treatment selection and mechanistic inference) for any brain disorder without the necessity of re-training or fine-tuning. Finally, since our approach involves anomaly detection, it can also work cooperatively with conventional discriminative models to identify and mitigate circumstances where supervised methods could catastrophic fail due to a test example very distinct from the training set (“out-of-distribution” examples). In order to promote open science, we have made all scripts and the trained models available to the wider research community (https://github.com/Warvito/Normative-modelling-using-deep-autoencoders).

Supplementary Information

Supplementary Information.^{(1.8MB, pdf)}

Acknowledgements

This study was supported by a Wellcome Trust's Innovator Award to Andrea Mechelli (208519/Z/17/Z). WHLP is supported by Wellcome Innovations [WT213038/Z/18/Z]. This work was carried out within the scope of the project "use-inspired basic research", for which the Department of General Psychology of the University of Padova has been recognized as "Dipartimento di eccellenza" by the Ministry of University and Research. JRS was supported by grant 2018/04654-9, São Paulo Research Foundation (FAPESP); and grant 2018/21934-5, São Paulo Research Foundation (FAPESP). VC was supported by NIH RF1AG063153. This research has been conducted using the UK Biobank Resource (Project number: 40323). Data were provided in part by OASIS: Cross-Sectional: Principal Investigators: D. Marcus, R, Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: https://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. Data used in the preparation of this article was obtained from the Australian Imaging Biomarkers and Lifestyle flagship study of ageing (AIBL) funded by the Commonwealth Scientific and Industrial Research Organisation (CSIRO) which was made available at the ADNI database (www.loni.usc.edu/ADNI). The AIBL researchers contributed data but did not participate in analysis or writing of this report. AIBL researchers are listed at www.aibl.csiro.au. ARWiBo data (www.arwibo.it) was obtained from NeuGRID4You initiative funded by the European Commission (FP7/2007-2013) under grant agreement no.283562. The overall goal of ARWiBo is to contribute, thorough synergy with neuGRID (https://neugrid2.eu), to global data sharing and analysis in order to develop effective therapies, prevention methods and a cure for Alzheimer' and other neurodegenerative diseases. Data used in the preparation of this article were obtained from the MIRIAD database. The MIRIAD investigators did not participate in analysis or writing of this report. The MIRIAD dataset is made available through the support of the UK Alzheimer's Society (Grant RF116). The original data collection was funded through an unrestricted educational grant from GlaxoSmithKline (Grant 6GKC).

Author contributions

W.H.L.P. obtained and organized the MRI data, pre-processed the MRI images, implemented the normative models, performed the groups analysis, drafted and edited the manuscript; C.S. obtained and organized the MRI data and revised the manuscript; R.G.D. implemented script to help pre-process the MRI data and the revised the manuscript; S.V. revised the manuscript; L.B. revised the manuscript; P.F.C. implemented the traditional classifiers and revised the manuscript; A.R. was responsible for part of the data collection and revised the manuscript; M.P. was responsible for part of the data collection and revised the manuscript; G.B.F. was responsible for part of the data collection; VDC revised manuscript and involved in funding efforts; J.R.S. revised manuscript and involved in funding efforts; A.M. revised the manuscript, gathered funding, and supervised the project.

Data availability

UK Biobank data are available through a procedure described at http://www.ukbiobank.ac.uk/using-the-resource/. ADNI data are available through an access procedure described at http://adni.loni.usc.edu/data-samples/access-data/. AIBL data are available through an access procedure described at https://aibl.csiro.au/adni/imaging.html. AIBL data are available through an access procedure described at https://www.gaaindata.org/partner/ARWIBO. OASIS-1 data are available through an access procedure described at https://www.oasis-brains.org/#data. OASIS-1 data are available through an access procedure described at https://www.oasis-brains.org/#data. MIRIAD data are available through an access procedure described at https://www.ucl.ac.uk/drc/research/methods/minimal-interval-resonance-imaging-alzheimers-disease-miriad.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-95098-0.

References

1.Marquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding heterogeneity in clinical cohorts using normative models: Beyond case–control studies. Biol. Psychiatry. 2016;80:552–561. doi: 10.1016/j.biopsych.2015.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Wolfers, T. et al. Individual differences v. the average patient: mapping the heterogeneity in ADHD using normative models. Psychol. Med. 1–10 (2019). [DOI] [PMC free article] [PubMed]
3.Kia, S. M. & Marquand, A. F. Neural processes mixed-effect models for deep normative modeling of clinical neuroimaging data. arXiv Prepr. arXiv1812.04998 (2018).
4.Zabihi M, et al. Dissecting the heterogeneous cortical anatomy of autism spectrum disorder using normative models. Biol. Psychiatry Cogn. Neurosci. Neuroimaging. 2019;4:567–578. doi: 10.1016/j.bpsc.2018.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Pinaya WHL, Mechelli A, Sato JR. Using deep autoencoders to identify abnormal brain structural patterns in neuropsychiatric disorders: A large-scale multi-sample study. Hum. Brain Mapp. 2019;40:944–954. doi: 10.1002/hbm.24423. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wolfers T, et al. Mapping the heterogeneous phenotype of schizophrenia and bipolar disorder using normative models. JAMA Psychiat. 2018 doi: 10.1001/JAMAPSYCHIATRY.2018.2467. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ziegler G, Ridgway GR, Dahnke R, Gaser C, Initiative ADN. Individualized Gaussian process-based prediction and detection of local and global gray matter abnormalities in elderly subjects. Neuroimage. 2014;97:333–348. doi: 10.1016/j.neuroimage.2014.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Huizinga W, et al. A spatio-temporal reference model of the aging brain. Neuroimage. 2018;169:11–22. doi: 10.1016/j.neuroimage.2017.10.040. [DOI] [PubMed] [Google Scholar]
9.Marquand, A. F. et al. Conceptualizing mental disorders as deviations from normative functioning. Mol. Psychiatry 1 (2019). [DOI] [PMC free article] [PubMed]
10.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
11.Vieira, S., Pinaya, W. H. L., Garcia-Dias, R. & Mechelli, A. Deep neural networks. in Machine Learning 157–172 (Elsevier, 2020).
12.Brewer JB. Fully-automated volumetric MRI with normative ranges: translation to clinical practice. Behav. Neurol. 2009;21:21–28. doi: 10.1155/2009/616581. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Busatto GF, Diniz BS, Zanetti MV. Voxel-based morphometry in Alzheimer’s disease. Expert Rev. Neurother. 2008;8:1691–1702. doi: 10.1586/14737175.8.11.1691. [DOI] [PubMed] [Google Scholar]
14.Pini L, et al. Brain atrophy in Alzheimer’s disease and aging. Ageing Res. Rev. 2016;30:25–48. doi: 10.1016/j.arr.2016.01.002. [DOI] [PubMed] [Google Scholar]
15.Sudlow C, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Mueller SG, et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI) Alzheimer’s Dement. 2005;1:55–66. doi: 10.1016/j.jalz.2005.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ellis KA, et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: Methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. Int. Psychogeriatrics. 2009;21:672–687. doi: 10.1017/S1041610209009405. [DOI] [PubMed] [Google Scholar]
18.Frisoni GB, et al. Markers of Alzheimer’s disease in a population attending a memory clinic. Alzheimer’s Dement. 2009;5:307–317. doi: 10.1016/j.jalz.2009.04.1235. [DOI] [PubMed] [Google Scholar]
19.Galluzzi S, et al. The new Alzheimer’s criteria in a naturalistic series of patients with mild cognitive impairment. J. Neurol. 2010;257:2004–2014. doi: 10.1007/s00415-010-5650-0. [DOI] [PubMed] [Google Scholar]
20.Marcus DS, et al. Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 2007;19:1498–1507. doi: 10.1162/jocn.2007.19.9.1498. [DOI] [PubMed] [Google Scholar]
21.Malone IB, et al. MIRIAD—Public release of a multiple time point Alzheimer’s MR imaging dataset. Neuroimage. 2013;70:33–36. doi: 10.1016/j.neuroimage.2012.12.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Elliott P, Peakman TC. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 2008;37:234–244. doi: 10.1093/ije/dym276. [DOI] [PubMed] [Google Scholar]
23.Miller KL, et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 2016;19:1523. doi: 10.1038/nn.4393. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Alfaro-Almagro F, et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage. 2018;166:400–424. doi: 10.1016/j.neuroimage.2017.10.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Aisen PS, et al. Clinical Core of the Alzheimer’s Disease Neuroimaging Initiative: Progress and plans. Alzheimer’s Dement. 2010;6:239–246. doi: 10.1016/j.jalz.2010.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Edmonds EC, et al. Early versus late MCI: Improved MCI staging using a neuropsychological approach. Alzheimer’s Dement. 2019;15:699–708. doi: 10.1016/j.jalz.2018.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Fischl B. FreeSurfer. Neuroimage. 2012;62:774–781. doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Fischl B, et al. Whole brain segmentation. Neuron. 2002;33:341–355. doi: 10.1016/S0896-6273(02)00569-X. [DOI] [PubMed] [Google Scholar]
29.Desikan RS, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]
30.Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I. & Frey, B. Adversarial autoencoders. arXiv Prepr. arXiv: 1511.05644 (2015).
31.Pinaya, W. H. L., Vieira, S., Garcia-Dias, R. & Mechelli, A. Autoencoders. in Machine Learning 193–208 (Elsevier, 2020).
32.Goodfellow, I. et al. Generative adversarial nets. in Advances in neural information processing systems 2672–2680 (2014).
33.Maas, A. L., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. in Proc. icml vol. 30 3 (2013).
34.Kingma, D. & Ba, J. Adam: A method for stochastic optimization. arXiv Prepr. arXiv: 1412.6980 1–15 (2014).
35.Smith, L. N. Cyclical learning rates for training neural networks. in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) 464–472 (IEEE, 2017).
36.Smith, L. N. A disciplined approach to neural network hyper-parameters: Part 1—Learning rate, batch size, momentum, and weight decay. arXiv Prepr. arXiv: 1803.09820 (2018).
37.Cliff N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol. Bull. 1993;114:494. doi: 10.1037/0033-2909.114.3.494. [DOI] [Google Scholar]
38.Efron B, Tibshirani R. Improvements on cross-validation: The 632+ bootstrap method. J. Am. Stat. Assoc. 1997;92:548–560. [Google Scholar]
39.Tipping, M. E. The relevance vector machine. in Advances in neural information processing systems 652–658 (2000).
40.Cortes C, Vapnik V. Support-vector networks. Mach. Learn. 1995;20:273–297. [Google Scholar]
41.Efron B. Nonparametric standard errors and confidence intervals. Can. J. Stat. 1981;9:139–158. doi: 10.2307/3314608. [DOI] [Google Scholar]
42.Baecker, L. et al. Brain age prediction: A comparison between machine learning models using region- and voxel-based morphometric data. Hum. Brain Mapp. (2021). [DOI] [PMC free article] [PubMed]
43.Morris JC, et al. Mild cognitive impairment represents early-stage Alzheimer disease. Arch. Neurol. 2001;58:397–405. doi: 10.1001/archneur.58.3.397. [DOI] [PubMed] [Google Scholar]
44.Pihlajamaki M, Jauhiainen AM, Soininen H. Structural and functional MRI in mild cognitive impairment. Curr. Alzheimer Res. 2009;6:179–185. doi: 10.2174/156720509787602898. [DOI] [PubMed] [Google Scholar]
45.Thompson PM, et al. Mapping hippocampal and ventricular change in Alzheimer disease. Neuroimage. 2004;22:1754–1766. doi: 10.1016/j.neuroimage.2004.03.040. [DOI] [PubMed] [Google Scholar]
46.Fox NC, Schott JM. Imaging cerebral atrophy: normal ageing to Alzheimer’s disease. Lancet. 2004;363:392–394. doi: 10.1016/S0140-6736(04)15441-X. [DOI] [PubMed] [Google Scholar]
47.Drago V, et al. Disease tracking markers for Alzheimer’s disease at the prodromal (MCI) stage. J. Alzheimer’s Dis. 2011;26:159–199. doi: 10.3233/JAD-2011-0043. [DOI] [PubMed] [Google Scholar]
48.Chételat G, et al. Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment. NeuroReport. 2002;13:1939–1943. doi: 10.1097/00001756-200210280-00022. [DOI] [PubMed] [Google Scholar]
49.Hämäläinen A, et al. Voxel-based morphometry to detect brain atrophy in progressive mild cognitive impairment. Neuroimage. 2007;37:1122–1131. doi: 10.1016/j.neuroimage.2007.06.016. [DOI] [PubMed] [Google Scholar]
50.Pennanen C, et al. A voxel based morphometry study on mild cognitive impairment. J. Neurol. Neurosurg. Psychiatry. 2005;76:11–14. doi: 10.1136/jnnp.2004.035600. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Kang DW, Lim HK, Joo S, Lee NR, Lee CU. Differential associations between volumes of atrophic cortical brain regions and memory performances in early and late mild cognitive impairment. Front. Aging Neurosci. 2019;11:245. doi: 10.3389/fnagi.2019.00245. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Mulder C, et al. Amyloid-β (1–42), total tau, and phosphorylated tau as cerebrospinal fluid biomarkers for the diagnosis of Alzheimer disease. Clin. Chem. 2010;56:248–253. doi: 10.1373/clinchem.2009.130518. [DOI] [PubMed] [Google Scholar]
53.Fortin JP, et al. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage. 2018;167:104–120. doi: 10.1016/j.neuroimage.2017.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Garcia-Dias, R. et al. Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners. Neuroimage220, (2020). [DOI] [PMC free article] [PubMed]
55.Kia, S. M. et al. Federated Multi-Site Normative Modeling using Hierarchical Bayesian Regression. bioRxiv (2021).
56.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]
57.Van Essen DC, et al. The WU-Minn human connectome project: An overview. Neuroimage. 2013;80:62–79. doi: 10.1016/j.neuroimage.2013.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information.^{(1.8MB, pdf)}

Data Availability Statement

[CR1] 1.Marquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding heterogeneity in clinical cohorts using normative models: Beyond case–control studies. Biol. Psychiatry. 2016;80:552–561. doi: 10.1016/j.biopsych.2015.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Wolfers, T. et al. Individual differences v. the average patient: mapping the heterogeneity in ADHD using normative models. Psychol. Med. 1–10 (2019). [DOI] [PMC free article] [PubMed]

[CR3] 3.Kia, S. M. & Marquand, A. F. Neural processes mixed-effect models for deep normative modeling of clinical neuroimaging data. arXiv Prepr. arXiv1812.04998 (2018).

[CR4] 4.Zabihi M, et al. Dissecting the heterogeneous cortical anatomy of autism spectrum disorder using normative models. Biol. Psychiatry Cogn. Neurosci. Neuroimaging. 2019;4:567–578. doi: 10.1016/j.bpsc.2018.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Pinaya WHL, Mechelli A, Sato JR. Using deep autoencoders to identify abnormal brain structural patterns in neuropsychiatric disorders: A large-scale multi-sample study. Hum. Brain Mapp. 2019;40:944–954. doi: 10.1002/hbm.24423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Wolfers T, et al. Mapping the heterogeneous phenotype of schizophrenia and bipolar disorder using normative models. JAMA Psychiat. 2018 doi: 10.1001/JAMAPSYCHIATRY.2018.2467. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Ziegler G, Ridgway GR, Dahnke R, Gaser C, Initiative ADN. Individualized Gaussian process-based prediction and detection of local and global gray matter abnormalities in elderly subjects. Neuroimage. 2014;97:333–348. doi: 10.1016/j.neuroimage.2014.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Huizinga W, et al. A spatio-temporal reference model of the aging brain. Neuroimage. 2018;169:11–22. doi: 10.1016/j.neuroimage.2017.10.040. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Marquand, A. F. et al. Conceptualizing mental disorders as deviations from normative functioning. Mol. Psychiatry 1 (2019). [DOI] [PMC free article] [PubMed]

[CR10] 10.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Vieira, S., Pinaya, W. H. L., Garcia-Dias, R. & Mechelli, A. Deep neural networks. in Machine Learning 157–172 (Elsevier, 2020).

[CR12] 12.Brewer JB. Fully-automated volumetric MRI with normative ranges: translation to clinical practice. Behav. Neurol. 2009;21:21–28. doi: 10.1155/2009/616581. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Busatto GF, Diniz BS, Zanetti MV. Voxel-based morphometry in Alzheimer’s disease. Expert Rev. Neurother. 2008;8:1691–1702. doi: 10.1586/14737175.8.11.1691. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Pini L, et al. Brain atrophy in Alzheimer’s disease and aging. Ageing Res. Rev. 2016;30:25–48. doi: 10.1016/j.arr.2016.01.002. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Sudlow C, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Mueller SG, et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI) Alzheimer’s Dement. 2005;1:55–66. doi: 10.1016/j.jalz.2005.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Ellis KA, et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: Methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. Int. Psychogeriatrics. 2009;21:672–687. doi: 10.1017/S1041610209009405. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Frisoni GB, et al. Markers of Alzheimer’s disease in a population attending a memory clinic. Alzheimer’s Dement. 2009;5:307–317. doi: 10.1016/j.jalz.2009.04.1235. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Galluzzi S, et al. The new Alzheimer’s criteria in a naturalistic series of patients with mild cognitive impairment. J. Neurol. 2010;257:2004–2014. doi: 10.1007/s00415-010-5650-0. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Marcus DS, et al. Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 2007;19:1498–1507. doi: 10.1162/jocn.2007.19.9.1498. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Malone IB, et al. MIRIAD—Public release of a multiple time point Alzheimer’s MR imaging dataset. Neuroimage. 2013;70:33–36. doi: 10.1016/j.neuroimage.2012.12.044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Elliott P, Peakman TC. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 2008;37:234–244. doi: 10.1093/ije/dym276. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Miller KL, et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 2016;19:1523. doi: 10.1038/nn.4393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Alfaro-Almagro F, et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage. 2018;166:400–424. doi: 10.1016/j.neuroimage.2017.10.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Aisen PS, et al. Clinical Core of the Alzheimer’s Disease Neuroimaging Initiative: Progress and plans. Alzheimer’s Dement. 2010;6:239–246. doi: 10.1016/j.jalz.2010.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Edmonds EC, et al. Early versus late MCI: Improved MCI staging using a neuropsychological approach. Alzheimer’s Dement. 2019;15:699–708. doi: 10.1016/j.jalz.2018.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Fischl B. FreeSurfer. Neuroimage. 2012;62:774–781. doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Fischl B, et al. Whole brain segmentation. Neuron. 2002;33:341–355. doi: 10.1016/S0896-6273(02)00569-X. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Desikan RS, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I. & Frey, B. Adversarial autoencoders. arXiv Prepr. arXiv: 1511.05644 (2015).

[CR31] 31.Pinaya, W. H. L., Vieira, S., Garcia-Dias, R. & Mechelli, A. Autoencoders. in Machine Learning 193–208 (Elsevier, 2020).

[CR32] 32.Goodfellow, I. et al. Generative adversarial nets. in Advances in neural information processing systems 2672–2680 (2014).

[CR33] 33.Maas, A. L., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. in Proc. icml vol. 30 3 (2013).

[CR34] 34.Kingma, D. & Ba, J. Adam: A method for stochastic optimization. arXiv Prepr. arXiv: 1412.6980 1–15 (2014).

[CR35] 35.Smith, L. N. Cyclical learning rates for training neural networks. in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) 464–472 (IEEE, 2017).

[CR36] 36.Smith, L. N. A disciplined approach to neural network hyper-parameters: Part 1—Learning rate, batch size, momentum, and weight decay. arXiv Prepr. arXiv: 1803.09820 (2018).

[CR37] 37.Cliff N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol. Bull. 1993;114:494. doi: 10.1037/0033-2909.114.3.494. [DOI] [Google Scholar]

[CR38] 38.Efron B, Tibshirani R. Improvements on cross-validation: The 632+ bootstrap method. J. Am. Stat. Assoc. 1997;92:548–560. [Google Scholar]

[CR39] 39.Tipping, M. E. The relevance vector machine. in Advances in neural information processing systems 652–658 (2000).

[CR40] 40.Cortes C, Vapnik V. Support-vector networks. Mach. Learn. 1995;20:273–297. [Google Scholar]

[CR41] 41.Efron B. Nonparametric standard errors and confidence intervals. Can. J. Stat. 1981;9:139–158. doi: 10.2307/3314608. [DOI] [Google Scholar]

[CR42] 42.Baecker, L. et al. Brain age prediction: A comparison between machine learning models using region- and voxel-based morphometric data. Hum. Brain Mapp. (2021). [DOI] [PMC free article] [PubMed]

[CR43] 43.Morris JC, et al. Mild cognitive impairment represents early-stage Alzheimer disease. Arch. Neurol. 2001;58:397–405. doi: 10.1001/archneur.58.3.397. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Pihlajamaki M, Jauhiainen AM, Soininen H. Structural and functional MRI in mild cognitive impairment. Curr. Alzheimer Res. 2009;6:179–185. doi: 10.2174/156720509787602898. [DOI] [PubMed] [Google Scholar]

[CR45] 45.Thompson PM, et al. Mapping hippocampal and ventricular change in Alzheimer disease. Neuroimage. 2004;22:1754–1766. doi: 10.1016/j.neuroimage.2004.03.040. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Fox NC, Schott JM. Imaging cerebral atrophy: normal ageing to Alzheimer’s disease. Lancet. 2004;363:392–394. doi: 10.1016/S0140-6736(04)15441-X. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Drago V, et al. Disease tracking markers for Alzheimer’s disease at the prodromal (MCI) stage. J. Alzheimer’s Dis. 2011;26:159–199. doi: 10.3233/JAD-2011-0043. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Chételat G, et al. Mapping gray matter loss with voxel-based morphometry in mild cognitive impairment. NeuroReport. 2002;13:1939–1943. doi: 10.1097/00001756-200210280-00022. [DOI] [PubMed] [Google Scholar]

[CR49] 49.Hämäläinen A, et al. Voxel-based morphometry to detect brain atrophy in progressive mild cognitive impairment. Neuroimage. 2007;37:1122–1131. doi: 10.1016/j.neuroimage.2007.06.016. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Pennanen C, et al. A voxel based morphometry study on mild cognitive impairment. J. Neurol. Neurosurg. Psychiatry. 2005;76:11–14. doi: 10.1136/jnnp.2004.035600. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Kang DW, Lim HK, Joo S, Lee NR, Lee CU. Differential associations between volumes of atrophic cortical brain regions and memory performances in early and late mild cognitive impairment. Front. Aging Neurosci. 2019;11:245. doi: 10.3389/fnagi.2019.00245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Mulder C, et al. Amyloid-β (1–42), total tau, and phosphorylated tau as cerebrospinal fluid biomarkers for the diagnosis of Alzheimer disease. Clin. Chem. 2010;56:248–253. doi: 10.1373/clinchem.2009.130518. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Fortin JP, et al. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage. 2018;167:104–120. doi: 10.1016/j.neuroimage.2017.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Garcia-Dias, R. et al. Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners. Neuroimage220, (2020). [DOI] [PMC free article] [PubMed]

[CR55] 55.Kia, S. M. et al. Federated Multi-Site Normative Modeling using Hierarchical Bayesian Regression. bioRxiv (2021).

[CR56] 56.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]

[CR57] 57.Van Essen DC, et al. The WU-Minn human connectome project: An overview. Neuroimage. 2013;80:62–79. doi: 10.1016/j.neuroimage.2013.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Using normative modelling to detect disease progression in mild cognitive impairment and Alzheimer’s disease in a cross-sectional multi-cohort study

Walter H L Pinaya

Cristina Scarpazza

Rafael Garcia-Dias

Sandra Vieira

Lea Baecker

Pedro F da Costa

Alberto Redolfi

Giovanni B Frisoni

Michela Pievani

Vince D Calhoun

João R Sato

Andrea Mechelli

Abstract

Introduction

Methods

Datasets

Table 1.

Table 2.

MRI processing

Normative model

Figure 1.

Normative model training

Analysis of the observed deviations

Brain regions deviations

Comparison against traditional machine learning classification

Experiments

Results

Comparison of deviation values for healthy controls and patients

Figure 2.

Normative model performance in discriminative tasks

Figure 3.

Brain regions deviations

Figure 4.

Traditional machine learning classification

Table 3.

Table 4.

Discussion

Supplementary Information

Acknowledgements

Author contributions

Data availability

Competing interests

Footnotes

Supplementary Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases