Skip to main content
Psychoradiology logoLink to Psychoradiology
. 2022 Mar 9;2(1):287–295. doi: 10.1093/psyrad/kkac001

A systematic analysis of diagnostic performance for Alzheimer's disease using structural MRI

Jiangping Wu 1, Kun Zhao 2, Zhuangzhuang Li 3, Dong Wang 4, Yanhui Ding 5, Yongbin Wei 6, Han Zhang 7, Yong Liu 8,9,
PMCID: PMC10939341  PMID: 38665142

Abstract

Background

Alzheimer's disease (AD) is one of the most common neurodegenerative disorders in the elderly. Although numerous structural magnetic resonance imaging (sMRI) studies have reported diagnostic models that could distinguish AD from normal controls (NCs) with 80–95% accuracy, limited efforts have been made regarding the clinically practical computer-aided diagnosis (CAD) system for AD.

Objective

To explore the potential factors that hinder the clinical translation of the AD-related diagnostic models based on sMRI.

Methods

To systematically review the diagnostic models for AD based on sMRI, we identified relevant studies published in the past 15 years on PubMed, Web of Science, Scopus, and Ovid. To evaluate the heterogeneity and publication bias among those studies, we performed subgroup analysis, meta-regression, Begg's test, and Egger's test.

Results

According to our screening criterion, 101 studies were included. Our results demonstrated that high diagnostic accuracy for distinguishing AD from NC was obtained in recently published studies, accompanied by significant heterogeneity. Meta-analysis showed that many factors contributed to the heterogeneity of high diagnostic accuracy of AD using sMRI, which included but was not limited to the following aspects: (i) different datasets; (ii) different machine learning models, e.g. traditional machine learning or deep learning model; (iii) different cross-validation methods, e.g. k-fold cross-validation leads to higher accuracies than leave-one-out cross-validation, but both overestimate the accuracy when compared to validation in independent samples; (iv) different sample sizes; and (v) the publication times. We speculate that these complicated variables might be the adverse factor for developing a clinically applicable system for the early diagnosis of AD.

Conclusions

Our findings proved that previous studies reported promising results for classifying AD from NC with different models using sMRI. However, considering the many factors hindering clinical radiology practice, there would still be a long way to go to improve.

Keywords: Alzheimer's disease, diagnosis, heterogeneity, sMRI, meta-analysis

Introduction

Alzheimer's disease (AD) is one of the most common neurodegenerative diseases in the elderly and it accounts for 60–70% of all dementia. There are >55 million people worldwide living with dementia, and a recent report according to Alzheimer's Disease International speculated that it would reach 78 million in 2030 (Gauthier et al., 2021).

AD is one of the top five leading causes of death among people older than 65 years in China (Zhou et al., 2019), and it is usually characterized by memory impairment, aphasia, apraxia, agnosia, visuospatial deficit, executive dysfunction (2021 Alzheimer's Disease Facts and Figures, 2021). Given the limited progression of pharmacological treatments for AD, convergent evidence has suggested that an early diagnosis plays a crucial role in delaying the progress of AD (Dubois et al., 2016; Rasmussen & Langerman, 2019; Vaz & Silvestre, 2020). AD leads to neuronal loss and, ultimately, atrophy: first in the medial temporal lobe (especially in the bilateral hippocampi) and later in the dementia stage in widespread cortical areas (Pini et al., 2016; Poulakis et al., 2018; Whitwell et al., 2012). Machine learning algorithms aim to detect the atrophy patterns and robustly distinguish them from normal aging (which partly also causes atrophy in the medial temporal lobe). Structural magnetic resonance imaging (sMRI), a powerful technique for noninvasive in vivo imaging of the human brain, has been used successfully in investigating the patterns of brain atrophy as an estimate of regional neurodegeneration of AD (Rathore et al., 2020). Many studies have reported high accuracy (i.e. 80–95%) for classifying AD from normal controls (NC) based on different sMRI features (for reviews, see Chavez-Fumagalli et al., 2021; Rathore et al., 2017). Some correct ratios are even higher than 95% in some diagnostic models based on deep learning (Zhang et al., 2021). Therefore, it would be very likely to be clinically translated in assisting with the early diagnosis of AD; although several existing computer-aided diagnosis (CAD) tools are already commercially available (https://www.cortechslabs.com/neuroquant; https://www.cneuro.com/cmri; https://mediaire.de/product/mdbrain; https://icometrix.com/products/icobrain-dm; https://jung-diagnostics.de/de/diagnostics), they only assess the volume of specific brain regions, which needs to be further assessed by the radiologists. The reality is that a CAD system that can automatically distinguish AD from the outpatients has rarely been reported.

The primary purpose of the present study is to investigate and discuss the possible reasons for hindering the clinical translation of the AD-related diagnostic models. First, we first performed a systematic review of the relevant studies between January 2006 and September 2021. Subsequently, a meta-analysis was introduced to evaluate the heterogeneity among the included studies quantitatively. Specifically, the heterogeneity of different stratified frames, including "dataset", "machine learning model," "cross-validation method," "sample size," and "publication time" were carefully evaluated. Finally, we assessed the robustness of the heterogeneous results of the study using different methods (i.e. subgroup analysis, meta-regression, and sensitivity analysis).

Materials and Methods

Search strategy

According to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (Liberati et al., 2009; Moher et al., 2009), we conducted a systematic literature review of previous studies that implemented sMRI to classify/predict AD from NC. To identify the original relevant articles for this paper, we adopted the following search strategy: (i) searching the titles and abstracts in English that published from January 2006 to September 2021 on PubMed, Web of Science, Scopus, and Ovid; and (ii) using Boolean operators in an advanced query: the search terms were concatenated as ("classification" OR "diagnostic" OR "predict*") AND ("MRI" OR "Magnetic Resonance Imaging") AND ("Alzheimer*"). Note that duplicate articles were removed from further analysis.

Inclusion criteria

To restrict studies for our aim, we performed the peer-reviewed research strategy to screen all retrieved articles (Sarica et al., 2017; Song et al., 2020). In particular, two examiners (J.W. and Z.L.) independently evaluated the full manuscripts using the following criteria: (i) focused on the classification of Alzheimer's disease and established diagnostic models based on sMRI; (ii) total sample size was larger than 20; and (iii) reported the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) value or these measures could be inferred. If J.W. and Z.L. had a conflict of understanding, the third examiner (Y.L.) would be the arbiter to resolve their conflicts. If more than one diagnostic model was reported in one study, we would only retain the model with the highest accuracy.

Investigations of heterogeneity

To estimate the performance of the diagnostic models, we first calculated the diagnostic odds ratio (DOR) for each model. DOR is defined as (TP × TN)/(FP × FN) and is used to measure the effectiveness of a diagnostic test (Glas et al., 2003). Then univariate meta-analysis was performed for DOR with R package "meta" (https://cran.r-project.org/web/packages/meta/, v.5.0–1) to estimate the heterogeneity and publication bias among studies. The confidence level for all calculations was set to 95% (95% CI) (Chavez-Fumagalli et al., 2021; Song et al., 2020). The I2 was used to reflect the proportion of heterogeneity in the total variation of effect size. The definition of I2 is (Huedo-Medina et al., 2006):

graphic file with name TM0001.gif

where Wi, Yi, and M represent weight for the study i, the effect size for the study i, and the summary effect, respectively. i is the ith study, k the total number of studies included, and d the degree of freedom of Q. The calculation of I2 is based on the random-effect model (Barili et al., 2018; Clark & Linzer, 2014).

A significantly heterogeneous result would be reported if I2 > 50% and P < 0.1 (Higgins & Thompson, 2002; Huedo-Medina et al., 2006). To further explore which factor influenced the heterogeneity of diagnostic test accuracy, we performed heterogeneity analyses for DOR with different stratified frames (i.e. "dataset," "machine learning model," "cross-validation method," "sample size," and "publication time"). In addition, meta-regression analysis can explore the factors and quantitatively describe the heterogeneity of different variables in meta-analysis.

Assessment of reporting bias

Reporting bias is known as publication bias, which has an enormous impact on the effectiveness of a systematic review and its meta-analysis (Dwan et al., 2013). It occurs when studies with favorable results are more likely to be published than studies with unfavorable results (van Enst et al., 2014). Generally, the most common method to discriminate publication bias is to draw a funnel plot and then evaluate publication bias by examining the asymmetry of the funnel plot (Harbord et al., 2006). The trim-and-fill method was used to describe the asymmetry of the funnel plot (Duval & Tweedie, 2000). Begg's test and Egger's test were introduced to quantitatively assess publication bias via funnel plot asymmetry (Leeflang, 2014; van Enst et al., 2014).

Results

A total of 2837 nonduplicate articles were identified. During the screening process, 367 articles were retained through our initial examination of their titles and abstracts. Then, examiners double-checked and excluded 204 articles according to the previously mentioned criteria. Of the remaining 163 papers, the TP, FP, FN, and FN were reported or correctly calculated in 101 articles. Note that some articles may contain multiple validation results. Herein, 116 records were finally included in our meta-analysis (Fig. 1).

Figure 1:

Figure 1:

The workflow diagram of PRISMA in the study.

Preliminary analysis of the diagnostic model

First, we conducted a preliminary exploration of the AD diagnosis models based on sMRI. Among the included papers, the number of studies of three types of diagnostic model [i.e. machine learning (ML), deep learning (DL), and others] was significantly different (60, 26, and 14%, respectively; Fig. 2a). The sample sizes of ML, DL, and other models were significantly different and the mean sample size in DL was the largest (Fig. 2b). The accuracy of DL models was markedly higher than the other two types of model (all P < 0.001). In contrast, there was no significant difference between ML models and models of others (P = 0.149) (Fig. 2c). However, for DOR, there were substantial differences among the three types of diagnostic model (all P < 0.005) (Fig. 2d). In addition, the cross-validation methods can be primarily divided into four categories (k-fold, leave-one-out, independent, and others), where the k-fold cross-validation method was the most commonly used and with the highest accuracy and DOR (Fig. 2e and f).

Figure 2:

Figure 2:

(A) The number of different validation methods in different diagnostic models. (B) The differences in the sample size of different diagnostic models. (C) The differences in diagnostic accuracy of different models. (D) The differences in diagnostic accuracy of different validation methods. (E) The differences in diagnostic odds ratio of different models. (F) The differences in diagnostic odds ratio of different validation methods.

As shown in Fig. 3, the study sample size significantly increased in recent years (P < 0.001). Similarly, the diagnostic accuracy also displayed a significant increasing trend along with the year of publication (P < 0.001).

Figure 3:

Figure 3:

(A) The sample size of AD diagnostic model changes with the year of publication. (B) The trend of the DOR of a diagnostic model for AD with the year of publication. The different colors and sizes of the circles correspond to different models for AD diagnosis and sample sizes of diagnostic models.

Results of assessment for heterogeneity

As shown in Table 1, the value of I2 was 73% (P < 0.001), indicating high heterogeneity among the included studies. To explore the reasons for the heterogeneity, we then compared different subgroups according to the stratification from five perspectives: “dataset," “machine learning model," "cross-validation method," “sample size," and “publication year." Significant differences were found from the five perspectives (Table 1). Specifically, the I2 of the ADNI dataset (77%) was significantly larger than the other datasets (56%) (P = 0.032). This indicated that the “dataset" could be one of the potential factors of high heterogeneity. For the diagnostic model, the I2 of ML, DL, and others method were 74% (P < 0.001), 65% (P < 0.001), and <1% (P = 0.650), separately. Similarly, low heterogeneity existed among the studies with other cross-validation methods (I2 < 1%, P = 0.781). Studies with samples <150 showed lower heterogeneity in comparison to studies with large samples. Specifically, the I2 was 68% (samples 150–300), 68% (samples 300–450), and 78% (samples > 450). As for the publication year, the I2 of 2011–2014, 2015–2018, and 2019–2021 was higher than 65%, while the I2 of 2006–2010 was relatively lower. This is probably because few studies were included for 2006–2010. Similar patterns were obtained even within the ADNI dataset (Table 1). As shown in Fig. 4a, meta-regression analysis revealed a linear relationship between sample size and log (DOR) (P < 0.001), indicating that the sample size could be one of the potential factors of high heterogeneity. Furthermore, datasets, models, and cross-validation methods showed similar linear relationships with log(DOR) (all P < 0.05).

Table 1:

Summary of heterogeneity in different subgroups based on a random effect model.

Subgroup Type ADNI + other datasets ADNI
Number Heterogeneity (I2 %) P values Number Heterogeneity (I2 %) P values
Dataset ADNI 79 77 (P < 0.001) P = 0.032
other datasets 37 56 (P < 0.001)
Method machine learning 70 74 (P < 0.001) P < 0.001 48 79 (P < 0.001) P = 0.002
deep learning 30 65 (P < 0.001) 24 69 (P < 0.001)
other method 16 <1 (P = 0.650) 7 <1 (P = 0.620)
Validation k-fold 72 76 (P < 0.001) P = 0.003 59 76 (P < 0.001) P < 0.019
leave-one-out 24 47 (P < 0.001) 11 60 (P < 0.001)
independent 15 80 (P < 0.001) 8 87 (P < 0.001)
other validation 5 <1 (P = 0.780) 1 Not applicable
Sample 0–150 41 44 (P = 0.010) P = 0.008 19 41 (P = 0.008) P = 0.025
150–300 32 73 (P < 0.001) 24 70 (P < 0.001)
300–450 30 74 (P < 0.001) 28 76 (P < 0.001)
>450 13 88 (P < 0.001) 8 92 (P < 0.001)
Year 2006–2010 6 <1 (P = 0.510) P < 0.001 2 8 (P = 0.300) P < 0.003
2011–2014 26 68 (P < 0.001) 14 65 (P < 0.001)
2015–2018 44 68 (P < 0.001) 34 74 (P < 0.001)
2019–2021 37 78 (P < 0.001) 29 81 (P < 0.001)
Total heterogeneity 73 (P < 0.001) 77 (P < 0.001)

Figure 4:

Figure 4:

(A) Bubble plot of sample size in meta-regression. Different circles correspond to different studies. The size of the circle is inversely proportional to the variance of the estimated effect value. (B) A Baujat plot explores the heterogeneity in a meta-analysis. Different circles also correspond to different studies.

Analysis sensitivity

We performed sensitivity analysis is to explore whether the results were stable or not. The Baujat plot has been proposed to detect sources of heterogeneity in the meta-analysis (Anzures-Cabrera & Higgins, 2010; Baujat et al., 2002). As shown in Fig. 4b, studies that fell in the upper right corner of the graph made an enormous contribution to the heterogeneity. Note that the heterogeneity was still high ( I2 =  68%) after removing those studies, and the I2 was more prominent (i.e. larger than 71%) when we dropped any of the included studies. To sum up, the high heterogeneity existing among studies was included in our analysis.

Reporting bias

As shown in Fig. 5a, the funnel chart was approximately symmetrical, indicating that there was no significant publication bias in the included studies. A large number of studies fell outside the dotted line (95% CI), which was associated with the heterogeneity among the included studies. Both P values of Begg's test (P = 0.367) and Egger's test (P = 0.068) were >0.05, which means that we could not reject our null hypothesis that there was no publication bias of the included studies in our analysis (Fig. 5b).

Figure 5:

Figure 5:

(A) Funnel chart of the trim and fill method. The red circles represent the original studies, and the hollow circles represent the studies added after cutting. (B) A radial plot of Egger's test. The dotted line in the middle represents the regression fitting line based on the random-effects model. The dashed lines on both sides represent the 95% confidence level.

Discussion

In this paper, we systematically reviewed 101 articles relevant to diagnostic models for AD and found high heterogeneity among them. Our meta-analysis showed that many factors contribute to the heterogeneity of high diagnostic accuracy of AD using sMRI, which included but was not limited to the "dataset," "machine learning model," "cross-validation method," "sample size," and "publication time". These complicated variables probably are reasons for hindering the clinical translation for the early diagnosis of AD.

Structural MRI is one of the most widely used and accessible techniques in searching the imaging biomarkers for AD and has been recommended in clinical guidelines (National Institute for Health and Care Excellence (UK), 2018). Modern machine learning models have been pervasive in searching neuroimaging-based biomarkers for AD based on sMRI, especially with the enormous contribution from the ADNI (http://adni.loni.usc.edu/) and other open datasets (https://www.oasis-brains.org). As mentioned previously, promising results have been achieved for classifying AD from NC with different models using sMRI (Cheng et al., 2017; Feng et al., 2021; Rathore et al., 2017). In the past decades, a lot of work has been done to translate the machine learning model from bench to bedside. Serval CAD tools have already been commercially available for diagnosing AD, such as Neuroquant (https://www.cortechslabs.com/neuroquant), cNeuro (https://www.combinostics.com/), mdbrain (https://mediaire.de/en/about/), icobrain dm (https://icometrix.com/products/icobrain-dm), and BIOMETRICA ( https://jung-diagnostics.de/de/diagnostics). However, these toolkits only can assess the volume of specific brain regions or some other brain features, which needs to be further evaluated by the radiologists. Therefore, there is still a gap from scientific research to clinical radiology practice.

First, the lack of generalizability and validation of large sample size is considered to be a major challenge for the field. Based on the included studies, we observed that sample size has been increasing quickly in recent years. We also noted that most of these studies are based on ADNI. Although the sample size of ADNI is >2000 participants, it comes from around 70 sites and four stages (ADNI1, ADNIGO, ADNI2, and ADNI3). It should be noted that the ADNI website (http://adni.loni.usc.edu/) is not a "click & play" curated dataset but a set of MRI scans and spreadsheets that need to be selected, downloaded, and processed by the individual researchers. Depending on the selection criteria, specific diagnostic groups are included. For example, the specific "early MCI" groups and newer ADNI3 patients show less (hippocampus) atrophy compared to the "late MCI" (Moore et al., 2020). In a word, it is still a significant heterogeneous dataset. The small sample size with leave-one-out or k-folds cross-validation will lead to the overfitting and/or the out-of-distribution problem for imaging data from different sites with various co-factors. Besides, the accuracy of the deep learning model is higher than the traditional machine learning model. One of the potential reasons is that the complex parameter space provides the possibility of higher accuracy; meanwhile, it also presents a challenge for generalization. Specifically, several recent studies with hippocampus markers (Ding et al., 2021; Li et al., 2019; Zhao et al., 2020) or deep learning models (Dyrba et al., 2021; Jin et al., 2020; Lian et al., 2020; Qiu et al., 2020) have provided a promising performance for introducing independent cross-validation with large multisite individuals. We believe that this kind of cross-validation with the independent sites would accelerate clinical translation.

Second, the lack of standardized preprocessing has hampered clinical application. As we know, the measures of the brain are also impacted by several technical factors such as the reliability and robustness of MRI scanners, as well as postprocessing pipelines. There are several popular tools, such as FSL, Freesurfer, Cat12, SPM, etc. However, we know that even the simplest measure, such as gray matter volume of the brain, would have minor differences driven by the used tools (Guo et al., 2019; Han et al., 2006; Jovicich et al., 2009; Medawar et al., 2021), which might affect the homogeneity of the models. It is impossible to ask all these studies to be performed at one single site, especially for large cohort studies. Therefore, a standard, well-established pipeline for data preprocessing is encouraged for future studies (Esteban et al., 2019; Glasser et al., 2013).

Third, some of the previous results were overestimated in the clinical availability due to over-selected participants who were involved in their studies. For example, the ADNI dataset is highly preselected in terms of image acquisition and patients via the rigorous inclusion and exclusion criteria. However, for a patient with cognitive impairment who decides to visit their doctor, he/she possibly has other disorders, such as diabetes, head trauma, hypertension, cerebrovascular disease, etc., and these risks would draw influence on the CAD system of AD. Many factors should be responsible for the significant heterogeneity rather than a single factor, and the heterogeneity of AD symptoms itself also seriously challenges and will prevent highly accurate prediction linking brain features for the development of a robust CAD system of AD (Ferreira et al., 2020; Habes et al., 2020; Machado et al., 2020; B. Zhang et al., 2021). Thus, a large, high-quality longitude protocol, validated in independent datasets obtained from different centers would benefit the future CAD system for AD.

This study has limitations. We only focused on the sMRI studies due to their wide availability as clinical tools. We tried to find the source of heterogeneity and evaluate the study's reliability. Given the high heterogeneity among studies in the current analysis, we focused on meta-regression, subgroup comparison, sensitivity analysis, and the evaluation of publication bias. Even so, there was still high heterogeneity for most subgroups. We speculate that adding other measures such as PET imaging, CSF markers, genetic factors, or lifestyle factors might help to achieve higher predictive power with lower heterogeneity.

Conclusion

In summary, we found significant differences among the different diagnostic models, i.e., high heterogeneity, by systematically reviewing the diagnostic models for AD. This was due to many factors, including but not limited to the "dataset", "machine learning model", "cross-validation method", "sample size" and "publication time". It is a tremendous challenge for model promotion and clinical translation. The clinical translation of AD diagnostic models still has a long way to improve.

ACKNOWLEDGEMENTS

This work was partially supported by the Beijing Natural Science Funds for Distinguished Young Scholars (No. JQ20036), the Fundamental Research Funds for the Central Universities (No. 2021XD-A03-1), and the National Natural Science Foundation of China (Nos. 81871438 and 82172018).

Contributor Information

Jiangping Wu, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China.

Kun Zhao, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China.

Zhuangzhuang Li, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China.

Dong Wang, School of Information Science and Engineering, Shandong Normal University, Ji'nan, 250014, China.

Yanhui Ding, School of Information Science and Engineering, Shandong Normal University, Ji'nan, 250014, China.

Yongbin Wei, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China.

Han Zhang, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China.

Yong Liu, School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; Center for Artificial Intelligence in Medical Imaging, Beijing University of Posts and Telecommunications, Beijing, 100876, China.

Author Contributions

J.W. and Z.L. analyzed the data and performed the measurements. J.W., K.Z., and Y.L. were principally responsible for preparing the manuscript. H.Z., D.W., Y.W., Y.D., and Y.L. revised the manuscript. Y.L. supervised the project.

Conflict of interests

None of the authors reported biomedical financial interests or potential conflicts of interest.

References

  1. 2021 Alzheimer's Disease Facts and Figures(2021) Alzheimers Dement. 17:327–406. [DOI] [PubMed] [Google Scholar]
  2. Anzures-Cabrera J, Higgins JP (2010) Graphical displays for meta-analysis: an overview with suggestions for practice. Res Synth Methods. 1:66–80. [DOI] [PubMed] [Google Scholar]
  3. Barili F, Parolari A, Kappetein PAet al. (2018) Statistical primer: heterogeneity, random- or fixed-effects model analyses?. Interact Cardiovasc Thorac Surg. 27:317–21. [DOI] [PubMed] [Google Scholar]
  4. Baujat B, Mahe C, Pignon JPet al. (2002) A graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials. Stat Med. 21:2641–52. [DOI] [PubMed] [Google Scholar]
  5. Chavez-Fumagalli MA, Shrivastava P, Aguilar-Pineda JAet al. (2021) Diagnosis of Alzheimer's disease in developed and developing countries: systematic review and meta-analysis of diagnostic test accuracy. J Alzheimer's Dis Rep. 5:15–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cheng B, Liu M, Shen Det al. (2017) Multi-domain transfer learning for early diagnosis of Alzheimer's disease. Neuroinformatics. 15:115–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Clark TS, Linzer DA (2014) Should I use fixed or random effects?. Political Sci Res Meth. 3:399–408. [Google Scholar]
  8. Ding Y, Zhao K, Che Tet al. (2021) Quantitative radiomic features as new biomarkers for Alzheimer's disease: an amyloid PET study. Cereb Cortex. 31:3950–61. [DOI] [PubMed] [Google Scholar]
  9. Dubois B, Padovani A, Scheltens Pet al. (2016) Timely diagnosis for Alzheimer's disease: a literature review on benefits and challenges. J Alzheimers Dis. 49:617–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Duval S, Tweedie R (2000) Trim and fill a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics. 56:455–63. [DOI] [PubMed] [Google Scholar]
  11. Dwan K, Gamble C, Williamson PRet al. (2013) Systematic review of the empirical evidence of study publication bias and outcome reporting bias – an updated review. PLoS ONE. 8:e66844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dyrba M, Hanzig M, Altenstein Set al. (2021) Improving 3D convolutional neural network comprehensibility via interactive visualization of relevance maps: evaluation in Alzheimer's disease. Alzheimers Res Ther. 13:191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Esteban O, Markiewicz CJ, Blair RWet al. (2019) fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat Methods. 16:111–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Feng J, Zhang S-W, Chen Let al. (2021) Alzheimer's disease classification using features extracted from nonsubsampled contourlet subband-based individual networks. Neurocomputing. 421:260–72. [Google Scholar]
  15. Ferreira D, Nordberg A, Westman E (2020) Biological subtypes of Alzheimer disease: a systematic review and meta-analysis. Neurology. 94:436–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gauthier S, Rosa-Neto P, Morais JAet al. (2021) World Alzheimer Report 2021: journey through the diagnosis of dementia. London, England: Alzheimer's Disease International. [Google Scholar]
  17. Glas AS, Lijmer JG, Prins MHet al. (2003) The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 56:1129–35. [DOI] [PubMed] [Google Scholar]
  18. Glasser MF, Sotiropoulos SN, Wilson JAet al. (2013) The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage. 80:105–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Guo C, Ferreira D, Fink Ket al. (2019) Repeatability and reproducibility of FreeSurfer, FSL-SIENAX and SPM brain volumetric measurements and the effect of lesion filling in multiple sclerosis. Eur Radiol. 29:1355–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Habes M, Grothe MJ, Tunc Bet al. (2020) Disentangling heterogeneity in Alzheimer's disease and related dementias using data-driven methods. Biol Psychiatry. 88:70–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Han X, Jovicich J, Salat Det al. (2006) Reliability of MRI-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer. Neuroimage. 32:180–94. [DOI] [PubMed] [Google Scholar]
  22. Harbord RM, Egger M, Sterne JA (2006) A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med. 25:3443–57. [DOI] [PubMed] [Google Scholar]
  23. Higgins JP, Thompson SG (2002) Quantifying heterogeneity in a meta-analysis. Stat Med. 21:1539–58. [DOI] [PubMed] [Google Scholar]
  24. Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez Fet al. (2006) Assessing heterogeneity in meta-analysis: q statistic or index?. Psychol Methods. 11:193–206. [DOI] [PubMed] [Google Scholar]
  25. Jin D, Zhou B, Han Yet al. (2020) Generalizable, reproducible, and neuroscientifically interpretable imaging biomarkers for Alzheimer's disease. Adv Sci (Weinh). 7:2000675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jovicich J, Czanner S, Han Xet al. (2009) MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths. Neuroimage. 46:177–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Leeflang MM (2014) Systematic reviews and meta-analyses of diagnostic test accuracy. Clin Microbiol Infect. 20:105–13. [DOI] [PubMed] [Google Scholar]
  28. Li H, Habes M, Wolk DAet al. (2019) A deep learning model for early prediction of Alzheimer's disease dementia based on hippocampal magnetic resonance imaging data. Alzheimers Dement. 15:1059–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lian C, Liu M, Zhang Jet al. (2020) Hierarchical fully convolutional network for joint atrophy localization and Alzheimer's disease diagnosis using structural MRI. IEEE Trans Pattern Anal Mach Intell. 42:880–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Liberati A, Altman DG, Tetzlaff Jet al. (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 62:e1–34. [DOI] [PubMed] [Google Scholar]
  31. Machado A, Ferreira D, Grothe MJet al. (2020) The cholinergic system in subtypes of Alzheimer's disease: an in vivo longitudinal MRI study. Alzheimers Res Ther. 12:51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Medawar E, Thieleking R, Manuilova Iet al. (2021) Estimating the effect of a scanner upgrade on measures of grey matter structure for longitudinal designs. PLoS ONE. 16:e0239021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Moher D, Liberati A, Tetzlaff Jet al. (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6:e1000097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Moore EE, Liu D, Pechman KRet al. (2020) Mild cognitive impairment staging yields genetic susceptibility, biomarker, and neuroimaging differences. Front Aging Neurosci. 12:139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. National Institute for Health and Care Excellence (UK) (2018) National Institute for Health and Care Excellence: Clinical Guidelines. In Dementia: Assessment, management and support for people living with dementia and their carers. National Institute for Health and Care Excellence (UK). [PubMed] [Google Scholar]
  36. Pini L, Pievani M, Bocchetta Met al. (2016) Brain atrophy in Alzheimer's disease and aging. Ageing Res Rev. 30:25–48. [DOI] [PubMed] [Google Scholar]
  37. Poulakis K, Pereira JB, Mecocci Pet al. (2018) Heterogeneous patterns of brain atrophy in Alzheimer's disease. Neurobiol Aging. 65:98–108. [DOI] [PubMed] [Google Scholar]
  38. Qiu S, Joshi PS, Miller MIet al. (2020) Development and validation of an interpretable deep learning framework for Alzheimer's disease classification. Brain. 143:1920–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rasmussen J, Langerman H (2019) Alzheimer's disease – Why we need early diagnosis. Degener Neurol Neuromusc Dis. 9:123–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Rathore S, Abdulkadir A, Davatzikos C (2020) Analysis of MRI data in diagnostic neuroradiology. Ann Rev Biomed Data Sci. 3:365–90. [Google Scholar]
  41. Rathore S, Habes M, Iftikhar MAet al. (2017) A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer's disease and its prodromal stages. Neuroimage. 155:530–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Sarica A, Cerasa A, Quattrone A (2017) Random forest algorithm for the classification of neuroimaging data in Alzheimer's disease: a systematic review. Front Aging Neurosci. 9:329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Song M, Yang Y, Yang Zet al. (2020) Prognostic models for prolonged disorders of consciousness: an integrative review. Cell Mol Life Sci. 77:3945–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. van Enst WA, Ochodo E, Scholten RJet al. (2014) Investigation of publication bias in meta-analyses of diagnostic test accuracy: a meta-epidemiological study. BMC Med Res Methodol. 14:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Vaz M, Silvestre S (2020) Alzheimer's disease: recent treatment strategies. Eur J Pharmacol. 887:173554. [DOI] [PubMed] [Google Scholar]
  46. Whitwell JL, Dickson DW, Murray MEet al. (2012) Neuroimaging correlates of pathologically defined subtypes of Alzheimer's disease: a case-control study. Lancet Neurol. 11:868–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zhang B, Lin L, Wu Set al. (2021) Multiple subtypes of Alzheimer's disease base on brain atrophy pattern. Brain Sci. 11:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Zhang X, Han L, Zhu Wet al. (2021) An explainable 3D residual self-attention deep neural network for joint atrophy localization and Alzheimer's disease diagnosis using structural MRI. IEEE J Biomed Health Inform. In Press. [DOI] [PubMed] [Google Scholar]
  49. Zhao K, Ding YH, Han Yet al. (2020) Independent and reproducible hippocampal radiomic biomarkers for multisite Alzheimer's disease: diagnosis, longitudinal progress and biological basis. Sci Bull. 65:1103–13. [DOI] [PubMed] [Google Scholar]
  50. Zhou M, Wang H, Zeng Xet al. (2019) Mortality, morbidity, and risk factors in China and its provinces, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet North Am Ed. 394:1145–58. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Psychoradiology are provided here courtesy of Oxford University Press

RESOURCES