Skip to main content
AMIA Summits on Translational Science Proceedings logoLink to AMIA Summits on Translational Science Proceedings
. 2023 Jun 16;2023:544–553.

Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE

Xinkai Wang 1, Yanbo Feng 1, Boning Tong 1, Jingxuan Bao 1, Marylyn D Ritchie 1, Andrew J Saykin 2, Jason H Moore 3, Ryan Urbanowicz 3, Li Shen 1
PMCID: PMC10283099  PMID: 37350896

Abstract

STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer’s disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.

Introduction

Alzheimer’s disease (AD) is one of the most common forms of dementia that slowly destroys memory, thinking skills, and the ability to carry out the simplest tasks. Recent estimates indicate that the disorder may rank the third leading cause of death for older people, just behind heart disease and cancer. More than 6 million Americans age 65 and older may have Alzheimer’s. Thus, there is surging interest in conducting AD-related analyses, particularly on AD diagnosis and biomarker discovery. As machine learning (ML) developed to be the keystone of modern data science and data analysis, ML tools and strategies have been prevalent in the research field of AD prediction. However, most of these ML tools and strategies in research are customized to serve particular datasets, methods, and aims. Moreover, inevitably, the detailed implementations of ML methods and the evaluation metrics vary significantly across different research. These discrepancies among the ML researches on AD and the complexity within them create barriers for researchers outside the field to apply the research results and methods practically and limited the development of application of the research results in the real world.

Automated machine learning (AutoML) is a promising approach to relieve this dilemma. For example, STREAMLINE has recently been developed as a straightforward, user-friendly, and transparent end-to-end machine learning pipeline with the AutoML strategy [1]. AutoML, in recent years, has become an important and productive field in ML due to the progression in computing powers. The AutoML tools enable researchers to build a system where hyperparameter optimization, feature selection, model selection, data analysis, algorithm performance comparison, and outcome comparison can be automated. With the strategy of AutoML and a powerful hyperparameter optimization tool called Optuna [2], the STREAMLINE framework has been implemented for solving data science problems. STREAMLINE is designed to carefully and strictly apply various ML algorithms and compare their characteristics, performance, and outcomes. The system is mainly composed of data counts, data correlation visualization, feature univariate analysis, data cleaning, cross-validation partitioning, scaling, imputation, pre-modeling feature importance evaluation, feature selection, model training, post-modeling feature importance evaluation, algorithm performance and result comparisons, and publication-ready report generator. The system includes common classification algorithms, such as Naive Bayes, Decision Tree, Random Forest, and Boosting, and also some novel rule-based evolutionary ML models such as eLCS and XCS. Users can easily adjust necessary paths and parameters, upload data to the input path, and then start the whole system with a single click, which is easy to operate even for researchers not familiar with ML methods.

Despite the efficient and versatile functionalities of the STREAMLINE, the current initial version is designed to solve only classification problems, more specifically, binary classification problems, and has limitations in solving regression-based ML problems in AD prediction research. As a result, to improve the versatility and functionality, STREAMLINE needs to satisfy demands on regression problems as well. With these observations, in this work, we extend STREAMLINE through implementing four effective regression-based ML models, including Linear Regression, Elastic Net, Group Lasso, and L21 Norm. We demonstrate the effectiveness of our regression STREAMLINE by applying it to the outcome prediction using multimodal brain imaging data in an Alzheimer’s disease study.

Literature Review

There are many existing AutoML platforms since 2013. The most well-known platform is the Auto-WEKA [3]. It contains 39 classification algorithms, 3 feature search methods, and 8 feature evaluators, and the platform focuses on combined algorithm selection and hyperparameter optimization. Another well-known work is the auto-sklearn toolkit [4]. It contains 15 classification algorithms, 14 preprocessing methods, and 4 data preprocessing methods that leverage meta-learning and Bayesian optimization. However, there is no well-known AutoML platform dedicated to regularized regression models for analyzing high-dimensional sparse data. This motivates us to extend the implementations to regression models.

There is also literature on applications of regularized regression methods on the ADNI data. Zhou et al. proposed the fused sparse group lasso method for AD prediction using the ADNI data [5]. They incorporated biomarkers at common and different time points and temporal smoothness using the lasso penalty terms. They used ADAS-Cog and MMSE outcomes with MRI features; their evaluation metrics are normalized Mean Squared Error, Average Correlation Coefficient and Mean Squared Error. Liu et al. [6] proposed a multi-task sparse group lasso framework to apply ADNI data with ADAS-Cog and MMSE outcomes and provided experimental results of 7 regularized linear regression methods as well as regression weights heatmap on ROIs. They used root Mean Squared Error and Correlation Coefficient as the evaluation metrics.

Materials and Method

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) [7, 8]. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsy-chological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see www.adni-info.org.

Study Participants In this work, we analyzed 805 non-Hispanic Caucasian subjects with complete baseline measurements (the first session of diagnosis, or session M-00) of three studied imaging modalities, genotyping data, and visit-matched diagnostic information. Specifically, there are 274 controls (i.e., 196 healthy controls (HC) and 78 normal controls with significant memory concern (SMC)) and 531 cases (i.e., 235 patients with early mild cognitive impairment (EMCI), 162 patients with late mild cognitive impairment (LMCI), and 134 AD patients). Shown in Table 1 are their characteristics.

Table 1:

Participant characteristics in our experiments. There are totally 805 participants, where HC and SMC participants are grouped as controls (N=274), and EMCI, LMCI and AD participants are grouped as cases (N=531).

Diagnosis Control Case P
Number 274 531 -
Gender(M/F) 125/149 282/249 5.25E-02
Age(mean±sd) 74.84±6.35 72.99±8.05 9.81E-04
Education(mean±sd) 16.44±2.72 15.99±2.73 2.71E-02

P-values were computed using one-way T-test (except for gender using χ2 test). The bold text denoted p < 0.05.

Imaging and Cognitive Data We were specifically interested in three imaging modalities in ADNI: structural MRI [9] (sMRI, measuring brain morphometry), amyloid-PET [10] (measuring amyloid burden, AV45), and FDG-PET [11] (measuring glucose metabolism) since these imaging modalities were widely used in AD research field. Within these three modalities, the sMRI scans were processed with voxel-based morphometry (VBM) using the Statistical Parametric Mapping (SPM) software tool [12]. Generally, all scans were aligned to a T1-weighted template image, segmented into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) maps, normalized to the standard Montreal Neurological Institute (MNI) space as 2×2×2 mm3 voxels and were smoothed with an 8mm FWHM kernel. The FDG-PET and AV45-PET scans were also registered into the same MNI space by SPM. The MarsBaR ROI toolbox [13] was used to group voxels into 116 regions-of-interest (ROIs). ROI-level measures were calculated by averaging all the voxel-level measures within each ROI. Then, for each imaging modality, the resultant data were matched to each participant’s corresponding visit sessions. As mentioned above, participants in this work included 805 non-Hispanic Caucasian subjects with complete baseline ROI-level measurements of three modalities and visitmatched diagnostic information; see Table 1 for their characteristics [14]. The outcome measurements studied in this work include four cognitive scores: ADAS 11, ADAS 13, MMSE, and CDR-SB, where each measurement quantifies the progress of AD from a different perspective [15].

Regression STREAMLINE The initial STREAMLINE framework consists of four main stages: (1) Data preparation, (2) Feature importance estimation and selection, (3) ML modeling, (4) Post-analysis. Our modified version of STREAMLINE pipeline for regression problems preserves the general structure with adjustments and expansions in each stage for regression-specific purposes. As shown in Figure. 1, the regression pipeline is composed of (1) Data preparation, (2) Feature importance evaluation, (3) ML modeling, (4) Post-analysis. Data preparation includes (1) data input, which takes in one or multiple properly formatted datasets as the input for the AutoML system; (2) exploratory analysis, which includes commonly used summary statistics and plots; (3) basic cleaning, which removes instances with missing outcome and any user-specified features; (4) k-fold cross-validation with three CV strategies: stratified (grouped by the same class), random, and matched (grouped by a user-specified feature column); (5) train-test split; (6) feature transform, which includes mean/SD scaling on training data and testing data; (7) missing value imputation, which involves mode imputation for categorical features and iterative imputation for quantitative features. Then, the prepared training and testing data are passed to Feature importance evaluation step. This stage includes (1) Mutual Information (MI) evaluation and (2) MultiSURF evaluation on features in training data, which filter out the unimportant features and save and plot the top most important features. After the pre-modeling feature importance evaluation, the training sets and testing sets are passed to the modeling stage. In the ML modeling stage, we change the original binary classification algorithms to four new regression algorithms: (1) Linear Regression, (2) Elastic Net [16], (3) Group Lasso, (4) L21 Norm [17, 18, 19, 20, 21]. For each algorithm, there are four sections: (1) hyperparameter optimization using Optuna [2], (2) optimized model training, (3) feature importance (FI) estimation, and (4) performance evaluation using testing data. These algorithms can be activated or inactivated based on the user’s choice. These trained models are saved with Python Pickle at the end of this stage. Eventually, the trained models are analyzed in the Post-analysis to generate FI plots, regression evaluation metrics, and brain region map with publication-ready quality [1].

Figure 1:

Figure 1:

Regression STREAMLINE Schematics. The pipeline of the modified Regression STREAMLINE is composed of 4 elements: Data Preparation, Feature Importance Evaluation, ML Modeling, Post-Modeling Analysis [1].

Data Preparation To prepare data for calculating feature importance and modeling in STREAMLINE, we first dropped the samples that contain null values for each modality of imaging data and each measurement. Then, we selected the common patient subject IDs from each modality for each measurement. We have three imaging modalities and four cognition measurements in total, so we created 3 × 4 = 12 individual datasets (or combinations of one imaging modality and one measurement) as inputs to the entire STREAMLINE pipeline.

Specifically, as for the Exploratory Analysis stage, we output several descriptive results: summary statistics, histogram of outcome values, number of instances, number of features (quantitative), number of missing values for each feature, and the feature correlation plot that visualizes the correlation values between each pair of features. As for data preprocessing, we dropped instances with missing outcome values and did multivariate or median imputation of the missing values from the features depending on the user’s choice. Next, we used Standard Scaler to scale all the imaging modalities data. As a common practice, we set the number of cross validation to be five, and we randomly selected 80 percent of training data and the rest 20 percent of testing data. We repeated this procedure for five times (5-fold cross validation) to generate five training and testing datasets for each combination of measurement and imaging modality.

Extended Regression Methods The four new regression algorithms include single-modal and multi-modal regression methods. For single-modal regression, we developed and applied Linear Regression, Elastic Net, and Group Lasso, and the data in each modality was organized as:

X=[X1,,Xn,,XN]T

where Xn represents nth ROI feature column containing all subjects in each single-modality data. Linear Regression and Elastic Net were imported from the scikit-learn package [22]; Group Lasso was imported from the python package [23].

Group Lasso is the regularization of linear regression that minimizes the following objective function:

||yXw||22+λgGng||wg||2

where X is the imaging data matrix, y is the measurement, and w is the weight we wanted to learn, λ is the weight for regularization, G is the set of groups, ng is the size of the group g, and wg is the weight vector for a specific group. Group Lasso employs the existing knowledge about the correlations between features to reduce the weight of correlated features that contribute to the prediction, so Group Lasso captures the weight on less correlated features. The group structure we imported is the regions from the brain that are symmetric. In other words, we treated each pair of symmetric regions of interest (ROIs) as one group.

For multi-modal regression, we developed and applied L21 norm as the regularization term for Linear Regression. This L21 algorithm was modified based on the package ”Diagnosis-guided Multi-modality Phenotype Associations” [24]. The goal of the multi-modal learning is to combine the multi-modal imaging data to provide different perspectives on brain changes at different diagnostic stages and therefore improve the prediction accuracy. To achieve the multi-modal learning, the data from three modalities were first organized into a three-dimensional matrix in the form of

X=X11XN1X12XN2X13XN3T

where Xn1,Xn2,Xn3 each represents Region of Interest (ROI) feature column in three different modalities. L21 norm regularization term is expressed as

W2,1=j=1nWj2

which applies L2 norm regularization along the ROIs in each modality and L1 norm regularization along the three modalities. This L21 norm regularization term then regularizes the Linear Regression and minimizes the following objective function:

12m=1MyXmWm22+λW2,1

where m indicates different modalities; y represents the cognition score measurements; Xm represents feature matrix in certain modality; Wm represents weight in corresponding modality; λ represents the weight for regularization; W is the overall weight.

The reason for implementing the L21 Norm regularized regression is that such a method can perform joint learning of multiple different imaging modality datasets which cannot be done with traditional linear regression. As shown in Figure 2, the L21 Norm not only groups the imaging data according to their imaging modality type (AV45, FDG, VBM) but also groups the imaging data according to the ROIs they belong to. By using two grouping strategies during learning, the L21-regularized regression learns both the information from all the ROIs in each modality and information from each ROI across all different modalities, which allows more comprehensive knowledge learning from different imaging perspectives. This integration of knowledge from different perspectives can potentially improve the performance of predicting cognition outcomes.

Figure 2:

Figure 2:

Grouping strategy behind L21-norm-regularized linear regression.

To complete hyperparameter tuning, we leveraged the existing hyperparameter framework Optuna [2]. This package tool is included in the original version of the STREAMLINE pipeline. It employs a random search strategy that randomly samples from the list or interval of hyperparameters to complete training and testing. We selected 50 trials for each cross validation and each machine learning method for fast computation without too much compromise on the performance.

Performance Metrics and Feature Importance. We selected six commonly used regression metrics from scikit-learn [22] and SciPy [25]: Explained Variance Score (Adjusted R2 Score), Max Error, Mean Absolute Error, Mean Squared Error, Median Absolute Error, and Pearson Correlation.

For feature importance, we first employed existing STREAMLINE framework to calculate Mutual Information and MultiSURF scores prior to the Modeling stage. Both scores only depend on the features themselves and don’t use measurement data. These methods are both filter methods, while Mutual Information detects univariate associations, and MultiSURF detects feature interactions [26].

During the Modeling stage, we also computed the permutation importance of the features as part of the STREAMLINE framework, while we utilized the new regression models we developed. This function was originally from the scikitlearn package. It first applies the fitted model to calculate the performance score based on a particular scoring metric, and this is the baseline performance. Then, for each feature column, it randomly shuffles the column, keeping all other feature columns fixed, and it uses such new permuted data to fit the model again and calculates the performance. The function repeats this process for many trials, and feature importance is calculated from the average of the drop in performance score across trials.

Results

Pre-Modeling Feature Importance Figure 3 and Figure 4 are the Mutual Information Scoresd the MultiS Scores for each of the three imaging modalities as mentioned before, respectively. The figures were automatically generated by the STREAMLINE pipeline. We can notice that some regions have relatively higher scores among two or more modalities. Angular gyrus, Parahippocampal gyrus, Hippocampus, and Inferior Temporal gyrus tend to have higher Mutual Information Scores; Parahippocampal gyrus, Hippocampus, and Caudate nucleus tend to have higher MultiSURF Scores.

Figure 3:

Figure 3:

Mutual Information Scores for AV45, FDG, and VBM (from left to right)

Figure 4:

Figure 4:

MultiSURF Scores for AV45, FDG, and VBM (from left to right)

Performance Table Table 2 demonstrates the performance for each measurement, imaging modality, and regression method on the testing set. The best performed scores and the regression methods that perform the best in most metrics for each combination of measurement and imaging modality are bold. We can see that for FDG PET scans, Elastic Net performs better in terms of most scoring metrics. For AV45 PET scans and VBM MR scans, Group Lasso performs better in general, while sometimes Elastic Net performs better.

Table 2:

Testing Performance For Each Measurement, Modality, and Method.

Measurement Modality Method Explained Variance Max Error Mean Absolute Error Mean Squared Error Median Absolute Error Pearson Correlation
ADAS 11 AV45 Linear Regression 0.177 21.355 4.342 33.114 3.512 0.494
Elastic Net 0.289 21.841 4.010 29.166 3.208 0.548
Group Lasso 0.295 21.868 3.984 28.954 3.182 0.551
L21 0.267 30.732 8.111 95.848 6.939 0.552
FDG Linear Regression 0.353 17.298 3.937 26.250 3.289 0.629
Elastic Net 0.406 18.576 3.658 24.276 2.780 0.647
Group Lasso 0.401 18.823 3.675 24.453 2.820 0.645
L21 0.380 28.014 7.980 88.709 6.897 0.657
VBM Linear Regression 0.221 20.338 4.233 31.059 3.281 0.528
Elastic Net 0.308 19.872 3.954 28.273 3.155 0.565
Group Lasso 0.314 20.155 3.921 27.903 3.108 0.569
L21 0.252 30.422 8.313 99.438 7.041 0.550
ADAS 13 AV45 Linear Regression 0.264 26.092 6.116 61.514 4.999 0.551
Elastic Net 0.351 26.228 5.679 54.405 4.784 0.594
Group Lasso 0.349 26.238 5.698 54.497 4.876 0.593
L21 0.301 40.566 12.701 219.795 11.123 0.583
FDG Linear Regression 0.368 21.745 5.813 52.958 5.098 0.629
Elastic Net 0.460 22.313 5.258 45.325 4.275 0.682
Group Lasso 0.452 22.927 5.293 45.998 4.428 0.676
L21 0.407 36.844 12.569 207.482 11.59 0.682
VBM Linear Regression 0.288 27.656 6.020 59.543 4.991 0.565
Elastic Net 0.357 27.153 5.671 53.878 4.631 0.598
Group Lasso 0.363 27.755 5.622 53.352 4.590 0.603
L21 0.275 42.243 13.047 230.541 11.816 0.574
MMSE AV45 Linear Regression 0.094 7.867 1.781 5.293 1.396 0.411
Elastic Net 0.250 7.451 1.585 4.378 1.275 0.502
Group Lasso 0.252 7.531 1.581 4.390 1.237 0.502
L21 0.079 29.99 27.378 754.965 28.009 0.328
FDG Linear Regression 0.283 7.077 1.585 4.209 1.299 0.564
Elastic Net 0.363 6.959 1.498 3.731 1.220 0.605
Group Lasso 0.360 7.417 1.674 4.671 1.364 0.603
L21 0.108 29.99 27.301 750.575 27.945 0.347
VBM Linear Regression 0.164 7.614 1.717 4.925 1.404 0.468
Elastic Net 0.262 7.451 1.594 4.318 1.299 0.512
Group Lasso 0.255 7.483 1.636 4.423 1.355 0.506
L21 0.063 29.99 27.447 758.814 27.996 0.275
CDR-SB AV45 Linear Regression 0.216 5.734 1.051 2.015 0.859 0.496
Elastic Net 0.280 6.056 1.004 1.851 0.797 0.530
Group Lasso 0.274 6.062 1.014 1.868 0.817 0.525
L21 0.248 7.376 1.093 2.954 0.658 0.541
FDG Linear Regression 0.297 5.054 1.020 1.819 0.861 0.563
Elastic Net 0.358 5.718 0.958 1.657 0.770 0.600
Group Lasso 0.345 5.820 0.961 1.715 0.759 0.589
L21 0.336 7.137 1.065 2.685 0.673 0.614
VBM Linear Regression 0.222 5.909 1.044 2.001 0.845 0.501
Elastic Net 0.293 6.115 0.994 1.820 0.802 0.543
Group Lasso 0.283 6.132 0.998 1.848 0.795 0.535
L21 0.216 7.614 1.136 3.160 0.642 0.513

One remarkable finding is that one method often dominates in performance for nearly all metrics for each combination of measurement and imaging modality (or each dataset we run in STREAMLINE). It also matches our intuition that all other regularized regression methods outweigh Linear Regression as the baseline method. Since L21 Norm takes in three datasets for each measurement (for three imaging modalities) simultaneously, this multi-modal regression method has much higher errors for each error-evaluation methods than other single-modal methods. However, L21 Norm has the highest Pearson Correlation in the combinations ADAS 11 AV45, ADAS 11 FDG, ADAS 13 FDG, CDR-SB AV45, CDR-SB FDG, which demonstrates advantages of this particular multi-modal regression method over others.

Post-Modeling Feature Importance Figure 5 is a collection composed of feature importance heatmap figures of three imaging modalities (AV45, FDG, VBM) with four different cognition measurements (ADAS 11, ADAS 13, CDR-SB, MMSE) after modeling. This figure is generated based on the normalized permutation feature importance scores obtained from the trained models, where FInormalized =FI-Mean(FI) Standard Deviation (FI). The mean and standard deviation are along each row of the heatmap. Each row represents the 116 amyloid imaging features, and each column represents the combination of imaging modalities, measurements and machine learning methods. Notice that Linear Regression is not sparse, so it identifies much more features than other ML methods, and it also fails to identity the most relevant features as the other methods do. Hence, we only discuss the results produced by Elastic Net, Group Lasso and L21.

Figure 5:

Figure 5:

Feature Importance for AV45, FDG, VBM (from left to right) with four cognition measurements ADAS 11, ADAS 13, CDR-SB, MMSE (from top to bottom)

Another observation is that except Linear Regression, all measurements and all other methods approximately agree on the important features for each imaging modality. This helps to corroborate the validity of the measurements and methods we used. According to Figure 3 and Figure 6, as for AV45, the methods mainly captured Hippocampus_R, Parahippocampal gyrus_L, and Putamen_L as the most important regions of interest that explain AD score the most. In particular, CDR-SB weights higher on Hippocampus_R, while the other measurements weight higher on Putamen_L; as for FDG, the methods mainly captured Cingulum Post_L, Parahippocampal gyrus_L, and Angular_L as the most important regions. All measurements weight the highest on Cingulum Post_L; as for VBM, the methods mainly captured Hippocampus_R, Amygdala_L, Angular_R, and Temporal Mid_R as the most important regions. All measurements weight the highest on Hippocampus_R and Temporal Mid_R. According to these figures, we can notice that the four methods used agree on important features. Generally, the Cingulum, Hippocampus, Parahippocampal gyrus, Angular gyrus, Putamen, Amygdala, Middle Temporal Pole, Caudate nucleus tend to have higher permutation importance scores, which are consistent with the ROIs with high Mutual Information Scores and MultiSURF Scores in pre-modeling stage. In Figure 6, the FI scores of algorithms with the best performance according to Table 2 are mapped on human brain regions to visualize the important brain regions contributing to the outcome prediction. This consistency also indicates the importance of these ROIs for predicting cognitive outcomes. By further comparing the figures, we notice that the ADAS 11, ADAS 13, and CDR-SB measurements can yield feature importance with higher values than MMSE, which indicates that CDR-SB and ADAS-Cog can be more precise in measuring the severity of cognitive dysfunction than the MMSE.

Figure 6:

Figure 6:

Brain Region Maps of FI scores of algorithms with the best performance in AV45, FDG, VBM (top to bottom) corresponding to ADAS11, ADAS13, MMSE and CDR-SB (from left to right).

To better visualize the feature importance in an intuitive manner, we created the brain maps (Figure 6) in three views for each combination of measurement and modality, using the feature importance values derived from the best performing methods in Table 2. The feature importance values in the brain region maps are consistent with the feature importance heatmap from Figure. 5.

Conclusion

The original STREAMLINE pipeline gives a handful and structural way to compare the performance of each machine learning method and the feature importance; the visualizations help to intuitively identify common patterns detected by multiple methods. However, the initial version is limited to binary classification. In this work, we have extended STREAMLINE through implementing multiple regression-based ML models, including Linear Regression, Elastic Net, Group Lasso, and L21 Norm. We demonstrated the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer’s disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results have shown the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers. Given reliable results, we will continue to leverage the structure of the pipeline to incorporate more models such as ensemble methods that are expected to give better performance than the single models we incorporated. We will also keep revamping the implementations to ensure consistency, transparency, reusability, and customization of the entire pipeline for rigorous regression-based automated machine learning analyses for complex real-world health problems.

Acknowledgment

This work was supported in part by the National Institutes of Health [U01 AG066833, U01 AG068057, R01 AG071470]. Data used in this study were obtained from the Alzheimer’s Disease Neuroimaging Initiative database (adni.loni.usc.edu), which was funded by NIH U01 AG024904.

Figures & Table

References

  • 1.Urbanowicz RJ, Zhang R, Cui Y, Suri P. STREAMLINE: a simple, transparent, end-to-end automated machine learning pipeline facilitating data analysis and algorithm comparison. arXiv preprint arXiv:220612002v1. 2022 [Google Scholar]
  • 2.Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019 [Google Scholar]
  • 3.Thornton C, Hutter F, Hoos HH, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. Proc. of KDD-2013. 2013:847–55. [Google Scholar]
  • 4.Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R. Advances in Neural Information Processing Systems. vol. 28. Curran Associates, Inc.; 2015. Efficient and robust automated machine learning. Available from: https://proceedings.neurips.cc/paper/2015/file/11d0e6287202fced83f79975ec59a3a6-Paper.pdf. [Google Scholar]
  • 5.Zhou J, Liu J, Narayan VA, Ye J. Modeling disease progression via fused sparse group lasso. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012:1095–103. doi: 10.1145/2339530.2339702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Liu X, Goncalves AR, Cao P, Zhao D, Banerjee A, Initiative ADN, et al. Modeling Alzheimer’s disease cognitive scores using multi-task sparse group lasso. Computerized Medical Imaging and Graphics. 2018;66:100–14. doi: 10.1016/j.compmedimag.2017.11.001. [DOI] [PubMed] [Google Scholar]
  • 7.Weiner MW, Veitch DP, Aisen PS, et al. The Alzheimer’s Disease Neuroimaging Initiative: a review of papers published since its inception. Alzheimers Dement. 2013;9(5):e111–94. doi: 10.1016/j.jalz.2013.05.1769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Shen L, Thompson PM, Potkin SG, et al. Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav. 2014;8(2):183–207. doi: 10.1007/s11682-013-9262-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jack J C R, Bernstein MA, et al. Update on the magnetic resonance imaging core of the Alzheimer’s Disease Neuroimaging Initiative [Journal Article] Alzheimers Dement. 2010;6(3):212–20. doi: 10.1016/j.jalz.2010.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jagust WJ, Landau SM, et al. The Alzheimer’s Disease Neuroimaging Initiative 2 PET Core: 2015 [Journal Article] Alzheimers Dement. 2015;11(7):757–71. doi: 10.1016/j.jalz.2015.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jagust WJ, Bandy D, et al. The Alzheimer’s Disease Neuroimaging Initiative positron emission tomography core [Journal Article] Alzheimers Dement. 2010;6(3):221–9. doi: 10.1016/j.jalz.2010.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ashburner J, Friston KJ. Voxel-based morphometry–the methods. Neuroimage. 2000;11(6):805–21. doi: 10.1006/nimg.2000.0582. [DOI] [PubMed] [Google Scholar]
  • 13.Tzourio-Mazoyer N, et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain [Journal Article] Neuroimage. 2002;15(1):273–89. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
  • 14.Feng Y, Kim M, Yao X, Liu K, Long Q, Shen L. Deep multiview learning to identify population structure with multimodal imaging. 2020 IEEE 20th Int. Conf. on Bioinfo. and Bioeng. (BIBE) 2020:308–14. doi: 10.1109/bibe50027.2020.00057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Alzheimer’s Disease modeling challenge. http://www.pi4cs.org/qt-pad-challenge .
  • 16.Shen L, Kim S, Qi Y, Inlow M, Swaminathan S, Nho K, et al. Identifying neuroimaging and proteomic biomarkers for MCI and AD via the Elastic Net. Multimodal Brain Image Analysis. 2011;7012:27–34. doi: 10.1007/978-3-642-24446-9_4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hao X, Bao Y, Guo Y, Yu M, Zhang D, Risacher SL, et al. Multi-modal neuroimaging feature selection with consistent metric constraint for diagnosis of Alzheimer’s disease. Medical Image Analysis. 2020;60:101625. doi: 10.1016/j.media.2019.101625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang H, Nie F, Huang H, Risacher S, Ding C, et al. Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. IEEE Int Conf Comput Vis. 2011:557–62. doi: 10.1109/ICCV.2011.6126288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang H, Nie F, Huang H, Risacher S, et al. Identifying AD-sensitive and cognition-relevant imaging biomarkers via joint classification and regression. Med Image Comput Comput Assist Interv. 2011;14(Pt 3):115–23. doi: 10.1007/978-3-642-23626-6_15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Yan J, Risacher S, Kim S, Simon J, Li T, Wan J, et al. 1. Multimodal neuroimaging predictors for cognitive performance using structured sparse learning. vol. 7509 of Lecture Notes in Computer Science. 2012:1–17. [Google Scholar]
  • 21.Yan J, Li T, Wang H, Huang H, Wan J, Nho K, et al. Cortical surface biomarkers for predicting cognitive outcomes using group l2,1 norm. Neurobiol Aging. 2015;36(Suppl 1):S185–93. doi: 10.1016/j.neurobiolaging.2014.07.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, et al. API design for machine learning software: experiences from the scikit-learn project. ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 2013:108–22. [Google Scholar]
  • 23.Moe YM. API reference. https://group-lasso.readthedocs.io/en/latest/api reference.html .
  • 24.Shao W. Diagnosis-guided multi-modality phenotype associations. https://github.com/shaoweinuaa/DGMM . [Google Scholar]
  • 25.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods. 2020;17:261–72. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Urbanowicz R, Zhang R. STREAMLINE: a simple, transparent, end-to-end automated machine learning pipeline. GitHub. 2022 https://github.com/UrbsLab/STREAMLINE . [Google Scholar]

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES