An analysis of the effects of limited training data in distributed learning scenarios for brain age prediction

Raissa Souza; Pauline Mouches; Matthias Wilms; Anup Tuladhar; Sönke Langner; Nils D Forkert

doi:10.1093/jamia/ocac204

. 2022 Oct 26;30(1):112–119. doi: 10.1093/jamia/ocac204

An analysis of the effects of limited training data in distributed learning scenarios for brain age prediction

Raissa Souza ^1,^2,^3,^✉, Pauline Mouches ^4,^5,⁶, Matthias Wilms ^7,^8,⁹, Anup Tuladhar ^10,^11,¹², Sönke Langner ¹³, Nils D Forkert ^14,^15,^16,¹⁷

PMCID: PMC9748540 PMID: 36287916

Abstract

Objective

Distributed learning avoids problems associated with central data collection by training models locally at each site. This can be achieved by federated learning (FL) aggregating multiple models that were trained in parallel or training a single model visiting sites sequentially, the traveling model (TM). While both approaches have been applied to medical imaging tasks, their performance in limited local data scenarios remains unknown. In this study, we specifically analyze FL and TM performances when very small sample sizes are available per site.

Materials and Methods

2025 T1-weighted magnetic resonance imaging scans were used to investigate the effect of sample sizes on FL and TM for brain age prediction. We evaluated models across 18 scenarios varying the number of samples per site (1, 2, 5, 10, and 20) and the number of training rounds (20, 40, and 200).

Results

Our results demonstrate that the TM outperforms FL, for every sample size examined. In the extreme case when each site provided only one sample, FL achieved a mean absolute error (MAE) of 18.9 ± 0.13 years, while the TM achieved a MAE of 6.21 ± 0.50 years, comparable to central learning (MAE = 5.99 years).

Discussion

Although FL is more commonly used, our study demonstrates that TM is the best implementation for small sample sizes.

Conclusion

The TM offers new opportunities to apply machine learning models in rare diseases and pediatric research but also allows even small hospitals to contribute small datasets.

Keywords: machine learning, distributed learning, brain age prediction

Graphical Abstract

INTRODUCTION

It has been shown that machine learning, especially deep learning models, can achieve accuracies like human experts for many computer-aided diagnosis applications.¹^,² However, many diverse datasets are typically needed to effectively train models that generalize well to unseen data. Training data size and diversity are commonly increased by collecting multisite data into a centralized repository before model training, which is called central learning.

Over the years, significant efforts have been made to create large data repositories such as the UK Biobank³ and the Alzheimer’s Disease Neuroimaging Initiative (ADNI),⁴ among others. Although using such repositories to implement and train central learning models has proven successful, this approach requires significant resources and time to collect and curate the datasets centrally. Moreover, ethical and legal restrictions and time-consuming procedures to implement data-sharing agreements between (international) sites often present considerable barriers to a central data collection approach.

An emerging approach to machine learning model training, known as distributed learning, aims to avoid the need to collect data in a central location by remotely training models at each site. Consequently, it overcomes several regulatory limitations and can increase dataset size and diversity for machine learning training.⁵

To date, distributed learning for neural networks has been primarily implemented using federated learning (FL), which consists of training independent neural network models at each collaborating site.^6–8 After local training, their weights are transferred to a central server and combined into a global model using an aggregation function such as simple average, median, or weighted average. Most studies exploring and evaluating FL models^8–11 simulate collaborative networks with only a few sites, whereas each one provides a relatively large local dataset for machine learning model training. For instance, healthcare-focused studies^8–11 simulated between 4 and 6 sites where each site provided between 100 and 300 datasets. One of the limitations of this evaluation setup is that it remains unknown how well these FL methods perform when only very few patients are available at a single site (ie, rare diseases, small hospitals, and pediatric cases). However, it is intuitive to assume that FL approaches will probably not perform well in these cases, as training a neural network on very small samples is known to be challenging.

In contrast to FL, a relatively underexplored distributed learning method is the so-called TM.¹²^,¹³ It consists of a single model sequentially trained at the collaborating sites. After training at one site, the updated model is passed to the next (with or without a server in between). Thus, no aggregation function is necessary as only a single model is iteratively updated. In theory, this approach should be more suitable when only small samples are available per site as the single model sequentially sees all datasets. However, a typical problem related to sequentially training a model is that visiting a site at a time can cause catastrophic forgetting, which might also downgrade the accuracy in these cases.

Sheller et al¹⁴ have shown how catastrophic forgetting and small contributions could negatively impact FL and TMs for brain tumor segmentation. However, in their study, only 2 out of 10 sites contributed small datasets (5 and 4). Instead, this work aims to implement traveling and federated machine learning models and compare both approaches for the extreme case when sites can only provide very small samples.

Therefore, we utilize brain age prediction using T1-weighted magnetic resonance imaging (MRI) data as our exemplary application scenario. While data to train such brain age prediction models is widely and freely available, we choose this specific application scenario as this data availability allows us to systematically simulate and analyze the effect of small sample sizes in contributing sites. From a clinical perspective, biological brain age prediction is crucial because it allows calculating the so-called brain age gap. The brain age gap has been investigated by many researchers¹⁵^,¹⁶ and has been suggested as a promising biomarker for several neurodegenerative diseases.¹⁷ It is determined by calculating the difference between a subject’s chronological and biological brain ages, capturing accelerated or delayed brain aging. A higher risk for neurodegenerative diseases is expected when a subject’s biological brain age exceeds the chronological age considerably. Such models typically use MRI data as input to machine learning regression models. While not a rare disease or pediatric case, we believe this work’s findings will be transferable to other problems with limited data.

The contributions of this work are as follows: (1) the exploration of distributed learning models for very small datasets and (2) the first evaluation of distributed learning architectures for regression tasks.

MATERIALS AND METHODS

Dataset

The data used in this study is a subsample of the data acquired within the Study of Health in Pomerania (SHIP), a general population study of randomly selected participants within Pomerania, Germany.¹⁸ It consists of 2025 cross-sectional T1-weighted MRI brain scans of predominantly healthy adults aged between 21 and 82 years (mean: 50 ± 13 years) acquired on a single scanner.

The SHIP study was approved by the local ethics committee of the University of Greifswald (BB 39/08, June 19, 2008) and included written consent forms from all participants. Additionally, MRI scans were anonymized for this study, eliminating additional ethics approval.

Morphological brain features, including structure volumes, surface areas, and thickness, were automatically derived from the T1-weighted MRI data using Fastsurfer.¹⁹ First, Fastsurfer segmented anatomical structures using a deep learning-based model and then extracted the different morphological measurements through surface reconstruction. This process resulted in 223 extracted features (volume: 99; surface area: 62; and thickness: 62) for each subject. Figure 1 shows an example of anatomical structure segmentations for 3 datasets.

Figure 1. — Example of structure segmentation using Fastsurfer.

Simulated distributed data

It is well known that machine learning models perform better when larger datasets are used during training, which has also been demonstrated for the task of brain age prediction.²⁰ Additionally, class balance is another critical factor for such models to achieve stable training and performance.²¹ As seen in Table 1, the dataset used in this work has a smaller representation of subjects aged between 20 and 40 years and for those over 60 years. In order to focus specifically on the effect of varying sample sizes across sites, a balanced dataset was composed by randomly selecting 140 subjects for training and 20 subjects for testing from 10 years bins. This reduced the initial dataset from 2025 to 960 subjects, whereas 840 subjects were used for training and 120 subjects for testing.

Table 1.

Data distribution

Age bins (years)	Number of subjects	Representation of the dataset (%)	Mean age (STD)	Median age
20–30	173	8	25.79 (2.41)	26
30–40	317	15	35.83 (2.90)	36
40–50	498	24	45.45 (2.80)	45
50–60	495	24	55.41 (2.83)	55
60–70	381	18	65.44 (2.84)	65
70+	161	7	74.36 (2.95)	74

Open in a new tab

We performed 10 Monte-Carlo cross-validation iterations to prevent selection-biased conclusions from being drawn from a single data split into simulated training and testing sets. Subsets of the global training sets were further divided to represent independent sites in the distributed learning network. This simulation scheme allowed us to modify the size of simulated local datasets from large to very small (ie, 20, 10, 5, 2, and 1).

The sample size directly impacts the number of collaborating sites composing our distributed learning network. For instance, when simulating a sample size of 20 subjects per site, our network consisted of 42 sites. In contrast, when simulating a sample size of 1 subject per site, the network was composed of 840 sites.

Base machine learning model

Brain age prediction was conducted using a multilayer perceptron (MLP) based on the morphological features extracted from the neuroimaging data. Four MLP architectures with the following number of neurons in the hidden layers were tested: (256); (256, 128); (256, 128, 64); (256, 128, 64, 32). The best-performing architecture was selected based on the lowest average mean absolute error (MAE), comparing chronological and predicted ages across 10 Monte-Carlo cross-validation iterations based on the central learning implementation. The architecture in this case refers to the number of layers and neurons in each layer. The hyperparameters were fine-tuned for each experiment since the Adam optimizer internally uses Momentum and the number of samples available during training to compute the gradients and update the weights. Therefore, the hyperparameters may change based on the setup (ie, central learning, FL, TM). Thus, the best-performing architecture consisted of 2 hidden layers with 256 and 128 neurons, respectively, and an output layer with 1 neuron. Hidden layers used ReLU activations, while the output layer’s activation function was linear.

The federated learning models

Every collaborating site receives a copy of a global model initialized at a central server in the FL setup. Local training at each site is performed for a predefined number of epochs. Upon completion of local training, the learned weights are sent back to the server. The server then applies the aggregation function, updates the global model, and sends its latest version to the sites for retraining (Supplementary Figure S1). This process can be repeated for several rounds, whereas multiple rounds are expected to improve the global model performance.

The aggregation function is a critical element of a FL implementation. The weighted average is often assumed to improve performance²¹ by considering the data distribution. Since this work simulated sites contributing the same number of datasets, a simple average of the learnt weights was used.

The traveling models

In the TM setup, the traveling sequence, which defines the order by which each site in the network is visited, is crucial as it could cause a phenomenon called catastrophic forgetting.¹⁴ Catastrophic forgetting occurs when the model forgets patterns learnt from data at sites from the beginning of the traveling sequence, favoring the data seen from sites towards the end of the sequence. Briefly described, after determining the traveling sequence, the server sends the initialized model to the first site of the sequence. After training is completed at that site, the model with the learnt weights travels to the next site, either directly or via the central server (Supplementary Figure S2). This process is repeated until the model reaches the last site of the sequence, finishing 1 training cycle. A training cycle in the TM corresponds to a round in FL. The traveling process can be repeated for multiple cycles to improve the model’s performance and reduce the risk of catastrophic forgetting.²²

Baseline models and evaluation

All models (central model, FL model, and TM) were initialized with the same random weights with a mean of 0 and standard deviation of 0.1, and their biases were initialized to zero. The Adam optimizer with an initial learning rate of 0.01 was chosen, and all models were trained for a total of 200 epochs. After training them for 10 epochs, an exponential learning rate decay of −0.1 was applied for every subsequent epoch.

To enable a fair comparison, the FL model was evaluated using 20, 40, and 200 rounds, whereas for each round, each local model was trained for 10, 5, and 1 epoch(s), respectively. Thus, the total number of epochs in all cases remained at 200.

For the TM, sites were visited in the same order for all cycles. Fixing the traveling sequence excludes any bias from varying travel orders. Furthermore, the same number of rounds (ie, cycles) and epochs per round were evaluated for the TM. For instance, when performing 20 cycles, the model was trained for 10 epochs per site. In contrast, for 40 cycles, it was trained for 5 epochs, always allowing the models to see the training data 200 times, the number of epochs the best-performing central learning model was trained with.

In addition to comparing the distributed learning models described above with the central learning model, we compared the distributed learning models with a naive model. The naive model is a model that always predicts the average age of the training set for every subject from the testing set. This comparison aimed to evaluate if the training leads to measurable improvements in the models’ performances. Supplementary Table S1 shows more details.

In accordance with the brain age prediction literature,²³ all models were quantitatively evaluated using the MAE. The MAE is determined by computing the absolute difference between the predicted brain and known chronological age. Consequently, MAE values closer to zero indicate a better prediction performance.

RESULTS

Baseline models

Table 2 summarizes the performance of the baseline models. The central learning model achieved the lowest MAE of 5.99 years, which is in the range of previously published models using similar tabulated data.²³ Figure 2 shows both baseline models predicted and chronological ages and their corresponding regression lines. Here, the naive model predicted age of around 50 years for every subject. In contrast, the central learning model predicted most of the ages well, as can be seen by the regression line nearly matching the optimal diagonal line for most ages. However, it can be observed that the central model slightly overpredicted subjects under 30 years and underpredicted subjects above 70 years. This phenomenon, known as the regression to the mean problem, was previously observed when evaluating brain age estimation models.²⁴

Table 2.

Summary of baseline models’ performance

Models	MAE
Central learning	5.99 ± 0.39
Naïve	14.73 ± 0.29

Open in a new tab

Figure 2. — Pink represents the predicted versus chronological ages and the regression line for the central learning model, while blue represents the naive model. The black dashed line represents the optimal diagonal.

Distributed learning models

Figure 3 shows the MAE of the FL models and TMs for different sample sizes at each collaborating site and round (ie, cycles for TMs). While the performance metrics presented are the average of the 10 Monte-Carlo cross-validation iterations, the scatter plots present the model’s age prediction versus the true age for all 10 Monte-Carlo cross-validation iterations. Supplementary Table S2 shows more details.

Effect of sample size per collaborating sites

It can be observed in Figure 3 that the FL models’ performance decreased as the sample size per site was reduced, independent of the number of times the FL models were aggregated. For instance, for both 20 and 40 rounds, the worst performance (MAE of ∼18.9 years) was found for the experiment simulating only a single subject for each contributing site. Their performance was worse than the naive model (MAE of 14.7 years), which indicated that the models could not learn many meaningful patterns. On the other hand, the performance of the TM was not affected considerably when trained with small sample sizes (ie, MAE of 6.78 years for 20 rounds and MAE of 6.06 years for 40 rounds for a single subject for each contributing site). The TM consistently achieved results comparable to central learning (MAE of 5.99 years) throughout the experiments.

Effect of number of epochs per round

Based on the quantitative results, it also became apparent that the ability of the distributed learning models to learn meaningful relationships from the data was also affected by the number of epochs that the model was trained for per round/cycle. Figure 3A–C shows how the performance of the FL models degraded with respect to the number of epochs that the models were trained before the aggregation steps. Thus, the performance of FL models showed an inverse relationship between the number of epochs per round/cycle and the prediction error. More precisely, a higher error was observed when training for fewer epochs per round/cycle. In contrast, the performance of the TM even improved when the models were trained at each site for fewer epochs per round/cycle. Although this effect is less pronounced, it showed to be statistically significant (2-tailed paired t-test P < .009).

Figure 4 shows the chronological and predicted ages for the experiments where each site provided 2 or only one subject(s) during training (Supplementary Figure S3 shows graphs for every experiment). Regarding the results for 2 subjects per site, Figures 4A and 5B demonstrate that training FL models for 10 and 5 epochs per round only achieved a good prediction for subjects aged between 40 and 50 years, while TMs achieved better predictions, performing slightly worse for subjects below 30 and over 70 years. For the results for 1 subject per site, traveling and FL models’ performance varied substantially. The FL models’ regression lines in the opposite direction indicated that the models learnt wrong relationships from the data, independent of the number of epochs trained. In contrast, the TM achieved results similar to when 2 subjects per site were simulated independently of the number of epochs trained. In that case, the models reached a high-performance level, predicting the subjects’ age closer to the optimal diagonal regression line, as shown in Figure 4C.

Figure 5. — (A) Performance of the traveling model trained for 1 epoch per round for different rounds. (B) Predicted ages and regression line for models trained for 25 and 200 rounds.

Effect of number of rounds

Generally, it can be observed in Figure 3 that FL models achieved better performance when trained for multiple epochs per round, while the TMs achieved better results for a single epoch. When training our models for a single epoch, 200 rounds/cycles were performed. Thus, this finding could indicate that more cycles are necessary to achieve better performance with TMs. However, Figure 5A demonstrates that TMs could perform similarly to central learning even when the number of times the model viewed the whole training set was reduced by 25% for each value of the site-level sample size examined (2-tailed paired t-test P < .0028). Figure 5B shows the predicted ages and regression lines for the TMs trained for 25 and 200 rounds/cycles. Therefore, overlapping regression lines demonstrated that TMs could learn with fewer rounds (ie, cycles) when trained for a single epoch per site.

DISCUSSION

The main finding of this work is that the TM is a promising implementation when very small sample sizes are available for machine learning training. Moreover, in extreme cases where each site only contributes a single dataset, the general FL approach is not well suited, while the TMs lead to considerably better results. This is essential to allow sites that can only provide limited datasets to still contribute to training powerful machine learning models. Practical scenarios where sites may hold limited data are (1) rare diseases as they have low prevalence,²⁵ (2) small hospitals, and (3) pediatric cases with considerable developmental differences.²⁶

Although FL is the most popular implementation, our results show that this approach is ineffective when dealing with very small sample sizes and/or training for fewer epochs prior to aggregation. It is likely that the poor result for FL models when only a single sample was available and/or fewer epochs were performed during training is caused by the heavily unspecified regression problem associated with the brain age prediction task. Even when using the simplest linear modeling approach, it is not possible to determine the correct orientation of the regression line from a single data point. Thus, the regression computed for only a single subject is very unstable and is heavily influenced by the training scheme, such as the random initialization of the model’s weights. A similar effect was also observed by Li et al,²¹ who noticed that when the distribution of labels in the training set is limited to a single class, the machine learning models are more prone to overfitting in the local training stage and the classification accuracy drops into the chance level. However, it should be noted that Li et al²¹ investigated a classification problem and not a regression problem. On the other hand, the lack of data and/or epochs does not affect the TMs as a single model sequentially sees all datasets. As a result, the TMs outperformed FL models for every experiment in our study, especially when a single sample was available and/or fewer epochs were performed at each site. Thus, the performance of FL is better when the amount of data and/or the number of epochs per site is higher.

Our findings demonstrated that FL models are sensitive to the number of local epochs, which is also in line with the findings reported by Li et al.²¹ Our results suggest that training machine learning models for multiple epochs per round lead to a poor generalization for younger subjects when only 2 or fewer subjects are available per site, possibly due to overfitting. More precisely, overfitting to the lower age extreme occurs because healthy participants between 20 and 30 years exhibit rather minor atrophy, making it more challenging for machine learning models to predict the correct ages. For this reason, it is likely that the model just predicted many younger subjects who do not exhibit any considerable atrophy around the age of 20. Furthermore, training for a single epoch per site was demonstrated to be even more difficult for FL models, while TMs performed well in that case. Controlling our experiments to allow TMs to train for the same number of epochs as the central learning model implied that training the TM for a single epoch meant 200 cycles. Thus, the good performance of the TM in this extreme case might suggest that more cycles are necessary, which would increase communication costs. However, we demonstrated that this is not the case as the TM can perform comparably to central learning while training for 25% fewer cycles. This ability of TMs to converge with fewer cycles for a single epoch per cycle is promising and reduces communication costs.

In contrast to a previous study that simulated up to 32 sites,²⁷ it was found in this work that the TM can learn and generalize successfully to unseen data even when sites have only 1 sample available and that the TM performance did not degrade when the number of sites increased. Aside from considerable differences regarding the overall sample size and number of sites simulated; the technical details provided for the other study do not allow us to identify relevant technical differences compared to our setup that may have caused this discrepancy.

A study that simulated a TM using mini-batches for classification tasks argued that fine-tuning the network is vital to prevent overfitting.²⁸ However, fine-tuning requires a validation set, which is very challenging when each site only provides a very small dataset. Nevertheless, it was found in this work that it is possible to achieve results in line with the brain age prediction literature²³ without relying on a validation set when dealing with a balanced training dataset.

There are a few limitations of this work that should be highlighted. First, our work was limited to the brain age prediction application because of the large dataset available that could be used to decipher the effect of a small sample size. Even though brain age prediction is not a rare disease or pediatric case, we believe this work’s findings are translatable to other problems with limited data. Second, the sample used in this study does not account for variability encountered when dealing with multisite data as it was collected in a single site and was identical and uniform distributed. Although these present limitations, they also eliminate biases. Thus, our results only consider the effect of smaller sample sizes at each site and not site-level variability in the distribution of classes, features, or site-level effects in the relationship between the two. Third, our study implemented only the simple weight average strategy for the FL aggregation function, but advanced strategies could be explored. Finally, our work focused on regression problems, which are typically harder to solve, especially when considering distributed learning with small local sample sizes. This may be the reason that there is such limited previous work and one of the main contributions of this work. While classification problems are out of the scope of this work, we will certainly investigate those in the future.

CONCLUSION

This work explored the applicability of distributed learning for extremely small sample sizes in medical imaging analysis. To the best of our knowledge, this is the first comparison of traveling and FL models for a regression task. Our results demonstrate that the TM outperforms the FL model independent of the sample size available per site. Additionally, it was found that the TM achieves similar results compared to central learning even when trained using a single epoch. Thus, the TM might be the best implementation for distributed learning in the case of small local datasets, providing a new opportunity to apply machine learning models in rare diseases, small hospitals, and pediatric research.

FUNDING

This work was supported by the Canada Research Chairs program, the River Fund at Calgary Foundation, the Canadian Institutes of Health Research (CIHR), and the Hotchkiss Brain Institute.

AUTHOR CONTRIBUTIONS

RS, MW, and NDF contributed to the study’s conception. SL contributed to data acquisition, and PM and SL contributed to data curation. RS, PM, and MW analyzed the results. RS wrote the first draft of the manuscript. All authors critically revised the previous versions of the manuscript and approved the final manuscript.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

CONFLICT OF INTEREST STATEMENT

None declared.

Supplementary Material

ocac204_Supplementary_Data

Click here for additional data file.^{(1.3MB, docx)}

Contributor Information

Raissa Souza, Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada; Biomedical Engineering Graduate Program, University of Calgary, Calgary, Alberta, Canada.

Pauline Mouches, Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada; Biomedical Engineering Graduate Program, University of Calgary, Calgary, Alberta, Canada.

Matthias Wilms, Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada; Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada.

Anup Tuladhar, Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada; Biomedical Engineering Graduate Program, University of Calgary, Calgary, Alberta, Canada.

Sönke Langner, Institute for Diagnostic Radiology and Neuroradiology, Rostock University Medical Center, Rostock, Germany.

Nils D Forkert, Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada; Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada; Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada.

Data Availability

The data that support the findings of this study are available from the Study of Health in Pomerania study upon request (https://www2.medizin.uni-greifswald.de/cm/fv/ship/).

REFERENCES

1. Maceachern SJ, Forkert ND.. Machine learning for precision medicine. Genome 2021; 64 (4): 416–25. [DOI] [PubMed] [Google Scholar]
2. lo Vercio L, Amador K, Bannister JJ, et al. Supervised machine learning tools: a tutorial for clinicians. J Neural Eng 2020; 17 (6): 062001. [DOI] [PubMed] [Google Scholar]
3. Sudlow C, Gallacher J, Allen N, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 2015; 12 (3): e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Jack CR, Bernstein MA, Fox NC, et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J Magn Reson Imaging 2008; 27 (4): 685–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Tuladhar A, Gill S, Ismail Z, et al. ; Alzheimer’s Disease Neuroimaging Initiative. Building machine learning models without sharing patient data: a simulation-based analysis of distributed learning by ensembling. J Biomed Inform 2020; 106: 103424. [DOI] [PubMed] [Google Scholar]
6. Ng D, Lan X, Yao MMS, et al. Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quant Imaging Med Surg 2021; 11 (2): 852–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Kaissis GA, Makowski MR, Rückert D, et al. Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2020; 2 (6): 305–11. [Google Scholar]
8. Basodi S, Raja R, Ray B, et al. Federation of brain age estimation in structural neuroimaging data. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS; Published online 2021: 3854–3857. doi: 10.1109/EMBC46164.2021.9629865 [DOI] [PubMed]
9. Adnan M, Kalra S, Cresswell JC, et al. Federated learning and differential privacy for medical image analysis. Sci Rep 2022; 12 (1): 1953. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Li X, Gu Y, Dvornek N, et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med Image Anal 2020; 65: 101765. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Silva S, Gutman BA, Romero E, et al. Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. In: Proceedings – International Symposium on Biomedical Imaging. Vol 2019 April. IEEE Computer Society; 2019: 270–274. doi: 10.1109/ISBI.2019.8759317 [DOI]
12. Chang K, Balachandar N, Lam C, et al. Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc 2018; 25 (8): 945–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Souza R, Aulakh A, Mouches P, et al. A comparative analysis of the impact of data distribution on distributed learning with a traveling model for brain age prediction. In: Park BJ, Deserno TM, eds. Medical Imaging 2022: Imaging Informatics for Healthcare, Research, and Applications. Vol. 12037. SPIE; 2022: 1. doi: 10.1117/12.2612728 [DOI] [Google Scholar]
14. Sheller MJ, Edwards B, Reina GA, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 2020; 10 (1): 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Wilms M, Bannister JJ, Mouches P, et al. Invertible modeling of bidirectional relationships in neuroimaging with normalizing flows: application to brain aging. IEEE Trans Med Imaging 2022; 41 (9): 2331–47. [DOI] [PubMed] [Google Scholar]
16. Mouches P, Wilms M, Rajashekar D, et al. Multimodal biological brain age prediction using magnetic resonance imaging and angiography with the identification of predictive regions. Hum Brain Mapp 2022; 43 (8): 2554–2566. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Baecker L, Garcia-Dias R, Vieira S, et al. Machine learning for brain age prediction: introduction to methods and clinical applications. eBioMedicine 2021; 72: 103600. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Volzke H, Alte D, Schmidt CO, et al. Cohort profile: the study of health in pomerania. Int J Epidemiol 2011; 40 (2): 294–307. [DOI] [PubMed] [Google Scholar]
19. Henschel L, Conjeti S, Estrada S, et al. FastSurfer – a fast and accurate deep learning based neuroimaging pipeline. Neuroimage 2020; 219: 117012. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. de Lange AG, Anatürk M, Rokicki J, et al. Mind the gap: performance metric evaluation in brain‐age prediction. Hum Brain Mapp 2022; 43 (10): 3113–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Li Q, Diao Y, Chen Q, et al. Federated Learning on Non-IID Data Silos: An Experimental Study. 2021. Published online February 3, 2021. https://arxiv.org/abs/2102.02079v4. Accessed December 13, 2021. [Google Scholar]
22. Souza R, Tuladhar A, Mouches P, et al. Multi-institutional travelling model for tumor segmentation in MRI datasets. In: Crimi A, Bakas S, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. Lecture Notes in Computer Science. Cham: Springer; 2022: 420–432. [Google Scholar]
23. Nam Y, Jang J, Lee HY, et al. Estimating age-related changes in in vivo cerebral magnetic resonance angiography using convolutional neural network. Neurobiol Aging 2020; 87: 125–31. [DOI] [PubMed] [Google Scholar]
24. Liang H, Zhang F, Niu X.. Investigating systematic bias in brain age estimation with application to post-traumatic stress disorders. Hum Brain Mapp 2019; 40 (11): 3143–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Taruscio D, Vittozzi L, Rocchetti A, et al. The occurrence of 275 rare diseases and 47 rare disease groups in Italy. Results from the National Registry of Rare Diseases. Int J Environ Res Public Health 2018; 15 (7): 1470. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Rahimzadeh V, Schickhardt C, Knoppers BM, et al. Key implications of data sharing in pediatric genomics. JAMA Pediatr 2018; 172 (5): 476–81. [DOI] [PubMed] [Google Scholar]
27. Sheller M, Reina G, Edwards B, et al. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. Lect Notes Comput Sci 2018; 11383: 92–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Zerka F, Urovi V, Bottari F, et al. Privacy preserving distributed learning classifiers – sequential learning with small sets of data. Comput Biol Med 2021; 136: 104716. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocac204_Supplementary_Data

Click here for additional data file.^{(1.3MB, docx)}

Data Availability Statement

The data that support the findings of this study are available from the Study of Health in Pomerania study upon request (https://www2.medizin.uni-greifswald.de/cm/fv/ship/).

[ocac204-B1] 1. Maceachern SJ, Forkert ND.. Machine learning for precision medicine. Genome 2021; 64 (4): 416–25. [DOI] [PubMed] [Google Scholar]

[ocac204-B2] 2. lo Vercio L, Amador K, Bannister JJ, et al. Supervised machine learning tools: a tutorial for clinicians. J Neural Eng 2020; 17 (6): 062001. [DOI] [PubMed] [Google Scholar]

[ocac204-B3] 3. Sudlow C, Gallacher J, Allen N, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 2015; 12 (3): e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B4] 4. Jack CR, Bernstein MA, Fox NC, et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J Magn Reson Imaging 2008; 27 (4): 685–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B5] 5. Tuladhar A, Gill S, Ismail Z, et al. ; Alzheimer’s Disease Neuroimaging Initiative. Building machine learning models without sharing patient data: a simulation-based analysis of distributed learning by ensembling. J Biomed Inform 2020; 106: 103424. [DOI] [PubMed] [Google Scholar]

[ocac204-B6] 6. Ng D, Lan X, Yao MMS, et al. Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quant Imaging Med Surg 2021; 11 (2): 852–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B7] 7. Kaissis GA, Makowski MR, Rückert D, et al. Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2020; 2 (6): 305–11. [Google Scholar]

[ocac204-B8] 8. Basodi S, Raja R, Ray B, et al. Federation of brain age estimation in structural neuroimaging data. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS; Published online 2021: 3854–3857. doi: 10.1109/EMBC46164.2021.9629865 [DOI] [PubMed]

[ocac204-B9] 9. Adnan M, Kalra S, Cresswell JC, et al. Federated learning and differential privacy for medical image analysis. Sci Rep 2022; 12 (1): 1953. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B10] 10. Li X, Gu Y, Dvornek N, et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med Image Anal 2020; 65: 101765. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B11] 11. Silva S, Gutman BA, Romero E, et al. Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. In: Proceedings – International Symposium on Biomedical Imaging. Vol 2019 April. IEEE Computer Society; 2019: 270–274. doi: 10.1109/ISBI.2019.8759317 [DOI]

[ocac204-B12] 12. Chang K, Balachandar N, Lam C, et al. Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc 2018; 25 (8): 945–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B13] 13. Souza R, Aulakh A, Mouches P, et al. A comparative analysis of the impact of data distribution on distributed learning with a traveling model for brain age prediction. In: Park BJ, Deserno TM, eds. Medical Imaging 2022: Imaging Informatics for Healthcare, Research, and Applications. Vol. 12037. SPIE; 2022: 1. doi: 10.1117/12.2612728 [DOI] [Google Scholar]

[ocac204-B14] 14. Sheller MJ, Edwards B, Reina GA, et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 2020; 10 (1): 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B15] 15. Wilms M, Bannister JJ, Mouches P, et al. Invertible modeling of bidirectional relationships in neuroimaging with normalizing flows: application to brain aging. IEEE Trans Med Imaging 2022; 41 (9): 2331–47. [DOI] [PubMed] [Google Scholar]

[ocac204-B16] 16. Mouches P, Wilms M, Rajashekar D, et al. Multimodal biological brain age prediction using magnetic resonance imaging and angiography with the identification of predictive regions. Hum Brain Mapp 2022; 43 (8): 2554–2566. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B17] 17. Baecker L, Garcia-Dias R, Vieira S, et al. Machine learning for brain age prediction: introduction to methods and clinical applications. eBioMedicine 2021; 72: 103600. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B18] 18. Volzke H, Alte D, Schmidt CO, et al. Cohort profile: the study of health in pomerania. Int J Epidemiol 2011; 40 (2): 294–307. [DOI] [PubMed] [Google Scholar]

[ocac204-B19] 19. Henschel L, Conjeti S, Estrada S, et al. FastSurfer – a fast and accurate deep learning based neuroimaging pipeline. Neuroimage 2020; 219: 117012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B20] 20. de Lange AG, Anatürk M, Rokicki J, et al. Mind the gap: performance metric evaluation in brain‐age prediction. Hum Brain Mapp 2022; 43 (10): 3113–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B21] 21. Li Q, Diao Y, Chen Q, et al. Federated Learning on Non-IID Data Silos: An Experimental Study. 2021. Published online February 3, 2021. https://arxiv.org/abs/2102.02079v4. Accessed December 13, 2021. [Google Scholar]

[ocac204-B22] 22. Souza R, Tuladhar A, Mouches P, et al. Multi-institutional travelling model for tumor segmentation in MRI datasets. In: Crimi A, Bakas S, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2021. Lecture Notes in Computer Science. Cham: Springer; 2022: 420–432. [Google Scholar]

[ocac204-B23] 23. Nam Y, Jang J, Lee HY, et al. Estimating age-related changes in in vivo cerebral magnetic resonance angiography using convolutional neural network. Neurobiol Aging 2020; 87: 125–31. [DOI] [PubMed] [Google Scholar]

[ocac204-B24] 24. Liang H, Zhang F, Niu X.. Investigating systematic bias in brain age estimation with application to post-traumatic stress disorders. Hum Brain Mapp 2019; 40 (11): 3143–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B25] 25. Taruscio D, Vittozzi L, Rocchetti A, et al. The occurrence of 275 rare diseases and 47 rare disease groups in Italy. Results from the National Registry of Rare Diseases. Int J Environ Res Public Health 2018; 15 (7): 1470. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B26] 26. Rahimzadeh V, Schickhardt C, Knoppers BM, et al. Key implications of data sharing in pediatric genomics. JAMA Pediatr 2018; 172 (5): 476–81. [DOI] [PubMed] [Google Scholar]

[ocac204-B27] 27. Sheller M, Reina G, Edwards B, et al. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. Lect Notes Comput Sci 2018; 11383: 92–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocac204-B28] 28. Zerka F, Urovi V, Bottari F, et al. Privacy preserving distributed learning classifiers – sequential learning with small sets of data. Comput Biol Med 2021; 136: 104716. [DOI] [PubMed] [Google Scholar]

PERMALINK

An analysis of the effects of limited training data in distributed learning scenarios for brain age prediction

Raissa Souza

Pauline Mouches

Matthias Wilms

Anup Tuladhar

Sönke Langner

Nils D Forkert

Abstract

Objective

Materials and Methods

Results

Discussion

Conclusion

Graphical Abstract

Graphical Abstract.

INTRODUCTION

MATERIALS AND METHODS

Dataset

Figure 1.

Simulated distributed data

Table 1.

Base machine learning model

The federated learning models

The traveling models

Baseline models and evaluation

RESULTS

Baseline models

Table 2.

Figure 2.

Distributed learning models

Figure 3.

Effect of sample size per collaborating sites

Effect of number of epochs per round

Figure 4.

Figure 5.

Effect of number of rounds

DISCUSSION

CONCLUSION

FUNDING

AUTHOR CONTRIBUTIONS

SUPPLEMENTARY MATERIAL

CONFLICT OF INTEREST STATEMENT

Supplementary Material

Contributor Information

Data Availability

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases