Site effects how-to and when: An overview of retrospective techniques to accommodate site effects in multi-site neuroimaging analyses

. 2022 Oct 31;13:923988. doi: 10.3389/fneur.2022.923988

Sample size	The use of Empirical Bayes can improve the estimation and removal of site effects in datasets even with small sample sizes of at least 20–30 subjects (32, 76), yet, samples with less than 20 subjects might overstrain the algorithm and lead to unreliable priors and hyperparameters. The problem might be aggravated when covariates are added that further compartmentalize the data.
Dimensionality of features	Generally, the computational burden of ComBat is low. Both extracted features in the magnitude of dozens and hundreds up to voxel-wise measures can be entered to the ComBat implementation. For some implementations, 3D files need to be rewritten as [N,1] vectors. ComBat cannot be run for a single feature (see above).
Balanced sample sizes	As pointed out above, a larger total sample size seems to outweigh the degree of balance which might serve as a hint toward a rather inclusive strategy.
Distribution of covariates	As site effects and covariate effects across all sites compete with each other in the ComBat model, it is recommended that the distribution of covariates are not disjunct but overlap between sites (11, 33).
Separate handling of different types of features	It may be critical to combine subsets of features with a diverse range (and different units) (for example, combining cortical thickness [range 1-5 mm] and subcortical volumes [20–100 mm³]) in one dataset for ComBat. This may disturb the standardization step that is based on the pooled variance across sites and all features. It is thus recommended to harmonize these distinct feature subsets separately, which also preserves the interpretability of the position and units of the ComBat adjusted values.
Expected non-linear covariate effects	ComBat-GAM might be considered the most flexible tool when the envelope of the non-linearity is entirely unknown. For life-span studies, its primary validation paper (11) thus represents a guideline. For other studies, ComBat with a set of pre-defined non-linear extensions might be comparably suitable.
Additional harmonization of covariance	This newer extension is worthy of consideration under certain pre-conditions (Chen et al.; 11): (1) Data exploration demonstrates that covariance actually differs between sites (scanners), (2) sample sizes are sufficiently large to provide reliable estimates for the covariance, (3) results should be compared to conservative standard ComBat to understand the impact of the additional step.
ComBat in scenarios without full access to subject-level features	Here, distributed ComBat might serve as a workaround to rescue power that is lost in classical RE meta-analysis.
Model transfer to unseen cases of known/unknown sites	Unseen cases of sites known to the model can be corrected by ComBat, ComBat-GAM or CovBat. The transfer of the corrective model to data from unseen sites is not directly supported by the ComBat family of methods - only through additional adaptations (e.g., Neuroharmony (37). Also see discussion.