Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Nov 14.
Published in final edited form as: Neuroimage. 2022 Jul 30;261:119509. doi: 10.1016/j.neuroimage.2022.119509

A comparison of methods to harmonize cortical thickness measurements across scanners and sites

Delin Sun 1,2,66, Gopalkumar Rakesh 1,2, Courtney C Haswell 1,2, Mark Logue 3,4,5,6, C Lexi Baird 1,2, Erin N O’Leary 7, Andrew S Cotton 7, Hong Xie 7, Marijo Tamburrino 7, Tian Chen 7,8, Emily L Dennis 8,9,10,11, Neda Jahanshad 9, Lauren E Salminen 9, Sophia I Thomopoulos 9, Faisal Rashid 9, Christopher RK Ching 9, Saskia BJ Koch 12,13, Jessie L Frijling 12, Laura Nawijn 12,14, Mirjam van Zuiden 12, Xi Zhu 15,16, Benjamin Suarez-Jimenez 80,15,16, Anika Sierk 17, Henrik Walter 17, Antje Manthey 17, Jennifer S Stevens 18, Negar Fani 18, Sanne JH van Rooij 18, Murray Stein 19, Jessica Bomyea 19, Inga K Koerte 8,20, Kyle Choi 21, Steven JA van der Werff 22,23, Robert RJM Vermeiren 22, Julia Herzog 24, Lauren AM Lebois 25,26, Justin T Baker 27, Elizabeth A Olson 25,28, Thomas Straube 29, Mayuresh S Korgaonkar 30, Elpiniki Andrew 31, Ye Zhu 32,33, Gen Li 32,33, Jonathan Ipser 34, Anna R Hudson 35, Matthew Peverill 36, Kelly Sambrook 37, Evan Gordon 79, Lee Baugh 41,42,43, Gina Forster 41,42,44, Raluca M Simons 42,45, Jeffrey S Simons 43,45, Vincent Magnotta 46, Adi Maron-Katz 47, Stefan du Plessis 48, Seth G Disner 49,50, Nicholas Davenport 49,50, Daniel W Grupe 51, Jack B Nitschke 52, Terri A deRoon-Cassini 53, Jacklynn M Fitzgerald 54, John H Krystal 55,56, Ifat Levy 55,56, Miranda Olff 12,57, Dick J Veltman 58, Li Wang 32,33, Yuval Neria 15,16, Michael D De Bellis 59, Tanja Jovanovic 60, Judith K Daniels 61, Martha Shenton 8,62, Nic JA van de Wee 22,23, Christian Schmahl 24, Milissa L Kaufman 25,63, Isabelle M Rosso 25,28, Scott R Sponheim 49,50, David Bernd Hofmann 29, Richard A Bryant 64, Kelene A Fercho 41,42,43,65, Dan J Stein 34, Sven C Mueller 35, Bobak Hosseini 67, K Luan Phan 67,68, Katie A McLaughlin 69, Richard J Davidson 51,52,70, Christine L Larson 71, Geoffrey May 38,39,40,72, Steven M Nelson 38,39,40,72, Chadi G Abdallah 55,56, Hassaan Gomaa 73, Amit Etkin 47,74, Soraya Seedat 48, Ilan Harpaz-Rotem 55,56, Israel Liberzon 75, Theo GM van Erp 76,77, Yann Quidé 81,82, Xin Wang 78, Paul M Thompson 9, Rajendra A Morey 1,2,*
PMCID: PMC9648725  NIHMSID: NIHMS1829637  PMID: 35917919

Abstract

Results of neuroimaging datasets aggregated from multiple sites may be biased by site-specific profiles in participants’ demographic and clinical characteristics, as well as MRI acquisition protocols and scanning platforms. We compared the impact of four different harmonization methods on results obtained from analyses of cortical thickness data: (1) linear mixed-effects model (LME) that models site-specific random intercepts (LMEINT), (2) LME that models both site-specific random intercepts and age-related random slopes (LMEINT+SLP), (3) ComBat, and (4) ComBat with a generalized additive model (ComBat-GAM). Our test case for comparing harmonization methods was cortical thickness data aggregated from 29 sites, which included 1,340 cases with posttraumatic stress disorder (PTSD) (6.2–81.8 years old) and 2,057 trauma-exposed controls without PTSD (6.3–85.2 years old). We found that, compared to the other data harmonization methods, data processed with ComBat-GAM was more sensitive to the detection of significant case-control differences (X2(3) = 63.704, p < 0.001) as well as case-control differences in age-related cortical thinning (X2(3) = 12.082, p = 0.007). Both ComBat and ComBat-GAM outperformed LME methods in detecting sex differences (X2(3) = 9.114, p = 0.028) in regional cortical thickness. ComBat-GAM also led to stronger estimates of age-related declines in cortical thickness (corrected p-values < 0.001 ), stronger estimates of case-related cortical thickness reduction (corrected p-values < 0.001 ), weaker estimates of age-related declines in cortical thickness in cases than controls (corrected p-values < 0.001 ), stronger estimates of cortical thickness reduction in females than males (corrected p-values < 0.001 ), and stronger estimates of cortical thickness reduction in females relative to males in cases than controls (corrected p-values < 0.001 ). Our results support the use of ComBat-GAM to minimize confounds and increase statistical power when harmonizing data with non-linear effects, and the use of either ComBat or ComBat-GAM for harmonizing data with linear effects.

Keywords: Data Harmonization, Scanner Effects, Site Effects, Cortical Thickness, ComBat, ComBat-GAM, Linear Mixed-Effects Model, General Additive Model, PTSD

1. Introduction

Large consortia, such as Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) (Thompson et al., 2020), Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) (Hofer et al., 2020), and others have aggregated neuroimaging data acquired on many different scanners and recruited subjects at many different sites to conduct meta- and mega-analyses. By applying standardized analysis pipelines to extremely large datasets of thousands or tens of thousands of samples, consortia improve reliability, enhance reproducibility of results, amass sufficient statistical power to detect relatively small effect sizes, and support the ability to divide samples while retaining the power to delineate subsample (e.g., male vs female or young vs old) and interaction effects. The diverse ethnic, racial, geographic, and clinical demography of consortium data has provided results that are more representative of the wider population while also permitting exploration of clinical and neurobiological subtypes of neuropsychiatric disorders (Dennis et al., 2022; Thompson et al., 2020). Neuroimaging results generated by consortia are more robust and reproducible than studies that are generated by a single laboratory (Koshiyama et al., 2022), provided that consortia apply uniform methods to data originating from multiple sites and scanners.

However, several challenges are posed by the analysis of consortium data. A major concern of consortium-generated results is bias introduced by site-specific acquisition protocols and MRI scanners that may interact with site-specific demographic and clinical profiles (Radua et al., 2020).The challenge of post hoc combination of datasets stems partly from a lack of a priori harmonization of MRI acquisition sequences. Prospective data collection by consortia such as NCANDA (Brown et al., 2015), ABCD (Volkow et al., 2018), TRACK-TBI (Hicks et al., 2013), and others have prescribed harmonized acquisition parameters at study out- set with the expectation of superior performance and obviating the need for post-acquisition harmonization. However, even prospective standardization and prescription of acquisition parameters results in significant variance attributed to sites for relatively short scan duration (e.g., 5 min) that can be reduced significantly by increasing scan duration (e.g., 25 min) (Noble et al., 2017). It remains unclear whether further post hoc harmonization of these datasets may improve sensitivity and power of analyses.

Various methods to harmonize neuroimaging data across sites are gaining acceptance and will become commonplace. However, there is little empirical evidence to support the use of a single method due to the lack of formal comparisons of available methods. In this study, we compared four harmonization methods. First, we tested linear mixed-effects modeling (LME), also known as the mixed-effects mega-analysis (ME-Mega) (Radua et al., 2020), with site as a random intercept (LMEINT) to model the intercept location effects of site on brain measures. Second, we tested LME with both random intercept and age-related random slope for the site covariate (LMEINT+SLP). Third, we used ComBat, a method originally developed to minimize batch effects present in data originating from multiple gene arrays (Johnson et al., 2007), and later adapted for neuroimaging data. ComBat is designed to remove site-associated differences while preserving variation due to biologically relevant variables such as age, sex, and diagnosis (Fortin et al., 2018). ComBat has been widely used to harmonize neuroimaging data including cortical thickness (Fortin et al., 2018), surface area, subcortical volumes (Radua et al., 2020), diffusion tensor imaging (Fortin et al., 2017; Hatton et al., 2020), and resting-state functional connectivity (Yu et al., 2018). Radua et al. (2020) reported that ComBat and LMEINT produced similar results when harmonizing cortical thickness, surface area, and subcortical volumes, while ComBat harmonization led to slightly higher statistical significance when performing between-group comparisons, in a multisite imaging study of schizophrenia. The fourth method, by Pomponio et al. (2020), improves on ComBat by modeling non-linear effects of age with a generalized additive model (GAM). ComBat-GAM allows for varied distributions of scale (multiplicative, or variance) and location (additive, or mean) effects, respectively.

ComBat-GAM was designed to capture age-related non-linearities across the lifespan by fitting a GAM with a penalized nonlinear term. Pomponio et al. (2020) examined cortical and subcortical gray matter volumes without harmonization, harmonized by ComBat, and harmonized by ComBat-GAM in a large sample of 10,477 healthy subjects aggregated from 18 sites who ranged in age from 3 to 96 years. They reported that gray matter volumes harmonized by ComBat-GAM achieved the best performance in an age prediction task that minimized the difference between actual age and predicted age. They also found that ComBat-GAM, compared to other approaches, consistently led to improved prediction accuracy for each dataset in a leave-one-site-out validation experiment. However, Pomponio et al. (2020) only investigated data from healthy participants, which did not involve case-control comparisons, nor formal comparisons to LME methods.

Consequently, the goals of the present study were to investigate (1) the performance of ComBat-GAM for comparing clinical cases to controls, (2) how performance is influenced by age, and (3) how well performance characteristics compare to LMEINT, LMEINT+SLP, and ComBat. Although the random-effects meta-analysis (RE-Meta) has been widely used by ENIGMA projects (Zugman et al., 2022), we did not include RE-Meta in this study because several studies showed that LME and ComBat produce results with greater statistical power than RE-Meta (Boedhoe et al., 2017; Favre et al., 2019; Radua et al., 2020; van Rooij et al., 2018). The increase in power is based on the premise that the site effect being removed represents random noise, and its removal leads to larger effect sizes and greater efficiency requiring fewer subjects to reject the null hypothesis at a pre-specified power.

An important caveat is that performance was measured by the number of brain regions with significant case-control differences. We recognize that neither the method with the greatest number of regions reaching significance nor the method that maximizes the magnitude (absolute value) of regression coefficients reflects the true underlying cortical thickness - the so-called ground truth. However, harmonization can move the values further from the ground truth and still be useful. The main aim of harmonization is to make uncalibrated measurements more comparable to each other. It is possible that measurable differences between cases and controls are potentially masked by scanner bias and effective harmonization should increase the difference between the distribution of cases and controls. Therefore, it is advisable to count the number of regions that are statistically significant after implementing harmonization. Nonetheless, there is a risk that harmonization may introduce variability that was not present in the original data.

Data aggregated from 29 sites served as our test case for comparing harmonization methods. Subjects’ data was grouped into cases with PTSD (N = 1340) and trauma-exposed controls without PTSD (N = 2057). PTSD is associated with anatomical and functional alterations in widely distributed regions of the brain (Dennis et al., 2022; Logue et al., 2018; Wang et al., 2021). Military service members with PTSD and comorbid mild traumatic brain injury (mTBI) experience faster age-associated decline in cortical thickness than controls (Santhanam et al., 2019; Savjani et al., 2017). We hypothesized significant case-control differences in cortical thickness and age-related cortical thinning would be detectable in more brain regions by utilizing ComBat-GAM relative to LMEINT, LMEINT+SLP, and ComBat.

2. Methods

2.1. Participants

Data were obtained for secondary analysis from the ENIGMA-PGC PTSD Working Group. The dataset originated from 29 sites located on five continents (PTSD, N = 1340; Trauma-Exposed Controls, N = 2057) from a broad age group (6.2–85.2 years old). Three sites were the source of all children and adolescents (Duke De Bellis 9.9 ± 2.5; Leiden University 16.0 ± 1.9; University of Washington 13.2 ± 2.9) and one site was the source of older participants (ADNI-DoD 67.9 ± 3.6), with minimal overlap between the 3 sites with participants under 20 years and sites with participants over 20 years. Only one site contributed both children (Duke University-De Bellis) and adults (Duke University-Morey). Demographic information is summarized in Table 1. Clinical measures and assessment of PTSD are explained in the Supplementary Table S1. The scanner information is listed in Supplementary Table S2. All study sites obtained approval from local institutional review boards or ethics committees. All participants provided written informed consent. Data is available upon request from the corresponding author.

Table 1.

Demographic information per study site.

Control PTSD
SiteName N Female: Male Age (yrs: mean ± SD) N Female: Male Age (yrs: mean ± SD)

ADNIDoD a 105 1:104 70.0±5.2 80 0:80 67.9±3.6
Amsterdam Medical Center 37 18:19 39.6±10.0 38 17:21 40.4±9.9
Columbia University 35 23:12 35.2±10.6 53 34:19 36.3±9.3
Duke University-De Bellis 86 47:39 10.5±2.6 29 15:14 9.9 ± 2.5
Duke University-Morey 270 59:211 39.7±10.1 114 16:98 40.7±9.9
Ghent University 59 59:0 37.7±12.3 8 8:0 32.6±10.3
University of Groningen - - - 40 40:0 38.2±9.7
U.W. Madison-Grupe 38 1:37 30.7±6.6 19 3:16 30.4±6.2
Emory University-GTP 108 103:5 40.8±12.2 66 66:0 37.0±12.3
INTRuST b 254 121:133 34.8±13.0 104 23:81 38.6±10.6
U.W. Milwaukee-Larson 45 23:22 35.5±11.4 19 10:9 29.2±8.6
Leiden University 30 26:4 14.7±1.6 22 19:3 16.0±1.9
University of Mannheim - - - 48 48:0 35.9 ± 11.8
Harvard University-McLean 13 13:0 35.6±10.5 39 39:0 38.2±12.9
Minneapolis V.A.-Disner 95 6:89 33.2±8.6 74 2:72 32.0±7.6
University of Münster 26 21:5 26.5±7.4 21 21:0 27.4±7.0
University of Illinois-Chicago 20 0:20 34.0±8.9 23 0:23 31.3 ± 9.3
Harvard University-Rosso 85 44:41 33.5 ± 9.3 20 13:7 35.3 ± 7.9
University of South Dakota 44 7:37 29.9 ± 6.9 78 17:61 28.8 ± 7.1
Stanford University 1 0:1 61.0 ± 0 68 40:28 36.9±10.3
Stellenbosch University 138 100:38 42.9±14.3 120 87:33 39.4±11.0
University of Toledo 61 27:34 34.3±11.6 15 7:8 40.9±9.5
UCAS-Beijing 36 17:19 48.2±6.8 34 21:13 51.0±6.7
University of Cape Town 55 55:0 28.7±6.4 7 7:0 30.5±7.2
University of Sydney-Westmead 107 71:36 40.4±13.2 48 25:23 39.0±11.6
University of Washington 202 105:97 14.1±3.0 53 25:28 13.2±2.9
Waco V.A. 25 4:21 40.7±11.6 41 6:35 41.0±11.0
University of New Haven 34 3:31 34.2±9.8 37 5:32 34.8±9.2
Yale University 48 8:40 29.4±8.2 22 3:19 31.8±6.9
a

Alzheimer’s Disease Neuroimaging Initiative - Department of Defense.

b

Injury & Traumatic Stress Clinical Consortium.

2.2. Imaging data preprocessing

Anatomical brain images were preprocessed at Duke University through a standardized neuroimaging and QC pipeline developed by the ENIGMA Consortium (http://enigma.ini.usc.edu/protocols/imaging-protocols/) (Logue et al., 2018). Cortical thickness measurements were generated using the FreeSurfer software (https://surfer.nmr.mgh.harvard.edu ) based on the Destrieux atlas (Destrieux et al., 2010) that contains 74 regions per hemisphere. All sites used FreeSurfer 5.3 for parcellation except ADNI-DoD, Minneapolis VA, and the Waco VA, which used FreeSurfer 6.0, as well as Amsterdam Medical Center and University of South Dakota, which used FreeSurfer 7.1.1 (Supplementary Table S2). Briefly, white matter surfaces were deformed toward the gray matter boundary at each surface vertex. Cortical thickness was calculated based on the average distance between the parcellated portions of white and pial surfaces within each region per participant. In each region, any missing value was replaced by the mean cortical thickness averaged across same group of participants (either PTSD or trauma-exposed controls) at the same site

2.3. ComBat harmonization

ComBat removes the effects of site while preserving inherent biological variance in the data (Fortin et al., 2018). In the present study, PTSD diagnosis, age, and sex were designated as biological variables. The ComBat approach was implemented using R scripts (https://github.com/Jfortin1/ComBatHarmonization) running on RStudio (ver. 1.3.1073) and R (ver. 4.0.2). Unlike implementations of LME models that merge data harmonization and statistical analyses, ComBat and ComBat-GAM perform only harmonization and make harmonized data available to the user.

2.4. ComBat-GAM harmonization

PTSD diagnosis, age, and sex were designated as biological variables, and age was specified as the only smooth term in the model. We employed the default setting so that the empirical Bayes estimates were used for site effects, and there were no custom boundaries for the smoothing terms. The ComBat-GAM approach was implemented using Python (ver. 3.8.5) scripts (https://github.com/rpomponio/neuroHarmonize).

2.5. Distribution of non-harmonized, ComBat harmonized, and ComBat-GAM harmonized data

Pairwise comparisons of non-harmonized, ComBat harmonized, and ComBat-GAM harmonized data using the function pairs() (from the R package emmeans) were applied to the absolute differences between the site-specific mean values and the mean value averaged across sites. The absolute, but not signed values, of the differences were investigated in order to test whether ComBat and ComBat-GAM harmonization led to more consistent distributions Specifically, smaller differences between the site-specific mean values and the mean across sites). The pairwise comparisons were also applied to site-specific standard deviations for cortical thickness across cortical regions. The p-values were adjusted using Bonferroni correction for three pairwise comparisons (i.e., ComBat vs. non-harmonized, ComBat-GAM vs. non-harmonized, ComBat-GAM vs. ComBat). The effects of harmonization by LME models cannot be observed directly because data harmonization and statistical analyses are inseparable in LME methods.

2.6. Statistical models

In all models, we included sex, age, and PTSD diagnosis as fixed factors to estimate their effects on regional cortical thickness, and as co-variates for testing interaction effects of interest. Either age by diagnosis interaction, or sex by diagnosis interaction was included in the models as a fixed factor when the corresponding interaction was of interest. The supplementary materials report on the influence of age 2 as a fixed-factor to estimate effects on regional cortical thickness, and for testing interaction effects. Linear modeling was used to analyze data harmonized by ComBat and data harmonized by ComBat-GAM. Cortical thickness data without harmonization was entered into the LME models. The LMEINT models employed study site as a random factor to model random intercepts. The LMEINT+SLP modeled both the site-specific random intercepts and age-related random slopes to reflect different age-related slopes in cortical thickness across sites. Bonferroni correction was employed for multiple testing of 148 cortical regions with a corrected α = 0.0003 (0.05/148). The functions lm() and lmer() (from the R package lme4) were used to calculate the unstandardized regression coefficients for lineal models and the random effects models, respectively. The R package lmerTest was utilized to extract the statistical significance of models. The fitted curves in this manuscript were made using default settings (i.e. loess) of the R ggplot2 function geom_smooth().

The number of regions with significant findings and the magnitude of effect size was compared separately between the 4 harmonization methods. A chi-squared test based on the function chisq.test() (from the native stats package in R) was used to compare the number of cortical regions showing significant effects. The region-specific regression coefficients were compared using repeated-measures ANOVA based on the function aov_ez() (from the R package afex ). If the omnibus ANOVA results were statistically significant, then post-hoc pairwise comparisons of the 4 harmonization methods were conducted using the function pairs() (from the R package emmeans ). The p-values were adjusted using the Bonferroni method for the 6 pairwise comparisons made with the outputs of the 4 harmonization methods.

3. Results

As shown in Fig. 1 and the interactive plot at https://4n8ygg-delin-sun.shinyapps.io/SDL_Shiny/, data distribution and age-related slops are largely modulated by site. Visual evidence of a non-linear age effect in participants under 20 years originate from 3 sites (Duke-De Bellis; Leiden University; University of Washington). Therefore, it is paramount and meaningful to harmonize the data by removing site effects.

Fig. 1.

Fig. 1.

Scatter plots of mean cortical thickness averaged across regions for each study site. Data distribution and age-related linear trends are markedly different across sites. Mean cortical thickness averaged across regions is shown to avoid regional biases. Participants are color-coded based on study site.

3.1. Distribution of non-harmonized, ComBat harmonized, and ComBat-GAM harmonized data

Distributions of non-harmonized and harmonized data are shown in Fig. 2. Relative to non-harmonized data, ComBat (controls: range of t-values: [−10.120,−3.225], p-values: [<0.001,0.006] corrected; PTSD: t-values: [−9.653,−3.475], p-values: [<0.001,0.003] corrected; across regions) and ComBat-GAM harmonized data (controls: t-values: [−10.046,−1.856], p-values: [<0.001,0.207] corrected; PTSD: t-values: [−9.590,−2.284], p-values: [<0.001,0.078] corrected; across regions) resulted in smaller differences overall between the site-specific data and the mean across sites. There was no significant difference between ComBat and ComBat-GAM harmonized data (controls: t-values: [−2.373,0.075], p-values: [0.064,0.999] corrected; PTSD: t-values: [−2.183,0.066], p-values : [0.100,0.999] corrected; across regions).

Fig. 2.

Fig. 2.

Site-specific cortical thickness averaged across regions. Non-harmonized (A), ComBat harmonized (B), and ComBat-GAM harmonized (C) data in participants with PTSD. Non-harmonized (D), ComBat harmonized (E), and ComBat-GAM harmonized (F) data in trauma-exposed controls. The order of sites in the figure is consistent with the order of site names in the legend from top to bottom to facilitate with interpretation. Compared to non-harmonized data, ComBat and ComBat-GAM lead to smaller differences between site-specific data and the mean values averaged across sites, and they do not change the site-specific standard deviations for cortical thickness. The effects of harmonization by LME models cannot be shown here because data harmonization and statistical analyses are inseparable in LME methods. Mean cortical thickness averaged across regions is shown to minimize regional biases. The boxplots were made using the default settings of the R ggplot2 function geom_boxplot(). The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called “outlying” points and are plotted individually.

There was no significant difference in the site-specific standard deviations between all data-pairings across all regions (controls: t-values : [−0.961,1.995], p-values: [0.154,0.999] corrected; PTSD: t-values : [−1.174,2.354], p-values: [0.066,0.999] corrected; across regions).

3.2. Main effect of age

As shown in Fig. 3, linear age-related trends are evident with ComBat harmonization, whereas non-linear trends are evident with ComBat-GAM harmonization with a dramatic decline in cortical thickness before 20, and a relatively slow decline after 20 years. This pattern holds true for both PTSD and control groups.

Fig. 3.

Fig. 3.

Scatter plots and non-linear trends of mean cortical thickness averaged across regions. Non-harmonized (A), ComBat harmonized (B), and ComBat-GAM harmonized (C) data in participants with PTSD. Non-harmonized (D), ComBat harmonized (E), and ComBat-GAM harmonized (F) data in controls. Both ComBat and ComBat-GAM reduce variances. ComBat-GAM is superior to ComBat at capturing the age-related non-linear trends in cortical thickness. Mean cortical thickness averaged across regions is shown to avoid biases by particular regions. The fit curves were made based on the default settings (i.e. loess) of the R ggplot2 function geom_smooth(). The shaded regions represent the 95% confidence intervals.

As shown in Figs. 4A and 5, the number of regions showing a significant main effect of age was significantly different across harmonization methods (X 2(3) = 89.658, p < 0.001). The age-related declines in cortical thickness were detected by ComBat-GAM and ComBat in 147 (99.3%) regions, by LMEINT in 145 (98.0%) regions, and by LME INT+SLP in 113 (76.4%) regions, see Table 2. As shown in Table 3, the ratio of detection (> 95%) indicated that the significant regions detected by one method was also detected by another method except for LMEINT+SLP, which is less efficiently (< 80%) in replicating the findings of age effects detected by the other methods.

Fig. 4.

Fig. 4.

Main effect of age. (A) Negative log-transformed statistical significance, i.e. −log10 (p). All four methods can detect multiple regions showing significance. The dashed and solid vertical lines represent thresholds p = 0.05 (uncorrected) and p = 0.05 (Bonferroni corrected), respectively. (B) Magnitude of regression coefficients. ComBat-GAM compared to the other methods provided stronger estimation of age-related cortical thickness reduction. The ordering of regions from top to bottom in both (A) and (B) is by ascending order of regression coefficients from cortical thickness data harmonized by ComBat-GAM. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes. The fit curves were made based on the default settings (i.e. loess ) of the R ggplot2 function geom_smooth().

Fig. 5.

Fig. 5.

Regions with significant main effect of age. The color bar represents the magnitude of the regression coefficient. LMEINT+SLP compared to the other methods detected fewer regions showing significant age effect. Cooler colors represent stronger age-related declines in cortical thickness. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes.

Table 2.

Percent of 148 regions showing statistical significance.

Effects Harmonization methods Chi-squared test
LMEINT LMEINT+SLP ComBat ComBat-GAM statistics p (Bonferroni corr.)

Age 98.0 76.4 99.3 99.3 89.658 <0.001
Diagnosis 1.4 1.4 3.4 20.9 63.704 <0.001
Age × Diagnosis 0 0 0 2.7 12.082 0.007
Sex 17.6 19.6 29.1 29.1 9.114 0.028
Sex × Diagnosis 0 0 0 0 NA NA

Note: LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes.

Table 3.

Ratio of detection (%) based on number of regions that met Bonferroni-corrected significance.

LMEINT LMEINT+SLP ComBat ComBat-GAM

Effect of Age
LME INT - 77.9 100 100
LME INT+SLP 100 - 100 100
ComBat 98.6 76.9 - 100
ComBat-GAM 98.6 76.9 100 -
Effect of Diagnosis
LME INT - 50 100 50
LME INT+SLP 50 - 50 100
ComBat 40 20 - 80
ComBat-GAM 3.2 6.5 12.9 -
Age by Diagnosis Interaction
LME INT - NA NA NA
LME INT+SLP NA - NA NA
ComBat NA NA - NA
ComBat-GAM 0 0 0 -
Effect of Sex
LME INT - 100 100 100
LME INT+SLP 89.7 - 100 93.1
ComBat 60.5 67.4 - 93
ComBat-GAM 60.5 62.8 93 -
Sex by Diagnosis Interaction
LME INT - NA NA NA
LME INT+SLP NA - NA NA
ComBat NA NA - NA
ComBat-GAM NA NA NA -

Note: The ratio of detection is defined as the proportion of cortical regions showing statistical significance that were identified by the methods in rows were also detected by the methods in columns. Higher ratio of detection means that the method in columns was as effective as the method in rows at detecting significance. NA indicates not available because no significant finding was detected by the method in rows.

The regression coefficients were significantly different across harmonization methods (F(1.6, 231.8) = 207.13, p < 0.001). As shown in Fig. 4B and Table 4, ComBat-GAM produced stronger estimates of age-related declines in cortical thickness than the other methods, while the other three methods were not significantly different from each other.

Table 4.

Comparisons of regression coefficients.

LMEINT LMEINT+SLP ComBat ComBat-GAM

Effect of Age
LME INT - 0.6e-04 −0.6e-04 10.8e-04
LME INT+SLP 0.703 - −1.2e-04 10.2e-04
ComBat 0.677 0.125 - 11.4e-04
ComBat-GAM <0.001 <0.001 <0.001 -
Effect of Diagnosis
LME INT - 0.4e-03 1.2e-03 6.6e-03
LME INT+SLP 0.277 - 0.7e-03 6.2e-03
ComBat <0.001 0.011 - 5.4e-03
ComBat-GAM <0.001 <0.001 <0.001 -
Age by Diagnosis Interaction
LME INT - 3.8e-05 −0.3e-05 −44.4e-05
LME INT+SLP 0.262 - −4.1e-05 −48.2e-05
ComBat 0.999 0.197 - −44.1e-05
ComBat-GAM <0.001 <0.001 <0.001 -
Effect of Sex
LME INT - −1.8e-03 0.6e-03 1.6e-03
LME INT+SLP <0.001 - 2.4e-03 3.4e-03
ComBat 0.012 <0.001 - 1.1e-03
ComBat-GAM <0.001 <0.001 <0.001 -
Sex by Diagnosis Interaction
LME INT - 6.0e-04 −0.4e-04 12.0e-04
LME INT+SLP 0.056 - −6.3e-04 6.1e-04
ComBat 0.999 0.037 - 12.4e-04
ComBat-GAM <0.001 0.049 <0.001 -

Note: The upper triangle of the matrix are the differences between regression coefficients from methods in rows and columns. Higher values mean that the method in columns lead to more negative (i.e., weaker positive coefficients, or stronger negative coefficients) estimates than the method in rows. The lower triangle represents the corresponding p-values (Bonferroni corrected).

3.3. Main effect of diagnosis

The number of regions showing a main effect of diagnosis was significantly different across harmonization approaches (X 2(3) = 63.704, p < 0.001). As shown in Fig. 6A, and Table 2, case-related reductions in cortical thickness were found by ComBat-GAM in 31 (20.9%) regions, by ComBat in 5 (3.4%) regions, by LMEINT and by LMEINT+SLP in 2 (1.4%) regions. As shown in Fig. 7, the regions discovered by ComBat-GAM include those within the salience network (SN; bilateral insula regions), executive control network (ECN; bilateral intraparietal sulcus and supramarginal gyri), default mode network (DMN; left ventromedial prefrontal cortex, and bilateral precuneus), and bilateral superior and inferior temporal gyri and sulci, which are consistent with previous reports ( Shalev et al., 2017 ). As shown in Table 3, the significant regions detected by LMEINT were also detected by ComBat, and the significant regions detected by LMEINT+SLP were also detected by ComBat-GAM, while the opposite was not true (ratio of detection <= 40%).

Fig. 6.

Fig. 6.

Main effect of diagnosis. (A) Negative log-transformed statistical significance, i.e. −log10 (p). ComBat-GAM compared to the other methods detected more regions showing statistical significance. The dashed and solid vertical lines represent thresholds p = 0.05 (uncorrected) and p = 0.05 (Bonferroni corrected), respectively. (B) Magnitude of regression coefficients. ComBat-GAM compared to the other methods provided stronger estimation of case-related cortical thickness reduction as well as weaker estimation of case-related cortical thickness increase. The ordering of regions from top to bottom in both (A) and (B) is by ascending order of regression coefficients from cortical thickness data harmonized by ComBat-GAM. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes. The fit curves were made based on the default settings (i.e. loess) of the R ggplot2 function geom_smooth().

Fig. 7.

Fig. 7.

Regions with a significant main effect of diagnosis. ComBat-GAM compared to the other methods detected more regions showing significant case-control difference. The color bar represents the magnitude of the regression coefficient. Cooler colors mean lower cortical thickness in PTSD than controls. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes.

Regression coefficients were different across harmonization methods (F(1.4, 205.1) = 335.79, p < 0.001). As shown in Fig. 6B and Table 4, ComBat-GAM produced stronger estimates of case-related cortical thickness reduction as well as weaker estimates of case-related cortical thickness increase than the other three methods, and ComBat produced stronger estimates of case-related cortical thickness reduction as well as weaker estimates of case-related cortical thickness increase than the two LME methods.

3.4. Age by diagnosis interaction

As shown in Fig. 8A, significant age by diagnosis interactions were detected by ComBat-GAM in 4 (2.7%) regions, while no significant in- teractions were detected by ComBat, LMEINT, and LMEINT+SLP. ComBat-GAM outperformed the other methods in detecting this interaction effect (X 2(3) = 12.082, p = 0.007), see Table 2. As shown in Fig. 9, age-related declines in cortical thickness were slower in cases than controls for 4 regions: right posterior-dorsal part of the cingulate gyrus, right marginal branch of the cingulate sulcus, right inferior temporal gyrus, and right fusiform gyrus. The linear (Fig. S1) and non-linear (Fig. S2) fits of the age-related distributions of cortical thickness harmonized by ComBat-GAM in these regions are shown in the supplementary materials.

Fig. 8.

Fig. 8.

Interaction of age and diagnosis. (A) Negative log-transformed statistical significance, i.e. −log10 (p). Only ComBat-GAM detected four regions showing statistical significance after correction. The dashed and solid vertical lines represent thresholds p = 0.05 (uncorrected) and p = 0.05 (Bonferroni corrected), respectively. (B) Magnitude of regression coefficients. ComBat-GAM compared to the other methods produced weaker estimates of age-related declines in cortical thickness in cases than controls. The ordering of regions from top to bottom in both (A) and (B) is by ascending order of regression coefficients from cortical thickness data harmonized by ComBat-GAM. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes. The fit curves were made based on the default settings (i.e. loess) of the R ggplot2 function geom_smooth().

Fig. 9.

Fig. 9.

Regions show significant age by diagnosis interaction. Only ComBat-GAM detected four regions showing statistical significance. The color bar represents the magnitude of the regression coefficient. Warmer colors mean that age-related declines in cortical thickness are smaller in PTSD than controls.

Regression coefficients differed across harmonization methods (F(1.3, 197.3) = 246.41, p < 0.001). As shown in Fig. 8B and Table 4, ComBat-GAM compared to the other methods produced weaker estimates of age-related declines in cortical thickness in cases than controls, and both ComBat and LMEINT compared to LMEINT+SLP produced weaker estimates of age-related declines in cortical thickness in cases than controls.

3.5. Main effect of sex

The number of regions showing a significant main effect of sex was significantly different across harmonization methods (X 2(3) = 9.114, p = 0.028). As shown in Fig. 10A and Table 2, the differences between males and females in cortical thickness were detected by ComBat-GAM and by ComBat in 43 (29.1%) regions, by LMEINT in 26 (17.6%) regions, and by LMEINT+SLP in 29 (19.6%) regions. As shown in Fig. 11, The analyses based on ComBat-GAM harmonization showed that females had greater cortical thickness than males in bilateral precentral and postcentral regions, bilateral middle cingulate cortex, bilateral superior frontal gyri, bilateral angular gyri, bilateral medial occipito-temporal sulci and lingual sulci, left frontal pole, left superior temporal sulci, and right parahippocampal gyrus. By contrast, males had greater cortical thickness than females in bilateral inferior temporal regions, left rectus, left planum polare of the superior temporal gyrus, left vertical ramus of the anterior segment of the lateral sulcus, bilateral calcarine sulci, left insula, left inferior and middle frontal sulci, left orbital sulci, right ventral posterior cingulate cortex, right temporal pole. As shown in Table 3, most regions showing statistical significance detected by the LME methods were also detected by ComBat and ComBat-GAM (ratio of detection > 90%), and the opposite is not true (ratio of detection <= 70%).

Fig. 10.

Fig. 10.

Main effect of sex. (A) Negative log-transformed statistical significance, i.e. −log10 (p). The dashed and solid vertical lines represent thresholds p = 0.05 (uncorrected) and p = 0.05 (Bonferroni corrected), respectively. (B) Magnitude of regression coefficients. ComBat-GAM compared to the other methods produced stronger estimates of cortical thickness reduction in females than males as well as weaker estimates of cortical thickness increase in females than males. The ordering of regions from top to bottom in both (A) and (C) is by ascending order of regression coefficients from cortical thickness data harmonized by ComBat-GAM. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes. The fit curves were made based on the default settings (i.e. loess) of the R ggplot2 function geom_smooth().

Fig. 11.

Fig. 11.

Regions with a significant main effect of sex. The color bar represents the magnitude of the regression coefficient. Cooler (warmer) colors indicate lower (higher) cortical thickness in females compared to males.

Regression coefficients were different across harmonization methods (F(1.8, 259.6) = 123.25, p < 0.001). As shown in Fig. 10B and Table 4, ComBat-GAM compared to the other methods produced stronger estimates of cortical thickness reduction in females than males as well as weaker estimates of cortical thickness increase in females than males. ComBat compared to LME methods as well as LMEINT compared to LMEINT+SLP produced stronger estimates of cortical thickness reduction in females than males as well as weaker estimates of cortical thickness increase in females than males.

3.6. Sex by diagnosis interaction

As shown in Fig. 12A, no significant sex by diagnosis interactions were found using data from any of the four methods. Regression coefficients were significantly different across harmonization approaches (F(1.2, 178.3) = 12.40, p < 0.001). As shown in Fig. 12B and Table 4, ComBat-GAM compared to the other methods produced stronger estimates of cortical thickness reduction in females relative to males in cases than controls, as well as weaker estimates of cortical thickness increase in females relative to males in cases than controls. ComBat compared to the LMEINT+SLP methods produced stronger estimates of cortical thickness increase in females compared to males in cases than controls, as well as weaker estimates of cortical thickness reduction in females compared to males in cases than controls.

Fig. 12.

Fig. 12.

Sex by diagnosis interaction. (A) Negative log-transformed statistical significance, i.e. −log10 (p). The dashed and solid vertical lines represent thresholds p = 0.05 (uncorrected) and p = 0.05 (Bonferroni corrected), respectively. None of the four methods detected significant regions. (B) The magnitude of regression coefficients. ComBat-GAM compared to the other methods produced stronger estimates of cortical thickness reduction in females relative to males in cases than controls, as well as weaker estimates of cortical thickness increase in females relative to males in cases than controls. The ordering of regions from top to bottom in both (A) and (B) is by ascending order of regression coefficients from cortical thickness data harmonized ComBat-GAM. LMEINT, LME models site-specific random intercept. LMEINT+SLP, LME models both site-specific random intercepts and age-related random slopes. The fit curves were made based on the default settings (i.e. loess) of the R ggplot2 function geom_smooth().

3.7. Results after removing sites with children, adolescents, and older participants

To test whether our findings were influenced by the data from children, adolescents, and very old participants, we re-analyzed the data after removing 3 sites with participants under 20 years and one site with older participants (~70 years). ComBat-GAM, ComBat, and LMEINT detected more regions with age-related cortical thinning compared to LMEINT+SLP. Both ComBat and ComBat-GAM compared to two LME methods detected more regions with sex-related cortical thickness differences. There was no significant difference among harmonization methods in detecting other effects. More details see Supplementary results section, and Table S4, S5, and S6.

4. Discussion

We compared the performance of four harmonization methods by applying them to cortical thickness data in participants grouped into clinical cases and controls from 29 different sites. The four harmonization methods included LMEINT, LMEINT+SLP, ComBat, and ComBat-GAM. We acknowledge that the number of regions reaching significance by any method does not necessarily reflect the ground truth, but the principle goal of harmonization is to convert uncalibrated measurements from multiple sources to be more comparable to each other. As summarized in Table 2, ComBat-GAM, ComBat, and LMEINT detected more regions with age-related cortical thinning compared to LMEINT+SLP (Figs. 4A and 5). Consistent with our a priori hypothesis, ComBat-GAM harmonization uncovered more regions with significant case-related reductions in cortical thickness (Figs. 6A and 7), and more regions displaying slower rates of age-related cortical thinning in cases than controls compared to the other methods (Figs. 8A and 9). ComBat and ComBat-GAM outperformed LME methods in detecting sex-related differences (Figs. 10A and 11), but not sex by diagnosis interactions (Fig. 12A). As summarized in Table 3, most regions showing significant effects of age and sex detected by LME methods were also detected by ComBat and ComBat-GAM, while the opposite was not true, except that LMEINT performed comparably to ComBat and ComBat-GAM for the main effect of age. Regression coefficients (Table 4) showed that compared to other methods, ComBat-GAM produced stronger estimates of age-related declines in cortical thickness (Fig. 4B), stronger estimates of case-related cortical thickness reduction (Fig. 6B), weaker estimates of age-related declines in cortical thickness in cases than controls (Fig. 8B), stronger estimates of cortical thickness reduction in females than males as well as weaker estimates of cortical thickness increase in females than males (Fig. 10B), stronger estimates of cortical thickness reduction in females relative to males in cases than in controls, and weaker estimates of cortical thickness increase in females relative to males in cases than in controls (Fig. 12B).

ComBat models the expected values of the imaging features as a linear combination of the biological variables and the site effects whose error term is modulated by additional site-specific scaling factors (Fortin et al., 2018). It also uses empirical Bayes to improve the estimation of the model parameters in studies with small sample size. Radua et al. (2020) used cortical thickness, surface area, and subcortical volume data in cases and controls from ENIGMA-Schizophrenia to compare ComBat to random-effects meta-analysis and random-effects mega-analysis, which we term LMEINT in the present study. They reported that ComBat delivered more results that were statistically significant than random-effects meta-analyses, and slightly more than LMEINT. However, they did not report results of non-linear age effects on cortical thickness, which are well documented (Frangou et al., 2022; Pomponio et al., 2020; Walhovd et al., 2017), nor did they report on effects of group membership on age-related changes in cortical thickness. By contrast, Pomponio et al. (2020) developed ComBat-GAM to support harmonization of neuroimaging data with non-linearities related to age or other variables by investigating cortical and subcortical gray matter volumes in 10,477 healthy subjects ranging in age from 3 to 96 years collected at 18 sites. They concluded that ComBat-GAM is superior to ComBat at predicting age based on regional volume data. However, Pomponio et al. (2020) only investigated healthy participants, which lacked guidance on harmonization of data for case-control comparisons. Finally, prior studies did not report the magnitude of regression coefficients obtained from various harmonization methods, in spite of an urgent plea by researchers to understand how harmonization influences the output of statistical models run on harmonized data.

Our study sought to fill these gaps by formally comparing regression coefficients and the number of regions showing statistically significant results, including case-control differences in cortical thickness across the lifespan. As shown in Fig. 2, ComBat and ComBat-GAM led to smaller differences between site-specific data and the mean values averaged across sites, and they did not change the site-specific standard deviations for cortical thickness. These results demonstrated that both ComBat and ComBat-GAM are effective at minimizing the effects of site without distorting the data distribution. Harmonization with ComBat-GAM was the most effective at detecting case-control differences as evidenced by significantly more regional findings as compared to other harmonization methods. ComBat-GAM was one of the most effective methods at detecting age-effects in cortical thickness, and the only method to uncover regions with different rates of age-related cortical thinning in cases compared to controls. Furthermore, most of the regions showing statistical significance following harmonization with other methods were also detected following ComBat-GAM harmonization. Whereas we have no collateral information to corroborate the findings from ComBat-GAM harmonization pertaining to case-control differences or age-dependent case-control differences, the literature offers consistent evidence of age-related patterns of cortical thickness across the lifespan (Frangou et al., 2022; Mutlu et al., 2013). One caveat is that motion related artifact, which is associated with lower cortical thickness measurements, increases with age (Savalia et al., 2017). Consequently, reduced cortical thickness with aging may be partially artifactual. Nonetheless, Fig. 3 shows concrete evidence of erroneous harmonization by ComBat that is handled correctly by ComBat-GAM. Our finding is corroborated by independent studies, which demonstrate that the highest cortical thickness occurs in childhood and that age is negatively correlated to cortical thickness with a steeper decline up to the third decade of life more gradually thereafter (Frangou et al., 2022; Mutlu et al., 2013). By contrast, ComBat harmonized the data along a linear pattern with age throughout the lifespan. Thus, ComBat-GAM harmonization may be advantageous, particularly for consortia studies with participants of all ages, particularly youth and young adults.

The performance of ComBat-GAM is attributable to its algorithm. LME models assume that the error terms follow the same normal distribution at all sites, which is rarely the case (Radua et al., 2020). ComBat overcomes this shortcoming by assuming different normal distributions at different sites for the error terms (Radua et al., 2020). ComBat-GAM further improves on ComBat by using a normal distribution as the prior for the intercept and an inverse-gamma distribution as the prior for the scale effect of the sites. It also uses generalized additive model (GAM) to capture non-linear variations in age-related changes in cortical thickness while avoiding overfitting (Pomponio et al., 2020).

In our study, participants at most sites were aged 20–60 years old, while volunteers from three sites were mostly below 20 years old, and participants from one site were mostly over 70 years old. We found that after removing the data from the four sites with either very young or very old participants, ComBat-GAM is not better than other harmonization methods at detecting regions with significant case-control differences and age by diagnosis interactions (see supplementary results section). We could not exclude the possibility that the superiority of ComBat-GAM versus the other methods is driven by overfitting data from sites with very young or very old participants. Fig. 1 shows the data distributions of the four sites are consistent with the literature, with steeper cortical thickness declines in youth and flatter age-appropriate declines in older adults (Frangou et al., 2022; Mutlu et al., 2013). Furthermore, the three sites with participants < 20 years old exhibit similar slopes of age-related declines in cortical thickness. Therefore, rather than concluding that ComBat-GAM overfits data from children contributed by specific sites, there is stronger evidence to conclude that ComBat-GAM accurately captures nonlinear age trends in cortical thickness. Data from sites with a larger age range may address this concern more conclusively.

We found slower rates of age-related decline in cortical thickness in cases compared to controls for 4 regions, but only for data harmonized with ComBat-GAM (Figs. 8 and 9). As shown in supplementary Figs. S1 and S2, cases exhibited lower cortical thickness compared to controls in youth and greater cortical thickness in elderly in the 4 regions. It is possible that PTSD induces more powerful cortical thinning in youth and delayed age-appropriate declines in cortical thickness in elderly. This explanation is partly consistent with previous findings that maltreated youth with versus without chronic PTSD have smaller volumes in the posterior brain structures (De Bellis et al., 2015). More studies are warranted to test whether case-control differences in age-related cortical thinning is overfit by ComBat-GAM.

A study by Ritchie et al. (2018) examined sex-differences in adults from UK Biobank (2750 females; males 2466; 44–77 years old) reported thicker cortex across most of the cortex in females than males except for the right insula. By contrast, harmonization with ComBat-GAM in our study showed that females have greater cortical thickness in prefrontal cortex, inferior parietal regions, and cingulate cortex, whereas males had greater cortical thickness in ventromedial prefrontal cortex, bilateral insula, posterior cingulate areas, and occipital lobe (Fig. 11). This difference may be explained by the large difference of age range in the present study (6.2–85.2 years old) compared to Ritchie et al. (2018) (44–77 years old). As shown in Fig. 3, the slopes of age-related changes in cortical thickness are quite different between young (especially < 20 years old) and old participants. The significant differences between males and females in cortical thickness may be driven by the data of relatively young participants. We found that ComBat and ComBat-GAM outperformed LME harmonization methods for detecting differences between females and males. While we did not formally test harmonization methods to detect age-related sex differences in cortical thickness, Frangou et al., 2022 reported that age-related declines in mean cortical thickness were more rapid in males than females in the mid-life group (30–59 years), but not in the early-life group (3–29 years) and late-life group (61–90 years).

The comparison of the regression coefficients showed that the selection of harmonization methods may either overestimate or underestimate effects of interest, even though the corresponding comparisons of the number of regions exhibiting significant effects were identical between methods. These findings are critical to interpreting statistical outputs. For instance, the magnitude of reductions in cortical thickness per year are biased by the harmonization method being used.

In reporting that ComBat-GAM is more sensitive than other methods, we must be clear to specify our narrow definition of “sensitive”, as the harmonization method that leads to the maximum number of brain regions with statistically significant effects. In fact, this metric does not necessarily determine better performance if we adopt a preferred definition, namely the method that produces results that are most consistent with the ground truth. Unfortunately, identifying ground truth is a challenging proposition, but we consider two options that may be informative and feasible. The first option is to acquire MRI scans and calculate cortical thickness from the same group of participants (or “travelling subjects”) on a variety of scanner manufacturers and MRI facilities. However, a sufficient sample size is essential as it must contain (1) a representative number of cases and controls from (2) across the lifespan in (3) participants of both sexes, (4) scans at each MRI facility and on scanners from each manufacturer. This is required to avoid possible confounds from interactions of scanner type and age, scanner type and diagnosis, and scanner type and sex. A second option is to generate simulated data from a large enough sample of participants, sites, and MRI facilities. The simulated data could be generated by adding characteristic noise, covariance, and bias profiles for each scanner manufacturer and each MRI facility. The simulated data could then be harmonized with several tools of interest to determine the method that produces data that most closely resembles the pre-noised data. Along the same lines, the post-harmonization data and the pre-noised data could be modeled for case-control effects, age effects, and interaction effects. The results of statistical modeling on post-harmonization datasets could be compared to the results from modeling the pre-noised dataset. The harmonization method that leads to results that most closely resemble the results obtained from modeling the pre-noised data would be deemed most faithful to the ground truth. Scanning an appropriate phantom may add value to ascertaining the ground truth, but is unlikely to add value to characterizing the role of age, sex, and diagnosis on harmonization methods.

While our study focused on 4 widely adopted harmonization methods, these represent only a small number in a large array of available methods. There has been a recent explosion in methods that apply machine learning and other advanced multivariate techniques to tackle harmonization. More detailed discussions about machine learning in data harmonization please see supplementary section “Machine learning in data harmonization”. The dawn of the big data age has heralded the need for harmonization methods that operate well beyond neuroimaging data to flexibly and extensibly harmonize manifold data types from social media, mobile devices, and sensors (Agarwal et al., 2013; Davatzikos, 2019). The rapid proliferation of data harmonization methods and the ubiquity of machine learning applications will require careful vetting and rigorous comparisons between competing methods using standard criteria for ascertaining harmonization performance. The urgent goal of advancing open science will be facilitated by developing and embracing advanced harmonization methods (Foster and Deardorff, 2017).

4.1. Limitations

There are four major limitations in the present study. Firstly, we investigated age-related changes in cortical thickness. However, only cross-sectional data was available. New approaches have been developed to harmonize data across scanners and sites as well as longitudinal visits (Beer et al., 2020; Dewey et al., 2019). Age-related cortical thinning estimated by one longitudinal study design was 3 times greater than cortical thinning from a cross-sectional study (Rast et al., 2018). Secondly, we only investigated cortical thickness, which is one of many brain measures that is disturbed in neuropsychiatric disorders. Further studies should investigate the performance of harmonization methods on multi-modal neuroimaging data with various anatomical, diffusion, functional, and clinical/behavioral measures. Thirdly, only three sites constituted participants under 20 years, and one site constituted participants over 70 years. After removing these data, ComBat-GAM did not outperform other harmonization methods in detecting regions with significant case-control differences and age by diagnosis interactions. Data from sites with a larger age range may address this concern more conclusively. Finally, we applied the same statistical model to the output of all harmonization methods to pinpoint differences between harmonization methods rather than statistical models or the interaction of harmonization methodology and statistical modeling. In the main text, the statistical model includes age, sex, and PTSD diagnosis as fixed factors. This model is simple and widely used in most psychiatric studies. We also consider age by diagnosis, and sex by diagnosis interactions because they are frequently tested in the literature. The statistical models listed in the main text may not fully reveal potential influences on cortical thickness, and the optimal statistical model may differ depending on the harmonization method. However, investigating potential interactions of harmonization method and statistical model are well beyond the scope of this study.

5. Conclusion

Cortical thickness data harmonized with ComBat-GAM relative to LMEINT, LMEINT+SLP, and ComBat is more sensitive at detecting significant case-control differences, and case-control differences that vary by age. Both ComBat and ComBat-GAM outperformed LME methods in detecting significant sex differences. ComBat-GAM provides stronger estimates of age-related declines in cortical thickness, stronger estimates of case-related cortical thickness reduction, weaker estimates of age-related declines in cortical thickness in cases than controls, stronger estimates of cortical thickness reduction in females than males, stronger estimates of cortical thickness reduction in females compared to males in cases than in controls. Our results support using ComBat-GAM to harmonize cortical thickness data across study sites to recover statistical power potentially lost by instrumental bias.

Supplementary Material

Supplementary material 2
Supplementary material 3
Supplementary material 1

Acknowledgments

DoD W81XWH-10-1-0925; Center for Brain and Behavior Research Pilot Grant; South Dakota Governor’s Research Center Grant; CX001600 VA CDA; NHMRC Program Grant #1073041; R01 MH111671; VISN6 MIRECC; German Research Foundation grant to J. K. Daniels (DA 1222/4-1 and WA 1539/8-2); VA RR&D 1IK2RX000709; NIMH R01-MH043454; NIMH K01-MH122774; NIMH K01 MH118428-01 (Suarez-Jimenez); 5U01AA021681-08; K24MH71434; K24 DA028773; R01 MH63407; K99NS096116; VA RR&D 1K1RX002325; VA RR&D 1K2RX002922; MH101380; ZonMw, the Netherlands organization for Health Research and Development grant to Miranda Olff (40-00812-98-10041); Academic Medical Center Research Council grant to Miranda Olff (110614); VA CSR&D 1IK2CX001680; VISN17 Center of Excellence pilot funding; NIMH R01MH105535; NIMH 1R21MH102634; German Federal Ministry of Education and Research (BMBF RELEASE 01KR1303A); German Research Society (Deutsche Forschungsgemeinschaft, DFG; SFB/TRR 58: C06, C07); R01MH117601; R01AG059874; MJFF 14848; MH098212; MH071537; M01RR00039; UL1TR000454; HD071982; HD085850; R21MH112956; Anonymous Women’s Health Fund; Kasparian Fund; Trauma Scholars Fund; Barlow Family Fund; NIMH K01 MH118467; W81XWH-08-2-0159; Department of Veterans Affairs via support for the National Center for PTSD; NI-AAA via its support for (P50) Center for the Translational Neuroscience of Alcohol; NCATS via its support of (CTSA) Yale Center for Clinical Investigation; NIH R01 MH106574; F32MH109274; NIMH 1R21MH102634; R01MH113574; R01-MH103291; BOF 2-4 year project to Sven C. Mueller (01J05415); R01MH105355; Dana Foundation (to Dr. Nitschke); the University of Wisconsin Institute for Clinical and Translational Research; a National Science Foundation Graduate Research Fellowship (to Dr. Grupe); the National Institute of Mental Health (NIMH) R01 MH63407 (to De Bellis), R01 AA12479 (to De Bellis), and R01 MH61744 (to De Bellis); R01-MH043454 and T32-MH018931 (to Dr. Davidson); core grant to the Waisman Center from the National Institute of Child Health and Human Development (P30-HD003352); NIMH K23MH112873; Veterans Affairs Merit Review Program (10/01/08 - 09/30/13); L30 MH114379; South African Medical Research Council “SHARED ROOTS ” Flagship Project; Grant-RFA-FSP-01-2013/SHARED ROOTS; South African Research Chair in PTSD from the Department of Science and Technology and the National Research Foundation; US Department of Defense Grant W81XWH08-2-0159 (PI: Stein, Murray B); VA RR&D I01RX000622; CDMRP W81XWH-08-2-0038; South African Medical Research Council; NARSAD Young Investigator; Department of Defense award number W81XWH-12-2-0012; ENIGMA was also supported in part by U54 EB020403 from the Big Data to Knowledge (BD2K) program; R56AG058854; R01MH116147;; P41 EB015922; 1R01MH110483; 1R21 MH098198; R01MH105355-01A. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs, the United States Government, or any other funding sources listed here.

Conflicts of Interest

Dr. Abdallah has served as a consultant, speaker and/or on advisory boards for FSV7, Lundbeck, Psilocybin Labs, Genentech and Janssen, and editor of Chronic Stress for Sage Publications, Inc.; he has filed a patent for using mTOR inhibitors to augment the effects of antidepressants (filed on August 20, 2018). Dr. Davidson is the founder and president of, and serves on the board of directors for, the non-profit organization Healthy Minds Innovations, Inc. Dr. Jahanshad, Dr. Thompson and Dr. Ching received partial research support from Biogen, Inc. (Boston, USA) for research unrelated to the content of this manuscript. Dr. Krystal is a consultant for AbbVie, Inc., Amgen, Astellas Pharma Global Development, Inc., AstraZeneca Pharmaceuticals, Biomedisyn Corporation, Bristol-Myers Squibb, Eli Lilly and Company, Euthymics Bioscience, Inc., Neurovance, Inc., FORUM Pharmaceuticals, Janssen Research & Development, Lundbeck Research USA, Novartis Pharma AG, Otsuka America Pharmaceutical, Inc., Sage Therapeutics, Inc., Sunovion Pharmaceuticals, Inc., and Takeda Industries; is on the Scientific Advisory Board for Lohocla Research Corporation, Mnemosyne Pharmaceuticals, Inc., Naurex, Inc., and Pfizer; is a stockholder in Biohaven Pharmaceuticals; holds stock options in Mnemosyne Pharmaceuticals, Inc.; holds patents for Dopamine and Noradrenergic Reuptake Inhibitors in Treatment of Schizophrenia, US Patent No. 5,447,948 (issued September 5, 1995), and Glutamate Modulating Agents in the Treatment of Mental Disorders, U.S. Patent No. 8,778,979 (issued July 15, 2014); and filed a patent for Intranasal Administration of Ketamine to Treat Depression. U.S. Application No. 14/197,767 (filed on March 5, 2014); US application or Patent Cooperation Treaty international application No. 14/306,382 (filed on June 17, 2014); Filed a patent for using mTOR inhibitors to augment the effects of antidepressants (filed on August 20, 2018). Dr. Schmahl is a consultant for Boehringer Ingelheim International GmbH. Dr. Stein has received research grants and/or consultancy honoraria from Lundbeck and Sun. Dr. Lebois reports unpaid membership on the Scientific Committee for the International Society for the Study of Trauma and Dissociation (ISSTD) and spousal license payment for Vanderbilt IP from Acadia Pharmaceuticals unrelated to the topic of this manuscript.

Footnotes

Credit authorship contribution statement

Delin Sun: Conceptualization, Methodology, Software, Formal analysis, Writing – original draft, Writing – review & editing. Gopalkumar Rakesh: Data curation, Project administration, Funding acquisition. Courtney C. Haswell: Data curation, Project administration, Funding acquisition. Mark Logue: Conceptualization, Methodology, Writing – review & editing. C. Lexi Baird: Data curation, Project administration, Funding acquisition. Erin N. O’Leary: Data curation, Project administration, Funding acquisition. Andrew S. Cotton: Data curation, Project administration, Funding acquisition. Hong Xie: Data curation, Project administration, Funding acquisition. Marijo Tamburrino: Data curation, Project administration, Funding acquisition. Tian Chen: Data curation, Project administration, Funding acquisition. Emily L. Dennis: Data curation, Project administration, Funding acquisition. Neda Jahanshad: Data curation, Project administration, Funding acquisition. Lauren E. Salminen: Data curation, Project administration, Funding acquisition. Sophia I. Thomopoulos: Data curation, Project administration, Funding acquisition. Faisal Rashid: Data curation, Project administration, Funding acquisition. Christopher R.K. Ching: Data cu- ration, Project administration, Funding acquisition. Saskia B.J. Koch: Data curation, Project administration, Funding acquisition. Jessie L. Frijling: Data curation, Project administration, Funding acquisition. Laura Nawijn: Data curation, Project administration, Funding acquisition. Mirjam van Zuiden: Data curation, Project administration, Funding acquisition. Xi Zhu: Data curation, Project administration, Funding acquisition. Benjamin Suarez-Jimenez: Data curation, Project ad- ministration, Funding acquisition. Anika Sierk: Data curation, Project administration, Funding acquisition. Henrik Walter: Data curation, Project administration, Funding acquisition. Antje Manthey: Data curation, Project administration, Funding acquisition. Jennifer S. Stevens: Data curation, Project administration, Funding acquisition. Negar Fani: Data curation, Project administration, Funding acquisition. Sanne J.H. van Rooij: Data curation, Project administration, Funding acquisition. Murray Stein: Data curation, Project administration, Funding acquisition. Jessica Bomyea: Data curation, Project administration, Funding acquisition. Inga K. Koerte: Data curation, Project administration, Funding acquisition. Kyle Choi: Data curation, Project administration, Funding acquisition. Steven J.A. van der Werff: Data curation, Project administration, Funding acquisition. Robert R.J.M. Vermeiren: Data curation, Project administration, Funding acquisition. Julia Herzog: Data curation, Project administration, Funding acquisition. Lauren A.M. Lebois: Data curation, Project administration, Funding acquisition. Justin T. Baker: Data curation, Project administration, Funding acquisition. Elizabeth A. Olson: Data curation, Project administration, Funding acquisition. Thomas Straube: Data curation, Project administration, Funding acquisition. Mayuresh S. Korgaonkar: Data curation, Project administration, Funding acquisition. Elpiniki Andrew: Data curation, Project administration, Funding acquisition. Ye Zhu: Data curation, Project administration, Funding acquisition. Gen Li: Data curation, Project administration, Funding acquisition. Jonathan Ipser: Data curation, Project administration, Funding acquisition. Anna R. Hudson: Data curation, Project administration, Funding acquisition. Matthew Peverill: Data curation, Project administration, Funding acquisition. Kelly Sambrook: Data curation, Project administration, Funding acquisition. Evan Gordon: Data curation, Project administration, Funding acquisition. Lee Baugh: Data curation, Project administration, Funding acquisition. Gina Forster: Data curation, Project administration, Funding acquisition. Raluca M. Simons: Data curation, Project administration, Funding acquisition. Jeffrey S. Simons: Data curation, Project administration, Funding acquisition. Vincent Magnotta: Data curation, Project administration, Funding acquisition. Adi Maron-Katz: Data curation, Project administration, Funding acquisition. Stefan du Plessis: Data curation, Project administration, Funding acquisition. Seth G. Disner: Data curation, Project administration, Funding acquisition. Nicholas Davenport: Data curation, Project ad- ministration, Funding acquisition. Daniel W. Grupe: Data curation, Project administration, Funding acquisition. Jack B. Nitschke: Data curation, Project administration, Funding acquisition. Terri A. deRoon-Cassini: Data curation, Project administration, Funding acquisition. Jacklynn M. Fitzgerald: Data curation, Project administration, Funding acquisition. John H. Krystal: Data curation, Project administration, Funding acquisition. Ifat Levy: Data curation, Project administration, Funding acquisition. Miranda Olff: Data curation, Project administration, Funding acquisition. Dick J. Veltman: Data curation, Project administration, Funding acquisition. Li Wang: Data curation, Project administration, Funding acquisition. Yuval Neria: Data curation, Project administration, Funding acquisition. Michael D. De Bellis: Data curation, Project administration, Funding acquisition. Tanja Jovanovic: Data curation, Project administration, Funding acquisition. Judith K. Daniels: Data curation, Project administration, Funding acquisition. Martha Shenton: Data curation, Project administration, Funding acquisition. Nic J.A. van de Wee: Data curation, Project administration, Funding acquisition. Christian Schmahl: Data curation, Project administration, Funding acquisition. Milissa L. Kaufman: Data curation, Project administration, Funding acquisition. Isabelle M. Rosso: Data curation, Project administration, Funding acquisition. Scott R. Sponheim: Data curation, Project administration, Funding acquisition. David Bernd Hofmann: Data curation, Project administration, Funding acquisition. Richard A. Bryant: Data curation, Project administration, Funding acquisition. Kelene A. Fercho: Data curation, Project administration, Funding acquisition. Dan J. Stein: Data curation, Project administration, Funding acquisition. Sven C. Mueller: Data curation, Project administration, Funding acquisition. Bobak Hosseini: Data curation, Project administration, Funding acquisition. K. Luan Phan: Data curation, Project administration, Funding acquisition. Katie A. McLaughlin: Data curation, Project administration, Funding acquisition. Richard J. Davidson: Data curation, Project administration, Funding acquisition. Christine L. Larson: Data curation, Project administration, Funding acquisition. Geoffrey May: Data curation, Project administration, Funding acquisition. Steven M. Nelson: Data curation, Project administration, Funding acquisition. Chadi G. Abdallah: Data curation, Project administration, Funding acquisition. Hassaan Gomaa: Data curation, Project administration, Funding acquisition. Soraya Seedat: Data curation, Project administration, Funding acquisition. Ilan Harpaz-Rotem: Data curation, Project administration, Funding acquisition. Israel Liberzon: Data curation, Project administration, Funding acquisition. Theo G.M. van Erp: Data curation, Project administration, Funding acquisition. Yann Quidé: Data curation, Project administration, Funding acquisition. Xin Wang: Data curation, Project administration, Funding acquisition. Paul M. Thompson: Conceptualization, Methodology, Writing – review & editing. Rajendra A. Morey: Conceptualization, Methodology, Writing – review & editing, Investigation, Resources, Data curation, Supervision, Project administration, Funding acquisition.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.neuroimage.2022.119509.

All other authors have no conflicts of interest to declare.

References

  1. Agarwal P, Shroff G, Malhotra P, 2013. Approximate incremental big-data harmonization. In: 2013 IEEE International Congress on Big Data, pp. 118–125. [Google Scholar]
  2. Beer JC, Tustison NJ, Cook PA, Davatzikos C, Sheline YI, Shinohara RT, Linn KA, Alzheimer’s Disease Neuroimaging Initiative, 2020. Longitudinal ComBat: a method for harmonizing longitudinal multi-scanner imaging data. Neuroimage 220, 117129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boedhoe PS, Schmaal L, Abe Y, Ameis SH, Arnold PD, Batistuzzo MC, Benedetti F, Beucke JC, Bollettini I, Bose A, Brem S, Calvo A, Cheng Y, Cho KI, Dallaspezia S, Denys D, Fitzgerald KD, Fouche JP, Gimenez M, Gruner P, Hanna GL, Hibar DP, Hoexter MQ, Hu H, Huyser C, Ikari K, Jahanshad N, Kathmann N, Kaufmann C, Koch K, Kwon JS, Lazaro L, Liu Y, Lochner C, Marsh R, Martinez-Zalacain I, Mataix-Cols D, Menchon JM, Minuzzi L, Nakamae T, Nakao T, Narayanaswamy JC, Piras F, Piras F, Pittenger C, Reddy YC, Sato JR, Simpson HB, Soreni N, Soriano-Mas C, Spalletta G, Stevens MC, Szeszko PR, Tolin DF, Venkatasubramanian G, Walitza S, Wang Z, van Wingen GA, Xu J, Xu X, Yun JY, Zhao Q, ENIGMA OCD Working Group, Thompson PM, Stein DJ, van den Heuvel OA, 2017. Distinct subcortical volume alterations in pediatric and adult OCD: a worldwide meta- and mega-analysis. Am. J. Psychiatry 174, 60–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brown SA, Brumback TY, Tomlinson K, Cummins K, Thompson WK, Nagel BJ, De Bellis MD, Hooper SR, Clark DB, Chung T, Hasler BP, Colrain IM, Baker FC, Prouty D, Pfefferbaum A, Sullivan EV, Pohl KM, Rohlfing T, Nichols BN, Chu WW, Tapert SF, 2015. The National Consortium on Alcohol and Neuro-Development in Adolescence (NCANDA): a multisite study of adolescent development and substance use. J. Stud. Alcohol Drugs 76, 895–908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Davatzikos C, 2019. Machine learning in neuroimaging: progress and challenges. Neuroimage 197, 652–656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. De Bellis MD, Hooper SR, Chen SD, Provenzale JM, Boyd BD, Glessner CE, MacFall JR, Payne ME, Rybczynski R, Woolley DP, 2015. Posterior structural brain volumes differ in maltreated youth with and without chronic posttraumatic stress disorder. Dev. Psychopathol 27, 1555–1576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dennis EL, Baron D, Bartnik-Olson B, Caeyenberghs K, Esopenko C, Hillary FG, Kenney K, Koerte IK, Lin AP, Mayer AR, Mondello S, Olsen A, Thompson PM, Tate DF, Wilde EA, 2022. ENIGMA brain injury: framework, challenges, and opportunities. Hum. Brain Mapp 43 (1), 149–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Destrieux C, Fischl B, Dale A, Halgren E, 2010. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. Neuroimage 53, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dewey BE, Zhao C, Reinhold JC, Carass A, Fitzgerald KC, Sotirchos ES, Saidha S, Oh J, Pham DL, Calabresi PA, van Zijl PCM, Prince JL, 2019. Deep harmony: a deep learning approach to contrast harmonization across scanner changes. Magn. Reson. Imaging 64, 160–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Favre P, Pauling M, Stout J, Hozer F, Sarrazin S, Abe C, Alda M, Alloza C, Alonso-Lana S, Andreassen OA, Baune BT, Benedetti F, Busatto GF, Canales-Rodriguez EJ, Caseras X, Chaim-Avancini TM, Ching CRK, Dannlowski U, Deppe M, Eyler LT, Fatjo-Vilas M, Foley SF, Grotegerd D, Hajek T, Haukvik UK, Howells FM, Jahanshad N, Kugel H, Lagerberg TV, Lawrie SM, Linke JO, McIntosh A, Melloni EMT, Mitchell PB, Polosan M, Pomarol–Clotet E, Repple J, Roberts G, Roos A, Rosa PGP, Salvador R, Sarro S, Schofield PR, Serpa MH, Sim K, Stein DJ, Sussmann JE, Temmingh HS, Thompson PM, Verdolini N, Vieta E, Wessa M, Whalley HC, Zanetti MV, Leboyer M, Mangin JF, Henry C, Duchesnay E, Houenou J, ENIGMA Bipolar Disorder Working Group, 2019. Widespread white matter microstructural abnormalities in bipolar disorder: evidence from mega- and meta-analyses across 3033 individuals. Neuropsychopharmacology 44, 2285–2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Fortin JP, Cullen N, Sheline YI, Taylor WD, Aselcioglu I, Cook PA, Adams P, Cooper C, Fava M, McGrath PJ, McInnis M, Phillips ML, Trivedi MH, Weissman MM, Shinohara RT, 2018. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage 167, 104–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fortin JP, Parker D, Tunc B, Watanabe T, Elliott MA, Ruparel K, Roalf DR, Satterthwaite TD, Gur RC, Gur RE, Schultz RT, Verma R, Shinohara RT, 2017. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 161, 149–170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Foster ED, Deardorff A, 2017. Open Science Framework (OSF). J. Med. Libr. Assoc 105, 203. [Google Scholar]
  14. Frangou S, Modabbernia A, Williams SCR, Papachristou E, Doucet GE, Agartz I, Aghajani M, Akudjedu TN, Albajes-Eizagirre A, Alnaes D, Alpert KI, Andersson M, Andreasen NC, Andreassen OA, Asherson P, Banaschewski T, Bargallo N, Baumeister S, Baur-Streubel R, Bertolino A, Bonvino A, Boomsma DI, Borgwardt S, Bourque J, Brandeis D, Breier A, Brodaty H, Brouwer RM, Buitelaar JK, Busatto GF, Buckner RL, Calhoun V, Canales-Rodriguez EJ, Cannon DM, Caseras X, Castellanos FX, Cervenka S, Chaim-Avancini TM, Ching CRK, Chubar V, Clark VP, Conrod P, Conzelmann A, Crespo-Facorro B, Crivello F, Crone EA, Dale AM, Dannlowski U, Davey C, de Geus EJC, de Haan L, de Zubicaray GI, den Braber A, Dickie EW, Di Giorgio A, Doan NT, Dorum ES, Ehrlich S, Erk S, Espeseth T, Fatouros-Bergman H, Fisher SE, Fouche JP, Franke B, Frodl T, Fuentes-Claramonte P, Glahn DC, Gotlib IH, Grabe HJ, Grimm O, Groenewold NA, Grotegerd D, Gruber O, Gruner P, Gur RE, Gur RC, Hahn T, Harrison BJ, Hartman CA, Hatton SN, Heinz A, Heslenfeld DJ, Hibar DP, Hickie IB, Ho BC, Hoekstra PJ, Hohmann S, Holmes AJ, Hoogman M, Hosten N, Howells FM, Hulshoff Pol HE, Huyser C, Jahanshad N, James A, Jernigan TL, Jiang J, Jonsson EG, Joska JA, Kahn R, Kalnin A, Kanai R, Klein M, Klyushnik TP, Koenders L, Koops S, Kramer B, Kuntsi J, Lagopoulos J, Lazaro L, Lebedeva I, Lee WH, Lesch KP, Lochner C, Machielsen MWJ, Maingault S, Martin NG, Martinez-Zalacain I, Mataix-Cols D, Mazoyer B, McDonald C, McDonald BC, McIntosh AM, McMahon KL, McPhilemy G, Meinert S, Menchon JM, Medland SE, Meyer-Lindenberg A, Naaijen J, Najt P, Nakao T, Nordvik JE, Nyberg L, Oosterlaan J, de la Foz VO, Paloyelis Y, Pauli P, Pergola G, Pomarol–Clotet E, Portella MJ, Potkin SG, Radua J, Reif A, Rinker DA, Roffman JL, Rosa PGP, Sacchet MD, Sachdev PS, Salvador R, Sanchez-Juan P, Sarro S, Satterthwaite TD, Saykin AJ, Serpa MH, Schmaal L, Schnell K, Schumann G, Sim K, Smoller JW, Sommer I, Soriano-Mas C, Stein DJ, Strike LT, Swagerman SC, Tamnes CK, Temmingh HS, Thomopoulos SI, Tomyshev AS, Tordesillas-Gutierrez D, Trollor JN, Turner JA, Uhlmann A, van den Heuvel OA, van den Meer D, van der Wee NJA, van Haren NEM, van ‘t Ent D, van Erp TGM, Veer IM, Veltman DJ, Voineskos A, Volzke H, Walter H, Walton E, Wang L, Wang Y, Wassink TH, Weber B, Wen W, West JD, Westlye LT, Whalley H, Wierenga LM, Wittfeld K, Wolf DH, Worker A, Wright MJ, Yang K, Yoncheva Y, Zanetti MV, Ziegler GC, Karolinska Schizophrenia P, Thompson PM, Dima D, 2022. Cortical thickness across the lifespan: data from 17,075 healthy individuals aged 3–90 years. Hum. Brain Mapp 43 (1), 431–451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hatton SN, Huynh KH, Bonilha L, Abela E, Alhusaini S, Altmann A, Alvim MKM, Balachandra AR, Bartolini E, Bender B, Bernasconi N, Bernasconi A, Bernhardt B, Bargallo N, Caldairou B, Caligiuri ME, Carr SJA, Cavalleri GL, Cendes F, Concha L, Davoodi-bojd E, Desmond PM, Devinsky O, Doherty CP, Domin M, Duncan JS, Focke NK, Foley SF, Gambardella A, Gleichgerrcht E, Guerrini R, Hamandi K, Ishikawa A, Keller SS, Kochunov PV, Kotikalapudi R, Kreilkamp BAK, Kwan P, Labate A, Langner S, Lenge M, Liu M, Lui E, Martin P, Mascalchi M, Moreira JCV, Morita-Sherman ME, O’Brien TJ, Pardoe HR, Pariente JC, Ribeiro LF, Richardson MP, Rocha CS, Rodriguez-Cruces R, Rosenow F, Severino M, Sinclair B, Soltanian-Zadeh H, Striano P, Taylor PN, Thomas RH, Tortora D, Velakoulis D, Vezzani A, Vivash L, von Podewils F, Vos SB, Weber B, Winston GP, Yasuda CL, Zhu AH, Thompson PM, Whelan CD, Jahanshad N, Sisodiya SM, McDonald CR, 2020. White matter abnormalities across different epilepsy syndromes in adults: an ENIGMA-Epilepsy Study. Brain 143, 2454–2473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hicks R, Giacino J, Harrison-Felix C, Manley G, Valadka A, Wilde EA, 2013. Progress in developing common data elements for traumatic brain injury research: version two —the end of the beginning. J. Neurotrauma 30, 1852–1861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hofer E, Roshchupkin GV, Adams HHH, Knol MJ, Lin HH, Li S, Zare H, Ahmad S, Armstrong NJ, Satizabal CL, Bernard M, Bis JC, Gillespie NA, Luciano M, Mishra A, Scholz M, Teumer A, Xia R, Jian XQ, Mosley TH, Saba Y, Pirpamer L, Seiler S, Becker JT, Carmichael O, Rotter JI, Psaty BM, Lopez OL, Amin N, van der Lee SJ, Yang Q, Himali JJ, Maillard P, Beiser AS, DeCarli C, Karama S, Lewis L, Harris M, Bastin ME, Deary IJ, Veronica Witte A, Beyer F, Loeffler M, Mather KA, Schofield PR, Thalamuthu A, Kwok JB, Wright MJ, Ames D, Trollor J, Jiang JY, Brodaty H, Wen W, Vernooij MW, Hofman A, Uitterlinden AG, Niessen WJ, Wittfeld K, Bulow R, Volker U, Pausova Z, Bruce Pike G, Maingault S, Crivello F, Tzourio C, Amouyel P, Mazoyer B, Neale MC, Franz CE, Lyons MJ, Panizzon MS, Andreassen OA, Dale AM, Logue M, Grasby KL, Jahanshad N, Painter JN, Colodro-Conde L, Bralten J, Hibar DP, Lind PA, Pizzagalli F, Stein JL, Thompson PM, Medland SE, Sachdev PS, Kremen WS, Wardlaw JM, Villringer A, van Duijn CM, Grabe HJ, Longstreth WT, Fornage M, Paus T, Debette S, Ikram MA, Schmidt H, Schmidt R, Seshadri S, Consortium E, 2020. Genetic correlations and genome-wide associations of cortical structure in general population samples of 22,824 adults. Nat. Commun 11 (1), 4796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Johnson WE, Li C, Rabinovic A, 2007. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127. [DOI] [PubMed] [Google Scholar]
  19. Koshiyama D, Miura K, Nemoto K, Okada N, Matsumoto J, Fukunaga M, Hashimoto R, 2022. Neuroimaging studies within Cognitive Genetics Collaborative Research Organization aiming to replicate and extend works of ENIGMA. Hum. Brain Mapp 43 (1), 182–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Logue MW, van Rooij SJH, Dennis EL, Davis SL, Hayes JP, Stevens JS, Densmore M, Haswell CC, Ipser J, Koch SBJ, Korgaonkar M, Lebois LAM, Peverill M, Baker JT, Boedhoe PSW, Frijling JL, Gruber SA, Harpaz-Rotem I, Jahanshad N, Koopowitz S, Levy I, Nawijn L, O’Connor L, Olff M, Salat DH, Sheridan MA, Spielberg JM, van Zuiden M, Winternitz SR, Wolff JD, Wolf EJ, Wang X, Wrocklage K, Abdallah CG, Bryant RA, Geuze E, Jovanovic T, Kaufman ML, King AP, Krystal JH, Lagopoulos J, Bennett M, Lanius R, Liberzon I, McGlinchey RE, McLaughlin KA, Milberg WP, Miller MW, Ressler KJ, Veltman DJ, Stein DJ, Thomaes K, Thompson PM, Morey RA, 2018. Smaller hippocampal volume in posttraumatic stress disorder: a Multisite ENIGMA-PGC Study: subcortical volumetry results from posttraumatic stress disorder consortia. Biol. Psychiatry 83, 244–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Mutlu AK, Schneider M, Debbane M, Badoud D, Eliez S, Schaer M, 2013. Sex differences in thickness, and folding developments throughout the cortex. Neuroimage 82, 200–207. [DOI] [PubMed] [Google Scholar]
  22. Noble S, Scheinost D, Finn ES, Shen XL, Papademetris X, McEwen SC, Bearden CE, Addington J, Goodyear B, Cadenhead KS, Mirzakhanian H, Cornblatt BA, Olvet DM, Mathalon DH, McGlashan TH, Perkins DO, Belger A, Seidman LJ, Thermenos H, Tsuang MT, van Erp TGM, Walker EF, Hamann S, Woods SW, Cannon TD, Constable RT, 2017. Multisite reliability of MR-based functional connectivity. Neuroimage 146, 959–970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Pomponio R, Erus G, Habes M, Doshi J, Srinivasan D, Mamourian E, Bashyam V, Nasrallah IM, Satterthwaite TD, Fan Y, Launer LJ, Masters CL, Maruff P, Zhuo CJ, Volzke H, Johnson SC, Fripp J, Koutsouleris N, Wolf DH, Gur R, Gur R, Morris J, Albert MS, Grabe HJ, Resnick SM, Bryan RN, Wolk DA, Shinohara RT, Shou HC, Davatzikos C, 2020. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. Neuroimage 208, 116450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Radua J, Vieta E, Shinohara R, Kochunov P, Quide Y, Green MJ, Weickert CS, Weickert T, Bruggemann J, Kircher T, Nenadic I, Cairns MJ, Seal M, Schall U, Henskens F, Fullerton JM, Mowry B, Pantelis C, Lenroot R, Cropley V, Loughland C, Scott R, Wolf D, Satterthwaite TD, Tan Y, Sim K, Piras F, Spalletta G, Banaj N, Pomarol-Clotet E, Solanes A, Albajes-Eizagirre A, Canales-Rodriguez EJ, Sarro S, Di Giorgio A, Bertolino A, Stablein M, Oertel V, Knochel C, Borgwardt S, du Plessis S, Yun JY, Kwon JS, Dannlowski U, Hahn T, Grotegerd D, Alloza C, Arango C, Janssen J, Diaz-Caneja C, Jiang W, Calhoun V, Ehrlich S, Yang K, Cascella NG, Takayanagi Y, Sawa A, Tomyshev A, Lebedeva I, Kaleda V, Kirschner M, Hoschl C, Tomecek D, Skoch A, van Amelsvoort T, Bakker G, James A, Preda A, Weideman A, Stein DJ, Howells F, Uhlmann A, Temmingh H, Lopez-Jaramillo C, Diaz-Zuluaga A, Fortea L, Martinez-Heras E, Solana E, Llufriu S, Jahanshad N, Thompson P, Turner J, van Erp T ENIGMA Consortium collaborators, 2020. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. Neuroimage 218, 116956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rast P, Kennedy KM, Rodrigue KM, Robinson P, Gross AL, McLaren DG, Grabowski T, Schaie KW, Willis SL, 2018. APOEepsilon4 genotype and hypertension modify 8-year cortical thinning: five occasion evidence from the Seattle Longitudinal Study. Cereb. Cortex 28, 1934–1945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ritchie SJ, Cox SR, Shen X, Lombardo MV, Reus LM, Alloza C, Harris MA, Alderson HL, Hunter S, Neilson E, Liewald DCM, Auyeung B, Whalley HC, Lawrie SM, Gale CR, Bastin ME, McIntosh AM, Deary IJ, 2018. Sex differences in the adult human brain: evidence from 5216 UK Biobank participants. Cereb. Cortex 28, 2959–2975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Santhanam P, Wilson SH, Mulatya C, Oakes TR, Weaver LK, 2019. Age-accelerated reduction in cortical surface area in United States service members and veterans with mild traumatic brain injury and post-traumatic stress disorder. J. Neurotrauma 36, 2922–2929. [DOI] [PubMed] [Google Scholar]
  28. Savalia NK, Agres PF, Chan MY, Feczko EJ, Kennedy KM, Wig GS, 2017. Motion-related artifacts in structural brain images revealed with independent estimates of in-scanner head motion. Hum. Brain Mapp 38, 472–492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Savjani RR, Taylor BA, Acion L, Wilde EA, Jorge RE, 2017. Accelerated changes in cortical thickness measurements with age in military service members with traumatic brain injury. J. Neurotrauma 34, 3107–3116. [DOI] [PubMed] [Google Scholar]
  30. Shalev A, Liberzon I, Marmar C, 2017. Post-traumatic stress disorder. N. Engl. J. Med 376, 2459–2469. [DOI] [PubMed] [Google Scholar]
  31. Thompson PM, Jahanshad N, Ching CRK, Salminen LE, Thomopoulos SI, Bright J, Baune BT, Bertolin S, Bralten J, Bruin WB, Bulow R, Chen J, Chye Y, Dannlowski U, de Kovel CGF, Donohoe G, Eyler LT, Faraone SV, Favre P, Filippi CA, Frodl T, Garijo D, Gil Y, Grabe HJ, Grasby KL, Hajek T, Han LKM, Hatton SN, Hilbert K, Ho TFC, Holleran L, Homuth G, Hosten N, Houenou J, Ivanov I, Jia TY, Kelly S, Klein M, Kwon JS, Laansma MA, Leerssen J, Lueken U, Nunes A, Neill JO, Opel N, Piras F, Piras F, Postema MC, Pozzi E, Shatokhina N, Soriano-Mas C, Spalletta G, Sun DQ, Teumer A, Tilot AK, Tozzi L, van der Merwe C, Van Someren EJW, van Wingen GA, Volzke H, Walton E, Wang L, Winkler AM, Wittfeld K, Wright MJ, Yun JY, Zhang GH, Zhang-James Y, Adhikari BM, Agartz I, Aghajani M, Aleman A, Althoff RR, Altmann A, Andreassen OA, Baron DA, Bartnik-Olson BL, Bas-Hoogendam JM, Baskin-Sommers AR, Bearden CE, Berner LA, Boedhoe PSW, Brouwer RM, Buitelaar JK, Caeyenberghs K, Cecil CAM, Cohen RA, Cole JH, Conrod PJ, De Brito SA, de Zwarte SMC, Dennis EL, Desrivieres S, Dima D, Ehrlich S, Esopenko C, Fairchild G, Fisher SE, Fouche JP, Francks C, Frangou S, Franke B, Garavan HP, Glahn DC, Groenewold NA, Gurholt TP, Gutman BA, Hahn T, Harding IH, Hernaus D, Hibar DP, Hillary FG, Hoogman M, Pol HHE, Jalbrzikowski M, Karkashadze GA, Klapwijk ET, Knickmeyer RC, Kochunov P, Koerte IK, Kong XZ, Liew SL, Lin ALP, Logue MW, Luders E, Macciardi F, Mackey S, Mayer AR, McDonald CR, McMahon AB, Medland SE, Modinos G, Morey RA, Mueller SC, Mukherjee P, Namazova-Baranova L, Nir TM, Olsen A, Paschou P, Pine DS, Pizzagalli F, Renteria ME, Rohrer JD, Samann PG, Schmaal L, Schumann G, Shiroishi MS, Sisodiya SM, Smit DJA, Sonderby IE, Stein DJ, Stein JL, Tahmasian M, Tate DF, Turner JA, van den Heuvel OA, van der Wee NJA, van der Werf YD, van Erp TGM, van Haren NEM, van Rooij D, van Velzen LS, Veer IM, Veltman DJ, Villalon-Reina JE, Walter H, Whelan CD, Wilde EA, Zarei M, Zelman V ENIGMA Consortium, 2020. ENIGMA and global neuroscience: a decade of large-scale studies of the brain in health and disease across more than 40 countries. Transl. Psychiatry 10 (1), 100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. van Rooij D, Anagnostou E, Arango C, Auzias G, Behrmann M, Busatto GF, Calderoni S, Daly E, Deruelle C, Di Martino A, Dinstein I, Duran FLS, Durston S, Ecker C, Fair D, Fedor J, Fitzgerald J, Freitag CM, Gallagher L, Gori I, Haar S, Hoekstra L, Jahanshad N, Jalbrzikowski M, Janssen J, Lerch J, Luna B, Martinho MM, McGrath J, Muratori F, Murphy CM, Murphy DGM, O’Hearn K, Oranje B, Parellada M, Retico A, Rosa P, Rubia K, Shook D, Taylor M, Thompson PM, Tosetti M, Wallace GL, Zhou F, Buitelaar JK, 2018. Cortical and subcortical brain morphometry differences between patients with autism spectrum disorder and healthy individuals across the lifespan: results from the ENIGMA ASD Working Group. Am. J. Psychiatry 175, 359–369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Volkow ND, Koob GF, Croyle RT, Bianchi DW, Gordon JA, Koroshetz WJ, Perez-Stable EJ, Riley WT, Bloch MH, Conway K, Deeds BG, Dowling GJ, Grant S, Howlett KD, Matochik JA, Morgan GD, Murray MM, Noronha A, Spong CY, Wargo EM, Warren KR, Weiss SRB, 2018. The conception of the ABCD study: from substance use to a broad NIH collaboration. Dev. Cogn. Neurosci 32, 4–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Walhovd KB, Fjell AM, Giedd J, Dale AM, Brown TT, 2017. Through thick and thin: a need to reconcile contradictory results on trajectories in human cortical development. Cereb. Cortex 27, 1472–1481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Wang X, Xie H, Chen T, Cotton AS, Salminen LE, Logue MW, Clarke-Rubright EK, Wall J, Dennis EL, O’Leary BM, Abdallah CG, Andrew E, Baugh LA, Bomyea J, Bruce SE, Bryant R, Choi K, Daniels JK, Davenport ND, Davidson RJ, DeBellis M, deRoon-Cassini T, Disner SG, Fani N, Fercho KA, Fitzgerald J, Forster GL, Frijling JL, Geuze E, Gomaa H, Gordon EM, Grupe D, Harpaz-Rotem I, Haswell CC, Herzog JI, Hofmann D, Hollifield M, Hosseini B, Hudson AR, Ipser J, Jahanshad N, Jovanovic T, Kaufman ML, King AP, Koch SBJ, Koerte IK, Korgaonkar MS, Krystal JH, Larson C, Lebois LAM, Levy I, Li G, Magnotta VA, Manthey A, May G, McLaughlin KA, Mueller SC, Nawijn L, Nelson SM, Neria Y, Nitschke JB, Olff M, Olson EA, Peverill M, Luan Phan K, Rashid FM, Ressler K, Rosso IM, Sambrook K, Schmahl C, Shenton ME, Sierk A, Simons JS, Simons RM, Sponheim SR, Stein MB, Stein DJ, Stevens JS, Straube T, Suarez-Jimenez B, Tamburrino M, Thomopoulos SI, van der Wee NJA, van der Werff SJA, van Erp TGM, van Rooij SJH, van Zuiden M, Varkevisser T, Veltman DJ, Vermeiren R, Walter H, Wang L, Zhu Y, Zhu X, Thompson PM, Morey RA, Liberzon I, 2021. Cortical volume abnormalities in posttraumatic stress disorder: an ENIGMA-psychiatric genomics consortium PTSD workgroup mega-analysis. Mol. Psychiatry 26 (8), 4331–4343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Yu MC, Linn KA, Cook PA, Phillips ML, McInnis M, Fava M, Trivedi MH, Weissman MM, Shinohara RT, Sheline YI, 2018. Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data. Hum. Brain Mapp 39, 4213–4227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Zugman A, Harrewijn A, Cardinale EM, Zwiebel H, Freitag GF, Werwath KE, Bas-Hoogendam JM, Groenewold NA, Aghajani M, Hilbert K, Cardoner N, Porta-Casteras D, Gosnell S, Salas R, Blair KS, Blair JR, Hammoud MZ, Milad M, Burkhouse K, Phan KL, Schroeder HK, Strawn JR, Beesdo-Baum K, Thomopoulos SI, Grabe HJ, Van der Auwera S, Wittfeld K, Nielsen JA, Buckner R, Smoller JW, Mwangi B, Soares JC, Wu MJ, Zunta-Soares GB, Jackowski AP, Pan PM, Salum GA, Assaf M, Diefenbach GJ, Brambilla P, Maggioni E, Hofmann D, Straube T, Andreescu C, Berta R, Tamburo E, Price R, Manfro GG, Critchley HD, Makovac E, Mancini M, Meeten F, Ottaviani C, Agosta F, Canu E, Cividini C, Filippi M, Kostic M, Munjiza A, Filippi CA, Leibenluft E, Alberton BAV, Balderston NL, Ernst M, Grillon C, Mujica-Parodi LR, van Nieuwenhuizen H, Fonzo GA, Paulus MP, Stein MB, Gur RE, Gur RC, Kaczkurkin AN, Larsen B, Satterthwaite TD, Harper J, Myers M, Perino MT, Yu Q, Sylvester CM, Veltman DJ, Lueken U, Van der Wee NJA, Stein DJ, Jahanshad N, Thompson PM, Pine DS, Winkler AM, 2022. Mega–analysis methods in ENIGMA: the experience of the generalized anxiety disorder working group. Hum. Brain Mapp 43 (1), 255–277. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 2
Supplementary material 3
Supplementary material 1

RESOURCES