Skip to main content
Human Brain Mapping logoLink to Human Brain Mapping
. 2026 Jan 30;47(2):e70458. doi: 10.1002/hbm.70458

A Systematic Evaluation of the Performance of Multiple Brain Age Algorithms in Two Cohorts of Youth

Cleanthis Michael 1, Natasha S Jones 1, Jamie L Hanson 2,3, Heidi B Westerman 1, Kelly L Klump 4, Colter Mitchell 5,6, Christopher S Monk 1,5, S Alexandra Burt 4, Luke W Hyde 1,5,
PMCID: PMC12856713  PMID: 41614428

ABSTRACT

The brain matures rapidly during childhood and adolescence. The environment may calibrate the pace of this process to shape cognition and mental health. Extending its utility as a risk marker from older to younger populations, brain age has been proposed to capture relative brain maturity in youth. Multiple algorithms have been developed to estimate brain age in predominantly White advantaged adults. Whether these models are useful in youth, particularly in more representative cohorts, remains unclear. Here, we systematically compare five influential algorithms (Drobinin, Whitmore, Pyment, Kaufmann, Centile) in two population‐based youth cohorts as a benchmark for future applied research. We examined (a) prediction accuracy (correlation with chronological age, mean absolute error), (b) sensitivity to scanning parameters (acquisition sequence, image quality), demographics (sex, puberty), and genetic similarity (intraclass correlations in pairs of monozygotic twins), and (c) strength of convergence between algorithms. In our primary sample of twins recruited from birth records to represent families in disadvantaged neighborhoods (N = 593; 9–19 years), three algorithms (Drobinin, Pyment, Centile) exhibited strong predictions from structural MRI data (correlations with chronological age = 0.51–0.68, mean absolute error = 1.60–3.02). These algorithms also generated correlated brain age values and gaps, and the expected pattern of strong but not identical intraclass correlations in monozygotic twins. Pyment exhibited the strongest correlation with age and was not sensitive to acquisition sequence, image quality, sex, and puberty. In a second sample of predominantly Black, low‐income youth with a narrow age range (N = 198; 15–17 years), these five algorithms exhibited weak predictions. This study raises critical questions about what “brain age” means, how it can best be estimated depending on the research question and study population, and whether it can be universally applied across samples with heterogeneous backgrounds and age ranges that are narrow or misaligned with the training data.

Keywords: adolescence, algorithm validity, brain age, developmental neuroscience, pace of brain development, population neuroscience


Performance of five influential brain age estimation algorithms in two population‐based cohorts of youth. Brain age was quantified using estimates of brain structure, and performance was evaluated based on correlations with chronological age, error indices, sensitivity to demographic characteristics, scanning parameters, genetic similarity, and convergence of predictions across algorithms.

graphic file with name HBM-47-e70458-g003.jpg

1. Introduction

With rising rates of psychiatric disorders (Keyes et al. 2019), together with technological advances in biological sciences, the past decade has witnessed growing interest in establishing reliable biomarkers of healthy and maladaptive aging and development. One promising type of biomarker involves the divergence between an individual's biological (e.g., telomere length, epigenetics) and chronological age (Cole et al. 2019). Recently, algorithms have been developed to estimate an individual's age from neuroimaging features, called “brain age” (Cole and Franke 2017). Brain age has most commonly been estimated from brain structure using anatomical scans (Franke and Gaser 2019). Deviations between brain age and actual age are referred to as brain age gap (Cole et al. 2019) and characterize how an individual's brain maturation diverges from an expected aging trajectory, with values thought to reflect relatively accelerated (positive values) or delayed (negative values) brain aging (Cole et al. 2019). Brain age was originally developed for older populations and predicts risk for cognitive decline, neurodegenerative diseases such as dementia, and psychiatric conditions such as schizophrenia (Cole et al. 2019). Nevertheless, brain age is increasingly being applied in youth to approximate the pace of brain maturation, either using existing algorithms trained across the lifespan or new algorithms trained in youth. Given the existence of multiple algorithms, the best way to measure brain age remains unclear (Bashyam et al. 2020; Hahn et al. 2021), especially in younger populations (Somerville 2016; Whitmore et al. 2023; Whitmore and Beck 2025).

1.1. Brain Development During Childhood and Adolescence

The human brain exhibits a protracted developmental course, undergoing dramatic maturational remodeling across childhood and adolescence. Cortical thickness increases during the early postnatal period, followed by rapid thinning throughout childhood and adolescence (Fuhrmann et al. 2015; Rakesh et al. 2023; Tamnes et al. 2017). Similarly, cortical surface area, as well as cortical and subcortical volumes, expand across childhood and decline during adolescence (LeWinn et al. 2017; Mills et al. 2014; Rakesh et al. 2023; Wierenga et al. 2014). Despite these normative group‐level patterns of structural brain maturation in youth, meaningful individual differences in the trajectory and pace of neurodevelopment exist (Mills et al. 2021).

Theoretical models postulate that the pace of brain development is critical for adaptation. On the one hand, protracted brain development may provide an extended window during which youth can learn from and adapt to their environment (Tottenham 2020), but could also delay cognitive and socioemotional maturation, elevating risk for psychopathology (Johnson et al. 2016). On the other hand, accelerated brain development may shorten periods of peak plasticity, restricting children's ability to adapt to future experiences (Tottenham 2020). However, faster brain development may also allow youth to independently navigate their environments at an earlier age (Callaghan and Tottenham 2016). While challenges in longitudinal neuroimaging data collection and analysis (Telzer et al. 2018) introduce obstacles with estimating the pace of brain development, brain age has been used in cross‐sectional samples as a parsimonious data‐driven biomarker of how an individual's trajectory of brain maturation deviates from the norm in youth.

1.2. Application and Validation of Brain Age

Over the last 5 years, an increasing number of studies have developed or implemented algorithms to estimate brain age in youth. These studies have been interested in evaluating how brain maturation is associated with developmental outcomes such as cognitive functioning (Erus et al. 2015; Kelly et al. 2022; Whitmore et al. 2023) and risk for psychopathology (Cropley et al. 2021; Keding et al. 2021; Kurth et al. 2022; MacSweeney et al. 2024; Sanford et al. 2022), or may be impacted by adversity (Beck et al. 2025; Drobinin et al. 2022; Keding et al. 2021; Rakesh et al. 2021). These studies usually implement different estimation algorithms and have reported inconsistent findings about whether advanced or delayed brain maturation is associated with adversity exposure, enhanced or compromised cognitive performance, or elevated versus reduced psychiatric symptoms. These findings demonstrate an urgent need to understand what brain age represents and how it can best be estimated in youth to appropriately characterize the environmental susceptibility and functional implications of brain maturation.

Parametric and demographic variations in existing algorithms create open questions about how distinct algorithms may perform across different cohorts of youth, particularly those with different demographic backgrounds. Different algorithms are trained on different neuroimaging parcels and features and therefore predict brain age using different aspects of brain organization. Similarly, each model is trained using different machine learning algorithms, such as regularized gradient boosting, deep learning, and Gaussian process regression. Demographically, the vast majority of existing models have been trained and validated in cohorts that primarily comprise adults (Kaufmann et al. 2019; Leonardsen et al. 2022; Yu et al. 2024), which may prevent them from accurately estimating brain age in youth. Most algorithms have also been trained in samples without psychiatric or neurological diagnoses, limiting generalizability to broader populations that intrinsically involve many individuals with these diagnoses. Finally, as many neuroimaging studies rely on convenience samples that are relatively homogenous across certain demographic features (e.g., socioeconomic status, race/ethnicity), findings from one study may not generalize to another study or broadly across a specific population (Falk et al. 2013; Hyde et al. 2024). Thus, even emerging algorithms trained in youth (Drobinin et al. 2022; Whitmore et al. 2023) may be challenging to apply in developmental cohorts with different demographic backgrounds because they were trained on relatively homogenous convenience samples. As such, several debates exist in the literature regarding which estimation algorithms can be adopted to estimate brain age in each study (Bashyam et al. 2020; Hahn et al. 2021), especially for youth (Whitmore and Beck 2025). These considerations prompt the need for evaluating algorithms in samples that contain heterogeneity across demographic features but have a clear sampling frame to precisely characterize the representativeness and generalizability of findings.

Recently, our team and others have rigorously compared different brain age algorithms in adults to make empirical recommendations for future research (Bacas et al. 2023; Dörfel et al. 2023; Hanson et al. 2024). These studies demonstrate that different algorithms exhibit different accuracy (correlations with chronological age, error indices such as mean absolute error [MAE] and root mean squared error [RMSE]), reliability (intraclass correlations across repeated scans), and sensitivity to head motion (associations with image quality and head motion). Such rigorous validation studies have not yet been conducted in youth. Evaluating the psychometric properties of different algorithms in youth can inform how future applied studies select or train brain age models and may clarify the currently mixed findings on how brain age gaps are associated with important contexts and outcomes in youth.

1.3. The Current Study

The present study investigated the performance of multiple brain age algorithms in two population‐based cohorts of youth to serve as a benchmark for future applied research. We selected three lifespan algorithms (Kaufmann et al. 2019; Leonardsen et al. 2022; Yu et al. 2024) that are well‐known, used frequently, and previously compared in validation studies in adults (Bacas et al. 2023; Dörfel et al. 2023; Hanson et al. 2024), as well as two recent algorithms developed specifically in youth (Drobinin et al. 2022; Whitmore et al. 2023). Our primary sample involved twins (9–19 years) recruited from birth records who were residing in neighborhoods with above‐average levels of poverty. This sampling frame resulted in a population‐based sample with enriched representation of disadvantaged families historically underrepresented in neuroimaging research and existing brain age models. To characterize algorithm performance, we probed associations and evaluation metrics between brain age and chronological age (i.e., correlations, MAEs, RMSEs). To identify sources of variability in brain age gaps that can inform which algorithms are adopted depending on the research question, we next evaluated each algorithm's sensitivity to scan parameters and demographic characteristics. Furthermore, since monozygotic twins have identical DNA, but different life experiences, they can be leveraged to examine whether these algorithms can detect differences in brain maturation among genetically identical youth while generating predictions that are correlated between twins as expected. Finally, we repeated our analyses in a second population‐based sample of predominantly Black, low‐income youth sampled in a narrower age range (15–17 years) recruited at birth from urban US hospitals to further interrogate the performance of existing brain age algorithms across youth from different cohorts, sampling frames, and demographic composition. As limited research has compared how different brain age algorithms perform across the lifespan generally, and in youth specifically, we did not specify a priori hypotheses about the relative performance of each estimation algorithm.

2. Materials and Methods

2.1. Participants

As described previously (Bezek et al. 2024; Michael et al. 2023), participants for the present study were part of the Michigan Twin Neurogenetics Study (MTwiNS), recruited from the Twin Study of Behavioral and Emotional Development—Child (TBED‐C), a project within the broader Michigan State Twin Registry (Burt and Klump 2013). Using birth records, the TBED‐C identified twin families living within 120 miles of East Lansing, MI, including urban (e.g., Detroit, Flint, Lansing), suburban, and rural areas. The TBED‐C included a population‐based arm (528 twin families) with children aged 6–10 years, and an “at‐risk” arm (502 twin families) from the same geographic region, but only recruited from neighborhoods with above‐median levels of poverty (> 10.5% of families in the neighborhood living below the poverty line, the median at study onset; Burt and Klump 2019). In a follow‐up neuroimaging study, MTwiNS recruited families from the “at‐risk” arm, as well as those in the population‐based arm that would have met “at‐risk” criteria (i.e., living in neighborhoods with above‐median levels of poverty). Although TBED‐C and MTwiNS recruited twins to parse genetic and environmental influences on brain development, risk, and resilience, twins are broadly representative of singletons in the population (Willemsen et al. 2021). This dataset can therefore provide important insight into how different brain age algorithms perform in population‐based cohorts of youth.

We have assessed 708 twins from 354 families for MTwiNS. Of these, 600 twins had structural magnetic resonance imaging (MRI) data required for brain age estimations (see Table 1), and seven participants were excluded due to either neurological diagnoses (n = 2) or low MRI data quality (n = 5) (see Section 2.3). Thus, the final sample included 593 participants from 320 families (54.0% male; 80.1% White, 11.6% Black, 0.5% Asian, 1.0% Latino/Latina, 0.7% Native American, 0.8% Pacific Islander, 5.3% “Other” ethnic/racial group membership). Youth were between 9 and 19 years, though 96.3% of the sample was between 10 and 18 years (M [SD] = 14.76 [2.03] years). Relative to excluded participants, included participants were significantly older by an average of 1.02 years (p fdr = 0.003), but did not significantly differ with respect to sex assigned at birth, race/ethnicity, household income, or parental education (all p fdr's > 0.050). Participants' guardians provided informed consent and participants provided assent in compliance with Institutional Review Board policies and American Psychological Association ethical standards in the treatment of human participants.

TABLE 1.

Summary of magnetic resonance imaging data included in the Michigan Twin Neurogenetics Study.

Number lost Participants with data
Original sample 708
Declined MRI scan (including declining to remove jewelry/piercings) 27
Uncomfortable with MRI scan 18
Dental (e.g., bracers, retainer) 17
Metal in/on the body a (including recent surgery) 13
Exceeding scanner size restrictions (e.g., overweight, broad shoulders) 5
Major medical/neurological disorder (e.g., autism spectrum, TBI, tumor) 8
Incomplete scan (no anatomical scan administered) 10
Total lost 98
Sample with completed MRI session 610
Not processed due to noisy data (e.g., excessive movement) 10
Major medical/neurological disorder (e.g., tumor, cortical dysplasia) 2
Over 80% of regions with failed visual QC 5
Total lost 17
Final sample with imaging data 593
a

Non‐MRI safe implanted medical devices, having BBs/pellets or other nonremovable metal inside of the body, recent surgery, metallic tattoos, unremovable jewelry.

2.2. Neuroimaging Data Acquisition and Processing

Details regarding neuroimaging acquisition and processing pipelines are described in previous work (Bezek et al. 2024; Michael et al. 2023). Briefly, each participant was scanned with one of two research‐dedicated GE Discovery MR750 3T scanners located at the University of Michigan Functional MRI Laboratory. To take advantage of improvements in MRI data acquisition and harmonize our protocol with the Adolescent Brain Cognitive Development (ABCD) Study (Casey et al. 2018), we modified our acquisition protocol after the first 140 families. For the first 140 families (280 twins), high‐resolution T1‐weighted SPGR images were acquired via an 8‐channel head coil (156 1 mm‐thick slices, TI = 500, flip angle = 15°, FOV = 25.6 mm2). For the remaining 214 families (428 twins), high‐resolution T1‐weighted SPGR images were acquired via a 32‐channel head coil (208 1 mm‐thick slices, TI = 1060, flip angle = 8°, FOV = 25.6 mm2). Anatomical images from both sequences were aligned with the AC‐PC plane. We purposefully retained the same slice thickness to better harmonize the two sequences.

While certain brain age estimation algorithms use raw T1‐weighted scans, others require structural neuroimaging data already processed through FreeSurfer. As such, T1‐weighted scans were processed using FreeSurfer software (version 6.0.0, http://surfer.nmr.mgh.harvard.edu/; Fischl 2012). Briefly, the FreeSurfer automated processing stream implements skull stripping, transformation to standard (Talairach) stereotactic space, subcortical structure labeling, surface extraction, spherical registration, and cortical parcellation. We resampled the imaging data for each participant to an average participant and performed surface smoothing using the qcache command and a 10 mm full‐width half‐maximum Gaussian kernel. We applied the Desikan–Killiany atlas (Desikan et al. 2006) and aseg atlas (Fischl et al. 2002) to parcellate and segment the brain into cortical and subcortical regions of interest, respectively.

2.3. Quality Assessment of Neuroimaging Data

Given extensive evidence that T1‐weighted image quality is systematically associated with estimates of brain structure (Gilmore et al. 2021), we first sought to quality check (QC) our structural MRI data. Using the FreeSurfer output, we first conducted qualitative QC to exclude participants with low‐quality neuroimaging data. We followed the same QC protocol as the ENIGMA consortium (Gao et al. 2024). Briefly, visual QC was performed by trained research assistants who, for each participant, flagged each evaluated region as pass/fail based on whether the segmentation and parcellation profiles were appropriate, overextended, or underextended. Then, quantitative analysis was conducted for each region to confirm the absence of outliers.

To complement our qualitative QC approach, we also extracted the Euler number as generated from FreeSurfer (Rosen et al. 2018). The Euler number quantifies the complexity of the reconstructed cortical surface's topology, calculated by aggregating the vertices and faces, followed by subtracting the number of faces. FreeSurfer generates one Euler number for each hemisphere, so we summed across both hemispheres to derive a single Euler number for the entire brain per participant. The Euler number has been previously shown to converge with human ratings of image “usability” at high accuracy (Rosen et al. 2018). A higher Euler number represents a cortical surface reconstruction with a greater number of holes, and thus lower image quality. We used this value to estimate whether image quality was systematically associated with brain age gaps (see Section 2.5).

2.4. Brain Age Algorithms

We implemented multiple algorithms to estimate brain age, each trained and validated in either developmental or lifespan samples without psychiatric or other diagnoses. These include the Pyment (Leonardsen et al. 2022), Drobinin (Drobinin et al. 2022), Whitmore (Whitmore et al. 2023), Kaufmann (Kaufmann et al. 2019), and Centile algorithms (Yu et al. 2024). We focused specifically on these algorithms for three main reasons. First, the lifespan algorithms we applied (Pyment, Kaufmann, Centile) were chosen based on: development in influential brain age studies, common use in the literature, large sample size in training datasets, heterogeneity in machine learning approaches, and previous comparison in validation studies in adult cohorts (Bacas et al. 2023; Dörfel et al. 2023; Hanson et al. 2024). Second, given our interest in facilitating the use of brain age to study development, we added two of the first algorithms (Drobinin, Whitmore) that were developed in cohorts of youth (Drobinin et al. 2022; Whitmore et al. 2023). Third, we focused on algorithms with easily accessible, up‐to‐date public software that can be readily disseminated in line with recommendations for open, reproducible science. No features used to train these algorithms were missing from our data.

Each algorithm generated a predicted brain age value. We subsequently subtracted each participant's chronological age from their predicted brain age to generate brain age gaps. Positive brain age gaps represent a brain predicted to be older than expected relative to the training sample, thought to indicate relatively “accelerated” brain maturation. In contrast, negative brain age gaps represent a brain predicted to be younger than expected relative to the training sample, thought to indicate relatively “delayed” brain maturation.

2.4.1. Drobinin Brain Age

The Drobinin algorithm was trained on a developmental sample aged 9–19 years who did not meet criteria for any psychiatric disorders (Drobinin et al. 2022). This algorithm estimated each participant's brain age from 136 cortical features (bilateral cortical gray matter volume and surface area from 34 regions per hemisphere based on the Desikan–Killiany atlas; Desikan et al. 2006), and 53 bilateral global features (e.g., intracranial volume) and subcortical volumes (e.g., amygdala), resulting in 189 total features estimated from FreeSurfer. Machine learning was conducted using the XGBoost algorithm, which uses gradient tree boosting to predict brain age. Two brain age values were extracted, one with and one without bias correction. Bias correction accounts for prediction bias toward the group mean (Smith et al. 2019), which in this context involves slightly overestimating age in younger participants and slightly underestimating age in older participants. Relevant code for this algorithm is located here: https://github.com/GitDro/DevelopmentalBrainAge.

2.4.2. Whitmore Brain Age

The Whitmore algorithm was trained on a sample of early adolescents from the ABCD Study with a narrow age range. Two separate models were trained: one using the baseline cohort (9–10 years) and one using the year‐2 follow‐up cohort (11–12 years). The models were trained using the same approach as the Drobinin algorithm (Drobinin et al. 2022), including XGBoost gradient tree boosting to predict brain age, but this time from only 175 features (as only 175 out of 189 features were available in the ABCD dataset) estimated from FreeSurfer. Three brain age values were extracted: (a) a baseline prediction without bias correction, (b) a baseline prediction with bias correction, and (c) a follow‐up prediction without bias correction. Relevant code for this algorithm is located here: https://github.com/LucyWhitmore/BrainAGE‐Maturation.

2.4.3. Pyment Brain Age

The Pyment algorithm was trained on one of the largest (N = 34,285) and most heterogeneous datasets assembled (21 nonoverlapping, publicly available neuroimaging datasets) with ages ranging from 3 to 95 years, stratified by age and study (Leonardsen et al. 2022). Only participants without psychiatric, neurological, or other diagnoses were included to train the model. Briefly, this algorithm implemented a Simple Fully Convolutional Network on T1‐weighted structural MRI images. This algorithm involves a novel deep convolutional neural network that, out of 79 participating teams, came first in the Predictive Analysis Challenge for brain age prediction in 2019 (Gong et al. 2021). T1‐weighted images were partially preprocessed in FreeSurfer (using the ‐autorecon1 command). This algorithm extracts a single brain age value. Additional technical details are available in the original report. Relevant code for this algorithm is located here: https://github.com/estenhl/pyment‐public.

2.4.4. Kaufmann Brain Age

The Kaufmann algorithm was trained on a large (N = 35,474) and heterogeneous sample (combining across 40 neuroimaging datasets), with ages ranging from 3 to 89 years (Kaufmann et al. 2019). Only participants without psychiatric, neurological, or other diagnoses were included to train the model. This algorithm estimated each participant's brain age separately for males versus females, based on 1118 structural features. These features included thickness, surface area, and gray matter volume from a multimodal parcellation of the cortex, subcortex, and cerebellum (Glasser et al. 2016). This algorithm applies gradient tree boosting to predict brain age. We implemented this algorithm by first completing standard processing approaches in FreeSurfer, including motion correction and intensity normalization of T1‐weighted images, removal of non‐brain tissue, transformation to Talairach space, segmentation of white matter and gray matter volumetric structures, and estimation of cortical thickness. We then converted standard FreeSurfer outputs, which use the Desikan–Killiany atlas, to the HCP‐MMP1.0 parcellation to extract the 1118 structural features utilized by the Kaufmann algorithm (tutorial located here: https://cjneurolab.org/2016/11/22/hcp‐mmp1‐0‐volumetric‐nifti‐masks‐in‐native‐structural‐space/). This algorithm extracts a single brain age value. Relevant code for this algorithm is located here: https://github.com/tobias‐kaufmann/brainage.

2.4.5. Centile Brain Age

The Centile BrainAGE 2 model was trained on a large discovery sample (N = 35,683) that pooled structural MRI data from datasets across Australia, East Asia, Europe, and North America, with ages ranging from 5 to 90 years (Yu et al. 2024). This model was in turn validated on independent replication (N = 2101, aged 8–80 years) and longitudinal consistency (N = 377, aged 9–25 years) samples. Similar to the other included algorithms, only healthy controls without psychiatric, medical, or neurological morbidity or cognitive impairment were used to train the model. The algorithm applies support vector regression (SVR) with radial basis function (RBF) Kernel on 150 morphometric features (including measures of cortical thickness, cortical surface area, and subcortical volume) that are extracted from T1‐weighted images processed using standard FreeSurfer pipelines. This algorithm estimates brain age separately for males and females, and extracts two brain age values, one with and one without bias correction. Relevant procedures for this algorithm are located here: https://centilebrain.org/#/brainAge2.

2.5. Statistical Analyses

2.5.1. Performance of Brain Age Estimation Algorithms

We adopted multiple steps to quantitatively evaluate the performance of each algorithm. First, we conducted zero‐order Pearson correlations between brain age and chronological age to assess whether these two measures of aging are closely interrelated. While there is no consensus on the correlation magnitude that would be considered “acceptable,” prior empirical research reports correlations between brain age and chronological age with a magnitude of ~0.90 in adults (Franke and Gaser 2019), compared to a relatively wider magnitude range of 0.50 to 0.90 in youth (Drobinin et al. 2022; Keding et al. 2021; Rakesh et al. 2021). For the purposes of the present study, we consider correlations > 0.50 as demonstrating relatively appropriate performance.

Second, we also computed several fit statistics to quantify the precision of brain age predictions, including the root mean square error, the coefficient of determination capturing the percentage of variance in brain age explained by chronological age (R‐squared; RSQ), and the MAE. RMSE calculates the square root of the average squared differences between predictions and true values. MAE averages the absolute differences between predicted and true values. MAE is in the same units as the original data (i.e., years), while RMSE is not and therefore penalizes large errors more heavily. Absolute cut‐off values for “good” performance have not been established as they can vary depending on the age range and scale of the data, among other parameters. However, MAE values ranging between 1 and 5 years have been consistently reported in adult and youth cohorts (Cole and Franke 2017; Drobinin et al. 2022; Keding et al. 2021; Rakesh et al. 2021; Whitmore et al. 2023). Here, we consider MAE values between 1 and 5 years as demonstrating appropriate MAE performance. As RMSE and RSQ are not commonly reported in most existing brain age studies in youth, we determined their performance using relative terms based on the fit statistics of all the algorithms examined.

2.5.2. Sensitivity of Brain Age Estimation Algorithms to Scan Parameters

We next sought to investigate the sensitivity of our brain age estimation algorithms to different potential sources of variation and noise. We conducted zero‐order correlations between brain age gaps with (a) neuroimaging acquisition sequence (8‐channel versus 32‐channel head coil; point biserial) and (b) image quality (Euler number; Pearson). Given the wide variability in acquisition sequence across neuroimaging studies and in image quality within and across studies, estimation algorithms that are robust to such sources of variability would likely be more generalizable across different datasets.

2.5.3. Demographic Variation in Brain Age Estimations

Given established differences in neurodevelopment as a function of sex and puberty, we next examined how brain age estimations from each algorithm are associated with demographic background. Specifically, we conducted zero‐order correlations between brain age gaps for each algorithm with (a) sex assigned at birth (male versus female; point biserial) and (b) pubertal status (Pearson). Pubertal maturation was operationalized using continuous scores from the parent‐reported Pubertal Development Scale (PDS). PDS scores have been shown to correlate with physician ratings of pubertal maturation (Petersen et al. 1988). Two participants did not complete the PDS, resulting in a total sample size of n = 591 for this analysis. These results can inform which algorithms are sensitive to sex and puberty, and thus may be most appropriate depending on the research question of each study (e.g., whether research questions focus on sex differences or pubertal maturation).

2.5.4. Twin Correlations in Brain Age Estimations

We calculated intraclass correlation coefficients (ICC) by estimating how brain age gaps are correlated within pairs of monozygotic twins. Since monozygotic twins have identical DNA but at least some different life experiences (and in our sample, at least a decade to diverge in their maturation), we expected that better‐performing algorithms would be able to differentiate between co‐twins (i.e., ICC < 1.0). At the same time, because monozygotic twins have identical DNA and share many aspects of their environment, their brain maturation should also be relatively similar, such that better‐performing algorithms should result in relatively strong correlations (e.g., ICC > 0.30, though note that this is a relatively arbitrary cut‐off).

2.5.5. Prediction Convergence Across Brain Age Algorithms

Although different algorithms estimate brain age from distinct neuroimaging features, are trained in unique reference populations, and apply heterogeneous machine learning algorithms (Bashyam et al. 2020; Hahn et al. 2021), they all attempt to estimate the same theoretical construct of “brain age.” The predicted brain age values and derived brain age gaps of different algorithms should consequently be correlated. To empirically cross‐reference this theoretical assumption, we conducted zero‐order Pearson correlations among (a) the predicted brain age values and (b) brain age gaps from each estimation algorithm.

2.5.6. Analyses in an Independent Population‐Based Cohort of Youth

To further investigate the appropriateness of existing brain age algorithms for cohorts with different identities and backgrounds, we also studied a separate neuroimaging sample: the Study of Adolescent Neural Development (SAND). Briefly, as described in prior work (Hardi et al. 2022; Michael et al. 2024), participants were recruited from the Future of Families and Child Wellbeing Study (FFCWS), a population‐based sample of 4898 youth born in large US cities (population > 200,000), oversampled (3:1) for nonmarital births. Given the demographics of the US and urban hospitals, this sampling frame resulted in high representation of low‐income, ethnoracially minoritized families (Reichman et al. 2001).

Data were collected at birth and at ages 1, 3, 5, 9, and 15 years through telephone and home visits. In a follow‐up sub‐study (SAND), 237 youth aged 15–17 years from Detroit (MI), Toledo (OH), and Chicago (IL) completed neuroimaging. From these, 208 participants had structural MRI data necessary for brain age estimations (see Table 2), and 10 participants were excluded due to substantial coverage cut‐off (n = 5), incidental findings (tumor; n = 1), and failing visual QC for > 80% of brain regions (n = 4). As such, the final sample included 198 participants (M [SD] = 15.85 [0.52] years; 55.1% female). In light of the sampling frame and demographics of these specific cities, particularly Detroit, most participants identified as Black Non‐Hispanic (78.8% Black Non‐Hispanic, 10.6% White, 5.6% Hispanic, 5.0% Other/Multiple). Parents provided written informed consent, and youth provided oral assent across all waves. The FFCWS and SAND were approved by the Institutional Review Boards of Princeton University and the University of Michigan and are in compliance with American Psychological Association ethical standards in the treatment of human participants.

TABLE 2.

Summary of magnetic resonance imaging data included in the Study of Adolescent Neural Development.

Number lost Participants with data
Original sample 237
Declined/missed MRI scan (including declining to remove jewelry/piercings) 8
Dental (e.g., bracers, retainer) 12
Metal in/on the body a (including recent surgery) 1
Exceeding scanner size restrictions (e.g., overweight, broad shoulders) 5
Major medical/neurological disorder (e.g., autism spectrum, TBI, tumor) 3
Total lost 29
Sample with completed MRI session 208
Substantial parietal coverage cut‐off 5
Major medical/neurological disorder (e.g., tumor) 1
Over 80% of regions with failed visual QC 4
Total lost 10
Final sample with imaging data 198
a

Non‐MRI safe implanted medical devices, having BBs/pellets or other nonremovable metal inside of the body, recent surgery, metallic tattoos, unremovable jewelry.

Details regarding neuroimaging acquisition and processing pipelines are described in previous work (Hardi et al. 2022; Michael et al. 2024). Briefly, participants were scanned with a research‐dedicated GE Discovery MR750 3T scanner with an 8‐channel head coil located at the University of Michigan Functional MRI Laboratory. High‐resolution T1‐weighted SPGR images were acquired (TR = 12 ms, TE = 5 ms, TI = 500 ms, flip angle = 15°, FOV = 26 cm, slice thickness = 1.4 mm, 256 × 192 matrix, 110 slices, voxel size = 1 mm × 1 mm × 1 mm). Anatomical images were aligned with the AC‐PC plane. T1‐weighted scans were processed using FreeSurfer software and quality assessed using identical pipelines as MTwiNS. No features used to train the brain age algorithms were missing from our data. We examined associations between brain age and chronological age, and derived fit statistics to evaluate each algorithm in the SAND cohort.

3. Results

3.1. Algorithm Performance

We first examined the performance of each brain age algorithm in our primary sample using a multistep approach combining descriptive and inferential levels of analysis (see Figure 1A and Table 3). First, we found that most algorithms generated brain age values largely within the age range of our sample, with two exceptions. The Whitmore models systematically underpredicted age by generating brain age values that were restricted within the age range of the training sample. As such, the oldest brain age value (e.g., 10.58 years in the baseline model without bias correction) was highly discrepant from the oldest actual age in our sample (19.75 years). In contrast, the Kaufmann model systematically overpredicted age by generating brain age values that were much higher than the oldest actual age in our sample (53.85 years).

FIGURE 1.

FIGURE 1

Training and predicted age ranges by brain age estimation algorithm and developmental cohort. Schematic illustrates the age range included in each training sample (gray) and each predicted age range (orange), benchmarked to the actual age range (green). Age ranges are depicted for (A) the Michigan Twin Neurogenetics Study (MTwiNS); (B) the Study of Adolescent Neural Development (SAND); (C) the MTwiNS cohort restricted to the age range of the SAND cohort; (D) the MTwiNS cohort restricted to youth who identified as an ethnic/racial minority (i.e., not White); (E) the MTwiNS cohort restricted to youth living below 200% of the poverty line; (F) the MTwiNS cohort restricted to the age range of the Adolescent Brain Cognitive Development (ABCD) Study at baseline; and (G) the MTwiNS cohort restricted to the age range of the ABCD Study at follow‐up (FU). BC, bias corrected.

TABLE 3.

Performance of brain age estimation algorithms in the Michigan Twin Neurogenetics Study.

Brain age algorithm Minimum predicted age Maximum predicted age Correlation with chronological age MAE RSQ RMSE
Drobinin 9.95 18.44 0.51*** 1.60 26.2% 1.96
Drobinin (bias corrected) 6.43 21.88 0.51*** 2.26 26.2% 2.75
Whitmore 9.64 10.58 0.34*** 4.66 11.6% 5.05
Whitmore (bias corrected) 8.55 13.17 0.34*** 3.94 11.6% 4.34
Whitmore (follow‐up) 11.36 12.78 0.41*** 2.94 16.4% 3.32
Pyment 9.09 28.05 0.68*** 1.78 46.2% 2.40
Kaufmann 8.57 53.85 0.44*** 6.61 19.6% 9.22
Centile 7.35 32.31 0.53*** 3.02 27.6% 3.91
Centile (bias corrected) 5.53 32.09 0.61*** 2.88 36.7% 3.70

Note: N = 593. ***p < 0.001.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

Second, we investigated the correlation between brain age and chronological age for each algorithm. We observed effect sizes that were substantially lower than previous literature, as well as effect sizes that were consistent with previous brain age studies in youth. The Pyment‐predicted age exhibited the strongest correlation with actual age (r = 0.68), followed by Drobinin (r = 0.51) and Centile predictions (r's = 0.53 and 0.61). The Whitmore predictions exhibited the weakest correlations (r's = 0.34 and 0.41), followed by the Kaufmann‐predicted age (r = 0.44).

Finally, we evaluated different indices of model performance for each algorithm. We found that the Pyment, Centile, and Drobinin algorithms had the smallest errors (MAE, RMSE) and the highest variance explained (RSQ), with errors around the lower bounds of those reported in previous studies (e.g., MAE = 1.78). In contrast, the Whitmore and Kaufmann algorithms had the largest errors (MAE, RMSE) and the lowest variance explained (RSQ), with errors around or outside the upper bounds of those reported in previous studies (e.g., MAE = 6.61). Collectively, these analyses consistently indicate that the Pyment, Drobinin, and Centile algorithms provide the best fit, while the Whitmore and Kaufmann algorithms provide the poorest fit in MTwiNS.

3.2. Prediction Sensitivity to Scan Parameters

Beyond the performance of each estimation algorithm, we additionally characterized each algorithm's sensitivity to neuroimaging parameters by correlating brain age gaps with acquisition sequence (8‐channel or 32‐channel head coil) and scan quality (Euler number) (see Figure 2A). We found that the developmental algorithms (Drobinin, Whitmore) generated more advanced brain age gaps for the 8‐channel head coil and lower‐quality scans, while two lifespan algorithms (Kaufmann, Centile) generated more advanced brain age gaps for the 32‐channel head coil and higher‐quality scans. Lastly, one lifespan algorithm (Pyment) was not sensitive to acquisition sequence and was minimally sensitive to scan quality, suggesting that this algorithm may be largely robust to systematic variation in scanning parameters that are conceptually unrelated to brain maturation.

FIGURE 2.

FIGURE 2

Correlation maps demonstrating the convergence and sensitivity to neuroimaging and demographic variation of brain age estimation algorithms in the Michigan Twin Neurogenetics Study. (A, B) Correlations between brain age gaps estimated from different algorithms with each other, enclosed by dashed lines, as well as with (A) neuroimaging scan parameters (acquisition sequence and Euler number) and (B) demographic characteristics (sex assigned at birth and pubertal status), enclosed by solid lines. Acquisition sequence was coded as 8‐channel head coil (0) or 32‐channel head coil (1). Sex assigned at birth was coded as male (0) or female (1). (C) Correlations between predicted brain age estimated from different algorithms with each other. BC, bias corrected; FU, follow‐up. N = 593 for all analyses, except correlations with puberty (N = 591).

3.3. Prediction Sensitivity to Demographic Variation

We further assessed each algorithm's sensitivity to demographic variation by correlating brain age gaps with sex and puberty (see Figure 2B). First, we found a generally weak relation between brain age gaps and sex assigned at birth (r's < 0.20), with the exception of one model that trained sex‐specific brain age estimation algorithms (Kaufmann), in which females exhibited more advanced brain age. With respect to pubertal maturation, we found that the developmental algorithms (Drobinin, Whitmore) generated more advanced brain age gaps for youth at earlier pubertal stages, whereas two lifespan algorithms (Kaufmann, Centile) generated more advanced brain age gaps for youth at later pubertal stages. As observed for variation in scan parameters, one algorithm was minimally sensitive to pubertal maturation (Pyment). Taken together, most algorithms were not sensitive to sex differences but varied as a function of pubertal maturation, with the exception of one algorithm that was not sensitive to either parameter (Pyment).

3.4. Prediction Sensitivity to Genetic Similarity

To identify whether algorithm predictions were interrelated within identical twin pairs, we correlated brain age gaps within pairs of monozygotic twins (n = 106 twin pairs). We found that four out of the five algorithms demonstrated the expected pattern of relatively strong, but not identical, correlations. These algorithms included the Drobinin (ICC's = 0.77 and 0.87), Pyment (ICC = 0.64), Kaufmann (ICC = 0.75), and Centile models (ICC's = 0.66 and 0.67). The Whitmore algorithm, however, generated brain age gaps with minimal variation between co‐twins (ICC's = 0.95 and 99), leading to concerns about its ability to distinguish similar, albeit distinct, brains.

3.5. Prediction Convergence Across Algorithms

We next assessed whether different algorithms converged in (a) predicted brain age (see Figure 2C) and (b) derived brain age gaps (see Figure 2A,B). Starting with predicted brain age, predictions across algorithms demonstrated convergence that ranged from weak (r = 0.20) to strong (r = 0.72), though most algorithms displayed moderate‐to‐strong convergence. The weakest convergence consistently involved the Whitmore model, suggesting that this algorithm generated brain age predictions that were not aligned with brain age predictions from other algorithms.

Regarding brain age gaps, we again found that convergence between different algorithms ranged from negligible (r = 0.00) to strong (r = 0.77), although most algorithms again demonstrated moderate convergence. The weakest convergence once more involved the Whitmore algorithm, specifically with Kaufmann, Pyment, and Centile algorithms, even correlating in the opposite direction (i.e., more advanced brain age estimated from Whitmore models was associated with more delayed brain age estimated from Kaufmann and Centile models).

Nevertheless, when synthesized with our previous findings, these patterns suggest that brain age algorithms that appropriately fit our data (i.e., Drobinin, Pyment, Centile) demonstrate moderate levels of convergence in both their predicted brain age values, as well as the derived brain age gaps. As all correlation magnitudes were below 0.80, these findings demonstrate that these algorithms measure largely overlapping but slightly distinct brain age constructs. See Figure 3 for a parsimonious summary of each algorithm's overall performance in MTwiNS.

FIGURE 3.

FIGURE 3

Summary of overall performance of various brain age estimation algorithms in the Michigan Twin Neurogenetics Study. Green cells indicate “high” performance on evaluation indices. Orange cells indicate “intermediate” performance on evaluation indices. Red cells indicate “low” performance on evaluation indices. Performance evaluation criteria were developed for the purposes of providing a parsimonious visual summary of study findings, with potential implications for algorithm selection in future studies. Alignment between brain age and actual age ranges: Performance summarized using absolute criteria. High performance = predicted maximum/minimum within 2 SD's of observed range. Intermediate performance = predicted maximum/minimum within 4 SD's of observed range. Low performance: Predicted maximum/minimum > 4 SD's of observed range, or highly restricted predicted age ranges. Correlation between brain age and actual age: Performance summarized using absolute criteria based on prior literature in youth. High performance = r > 0.50. Intermediate performance = r > 0.30. Low performance = r < 0.30. Fit/error statistics: Performance summarized using relative criteria for the primary error metric: Mean absolute error. High performance = lower tertile. Intermediate performance = middle tertile. Low performance = upper tertile. Robustness to scan‐related variation: Performance summarized using absolute criteria regarding sensitivity to scanning acquisition sequence (8‐ vs. 32‐channel head coil) and image quality (Euler number). High performance = weak sensitivity (r < 0.20). Intermediate performance = moderate sensitivity (r < 0.50). Low performance = high sensitivity (r > 0.50). Twin pair correlations: Performance summarized using relative criteria regarding intraclass correlations of brain age gaps in monozygotic twin pairs. High performance = lower tertile. Intermediate performance = middle tertile. Low performance = upper tertile. Convergence with other algorithms: Performance summarized using absolute criteria. High performance = strong correlations (r = 0.30–0.50). Intermediate performance = moderate correlations (r = 0.20–0.30). Low performance = weak correlations (r = 0.00–0.20).

3.6. Analyses in an Independent Population‐Based Cohort of Youth

We subsequently evaluated the performance of the same algorithms in a separate cohort of predominantly Black, low‐income adolescents (see Figure 1B and Table 4). Descriptively, instead of generating brain age values that largely aligned with the age range of this cohort (15.03–17.60 years), all five algorithms generated predictions that substantially differed from this age range (e.g., 10.46–29.33 years for the Centile model). Pyment and Centile models exhibited the strongest, whereas Whitmore models exhibited the weakest correlations with chronological age. However, correlations between brain age and actual age were consistently in the weak range (all r's ≤ 0.24), with associations in some models not reaching statistical significance. Finally, error metrics (MAE, RMSE) were relatively large (MAE ranging from 1.00 to 8.53). This relatively weaker performance is also reflected by the finding that negligible variance in chronological age can be explained by predicted brain age, with RSQ values ranging from 0.4% (Whitmore) to 6.0% (Centile). Overall, these findings suggest that the algorithms we examined may be challenging to apply meaningfully in the SAND cohort.

TABLE 4.

Performance of brain age estimation algorithms in the Study of Adolescent Neural Development.

Brain age algorithm Minimum predicted age Maximum predicted age Correlation with chronological age MAE RSQ RMSE
Drobinin 13.08 18.89 0.16* 1.00 2.5% 1.22
Drobinin (bias corrected) 12.12 22.69 0.16* 2.22 2.5% 2.71
Whitmore 9.83 10.64 0.06 5.61 0.4% 5.64
Whitmore (bias corrected) 9.47 13.49 0.06 4.33 0.4% 4.42
Whitmore (follow‐up) 11.72 12.95 0.10 3.51 1.0% 3.55
Pyment 10.55 25.46 0.22** 2.19 4.8% 2.78
Kaufmann 12.28 51.63 0.15* 8.53 2.4% 11.0
Centile 10.46 29.33 0.20** 2.88 4.2% 3.67
Centile (bias corrected) 10.10 29.02 0.24*** 2.70 6.0% 3.49

Note: N = 198. *p < 0.05; **p < 0.01; ***p < 0.001.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

3.7. Factors Explaining Limited Performance of Brain Age Estimation Algorithms in the Second Cohort of Youth

Multiple reasons could account for why the five implemented algorithms may not accurately estimate brain age in the SAND cohort: (a) the narrow age range of SAND, (b) the overrepresentation of racial minorities in SAND, and (c) the overrepresentation of lower‐income families in SAND. These racial and socioeconomic backgrounds are underrepresented in existing brain age training samples. To empirically test these possibilities, we conducted three post hoc analyses in the MTwiNS cohort. First, to examine the role of narrow age ranges, we repeated our analyses after restricting the MTwiNS cohort to the same age range as the SAND cohort (n = 235; see Figure 1C and Table 5). Second, given the limited number of participants within each racial and ethnic group beyond the sample majority (White), we repeated our analyses after restricting the MTwiNS cohort to youth who did not identify as White, therefore representing ethnoracially minoritized families (n = 118; see Figure 1D and Table 6). Third, because the SAND participants have very low average incomes, we repeated our analyses after restricting the MTwiNS cohort to families living below 200% of the poverty line, defined using the income and size of each household (n = 271; see Figure 1E and Table 7). These analyses demonstrated that (a) correlations between brain age and actual age and (b) fit statistics (MAE, RSQ, RMSE) were very similar to the primary analyses in MTwiNS for the ethnoracially‐ and socioeconomically‐constrained subsamples, but were substantially weaker in the age‐constrained subsample. These analyses are consistent with the hypothesis that the narrow age range, as opposed to the low socioeconomic status or racial/ethnic composition, of the SAND cohort may be the primary factor underlying the relatively weak predictions of the applied algorithms in this sample.

TABLE 5.

Performance of brain age estimation algorithms in an age‐constrained subsample (ages 15–17, aligned with the Study of Adolescent Neural Development) in the Michigan Twin Neurogenetics Study.

Brain age algorithm Correlation with chronological age MAE RSQ RMSE
Drobinin 0.24*** 1.65 5.8% 2.01
Drobinin (bias corrected) 0.24*** 2.30 5.8% 2.82
Whitmore 0.19** 5.96 3.5% 6.01
Whitmore (bias corrected) 0.19** 4.99 3.5% 5.09
Whitmore (follow‐up) 0.23*** 3.97 5.3% 4.04
Pyment 0.37*** 1.87 13.6% 2.55
Kaufmann 0.30*** 7.07 8.9% 9.62
Centile 0.24*** 3.03 5.9% 3.90
Centile (bias corrected) 0.29*** 2.96 8.5% 3.79

Note: N = 235. *p < 0.05; **p < 0.01; ***p < 0.001.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

TABLE 6.

Performance of brain age estimation algorithms in an ethnoracially‐constrained subsample (participants who did not identify as White) in the Michigan Twin Neurogenetics Study.

Brain age algorithm Correlation with chronological age MAE RSQ RMSE
Drobinin 0.47*** 1.55 21.9% 1.87
Drobinin (bias corrected) 0.47*** 2.08 21.9% 2.51
Whitmore 0.25** 4.07 6.0% 4.52
Whitmore (bias corrected) 0.25** 3.45 6.0% 3.90
Whitmore (follow‐up) 0.33*** 2.47 11.0% 2.87
Pyment 0.68*** 1.51 45.6% 1.97
Kaufmann 0.41*** 8.20 16.9% 11.2
Centile 0.46*** 3.21 21.5% 4.05
Centile (bias corrected) 0.55*** 2.91 30.4% 3.71

Note: N = 118. *p < 0.05; **p < 0.01; ***p < 0.001.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

TABLE 7.

Performance of brain age estimation algorithms in a socioeconomically‐constrained subsample (participants living below 200% of the poverty line) in the Michigan Twin Neurogenetics Study.

Brain age algorithm Correlation with chronological age MAE RSQ RMSE
Drobinin 0.53*** 1.58 28.3% 1.94
Drobinin (bias corrected) 0.53*** 2.25 28.3% 2.75
Whitmore 0.34*** 4.28 11.5% 4.75
Whitmore (bias corrected) 0.34*** 3.60 11.5% 4.06
Whitmore (follow‐up) 0.45*** 2.73 20.1% 3.08
Pyment 0.70*** 1.63 49.3% 2.24
Kaufmann 0.41*** 6.58 16.5% 9.32
Centile 0.54*** 3.08 28.6% 3.96
Centile (bias corrected) 0.62*** 2.86 38.0% 3.70

Note: N = 271. *p < 0.05; **p < 0.01; ***p < 0.001.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

Our final set of analyses sought to disentangle whether reductions in prediction accuracy in age‐constrained samples reflect (a) a general effect of narrow age ranges in the testing cohort or (b) a specific effect of misalignment between the age ranges included in the training and testing cohorts. We hypothesized that if prediction accuracy is primarily limited by age‐range misalignment, the Whitmore algorithms—trained within 9–10 or 11–12 years—should perform well when tested on comparably aged samples. In contrast, if prediction accuracy is primarily limited by narrow age ranges, the Whitmore models should continue to perform poorly even when the age range of the testing cohort fully overlaps. Finally, as prior analyses indicated that narrow developmental ranges undermine prediction accuracy, we expected the remaining four algorithms (Drobinin, Pyment, Kaufmann, Centile) to exhibit relatively weak predictions in samples restricted to the narrow age range of the Whitmore training cohorts.

To test these explanations, we repeated our analyses in MTwiNS after restricting the age range of the cohort in two additional ways. The first subsample was matched to the age range of the ABCD baseline sample (ages 9–10 years) used to train the baseline Whitmore algorithms (n = 38; see Figure 1F and Table 8), whereas the second subsample was matched to the age range of the ABCD follow‐up sample (ages 11–12 years) used to train the follow‐up Whitmore algorithm (n = 66; see Figure 1G and Table 9). Across both age‐constrained subsamples, performance accuracy was weak for all algorithms, including the Whitmore models trained in fully overlapping age ranges. Synthesizing across all the analyses performed, our findings indicate that both narrow (i.e., Whitmore) and misaligned (i.e., Kaufmann) age ranges may pose primary obstacles to accurate brain age estimation in youth cohorts.

TABLE 8.

Performance of brain age estimation algorithms in an age‐constrained subsample (ages 9–10, aligned with the baseline wave of the Adolescent Brain Cognitive Development Study) in the Michigan Twin Neurogenetics Study.

Brain age algorithm Correlation with chronological age MAE RSQ RMSE
Drobinin −0.08 2.20 0.7% 2.53
Drobinin (bias corrected) −0.08 2.04 0.7% 2.49
Whitmore 0.08 0.48 0.7% 0.56
Whitmore (bias corrected) 0.08 0.62 0.7% 0.79
Whitmore (follow‐up) 0.03 1.54 0.1% 1.60
Pyment 0.28 1.19 8.0% 1.75
Kaufmann 0.12 4.84 1.5% 7.20
Centile 0.12 2.33 1.5% 2.96 s
Centile (bias corrected) 0.16 1.88 2.7% 2.38

Note: N = 38.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

TABLE 9.

Performance of brain age estimation algorithms in an age‐constrained subsample (ages 11–12, aligned with the follow‐up wave of the Adolescent Brain Cognitive Development Study) in the Michigan Twin Neurogenetics Study.

Brain age algorithm Correlation with chronological age MAE RSQ RMSE
Drobinin 0.17 1.35 3.0% 1.68
Drobinin (bias corrected) 0.17 2.32 3.0% 2.79
Whitmore 0.07 2.25 0.5% 2.33
Whitmore (bias corrected) 0.07 1.88 0.5% 2.13
Whitmore (follow‐up) 0.17 0.60 2.9% 0.68
Pyment 0.29* 1.64 8.6% 2.17
Kaufmann 0.19 3.80 3.4% 5.41
Centile 0.21 2.33 4.5% 3.11
Centile (bias corrected) 0.26* 2.22 6.6% 2.88

Note: N = 66. *p < 0.05.

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error; RSQ, R‐squared.

4. Discussion

Despite its growing application, the best way to measure brain age in youth remains unclear. This study systematically compared the performance of five common, influential brain age models in a population‐based adolescent sample recruited from birth records to represent families living in disadvantaged neighborhoods. While Pyment demonstrated the best performance overall, we found that Pyment, Drobinin, and Centile generated brain age values that largely aligned with the age range of our sample, closely tracked with chronological age (r's between 0.51 and 0.68), and had better fit indices (e.g., MAE between 1.50 and 3), reflecting effect sizes previously reported in youth (Cole and Franke 2017; Drobinin et al. 2022; Keding et al. 2021; Whitmore et al. 2023). In contrast, Whitmore and Kaufmann generated brain age predictions that were misaligned with the age range of our sample, tracked less well with age (r's between 0.34 and 0.44), and had worse fit indices (e.g., MAE between 3 and 7). Moreover, while some models predicted advanced brain age for our 8‐channel head coil, lower‐quality scans, and earlier pubertal stages (Drobinin, Whitmore), the reverse was observed for others (Centile, Kaufmann). Pyment was not sensitive to acquisition sequence, image quality, or puberty, and most algorithms were not sensitive to sex differences. All algorithms besides Whitmore generated brain age gaps with strong, but not identical, ICCs in monozygotic twins, suggesting they showed expected correlations between identical twins while distinguishing between similar but distinct brains. Models with stronger fit to our data (Pyment, Drobinin, Centile) largely produced converging brain age values and gaps with each other. Finally, all algorithms showed weak brain age predictions in a separate sample of economically and racially minoritized adolescents, with follow‐up analyses suggesting this challenge in prediction was largely due to the narrow age range of this cohort. This study emphasizes that some brain age algorithms perform better than others and, since they were not all intercorrelated, they each tap a distinct underlying construct and may not generalize to all studies, particularly those with narrow age ranges. Overall, these findings suggest that brain age algorithms may be best applied in samples with relatively larger developmental ranges that also overlap with the cohort in which the algorithm was trained.

4.1. Comparison of Brain Age Estimation Algorithms

Across numerous evaluation criteria, the Pyment model exhibited the best performance in our primary sample, followed by the Drobinin and Centile models. For example, Pyment showed the strongest correlations with actual age, the largest proportion of variance explained in age, and robustness to acquisition sequence and head motion. Previous comparison studies in adults have similarly identified high accuracy, test–retest reliability, and robustness to head motion in Pyment (Dörfel et al. 2023; Hanson et al. 2024). Several reasons may explain why Pyment was also the best‐performing algorithm in our sample. Despite disproportionately representing older cohorts, the training sample was large and extended into the age range of our sample. Second, Pyment included multiple acquisition sequences and did not exclude participants for head motion or image quality, which may facilitate robust brain age estimations regardless of noise and scanning variation. Similar factors may account for the overall good performance of Centile. In contrast, Drobinin may generate accurate predictions because it was trained on a cohort of youth with a largely overlapping age range with our sample. This explanation dovetails with evidence indicating that model performance varies considerably depending on the age range of the sample (de Lange et al. 2022; Franke et al. 2010). This consideration is key for studies of youth since most validated algorithms have been trained in adult cohorts with limited representation of young ages. The sensitivity of most algorithms to the age range of the training sample highlights the importance of using algorithms developed in samples with overlapping ages, and emphasizes that caution should be applied when interpreting what “brain age” means in each study.

The Kaufmann and Whitmore models did not strongly predict brain age in our primary sample, evidenced by inaccurate predicted age ranges, weak associations with age, and poor fit. Moreover, the Whitmore algorithm demonstrated nearly perfect ICCs in brain age gaps among monozygotic twins, suggesting that it could not differentiate between brains that should be quite similar, but not perfectly identical. Whitmore was trained using the same algorithm and similar neuroimaging features as Drobinin. Nevertheless, whereas the Drobinin model was trained on a cohort of youth with similar ages to our sample, the Whitmore models were trained on narrower samples aged 9–10 years (ABCD baseline) or 11–12 years (ABCD follow‐up). These differences in training samples may systematically bias the Whitmore model to predict brain age values anchored to the age range of their sample. This result is partly expected as these models were trained to capture short‐term maturational changes circumscribed to early adolescence, rather than long‐term maturational changes occurring across development. However, this finding again suggests that misaligned age ranges between training and applied samples can undermine brain age estimations, regardless of the training algorithm and imaging features. Further, this finding emphasizes limitations long established in the field of machine learning, such as the limited ability of tree‐based algorithms (e.g., XGBoost implemented by Whitmore) to extrapolate predictions outside of the ages included in the training sample (Whitmore and Beck 2025). Regarding Kaufmann, despite being trained on a large, heterogeneous sample that included ages in the range of our data (similar to Pyment), brain age was systematically overpredicted. Differences in training samples, machine learning algorithms, and neuroimaging features may allow the Pyment, but not the Kaufmann, software to accurately predict brain age in our sample.

Even though the predicted brain age values were reasonably intercorrelated across the five algorithms, the brain age gaps were only intercorrelated for four algorithms, including the Pyment, Drobinin, Centile, and Kaufmann models. Conversely, the Whitmore algorithm, which demonstrated weaker performance overall, generated brain age gaps that were weakly, or even negatively, correlated with other algorithms. Moreover, despite being positively intercorrelated, the brain age gaps from the other four models exhibited divergent associations with pubertal and scanning‐related variation. Whereas the Drobinin algorithm predicted more advanced brain age for the 8‐channel head coil, lower scan quality, and earlier pubertal stages, the Kaufmann and Centile models predicted more advanced brain age for the 32‐channel head coil, higher scan quality, and later pubertal stages. The Pyment model was not sensitive to acquisition sequence, head motion, or puberty. These findings are inconsistent with reports that pubertal maturation is positively related to brain age gaps across two algorithms, but align with evidence suggesting opposite links with cognition depending on the age of the training sample (Whitmore et al. 2023). As predictions and phenotypic associations in one algorithm may diverge from others, these findings emphasize the need for careful selection and evaluation of brain age models in youth. For example, whereas Pyment may be flexibly applied regardless of scanning parameters, the lack of associations with sex and puberty suggest that this algorithm may not be appropriate when research questions focus on sex differences or pubertal influences on brain maturation.

To further characterize how existing brain age algorithms perform in multiple cohorts, particularly those with higher representation of economic backgrounds and racial identities that are not often included in neuroimaging studies, we also compared these algorithms in a second sample of predominantly Black, low‐income adolescents. Surprisingly, the five algorithms we examined generated relatively weak brain age predictions in this second cohort. Importantly, the five models we tested were trained in convenience samples that mainly comprised White adults with less disadvantaged backgrounds, less adversity exposure, and no psychological or medical diagnoses. This demographic and contextual misalignment between training and testing samples likely reduces the representativeness of current algorithms and their ability to generalize to youth from more diverse or disadvantaged backgrounds (Dhamala et al. 2024; Falk et al. 2013; Hyde et al. 2024; Ricard et al. 2023; Wu et al. 2024). Another possibility is that this second cohort had a smaller sample size and a narrow age range (15–17 years), two factors known to impair model performance (de Lange et al. 2022; Jollans et al. 2019). In this case, it could be that brain age algorithms may struggle to detect meaningful interindividual differences in brain maturation in samples where the span of ages—and thus developmental variability—is small. Consistent with this hypothesis, follow‐up analyses revealed that constraining the MTwiNS cohort to a narrow age range (15–17 years), but not to ethnoracial minorities or lower‐income youth, dramatically undermined the performance of all brain age algorithms. Additionally, brain age predictions remained relatively weak when the MTwiNS cohort was restricted to match the age range of the Whitmore training models. This observation reinforces that limited developmental range in testing cohorts—rather than misaligned sociodemographic background and age ranges alone—constitutes a major obstacle to accurate brain age estimation in youth. These findings underscore a need for the development and validation of brain age algorithms that are trained on population‐based cohorts with wide age ranges, and caution the application of current algorithms to cohorts of youth with narrow or misaligned age spans and underrepresented identities.

4.2. Implications and Future Recommendations

While brain age is being increasingly applied to understand cognitive functioning (Erus et al. 2015; Kelly et al. 2022; Whitmore et al. 2023), psychiatric symptoms (Cropley et al. 2021; Kurth et al. 2022; MacSweeney et al. 2024; Sanford et al. 2022), and environmental influences on neurodevelopment (Beck et al. 2025; Drobinin et al. 2022; Keding et al. 2021; Rakesh et al. 2021) in youth, these studies report highly discrepant findings regarding whether these experiences and outcomes are associated with accelerated or delayed brain maturation. One explanation for these inconsistent findings is that nuanced parameters of brain maturation, environmental experiences, and phenotypic outcomes, which remain to be characterized, change whether characteristics of interest are associated with advanced or delayed brain age. Our study and others, however, point to another explanation. Specifically, multiple algorithms have been developed to estimate brain age, which exhibit different psychometric properties (Bacas et al. 2023; Dörfel et al. 2023; Hanson et al. 2024) and, in our study, distinct relationships with puberty, image quality, and acquisition sequence. As such, the mixed literature may be corollary to the different algorithms applied to estimate brain age across studies. This possibility emphasizes the need for continued basic research to establish what each algorithm “means” in each context, and hence how brain age can best be estimated depending on the population of interest, study design, and research question (Whitmore and Beck 2025). Establishing these guidelines can generate more consistent and replicable findings on how brain age relates to environmental experiences and developmental outcomes.

4.3. Limitations

This is the first study to our knowledge to rigorously compare commonly used software packages for estimating brain age in population‐based cohorts of youth and can thus serve as a benchmark for future applied research in developmental cohorts. However, the extent to which our study can inform brain age studies earlier during infancy and middle childhood, or later during adulthood, is unclear. Second, we focused on five common algorithms with accessible open‐source code to estimate brain age but, similar to other indices of biological aging such as DNA methylation (Raffington and Belsky 2022), additional algorithms are continuously under development. These novel models will be important to test in heterogeneous cohorts of youth to continue identifying the best estimation algorithms in these populations. For example, brain age can be estimated using not only brain structure but also patterns of functional connectivity, diffusion‐weighted imaging, or a combination (Keding et al. 2024; Niu et al. 2020). Future research should probe whether multimodal algorithms testing both structural and functional development are more sensitive to individual differences in brain maturity and thus demonstrate better performance.

In this study, we evaluate the performance of different brain age estimation algorithms in youth using cross‐sectional neuroimaging data. Consequently, contrary to evaluation studies in adults (Bacas et al. 2023; Dörfel et al. 2023; Hanson et al. 2024), we did not investigate test–retest reliability or the psychometric properties of longitudinal brain age measures that may be particularly relevant for periods of rapid neurodevelopment. However, interpreting longitudinal changes in brain age among youth presents unique conceptual challenges. Brain development is nonlinear, regionally heterogeneous, and dynamically shaped by environmental and genetic influences that can accelerate or delay maturation across time (Callaghan and Tottenham 2016; Whitmore and Beck 2025). As such, youth who appear neurobiologically “advanced” at one wave may appear “delayed” at another—not necessarily because predictions are unreliable, but rather because maturational trajectories could shift across waves. Additionally, the majority of existing developmental studies estimate brain age using cross‐sectional data; therefore, establishing how different algorithms perform cross‐sectionally is essential for interpreting the rapidly growing developmental brain age literature. For these reasons, we have prioritized evaluating cross‐sectional performance as a critical first step toward informing how brain age can be estimated and interpreted in youth. Future work with closely spaced scans should assess the short‐term test–retest reliability of brain age algorithms, while studies with longer follow‐up intervals can uniquely characterize dynamic patterns of brain maturation across development.

4.4. Conclusions

This study performed a systematic comparison of the performance of five brain age estimation algorithms in two population‐based cohorts of youth. We found that, although the majority of algorithms generated somewhat converging brain age predictions, different algorithms exhibited drastically different levels of performance, sensitivity to pubertal maturation, and robustness to acquisition sequence and image quality. Whereas the Pyment algorithm demonstrated the best performance and was not sensitive to puberty, the Drobinin and Centile algorithms were also strong but were sensitive to acquisition sequence, head motion, and puberty. Meanwhile, the Kaufmann and Whitmore algorithms generated weak predictions in our primary sample. Additionally, none of these estimation algorithms could accurately predict brain age in a second independent cohort, which had a narrow age range and was primarily composed of Black, low‐income youth. These findings provide a data‐driven call for continued basic research on the appropriate estimation of brain age in these populations. Ultimately, this study may inform algorithm selection in applied research that seeks to delineate how brain maturation is associated with adversity exposure, cognitive development, and mental health in youth.

Author Contributions

Cleanthis Michael: conceptualization, methodology, software, formal analysis, writing – original draft, visualization. Natasha S. Jones: methodology, software, writing – review and editing. Jamie L. Hanson: conceptualization, methodology, software, writing – review and editing. Heidi B. Westerman: software, writing – review and editing. Kelly L. Klump: writing – review and editing, funding acquisition. Colter Mitchell: writing – review and editing, funding acquisition. Christopher S. Monk: writing – review and editing, funding acquisition. S. Alexandra Burt: writing – review and editing, funding acquisition. Luke W. Hyde: conceptualization, methodology, writing – review and editing, funding acquisition.

Funding

Research reported in this publication related to MTwiNS was supported by the National Institutes of Mental Health (NIMH) and the Office of the Director National Institute of Health, under Award Number UH3MH114249 and the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the NIH under Award Number R01HD093334 to S.A.B. and L.W.H. Additional MTwiNS funding was provided by the Avielle Foundation via The Conway Family Award for Excellence in Neuroscience (to L.W.H. and S.A.B.) and a NARSAD Young Investigator Grant from the Brain and Behavior Foundation (to L.W.H.), and institutional funding was provided by the University of Michigan (to L.W.H.) and Michigan State University (to S.A.B.). Research reported in this publication related to SAND was supported by the NIMH (grants R01MH103761 to C.S.M. and R01MH121079 to L.W.H., Co.M., and C.S.M.). Funding for the Future of Families and Child Wellbeing Study was provided by the National Institute of Child Health and Human Development (grants R01‐HD36916, R01‐HD39135, U01‐HD110063, and R01‐HD40421) and a consortium of private foundations. Cl.M. was supported by the Marshall M. Weinberg Fellowship in Cognitive Science. J.L.H. was supported by the National Institute of Mental Health (R21MH128793) and the Learning, Research, & Development Center at the University of Pittsburgh. H.B.W. was supported by the Ruth L. Kirschstein National Research Service Award (F31MH131373) and Eunice Kennedy Shriver National Institute of Child Health and Human Development Developmental Psychology Training Grant (T32HD007109). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Conflicts of Interest

The authors declare no conflicts of interest.

Michael, C. , Jones N. S., Hanson J. L., et al. 2026. “A Systematic Evaluation of the Performance of Multiple Brain Age Algorithms in Two Cohorts of Youth.” Human Brain Mapping 47, no. 2: e70458. 10.1002/hbm.70458.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Bacas, E. , Kahhalé I., Raamana P. R., Pablo J. B., Anand A. S., and Hanson J. L.. 2023. “Probing Multiple Algorithms to Calculate Brain Age: Examining Reliability, Relations With Demographics, and Predictive Power.” Human Brain Mapping 44, no. 9: 3481–3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bashyam, V. M. , Erus G., Doshi J., et al. 2020. “MRI Signatures of Brain Age and Disease Over the Lifespan Based on a Deep Brain Network and 14 468 Individuals Worldwide.” Brain 143, no. 7: 2312–2324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beck, D. , Whitmore L., MacSweeney N., et al. 2025. “Dimensions of Early‐Life Adversity Are Differentially Associated With Patterns of Delayed and Accelerated Brain Maturation.” Biological Psychiatry 97, no. 1: 64–72. [DOI] [PubMed] [Google Scholar]
  4. Bezek, J. L. , Tillem S., Suarez G. L., et al. 2024. “Functional Brain Network Organization and Multidomain Resilience to Neighborhood Disadvantage in Youth.” American Psychologist 79, no. 8: 1123–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Burt, S. A. , and Klump K. L.. 2013. “The Michigan State University Twin Registry (MSUTR): An Update.” Twin Research and Human Genetics 16, no. 1: 344–350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Burt, S. A. , and Klump K. L.. 2019. “The Michigan State University Twin Registry (MSUTR): 15 Years of Twin and Family Research.” Twin Research and Human Genetics 22, no. 6: 741–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Callaghan, B. L. , and Tottenham N.. 2016. “The Stress Acceleration Hypothesis: Effects of Early‐Life Adversity on Emotion Circuits and Behavior.” Current Opinion in Behavioral Sciences 7: 76–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Casey, B. J. , Cannonier T., Conley M. I., et al. 2018. “The Adolescent Brain Cognitive Development (ABCD) Study: Imaging Acquisition Across 21 Sites.” Developmental Cognitive Neuroscience 32: 43–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cole, J. H. , and Franke K.. 2017. “Predicting Age Using Neuroimaging: Innovative Brain Ageing Biomarkers.” Trends in Neurosciences 40, no. 12: 681–690. [DOI] [PubMed] [Google Scholar]
  10. Cole, J. H. , Marioni R. E., Harris S. E., and Deary I. J.. 2019. “Brain Age and Other Bodily ‘Ages’: Implications for Neuropsychiatry.” Molecular Psychiatry 24, no. 2: 266–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cropley, V. L. , Tian Y., Fernando K., et al. 2021. “Brain‐Predicted Age Associates With Psychopathology Dimensions in Youths.” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 6, no. 4: 410–419. [DOI] [PubMed] [Google Scholar]
  12. de Lange, A.‐M. G. , Anatürk M., Rokicki J., et al. 2022. “Mind the Gap: Performance Metric Evaluation in Brain‐Age Prediction.” Human Brain Mapping 43, no. 10: 3113–3129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Desikan, R. S. , Ségonne F., Fischl B., et al. 2006. “An Automated Labeling System for Subdividing the Human Cerebral Cortex on MRI Scans Into Gyral Based Regions of Interest.” NeuroImage 31, no. 3: 968–980. [DOI] [PubMed] [Google Scholar]
  14. Dhamala, E. , Ricard J. A., Uddin L. Q., et al. 2024. “Considering the Interconnected Nature of Social Identities in Neuroimaging Research.” Nature Neuroscience 28, no. 2: 222–233. [DOI] [PubMed] [Google Scholar]
  15. Dörfel, R. P. , Arenas‐Gomez J. M., Fisher P. M., et al. 2023. “Prediction of Brain Age Using Structural Magnetic Resonance Imaging: A Comparison of Accuracy and Test–Retest Reliability of Publicly Available Software Packages.” Human Brain Mapping 44, no. 17: 6139–6148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Drobinin, V. , Van Gestel H., Helmick C. A., Schmidt M. H., Bowen C. V., and Uher R.. 2022. “The Developmental Brain Age Is Associated With Adversity, Depression, and Functional Outcomes Among Adolescents.” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 7, no. 4: 406–414. [DOI] [PubMed] [Google Scholar]
  17. Erus, G. , Battapady H., Satterthwaite T. D., et al. 2015. “Imaging Patterns of Brain Development and Their Relationship to Cognition.” Cerebral Cortex 25, no. 6: 1676–1684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Falk, E. B. , Hyde L. W., Mitchell C., et al. 2013. “What Is a Representative Brain? Neuroscience Meets Population Science.” Proceedings of the National Academy of Sciences of the United States of America 110, no. 44: 17615–17622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Fischl, B. 2012. “FreeSurfer.” NeuroImage 62, no. 2: 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fischl, B. , Salat D. H., Busa E., et al. 2002. “Whole Brain Segmentation: Automated Labeling of Neuroanatomical Structures in the Human Brain.” Neuron 33, no. 3: 341–355. [DOI] [PubMed] [Google Scholar]
  21. Franke, K. , and Gaser C.. 2019. “Ten Years of BrainAGE as a Neuroimaging Biomarker of Brain Aging: What Insights Have We Gained?” Frontiers in Neurology 10: 789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Franke, K. , Ziegler G., Klöppel S., and Gaser C.. 2010. “Estimating the Age of Healthy Subjects From T1‐Weighted MRI Scans Using Kernel Methods: Exploring the Influence of Various Parameters.” NeuroImage 50, no. 3: 883–892. [DOI] [PubMed] [Google Scholar]
  23. Fuhrmann, D. , Knoll L. J., and Blakemore S.‐J.. 2015. “Adolescence as a Sensitive Period of Brain Development.” Trends in Cognitive Sciences 19, no. 10: 558–566. [DOI] [PubMed] [Google Scholar]
  24. Gao, Y. , Staginnus M., Gao Y., et al. 2024. “Cortical Structure and Subcortical Volumes in Conduct Disorder: A Coordinated Analysis of 15 International Cohorts From the ENIGMA‐Antisocial Behavior Working Group.” Lancet Psychiatry 11, no. 8: 620–632. [DOI] [PubMed] [Google Scholar]
  25. Gilmore, A. D. , Buser N. J., and Hanson J. L.. 2021. “Variations in Structural MRI Quality Significantly Impact Commonly Used Measures of Brain Anatomy.” Brain Informatics 8, no. 1: 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Glasser, M. F. , Coalson T. S., Robinson E. C., et al. 2016. “A Multi‐Modal Parcellation of Human Cerebral Cortex.” Nature 536, no. 7615: 171–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gong, W. , Beckmann C. F., Vedaldi A., Smith S. M., and Peng H.. 2021. “Optimising a Simple Fully Convolutional Network for Accurate Brain Age Prediction in the PAC 2019 Challenge.” Frontiers in Psychiatry 12: 627996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hahn, T. , Fisch L., Ernsting J., et al. 2021. “From ‘Loose Fitting’ to High‐Performance, Uncertainty‐Aware Brain‐Age Modelling.” Brain 144, no. 3: e31. [DOI] [PubMed] [Google Scholar]
  29. Hanson, J. L. , Adkins D. J., Bacas E., and Zhou P.. 2024. “Examining the Reliability of Brain Age Algorithms Under Varying Degrees of Participant Motion.” Brain Informatics 11, no. 1: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hardi, F. A. , Goetschius L. G., Peckins M. K., et al. 2022. “Differential Developmental Associations of Material Hardship Exposure and Adolescent Amygdala–Prefrontal Cortex White Matter Connectivity.” Journal of Cognitive Neuroscience 34, no. 10: 1866–1891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hyde, L. W. , Bezek J. L., and Michael C.. 2024. “The Future of Neuroscience in Developmental Psychopathology.” Development and Psychopathology 36: 1–16. [DOI] [PubMed] [Google Scholar]
  32. Johnson, S. B. , Riis J. L., and Noble K. G.. 2016. “State of the Art Review: Poverty and the Developing Brain.” Pediatrics 137, no. 4: e20153075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Jollans, L. , Boyle R., Artiges E., et al. 2019. “Quantifying Performance of Machine Learning Methods for Neuroimaging Data.” NeuroImage 199: 351–365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kaufmann, T. , van der Meer D., Doan N. T., et al. 2019. “Common Brain Disorders Are Associated With Heritable Patterns of Apparent Aging of the Brain.” Nature Neuroscience 22, no. 10: 1617–1623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Keding, T. J. , Heyn S. A., Russell J. D., et al. 2021. “Differential Patterns of Delayed Emotion Circuit Maturation in Abused Girls With and Without Internalizing Psychopathology.” American Journal of Psychiatry 178, no. 11: 1026–1036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Keding, T. J. , Russell J. D., Zhu X., He Q., Li J. J., and Herringa R. J.. 2024. “Diverging Effects of Violence Exposure and Psychiatric Symptoms on Amygdala‐Prefrontal Maturation During Childhood and Adolescence.” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 10: 450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Kelly, C. , Ball G., Matthews L. G., et al. 2022. “Investigating Brain Structural Maturation in Children and Adolescents Born Very Preterm Using the Brain Age Framework.” NeuroImage 247: 118828. [DOI] [PubMed] [Google Scholar]
  38. Keyes, K. M. , Gary D., O'Malley P. M., Hamilton A., and Schulenberg J.. 2019. “Recent Increases in Depressive Symptoms Among US Adolescents: Trends From 1991 to 2018.” Social Psychiatry and Psychiatric Epidemiology 54, no. 8: 987–996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Kurth, F. , Levitt J. G., Gaser C., et al. 2022. “Preliminary Evidence for a Lower Brain Age in Children With Attention‐Deficit/Hyperactivity Disorder.” Frontiers in Psychiatry 13: 1019546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Leonardsen, E. H. , Peng H., Kaufmann T., et al. 2022. “Deep Neural Networks Learn General and Clinically Relevant Representations of the Ageing Brain.” NeuroImage 256: 119210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. LeWinn, K. Z. , Sheridan M. A., Keyes K. M., Hamilton A., and McLaughlin K. A.. 2017. “Sample Composition Alters Associations Between Age and Brain Structure.” Nature Communications 8, no. 1: 874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. MacSweeney, N. , Beck D., Whitmore L., et al. 2024. “Multimodal Brain Age Indicators of Internalising Problems in Early Adolescence: A Longitudinal Investigation.” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 10: 475. [DOI] [PubMed] [Google Scholar]
  43. Michael, C. , Gard A. M., Tillem S., et al. 2024. “Developmental Timing of Associations Among Parenting, Brain Architecture, and Mental Health.” JAMA Pediatrics 178, no. 12: 1326–1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Michael, C. , Tillem S., Sripada C. S., Burt S. A., Klump K. L., and Hyde L. W.. 2023. “Neighborhood Poverty During Childhood Prospectively Predicts Adolescent Functional Brain Network Architecture.” Developmental Cognitive Neuroscience 64: 101316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Mills, K. L. , Lalonde F., Clasen L. S., Giedd J. N., and Blakemore S.‐J.. 2014. “Developmental Changes in the Structure of the Social Brain in Late Childhood and Adolescence.” Social Cognitive and Affective Neuroscience 9, no. 1: 123–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Mills, K. L. , Siegmund K. D., Tamnes C. K., et al. 2021. “Inter‐Individual Variability in Structural Brain Development From Late Childhood to Young Adulthood.” NeuroImage 242: 118450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Niu, X. , Zhang F., Kounios J., and Liang H.. 2020. “Improved Prediction of Brain Age Using Multimodal Neuroimaging Data.” Human Brain Mapping 41, no. 6: 1626–1643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Petersen, A. C. , Crockett L., Richards M., and Boxer A.. 1988. “A Self‐Report Measure of Pubertal Status: Reliability, Validity, and Initial Norms.” Journal of Youth and Adolescence 17, no. 2: 117–133. [DOI] [PubMed] [Google Scholar]
  49. Raffington, L. , and Belsky D. W.. 2022. “Integrating DNA Methylation Measures of Biological Aging Into Social Determinants of Health Research.” Current Environmental Health Reports 9, no. 2: 196–210. [DOI] [PubMed] [Google Scholar]
  50. Rakesh, D. , Cropley V., Zalesky A., Vijayakumar N., Allen N. B., and Whittle S.. 2021. “Neighborhood Disadvantage and Longitudinal Brain‐Predicted‐Age Trajectory During Adolescence.” Developmental Cognitive Neuroscience 51: 101002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Rakesh, D. , Whittle S., Sheridan M. A., and McLaughlin K. A.. 2023. “Childhood Socioeconomic Status and the Pace of Structural Neurodevelopment: Accelerated, Delayed, or Simply Different?” Trends in Cognitive Sciences 27, no. 9: 833–851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Reichman, N. E. , Teitler J. O., Garfinkel I., and McLanahan S. S.. 2001. “Fragile Families: Sample and Design.” Children and Youth Services Review 23, no. 4: 303–326. [Google Scholar]
  53. Ricard, J. A. , Parker T. C., Dhamala E., Kwasa J., Allsop A., and Holmes A. J.. 2023. “Confronting Racially Exclusionary Practices in the Acquisition and Analyses of Neuroimaging Data.” Nature Neuroscience 26, no. 1: 4–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Rosen, A. F. G. , Roalf D. R., Ruparel K., et al. 2018. “Quantitative Assessment of Structural Image Quality.” NeuroImage 169: 407–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Sanford, N. , Ge R., Antoniades M., et al. 2022. “Sex Differences in Predictors and Regional Patterns of Brain Age Gap Estimates.” Human Brain Mapping 43, no. 15: 4689–4698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Smith, S. M. , Vidaurre D., Alfaro‐Almagro F., Nichols T. E., and Miller K. L.. 2019. “Estimation of Brain Age Delta From Brain Imaging.” NeuroImage 200: 528–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Somerville, L. H. 2016. “Searching for Signatures of Brain Maturity: What Are We Searching For?” Neuron 92, no. 6: 1164–1167. [DOI] [PubMed] [Google Scholar]
  58. Tamnes, C. K. , Herting M. M., Goddings A.‐L., et al. 2017. “Development of the Cerebral Cortex Across Adolescence: A Multisample Study of Inter‐Related Longitudinal Changes in Cortical Volume, Surface Area, and Thickness.” Journal of Neuroscience 37, no. 12: 3402–3412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Telzer, E. H. , McCormick E. M., Peters S., Cosme D., Pfeifer J. H., and van Duijvenvoorde A. C. K.. 2018. “Methodological Considerations for Developmental Longitudinal fMRI Research.” Developmental Cognitive Neuroscience 33: 149–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Tottenham, N. 2020. “Early Adversity and the Neotenous Human Brain.” Biological Psychiatry 87, no. 4: 350–358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Whitmore, L. , and Beck D.. 2025. “Current Challenges and Future Directions for Brain Age Prediction in Children and Adolescents.” Nature Communications 16, no. 1: 7771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Whitmore, L. B. , Weston S. J., and Mills K. L.. 2023. “BrainAGE as a Measure of Maturation During Early Adolescence.” Imaging Neuroscience 1: 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Wierenga, L. M. , Langen M., Oranje B., and Durston S.. 2014. “Unique Developmental Trajectories of Cortical Thickness and Surface Area.” NeuroImage 87: 120–126. [DOI] [PubMed] [Google Scholar]
  64. Willemsen, G. , Odintsova V., de Geus E., and Boomsma D. I.. 2021. “Twin‐Singleton Comparisons Across Multiple Domains of Life.” In Twin and Higher‐Order Pregnancies, edited by Khalil A., Lewi L., and Lopriore E., 51–71. Springer International Publishing. [Google Scholar]
  65. Wu, K. C. , Hong S., Cross F. L., et al. 2024. “Increasing Diversity in Neuroimaging Research: Participant‐Driven Recommendations From a Qualitative Study of an Under‐Represented Sample.” Developmental Cognitive Neuroscience 70: 101474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Yu, Y. , Cui H.‐Q., Haas S. S., et al. 2024. “Brain‐Age Prediction: Systematic Evaluation of Site Effects, and Sample Age Range and Size.” Human Brain Mapping 45, no. 10: e26768. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Human Brain Mapping are provided here courtesy of Wiley

RESOURCES