Abstract
The complex biological mechanisms underlying human brain aging remain incompletely understood, involving multiple body organs and chronic diseases. In this study, we used multimodal magnetic resonance imaging and artificial intelligence to examine the genetic architecture of the brain age gap (BAG) derived from gray matter volume (GM-BAG, N=31,557 European ancestry), white matter microstructure (WM-BAG, N=31,674), and functional connectivity (FC-BAG, N=32,017). We identified sixteen genomic loci that reached genome-wide significance (P-value<5×10−8). A gene-drug-disease network highlighted genes linked to GM-BAG for treating neurodegenerative and neuropsychiatric disorders and WM-BAG genes for cancer therapy. GM-BAG showed the highest heritability enrichment for genetic variants in conserved regions, whereas WM-BAG exhibited the highest heritability enrichment in the 5’ untranslated regions; oligodendrocytes and astrocytes, but not neurons, showed significant heritability enrichment in WM and FC-BAG, respectively. Mendelian randomization identified potential causal effects of several exposure variables on brain aging, such as type 2 diabetes on GM-BAG (odds ratio=1.05 [1.01, 1.09], P-value=1.96×10−2) and AD on WM-BAG (odds ratio=1.04 [1.02, 1.05], P-value=7.18×10−5). Overall, our results provide valuable insights into the genetics of human brain aging, with clinical implications for potential lifestyle and therapeutic interventions. All results are publicly available at the MEDICINE knowledge portal: https://labs.loni.usc.edu/medicine.
The advent of artificial intelligence (AI) has provided novel approaches to investigate various aspects of human brain health1,2, such as normal brain aging3, neurodegenerative disorders such as Alzheimer’s disease (AD)4, and brain cancer5. Based on magnetic resonance imaging (MRI), AI-derived measures of the human brain age6–8 have emerged as a valuable biomarker for evaluating brain health. More precisely, the difference between an individual’s AI-predicted brain age and chronological age – brain age gap (BAG) – provides a means of quantifying an individual’s brain health by measuring deviation from the normative aging trajectory. BAG has demonstrated sensitivity to several common brain diseases, clinical variables, and cognitive functions9, presenting the promising potential for its use in the general population to capture relevant pathological processes.
Brain imaging genomics10, an emerging scientific field advanced by both computational statistics and AI, uses imaging-derived phenotypes (IDP11) from MRI and genetics to offer mechanistic insights into healthy and pathological aging of the human brain. Recent large-scale genome-wide association studies (GWAS)11–18 have identified a diverse set of genomic loci linked to gray matter (GM)-IDP from T1-weighted MRI, white matter (WM)-IDP from diffusion MRI [fractional anisotropy (FA), mean diffusivity (MD), neurite density index (NDI), and orientation dispersion index (ODI)], and functional connectivity (FC)-IDP from functional MRI. While previous GWAS19 have associated BAG with common genetic variants [e.g., single nucleotide polymorphism (SNP)], they primarily focused on GM-BAG9,20–22 or did not comprehensively capture the genetic architecture of the multimodal BAG19 via post-GWAS analyses in order to biologically validate the GWAS signals. It is crucial to holistically identify the genetic factors associated with multimodal BAGs (GM, WM, and FC-BAG), where each BAG reflects distinct and/or similar neurobiological facets of human brain aging. Furthermore, dissecting the genetic architecture of human brain aging may determine the causal implications, which is essential for developing gene-inspired therapeutic interventions. Finally, numerous risk or protective lifestyle factors and neurobiological processes may also exert independent, synergistic, antagonistic, sequential, or differential influences on human brain health. Therefore, a holistic investigation of multimodal BAGs is urgent to fully capture the genetics of human brain aging, including the genetic correlation, gene-drug disease network, and potential causality. In this study, we postulate that AI-derived GM, WM, and FC-BAG can serve as robust, complementary endophenotypes23 – close to the underlying etiology – for precise quantification of human brain health.
The present study sought to uncover the genetic architecture of multimodal BAG and explore the causal relationships between protective/risk factors and decelerated/accelerated brain age. To accomplish this, we analyzed multimodal brain MRI scans from 42,089 participants from the UK Biobank (UKBB) study24 and used 119 GM-IDP, 48 FA WM-IDP, and 210 FC-IDP to derive GM, WM, and FC-BAG, respectively. Refer to Method 1for selecting the final feature sets for each BAG. We first compared the age prediction performance of different machine learning models using these IDPs. We then performed GWAS to identify genomic loci associated with GM, WM, and FC-BAG in the European ancestry population. In post-GWAS analyses, we constructed a gene-drug-disease network, estimated the genetic correlation with several brain disorders, assessed their heritability enrichment in various functional categories or specific cell types, and calculated the polygenic risk scores (PRS) of the three BAGs. Finally, we performed Mendelian Randomization (MR)25 to infer the causal effects of several clinical traits and diseases on the three BAGs.
Results
In the first section, we objectively compared the age prediction performance of four machine learning methods using these GM, WM, and FC-IDPs (Fig. 1A). To this end, we employed a nested cross-validation (CV) procedure in the training/validation/test dataset (N=4000); an independent test dataset (N=38,089)26,27 was held out – unseen until we finalized the models using only the training/validation/test dataset (Method 1). The four machine learning models included support vector regression (SVR), LASSO regression, multilayer perceptron (MLP), and a five-layer neural network (i.e., three linear layers and one rectified linear unit layer; hereafter, NN)28 (Method 3). The second section focused on the main GWASs using the European ancestry population (31,557<N<32,017) and their sensitivity checks in six scenarios (Method 4A). In the last section, we validated the GWAS findings in several post-GWAS analyses, including genetic correlation, gene-drug-disease network, partitioned heritability, PRS calculation, and Mendelian randomization (Method 4).
Figure 1: Brain age prediction using three MRI modalities and four machine learning models.
A)Multimodal brain MRI data were used to derive imaging-derived phenotypes (IDP) for T1-weighted MRI (119 GM-IDP), diffusion MRI (48 WM-IDP), and resting-state functional MRI (210 FC-IDP). IDPs for each modality are shown here using different colors based on predefined brain atlases or ICA for FC-IDP.B) Linear models achieved lower mean absolute errors (MAE) than non-linear models using support vector regression (SVR), LASSO regression, multilayer perceptron (MLP), and a five-layer neural network (NN). The MAE for the independent test dataset is presented, and the # symbol indicates the model with the lowest MAE for each modality. Error bars represent standard deviation (SD).C) Pearson’s correlation (r) between the predicted brain age and chronological age is computed, and statistical significance (P-value<0.05) - after adjustment for multiple comparisons using the FDR method - is denoted by the * symbol. Error bars represent the 95% confidence interval (CI).D) Scatter plot for the predicted brain age and chronological age.E) Phenotypic correlation (pc) between the GM, WM, and FC-BAG using Pearson’s correlation coefficient (r).
GM, WM, and FC-BAG derived from three MRI modalities
Several findings were observed based on the results from the independent test dataset (N=38,089, Method 1). First, GM-IDP (4.39<mean absolute error (MAE)<5.35; 0.64<r<0.66), WM-IDP (4.92<MAE<7.95; 0.42<r<0.65), and FC-IDP (5.48<MAE<6.05; 0.43 <r<0.46) achieved gradually a higher MAE and smaller Pearson’s correlation (r) (Fig. 1B, C, and D). Second, LASSO regression obtained the lowest MAE for GM, WM, and FC-IDP; linear models obtained a lower MAE than non-linear networks (Fig. 1B). Third, all models generalized well from the training/validation/test dataset (N=4000, Method 1) to the independent test dataset. However, simultaneously incorporating WM-IDP from FA, MD, NDI, and ODI resulted in severely overfitting models (Supplementary eTable 1A). The observed overfitting may be attributed to many parameters (N=38,364) in the network or strong correlations among the diffusion metrics (i.e., FA, MD, ODI, and NDI). Fourth, the experiments stratified by sex did not exhibit substantial differences, except for a stronger overfitting tendency observed in females compared to males using WM-IDP incorporating the four diffusion metrics (Supplementary eTable 1B). Detailed results of the CV procedure, including the training, validation, test performance, and sex-stratified experiments, are presented in Supplementary eTable 1. In all subsequent genetic analyses, we reported the results using BAG derived from the three LASSO models with the lowest MAE in each modality (Fig. 1A), with the “age bias” corrected as in De Lange et al.29.
In the literature, other studies30–33 have thoroughly evaluated age prediction performance using different machine learning models and input features. More et al.34 systematically compared the performance of age prediction of 128 workflows (MAE between 5.23–8.98 years) and showed that voxel-wise feature representation (MAE approximates 5–6 years) outperformed parcel-based features (MAE approximates 6–9 years) using conventional machine learning algorithms (e.g., LASSO regression). Using deep neural networks, Peng et al.30 and Leonardsen et al.31 reported a lower MAE (nearly 2.5 years) with voxel-wise imaging scans. However, we previously showed that a moderately fitting convolutional neural network (CNN) obtained significantly higher differentiation (a larger effect size) than a tightly fitting CNN (a lower MAE) between the disease and health groups35. To summarize, our study’s brain age prediction performance aligns with those reported in the existing literature, considering the utilization of low-dimensional hand-crafted IDPs and conventional machine learning algorithms34.
Finally, we calculated the phenotypic correlation (pc) between GM, WM, and FC-BAG using Pearson’s correlation coefficient. GM-BAG and WM-BAG showed the highest positive correlation (pc=0.38; P-value<1×10−10; N=30,733); GM-BAG (pc=0.09; P-value<1×10−10; N=30,660) and WM-BAG (pc=0.10; P-value<1×10−10; N=31,574) showed weak correlations with FC-BAG (Fig. 1E).
GM, WM, and FC-BAG are associated with sixteen genomic loci
In the European ancestry populations, GWAS (Method 4A) revealed 6, 9, and 1 genomic loci linked to GM (N=31,557), WM (N=31,674), and FC-BAG (N=32,017), respectively (Fig. 2A). The top lead SNP and mapped genes of each locus are presented in Supplementary eTable 2. We also calculated the genomic inflation factor (λ) and the linkage disequilibrium score regression (LDSC) intercept (b)36 to scrutinize the robustness of the GWAS of GM-BAG (λ=1.118; b=1.0016±0.0078), WM-BAG (λ=1.124; b=1.0187±0.0073), and FC-BAG (λ=1.046; b=1.0039±0.006). All LDSC intercepts were close to 1, indicating no substantial genomic inflation. The individual Manhattan and QQ plots of the three GWASs are presented in Supplementary eFigure 3 and are also publicly available at the MEDICINE knowledge portal: https://labs.loni.usc.edu/medicine. The three BAGs were significantly heritable (P-value<1×10−10) after adjusting for multiple comparisons using the Bonferroni method using the genome-wide complex trait analysis (GCTA) software37. GM-BAG showed the highest SNP-based heritability (h2=0.47±0.02), followed by WM-BAG (h2=0.46±0.02) and FC-BAG (h2=0.11±0.02).
Figure 2: Genome-wide associations of multimodal brain age gaps.
A) Genome-wide associations identified sixteen genomic loci associated with GM (6), WM (9), and FC-BAG (1) using a genome-wide P-value threshold [−log10(P-value) > 7.30]. The top lead SNP and the cytogenetic region number represent each locus. B) Phenome-wide association query from GWAS Catalog38. Independent significant SNPs inside each locus were largely associated with many traits. We further classified these traits into several trait categories, including biomarkers from multiple body organs (e.g., heart and liver), neurological disorders (e.g., Alzheimer’s disease and Parkinson’s disease), and lifestyle risk factors (e.g., alcohol consumption). C) Regional plot for a genomic locus associated with GM-BAG. Color-coded SNPs are decided based on their highest r2 to one of the nearby independent significant SNPs. Gray-colored SNPs are below the r2 threshold. The top lead SNP, lead SNPs, and independent significant SNPs are denoted as dark purple, purple, and red, respectively. Mapped, orange-colored genes of the genomic locus are annotated by positional, eQTL, and chromatin interaction mapping (Method 4B). D) Regional plot for a genomic locus associated with WM-BAG. E) The novel genomic locus associated with FC-BAG did not map to any genes. We used the Genome Reference Consortium Human Build 37 (GRCh37) in all genetic analyses. F) Genetic correlation (gc) between the GM, WM, and FC-BAG using the LDSC software. Abbreviation: AD: Alzheimer’s disease; ASD: autism spectrum disorder; PD: Parkinson’s disease; ADHD: attention-deficit/hyperactivity disorder.
We performed a query in the GWAS Catalog38 for these genetic variants within each locus to understand the phenome-wide association of these identified loci in previous literature (Method 4C). Notably, the SNPs within each locus were linked to other traits previously reported in the literature (Supplementary eFile 1). Specifically, the GM-BAG loci were uniquely associated with neuropsychiatric disorders such as major depressive disorder (MDD), heart disease, and cardiovascular disease. We also observed associations between these loci and other diseases (including anemia), as well as biomarkers from various human organs (e.g., liver) (Fig. 2B). We then performed positional and functional annotations to map SNPs to genes associated with GM, WM, and FC-BAG loci (Method 4B). Fig. 2C–E showcased the regional Manhattan plot of one genomic locus linked to GM, WM, and FC-BAG. A detailed discussion of these exemplary loci, SNPs, and genes is presented in Supplementary eText 1.
Finally, we calculated the genetic correlation (gc) between the GM, WM, and FC-BAG using the LDSC software. GM-BAG and WM-BAG showed the highest positive correlation (gc=0.49; P-value<1×10−10); GM-BAG (gc=0.20; P-value=0.025) and WM-BAG (gc=0.29; P-value=0.005) showed weak correlations with FC-BAG (Fig. 2F). The genetic correlations largely mirror the phenotypic correlations, supporting the long-standing Cheverud’s Conjecture39. We also verified that these genetic correlations exhibited consistency between the two random splits (split1 and spit2: 15,778<N<16,008), sharing a similar age and sex distribution (Supplementary eFigure 2).
Sensitivity analyses for the genome-wide associations
We aimed to check the robustness of the main GWASs using the full sample sizes of the European populations (Fig. 2A). To this end, we performed six sensitivity analyses (Method 4A).
Applying the Bonferroni method to correct for multiple comparisons, we noted high concordance rates between the split1 (as discovery, 15,778<N<16,008) and split2 (as replication, 15,778<N<16,008) GWASs. Specifically, for GM-BAG, we observed a concordance rate of 99% [P-value<0.05/3092; 3092 significant SNPs passing the genome-wide P-value threshold (<5×10−8) in the discovery data], and for WM-BAG, the concordance rate reached 100% (P-value<0.05/116). FC-BAG did not achieve significant genome-wide results in the spit-sample GWASs (Supplementary eFigure 3 and Supplementary eFile 2).
In sex-stratified GWASs, the concordance rates were 100% (P-value<0.05/3072) for GM-BAG and 88.6% (P-value<0.05/116) for WM-BAG when comparing the male-GWAS (as replication, 14,969<N<15,127) to female-GWAS (as discovery, 16,588<N<16,890). FC-BAG did not achieve significant genome-wide results (Supplementary eFigure 4 and Supplementary eFile 3).
The concordance rates of the GWASs using non-European ancestry populations (as replication, 4646<N<5091) were low compared to the main GWASs using the European population: only 13.78% for GM-BAG and 41.94% for WM-BAG (P-value<0.05) (Supplementary eFigure 5 and Supplementary eFile 4).
A mixed linear model employed via fastGWA40 (as replication, 31,557<N<32,017) obtained 100% concordance rates for GM, WM, and FC-BAG compared to GWAS using PLINK linear regression (Supplementary eFile 5). The genetic loci, genomic inflation factor (l), and the LDSC intercepts for GM, WM, and FC-BAG were similar between the PLINK and fastGWA analyses (Supplementary eFigure 6).
We found a 100% concordance rate of the SNPs identified for the GM-BAG GWAS using LASSO regression (as discovery, BAG MAE=4.39 years) and SVR (P-value < 0.05/3382, as replication, BAG MAE=4.43 years) (Supplementary eFigure 7 and Supplementary eFile 6). The BAGs derived from the two machine larning models were highly correlated (r=0.99; P-value<1×10−10).
We finally found a 92.43% concordance rate of the SNPs identified in the GM-BAG GWAS using the 119 MUSE ROIs41 (as discovery, BAG MAE=4.39 years) and voxel-wide RAVENS42 maps (as replication, P-value < 0.05/3382, BAG MAE=5.12 years) (Supplementary eFigure 8 and Supplementary eFile 7). The BAGs derived from the two types of features were significantly correlated (r=0.74; P-value<1×10−10). The brain age prediction performance using RAVENS showed marginal overfitting, with an MAE of 4.31 years in the training/validation/test dataset and an MAE of 5.12 years in the independent test dataset.
These findings suggest that our GWASs were robust across sex, random splits, imaging features, GWAS methods, and machine learning methods within European populations; however, their generalizability to non-European populations is limited. All subsequent post-GWAS analyses were conducted using the main GWAS results of European ancestry.
The gene-drug-disease network highlights disease-specific drugs that bind to genes associated with GM and WM-BAG
We investigated the potential “druggable genes43“ from the mapped genes by constructing a gene-drug-disease network (Method 4F). The network connects genes with drugs (or drug-like molecules) targeting specific diseases currently active at any stage of clinical trials.
We revealed clinically relevant associations for 4 and 6 mapped genes associated with GM-BAG and WM-BAG, respectively. The GM-BAG genes were linked to clinical trials for treating heart, neurodegenerative, neuropsychiatric, and respiratory diseases. On the other hand, the WM-BAG genes were primarily targeted for various cancer treatments and cardiovascular diseases (Fig. 3). To illustrate, for the GM-BAG MAPT gene, several drugs or drug-like molecules are currently being evaluated for treating AD. Semorinemab (RG6100), an anti-tau IgG4 antibody, was being investigated in a phase-2 clinical trial (trial number: NCT03828747), which targets extracellular tau in AD, to reduce microglial activation and inflammatory responses44. Another drug is the LMTM (TRx0237) - a second-generation tau protein aggregation inhibitor currently being tested in a phase-3 clinical trial (trial number: NCT03446001) for treating AD and frontotemporal dementia45. Regarding WM-BAG genes, they primarily bind with drugs for treating cancer and cardiovascular diseases. For instance, the PDIA3 gene, associated with the folding and oxidation of proteins, has been targeted for developing several zinc-related FDA-approved drugs for treating cardiovascular diseases. Another example is the MAP1A gene, which encodes microtubule-associated protein 1A. This gene is linked to the development of estramustine, an FDA-approved drug for prostate cancer (Fig. 3). Detailed results are presented in Supplementary eFile 8.
Figure 3: Gene-drug-disease network of multimodal brain age gaps.
The gene-drug-disease network derived from the mapped genes revealed a broad spectrum of targeted diseases and cancer, including brain cancer, cardiovascular system diseases, Alzheimer’s disease, and obstructive airway disease, among others. The thickness of the lines represented the P-values (-log10) from the brain tissue-specific gene set enrichment analyses using the GTEx v8 dataset. We highlight several drugs under the blue-colored and bold text. Abbreviation: ATC: Anatomical Therapeutic Chemical; ICD: International Classification of Diseases.
Multimodal BAG is genetically correlated with AI-derived subtypes of brain diseases
We calculated the genetic correlation using the GWAS summary statistics from 16 clinical traits to examine genetic covariance between multimodal BAG and other clinical traits. The selection procedure and quality check of the GWAS summary statistics are detailed in Method 4D. These traits encompassed common brain diseases and their AI-derived disease subtypes, as well as education and intelligence (Fig. 4A and Supplementary eTable 3). The AI-generated disease subtypes were established in our previous studies utilizing semi-supervised clustering methods46 and IDP from brain MRI scans.
Figure 4: Genetic correlation, partitioned heritability enrichment, and PRS prediction accuracy on multimodal brain age gaps.
A) Genetic correlation (gc) between GM, WM, and FC-BAG and 16 clinical traits. These traits include neurodegenerative diseases (e.g., AD) and their AI-derived subtypes (e.g., AD1 and AD24), neuropsychiatric disorders (e.g., ASD) and their subtypes (ASD1, 2, and 347), intelligence, and education. B) The proportion of heritability enrichment for the 53 functional categories51. We only show the functional categories that survived the correction for multiple comparisons using the FDR method. C) Cell type-specific partitioned heritability estimates. We included gene sets from Cahoy et al.58 for three main cell types (i.e., astrocyte, neuron, and oligodendrocyte). After adjusting for multiple comparisons using the FDR method, the * symbol denotes statistical significance (P-value<0.05). Error bars represent the standard error of the estimated parameters. D) The incremental R2 of the PRS derived by PRC-CS to predict the GM, WM, and FC-BAG in the target/test data (i.e., the split2 GWAS). The y-axis indicates the proportions of phenotypic variation (GM, WM, and FC-BAG) that the PRS can significantly and additionally explain. The x-axis lists the seven P-value thresholds considered. Abbreviation: AD: Alzheimer’s disease; ADHD: attention-deficit/hyperactivity disorder; ASD: autism spectrum disorder; BIP: bipolar disorder; MDD: major depressive disorder; OCD: obsessive-compulsive disorder; SCZ: schizophrenia; CAD: coronary artery disease; CD: Crohn’s disease; BMD: bone mineral density; PD: Parkinson’s disease; SLE: systemic lupus erythematosus; BMI: body mass index; CVD: cardiovascular disease; LDL: low-density lipoprotein cholesterol; MS: multiple sclerosis; AF: Atrial fibrillation.
Our analysis revealed significant genetic correlations between GM-BAG and AI-derived subtypes of AD (AD14), autism spectrum disorder (ASD) (ASD1 and ASD347), schizophrenia (SCZ148), and obsessive-compulsive disorder (OCD)49; WM-BAG and AD1, ASD1, SCZ1, and SCZ2; and FC-BAG and education50 and SCZ1. Detailed results for rg estimates are presented in Supplementary eTable 4. These subtypes, in essence, capture more homogeneous disease effects than the conventional “unitary” disease diagnosis, hence serving as robust endophenotypes23.
Multimodal BAG shows specific enrichment of heritability in different functional categories and cell types
We conducted a partitioned heritability analysis51 to investigate the heritability enrichment of genetic variants related to multimodal BAG in the 53 functional categories (Method 4E). Our results revealed that GM and WM-BAG exhibited significant heritability enrichment across numerous annotated functional categories. Specifically, some categories displayed greater enrichment than others, and we have outlined some in further detail.
For GM-BAG, the regions conserved across mammals, as indicated by the label “conserved” in Fig. 4B, displayed the most notable enrichment of heritability: approximately 2.61% of SNPs were found to explain 0.43±0.07 of SNP heritability (P-value=5.80×10−8). Additionally, transcription start site (TSS)52 regions employed 1.82% of SNPs to explain 0.16±0.05 of SNP heritability (P-value=8.05×10−3). TSS initiates the transcription at the 5’ end of a gene and is typically embedded within a core promoter crucial to the transcription machinery53. The heritability enrichment of Histone H3 at lysine 4, as denoted for “H3K4me3_peaks” in Fig. 4B, and histone H3 at lysine 9 (H3K9ac)54 were also found to be large and were known to highlight active gene promoters55. For WM-BAG, 5’ untranslated regions (UTR) used 0.54% of SNPs to explain 0.09±0.03 of SNP heritability (P-value=4.24×10−3). The 5’ UTR is a crucial region of a messenger RNA located upstream of the initiation codon. It is pivotal in regulating transcript translation, with varying mechanisms in viruses, prokaryotes, and eukaryotes.
Additionally, we examined the heritability enrichment of multimodal BAG in three different cell types (Fig. 4C). WM-BAG (P-value=1.69×10−3) exhibited significant heritability enrichment in oligodendrocytes, one type of neuroglial cells. FC-BAG (P-value=1.12×10−2) showed such enrichment in astrocytes, the most prevalent glial cells in the brain. GM-BAG showed no enrichment in any of these cells. Our findings are consistent with understanding the molecular and biological characteristics of GM and WM. Oligodendrocytes are primarily responsible for forming the lipid-rich myelin structure, whereas astrocytes play a crucial role in various cerebral functions, such as brain development and homeostasis. Convincingly, a prior GWAS14 on WM-IDP also identified considerable heritability enrichment in glial cells, especially oligodendrocytes. Detailed results for the 53 functional categories and cell-specific analyses are presented in Supplementary eTable 5.
Prediction ability of the polygenic risk score of the multimodal BAG
We derived the PRS for GM, WM, and FC-BAG using the conventional C+T (clumping plus P-value threshold) approach56 via PLINK and a Bayesian method via PRS-CS57 (Method 4H).
We found that the GM, WM, and FC-BAG-PRS derived from PRS-CS significantly predicted the phenotypic BAGs in the test data (split2 GWAS, 15,697<N<15,940), with an incremental R2 of 2.17%, 1.85%, and 0.19%, respectively (Fig. 4D). Compared to the PRS derived from PRS-CS, the PLINK approach achieved a lower incremental R2 of 0.81%, 0.45%, and 0.14% for GM, WM, and FC-BAG, respectively (Supplementary eFigure 9). Overall, the predictive capacity of PRS is moderate, in line with earlier discoveries involving raw imaging-derived phenotypes, as demonstrated in Zhao et al.13, where PRSs developed for seven selective brain regions were able to explain roughly 1.18% to 3.93% of the phenotypic variance associated with these traits.
The potential causal relationships between GM and WM-BAG and other clinical traits
We investigated the potential causal effects of several risk factors (i.e., exposure variable) on multimodal BAG (i.e., outcome variable) using a bidirectional two-sample MR approach59 (Method 4G). We hypothesized that several diseases and lifestyle risk factors might contribute to accelerating or decelerating human brain aging.
We found putative causal effects of triglyceride-to-lipid ratio in very large very-low-density lipoprotein (VLDL)60 [P-value=5.09×10−3, OR (95% CI) = 1.08 (1.02, 1.13), number of SNPs=52], type 2 diabetes61 [P-value=1.96×10−2, OR (95% CI) = 1.05 (1.01, 1.09), number of SNPs=10], and breast cancer62 [P-value=1.81×10−2, OR (95% CI) = 0.96 (0.93, 0.99), number of SNPs=118] on GM-BAG (i.e., accelerated brain age). We also identified causal effects of AD63 [P-value=7.18×10−5, OR (95% CI) = 1.04 (1.02, 1.05), number of SNPs=13] on WM-BAG (Fig. 5A). We subsequently examined the potential inverse causal effects of multimodal BAG (i.e., exposure) on these risk factors (i.e., outcome). However, owing to the restricted power [number of instrumental variables (IV) < 6], we did not observe any significant signals (Supplementary eFigure 10 and Supplementary eFile 9).
Figure 5: Causal inference of multimodal brain age gaps.
A) Causal inference was performed using a two-sample Mendelian Randomization (MR, Method 4G) approach for seven selected exposure variables on three outcome variables (i.e., GM, WM, and FC-BAG). The symbol * denotes statistical significance after correcting for multiple comparisons using the FDR method (N=7); the symbol # denotes the tests passing the nominal significance threshold (P-value=0.05) but did not survive the multiple comparisons. The odds ratio (OR) and the 95% confidence interval (CI) are presented. B) Leave-one-out analysis of the triglyceride-to-lipid ratio on GM-BAG. Each row represents the MR effect (log OR) and the 95% CI by excluding that SNP from the analysis. The red line depicts the IVW estimator using all SNPs. C) Forest plot for the single-SNP MR results. Each line represents the MR effect (log OR) for the triglyceride-to-lipid ratio on GM-BAG using only one SNP; the red line shows the MR effect using all SNPs together. D) Scatter plot for the MR effect sizes of the SNP-triglyceride-to-lipid ratio association (x-axis, SD units) and the SNP-GM-BAG associations (y-axis, log OR) with standard error bars. The slopes of the purple and green lines correspond to the causal effect sizes estimated by the IVW and the MR Egger estimator, respectively. We annotated a potential outlier. E) Funnel plot for the relationship between the causal effect of the triglyceride-to-lipid ratio on GM-BAG. Each dot represents MR effect sizes estimated using each SNP as a separate instrument against the inverse of the standard error of the causal estimate. The vertical red line shows the MR estimates using all SNPs. We annotated a potential outlier. Abbreviation: AD: Alzheimer’s disease; AST: aspartate aminotransferase; BMI: body mass index; VLDL: very low-density lipoprotein; CI: confidence interval; OR: odds ratio; SD: standard deviation; SE: standard error.
Sensitivity analyses for Mendelian randomization
We performed sensitivity analyses to investigate potential violations of the three IV assumptions (Method 4G). To illustrate this, we showcased the sensitivity analysis results for the causal effect of the triglyceride-to-lipid in VLDL ratio on GM-BAG (Fig. 5B–E). In a leave-one-out analysis, we found that no single SNP overwhelmingly drove the overall effect (Fig. 5B). There was evidence for the presence of minor heterogeneity64 of the causal effect amongst SNPs (Cochran’s Q value=76.06, P-value=5.09×10−3). Some SNPs exerted opposite causal effects compared to the model using all SNPs (Fig. 5C). The scatter plot (Fig. 5D) indicated one obvious SNP outlier (rs11591147), and the funnel plot showed little asymmetry with only an outlier denoted in Fig. 5E (rs4507142). Finally, the MR Egger estimator allows for pleiotropic effects independent of the effect on the exposure of interest (i.e., the InSIDE assumption65). Our results from the Egger estimator showed a small positive intercept (5.21×10−3±2.87×10−3, P-value=0.07) and a lower OR [inverse-variance weighted (IVW): 1.08 (1.02, 1.13) vs. Egger: 1.01 (0.93, 1.10)], which may indicate the presence of directional horizontal pleiotropy for some SNPs. We present sensitivity analyses for other significant exposure variables in Supplementary eFigure 11.
To investigate the potential directional pleiotropic effects, we re-analyzed the MR Egger regression by excluding the two outliers identified in Fig. 5D (rs11591147) and E (rs4507142), which led to a slightly increased OR [1.04 (0.96, 1.12)] and a smaller positive intercept (4.41×10−3±2.65×10−3, P-value=0.09). Our findings support that these two outlier SNPs may have a directional pleiotropic effect on GM-BAG. Nevertheless, given the complex nature of brain aging, many other biological pathways may also contribute to human brain aging. For instance, the SNP (rs11591147) was largely associated with other blood lipids, such as LDL cholesterol66, and heart diseases, such as coronary artery disease67. Detailed results obtained from all five MR methods are presented in Supplementary eFile 9.
Discussion
The present study harnessed brain imaging genetics from a cohort of 42,089 participants in UKBB to investigate the underlying genetics of multimodal BAG. Our approach commenced with objectively assessing brain age prediction performance, encompassing various imaging modalities (T1-weighted, diffusion, and resting-state MRI), feature types (ROI vs. voxel), and machine learning algorithms. Subsequently, we conducted genome-wide associations, demonstrating the robustness of identified genetic signals in individuals of European ancestry across diverse factors. Lastly, our study encompassed several post-GWAS analyses, validating the GWAS results, shedding light on the intricate biological processes involved, and uncovering the multifaceted interplay between human brain aging and various health conditions and clinical traits. Our findings unveiled shared genetic factors and unique characteristics – varying degrees of phenotypic and genetic correlation – within BAG across three distinct imaging modalities.
Genetic architecture of GM-BAG
Our genetic results from GM-BAG substantiate that many diseases, conditions, and clinical phenotypes share genetic underpinnings with brain age, perhaps driven by macrostructural changes in GM (e.g., brain atrophy). The locus with the most significant signal (the top lead SNP rs534114641 at 17q21.31) showed substantial association with the traits mentioned above and was mapped to numerous genes associated with various diseases (Fig. 2C). Several previous GM-BAG GWAS20,22 also identified this locus. Among these genes, the MAPT gene, known to encode a protein called tau, is a prominent AD hallmark and implicated in approximately 30 tauopathies, including progressive supranuclear palsy and frontotemporal lobar degeneration68. Our gene-drug-disease network also showed several drugs, such as Semorinemab44, in active clinical trials currently targeting treatment for AD (Fig. 3). The heritability enrichment of GM-BAG was high in several functional categories, with conserved regions being the most prominent. The observed higher heritability enrichment in conserved regions compared to coding regions69 supports the long-standing hypothesis regarding the functional significance of conserved sequences. However, the precise role of many highly conserved non-coding DNA sequences remains unclear70. The genetic correlation results of GM-BAG with subtypes of common brain diseases highlight the promise for the AI-derived subtypes, rather than the “one-for-all” unitary disease diagnosis, as robust endophenotypes23. These findings strongly support the clinical implications of re-evaluating pertinent hypotheses using the AI-derived subtypes in patient stratification and personalized medicine.
The elevated triglyceride-to-lipid ratio in VLDL, an established biomarker for cardiovascular diseases71, is causally associated with higher GM-BAG (accelerated brain age). Therefore, lifestyle interventions that target this biomarker might hold promise as an effective strategy to enhance overall brain health. In addition, we revealed that one unit-increased likelihood of type 2 diabetes has a causal effect on GM-BAG increase. Research has shown that normal brain aging is accelerated by approximately 26% in patients with progressive type 2 diabetes compared with healthy controls72. The protective causal effect of breast cancer on GM-BAG is intriguing in light of existing literature adversely linking breast cancer to brain metastasis73 and chemotherapy-induced cognitive impairments, commonly known as “chemo brain”. In addition, it’s important to exercise caution when considering the potential causal link between breast cancer and GM-BAG, as MR analyses are susceptible to population selection bias74 due to the high breast cancer mortality rate.
Genetic architecture of WM-BAG
The genetic architecture of WM-BAG exhibits strong correlations with cancer-related traits, AD, and physical measures such as BMI, among others. Our phenome-wide association query largely confirms the enrichment of these traits in previous literature. In particular, the DNAJC1 gene, annotated from the most polygenic locus on chromosome 10 (top lead SNP: rs564819152), encodes a protein called heat shock protein 40 (Hsp40) and plays a role in protein folding and the response to cellular stress. This gene is implicated in various cancer types, such as breast, renal, and melanoma (Supplementary eFigure 12). In addition, several FDA-approved drugs have been developed based on these WM-BAG genes for different types of cancer in our gene-drug-disease network (Fig. 3). Our findings provide novel insights into the genetic underpinnings of WM-BAG and their potential relevance to cancer.
Remarkably, one unit-increased likelihood of AD was causally associated with increased WM-BAG. Our Mendelian randomization analysis confirmed the abundant association evidenced by the phenome-wide association query (Fig. 2B). Dementia, such as AD, is undeniably a significant factor contributing to the decline of the aging brain. Evidence suggests that AD is not solely a GM disease; significant microstructural changes can be observed in WM before the onset of cognitive decline75. We also identified a nominal causal significance of BMI [risk effect; P-value=4.73×10−2, OR (95% CI) = 1.03 (1.00, 1.07)] on WM-BAG. These findings underscore the potential of lifestyle interventions and medications currently being tested in clinical trials for AD to improve overall brain health.
Genetic architecture of FC-BAG
The genetic signals for FC-BAG were weaker than those observed for GM and WM-BAG, which is consistent with the age prediction performance and partially corroborates Cheverud’s conjecture: using genetic correlations (Fig. 2F) as proxies for phenotypic correlations (Fig. 1E) when collecting individual phenotypes is expensive and unavailable. A novel genomic locus on chromosome 6 (6q.13) harbors an independent variant (rs1204329) previously linked to insomnia76. The top lead SNP, rs5877290, associated with this locus is a novel deletion-insertion mutation type: no known association with any human disease or gene mapping has been established for this SNP. The genetic basis of FC-BAG covaries with educational performance and schizophrenia subtypes. Specifically, parental education has been linked to cognitive ability, and researchers have identified a functional connectivity biomarker between the right rostral prefrontal cortex and occipital cortex that mediates the transmission of maternal education to offspring’s performance IQ77. On the other hand, schizophrenia is a highly heritable mental disorder that exhibits functional dysconnectivity throughout the brain78. AD was causally associated with FC-BAG with nominal significance [risk effect for per unit increase; P-value=4.43×10−2, OR (95% CI) = 1.02 (1.00, 1.03), number of SNPs=13] (Fig. 5A). The relationship between functional brain networks and the characteristic distribution of amyloid-β and tau in AD79 provides evidence that AD is a significant factor in the aging brain, underscoring its role as a primary causative agent.
The comparative trend of genetic heritability among GM, WM, and FC-BAG is also consistent with previous large-scale GWAS of multimodal brain IDP. Zhao et al. performed GWAS on GM13, WM14, and FC-IDP18, showing that FC-IDP is less genetically heritable than others. Similar observations were also demonstrated by Elliot et al.11 in the first large-scale GWAS using multimodal IDP from UKBB. The weaker genetic signal observed in FC-BAG can be attributed to many factors. One of the main reasons is the higher signal-to-noise ratio in FC measurements due to the dynamic and complex nature of brain activity, which can make it difficult to accurately measure and distinguish between the true signal and noise. Social-environmental and lifestyle factors can also contribute to the “missing heritability” observed in FC-BAG. For example, stress, sleep patterns, physical activity, and other environmental factors can impact brain function and connectivity80. In contrast, GM and WM measurements are more stable and less influenced by environmental factors, which may explain why they exhibit stronger genetic signals and higher heritability estimates.
Limitations
This study has several limitations. We can employ deep learning on voxel-wise imaging scans to enhance brain age prediction performance. Nevertheless, it warrants additional exploration to determine whether the resulting reduction in MAE translates into more robust genome-wide associations, as our previous work has demonstrated that BAGs derived from a CNN with a lower MAE did not exhibit heightened sensitivity to disease effects such as AD35. Second, the generalization ability of the GWAS findings to non-European ancestry is limited, potentially due to small sample sizes and cryptic population stratification. Future investigations can be expanded to encompass a broader spectrum of underrepresented ethnic groups, diverse disease populations, and various age ranges spanning the entire lifespan. This expansion can be facilitated by leveraging the resources of large-scale brain imaging genetic consortia like ADNI81, focused on Alzheimer’s disease, and ABCD82, which centers on brain development during adolescence. Third, it’s important to exercise caution when interpreting the results of this study due to the various assumptions associated with the statistical methods employed, including LDSC and MR. Lastly, it’s worth noting that brain age represents a residual score encompassing measurement error. A recent study83 has underscored the significance of incorporating longitudinal data when calculating brain age. Future research should be conducted once the longitudinal scans from the UK Biobank become accessible to explore this impact on GWASs.
Outlook
In summary, our multimodal BAG GWASs provide evidence that the aging process of the human brain is a complex biological phenomenon intertwined with several organ systems and chronic diseases. We digitized the human brain from multimodal imaging and captured a complete genetic landscape of human brain aging. This opens new avenues for drug repurposing/repositioning and aids in identifying modifiable protective and risk factors that can ameliorate human brain health.
Methods
Method 1: Study populations
UKBB is a population-based study of more than 50,000 people recruited between 2006 and 2010 from Great Britain. The current study focused on participants from the imaging-genomics population who underwent both an MRI scan and genome sequencing (genotype array data and the imputed genotype data) under application number 35148. The UKBB study has ethical approval, and the ethics committee is detailed here: https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ethics-advisory-committee. The study design, phenotype and genetic data availability, and quality check have been published and detailed elsewhere24. Table 1 shows the study characteristics of the present work.
Table 1.
Study characteristics.
| Population (overlap) | T1w MRI | dMRI | rsfMRI | Age(year)* | Sex/female* |
|---|---|---|---|---|---|
| Total (35,261) | 36,304 | 39,661 | 36,858 | 63.64 (45.00, 81.00) | 18,700/53% |
| Training/validation/test (4000) | 4000 | 4000 | 4000 | 63.47 (46.00, 81.00) | 2000/50% |
| Independent test (31,261) | 32,304 | 35,661 | 32,858 | 63.66 (45.00, 81.00) | 16,700/53% |
| GWAS | 31,557 | 31,749 | 32,017 | NA | NA |
The current table presents participants of all ancestries for the age prediction task. We constrained participants with only European ancestry for downstream genetic analyses.
For age and sex, we reported statistics for the overlapping population of the three modalities: 35,261 participants for the entire population, 4000 participants for the training/validation/test dataset, and 31,261 participants for the independent test dataset.
We also showed the number of participants for the GM, WM, and FC-BAG GWAS. In total, our analyses included 42,089 unique participants who had at least one image scan. Abbreviation: dMRI: diffusion MRI; rsfMRI: resting-state functional MRI; T1w MRI: T1-weighted MRI.
To train the machine learning model and compare the performance of the multimodal BAG, we defined the following two datasets:
Training/validation/test dataset:To objectively compare the age prediction performance of different MRI modalities and machine learning models, we randomly sub-sampled 500 (250 females) participants within each decade’s range from 44 to 84 years old, resulting in the same 4000 participants for GM, WM, and FC-IDP. This dataset was used to train machine learning models. In addition, we ensured that the training/validation/test splits were the same in the CV procedure. As UKBB is a general population, we explicitly excluded participants with common brain diseases, including mental and behavioral disorders (ICD-10 code: F; N=2678) and diseases linked to the central nervous system (ICD-10 code: G group; N=3336).
Independent test dataset:The rest of the population for each MRI modality (N=38089) was set as independent test datasets – unseen until we finalized the training procedure84.
The GM-IDP includes 119 GM regional volumes from the MUSE atlas, consolidated by the iSTAGING consortium. We studied the influence of different WM-IDP features: i) 48 FA values; ii) 109 TBSS-based85 values from FA, MD, ODI, and NDI; iii) 192 skeleton-based mean values from FA, MD, ODI, and NDI. For FC-IDP, 210 ICA-derived functional connectivity components were included. The WM and FC-IDP were downloaded from UKBB (Method 3B).
Method 2: Image processing
(A): T1-weighted MRI processing:
The imaging quality check is detailed in Supplementary eMethod 2. All images were first corrected for magnetic field intensity inhomogeneity.86 A deep learning-based skull stripping algorithm was applied to remove extra-cranial material. In total, 145 IDPs were generated in gray matter (GM, 119 ROIs), white matter (WM, 20 ROIs), and ventricles (6 ROIs) using a multi-atlas label fusion method41. The 119 GM ROIs were fit to the four machine learning models to derive the GM-BAG.
(B): Diffusion MRI processing:
UKBB has processed diffusion MRI (dMRI) data and released several WM tract-based metrics for the Diffusion Tensor Imaging (DTI) model (single-shell dMRI) and Neurite Orientation Dispersion and Density Imaging (NODDI87) model (multi-shell dMRI). The Eddy88 tool corrected raw images for eddy currents, head motion, and outlier slices. The mean values of FA, MD, ODI, and NDI were extracted from the 48 WM tracts of the “ICBM-DTI-81 white-matter labels” atlas89, resulting in 192 WM-IDP (category code:134). In addition, a tract-skeleton (TBSS)85 and probabilistic tractography analysis90 were employed to derive weighted-mean measures within the 27 major WM tracts, referred to as the 108 TBSS WM-IDP (category code: 135). Finally, since we observed overfitting – an increase of MAEs from the cross-validated test results to the independent test results – when incorporating features from FA, MD, ODI, and NDI (as detailed in Supplementary eTable 1A), we chose to use only the 48 FA WM-IDPs to train the models for generating GM-BAG.
(C): Resting-state functional MRI processing:
For FC-IDP, we used the 21 × 21 resting-state functional connectivity (full correlation) matrices (data-field code: 25750) from UKBB91,92. UKBB processed rsfMRI data and released 25 whole-brain spatial independent component analysis (ICA)-derived components93; four components were removed due to artifactual components. This resulted in 210 FC-IDP quantifying pairwise correlations of the ICA-derived components. Details of dMRI and rsfMRI processing are documented here: https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf.
Method 3: Multimodal brain age prediction using machine learning models
GM, WM, and FC-IDP (details of image processing are presented in Method 2) were fit into four machine learning models (linear and non-linear) to predict brain age as the outcome. Specifically, we used SVR, LASSO regression, MLP, and a five-layer neural network (NN: three linear layers and one rectified linear unit layer).
To objectively and reproducibly compare the age prediction performance using different machine learning models and MRI modalities, we adopted a nested CV procedure and included an independent test dataset27. Specifically, the outer loop CV was performed for 100 repeated random splits: 80% of the data were used for training. The remaining 20% was used for validation/testing in the inner loop with a 10-fold CV. In addition, we concealed an independent test dataset – unseen for testing until we finished fine-tuning the machine learning models84 (e.g., hyperparameters for SVR and neural networks). To compare the results of different models and modalities, we showed MAE’s mean and empirical standard deviation instead of performing any statistical test (e.g., a two-sample t-test). This is because no unbiased variance estimate exists for complex CV procedures (refer to notes from Nadeau and Benjio94).
Method 4: Genetic analyses
Imputed genotype data were quality-checked for downstream analyses. Our quality check pipeline (see below) resulted in 33,541 European ancestry participants and 8,469,833 SNPs. After merging with the multimodal MRI populations, we included 31,557 European participants for GM-BAG, 31,749 participants for WM-BAG, and 32,017 participants for FC-BAG GWAS. Details of the protocol are described elsewhere15,95. We summarize our genetic QC pipeline as below. First, we excluded related individuals (up to 2nd-degree) from the complete UKBB sample using the KING software for family relationship inference96. We then removed duplicated variants from all 22 autosomal chromosomes. Individuals whose genetically identified sex did not match their self-acknowledged sex were removed. Other excluding criteria were: i) individuals with more than 3% of missing genotypes; ii) variants with minor allele frequency (MAF) of less than 1%; iii) variants with larger than 3% missing genotyping rate; iv) variants that failed the Hardy-Weinberg test at 1×10−10. To adjust for population stratification97, we derived the first 40 genetic principle components (PC) using the FlashPCA software98. Details of the genetic quality check protocol are described elsewhere95,99.
(A): Genome-wide association analysis:
For GWAS, we ran a linear regression using Plink100 for GM, WM, and FC-BAG, controlling for confounders of age, dataset status (training/validation/test or independent test dataset), age x squared, sex, age x sex interaction, age-squared x sex interaction, total intracranial volume, the brain position in the scanner (lateral, transverse, and longitudinal), and the first 40 genetic principal components. The inclusion of these covariates is guided by pioneer neuroimaging GWAS conducted by Zhao et al13. and Elliot et al.11 We adopted the genome-wide P-value threshold (5 × 10−8) and annotated independent genetic signals considering linkage disequilibrium (see below). We then estimated the SNP-based heritability using GCTA37 using the individual-level genotype data with the same covariates in GWAS.
To check the robustness of our GWAS results using European ancestry, we performed six sensitivity checks, including i) split-sample GWAS by randomly dividing the entire population into two sex and age-matched splits, ii) sex-stratified GWAS for males and females, iii) non-European GWAS, iv) fastGWA40 for a mixed linear model that accounts for cryptic population stratification, v) machine learning-specific GWAS, and vi) feature type-specific GWAS.
(B): Annotation of genomic loci and genes:
The annotation of genomic loci and mapped genes was performed via FUMA101 (https://fuma.ctglab.nl/, version: v1.5.0). For the annotation of genomic loci, we first defined lead SNPs (correlation r2 ≤ 0.1, distance < 250 kilobases) and assigned them to a genomic locus (non-overlapping); the lead SNP with the lowest P-value (i.e., the top lead SNP) was used to represent the genomic locus. For gene mappings, three different strategies were considered. First, positional mapping assigns the SNP to its physically nearby genes (a 10 kb window by default). Second, eQTL mapping annotates SNPs to genes based on eQTL associations. Finally, chromatin interaction mapping annotates SNPs to genes when there is a significant chromatin interaction between the disease-associated regions and nearby or distant genes.101 The definition of top lead SNP, lead SNP, independent significant SNP, and candidate SNP can be found in Supplementary eMethod 1.
(C): Phenome-wide association query for genomic loci associated with other traits in the literature:
We queried the significant independent SNPs within each locus in the GWAS Catalog (query date: January 10th, 2023 via FUMA version: v1.5.0) to determine their previously identified associations with other traits. For these associated traits, we further mapped them into several high-level categories for visualization purposes (Fig. 2B).
(D): Genetic correlation:
We used LDSC36 to estimate the pairwise genetic correlation (rg) between GM, WM, and FC-BAG and several pre-selected traits (Supplementary eTable 3) by using the precomputed LD scores from the 1000 Genomes of European ancestry. The following pre-selected traits were included: Alzheimer’s disease (AD), autism spectrum disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), obsessive-compulsive disorder (OCD), major depressive disorder (MDD), bipolar disorder (BIP), schizophrenia (SCZ), education and intelligence, as well as the AI-derived subtypes for AD (AD1 and AD2102), ASD (ASD1, ASD2, and ASD347), and SCZ (SCZ1 and SCZ2103) – serving as more robust endophenotypes than the disease diagnoses themselves. To ensure the suitability of the GWAS summary statistics, we first checked that the selected study’s population was European ancestry; we then guaranteed a moderate SNP-based heritability h2 estimate and excluded the studies with spurious low h2 (<0.05). Notably, LDSC corrects for sample overlap and provides an unbiased estimate of genetic correlation104. The h2 estimate from LDSC is, in general, lower than that of GCTA because LDSC uses GWAS summary statistics and pre-computed LD information and has slightly different model assumptions across different software105.
(E): Partitioned heritability estimate:
Partitioned heritability analysis estimates the percentage of heritability enrichment explained by annotated genome regions51. First, the partitioned heritability was calculated for 53 main functional categories. The 53 functional categories are not specific to any cell type, including coding, UTR, promoter, and intronic regions. Details of the 53 categories are described elsewhere51 and are also presented in Supplementary eTable 5A. Subsequently, cell type-specific partitioned heritability was estimated using gene sets from Cahoy et al.58 for three main cell types (i.e., astrocyte, neuron, and oligodendrocyte) (Supplementary eTable 5B).
(F): Gene-drug-disease network construction:
We curated data from the Drug Bank database (v.5.1.9)106 and the Therapeutic Target Database (updated by September 29th, 2021) to construct a gene-drug-disease network. Specifically, we constrained the target to human organisms and included all drugs with active statuses (e.g., patented and approved) but excluded inactive ones (e.g., terminated or discontinued at any phase). To represent the disease, we mapped the identified drugs to the Anatomical Therapeutic Chemical (ATC) classification system for the Drugbank database and the International Classification of Diseases (ICD-11) for the Therapeutic Target Database.
(G): Two-sample Mendelian Randomization:
We investigated whether the clinical traits previously associated with our genomic loci (Fig. 2B) were a cause or a consequence of GM, WM, and FC-BAG using a bidirectional, two-sample MR approach. GM, WM, and FC-BAG are the outcome/exposure variables in the forward/inverse MR, respectively. We applied five different MR methods using the TwoSampleMR R package59, including the inverse variance weighted (IVW), MR Egger107, weighted median108, simple mode, and weighted mode methods. We reported the results of IVW in the main text and the four others in the Supplementary eFile 9. MR relies on a set of crucial assumptions to ensure the validity of its results. These assumptions include the requirement that the chosen genetic instrument exhibits a strong association with the exposure of interest while remaining free from direct associations with confounding factors that could influence the outcome. Additionally, the genetic variant used in MR should be independently allocated during conception and inheritance, guaranteeing its autonomy from potential confounders. Furthermore, this genetic instrument must affect the outcome solely through the exposure of interest without directly impacting alternative pathways that could influence the outcome (no horizontal pleiotropy). The five MR methods handle pleiotropy and instrument validity assumptions differently, offering various degrees of robustness to violations. For example, MR Egger provides a method to estimate and correct for pleiotropy, making it robust in the presence of horizontal pleiotropy. However, it assumes that directional pleiotropy is the only form of pleiotropy present.
To ensure an unbiased selection of exposure variables, we followed a systematic procedure guided by the STROBE-MR Statement109. We pre-selected exposure variables across various categories based on our phenome-wide association query. These variables encompassed neurodegenerative diseases (e.g., AD), liver biomarkers (e.g., AST), cardiovascular diseases (e.g., the triglyceride-to-lipid ratio in VLDL), and lifestyle-related risk factors (e.g., BMI). Subsequently, we conducted an automated query for these traits in the IEU GWAS database110, which provides curated GWAS summary statistics suitable for MR, using the available_outcomes() function. We ensured the selected studies used European ancestry populations and shared the same genome build as our GWAS (HG19/GRCh37). Additionally, we manually examined the selected studies to exclude any GWAS summary statistics overlapping with UK Biobank populations to prevent bias stemming from sample overlap111. This process yielded a set of seven exposure variables, comprising AD, breast cancer, type 2 diabetes, renin level, triglyceride-to-lipid ratio, aspartate aminotransferase (AST), and BMI. The details of the selected studies for the instrumental variables (IVs) are provided in Supplementary eTable 6.
We performed several sensitivity analyses. First, a heterogeneity test was performed to check for violating the IV assumptions. Horizontal pleiotropy was estimated to navigate the violation of the IV’s exclusivity assumption64 using a funnel plot, single-SNP MR approaches, and MR Egger estimator107. Moreover, the leave-one-out analysis excluded one instrument (SNP) at a time and assessed the sensitivity of the results to individual SNP.
(H): PRS prediction:
We calculated the PRS using the GWAS results from the split-sample analyses. The weights of the PRS were defined based on split1 data (training/base data), and the split2 GWAS summary statistics were used as the test/target data. The QC steps for the base data are as follows: i) removal of duplicated and ambiguous SNPs for the base data; ii) clumping the base GWAS data; iii) pruning to remove highly correlated SNPs in the target data; iv) removal of high heterozygosity samples in the target data; v) removal of duplicated, mismatching and ambiguous SNPs in the target data. After rigorous QC, we employed two methods to derive the three BAG-PRS in the split2 population: i) PLINK with the classic C+T method (clumping + thresholding) and ii) PRS-CS57 with a Bayesian approach.
To determine the “best-fit” PRS P-value threshold, we performed a linear regression using the PRS calculated at different P-value thresholds (0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5), controlling for age, sex, total intracellular volume, brain position during scanning (lateral, transverse, and longitudinal), and the first forty genetic PCs. A null model was established by including only the abovementioned covariates. The alternative model was then constructed by introducing each BAG-PRS as an extra independent variable.
Supplementary Material
Acknowledgments
We want to express our sincere gratitude to the UK Biobank team for their invaluable contribution to advancing clinical research in our field. The primary funding support for this present study is from the initial funding package provided by Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California for WJ. The iSTAGING consortium is a multi-institutional effort funded by NIA by RF1 AG054409 for DC. This research has been conducted using the UK Biobank Resource under Application Number 35148. We thank Caroline O’Driscoll for her work creating the MEDICINE web portal, which has been instrumental in showcasing and disseminating our scientific findings.
Footnotes
Code Availability
- MLNI: https://anbai106.github.io/mlni/, brain age prediction (V0.1.2)
- MEDICINE: https://labs.loni.usc.edu/medicine, knowledge portal for dissemination and GWAS summary statistics sharing
- MUSE: https://www.med.upenn.edu/sbia/muse.html, image preprocessing for GM-IDP
- PLINK: https://www.cog-genomics.org/plink/, GWAS and PRS
- FUMA: https://fuma.ctglab.nl/, gene mapping, genomic locus annotation
- GCTA: https://yanglab.westlake.edu.cn/software/gcta/#Overview, heritability estimates, and fastGWA
- LDSC: https://github.com/bulik/ldsc, genetic correlation, partitioned heritability
- TwoSampleMR: https://mrcieu.github.io/TwoSampleMR/index.html, MR
- PRS-CS: https://github.com/getian107/PRScs, PRS
Competing Interests
None
Data Availability
The GWAS summary statistics corresponding to this study are publicly available on the MEDICINE knowledge portal (https://labs.loni.usc.edu/medicine).
References
- 1.Rajpurkar P., Chen E., Banerjee O. & Topol E. J. AI in health and medicine. Nat Med 28, 31–38 (2022). [DOI] [PubMed] [Google Scholar]
- 2.Hassabis D., Kumaran D., Summerfield C. & Botvinick M. Neuroscience-Inspired Artificial Intelligence. Neuron 95, 245–258 (2017). [DOI] [PubMed] [Google Scholar]
- 3.Lee J. et al. Deep learning-based brain age prediction in normal aging and dementia. Nat Aging 2, 412–424 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wen J. et al. Genetic, clinical underpinnings of subtle early brain change along Alzheimer’s dimensions. 2022.09.16.508329 Preprint at 10.1101/2022.09.16.508329 (2022). [DOI] [Google Scholar]
- 5.Hollon T. et al. Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging. Nat Med 1–5 (2023) doi: 10.1038/s41591-023-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cole J. H., Marioni R. E., Harris S. E. & Deary I. J. Brain age and other bodily ‘ages’: implications for neuropsychiatry. Mol Psychiatry 24, 266–281 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jones D. T., Lee J. & Topol E. J. Digitising brain age. The Lancet 400, 988 (2022). [DOI] [PubMed] [Google Scholar]
- 8.Tian Y. E. et al. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med 1–11 (2023) doi: 10.1038/s41591-023-02296-6. [DOI] [PubMed] [Google Scholar]
- 9.Kaufmann T. et al. Common brain disorders are associated with heritable patterns of apparent aging of the brain. Nat Neurosci 22, 1617–1623 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shen L. & Thompson P. M. Brain Imaging Genomics: Integrated Analysis and Machine Learning. Proceedings of the IEEE 108, 125–162 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Elliott L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Smith S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat Neurosci 24, 737–745 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhao B. et al. Genome-wide association analysis of 19,629 individuals identifies variants influencing regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits. Nat Genet 51, 1637–1644 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhao B. et al. Common genetic variation influencing human white matter microstructure. Science 372, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wen J. et al. Novel genomic loci and pathways influence patterns of structural covariance in the human brain. 2022.07.20.22277727 Preprint at 10.1101/2022.07.20.22277727 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Grasby K. L. et al. The genetic architecture of the human cerebral cortex. Science 367, eaay6690 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Brouwer R. M. et al. Genetic variants associated with longitudinal changes in brain structure across the lifespan. Nat Neurosci 25, 421–432 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhao B. et al. Common variants contribute to intrinsic human brain functional networks. Nat Genet 54, 508–517 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Smith S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. eLife 9, e52677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ning K. et al. Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging. Neurobiology of Aging 105, 199–204 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Leonardsen E. H. et al. Genetic architecture of brain age and its causal relations with brain and mental disorders. Mol Psychiatry 1–10 (2023) doi: 10.1038/s41380-023-02087-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jonsson B. A. et al. Brain age prediction using deep learning uncovers associated sequence variants. Nat Commun 10, 5409 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kendler K. & Neale M. Endophenotype: a conceptual analysis. Mol Psychiatry 15, 789–797 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bycroft C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Emdin C. A., Khera A. V. & Kathiresan S. Mendelian Randomization. JAMA 318, 1925–1926 (2017). [DOI] [PubMed] [Google Scholar]
- 26.Varoquaux G. et al. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. NeuroImage 145, 166–179 (2017). [DOI] [PubMed] [Google Scholar]
- 27.Samper-González J. et al. Reproducible evaluation of classification methods in Alzheimer’s disease: Framework and application to MRI and PET data. NeuroImage 183, 504–521 (2018). [DOI] [PubMed] [Google Scholar]
- 28.Pedregosa F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). [Google Scholar]
- 29.de Lange A.-M. G. & Cole J. H. Commentary: Correction procedures in brain-age prediction. Neuroimage Clin 26, 102229 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Peng H., Gong W., Beckmann C. F., Vedaldi A. & Smith S. M. Accurate brain age prediction with lightweight deep neural networks. Medical Image Analysis 68, 101871 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Leonardsen E. H. et al. Deep neural networks learn general and clinically relevant representations of the ageing brain. NeuroImage 256, 119210 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Vidal-Pineiro D. et al. Individual variations in ‘brain age’ relate to early-life factors more than to longitudinal brain change. eLife 10, e69995 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wood D. A. et al. Accurate brain-age models for routine clinical MRI examinations. Neuroimage 249, 118871 (2022). [DOI] [PubMed] [Google Scholar]
- 34.More S. et al. Brain-age prediction: A systematic comparison of machine learning workflows. NeuroImage 270, 119947 (2023). [DOI] [PubMed] [Google Scholar]
- 35.Bashyam V. M. et al. MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain 143, 2312–2324 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bulik-Sullivan B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yang J., Lee S. H., Goddard M. E. & Visscher P. M. GCTA: A Tool for Genome-wide Complex Trait Analysis. Am J Hum Genet 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Buniello A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005–D1012 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cheverud J. M. A COMPARISON OF GENETIC AND PHENOTYPIC CORRELATIONS. Evolution 42, 958–968 (1988). [DOI] [PubMed] [Google Scholar]
- 40.Jiang L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51, 1749–1755 (2019). [DOI] [PubMed] [Google Scholar]
- 41.Doshi J. et al. MUSE: MUlti-atlas region Segmentation utilizing Ensembles of registration algorithms and parameters, and locally optimal atlas selection. Neuroimage 127, 186–195 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Davatzikos C., Genc A., Xu D. & Resnick S. M. Voxel-Based Morphometry Using the RAVENS Maps: Methods and Validation Using Simulated Longitudinal Atrophy. NeuroImage 14, 1361–1369 (2001). [DOI] [PubMed] [Google Scholar]
- 43.Hopkins A. L. & Groom C. R. The druggable genome. Nat Rev Drug Discov 1, 727–730 (2002). [DOI] [PubMed] [Google Scholar]
- 44.Antibody-Mediated Targeting of Tau In Vivo Does Not Require Effector Function and Microglial Engagement - PubMed. https://pubmed.ncbi.nlm.nih.gov/27475227/. [DOI] [PubMed]
- 45.Wilcock G. K. et al. Potential of Low Dose Leuco-Methylthioninium Bis(Hydromethanesulphonate) (LMTM) Monotherapy for Treatment of Mild Alzheimer’s Disease: Cohort Analysis as Modified Primary Outcome in a Phase III Clinical Trial. J Alzheimers Dis 61, 435–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wen J. et al. Subtyping brain diseases from imaging data. Preprint at 10.48550/arXiv.2202.10945 (2022). [DOI] [PubMed] [Google Scholar]
- 47.Hwang G. et al. Assessment of Neuroanatomical Endophenotypes of Autism Spectrum Disorder and Association With Characteristics of Individuals With Schizophrenia and the General Population. JAMA Psychiatry (2023) doi: 10.1001/jamapsychiatry.2023.0409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Chand G. B. et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using machine learning. Brain 143, 1027–1038 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS). Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol Psychiatry 23, 1181–1188 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Rietveld C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Finucane H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hoffman M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41, 827–841 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Haberle V. & Stark A. Eukaryotic core promoters and the functional basis of transcription initiation. Nat Rev Mol Cell Biol 19, 621–637 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Trynka G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet 45, 124–130 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Barski A. et al. High-Resolution Profiling of Histone Methylations in the Human Genome. Cell 129, 823–837 (2007). [DOI] [PubMed] [Google Scholar]
- 56.Choi S. W., Mak T. S.-H. & O’Reilly P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc 15, 2759–2772 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ge T., Chen C.-Y., Ni Y., Feng Y.-C. A. & Smoller J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Cahoy J. D. et al. A Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New Resource for Understanding Brain Development and Function. J. Neurosci. 28, 264–278 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Hemani G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Borges M. C. et al. Circulating Fatty Acids and Risk of Coronary Heart Disease and Stroke: Individual Participant Data Meta-Analysis in Up to 16 126 Participants. J Am Heart Assoc 9, e013131 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Morris A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44, 981–990 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.K M. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lambert J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet 45, 1452–1458 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Bowden J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med 36, 1783–1802 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Burgess S. & Thompson S. G. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur J Epidemiol 32, 377–389 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Klarin D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat Genet 50, 1514–1523 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Nelson C. P. et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet 49, 1385–1391 (2017). [DOI] [PubMed] [Google Scholar]
- 68.Horie K. et al. CSF tau microtubule-binding region identifies pathological changes in primary tauopathies. Nat Med 28, 2547–2554 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Gusev A. et al. Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases. The American Journal of Human Genetics 95, 535–552 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Stamatoyannopoulos J. A. What does our genome encode? Genome Res 22, 1602–1611 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Nordestgaard B. G. & Varbo A. Triglycerides and cardiovascular disease. Lancet 384, 626–635 (2014). [DOI] [PubMed] [Google Scholar]
- 72.Antal B. et al. Type 2 diabetes mellitus accelerates brain aging and cognitive decline: Complementary findings from UK Biobank and meta-analyses. Elife 11, e73138 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Wu A. M. L. et al. Aging and CNS Myeloid Cell Depletion Attenuate Breast Cancer Brain Metastasis. Clinical Cancer Research 27, 4422–4434 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Gkatzionis A. & Burgess S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? International Journal of Epidemiology 48, 691–701 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Sachdev P. S., Zhuang L., Braidy N. & Wen W. Is Alzheimer’s a disease of the white matter? Curr Opin Psychiatry 26, 244–251 (2013). [DOI] [PubMed] [Google Scholar]
- 76.Watanabe K. et al. Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nat Genet 54, 1125–1132 (2022). [DOI] [PubMed] [Google Scholar]
- 77.Cermakova P. et al. Parental education, cognition and functional connectivity of the salience network. Sci Rep 13, 2761 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Cao H., Zhou H. & Cannon T. D. Functional connectome-wide associations of schizophrenia polygenic risk. Mol Psychiatry 26, 2553–2561 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Yu M., Sporns O. & Saykin A. J. The human connectome in Alzheimer disease — relationship to biomarkers and genetics. Nat Rev Neurol 17, 545–563 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Tost H., Champagne F. A. & Meyer-Lindenberg A. Environmental influence in the brain, human welfare and mental health. Nat Neurosci 18, 1421–1431 (2015). [DOI] [PubMed] [Google Scholar]
- 81.Petersen R. C. et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Casey B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience 32, 43–54 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Di Biase M. A. et al. Mapping human brain charts cross-sectionally and longitudinally. Proceedings of the National Academy of Sciences 120, e2216798120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Wen J. et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Medical Image Analysis 63, 101694 (2020). [DOI] [PubMed] [Google Scholar]
- 85.Smith S. M. et al. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. Neuroimage 31, 1487–1505 (2006). [DOI] [PubMed] [Google Scholar]
- 86.Tustison N. J. et al. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Zhang H., Schneider T., Wheeler-Kingshott C. A. & Alexander D. C. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61, 1000–1016 (2012). [DOI] [PubMed] [Google Scholar]
- 88.Smith S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl 1, S208–19 (2004). [DOI] [PubMed] [Google Scholar]
- 89.Mori S., Wakana S., Nagae-Poetscher L. & van Zijl P. MRI Atlas of Human White Matter. (Elsevier, 2005). [DOI] [PubMed] [Google Scholar]
- 90.Wakana S. et al. Reproducibility of quantitative tractography methods applied to cerebral white matter. NeuroImage 36, 630–644 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Alfaro-Almagro F. et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Miller K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience 19, 1523–1536 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Beckmann C. F. & Smith S. M. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging 23, 137–152 (2004). [DOI] [PubMed] [Google Scholar]
- 94.Inference for the Generalization Error | SpringerLink. https://link.springer.com/article/10.1023/A:1024068626366. [Google Scholar]
- 95.Wen J. et al. Characterizing Heterogeneity in Neuroimaging, Cognition, Clinical Symptoms, and Genetics Among Patients With Late-Life Depression. JAMA Psychiatry (2022) doi: 10.1001/jamapsychiatry.2022.0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Manichaikul A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Price A. L., Zaitlen N. A., Reich D. & Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11, 459–463 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Abraham G., Qiu Y. & Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017). [DOI] [PubMed] [Google Scholar]
- 99.Wen J. et al. The Genetic Architecture of Biological Age in Nine Human Organ Systems. medRxiv 2023.06.08.23291168 (2023) doi: 10.1101/2023.06.08.23291168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Purcell S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Watanabe K., Taskesen E., van Bochoven A. & Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun 8, 1826 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Wen J. et al. Genetic, clinical underpinnings of subtle early brain change along Alzheimer’s dimensions. 2022.09.16.508329 Preprint at 10.1101/2022.09.16.508329 (2022). [DOI] [Google Scholar]
- 103.Chand G. B. et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using machine learning. Brain 143, 1027–1038 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Bulik-Sullivan B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–1241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Zhang Y. et al. Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics. Brief Bioinform 22, bbaa442 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Wishart D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46, D1074–D1082 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Bowden J., Davey Smith G. & Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol 44, 512–525 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Bowden J., Davey Smith G., Haycock P. C. & Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol 40, 304–314 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Skrivankova V. W. et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. JAMA 326, 1614–1621 (2021). [DOI] [PubMed] [Google Scholar]
- 110.Elsworth B. et al. The MRC IEU OpenGWAS data infrastructure. 2020.08.10.244293 Preprint at 10.1101/2020.08.10.244293 (2020). [DOI] [Google Scholar]
- 111.Burgess S., Davies N. M. & Thompson S. G. Bias due to participant overlap in two-sample Mendelian randomization. Genet Epidemiol 40, 597–608 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The GWAS summary statistics corresponding to this study are publicly available on the MEDICINE knowledge portal (https://labs.loni.usc.edu/medicine).





