Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Jun 11:2023.04.13.536818. [Version 2] doi: 10.1101/2023.04.13.536818

The Genetic Heterogeneity of Multimodal Human Brain Age

Junhao Wen 1,2,3,*, Bingxin Zhao 4, Zhijian Yang 2, Guray Erus 2, Ioanna Skampardoni 2, Elizabeth Mamourian 2, Yuhan Cui 2, Gyujoon Hwang 2, Jingxuan Bao 5, Aleix Boquet-Pujadas 6, Zhen Zhou 2, Yogasudha Veturi 7, Marylyn D Ritchie 8, Haochang Shou 2, Paul M Thompson 9, Li Shen 5, Arthur W Toga 3, Christos Davatzikos 2
PMCID: PMC10274645  PMID: 37333190

Abstract

The complex biological mechanisms underlying human brain aging remain incompletely understood, involving multiple body organs and chronic diseases. In this study, we used multimodal magnetic resonance imaging and artificial intelligence to examine the genetic heterogeneity of the brain age gap (BAG) derived from gray matter volume (GM-BAG), white matter microstructure (WM-BAG), and functional connectivity (FC-BAG). We identified sixteen significant genomic loci, with GM-BAG loci showing abundant associations with neurodegenerative and neuropsychiatric traits, WM-BAG loci implicated in cancer and Alzheimer’s disease (AD), and FC-BAG in insomnia. A gene-drug-disease network highlighted genes linked to GM-BAG for treating neurodegenerative and neuropsychiatric disorders and WM-BAG genes for cancer therapy. GM-BAG showed the highest heritability enrichment for genetic variants in conserved regions, whereas WM-BAG exhibited the highest heritability enrichment in the 5’ untranslated regions; oligodendrocytes and astrocytes, but not neurons, showed significant heritability enrichment in WM and FC-BAG, respectively. Notably, Mendelian randomization identified causal risk effects of triglyceride-to-lipid ratio in very low-density lipoprotein and type 2 diabetes on GM-BAG and AD on WM-BAG. Overall, our results provide valuable insights into the genetic heterogeneity of human brain aging, with clinical implications for potential lifestyle and therapeutic interventions.

Main

The advent of artificial intelligence (AI) has provided novel approaches to investigate various aspects of human brain health1,2, such as normal brain aging3, neurodegenerative disorders such as Alzheimer’s disease (AD)4, and brain cancer5. Based on magnetic resonance imaging (MRI), AI-derived measures of the human brain age68 have emerged as a valuable biomarker for evaluating brain health. More precisely, the difference between an individual’s predicted brain age and their chronological age – brain age gap (BAG) – provides a means of quantifying an individual’s brain health by measuring deviation from the normative aging trajectory. This deviation can be caused by adverse factors, such as AD9, leading to an accelerated brain age (i.e., a positive BAG) or protective factors, such as physical activity or education, resulting in a decelerated brain age (i.e., a negative BAG).

Brain imaging genomics10, an emerging scientific field being advanced by both computational statistics and AI, uses imaging-derived phenotypes (IDP11) from MRI and genetic variants to offer mechanistic insights into healthy and pathological aging of the human brain. Recent large-scale genome-wide association studies (GWAS)1118 have identified a diverse set of genomic loci linked to gray matter (GM)-IDP from T1-weighted MRI, white matter (WM)-IDP from diffusion MRI, and functional connectivity (FC)-IDP from functional MRI. While previous GWAS19 have associated BAG with common genetic variants [e.g., single nucleotide polymorphism (SNP)], they primarily focused on GM-BAG9,20. Notably, identifying diverse genetic factors underlying multimodal human brain age – the genetic heterogeneity – and pinpointing their causal consequences are vital for developing gene-inspired therapeutic interventions. Numerous risk or protective lifestyle factors and neurobiological processes may exert independent, synergistic, antagonistic, sequential, or differential influences on human brain health. In this context, we postulate that AI-derived GM, WM, and FC-BAG can serve as robust, complementary endophenotypes21 – close to the underlying etiology – for precision medicine. These AI-derived endophenotypes may offer an avenue for quantifying various dimensions of brain health and creating novel gene-drug-disease networks for drug repurposing22 that may enhance overall brain health.

The present study sought to uncover the genetic heterogeneity of multimodal BAG and explore the causal relationships between protective/risk factors and decelerated/accelerated brain age. To accomplish this, we analyzed multimodal brain MRI scans from 42,089 participants from the UK Biobank (UKBB) study23 and obtained 119 GM-IDP, 48 fractional anisotropy (FA) WM-IDP, and 210 FC-IDP (See Method 1). We first compared the age prediction performance of four machine learning models using these IDPs and then conducted extensive genetic analyses to depict their genetic architectures. We performed GWAS to identify genomic loci associated with GM, WM, and FC-BAG, constructed a gene-drug-disease network, and estimated genetic correlation with several brain disorders and heritability enrichment in various functional categories or specific cell types. Finally, we performed Mendelian Randomization (MR)24 to infer the causal effects of several clinical traits related to other body organs and diseases on multimodal BAG. All results, including the GWAS summary statistics, are publicly accessible through the MEDICINE (Multi-organ biomEDIcal sCIeNcE) knowledge portal: https://labs.loni.usc.edu/medicine.

Results

The phenotypic heterogeneity of multimodal human brain age derived from three MRI modalities and four machine learning models

To compare the age prediction performance of the four machine learning methods based on GM, WM, and FC-IDP (Fig. 1A), we employed a nested cross-validation (CV) procedure and used an independent test dataset25,26 – unseen until we finalized the models using only the training dataset in the nested CV (Method 1). The four machine learning models included support vector regression (SVR), LASSO regression, multilayer perceptron (MLP), and a five-layer neural network (i.e., three linear layers and one ReLu layer; hereafter, NN)27 (Method 2). Other studies have thoroughly evaluated machine learning models for predicting brain age28, but we selected these as they represent methods currently used in the field.

Figure 1: Brain age prediction using three MRI modalities and four machine learning models.

Figure 1:

A) Multimodal brain MRI data were used to derive imaging-derived phenotypes (IDP) for T1-weighted MRI (119 GM-IDP), diffusion MRI (48 WM-IDP), and resting-state functional MRI (210 FC-IDP). IDPs for each modality are shown here using different colors based on predefined brain atlases or ICA for FC-IDP. B) Linear models achieved lower mean absolute errors (MAE) than non-linear models using support vector regression (SVR), LASSO regression, multilayer perceptron (MLP), and a five-layer neural network (NN). The MAE for the independent test dataset is presented, and the # symbol indicates the model with the lowest MAE for each modality. Error bars represent standard deviation (SD). C) Pearson’s correlation (r) between the predicted brain age and chronological age is computed, and statistical significance (P-value<0.05) - after adjustment for multiple comparisons using the FDR method - is denoted by the * symbol. Error bars represent the 95% confidence interval (CI). D) Scatter plot for the predicted brain age and chronological age. E) Participants from the independent test dataset show considerable heterogeneity in the manifestation of GM, WM, and FC-BAG. Colored triangles represent each participant’s absolute maximum magnitude of the multimodal BAG. The presented brain age from the three best-MAE models is corrected for the “age bias”.

Several findings were observed based on the results from the independent test dataset. First, GM-IDP (4.39<MAE<5.35; 0.64<r<0.66), WM-IDP (4.92<MAE<7.95; 0.42<r<0.65), and FC-IDP (5.48<MAE<6.05; 0.43 <r<0.46) achieved gradually higher mean absolute errors (MAE) and smaller Pearson’s correlation (r) (Fig. 1B, C, and D). Second, LASSO regression always obtained the lowest MAE for GM, WM, and FC-IDP; linear models obtained a lower MAE than non-linear networks (Fig. 1B). Third, all models generalized well from the training/validation/test dataset to the independent test dataset. However, simultaneously incorporating WM-IDP from FA, mean diffusivity (MD), neurite density index (NDI), and orientation dispersion index (ODI) resulted in models that were severely overfitted (Supplementary eTable 1 and Supplementary eFigure 1). The observed overfitting may be attributed to a large number of parameters (N=38,364) in the network and the strong correlations among the diffusion metrics (i.e., FA, MD, ODI, and NDI). Fourth, the experiments stratified by sex did not exhibit substantial differences, except for a stronger overfitting tendency observed in females compared to males using WM-IDP incorporating the four diffusion metrics (Supplementary eTable 1). Fifth, we found considerable heterogeneity in the expression of GM, WM, and FC-BAG in this independent dataset (Fig. 1E). Detailed results of the CV procedure, including the training, validation, and test performance, as well as the sex-stratified experiments, are presented in Supplementary eTable 1. These results led us to investigate the underlying genetic determinants responsible for this phenotypic heterogeneity (Fig. 1E). In all subsequent genetic analyses, we reported the results using BAG derived from the three LASSO models with the lowest MAE in each modality (Fig. 1A), with the “age bias” corrected as in De Lange et al.29.

Genomic loci associated with multimodal BAG show different phenome-wide associations

GWAS (Method 4A) revealed 6, 9, and 1 genomic loci linked to GM, WM, and FC-BAG, respectively (Fig. 2A, Supplementary eTable 2). All multimodal BAGs were significantly heritable (Fig. 2F). We performed a phenome-wide association query for these independent variants within each locus (Method 4C). Notably, the independent SNPs within each locus were frequently linked to other traits previously reported in the literature (Supplementary eFile 1). Specifically, the GM-BAG loci were uniquely associated with neuropsychiatric disorders such as major depressive disorder (MDD), heart disease, and cardiovascular disease. We also observed associations between these loci and other diseases (including anemia), as well as biomarkers from various human organs (e.g., liver) (Fig. 2B). As an illustration, within a specific genomic locus (top lead SNP: rs534115641, 17q21.31, P-value=8.44×10−23), the independent significant SNPs were predominantly linked with traits related to AD (rs720740030, rs19949930) and mental health (rs9303521 and rs7225384 for depressed affect31). One mapped gene, MAPT, has a well-established association with increased susceptibility to AD and other degenerative diseases32. The loci for WM-BAG were also found to be associated with physical measures [e.g., body mass index (BMI)], cancer (e.g., brain cancer), and AD-related biomarkers (e.g., tau protein levels, Fig. 2B). Finally, the genomic locus associated with FC-BAG was only associated with insomnia, with the top lead SNP (rs5877290) being novel (Fig. 2E). The genomic loci linked to GM, WM, and FC-BAG were distributed throughout the human genome, with many locations being distinct. In contrast, some others were closely situated across different BAGs. For instance, on chromosome 1, the locus associated with GM-BAG (top lead SNP: rs61732315, 1q32.1) and the locus related to WM-BAG (top lead SNP: rs11118475, 1q32.2) were in proximity (Fig. 2A).

Figure 2: Genome-wide associations of multimodal brain age gaps.

Figure 2:

A) Genome-wide associations identified sixteen genomic loci associated with GM (6), WM (9), and FC-BAG (1) using a genome-wide P-value threshold [−log10(P-value) > 7.30]. The top lead SNP and the cytogenetic region number represent each locus. B) Phenome-wide association query from GWAS Catalog39. Independent significant SNPs inside each locus were largely associated with many traits. We further classified these traits into several trait categories, including biomarkers from multiple body organs (e.g., heart and liver), neurological disorders (e.g., Alzheimer’s disease and Parkinson’s disease), and lifestyle risk factors (e.g., alcohol consumption). C) Regional plot for a genomic locus associated with GM-BAG. Color-coded SNPs are decided based on their highest r2 to one of the nearby independent significant SNPs. Gray-colored SNPs are below the r2 threshold. The top lead SNP, lead SNPs, and independent significant SNPs are denoted as dark-purple, purple, and red, respectively. Mapped, orange-colored genes of the genomic locus are annotated by positional, eQTL, and chromatin interaction mapping (Method 4B). D) Regional plot for a genomic locus associated with WM-BAG. E) The novel genomic locus associated with FC-BAG did not map to any genes. We used the Genome Reference Consortium Human Build 37 (GRCh37) in all genetic analyses. F) The SNP-based heritability estimates40 (h2) are shown for GM, WM, and FC-BAG. Abbreviation: AD: Alzheimer’s disease; ASD: autism spectrum disorder; PD: Parkinson’s disease; ADHD: attention-deficit/hyperactivity disorder.

We then performed positional and functional annotations to map SNPs to genes associated with GM and WM-BAG loci (Method 4B). The genomic locus (top lead SNP: rs534115641, Fig. 2C) linked to GM-BAG was mapped to multiple protein-encoding genes by position, eQTL, and chromatin interaction. The NSF gene, which encodes N-ethylmaleimide-sensitive fusion proteins, plays a key role in transferring membrane vesicles between cellular compartments. This gene has been linked to several conditions, including Parkinson’s disease (PD)33, epithelial ovarian cancer34, cognitive traits35, and fibromuscular dysplasia36. The CRHR1 gene encodes a G protein-coupled receptor, which specifically binds to neuropeptides of the corticotropin-releasing hormone family. These neuropeptides are recognized as key regulators of the hypothalamic-pituitary-adrenal pathway. A prior GWAS37 corroborated the association of this gene with the response to environmental stress, providing substantive support for the engagement of the hypothalamic-pituitary-adrenal axis, the central nervous system, and the endocrine system in the regulation of stress response38. We also identified a highly polygenic genomic locus (top lead SNP: rs564819152, Fig. 2D) for WM-BAG. This locus mapped to the SKIDA1, CASC10, MLLT10, and DNAJC1 genes – all implicated in various types of cancer. In contrast, the FC-BAG locus was novel and did not map to any genes. All mapped genes for GM and WM-BAG are presented in Supplementary eTable 2. These results indicate potential genetic overlap and causal links between the multimodal BAG and other clinical traits.

Sensitivity analyses for the genome-wide associations in split-sample experiments, sex-stratified experiments, and non-European populations

We performed extensive sensitivity analyses to check the robustness of our GWAS results. In the split-sample analysis (Method 4A), we found that two of the six GM-BAG loci (1q41 and 17q21.31; P-value>5×10−8) were replicated in the two splits. While the other four genomic loci did not surpass the genome-wide P-value threshold, they all exhibited local minima and featured the same top lead SNPs or existed in a state of high linkage disequilibrium. Five of the nine loci (1q32.2, 3q28, 4p15.2, 10p12.31, and 12q23.3; P-value>5×10−8) were replicated for WM-BAG in the two splits, and the FC-BAG locus was not replicated. A similar trend was observed for the other loci that did not pass the genome-wide threshold as in GM-BAG (Supplementary eFigure 2).

The sex-stratified GWAS for males and females demonstrated genome-wide consistency for certain loci. In particular, the genomic locus on chromosome 17 (17q21.31, Supplementary eFigure 3) was replicated in males, females, and the main analysis (Fig. 2C) for GM-BAG. Similarly, the locus on chromosome 3 (3q28, Supplementary eFigure 3) was also replicated in males, females, and the main analysis for WM-BAG (Fig. 2C).

In addition, GWAS was conducted on non-European individuals, but the resulting loci were not replicated, which is not unexpected, given the considerably smaller sample size (N<5091) and population stratification stemming from varying ancestral origins (Supplementary eFigure 4).

These findings suggest that the genomic loci exhibited robustness across sex, splits, and within European populations; however, their generalizability to non-European populations within the UKBB cohort is limited. Therefore, further validation of these associations is necessary through replication studies in an independent large-scale dataset.

Gene-drug-disease network highlights disease-specific drugs that bind to genes associated with GM and WM-BAG

We investigated the potential “druggable genes41” from the mapped genes by constructing a gene-drug-disease network (Method 4F). The network connects genes with drugs (or drug-like molecules) targeting specific diseases currently active at any stage of clinical trials. We revealed clinically relevant associations for 4 and 6 mapped genes associated with GM-BAG and WM-BAG, respectively. The GM-BAG genes were linked to clinical trials for treating heart disease, neurodegenerative, neuropsychiatric diseases, and respiratory diseases. On the other hand, the WM-BAG genes were primarily targeted for various cancer treatments and cardiovascular diseases (Fig. 3). To illustrate, for the GM-BAG MAPT gene, several drugs or drug-like molecules are currently being evaluated for treating AD. For example, Semorinemab (RG6100), an anti-tau IgG4 antibody, was being investigated in a phase-2 clinical trial (trial number: NCT03828747), which targets extracellular tau in AD, to reduce microglial activation and inflammatory responses42. Another drug is the LMTM (TRx0237) - a second-generation tau protein aggregation inhibitor currently being tested in a phase-3 clinical trial (trial number: NCT03446001) for treating AD and frontotemporal dementia43. Regarding WM-BAG genes, they primarily bind with drugs for treating cancer and cardiovascular diseases. For instance, the PDIA3 gene, associated with the folding and oxidation of proteins, has been targeted for developing several zinc-related FDA-approved drugs for treating cardiovascular diseases. Another example is the MAP1A gene, which encodes microtubule-associated protein 1A. This gene is linked to the development of estramustine, an FDA-approved drug for prostate cancer (Fig. 3). Detailed results are presented in Supplementary eFile 2.

Figure 3: Gene-drug-disease network of multimodal brain age gaps.

Figure 3:

The gene-drug-disease network derived from the mapped genes revealed a broad spectrum of targeted diseases and cancer, including brain cancer, cardiovascular system diseases, Alzheimer’s disease, and obstructive airway disease, among others. The thickness of the lines represented by the brain tissue-specific gene set enrichment results using the GTEx v8 dataset. We highlight several drugs under blue-colored, bold text to illustrate the main text. Abbreviation: ATC: Anatomical Therapeutic Chemical; ICD: International Classification of Diseases.

Multimodal BAG is genetically correlated with AI-derived subtypes of brain disorders

We calculated the genetic correlation (rg) using the GWAS summary statistics from 16 clinical traits to examine genetic covariance between multimodal BAG and other clinical traits (Method 4D). These traits encompassed common brain diseases and their AI-derived disease subtypes, as well as education and intelligence (Fig. 4A and Supplementary eTable 3). The AI-generated disease subtypes were established in our previous studies utilizing semi-supervised clustering methods44 and IDP from brain MRI scans. These subtypes, in essence, capture more homogeneous disease effects than the conventional “unitary” disease diagnosis, hence presenting robust endophenotypes21 for precision medicine.

Figure 4: Genetic correlation, partitioned heritability enrichment, and PRS prediction accuracy on multimodal brain age gaps.

Figure 4:

A) Genetic correlation (rg) between GM, WM, and FC-BAG and 16 clinical traits. These traits include neurodegenerative diseases (e.g., AD) and their AI-derived subtypes (e.g., AD1 and AD24), neuropsychiatric disorders (e.g., ASD) and their subtypes (ASD1, 2, and 345), intelligence, and education. B) The proportion of heritability enrichment for the 53 functional categories49. We only show the functional categories that survived the correction for multiple comparisons using the FDR method for visualization purposes. C) Cell type-specific partitioned heritability estimates. We included gene sets from Cahoy et al.56 for three main cell types (i.e., astrocyte, neuron, and oligodendrocyte). After adjusting for multiple comparisons using the FDR method, the * symbol denotes statistical significance (P-value<0.05). Error bars represent the standard error of the estimated parameters. D) Prediction accuracy of (externally validated with non-UKBB GWAS for the reference weights) PRS55 of 36 clinical traits on GM, WM, and FC-BAG. The y-axis (R2) indicates the proportions of phenotypic variation (GM, WM, and FC-BAG) that can be additionally explained by the significant PRS (i.e., the incremental R-squared). The x-axis lists the clinical traits that survive the correction for multiple comparisons using the FDR method. Abbreviation: AD: Alzheimer’s disease; ADHD: attention-deficit/hyperactivity disorder; ASD: autism spectrum disorder; BIP: bipolar disorder; MDD: major depressive disorder; OCD: obsessive-compulsive disorder; SCZ: schizophrenia; CAD: coronary artery disease; CD: Crohn’s disease; BMD: bone mineral density; PD: Parkinson’s disease; SLE: systemic lupus erythematosus; BMI: body mass index; CVD: cardiovascular disease; LDL: low-density lipoprotein cholesterol; MS: multiple sclerosis; AF: Atrial fibrillation.

Our analysis revealed significant genetic correlations between GM-BAG and AI-derived subtypes of AD (AD14), autism spectrum disorder (ASD) (ASD1 and ASD345), schizophrenia (SCZ146), and obsessive-compulsive disorder (OCD)47; WM-BAG and AD1, ASD1, SCZ1, and SCZ2; and FC-BAG and education48 and SCZ1. Detailed results for rg estimates are presented in Supplementary eTable 4.

Multimodal BAG shows specific enrichment of heritability in different functional categories and cell types

We conducted a partitioned heritability analysis49 to investigate the heritability enrichment of genetic variants related to multimodal BAG in the 53 functional categories (Method 4E). Our results revealed that GM and WM-BAG exhibited significant heritability enrichment across numerous annotated functional categories. Specifically, some categories displayed greater enrichment than others, and we have outlined some in further detail.

For GM-BAG, the regions conserved across mammals, as indicated by the label “conserved” in Fig. 4B, displayed the most notable enrichment of heritability: approximately 2.61% of SNPs were found to explain 0.43±0.07 of SNP heritability (P-value=5.80×10−8). Additionally, transcription start site (TSS)50 regions employed 1.82% of SNPs to explain 0.16±0.05 of SNP heritability (P-value=8.05×10−3). TSS initiates the transcription at the 5’ end of a gene and is typically embedded within a core promoter crucial to the transcription machinery51. The heritability enrichment of Histone H3 at lysine 4, as denoted for “H3K4me3_peaks” in Fig. 4B, and histone H3 at lysine 9 (H3K9ac)52 were also found to be large and were known to highlight active gene promoters53. For WM-BAG, 5’ untranslated regions (UTR) used 0.54% of SNPs to explain 0.09±0.03 of SNP heritability (P-value=4.24×10−3). The 5’ UTR is a crucial region of a messenger RNA located upstream of the initiation codon. It is pivotal in regulating transcript translation, with varying mechanisms in viruses, prokaryotes, and eukaryotes.

Additionally, we examined the heritability enrichment of multimodal BAG in three different cell types (Fig. 4C). WM-BAG (P-value=1.69×10−3) exhibited significant heritability enrichment in oligodendrocytes, one type of neuroglial cells. FC-BAG (P-value=1.12×10−2) showed such enrichment in astrocytes, the most prevalent glial cells in the brain. GM-BAG showed no enrichment in any of these cells. Our findings are consistent with understanding the molecular and biological characteristics of GM and WM. Oligodendrocytes are primarily responsible for forming the lipid-rich myelin structure, whereas astrocytes play a crucial role in various cerebral functions, such as brain development and homeostasis. Convincingly, a prior GWAS14 on WM-IDP also identified considerable heritability enrichment in glial cells, especially oligodendrocytes. Detailed results for the 53 functional categories and cell-specific analyses are presented in Supplementary eTable 5.

Polygenic risk scores of other diseases weakly predict multimodal BAG

We examined the predictive ability of genetic variants associated with other 36 disease traits on GM, WM, and FC-BAG using PRS prediction54. Polygenic risk scores (PRS) of 36 disease traits, externally validated using non-UKBB samples for the reference weights55, were used to calculate the incremental R-squared (R2) – the proportions of phenotypic variation that PRS can additionally explain, on top of sex, age, genetic principle components, and imaging scan positions (Method 4H). In Fig. 4D, we show the results that survived the correction for multiple comparisons using the FDR method. We illustrated here several most predictive PRS on multimodal BAG.

We found that the PRS for height additionally accounted for 0.29% (P-value=6.05×10−23) of the variance in GM-BAG, and the PRS for bone mineral density accounted for 0.19% (P-value=3.43×10−16). Combining the 36 PRS increased the proportions of variance explained to 1.05% (P-value=0.048) in GM-BAG. For WM-BAG, the PRS for hypertension additionally accounted for 0.098% (P-value=8.72×10−9) of the variance, and the PRS for osteoporosis accounted for 0.095% (P-value=1.50×10−8). All the 36 PRS together increased the proportions of variance explained to 0.58% (P-value=0.63) in WM-BAG. Finally, For FC-BAG, the PRS for ischaemic stroke additionally accounted for 0.15% (P-value=2.12×10−12) of the variance in GM-BAG, and the PRS for hypertension accounted for 0.13% (P-value=1.27×10−10). Combining the 36 PRS increased the proportions of variance explained to 0.37% (P-value=0.43) in FC-BAG. Detailed results are presented in Supplementary eTable 6. Overall, the incremental R2 of these PRS is limited in predicting multimodal BAG, likely due to the relatively small genetic correlations among these traits and the potential contribution of rare variants to the “missing heritability”. Subsequent research is necessary to establish disease subtype-specific PRS that may offer more accurate predictions for personalized brain aging.

Triglyceride, type 2 diabetes, breast cancer, and AD are causally associated with GM and WM-BAG

We investigated the potential causal effects of several risk factors (i.e., exposure variable) on multimodal BAG (i.e., outcome variable) using a bidirectional two-sample MR approach57 (Method 4G). We hypothesized that several diseases and lifestyle risk factors might contribute to the acceleration or deceleration of human brain aging.

We found putative causal effects of triglyceride-to-lipid ratio in very large very-low-density lipoprotein (VLDL)58 [risk effect for per unit increase; P-value=5.09×10−3, OR (95% CI) = 1.08 (1.02, 1.13), number of SNPs=52], type 2 diabetes59 [risk effect; P-value=1.96×10−2, OR (95% CI) = 1.05 (1.01, 1.09), number of SNPs=10], and breast cancer60 [protect effect; P-value=1.81×10−2, OR (95% CI) = 0.96 (0.93, 0.99), number of SNPs=118] on GM-BAG (i.e., accelerated brain age). We also identified causal effects of AD61 [risk effect for per unit increase; P-value=7.18×10−5, OR (95% CI) = 1.04 (1.02, 1.05), number of SNPs=13] on WM-BAG (Fig. 5A). We subsequently examined the potential inverse causal effects of multimodal BAG (i.e., exposure) on these risk factors (i.e., outcome). However, owing to the restricted power [number of instrumental variables (IV) < 6], we did not observe any significant signals (Supplementary eFigure 5 and Supplementary eFile 3).

Figure 5: Causal inference of multimodal brain age gaps.

Figure 5:

A) Causal inference was performed using a two-sample Mendelian Randomization (MR, Method 4G) approach for seven selected exposure variables on three outcome variables (i.e., GM, WM, and FC-BAG). The symbol * denotes statistical significance after correcting for multiple comparisons using the FDR method (N=7); the symbol # denotes the tests passing the nominal significance threshold (P-value=0.05) but did not survive the multiple comparisons. The odds ratio (OR) and the 95% confidence interval (CI) are presented. B) Leave-one-out analysis of the triglyceride-to-lipid ratio on GM-BAG. Each row represents the MR effect (log OR) and the 95% CI by excluding that SNP from the analysis. The red line depicts the IVW estimator using all SNPs. C) Forest plot for the single-SNP MR results. Each line represents the MR effect (log OR) for the triglyceride-to-lipid ratio on GM-BAG using only one SNP; the red line shows the MR effect using all SNPs together. D) Scatter plot for the MR effect sizes of the SNP-triglyceride-to-lipid ratio association (x-axis, SD units) and the SNP-GM-BAG associations (y-axis, log OR) with standard error bars. The slopes of the purple and green lines correspond to the causal effect sizes estimated by the IVW and the MR Egger estimator, respectively. We annotated a potential outlier. E) Funnel plot for the relationship between the causal effect of the triglyceride-to-lipid ratio on GM-BAG. Each dot represents MR effect sizes estimated using each SNP as a separate instrument against the inverse of the standard error of the causal estimate. The vertical red line shows the MR estimates using all SNPs. We annotated a potential outlier. Abbreviation: AD: Alzheimer’s disease; AST: aspartate aminotransferase; BMI: body mass index; VLDL: very low-density lipoprotein; CI: confidence interval; OR: odds ratio; SD: standard deviation; SE: standard error.

We performed sensitivity analyses to investigate potential violations of the three IV assumptions (Method 4G). To illustrate this, we showed the sensitivity analysis results for the causal effect of the triglyceride-to-lipid in VLDL ratio on GM-BAG (Fig. 5BE). In a leave-one-out analysis, we found that no single SNP overwhelmingly drove the overall effect (Fig. 5B). There was evidence for the presence of minor heterogeneity62 of the causal effect amongst SNPs (Cochran’s Q value=76.06, P-value=5.09×10−3). Some SNPs exerted opposite causal effects compared to the model using all SNPs (Fig. 5C). The scatter plot (Fig. 5D) indicated one obvious SNP outlier (rs11591147), and the funnel plot showed little asymmetry with only an outlier denoted in Fig. 5E (rs4507142). Finally, the MR Egger estimator allows for pleiotropic effects independent of the effect on the exposure of interest (i.e., the InSIDE assumption63). Our results from the Egger estimator showed a small positive intercept (5.21×10−3±2.87×10−3, P-value=0.07) and a lower OR [inverse-variance weighted (IVW): 1.08 (1.02, 1.13) vs. Egger: 1.01 (0.93, 1.10)], which may indicate the presence of directional horizontal pleiotropy for some SNPs. We present sensitivity analyses for other significant exposure variables in Supplementary eFigure 6.

To investigate the potential directional pleiotropic effects, we re-analyzed the MR Egger regression by excluding the two outliers identified in Fig. 5D (rs11591147) and E (rs4507142), which led to a slightly increased OR [1.04 (0.96, 1.12)] and a smaller positive intercept (4.41×10−3±2.65×10−3, P-value=0.09). Our findings support that these two outlier SNPs may have a directional pleiotropic effect on GM-BAG. Nevertheless, given the complex nature of brain aging, many other biological pathways may also contribute to human brain aging. For instance, the SNP (rs11591147) was largely associated with other blood lipids, such as LDL cholesterol64, and heart diseases, such as coronary artery disease65. Based on our sensitivity analysis and the consistency of the results obtained from all five MR methods (Supplementary eFile 3), we showed evidence for a putative causal effect of the triglyceride-to-lipid ratio in VLDL on GM-BAG.

Discussion

The current study used multimodal MRI and AI to elucidate the genetic architecture of multimodal BAG, shedding light on the genome-wide influence of common genetic variants on human brain aging. Our results unveil the heterogeneity of this intricate biological process and its multifaceted interplay – via associations and causality – with other health conditions and clinical traits. Quantifying multimodal human brain age and deciphering its heterogeneity may aid in patient management, genetically guided drug development, and potential lifestyle changes and interventions.

Genetic architecture of GM-BAG

Our genetic results from GM-BAG substantiate that many diseases, conditions, and clinical phenotypes share genetic underpinnings with brain age, perhaps driven by macrostructural changes in GM (e.g., brain atrophy). The locus with the highest degree of polygenicity (the top lead SNP rs534114641 at 17q21.31) showed substantial association with the traits mentioned above and was mapped to numerous genes associated with various diseases (Fig. 2C). Several previous GM-BAG GWAS9,63,64 also identified this locus. Among these genes, the MAPT gene, known to encode a protein called tau, is a prominent hallmark in AD and is implicated in approximately 30 tauopathies, including progressive supranuclear palsy and frontotemporal lobar degeneration66. Our gene-drug-disease network also showed several drugs, such as Semorinemab42, in active clinical trials currently targeting treatment for AD (Fig. 3). The heritability enrichment of GM-BAG was high in several functional categories, with conserved regions being the most prominent. The observed higher heritability enrichment in conserved regions compared to coding regions67 supports the long-standing hypothesis regarding the functional significance of conserved sequences. However, the precise role of many highly conserved non-coding DNA sequences remains unclear68. The genetic correlation results of GM-BAG highlight the promise for the AI-derived subtypes, rather than the “one-for-all” unitary disease diagnosis, as robust endophenotypes21. These findings strongly support the clinical implications of re-evaluating pertinent hypotheses using the AI-derived subtypes in patient stratification and personalized medicine.

The elevated triglyceride-to-lipid ratio in VLDL, an established biomarker for cardiovascular diseases69, is causally associated with higher GM-BAG (accelerated brain age). Therefore, lifestyle interventions that target this biomarker might hold promise as an effective strategy to enhance overall brain health. In addition, we revealed that one unit-increased likelihood of type 2 diabetes has a causal effect on GM-BAG increase. Research has shown that normal brain aging is accelerated by approximately 26% in patients with progressive type 2 diabetes compared with healthy controls70. The protective causal effect of breast cancer on GM-BAG is intriguing in light of existing literature adversely linking breast cancer to brain metastasis71 and chemotherapy-induced cognitive impairments, commonly known as “chemo brain”. In addition, MR was proven to be sensitive to population selection bias72.

Genetic architecture of WM-BAG

The genetic architecture of WM-BAG exhibits strong correlations with cancer-related traits, AD, and physical measures such as BMI, among others. Our phenome-wide association query largely confirms the enrichment of these traits in previous literature. In particular, the DNAJC1 gene, annotated from the most polygenic locus on chromosome 10 (top lead SNP: rs564819152), encodes a protein called heat shock protein 40 (Hsp40) and plays a role in protein folding and the response to cellular stress. This gene is implicated in various cancer types, such as breast, renal, and melanoma (Supplementary eFigure 7). In addition, several FDA-approved drug drugs have been developed based on these WM-BAG genes for different types of cancer in our gene-drug-disease network (Fig. 3). Our findings provide novel insights into the genetic underpinnings of WM-BAG and their potential relevance to cancer.

Remarkably, one unit-increased likelihood of AD was causally associated with increased WM-BAG. Our Mendelian randomization analysis confirmed the abundant association evidenced by the phenome-wide association query (Fig. 2B). Dementia, such as AD, is undeniably a significant factor contributing to the decline of the aging brain. Evidence suggests that AD is not solely a GM disease; significant microstructural changes can be observed in WM before the onset of cognitive decline73. We also identified a nominal causal significance of BMI [risk effect; P-value=4.73×10−2, OR (95% CI) = 1.03 (1.00, 1.07)] on WM-BAG. These findings underscore the potential of lifestyle interventions and medications currently being tested in clinical trials for AD to improve overall brain health.

Genetic architecture of FC-BAG

The genetic signals for FC-BAG were weaker than those observed for GM and WM-BAG, which is consistent with the age prediction performance and partially corroborates Cheverud’s conjecture: using phenotypic correlations as proxies for genetic correlations. A novel genomic locus on chromosome 6 (6q.13) harbors an independent variant (rs1204329) previously linked to insomnia74. The top lead SNP, rs5877290, associated with this locus is a novel deletion-insertion mutation type: no known association with any human disease or gene mapping has been established for this SNP. The genetic basis of FC-BAG covaries with educational performance and schizophrenia subtypes. Specifically, parental education has been linked to cognitive ability, and researchers have identified a functional connectivity biomarker between the right rostral prefrontal cortex and occipital cortex that mediates the transmission of maternal education to offspring’s performance IQ75. On the other hand, schizophrenia is a highly heritable mental disorder that exhibits functional dysconnectivity throughout the brain76. AD was causally associated with FC-BAG with nominal significance [risk effect for per unit increase; P-value=4.43×10−2, OR (95% CI) = 1.02 (1.00, 1.03), number of SNPs=13] (Fig. 5A). The relationship between functional brain networks and the characteristic distribution of amyloid-β and tau in AD77 provides evidence that AD is a significant factor in the aging brain, underscoring its role as a primary causative agent.

The comparative trend of genetic heritability among GM, WM, and FC-BAG is also consistent with previous large-scale GWAS of multimodal brain IDP. Zhao et al. performed GWAS on GM13, WM14, and FC-IDP18, showing that FC-IDP is less genetically heritable than others. Similar observations were also demonstrated by Elliot et al.11 in the first large-scale GWAS using multimodal IDP from UKBB. The weaker genetic signal observed in FC-BAG can be attributed to many factors. One of the main reasons is the higher signal-to-noise ratio in FC measurements due to the dynamic and complex nature of brain activity, which can make it difficult to accurately measure and distinguish between the true signal and noise. Social-environmental and lifestyle factors can also contribute to the “missing heritability” observed in FC-BAG. For example, stress, sleep patterns, physical activity, and other environmental factors can impact brain function and connectivity78. In contrast, GM and WM measurements are more stable and less influenced by environmental factors, which may explain why they exhibit stronger genetic signals and higher heritability estimates.

Our multimodal approach provides evidence that the aging process of the human brain is a complex biological phenomenon intertwined with several organs and chronic diseases. This affirms a recent study that establishes a link between heterogeneous aging across various organ systems and the prediction of chronic diseases and mortality at the phenotypic level8. In summary, this study comprehensively depicts the genetic architecture and heterogeneity of multimodal human brain age, opening new avenues for drug repurposing/repositioning and identifying modifiable protective and risk factors that can ameliorate human brain health.

Methods

Method 1: Study populations

UKBB is a population-based study of more than 50,000 people recruited between 2006 and 2010 from Great Britain. The current study focused on participants from the imaging-genomics population who underwent an MRI scan and genome sequencing (genotype array data and the imputed genotype data). The UKBB study has ethical approval, and the ethics committee is detailed here: https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/governance/ethics-advisory-committee. The study design, phenotype and genetic data availability, and quality check have been published and detailed elsewhere23. The current work was performed under application number 35148. Table 1 shows the study characteristics of the present work.

Table 1.

Study characteristics.

Population (overlap) T1w MRI dMRI rsfMRI Age (year)* Sex /female*

Total (35,261) 36,304 39,661 36,858 63.64 (45.00, 81.00) 18,700/53%
Training/validation/test (4000) 4000 4000 4000 63.47 (46.00, 81.00) 2000/50%
Independent test (31,261) 32,304 35,661 32,858 63.66 (45.00, 81.00) 16,700/53%
GWAS 31,557 31,749 32,017 NA NA

The current table presents participants of all ancestries for the age prediction task. We constrained participants with only European ancestry for downstream genetic analyses.

*

For age and sex, we reported statistics for the overlapping population of the three modalities: 35,261 participants for the entire population, 4000 participants for the training/validation/test dataset, and 31,261 participants for the independent test dataset. We also showed the number of participants for the GM, WM, and FC-BAG GWAS. In total, our analyses included 42,089 unique participants that had at least one image scan. Abbreviation: dMRI: diffusion MRI; rsfMRI: resting-state functional MRI; T1w MRI: T1-weighted MRI.

To train the machine learning model and compare the performance of the multimodal BAG, we defined the following two datasets:

  • Training/validation/test dataset: To objectively compare the age prediction performance of different MRI modalities and machine learning models, we randomly sub-sampled 500 (250 females) participants within each decade’s range from 44 to 84 years old, resulting in the same 4000 participants (excluding all major diseased participants, such as AD, MDD, etc.) for GM, WM, and FC-IDP. This dataset was used to train machine learning models. In addition, we ensured that the training/validation/test splits were the same in the CV procedure.

  • Independent test dataset: The rest of the population for each MRI modality was set as independent test datasets – unseen until we finalized the training procedure79.

The GM-IDP includes 119 GM regional volumes from the MUSE atlas, consolidated by the iSTAGING consortium. We studied the influence of different WM-IDP features: i) 48 FA values; ii) 109 TBSS-based80 values from FA, MD, ODI, and NDI; iii) 192 skeleton-based mean values from FA, MD, ODI, and NDI. For FC-IDP, 210 ICA-derived functional connectivity components were included. The WM and FC-IDP were downloaded from UKBB (Method 3B).

Method 2: Multimodal brain age prediction using machine learning models

GM, WM, and FC-IDP (details of image processing are presented in Method 3) were fit into four machine learning models (linear and non-linear) to predict brain age. Specifically, we used SVR, LASSO regression, MLP, and a five-layer neural network (NN: three linear layers and one ReLu layer).

To objectively and reproducibly compare the age prediction performance using different machine learning models and MRI modalities, we adopted a nested CV procedure and included an independent test dataset26. Specifically, the outer loop CV was performed for 100 repeated random splits: 80% of the data were used for training. The remaining 20% was used for validation/testing in the inner loop with a 10-fold CV. In addition, we concealed an independent test dataset – unseen for testing until we finished fine-tuning the machine learning models79 (e.g., hyperparameters for SVR and neural networks). To compare the results of different models and modalities, we showed MAE’s mean and empirical standard deviation instead of performing any statistical test (e.g., a two-sample t-test). This is because no unbiased variance estimate exists for complex CV procedures (refer to notes from Nadeau and Benjio81).

Method 3: Image processing

(A): T1-weighted MRI processing:

The imaging quality check is detailed in Supplementary eMethod 2. All images were first corrected for magnetic field intensity inhomogeneity.82 A deep learning-based skull stripping algorithm was applied to remove extra-cranial material. In total, 145 IDPs were generated in gray matter (GM, 119 ROIs), white matter (WM, 20 ROIs), and ventricles (6 ROIs) using a multi atlas label fusion method.83 The 119 GM ROIs were fit to the four machine learning models to derive the GM-BAG.

(B): Diffusion MRI processing:

UKBB has processed diffusion MRI (dMRI) data and released several WM tract-based metrics for the Diffusion Tensor Imaging (DTI) model (single-shell dMRI) and Neurite Orientation Dispersion and Density Imaging (NODDI84) model (multi-shell dMRI). The Eddy85 tool corrected raw images for eddy currents, head motion, and outlier slices. The mean values of FA, MD, ODI, and NDI were extracted from the 48 WM tracts of the “ICBM-DTI-81 white-matter labels” atlas86, resulting in 192 WM-IDP (category code:134). In addition, a tract-skeleton (TBSS)80 and probabilistic tractography analysis87 were employed to derive weighted-mean measures within the 27 major WM tracts, referred to as the 108 TBSS WM-IDP (category code: 135). Finally, to overcome the overfitting problem due to the high correlations between FA, MD, ODI, and NDI within the same WM tract, we used only the 48 FA WM-IDP to fit the models for brain age prediction.

(C): Resting-state functional MRI processing:

For FC-IDP, we used the 21 × 21 resting-state functional connectivity (full correlation) matrices (data-field code: 25750) from UKBB88,89. UKBB processed rsfMRI data and released 25 whole-brain spatial independent component analysis (ICA)-derived components90; four components were removed due to artifactual components. This resulted in 210 FC-IDP quantifying pairwise correlations of the ICA-derived components. Details of dMRI and rsfMRI processing are documented here: https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf.

Method 4: Genetic analyses

Imputed genotype data were quality-checked for downstream analyses. Our quality check pipeline (see below) resulted in 33,541 European ancestry participants and 8,469,833 SNPs. After merging with the multimodal MRI populations, we included 31,557 European participants for GM-BAG, 31,749 participants for WM-BAG, and 32,017 participants for FC-BAG GWAS. Details of the protocol are described elsewhere15,91.

(A): Genome-wide association analysis:

For GWAS, we ran a linear regression using Plink92 for GM, WM, and FC-BAG, controlling for confounders of age, dataset status (training/validation/test or independent test dataset), age × squared, sex, age × sex interaction, age-squared × sex interaction, total intracranial volume, the brain position in the scanner (lateral, transverse, and longitudinal), and the first 40 genetic principal components. We adopted the genome-wide P-value threshold (5 × 10−8) and annotated independent genetic signals considering linkage disequilibrium (see below). We then estimated the SNP-based heritability using GCTA40 with the same covariates in GWAS.

To check the robustness of our GWAS results, we also performed the i) sex-stratified GWAS for males and females, ii) the split-sample GWAS by randomly dividing the entire population into two sex, age-matched splits, and iii) non-European GWAS.

(B): Annotation of genomic loci and genes:

The annotation of genomic loci and mapped genes was performed via FUMA (https://fuma.ctglab.nl/, version: v1.3.8). For the annotation of genomic loci, we first defined lead SNPs (correlation r2 ≤ 0.1, distance < 250 kilobases) and assigned them to a genomic locus (non-overlapping); the lead SNP with the lowest P-value (i.e., the top lead SNP) was used to represent the genomic locus. For gene mappings, three different strategies were considered. First, positional mapping assigns the SNP to its physically nearby genes (a 10 kb window by default). Second, eQTL mapping annotates SNPs to genes based on eQTL associations. Finally, chromatin interaction mapping annotates SNPs to genes when there is a significant chromatin interaction between the disease-associated regions and nearby or distant genes.93 The definition of top lead SNP, lead SNP, independent significant SNP, and candidate SNP can be found in Supplementary eMethod 1.

(C): Phenome-wide association query for genomic loci associated with other traits in the literature:

We queried the significant independent SNPs within each locus in the GWAS Catalog to determine their previously identified associations with other traits. For these associated traits, we further mapped them into several high-level categories for visualization purposes (Fig. 2B).

(D): Genetic correlation:

We used LDSC94 to estimate the pairwise genetic correlation (rg) between GM, WM, and FC-BAG and several pre-selected traits (Supplementary eTable 3) by using the precomputed LD scores from the 1000 Genomes of European ancestry. The following pre-selected traits were included: Alzheimer’s disease (AD), autism spectrum disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), obsessive-compulsive disorder (OCD), major depressive disorder (MDD), bipolar disorder (BIP), schizophrenia (SCZ), education and intelligence, as well as the AI-derived subtypes for AD (AD1 and AD295), ASD (ASD1, ASD2, and ASD345), and SCZ (SCZ1 and SCZ296) – serving as more robust endophenotypes than the disease diagnoses themselves. To ensure the suitability of the GWAS summary statistics, we first checked that the selected study’s population was European ancestry; we then guaranteed a moderate SNP-based heritability h2 estimate and excluded the studies with spurious low h2 (<0.05). Notably, LDSC corrects for sample overlap and provides an unbiased estimate of genetic correlation97.

(E): Partitioned heritability estimate:

Partitioned heritability analysis estimates the percentage of heritability enrichment explained by annotated genome regions49. First, the partitioned heritability was calculated for 53 main functional categories. The 53 functional categories are not specific to any cell type, including coding, UTR, promoter and intronic regions, etc. Details of the 53 categories are described elsewhere49. Subsequently, cell type-specific partitioned heritability was estimated using gene sets from Cahoy et al.56 for three main cell types (i.e., astrocyte, neuron, and oligodendrocyte).

(F): Gene-drug-disease network construction:

We curated data from the Drug Bank database (v.5.1.9)98 and the Therapeutic Target Database (updated by September 29th, 2021) to construct a gene-drug-disease network. Specifically, we constrained the target to human organisms and included all drugs with active statuses (e.g., patented, approved, etc.) but excluded inactive ones (e.g., terminated or discontinued at any phase). To represent the disease, we mapped the identified drugs to the Anatomical Therapeutic Chemical (ATC) classification system for the Drugbank database and the International Classification of Diseases (ICD-11) for the Therapeutic Target Database.

(G): Two-sample Mendelian Randomization:

We investigated whether the clinical traits previously associated with our genomic loci (Fig. 2B) were a cause or a consequence of GM, WM, and FC-BAG using a bidirectional, two-sample MR approach. GM, WM, and FC-BAG are the outcome/exposure variables in the forward/inverse MR, respectively. We applied five different MR methods using the TwoSampleMR R package57, and reported the results of IVW in the main text and the four others (e.g., Egger estimator) in the supplementary.

We selected exposure variables from different categories based on our phenome-wide association query. These exposure variables include neurodegenerative diseases (e.g., AD), biomarkers from the liver (e.g., AST), cardiovascular diseases (e.g., the triglyceride-to-lipid ratio in VLDL), and lifestyle risk factors (e.g., BMI). We also manually checked these papers and excluded studies that used the UKBB data in the discovery samples, resulting in 7 exposure variables. The characteristics of selected studies for the IVs are presented in Supplementary eTable 7.

We performed several sensitivity analyses. First, a heterogeneity test was performed to check for violating the IV assumptions. Horizontal pleiotropy was estimated to navigate the violation of the IV’s exclusivity assumption62 using a funnel plot, single-SNP MR approaches, and MR Egger estimator99. Moreover, the leave-one-out analysis excluded one instrument (SNP) at a time and assessed the sensitivity of the results to individual SNP.

(H): PRS prediction:

We calculated the incremental R-squared (R2) using each PRS of the 36 clinical traits55 to predict GM, WM, and FC-BAG. A null model was established via a linear regression model, incorporating independent variables such as age, dataset status (training/validation/test or independent test dataset), sex, total intracranial volume, brain position during scanning (lateral, transverse, and longitudinal), and the first 40 genetic principal components. The alternative model was then constructed by introducing each PRS as an extra independent variable. Furthermore, an alternative model was also created, which involved all 36 PRS variables in evaluating their combined effect in predicting GM, WM, and FC-BAG.

Supplementary Material

Supplement 1
media-1.docx (3.2MB, docx)

Acknowledgments

The iSTAGING consortium is a multi-institutional effort funded by NIA by RF1 AG054409. This research has been conducted using the UK Biobank Resource under Application Number 35148.

Footnotes

Competing Interests

None

Code Availability

The software and resources used in this study are all publicly available:

Data Availability

The GWAS summary statistics corresponding to this study are publicly available on the MEDICINE knowledge portal (https://labs.loni.usc.edu/medicine), the FUMA online platform (https://fuma.ctglab.nl/), the GWAS Catalog platform (https://www.ebi.ac.uk/gwas/home), and the IEU GWAS database (https://gwas.mrcieu.ac.uk/).

References

  • 1.Rajpurkar P., Chen E., Banerjee O. & Topol E. J. AI in health and medicine. Nat Med 28, 31–38 (2022). [DOI] [PubMed] [Google Scholar]
  • 2.Hassabis D., Kumaran D., Summerfield C. & Botvinick M. Neuroscience-Inspired Artificial Intelligence. Neuron 95, 245–258 (2017). [DOI] [PubMed] [Google Scholar]
  • 3.Lee J. et al. Deep learning-based brain age prediction in normal aging and dementia. Nat Aging 2, 412–424 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wen J. et al. Genetic, clinical underpinnings of subtle early brain change along Alzheimer’s dimensions. 2022.09.16.508329 Preprint at 10.1101/2022.09.16.508329 (2022). [DOI]
  • 5.Hollon T. et al. Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging. Nat Med 1–5 (2023) doi: 10.1038/s41591-023-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cole J. H., Marioni R. E., Harris S. E. & Deary I. J. Brain age and other bodily ‘ages’: implications for neuropsychiatry. Mol Psychiatry 24, 266–281 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jones D. T., Lee J. & Topol E. J. Digitising brain age. The Lancet 400, 988 (2022). [DOI] [PubMed] [Google Scholar]
  • 8.Tian Y. E. et al. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med 1–11 (2023) doi: 10.1038/s41591-023-02296-6. [DOI] [PubMed] [Google Scholar]
  • 9.Ning K. et al. Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging. Neurobiology of Aging 105, 199–204 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shen L. & Thompson P. M. Brain Imaging Genomics: Integrated Analysis and Machine Learning. Proceedings of the IEEE 108, 125–162 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Elliott L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Smith S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat Neurosci 24, 737–745 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhao B. et al. Genome-wide association analysis of 19,629 individuals identifies variants influencing regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits. Nat Genet 51, 1637–1644 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhao B. et al. Common genetic variation influencing human white matter microstructure. Science 372, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wen J. et al. Novel genomic loci and pathways influence patterns of structural covariance in the human brain. 2022.07.20.22277727 Preprint at 10.1101/2022.07.20.22277727 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Grasby K. L. et al. The genetic architecture of the human cerebral cortex. Science 367, eaay6690 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Brouwer R. M. et al. Genetic variants associated with longitudinal changes in brain structure across the lifespan. Nat Neurosci 25, 421–432 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhao B. et al. Common variants contribute to intrinsic human brain functional networks. Nat Genet 54, 508–517 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Smith S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. eLife 9, e52677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kaufmann T. et al. Common brain disorders are associated with heritable patterns of apparent aging of the brain. Nat Neurosci 22, 1617–1623 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kendler K. & Neale M. Endophenotype: a conceptual analysis. Mol Psychiatry 15, 789–797 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Reay W. R. & Cairns M. J. Advancing the use of genome-wide association studies for drug repurposing. Nat Rev Genet 22, 658–671 (2021). [DOI] [PubMed] [Google Scholar]
  • 23.Bycroft C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Emdin C. A., Khera A. V. & Kathiresan S. Mendelian Randomization. JAMA 318, 1925–1926 (2017). [DOI] [PubMed] [Google Scholar]
  • 25.Varoquaux G. et al. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. NeuroImage 145, 166–179 (2017). [DOI] [PubMed] [Google Scholar]
  • 26.Samper-González J. et al. Reproducible evaluation of classification methods in Alzheimer’s disease: Framework and application to MRI and PET data. NeuroImage 183, 504–521 (2018). [DOI] [PubMed] [Google Scholar]
  • 27.Pedregosa F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). [Google Scholar]
  • 28.More S. et al. Brain-age prediction: A systematic comparison of machine learning workflows. NeuroImage 270, 119947 (2023). [DOI] [PubMed] [Google Scholar]
  • 29.de Lange A.-M. G. & Cole J. H. Commentary: Correction procedures in brain-age prediction. Neuroimage Clin 26, 102229 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jun G. et al. A novel Alzheimer disease locus located near the gene encoding tau protein. Mol Psychiatry 21, 108–117 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nagel M. et al. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat Genet 50, 920–927 (2018). [DOI] [PubMed] [Google Scholar]
  • 32.Desikan R. S. et al. Genetic overlap between Alzheimer’s disease and Parkinson’s disease at the MAPT locus. Mol Psychiatry 20, 1588–1595 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hamza T. H. et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson’s disease. Nat Genet 42, 781–785 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kuchenbaecker K. B. et al. Identification of six new susceptibility loci for invasive epithelial ovarian cancer. Nat Genet 47, 164–171 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.de la Fuente J., Davies G., Grotzinger A. D., Tucker-Drob E. M. & Deary I. J. A general dimension of genetic sharing across diverse cognitive traits inferred from molecular data. Nat Hum Behav 5, 49–58 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Georges A. et al. Genetic investigation of fibromuscular dysplasia identifies risk loci and shared genetics with common cardiovascular diseases. Nat Commun 12, 6031 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Nagel M., Speed D., van der Sluis S. & Østergaard S. D. Genome-wide association study of the sensitivity to environmental stress and adversity neuroticism cluster. Acta Psychiatrica Scandinavica 141, 476–478 (2020). [DOI] [PubMed] [Google Scholar]
  • 38.Russell G. & Lightman S. The human stress response. Nat Rev Endocrinol 15, 525–534 (2019). [DOI] [PubMed] [Google Scholar]
  • 39.Buniello A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D1005–D1012 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yang J., Lee S. H., Goddard M. E. & Visscher P. M. GCTA: A Tool for Genome-wide Complex Trait Analysis. Am J Hum Genet 88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hopkins A. L. & Groom C. R. The druggable genome. Nat Rev Drug Discov 1, 727–730 (2002). [DOI] [PubMed] [Google Scholar]
  • 42.Antibody-Mediated Targeting of Tau In Vivo Does Not Require Effector Function and Microglial Engagement - PubMed. https://pubmed.ncbi.nlm.nih.gov/27475227/. [DOI] [PubMed]
  • 43.Wilcock G. K. et al. Potential of Low Dose Leuco-Methylthioninium Bis(Hydromethanesulphonate) (LMTM) Monotherapy for Treatment of Mild Alzheimer’s Disease: Cohort Analysis as Modified Primary Outcome in a Phase III Clinical Trial. J Alzheimers Dis 61, 435–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wen J. et al. Subtyping brain diseases from imaging data. Preprint at 10.48550/arXiv.2202.10945 (2022). [DOI] [PubMed] [Google Scholar]
  • 45.Hwang G. et al. Assessment of Neuroanatomical Endophenotypes of Autism Spectrum Disorder and Association With Characteristics of Individuals With Schizophrenia and the General Population. JAMA Psychiatry (2023) doi: 10.1001/jamapsychiatry.2023.0409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Chand G. B. et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using machine learning. Brain 143, 1027–1038 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS). Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol Psychiatry 23, 1181–1188 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rietveld C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Finucane H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Hoffman M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41, 827–841 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Haberle V. & Stark A. Eukaryotic core promoters and the functional basis of transcription initiation. Nat Rev Mol Cell Biol 19, 621–637 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Trynka G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet 45, 124–130 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Barski A. et al. High-Resolution Profiling of Histone Methylations in the Human Genome. Cell 129, 823–837 (2007). [DOI] [PubMed] [Google Scholar]
  • 54.International Schizophrenia Consortium et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Thompson D. J. et al. UK Biobank release and systematic evaluation of optimised polygenic risk scores for 53 diseases and quantitative traits. 2022.06.16.22276246 Preprint at 10.1101/2022.06.16.22276246 (2022). [DOI] [Google Scholar]
  • 56.Cahoy J. D. et al. A Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New Resource for Understanding Brain Development and Function. J. Neurosci. 28, 264–278 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hemani G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Borges M. C. et al. Circulating Fatty Acids and Risk of Coronary Heart Disease and Stroke: Individual Participant Data Meta-Analysis in Up to 16 126 Participants. J Am Heart Assoc 9, e013131 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Morris A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44, 981–990 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.K, M. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Lambert J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet 45, 1452–1458 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bowden J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat Med 36, 1783–1802 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Burgess S. & Thompson S. G. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur J Epidemiol 32, 377–389 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Klarin D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat Genet 50, 1514–1523 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Nelson C. P. et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet 49, 1385–1391 (2017). [DOI] [PubMed] [Google Scholar]
  • 66.Horie K. et al. CSF tau microtubule-binding region identifies pathological changes in primary tauopathies. Nat Med 28, 2547–2554 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Gusev A. et al. Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases. The American Journal of Human Genetics 95, 535–552 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Stamatoyannopoulos J. A. What does our genome encode? Genome Res 22, 1602–1611 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Nordestgaard B. G. & Varbo A. Triglycerides and cardiovascular disease. Lancet 384, 626–635 (2014). [DOI] [PubMed] [Google Scholar]
  • 70.Antal B. et al. Type 2 diabetes mellitus accelerates brain aging and cognitive decline: Complementary findings from UK Biobank and meta-analyses. Elife 11, e73138 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Wu A. M. L. et al. Aging and CNS Myeloid Cell Depletion Attenuate Breast Cancer Brain Metastasis. Clinical Cancer Research 27, 4422–4434 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Gkatzionis A. & Burgess S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? International Journal of Epidemiology 48, 691–701 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Sachdev P. S., Zhuang L., Braidy N. & Wen W. Is Alzheimer’s a disease of the white matter? Curr Opin Psychiatry 26, 244–251 (2013). [DOI] [PubMed] [Google Scholar]
  • 74.Watanabe K. et al. Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nat Genet 54, 1125–1132 (2022). [DOI] [PubMed] [Google Scholar]
  • 75.Cermakova P. et al. Parental education, cognition and functional connectivity of the salience network. Sci Rep 13, 2761 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Cao H., Zhou H. & Cannon T. D. Functional connectome-wide associations of schizophrenia polygenic risk. Mol Psychiatry 26, 2553–2561 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Yu M., Sporns O. & Saykin A. J. The human connectome in Alzheimer disease — relationship to biomarkers and genetics. Nat Rev Neurol 17, 545–563 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Tost H., Champagne F. A. & Meyer-Lindenberg A. Environmental influence in the brain, human welfare and mental health. Nat Neurosci 18, 1421–1431 (2015). [DOI] [PubMed] [Google Scholar]
  • 79.Wen J. et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Medical Image Analysis 63, 101694 (2020). [DOI] [PubMed] [Google Scholar]
  • 80.Smith S. M. et al. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. Neuroimage 31, 1487–1505 (2006). [DOI] [PubMed] [Google Scholar]
  • 81.Inference for the Generalization Error | SpringerLink. 10.1023/A:1024068626366. [DOI] [Google Scholar]
  • 82.Tustison N. J. et al. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Doshi J. et al. MUSE: MUlti-atlas region Segmentation utilizing Ensembles of registration algorithms and parameters, and locally optimal atlas selection. Neuroimage 127, 186–195 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Zhang H., Schneider T., Wheeler-Kingshott C. A. & Alexander D. C. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61, 1000–1016 (2012). [DOI] [PubMed] [Google Scholar]
  • 85.Smith S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl 1, S208–19 (2004). [DOI] [PubMed] [Google Scholar]
  • 86.Mori S., Wakana S., Nagae-Poetscher L. & van Zijl P. MRI Atlas of Human White Matter. (Elsevier, 2005). [DOI] [PubMed] [Google Scholar]
  • 87.Wakana S. et al. Reproducibility of quantitative tractography methods applied to cerebral white matter. NeuroImage 36, 630–644 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Alfaro-Almagro F. et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Miller K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience 19, 1523–1536 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Beckmann C. F. & Smith S. M. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging 23, 137–152 (2004). [DOI] [PubMed] [Google Scholar]
  • 91.Wen J. et al. Characterizing Heterogeneity in Neuroimaging, Cognition, Clinical Symptoms, and Genetics Among Patients With Late-Life Depression. JAMA Psychiatry (2022) doi: 10.1001/jamapsychiatry.2022.0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Purcell S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Watanabe K., Taskesen E., van Bochoven A. & Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun 8, 1826 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Bulik-Sullivan B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Wen J. et al. Genetic, clinical underpinnings of subtle early brain change along Alzheimer’s dimensions. 2022.09.16.508329 Preprint at 10.1101/2022.09.16.508329 (2022). [DOI] [Google Scholar]
  • 96.Chand G. B. et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using machine learning. Brain 143, 1027–1038 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Bulik-Sullivan B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–1241 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Wishart D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46, D1074–D1082 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Bowden J., Davey Smith G. & Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol 44, 512–525 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.docx (3.2MB, docx)

Data Availability Statement

The GWAS summary statistics corresponding to this study are publicly available on the MEDICINE knowledge portal (https://labs.loni.usc.edu/medicine), the FUMA online platform (https://fuma.ctglab.nl/), the GWAS Catalog platform (https://www.ebi.ac.uk/gwas/home), and the IEU GWAS database (https://gwas.mrcieu.ac.uk/).


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES