Abstract
Endurance exercise training is known to reduce risk for a range of complex diseases. However, the molecular basis of this effect has been challenging to study and largely restricted to analyses of either few or easily biopsied tissues. Extensive transcriptome data collected across 15 tissues during exercise training in rats as part of the Molecular Transducers of Physical Activity Consortium has provided a unique opportunity to clarify how exercise can affect tissue-specific gene expression and further suggest how exercise adaptation may impact complex disease-associated genes. To build this map, we integrate this multi-tissue atlas of gene expression changes with gene-disease targets, genetic regulation of expression, and trait relationship data in humans. Consensus from multiple approaches prioritizes specific tissues and genes where endurance exercise impacts disease-relevant gene expression. Specifically, we identify a total of 5523 trait-tissue-gene triplets to serve as a valuable starting point for future investigations [Exercise; Transcription; Human Phenotypic Variation].
Subject terms: Data integration, Gene expression, Genome-wide association studies, Transcriptomics
It is known that exercise influences many human traits, but not which tissues and genes are most important. This study connects transcriptome data collected across 15 tissues during exercise training in rats as part of the Molecular Transducers of Physical Activity Consortium with human data to identify traits with similar tissue specific gene expression signatures to exercise.
Introduction
Endurance exercise is associated with multiple positive health outcomes1,2. However, the molecular basis of these positive effects has been challenging to study, with past work restricted to molecular assays in either few or easily accessible tissues3. Even when prior differential analyses have identified exercise-responsive genes, there is often limited evidence for their shared molecular impact on disease. To address this challenge, we have combined the extensive, multi-tissue transcriptome data from the Molecular Transducers of Physical Activity Consortium (MoTrPAC) preclinical endurance exercise training (EET) study in rats4 with data from the Genotype-Tissue Expression (GTEx) project, where genetic differences in expression levels have been previously connected to 114 traits and diseases from publicly available Genome Wide Association Studies (GWAS) distributed across several phenotypic and plausibly exercise-responsive categories5 (note: acronyms and abbreviations used in this paper are summarized in Supplementary Table 1). The MoTrPAC EET study provided differential expression results after treadmill exercise training for both female and male F344 rats, with multiple tissues harvested at 1, 2, 4, and 8 weeks of training. All samples were harvested 48 h after the last exercise bout, and the 8-week time point was taken to correspond to the adapted state, as it allowed for the greatest degree of long-term adaptation to exercise to have occurred, as well as the least degree of unadapted acute response (eg inflammation). In rats, as in humans, exercise capacity is a genetic trait with well-studied relationships across a range of human-relevant complex traits and diseases6–8. Combined, these data provided a cross-tissue, whole organism molecular view of adaptation to exercise that is unattainable in human participants.
To assess the relationship of exercise adaption and complex disease risk in distinct tissues, we applied a combination of heritability and transcriptome-wide association study (TWAS) analyses (Fig. 1a). These analyses are state-of-the-art in human genetics but have yet to be broadly applied cross-species in the context of exercise adaption. They allow us to investigate exercise adaptation genes and gene sets for their relationship to specific complex diseases. We applied LDSC9, which can accommodate linkage disequilibrium to estimate SNP-heritability captured by sets of exercise adaption genes ()10, alongside MESC11, which incorporates both GWAS and Expression Quantitative Trait Loci (eQTL) summary statistics to estimate the proportion of mediated by gene expression within and across tissues to assess the relationship between genetic variability and adaptive exercise training response. Finally, we leveraged published S-PrediXcan12,13 output, which estimates gene by tissue-level associations and directions of effect for specific diseases to identify genes where changes in gene expression due to exercise adaptation have the potential to alter disease risk.
Using these data and approaches, we identify gene and tissue combinations where expression levels could mediate disease risk and where exercise training had the potential to induce expression differences capable of overwhelming the impact of human standing variation measured in the GTEx study, both overall and with respect to its genetic component. We further assess if specific diseases and traits are enriched for genes differentially expressed in exercise training, both in their overall occurrence and in their directionality of effect. Combining these approaches, we identify specific genes that lie at this intersection of biological relevance as candidates where exercise effects could override expression-mediated disease risk.
Results
Exercise training has unique disease gene signatures across tissues
Exercise training induces differential expression of rat genes across multiple body tissues, and many of these genes can be mapped to human orthologs: 94.5% of all unique, differentially expressed (DE) genes (87–98% across tissues), and 79% of all expressed genes (85–93% across tissues). However, most of these changes exhibit marked tissue specificity. As observed in the main MoTrPAC PASS1B paper4, we found that after long-term exercise training, there was limited overall concordance of adaptive differential expression across tissues in the subset of rat genes with identifiable human orthologs (hereafter ‘genes’, unless otherwise noted). Only two pairs of tissues in females—the skeletal muscles vastus lateralis and gastrocnemius, as well as white adipose and the colon—produced Spearman’s ‘ρ’s at a level greater than 0.3 (Supplementary Fig. 1b). Further, there was little overlap in differentially expressed gene sets (DEGs) corresponding to each tissue (Fig. 1b). Approximately 78% of DE genes were unique and differentially expressed in only one tissue, and 95% of genes matched at most a pair of tissues. Only one pair of tissues showed a Jaccard index > 0.1 (the gastrocnemius and the vastus lateralis, Jaccard Similarity ≈ 0.21). This indicates that unique genes and pathways adapt to exercise training in different tissues and likely impact different subsets of disease-relevant genes. To this point, we observed 370 high-scoring (Open Targets evidence score > 0.8, an arbitrary threshold chosen to select gene × trait relationships with high levels of supporting evidence) disease genes from 251 traits across our 15 surveyed tissues (Fig. 1c) that were consistently responsive to exercise training in both males and females, with an average of 18.2 genes per tissue. When we excluded easily biopsied tissues such as blood, skeletal muscles, and adipose, we found 178 well-established disease genes associated with an adaptive exercise training response. This corresponded to 143 traits and included 101 traits without any gene-trait associations in an easily biopsied tissue. Notably, these included well-studied genes such as LDLR (DE in {CORTEX, HIPPOC, SKM-GN, SKM-VL} and APOB (DE in {COLON, KIDNEY, LUNG}), both confidently associated with hypercholesterolemia; SLC6A8 (DE in {HEART, LIVER, LUNG}), associated with creatine transporter deficiency; FOXP3 (DE in {HEART, SPLEEN}, associated with immune dysregulation; and BRCA2 (DE in ADRNL), associated with breast neoplasia.
Exercise effects on regulation of gene expression
We sought to identify where changes in gene expression due to exercise training could potentially overcome either genetic or standing variability measured in GTEx. Here, our hypothesis was that exercise behavior may be more impactful than baseline variance or genetics at these loci for the component of disease risk mediated by gene expression. For each gene and tissue where we could detect non-zero genetic (IHW α = 0.10), we calculated genetic variance as the product of the estimated heritability and observed total phenotypic variance (Fig. 2). At 8W, we observed an average of 1 (range: 0–10) genes per tissue in at least one sex with effect sizes in trained rat that were > 2SD the genetic component of expression variability of the matched sex in humans (SDgeno), and 52 (range: 1–586) genes per tissue with DE > 2SD overall expression variability (SDpheno), with the latter set featuring ≈ 50 genes per tissue whose could not be significantly distinguished from 0 after multiplicity adjustment. Intersecting these genes with Open Targets, we observed 30 unique > 2 SDpheno DE genes with > 0.8 evidence scores, though only 14 of these were expressed in less accessible tissues. APOB was included in the latter group, mentioned above (DE in male lung at ≈ +9.7 SDpheno, and in the female colon at ≈ +2.4 SDpheno).
Heritability of complex disease enriched in or near training-responsive genes
We investigated whether exercise specifically modulates any traits or diseases by building on a previous approach14 to identify these effects. First, we computed the trait or disease heritability for gene sets that were differentially expressed due to exercise training in each tissue at 8W in both sexes and in the same direction. We observed the strongest magnitude of enrichments in blood phenotypes in the blood tissue, especially traits corresponding to densities of immune cells (Fig. 3).
Across the 43 traits with at least one significant enrichment at Bonferroni-adjusted α = 0.05, the largest significant enrichment factor corresponded most often to the spleen (22/43 ≈ 51%), especially in Blood (9/14) and Immune (6/7) phenotypes, with an average enrichment factor of ≈2.85 across significant enrichments. The proportion of heritability captured by these gene sets is on the order of ≈ 10% (Supplementary Fig. 2a) and corresponds to broadly independent signals across tissues (Supplementary Fig. 2b–c). This approach provides a general prioritization for assessing which traits or diseases could be most impacted by exercise training. However, we performed simulation experiments using randomly sampled gene sets of equivalent size to our original tissue-specific gene sets. These produced highly similar distributions of p-values to those observed for the empirical data. As such, these results (Fig. 3) should be interpreted less in the framework of null-hypothesis significance testing and more descriptively, as a relative ordering of estimated magnitudes of effect.
PrediXcan-significant genes overlap adaptive training-response genes
We examined the intersection of genes that are differentially expressed at “8w_F1_M1" and “8w_F-1_M-1” (i.e., sex- and direction-consistent after 8 weeks of training) and IHW-significant PrediXcan hits (Fig. 4a). We were able to identify substantial enrichment in many of the traits, trait categories, tissues, and tissue-by-trait pairs through the use of a hierarchical Bayesian model able to partially pool estimates of difference effects towards the means of their respective populations. Here, we see confident (>95% posterior probability) enrichments across all levels of the model hierarchy (Fig. 4e–g). Specifically, we observed confident positive differences in the colon, kidneys, small intestines, spleen, hippocampus, lungs, and heart, in order of decreasing posterior mean, as well as in the Endocrine and Cardiometabolic categories. We also noted several specific trait enrichments across cardiometabolic markers, mainly cholesterol and saturated fatty acids. At the trait × tissue level, posterior output were broadly uncertain in most pairs’ directionality of enrichment, with a smaller number showing stronger confidence in positive enrichment (Fig. 4b).
Conversely, none of the frequentist analyses of this overlap produced significant results at FWER α = 0.05 (-log10(0.05) ≈ 1.30, one-sided), with the most significant result corresponding to the multi-tissue GSEA for high cholesterol at an adjusted p-value of ≈ 0.064 (Supplementary Fig. 3, ES ≈ 0.63, log2err ≈ 0.48). However, results were broadly concordant across the two approaches, and more confident posterior distributions corresponded to lower frequentist p-values, with intermediate positive Spearman’s ρs for pairwise and trait-wise comparisons (Supplementary Fig. 3a–c). Frequentist meta-analysis of tissue and trait-category enrichments were in less confident agreement, with the latter showing mild disagreement, though at p ≈ 0.53 (output from stats::cor.test in R).
Exercise induces both more and less disease-like differential gene expression
To identify the direction of training effect in these intersecting gene sets, we queried the posterior output from a second multilevel model, visualizing posterior means for each tissue and tissue-by-trait combinations as a dot plot (Fig. 5). Given the reduced capacity for signal in these data (focal totals no longer being the set of DEGs, but the set of DEGs ∩ PrediXcan hits), we report on confident effects when a posterior mass is >90% to one side of 0. As such, the strongest confident mean enrichments for positive effects were observed in body fat percentage, asthma, and body mass index (BMI), and the strongest mean depletions in standing height and high cholesterol, though of the latter only standing height was “confident". Otherwise, body fat percentage was the only trait with posterior difference >95% in either direction. As traits varied in the degree to which they could be considered harmful or beneficial, we could not evaluate gross tissue effects across traits, but at the tails of each trait’s hyperdistribution, blood, spleen, and the two skeletal muscles had the strongest degree of deviation from null expectation. Additionally, ≈83% of the posterior mass of our GSNP weight parameter θ fell above 0.5, with ≈28% falling above 0.9.
When examining the direction of trajectories for 8-week gene sets linked to the two non-anthropometric traits, we noticed a regression towards a mean proportion of 0.5 across tissues. This is likely due to underlying genes only being differentially expressed at later time points (Fig. 6). Examining which genes and tissues correspond to both high deviation from the mean and relatively large amounts of DE, we observed blood genes associated with lower cholesterol in males (NDUFA13, FADS2, PNKD, AAMP, and OGDH), as well as the male training vastus lateralis gene TMBIM1, the female-specific training gene APOB in colon, and the female training gene ABCG8 in liver. With respect to increased risk of asthma, blood genes again had the largest relative effect sizes in males (BAG6, CCNG, CRAT, PTPA, and FAM89B), with female training genes exhibiting the largest effects in ATP6V1G2 in the vastus lateralis, ENDOU in white adipose, and CCNF in blood.
Discussion
In our study, we have identified multiple tissues and tissue-by-gene pairs where exercise may modify disease risk through gene expression. Despite human-rat differences, our unbiased approach identified multiple results that echo established exercise-disease relationships. However, some findings were unexpected.
Gene sets that responded to exercise were enriched for PrediXcan genes linked with cardiometabolic traits (Fig. 4e, f). The intersection of these genes seems to lean away from disease-like effects (Fig. 5), but we also found disease-like effects for genes associated with asthma and body fat percentage (Fig. 5). These associations, however, did not exhibit intersect sizes larger than expected by chance, and the latter showed only weak evidence of depletion (Fig. 4f). Additionally, when aggregating across traits, several of the most “classically" exercise-responsive tissues – the skeletal muscles and white adipose—appeared to be among the most depleted for PrediXcan hits (Fig. 4d), though no marginal posterior distributions for difference parameters there reached our 95% posterior mass threshold. Overall, estimates for enrichments and depletions in both intersect and directionality effects were small, even for confidently non-zero effects, predominately varying within 0 ± 0.3 on the log-odds scale (Figs. 4e–g, 5). This corresponds to a maximum difference of ≈7.5% on the probability scale (inv-logit(0.15) - inv-logit(-0.15)), and is consistent with the relatively small deviations observed from the 1-to-1 lines in Fig. 4c–d. Interpretation of these exercise biological findings should not lose sight of this context: small, subtle, but nevertheless discernible association.
In the case of body fat percentage (BF%), it may be that absent dietary control – for example, when rats are fed ad libidum – genes are regulated in a manner that elicits increased fat storage as an adaptation to higher energy expenditure15. Thus, even though exercise may often be done by humans with the goal of reducing BF% through increased caloric expenditure, an interaction with diet modulates this response. However, white fat itself does not appear to be enriched for positive effects on BF%, with the most pronounced enrichments evident in blood and heart. Similarly, the strongest depletion for positive effects, both within anthropometric traits and overall, occurred in the standing height phenotype. Evidence for exercise effects on height is weak and ambiguous. However, particularly intense exercise may have an attenuating effect on growth16–18, especially under nutritional stress, which may partially underlie the associations observed here.
Asthma also emerged as a trait with shared transcriptional effects as exercise. This may be due to similar etiology between the general asthmatic condition measured by self-report and exercise-induced bronchoconstriction (EIB), where lung epithelial stress from exercise and increased drying and cooling of the airways due to increased ventilation triggers an inflammatory response alongside shortness of breath19. Though no individual tissues were found to be confidently enriched in positive effects for this phenotype (Fig. 5), DE genes in the spleen—a key immune and inflammatory response regulator20 – emerged as having the greatest enrichment in , comprising nearly four times baseline expectation and accounting for ≈10% of trait heritability overall (Fig. 3, Supplementary Fig. 2a). Moreover, DE genes in the spleen had the highest enrichment across a number of additional immune cell and disease phenotypes, and both eosinophil and basophil counts were found to have moderate genetic correlations with the asthma phenotype, highlighting the recently proposed roles these cell types play in structuring EIB21,22. Finally, even where exercise regulates gene expression in ostensibly “disease-like" directions, it may be that many phenotypes as those above manifest when inflammatory, hunger-regulating, or other effects of exercise occur without having first been induced by exercise. We hypothesize that by subjecting the body to disease-like stresses, regular exercise elicits adaptation to the symptoms of those diseases, reducing the risk of their manifestation from the disease itself. In this light, the presence of their signal here may also be expected.
At the gene level, several of the highlighted genes in the DEG / PrediXcan intersection were supported by prior literature. In the cholesterol phenotype, FADS223, PNKD24, and OGDH25, TMBIM126, APOB27, and ABCG828 have all been implicated previously, while NDUFA13 and AAMP have not. For asthma and / or reduced lung function, links have been drawn to BAG629, CCNF30, and CRAT31, though other genes are mentioned in similar contexts to that explored in this work, also relying on integration of eQTL and GWAS association mapping (e.g., FAM89B32).
A notable limitation of this study may be that, despite their well-established use as an exercise model, rats are separated from humans by nearly 140 million years of evolution33. Comparison of exercise-independent age and sex effects, meanwhile, may be limited by differences in age between individuals in GTEx and MoTrPAC, as most humans in the GTEx v8 dataset were aged 50+34,35 while trained F344 rats were uniformly under eight months of age and therefore well under the age of onset of the F344 rat equivalent of sex-specific, aging-related changes such as menopause36. These results may also have limited portability to non-European populations, as the GTEx sample comprises mostly European-descendant individuals. Identification of rat-human gene orthology is another difficult problem, and important biology almost certainly lies within disease and exercise-responsive genes across species whose correspondence can not be easily established. But while species differences can complicate interpretation of exercise-induced regulation of orthologous genes, these models remain crucial and provide high levels of experimental compliance and tissue accessibility from individuals who are far more straightforward to motivate. As such, a unique aspect of the MoTrPAC rat exercise training data includes the availability of differential expression data across 15 distinct tissues, many impossible or impractical to collect in humans as part of an exercise study. Accelerated rat life history also makes it feasible to conduct experiments on exercise training adaptation on timescales relevant to their lifespan. It’s simpler to regulate rat behavior than human behavior, reducing biases linked to non-compliance and attrition.
We expect future studies can benefit and expand on this work in several ways. Qualitative sex-specificity, a notable hallmark of exercise adaptation in humans37,38, fell outside the scope considered here, though is afforded closer treatment in companion publications39. Future causal inferential work may use the genetic correlates of physical activity40 as instruments to infer tissue-specific drivers of phenotypic adaptation41 in humans. But analysis of experimental data from animal models will complement these efforts where genetic effects are weak (Fig. 2a), targeting causality directly to identify how tissue and organ systems adapt to exercise and influence a large variety of human traits and diseases. Finally, we expect that future studies may benefit from our work by evaluating specific loci therein for GxE interactions within large-scale human population biobanks. Combined, MoTrPAC’s EET study provides a large-scale, cross-tissue map of changes in exercise adaptation that enables generating new mechanistic hypotheses on the disease impacts of exercise training.
Methods
This study did not generate novel data, instead relying on data published in previous or concurrent studies. Animal procedures from the concurrent MoTrPAC PASS1B study4 were approved by the University of Iowa’s Institutional Animal Care and Use Committee.
MoTrPAC EET study design
The MoTrPAC42 Endurance Exercise Training Study is described in detail in the landscape manuscript4 (data accessible at https://motrpac-data.org/data-access). In brief, both female and male F344 rats were subjected to treadmill exercise training, with tissues harvested at 1, 2, 4, and 8 weeks of training. All samples were taken 48 h after the last exercise bout, with the 8-week time point taken to correspond to the adapted state. In this work, we leverage data from a total of 738 extracted samples across 15 tissues and 47-50 rats per tissue that were subjected to RNA-sequencing and differential expression analysis.
Differential expression analysis
Differential expression analysis (DEA) is described in detail by the main MoTrPAC manuscript4. Briefly, DEA was performed separately in each sex and tissue using filtered raw counts as input for DESeq243. Likelihood ratio tests (DESeq2::nbinomLRT()) were used to identify genes that changed over the training time course in at least one sex while accounting for RNA-Seq technical covariates (RNA integrity number, median 5′-3′ bias, percent of reads mapping to globin, and percent of PCR duplicates as quantified with Unique Molecular Identifiers). For each gene, male- and female-specific p-values were combined using the Fisher’s sum of logs method. These meta-analytic p-values were adjusted across all RNA-Seq datasets using Independent Hypothesis Weighting (IHW) with tissue as a covariate44. Training-differential genes were selected at 5% IHW α. Given the regression model of each gene described above, contrasts were made between each training timepoint (i.e., 1, 2, 4, or 8 weeks) and the sex-matched sedentary controls using DESeq2::DESeq() to calculate time- and sex-specific summary statistics.
Correlation of differential analysis results
The nominal p-values and log fold-changes from the time- and sex-specific differential expression analysis results were transformed into standard normal random variables using qnorm(p-value / 2, lower.tail = F) * sign in base-R. These “z-scores" were organized into a gene-by-condition matrix, where conditions were tissue, sex, and timepoint combinations. The z-score matrix was filtered to include the set of genes that had no missing values across all conditions. We calculated the Spearman correlation between all pairs of conditions to quantify the concordance of the training effect across conditions.
Graphical clustering of differential analysis results
Graphical clustering of differential analysis results is described in detail in the main MoTrPAC EET study manuscript4. All training-differential features at 5% IHW α were clustered into homogeneous patterns using their time- and sex-specific differential analysis z-scores. The statistical details are provided elsewhere4,45–47. Briefly, the expectation-maximization (EM) process of the repfdr algorithm was used to assign one of three simplified states to each z-score: −1 for down-regulation, 0 for null (no change), or 1 for up-regulation45. For each feature and timepoint, the simplified states from each sex were combined into one of nine possible states (−1, 0, or 1 for each sex). For example, the state “F1_M1" represented a feature that was up-regulated in both females (F1) and males (M1) at a given timepoint. Here, to focus on genes with sex-consistent training effects in the trained state, we selected genes that were assigned to the F1_M1 state (up-regulated in both sexes) or the F-1_M-1 state (down-regulated in both sexes) at 8 weeks. To enable comparison between genes expressed in rats and humans, we compiled a MoTrPAC rat-to-human ortholog map from GENCODE and RGD resources4,48,49. The distribution of those genes able to be matched to human orthologs across tissues is summarized in Fig. 1b.
Open targets intersection
The Open Targets50 database (Release 22.04) was downloaded on June 8th, 2022. Entries in this database represent curated sets of human genes with disease relationships established from multiple sources of evidence. We used the R-package sparklyr51 to cross-reference differentially expressed rat genes to all orthologous Open Targets gene-trait direct associations at different evidence-score thresholds. The abundance of these associations were quantified on a tissue-specific and tissue-shared bases, comprising genes differentially expressed in three or more tissues. A table listing all genes, top trait associations, and corresponding tissues is provided in the Supplementary Files folder of the GitHub repo.
Heritability analyses
We retrieved summary statistics (sumstats) for 114 published GWAS5. Using the program LDSC9, we estimated SNP-heritability () for each GWAS in LDSC10, including the default baseline annotation of 53 functional categories. We further estimated using MESC11, and with the provided expression scores meta-analyzed over 48 GTEx tissues, estimated expression-mediated heritability () for our 114 traits, as well as the ratio of .
LDSC was used to estimate overall proportion of and enrichment in across loci within a 100kb window of all sex-consistent 8w DE gene sets in each tissue following the “Cell type specific analyses" tutorial. We included here the baseline annotation, as well as an annotation comprising loci within 100kb of all expressed genes in each tissue. Finally, to assess the sensitivity of tissue-specific results on overlaps in gene sets between tissues, we estimated heritability and heritability enrichment conditional on annotations corresponding to all other tissues alongside the baseline annotation.
Human expression data & effect standardization
To assess the degree that exercise effects could overcome genetic and phenotypic variability of gene expression in a tissue, we used the GTEx database (version 8)34. To allow for a common scale between exercise DE and measures of gene expression in GTEx, we modified the GTEx pipeline to use a pseudolog (log2(x + 1)) transformation in place of its default inverse-normal transform, otherwise keeping later steps in the pipeline intact. Next, we took the outputted expression matrices and residualized out the provided covariates using the lm() function in base-R (sex, the top 5 genotyping principal components, Sequencing platform, Sequencing protocol, and the suggested number of PEER factors in the GTEx documentation). On a per-gene basis, we then computed sample variances for each gene in each tissue, pooled across sex to reflect the sex-independent nature of exercise-induced DE. To regularize outlying variance estimates due to sampling effects, we fit an inverse-gamma distribution to tissue-specific sample variances using a maximum goodness-of-fit estimator implemented in the R-package fitdistrplus52 by the function fitdist(). As the inverse-gamma is the conjugate prior of the variance term of a normal distribution with known mean, we adopted an Empirical Bayesian strategy to produce posterior estimates of each gene’s expression variance. To allow for heterogeneity in this term across sex, we did this separately for male- and female-coded individuals in the GTEx study population. Additional details are provided in the Supplementary Methods. Across tissues, these empirical priors are plotted in the denominator of Fig. 2a. For each gene, we then took the square root of the posterior mean of inferred log2expression variance () to estimate within-population standard deviation of the magnitude of gene expression. We then divided estimated exercise DE by these values to produce standardized estimates in units of within-tissue phenotypic standard deviation (SDpheno). Further, these estimates were conditioned on both sex and population (quantile plots in upper panels of Fig. 2a).
To estimate the scale of genetic influence on gene expression, we used the software Plink53 and GCTA54, specifically GCTA-GRM55, to estimate of each gene’s expression, using the same covariates as before. In contrast to prior work estimating in GTEx’ inverse-normal transformed gene expression matrices56, we focused on obtaining estimates on a gene-specific basis, and so constrained output to be bounded between 0 and 1.
We then took these estimates, which represent the proportion of expression variance able to be explained by linear effects at the SNP level, and multiplied them by the estimates of expression variance, dividing our estimates of exercise-induced DE by the square root of that product to obtain exercise-effect sizes in units of genetic (SNP) standard deviation (SDgeno). Many of these point estimates were at or near 0, resulting in extreme standardized effect sizes. As a further filter, we thresholded on significance (IHW α = 0.10, with tissue as a covariate) to focus on confidently heritable genetically regulated expression. This removed ≈92% of gene x tissue pairs (583,238/632,738), leaving 49,500 for later analysis and use in figures.
Cross-referencing exercise-training genes with human TWAS
To identify specific genes where exercise-training effects may have the potential to mediate traits, we cross-referenced exercise-genes against transcriptome-wide association results (TWAS). Specifically, we downloaded S-PrediXcan13 output5 for 114 GWAS and MASHR-based expression models using GTEx v8, filtering by significance (IHW α = 0.05, with tissue x trait pairs as a covariate), and intersected with genes that were differentially expressed due to exercise at 8W in a sex-consistent manner, i.e., members of the nodes “8w_F1_M1" and “8w_F-1_M-1". 99 of 114 traits had a nonzero intersect in at least one tissue.
To assess potential enrichments in these intersections, we compared the observed count of S-PrediXcan hits in the DEG sets against those outside the DEG sets, adopting a tractable Binomial approximation to the Bernoulli distribution to test for enrichment or depletion of genes under a multilevel Bayesian model, following ref. 57. This approach allowed us to partially pool information across tissues and traits, avoiding the need for post-hoc multiplicity adjustment58, as multiplicity is explicitly built into the inference model itself through flexible regularization of model parameters towards 0. Specifically, we fit a model of the form:
1.1 |
1.2 |
1.3 |
1.4 |
1.5 |
1.6 |
1.7 |
1.8 |
1.9 |
1.10 |
1.11 |
1.12 |
1.13 |
1.14 |
Notation for this model is summarized in Supplementary Table 2, but in brief: the intersect size in tissue i ∈ {1, 2, …, 15} and trait j ∈ {1, 2, …, 99} was binomially distributed, with giving the total number of genes in that tissue that were differentially expressed at 8W and expressed at any level in the PrediXcan analysis (i.e., disregarding genes that were not expressed in both samples). The function f() can be any function mapping , but here was the inverse-logit function. On the logit-scale, was expressed as a deviation from a mean πi,j, with an equal and opposite deviation to the log-odds of observing a PrediXcan hit in the complementary set, defined as all expressed genes that were not differentially expressed at 8W in a sex-consistent manner. This deviation term had four components: a tissue difference βi, a trait difference γj, a tissue x trait difference ϵi,j, and an overall difference α. Adding and subtracting half from πi,j to produce and , respectively, was done to prevent specifying greater prior uncertainty on one of the two composite probability parameters.
The various scale parameters, σ, served to adaptively regularize estimates of each difference term towards their mean. Otherwise, we nested trait difference effects γj in trait category difference effects μk, where k ∈ {1, 2, …, 12} indexes previously designated trait categories5, i.e., members of the set {Psychiatric, Aging, Cardiometabolic, Allergy, Digestive, Immune, Endocrine, Skeletal, Anthropometric, Hair, Blood, Cancer}. If traits in a particular category showed consistent evidence of deviation, partial pooling shrunk estimates towards their respective mean hyperparameters, allowing them to share information to the extent the model could detect information to be shared. We use a similar model structure to express the overall location parameter, πi,j.
Pseudo-replication across tissues and traits amplifies signals that inform higher-level parameters, leading the inference model to mistake interdependent effects as independent evidence for enrichment. When aggregating many Bernoulli random variables to a single binomial, signals of gene interdependence that would otherwise prevent this are lost. To address this, we introduced parameters Σi, Σj, and Σi×j, corresponding to i × i tissue, j × j trait, and (i ⋅ j) × (i ⋅ j) tissue × trait correlation matrices, respectively. For tractability, we then fixed these to maximum-likelihood estimates of each respective gene-wise correlation matrix under a bivariate probit, which we fit marginally across all DEGs, PrediXcan hits (jointly across tissues), and DEG × PrediXcan intersects using the nlm (non-linear minimization) algorithm59 implemented in and accessed through the R-packages stats and optimx60. As we fit each pairwise correlation individually, rather than simultaneously, there was no guarantee that the resulting correlation matrix is positive semi-definite. To ensure this constraint is met and all pairwise correlations are jointly possible, we transformed the pairwise-estimated correlation matrices with Higham’s algorithm61 implemented in the R-package Matrix62 function nearPD() before proceeding further.
We fit this model in Stan63 using CmdStanR64 and in Fig. 4e–g visualize marginal posterior distributions for tissue, trait, and trait category difference effects as violin plots using the R-package vioplot65. Additionally, where the composite difference effect for a particular cell in Fig. 4a finds > 95% of its posterior mass to one side of 0, we colored its upper or lower corner with a red or blue triangle to signify enrichment or depletion of that tissue x trait combination, respectively. To accommodate this and subsequent models’ challenging posterior geometry, we used a non-centered parameterization, running four separate and randomly initialized chains for 2.5 × 103 warmup and 2.5 × 103 sampling iterations, with a target acceptance rate δ of 0.95. To diagnose pathologies in the MCMC output and confirm adequate convergence, mixing, and sampling intensity, we used the posterior package66 for MCMC diagnostics, requiring that all model parameters, as well as posterior density and likelihood, receive and both bulk and tail Effective Sample Size (ESS) > 500, in addition to requiring that < 0.05% of iterations end in a divergence.
To complement this analysis, we also performed frequentist Gene Set Enrichment Analysis using the function fgsea() implemented in the R-package fgsea67. Specifically, for each trait × tissue pair, we assessed enrichment of the set of DEGs in that tissue in the list of -log10(p-values) of mutually expressed, orthologous genes’ p-values from PrediXcan, applying a Bonferroni correction to the output. To aggregate across these interdependent tests and assess tissue- and trait-level enrichment, we took the harmonic mean of subtest p-values corresponding to each grouping68, applying a similar FWER adjustment to the output (α = 0.05). We leveraged the same meta-analytic procedure over trait-level enrichments to aggregate to trait categories. Finally, we explored an alternative approach to aggregating multi-tissue GSEA within traits. As a more stringent set of multi-tissue responsive genes, we took the set of all genes differentially expressed in three or more tissues. For PrediXcan p-values, we took the harmonic mean of PrediXcan p-values for each gene across all studied tissues prior to -log10 transformation. These were input into conventional GSEA, and output from all of the above comparisons was visualized in Supplementary Fig. 3.
Proportion of disease-like effects
To assess the proportion of DE acting in disease-like directions relative to each phenotype (the product of the direction of DE and the direct of association from PrediXcan), we applied another Bayesian multilevel model, comparing the observed, unweighted frequency of positive effects against a “null" frequency of 0.5 (equivalently, against log-odds of 0):
2.1 |
2.2 |
2.3 |
2.4 |
2.5 |
2.6 |
2.7 |
2.8 |
2.9 |
2.10 |
2.11 |
Unlike for their overall frequency, signal for the directionality of effect (more or less disease-like) cannot be shared across traits or within trait categories, as the traits themselves vary in whether they are harmful, neutral, or beneficial. Instead, we perform partial pooling across the scale of these differences, across both trait categories and within traits themselves. Notation for this model is summarized in Supplementary Table 3, but in brief: we estimate overall scale parameters (ρ, δ), and then estimate a log-normally distributed multiplicative factor (λj, γk) to scale each on a trait-wise and trait-category-wise basis, respectively. To accommodate per-trait interdependence in the direction of deviation, we invert Cheverud’s conjecture69, using our previously estimated GSNP as a proxy for an environmental correlation matrix (Supplementary Fig. 1a). As these genetic correlations were estimated pairwise, positive-semidefiniteness (PSD) of the whole correlation matrix is not guaranteed. To satisfy the PSD constraint of GSNP, we substituted the nearest PSD correlation matrix output from Higham’s algorithm61, implemented in the R-package Matrix62 function nearPD(). To allow flexibility in this modeling assumption, we compute a linearly weighted average of this matrix and the identity matrix (i), estimating the weight parameter θ from a flat Beta prior. MCMC sampling parameters were specified and diagnostics performed as previously described.
We examined the two non-anthropometric traits with the highest posterior means in Fig. 6, tracing the proportion of effects of the 8W gene set backwards to the first week. Where individual genes are not assigned to a graphical node signifying differential expression, their contribution to the count of positive effects was taken to be 0.5 when calculating the overall proportion. We include in these tissue-specific trajectories the set of genes corresponding to each tissue, their direction of effect on the trait, and their standardized effect size from Fig. 2b. Similar figures to Fig. 6 for all other traits may be found in the GitHub repository mentioned below.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
We would like to thank Michael Gloudemans, Daniel Nachun, Bob Carpenter, Andrew Gelman, Laurens van de Wiel, Andrew Marderstein, Bruna Balliu, Kim Huffman, and Kate Gates for their valuable input on many parts of the analyses presented above. We would also like to thank Marty Walsh, John Williams, Matt Wheeler, and other members of MoTrPAC for their crucial feedback on this work. Finally, we would like to acknowledge the entire MoTrPAC team, including PASS, CAS and BIC, for their indispensable contributions in generating the exercise-response data used here. MoTrPAC is supported by NIH grants U24OD026629 (MSG, Bioinformatics Center), U24DK112349 (MSG), U24DK112342 (MSG), U24DK112340 (MSG), U24DK112341 (MSG), U24DK112326 (MSG), U24DK112331 (MSG), U24DK112348 (SBM, Chemical Analysis Sites), U01AR071133 (MSG), U01AR071130 (MSG), U01AR071124 (MSG), U01AR071128 (MSG), U01AR071150 (MSG), U01AR071160 (MSG), U01AR071158 (MSG, Clinical Centers), U24AR071113 (MSG, Consortium Coordinating Center), U01AG055133 (MSG), U01AG055137 (MSG), and U01AG055135 (MSG, PASS/Animal Sites). Research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award number T15LM007033 (N.G.V.).
Author contributions
NGV, NRG, and SBM collectively conceived of and designed the analysis strategy underlying this work. NGV implemented most analysis and figure code, with NRG facilitating access and advising processing of GTEx, GWAS, and MoTrPAC data. NGV and NRG performed testing and validation, as well as compiling online materials. NGV drafted the manuscript, which then received extensive edits and suggestions from NRG and SBM. MSG provided substantive feedback and advice throughout all parts of this work. All authors approved the manuscript prior to submission.
Peer review
Peer review information
Nature Communications thanks Frank Booth, Taylor Head, Kangjin Kim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
This study did not generate novel data, relying instead on previously or concurrently published data. MoTrPAC PASS1B data (10.1101/2022.09.21.508770) used here have been deposited at https://motrpac-data.org/data-access. Inquiries regarding access to these data should be sent to motrpac-helpdesk@lists.stanford.edu. Further resources are available at motrpac.org and motrpac-data.org. Where it would be difficult to re-host large datasets from GTEx34, Open Targets50, and PrediXcan5, we provide download links in the documentation of the associated code repository. Source data to generate all figures seen here are provided with this paper in the form of *.RData objects. These contain all necessary processed data to fully and quickly reproduce all paper figures using the scripts contained in https://github.com/NikVetr/MoTrPAC_Complex_Traits/tree/main/scripts/figures. Source data are provided with this paper.
Code availability
We provide end-to-end scripts to perform all analyses described above in a GitHub repository70 located at the following URL: https://github.com/NikVetr/MoTrPAC_Complex_Traits. Additionally, we provide scripts to generate all figures, as well as intermediate data files corresponding to compiled results at each level of analysis (MCMC output, Open Targets associations, cross-referenced DEG-PrediXcan intersects, aggregated GCTA output, and relative effect sizes).
Competing interests
S.B.M. is a consultant for BioMarin, MyOme and Tenaya Therapeutics. These companies are broadly interested in treatments for rare and common genetic diseases but had no input on any component of this study. The authors have no other competing interests to declare.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A list of authors and their affiliations appears at the end of the paper.
Contributor Information
Nikolai G. Vetr, Email: nikgvetr@stanford.edu
Stephen B. Montgomery, Email: smontgom@stanford.edu
MoTrPAC Study Group:
Joshua N. Adkins, Brent G. Albertson, David Amar, Mary Anne S. Amper, Jose Juan Almagro Armenteros, Euan Ashley, Julian Avila-Pacheco, Dam Bae, Ali Tugrul Balci, Marcas Bamman, Nasim Bararpour, Elisabeth R. Barton, Pierre M. Jean Beltran, Bryan C. Bergman, Daniel H. Bessesen, Sue C. Bodine, Frank W. Booth, Brian Bouverat, Thomas W. Buford, Charles F. Burant, Tiziana Caputo, Steven Carr, Toby L. Chambers, Clarisa Chavez, Maria Chikina, Roxanne Chiu, Michael Cicha, Clary B. Clish, Paul M. Coen, Dan Cooper, Elaine Cornell, Gary Cutter, Karen P. Dalton, Surendra Dasari, Courtney Dennis, Karyn Esser, Charles R. Evans, Roger Farrar, Facundo M. Fernádez, Kishore Gadde, Nicole Gagne, David A. Gaul, Yongchao Ge, Robert E. Gerszten, Bret H. Goodpaster, Laurie J. Goodyear, Marina A. Gritsenko, Kristy Guevara, Fadia Haddad, Joshua R. Hansen, Melissa Harris, Trevor Hastie, Krista M. Hennig, Steven G. Hershman, Andrea Hevener, Michael F. Hirshman, Zhenxin Hou, Fang-Chi Hsu, Kim M. Huffman, Chia-Jui Hung, Chelsea Hutchinson-Bunch, Anna A. Ivanova, Bailey E. Jackson, Catherine M. Jankowski, David Jimenez-Morales, Christopher A. Jin, Neil M. Johannsen, Robert L. Newton, Jr, Maureen T. Kachman, Benjamin G. Ke, Hasmik Keshishian, Wendy M. Kohrt, Kyle S. Kramer, William E. Kraus, Ian Lanza, Christiaan Leeuwenburgh, Sarah J. Lessard, Bridget Lester, Jun Z. Li, Malene E. Lindholm, Ana K. Lira, Xueyun Liu, Ching-ju Lu, Nathan S. Makarewicz, Kristal M. Maner-Smith, D. R. Mani, Gina M. Many, Nada Marjanovic, Andrea Marshall, Shruti Marwaha, Sandy May, Edward L. Melanson, Michael E. Miller, Matthew E. Monroe, Samuel G. Moore, Ronald J. Moore, Kerrie L. Moreau, Charles C. Mundorff, Nicolas Musi, Daniel Nachun, Venugopalan D. Nair, K. Sreekumaran Nair, Michael D. Nestor, Barbara Nicklas, Pasquale Nigro, German Nudelman, Eric A. Ortlund, Marco Pahor, Cadence Pearce, Vladislav A. Petyuk, Paul D. Piehowski, Hanna Pincas, Scott Powers, David M. Presby, Wei-Jun Qian, Shlomit Radom-Aizik, Archana Natarajan Raja, Krithika Ramachandran, Megan E. Ramaker, Irene Ramos, Tuomo Rankinen, Alexander (Sasha) Raskind, Blake B. Rasmussen, Eric Ravussin, R. Scott Rector, W. Jack Rejeski, Collyn Z-T. Richards, Stas Rirak, Jeremy M. Robbins, Jessica L. Rooney, Aliza B. Rubenstein, Frederique Ruf-Zamojski, Scott Rushing, Tyler J. Sagendorf, Mihir Samdarshi, James A. Sanford, Evan M. Savage, Irene E. Schauer, Simon Schenk, Robert S. Schwartz, Stuart C. Sealfon, Nitish Seenarine, Kevin S. Smith, Gregory R. Smith, Michael P. Snyder, Tanu Soni, Luis Gustavo Oliveira De Sousa, Lauren M. Sparks, Alec Steep, Cynthia L. Stowe, Yifei Sun, Christopher Teng, Anna Thalacker-Mercer, John Thyfault, Rob Tibshirani, Russell Tracy, Scott Trappe, Todd A. Trappe, Karan Uppal, Sindhu Vangeti, Mital Vasoya, Elena Volpi, Alexandria Vornholt, Michael P. Walkup, Martin J. Walsh, Matthew T. Wheeler, John P. Williams, Si Wu, Ashley Xia, Zhen Yan, Xuechen Yu, Chongzhi Zang, Elena Zaslavsky, Navid Zebarjadi, Tiantian Zhang, Bingqing Zhao, and Jimmy Zhen
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-45966-w.
References
- 1.Ruegsegger GN, Booth FW. Health benefits of exercise. Cold Spring Harbor Perspect. Med. 2018;8:a029694. doi: 10.1101/cshperspect.a029694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Fiuza-Luces C, et al. Exercise benefits in cardiovascular disease: beyond attenuation of traditional risk factors. Nat. Rev. Cardiol. 2018;15:731–743. doi: 10.1038/s41569-018-0065-1. [DOI] [PubMed] [Google Scholar]
- 3.Amar D, et al. Time trajectories in the transcriptomic response to exercise - a meta-analysis. Nat. Commun. 2021;12:3471. doi: 10.1038/s41467-021-23579-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.MoTrPAC Study Group Temporal dynamics of the multi-omic response to endurance exercise training across tissues. Preprint at https://www.biorxiv.org/content/10.1101/2022.09.21.508770v2 (2022). [DOI] [PMC free article] [PubMed]
- 5.Barbeira AN, et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 2021;22:49. doi: 10.1186/s13059-020-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Koch LG, Britton SL. Rat models of exercise for the study of complex disease. Methods Mol. Biol. (Clifton, N.J.) 2019;2018:309–317. doi: 10.1007/978-1-4939-9581-3_15. [DOI] [PubMed] [Google Scholar]
- 7.Xiao K, et al. Beneficial effects of running exercise on hippocampal microglia and neuroinflammation in chronic unpredictable stress-induced depression model rats. Transl. Psychiatry. 2021;11:461. doi: 10.1038/s41398-021-01571-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Koch LG, et al. Intrinsic aerobic capacity sets a divide for aging and longevity. Circul. Res. 2011;109:1162–1172. doi: 10.1161/CIRCRESAHA.111.253807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bulik-Sullivan BK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Finucane HK, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yao DW, O’Connor LJ, Price AL, Gusev A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 2020;52:626–633. doi: 10.1038/s41588-020-0625-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gamazon ER, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Barbeira AN, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Balliu B, et al. An integrated approach to identify environmental modulators of genetic risk factors for complex traits. Am. J. Hum. Genet. 2021;108:1866–1879. doi: 10.1016/j.ajhg.2021.08.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pontzer H, et al. Constrained total energy expenditure and metabolic adaptation to physical activity in adult humans. Curr. Biol. 2016;26:410–417. doi: 10.1016/j.cub.2015.12.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Daly RM, Bass S, Caine D, Howe W. Does training affect growth? Phys. Sportsmed. 2002;30:21–29. doi: 10.3810/psm.2002.10.488. [DOI] [PubMed] [Google Scholar]
- 17.Borer KT. The effects of exercise on growth. Sports Med. 1995;20:375–397. doi: 10.2165/00007256-199520060-00004. [DOI] [PubMed] [Google Scholar]
- 18.Godfrey RJ, Madgwick Z, Whyte GP. The exercise-induced growth hormone response in athletes. Sports Med. (Auckl. N.Z.) 2003;33:599–613. doi: 10.2165/00007256-200333080-00005. [DOI] [PubMed] [Google Scholar]
- 19.Del Giacco SR, Firinu D, Bjermer L, Carlsen K-H. Exercise and asthma: an overview. Eur. Clin. Respir. J. 2015;2:27984. doi: 10.3402/ecrj.v2.27984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bronte V, Pittet MJ. The spleen in local and systemic regulation of immunity. Immunity. 2013;39:806–818. doi: 10.1016/j.immuni.2013.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hallstrand TS, et al. Inflammatory basis of exercise-induced bronchoconstriction. Am. J. Respir. Crit. Care Med. 2005;172:679–686. doi: 10.1164/rccm.200412-1667OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sastre B, et al. Distinctive bronchial inflammation status in athletes: basophils, a new player. Eur. J. Appl. Physiol. 2013;113:703–711. doi: 10.1007/s00421-012-2475-9. [DOI] [PubMed] [Google Scholar]
- 23.Hayashi Y, et al. Ablation of fatty acid desaturase 2 (FADS2) exacerbates hepatic triacylglycerol and cholesterol accumulation in polyunsaturated fatty acid-depleted mice. FEBS Letters. 2021;595:1920–1932. doi: 10.1002/1873-3468.14134. [DOI] [PubMed] [Google Scholar]
- 24.Ershov P, et al. Enzymes in the Cholesterol Synthesis Pathway: Interactomics in the Cancer Context. Biomedicines. 2021;9:895. doi: 10.3390/biomedicines9080895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fan Z, et al. Generation of an oxoglutarate dehydrogenase knockout rat model and the effect of a high-fat diet. RSC Adv. 2018;8:16636–16644. doi: 10.1039/c8ra00253c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhao G-N, et al. Tmbim1 is a multivesicular body regulator that protects against nonalcoholic fatty liver disease in mice and monkeys by targeting the lysosomal degradation of Tlr4. Nat. Med. 2017;23:742–752. doi: 10.1038/nm.4334. [DOI] [PubMed] [Google Scholar]
- 27.Davis RA. Cell and molecular biology of the assembly and secretion of apolipoprotein B-containing lipoproteins by the liver. Biochim. Biophys. Acta Mol. Cell Biol. Lipids. 1999;1440:1–31. doi: 10.1016/s1388-1981(99)00083-9. [DOI] [PubMed] [Google Scholar]
- 28.Yu X-H, et al. ABCG5/ABCG8 in cholesterol excretion and atherosclerosis. Clin. Chim. Acta. 2014;428:82–88. doi: 10.1016/j.cca.2013.11.010. [DOI] [PubMed] [Google Scholar]
- 29.Legaki E, Arsenis C, Taka S, Papadopoulos NG. DNA methylation biomarkers in asthma and rhinitis: are we there yet? Clin. Transl. Allergy. 2022;12:e12131. doi: 10.1002/clt2.12131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Song M-K, Kim DI, Lee K. Causal relationship between humidifier disinfectant exposure and Th17-mediated airway inflammation and hyperresponsiveness. Toxicology. 2021;454:152739. doi: 10.1016/j.tox.2021.152739. [DOI] [PubMed] [Google Scholar]
- 31.Lepeule J, et al. Gene promoter methylation is associated with lung function in the elderly: the normative aging study. Epigenetics. 2012;7:261–269. doi: 10.4161/epi.7.3.19216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhu Z, et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J. Allergy Clin. Immunol. 2020;145:537–549. doi: 10.1016/j.jaci.2019.09.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Alvarez-Carretero S, et al. A species-level timeline of mammal evolution integrating phylogenomic data. Nature. 2022;602:263–267. doi: 10.1038/s41586-021-04341-1. [DOI] [PubMed] [Google Scholar]
- 34.Lonsdale J, et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Oliva M, et al. The impact of sex on gene expression across human tissues. Science. 2020;369:eaba3066. doi: 10.1126/science.aba3066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sone K, et al. Changes of estrous cycles with aging in female F344/n rats. Exp. Anim. 2007;56:139–148. doi: 10.1538/expanim.56.139. [DOI] [PubMed] [Google Scholar]
- 37.Landen S, et al. Genetic and epigenetic sex-specific adaptations to endurance exercise. Epigenetics. 2019;14:523–535. doi: 10.1080/15592294.2019.1603961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Landen S, et al. Physiological and molecular sex differences in human skeletal muscle in response to exercise training. J. Physiol. 2023;601:419–434. doi: 10.1113/JP279499. [DOI] [PubMed] [Google Scholar]
- 39.Many, G. M. et al. Sexual dimorphism and the multi-omic response to exercise training in rat subcutaneous white adipose tissue. bioRxiv: Preprint Server Biol. (2023). [DOI] [PMC free article] [PubMed]
- 40.Wang Z, et al. Genome-wide association analyses of physical activity and sedentary behavior provide insights into underlying mechanisms and roles in disease prevention. Nat. Genet. 2022;54:1332–1344. doi: 10.1038/s41588-022-01165-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sanderson E, et al. Mendelian randomization. Nat. Rev. Methods Primers. 2022;2:1–21. doi: 10.1038/s43586-021-00092-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sanford JA, et al. Molecular transducers of physical activity consortium (MoTrPAC): mapping the dynamic responses to exercise. Cell. 2020;181:1464–1474. doi: 10.1016/j.cell.2020.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:1–21. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Ignatiadis N, Klaus B, Zaugg JB, Huber W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods. 2016;13:577–580. doi: 10.1038/nmeth.3885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Heller R, Yaacoby S, Yekutieli D. Repfdr: a tool for replicability analysis for genomewide association studies. Bioinformatics. 2014;30:2971–2972. doi: 10.1093/bioinformatics/btu434. [DOI] [PubMed] [Google Scholar]
- 46.Heller R, Yekutieli D. Replicability analysis for genome-wide association studies. Ann. Appl. Stat. 2014;8:481–498. [Google Scholar]
- 47.Efron B. Size, power and false discovery rates. Ann. Stat. 2007;35:1351–1377. [Google Scholar]
- 48.Frankish A, et al. GENCODE 2021. Nucleic Acids Res. 2021;49:D916–D923. doi: 10.1093/nar/gkaa1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Smith JR, et al. The year of the rat: the rat genome database at 20: a multi-species knowledgebase and analysis platform. Nucleic Acids Res. 2020;48:D731–D742. doi: 10.1093/nar/gkz1041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ochoa D, et al. Open targets platform: supporting systematic drug–target identification and prioritisation. Nucleic Acids Res. 2021;49:D1302–D1310. doi: 10.1093/nar/gkaa1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Luraschi, J. et al. sparklyr: R Interface to Apache Spark. R package version 1.7.7, https://CRAN.R-project.org/package=sparklyr (2022).
- 52.Delignette-Muller ML, Dutang C. Fitdistrplus: an R package for fitting distributions. J. Stat. Softw. 2015;64:1–34. [Google Scholar]
- 53.Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wheeler HE, et al. Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLoS Genet. 2016;12:e1006423. doi: 10.1371/journal.pgen.1006423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Gelman, A. et al. Bayesian Data Analysis, 3E (Chapman and Hall/CRC, 2013).
- 58.Gelman A, Hill J, Yajima M. Why we (usually) don’t have to worry about multiple comparisons. J. Res. Educat. Effect. 2012;5:189–211. [Google Scholar]
- 59.Schnabel RB, Koonatz JE, Weiss BE. A modular system of algorithms for unconstrained minimization. ACM Trans. Math. Softw. 1985;11:419–440. [Google Scholar]
- 60.Nash, J. C., Varadhan, R. & Grothendieck, G. optimx: Expanded Replacement and Extension of the ’optim’ Function. R package version 10.21, https://CRAN.R-project.org/package=optimx (2022).
- 61.Higham NJ. Computing the nearest correlation matrix—a problem from finance. IMA J. Numer. Anal. 2002;22:329–343. [Google Scholar]
- 62.Bates, D. & Maechler, M. Matrix. R package version 1.6-5, https://CRAN.R-project.org/package=Matrix (2019).
- 63.Team, S. D. Stan Modeling Language Users Guide and Reference Manual. Version 2.34, https://mc-stan.org (2023).
- 64.Gabry, J. & Češnovar, R. cmdstanr: R Interface to ’CmdStan’. R package version 0.3.0.9000, https://mcstan.org/cmdstanr/ (2022).
- 65.Adler, D., Kelly, S. T. & Elliott, T. M. vioplot: Violin Plot. R package version 0.4.0, https://CRAN.Rproject.org/package=vioplot (2021).
- 66.Bürkner, P., Gabry, J., Kay, M. & Vehtari, A. posterior: Tools for Working with Posterior Distributions. R package version 1.2.2, https://mc-stan.org/posterior/ (2022).
- 67.Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at https://www.biorxiv.org/content/10.1101/060012v3 (2021).
- 68.Wilson DJ. The harmonic mean p-value for combining dependent tests. Proc. Natl. Acad. Sci. 2019;116:1195–1200. doi: 10.1073/pnas.1814092116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Sodini SM, Kemper KE, Wray NR, Trzaskowski M. Comparison of genotypic and phenotypic correlations: Cheverud’s conjecture in humans. Genetics. 2018;209:941–948. doi: 10.1534/genetics.117.300630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Vetr, N., Gay, N. & Stephen, M.The impact of exercise on gene regulation in association with complex trait genetics. Version 1.0.0, https://zenodo.org/records/10211801 (2023). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study did not generate novel data, relying instead on previously or concurrently published data. MoTrPAC PASS1B data (10.1101/2022.09.21.508770) used here have been deposited at https://motrpac-data.org/data-access. Inquiries regarding access to these data should be sent to motrpac-helpdesk@lists.stanford.edu. Further resources are available at motrpac.org and motrpac-data.org. Where it would be difficult to re-host large datasets from GTEx34, Open Targets50, and PrediXcan5, we provide download links in the documentation of the associated code repository. Source data to generate all figures seen here are provided with this paper in the form of *.RData objects. These contain all necessary processed data to fully and quickly reproduce all paper figures using the scripts contained in https://github.com/NikVetr/MoTrPAC_Complex_Traits/tree/main/scripts/figures. Source data are provided with this paper.
We provide end-to-end scripts to perform all analyses described above in a GitHub repository70 located at the following URL: https://github.com/NikVetr/MoTrPAC_Complex_Traits. Additionally, we provide scripts to generate all figures, as well as intermediate data files corresponding to compiled results at each level of analysis (MCMC output, Open Targets associations, cross-referenced DEG-PrediXcan intersects, aggregated GCTA output, and relative effect sizes).