Summary
Regulation of transcription and translation are mechanisms through which genetic variants affect complex traits. Expression quantitative trait locus (eQTL) studies have been more successful at identifying cis-eQTL (within 1 Mb of the transcription start site) than trans-eQTL. Here, we tested the cis component of gene expression for association with observed plasma protein levels to identify cis- and trans-acting genes that regulate protein levels. We used transcriptome prediction models from 49 Genotype-Tissue Expression (GTEx) Project tissues to predict the cis component of gene expression and tested the predicted expression of every gene in every tissue for association with the observed abundance of 3,622 plasma proteins measured in 3,301 individuals from the INTERVAL study. We tested significant results for replication in 971 individuals from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA). We found 1,168 and 1,210 cis- and trans-acting associations that replicated in TOPMed (FDR < 0.05) with a median expected true positive rate (π1) across tissues of 0.806 and 0.390, respectively. The target proteins of trans-acting genes were enriched for transcription factor binding sites and autoimmune diseases in the GWAS catalog. Furthermore, we found a higher correlation between predicted expression and protein levels of the same underlying gene (R = 0.17) than observed expression (R = 0.10, p = 7.50 × 10−11). This indicates the cis-acting genetically regulated (heritable) component of gene expression is more consistent across tissues than total observed expression (genetics + environment) and is useful in uncovering the function of SNPs associated with complex traits.
Keywords: TWAS, plasma proteome, cis-eQTL, trans-eQTL, heritability, genetic prediction, gene expression, transcription factors, autoimmune diseases
We performed TWAS for thousands of plasma proteins, comparing same-gene, cis, and trans effects across tissues. We show the heritable component of gene expression more strongly correlates with protein levels than with total observed expression and is therefore more useful in uncovering the functions of SNPs associated with complex traits.
Introduction
The regulation of gene expression and protein abundance are important mechanisms through which many noncoding genome-wide association study (GWAS) SNPs affect traits.1 Expression quantitative trait locus (eQTL) mapping in multiple human tissues has discovered a variety of both distal (trans) and proximal (cis) variants associated with gene expression.2,3,4 While cis-eQTLs tend to have larger effect sizes than trans-eQTLs,5,6 studies have shown that trans-eQTLs account for a larger portion of the heritability of gene expression.5,7 One study on twins found that, on average, trans-eQTLs explained over 3 times the variance in gene expression than cis-eQTLs.8 Furthermore, trans-eQTLs tend to be more tissue specific, suggesting that they play a role in cell type differentiation.4,6
Despite the importance of trans-eQTLs in regulating gene expression, QTL mapping studies have been limited in their ability to detect trans-acting effects due in part to the high multiple testing burden, as well as their comparatively low effect sizes.4,9,10 Methods that minimize the multiple testing burden by prioritizing subsets of variants or grouping trans-genes have proven more successful at identifying trans-eQTLs.9,10,11,12,13 For example, one study that tested the cis-component of gene expression for association with the observed expression of distant genes identified more replicable trans-acting genes than a comparable trans-eQTL study.9 This is because many trans-eQTLs colocalize with cis-eQTLs, and they affect the expression of distant genes through cis-mediators such as a nearby transcription factor (TF) gene.4,5,10
The advent of advanced assay technologies that capture and measure protein abundances has enabled protein quantitative trait locus (pQTL) mapping studies to identify variants associated with the abundance of proteins.14,15 Trans-pQTLs, like eQTLs, tend to have lower effect sizes and be tissue specific.16 Mirroring methods for detecting trans-eQTLs, we hypothesized that cis-prioritization will improve detection of trans-pQTLs.
Here, we applied a transcriptome-wide association study (TWAS) framework to proteomic data, testing the genetically predicted expression of genes for association with the observed abundance of plasma proteins.17 We show that TWAS for protein levels is an effective method for identifying replicable trans-acting associations between predicted transcripts and proteins. We also found a high expected proportion of true positives for associations between the predicted transcripts and protein products of the same underlying gene. Furthermore, using RNA-sequencing data, we show that predicted gene expression better correlates with protein levels than observed gene expression.
Material and methods
This study was approved by the Loyola University Chicago institutional review board (IRB) project #2014. Appropriate informed consent was obtained from human subjects.
Genome and proteome data
Our discovery dataset was from the INTERVAL study, which was conducted on around 50,000 blood donors with European ancestry across England.18 Here, we used data from the 3,301 individuals who had both a genotyping microarray performed (EGA: EGAD00010001544) and a targeted proteome assay run to measure their plasma proteome levels (EGA: EGAD00001004080).14 Data generation and quality control have previously been described by the INTERVAL study.14,18 Briefly, an Affymetrix Axiom UK Biobank array was used for genotyping, and imputation was performed on the Sanger imputation server using a combined 1000 Genomes phase 3-UK10K reference panel.14,19 Genotypes were then filtered for minor allele frequency (MAF) > 0.01 and imputation R2 > 0.8.19 The SOMAscan assay used to collect the proteomic data targeted 3,622 plasma proteins.20 The protein levels were log transformed and adjusted for age, sex, duration between blood draw and processing, and the first three genetic principal components (PCs).14
Our replication dataset was from the Trans-omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA) multi-omics pilot study. The TOPMed program is a research consortium that aims at improving personalized disease treatments through the study of genetics and other omics traits’ effects on disease traits and drug responses.21 MESA is a community-based cohort study designed to determine the prevalence, determinants, and progression of subclinical cardiovascular disease.22 MESA recruited men and women aged 45–84 free of clinical cardiovascular disease at baseline from six different locations in the United States and from four major race/ethnicity groups, which included African American (AFA), Chinese (CHN), European (EUR), and Hispanic/Latino (HIS).22 Individuals were genotyped as part of the MESA SHARe study (dbGaP: phs000420.v6.p3).22 As previously described, an Affymetrix 6.0 array was used for genotyping, and imputation was performed on the Michigan imputation server using the 1000 Genomes phase 3 v5 reference panel.19,23 Imputed genotypes were then filtered for MAF >0.01 and imputation R2 > 0.8.19 Additionally, individuals taking part in the MESA multi-omics pilot study had their plasma proteome measured with a SOMAscan HTS Assay that targeted 1,300 plasma proteins, 1,039 of which overlapped with the proteins tested in the INTERVAL study.15 Protein levels were measured at two time points, exam 1 (2000–2002) and exam 5 (2010–2012). We log transformed each time point and then adjusted for age and sex. We then took the mean of the two time points (if a participant was not measured at both time points, we then used the single time point), performed rank inverse normalization, and adjusted for the first ten genotypic PCs.19 In total, our replication cohort included 971 individuals with genotypes and plasma protein level measurements (AFA, n = 183; CHN, n = 71; EUR, n = 416; HIS, n = 301).
Transcriptome data
For our analysis comparing the genetically regulated transcriptome to the observed transcriptome, we used transcriptomic data from individuals in the MESA multi-omics pilot study. RNA sequencing was performed for individuals from all four populations (AFA, CHN, EUR, and HIS) in three different blood cell types: peripheral blood mononuclear cells (PBMC), CD14+ monocytes, and CD4+ T cells.22 In total, 395 monocyte samples and 397 T cell samples were sequenced at one time point, exam 5, and 1,287 PBMC samples were sequenced over two time points, exam 1 and exam 5. Genes with average transcripts per million (TPM) values <0.1 were filtered out, leaving 18,193 genes with expression measurements in PBMC, monocytes, and T cells. After log transforming each TPM value and adjusting for age and sex as covariates using linear regression and extracting the residuals, we took the mean of the two time points (or the single adjusted log-transformed value if expression levels were only measured once), performed rank-based inverse normal transformation, and adjusted for the first 10 genotype and 10 expression PCs, as described previously.24
TWAS for protein levels
We performed TWAS with the software tool, PrediXcan, which leverages eQTL weights to predict genetically regulated expression (GReX) and performs a linear association analysis to correlate GReX with a measured trait.17 We used gene expression prediction models from PredictDB, which were built using the Genotype-Tissue Expression (GTEx) Project’s version 8 release, to impute GReX in 49 different human tissues.6,17,25,26 The models were built using multi-variate adaptive shrinkage in R (MASHR)27 and only include cis-eQTLs with MAF >0.01.28 The number of genes included in each tissue’s gene expression prediction model can be found in Table S1. The GTEx models collapse alternative transcripts into gene-level prediction models, meaning what we refer to as the predicted transcript levels for any one gene may include multiple different mRNA products. In each tissue, we tested genetically predicted transcript levels for association with the observed protein levels of all 3,622 plasma proteins measured in the INTERVAL study. We assessed significance via the Benjamini-Hochberg false discovery rate (FDR) method. Within each of the 49 tissues for which we predicted expression, we used the Qvalue R package to calculate q values for all predicted transcript-protein association tests conducted.29 Transcript-protein pairs with a q value (FDR) <0.05 were considered statistically significant.
For every transcript-protein pair that we found significant (FDR < 0.05) in INTERVAL, we tested the association for replication using genotypes and protein levels from TOPMed MESA if the protein was measured in both studies. We assessed significance of replicating pairs via the Benjamini-Hochberg FDR method. Within each of the 49 tissues for which we predicted expression, we used the Qvalue R package to calculate q values for all predicted transcript-protein association tests conducted in TOPMed.29 Transcript-protein pairs with a q value (FDR) <0.05 were considered statistically significant.
Calculating the proportion of true positives
The π0 statistic is the estimated proportion of false positives from a distribution of p values, assuming a uniform distribution of null p values.29 The q value function from the Qvalue R package calculates the π0 statistic from a vector of p values.29 Likewise, the π1 statistic estimates the proportion of true positives given a distribution of p values and is derived from π0 as defined below.29
We divided the associations we tested in INTERVAL into four categories based on the genomic proximity of the predicted transcript and the target protein: cis-acting, cis-same, cis-different, and trans-acting. We defined cis-acting relationships as those where the transcription start site of the gene that encodes the predicted transcript was within 1 Mb of the transcription start site of the gene that encodes the target protein. Likewise, trans-acting transcript-protein pairs were greater than 1 Mb away from each other or on different chromosomes. We further divided cis-acting relationships into cis-same, where the gene that encodes the predicted transcript was the same as the gene that encodes the target protein, and cis-different, where the predicted transcript and target protein are encoded by different but nearby genes.
For each of these groups in every tissue, we pulled the p values for every tested association and calculated the π0 statistic using the Qvalue R package. While we used the default q value function parameters in INTERVAL, we adjusted the q value parameters when replicating in TOPMed MESA. Because we only tested pairs that we already found significant in INTERVAL, most of the cis-same associations tested in TOPMed MESA returned significant p values, thus the p value distribution in most tissues did not extend all the way to 1. By default, the q value function calculates the average frequency of p values from 0.05 to 1.0 to determine the expected proportion of null p values, so there must be p values throughout this entire range for the function to work. These bounds are controlled by the lambda parameter, which we set from 0.05 to 0.75 instead of the default 0.05 to 1.0 when calculating π0 in TOPMed MESA. With the estimated π0 statistic, we calculated the π1 value for every cis/trans group in every tissue.
When mapping trans effects, there is danger of false negatives when adjusting for potential confounders via PEER factor or PC correction.30,31 However, failure to remove confounding factors could result in false positive trans associations. In a sensitivity analysis, we compared TWAS results without protein PC adjustment to TWAS results also adjusting the INTERVAL protein matrix by 5–40 PCs, which could control for unknown confounders. We observed consistent π1 statistics across the protein PCs tested and observed consistent counts of significant transcript-protein pair hits (FDR <0.05) for cis-same and cis-different mechanisms with some variability and no clear trend in the trans results (Figure S1). Therefore, we kept the non-protein PC-adjusted results for downstream analyses.
Gene set enrichment analysis of target proteins
We used the web tool, Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS) to perform a gene set enrichment analysis of all of the protein targets that replicated in TOPMed MESA.32 We tested the targets involved in cis-acting and trans-acting associations separately. For both groups, we tested the target proteins for enrichment (FDR <0.05) of GWAS catalog associations33 and motifs that are known targets of TFs annotated in the Molecular Signatures Database.34,35
Identifying pleiotropic regulatory loci
We defined a pleiotropic regulatory gene as one that is significantly associated with the abundance of more than 50 unique protein targets in INTERVAL. We counted the number of significant target proteins for each gene (FDR <0.05) across all 49 tissues in INTERVAL to identify pleiotropic regulatory genes. We grouped pleiotropic regulatory genes whose transcription start sites were within 200 kb of each other into pleiotropic regulatory loci. For each pleiotropic regulatory locus, we quantified the number of unique protein targets of the genes within that locus along with the number of these targets that were tested in TOPMed MESA and the number of these targets with replicated associations with any of the genes in that locus in TOPMed MESA.
Gene set enrichment analysis of pleiotropic regulators
We used FUMA GWAS to perform a gene set enrichment analysis of the pleiotropic regulatory genes as well as their protein targets.32 We tested the protein targets of each pleiotropic regulatory locus that we discovered in INTERVAL for enrichment (FDR <0.05) of GWAS catalog associations and TF target motifs using all proteins measured in INTERVAL as background. For the pleiotropic regulatory loci with more than one gene, we tested the pleiotropic regulatory genes at that locus for enrichment of GWAS catalog associations and TF target motifs using the union of all genes in each tissue prediction model as background (22,133 genes total).
Cis-same observed expression association analysis
We performed a linear regression analysis to test observed expression levels for association with observed protein levels. RNA-sequencing data are not available in INTERVAL, but they are in TOPMed MESA. In each of these tissues, we leveraged PrediXcan’s linear regression association script to test the observed gene expression of each gene measured in TOPMed MESA for association with the observed abundance of the protein product of that gene if it was also measured in TOPMed MESA. We compared these results to the association of the predicted gene expression of each gene with the observed abundance of the protein product of that gene. The number of genes that we tested for cis-same associations in each tissue are listed in Table S2.
As above, we assessed significance via the Benjamini-Hochberg FDR method. Within each of the 49 tissues with gene expression prediction models, as well as the 3 tissues with observed gene expression data, we calculated q values for all the cis-same transcript-protein pairs tested using the Qvalue R package. We further calculated the π1 statistic for the cis-same associations tested in every predicted and observed tissue using the Qvalue R package with a truncated lambda range (0.05–0.75), as described above for TOPMed MESA.
Finally, in every predicted and observed tissue, we calculated the Pearson correlation of gene expression with protein abundance for every gene with a significant cis-same association in any tissue. Because some genes were not included in every prediction model and a different set of genes were measured via RNA sequencing, we were not able to calculate the Pearson correlation of expression and protein levels for every gene in every tissue. To summarize results across tissues, we calculated the maximum correlation values between gene expression and protein levels for every gene across all the predicted tissues and across all the observed tissues.
Results
TWAS for proteins identifies replicable gene-protein associations
We sought to identify both cis- and trans-acting transcriptional regulators of plasma proteins by performing TWAS for protein levels. Using the PrediXcan software framework,17 we tested the genetically regulated component of gene expression (GReX) for association with plasma protein levels. Our discovery set included individuals from the INTERVAL cohort (n = 3,301), and we sought to replicate our findings in the TOPMed MESA cohort (n = 971). For these individuals, we predicted gene expression using previously built prediction models in 49 tissues from the GTEx project (Figure 1A). Then, we calculated the correlation between predicted gene expression and observed protein levels for all 3,622 proteins measured in INTERVAL. We quantified significant transcript-protein pairs as cis- (within 1 Mb of each other) or trans-acting (greater than 1 Mb apart) relationships. We further divided the cis-acting pairs into cis-same, where a transcript is associated with the protein that it encodes, and cis-different, where a transcript is associated with the protein product of a nearby, different gene (Figure 1B).
We identified 3,699 significant (FDR <0.05) unique cis-acting associations for 482 unique proteins (240 cis-different and 242 cis-same) and 13,598 significant (FDR <0.05) unique trans-acting associations for 2,016 unique proteins in INTERVAL (Figures 2A–2D). The TOPMed MESA plasma proteome data included 1,039 proteins that were also measured in INTERVAL. Of the 17,297 significant transcript-protein pairs we discovered in INTERVAL, we tested 8,111 pairs for replication in TOPMed MESA and found 1,168 cis-acting pairs replicated (FDR <0.05) for 218 unique proteins (92 cis-different and 126 cis-same) and 1,210 trans-acting pairs replicated for 239 proteins (FDR <0.05, Figures 2B–2D). On average, the significant cis-acting relationships we discovered in INTERVAL were shared across more tissues than the significant trans-acting relationships we discovered in INTERVAL (Figure 3).
Of the transcript-protein pairs tested in INTERVAL, the trans-acting results had the lowest expected true positive rate (π1) with a median π1 of 0.004 across all 49 tissues, followed by the cis-different results with a median π1 of 0.099, and the cis-same results with a median π1 of 0.278 (Figure 4; Table S3). We have more confidence in the significant results from INTERVAL that were also tested in TOPMed. The median π1 value across tissues increased to 0.390 for trans-acting relationships, 0.783 for cis-different pairs, and 0.888 for cis-same pairs (Figure 4; Table S4). Within mechanism categories, π1 values were largely uncorrelated with the number of transcript-protein pairs tested in each tissue; only cis-same π1 values in INTERVAL correlated with number of tests (Figure S2).
Protein targets of trans-acting genes enriched for transcription factor target motifs and GWAS catalog phenotypes
We first tested the protein targets that replicated in TOPMed MESA, divided into targets of cis-acting genes and targets of trans-acting genes, for enrichment of motifs targeted by TFs. While the cis-targets were not enriched for TF targets, the trans-targets were enriched for motifs targeted by the TFs NFKB2, RELA, NFAT1C, FOXF2, AR, GATA1, and STAT1 (Figure 5; Table S5).
Furthermore, we tested the cis- and trans-targets for enrichment of GWAS catalog associations and found that the trans-targets were enriched for blood protein levels and inflammatory bowel disease, and the cis-targets were enriched for blood protein levels, ankylosing spondylitis, inflammatory bowel disease, and chronic inflammatory diseases (Table S6).
Pleiotropic regulatory regions enriched for TF target motifs and GWAS catalog phenotypes
By quantifying the number of target proteins that each transcript was significantly associated with, we identified several loci that may be involved in the regulation of many different proteins throughout the genome, which we have named “pleiotropic regulatory” loci. Here, we defined a pleiotropic gene as one with more than 50 unique protein targets in INTERVAL. We grouped pleiotropic genes whose transcription start sites are within 200 kb of each other into pleiotropic loci. These loci are represented through the vertical lines of dots in Figures 2A–2C. We discovered 11 distinct pleiotropic regulatory loci in INTERVAL (Table S7). While most of the loci did not have many targets that replicated in TOPMed MESA, there were a few that replicated well, including the C7 locus on chromosome 5, the SKIV2L locus on chromosome 6, the ABO locus on chromosome 9, and the SARM1 locus on chromosome 17 (Table S7). Only one of the 218 tested targets of the largest pleiotropic regulatory locus discovered in INTERVAL, the MYADM locus on chromosome 19, replicated in TOPMed MESA (Table S7).
We performed a gene set enrichment analysis of the protein targets in INTERVAL of each of these pleiotropic regulatory loci. For most of the loci, we found no significant enrichment of TF targets or GWAS catalog associations in the target proteins. However, we found that the target proteins of the ABO locus were enriched (FDR <0.05) for associations with blood protein levels in the GWAS catalog. Furthermore, we found that the target proteins of the C7 locus were enriched (p value: 6.58e-5; adjusted p: 4.02e-2) for a motif (MSigDB: M18461) that is targeted by the TF ARNT. Of the 271 genes in the gene set, we tested 42 in our TWAS, and 13 were targets of the C7 locus. While ARNT had gene expression prediction models in many tissues, it was not significantly associated with any of the targets of the C7 locus in our TWAS analysis.
Additionally, we performed a gene set enrichment analysis of the pleiotropic regulatory genes involved in each locus that comprised of more than one gene. Four of five loci tested were enriched for some GWAS catalog associations (Table S8). The HLA locus was enriched for 52 GWAS catalog associations, including a wide variety of immune-related diseases and conditions like neuromyelitis, lymphoma, pneumonia, and more. Only one locus was enriched (FDR <0.05) for TF targets; the SARM1 locus on chromosome 17 was enriched (FDR <0.05) for a motif (MSigDB: M826) targeted by the TF, SREBF1. Of the 174 genes in this gene set, we tested 153 in our TWAS, and three were pleiotropic regulators at this locus: POLDIP2, TMEM199, and SUPT6H. While SREBF1 had prediction models in many tissues, it was not significantly associated with any target proteins in our TWAS analysis.
Predicted gene expression correlates better with protein levels than observed gene expression
We used the RNA-sequencing data from TOPMed MESA to test how the correlation of observed gene expression with observed protein abundance compared to that of predicted gene expression with observed protein abundance. For each of the three tissues with observed gene expression data (PBMC, monocytes, and T cells), we tested the abundance of all 1,300 proteins measured for association with the observed expression of the genes that encode the proteins (cis-same gene-protein relationship). We compared these observed expression results to cis-same TWAS results using the GTEx prediction models. We discovered more genes with significant associations between predicted expression and observed protein levels (FDR <0.05) than genes with significant associations between observed gene expression and observed protein levels (FDR <0.05). In total, we discovered 407 genes with a significant cis-same association across all 49 predicted tissues and 121 genes with a significant cis-same association across all three measured tissues. We found a significant cis-same association with both predicted and observed expression for 89 genes, while the rest were unique associations (Figure 6A).
Furthermore, the proportion of true positive cis-same associations (π1) was on average higher across predicted tissues than observed tissues (Figure 6B). The observed tissue with the highest π1 value was PBMC at 0.239, followed by monocytes at 0.193, and T cells at 0.077. Likewise, all but one predicted tissue had a higher π1 than the observed tissues (Table S9). Notably, whole blood, the closest predicted tissue to the observed tissues, had a higher π1 than all three of the observed tissues at 0.331.
Finally, we wanted to see if the correlation of predicted expression and protein abundance was stronger than the correlation of observed gene expression and protein abundance. For the union of genes whose expression, predicted or observed, was significantly (FDR <0.05) associated with protein abundance, we calculated the Pearson correlation of expression and protein levels in every tissue where there was a measurement for both traits. When looking at the maximum correlation values across the predicted and observed tissues separately, we found that GReX on average had a stronger correlation with protein abundance than observed gene expression for significant cis-same genes (Figures 6C and 6D). We found that predicted tissues closely related to blood plasma, such as whole blood and liver, ranked high in terms of median correlation of expression levels and protein levels by gene, while most of the brain tissues had the lowest median correlation of expression levels and protein levels (Figure 7). While median correlation significantly associated with the number of cis-same genes tested (R2 = 0.27, p = 0.00011), we note that whole blood and liver both had higher correlations than expected given the number of genes tested (Figure S3).
Discussion
Here, we applied the TWAS framework to test genetically regulated gene expression for association with measured plasma protein levels in order to discover gene regulatory relationships between both distant (trans-acting) and nearby (cis-acting) genes. Similar to a prior study, which applied trans-PrediXcan to test genetically regulated gene expression for association with observed expression levels, our approach proved more effective at identifying trans-acting effects than a typical QTL study.9 Compared to a trans-pQTL study performed in our discovery cohort (INTERVAL), which found 1,104 proteins with trans-pQTL14 (p < 1.5 × 10−11), our method discovered 2,016 protein targets of trans-acting genes, 239 of which replicated in the much smaller TOPMed MESA cohort. Methods like TWAS, which prioritize cis-eQTL, have been shown to be more effective at discovering trans-acting effects because often trans-eQTL act through cis-mediators like nearby TF genes.10 We found that the protein targets of trans-acting genes were enriched for TF binding sites, while the cis-targets were not, supporting the idea that many trans-effects are driven by TF genes. Furthermore, we found that the cis-acting associations were shared across more tissues than the trans-acting effects, which tended to be more tissue specific, as has been shown in previous eQTL studies.3,7
We identified several loci throughout the genome with strong pleiotropic effects where one gene, or several in linkage disequilibrium, significantly (FDR <0.05) associated with many protein targets throughout the genome. Many of these loci have been identified before, including the ABO, VTN, APOE, CFH, and BCHE loci.14,36,37,38,39 Here, we called these regions pleiotropic regulatory loci and discovered 11 in INTERVAL and 5 that replicated in TOPMed MESA. It has been shown previously that these trans-acting pleiotropic regulator genes are enriched for GWAS traits, suggesting that trans-protein regulation plays an important role in disease variation.9,38 We performed a gene set enrichment analysis of all of the trans-acting genes in each of these pleiotropic regulatory loci as well as the target proteins of each of these pleiotropic regulatory loci. We found that the targets and pleiotropic regulatory genes of many of these loci were enriched for GWAS catalog associations including several autoimmune diseases and other disease phenotypes. Autoimmune disease enrichment is somewhat expected given the proteins in our TWAS were measured in blood plasma. For example, the genes at the CFHR locus were enriched for autoimmune diseases such as IgA nephropathy and age-related macular degeneration, as well as C3 and C4 levels. CFHR genes interact with proteins like C3 and C4 in the complement system, a cascade of proteins important to the immune response system, thus changes in expression of these pleiotropic regulatory genes could lead to the progression of autoimmune diseases.40
We found that our significant results discovered in INTERVAL had a low expected proportion of true positives (π1) across all associations tested, though we have more confidence in the cis-acting results than trans-acting. This is a symptom of an ongoing issue with identifying trans-acting effects; the multiple testing burden is too high due to the high number of associations that must be tested combined with the observation that trans-acting effects are generally smaller than cis-acting effects.3,6,41,42 Nevertheless, we replicated many of our significant associations discovered in INTERVAL in TOPMed MESA, where we found much higher proportions of true positives across all associations tested. In many tissues, we estimated a π1 of nearly 1.0 for the cis-same results, indicating a strong correlation between genetically regulated gene expression levels and observed protein levels. This is in contrast with many studies that have shown a poor correlation between transcript and protein levels of the same underlying gene.43,44,45,46 One of the main issues in correlating expression levels with protein levels is the high fluctuation in these traits due to environmental influence; it has been shown that proteins that can be more reproducibly measured, meaning they are less prone to environmental variation, have a stronger correlation with expression levels.47 Furthermore, genetically predicted expression levels have been shown to strongly correlate with genetically predicted protein levels.48
Here, we show that genetically predicted expression levels correlate better with plasma protein abundance than observed expression levels. This indicates the genetically regulated (heritable) component of gene expression and protein abundance is more consistent across tissues than the non-genetic, i.e., environmental, components. We leveraged the TWAS framework to test both predicted expression in 49 tissues and observed expression in three tissues for association with plasma protein levels in individuals from the TOPMed MESA cohort. Most of the unique associations we discovered with observed expression were also significant when using predicted expression, and we found many unique associations with predicted expression that we did not with observed expression. Furthermore, we estimated a higher proportion of true positives for our predicted expression results. Even in a tissue-matched scenario (comparing predicted expression in whole blood to observed expression in PBMC), we found a higher proportion of true positive results for predicted expression. Additionally, we found that the Pearson correlation of expression levels with proteins levels of the same underlying gene was on average higher when working with predicted expression than observed expression. We found that tissues that are closely related to blood, like whole blood and liver, which is responsible for secreting many plasma proteins into the bloodstream, had a higher correlation of predicted expression levels and protein levels, which has been shown previously in another cohort.48 Furthermore, the brain tissues tended to have the lowest correlation of expression levels and plasma protein levels, perhaps because of the blood-brain barrier, as has been suggested previously.48
A limitation of the study is that our discovery cohort is not ancestrally diverse, comprising entirely of individuals of European descent, while our replication cohort, which is diverse, has a small sample size. Another limitation of this study is the type of proteomic data we used. Our study was not truly proteome wide, as we could only test the proteins measured by the targeted proteome assay. As such, there are likely many regulatory relationships that we were not able to capture due to the limited number of proteins measured in both the INTERVAL and TOPMed study. Furthermore, we only have proteomic data for plasma proteins when, like gene expression levels, protein levels vary across tissues and cell types. Additionally, the aptamers on the SOMAscan assays used to target specific proteins are known to sometimes have multiple targets, so some of our protein level measurements may represent the abundance of multiple different proteins.19 All protein assays that rely on binding could be affected by protein altering variants in the aptamer binding site. However, integrating proteomic data with RNA-sequencing transcriptome data alleviates some of these concerns. We note that just 120 of the 3,339 (3.6%) INTERVAL proteins and zero of the 1,335 (0%) TOPMed MESA proteins had protein-altering variants, defined by the Ensembl Variant Effect Predictor,49 in their respective GTEx whole blood transcript prediction models.
Our results highlight the benefits of working with predicted expression over observed expression. First, it is easier to calculate predicted expression than it is to measure observed expression since many more studies have genome-wide genotypes than gene expression data. Also, using the cis-acting genetically regulated (heritable) component of gene expression to discover trans-acting gene effects on protein abundance finds more significant associations than traditional SNP-based pQTL studies. Most importantly, because this heritable component of gene expression more strongly correlates with protein levels than total observed expression, predicted expression is useful in uncovering the function of SNPs associated with complex traits.
Data and code availability
Full summary statistics for all association analyses performed and code for presented results are available at https://github.com/hwittich/TWAS_for_protein. Data from INTERVAL is under controlled access via the European Genome-phenome Archive at https://ega-archive.org/ for both genotypes (EGA: EGAD00010001544) and blood plasma aptamers levels as measured by a SOMAscan assay (EGA: EGAD00001004080). TOPMed MESA data are under controlled access in dbGaP at https://www.ncbi.nlm.nih.gov/gap/. Genotypes are available through accession dbGaP: phs000420.v6.p3 and RNA-sequencing and proteome data are available through accession dbGaP: phs001416.v2.p1.
Acknowledgments
This work is supported by the NIH National Human Genome Research Institute Academic Research Enhancement Award R15 HG009569 (HEW). INTERVAL and TOPMed MESA grants are detailed and acknowledged in the supplemental information.
Declaration of interests
The authors declare no competing interests.
Published: February 5, 2024
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2024.01.006.
Supplemental information
References
- 1.Hormozdiari F., Gazal S., van de Geijn B., Finucane H.K., Ju C.J.-T., Loh P.-R., Schoech A., Reshef Y., Liu X., O’Connor L., et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 2018;50:1041–1047. doi: 10.1038/s41588-018-0148-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Strunz T., Grassmann F., Gayán J., Nahkuri S., Souza-Costa D., Maugeais C., Fauser S., Nogoceke E., Weber B.H.F. A mega-analysis of expression quantitative trait loci (eQTL) provides insight into the regulatory architecture of gene expression variation in liver. Sci. Rep. 2018;8:5865. doi: 10.1038/s41598-018-24219-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Liu X., Finucane H.K., Gusev A., Bhatia G., Gazal S., O’Connor L., Bulik-Sullivan B., Wright F.A., Sullivan P.F., Neale B.M., Price A.L. Functional Architectures of Local and Distal Regulation of Gene Expression in Multiple Human Tissues. Am. J. Hum. Genet. 2017;100:605–616. doi: 10.1016/j.ajhg.2017.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Võsa U., Claringbould A., Westra H.-J., Bonder M.J., Deelen P., Zeng B., Kirsten H., Saha A., Kreuzhuber R., Yazar S., et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 2021;53:1300–1310. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu X., Li Y.I., Pritchard J.K. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell. 2019;177:1022–1034.e6. doi: 10.1016/j.cell.2019.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Grundberg E., Small K.S., Hedman Å.K., Nica A.C., Buil A., Keildson S., Bell J.T., Yang T.-P., Meduri E., Barrett A., et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 2012;44:1084–1089. doi: 10.1038/ng.2394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ouwens K.G., Jansen R., Nivard M.G., van Dongen J., Frieser M.J., Hottenga J.-J., Arindrarto W., Claringbould A., van Iterson M., Mei H., et al. A characterization of cis- and trans-heritability of RNA-Seq-based gene expression. Eur. J. Hum. Genet. 2020;28:253–263. doi: 10.1038/s41431-019-0511-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wheeler H.E., Ploch S., Barbeira A.N., Bonazzola R., Andaleon A., Fotuhi Siahpirani A., Saha A., Battle A., Roy S., Im H.K. Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways and complex traits. Genet. Epidemiol. 2019;43:596–608. doi: 10.1002/gepi.22205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang F., Wang J., Pierce B.L., Chen L.S., GTEx Consortium Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis. Genome Res. 2017;27:1859–1871. doi: 10.1101/gr.216754.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dutta D., He Y., Saha A., Arvanitis M., Battle A., Chatterjee N. Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood. Nat. Commun. 2022;13:4323. doi: 10.1038/s41467-022-31845-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Banerjee S., Simonetti F.L., Detrois K.E., Kaphle A., Mitra R., Nagial R., Söding J. Tejaas: reverse regression increases power for detecting trans-eQTLs. Genome Biol. 2021;22:142. doi: 10.1186/s13059-021-02361-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liu X., Mefford J.A., Dahl A., He Y., Subramaniam M., Battle A., Price A.L., Zaitlen N. GBAT: a gene-based association test for robust detection of trans-gene regulation. Genome Biol. 2020;21:211. doi: 10.1186/s13059-020-02120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sun B.B., Maranville J.C., Peters J.E., Stacey D., Staley J.R., Blackshaw J., Burgess S., Jiang T., Paige E., Surendran P., et al. Genomic atlas of the human plasma proteome. Nature. 2018;558:73–79. doi: 10.1038/s41586-018-0175-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gold L., Ayers D., Bertino J., Bock C., Bock A., Brody E.N., Carter J., Dalby A.B., Eaton B.E., Fitzwater T., et al. Aptamer-Based Multiplexed Proteomic Technology for Biomarker Discovery. PLoS One. 2010;5 doi: 10.1371/journal.pone.0015004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yang C., Farias F.H.G., Ibanez L., Suhy A., Sadler B., Fernandez M.V., Wang F., Bradley J.L., Eiffert B., Bahena J.A., et al. Genomic atlas of the proteome from brain, CSF and plasma prioritizes proteins implicated in neurological disorders. Nat. Neurosci. 2021;24:1302–1312. doi: 10.1038/s41593-021-00886-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gamazon E.R., Wheeler H.E., Shah K.P., Mozaffari S.V., Aquino-Michaels K., Carroll R.J., Eyler A.E., Denny J.C., Nicolae D.L., et al. GTEx Consortium A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Di Angelantonio E., Thompson S.G., Kaptoge S., Moore C., Walker M., Armitage J., Ouwehand W.H., Roberts D.J., Danesh J., et al. INTERVAL Trial Group Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors. Lancet. 2017;390:2360–2371. doi: 10.1016/S0140-6736(17)31928-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schubert R., Geoffroy E., Gregga I., Mulford A.J., Aguet F., Ardlie K., Gerszten R., Clish C., Van Den Berg D., Taylor K.D., et al. Protein prediction for trait mapping in diverse populations. PLoS One. 2022;17 doi: 10.1371/journal.pone.0264341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Rohloff J.C., Gelinas A.D., Jarvis T.C., Ochsner U.A., Schneider D.J., Gold L., Janjic N. Nucleic Acid Ligands With Protein-like Side Chains: Modified Aptamers and Their Use as Diagnostic and Therapeutic Agents. Mol. Ther. Nucleic Acids. 2014;3:e201. doi: 10.1038/mtna.2014.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Taliun D., Harris D.N., Kessler M.D., Carlson J., Szpiech Z.A., Torres R., Taliun S.A.G., Corvelo A., Gogarten S.M., Kang H.M., et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590:290–299. doi: 10.1038/s41586-021-03205-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bild D.E., Bluemke D.A., Burke G.L., Detrano R., Diez Roux A.V., Folsom A.R., Greenland P., Jacob D.R., Jr., Kronmal R., Liu K., et al. Multi-Ethnic Study of Atherosclerosis: Objectives and Design. Am. J. Epidemiol. 2002;156:871–881. doi: 10.1093/aje/kwf113. [DOI] [PubMed] [Google Scholar]
- 23.Mogil L.S., Andaleon A., Badalamenti A., Dickinson S.P., Guo X., Rotter J.I., Johnson W.C., Im H.K., Liu Y., Wheeler H.E. Genetic architecture of gene expression traits across diverse populations. PLoS Genet. 2018;14 doi: 10.1371/journal.pgen.1007586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Araujo D.S., Nguyen C., Hu X., Mikhaylova A.V., Gignoux C., Ardlie K., Taylor K.D., Durda P., Liu Y., Papanicolaou G., et al. Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations. HGG Adv. 2023;4 doi: 10.1016/j.xhgg.2023.100216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Barbeira A.N., Bonazzola R., Gamazon E.R., Liang Y., Park Y., Kim-Hellmuth S., Wang G., Jiang Z., Zhou D., Hormozdiari F., et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 2021;22:49. doi: 10.1186/s13059-020-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 2019;51:187–195. doi: 10.1038/s41588-018-0268-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Barbeira A.N., Melia O.J., Liang Y., Bonazzola R., Wang G., Wheeler H.E., Aguet F., Ardlie K.G., Wen X., Im H.K. Fine-mapping and QTL tissue-sharing information improves the reliability of causal gene identification. Genet. Epidemiol. 2020;44:854–867. doi: 10.1002/gepi.22346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Storey J.D., Tibshirani R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.GTEx Consortium Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kang H.M., Ye C., Eskin E. Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots. Genetics. 2008;180:1909–1925. doi: 10.1534/genetics.108.094201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Watanabe K., Taskesen E., van Bochoven A., Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 2017;8:1826. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E., et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mootha V.K., Lindgren C.M., Eriksson K.-F., Subramanian A., Sihag S., Lehar J., Puigserver P., Carlsson E., Ridderstråle M., Laurila E., et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
- 36.Suhre K., Arnold M., Bhagwat A.M., Cotton R.J., Engelke R., Raffler J., Sarwath H., Thareja G., Wahl A., DeLisle R.K., et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 2017;8 doi: 10.1038/ncomms14357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gudjonsson A., Gudmundsdottir V., Axelsson G.T., Gudmundsson E.F., Jonsson B.G., Launer L.J., Lamb J.R., Jennings L.L., Aspelund T., Emilsson V., Gudnason V. A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat. Commun. 2022;13:480. doi: 10.1038/s41467-021-27850-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Emilsson V., Ilkov M., Lamb J.R., Finkel N., Gudmundsson E.F., Pitts R., Hoover H., Gudmundsdottir V., Horman S.R., Aspelund T., et al. Co-regulatory networks of human serum proteins link genetics to disease. Science. 2018;361:769–773. doi: 10.1126/science.aaq1327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pietzner M., Wheeler E., Carrasco-Zanini J., Raffler J., Kerrison N.D., Oerton E., Auyeung V.P.W., Luan J., Finan C., Casas J.P., et al. Genetic architecture of host proteins involved in SARS-CoV-2 infection. Nat. Commun. 2020;11:6397. doi: 10.1038/s41467-020-19996-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zipfel P.F., Wiech T., Stea E.D., Skerka C. CFHR Gene Variations Provide Insights in the Pathogenesis of the Kidney Diseases Atypical Hemolytic Uremic Syndrome and C3 Glomerulopathy. J. Am. Soc. Nephrol. 2020;31:241–256. doi: 10.1681/ASN.2019050515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Fairfax B.P., Makino S., Radhakrishnan J., Plant K., Leslie S., Dilthey A., Ellis P., Langford C., Vannberg F.O., Knight J.C. GENETICS OF GENE EXPRESSION IN PRIMARY IMMUNE CELLS IDENTIFIES CELL-SPECIFIC MASTER REGULATORS AND ROLES OF HLA ALLELES. Nat. Genet. 2012;44:502–510. doi: 10.1038/ng.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dimas A.S., Deutsch S., Stranger B.E., Montgomery S.B., Borel C., Attar-Cohen H., Ingle C., Beazley C., Gutierrez Arcelus M., Sekowska M., et al. Common regulatory variation impacts gene expression in a cell type dependent manner. Science. 2009;325:1246–1250. doi: 10.1126/science.1174148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gygi S.P., Rochon Y., Franza B.R., Aebersold R. Correlation between Protein and mRNA Abundance in Yeast. Mol. Cell Biol. 1999;19:1720–1730. doi: 10.1128/mcb.19.3.1720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Marguerat S., Schmidt A., Codlin S., Chen W., Aebersold R., Bähler J. Quantitative Analysis of Fission Yeast Transcriptomes and Proteomes in Proliferating and Quiescent Cells. Cell. 2012;151:671–683. doi: 10.1016/j.cell.2012.09.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Cheng P., Zhao X., Katsnelson L., Camacho-Hernandez E.M., Mermerian A., Mays J.C., Lippman S.M., Rosales-Alvarez R.E., Moya R., Shwetar J., et al. Proteogenomic analysis of cancer aneuploidy and normal tissues reveals divergent modes of gene regulation across cellular pathways. Elife. 2022;11 doi: 10.7554/eLife.75227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Schwanhäusser B., Busse D., Li N., Dittmar G., Schuchhardt J., Wolf J., Chen W., Selbach M. Global quantification of mammalian gene expression control. Nature. 2011;473:337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
- 47.Upadhya S.R., Ryan C.J. Experimental reproducibility limits the correlation between mRNA and protein abundances in tumor proteomic profiles. Cell Rep. Methods. 2022;2 doi: 10.1016/j.crmeth.2022.100288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhang J., Dutta D., Köttgen A., Tin A., Schlosser P., Grams M.E., Harvey B., Yu B., Boerwinkle E., et al. CKDGen Consortium Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. Nat. Genet. 2022;54:593–602. doi: 10.1038/s41588-022-01051-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Full summary statistics for all association analyses performed and code for presented results are available at https://github.com/hwittich/TWAS_for_protein. Data from INTERVAL is under controlled access via the European Genome-phenome Archive at https://ega-archive.org/ for both genotypes (EGA: EGAD00010001544) and blood plasma aptamers levels as measured by a SOMAscan assay (EGA: EGAD00001004080). TOPMed MESA data are under controlled access in dbGaP at https://www.ncbi.nlm.nih.gov/gap/. Genotypes are available through accession dbGaP: phs000420.v6.p3 and RNA-sequencing and proteome data are available through accession dbGaP: phs001416.v2.p1.