Abstract
Studying the plasma proteome as the intermediate layer between the genome and the phenome has the potential to identify new disease processes. Here, we conducted a cis-focused proteogenomic analysis of 2,923 plasma proteins measured in 1,180 individuals using antibody-based assays. We 1) identify 256 unreported protein quantitative trait loci (pQTL), 2) demonstrate shared genetic regulation of 224 cis-pQTLs with 575 specific health outcomes, revealing examples for important metabolic diseases, like gastrin releasing peptide as a potential therapeutic target for type 2 diabetes, 3) improve causal gene assignment at 40% (n=192) of overlapping risk loci, and 4) observe convergence of phenotypic consequences of cis- pQTLs and rare loss-of-function gene burden for twelve proteins, like TIMD4 for lipoprotein metabolism. Our findings demonstrate the value of integrating complementary proteomic technologies with genomics even at moderate scale to identify novel mediators of metabolic diseases with the potential for therapeutic interventions.
Introduction
Rare and common sequence variation across the genome contributes to the risk of most human diseases investigated to date (1). However, the translation of the many established and emerging genome-to-phenome links is limited by the uncertainty around the underlying causal genes. This presents a major limitation for experimental follow-up, mechanistic understanding, and use of the emerging genomic evidence in drug development. Different approaches, such as integration of tissue-specific gene expression data (2), experimentally derived functional genomic data such as ChIP-seq or ATAC-seq (3), or functional characterization of candidate variants using CRISPR screens in cellular models (4) have been used to address this gap and to identify likely causal genes at risk loci. However, complex regulatory processes take place at each stage of transcription and translation, which leads to low to moderate correlation between transcript and protein abundance, and cellular models can only approximate complex human biology. Compared to these methods, a proteogenomic approach has the advantage of focusing on the biologically active entity - the protein.
The development of broad-capture proteomic assays, targeting thousands of proteins in parallel, now enables proteogenomic approaches which can efficiently identify causal genes by systematically testing for a shared genetic regulation of protein levels or function and disease susceptibility. This has catalyzed substantial advances in the identification of a) causal genes and proteins underlying established disease ‘loci’, and b) molecular ‘hubs’ that connect the genome not to one but many diseases through the encoded protein (5–18). Previous large-scale proteogenomic studies covering thousands of proteins have almost exclusively used aptamer-based assays (10, 11, 15, 16). Correlations of protein measures from aptamer versus antibody-based technologies have been shown to vary widely, and proteogenomic results are concordant for around only 65% based on around 900 overlapping proteins targets (14). To date, antibody-based proteomic assays have only been available for selected protein panels at scale (6, 7, 9, 13, 14, 18), but this is changing with the availability of the Olink® Explore 1536 and Olink® Explore Expansion assays measuring ˜1,400 proteins each.
The UK Biobank Pharma Proteomics Project (UKB-PPP) project which measured ˜1,400 proteins using Olink® Explore 1536 assay in over 50,000 participants successfully demonstrated the power of scaling up by cataloguing over 10,000 mainly novel protein quantitative trait loci (pQTLs), including high-impact rare variants (19). However, these studies provided few insights about the translational potential of pQTLs to systematically inform candidate gene annotation at known risk loci and more importantly, to reveal novel biological roles of proteins for human health at scale. UKB-PPP and others did demonstrate that genuine and biologically relevant protein quantitative trait loci (pQTL) can be discovered in as few as hundreds of individuals (14, 17, 20), suggesting that broader proteomic coverage in even small-scale proteogenomic studies can make substantial advances to the understanding of diseases if integrated with genome-wide association statistics from bespoke, large-scale studies on diverse diseases and broader health characteristics.
Here, we generate antibody-based proteomic data using the Olink® Explore 1536 and Explore Expansion assays to capture 2,923 proteins in 1,180 individuals. We perform genetic fine-mapping at protein coding genes (±500kb) and enhance the understanding of disease mechanisms by systematically integrating cis-pQTLs with thousands of diseases and health measures to (a) refine the candidate causal gene assignment at existing disease susceptibility loci at scale and (b) identify novel disease mechanisms in phenome-wide colocalization analyses.
Results
Fine-mapping cis-pQTLs for 2,923 protein targets
We adopted a Bayesian fine-mapping strategy to identify proximal acting genetic variants (cis-pQTLs, ±500kb around the protein coding gene) that were associated with plasma abundance of 2,923 proteins measured in 1,180 participants of the EPIC-Norfolk cohort (21) (Supplementary Table 1–2). We identified a total of 1,553 independent credible sets for 914 unique protein targets for which sentinel variants reached genome-wide significance (p<5x10-8) when modelled jointly at each protein coding locus (Fig. 1A, Supplementary Table 3). The number of independent credible sets for each protein target ranged between one and eight (mean=1.64, IQR=1-2), illustrating wide-spread allelic heterogeneity at protein coding loci. This included 256 unreported credible sets (Fig. 1B), 236 of which successfully replicated in an independent test set (Supplementary Table 3) (5–18). We further observed a high-replication rate for 590 proteins overlapping with the UKB-PPP effort (89.9%, 910 out of 1,013 Olink® Explore 1536 cis-pQTL credible sets). Notably 125 of the novel signals were for 101 previously targeted proteins, the majority of which (n=92 proteins) have been targeted using non-antibody-based technologies in samples sizes up to 30 times larger than ours (10, 11, 15, 16) (Fig. 1B).
Figure 1. Genetic associations of 2,923 proteins measured by the Olink Explore 1536 and Olink Explore Expansion platforms in 1,180 individuals.

Previously unreported and reported pQTLs are represented with a filled and hollow circle, respectively. Only the variants which are genome-wide significant (p-value<5x10-8) in the joint model are presented. A. Miami plot representing the independent lead cis-pQTLs identified through Bayesian fine-mapping for 914 unique proteins. Shown are p-values from a linear regression model modelling all identified credible set variants for a given protein target jointly. Top: Lead cis-pQTL signals unreported to date. Bottom: Lead cis-pQTL signals which were in linkage disequilibrium (LD; r2>0.5) with a previously reported pQTL. B. Minor allele frequency vs effect size of unreported pQTL signals, coloured by whether the protein has previously been targeted. Unreported pQTL signals for a previously targeted protein are coloured grey and those for a previously untargeted protein are coloured orange. C. Minor allele frequency vs effect size of unreported pQTL signals, coloured by most severe variant consequence prediction. The colour coding represents the most severe Variant Effect Predictor (73) consequence of the lead cis-pQTL, or variants in LD (r2>0.6) within the protein encoding gene. The most severe consequence is coloured red (Ensembl consequence rank = 1) and the least severe consequence is coloured blue (Ensembl consequence rank = 37). D. Minor allele frequency vs effect size of reported pQTL signals, coloured by most severe variant prediction. The colour coding represents the most severe Variant Effect Predictor consequence of the lead cis-pQTL, or variants in LD (r2>0.6) with the lead cis-pQTL within the protein encoding gene. The most severe consequence is coloured red (Ensembl consequence rank = 1) and the least severe consequence is coloured blue (Ensembl consequence rank = 37). Lines are power curves which represent 25% (light grey), 90% (medium grey) and 95% (dark grey) power from the bottom to the top, respectively in our study with 1,180 participants for inverse rank normalized protein level measurements.
We observed that the distributions of effect sizes and minor allele frequencies for unreported cis-pQTLs were comparable to the 1,297 (83.5%) successfully replicated cis-pQTLs (5–18) (Supplementary Table 3), illustrating that complementary proteomic technologies can still identify genetic variants that would have been anticipated to be seen in previous studies with much larger sample sizes (Fig. 1A-D). This included 34 previously unreported cis-pQTLs with a minor allele frequency (MAF) above 5% with large absolute effect sizes (range 0.5-1.57 s.d. per allele), half of which (n=19) are unlikely to be a result of altered epitope binding of the affinity reagent and hence illustrate a strong genetic control for selected protein targets. In general, the identified cis-pQTLs explained a median of 10.1% of the variance in adjusted protein abundances (IQR: 4.9% - 22.4%), with lead-cis pQTLs explaining over 50% of the variance for 59 protein targets (Supplementary Table 4).
Proteins with at least one significant cis-pQTL were enriched for characteristics of secreted proteins, like the presence of disulfide-bonds (odds ratio (OR) [95%-CI]: 4.47 [3.78-5.31]; p-value=3.0x10-74) or glycosylation sites (OR [95%-CI]: 2.22 [1.79-2.75]; p-value=1.1x10-13), but depleted of sites for posttranslational modifications that are important for intracellular signaling, like phosphorylation (OR [95%-CI]: 0.42 [0.34-0.54]; p-value=5.5x10-14) or ubiquitination (OR [95%-CI]: 0.30 [0.19-0.45]; p-value=4.2x10-11).
For more than half of the protein targets (n=532) with at least one cis-pQTL, we observed strong evidence of colocalization (PP>80%) between a cis-pQTL and the corresponding gene expression QTL (eQTL) signal in at least one out of 49 tissues of the GTEx resource (Supplementary Table 5). These results suggest altered expression of protein coding genes in one or multiple tissues as a major source for cis associations observed with plasma protein levels.
We finally tested whether any of the sex differences in the effect of all the identified cis-pQTLs and detected only four variants that passed the multiple testing correction (p<0.05/1553; Supplementary Table 6). This included sex-differential effects of proteins most highly expressed in the reproductive system like TEX101 (rs35033974, p-valuesex-interaction=1.11x10-8, betawomen [95% CI] = 0.19 [0.08 – 0.30], betamen= 0.67 [0.55 - 1.88]; testis) and PAEP (rs783768, p-valuesex-interaction=4. 18x10-10, betawomen [95% CI] = 0.66 [0.48 – 0.69], betamen= 1.16 [0.17 – 1.26]; endometrium), as well as two proteins (KLK4 and COL28A1) involved in collagen chain trimerization. While not tested here, previous studies with larger sample sizes did not observe any genetic sex-differential effects on the X chromosome (15).
From genome to phenome via the proteome
The genome is linked to the phenome via the proteome and the translational potential of pQTLs is due to their ability to link insights about the genetic regulation of protein levels and function to diseases (15). We identified 1,110 robust protein – phenotype pairs (Fig. 2; posterior probability [PP] > 80% of a shared genetic signal) comprising 224 protein targets for 575 unique traits by systematically testing for a shared genetic architecture at protein coding loci (±500kb) across the phenome (Supplementary Table 7). This included well-described examples, such as UMOD and kidney disease or established drug targets like PCSK9 and LDL-cholesterol. Notably, we observed evidence for phenotypic consequences of cis-pQTLs for a total of 93 protein targets, which were not observed in our previous study (15), with almost ten times higher sample size but using an aptamer-based technology, clearly highlighting the value of performing proteogenomic studies even for protein targets captured in massive scale studies.
Figure 2. Protein – disease network.

Results from phenome-wide colocalization at protein coding loci (±500kb) are shown. For simplicity, only proteins with at least one binary outcome (i.e., mainly diseases) association are included. Proteins are presented with a square, binary outcomes are presented with large circles, and continuous outcomes are presented with small circles. The colour for the circles present the trait category. Edges between proteins and phenotypes represent strong evidence for a shared genetic signal (PP>80% and LD between regional sentinel variants >0.8). Effect directions are indicated by the line type (solid = higher protein abundance, increased risk, dashed = higher protein abundance, reduced risk) and derived based on the lead cis-pQTL at the corresponding locus. The full list of colocalization results can be found in Supplementary Table 7 and results can be viewed in full resolution in Cytoscape session provided in Supplementary Data 1. Abbreviations: GIT, gastrointestinal tract.
One of the examples was gastrin releasing peptide (GRP, encoded by GRP), for which we observed strong evidence of colocalization (PP=82.5%) between plasma levels and type 2 diabetes (T2D) risk at an established genome-wide association study (GWAS) locus (18q21) for which different genes had been prioritized, including SEC11C, GRP, and MC4R (22–24). The GRP-increasing G-allele of the lead cis-pQTL (rs1517035; MAF=0.18) was associated with a reduced risk for T2D in the largest T2D study (25) (OR [95% CI] = 0.96 [0.95-0.98], p-value=7.8x10-10). GRP is a neuropeptide named for its ability to stimulate secretion of the gastric acid secretagogue, gastrin, in the stomach (26, 27), but it is likely involved in other metabolic pathways. We obtained strong evidence that GRP likely mediates T2D risk via an effect on overall obesity based on the convergence of evidence from mice studies (28, 29), (30), human trials (31) and human genetic data from this study. Briefly, we established a shared genetic signal between plasma GRP, body mass index and fat, and T2D risk using multi-trait colocalization with coherent effect directions (Fig. 3). GRP induces satiety in mice via its cognate GRP receptor (Grpr) (28, 29) and mice lacking Grpr show impaired glucose tolerance after gastric glucose administration (30) and gain excess body weight under ad libidum conditions (29). These observations have been corroborated by human trials, in which treatment with human recombinant GRP (hrGRP) led to weight loss through reduced food intake (31). In summary, our results motivate investigations into hrGRP for appetite control and body weight lowering to possibly assist in T2D management and remission, an approach similar to recently implemented treatment strategies targeting incretins, like GLP-1, and associated receptors, with preliminary evidence of an additive effect in rats (32).
Figure 3. Stacked regional association plots for the multi-trait colocalization.
Linear and logistic regression models were used to obtain summary statistics presented in this figure. A. Stacked regional association plots for the multi-trait colocalization of the GRP cis-pQTL with gynoid fat, android fat, total body fat, body mass index and type 2 diabetes. The top candidate SNP highlighted by multi-trait colocalization (rs7243357) and lead cis-pQTL for GRP (rs1517035) are in strong LD (r2=0.8). Gynoid fat, android fat and total body fat phenotypes are based on UK Biobank and were analysed in-house using BOLT-LMM (80). B. Stacked regional association plot the multi-trait colocalization of the FGFR4 cis-pQTL with type 2 diabetes in East Asian populations. Red colouring represents a positive effect direction in reference to the protein increasing allele for GRP whereas blue represent an inverse association. The hue of the colour represents the strength of r2 representing the LD structure, as indicated on the legend. European Type 2 diabetes summary statistics were obtained from dbGAP Million Veteran Program (MVP) European subset (ncases= 148,726, ncontrols= 965,732) (25). East Asian Type 2 diabetes summary statistics were obtained from Mahajan et al (2022) (ncases= 56,268, ncontrols= 227,155) (24). The body mass index summary statistics were obtained from Pulit et al. (2019) (n=806,834) (81).
Several risk loci for T2D have been reported to be specific to certain ancestries (22–24). In the absence of strong differences in allele frequencies, such ancestry specific effects could be caused by a variety of different factors, including environmental factors such as dietary intake. We obtained robust evidence that FGFR4 is the candidate causal gene at the East Asian-specific FGFR4-NSD1 risk locus supported by a high PP of 97% for a shared genetic signal with plasma levels of the gene product fibroblast growth factor receptor 4 (FGFR4) and cross-ancestral conserved LD between regional sentinel variants (r2>0.96; Fig. 3). The protein-increasing A-allele of the lead cis-pQTL (rs351855, beta= 1.01, p-value=9.8x10-234, EAFEuropean=0.30, EAFEastAsian=0.46) was associated with an increased risk for T2D (OR [95%-CI] = 1.28 [1.17 – 1.40], p-value=1.1x10-7) in East Asians (24). Candidate gene studies have implicated rs351855 (p.G388R) in cancer susceptibility (33–35), and subsequent mechanistic studies showed a gain of function of the mutant FGFR4 by binding transducer and activator of transcription 3 (STAT3) (36). While we found no evidence for an association to cancer, there are different studies that support our observation of FGFR4 in T2D-related pathways including hepatic glucose, bile, and lipid metabolism, and possibly insulin signaling in a diet-dependent manner (37–40). Briefly, Fgfr4-/- mice fed a normal chow diet exhibit insulin resistance and impaired glucose tolerance compared to wild-type controls, however, this difference is not observed in high-fat diet fed mice where both groups showed signs of insulin resistance. A similar masked genetic effect by a high-fat diet is seen with the mutant protein in mice and small observational studies in humans (41). The ability of diet to obscure genetic effects may explain the ancestral-specific effect in the absence of strong differences in allele frequencies, with high-fat diet conditions being substantially more common in Western-style countries of predominantly European ancestry compared to East Asia (42), in particular Japan, in line with Biobank Japan (p-valueT2D=7.6x10-11) being the largest contributing population to the East Asian T2D meta-analysis (24).
Proteogenomic annotation of genes at reported risk loci
Annotation of the candidate causal genes at disease susceptibility loci is the major bottleneck in the translation of GWAS into biological and possibly clinical insights (43). We exploited the genomic proximity between cis-pQTLs and the protein coding gene for gene annotation by systematically overlapping identified credible sets in this study with reported risk loci (p<5x10-8) from the GWAS catalog (downloaded on 23/03/2022; (1)).
We identified 480 credible sets targeting 395 unique proteins (43.2% of all, 914 unique protein targets) for which the lead cis-pQTL or a proxy (r2>0.8) had been reported as regional lead signal for one or more of 5,391 collated traits in the GWAS catalog (Fig. 4 and Supplementary Tab. 8,). This included 236 unique protein targets (59.7%) that had also matching evidence for colocalization with gene expression events in at least one tissue, providing additional confidence in candidate causal gene assignment, whereas the reaming ones demonstrate the importance of additional functional genomic layers to facilitate causal gene assignment.
Figure 4. Candidate causal gene assignment at reported GWAS loci using pQTLs. The marked genetic locations on the human karyotypes (chromosomes 1-22) only present the existing GWAS risk loci which overlapped with pQTL loci (n=480).
The locus is coloured orange if the pQTL provides a novel candidate causal gene assignment for one or more traits, light blue if it refines a candidate causal gene from a longer list of reported or closest genes, and dark blue if it confirms the candidate causal gene assignment provided by the GWAS.
For 40% (n=192) of the thereby annotated loci, we prioritized a gene that was different from the one originally reported, of which 50% (n=96) were not the gene nearest to the GWAS sentinel variant. We further refined a longer list of putative causal genes to a single one for an additional 31 cis-regions (6.5%). While systematic testing for a shared genetic architecture using statistical colocalization was not possible due to the general lack of genome-wide summary statistics, about half (49.6%) of the protein targets were also highlighted in our colocalization analysis.
These results exemplify the unique potential of cis-pQTLs for gene annotation of loci reported across diseases and traits related to human health (Fig. 4 and Supplementary Tab. 8). For example, we identify DKKL1 as a candidate causal gene for multiple sclerosis (MS), potentially through a role in B-cell hyperactivity, which may provide late genetic evidence for depletion of B-cells being one of the most effective treatments for MS, a therapeutic strategy that originally emerged from clinical and neuropathological studies (44, 45) (see Supplementary Note 1).
Multiple independent genetic variants being associated with the same protein target at the same locus, so-called allelic heterogeneity, provides the highest confidence in gene assignment but can also highlight differential biological roles for the same protein. We observed 73 such protein targets with two or more credible sets including distinct GWAS variants for related and unrelated traits. For example, we observed a segregation of phenotypes across distinct cis-pQTLs for alpha-L-iduronidase encoded at IDUA. Briefly, three out of four detected credible sets contained GWAS risk loci or strong proxies (r2>0.8) for fractures for fractures (rs115134980; MAF=16.1%; OR [95% CI] = 0.94 [0.92 − 0.96], p-value=7.4x10-12) (46) and bone mineral density (rs115134980; MAF=16.1%; beta=0.07, p-value=1.8x10-17) (47), waist-to-hip ratio adjusted for BMI (48) (rs11724804; MAF=44.7%; beta=-0.017, p-value=7.6x10-21) and inflammatory diseases (49), as well as type 1 diabetes (50) (rs3796622; MAF=35.2%; OR [95% CI] = 0.93[0.90-0.96], p-value=1.7x10-7) (Fig. 5). Alpha-L-iduronidase is essential for the breakdown of glycosaminoglycans within lysosomes and rare pathogenic variants within IDUA are known to cause accumulation of glycosaminoglycans in lysosomes (mucopolysaccharidosis type I [MPS-1]). People with MPS-1 present with a wide spectrum of complications, such as skeletal deformities or organomegaly, that has been attributed to the variable impact of mutations on enzyme activity, with nonsense mutations causing most severe diseases (Hurler syndrome) (51). Knock-down of Idua further led to disturbed bone turn over favoring bone mass build up due to lysosomal overload in osteoblasts (52). While these observations explain bone phenotypes seen for the common cis-pQTL, there are no reports for an elevated risk for inflammatory or autoimmune disease among people with MPS-1 or other evidence from rare variant analysis. Tissue-dependent effects of common variants might be one explanation for the different phenotypes linked to distinct cis-pQTLs for alpha-L-iduronidase. We observed another example of allelic heterogeneity with distinct phenotypic consequences for Alzheimer’s disease and childhood obesity at the 16q22.1 locus (see Supplementary Note 2).
Figure 5. Allelic heterogeneity at protein coding loci translates into distinct phenotypic consequences at IDUA.
Regional associations plots centered around IDUA (±400kb) for plasma alpha-L-iduronidase levels, type 1 diabetes (50), waist-to-hip ratio (WHR) adjusted for body mass index (BMI) (48), and risk of fractures (46). Shown are association statistics (p-values) from genome-wide association analysis, obtained from linear and logistic regression models. Single genetic variants were coloured based on LD with three distinct cis-pQTLs (rs3796522 – orange; rs115134980 – purple; rs11724804 – green). Lead cis-pQTLs are highlighted by hollow diamonds.
Phenotypic convergence of rare gene burden and cis-pQTLs
Much effort and funding has been invested into biobank-scale whole-exome sequencing studies (ExWAS) to identify rare deleterious genetic variants and novel disease candidate genes for the development of treatment strategies (53, 54). These studies focused on the rare deleterious end of gene (and protein) dysfunction, but whether common, more subtle, possibly regulatory, effects on the same gene product have similar consequences remains to be established. Such rare to common convergence could, for example, establish dose response relationships to estimate therapeutic windows for drugs (55). We therefore systematically cross-referenced our cis-based phenome-wide colocalization results with a recent exome-wide gene burden study among ˜450,000 UK Biobank participants across almost 4,000 phenotypes (53).
Among 2,939 protein coding genes analyzed in the present study, 40 (1.3%) showed evidence for phenotypic associations with a rare variant gene-burden (p<1x10-6) and statistical colocalization (PP>80%) with a cis-pQTL, whereas 281 and 184 protein coding genes were linked to phenotypes through ExWAS or cis-pQTLs only, respectively (Fig. 6A). Out of the 40 overlapping genes, we observed phenotypic convergence for only 12 genes across 21 phenotypes following manual review to harmonize phenotype definitions (Supplementary Tab. 9). These results clearly exemplify the complementary nature of both approaches and the unique ability of bespoke proteogenomic experiments to prioritize disease mediators and hence putative therapeutic targets.
Figure 6. Phenotypic convergence of rare variant burden and common cis-pQTLs for protein coding genes and TIMD4 as an example.
A. Venn diagram showing the number of genes with a significant rare variant gene burden association (p<1E-06) with at least one trait (53) in blue and the number of genes with a significant pQTL colocalization (PP>80%) with at least one trait in orange. All 2,939 unique genes covered by Olink Explore 1536 and Explore Expansion assays were investigated. B. Forest plot comparing the effect size estimates between TIMD4 cis-pQTL (rs58198139) and rare TIMD4 loss of function (LoF) gene-burden results (variant group: missense and loss of function variants with a minor allele frequency < 1%) for low density lipoprotein cholesterol, total cholesterol and triglyceride levels. Rare TIMD4 loss of function (LoF) gene-burden (n=454,787) results are shown in blue and TIMD4 cis-pQTL associations (n=1,180) are shown in orange. C. Stacked regional plot of the multi-trait colocalization of TIMD4 cis-pQTL with lymphocyte count, low density lipoprotein cholesterol, and triglycerides. Red colouring represents a positive effect direction with protein increasing allele with TIMD4 whereas blue represent an inverse association. The hue of the colour represents the strength of r2 representing the LD structure, as indicated on the legend. Linear regression models were used to obtain summary statistics presented in this figure.
We observed a dose-response relationship between putative functional consequences for T-cell immunoglobulin and mucin domain containing 4 (TIMD4) and LDL-cholesterol as well as total triglyceride, but not HDL-cholesterol levels in blood (Fig. 6C). The protein-decreasing T-allele of the lead cis-pQTL (rs58198139) was associated with moderate effects on LDL-cholesterol in UK Biobank (MAF=0.26; betaLDL=0.03, p-valueLDL=7x10-44), likely mediated by altered protein expression, while the cumulative burden of rare loss-of-function variants was associated with substantially higher LDL-cholesterol levels (betaLDL= 0.25, p-valueLDL= 1.51x10-9, variant mask: predicted loss of function and deleterious missense variants with MAF<1%; Fig. 6B). This finding is in line with the locus being one of the earliest discovered loci for polygenic dyslipidemia but with few functional insights gained since (56). TIMD4 is best known for its role in tissue-dependent macrophage efferocytosis of apoptotic cells (57, 58) but does also participate in T-cell activation and recruitment (59). Accordingly, Timd4-/- mice show impaired macrophage phagocytosis and increased lymphocyte cell counts, an observation recapitulated by our phenome-wide colocalization analyses identifying an inverse association for the protein-decreasing T-allele for lymphocyte counts (beta=1.02, p-value=1.4x10-12) with high certainty (PP = 97.5%). Circulating leucocytes and resident M2 macrophages can take up cholesterol from circulating LDL particles and sequestered lipoproteins in the vasculature, but classical pathways, like the LDL-receptor mediated uptake, were shown, at least in mice, to have no substantial effect on plasma LDL-cholesterol levels (60). In contrast, more recent work demonstrated the ability of TIMD4+ adipose tissue macrophages to significantly contribute to the regulation of post-prandial HDL-cholesterol levels in mice (61). While there was no difference in triglycerides or non-HDL cholesterol following TIMD4 blockade, TIMD4 blockade inhibited LDL-induced lysosomal activity in vitro, suggesting a role for TIMD4 in peripheral LDL cholesterol processing. These findings provide evidence of a role for TIMD4 in the regulation of systemic lipoprotein metabolism, and taken together with our proteogenomic findings, provide a compelling rationale to explore the role of TIMD4+ macrophages in systemic LDL cholesterol metabolism. We note that we did not observe strong genetic evidence for an association between rare or common genetic variation at TIMD4 for coronary artery disease (cis-pQTL: protein decreasing T-allele, OR [95% CI] = 1.04 [1.02-1.07], p-value=0.002; gene burden: OR [95% CI] = 1.3 [0.90-1.88], p-value=0.16), which may weaken the expectation that pharmacological modulation of TIMD4 could address the residual burden of CAD despite lipid-lowering treatment.
Discussion
Proteogenomic approaches have the potential to establish a direct link from rare and common variation in or close by protein-encoding genes to human health via the protein product (5–18). Despite recent advances and early successes, the field is still in its infancy with respect to the scale and protein capture, with existing broad-capture technologies currently targeting less than a third of all proteins encoded in the human genome (5–18), not capturing posttranslational modifications and not providing absolute protein quantification.
Here, we identified more than 200 unreported cis-pQTLs by capitalizing on recent assay developments. The fact that we identified hundreds of cis-pQTLs for proteins that have been investigated in studies much larger than ours might be best explained by the need to develop further orthogonal methods to measure protein targets as we have outlined previously (14).
We demonstrate that systematic application of cis-pQTLs to large-scale genetic studies of human diseases can 1) guide causal gene annotation at GWAS loci (e.g., DKKL1 for multiple sclerosis), 2) identify pathways that link genes to diseases guided by a protein-phenotype network, and 3) complement gene-burden testing of rare variants to discover novel biology. We highlight specific examples in more detail and share a large number of high-confidence protein–phenotype associations that provide a direct guide for functional follow-up and future investigations of variant protein with disease relevance about which little is known to date.
The vast majority (˜90%) of genetic variants identified in GWASs reside in non-coding regions of the genome (62), creating a challenge for variant-to-function annotation. While large-scale and tissue-resolved gene expression studies have been pivotal for gene assignment (2, 63, 64), we, in line with previous studies, demonstrated the efficiency and ability of cis-pQTLs to prioritize causal candidate genes including reassignments at 40% of overlapping loci. In contrast to other annotation approaches, the value of the integration of proteogenomic studies lies in the instrumentalization of the likely biological effector molecules. Studies using proteomic profiling in disease relevant tissues or single cell-types are needed to further elucidate the mechanisms underlying the many thousands of unassigned GWAS loci, including intracellular signaling pathways that cannot be readily proxied from blood samples. The same applies to the incomplete coverage of the circulating proteome, that currently prohibits the distinction of pleiotropic regulatory variants from genuine specific cis-pQTLs.
This study is a powerful demonstration that even moderately sized proteomic studies can result in the identification of novel biology when combined with bespoke analysis pipelines designed for the identification of cis-pQTLs and systematic integration and follow-up with disease GWAS summary statistics. Eventually multiple different technologies will be needed at scale to capture not only proteins of interest but also the vast spectrum of proteoforms with possible distinct phenotypic consequences (14). This prediction is supported by power calculations of the UKB-PPP (17), but also based on the observation that our study identified cis-pQTLs for genes that are under less evolutionary constraint as indicated by higher observed/expected scores for missense (+0.15; p-value=5.0x10-50) and loss-of-function (+0.22; p-value=7.5x10-51) variation in gnomAD (65). An observation is in line with recent findings among eQTL studies (66).
We observed convergence of gene–phenotype associations between ExWAS and our proteogenomic approach only at a small number of genes. Gene identification with overlapping or converging evidence, as shown for TIMD4, provides high confidence about the underlying causal gene, while the incomplete overlap clearly indicates the complementary nature of both approaches for drug target prioritization. An important distinction between both approaches, beyond the different genetic variants covered, is the ability of proteogenomics to emulate protein variation across the whole spectrum of abundance and in some cases function, and not only putative loss-of-function (rarely gain of function), which might explain differences seen in phenotypic consequences between both approaches. In addition, in terms of practicality, integration of pQTLs into colocalization and GWAS loci annotation enabled us to uncover unreported disease biology with a small sample size of 1,180 individuals, whereas substantially larger sample sizes, even millions of individuals, are needed to reach enough power to detect rare variant associations in ExWAS studies for disease endpoints (67).
Our study has some limitations that need to be considered. Affinity-based reagents allow for the quantification of protein abundance but are inherently limited to quantify the level of activity, although a general correspondence between the two can be assumed. This limits insights about the role of protein targets using a proteogenomic approach. Further, numerous posttranslational modifications can change the function and abundance of proteins but are currently not distinguishable using affinity reagents at scale. We deliberately decided to restrict genetic analysis of protein targets to the corresponding protein coding regions (±500kb) for two reasons: 1) the high biological prior to identify genetic variants directly linked to protein function/abundance, and 2) to increase power for statistical analysis by limiting the multiple testing burden. We, however, made genome-wide association statistics based on inverse variance weighted meta-analyses across both sets available to enable targeted discoveries for selected proteins. Larger studies to explore the spectrum of trans-pQTLs and longitudinal studies to explore temporal effects of pQTLs are also needed to better understand the impact of proteins on human health.
In summary, we demonstrate the clear potential of broad-capture proteogenomic studies to identify novel biological pathways that link protein-encoding genes to human (metabolic) diseases. Systematic integration of human genomic with proteomic and phenomic data enables such investigation even in relatively moderately sized studies and can help to prioritize targets and indications for the development of safe and effective therapeutic interventions.
Materials and Methods
Study participants
We measured protein levels among 1,200 participants of the European Prospective Investigation into Cancer (EPIC)-Norfolk study, a cohort middle-aged, individuals from the general population of Norfolk, a county in Eastern England which is a component of EPIC (21). We excluded individuals who were related, of non-European descent or did not have a high-quality proteomic profile resulting in 1,180 and 1,178 for genetic analyses for Olink Explore 1536 and Explore Expansion platforms, respectively (Supplementary Tab. 1). The study was approved by the Norfolk Research Ethics Committee (ref. 05/Q0101/191) and all participants gave their informed written consent before entering the study. Information on lifestyle factors and medical history was obtained from questionnaires as reported previously (21). For cost-efficient proteomic profiling, our study consisted of a random sub-cohort (n=755) and a case cohort (n=425) covering a total of seven incident diseases (Supplementary Tab. 2). We performed a replication study in a total of 1707 participants, again divided into a random subcohort (n=1001) and case cohort (n=706) applying the same exclusion criteria as mentioned above (Supplementary Tab. 1–2).
Proteomic profiling
We used serum samples from the baseline assessment (1993 - 1997) that had been stored in liquid nitrogen for proteomic profiling using the Olink Explore 1536 and Explore Expansion platforms targeting 2925 unique proteins by 2943 assays, of which 2923 unique proteins mapped to a protein encoding locus in genome assembly GRCh37. Details regarding the assay have been described in detail (68). Briefly, proteins are targeted by two separate unique antibodies, each of which are labelled with complementary single stranded oligonucleotides (proximity extension assays (69)). These proximity extension assays hybridization occurs subsequent to the binding of antibody pairs with complementary oligonucleotides which can be quantified using next generation sequencing (NGS). NGS read-outs undergo quality control procedures where internal (incubation, extension and amplification controls) and external (negative, plate and sample controls) controls are included. Normalized protein expression (NPX) units are generated by normalization to the extension control and further normalization to the plate control and reported on a log2 scale. We excluded samples which were extreme outliers using principal component analysis from their entire proteomic profiles. For downstream genetic analysis (fine-mapping and region-based association analysis), we first rank-inverse normal transformed NPX-values to achieve robust statistical analyses with comparable effect estimates across proteins. We then corrected inverse rank transformed values for age, sex, measurement plate, and the first ten genetic principal components using linear regression models. The residuals of this analysis were used throughout the study. For simplicity, we use the term ‘protein levels’ to refer to the relative assay readouts for each protein, although we acknowledge that affinity-based reagents might be affected by genetic or posttranslational modifications of epitope regions.
Genotyping
EPIC-Norfolk samples (n=21,448) were genotyped on the Affymetrix UK Biobank Axiom array chip by Cambridge Genomic Services, University of Cambridge, UK. Sample and variant QC followed the Affymetrix Best Practices guidelines. Samples were excluded based on DishQC < 0.82 (fluorescence signal contrast), call-rate <97%, heterozygosity outliers and sex discordance checks. Variants were excluded if call-rate <95% or HWE<=1e-6. Monomorphic variants and those with cluster problems detected using Affymetrix SNPolisher were excluded. Genotype imputation was performed using two different reference panels, the Haplotype Reference Consortium (HRC) (release 1) reference panel and the combined UK10K+1000 Genomes Phase 3 reference panel. After pre-imputation QC, 21,044 samples remained for imputation. All SNPs imputed using the HRC reference panel were included, and additional variants imputed using only the UK10K+1000 Genomes reference panel were added to create a combined imputed set. Variants with imputation quality INFO < 0.4 or MAF of < 0.0001 were excluded. All positions are on genome assembly GRCh37. After excluding ancestry outliers, individuals without a high-quality proteomic profile for each panel and pruning the sample set for related individuals, 1,180 and 1,178 individuals were included in proteogenomic analyses for Olink Explore 1536 and Explore Expansion platforms, respectively.
Fine mapping
We used statistical fine-mapping as implemented in the ‘sum of single effects’ model (SuSiE) using individual level genotype and protein data to identify credible sets at protein encoding loci (±500kb). Briefly, SuSiE employs a Bayesian framework for variable selection in a multiple regression problem with the aim to identify sets of independent variants each of which likely contain the true causally underlying genetic variant (70). We implemented the workflow using the R package susieR (v.0.11.92) and default prior and parameter settings. However, we noticed that SuSiE sometimes reports overlapping credible sets or credible sets that contained variants in high LD with already selected ones. Therefore, we adopted a grid search by first iterating the maximum number of credible sets from 2 to 10 (L in SuSiE terminology) and subsequently selecting the output for the maximum L so that none of the credible sets reported variants in LD (r2>0.1). We further tested for independent effects of all lead credible set variants (selecting using highest posterior inclusion probability) by including them in a joint regression model. We only took forward credible sets which were identified through fine-mapping and also met stringent genome-wide (p<5x10-8) significance in linear regression models that jointly modelled the effect of lead credible variants in the locus. In sensitivity analysis, we did not saw an effect of the time passed since storage on the identified cis-pQTLs. We used R v.3.6.0 to compute regression models.
Testing for effect modification by sex
In order to test for potential differences of the cis-pQTLs identified in this study, we included an interaction term between the cis-pQTL and sex in a linear regression model with the same adjustments as in the main analysis. We have defined significance at a Bonfferoni-corrected threshold (p<0.05/1553).
Replication of cis-pQTLs
We tested for independent replication of identified lead variants from credible sets by running the exact same joint models in a separate set of EPIC-Norfolk (n=1707) for which proteome profiling was done at a later timepoint. We considered cis-pQTLs to replicate, if they showed directionally concordant associations and further met a stringent Bonferroni-corrected threshold of significance (p<3.21x10-5). A total of 1,506 out of 1,553 (96.9%) cis-pQTLs fulfilled these criteria. We further observed a strong correlation (r=0.96) between the effect size estimates between the two studies (Supplementary Table 3).
Region-based association testing
To complement fine-mapping analysis, we computed regional association statistics at protein coding loci (±500kb) using fastGWA software provided by GCTA (v. 1.93.2beta) (71). We used residuals from rank-inverse normal transformed NPX-values corrected for age, sex, plate effect and the first ten genetic principal components. To account for the different selection designs of the sub-cohort and the cases, we performed these analyses within each cohort separately and combined in an inverse-variance fixed-effects meta-analysis in METAL (72).
Gene, Variant, and Protein annotation
We obtained conservation scores for all protein coding genes from gnomAD. We used the Variant Effect Predictor software (73) (version 98.3) with the --pick option to annotate all independent lead variants and proxies (r2>0.6) of identified pQTLs in our data set and report possible functional consequences. We collapsed pQTLs mapping to the same functional variant to reduce redundancy. We further obtained protein characteristics, e.g., glycosylation sites, from UniProt (74). To test for enriched characteristics of proteins among those with at least one cis-pQTL, we performed Fisher’s exact test using all proteins captured by the Olink Explore and Expand platform as a background.
Variance explained
We estimated the variance explained by pQTLs for protein levels of protein with at least one cis-pQTL by including all cis-pQTLs in a linear regression model using residual protein levels as outlined in the region-based association testing section. We used the R2 of the entire model as an estimate for the variance explained.
Annotation of GWAS catalog loci
We downloaded genome-wide significant summary statistics from the GWAS catalog (date 23/03/2022; (1)) and tested whether any of the lead credible set variants (lead cis-pQTLs) or proxies (r2>0.8) with the lead cis-QTLs have been reported to be associated with any non-proteomic trait, that is omitting any results that related to multiplex proteomic assays. Out of 347,165 entries (n=9,997 unique traits), 212,628 entries (n=5,391 unique traits) passed this and additional filtering steps (missing effect estimates, missing risk allele, and not passing genome-wide significance). For each cis-pQTL – GWAS variant mapping, we compared the reported or mapped gene (closest gene assigned by the GWAS catalog) to the protein-encoding gene at the locus.
Phenome-wide analyses at protein-encoding loci
We performed phenome-wide analyses using statistical colocalization for 914 protein targets where we had evidence for at least one cis-pQTL. To this end, we queried the Open GWAS database (75, 76) using a defined region (±500 kb) around the protein-encoding gene body and tested whether any of the traits in the databases showed a high PP of shared genetic signal with plasma concentrations of the encoded protein target using statistical colocalization (77). We only tested phenotypes for colocalization that had at least suggestive evidence of association (p<10-6) with the lead cis-pQTL in the region or a close proxy (r2>0.8). We chose a cut-off of PP>80% to declare that a protein target and a phenotypic trait are highly likely to share a genetic signal at a locus. As there are currently no methods to control the false discovery rate for colocalization screens, we used a conservative prior setting with p12=1x10-6 and further ensured that regional sentinel variants were in strong LD (r2>0.8). To avoid spurious colocalization results due to imperfect overlap of SNPs, we filter all results for which the strongest cis-pQTL or sufficient proxy (r2>0.8) in the overlapping set was not included in the overlapping set of SNPs or if less than 500 SNPs were overlapping. We used the igraph package in R to visualize protein – disease colocalization results as a network to account for cross-disease dependencies established by proteins. In studies where linear regression was used for binary traits for computational efficiency, we used the following formula to report ORs where needed: log(odds ratio) = β / (μ * (1 - μ)), where μ = case fraction.
Incorporation of gene expression data
We systematically tested for a shared genetic signal between plasma abundances of a protein and gene expression levels (eQTL) of the protein coding gene in 49 tissues from the GTEx project (v8) (78). We used a similar colocalization framework as described above but adopting a less stringent P12 prior (p12=1x10-5) to account for the higher biological prior of genetic signals in the protein encoding region. All GTEx variant-gene cis-eQTL associations from each tissue were downloaded in January 2020 from https://console.cloud.google.com/storage/browser/gtex-resources.
Phenotypic convergence between pQTL colocalization and rare loss of function gene-burden associations
To compare the phenotypic convergence of rare loss of function gene-burden and cis-pQTLs colocalization results, we downloaded single variant and gene-burden results for 3,986 phenotypic outcomes from UK Biobank respectively which were analysed by Backman et al. (2021) (downloaded on: 07/12/2021); (53). We filtered the results for 2,939 protein coding genes covered by the Olink Explore 1536 and Explore Expansion platforms. We compared the phenotypic convergence of genes that were significant for at least one phenotypic outcome in the exome-wide association analysis at exome-wide significance (p<1x10-6) with the pQTLs that showed significant statistical colocalization for at least one trait (PP>80%). If ExWAS results were significant for more than one variant group for the same gene – trait association, we have filtered the results to only take forward the most significant finding.
Multitrait colocalisation
We used hypothesis prioritisation in multi-trait colocalisation (HyPrColoc) (79) at selected protein loci to identify a shared genetic signal across various traits, including gene expression, plasma protein levels, and prioritized phenotypes from the disease-wise colocalization framework. HyPrColoc provides for each cluster three different types of output: 1) a PP that all phenotypes in the cluster share a common genetic signal, 2) a regional association probability, that it, that all the phenotypes share an association with one or more variants in the region, and 3) the proportion of the PP explained by the candidate variant. We considered a highly likely alignment of a genetic signal across various phenotypes if the PP>80% and report obtained PPs otherwise.
Supplementary Material
Acknowledgments
The EPIC-Norfolk study (DOI 10.22025/2019.10.105.00004) has received funding from the Medical Research Council (MR/N003284/1 MC-UU_12015/1 and MC_UU_00006/1, N.J.W.) and Cancer Research UK (C864/A14136. N.J.W.). The genetics work in the EPIC-Norfolk study was funded by the Medical Research Council (MC_PC_13048, N.J.W.). We are grateful to all the participants who have been part of the project and to the many members of the study teams at the University of Cambridge including the EPIC-Norfolk investigators, the Study Co-ordination team, the Epidemiology Field, Data and Laboratory teams who have enabled this research. This work was supported in part by MRC Rapid Call (MC_PC_21036, N.J.W., C.L.) and HDRUK Multi-Omics (G107794, C.L.) grants and the UKRI/NIHR Strategic Priorities Award in Multimorbidity Research for the Multimorbidity Mechanism and Therapeutics Research Collaborative (MR/V033867/1, C.L.). Proteomics measurements were also supported by a collaboration agreement between the University of Cambridge and Olink. We thank Philippa Pettingill, Ida Grundberg and Janet Kenyon for their support with quality control on the proteomic data. M.K. is supported by Gates Cambridge Trust. J.C.-Z. is supported by a 4-year Wellcome Trust PhD Studentship and the Cambridge Trust. C.L., E.W., M.P., N.K. and N.J.W. are funded by the Medical Research Council (MC_UU_00006/1). For the purpose of Open Access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
The authors thank Million Veteran Program (MVP) staff, researchers, and volunteers, who have contributed to MVP, and especially participants who previously served their country in the military and now generously agreed to enroll in the study (see https://www.research.va.gov/mvp/ for more details). We thank Friedemann Paul, Aroon Hingorani and Siamon Gordon for sharing their expertise on disease-specific examples. We thank Gabi Kastenmüller and Maria Anna Wörheide for their support in making genome-wide summary statistics from this study available on omicscience.org.
Footnotes
Author contributions
M.K., M.P. and C.L. designed the analysis and drafted the manuscript. M.K., M.P. and E.W. have performed the bioinformatics analyses. J.C.-Z. and N.D.K. have performed the quality control and data preparation of the proteomic data. S.L. contributed to the interpretation and curation of disease examples. N.J.W. is PI of the EPIC-Norfolk study. All authors contributed to the interpretation of the results and critically reviewed the manuscript.
Competing interests
E.W. is now an employee of AstraZeneca. The remaining authors declare no competing interests.
Data availability
The EPIC-Norfolk data can be requested by bona fide researchers for specified scientific purposes via the study website (https://www.mrc-epid.cam.ac.uk/research/studies/epic-norfolk/). Data will either be shared through an institutional data sharing agreement or arrangements will be made for analyses to be conducted remotely without the need for data transfer.
Fine-mapped summary statistics for protein coding regions can be found here: https://doi.org/10.5281/zenodo.7576293. The genome-wide summary statistics resulting from the meta-analysis between discovery and replication samples (n = 2,887) can be downloaded for all protein targets included in this study from https://omicscience.org/.
Genome-wide association studies for anthropometric phenotypes have been conducted using the UK Biobank resource (application no. 44448). Access to the UK Biobank genotype and phenotype data is open to all approved health researchers (http://www.ukbiobank.ac.uk/).
GWAS Catalogue summary statistics (v.1.0.2) were downloaded (March 2022) from https://www.ebi.ac.uk/gwas/api/search/downloads/studiesalternative. All GTEx variant-gene cis-eQTL associations from each tissue were downloaded (January 2020) from https://console.cloud.google.com/storage/browser/gtex-resources. OpenGWAS summary statistics were accessed via ieugwasr package in R v3.6.0.
Code availability
Associated code and scripts for the analysis are available on GitHub (https://github.com/MRC-Epid/pGWASOlinkEPIC).
References
- 1.Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–D12. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Barbeira AN, Bonazzola R, Gamazon ER, Liang Y, Park Y, Kim-Hellmuth S, et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 2021;22(1):49. doi: 10.1186/s13059-020-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583(7818):699–710. doi: 10.1038/s41586-020-2493-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Abell NS, DeGorter MK, Gloudemans MJ, Greenwald E, Smith KS, He Z, et al. Multiple causal variants underlie genetic associations in humans. Science. 2022;375(6586):1247–54. doi: 10.1126/science.abj5117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun. 2017;8:14357. doi: 10.1038/ncomms14357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, et al. Genomic atlas of the human plasma proteome. Nature. 2018;558(7708):73–9. doi: 10.1038/s41586-018-0175-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Folkersen L, Fauman E, Sabater-Lleal M, Strawbridge RJ, Frånberg M, Sennblad B, et al. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet. 2017;13(4):e1006706. doi: 10.1371/journal.pgen.1006706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yao C, Chen G, Song C, Keefe J, Mendelson M, Huan T, et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat Commun. 2018;9(1):3268. doi: 10.1038/s41467-018-05512-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gilly A, Park YC, Png G, Barysenka A, Fischer I, Bjørnland T, et al. Whole-genome sequencing analysis of the cardiometabolic proteome. Nat Commun. 2020;11(1):6336. doi: 10.1038/s41467-020-20079-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, Magnusson MI, Styrmisdottir EL, et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet. 2021;53(12):1712–21. doi: 10.1038/s41588-021-00978-w. [DOI] [PubMed] [Google Scholar]
- 11.Gudjonsson A, Gudmundsdottir V, Axelsson GT, Gudmundsson EF, Jonsson BG, Launer LJ, et al. A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat Commun. 2022;13(1):480. doi: 10.1038/s41467-021-27850-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Katz DH, Tahir UA, Bick AG, Pampana A, Ngo D, Benson MD, et al. Whole Genome Sequence Analysis of the Plasma Proteome in Black Adults Provides Novel Insights Into Cardiovascular Disease. Circulation. 2022;145(5):357–70. doi: 10.1161/CIRCULATIONAHA.121.055117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Png G, Barysenka A, Repetto L, Navarro P, Shen X, Pietzner M, et al. Mapping the serum proteome to neurological diseases using whole genome sequencing. Nat Commun. 2021;12(1):7042. doi: 10.1038/s41467-021-27387-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Pietzner M, Wheeler E, Carrasco-Zanini J, Kerrison ND, Oerton E, Koprulu M, et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat Commun. 2021;12(1):6822. doi: 10.1038/s41467-021-27164-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pietzner M, Wheeler E, Carrasco-Zanini J, Cortes A, Koprulu M, Wörheide MA, et al. Mapping the proteo-genomic convergence of human diseases. Science. 2021;374(6569):eabj1541. doi: 10.1126/science.abj1541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang J, Dutta D, Köttgen A, Tin A, Schlosser P, Grams ME, et al. Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. Nat Genet. 2022;54(5):593–602. doi: 10.1038/s41588-022-01051-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sun BB, Chiou J, Traylor M, Benner C, Hsu Y-H, Richardson TG, et al. Genetic regulation of the human plasma proteome in 54,306 UK Biobank participants. bioRxiv. :2022:2022.06.17.496443 [Google Scholar]
- 18.Folkersen L, Gustafsson S, Wang Q, Hansen DH, Hedman A, Schork A, et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab. 2020;2(10):1135–48. doi: 10.1038/s42255-020-00287-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dhindsa RS, Burren OS, Sun BB, Prins BP, Matelska D, Wheeler E, et al. Influences of rare protein-coding genetic variants on the human plasma proteome in 50,829 UK Biobank participants. bioRxiv. :2022:2022.10.09.511476 [Google Scholar]
- 20.Enroth S, Johansson A, Enroth SB, Gyllensten U. Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat Commun. 2014;5:4684. doi: 10.1038/ncomms5684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Day N, Oakes S, Luben R, Khaw KT, Bingham S, Welch A, et al. EPIC-Norfolk: study design and characteristics of the cohort. European Prospective Investigation of Cancer. Br J Cancer. 1999;80 Suppl 1:95–103. [PubMed] [Google Scholar]
- 22.Vujkovic M, Keaton JM, Lynch JA, Miller DR, Zhou J, Tcheandjieu C, et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multiancestry meta-analysis. Nat Genet. 2020;52(7):680–91. doi: 10.1038/s41588-020-0637-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Spracklen CN, Horikoshi M, Kim YJ, Lin K, Bragg F, Moon S, et al. Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature. 2020;582(7811):240–5. doi: 10.1038/s41586-020-2263-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mahajan A, Spracklen CN, Zhang W, Ng MCY, Petty LE, Kitajima H, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet. 2022;54(5):560–72. doi: 10.1038/s41588-022-01058-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol. 2016;70:214–23. doi: 10.1016/j.jclinepi.2015.09.016. [DOI] [PubMed] [Google Scholar]
- 26.McDonald TJ, Nilsson G, Vagne M, Ghatei M, Bloom SR, Mutt V. A gastrin releasing peptide from the porcine nonantral gastric tissue. Gut. 1978;19(9):767–74. doi: 10.1136/gut.19.9.767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McDonald TJ, Jörnvall H, Nilsson G, Vagne M, Ghatei M, Bloom SR, et al. Characterization of a gastrin releasing peptide from porcine non-antral gastric tissue. Biochem Biophys Res Commun. 1979;90(1):227–33. doi: 10.1016/0006-291x(79)91614-0. [DOI] [PubMed] [Google Scholar]
- 28.Ladenheim EE, Taylor JE, Coy DH, Moore KA, Moran TH. Hindbrain GRP receptor blockade antagonizes feeding suppression by peripherally administered GRP. Am J Physiol. 1996;271(1 Pt 2):R180–4. doi: 10.1152/ajpregu.1996.271.1.R180. [DOI] [PubMed] [Google Scholar]
- 29.Ladenheim EE, Hampton LL, Whitney AC, White WO, Battey JF, Moran TH. Disruptions in feeding and body weight control in gastrin-releasing peptide receptor deficient mice. J Endocrinol. 2002;174(2):273–81. doi: 10.1677/joe.0.1740273. [DOI] [PubMed] [Google Scholar]
- 30.Persson K, Gingerich RL, Nayak S, Wada K, Wada E, Ahrén B. Reduced GLP-1 and insulin responses and glucose intolerance after gastric glucose in GRP receptor-deleted mice. Am J Physiol Endocrinol Metab. 2000;279(5):E956–62. doi: 10.1152/ajpendo.2000.279.5.E956. [DOI] [PubMed] [Google Scholar]
- 31.Gutzwiller JP, Drewe J, Hildebrand P, Rossi L, Lauper JZ, Beglinger C. Effect of intravenous human gastrin-releasing peptide on food intake in humans. Gastroenterology. 1994;106(5):1168–73. doi: 10.1016/0016-5085(94)90006-x. [DOI] [PubMed] [Google Scholar]
- 32.Mhalhal TR, Washington MC, Newman KD, Heath JC, Sayegh AI. Combined gastrin releasing peptide-29 and glucagon like peptide-1 reduce body weight more than each individual peptide in diet-induced obese male rats. Neuropeptides. 2018;67:71–8. doi: 10.1016/j.npep.2017.11.009. [DOI] [PubMed] [Google Scholar]
- 33.Frullanti E, Berking C, Harbeck N, Jézéquel P, Haugen A, Mawrin C, et al. Meta and pooled analyses of FGFR4 Gly388Arg polymorphism as a cancer prognostic factor. Eur J Cancer Prev. 2011;20(4):340–7. doi: 10.1097/CEJ.0b013e3283457274. [DOI] [PubMed] [Google Scholar]
- 34.Chou CH, Hsieh MJ, Chuang CY, Lin JT, Yeh CM, Tseng PY, et al. Functional FGFR4 Gly388Arg polymorphism contributes to oral squamous cell carcinoma susceptibility. Oncotarget. 2017;8(56):96225–38. doi: 10.18632/oncotarget.21958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Xiong SW, Ma J, Feng F, Fu W, Shu SR, Ma T, et al. Functional FGFR4 Gly388Arg polymorphism contributes to cancer susceptibility: Evidence from meta-analysis. Oncotarget. 2017;8(15):25300–9. doi: 10.18632/oncotarget.15811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ulaganathan VK, Sperl B, Rapp UR, Ullrich A. Germline variant FGFR4 p.G388R exposes a membrane-proximal STAT3 binding site. Nature. 2015;528(7583):570–4. doi: 10.1038/nature16449. [DOI] [PubMed] [Google Scholar]
- 37.Shin DJ, Osborne TF. FGF15/FGFR4 integrates growth factor signaling with hepatic bile acid metabolism and insulin action. J Biol Chem. 2009;284(17):11110–20. doi: 10.1074/jbc.M808747200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ge H, Zhang J, Gong Y, Gupte J, Ye J, Weiszmann J, et al. Fibroblast growth factor receptor 4 (FGFR4) deficiency improves insulin resistance and glucose metabolism under diet-induced obesity conditions. J Biol Chem. 2014;289(44):30470–80. doi: 10.1074/jbc.M114.592022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wu X, Ge H, Lemon B, Weiszmann J, Gupte J, Hawkins N, et al. Selective activation of FGFR4 by an FGF19 variant does not improve glucose metabolism in ob/ob mice. Proc Natl Acad Sci U S A. 2009;106(34):14379–84. doi: 10.1073/pnas.0907812106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Huang X, Yang C, Luo Y, Jin C, Wang F, McKeehan WL. FGFR4 prevents hyperlipidemia and insulin resistance but underlies high-fat diet induced fatty liver. Diabetes. 2007;56(10):2501–10. doi: 10.2337/db07-0648. [DOI] [PubMed] [Google Scholar]
- 41.Lutz SZ, Hennige AM, Peter A, Kovarova M, Totsikas C, Machann J, et al. The Gly385(388)Arg Polymorphism of the FGFR4 Receptor Regulates Hepatic Lipogenesis Under Healthy Diet. J Clin Endocrinol Metab. 2019;104(6):2041–53. doi: 10.1210/jc.2018-01573. [DOI] [PubMed] [Google Scholar]
- 42.Micha R, Khatibzadeh S, Shi P, Fahimi S, Lim S, Andrews KG, et al. Global, regional, and national consumption levels of dietary fats and oils in 1990 and 2010: a systematic analysis including 266 country-specific nutrition surveys. BMJ. 2014;348:g2272. doi: 10.1136/bmj.g2272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lappalainen T, MacArthur DG. From variant to function in human disease genetics. Science. 2021;373(6562):1464–8. doi: 10.1126/science.abi8207. [DOI] [PubMed] [Google Scholar]
- 44.Cencioni MT, Mattoscio M, Magliozzi R, Bar-Or A, Muraro PA. B cells in multiple sclerosis - from targeted depletion to immune reconstitution therapies. Nat Rev Neurol. 2021;17(7):399–414. doi: 10.1038/s41582-021-00498-5. [DOI] [PubMed] [Google Scholar]
- 45.Granqvist M, Boremalm M, Poorghobad A, Svenningsson A, Salzer J, Frisell T, et al. Comparative Effectiveness of Rituximab and Other Initial Treatment Choices for Multiple Sclerosis. JAMA Neurol. 2018;75(3):320–7. doi: 10.1001/jamaneurol.2017.4011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Morris JA, Kemp JP, Youlten SE, Laurent L, Logan JG, Chai RC, et al. An atlas of genetic influences on osteoporosis in humans and mice. Nat Genet. 2019;51(2):258–66. doi: 10.1038/s41588-018-0302-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Medina-Gomez C, Kemp JP, Trajanoska K, Luan J, Chesi A, Ahluwalia TS, et al. Life-Course Genome-wide Association Study Meta-analysis of Total Body BMD and Assessment of Age-Specific Effects. Am J Hum Genet. 2018;102(1):88–102. doi: 10.1016/j.ajhg.2017.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lotta LA, Wittemans LBL, Zuber V, Stewart ID, Sharp SJ, Luan J, et al. Association of Genetic Variants Related to Gluteofemoral vs Abdominal Fat Distribution With Type 2 Diabetes, Coronary Disease, and Cardiovascular Risk Factors. JAMA. 2018;320(24):2553–63. doi: 10.1001/jama.2018.19329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Acosta-Herrera M, Kerick M, González-Serna D, Wijmenga C, Franke A, Gregersen PK, et al. Genome-wide meta-analysis reveals shared new loci in systemic seropositive rheumatic diseases. Ann Rheum Dis. 2019;78(3):311–9. doi: 10.1136/annrheumdis-2018-214127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Robertson CC, Inshaw JRJ, Onengut-Gumuscu S, Chen WM, Santa Cruz DF, Yang H, et al. Fine-mapping, trans-ancestral and genomic analyses identify causal variants, cells, genes and drug targets for type 1 diabetes. Nat Genet. 2021;53(7):962–71. doi: 10.1038/s41588-021-00880-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Clarke LA, Giugliani R, Guffon N, Jones SA, Keenan HA, Munoz-Rojas MV, et al. Genotypephenotype relationships in mucopolysaccharidosis type I (MPS I): Insights from the International MPS I Registry. Clin Genet. 2019;96(4):281–9. doi: 10.1111/cge.13583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kuehn SC, Koehne T, Cornils K, Markmann S, Riedel C, Pestka JM, et al. Impaired bone remodeling and its correction by combination therapy in a mouse model of mucopolysaccharidosis-I. Hum Mol Genet. 2015;24(24):7075–86. doi: 10.1093/hmg/ddv407. [DOI] [PubMed] [Google Scholar]
- 53.Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–34. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Szustakowski JD, Balasubramanian S, Kvikstad E, Khalid S, Bronson PG, Sasson A, et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat Genet. 2021;53(7):942–8. doi: 10.1038/s41588-021-00885-0. [DOI] [PubMed] [Google Scholar]
- 55.Plenge RM, Scolnick EM, Altshuler D. Validating therapeutic targets through human genetics. Nat Rev Drug Discov. 2013;12(8):581–94. doi: 10.1038/nrd4051. [DOI] [PubMed] [Google Scholar]
- 56.Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet. 2009;41(1):56–65. doi: 10.1038/ng.291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lemke G. How macrophages deal with death. Nat Rev Immunol. 2019;19(9):539–49. doi: 10.1038/s41577-019-0167-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Miyanishi M, Tada K, Koike M, Uchiyama Y, Kitamura T, Nagata S. Identification of Tim4 as a phosphatidylserine receptor. Nature. 2007;450(7168):435–9. doi: 10.1038/nature06307. [DOI] [PubMed] [Google Scholar]
- 59.Kuchroo VK, Dardalhon V, Xiao S, Anderson AC. New roles for TIM family members in immune regulation. Nat Rev Immunol. 2008;8(8):577–80. doi: 10.1038/nri2366. [DOI] [PubMed] [Google Scholar]
- 60.Fazio S, Hasty AH, Carter KJ, Murray AB, Price JO, Linton MF. Leukocyte low density lipoprotein receptor (LDL-R) does not contribute to LDL clearance in vivo: bone marrow transplantation studies in the mouse. J Lipid Res. 1997;38(2):391–400. [PubMed] [Google Scholar]
- 61.Magalhaes MS, Smith P, Portman JR, Jackson-Jones LH, Bain CC, Ramachandran P, et al. Role of Tim4 in the regulation of ABCA1. Nat Commun. 2021;12(1):4434. doi: 10.1038/s41467-021-24684-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Alsheikh AJ, Wollenhaupt S, King EA, Reeb J, Ghosh S, Stolzenburg LR, et al. The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases. BMC Med Genomics. 2022;15(1):74. doi: 10.1186/s12920-022-01216-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Porcu E, Rüeger S, Lepik K, Santoni FA, Reymond A, Kutalik Z, et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat Commun. 2019;10(1):3300. doi: 10.1038/s41467-019-10936-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Võsa U, Claringbould A, Westra HJ, Bonder MJ, Deelen P, Zeng B, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53(9):1300–10. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gudmundsson S, Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, et al. Addendum: The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2021;597(7874):E3–E4. doi: 10.1038/s41586-021-03758-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Mostafavi H, Spence JP, Naqvi S, Pritchard JK. Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery. bioRxiv. :2022:2022.05.07.491045 [Google Scholar]
- 67.Akbari P, Gilani A, Sosina O, Kosmicki JA, Khrimian L, Fang YY, et al. Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity. Science. 2021;373(6550) doi: 10.1126/science.abf8683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Zhong W, Edfors F, Gummesson A, Bergström G, Fagerberg L, Uhlén M. Next generation plasma proteome profiling to monitor health and disease. Nat Commun. 2021;12(1):2493. doi: 10.1038/s41467-021-22767-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Assarsson E, Lundberg M, Holmquist G, Björkesten J, Thorsen SB, Ekman D, et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One. 2014;9(4):e95192. doi: 10.1371/journal.pone.0095192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Wang G, Sarkar A, Car bonetto P, Stephens M. A simple new approach to variable selection in regression, with application to genetic fine mapping. Royal Statistical Society. 2020:1273–300. doi: 10.1111/rssb.12388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Jiang L, Zheng Z, Qi T, Kemper KE, Wray NR, Visscher PM, et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet. 2019;51(12):1749–55. doi: 10.1038/s41588-019-0530-8. [DOI] [PubMed] [Google Scholar]
- 72.Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Consortium U. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–D9. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Elsworth B, Lyon M, Alexander T, Liu Y, Matthews P, Hallett J, et al. The MRC IEU OpenGWAS data infrastructure. bioRxiv. :2020:2020.08.10.244293 [Google Scholar]
- 76.Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7 doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10(5):e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Consortium G. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–30. doi: 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Foley CN, Staley JR, Breen PG, Sun BB, Kirk PDW, Burgess S, et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat Commun. 2021;12(1):764. doi: 10.1038/s41467-020-20885-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47(3):284–90. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet. 2019;28(1):166–74. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The EPIC-Norfolk data can be requested by bona fide researchers for specified scientific purposes via the study website (https://www.mrc-epid.cam.ac.uk/research/studies/epic-norfolk/). Data will either be shared through an institutional data sharing agreement or arrangements will be made for analyses to be conducted remotely without the need for data transfer.
Fine-mapped summary statistics for protein coding regions can be found here: https://doi.org/10.5281/zenodo.7576293. The genome-wide summary statistics resulting from the meta-analysis between discovery and replication samples (n = 2,887) can be downloaded for all protein targets included in this study from https://omicscience.org/.
Genome-wide association studies for anthropometric phenotypes have been conducted using the UK Biobank resource (application no. 44448). Access to the UK Biobank genotype and phenotype data is open to all approved health researchers (http://www.ukbiobank.ac.uk/).
GWAS Catalogue summary statistics (v.1.0.2) were downloaded (March 2022) from https://www.ebi.ac.uk/gwas/api/search/downloads/studiesalternative. All GTEx variant-gene cis-eQTL associations from each tissue were downloaded (January 2020) from https://console.cloud.google.com/storage/browser/gtex-resources. OpenGWAS summary statistics were accessed via ieugwasr package in R v3.6.0.
Code availability
Associated code and scripts for the analysis are available on GitHub (https://github.com/MRC-Epid/pGWASOlinkEPIC).
Associated code and scripts for the analysis are available on GitHub (https://github.com/MRC-Epid/pGWASOlinkEPIC).








