Skip to main content
PLOS One logoLink to PLOS One
. 2022 May 23;17(5):e0268815. doi: 10.1371/journal.pone.0268815

Genetically regulated gene expression and proteins revealed discordant effects

Janne Pott 1,2,*, Tarcyane Garcia 1, Stefanie M Hauck 3, Agnese Petrera 3, Kerstin Wirkner 1,2, Markus Loeffler 1,2, Holger Kirsten 1,2, Annette Peters 3,4,5, Markus Scholz 1,2,*
Editor: Jie V Zhao6
PMCID: PMC9126407  PMID: 35604899

Abstract

Background

Although gene-expression (GE) and protein levels are typically strongly genetically regulated, their correlation is known to be low. Here we investigate this phenomenon by focusing on the genetic background of this correlation in order to understand the similarities and differences in the genetic regulation of these omics layers.

Methods and results

We performed locus-wide association studies of 92 protein levels measured in whole blood for 2,014 samples of European ancestry and found that 66 are genetically regulated. Three female- and one male-specific effects were detected. We estimated the genetically regulated GE for all significant genes in 49 GTEx v8 tissues. A total of 7 proteins showed negative correlations with their respective GE across multiple tissues. Finally, we tested for causal links of GE on protein expression via Mendelian Randomization, and confirmed a negative causal effect of GE on protein level for five of these genes in a total of 63 gene-tissue pairs: BLMH, CASP3, CXCL16, IL6R, and SFTPD. For IL6R, we replicated the negative causal effect on coronary-artery disease (CAD), while its GE was positively linked to CAD.

Conclusion

While total GE and protein levels are only weakly correlated, we found high correlations between their genetically regulated components across multiple tissues. Of note, strong negative causal effects of tissue-specific GE on five protein levels were detected. Causal network analyses revealed that GE effects on CAD risks was in general mediated by protein levels.

Introduction

Several large-scale genome-wide association studies (GWASs) identified more than 150 genetic risk loci of coronary artery disease (CAD) [16]. However, for the majority of loci, the underlying molecular pathology remains to be elucidated. High-throughput proteomics could contribute to our understanding of molecular patho-mechanisms by providing functional causal links between genetic loci, proteome expressions and cardiovascular disease traits.

While there is typically a strong relationship between genetics and transcriptomics via cis expression quantitative trait loci (eQTLs), some studies have shown only weak correlations between the transcriptomic and the proteomic layer [710]. Possible reasons comprise different half-lives of mRNA and respective protein, posttranscriptional modifications and tissue and compartment specificity [11]. Nevertheless, genetic effects on expression and protein (proteome quantitative trait loci—pQTL) levels are partly overlapping suggesting common genetic drivers. For example, the Framingham Heart Study detected 26 pQTLs in cis that overlapped with respective eQTLs in whole blood, liver and heart tissues [12]. He et al. [13] analyzed the liver-specific proteome on genome-wide scale and found for about 40% of all tested genes an overlap with known eQTLs in liver.

In this study, we aimed at identifying cis-pQTLs for a panel of 92 biomarkers of CAD measured by proximity extension assays in blood. To characterize these cis loci in more detail, we compared the effects of cis-eQTLs and pQTLs at these loci in more detail. For this purpose, we analyzed the overlap of eQTLs and pQTLs by co-localization analyses and tested for association of genetically regulated gene expression (GE) across tissues and respective blood protein expression (PE). Finally, the identified genetic associations were used to establish causal chains of genetics, transcriptomics, proteomics and CAD via concatenated Mendelian Randomization analyses.

Material and methods

Cohort description

All analyses were performed in participants of the LIFE-Adult study. In LIFE-Adult, 10,000 residents of the city of Leipzig, Germany were randomly recruited in an age- and sex-stratified manner. All participants were deeply examined with respect to civilization diseases such as obesity, diabetes, cardiovascular diseases, cognitive impairment and mental disorders as well as contributing environmental and life-style factors. Details can be found in Loeffler et al. [14]. Blood samples were taken from all participants after an overnight fasting and were stored in the Leipzig Medical Biobank for subsequent analyzes and measurements of genetic, transcriptomic and proteomic data. Overlap of OMICs data is displayed in S1 Fig in S1 File.

LIFE-Adult meets the ethical standards of the Declaration of Helsinki and is approved by the Ethics Committee of the Medical Faculty of the University Leipzig, Germany (Reg. No 263-2009-14122009). Written informed consent including agreement with genetic analyses was obtained from all participants. A basic description of samples used in this study can be found in Table 1.

Table 1. Basic sample description.

Variable Overall Female Male P-value
(n = 2,014) (n = 974) (n = 1,040)
Age (years) 62.5 (11.5) 62.0 (11.3) 62.9 (11.6) 2.47E-02
BMI (kg/m2) 27.7 (4.5) 27.2 (4.8) 28.1 (4.1) 1.02E-06
Current smoker 320 (16.7%) 147 (16.2%) 173 (17.3%) 5.49E-01
Hypertensiona 1166 (58.7%) 496 (51.9%) 670 (65.1%) 2.91E-09
Type 2 diabetesa 479 (23.8%) 214 (22.0%) 265 (25.5%) 7.25E-02
Statin therapyb 319 (15.9%) 124 (12.8%) 195 (18.8%) 2.99E-04
TC (mmol/l) 5.70 (1.06) 5.87 (1.06) 5.53 (1.04) 3.35E-13
LDL-C (mmol/l) 3.58 (0.95) 3.59 (0.96) 3.57 (0.95) 5.88E-01
HDL-C (mmol/l) 1.62 (0.46) 1.81 (0.46) 1.44 (0.39) 2.86E-73

For continuous parameters, the unit is given in parenthesis, and arithmetic mean and standard deviation values are shown. For binary variables, total number and percentages are provided. Differences between sexes were tested with a chi-squared test for all binary parameters, and with Mann-Whitney U test for all continuous parameters. Abbreviations: BMI, body mass index; TC, total cholesterol; LDL-C, low-density lipoprotein; HDL-C, high-density lipoprotein.

a anamnestic, medication or determined by HbA1c>6.5%;

b ATC-code beginning with C10.

Protein biomarker measurement

For proteomic profiling, we selected EDTA plasma probes of 2,024 elderly LIFE-Adult participants. Measurement of 92 CVD-related protein biomarkers was performed with the proximity extension assay (PEA) [15] using the Olink CVD Panel III. Measurements were performed in 23 batches each including 88 samples and two identical controls each. For eight samples, Olink measurement failed, and additional two samples were excluded as outliers (Mahalanobis Distance >3 IQR), resulting in N = 2,014 samples available for further analyses.

Measurements of biomarkers are available for all of these samples except for three biomarkers. Two assays (BLMH, CTSD) failed at one plate, resulting in N = 1,926 for these traits. A single missing value of metalloproteinase 4 was mean-imputed. Across all 92 assays in CVD III, the mean intra-assay (within run) and inter-assay (between runs) variations expressed as coefficients of variation are reported to be 8.1% and 11.5%, respectively. We used normalized protein expression units as semi-quantitative traits. Genetic data were available for all of the samples. An overview of all biomarkers including their distribution and genetic regions is given in S1 Table in S2 File.

Gene expression measurement

Isolated mRNA from whole blood of 3,527 samples was hybridized to Illumina HT-12 v4 Expression BeadChips (Illumina, San Diego, CA, USA) and gene expression (GE) was measured on the Illumina HiScan (47,231 raw GE probes). We then processed the data by log2-transformation, quantile-normalization [16, 17] and correction for batch effects [18] using R/Bioconductor.

Probes were excluded if they were (1) expressed in less than 5% of the samples, (2) still significantly associated with batch effects, or (3) unable to map to a gene according to ingenuity pathway analyses (IPA, QIAGEN Inc., accessed on 2019-04-04). In summary, 20,972 valid GE probes remained, corresponding to 15,950 genes. We looked for transcripts corresponding to the 92 proteome features of the PEA. There were 91 probes with sufficient QC, matching to 68 unique genes.

Samples were removed if (1) the number of detected GE probes deviated more than 3*IQR from the median, (2) the Mahalanobis distance of several quality characteristics deviated more than 3*IQRs from the median [19], or (3) the Euclidean distance of expression values deviated more than 4*IQRs from the median [16]. Overall, of the assayed 3,527 samples, 110 had to be removed for quality reasons. Of those remaining, 3,194 samples had also genetic data available in high quality.

Genotyping & Imputation

A total of 7,838 participants of LIFE-Adult were genotyped on the genome-wide SNP array Axiom CEU1 (Affymetrix). Genotype calling was performed using the software Affymetrix Power Tools (version 1.20.06). We conducted calling and quality control according to Affymetrix’s best practice steps [20].

SNPs were excluded if (1) their call rate was less than 97%, (2) there was a significant violation of Hardy-Weinberg equilibrium (p<1x10-6 for autosomal SNPs, p<1x10-4 for X-chromosomal SNPs in women only), (3) significant plate association (p<1x10-7), or (4) cluster plot specific parameters according to Affymetrix’s recommendation [20].

Samples were removed if (1) their signal contrast on the array was low (<0.82), (2) their call rate was less than 97%, (3) the estimated sex differed from the sex retrieved from the databank, (4) cryptic relatedness was observed (>0.6 [21]), or (5) the estimated genetic ethnicity was out of range (>6*SD in any of the first 10 principal components). There were 33 ethnic outliers, which were removed for all further analyses (see S2 Fig in S1 File). After filtering, LIFE-Adult was genetically homogeneous and we therefore refrained from correcting for population stratification via PCs in the main analyses, but included the first ten PCs in a sensitivity analysis of all lead SNPs per protein.

We imputed our SNP data on the reference 1000 Genome Phase 3 [22] using SHAPEIT [23] v2r900 for prephasing and IMPUTE2 [24] v2.3.2 for genotype estimation. For this study, all SNPs with minor allele frequency (MAF) <1% or imputation info score <0.8 were excluded, resulting in 9,033,656 SNPs for further analyses.

Statistical analysis

An overview of our analysis plan is given in S3 Fig in S1 File.

Genetic association analyses for 92 protein biomarkers

For each of the 92 biomarkers we performed genetic association analyses at the regions of the gene coding for the biomarker, i.e. we searched for cis-pQTLs, only. The region between gene start -500 kb and gene stop +500 kb was considered (see S1 Table in S2 File for the assumed starts and stops of genes). Primary genetic association analysis was done in all subjects (n = 2,014) with PE adjusted for age and sex. In a secondary analyses, we ran sex-stratified analyses (n = 974 female, n = 1,040 male) adjusting PE for age. For the analyses, we used the additive frequentist model with expected genotype counts as implemented in PLINK 2.0 [25]. We lifted our data from hg19 to hg38 using the GWAS Summary Statistics harmonization tool [26].

We pooled all cis-regions of our primary analysis and performed a hierarchical FDR correction as suggested for eQTLs by Peterson et al. [27]. In more detail, we first applied Benjamini & Hochberg (BH) [28] correction of all SNPs associations calculate for a specific PE and identified the SNP with the minimal corrected p-value (Simes p-value). Next, we applied BH on the 92 Simes p-values and tested with α1 = 0.05 to determine the k proteins showing significant associations. We then used α2 = 0.05 x k/92 as significance threshold on the first level as proposed by Benjamini & Bogomolov (BB) [29]. The SNP with lowest and significant p-value was denoted as lead cis-pQTL of the respective protein. We then merged all significant associations and pruned the variants to a subset of markers that are in approximate linkage equilibrium with each other (r2<0.1). Linkage disequilibrium (LD) was calculated using all LIFE-Adult participants. Finally, we annotated these independent variants with (1) other nearby genes (Ensemble, +/- 250 kb of SNP position) [30], (2) known traits associations from the GWAS Catalog (LD r2>0.3) [31], (3) known cis-eQTLs (LD r2>0.3, αcis = 0.05) [3235], and (4) CADD scores as measure of deleteriousness [36]. We defined novel loci as regions whose lead SNP was not in LD with a variant reported for blood protein biomarker levels in the GWAS catalog (LD r2≤0.3).

For all lead cis-pQTLs, we checked for sex-specific effects on PE and compared effect sizes between females and males applying t-tests of beta estimates [37]. We also looked for sex-specific significant loci by applying the same hierarchical FDR correction as mentioned above. Finally, we looked up the eQTL summary statistics of all GTEx v8 [38] tissues for all lead SNPs and their associated genes and compared their effect direction with our pQTL findings. The GTEx data used for the analyses described in this manuscript were obtained from the GTEx Portal on 09.06.2020. We reported those for whole blood and the (second) best associated tissue. In addition, per locus we retrieved the best cis-eQTLs (defined by lowest p-value per tissue and gene) and calculated pairwise LD (r2) to the respective lead pQTLs. To validate our findings, we performed whole blood cis-eQTL analysis in our LIFE data (using n = 3,194 samples with gene expression in whole blood and genetic data).

Co-localization and association analyses between gene-expression and protein levels

In order to investigate the link between gene expression and protein levels in more detail, we performed three locus-wise analyses: First, we performed a pairwise co-localization test [39] between our pQTLs and eQTLs obtained from GTEx v8. In more detail, this method tested if two trait associations share the same causal variant, regardless of effect direction. Five hypotheses were tested in parallel, of which H4 states that the traits share the causal SNP, while H3 assumes two independent signals. As threshold for co-localization and for independence, we used a posterior probability for H4 and H3 of ≥0.75, respectively. The region of co-localization was defined as the position of the lead pQTL +/- 500 kb. We used the R-package “coloc” for this analysis [39].

In a second analysis, the summary statistics of all proteomic features with significant cis-pQTLs were used to search for correlations with respective genetically estimated gene-expression (gGE) using the MetaXcan approach [40]. The expression prediction models were downloaded from the github repository [41] (see also PredictDB [42]; GTEx v8 models using elastic net algorithm). PredictDB contains only models that passed stringent criteria (e.g., number of SNPs used, posterior probability for being an eQTL). Hence, not all gene—tissue combinations were available for this analysis. In total, we tested 2,242 tissue-specific gGE for protein association. To adjust for multiple testing of several tissues per protein, we performed a hierarchical FDR correction as mentioned above. The first level were the tissues per protein, the second level were the analyzed proteins. We report findings in whole blood and the (second) best associated tissue.

Finally, we validated the MetaXcan results obtained for blood tissue using our measured gene-expression profiles. Raw gene-expression data were available for 45 of the 64 genes with pQTLs and paired GE/proteome data were available for 1048 samples. We estimated both Pearson’s correlation and Pearson´s partial correlation controlling for sex, age, percentage of lymphocytes and percentage of monocytes on total white blood cells. We repeated this analysis in the sex-stratified subsets.

Mendelian randomization analyses

To investigate whether observed associations between GE and PE were causal, we performed Mendelian Randomization (MR) analyses. As MR requires strong instruments, we used the best-associated cis-eQTLs per tissue with lowest p-value and p<5x10-8 (n = 428 SNPs for a total of 58 genes). To adjust for multiple testing, we performed a hierarchical FDR correction as mentioned above. The first level were the tissues, the second level were the analyzed genes.

Since the proteins on our array were supposed to be cardio-vascular biomarkers, we also estimated the causal effects of protein levels on CAD. We considered only lead pQTLs reaching p<5x10-8 as instruments (n = 48 proteins). P-values of MR were adjusted using Bonferroni correction for 48 tests. For proteins with significant causal effect on CAD, we tested for causal chains GE → PE → CAD [43].

In all analyses, we used the ratio method and estimated the standard error using the first two terms of the delta method [44]. Summary statistics were obtained from our pQTL-analyses, from GTEx v8 [38] and from van der Harst et al. [6].

Results

An overview of all 92 analyzed proteins, their abbreviations and full name is given in S1 Table in S2 File. In the following, only the gene name abbreviations are used, with the regular written names referring to PE and italic written names to GE. In addition, all main results are included in this table as TRUE/FALSE vectors, which summarize the genetic associations, the GE correlations and the causal analyzes per protein (for the combined setting).

SNP level results

After applying hierarchical FDR, we detected for 64 biomarkers significant associations in or nearby the corresponding gene (23,951 unique SNPs, see S2 Table in S2 File for an overview of all Simes p-values). Priority pruning revealed 758 independent SNPs (see S3 Table in S2 File for summary statistics and full annotation with nearby genes, GWAS catalog traits and enriched pathways, and S4 Table in S2 File for lead pQTLs per protein). Several of these loci were already described for association with blood protein levels (n = 27 loci) [4549]. Of the remaining 37 loci, 25 were previously reported for other traits (e.g. lipids, CAD related traits, or blood fractions), while 12 loci were not reported for any trait associations so far. These 37 loci are considered novel for our protein traits. Fig 1 shows a circular plot of all cis regions and -log10 transformed association p-values of our association study and those of the best GTEx tissue per gene. Although we did not perform a classic GWAS, 16 of the 37 novel and 22 of the other loci also reached the classic genome-wide significance threshold of p<5x10-8 (regarding the Simes p-value).

Fig 1. Circular plot of cis-associations.

Fig 1

Log-transformed p-values for cis-pQTLs and eQTLs are shown in the green respectively blue circle. We obtained the statistics for eQTLs from GTEx and present the results of the tissue with the strongest eQTL per gene (see S3 Table in S2 File). For plotting, the y-axis was restricted to -log (p) = 20, i.e. all larger -log (p) values were set to 20. The red circles mark the classical genome-wide significance threshold (p = 5x10-8). Gene names are added for loci not yet described for blood protein levels, and are colored with respect to the novelty level (blue: not described for any other traits, black: reported for other traits except for blood protein levels).

In our sex-stratified approach, we detected 54 proteins significantly associated in men, and 48 in women (see S2 Table in S2 File). Five proteins were associated in males, but not in females, and had lower Simes p-values compared to the combined setting, suggesting male-specific loci (GDF15, MPO, PAI, OPN, and TFF3). Of note, PAI was only associated in the male setting, not in the combined one. Similarly, NOTCH3 was only associated in the female setting, but not in the combined or male setting. In the following, all 66 proteins with association in at least one setting are analyzed. Regarding the 110 lead SNPs of all settings (both from the combined and sex-stratified analyses, if other lead SNPs were detected here), we observed for 18 of them a significant difference in effect size in men and women, but only five of them survived multiple testing correction (S4 Fig in S1 File), including NOTCH3 and MPO. We reported the sex-stratified results in S4 Table in S2 File. In our sensitivity analyses additionally adjusting for the first ten principal components, we found no significant bias in our results, as the effect estimates were the same, and their p-value increased only slightly. All associations remained significant according to our FDR threshold (see S4 Table in S2 File).

Next, we searched the GTEx database for associations of our 110 lead pQTLs with GE of the corresponding genes. We found 7,714 such associations across all 49 GTEx tissues. We checked the direction consistency of the e- and pQTLs, and surprisingly, detected for 41 proteins at least one discordant direction. For 13 of them, this discordant direction was observed in most of the associated tissues (more than 75% of tissues in which the eQTL was observed, see Table 2 and Fig 2). Restricted to whole blood, there were ten QTLs with discordant effects. To validate this finding in whole blood, we replicated the eQTL analysis in our LIFE data (GE available for 45 genes). Here, four of the ten SNPs were associated with p<0.05 and showed discordant effect directions when compared to the corresponding pQTL (SFTPD, BLMH, ACP5, and CXCL16). The other SNPs had the same effect direction or showed no significant effect in our data.

Table 2. Comparison of effect direction of cis-pQTLs from our GWAS and cis-eQTLs from GTEx.

Locus Information pQTL eQTL GTEx whole blood and (sec.) best tissue
Protein (ratio) pQTL effect allele / EAF beta p-value beta p-value Tissue
IL6R rs4129267 0.421 2.96x10-323 -0.087 5.40x10-09 Artery Tib.
(17/18) T / 0.380 -0.194 8.41x10-17
CCL15 rs41436444 0.939 4.56x10-292 - - Lung
(20/21) CAGGGCAG / 0.080 -0.635 1.65x10-17
CCL16 rs10445391 -0.852 8.01x10-155 - - Thyroid
(3/4) G / 0.072 0.220 2.27x10-04
SFTPD rs721917 -0.475 1.37x10-95 0.113 3.39x10-03 Artery Tib.
(42/42) G / 0.406 0.661 1.90x10-43
BLMH rs7214248 0.193 4.29x10-61 -0.072 3.80x10-04 Artery Tib.
(24/26) A / 0.346 -0.221 1.02x10-17
ACP5 rs897811 -0.163 1.70x10-20 0.501 5.78x10-19 Thyroid
(35/37) C / 0.116 0.376 9.71x10-10
TIMP4 rs392394 0.140 4.54x10-17 -0.024 4.36x10-01 Artery Tib.
(16/17) A / 0.782 -0.239 7.95x10-10
TNFRSF11B rs11300005 0.073 2.70x10-11 - - Eso. Mus.
(2/2) C / 0.493 -0.133 1.32x10-02
AXL rs3786556 -0.446 9.39x10-16 -0.002 9.68x10-01 Artery Tib.
(28/28) T / 0.184 0.227 2.13x10-13
CPA1 rs35454128 0.197 3.79x10-08 0.117 8.18x10-02 Adipose Sub.
(7/9) C / 0.120 -0.237 1.95x10-03
CASP3 rs6845294 0.153 1.09x10-07 0.026 1.54x10-01 Cells fibro.
(14/15) A / 0.687 -0.248 5.35x10-17
CDH5 rs16956504 0.105 1.10x10-07 0.007 8.93x10-01 Pituitary
(3/3) C / 0.110 -0.161 2.14x10-02
CXCL16 rs145042193 -0.064 8.22x10-07 0.104 2.47x10-03 Cells fibro.
(25/25) T / 0.208 0.362 4.92x10-17

We show results of the 13 genes for which discordant effect directions between pQTL and most of the respective eQTLs (more than 75% of all significant eQTLs across tissues) were observed. We also report eQTLs of whole blood and the best-associated tissue in GTEx. The effect allele and its frequency is given below of the respective SNP ID. For four genes we could replicate the different effect direction in our LIFE data (marked in bold, see S2 Table in S2 File for more details).

Fig 2. Scatter plot of effect estimates of the 63 lead pQTL-SNPs on gene expression and protein levels.

Fig 2

As the focus was on the direction only, we did not normalize the effect estimates. Only SNPs with a significant eQTL in at least one tissue are displayed (p< = 0.05 in GTEx). Results of the 13 genes showing discordant pQTL and eQTL directions in more than 75% of eQTL tissues are labeled (see also Table 2 and S5 Table in S2 File).

Locus level results

Although all lead pQTLs were associated with GE in at least one tissue, only in 4% of the GE-tissue combinations the best eQTL was also the best pQTL, and in 30% the two SNPs were in some LD (r2>0.1). To determine whether these signals are inter-related, we performed co-localization analyses and tested for an association of genetically regulated gene expression and protein levels. A summary of these tissue-specific analyses is shown as Venn diagram (S5 Fig in S1 File) and in S6 Table in S2 File.

We observed 50 proteins with at least one shared (PP4>75%) GE signal, and 42 with at least one independent signal (PP3>75%) across tissues. A total of 34 proteins show both, dependent and independent signals in different tissues. Posterior probabilities for all pairs can be found in S7 Table in S2 File and are displayed in S6 Fig in S1 File. We compared the gene-tissue combinations of shared and independent signaling with those of high and low LD between best eQTL and pQTL. Regarding the higher LD (r2>0.1) combinations, the distribution between shared and independent signals was almost the same (n = 212 with PP3>0.75, n = 295 with PP4>0.75). This demonstrates that LD does not guarantee co-localization. In contrast, for low LD pairs, there was a clear trend to independent signals (n = 189 with PP3>0.75, n = 14 with PP4>0.75).

We estimated the genetically regulated gene expression in all GTEx v8 tissues using MetaXcan. Here, several SNPs at each gene locus were selected and included into the GE prediction model. The predicted GE was then tested for association with the respective protein. After applying hierarchical FDR, we detected significant associations for 58 of the 64 considered biomarkers in at least one tissue (n = 1,474 significant tests of a total of 2,242). Most genes showed this association in about half of all tissues (median of 27 associated tissues). Counterexamples are LBTR and IL17RA, showing GE-PE association across many tissues (49, respectively 42 tissues).

We compare the MetaXcan results with our results obtained by co-localization analyses. The intersection of significant associations and co-localization comprised n = 531 gene-tissue pairs, of which n = 288 pairs showed co-localization and n = 243 indicated independent signaling. This demonstrates that results of MetaXcan-based gene-expression association analysis are only loosely related to those of co-localization. We summarized all MetaXcan results in S8 Table in S2 File.

We checked the direction of the correlation of tissue-specific GE and PE. A total of 42 proteins showed opposite direction of effects in at least one tissue. Seven of them showed this negative association in most of the tissues (see Table 3), including six which were also found based on the LD considerations performed in the previous paragraph (see Table 2).

Table 3. Proteins showing predominantly negative correlation between tissue-specific GE and blood PE.

Protein #tissues (neg/tot) Effect P-value Tissue
IL6R 25/29 -2.177 7.11x10-303 Artery Tib.
-2.965 8.21x10-197 WB
SFTPD 33/40 -6.003 7.69x10-77 Colon Trans.
BLMH 12/15 -0.985 1.48x10-58 Artery Tib.
-1.052 8.11x10-15 WB
TIMP4 29/36 -0.371 8.00x10-17 Brain Ant. Cingulate Cortex
CASP3 25/29 0.865 1.51x10-7 Brain Putamen
-109.229 2.78x10-7 WB
CXCL16 41/41 -0.242 6.70x10-7 Kidney Cortex
-0.067 1.54x10-2 WB
TREML2 13/14 -40.610 5.85x10-6 Thyroid
-0.876 3.58x10-4 WB

MetaXcan results are shown for the best-associated tissue with negative effect estimate and whole blood (WB), if a prediction model was available. Neg = Number of significant negative correlations across tissues. Tot = Total number of significant associations.

Finally, we compared the MetaXcan-derived GE-protein associations with GE-protein associations based on raw GE data of whole blood from our LIFE study. GE data were available for 45 genes. Among those, we detected 21 significant partial correlations controlling for age, sex and white blood cell counts (S9 Table in S2 File). Comparing these results with the respective MetaXcan results of whole blood, we found eight pairs that are significant in both analyses. For all of them, the same effect direction was observed, with negative correlation of PE, and both, total GE and genetically estimated GE in whole blood.

Causal network of gene expression, protein levels and CAD

We performed Mendelian Randomization analyses of the causal relationship between GE and PE for all tissues for which a strong eQTL (p<5x10-8) was available. Accordingly, we tested 58 genes in up to 27 tissues (n = 670 tests in the combined setting). We detected causal links for 51 genes (501 genes-tissue pairs). There were predominantly positive effects (364 pairs with positive causal effect of GE on PE). A summary of all instruments, tissues, and causal effects is given in S10 Table in S2 File.

For 419 pairs of GE and protein, we found both, significant MetaXcan association and MR effect. Among those, 398 showed concordant effect directions between GE and PE (92 pairs with negative effect, 306 with positive effect), i.e. effect directions are in large agreement. The 92 pairs with negative effect comprise 20 unique genes. Twelve of them show this relation in >75% of associated tissues, including five genes described in the previous sections (BLMH, CASP3, CXCL16, IL6R, and SFTPD). Summaries of analysis results of these genes are shown in Fig 3 and S7 Fig in S1 File, and Table 4.

Fig 3. Overlap of genes with negative correlation to protein levels according to different analysis strategies.

Fig 3

We present the genes with negative protein correlations for more than 75% of the significant tissues according to MetaXcan (MX), negative causal effect estimates according to Mendelian Randomization (MR), and opposite effect direction of eQTLs and pQTLs (see also Tables 24).

Table 4. Proteins with predominantly negative causal links of GE and PE.

Protein #tissues (neg/tot) Causal effect estimate P-value MR Association effect estimate P-value MetaXcan Tissue
IL6R 5/5 -0.713 5.57x10-27 0.126 4.63x10-06 Testis
-3.008 1.35x10-16 -2.965 8.21x10-197 WB
SFTPD 22/25 -0.347 2.64x10-17 -0.050 5.68x10-04 Heart AA
WB
-0.749 6.43x10-06
BLMH 2/2 -0.873 3.07x10-15 -0.985 1.48x10-58 Artery Tib.
CCL15 15/16 -0.459 1.02x10-14 Nerve Tib.
PI3 1/1 -0.731 1.02x10-08 WB
AXL 6/6 -0.243 6.03x10-06 Artery Tib.
CCL16 15/15 -0.181 1.28x10-05 Liver
CASP3 5/6 -0.440 1.89x10-05 -0.173 1.57x10-06 Cells fibro.
0.914 1.83x10-05 0.865 1.51x10-07 WB
TFRC 1/1 -0.404 7.94x10-05 Lung
CXCL16 17/17 -0.225 1.69x10-04 -0.213 1.68x10-05 Thyroid
-0.097 2.97x10-03 -0.067 1.54x10-02 WB
CDH5 1/1 -0.148 1.92x10-02 Thyroid
CD163 1/1 -0.129 3.66x10-02 Testis

MR results are shown for the best tissue with negative effect estimate and whole blood, if available. We added respective MetaXcan results for comparison. Neg = Number of significant negative causal estimate across tissue. Tot = Total number of significant MR tests.

Next, we tested the causal link between the biomarkers and coronary artery disease. We restricted the analyses to lead pQTLs with p<5x10-8 and available CAD statistics from van der Haarst et al. [6]. This left us with 47 biomarkers. Results are given in S11 Table in S2 File and a scatter plot for all pairs is shown in S8 Fig in S1 File. Four protein showed a significant effect in the combined setting: IL6R (βIV = -0.094, p = 4.90x10-14), PCSK9 (βIV = 0.540, p = 4.72x10-10), TFPI (βIV = -0.116, p = 7.83x10-5), and AXL (βIV = 0.487, p = 6.09x10-5). IL6R had significant causal estimates in the sex-stratified settings as well. The estimated causal TFPI effect was also significant in females, but in males it reached only nominal significance that did not survive multiple testing correction. PCSK9 was also causally linked in males, while in females the PCSK9 instrument did not reach the significance threshold to be included for MR analysis. For AXL, both sex-stratified instruments were above the MR significance threshold and hence excluded.

Finally, we searched for causal chains from GE over PE to CAD for all four proteins with significant causal link to CAD, which were also causally affected by gene expression (n = 50 tests). The total causal effect estimates of GE on CAD were significant in 26 gene-tissue pairs and negative in four of them (TFPI in tibial artery and cultured fibroblast cells in all and females). AXL was not causally linked to CAD in any tissue. The indirect effect was estimated as product of the GE → PE and PE → CAD effects, which corresponds to the effect of GE on CAD mediated by PE. These indirect effect estimates were significant for all 50 gene-tissue pairs and for 45 of them no significant difference between the total and indirect GE-CAD effect was observed, indicating complete mediation of GE via PE towards CAD. We summarized the results in S12 Table in S2 File and displayed the causal chains in Fig 4.

Fig 4. Graphical overview of causal networks form gene expression (GE) over protein expression (PE) on the outcome coronary artery disease (CAD).

Fig 4

Orange arrows indicate a negative causal effect of protein level on CAD (IL6R and TFPI) or of gene expression on protein levels, blue arrows denote positive links. The settings are indicated by c (combined), m (males) and f (females). Tissues in which gene expression showed a significant indirect effect on CAD are listed next to their gene. Bold tissues indicate also a significant direct effect.

Discussion

In this work, we performed a genetic cis-association analysis of 92 cardiovascular biomarkers and found 66 regulations. We used these signals to unravel the relationship of cis-pQTLs and cis-eQTLs by (1) testing for co-localization of signals, (2) analyzing the correlation of genetically regulated GE and PE, and (3) testing for causal effects of the GE/PE associations. Finally, we established causal chains of GE, PE and CAD across tissues.

In our study, we focus on cis-effects rather than a whole-genome hypothesis-free approach. Similar to eQTL analyses, cis-effects tend to be true positives, while trans-effects are often false positives requiring much more stringent false positive control limiting power of this type of analysis. We check the GWAS catalog for known associations, and detected that 37 of our associations are novel. Moreover, for 38 of the 66 associated proteins, our lead SNP achieved genome-wide significance (p<5x10-8).

As example, PI3 was a genome-wide significant and novel loci, which was also associated in the sex-stratified analyses. Here, it showed a significant sex-related effect, i.e. the effect estimate in men was twice that of women. We tested our own GE and eQTL data for this interaction but could not confirm this sex-dimorphism at the GE layer. Moreover, we detected a co-localization of eQTLs and pQTLs for men (whole blood), but not for women. In both MetaXcan and MR, the estimates were significant in each sex, but with stronger effect in men (MR in whole blood: βmen = -0.990, βwomen = -0.481, pIA = 0.002). PI3 codes for elafin, which has been linked to the inflammatory response in atherosclerosis [50] and myocardial infarction [51]. Most atherosclerotic outcomes show sexual dimorphism as well, with higher risk for men. This makes elafin an interesting target for further studies of sex-dimorphisms in cardiovascular research.

To unravel the relationship between GE and PE, we performed several analyses across different tissues since the origin of plasma PE is not necessarily whole blood. We found that the lead pQTL was in most cases not the best-associated eQTL. We tested pairwise LD between eQTLs and pQTLs and found most of them in low LD (r2<0.1) across tissues and proteins. When comparing LD results with our co-localization results, we found as expected that amongst the low LD pairs signals are often predicted as independent. More surprisingly, for the high LD pairs, the ratio between independence and co-localization was balanced, i.e. LD does not ensure co-localization of the signals. For example, both IL17RA and SFTPD have high LD pairs in 34 and 39 tissues, respectively. IL17RA showed co-localization of these signals in all these tissues. In contrast, for SFTPD, co-localization was refuted for all these signals.

In general, we observed a good agreement of MetaXcan and MR results although the MetaXcan approach does not show causality per se and is also based on different gene-models compared to the instruments used for MR. In contrast, we observed only a moderate overlap between co-localization results and MetaXcan / MR results. Signals with co-localization but no significant causal estimate could be explained by pleiotropic effects while causality but lack of co-localization could be explained by locus heterogeneity, i.e. different causal e- and pQTLs. However, most of the detected associations were found in for high LD pairs of e- and pQTL (Co-localization: 295 of 313 gene tissue pairs [94.2%]; MetaXcan: 659 of 1,474 [44.7%], MR: 357 of 501 [71.3%]).

Most interestingly, we detected five gene-protein pairs, which consistently showed opposite effect directions of eQTLs and pQTLs, negative correlation of GE and PE (MetaXcan) and negative causal effects (Mendelian Randomization). Those were BLMH, CASP3, CXCL16, IL6R and SFTPD. While BLMH was co-localized in 10 of 12 analyzed tissues, the other four had one co-localizing tissue each, but all other tissues suggested independent signals. Only BLMH was co-localized in most tissues. We discuss the functional relevance of this observation in more detail by a GTEx v8 look-up of gene expression levels by tissue [38]. While the highest rate of BLMH expression occurs in skin tissue, it is also expressed in artery tissues, for which we observed negative association and in tibial artery tissue also negative causal effect. High expression levels in GTEx v8 [38] and negative links in our analyses were also found for CXCL16 in tissues testis, whole blood, and skin, and for IL6R in tissues muscle skeletal, whole blood, esophagus muscularis, and colon transverse. CASP3 had negative links between GE and PE in several tissues, but also four positive links for brain (substantia nigra), liver, pancreas and whole blood tissue. The negative links were detected in tissues with higher GE, e.g. cells cultured fibroblasts. SFTPD is highly expressed in lung tissue, in which we found independent e- and pQTL signals and a positive causal estimates. In other tissues with lower SFTPD expression, negative links were observed, e.g. thyroid. Thus, in the last case, the relationship of GE and PE could be dominated by single tissues showing a positive correlation, while for the other four the negative links happen in tissues with substantial gene expression. This suggests functional relevance of BLMH, CASP3, CXCL16, and IL6R that needs further biological validation.

Consistent negative effects between GE and PE are of particular interest to further studies, since the mechanism behind this is not clear. Explanations for this observation could be (1) tissue-specific protein levels that differ from those measured in whole blood; (2) whole blood acting only as transport compartment to a specific target tissue; (3) upregulation of pathways, in which the protein is further metabolized; (4) post-translational modifications that influence protein degradation; or (5) upregulation of genes in response to increased consumption / degradation of protein.

Among our results, IL6R showed the most pronounced effects, with causal negative effects in whole blood, testis, artery tibial and colon transverse tissues. We speculate that inflammatory conditions consume IL6R resulting in low plasma levels but increased gene-expression to counter the IL6R loss. The negative effect of IL6R on CAD was previously reported by Yuan et al. [52] and could be explained by reduced inflammation. However, the GE effect on CAD was positive, which is mainly mediated by IL6R PE.

In our MR mediation analyses, we found several indirect causal links of tissue-specific GE over PE in whole blood on CAD. Only for TFPI, we found indirect effects from heart-specific tissues (atrial appendage and left ventricle). Although these two tissues might be more specific for CAD, cis-effects are usually shared across tissues, with exception of brain tissues [38]. Hence, other tissues such as muscle and whole blood with larger sample sizes might still detect the true contribution of the gene expression. In addition, we do not know which tissue our measured proteins come from. Therefore, it is also possible that increased GE in CAD-unrelated tissues lead to higher blood protein levels. The proteins can be transported to heart tissues, where they increase the risk for an event. The detected direct effects of AXL in four tissues could be false positives, given the comparison to a non-significant total effect. The direct effect of PCSK9 gene expression occurs in Adipose tissue (visceral omentum), where it is only weakly expressed (TPM = 0.27). This needs further biological validation.

One limitation of this study was its relatively small sample size, and with it, reduced power. We therefore refrained from analyzing trans-pQTLs and focused on cis-effects. Larger studies of meta-analyses are required to resolve this limitation. The observed trans-associations could then be used as independent instruments for a bivariate Mendelian Randomization analysis, checking if high protein levels showed reverse causality on the gene expressions, an issue which could not be addressed by our study.

In conclusion, we discovered several causal links of tissue-specific gene-expression and blood protein levels of cardiovascular biomarkers. Observed negative causal relationships are of interest for further studies to unravel the underlying post-transcriptional or pathway-associated regulatory processes. Finally, we established a causal patho-mechanistic network of GE and PE of IL6R, PCSK9, TFPI, and AXL and coronary artery disease providing possible new therapy targets.

Supporting information

S1 File

(PDF)

S2 File

(XLSX)

Acknowledgments

We thank all study participants of the LIFE-Adult study whose personal dedication and commitment have made this project possible. LIFE-Adult genotyping (round 3) was done at the Cologne Center for Genomics (CCG, University of Cologne, Peter Nürnberg and Mohammad R. Toliat). For genotype imputation, compute infrastructure provided by ScaDS (Dresden/Leipzig Competence Center for Scalable Data Services and Solutions) at the Leipzig University Computing Centre was used.

Data Availability

All summary statistics are publicly available from Zenodo (DOI: 10.5281/zenodo.6045694). Scripts used in the secondary analyses are included at https://github.com/GenStatLeipzig/LWAS_Olink. Complete data sets including genetic data cannot be shared publicly due to ethical and legal restrictions, as they are sufficient to identity study participants. This is not covered by the informed consent form of the LIFE-Adult study. Data are available from the LIFE Research Center (contact via Dr. Matthias Nüchter, Head of Managing Office, E-mail: matthias.nuechter@life.uni-leipzig.de) for researchers who meet the criteria for access to confidential data.

Funding Statement

LIFE-Adult is funded by the Leipzig Research Center for Civilization Diseases (LIFE). LIFE is an organizational unit affiliated to the Medical Faculty of the University of Leipzig. LIFE is funded by means of the European Union, by the European Regional Development Fund (ERDF) and by funds of the Free State of Saxony within the framework of the excellence initiative. Olink measurements were funded by the HI-MAG project "Serum proteome biomarkers as mediators of cardiometabolic disease development" of the Medical Faculty of the University Leipzig and the Helmholtz Zentrum München. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Nikpay M, Goel A, Won H-H, Hall LM, Willenborg C, Kanoni S, et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015; 47:1121–30. Epub 2015/09/07. doi: 10.1038/ng.3396 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Howson JMM, Zhao W, Barnes DR, Ho W-K, Young R, Paul DS, et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat Genet. 2017; 49:1113–9. Epub 2017/05/22. doi: 10.1038/ng.3874 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Klarin D, Zhu QM, Emdin CA, Chaffin M, Horner S, McMillan BJ, et al. Genetic analysis in UK Biobank links insulin resistance and transendothelial migration pathways to coronary artery disease. Nat Genet. 2017; 49:1392–7. Epub 2017/07/17. doi: 10.1038/ng.3914 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nelson CP, Goel A, Butterworth AS, Kanoni S, Webb TR, Marouli E, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet. 2017; 49:1385–91. Epub 2017/07/17. doi: 10.1038/ng.3913 . [DOI] [PubMed] [Google Scholar]
  • 5.Verweij N, Eppinga RN, Hagemeijer Y, van der Harst P. Identification of 15 novel risk loci for coronary artery disease and genetic risk of recurrent events, atrial fibrillation and heart failure. Sci Rep. 2017; 7:2761. Epub 2017/06/05. doi: 10.1038/s41598-017-03062-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.van der Harst P, Verweij N. Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease. Circ Res. 2018; 122:433–43. Epub 2017/12/06. doi: 10.1161/CIRCRESAHA.117.312086 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen G, Gharib TG, Huang C-C, Taylor JMG, Misek DE, Kardia SLR, et al. Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics. 2002; 1:304–13. doi: 10.1074/mcp.m200008-mcp200 . [DOI] [PubMed] [Google Scholar]
  • 8.Pascal LE, True LD, Campbell DS, Deutsch EW, Risk M, Coleman IM, et al. Correlation of mRNA and protein levels: cell type-specific gene expression of cluster designation antigens in the prostate. BMC Genomics. 2008; 9:246. Epub 2008/05/23. doi: 10.1186/1471-2164-9-246 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ghazalpour A, Bennett B, Petyuk VA, Orozco L, Hagopian R, Mungrue IN, et al. Comparative analysis of proteome and transcriptome variation in mouse. PLoS Genet. 2011; 7:e1001393. Epub 2011/06/09. doi: 10.1371/journal.pgen.1001393 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yeung ES. Genome-wide correlation between mRNA and protein in a single cell. Angew Chem Int Ed Engl. 2011; 50:583–5. doi: 10.1002/anie.201005969 . [DOI] [PubMed] [Google Scholar]
  • 11.Haider S, Pal R. Integrated analysis of transcriptomic and proteomic data. Curr Genomics. 2013; 14:91–110. doi: 10.2174/1389202911314020003 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yao C, Chen G, Song C, Keefe J, Mendelson M, Huan T, et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat Commun. 2018; 9:3268. Epub 2018/08/15. doi: 10.1038/s41467-018-05512-x . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.He B, Shi J, Wang X, Jiang H, Zhu H-J. Genome-wide pQTL analysis of protein expression regulatory networks in the human liver. BMC Biol. 2020; 18:97. Epub 2020/08/10. doi: 10.1186/s12915-020-00830-3 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Loeffler M, Engel C, Ahnert P, Alfermann D, Arelin K, Baber R, et al. The LIFE-Adult-Study: objectives and design of a population-based cohort study with 10,000 deeply phenotyped adults in Germany. BMC Public Health. 2015; 15:691. Epub 2015/07/22. doi: 10.1186/s12889-015-1983-z . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Assarsson E, Lundberg M, Holmquist G, Björkesten J, Thorsen SB, Ekman D, et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One. 2014; 9:e95192. Epub 2014/04/22. doi: 10.1371/journal.pone.0095192 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008; 24:1547–8. Epub 2008/05/08. doi: 10.1093/bioinformatics/btn224 . [DOI] [PubMed] [Google Scholar]
  • 17.Schmid R, Baum P, Ittrich C, Fundel-Clemens K, Huber W, Brors B, et al. Comparison of normalization methods for Illumina BeadChip HumanHT-12 v3. BMC Genomics. 2010; 11:349. Epub 2010/06/02. doi: 10.1186/1471-2164-11-349 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007; 8:118–27. Epub 2006/04/21. doi: 10.1093/biostatistics/kxj037 . [DOI] [PubMed] [Google Scholar]
  • 19.Cohen Freue GV, Hollander Z, Shen E, Zamar RH, Balshaw R, Scherer A, et al. MDQC: a new quality assessment method for microarrays based on quality control reports. Bioinformatics. 2007; 23:3162–9. Epub 2007/10/12. doi: 10.1093/bioinformatics/btm487 . [DOI] [PubMed] [Google Scholar]
  • 20.Affymetrix. Axiom Analysis Suite. UserGuide. 2015. http://www.affymetrix.com/support/technical/byproduct.affx?product=axiomanalysissuite.
  • 21.Wang J. An estimator for pairwise relatedness using molecular markers. Genetics. 2002; 160:1203–15. doi: 10.1093/genetics/160.3.1203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. A global reference for human genetic variation. Nature. 2015; 526:68–74. doi: 10.1038/nature15393 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Delaneau O, Howie B, Cox AJ, Zagury J-F, Marchini J. Haplotype estimation using sequencing reads. Am J Hum Genet. 2013; 93:687–96. doi: 10.1016/j.ajhg.2013.09.002 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009; 5:e1000529. Epub 2009/06/19. doi: 10.1371/journal.pgen.1000529 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015; 4:7. Epub 2015/02/25. doi: 10.1186/s13742-015-0047-8 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Barbeira AN, Vairus L. summary-gwas-imputation. harmonization, liftover, and imputation of summary statistics from GWAS. GitHub repository: IM-Lab; 2020. https://github.com/hakyimlab/summary-gwas-imputation. [Google Scholar]
  • 27.Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies. Genet Epidemiol. 2016; 40:45–56. Epub 2015/12/02. doi: 10.1002/gepi.21942 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Benjamini Yoav, and Hochberg Yosef. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological). 1995; 57:289–300. Available from: www.jstor.org/stable/2346101. [Google Scholar]
  • 29.Benjamini Y, Bogomolov M. Selective inference on multiple families of hypotheses. J R Stat Soc B. 2014; 76:297–318. doi: 10.1111/rssb.12028 [DOI] [Google Scholar]
  • 30.Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. Ensembl 2018. Nucleic Acids Res. 2018; 46:D754–D761. doi: 10.1093/nar/gkx1098 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019; 47:D1005–D1012. doi: 10.1093/nar/gky1120 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Battle A, Brown CD, Engelhardt BE, Montgomery SB. Genetic effects on gene expression across human tissues. Nature. 2017; 550:204–13. doi: 10.1038/nature24277 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Joehanes R, Zhang X, Huan T, Yao C, Ying S-X, Nguyen QT, et al. Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol. 2017; 18:16. Epub 2017/01/25. doi: 10.1186/s13059-016-1142-6 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kirsten H, Al-Hasani H, Holdt L, Gross A, Beutner F, Krohn K, et al. Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci†. Hum Mol Genet. 2015; 24:4746–63. Epub 2015/05/27. doi: 10.1093/hmg/ddv194 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Westra H-J, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet. 2013; 45:1238–43. Epub 2013/09/08. doi: 10.1038/ng.2756 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019; 47:D886–D894. doi: 10.1093/nar/gky1016 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ. 2003; 326:219. doi: 10.1136/bmj.326.7382.219 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020; 369:1318–30. doi: 10.1126/science.aaz1776 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014; 10:e1004383. Epub 2014/05/15. doi: 10.1371/journal.pgen.1004383 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Barbeira AN, Pividori M, Zheng J, Wheeler HE, Nicolae DL, Im HK. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 2019; 15:e1007889. Epub 2019/01/22. doi: 10.1371/journal.pgen.1007889 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Barbeira AN, PrediXcan Team. MetaXcan. GitHub repository: IM-Lab; 2021. https://github.com/hakyimlab/MetaXcan.
  • 42.PredictDB Team. GTEx v8 models on eQTL and sQTL. PredictDB: Im Lab, Genetic Medicine, Department of Medicine, The University of Chicago. [cited 21 May 2021]. https://predictdb.org//post/2021/07/21/gtex-v8-models-on-eqtl-and-sqtl/.
  • 43.Burgess S, Daniel RM, Butterworth AS, Thompson SG. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int J Epidemiol. 2015; 44:484–95. Epub 2014/08/22. doi: 10.1093/ije/dyu176 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Burgess S, Thompson SG. Mendelian Randomization. Methods for Using Genetic Variants in Causal Estimation. 1st ed. CRC Press; 2015. [Google Scholar]
  • 45.Sun W, Kechris K, Jacobson S, Drummond MB, Hawkins GA, Yang J, et al. Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD. PLoS Genet. 2016; 12:e1006011. Epub 2016/08/17. doi: 10.1371/journal.pgen.1006011 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun. 2017; 8:14357. Epub 2017/02/27. doi: 10.1038/ncomms14357 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Folkersen L, Fauman E, Sabater-Lleal M, Strawbridge RJ, Frånberg M, Sennblad B, et al. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet. 2017; 13:e1006706. Epub 2017/04/03. doi: 10.1371/journal.pgen.1006706 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, et al. Genomic atlas of the human plasma proteome. Nature. 2018; 558:73–9. Epub 2018/06/06. doi: 10.1038/s41586-018-0175-2 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Emilsson V, Ilkov M, Lamb JR, Finkel N, Gudmundsson EF, Pitts R, et al. Co-regulatory networks of human serum proteins link genetics to disease. Science. 2018; 361:769–73. Epub 2018/08/02. doi: 10.1126/science.aaq1327 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Henriksen PA, Hitt M, Xing Z, Wang J, Haslett C, Riemersma RA, et al. Adenoviral gene delivery of elafin and secretory leukocyte protease inhibitor attenuates NF-kappa B-dependent inflammatory responses of human endothelial cells and macrophages to atherogenic stimuli. J Immunol. 2004; 172:4535–44. doi: 10.4049/jimmunol.172.7.4535 . [DOI] [PubMed] [Google Scholar]
  • 51.Shavadia JS, Granger CB, Alemayehu W, Westerhout CM, Povsic TJ, Brener SJ, et al. High-throughput targeted proteomics discovery approach and spontaneous reperfusion in ST-segment elevation myocardial infarction. Am Heart J. 2020; 220:137–44. Epub 2019/11/09. doi: 10.1016/j.ahj.2019.09.015 . [DOI] [PubMed] [Google Scholar]
  • 52.Yuan S, Lin A, He Q-Q, Burgess S, Larsson SC. Circulating interleukins in relation to coronary artery disease, atrial fibrillation and ischemic stroke and its subtypes: A two-sample Mendelian randomization study. Int J Cardiol. 2020; 313:99–104. Epub 2020/03/21. doi: 10.1016/j.ijcard.2020.03.053 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Jie V Zhao

26 Oct 2021

PONE-D-21-20087Genetically regulated gene expression and proteins revealed discordant effectsPLOS ONE

Dear Dr. Pott,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. You may notice the reviewer raised some concerns regarding the methodology. I agree with these points, especially it would be useful to know whether the use of GTEx V8 can replicate the findings based on GTEx V7, as well as the basis of outcome selection.

Please submit your revised manuscript by 25 Dec 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jie V Zhao

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

4. Please note that in order to use the direct billing option the corresponding author must be affiliated with the chosen institute. Please either amend your manuscript to change the affiliation or corresponding author, or email us at plosone@plos.org with a request to remove this option.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors did an extensive work to study the 92 proteins in the whole blood from 2014 participants in both SNP level and locus level. Of which, 63 proteins were genetically regulated, and 3 sex specific effects were identified. The authors then tested for the colocalization between pQTLs and eQTLs and found evidence of both independent and shared signals. Additionally, the authors tested for the association of genetically regulated gene expression (GE) across tissues and identified 10 proteins of negative correlations. Finally, the authors tested the causal link of genomics, transcriptome, proteome to coronary artery disease (CAD) risk and confirmed a negative causal effect of GE on protein level for 5 genes. The main finding was the gene IL6R.

Comment 1: In the statistical analysis, the population stratification (PS) is not considered. It would be great if the authors can include more details to justify why PS is not a concern in the study. Otherwise, if there exists any PS, not adjusting for it could induce significant bias.

Comment 2: GTEx v8 data has been released for quite a while. I wonder why the authors used v7 instead of v8?

Comment 3: The proteins in the study were supposed to be cardiovascular (CVD) biomarkers and the authors tried to establish the link from genomics, transcriptome, proteome to CAD risk. I wonder why the authors included CVD-unrelated tissues, especially when testing for the causal path involving CAD, how does the GE and PE in colon, lung and testis contribute to CAD outcome?

Comment 4: The authors may need to review and correct the typos and formatting issues throughout. For example, 1) in the abstract, CAD is first mentioned and needs to be defined; 2) The table 2 needs to be re-formatted and the minor allele of CCL15 does not look right. In line 238, the results show ten genes instead of eight; 3) The gene names should be in italics format throughout; 4) The figure 2-4 in the main text are mis-numbered as figure 1-3.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 May 23;17(5):e0268815. doi: 10.1371/journal.pone.0268815.r002

Author response to Decision Letter 0


11 Feb 2022

Editorial Comments:

1. PLOS ONE's style requirements, including those for file naming

Author reply: We thank the editor for his comments and changed the manuscript accordingly.

2. Minimal Data Set / 3. Data Availability

Author reply: Complete data sets including genetic data cannot be made publicly available due to ethical and legal restrictions, as they are sufficient to identity study participants. This is not covered by the informed consent. However, access to the LIFE-Adult data is possible via project agreements addressed to:

Leipziger Forschungszentrum für Zivilisationserkrankungen (LIFE)

Dr. Matthias Nüchter

Head of Managing Office

Universität Leipzig, Medizinische Fakultät

Philipp-Rosenthal-Str. 27

04103 Leipzig

E-mail: matthias.nuechter@life.uni-leipzig.de

The pQTL summary statistics are publicly available at Zenodo (DOI: 10.5281/zenodo.6045694). If necessary, all R-scripts used for this publication can be made available at github.

4. Direct billing option

We confirm that the corresponding authors, Janne Pott & Markus Scholz, are affiliated with the University of Leipzig, which is the chosen institute.

Reviewer #1:

The authors did an extensive work to study the 92 proteins in the whole blood from 2014 participants in both SNP level and locus level. Of which, 63 proteins were genetically regulated, and 3 sex specific effects were identified. The authors then tested for the colocalization between pQTLs and eQTLs and found evidence of both independent and shared signals. Additionally, the authors tested for the association of genetically regulated gene expression (GE) across tissues and identified 10 proteins of negative correlations. Finally, the authors tested the causal link of genomics, transcriptome, proteome to coronary artery disease (CAD) risk and confirmed a negative causal effect of GE on protein level for 5 genes. The main finding was the gene IL6R.

Comment 1: In the statistical analysis, the population stratification (PS) is not considered. It would be great if the authors can include more details to justify why PS is not a concern in the study. Otherwise, if there exists any PS, not adjusting for it could induce significant bias.

Author reply: The LIFE Adult study is a long-term, population-based cohort study with participants from Leipzig and surrounding areas. In line with this, the genetic homogeneity is high (see new Supplemental Figure 3). We therefore decided to remove the few ethnic outliers and refrained from adjustment for principal components. However, we repeated our association tests for SNPs after hierarchical FDR and adjusted for 10 principle components. The beta estimates were essentially the same, while the standard error increased slightly. All SNPs remained significant according to our hierarchical FDR threshold (see Supplemental Table 4).

Changes in manuscript: We added a PCA plot of our study data as new Supplemental Figure 3 revealing high homogeneity, and included PC adjusted statistics for the lead SNPs in Supplemental Table 4. We also described our approach in the Material section in more detail.

Comment 2: GTEx v8 data has been released for quite a while. I wonder why the authors used v7 instead of v8?

Author reply: We thank the reviewer very much for this suggestion. Previously, we used hg19 SNP data, which is the same build as in GTEx v7. We agree that the most recent version of GTEx should be used and have therefore harmonized our data to hg38 and repeated all our analysis with the harmonized data and GTEx v8.

Changes in manuscript: We updated the methods for the harmonization and GTEx v8 usage. Please note that this has changed all of the tables and figures, while the main message of the paper remains the same.

Comment 3: The proteins in the study were supposed to be cardiovascular (CVD) biomarkers and the authors tried to establish the link from genomics, transcriptome, proteome to CAD risk. I wonder why the authors included CVD-unrelated tissues, especially when testing for the causal path involving CAD, how does the GE and PE in colon, lung and testis contribute to CAD outcome?

Author reply: We agree to discuss this in more detail. Cis-effects have been reported to be shared across tissues, with exception of brain tissues [The GTEx Consortium, DOI: 10.1126/science.aaz1776]. Hence we used also different tissues, partly with larger sample size, which might still detect the true contribution of the gene expression on CAD. Our described links of GE to CAD are mediated by PE, suggesting that increased GE in CAD-unrelated tissues lead to higher blood protein levels, which are then transported to heart tissues, where they increase the risk for a cardiovascular event.

Changes in manuscript: We added a paragraph in the Discussion section.

Comment 4: The authors may need to review and correct the typos and formatting issues throughout. For example, 1) in the abstract, CAD is first mentioned and needs to be defined; 2) The table 2 needs to be re-formatted and the minor allele of CCL15 does not look right. In line 238, the results show ten genes instead of eight; 3) The gene names should be in italics format throughout; 4) The figure 2-4 in the main text are mis-numbered as figure 1-3.

Author reply: We are sorry for these typos.

Changes in manuscript: We checked our manuscript after the extensive changes. We believe we have now eliminated all typos and formatting issues.

Attachment

Submitted filename: ResponseToReviewers.docx

Decision Letter 1

Jie V Zhao

26 Apr 2022

PONE-D-21-20087R1Genetically regulated gene expression and proteins revealed discordant effectsPLOS ONE

Dear Dr. Pott,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. As you can see, one reviewer has pointed out some wording issues and consistency throughout the paper, please revise accordingly.

Please submit your revised manuscript by 10 May. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jie V Zhao

Section Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors did extensive work to revise the manuscript. Now the manuscript looks great. I do not have further question.

Reviewer #2: In the revised version of their manuscript the authors have addressed the comments raised by the reviewer and doing so the quality and clarity of the manuscript have improved substantially.

I have only a couple of (very) minor further comments:

1. Figure numbers in the main text are still not correct.

2. The authors may want to present numbers in a consistent way throughout the manuscript, either as e.g. "2384" or "2,384".

3. It seems at several occasions there is a typo with r22 instead of r2 (when referring to LD), and line 313 “data WERE available”.

4. The authors may want to check for consistent use of past tense in the material and methods section, e.g. line 135 and line 196 in the revised version of the manuscript.

5. Data from the 1000 Genome Project Phase 3 have been used for genotype imputation. I wonder, why this rather old dataset was used instead of a more updated sources for imputation, e.g. the Haplotype reference consortium r1.1 .

6. I find the term "best eQTL" a bit unclear and would like the authors to specify whether they refer to strongest/most significant eQTL and/or largest effect size.

7. A more cautious wording with regard to Mendelian Randomization may be more adequate, as this methods gives an estimate of causation/causality but does not prove or guarantee such relationship, e.g. line 349 "IL6R was the only protein that showed causality in all three settings".

8. In the part of the discussion section where the authors refer to TPM values of several genes in several tissues, many new results are reported (that otherwise have only appeared in supplementary material). I suggest to limit the content of new data presented here and instead report these findings in the results section.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 May 23;17(5):e0268815. doi: 10.1371/journal.pone.0268815.r004

Author response to Decision Letter 1


6 May 2022

Reviewer #1:

The authors did extensive work to revise the manuscript. Now the manuscript looks great. I do not have further question.

Authors reply: We thank the reviewer very much for the positive evaluation.

Reviewer #2:

In the revised version of their manuscript the authors have addressed the comments raised by the reviewer and doing so the quality and clarity of the manuscript have improved substantially.

I have only a couple of (very) minor further comments:

Authors reply: We thank the reviewer very much for the encouraging evaluation and the helpful comments.

Comment 1: Figure numbers in the main text are still not correct.

Author reply: We are sorry for not checking the created PDF file for this mix-up. For Fig 1, the corresponding text box was accidently deleted. Hence, when converting to a PDF, the auto-numbering used 1 for Fig 2, and so on.

Changes in manuscript: We added the text box for Fig 1 again.

Comment 2: The authors may want to present numbers in a consistent way throughout the manuscript, either as e.g. "2384" or "2,384".

Author reply: We are sorry for this mix-up and present numbers now consistently as “2,014”.

Changes in manuscript: We changed all number so that the thousands separator is consistently a comma.

Comment 3: It seems at several occasions there is a typo with r22 instead of r2 (when referring to LD), and line 313 “data WERE available”.

Author reply: We believe this to be a marked-up problem, as we did not find this typo in the cleaned manuscript. We accidently deleted the superscripted 2, and added it again. However, in the marked-up mode, a superscripted deleted number looks like an underlined number and hence it reads as r22 instead of r2.

Changes in manuscript: We checked the manuscript for the r22 typo and corrected line 313.

Comment 4: The authors may want to check for consistent use of past tense in the material and methods section, e.g. line 135 and line 196 in the revised version of the manuscript.

Author reply: We are sorry for these inconsistencies and corrected them in the manuscript

Changes in manuscript: We checked the Material & Methods section and corrected the grammar if necessary.

Comment 5: Data from the 1000 Genome Project Phase 3 have been used for genotype imputation. I wonder, why this rather old dataset was used instead of a more updated sources for imputation, e.g. the Haplotype reference consortium r1.1 .

Author reply: We were not able to legally update imputation of genetic data of the LIFE consortium to newer references such as HRC or TopMed, as these references are not publicly accessible for imputation on our servers. Therefore, genetic data of the LIFE-consortium would have to be transferred to e.g. Michigan Imputation Server in the United States of America (US) or Sanger Imputation Server in the United Kingdom (UK). For both US and UK, there are still open Data Privacy concerns, as the level of data protection does not comply with the standards required by the European Union (EU). For the US, the Trans-Atlantic Data Privacy Framework is under development, but not yet in place (see https://ec.europa.eu/commission/presscorner/detail/en/ip_22_2087). For the UK, an adequacy decision under the General Data Protection Regulation had been adopted by the EU just last June. However, the final decision for imputation on a server in a country other than an EU member state lies within the LIFE consortia. We will update imputation of the LIFE genetic data once the consortium agrees, or an alternative Imputation Server is available within the EU.

Changes in manuscript: None.

Comment 6: I find the term "best eQTL" a bit unclear and would like the authors to specify whether they refer to strongest/most significant eQTL and/or largest effect size.

Author reply: We meant the cis-eQTL with lowest p-value per gene and tissue.

Changes in manuscript: We added this information in the Material and Method section.

Comment 7: A more cautious wording with regard to Mendelian Randomization may be more adequate, as this methods gives an estimate of causation/causality but does not prove or guarantee such relationship, e.g. line 349 "IL6R was the only protein that showed causality in all three settings".

Author reply: We thank the reviewer for this reminder and agree to tone down the statements on causality.

Changes in manuscript: We used a more cautious wording in the Results section regarding Mendelian Randomization. In addition we added some information why not all settings were tested for this analysis.

Comment 8: In the part of the discussion section where the authors refer to TPM values of several genes in several tissues, many new results are reported (that otherwise have only appeared in supplementary material). I suggest to limit the content of new data presented here and instead report these findings in the results section.

Author reply: In this part of the discussion, we tried to evalute the functional relevance by checking the published expression rate of GTEx v8 of the identified genes by tissues, e.g. do these negative links only occur in tissues with low expressions. Hence, that are not new results. However, we agree to shorten that paragraph by focusing on the main comparisions only.

Changes in manuscript: We shortend the TPM paragraph in the discussion and clarified its reference, GTEx v8.

Attachment

Submitted filename: ResponseToReviewers.docx

Decision Letter 2

Jie V Zhao

10 May 2022

Genetically regulated gene expression and proteins revealed discordant effects

PONE-D-21-20087R2

Dear Dr. Pott,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jie V Zhao

Section Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Jie V Zhao

12 May 2022

PONE-D-21-20087R2

Genetically regulated gene expression and proteins revealed discordant effects

Dear Dr. Pott:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jie V Zhao

Section Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (PDF)

    S2 File

    (XLSX)

    Attachment

    Submitted filename: ResponseToReviewers.docx

    Attachment

    Submitted filename: ResponseToReviewers.docx

    Data Availability Statement

    All summary statistics are publicly available from Zenodo (DOI: 10.5281/zenodo.6045694). Scripts used in the secondary analyses are included at https://github.com/GenStatLeipzig/LWAS_Olink. Complete data sets including genetic data cannot be shared publicly due to ethical and legal restrictions, as they are sufficient to identity study participants. This is not covered by the informed consent form of the LIFE-Adult study. Data are available from the LIFE Research Center (contact via Dr. Matthias Nüchter, Head of Managing Office, E-mail: matthias.nuechter@life.uni-leipzig.de) for researchers who meet the criteria for access to confidential data.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES