Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies

Liye Zhang; Tao Ju; Xiuyuan Jin; Jiadong Ji; Jiayi Han; Xiang Zhou; Zhongshang Yuan

doi:10.1093/genetics/iyac153

. 2022 Oct 13;222(4):iyac153. doi: 10.1093/genetics/iyac153

Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies

Liye Zhang ^1,^2,^#, Tao Ju ^3,^4,^#, Xiuyuan Jin ^5,⁶, Jiadong Ji ⁷, Jiayi Han ^8,⁹, Xiang Zhou ¹⁰, Zhongshang Yuan ^11,^12,^✉

Editor: H Zhao

PMCID: PMC9713396 PMID: 36227056

Abstract

Transcriptome-wide association studies aim to integrate genome-wide association studies and expression quantitative trait loci mapping studies for exploring the gene regulatory mechanisms underlying diseases. Existing transcriptome-wide association study methods primarily focus on 1 gene at a time. However, complex diseases are seldom resulted from the abnormality of a single gene, but from the biological network involving multiple genes. In addition, binary or ordinal categorical phenotypes are commonly encountered in biomedicine. We develop a proportional odds logistic model for network regression in transcriptome-wide association study, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study, to detect the association between a network and binary or ordinal categorical phenotype. Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study relies on 2-stage transcriptome-wide association study framework. It first adopts the distribution-robust nonparametric Dirichlet process regression model in expression quantitative trait loci study to obtain the SNP effect estimate on each gene within the network. Then, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study uses pointwise mutual information to represent the general relationship among the network nodes of predicted gene expression in genome-wide association study, followed by the association analysis with all nodes and edges involved in proportional odds logistic model. A key feature of Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study is its ability to simultaneously identify the disease-related network nodes or edges. With extensive realistic simulations including those under various between-node correlation patterns, we show Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study can provide calibrated type I error control and yield higher power than other existing methods. We finally apply Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study to analyze bipolar and major depression status and blood pressure from UK Biobank to illustrate its benefits in real data analysis.

Keywords: transcriptome-wide association studies, biological network, pointwise mutual information, ordinal categorical phenotypes

Introduction

Transcriptome-wide association studies (TWASs) aim to identify the disease-related genes by integrating genome-wide association studies (GWASs) and the parallel expression quantitative trait loci (eQTL) mapping studies. It has shown great promise in interpreting GWAS signals and in illuminating the regulatory mechanism for disease susceptibility. TWAS can be typically conducted in a 2-stage framework. In the first stage, the genotype effect size has been derived from an eQTL study. In the second stage, the predicted gene expression in GWAS can be obtained using the genotype effect size estimated from the first stage, followed by the association analysis between the predicted gene expression and the outcome GWAS trait.

Many statistical methods for TWAS analysis have been developed, including but not limited to PrediXcan (Gamazon et al. 2015), TWAS (Gusev et al. 2016), DPR (Zeng and Zhou 2017), TIGAR (Nagpal et al. 2019), SMR (Zhu et al. 2016), PMR-Egger (Yuan et al. 2020), UTMOST (Hu et al. 2019), MoPMR-Egger (Liu et al. 2021), kTWAS (Cao et al. 2021), and HMAT (Zeng et al. 2021). Some of these methods focus on improving the estimation accuracy of the genotype effect size in the first stage (e.g. PrediXcan, TWAS, DPR, and TIGAR), some focus on improving the association power either using kernel-type method in the second stage (e.g. kTWAS) or through aggregating multiple expression prediction models (e.g. HMAT), or adopting a likelihood-based inference procedure (e.g. PMR-Egger).

However, almost all available TWAS methods only focus on 1 gene at a time, which may be suboptimal due to its failure to account for the correlation among multiple genes. So far, there are only 2 methods for modeling multiple genes, FOCUS and FOGS (Mancuso et al. 2019; Wu and Pan 2020). FOCUS models multiple genes in TWAS framework from a Bayesian perspective, and finally obtains region-based credible gene sets possibly containing the causal gene at a given confidence level. FOGS conceptually transforms the multiple gene modeling into SNPs modeling and performs conditional analysis of each specific SNP in 1 gene by adjusting the SNPs of other genes within the same region. Both FOCUS and FOGS exhibit a great advantage over the TWAS methods that only model 1 gene at a time. However, both of them are still unable to account for the network structure among multiple genes, thus may lead to a loss of efficiency.

From the network medicine perspective, complex diseases are seldom resulted from the abnormality involving a single gene, but from the biological network involving multiple genes (Barabási et al. 2011). A biological network often consists of nodes (e.g. genes) and edges representing functions or physiological interactions between nodes. Identifying the disease-related biological network can provide a better understanding of the network mechanism underlying the disease. Conceptually, both nodes and edges can contribute to the development of disease. However, the relationship between 2 gene nodes is often quite complex, the challenge is to choose a suitable measure to capture the between-node connection, thus to quantify the edge information. Compared with other differential network methods such as PMNR (Lin et al. 2020), DGCA (McKenzie et al. 2016), and RANK (Alvo et al. 2010), pointwise mutual information (PMI)-based network regression method has been illustrated to be an efficient measure for the correlation between 2 nodes and has better performance in capturing the general relationship among pairwise network nodes (Lin et al. 2020; Wang et al. 2021).

In addition, both binary and ordinal categorical phenotypes are commonly encountered in biomedicine. For example, it is commonly used the binary indicators to define hypertension status. Besides, ordinal categorical data, which are often collected from testing to measure human behaviors, satisfaction, and preferences, are also a common type of phenotype. For example, the ordinal numbers are often used to represent the severity of bipolar and major depression status in UK Biobank (Bycroft et al. 2018). It would be unreasonable to directly regard the binary or ordinal categorical phenotype as a quantitative trait, and applying linear regression methods to these traits can suffer from information loss and is less powerful.

In the present study, we presented a Proportional Odds LOgistic model for NEtwork regression in TWAS, PoLoNet, to detect the association between 1 specific network and binary or ordinal categorical phenotype. PoLoNet is developed under a 2-stage TWAS analysis framework to overcome the above challenges. PoLoNet first adopts the distribution-robust Dirichlet process regression (DPR) model in the eQTL study to obtain the SNP effect estimate on each specific gene within a specific network. Then, PoLoNet uses PMI to calculate all the between-node correlations (edges) to capture the general relationship among the network nodes of predicted gene expression, followed by the association analysis with all the nodes and all the edges involved in the model. In this sense, PoLoNet effectively accounts for the structure of the network and is able to simultaneously identify the trait-related nodes as well as the trait-related edges. With comprehensive and extensive realistic simulations, we show that PoLoNet can provide calibrated type I error control for testing either the node effect or the edge effect, which yields higher power than other existing methods. Finally, we analyzed blood pressure as well as bipolar and major depression status from the UK Biobank to illustrate the benefits of PoLoNet in real data analysis.

Materials and methods

Pointwise mutual information

The PMI of 2 random variables $X$ and $Y$ is defined as:

PMI (x, y) = \log \frac{p (x, y)}{p (x) p (y)}

(1)

where $p (x, y)$ is the joint distribution of $X$ and $Y$ , with $p (x)$ and $p (y)$ being their marginal distributions. The PMI between 2 variables is 0 if and only if they are independent. In this sense, PMI can capture the general relationship between 2 variables regardless of whether the correlation pattern is linear or not. The calculation of PMI requires the estimation of the joint density function as well as the marginal density function. Often, it is hard to predetermine these specific distributions, thus, we choose the nonparametric Kernel density to estimate the density function to avoid the risk of distribution misspecification. Kernel density estimation can approximate the true distribution based on the data at hand and improve the robustness of the PMI estimator. The 2D kernel density estimation of $X$ and $Y$ is defined as:

{\hat{f}}_{H} (z; H) = \frac{1}{n} \sum_{i = 1}^{n} K_{H} (z - Z_{i})

(2)

where $z = {(x, y)}^{T}$ and $Z_{i} = {(X_{i}, Y_{i})}^{T}$ , $i = 1,2, \dots, n$ . $H$ is a $2$ × $2$ symmetric and positive bandwidth matrix. $K$ is a 2D kernel density function. We herein choose the commonly used 2D normal kernel, with $K_{H} (z) = {(2 π)}^{- d / 2} {| H |}^{- d / 2} e xp (- \frac{1}{2} z^{T} H^{- 1} z)$ and $d = 2$ .

PoLoNet model

We assume there are totally $m$ genes and denote $x_{i}$ as a $n_{1}$ -vector of gene expression measurements of the $i$ th gene, which is measured on $n_{1}$ individuals in the eQTL study and denote $G_{x_{i}}$ as a $n_{1}$ by $p_{i}$ genotype matrix for cis-SNPs of the $i$ th gene in the same study, $i = 1, \dots, m$ . In the typical 2-stage TWAS, we first need to obtain gene expression prediction weights for each gene by constructing the model in the eQTL study as follows:

x_{i} = G_{x_{i}} β_{i} + ε_{x_{i}}, (i = 1, \dots, m)

(3)

where $β_{i}$ is a $p_{i}$ -vector of SNP effect sizes on the expression of the $i$ th gene. $ε_{x_{i}}$ is a $n_{1}$ -vector of residual error with each element independent and identically distributed from a normal distribution $N (0, σ_{x}^{2})$ . We denote $G_{y_{i}}$ as a $n_{2}$ by $p_{i}$ genotype matrix for the same $p_{i}$ cis-SNPs of the $i$ th gene in GWAS. We can obtain the estimator of the SNP effect sizes of the ith gene from above model, and then derive the predicted gene expression as ${\tilde{x}}_{i} = {({\tilde{x}}_{i 1}, \dots, {\tilde{x}}_{i n_{2}})}^{T} (i = 1, \dots, m)$ in GWAS, where ${\tilde{x}}_{i} = G_{y_{i}} β_{i}$ . Noting that improvement in the SNP effect size estimation can substantially increase the performance of TWAS (Zhou et al. 2020). The accuracy of gene expression prediction highly depends on how close the prior distribution of SNP effect size is to the real genetic architecture. However, the true distribution of SNP effect size is often unknown and is hard to be fully captured using the parametric models. We herein prefer to use the nonparametric DPR to construct gene expression prediction model, which can automatically infer the SNP effect size distribution from the data and avoid the risk of model misspecification. In addition, we alternatively adopt the Bayesian sparse linear mixed model (BSLMM), which assumes the SNP effect size follows 2 mixture normal distributions, for sensitive analysis. These $m$ genes can weave into a biological network with each node representing the gene expression measurement and each edge representing the between-node correlation. For the $j$ th $(j = 1, \dots, n_{2})$ individual in GWAS, let $E_{jlk}$ denote the PMI estimator between the $l$ th node and the $k$ th node for the $j$ -th individual, $Z_{j} = {(Z_{1 j}, Z_{2 j}, \dots, Z_{s j})}^{T}$ denote a $s$ -vector of the covariates [e.g. sex, top 10 genotype principal components (PCs)], and $y_{j} = 1, \dots, C$ denote its ordinal categorical phenotype, with $C$ being the number of categories. PoLoNet model is constructed as follows:

logit (v_{j c}) = ε_{c} - η_{j} = ε_{c} - \sum_{i = 1}^{s} Z_{i j} α_{i} - \sum_{i = 1}^{m} {\tilde{x}}_{i j} γ_{i} - \sum_{l = 1}^{m} \sum_{k > l}^{m} I_{l k} E_{jlk} δ_{l k}

(4)

where

I_{l k} = \{\begin{matrix} 1 \\ 0 \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} l t h gene and kth gene are connected i n the network \\ o therwise \end{matrix}

and $v_{j c} = \Pr (y_{j} \leq c| Z_{j}, {\tilde{x}}_{i j}, E_{jlk}) (c \in [1, C])$ is the cumulative probability given the covariates, all the nodes of predicted gene expression as well as the edges. The cutpoints $ε : ε_{1} < \dots < ε_{C} = \infty$ are used to categorize the data, $α_{i} (i = 1, \dots, s)$ is the coefficient of the ith covariate, $γ_{i} (i = 1, \dots, m)$ is the coefficient that represents the causal effect of the ith nodes, and $δ_{l k} (l = 1, \dots, m; k > l)$ is the effect of the edge linking the $l$ th gene and the $k$ th gene. Note that the model is reduced to be the logistic model with $C = 2$ .

We are interested in estimating and testing the node effects $γ$ as well as the edge effects $δ$ . Here we put the inference into the likelihood framework and use a regularized Newton–Raphson algorithm with step halving to obtain the approximate maximum likelihood estimate of $\hat{γ}$ and $\hat{δ}$ as well as their standard error $SE (\hat{γ})$ and $SE (\hat{δ})$ . Specifically, the regularized method can make algorithm apply to ordinal dataset and step halving is enforced to obtain the maximum likelihood estimate when the full step causes a decrease in the likelihood function, in which case the algorithm is consecutively halved until the step is small enough to cause an increase in the likelihood (Christensen 2018). Afterwards, we can construct the corresponding Wald test statistic and obtain a P-value for hypothesis testing.

We refer to our model as the Proportional Odds LOgistic model for NEtwork regression in TWAS (PoLoNet), to conduct the network regression for either binary or ordinal categorical phenotype in TWAS framework. PoLoNet first adopts the DPR model to impute all the gene expression within a specific network, and then adopts PMI to represent the general between-node correlation pattern. PoLoNet can simultaneously estimate and test both the effect of each gene’s expression and the effect of between-gene connection in TWAS. PoLoNet is computationally scalable and is implemented in a R package, freely available at https://github.com/Liye222/PoLoNet.

Numeric simulations

Given that there are no existed statistical methods for network regression analysis for either binary or ordinal categorical traits in TWAS yet, we performed comprehensive simulations to compare PoLoNet with the Proportional odds logistic model using the product moment (PM) to represent the edge of Network in TWAS (PPNT), which essentially replaces PMI with PM but under PoLoNet framework. We first mimic a realistic TWAS analysis by integrating the GEUVADIS study (Lappalainen et al. 2013) with GWAS from UK Biobank (Bycroft et al. 2018) (details for these 2 datasets are provided in the Real data application). Specifically, we obtained genotype data and gene expression data from GEUVADIS ( $n_{1} = 465$ ) and standardized the genotype of each SNP as well as the expression vector to have a zero mean and a unit SD. For each specific gene, we applied DPR or BSLMM to obtain the estimate of the SNP effect size on gene expression, respectively. Then, we obtained genotypes for the same SNPs from UK Biobank and standardized the genotype vector of each SNP to have a zero mean and a unit SD. With the standardized genotype matrix and SNP effect sizes estimated by DPR or BSLMM from the GEUVADIS study, we obtained the predicted gene expression, which was further used to simulate the phenotype. In addition, to avoid the risk of prespecifying the network structure, we selected the pathway of Alzheimer disease (hsa05010–nt06412) from the Kyoto Encyclopedia of Genes and Genomes (KEGG) as the network (Eisen et al. 1998; Ashburner et al. 2000). The final analyzed network included 12 nodes and 13 edges (Fig. 1) after overlapping all the genes on the pathway with those in UK Biobank. We randomly selected various sample sizes ( $n_{2} = 5,000, 10,000, 20,000$ ) of the GWAS study from the 337,129 individuals in the UK Biobank.

Fig. 1. — The simulated network structure based on the Alzheimer pathway from KEGG.

We designed the following 4 simulation scenarios with prespecified node or edge:

only a node of network has the effect (e.g. node $X_{5}$ in Fig. 1),
only an edge of network has the effect (e.g. edge $E_{5,10}$ in Fig. 1),
a node and an edge have effects, with the node hanging on the edge (e.g. node $X_{2}$ and edge $E_{2,6}$ in Fig. 1), and
a node and an edge have effects, with the node not hanging on the edge (e.g. node $X_{2}$ and edge $E_{5,10}$ in Fig. 1).

In each simulation scenario, we considered 4 correlation patterns between 2 nodes $X_{i}$ and $X_{j}$ , including 1 linear correlation ( $X_{j} = 0.5 X_{i} + ε$ ) and 3 nonlinear correlations (de Siqueira Santos et al. 2014; Lin et al. 2020): quadratic relationship ( $X_{j} = 0.5 {X_{i}}^{2} + ε$ ), sine relationship ( $X_{j} = \sin (X_{i} + 1) + ε$ ) as well as the combination of quadratic and sine relationship ( $X_{j} = {(\sin X_{i})}^{2} + ε$ ), where $ε$ is the residual error from a standard normal distribution $ε \sim N (0,1) .$ For example, if we set the linear relationship between nodes $X_{5}$ and $X_{10}$ , then $X_{10} = 0.5 \cdot X_{5} + ε$ . We set $E_{5,10} = 0.5 {\cdot X}_{5} \cdot X_{10}$ to represent the edge. If we set the quadratic relationship between $X_{5}$ and $X_{10}$ , then $X_{10} = 0.5 \cdot {X_{5}}^{2} + ε$ , that is, the nonlinear quadratic relationship between $X_{5}$ and $X_{10}$ can be transformed to the linear relationship between ${X_{5}}^{2}$ and $X_{10}$ , we then set $E_{5,10} = 0.5 \cdot {X_{5}}^{2} \cdot X_{10}$ to represent the edge. In addition, we simulated 1 covariate $Z_{1}$ from the standard normal distribution and another covariate $Z_{2}$ from a Bernoulli (0.5) distribution.

For binary phenotype, we set the case–control ratios to be balanced (1:1), moderately unbalanced (1:9), strongly unbalanced (1:99), and extremely unbalanced (1:999). We used the following liability threshold model to generate the binary traits. For subject $j (j = 1, \dots, n_{2})$ ,

{τ_{j}}^{*} = \frac{\exp (\sum_{i = 1}^{s} Z_{i j} α_{i} + \sum_{i = 1}^{m} {\tilde{x}}_{i j} γ_{i} + \sum_{l = 1}^{m} \sum_{k > l}^{m} I_{l k} E_{jlk} δ_{l k})}{1 + \exp (\sum_{i = 1}^{s} Z_{i j} α_{i} + \sum_{i = 1}^{m} {\tilde{x}}_{i j} γ_{i} + \sum_{l = 1}^{m} \sum_{k > l}^{m} I_{l k} E_{jlk} δ_{l k})} .

(5)

We then selected cutpoints based on the given case–control ratios. For example, we can define $y_{j} = 1$ if the latent variable ${τ_{j}}^{*}$ is greater than the cutpoint ${τ_{0.5}}^{*}$ , otherwise $y_{j} = 0$ , where cutpoint ${τ_{0.5}}^{*}$ is the median of $τ^{*} = {({τ_{1}}^{*}, \dots, {τ_{n_{2}}}^{*})}^{T}$ . For ordinal categorical phenotype, we assumed there were 4 categories with ratios to be balanced ( $c_{1} : c_{2} : c_{3} : c_{4} = 1 : 1 : 1 : 1$ ), moderately unbalanced ( $c_{1} : c_{2} : c_{3} : c_{4} = 10 : 1 : 1 : 1$ ), strongly unbalanced ( $c_{1} : c_{2} : c_{3} : c_{4} = 30 : 1 : 1 : 1$ ), and extremely unbalanced ( $c_{1} : c_{2} : c_{3} : c_{4} = 100 : 1 : 1 : 1$ ).

The ordinal categorical phenotype can be simulated as

τ_{j} = η_{j} = \sum_{i = 1}^{s} Z_{i j} α_{i} + \sum_{i = 1}^{m} {\tilde{x}}_{i j} γ_{i} + \sum_{l = 1}^{m} \sum_{k > l}^{m} I_{l k} E_{jlk} δ_{l k}

(6)

The cutpoints were selected similarly to that for binary phenotype,

y_{j} = \{\begin{matrix} 1, i f τ_{j} \leq ε_{c}, c = 1 \\ c, {if ε}_{c - 1} < τ_{j} \leq ε_{c}, c = 2,3, 4 \end{matrix}

(7)

In the simulations, we set the effects of covariates to be fixed as 0.5 and assessed the type I error rates by setting all the effects of both nodes and edges to be 0. For binary phenotype, we further evaluated empirical power at a significance level of 0.05 by setting the effect of causal node or edge to be 0.001, 0.075, 0.15, 0.225, and 0.3, the effects of other nodes and edges to be 0. These effects are calculated to be the 0%, 25%, 50%, 75%, and 100% quantiles from the real blood pressure GWAS data. We further calculated phenotype heritability based on effect sizes of nodes or edges and between-node relationships in simulations (Lee et al. 2011), with the median and mean of phenotype heritability being about 0.09 and 0.17, respectively (Supplementary Table 1). We also calculated gene expression heritability of the 12 genes in the network of Alzheimer disease by GCTA (Yang et al. 2011), with the median and mean of gene expression heritability being about 0.01 and 0.02, respectively (Supplementary Table 2). For ordinary categorical phenotype, we evaluated empirical power by setting the effect of causal node and edge to be 0.05, the effects of other nodes and edges to be 0. In addition, we performed a total of 1,000 simulation replicates per setting for binary and ordinary categorical phenotypes.

To further alleviate the influence of prespecifying the effecting node or edge, we conducted additional simulations in which the effecting node and edge were randomly selected, under the linear or nonlinear (combination of quadratic and sine) between-node correlation.

Real data application

We applied PoLoNet to perform the network regression for binary or ordinal categorical traits in TWAS analysis. Specifically, we obtained the GEUVADIS data as the gene expression data and examined 2 GWAS traits from the UK Biobank. The detailed data processing is provided below.

The GEUVADIS data contains gene expression measurements for 465 individuals from 5 different populations including CEPH (CEU), Finns (FIN), British (GBR), Toscani (TSI), and Yoruba (YRI). We only focused on protein-coding genes and lncRNAs that are annotated in GENCODE (release 12) (Harrow et al. 2012; Wen et al. 2015). Among these genes, we removed low-expressed genes that have zero counts in at least half of the individuals to obtain a final set of 15,810. We performed PEER normalization to remove confounding effects and unwanted variations following previous studies (Stegle et al. 2012; Zeng and Zhou 2017; Yuan et al. 2020). To remove the remaining population stratification, we quantile normalized the gene expression measurements across individuals in each population to a standard normal distribution, and then further quantile normalized the gene expression measurements to a standard normal distribution across individuals from all 5 populations. Besides the expression data, all individuals also have their genotypes sequenced in the 1000 Genomes Projects. We filtered out SNPs that have a Hardy–Weinberg equilibrium (HWE) P-value < $10^{- 4}$ , a genotype call rate <95%, or a minor allele frequency (MAF) <0.001. Finally, a total of 7,072,917 SNPs remained for analysis.

The UK Biobank data consists of 487,298 individuals and 92,693,895 imputed SNPs (Bycroft et al. 2018). We followed the same sample QC procedure in Neale lab (Supplementary Web Resources) to retain a total of 337,129 individuals of European ancestry. We filtered out SNPs with HWE P-value < $10^{- 7}$ , genotype call rate <95%, and MAF < 0.001 to obtain a total of 13,876,958 SNPs. We integrated the GEUVADIS data with GWAS data from UK Biobank for TWAS analysis. For each gene in turn in the GEUVADIS data, we extracted cis-SNPs that are within either 1 Mb upstream of the transcription start site or 1 Mb downstream of the transcription end site. We overlapped these cis-SNPs of genes in GEUVADIS with that obtained from UK Biobank to obtain common SNPs.

Following ESC/ESH Guidelines for the management of arterial hypertension (Williams et al. 2018), we obtained ordinal categorical blood pressure (a total of 6 categories, Supplementary Table 3) as well as binary blood pressure. Specifically, hypertension is diagnosed at clinic BP $\geq 140 / 90$ mmHg, which is equivalent to $135 / 85$ mmHg for home BP measurement as well as an all-day mean pressure of $130 / 80$ mmHg for ambulatory blood pressure monitoring. We searched the networks potentially related with blood pressure from KEGG using the term “blood pressure” and “hypertension” and involved all the 22 networks (hsa04610, hsa04270, hsa04614, hsa04924, hsa04925, hsa04926, hsa04960, hsa05415, hsa00140, hsa00590, hsa03320, hsa04010, hsa04020, hsa04064, hsa04150, hsa04260, hsa04350, hsa04370, hsa04750, hsa04910, hsa04931, and hsa04933). After overlapping the network genes with those from UK Biobank, we finally analyzed totally 22 networks, including 1,874 gene nodes and 5,059 edges (Supplementary Table 4). The average sample sizes across different categories are 35,194:43,850:53,546:91,042:38,312:10,441 for the ordinal categorical blood pressure and 132,591:139,795 for the binary blood pressure (Supplementary Tables 5 and 6). Blood pressure is a major risk factor for cardiovascular morbidity and mortality (Collaboration 2002), with heritability ranging from 0.3 to 0.5 (Ehret and Caulfield 2013). We conducted the analysis using DPR as the gene expression prediction model by adjusting the sex and top 10 genotype principal components to remove the population structure in both PoLoNet and PPNT models.

Secondly, we focused on the UK Biobank GWAS of bipolar and major depression status, where, following the definition from UK Biobank, 0–5 was used to represent the severity of the disease: no bipolar or depression (0), single probable major depression episode (1), probable recurrent major depression (moderate) (2), probable recurrent major depression (severe) (3), bipolar II disorder (4), and bipolar I disorder (5). Again, we searched the networks potentially related with bipolar and major depression status using the term “depression” and included all the 9 networks (hsa04730, hsa05017, hsa04020, hsa04724, hsa04010, hsa04140, hsa04141, hsa04720, and hsa04721). After overlapping these network genes with those from UK Biobank, we finally analyzed totally 9 networks, including 1,033 gene nodes and 2,880 edges (Supplementary Table 4). The average sample sizes across different categories are 43,590:4,064:7,629:4,316:339:377 for the ordinal categorical phenotype and 43,590:16,726 for the binary blood pressure (Supplementary Tables 5 and 6). Bipolar disorders are a complex group of severe and chronic disorders, with heritability being 70% (McIntyre et al. 2020). In addition, depression has been commonly recognized as the most frequent clinical presentation of bipolar disorder (Hirschfeld 2014). Again, we conducted the analysis using DPR as the gene expression prediction model and took the sex and top 10 genotype principal components into account.

Results

Simulation results for binary phenotype

We first illustrated the performance of both PoLoNet and PPNT in detecting the effect of the node under sample size 10,000, using DPR as the gene expression prediction model in TWAS. The type I error rates of both methods are close to the nominal level, regardless of the correlation patterns among the network nodes and regardless whether the case–control ratios are balanced or not (Fig. 2a). PoLoNet has comparable power with PPNT and the power of both methods increases as the effect size increases and as the case–control ratio becomes more balanced, regardless of the correlation patterns (Fig. 3, a–d). In addition, similar results can be found when both node and edge have effects, with the effecting node hanging on the edge (Supplementary Fig. 1) as well as with the effecting node not hanging on the edge (Supplementary Fig. 2).

Fig. 2. — Empirical type I error rates of PoLoNet and PPNT for binary phenotype. Empirical type I error rates (y-axis) at a significance level 0.05 is plotted against different between-node relationship (x-axis). The sample size is 10,000 and DPR is used as the gene expression prediction model in TWAS. We considered 4 case–control ratios [extremely unbalanced (1:999); strongly unbalanced (1:99); moderately unbalanced (1:9); balanced (1:1)] for both methods. a) Detecting the effect of the node under the setting that only a node has the effect. b) Detecting the effect of the edge under the setting that only an edge has the effect.

Fig. 3. — Power of PoLoNet and PPNT for binary phenotype. Power (y-axis) at a significance level 0.05 is plotted against different effect size (0.001, 0.075, 0.15, 0.225, 0.3) (x-axis). The sample size is 10,000 and DPR is used as the gene expression prediction model in TWAS. We considered 4 case–control ratios [extremely unbalanced (1:999), strongly unbalanced (1:99), moderately unbalanced (1:9), balanced (1:1)]. The simulations are considered with 4 correlation patterns including combination (a), sine (b), quadratic (c), and linear (d) when only a node has effect as well as combination (e), sine (f), quadratic (g), and linear (h) when only an edge has effect.

We then evaluated the performance of both PoLoNet and PPNT in detecting the effect of the edge under sample size 10,000, using DPR as the gene expression prediction model in TWAS. The type I error rates of both methods are quite close to the nominal level, regardless of the correlation patterns among different network nodes and regardless whether the case–control ratios are balanced or not (Fig. 2b). The power of both PoLoNet and PPNT increases as the effect size increases and as the case–control ratio becomes more balanced (Fig. 2, e–h). When the between-node correlation is nonlinear, the power of PoLoNet is much larger than that of PPNT. For example, under the combination of quadratic and sine relationship between the network nodes, the power of PoLoNet reached to be 1 when the effect size is 0.225 and the case–control ratio is moderately unbalanced, while it is only 0.042 for PPNT. Although the power of PoLoNet reduces to 0.32 in this setting when the case–control ratio is extremely unbalanced, it is still much higher than 0.045 for PPNT.

Besides, we also find that the power difference between the 2 methods is related to the pattern of the nonlinear relationship. For example, when the case–control ratios are moderately unbalanced (1:9), strongly unbalanced (1:99), and extremely unbalanced (1:999), the power of PoLoNet is significantly higher than that of PPNT under the combination of quadratic and sine relationship as well as under the quadratic relationship among different network nodes, while the power difference reduces under the sine relationship.

As expected, the power of PoLoNet also depends on the specific correlation pattern among the network nodes. With the increase of the effect size and the case–control ratio becoming more balanced, the power of PoLoNet increases faster than that of PPNT. For example, when the correlation pattern is quadratic and the effect size is 0.075, the power of PoLoNet increases from 0.11 to 1 when the case–control ratio changes from extremely unbalanced to moderately unbalanced, while the power of PPNT increases from 0.02 to 0.41. When the relationship is the combination of quadratic and sine and the case–control ratio is strongly unbalanced, the power of PoLoNet increases from 0.24 to 0.94 when the effect size increases from 0.075 to 0.225, while the power of PPNT increases from 0.04 to 0.05.

In addition, the power advantage of PoLoNet over PPNT remained as long as the correlation patterns are nonlinear, regardless whether the case–control ratios are balanced or not. All the results illustrated PoLoNet can efficiently capture the general nonlinear relationship among network nodes. In the linear correlation pattern, the power of PoLoNet is slightly lower than that of PPNT, which may be presumably due to the fact that the PM is the gold standard in this case.

Similar conclusions can be obtained when both node and edge have effects, with the effecting node hanging on the edge (Supplementary Fig. 3) as well as with the effecting node not hanging on the edge (Supplementary Fig. 4). When the case–control ratio is moderately unbalanced (9:1) and both node and edge have effects with the node hanging on the edge, it is counter-intuitive that the power of PPNT reduces as the effect size increases under the quadratic relationship (Supplementary Fig. 3). This may be presumably because the node effect in this setting, to some extent, has an impact on the edge effect. Similar results can also be found under different sample sizes (Supplementary Figs. 5–16) and when the gene expression prediction in TWAS was derived from the BSLMM model (Supplementary Figs. 17–34).

Simulation results for ordinal categorical phenotype

Overall, the simulation results of both PoLoNet and PPNT in detecting the node effect for ordinal categorical phenotype are quite consistent with that for the binary phenotype. Again, both methods have well-calibrated type I error rates regardless of the correlation patterns among the network nodes, the sample sizes and regardless whether the ratios across different categories are balanced or not (Fig. 4, a–d). Under various simulation scenarios, the power of both methods is almost same, and as expected, both power increases as the sample size increases and as the ratio across different categories is more balanced (Fig. 5, a–d and Supplementary Figs. 35 and 36).

Fig. 5. — Power of PoLoNet and PPNT for ordinal categorical phenotype. Power (y-axis) at a significance level 0.05 is plotted against different sample size (x-axis). The effect size is set to be 0.05 and DPR is used as the gene expression prediction model in TWAS. We considered 4 ratios across different categories [extremely unbalanced (100:1:1:1), strongly unbalanced (30:1:1:1), moderately unbalanced (10:1:1:1), and balanced (1:1:1:1)]. The simulations are considered with 4 correlation patterns included combination (a), sine (b), quadratic (c), and linear (d) when only a node has effect as well as combination (e), sine (f), quadratic (g), and linear (h) when only an edge has effect.

Both methods have well-calibrated type I error rates in detecting the edge effect for ordinal categorical phenotype (Fig. 4, e–h), regardless of the correlation patterns and regardless whether the ratio across different categories is balanced or not. The power of both methods increases as the sample size increases and the ratio across different categories becomes more balanced. In addition, PoLoNet shows higher power than PPNT when the correlation among different network nodes is nonlinear, regardless whether the ratios across different categories are balanced or not (Fig. 5, e–g). For example, under the combination of quadratic and sine relationship, the power of PoLoNet is 0.66 when the sample size is 10,000 and the ratio across different categories is moderately unbalanced, while it is 0.06 for PPNT. Although in this setting, the power of PoLoNet reduces to be 0.18 when the ratio across different categories is extremely unbalanced, it is still much higher than 0.08 for PPNT.

In addition, consistent with that of the binary phenotypes, the power difference between the 2 methods depends on the specific nonlinear relationship among the network nodes (Fig. 5, e–g). For example, the power of PoLoNet is significantly higher than that of PPNT under the combination of quadratic and sine relationship, while the power difference reduces under the sine relationship as well as quadratic relationship.

Again, with the increase of the sample size and the ratio across different categories becoming more balanced, the power of PoLoNet increases faster than that of PPNT. For example, when the correlation pattern is the combination of quadratic and sine and the sample size increases from 5,000 to 20,000, the power of PoLoNet increases from 0.17 to 0.77 when the ratio across different categories is strongly unbalanced, while the power of PPNT increases from 0.04 to 0.06. When the correlation pattern is the combination of quadratic and sine and the ratios across different categories change from strongly unbalanced to moderately unbalanced, the power of PoLoNet increases from 0.77 to 0.91 under sample size is 20,000, while the power of PPNT increases from 0.06 to 0.07.

In addition, the power advantage of PoLoNet over PPNT remains as long as the correlation patterns are nonlinear, regardless whether the ratio across different categories is balanced or not.

Similar to that of binary phenotype, PoLoNet shows a slightly lower power than PPNT under the linear relationship among different network nodes (Fig. 5h and Supplementary Figs. 37 and 38). All the results are consistent when using BSLMM as the gene expression prediction model in TWAS (Supplementary Figs. 39–44). In addition, all the results when randomly selecting the effective node and edge are consistent with that when prespecifying the effecting node and edge (Supplementary Fig. 45).

Real data application

We totally analyzed 31 biological networks, among which 22 are potentially related with blood pressure and 9 are potentially related with bipolar disorder and major depression. Given the node test and the edge test are often highly correlated in network regression of TWAS and the commonly used Bonferroni correction is too stringent for multiple nonindependent tests, we adjusted the P-values using the false discover rate (FDR) with Benjamini–Hochberg procedure to perform multiple tests, and declared the significance at an FDR threshold 0.05 with FDR being calculated per gene network.

Blood pressure

Table 1 displayed the significant genes identified from both methods. Consistent with the simulations that both PoLoNet and PPNT have the comparable performance in detecting the node effect, PoLoNet successfully identified 39 genes and PPNT also identified 39 genes when regarding blood pressure as binary phenotype, with 37 genes being overlapped. When regarding blood pressure as ordinal categorical phenotype, PoLoNet successfully identified 63 genes and PPNT also identified 58 genes, with 56 genes being overlapped.

Table 1.

Effecting nodes for blood pressure with P-values in parenthesis being corrected by FDR.

Network	Binary phenotype		Ordinal categorical phenotype
Network	PoLoNet	PPNT	PoLoNet	PPNT
hsa04925	NPPA (0.0000083)	NPPA (0.0000065)	NPPA (0.00000001355)	NPPA (0.0000000135)
			CACNA1D (0.0076038)	CACNA1D (0.0100866)
			CALML6 (0.0422552)	CALML6 (0.0475372)
hsa04926	MAPK14 (0.0125709)	MAPK14 (0.0135514)	MAPK14 (0.008507)	MAPK14 (0.0072977)
hsa04270	NPPA (0.0000131)	NPPA (0.0000089)	NPPA (0.00000002834)	NPPA (0.0000000283)
hsa04610
hsa04614
hsa04924	NPPA (0.0000083)	NPPA (0.0000095)	NPPA (0.00000000758)	NPPA (0.00000000758)
	CREB1 (0.0224878)	CREB1 (0.0219618)	ADRB2 (0.0076037)	ADRB2 (0.0051748)
	CACNA1D (0.0471104)	ADRB2 (0.0365762)	CACNA1D (0.0076037)	CACNA1D (0.0063713)
	ADRB2 (0.0527955)	CACNA1D (0.0365762)	CREB1 (0.017981)	PDE3B (0.0221398)
			PDE3B (0.0214371)	CREB1 (0.0221398)
			CALML6 (0.0280106)	CALML6 (0.0294454)
hsa04960	PIK3R3 (0.0403795)	PIK3R3 (0.0386846)	PIK3R3 (0.0042848)	PIK3R3 (0.0040126)
			NEDD4L (0.0462781)	NEDD4L (0.0481877)
			IGF1 (0.0474213)	IGF1 (0.0481877)
hsa04933	MAPK14 (0.0180681)	AGER (0.0460069)	MAPK14 (0.0144626)	MAPK14 (0.0151749)
	PLCD3 (0.0399671)	MAPK14 (0.0254948)	TGFB2 (0.045379)	PLCD3 (0.0083805)
		PLCD3 (0.0460069)	PLCD3 (0.0049779)	CDKN1B (0.0293716)
			CDKN1B (0.0331786)
hsa04260	CACNB4 (0.0487354)	CACNB4 (0.0482168)	CACNB4 (0.0478903)	CACNB4 (0.0469485)
	MYH7 (0.0006689)	MYH7 (0.0005076)	MYH7 (0.0002048)	MYH7 (0.0001368)
	MYL2 (0.0117386)	MYL2 (0.0119919)	MYL2 (0.0114153)	MYL2 (0.0112197)
	MYL4 (0.000261)	MYL4 (0.0002293)	MYL4 (0.0000001)	MYL4 (0.0000001)
	ATP1A1 (0.0199741)	ATP1A1 (0.0196313)	ATP1A3 (0.0248526)	ATP1A3 (0.0245137)
	ATP1A3 (0.0487354)	ATP1A3 (0.0482168)
hsa04910	PPP1R3C (0.0239664)	PPP1R3C (0.0236558)	MKNK1 (0.0378979)	PPP1R3C (0.0241115)
	PIK3R3 (0.0456463)	PRKAR2A (0.0437213)	PPP1CC (0.0312045)	PRKAR2A (0.0090441)
		PIK3R3 (0.0402483)	PPP1R3C (0.0204466)	RPS6KB1 (0.0090441)
			PRKAR2A (0.01401)	PIK3R3 (0.0056313)
			RPS6KB1 (0.0078237)
			PIK3R3 (0.0078237)
hsa05415	CTSD (0.0430295)	NDUFS3 (0.0002045)	PARP1 (0.0240052)	PARP1 (0.0255786)
	NDUFS3 (0.0002546)	MAPK14 (0.006855)	CTSD (0.0190586)	CTSD (0.0201943)
	MAPK14 (0.0090722)		NDUFS3 (0.0026946)	NDUFS3 (0.0023011)
	RELA (0.0351457)		TGFB2 (0.0240052)	TGFB2 (0.0255786)
			MAPK14 (0.0085807)	MAPK14 (0.0057581)
hsa04750	MAPK14 (0.0133908)	MAPK14 (0.0223228)	MAPK14 (0.0120018)	MAPK14 (0.0222056)
	PIK3R3 (0.0256072)	PIK3R3 (0.0294868)	PIK3R3 (0.0052431)	PIK3R3 (0.0060677)
	PPP1CC (0.020693)	PPP1CC (0.0294868)	PPP1CC (0.0039188)	PPP1CC (0.0067183)
			CALML6 (0.0499864)
hsa04931	PPP1R3C (0.001951)	PPP1R3C (0.0017368)	RPS6KB1 (0.0144681)	RPS6KB1 (0.0127727)
	PTPN11 (0.0169454)	PTPN11 (0.0184986)	PIK3R3 (0.0144681)	PIK3R3 (0.0155044)
			PPP1R3C (0.0024292)	PPP1CC (0.0475085)
			PTPN11 (0.0144681)	PPP1R3C (0.0025617)
				PTPN11 (0.0155044)
hsa04370	MAPK14 (0.0030008)	MAPK14 (0.0022332)	PIK3R3 (0.0128875)	PIK3R3 (0.0095003)
			MAPK14 (0.011384)	MAPK14 (0.0070355)
hsa04350			RPS6KB1 (0.0078612)	RPS6KB1 (0.0051628)
			TGFB2 (0.0362249)	TGFB2 (0.0254927)
hsa04150	WNT3 (0.0175629)	WNT3 (0.006402)	WNT3 (0.0006346)	WNT3 (0.0003605)
	WNT3A (0.0500215)	WNT3A (0.0337752)	RPS6KB1 (0.0247184)	RPS6KB1 (0.0197823)
	SEH1L (0.0500215)	SEH1L (0.0337752)	SEH1L (0.0033181)	SEH1L (0.0016656)
	PIK3R3 (0.0500215)	PIK3R3 (0.0451122)	PIK3R3 (0.0158498)	PIK3R3 (0.0197823)
hsa04064	MAP3K14 (0.0000002)	MAP3K14 (0.0000001)	MAP3K14 (7.7E−09)	MAP3K14 (7.7E−09)
	BLNK (0.0234098)	BLNK (0.0236899)	BLNK (0.0493661)	TRAF2 (0.0023191)
	GADD45A (0.022993)	GADD45A (0.0204957)	TRAF2 (0.0029848)
hsa04020			ADRB2 (0.0440296)	ADRB2 (0.0377802)
			MST1 (0.0191079)	ITPKB (0.0210195)
			ITPKB (0.0151195)	PDGFRA (0.0334299)
			PDGFRA (0.0440296)	PLCD3 (0.0455692)
			PLCD3 (0.0465473)
hsa04010	MAP3K14 (0.0000014)	MAP3K14 (0.0000003)	MAP3K14 (9.57E−09)	MAP3K14 (9.57E−09)
				TRAF2 (0.0463493)
hsa03320	NR1H3 (0.0054105)	NR1H3 (0.0060042)	NR1H3 (0.0497514)
hsa00590	CYP2C19 (0.0306979)	CYP2C19 (0.0164376)	CYP2C19 (0.0012097)	CYP2C19 (0.0005756)
hsa00140	CYP11A1 (0.0428634)	CYP11A1 (0.0433207)	CYP19A1 (0.0215005)	CYP19A1 (0.0275218)
			CYP11A1 (0.0078703)	CYP11A1 (0.006447)

Open in a new tab

Also consistent with simulations that PoLoNet gained more power than PPNT in detecting the edge effect (Table 2), PoLoNet identified 71 edges while PPNT identified 66 edges when regarding blood pressure as binary phenotype, with 63 edges being overlapped. The significant edges identified by PoLoNet rather than PPNT included PRKCE–PRKD3 (P = 0.0054, hsa04925), RAMP2–GNAS (P = 0.0507, hsa04270), PRKACB–TRPV1 (P = 2.588E−10, hsa04750), HSD17B1–HSD17B12 (P = 0, hsa00140), SRD5A1–HSD17B8 (P = 0.0428, hsa00140), HSD17B12–CYP21A2 (P = 8.14E−11, hsa00140), HSD17B12–CYP7B1 (P = 1.15E−09, hsa00140), and UGT1A6–CYP19A1 (P = 0.0115, hsa00140). When regarding blood pressure as ordinal categorical phenotype, PoLoNet identified 78 edges while PPNT identified 68 edges, with 67 being overlapped. The significant edges identified by PoLoNet rather than PPNT included PRKCE–PRKD3 (P = 0.0223, hsa04925), RAMP2–GNAS (P = 0.0509, hsa04270), RELA–TNF (P = 0.0162, hsa04933), PRKAG2–ACACB (P = 0.0363, hsa04910), AKT3–TBC1D4 (P = 0.0200, hsa05415), PRKACB–TRPV1 (P = 0.0491, hsa04750), SPHK2–NRAS (P = 0.0205, hsa04370), FZD8–LRP5 (P = 0.0159, hsa04150), PDGFB–FGFR4 (P = 0.0466, hsa04020), UGT1A6–CYP3A5 (P = 0.0417, hsa00140), and UGT1A6–CYP19A1 (P = 0.0399, hsa00140).

Table 2.

Effecting edges for blood pressure with P-values in parenthesis being corrected by FDR.

Network	Binary phenotype		Ordinal categorical phenotype
Network	PoLoNet	PPNT	PoLoNet	PPNT
hsa04925	PLCB1–KCNK9 (0)	PLCB1–KCNK9 (0)	PLCB1–KCNK9 (0)	PLCB1–KCNK9 (0)
	GNA11–PLCB2 (3.95E−09)	GNA11–PLCB2 (2.92E−09)	GNA11–PLCB2 (1.141E−17)	GNA11–PLCB2 (1.545E−17)
	GNA11–PLCB1 (0.005448)	GNA11–PLCB1 (0.004804)	GNA11–PLCB1 (0.000178)	GNA11–PLCB1 (0.0002)
	PRKCE–PRKD3 (0.05021)		PRKCE–PRKD3 (0.0223)
hsa04926	AKT3–NOS3 (0)	AKT3–NOS3 (0)	AKT3–NOS3 (0)	AKT3–NOS3 (0)
	NFKB1–VEGFA (1.143E−09)	NFKB1–VEGFA (1.74E−09)	NFKB1–VEGFA (3.528E−19)	NFKB1–VEGFA (8.818E−19)
	NFKBIA–RELA (0.01024)	NFKBIA–RELA (0.010928)	NFKBIA–RELA (0.0001)	NFKBIA–RELA (0.0001)
hsa04270	MYLK4–MYL9 (0)	MYLK4–MYL9 (0)	MYLK4–MYL9 (0)	MYLK4–MYL9 (0)
	MYLK–MYL6 (1.48E−09)	MYLK–MYL6 (8.55E−10)	MYLK–MYL6 (2.794E−18)	MYLK–MYL6 (1.8813E−18)
	MYLK–MYL6B (0.0264738)	MYLK–MYL6B (0.019432)	MYLK–MYL6B (0.0006)	MYLK–MYL6B (0.0005)
	RAMP2–GNAS (0.0507706)		RAMP2–GNAS (0.0510)
hsa04610	F12–PLG (0)	F12–PLG (0)	F12–PLG (0)	F12–PLG (0)
	PLAT–PLG (3.45E−12)	PLAT–PLG (3.12E−12)	PLAT–PLG (2.164E−21)	PLAT–PLG (1.618E−21)
	PLAU–PLG (0.0003694)	PLAU–PLG (0.0003726)	PLAU–PLG (0.00001)	PLAU–PLG (0.00001429)
hsa04614	ACE–ENPEP (0)	ACE–ENPEP (0)	ACE–ENPEP (0)	ACE–ENPEP (0)
	THOP1–ACE (0.0013855)	THOP1–ACE (0.0013621)	THOP1–ACE (0.00005)	THOP1–ACE (0.00005119)
hsa04924	GNAS–ADCY5 (0)	GNAS–ADCY5 (0)	GNAS–ADCY5 (0)	GNAS–ADCY5 (0)
	PRKACB–KCNMA1 (5.46E−10)	PRKACB–KCNMA1 (7.4E−10)	PRKACB–KCNMA1 (6.4E−19)	PRKACB–KCNMA1 (8.55E−19)
	PRKACB–CREB1 (0.0125937)	PRKACB–CREB1 (0.0132)	PRKACB–CREB1 (0.0002)	PRKACB–CREB1 (0.00016)
	GNAI2–ADCY6 (0.0557875)	GNAI2–ADCY6 (0.0593)	GNAS–ADCY6 (0.0498)	GNAS–ADCY6 (0.0469121)
hsa04960	SGK1–KCNJ1 (0)	SGK1–KCNJ1 (0)	SGK1–KCNJ1 (0)	SGK1–KCNJ1 (0)
	NR3C2–KRAS (2.00E−10)	NR3C2–KRAS (1.69E−10)	NR3C2–KRAS (1.887E−19)	NR3C2–KRAS (1.434E−19)
	NR3C2–SGK1 (0.0023932)	NR3C2–SGK1 (0.0021451)	NR3C2–SGK1 (0.00006)	NR3C2–SGK1 (0.00005)
hsa04933	AGER–PLCD3 (0)	AGER–PLCD3 (0)	AGER–PLCD3 (0)	AGER–PLCD3 (0)
	AGER–PLCB3 (0.009526)	AGER–PLCB3 (0.0078223)	AGER–PLCB3 (0.0007)	AGER–PLCB3 (0.0005)
	AGER–PLCB4 (9.297E−10)	AGER–PLCB4 (8.06E−10)	AGER–PLCB4 (9.734E−17)	AGER–PLCB4 (4.827E−17)
			RELA–TNF (0.0162)
hsa04260	ACTC1–MYH6 (0)	ACTC1–MYH6 (0)	ACTC1–MYH6 (0)	ACTC1–MYH6 (0)
	TPM1–ACTC1 (0.0093002)	TPM1–ACTC1 (0.0092292)	TPM1–ACTC1 (0.0003)	TPM1–ACTC1 (0.00026)
	TPM2–ACTC1 (4.69E−10)	TPM2–ACTC1 (4.47E−10)	TPM2–ACTC1 (6.485E−19)	TPM2–ACTC1 (5.948E−19)
hsa04910	TSC1–RHEB (0)	TSC1–RHEB (0)	TSC1–RHEB (0)	TSC1–RHEB (0)
	MAPK8–IRS2 (0.0247373)	MAPK8–IRS2 (0.0236557)	PRKAG2–ACACB (0.036)	MAPK8–IRS2 (0.0023749)
	MAPK9–IRS1 (3.40E−07)	MAPK9–IRS1 (1.32E−07)	MAPK8–IRS2 (0.0025)	MAPK9–IRS1 (3.00E−14)
			MAPK9–IRS1 (6.105E−14)	PIK3CB–PRKCI (0.0380)
			KRAS–BRAF (0.0379)
hsa05415	PRKCA–NOS3 (0)	PRKCA–NOS3 (0)	PRKCA–NOS3 (0)	PRKCA–NOS3 (0)
	PRKCB–TGFB1 (1.91E−10)	PRKCB–TGFB1 (1.88E−10)	AKT3–TBC1D4 (0.0200)	PRKCB–TGFB1 (4.185E−18)
	PRKCB–RELA (0.0006456)	PRKCB–RELA (0.0008)	PRKCB–TGFB1 (2.878E−18)	PRKCB–RELA (0.0001391)
			PRKCB–RELA (0.0001)
hsa04750	P2RY2–GNAQ (0)	P2RY2–GNAQ (0)	P2RY2–GNAQ (0)	P2RY2–GNAQ (0)
	PRKACB–TRPV1 (0.029883)	PTGER2–GNAS (0.0063)	PRKACB–TRPV1 (0.0490)	PTGER2–GNAS (0.00009)
	PTGER2–GNAS (0.0059618)	PTGER4–GNAS (1.64E−10)	PTGER2–GNAS (0.00008)	PTGER4–GNAS (7.272E−20)
	PTGER4–GNAS (2.59E−10)		PTGER4–GNAS (8.357E−20)
hsa04931	MAPK8–IRS1 (0)	MAPK8–IRS1 (0)	MAPK8–IRS1 (0)	MAPK8–IRS1 (0)
	MAPK10–IRS1 (0.0378388)	MAPK10–IRS1 (0.034916)	MAPK10–IRS1 (0.0061)	MAPK10–IRS1 (0.0053753)
	MAPK10–IRS2 (1.83E−08)	MAPK10–IRS2 (1.09E−08)	MAPK10–IRS2 (2.527E−16)	MAPK10–IRS2 (2.237E−16)
hsa04370	CDC42–MAPK14 (0)	CDC42–MAPK14 (0)	CDC42–MAPK14 (0)	CDC42–MAPK14 (0)
	PLCG1–PPP3CA (0.0057084)	PLCG1–PPP3CA (0.0056)	PLCG1–PPP3CA (0.00008)	PLCG1–PPP3CA (0.00008)
	PLCG1–PPP3CB (1.33E−11)	PLCG1–PPP3CB (1.59E−11)	PLCG1–PPP3CB (9.467E−21)	PLCG1–PPP3CB (1.183E−20)
			SPHK2–NRAS (0.0205)
hsa04350	LTBP1–TGFB1 (0)	LTBP1–TGFB1 (0)	LTBP1–TGFB1 (0)	LTBP1–TGFB1 (0)
	CUL1–SMAD3 (1.83E−10)	CUL1–SMAD3 (0.0120854)	CUL1–SMAD3 (0.0004)	CUL1–SMAD3 (0.0004)
	CUL1–SMAD2 (0.016279)	CUL1–SMAD2 (1.23E−10)	CUL1–SMAD2 (7.828E−19)	CUL1–SMAD2 (8.116E−19)
hsa04150	IGF1R–GRB2 (0)	IGF1R–GRB2 (0)	IGF1R–GRB2 (0)	IGF1R–GRB2 (0)
	IRS1–PIK3CA (0.0181419)	GSK3B–TBC1D7 (0.0063)	FZD8–LRP5 (0.0158)	IRS1–PIK3CA (0.0003)
	IRS1–PIK3CB (6.92E−08)	IRS1–PIK3CA (0.0070)	IRS1–PIK3CA (0.0006)	IRS1–PIK3CB (4.272E−16)
		IRS1–PIK3CB (1.54E−08)	IRS1–PIK3CB (2.25E−15)
hsa04064	BIRC2–RIPK1 (0)	BIRC2–RIPK1 (0)	BIRC2–RIPK1 (0)	BIRC2–RIPK1 (0)
	NFKBIA–NFKB1 (0.00090)	NFKBIA–NFKB1 (0.0008)	NFKBIA–NFKB1 (0.000008)	NFKBIA–NFKB1 (0.000007)
	NFKBIA–RELA (1.03E−10)	NFKBIA–RELA (4.30E−11)	NFKBIA–RELA (5.952E−19)	NFKBIA–RELA (2.519E−19)
hsa04020	ADCY1–PRKACA (0)	ADCY1–PRKACA (0)	PDGFB–FGFR4 (0.0466026)	CAMK1D–CALML6 (0.0075)
	CAMK1D–CALM1 (2.8E−07)	CAMK1D–CALM1 (11.8E−07)	CAMK1D–CALML6 (0.0099)	CAMK1D–CALM1 (5.5153E−15)
			CAMK1D–CALM1 (7.31E−15)	ADCY1–PRKACA (0)
			ADCY1–PRKACA (0)
hsa04010	RASGRF1–RRAS2 (0)	RASGRF1–RRAS2 (0)	RASGRF1–RRAS2 (0)	RASGRF1–RRAS2 (0)
	RASGRF1–RRAS (2.8E−08)	RASGRF1–RRAS (1.68E−08)	RASGRF1–NRAS (0.0039)	RASGRF1–NRAS (0.0047)
	GRB2–ERBB2 (0.0038895)	NRAS–MAP3K1 (0.0396)	RASGRF1–RRAS (9.71E−15)	RASGRF1–RRAS (1.24E−14)
hsa03320	PPARA–HMGCS1 (0)	PPARA–HMGCS1 (0)	PPARA–HMGCS1 (0)	PPARA–HMGCS1 (0)
	PPARA–SCD (0.0054104)	PPARA–SCD (0.0060)	PPARA–SCD (0.0000842)	PPARA–SCD (0.00009)
	PPARA–SCD5 (4.21E−09)	PPARA–SCD5 (2.75E−09)	PPARA–SCD5 (1.624E−17)	PPARA–SCD5 (1.1708E−17)
hsa00590	EPHX2–CYP2J2 (0)	EPHX2–CYP2J2 (0)	EPHX2–CYP2J2 (0)	EPHX2–CYP2J2 (0)
	CYP2J2–GPX1 (0.0306979)	CYP2J2–GPX1 (0.035442)	CYP2J2–GPX1 (0.0004405)	CYP2J2–GPX1 (0.0005)
	CYP2J2–GPX2 (1.56E−10)	CYP2J2–GPX2 (1.68E−10)	CYP2J2–GPX2 (2.847E−20)	CYP2J2–GPX2 (5.114E−20)
hsa00140	HSD17B1–HSD17B12 (0)	HSD17B1–UGT1A6 (0)	HSD17B1–UGT1A6 (0)	HSD17B1–UGT1A6 (0)
	HSD17B1–UGT2A3 (0.0015)	HSD17B1–UGT2A3 (0.00163)	HSD17B1–UGT2A3 (0.00004)	HSD17B1–UGT2A3 (0.00005)
	SRD5A1–HSD17B8 (0.0429)	HSD17B1–CYP1A1 (6.01E−11)	HSD17B1–CYP1A1 (4.315E−20)	HSD17B1–CYP1A1 (5.46E−20)
	HSD17B12–CYP21A2 (8.1E−11)		UGT1A6–CYP3A5 (0.0417)	CYP1B1–UGT2B7 (0.0352)
	HSD17B12–CYP7B1 (1.16E−09)		UGT1A6–CYP19A1 (0.0400)
	UGT1A6–CYP19A1 (0.0116)

Open in a new tab

Further analysis illustrated that the edges identified by PoLoNet rather than PPNT are more likely to have the nonlinear relationship (Supplementary Fig. 46, a and b). In addition, those edges simultaneously identified by both methods are more likely to have the linear relationship among network nodes (Supplementary Fig. 46c). All these results showed that PoLoNet can capture both linear and nonlinear relationship in network regression of TWAS.

Bipolar and major depression status

Both PoLoNet and PPNT have the comparable performance in detecting the node effect. They identified the common gene SLC6A12 (P = 0.01043 for PoLoNet and P = 0.01045 for PPNT) in hsa04721 when regarding bipolar and major depression status as binary trait, as well as SLC6A12 (P = 0.00678 and P = 0.00688 for PoLoNet and PPNT) in hsa04721 and BAG1 (P = 0.02647 and P = 0.03440 for PoLoNet and PPNT) in hsa04141 when regarding bipolar and major depression status as ordinal categorical phenotype. Neither methods identified significant edges when regarding bipolar and major depression status as binary phenotype. However, PoLoNet significantly identified an effecting edge, CALM1–HRAS (P = 0.042894, hsa04720), while PPNT failed to detect any effecting edges when regarding bipolar and major depression status as ordinal categorical phenotype.

Again, the scatter plot further indicated that the correlation between CALM1 and HRAS is more likely to be nonlinear (Supplementary Fig. 46f). The fewer significant signals of bipolar and major depression may be due to the high missing rate of the samples, totally 256,348 out of 337,129 individuals in UK Biobank are missed for bipolar and major depression status.

Discussion

Identifying biological networks related to the complex traits is important to illustrate the network mechanism underlying complex disease. We, in TWAS framework, have presented PoLoNet, a novel network regression method that detects the association between a given network and either the binary or ordinal categorical phenotypes of interest. PoLoNet relies on DPR to obtain the optimum gene expression prediction weights, introduces PMI to measure the general between-node correlation and, more importantly, can simultaneously identify the specific gene nodes as well as edges related to the outcome traits. Technically, PoLoNet chooses the nonparametric kernel density estimation to estimate PMI to avoid the risk of misspecification of the joint distribution of 2 gene nodes within a network. Comprehensively realistic simulations illustrate that PoLoNet can effectively capture the general relationship among different gene nodes and has better performance than another competing method. In addition, the advantage of PoLoNet remains robust against different gene expression imputation models in TWAS, as well as different ratios across multiple categories of phenotypes.

Additional simulations also illustrated the power loss if regarding the binary or ordinal categorical phenotype as a quantitative trait (Supplementary Figs. 47 and 48), the robustness of PoLoNet when some genes are unavailable (Supplementary Figs. 49 and 50). We also demonstrated that PoLoNet is computationally efficient (Supplementary Table 7) and can analyze large datasets with hundreds of thousands of individuals including UK Biobank. In addition, we have conducted simulations and real data analysis to compare PoLoNet with TIGAR, given that TIGAR has shown the advantage in capturing the SNP effect size on gene expression by using DPR in the first TWAS step. The simulation shows power gain of PoLoNet for network regression in TWAS (Supplementary Figs. 51–53). The real data analysis shows similar performance with TIGAR in detecting the node effect (Supplementary Table 8).

One may argue that we can first get the PMI estimate among the network nodes of gene expression, rather than predicted gene expression, in the eQTL study, given that the gene expression data are available in the eQTL study. Then, the estimate of PMI can be regarded as a new exposure and the standard TWAS analysis can be conducted in the proportional odds logistic model for second stage. However, there would be large prediction error due to the limited sample size in the eQTL study (e.g. only 465 samples in the GEUVADIS data). In addition, different from traditional TWAS analysis that choosing the cis-SNPs of each gene as the genotypes, it is hard to determine, both biologically and statistically, which SNPs can be chosen for the PMI between 2 genes as the genotypes.

Many studies have shown that genetic factors can play important roles in the development of hypertension. In real data analysis, we found that the genes or gene–gene interactions including NPPA, RELA, ADRB1, CREB1, TGFB2, GNAS–ADCY5, and RELA–TNF were significantly associated with blood pressure. NPPA can be regarded as one of the representative genes associated with blood pressure. The NPPA T2238C variant was associated with the modification of antihypertensive medication effects on cardiovascular diseases and blood pressure (Lynch et al. 2008). In addition, there are relevant literatures demonstrating the relationship between RELA (Zhang et al. 2021), ADRB1 (Peng et al. 2009; Kong et al. 2013), CREB1 (Gonzalez et al. 2017; van Zonneveld et al. 2020), TGFB2 (Walker et al. 2012), and blood pressure, respectively. The signaling molecule cAMP catalyzed by ADCY5 responds to the G protein encoded by GNAS, resulting in an increase of IL6, thus with an effect on hypertension (Cheng et al. 2017; Ruan et al. 2017). This is consistent with our study that GNAS–ADCY5 affects blood pressure. Besides, miR-138 could improve LPS-induced TNF-α and IL-6 levels through targeting RELA and affecting NF-κB signaling. Abnormal levels of TNF-α and IL-6 would disrupt trophoblast function and lead to preeclampsia which is a pregnancy complication classified by new onset of elevated blood pressure (Yin et al. 2021).

Likewise, we found that SLC6A12, BAG1, and CALM1–HRAS can also play important roles in the pathogenesis of bipolar and major depression status. Previous studies confirmed SLC6A13 (GAT2) and SLC6A12 (BGT1), as putative Notch/RBP-J pathway targets, which may function downstream of RBP-J to limit the accumulation of GABA in the Schaffer collateral pathway and lead to defects in long-term depression (Liu et al. 2015). Similarly, there are still some literatures to support the relationships between the genes [e.g. BAG1 (Szczepankiewicz et al. 2021), CALM1 (Bazwinsky‐Wutschke et al. 2014), and HRAS (Schreiber et al. 2017)] and bipolar disorder and major depression.

PoLoNet is not without limitations. First, the gene network structure is assumed to be prespecified. Often, learning network structure requires determining all the possible edges with the highest degree of data matching, and a joint probability distribution of all network nodes can reflect more than 1 network structure. Indeed, most biologists can roughly determine the network structure involving in the corresponding biological process. In addition, the publicly available multiple databases (e.g. KEGG) can also provide help to establish the network. Second, PoLoNet directly plugs the estimator of PMI among different predicted gene expressions into the regression model and fails to account for the uncertainty in estimating the PMI, which may lead to the power loss, especially in smaller eQTL study. PMI also fails to account for the direction of the link between gene codes. Another limitation is that PoLoNet is developed only for individual level data. Extending PoLoNet to summary data is nontrivial, as it is hard to calculate the PMI between 2 network nodes only using the summary data. In addition, caution should be made against the interpretation of the effect of individual nodes and edges, given the potential for mediation effects within the network.

Acknowledgments

The authors are grateful to UK Biobank resource.

Funding

This work was supported by the National Natural Science Foundation of China [81872712 and 82173624] and the Natural Science Foundation of Shandong Province [ZR2019ZD02].

Author contributions

ZY conceived the study. LZ, TJ, XJ, and JH processed the data. LZ, TJ, JJ, and ZY verified all the data in the study. LZ and TJ performed the analyses. LZ, TJ, ZY, and XZ wrote the manuscript. All authors read and approved the final manuscript.

Conflicts of interest

We declare that we have no conflict of interest.

Contributor Information

Liye Zhang, Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China.

Tao Ju, Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China.

Xiuyuan Jin, Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China.

Jiadong Ji, Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, China.

Jiayi Han, Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China.

Xiang Zhou, Department of Biostatistics, The University of Michigan, Ann Arbor, MI 48109, USA.

Zhongshang Yuan, Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China.

Data Availability

The GEUVADIS data underlying this article are available at https://www.internationalgenome.org/data-portal/data-collection/geuvadis. And the UK Biobank data used in our real data analysis are available at https://www.ukbiobank.ac.uk/, under application number 51470. The sample QC procedure in Neale lab also can be obtained at https://github.com/Nealelab/UK_Biobank_GWAS/tree/master/imputed-v2-gwas. Supplementary materials are freely available at https://github.com/Liye222/PoLoNet. Our method is implemented in the R package PoLoNet, freely available at https://github.com/Liye222/PoLoNet.

Literature cited

Alvo M, Liu Z, Williams A, Yauk C.. Testing for mean and correlation changes in microarray experiments: an application for pathway analysis. BMC Bioinformatics. 2010;11(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barabási A-L, Gulbahce N, Loscalzo J.. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bazwinsky‐Wutschke I, Mühlbauer E, Albrecht E, Peschke E.. Calcium‐signaling components in rat insulinoma β‐cells (INS‐1) and pancreatic islets are differentially influenced by melatonin. J Pineal Res. 2014;56(4):439–449. [DOI] [PubMed] [Google Scholar]
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cao C, Kwok D, Edie S, Li Q, Ding B, Kossinna P, Campbell S, Wu J, Greenberg M, Long Q.. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief Bioinform. 2021;22(4):bbaa270. [DOI] [PubMed] [Google Scholar]
Cheng HM, Koutsidis G, Lodge JK, Ashor A, Siervo M, Lara J.. Tomato and lycopene supplementation and cardiovascular risk factors: a systematic review and meta-analysis. Atherosclerosis. 2017;257:100–108. [DOI] [PubMed] [Google Scholar]
Christensen RHB. Cumulative link models for ordinal regression with the R package ordinal. J Stat Softw. 2018;35:1–40. [Google Scholar]
Collaboration PS. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet. 2002;360(9349):1903–1913. [DOI] [PubMed] [Google Scholar]
de Siqueira Santos S, Takahashi DY, Nakata A, Fujita A.. A comparative study of statistical methods used to identify dependencies between gene expression signals. Brief Bioinform. 2014;15(6):906–918. [DOI] [PubMed] [Google Scholar]
Ehret GB, Caulfield MJ.. Genes for blood pressure: an opportunity to understand hypertension. Eur Heart J. 2013;34(13):951–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eisen MB, Spellman PT, Brown PO, Botstein D.. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95(25):14863–14868. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NJ, et al. ; GTEx Consortium. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gonzalez AA, Salinas-Parra N, Leach D, Navar LG, Prieto MC. PGE2 upregulates renin through E-prostanoid receptor 1 via PKC/cAMP/CREB pathway in M-1 cells. Am J Physiol Renal Physiol. 2017;313(4):F1038–F1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22(9):1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hirschfeld R. Differential diagnosis of bipolar disorder and major depressive disorder. J Affect Disord. 2014;169:S12–S16. [DOI] [PubMed] [Google Scholar]
Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, et al. ; Alzheimer’s Disease Genetics Consortium. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet. 2019;51(3):568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kong H, Li X, Zhang S, Guo S, Niu W.. The β1-adrenoreceptor gene Arg389Gly and Ser49Gly polymorphisms and hypertension: a meta-analysis. Mol Biol Rep. 2013;40(6):4047–4053. [DOI] [PubMed] [Google Scholar]
Lappalainen T, Sammeth M, Friedländer MR, 't Hoen PAC, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, et al. ; Geuvadis Consortium. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501(7468):506–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee SH, Wray NR, Goddard ME, Visscher PM.. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88(3):294–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin W, Ji J, Zhu Y, Li M, Zhao J, Xue F, Yuan Z.. PMINR: pointwise mutual information-based network regression–with application to studies of lung cancer and Alzheimer’s disease. Front Genet. 2020;11:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu L, Zeng P, Xue F, Yuan Z, Zhou X.. Multi-trait transcriptome-wide association studies with probabilistic Mendelian randomization. Am J Hum Genet. 2021;108(2):240–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu S, Wang Y, Worley PF, Mattson MP, Gaiano N.. The canonical Notch pathway effector RBP‐J regulates neuronal plasticity and expression of GABA transporters in hippocampal networks. Hippocampus. 2015;25(5):670–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch AI, Boerwinkle E, Davis BR, Ford CE, Eckfeldt JH, Leiendecker-Foster C, Arnett DK.. Pharmacogenetic association of the NPPA T2238C genetic variant with cardiovascular disease outcomes in patients with hypertension. JAMA. 2008;299(3):296–307. [DOI] [PubMed] [Google Scholar]
Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, Pasaniuc B.. Probabilistic fine-mapping of transcriptome-wide association studies. Nat Genet. 2019;51(4):675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
McIntyre RS, Berk M, Brietzke E, Goldstein BI, López-Jaramillo C, Kessing LV, Malhi GS, Nierenberg AA, Rosenblat JD, Majeed A, et al. Bipolar disorders. Lancet. 2020;396(10265):1841–1856. [DOI] [PubMed] [Google Scholar]
McKenzie AT, Katsyv I, Song W-M, Wang M, Zhang B.. DGCA: a comprehensive R package for differential gene correlation analysis. BMC Syst Biol. 2016;10(1):1–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M, Gibson G, De Jager PL, Bennett DA, Wingo AP, Wingo TS, et al. TIGAR: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am J Hum Genet. 2019;105(2):258–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peng Y, Xue H, Luo L, Yao W, Li R.. Polymorphisms of the β1-adrenergic receptor gene are associated with essential hypertension in Chinese. Clin Chem Lab Med. 2009;47(10):1227–1231. [DOI] [PubMed] [Google Scholar]
Ruan X, Chen T, Wang X, Li Y.. Suxiao Jiuxin Pill protects cardiomyocytes against mitochondrial injury and alters gene expression during ischemic injury. Exp Ther Med. 2017;14(4):3523–3532. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schreiber J, Grimbergen L-A, Overwater I, Vaart T v d, Stedehouder J, Schuhmacher AJ, Guerra C, Kushner SA, Jaarsma D, Elgersma Y.. Mechanisms underlying cognitive deficits in a mouse model for Costello Syndrome are distinct from other RASopathy mouse models. Sci Rep. 2017;7(1):1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stegle O, Parts L, Piipari M, Winn J, Durbin R.. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7(3):500–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
Szczepankiewicz D, Narożna B, Celichowski P, Sakrajda K, Kołodziejski P, Banach E, Zakowicz P, Pruszyńska-Oszmałek E, Pawlak J, Wiłkość M, et al. Genes involved in glucocorticoid receptor signalling affect susceptibility to mood disorders. World J Biol Psychiatry. 2021;22(2):149–160. [DOI] [PubMed] [Google Scholar]
van Zonneveld AJ, Au YW, Stam W, van Gelderen S, Rotmans JI, Deen PMT, Rabelink TJ, Bijkerk R.. MicroRNA-132 regulates salt-dependent steady-state renin levels in mice. Commun Biol. 2020;3(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Walker KA, Cai X, Caruana G, Thomas MC, Bertram JF, Kett MM.. High nephron endowment protects against salt-induced hypertension. Am J Physiol Renal Physiol. 2012;303(2):F253–F258. [DOI] [PubMed] [Google Scholar]
Wang M, Xie J, Xu S.. M6A-BiNP: predicting N6-methyladenosine sites based on bidirectional position-specific propensities of polynucleotides and pointwise joint mutual information. RNA Biol. 2021;18(12):2498–2512. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wen X, Luca F, Pique-Regi R.. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 2015;11(4):e1005176. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams B, Mancia G, Spiering W, Agabiti Rosei E, Azizi M, Burnier M, Clement DL, Coca A, de Simone G, Dominiczak A, et al. ; ESC Scientific Document Group. 2018 ESC/ESH Guidelines for the management of arterial hypertension: the Task Force for the management of arterial hypertension of the European Society of Cardiology (ESC) and the European Society of Hypertension (ESH). Eur Heart J. 2018;39(33):3021–3104. [DOI] [PubMed] [Google Scholar]
Wu C, Pan W.. A powerful fine-mapping method for transcriptome-wide association studies. Hum Genet. 2020;139(2):199–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Lee SH, Goddard ME, Visscher PM.. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yin A, Chen Q, Zhong M, Jia B.. MicroRNA-138 improves LPS-induced trophoblast dysfunction through targeting RELA and NF-κB signaling. Cell Cycle. 2021;20(5–6):508–521. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan Z, Zhu H, Zeng P, Yang S, Sun S, Yang C, Liu J, Zhou X.. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat Commun. 2020;11(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeng P, Dai J, Jin S, Zhou X.. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet. 2021;30(10):939–951. [DOI] [PubMed] [Google Scholar]
Zeng P, Zhou X.. Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat Commun. 2017;8(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang L, Li Z, Xing C, Gao N, Xu R.. Folate reverses NF-κB p65/Rela/IL-6 level induced by hyperhomocysteinemia in spontaneously hypertensive rats. Front Pharmacol. 2021;2107:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou D, Jiang Y, Zhong X, Cox NJ, Liu C, Gamazon ER.. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet. 2020;52(11):1239–1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–487. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[iyac153-B1] Alvo M, Liu Z, Williams A, Yauk C.. Testing for mean and correlation changes in microarray experiments: an application for pathway analysis. BMC Bioinformatics. 2010;11(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B2] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B3] Barabási A-L, Gulbahce N, Loscalzo J.. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B4] Bazwinsky‐Wutschke I, Mühlbauer E, Albrecht E, Peschke E.. Calcium‐signaling components in rat insulinoma β‐cells (INS‐1) and pancreatic islets are differentially influenced by melatonin. J Pineal Res. 2014;56(4):439–449. [DOI] [PubMed] [Google Scholar]

[iyac153-B5] Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O'Connell J, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B6] Cao C, Kwok D, Edie S, Li Q, Ding B, Kossinna P, Campbell S, Wu J, Greenberg M, Long Q.. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief Bioinform. 2021;22(4):bbaa270. [DOI] [PubMed] [Google Scholar]

[iyac153-B7] Cheng HM, Koutsidis G, Lodge JK, Ashor A, Siervo M, Lara J.. Tomato and lycopene supplementation and cardiovascular risk factors: a systematic review and meta-analysis. Atherosclerosis. 2017;257:100–108. [DOI] [PubMed] [Google Scholar]

[iyac153-B8] Christensen RHB. Cumulative link models for ordinal regression with the R package ordinal. J Stat Softw. 2018;35:1–40. [Google Scholar]

[iyac153-B9] Collaboration PS. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet. 2002;360(9349):1903–1913. [DOI] [PubMed] [Google Scholar]

[iyac153-B10] de Siqueira Santos S, Takahashi DY, Nakata A, Fujita A.. A comparative study of statistical methods used to identify dependencies between gene expression signals. Brief Bioinform. 2014;15(6):906–918. [DOI] [PubMed] [Google Scholar]

[iyac153-B11] Ehret GB, Caulfield MJ.. Genes for blood pressure: an opportunity to understand hypertension. Eur Heart J. 2013;34(13):951–961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B12] Eisen MB, Spellman PT, Brown PO, Botstein D.. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95(25):14863–14868. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B13] Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NJ, et al. ; GTEx Consortium. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B14] Gonzalez AA, Salinas-Parra N, Leach D, Navar LG, Prieto MC. PGE2 upregulates renin through E-prostanoid receptor 1 via PKC/cAMP/CREB pathway in M-1 cells. Am J Physiol Renal Physiol. 2017;313(4):F1038–F1049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B15] Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245–252. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B16] Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22(9):1760–1774. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B17] Hirschfeld R. Differential diagnosis of bipolar disorder and major depressive disorder. J Affect Disord. 2014;169:S12–S16. [DOI] [PubMed] [Google Scholar]

[iyac153-B18] Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, et al. ; Alzheimer’s Disease Genetics Consortium. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet. 2019;51(3):568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B19] Kong H, Li X, Zhang S, Guo S, Niu W.. The β1-adrenoreceptor gene Arg389Gly and Ser49Gly polymorphisms and hypertension: a meta-analysis. Mol Biol Rep. 2013;40(6):4047–4053. [DOI] [PubMed] [Google Scholar]

[iyac153-B20] Lappalainen T, Sammeth M, Friedländer MR, 't Hoen PAC, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, et al. ; Geuvadis Consortium. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501(7468):506–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B21] Lee SH, Wray NR, Goddard ME, Visscher PM.. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88(3):294–305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B22] Lin W, Ji J, Zhu Y, Li M, Zhao J, Xue F, Yuan Z.. PMINR: pointwise mutual information-based network regression–with application to studies of lung cancer and Alzheimer’s disease. Front Genet. 2020;11:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B23] Liu L, Zeng P, Xue F, Yuan Z, Zhou X.. Multi-trait transcriptome-wide association studies with probabilistic Mendelian randomization. Am J Hum Genet. 2021;108(2):240–256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B24] Liu S, Wang Y, Worley PF, Mattson MP, Gaiano N.. The canonical Notch pathway effector RBP‐J regulates neuronal plasticity and expression of GABA transporters in hippocampal networks. Hippocampus. 2015;25(5):670–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B25] Lynch AI, Boerwinkle E, Davis BR, Ford CE, Eckfeldt JH, Leiendecker-Foster C, Arnett DK.. Pharmacogenetic association of the NPPA T2238C genetic variant with cardiovascular disease outcomes in patients with hypertension. JAMA. 2008;299(3):296–307. [DOI] [PubMed] [Google Scholar]

[iyac153-B26] Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, Pasaniuc B.. Probabilistic fine-mapping of transcriptome-wide association studies. Nat Genet. 2019;51(4):675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B27] McIntyre RS, Berk M, Brietzke E, Goldstein BI, López-Jaramillo C, Kessing LV, Malhi GS, Nierenberg AA, Rosenblat JD, Majeed A, et al. Bipolar disorders. Lancet. 2020;396(10265):1841–1856. [DOI] [PubMed] [Google Scholar]

[iyac153-B28] McKenzie AT, Katsyv I, Song W-M, Wang M, Zhang B.. DGCA: a comprehensive R package for differential gene correlation analysis. BMC Syst Biol. 2016;10(1):1–25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B29] Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M, Gibson G, De Jager PL, Bennett DA, Wingo AP, Wingo TS, et al. TIGAR: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am J Hum Genet. 2019;105(2):258–266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B30] Peng Y, Xue H, Luo L, Yao W, Li R.. Polymorphisms of the β1-adrenergic receptor gene are associated with essential hypertension in Chinese. Clin Chem Lab Med. 2009;47(10):1227–1231. [DOI] [PubMed] [Google Scholar]

[iyac153-B31] Ruan X, Chen T, Wang X, Li Y.. Suxiao Jiuxin Pill protects cardiomyocytes against mitochondrial injury and alters gene expression during ischemic injury. Exp Ther Med. 2017;14(4):3523–3532. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B32] Schreiber J, Grimbergen L-A, Overwater I, Vaart T v d, Stedehouder J, Schuhmacher AJ, Guerra C, Kushner SA, Jaarsma D, Elgersma Y.. Mechanisms underlying cognitive deficits in a mouse model for Costello Syndrome are distinct from other RASopathy mouse models. Sci Rep. 2017;7(1):1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B33] Stegle O, Parts L, Piipari M, Winn J, Durbin R.. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7(3):500–507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B34] Szczepankiewicz D, Narożna B, Celichowski P, Sakrajda K, Kołodziejski P, Banach E, Zakowicz P, Pruszyńska-Oszmałek E, Pawlak J, Wiłkość M, et al. Genes involved in glucocorticoid receptor signalling affect susceptibility to mood disorders. World J Biol Psychiatry. 2021;22(2):149–160. [DOI] [PubMed] [Google Scholar]

[iyac153-B35] van Zonneveld AJ, Au YW, Stam W, van Gelderen S, Rotmans JI, Deen PMT, Rabelink TJ, Bijkerk R.. MicroRNA-132 regulates salt-dependent steady-state renin levels in mice. Commun Biol. 2020;3(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B36] Walker KA, Cai X, Caruana G, Thomas MC, Bertram JF, Kett MM.. High nephron endowment protects against salt-induced hypertension. Am J Physiol Renal Physiol. 2012;303(2):F253–F258. [DOI] [PubMed] [Google Scholar]

[iyac153-B37] Wang M, Xie J, Xu S.. M6A-BiNP: predicting N6-methyladenosine sites based on bidirectional position-specific propensities of polynucleotides and pointwise joint mutual information. RNA Biol. 2021;18(12):2498–2512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B38] Wen X, Luca F, Pique-Regi R.. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 2015;11(4):e1005176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B39] Williams B, Mancia G, Spiering W, Agabiti Rosei E, Azizi M, Burnier M, Clement DL, Coca A, de Simone G, Dominiczak A, et al. ; ESC Scientific Document Group. 2018 ESC/ESH Guidelines for the management of arterial hypertension: the Task Force for the management of arterial hypertension of the European Society of Cardiology (ESC) and the European Society of Hypertension (ESH). Eur Heart J. 2018;39(33):3021–3104. [DOI] [PubMed] [Google Scholar]

[iyac153-B40] Wu C, Pan W.. A powerful fine-mapping method for transcriptome-wide association studies. Hum Genet. 2020;139(2):199–213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B41] Yang J, Lee SH, Goddard ME, Visscher PM.. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B42] Yin A, Chen Q, Zhong M, Jia B.. MicroRNA-138 improves LPS-induced trophoblast dysfunction through targeting RELA and NF-κB signaling. Cell Cycle. 2021;20(5–6):508–521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B43] Yuan Z, Zhu H, Zeng P, Yang S, Sun S, Yang C, Liu J, Zhou X.. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat Commun. 2020;11(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B44] Zeng P, Dai J, Jin S, Zhou X.. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet. 2021;30(10):939–951. [DOI] [PubMed] [Google Scholar]

[iyac153-B45] Zeng P, Zhou X.. Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat Commun. 2017;8(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B46] Zhang L, Li Z, Xing C, Gao N, Xu R.. Folate reverses NF-κB p65/Rela/IL-6 level induced by hyperhomocysteinemia in spontaneously hypertensive rats. Front Pharmacol. 2021;2107:1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B47] Zhou D, Jiang Y, Zhong X, Cox NJ, Liu C, Gamazon ER.. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet. 2020;52(11):1239–1246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyac153-B48] Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–487. [DOI] [PubMed] [Google Scholar]

PERMALINK

Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies

Liye Zhang

Tao Ju

Xiuyuan Jin

Jiadong Ji

Jiayi Han

Xiang Zhou

Zhongshang Yuan

Roles

Abstract

Introduction

Materials and methods

Pointwise mutual information

PoLoNet model

Numeric simulations

Fig. 1.

Real data application

Results

Simulation results for binary phenotype

Fig. 2.

Fig. 3.

Simulation results for ordinal categorical phenotype

Fig. 4.

Fig. 5.

Real data application

Blood pressure

Table 1.

Table 2.

Bipolar and major depression status

Discussion

Acknowledgments

Funding

Author contributions

Conflicts of interest

Contributor Information

Data Availability

Literature cited

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases