Inferring a directed acyclic graph of phenotypes from GWAS summary statistics

Rachel Zilinskas; Chunlin Li; Xiaotong Shen; Wei Pan; Tianzhong Yang

doi:10.1093/biomtc/ujad039

. 2024 Mar 12;80(1):ujad039. doi: 10.1093/biomtc/ujad039

Inferring a directed acyclic graph of phenotypes from GWAS summary statistics

Rachel Zilinskas ¹, Chunlin Li ², Xiaotong Shen ³, Wei Pan ^4,^✉, Tianzhong Yang ^5,^✉

PMCID: PMC10928990 PMID: 38470257

ABSTRACT

Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer’s disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available.

Keywords: Alzheimer’s disease (AD), directed acyclic graph (DAG), genome-wide association study (GWAS), likelihood ratio test, proteomics

1. INTRODUCTION

Network analysis has deepened our understanding of biological mechanisms and disease etiologies (Zhang and Itan, 2019). Specifically, protein–protein interaction (PPI) networks that capture the interplay of proteins in the biomolecular systems are vital for normal cell functions (Snider et al., 2015). Disturbing of the normal pattern in the PPI network can be causative to or indicative of a disease state. Studies have linked co-regulatory networks of proteins to a variety of complex diseases (Ross and Poirier, 2004; Emilsson et al., 2018). Recently, a network-based method modeling PPI boasted high accuracy rates in cancer prediction (Id et al., 2021). Cheng et al. (2021) further showed that disease-associated variants were significantly enriched in the sequences coding PPI interfaces compared to variants in healthy individuals. Their work also demonstrated associations of PPIs with drug resistance and overall survival, highlighting the use of protein networks for informing genotype-based therapy. Network-based analyses have shown their potential in advancing precision medicine for complex diseases over traditional approaches, which focus on monogenic mutations and independent assessment of risk factors (Napoli et al., 2020).

Network analyses can be categorized into 2 groups. One utilizes only phenotypic data to construct networks. For example, weighted gene network co-expression analysis estimates an undirected network, which is further characterized using dimension reduction techniques (Zhang and Horvath, 2005). Graphical lasso formulation employs penalized methods to estimate a Gaussian graphical model for a large number of variables (Witten et al., 2012). Bayesian network analysis (Friedman et al., 2000) estimates directed acyclic graphs (DAG), which are widely accepted in biological systems (Ashburner et al., 2000), and its recent improvements in computational approaches have led to much shortened computational time (Liu et al., 2016). The other group of methods exploits the use of instrumental variable (IV) techniques to estimate a DAG, assuming a linear structural equation model. Chen et al. (2018) developed a penalized two-stage least squares method to estimate a DAG, assuming known intervention targets. Li et al. (2023) further extended the work to accommodate unknown intervention targets commonly encountered in biological applications.

Individual-level data are required for all the methods above, which, however, can be difficult to obtain, especially for human studies, due to logistic limitations and privacy concerns. On the other hand, many genome-wide association studies (GWAS) have shared their summary statistics publicly, generating a rich and valuable data resource. Thus, we propose adapting the network estimation and inference methods of Li et al. (2023) to rely only on GWAS summary statistics and a genetic reference panel, both much more easily accessible. We will show how a DAG can be estimated for cardiovascular-related proteins using a large-scale proteomic GWAS summary dataset, and then link the protein network to Alzheimer’s disease (AD). The algorithm for the proposed work is packaged in R. Our work represents one of the initial attempts to utilize GWAS summary statistics in the construction of a DAG. We expect that our work can facilitate more comprehensive network analysis in studying biological and medical relationships. In addition to inferring PPI network, our method is readily applicable to understand the interplay of many other molecular and non-molecular phenotypes, as long as the corresponding GWAS summary statistics are available.

2. METHODS

2.1. Network modeling and data

2.1.1. Directed phenotype network

Our goal is to use genotypes as external interventions to construct and infer a DAG that describes the directed relationships among a set of phenotypes. In the framework of interventional Gaussian DAG (Li et al., 2023), we assume

(1)

where Inline graphic is the N × P data matrix of P phenotypes, is the N × Q data matrix of Q genotypes serving as IVs, is the N × P error matrix with each row sampled from , and N is the sample size. Note that Equation (1) lacks an intercept because we assume phenotype and genotype are centered at mean 0, which could be easily done with individual-level GWAS data.

In Equation (1), Inline graphic and are unknown parameters to be estimated. The P × P matrix specifies the network structure such that u_kj ≠ 0 indicates a directed relation from phenotype k to phenotype j. The Q × P matrix specifies the targets and strengths of interventions in that w_qp ≠ 0 indicates an interventional relation from genotype q to phenotype p. Let Inline graphic be the set of directed relations, and be the set of interventional relations.

2.1.2. Summary statistics and reference panel

In Equation (1), the data matrices Inline graphic and contain individual information, such that each row represents the variables measured on an individual. On the other hand, GWAS summary statistics aggregate N observations into a single measure for each single nucleotide polymorphism (SNP) across the whole genome. This measure is the average effect of having 1 copy of the effect allele of the SNP on the phenotype being studied. It is estimated by Inline graphic , often reported along with accompanying statistics, such as the corresponding standard error , z-score z_qp, sample size N, reference allele (REF), minor allele frequency (MAF), and P-value. The summary statistics of the Q SNPs in are included in the GWAS summary data.

As a complement to the summary-level data, a reference panel comprising genotypic data of individuals from a general population provides the correlation structure among the genotypes. Many existing resources can be used for such a reference panel (International HapMap Consortium, 2005; The 1000 Genomes Project Consortium, 2015; Bycroft et al., 2018; Taliun et al., 2021). Given an N_r × Q (centered) reference panel Inline graphic of N_r individuals, we follow the conventional suggestion (Mak et al., 2017) to regularize the genetic correlation matrix , such that , where 0 ≤ s_p ≤ 1 is a real number controlling the degree of regularization.

From the summary statistics and the reference panel, we compute the following quantities that are used for the construction and inference of the directed phenotype network. The subsequent computation also assumes Inline graphic and are centered, which does not influence and the accompanying statistics.

The covariance matrix of genotypes is estimated by .
Let . Then is estimated by , provided that MAF is reported in the summary statistics, or otherwise estimated by , the q th diagonal element of .
Given , is estimated by .
For , we use the median estimate .
Finally, is estimated using the null SNPs from GWAS summary statistics, that is, SNPs not marginally associated with or . Following Kim et al. (2015), , where and are vectors of z-scores for the null SNPs. Thus, we can rearrange the sample correlation formula for (centered) phenotype variables and plug in our approximation to obtain . In practice, the SNPs with P-values >0.05 are considered as null SNPs. An alternative method to consider for estimating is also feasible to use for GWAS (Bulik-Sullivan et al., 2015), although not used herein.

Next, we extend the framework of interventional Gaussian DAG to leverage large-scale GWAS summary statistics.

2.2. Method for network construction

The estimation of interventional Gaussian DAG consists of 3 steps:

First, we use penalized regressions to estimate the genotype–phenotype association matrix in the following equation
(2)
Next, we employ the peeling algorithm (Li et al., 2023) to learn a super-DAG, that is, a directed super-graph without cycles, based on obtained in Step (E1).
Finally, we estimate and through penalized regressions based on the estimated super-DAG in Step (E2).

Now, we elaborate on our extensions to accommodate summary statistics.

2.2.1. Estimation of V by truncated Lasso penalized regressions

In Equation (2), the matrix Inline graphic can be estimated column-wise from

(3)

where Inline graphic is the data vector of phenotype p, is the data matrix of genotypes, and vector is the p th column of . Given the summary statistics, we expand the squared error function , and replace the quantities , , and in Equation (3) with their estimates , , and , respectively. As a result, we estimate Inline graphic through regressions with the Truncated Lasso Penalty (TLP) (Shen et al., 2012) to minimize

(4)

where κ_p > 0 is an integer tuning parameter and Inline graphic is the TLP function, which does not penalize the parameters over the threshold τ_p. We use the R package “glmtlp” (Li et al., 2022) to fit the summary-level data regression Equation (4).

For implementation, we fix Inline graphic and choose κ_p ∈ {1, …, Q} individually for each of the P penalized regressions by minimizing the pseudo-BIC (Pattee and Pan, 2020), which is defined as , where is the (estimated) sum of squared error of , is the number of nonzero coefficients in , is the estimate in Equation (4) with tuning parameters (λ_p, τ_p), N is the sample size (when N differs, the median is taken), and Inline graphic is the estimated residual variance for phenotype p in Equation (3). When Q is small compared to N as in our application, a consistent estimate for can be obtained from the ordinary least squares using all Q genotypes, , where is the estimate in Equation (4) with κ_p = P. Letting Inline graphic (1 ≤ p ≤ P) be the estimates with the optimally chosen tuning parameters, the final estimate of is .

2.2.2. Estimation of super-DAG by the peeling algorithm

Given Inline graphic , the peeling algorithm (Li et al., 2023) can be used to construct a super-DAG with phenotype edge set (a superset of ) and interventional edge set (a superset of ). The key idea is that the sparse pattern of matrix characterizes the orientations of the relations among the phenotypes. Specifically, it is demonstrated in Li et al. (2023) that v_qp ≠ 0 implies that genotype q intervenes on phenotype p or an ancestor node of phenotype p in the DAG. Thus, if v_qp ≠ 0 and v_qi = 0 for i ≠ p, then phenotype p is a leaf node in the DAG; that is, there is no directed edge from phenotype p to the others. On this basis, we can sequentially identify and remove (ie, peel) the leaf node in the DAG, and construct supersets Inline graphic and .

Since the peeling algorithm solely depends on Inline graphic , no modification is needed to extend the existing method to accommodate summary-level data.

2.2.3. Estimation of U and W

The peeling algorithm yields supersets Inline graphic and . To remove the extra edges in and , we consider fitting U and W within a restricted model defined by and .

From Equation (1), for phenotype p, we have

(5)

where Inline graphic and . As in Section 2.2.1, we replace the corresponding quantities with the summary-level data estimates and fit the TLP regression based on Equation (5),

(6)

where Inline graphic is the parameter vector and . We fix and the tuning parameter are selected by pseudo-BIC as described in Section 2.2.1. The estimated and (1 ≤ p ≤ P) are aggregated to form the the final estimate and .

Due to penalization, we recommend following the common practice to standardize the variables so that the phenotypes and genotypes are on a comparable scale, which is straightforward to do as Inline graphic and are obtained. Moreover, if only is of interest, penalization of is optional.

2.3. Likelihood-based inference for a DAG

We extend the likelihood ratio inference (Li et al., 2023) to quantify the uncertainty of the network structures. As in Li et al. (2023), we consider 2 types of hypothesis testing.

Testing of multiple directed relations. The null hypothesis H₀: u_kj = 0 for each and alternative hypothesis H_a: u_kj ≠ 0 for some . Rejecting H₀ indicates evidence for the presence of some hypothesized relationships in the network.
Testing of a directed pathway. The null hypothesis H₀: u_kj = 0 for some and alternative hypothesis H_a: u_kj ≠ 0 for each . Rejecting H₀ indicates evidence for the presence of the entire directed pathway in the phenotype network.

The procedure for testing multiple directed relations comprises 5 steps.

Estimate and use the peeling algorithm to obtain and , as in Section 2.2.
Identify the set of non-degenerate edges (Li et al., 2023), which contains , non-degenerate edges pointed to phenotype p.
Estimate the parameters and under H₀ and H_a, respectively. Specifically, denote by and , the estimates under H₀. Then and are computed as in the regression (6) with an additional constraint that u_kj = 0 for . Let and be the estimates under H_a. Then and are computed from the restricted models (1 ≤ p ≤ P), , , via regression (6), where the penalties become and .
Compute (1 ≤ p ≤ P) from the residual sum of squares of .
Compute the test statistic , where L is the log-likelihood of the model [Equation (1)]. By Li et al. (2023), T is approximately chi-squared distributed with degrees of freedom when the size is <50; is approximately standard normal when >50. Thus, the P-value is calculated as , when and , when .

The procedure for testing a directed pathway is similar, with minor modifications.

Estimate , as in Step (T1).
First, we decompose H₀ into each nongenerate edge . For each , implement Steps (T2)–(T5) above to obtain the corresponding P-value PV _{(k, j)}. The final P-value is computed as the maximum of the P-values for the sub-hypotheses, .

Of note, testing a directed pathway concerns a composite (null) hypothesis. Fixing 0 < α < 1, we have Inline graphic (Li et al., 2023). In other words, the test asymptotically achieves exactly the α significance level for the composite null hypothesis.

3. INFERRING CARDIOVASCULAR-RELATED PROTEIN–PROTEIN INTERACTION NETWORK

The role of cardiovascular diseases has been recognized as an important etiologic hallmark of AD (de Bruijn and Ikram, 2014). There are different hypotheses on the various mechanisms underlying the association between AD and cardiovascular diseases (Tini et al., 2020). In this real data application, we constructed a directed PPI network of some cardiovascular-related proteins based on a GWAS of 83 plasma protein biomarkers. We further connected the PPI network to AD through MR analyses.

3.1. GWAS summary statistics for cardiovascular-related proteins

The GWAS summary statistics on 83 cardiovascular-related proteins, which came from Wald tests for the association between each SNP and the standardized residuals among 3394 European individuals by Folkersen et al. (2017) were used. Five proteins were excluded from the analysis as their corresponding protein-encoding genes are located on the sex chromosome. The summary statistics were first processed to remove (a) indels; (b) SNPs located within 1 base pair of an indel; (c) SNPs with imputation quality score INFO ≤ 0.8; and (d) SNPs with MAF ≤ 0.05. We then used the following steps to select putative IVs for the proteins:

SNPs were clumped at an r² value of 0.01 using 3000 uncorrelated individuals (individuals with kinship coefficients <0.084) from UK Biobank of European ancestry as the reference panel, such that SNPs were independent of each other for each protein (Bycroft et al., 2018).
Only the SNPs in the clumped data files located within ±1 MB of each protein-encoding gene were considered. In general, cis-regulatory changes will be less pleiotropic (Signor and Nuzhdin, 2018), and thus, these SNPs located close to the genes are more likely to be valid IVs due to the exclusion assumption (Swerdlow et al., 2016; Hemani et al., 2018; Li et al., 2023) (ie, an IV only directly intervenes on 1 primary variable).
To ensure the relevance assumption (Li et al., 2023) was satisfied (ie, IV intervenes on at least 1 primary variable), we only selected SNPs whose P-values were below the GWAS significance threshold (5 × 10⁻⁸). This filtering process led to a total number of 33 SNPs and 23 proteins, with at least 1 putative IV in the final network analysis.

The genetic correlation matrix for the included IVs was estimated based on the same reference panel used in clumping. We calculated the empirical correlation of each pair of proteins as the correlation coefficient of the z-scores of the null SNPs, that is, all autosomal SNPs with MAF ≥ 0.05, INFO ≥ 0.8, and GWAS P-values ≥ 0.05 for both proteins. The number of null SNPs for each pair of proteins ranged from 1 191 204 to 1 223 357. All preparation of the reference panel and GWAS data for both the DAG estimation and MR analysis was done using PLINK version 1.9 (Purcell et al., 2007).

3.2. GWAS summary statistics for AD

We explored the relationship between each of the 23 proteins in Folkersen et al. (2017) and AD. We used the summary statistics of the GWAS for AD from a most recent study totaling 111 326 clinically diagnosed or “proxy” AD cases and 677 663 controls (Bellenguez et al., 2022). We removed SNPs with MAF < 0.05, SNPs not included in the GWAS of Folkersen et al. (2017), and clumped SNPs at r² = 0.01. Among the remaining SNPs, we selected IVs only with a GWAS P-value <5 × 10⁻⁸ in the MR analyses.

3.3. Results

We constructed a DAG of the 23 proteins as described in Section 2.5.1. As Folkersen et al. (2017) shared MAF in the summary statistics, we compared them with those in UK Biobank, the reference panel for clumping and estimating genetic correlation matrix. The absolute difference of the MAF of all IVs ranged from 0.001 to 0.055 with a mean of 0.02, while the correlation was 0.99 (Supplementary Table S1). We further performed MR analysis on each protein to evaluate their relationship with AD using the TwoSampleMR package (Hemani et al., 2018). We used Egger’s test of intercept for examining the exclusion assumption: If a protein had a P-value of the Egger’s test of intercept >0.05/23, there was no evidence against no direct/pleiotropic effects, and we’d go with the more powerful MR-IVW method; otherwise, we used MR-Egger (to allow pleiotropic effects of IVs). In any case, we used the P-value cut-off <0.05/23 to declare statistical significance. Supplementary Table S2 contains a complete list of MR results. The protein IL18, which showed marginal significance in both Egger’s test of intercept (P-value = 0.06) and MR-Egger (P-value = 0.07), was a parent node for several proteins related to AD, including ADM, IL1RL1, CTSD, CXCL6, and CXCL16. We further performed the likelihood ratio test on each edge. Edges with P-value <0.05/(23 × 22−56) were considered as significant and were in solid line in Figure 1. The number of tests in the Bonferonni correction is bounded by the sum of possible edges among all the nodes minus the total number of edges after the peeling algorithm, which is justified in Supplementary Material S2.1. Each edge from IL18 to the 5 AD-associated proteins was highly significant in the likelihood ratio test, thus suggesting that simultaneous testing of the pathway from IL18 to the 5 genes would be significant. Previous studies detected increased levels of pro-inflammatory IL18 in both cardiovascular diseases and in brain regions of AD patients (Sutinen et al., 2014). IL18 is known to increase the level of Cdk5 and GSK-3β, which are involved in Tau hyperphosphorylation, and the inhibition of Cdk5 was known to improve AD subjects’ conditions (Calabrò et al., 2021). Our work suggests a possible regulatory role of IL18 on multiple AD-associated proteins. According to OpenTargets.org (Ochoa et al., 2021) for current pharmaceuticals either approved or in development with IL18, this protein is currently a target of an antibody drug to treat diabetes mellitus and a few other conditions; diabetes has long been linked to AD with epidemiological and biological evidence (Barbagallo and Dominguez, 2014). Lastly, we provide a Shiny application that allows users to test any selected proteins in this cardiovascular-related PPI network.

Estimated DAG for 23 proteins based on the GWAS summary statistics of Folkersen et al. (2017). Proteins significantly associated with AD in MR analysis are colored gold. A solid line represents an edge that is statistically significant by the likelihood ratio test whereas a dashed line represents an edge that is not significant.

4. SIMULATION STUDIES

4.1. Simulation settings

We simulated the data assuming a fixed Inline graphic , , standardized genotype matrix , and sampled each row of independently from . Then we generated from equation: . Without loss of generality, no intercept was modeled; that is, was centered at mean 0. The values of , , and are provided in Supplementary Material S1.1, where the structure of the relationship of Inline graphic followed a DAG of 15 nodes/phenotypes (Figure 2). The effect sizes of the non-zero components of ranged from 0.002 to 1.16, with a median of 0.06. All phenotypes had at least 1 valid IV. Twenty-six SNPs were included in the model, with their effect sizes ranging from −2.2 to 2.5 with a median of −0.11. Two SNPs violated the relevance assumption, while the rest were valid IVs. We also varied the effect sizes to be 1/3 and 1/15 of Inline graphic while keeping fixed.

True DAG for the simulation study with 15 phenotypes.

The standardized genotype matrix Inline graphic was obtained from unrelated individuals of European ancestry in UK Biobank (Bycroft et al., 2018). We then calculated the summary statistics using a linear model of each phenotype on each standardized genotype, and inputted the summary statistics into the proposed algorithm. The reference panel was obtained from the UK Biobank European samples, which were not correlated with the simulated samples used to derive the summary statistics. SNPs on chromosome 22 with a Hardy-Weinberg Disequilibrium test P-value > 0.0001, missing call rate <0.05, and MAF >0.05 were pruned to have r² < 0.01. We then randomly selected 26 SNPs for Inline graphic . Missing values of SNPs were imputed by their mean. Null SNPs were directly simulated to be independent of each other and have no relationship with .

We evaluated the performance of both the network construction and statistical inference for the proposed method. To evaluate the performance of network construction, we examined the false-positive (TP) and false-negative (FN) rates for Inline graphic over 200 replications. The sample size of the summary statistics was varied at 3000, 6000, 9000, and 12 000, and the sample size of the reference panel was fixed at 3000. Null SNPs were simulated to have the same sample size as the GWAS summary statistics.

In terms of testing, we examined the empirical Type I error/power of the likelihood ratio tests with increasing sample sizes and varying strength of Inline graphic for the following 5 scenarios over 1000 replications for the 2 types of testing.

I. Testing 1 or more directed relations:

A1. Type 1 error: testing 1 edge when in truth it was null with H₀: u_{1, 14} = 0 versus H₁: u_{1, 14} ≠ 0.

A2. Power: testing 1 edge when in truth it was not null with H₀: u_{1, 6} = 0 versus H₁: u_{1, 6} ≠ 0.

A3. Power: testing 2 edges together when in truth both were not null with H₀: u_{7, 15} = u_{1, 6} = 0 versus H₁: u_{7, 15} ≠ 0 or u_{1, 6} ≠ 0.

II. Testing of a directed pathway:

B1. Type 1 error: testing whether at least 1 edge was not null when in truth only 1 was not null with H₀: u_{1, 14} = 0 or u_{6, 12} = 0 versus H₁: u_{1, 14} ≠ 0 and u_{6, 12} ≠ 0.

B2. Power: testing whether at least 1 edge was not null when in truth both were not null with H₀: u_{1, 6} = 0 or u_{6, 12} = 0 versus H₁: u_{1, 6} ≠ 0 and u_{6, 12} ≠ 0.The true strengths of the tested edges were: u_{1, 14} = 0, u_{1, 6} = 0.27, u_{7, 15} = 0.44, and u_{6, 12} = 1.36.

4.2. Simulation results

With the increase of sample size, the constructed networks became closer to the true graph (Figure 3; numeric values in Figure 3a are in Supplementary Tables S3–S6). More specifically, FP of Inline graphic was ∼0.05, and FN of decreased with the increase of sample size when 15 000 null SNPs were used to estimate the 15 × 15 matrix of . Unsurprisingly, the performance of using estimated by 15 000 null SNPs (denoted as in Figure 3) was slightly worse than that of using (denoted as Inline graphic in Figure 3). In addition, the FP of clearly decreased with the increase of effect sizes (from to to ). To check the validity of our method, we compared pBIC estimated from summary statistics with BIC estimated from the individual-level data for the same set of penalized regression coefficients. We found the 2 sets of values were highly concordant, and the results from 1 iteration were plotted in Supplementary Figure S2.

Performance of network construction (a) and likelihood ratio tests (b–d) in simulation with varying sample sizes and true effect sizes at , , and . Figures b, c, and d represent scenarios A2, A3, and B2, respectively.

Inline graphic — Performance of network construction (a) and likelihood ratio tests (b–d) in simulation with varying sample sizes and true effect sizes at , , and . Figures b, c, and d represent scenarios A2, A3, and B2, respectively.

In terms of testing, we observed well-controlled Type I error rates for Scenarios A1 and B1 (numeric values present in Figure 3, b–d are in Supplementary Tables S7–S10), using Inline graphic . We note that the empirical Type I error rates might become conservative when is replaced by its estimate derived from 15 000 null SNPs (Supplementary Table S10). In real data analysis of GWAS summary level data, typically a much larger number of null SNPs and various other strategies can be used to better estimate Inline graphic (Bulik-Sullivan et al., 2015; Kim et al., 2015; Li et al., 2021); however, the investigation in this direction is out of our scope. Furthermore, empirical power was high for scenarios A2, A3, and B2 and increased with sample size and effect size. The empirically power of jointly testing 2 edges u_{1, 6} and u_{7, 15} (Scenario A3) was larger than testing one edge u_{1, 6} alone (Scenario A2).

5. DISCUSSION

In this paper, we present a method to estimate an interventional DAG of phenotypes utilizing linear structural equation models, applicable to GWAS summary statistics in the absence of individual-level data. We demonstrated satisfactory performance in terms of the FP and FN rates in network construction and high empirical power and of well-controlled Type I error rates of the likelihood ratio tests. We applied this method to a large-scale proteomic GWAS summary dataset to obtain an estimated DAG of 23 cardiovascular-related proteins and further illustrated the effects of these proteins on AD by MR analysis. These results can be useful in understanding the disease etiology, drug repurposing, and other applications for AD.

We note that the choice of a proper reference panel is just as important for our method as many other summary-statistics-based methods (Deng and Pan, 2018; Chen et al., 2021; Privé et al., 2022). When constructing the cardiovascular-related PPI network, we used an ancestry-matched reference panel of uncorrelated individuals from UK Biobank with a sample size of 3000, which is close to the sample size of the GWAS of cardiovascular proteins. Furthermore, we clumped SNPs around the cis-region of the gene and only used the genome-wide significant SNPs as IVs. This step not only aimed at selecting at least 1 valid IV for each protein, but also achieving better estimation of the genetic correlation matrix for the IVs as the number of total IVs became much smaller than the sample size of the reference panel, that is, Q ≪ N_r. Our analysis was constrained to a single ancestry group and unrelated samples. As the collection of multi-ancestry and related samples increases, it will be of significant research interest to establish networks among these populations, a challenge we anticipate addressing in our future work.

In recent years, a large amount of summary-level data has become widely accessible. On the molecular level, many studies published their summary statistics for SNP-molecular phenotype associations. For example, variant-gene associations in 49 tissues can be directly downloaded from GTExPortal (Consortium, 2020). Beyond the molecular phenotypes, UK Biobank alone provides GWAS summary statistics on >7000 traits, including but not limited to cognitive functions, early life factors, health and medical history, and physical measurement (Bycroft et al., 2018). Our proposed method provides a computational and analytical tool to explore the relationships among multiple phenotype variables by taking advantage of rapid advances in GWAS and other association mappings.

Supplementary Material

ujad039_Supplemental_Files

Web Appendices, Supplementary Tables S1–S10, Figure S1 referenced in Sections 3.3 and 4.2, and data and code (sumDAG algorithm, Shiny app, and the real data analysis code) are available with this paper at the Biometrics website on Oxford Academic. Data and code can also be found on https://github.com/chunlinli/sumdag.

ujad039_supplemental_files.zip^{(681.2KB, zip)}

Acknowledgement

Rachel Zilinskas and Chunlin Li have contributed equally to this work. The authors would like to thank the associate editor and the reviewer for their valuable comments. Tianzhong Yang would like to further acknowledge the Children’s Cancer Research Funds and the St. Baldrick’s Foundation Scholar Award.

Contributor Information

Rachel Zilinskas, Statistics and Data Corporation, Tempe, AZ 85288, United States.

Chunlin Li, Department of Statistics, Iowa State University, Ames, IA 50011, United States.

Xiaotong Shen, School of Statistics, University of Minnesota, Minneapolis, MN 55455, United States.

Wei Pan, Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States.

Tianzhong Yang, Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States.

FUNDING

This research was supported by NIH grants U01 AG073079, R01 AG065636, R01 AG069895, RF1 AG067924, R01 HL116720, and R01 GM126002 and by the Minnesota Supercomputing Institute at the University of Minnesota.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

We downloaded the summary-level genome-wide association study data in Section 3.1 from https://zenodo.org/record/264128/. The algorithm for the proposed work is packaged in R, available at https://github.com/chunlinli/sumdag and the Biometrics website, along with code used for the simulation studies and real data application. The processed summary-level genome-wide association study data that were used as input for the algorithm for the real data application are also included on GitHub.

References

Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M. et al. (2000) The gene ontology consortium gene ontology: tool for the unification of biology. Nature Genetics, 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barbagallo M., Dominguez L. J. (2014) Type 2 diabetes mellitus and Alzheimer’s disease. World Journal of Diabetes, 5, 889–893. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bellenguez C., Küçükali F., Jansen I. E., Kleineidam L., Moreno-Grau S., Amin N. et al. (2022) New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics, 54, 412–436. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bulik-Sullivan B., Finucane H. K., Anttila V., Gusev A., Day F. R., Loh P.-R. et al. (2015) An atlas of genetic correlations across human diseases and traits. Nature Genetics, 47, 1236–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bycroft C., Freeman C., Petkova D., Band G., Elliott L. T., Sharp K. et al. (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature, 562, 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
Calabrò M., Rinaldi C., Santoro G., Crisafulli C. (2021) The biological pathways of Alzheimer disease: a review. AIMS Neuroscience, 8, 86–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen C., Ren M., Zhang M., Zhang D. (2018) A two-stage penalized least squares method for constructing large systems of structural equations. Journal of Machine Learning Research, 19, 1–34. [Google Scholar]
Chen W., Wu Y., Zheng Z., Qi T., Visscher P. M., Zhu Z. et al. (2021) Improved analyses of gwas summary statistics by reducing data heterogeneity and errors. Nature Communications, 12, 7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheng F., Zhao J., Wang Y., Lu W., Liu Z., Zhou Y. et al. (2021) Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nature Genetics, 53, 342–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
Consortium G. (2020) The GTEx consortium atlas of genetic regulatory effects across human tissues. Science, 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Bruijn R. F., Ikram M. A. (2014) Cardiovascular risk factors and future risk of alzheimer’s disease. BMC Medicine, 12, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deng Y., Pan W. (2018) Improved use of small reference panels for conditional and joint analysis with gwas summary statistics. Genetics, 209, 401–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emilsson V., Ilkov M., Lamb J. R., Finkel N., Gudmundsson E. F., Pitts R. et al. (2018) Co-regulatory networks of human serum proteins link genetics to disease. Science, 361, 769–773. [DOI] [PMC free article] [PubMed] [Google Scholar]
Folkersen L., Fauman E., Sabater-Lleal M., Strawbridge R. J., Frånberg M., Sennblad B. et al. (2017) Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLOS Genetics, 13, e1006706. [DOI] [PMC free article] [PubMed] [Google Scholar]
Friedman N., Linial M., Nachman I., Pe’er D. (2000) Using bayesian networks to analyze expression data. In: Journal of Computational Biology, 7(3-4), 601–20. [DOI] [PubMed] [Google Scholar]
Hemani G., Bowden J., Smith G. D. (2018) Evaluating the potential role of pleiotropy in mendelian randomization studies. Human Molecular Genetics, 27, 195–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hemani G., Zheng J., Elsworth B., Wade K. H., Haberland V., Baird D. et al. (2018) The MR-base platform supports systematic causal inference across the human phenome. eLife, 7, e34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
Id J. Q., Chen K., Zhong C., Zhu S., Id X. M. (2021) Network-based protein-protein interaction prediction method maps perturbations of cancer interactome. PLOS Genetics, 17, e1009869. [DOI] [PMC free article] [PubMed] [Google Scholar]
International HapMap Consortium . (2005) A haplotype map of the human genome. Nature, 437, 1299–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim J., Bai Y., Pan W. (2015) An adaptive association test for multiple phenotypes with GWAS summary statistics. Genetic Epidemiology, 39, 651–663. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li C., Shen X., Pan W. (2023) Inference for a large directed acyclic graph with unspecified interventions. Journal of Machine Learning Research, 24, 1–48. [PMC free article] [PubMed] [Google Scholar]
Li C., Yang Y., Wu C. (2022) Package “glmtlp”. https://cran.r-project.org/web/packages/glmtlp/glmtlp.pdf.
Li T., Ning Z., Shen X. (2021) Improved estimation of phenotypic correlations using summary association statistics. Frontiers in Genetics, 12, 665252. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu F., Zhang S.-W., Guo W.-F., Wei Z.-G., Chen L. (2016) Inference of gene regulatory network based on local bayesian networks. PLoS Computational Biology, 12, e1005024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mak T. S. H., Porsch R. M., Choi S. W., Zhou X., Sham P. C. (2017) Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology, 41, 469–480. [DOI] [PubMed] [Google Scholar]
Napoli C., Benincasa G., Donatelli F., Ambrosio G. (2020) Precision medicine in distinct heart failure phenotypes: Focus on clinical epigenetics. American Heart Journal, 224, 113–128. [DOI] [PubMed] [Google Scholar]
Ochoa D., Hercules A., Carmona M., Suveges D., Gonzalez-Uriarte A., Malangone C. et al. (2021) Open targets platform: supporting systematic drug–target identification and prioritisation. Nucleic Acids Research, 49, D1302–D1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pattee J., Pan W. (2020) Penalized regression and model selection methods for polygenic scores on summary statistics. PLOS Computational Biology, 16, e1008271. [DOI] [PMC free article] [PubMed] [Google Scholar]
Privé F., Arbel J., Aschard H., Vilhjálmsson B. J. (2022) Identifying and correcting for misspecifications in gwas summary statistics and polygenic scores. Human Genetics and Genomics Advances, 3, 100136. [DOI] [PMC free article] [PubMed] [Google Scholar]
Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., Bender D. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ross C. A., Poirier M. A. (2004) Protein aggregation and neurodegenerative disease. Nature Medicine, 10, S10–S17. [DOI] [PubMed] [Google Scholar]
Shen X., Pan W., Zhu Y. (2012) Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107, 223–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
Signor S. A., Nuzhdin S. V. (2018) The evolution of gene expression in cis and trans. Trends in Genetics, 34, 532–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
Snider J., Kotlyar M., Saraon P., Yao Z., Jurisica I., Stagljar I. (2015) Fundamentals of protein interaction network mapping. Molecular Systems Biology, 11, 848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sutinen E. M., Korolainen M. A., Häyrinen J., Alafuzoff I., Petratos S., Salminen A. et al. (2014) Interleukin-18 alters protein expressions of neurodegenerative diseases-linked proteins in human SH-SY5Y neuron-like cells. Frontiers in Cellular Neuroscience, 8, 214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Swerdlow D., Kuchenbaecker K., Shah S., Sofat R., Holmes M., White J. et al. (2016) Selecting instruments for mendelian randomization in the wake of genome-wide association studies. International Journal of Epidemiology, 45, 1600–1616. [DOI] [PMC free article] [PubMed] [Google Scholar]
Taliun D., Harris D. N., Kessler M. D., Carlson J., Szpiech Z. A., Torres R. et al. (2021) Sequencing of 53,831 diverse genomes from the nhlbi topmed program. Nature, 590, 290–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
The 1000 Genomes Project Consortium . (2015) A global reference for human genetic variation. Nature, 526, 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tini G., Scagliola R., Monacelli F., La Malfa G., Porto I., Brunelli C. et al. (2020) Alzheimer’s disease and cardiovascular disease: a particular association. Cardiology Research and Practice, 2020, 2617970. [DOI] [PMC free article] [PubMed] [Google Scholar]
Witten D. M., Friedman J. H., Simon N. (2012) New insights and faster computations for the graphical lasso view. Journal of Computational and Graphical Statistics, 20: 892–900. [Google Scholar]
Zhang B., Horvath S. (2005) A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4: 1128. [DOI] [PubMed] [Google Scholar]
Zhang P., Itan Y. (2019) Biological network approaches and applications in rare disease studies. Genes, 10, 797. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ujad039_Supplemental_Files

ujad039_supplemental_files.zip^{(681.2KB, zip)}

Data Availability Statement

[bib1] Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M. et al. (2000) The gene ontology consortium gene ontology: tool for the unification of biology. Nature Genetics, 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Barbagallo M., Dominguez L. J. (2014) Type 2 diabetes mellitus and Alzheimer’s disease. World Journal of Diabetes, 5, 889–893. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Bellenguez C., Küçükali F., Jansen I. E., Kleineidam L., Moreno-Grau S., Amin N. et al. (2022) New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics, 54, 412–436. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Bulik-Sullivan B., Finucane H. K., Anttila V., Gusev A., Day F. R., Loh P.-R. et al. (2015) An atlas of genetic correlations across human diseases and traits. Nature Genetics, 47, 1236–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Bycroft C., Freeman C., Petkova D., Band G., Elliott L. T., Sharp K. et al. (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature, 562, 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Calabrò M., Rinaldi C., Santoro G., Crisafulli C. (2021) The biological pathways of Alzheimer disease: a review. AIMS Neuroscience, 8, 86–132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Chen C., Ren M., Zhang M., Zhang D. (2018) A two-stage penalized least squares method for constructing large systems of structural equations. Journal of Machine Learning Research, 19, 1–34. [Google Scholar]

[bib8] Chen W., Wu Y., Zheng Z., Qi T., Visscher P. M., Zhu Z. et al. (2021) Improved analyses of gwas summary statistics by reducing data heterogeneity and errors. Nature Communications, 12, 7117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Cheng F., Zhao J., Wang Y., Lu W., Liu Z., Zhou Y. et al. (2021) Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nature Genetics, 53, 342–353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Consortium G. (2020) The GTEx consortium atlas of genetic regulatory effects across human tissues. Science, 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] de Bruijn R. F., Ikram M. A. (2014) Cardiovascular risk factors and future risk of alzheimer’s disease. BMC Medicine, 12, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Deng Y., Pan W. (2018) Improved use of small reference panels for conditional and joint analysis with gwas summary statistics. Genetics, 209, 401–408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Emilsson V., Ilkov M., Lamb J. R., Finkel N., Gudmundsson E. F., Pitts R. et al. (2018) Co-regulatory networks of human serum proteins link genetics to disease. Science, 361, 769–773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Folkersen L., Fauman E., Sabater-Lleal M., Strawbridge R. J., Frånberg M., Sennblad B. et al. (2017) Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLOS Genetics, 13, e1006706. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Friedman N., Linial M., Nachman I., Pe’er D. (2000) Using bayesian networks to analyze expression data. In: Journal of Computational Biology, 7(3-4), 601–20. [DOI] [PubMed] [Google Scholar]

[bib16] Hemani G., Bowden J., Smith G. D. (2018) Evaluating the potential role of pleiotropy in mendelian randomization studies. Human Molecular Genetics, 27, 195–208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Hemani G., Zheng J., Elsworth B., Wade K. H., Haberland V., Baird D. et al. (2018) The MR-base platform supports systematic causal inference across the human phenome. eLife, 7, e34408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Id J. Q., Chen K., Zhong C., Zhu S., Id X. M. (2021) Network-based protein-protein interaction prediction method maps perturbations of cancer interactome. PLOS Genetics, 17, e1009869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] International HapMap Consortium . (2005) A haplotype map of the human genome. Nature, 437, 1299–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Kim J., Bai Y., Pan W. (2015) An adaptive association test for multiple phenotypes with GWAS summary statistics. Genetic Epidemiology, 39, 651–663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Li C., Shen X., Pan W. (2023) Inference for a large directed acyclic graph with unspecified interventions. Journal of Machine Learning Research, 24, 1–48. [PMC free article] [PubMed] [Google Scholar]

[bib22] Li C., Yang Y., Wu C. (2022) Package “glmtlp”. https://cran.r-project.org/web/packages/glmtlp/glmtlp.pdf.

[bib23] Li T., Ning Z., Shen X. (2021) Improved estimation of phenotypic correlations using summary association statistics. Frontiers in Genetics, 12, 665252. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Liu F., Zhang S.-W., Guo W.-F., Wei Z.-G., Chen L. (2016) Inference of gene regulatory network based on local bayesian networks. PLoS Computational Biology, 12, e1005024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Mak T. S. H., Porsch R. M., Choi S. W., Zhou X., Sham P. C. (2017) Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology, 41, 469–480. [DOI] [PubMed] [Google Scholar]

[bib26] Napoli C., Benincasa G., Donatelli F., Ambrosio G. (2020) Precision medicine in distinct heart failure phenotypes: Focus on clinical epigenetics. American Heart Journal, 224, 113–128. [DOI] [PubMed] [Google Scholar]

[bib27] Ochoa D., Hercules A., Carmona M., Suveges D., Gonzalez-Uriarte A., Malangone C. et al. (2021) Open targets platform: supporting systematic drug–target identification and prioritisation. Nucleic Acids Research, 49, D1302–D1310. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Pattee J., Pan W. (2020) Penalized regression and model selection methods for polygenic scores on summary statistics. PLOS Computational Biology, 16, e1008271. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Privé F., Arbel J., Aschard H., Vilhjálmsson B. J. (2022) Identifying and correcting for misspecifications in gwas summary statistics and polygenic scores. Human Genetics and Genomics Advances, 3, 100136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., Bender D. et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Ross C. A., Poirier M. A. (2004) Protein aggregation and neurodegenerative disease. Nature Medicine, 10, S10–S17. [DOI] [PubMed] [Google Scholar]

[bib32] Shen X., Pan W., Zhu Y. (2012) Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107, 223–232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Signor S. A., Nuzhdin S. V. (2018) The evolution of gene expression in cis and trans. Trends in Genetics, 34, 532–544. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Snider J., Kotlyar M., Saraon P., Yao Z., Jurisica I., Stagljar I. (2015) Fundamentals of protein interaction network mapping. Molecular Systems Biology, 11, 848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Sutinen E. M., Korolainen M. A., Häyrinen J., Alafuzoff I., Petratos S., Salminen A. et al. (2014) Interleukin-18 alters protein expressions of neurodegenerative diseases-linked proteins in human SH-SY5Y neuron-like cells. Frontiers in Cellular Neuroscience, 8, 214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Swerdlow D., Kuchenbaecker K., Shah S., Sofat R., Holmes M., White J. et al. (2016) Selecting instruments for mendelian randomization in the wake of genome-wide association studies. International Journal of Epidemiology, 45, 1600–1616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Taliun D., Harris D. N., Kessler M. D., Carlson J., Szpiech Z. A., Torres R. et al. (2021) Sequencing of 53,831 diverse genomes from the nhlbi topmed program. Nature, 590, 290–299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] The 1000 Genomes Project Consortium . (2015) A global reference for human genetic variation. Nature, 526, 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Tini G., Scagliola R., Monacelli F., La Malfa G., Porto I., Brunelli C. et al. (2020) Alzheimer’s disease and cardiovascular disease: a particular association. Cardiology Research and Practice, 2020, 2617970. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Witten D. M., Friedman J. H., Simon N. (2012) New insights and faster computations for the graphical lasso view. Journal of Computational and Graphical Statistics, 20: 892–900. [Google Scholar]

[bib41] Zhang B., Horvath S. (2005) A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4: 1128. [DOI] [PubMed] [Google Scholar]

[bib42] Zhang P., Itan Y. (2019) Biological network approaches and applications in rare disease studies. Genes, 10, 797. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Inferring a directed acyclic graph of phenotypes from GWAS summary statistics

Rachel Zilinskas

Chunlin Li

Xiaotong Shen

Wei Pan

Tianzhong Yang

ABSTRACT

1. INTRODUCTION

2. METHODS

2.1. Network modeling and data

2.1.1. Directed phenotype network

2.1.2. Summary statistics and reference panel

2.2. Method for network construction

2.2.1. Estimation of V by truncated Lasso penalized regressions

2.2.2. Estimation of super-DAG by the peeling algorithm

2.2.3. Estimation of U and W

2.3. Likelihood-based inference for a DAG

3. INFERRING CARDIOVASCULAR-RELATED PROTEIN–PROTEIN INTERACTION NETWORK

3.1. GWAS summary statistics for cardiovascular-related proteins

3.2. GWAS summary statistics for AD

3.3. Results

FIGURE 1.

4. SIMULATION STUDIES

4.1. Simulation settings

FIGURE 2.

4.2. Simulation results

FIGURE 3.

5. DISCUSSION

Supplementary Material

Acknowledgement

Contributor Information

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases