Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2025 Jan 7;21(1):e1011519. doi: 10.1371/journal.pgen.1011519

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Deborah Kunkel 1,*, Peter Sørensen 2, Vijay Shankar 3, Fabio Morgante 3,4,*
Editor: Anke Huels5
PMCID: PMC11741642  PMID: 39775068

Abstract

Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.

Author summary

Polygenic prediction refers to the use of an individual’s genetic information (i.e., genotypes) to predict traits (i.e., phenotypes), which are often of medical relevance. It is known that some phenotypes are related and are affected by the same genotypes. When this is the case, it is possible to improve the accuracy of predictions by using methods that model multiple phenotypes jointly and account for shared effects. mr.mash is a recently developed multi-phenotype method that can learn which effects are shared and has been shown to improve prediction. However, mr.mash requires large data sets of genetic and phenotypic information collected at the individual level. Such data are often unavailable due to privacy concerns, or are difficult to work with due to the computational resources needed to analyze data of this size. Our work extends mr.mash to require only summary statistics from Genome-Wide Association Studies, which are usually publicly available, instead of individual-level data. In addition, the computations using summary statistics do not depend on sample size, making the newly developed mr.mash-rss scalable to extremely large data sets. Using simulations and real data analysis, we show that our method is competitive with other methods for polygenic prediction.

Introduction

Predicting complex trait phenotypes from genotypes is a central task of a few branches of quantitative genetics. In agricultural breeding, there is interest in predicting breeding values (EBV) to select the best individuals for reproduction and achieve an increase in performance over generations [1]. In human genetics, predicting medically relevant phenotypes such as disease risk via polygenic scores (PGS) is important to stratify the population and identify individuals with greater genetic risk [2]. Finally, with the advent of transcriptome-wide association studies (TWAS), predicting gene expression as an intermediate step has become of interest [3]. In all these applications, accurate predictions are important. The response to artificial selection is directly proportional to the accuracy of EBVs [4]. Precise identification of individuals at risk for a particular disease requires accurate PGS [2]. The power to discover gene-phenotype associations in TWAS depends on the accuracy of gene expression prediction [5].

Technically, phenotypic prediction is achieved by modeling the phenotype of interest as a multiple regression on genotypes at a set of genetic variants [6]. Both frequentist and Bayesian approaches to multiple regression have been developed for and/or applied to this task, with accuracy spanning from very low to high depending on the genetic architecture of the trait analyzed [712]. Multiple phenotypes may be genetically correlated due to pleiotropy (i.e., the sharing of causal variants across traits). In that case, modeling these phenotypes jointly via multivariate multiple regression methods can improve effect sizes estimates by leveraging effect sharing and, thus, increase prediction accuracy [1317]. Integrative approaches that combine multiple single-phenotype PGSs across phenotypes have also been shown to improve prediction accuracy [18, 19].

Recently, Morgante et al. (2023) introduced the “Multiple Regression with Multivariate Adaptive Shrinkage” or “mr.mash” [20]. mr.mash is a Bayesian approach to multivariate multiple regression that is able learn complex patterns of effect sharing across phenotypes directly from the data. This is achieved through the use of flexible priors on the effect sizes across phenotypes and an empirical Bayes (EB) framework to adapt these priors to the data. Computational effiency is achieved by using Variational Inference (VI) as opposed to the more expensive Markov Chain Monte Carlo (MCMC) methods. For a detailed account of VI and EB in this context, including the advantages, we direct the reader to [20]. Using multi-tissue gene expression prediction from cis-genotypes as an example, the authors showed that mr.mash is competitive in terms of both prediction accuracy and speed [20]. However, while powerful, mr.mash has some limitations. First, mr.mash requires individual-level data, i.e., genotypes and phenotypes for each individual and, mainly for privacy reasons, these data are rarely publicly available [21]. Second, mr.mash does not scale well to datasets with very large sample size, such as modern biobanks. These weaknesses limit the use of mr.mash for PGS prediction in human genetics.

In this work, we overcome both these limitations by introducing “mr.mash Regression with Summary Statistics” or “mr.mash-rss”, an extension of mr.mash that only requires summary-level data. These are effect sizes and their standard errors (or Z-scores) from univariate Genome-Wide Association Studies (GWASs) and Linkage Disequilibrium (LD) estimates from reference panels, which are usually publicly available [21]. mr.mash-rss shares some features with the established Multivariate Adaptive Shrinkage (mash) [22], in that they both use the same mixture-of-multivariate-Normals prior on the effect sizes to leverage effect sharing across conditions (e.g., different phenotypes), and the EB approach to adapt the prior to the data. In fact, both mr.mash and mr.mash-rss “borrow” this framework that was introduced with mash. However, while mash assumes that the input summary statistics come from independent variables (i.e., it does not deal with LD), mr.mash-rss takes a full multivariate multiple regression approach and adjusts effect sizes for both sharing across conditions and correlations among variables (i.e., it takes LD into account). We test mr.mash-rss in the task of PGS prediction for multiple phenotypes jointly via simulations in several scenarios and show that it is competitive in terms of prediction accuracy with currently available methods. We then confirm these results in the analysis of real data for 16 blood cell traits in the UK Biobank [23, 24].

Description of the method

The multivariate multiple regression is used to model the effects of several predictor variables X on multiple responses Y jointly:

Y=XB+EEMNn×r(0,In,V) (1)

where YRn×r is the response matrix for r responses (phenotypes in our case) in n individuals, XRn×p is the predictor matrix for p predictors (genetic variants in our case) in n individuals, BRp×r is the matrix of effects for p predictors and r responses, and ERn×r, is the matrix of residuals for r responses for n individuals. The residuals follow a Matrix Normal distribution with mean 0 (an n × r matrix of zeroes), covariance across individuals In (an n × n identity matrix), and covariance across responses V (an r × r positive definite matrix).

mr.mash adopts a Bayesian approach by imposing a prior on the effects:

bjw0,S0k=1Kw0,kNr(0,S0,k),j=1,,p. (2)

where bj is an r-vector that captures the effects of predictor j, and bj is the jth row of B. Thus, the effects are assumed to be identically distributed as a mixture of r-variate Normals with K components. The prior is determined by w0 ≔ (w0,1, …, w0,K), the set of non-negative mixture weights, and S0{S0,1,,S0,K}, the set of r × r covariance matrices across responses. The elements of S0 are prespecified and are intended to capture plausible patterns of effect sharing across responses [20].

To make the model fit computationally efficient for large datasets, mr.mash approximates p(BX,Y,V,w0,S0), the true posterior distribution of the regression coefficients, through variational inference, which uses optimization techniques to find the best approximation within a chosen family of distributions [25]. The optimal approximation is determined by maximizing the evidence lower bound (ELBO), a lower bound on the model’s marginal likelihood. In addition, mr.mash also estimates w0 (and V) from the data by maximizing the ELBO, thereby adapting the prior to the data. This whole procedure has been termed variational empirical Bayes [26].

Extension of mr.mash to summary statistics

Following the approach of [27], we express the updates in mr.mash in terms of sufficient statistics. The likelihood for the mr.mash model is

(B;X,Y,V)MNn×r(Y;XB,In,V)=|2πV|-n/2exp{-12tr[V-1(YY-YXB-BXY+BXXB]}. (3)

We can see that XX, XY, and YY are sufficient statistics for the likelihood. Thus, the mr.mash model can be fitted using expressions based only on these sufficient statistics (see S1 Text for detailed derivations) to obtain the same results as using individual-level data X and Y.

The sufficient statistics can be recovered from effect sizes and their standard errors (or Z-scores) from GWAS and LD estimates, following steps provided in [27] and S1 Text. We call mr.mash with summary statistics mr.mash-rss. However, it should be noted that while XY can be recovered exactly, XX is only approximated when LD estimates come from reference panels, rather than from the data that generated GWAS summary statistics [27]. Thus, using summary data can be seen as fitting the mr.mash model using an approximation to the likelihood in 3 [27]. The quality of the approximation depends on how closely the LD reference panel matches the GWAS summary statistics. Quality control should therefore be performed on summary statistics and LD before model fitting [27, 28]. In addition, YY may not be available. However, this quantity is not strictly necessary, unless V is estimated within the mr.mash-rss algorithm [27]. While mr.mash has a way to deal with missing values in Y, mr.mash-rss assumes the summary statistics be computed using the same individuals for each response (i.e., there are no missing values in Y).

The methods introduced in this paper are implemented in the R package [29] mr.mash.alpha which is available for download at https://github.com/stephenslab/mr.mash.alpha.

Verification and comparison

Simulations using UK Biobank genotypes

We devised a simulation study where the goal was to compare mr.mash-rss and other competing methods at computing PGS for multiple phenotypes from summary data. We used real genotypes from the UK Biobank array data for n = 105, 000 nominally unrelated White British individuals that were randomly sampled. After applying a series of filters (see S1 Text for details), the data included p = 595, 071 genetic variants.

We simulated r = 5 phenotypes according to three scenarios that differed in the structure of the effect sharing across phenotypes. Causal variants (5,000 for all scenarios) were randomly sampled from all genetic variants.

  • 1

    “Equal Effects”, where each causal variant affects all the phenotypes and has the same effect across phenotypes. The per-phenotype proportion of variance explained by the causal variants or genomic heritability (hg2) is equal to 0.5.

  • 2

    “Mostly Null”, where the causal variants affect only the first phenotype with hg2 equal to 0.5, while the remaining phenotypes are affected only by a non-genetic component (i.e., hg2=0)

  • 3

    “Shared Effects in Subgroups”, where the effect of each causal variant is drawn such that it is equally likely to be shared (but not be equal) in phenotypes 1 through 3 or to be shared (but not be equal) in phenotypes 4 and 5. The per-phenotype hg2 is 0.3 in phenotypes 1–3 and 0.5 in phenotypes 4 and 5.

These three scenarios were similar to those used in [20], but some parameters (e.g., number of causal variants) were modified to reflect more closely the genetic architecture of complex traits, rather than gene expression. We also simulated a few scenarios based on the Equal Effects scenario (i.e., equal effects of the causal variants across phenotypes) to assess the effect of genomic heritability, polygenicity (i.e., number of causal variants), and number of phenotypes modeled on the performance of the methods:

  • 4

    “Low hg2”, where the per-phenotype hg2 is 0.2.

  • 5

    “High Polygenicity”, where the number of causal variants is 50,000.

  • 6

    “More Phenotypes”, where the number of simulated phenotypes is 10.

For each of the scenarios above, we simulated 20 replicates. Per-phenotype prediction accuracy was computed as the R2 from the linear regression of the true phenotypes on the predicted phenotypes for the test set individuals, which consisted of 5,000 randomly sampled individuals from the total of 105,000. This metric has the attractive property that its upper bound is hg2 [30].

Methods compared

We compared mr.mash-rss to a few competing methods that satisfied the following requirements: (1) can be fitted with only summary data; (2) do not require a validation data set to tune model parameters; (3) for multivariate methods, are able to model at least 5 phenotypes jointly. This resulted in the choice of the following methods:

  • 1

    LDpred2-auto. This is a univariate Bayesian method that imposes a two-component mixture prior on the regression coefficients, consisting of a point-mass at 0 and a zero-centered Normal distribution [10]. This method is labelled “LDpred2” in the results.

  • 2

    SBayesR. This is a univariate Bayesian method that imposes a four-component mixture prior on the regression coefficients, consisting of a point-mass at 0 and three zero-centered Normal distributions, each with a different variance [9]. This method is labelled “SBayesR” in the results.

  • 3

    SmvBayesC. This is a multivariate Bayesian method that imposes a two-component mixture prior on the regression coefficients across phenotypes, consisting of a point-mass at 0 and a zero-centered multivariate Normal distribution [31, 32]. This method allows for each genetic variant to affect any combination of phenotypes. This method is labelled “SmvBayesC” in the results. We also tested a “restrictive” version that allows for each genetic variant to affect all or none of the phenotypes only [14, 31]. This method is labelled “SmvBayesC-rest” in the results.

We also included two 2-step approaches:

  • 4

    MTAG+LDpred2-auto. The first step uses MTAG, which is a multivariate method that adjusts univariate ordinary least squares (OLS) summary statistics based on the (estimated) correlation between the effects across phenotypes [33]. Because MTAG does not account for LD between variants, MTAG-adjusted summary statistics are then fed to LDpred2-auto in the second step. This method is labelled “MTAG+LDpred2” in the results.

  • 5

    wMT-SBLUP. This is a method that uses SBLUP to convert univariate OLS summary statistics into univariate Best Linear Unbiased Predictor (BLUP) estimates in the first step. In the second step, the univariate BLUP estimates for multiple phenotypes are adjusted based on weights that take into account the genetic correlations among phenotypes and the sample size from which the summary statistics were computed [34]. This method is labelled “wMT-SBLUP” in the results.

Each method was fitted for each chromosome separately using summary statistics calculated using only the training set individuals. The summary statistics (i.e., effect sizes and standard errors) were computed from univariate simple linear regression of each phenotype on each genetic variant, one at a time. Each phenotype was quantile normalized before the analysis. LD between each pair of variants was computed using 146,288 nominally unrelated White British individuals that did not overlap with the 105,000 individuals used for the rest of the analyses. Correlations between variants that were more than 3 cM apart were set to 0 to create a “banded” LD matrix [10]. We fitted mr.mash-rss including both “canonical” and “data-driven” covariance matrices (see S1 Text and [20] for details).

Results

In the Equal Effects scenario (Fig 1A), the three fully multivariate methods (i.e., mr.mash-rss, and the two versions of SmvBayesC) performed better than the two univariate methods. This is expected because the univariate methods assume independence among genetic effects across phenotypes and are unable to learn the pattern of equal genetic effects. Among the multivariate methods, mr.mash-rss produced higher accuracy than SmvBayesC. The “restrictive” version of SmvBayesC performed as well as the unrestricted one because this scenario meets one of the effect combinations allowed by this less flexible method. The two 2-step approaches (i.e., MTAG+LDpred2-auto and wMT-SBLUP) did not perform as well as the other methods. In particular, wMT-SBLUP performed substantially worse than the other methods. We attribute the poor performance to the infinitesimal architecture assumption that wMT-SBLUP makes that does not match our simulation scenarios.

Fig 1. Prediction accuracy in simulations with different patterns of effect sharing across phenotypes.

Fig 1

Each panel summarizes the accuracy of the test set predictions in 20 simulations. The thick, black line in each box gives the median R2. The dotted and dashed lines give the maximum accuracy achievable, i.e., the simulated hg2.

In the Mostly Null scenario (Fig 1B), the genetic effects are present only in the first phenotype. Thus, joint modeling of all the phenotypes is not expected to produce any increase in accuracy compared to a phenotype-by-phenotype analysis. In phenotype 1, while SBayesR and LDpred2-auto were the most accurate methods, mr.mash-rss only had slightly lower mean R2. As for SmvBayesC, the full version performed only slightly worse than mr.mash-rss; however, the “restrictive” version performed much worse. This observation is expected, given that the prior of SmvBayesC “restrictive” only allows for the effects to be present in all or none of the phenotypes. MTAG+LDpred2-auto did a little worse than the other multivariate methods, while wMT-SBLUP performed the worst.

In the Shared Effects in Subgroups scenario (Fig 1C), SBayesR, LDpred2-auto, SmvBayesC, and mr.mash-rss performed very similarly in phenotypes 4 and 5, with SmvBayesC having slightly higher accuracy than the other methods. On the other hand, SmvBayesC and mr.mash-rss outperformed the univariate methods in phenotypes 1–3. This can be explained by the slightly higher sharing of effects across phenotypes, the larger number of phenotypes with shared effects, and the lower hg2, which make the advantage of a multivariate analysis more clear than in phenotypes 4 and 5. The prior of SmvBayesC “restrictive” is not well-suited for this scenario, which resulted in this method not performing well across phenotypes. Similarly, MTAG’s assumption that all genetic variants have the same effect sharing patterns across traits is clearly violated in this scenario. This resulted in MTAG+LDpred2-auto not performing as well as the other methods, with the exception of wMT-SBLUP, which again performed poorly.

In the Low hg2 scenario (Fig 2A), the three fully multivariate methods performed better than the two univariate methods. In addition, the relative improvement provided by the multivariate methods was larger than in the Equal Effects scenario with hg2=0.5 (Fig 1A). With smaller signal-to-noise ratio, it is harder to estimate effects accurately. Multivariate methods can borrow information across phenotypes and improve accuracy. In this scenario, MTAG+LDpred2-auto also showed better performance than the univariate methods. mr.mash-rss was the best performing method, while wMT-SBLUP was the worst in this scenario.

Fig 2. Prediction accuracy in simulations with different genetic architecture.

Fig 2

Each panel summarizes the accuracy of the test set predictions in 20 simulations. The thick, black line in each box gives the median R2. The dotted lines give the maximum accuracy achievable, i.e., the simulated hg2.

In the High Polygenicity scenario (Fig 2B), prediction accuracy achieved by all the methods was much lower than in the Equal Effects scenario with 5,000 causal variants (Fig 1A). This is expected since each causal variant explains a much smaller proportion of phenotypic variance and, consequently, the effects are harder to estimate accurately. However, mr.mash-rss substantially outperformed both univariate and fully multivariate competing methods. SmvBayesC could not adapt well to this scenario, providing accuracies that are similar to or even lower than SBayesR and LDpred2-auto. MTAG+LDpred2-auto was very competitive with mr.mash-rss, essentially matching its performance for almost all phenotypes, while wMT-SBLUP was the worst in this scenario.

In the More Phenotypes scenario (Fig 2C), the results are very similar to the Equal Effects scenario with 5 phenotypes (Fig 1A). The relative improvement in accuracy provided by mr.mash-rss was, however, a little larger in this scenario because the method can borrow information across more phenotypes with equal effects. On the other hand, SmvBayesC “restrictive” did not benefit from the larger number of phenotypes and provided a relative improvement over the univariate methods that was similar to that of the Equal Effects scenario with 5 phenotypes. We could not run the full version of SmvBayesC in this scenario because it was too computationally intensive.

We note that LDpred2-auto had convergence issues when using MTAG-adjusted summary statistics, presumably due to it being designed to use OLS summary statistics. Thus, some trait-scenario combinations are based on fewer than 20 replicates for MTAG+LDpred2-auto.

To evaluate runtime, we selected chromosome 10 as a medium size chromosome and LDpred2-auto, SmvBayesC, and mr.mash-rss as the best performing methods in Shared Effects in Subgroups, i.e., the most complex scenario in our simulations. The results confirm that mr.mash-rss is computationally efficient compared to the other multivariate method (S1 Table).

Robustness to model misspecification

When analyzing real data, it is common to use LD matrices that are computed using a reference panel, rather than the individuals from whom the summary statistics were computed. It is well-known that issues with analyses with summary statistics arise when the reference panel and the GWAS population do not match [27, 35, 36]. Thus, it is important to assess the robustness of mr.mash-rss to the choice of the LD matrix. While we note that all the simulations above used an LD matrix that was computed using a subset of UK Biobank individuals that did not overlap with those used to compute the summary statistics, we also tested the performance of mr.mash-rss with truly external LD matrices. To do so, we used the same setting as the Equal Effects scenario, but computed LD matrices using 503 unrelated European individuals from the 1000G project [37]. This is a very small sample size that has been shown to be problematic when analyzing large GWAS samples [35]. Some preliminary testing highlighted that “banded” LD matrices (as used for the other simulations) resulted in convergence issues for the methods, while “block-diagonal” LD matrices computed as in [38] produced a better performance. Thus, we used the latter for the “External LD” simulation scenario. The results of this analysis are summarized in S1 Fig and show that, as expected, all the methods performed worse compared to the Equal Effects scenario. However, mr.mash-rss remained the best performing method overall.

mr.mash-rss assumes complete sample overlap across phenotypes. However, this might not be the case when analyzing real data. To investigate the robustness of mr.mash-rss to the violation of this assumption, we used the same setting as the Equal Effects scenario and assigned missing values completely at random (MCAR) to individuals in the training set. This was done such that each individual had missing values in any combination of the five phenotypes with equal probability. We simulated two scenarios, one where 20% of the individuals had missing phenotypes and another one where 80% of the individuals had missing phenotypes. The results, summarized in S2 Fig, show that mr.mash-rss' performance was unchanged in the scenario with fewer missing phenotypes. In the scenario with a larger percentage of missing phenotypes, mr.mash-rss’ prediction accuracy was now lower than SmvBayesC, but still higher than the univariate methods and similar to MTAG+LDpred2-auto.

In summary, these analyses show that mr.mash-rss is fairly robust to some model misspecifications.

Applications

Case study: Predicting blood cell traits in the UK Biobank

To evaluate mr.mash-rss on a real data application, we sought to predict blood cell traits from genotypes using the UK Biobank data. The UK Biobank is a dataset of roughly 500,000 individuals with genetic and phenotypic data [39]. We focused on a subset of 16 blood cell traits that have been used for quantitative genetic analyses in previous work [24]. After a series of filters (see S1 Text for details), our data consisted of n = 244,049 individuals and p = 1,054,330 HapMap3 variants, as has been previously recommended [40]. The 244K individuals were split into 5 non-overlapping groups to perform 5-fold cross-validation. Each method was trained on the data from 4 groups and prediction accuracy was computed in the remaining fifth group. This procedure was repeated five times, once for each fold. Given that SmvBayesC is too computationally intensive for this many phenotypes, LDpred2-auto suffered from convergence issues when using MATG-adjusted summary statistics, and wMT-SBLUP performed poorly in all simulation scenarios and does not account for sample overlap, we compared mr.mash-rss, LDpred2-auto, and SBayesR in the real data application.

The results of this analysis are summarized in Fig 3 and S2 Table. Overall, the three methods performed similarly. This result is similar to what we found in the “Shared Effects in Subgroup” simulation scenario, which was designed to be reflective of the complex genetic architecture and effect sharing patterns of actual complex traits. However, mr.mash-rss was the most accurate for 14 out 16 blood cell phenotypes. The relative change in mean prediction accuracy compared to LDpred2-auto ranged from -0.6% (Eosinophil Percentage) to 32.8% (Basophill Percentage), with an average of 5.4% (Table 1). The relative change in mean prediction accuracy compared to SBayesR ranged from -1.9% (Eosinophil Percentage) to 13.9% (Basophill Percentage), with an average of 2.7%(Table 1). The better performance of SBayesR compared to LDpred2-auto may be due to a more flexible prior that can better approximate the actual distribution of the genetic effects.

Fig 3. Prediction accuracy for the 16 blood cell traits in the full UK Biobank data.

Fig 3

The thick, black line in each box gives the median R2.

Table 1. Percentage change in mean R2 of mr.mash-rss relative to LDpred2-auto and SBayesR for the 16 blood cell traits in the full and sampled UK Biobank data.

Phenotype Full data Sampled data
LDpred2-auto SBayesR LDpred2-auto SBayesR
Red Blood Cell Counts (RBC#) 2.6 1.1 18.1 6.1
Haemoglobin Concentration (HGB) 0.1 -1.6 13.5 -0.7
Mean Corpuscular Volume (MCV) 2.3 1.3 7.7 1.3
Red Blood Cell Volume Distribution Width (RDW) 4.8 3.3 19.6 7.5
Mean Sphered Cell Volume (MSCV) 6.1 4.4 17.2 7.6
Reticulocyte Percentage (RET%) 6.3 3.5 20.1 8.3
High Light Scatter Reticulocytes Percentage (HLR%) 4.0 1.4 16.3 4.7
Platelet Count (PLT#) 1.8 0.9 11.3 4.3
Plateletcrit (PCT) 1.6 0.6 10.9 3.0
Platelet Distribution Width (PDW) 1.7 0.7 9.0 1.7
White Blood Cell Count (WBC#) 3.2 1.8 18.3 4.3
Lymphocyte Percentage (LYMPH%) 6.9 5.1 32.9 11.5
Monocyte Percentage (MONO%) 3.2 1.3 7.2 -1.4
Neutrophil Percentage (NEUT%) 9.7 7.1 36.0 13.8
Eosinophil Percentage (EO%) -0.6 -1.9 12.4 -2.4
Basophil Percentage (BASO%) 32.8 13.9 122.3 13.5

Underlined are the negative values, i.e., those instances where mr.mash-rss produces lower accuracy than the competing method.

In accordance with the simulations, the improvement in accuracy tended to be largest for phenotypes with lower genomic heritability (though this relationship is only suggestive) as shown in Fig 4. With lower signal-to-noise ratio, leveraging the sharing of effects in a multivariate analysis can give greater improvements. This can be seen, for example, for Neutrophil Percentage (hg2=0.16; S3 Table), which has been shown to share putative causal variants with Lymphocyte Percentage (Fig. 3C in [24]) and is one of the phenotypes showing a greater improvement from using mr.mash-rss. On the other hand, the three platelet phenotypes have higher genomic heritability (hg2=0.24-0.31; S3 Table) and despite some sharing of causal variants (Fig. 3C in [24]), the improvements in accuracy from using mr.mash-rss are very small.

Fig 4. Relationship between improvement in prediction accuracy and genomic heritability in the full UK Biobank data.

Fig 4

Phenotypes are plotted along the x-axis by their genomic heritability (hg2) and along the y-axis by the change in R2 relative to the LDpred2-auto (Panel A) and SBayesR (Panel B); that is, (R2(mr.mash-rss)—R2(other method))/R2(other method). The blue line represents the linear regression fit with 95% confidence bands.

Previous analyses have shown that phenotypes with smaller sample size gain more advantage from multivariate modeling [20, 41]. We hypothesized that more substantial improvements in prediction accuracy from using mr.mash-rss could be obtained with a smaller sample size. Thus, we repeated the same analysis on 75,000 individuals, randomly sampled from the total of 244,049. The results, summarized in Fig 5 and S2 Table, showed that this is indeed the case.

Fig 5. Prediction accuracy for the 16 blood cell traits in the sampled UK Biobank data.

Fig 5

The thick, black line in each box gives the median R2.

In fact, the relative change in mean prediction accuracy compared to LDpred2-auto ranged from 7.2% (Monocyte Percentage) to 122.3% (Basophill Percentage), with an average of 23.3% (Table 1). This is about 4 times larger than the average relative change in mean prediction accuracy using the full data. The relative change in mean prediction accuracy compared to SBayesR ranged from -2.4% (Eosinophil Percentage) to 13.5% (Neutrophil Percentage), with an average of 5.2% (Table 1). This is about 2 times larger than the average relative change in mean prediction accuracy using the full data.

Case study: Predicting more polygenic traits in the UK Biobank

We then sought to predict eight more polygenic phenotypes. Based on [42], we chose a group of phenotypes that have high pairwise genetic correlations; namely, body mass index (BMI), trunk fat mass (TFM), body fat percentage (BFP), weight, waist circumference, hip circumference. We also chose two additional phenotypes, namely diastolic blood pressure (DP) and systolic blood pressure (SP), that are highly genetically correlated with each other and are moderately genetically correlated with the phenotypes in the first group. This was meant as a stress test of our methods, given that previous studies have shown that this type of variational empirical Bayes methods are usually less competitive with dense signals [12, 43].

The results of this analysis are summarized in S3 Fig and show that, as expected, mr.mash-rss was outperformed by the other methods for all the phenotypes in this case study. Previous studies have found the source of the under performance to be the M-step update for the prior mixture weights [12]. Here, anecdotally, we also observed the same phenomenon, whereby the weights on the null component and on the components with very small variance in the mixture tended to be over estimated, resulting in over shrinkage of the effects. [12] solved the issue by using a grid search and cross-validation approach to select the combination of mixture weights that maximizes prediction accuracy in a test set. However, this strategy is not feasible for mr.mash-rss wherein the number of mixture components is often more than 100.

Thus, we used a different strategy to try and improve mr.mash-rss’ performance. In particular, we ran mash with the same mixture prior as mr.mash-rss, on the summary statistics for a subset of semi-independent LD-pruned genetic variants for all chromosomes. We extracted the estimated mixture weights, set the weight on the null component to 0.5, and rescaled the other mixture weights accordingly (this step was necessary because mash underestimated the null weight due to it being run on a small sample of genetic variants). These mixture weights were fed to mr.mash-rss, which was then constrained to update the mixture weights for only the first 10 iterations. In this way, we maintained the adaptive nature of the empirical Bayes without incurring over shrinkage of the effects.

The results show that using this strategy (termed “mr.mash-rss mash” in the S3 Fig), the performance of mr.mash-rss improves for every phenotype, becoming comparable to the other methods’ for DP, SP, hip, and weight, and superior for BMI.

Discussion

In this work, we have introduced mr.mash-rss, the summary data version of a recently developed empirical Bayes multivariate multiple regression method [20]. Like mr.mash, mr.mash-rss enjoys (1) the ability to learn patterns of effect sharing across phenotypes; (2) the ability to model dozens of phenotypes jointly; (3) computational efficiency. Additionally, mr.mash-rss addresses two important limitations of mr.mash —the need for individual-level data and the lack of scalability to biobank-size data.

Through an array of simulations and real data analysis using the UK Biobank, we showed that mr.mash-rss is competitive with state-of-the-art univariate and multivariate PGS methods. Of note, mr.mash-rss outperformed competing methods in 14 out of 16 blood cell phenotypes, although the magnitude of the improvement varied across phenotypes, from modest to substantial. This highlights that the general mr.mash model can adapt to either more sparse (e.g., for gene expression [20]) or more dense (e.g., for complex traits) genetic architectures. We also showed that the improvement in prediction accuracy from using mr.mash-rss increased substantially with a smaller sample size. This holds good promise for improving prediction accuracy for phenotypes that are difficult to measure and in samples of individuals of non-European descent, which are usually much smaller [44]. In addition, the performance of the mr.mash model depends on the accuracy of the “data-driven” covariance matrices [20]. Thus, advances in covariance matrix estimation can potentially lead to improvements in prediction accuracy.

A limitation of mr.mash-rss is that it does not perform as well for very polygenic phenotypes as highlighted by the second data application. This is a known issue for this type of variational empirical Bayes methods [12, 43]. The problem stems from the update of the hyperparameters (the mixture weights, in particular) by maximization of the ELBO, which gets trapped in sub-optimal local optima [12, 45]. This issue can be overcome by using a grid search and cross-validation approach to select the combination of mixture weights that maximizes prediction accuracy in a test set [11, 12]. However, mr.mash-rss usually includes a large number of mixture components, which makes the grid search and cross-validation approach not feasible. We were able to ameliorate this problem by obtaining good initial estimates of the mixture weights with mash, which were then refined with a few iterations of mr.mash-rss. This simple strategy improved the prediction accuracy of mr.mash-rss, making it competitive for most traits analyzed. Future research is needed to find a more principled way to select hyperparameters that works well with arbitrary patterns of sparsity in the genetic architecture of complex traits.

Another limitation of mr.mash-rss is that it requires the summary statistics to be computed on the same samples for each phenotype. In other words, there should not be missing data in Y in 1. Dealing with arbitrary patterns of missing data in multivariate models is not a trivial problem [46] and is an area where more research is needed. If individual-level data are available, missing values may be imputed before the prediction analysis. In fact, recent work has shown that imputing missing values results in improved prediction accuracy of PGS and power in GWAS [47, 48]. Nonetheless, our results showed that mr.mash-rss is robust to a small to medium amount of missing phenotypes. In addition, in specific cases such as with complete sample non-overlap across phenotypes, some simplifications arise that allow for models like mr.mash-rss to be fitted efficiently [38].

Our work showed that mr.mash-rss is fairly robust to some forms of model misspecification (i.e., external LD and sample non-overlap). However, model misspecification also arises with the use of “imperfect” summary statistics. For example, when summary statistics come from a meta-analysis of multiple cohorts, sample size is often different among genetic variants, and different biases and noise levels likely affect different cohorts [40]. One way to test the robustness of mr.mash-rss to different sources of model misspecification would be to use truly external summary statistics, possibly from a meta-analysis, and evaluate its performance in an independent cohort.

This work evaluated mr.mash-rss using continuous phenotypes. While the theory behind the method assumes the phenotypes to be continuous, it may be possible for mr.mash-rss to be applied to case-control phenotypes, in the same way as methods such LDpred2-auto and SBayesR, which also assume continuous phenotypes. An in-depth investigation of the performance of mr.mash-rss for case-control phenotypes is left for future work.

Supporting information

S1 Table. Summary statistics for runtime (in seconds) on chromosome 10 for the “Shared Effects in Subgroups” scenario.

For LDpred2-auto, the statistics are based on the sum of runtime across phenotypes. Each method was run using 4 CPUs.

(PDF)

pgen.1011519.s001.pdf (52.7KB, pdf)
S2 Table. Mean prediction R2 across test sets for the 16 blood cell traits in the full and sampled UK Biobank data.

(PDF)

pgen.1011519.s002.pdf (68.2KB, pdf)
S3 Table. Mean hg2 across training sets for the 16 blood cell traits in the full UK Biobank data.

(PDF)

pgen.1011519.s003.pdf (66.8KB, pdf)
S1 Fig. Prediction accuracy in simulations with external LD matrix.

The figure summarizes the accuracy of the test set predictions in 20 simulations of the Equal Effects scenario. The thick, black line in each box gives the median R2. The dotted line gives the maximum accuracy achievable, i.e., the simulated hg2.

(EPS)

pgen.1011519.s004.eps (23.3KB, eps)
S2 Fig. Prediction accuracy in simulations with missing phenotypes.

Each panel summarizes the accuracy of the test set predictions in 20 simulations of the Equal Effects scenario. Panel A (B) includes the results of a scenario where 20% (80%) of the individuals have missing values in any combination of the 5 phenotypes. The thick, black line in each box gives the median R2. The dotted line gives the maximum accuracy achievable, i.e., the simulated hg2.

(EPS)

pgen.1011519.s005.eps (39.7KB, eps)
S3 Fig. Prediction accuracy for 8 more polygenic traits in the full UK Biobank data.

The thick, black line in each box gives the median R2.

(EPS)

pgen.1011519.s006.eps (19.2KB, eps)
S1 Text. Detailed methods.

Detailed description of the methods, including: derivations of the mr.mash-rss algorithms; data preparation; simulations; methods compared; data analysis.

(PDF)

pgen.1011519.s007.pdf (352.7KB, pdf)

Acknowledgments

This research was conducted using the UK Biobank Resource under application number 129216. We thank Gao Wang, Yuxin Zou, Peter Carbonetto, and Matthew Stephens for useful discussions.

Data Availability

The genotype and phenotype data used in our analyses are available from UK Biobank (https://www.ukbiobank.ac.uk/). All code implementing the simulations and data analyses, and the compiled results generated from our simulations have been deposited on Zenodo (https://doi.org/10.5281/zenodo.14262333). The methods are implemented in the R package mr.mash.alpha, available for download at https://github.com/stephenslab/mr.mash.alpha.

Funding Statement

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM146868 to FM. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. PS acknowledges support from Open Discovery Innovation Network (ODIN) under grant number NNF20SA0061466. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Hickey JM, Chiurugwi T, Mackay I, Powell W. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nature genetics. 2017;49(9):1297–1303. doi: 10.1038/ng.3920 [DOI] [PubMed] [Google Scholar]
  • 2. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome medicine. 2020;12(1):1–11. doi: 10.1186/s13073-020-00742-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, et al. Opportunities and challenges for transcriptome-wide association studies. Nature genetics. 2019;51(4):592–599. doi: 10.1038/s41588-019-0385-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Walsh B, Lynch M. Evolution and selection of quantitative traits. Oxford University Press; 2018. [Google Scholar]
  • 5. Cao C, Ding B, Li Q, Kwok D, Wu J, Long Q. Power analysis of transcriptome-wide association study: Implications for practical protocol choice. PLoS genetics. 2021;17(2):e1009405. doi: 10.1371/journal.pgen.1009405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Meuwissen TH, Hayes BJ, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. genetics. 2001;157(4):1819–1829. doi: 10.1093/genetics/157.4.1819 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013;193(2):327–345. doi: 10.1534/genetics.112.143313 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Ge T, Chen CY, Ni Y, Feng YCA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature communications. 2019;10(1):1776. doi: 10.1038/s41467-019-09718-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Lloyd-Jones LR, Zeng J, Sidorenko J, Yengo L, Moser G, Kemper KE, et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nature communications. 2019;10(1):5086. doi: 10.1038/s41467-019-12653-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Privé F, Arbel J, Vilhjálmsson BJ. LDpred2: better, faster, stronger. Bioinformatics. 2020;36(22-23):5424–5431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Zhang Q, Privé F, Vilhjálmsson B, Speed D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nature communications. 2021;12(1):4192. doi: 10.1038/s41467-021-24485-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Zabad S, Gravel S, Li Y. Fast and accurate Bayesian polygenic risk modeling with variational inference. The American Journal of Human Genetics. 2023;110(5):741–761. doi: 10.1016/j.ajhg.2023.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends in Genetics. 2021;37(11):995–1011. doi: 10.1016/j.tig.2021.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Jia Y, Jannink JL. Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics. 2012;192(4):1513–1522. doi: 10.1534/genetics.112.144246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Grinberg NF, Wallace C. Multi-tissue transcriptome-wide association studies. Genetic Epidemiology. 2021;45(3):324–337. doi: 10.1002/gepi.22374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Rohde PD, Nyegaard M, Kjolby M, Sørensen P. Multi-trait genomic risk stratification for type 2 diabetes. Frontiers in Medicine. 2021;8:711208. doi: 10.3389/fmed.2021.711208 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. The American Journal of Human Genetics. 2023;110(10):1673–1689. doi: 10.1016/j.ajhg.2023.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Albiñana C, Zhu Z, Schork AJ, Ingason A, Aschard H, Brikell I, et al. Multi-PGS enhances polygenic prediction by combining 937 polygenic scores. Nature communications. 2023;14(1):4702. doi: 10.1038/s41467-023-40330-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Truong B, Hull LE, Ruan Y, Huang QQ, Hornsby W, Martin H, et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genomics. 2024;4(4). doi: 10.1016/j.xgen.2024.100523 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Morgante F, Carbonetto P, Wang G, Zou Y, Sarkar A, Stephens M. A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes. PLoS Genetics. 2023;19(7):e1010539. doi: 10.1371/journal.pgen.1010539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nature reviews genetics. 2017;18(2):117–127. doi: 10.1038/nrg.2016.142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Urbut SM, Wang G, Carbonetto P, Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature genetics. 2019;51(1):187–195. doi: 10.1038/s41588-018-0268-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Vuckovic D, Bao EL, Akbari P, Lareau CA, Mousas A, Jiang T, et al. The polygenic and monogenic basis of blood traits and diseases. Cell. 2020;182(5):1214–1231. doi: 10.1016/j.cell.2020.08.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Zou Y, Carbonetto P, Xie D, Wang G, Stephens M. Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model. bioRxiv. 2023; p. 2023–04. [Google Scholar]
  • 25. Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: A review for statisticians. Journal of the American statistical Association. 2017;112(518):859–877. doi: 10.1080/01621459.2017.1285773 [DOI] [Google Scholar]
  • 26. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993–1022. [Google Scholar]
  • 27. Zou Y, Carbonetto P, Wang G, Stephens M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genetics. 2022;18(7):e1010299. doi: 10.1371/journal.pgen.1010299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Chen W, Wu Y, Zheng Z, Qi T, Visscher PM, Zhu Z, et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nature Communications. 2021;12(1):7117. doi: 10.1038/s41467-021-27438-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.R Core Team. R: A Language and Environment for Statistical Computing; 2023. Available from: https://www.R-project.org/.
  • 30. Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM. Pitfalls of predicting complex traits from SNPs. Nature Reviews Genetics. 2013;14(7):507–515. doi: 10.1038/nrg3457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Cheng H, Kizilkaya K, Zeng J, Garrick D, Fernando R. Genomic prediction from multiple-trait Bayesian regression methods using mixture priors. Genetics. 2018;209(1):89–103. doi: 10.1534/genetics.118.300650 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Rohde PD, Fourie Sørensen I, Sørensen P. Expanded utility of the R package, qgg, with applications within genomic medicine. Bioinformatics. 2023;39(11):btad656. doi: 10.1093/bioinformatics/btad656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nature genetics. 2018;50(2):229–237. doi: 10.1038/s41588-017-0009-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Maier RM, Zhu Z, Lee SH, Trzaskowski M, Ruderfer DM, Stahl EA, et al. Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nature communications. 2018;9(1):989. doi: 10.1038/s41467-017-02769-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Benner C, Havulinna AS, Järvelin MR, Salomaa V, Ripatti S, Pirinen M. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. The American Journal of Human Genetics. 2017;101(4):539–551. doi: 10.1016/j.ajhg.2017.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dun Y, Chatterjee N, Jin J, Nishimura A. A Robust Bayesian Method for Building Polygenic Risk Scores using Projected Summary Statistics and Bridge Prior. arXiv preprint arXiv:240115014. 2024;.
  • 37. Consortium GP, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68. doi: 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Spence JP, Sinnott-Armstrong N, Assimes TL, Pritchard JK. A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics. BioRxiv. 2022; p. 2022–04. [Google Scholar]
  • 39. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. doi: 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. Human Genetics and Genomics Advances. 2022;3(4). doi: 10.1016/j.xhgg.2022.100136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nature genetics. 2019;51(3):568–576. doi: 10.1038/s41588-019-0345-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Wu Y, Burch KS, Ganna A, Pajukanta P, Pasaniuc B, Sankararaman S. Fast estimation of genetic correlation for biobank-scale data. The American Journal of Human Genetics. 2022;109(1):24–32. doi: 10.1016/j.ajhg.2021.11.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Kim Y, Wang W, Carbonetto P, Stephens M. A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression. Journal of Machine Learning Research. 2024;25(185):1–59. [Google Scholar]
  • 44. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature genetics. 2019;51(4):584–591. doi: 10.1038/s41588-019-0379-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Ročková V, George EI. EMVS: The EM approach to Bayesian variable selection. Journal of the American Statistical Association. 2014;109(506):828–846. doi: 10.1080/01621459.2013.869223 [DOI] [Google Scholar]
  • 46. Little RJ, Rubin DB. Statistical analysis with missing data. vol. 793. John Wiley & Sons; 2019. [Google Scholar]
  • 47. Dahl A, Thompson M, An U, Krebs M, Appadurai V, Border R, et al. Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder. Nature Genetics. 2023;55(12):2082–2093. doi: 10.1038/s41588-023-01559-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. An U, Pazokitoroudi A, Alvarez M, Huang L, Bacanu S, Schork AJ, et al. Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nature Genetics. 2023;55(12):2269–2276. doi: 10.1038/s41588-023-01558-w [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Michael P Epstein, Anke Huels

1 Jul 2024

Dear Dr Morgante,

Thank you very much for submitting your Methods entitled 'Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes.' to PLOS Genetics. Also, thank you for your patience with the review process; we apologize for the extensive delay in a decision. 

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. While the manuscript is well written and presents valuable improvements over existing methods like mr.mash and mashR, it cannot be considered a significant advancement in its current form. A more comprehensive comparison with existing methods and an extended real data application are needed to ensure the robustness and applicability of the method in real scenarios. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Anke Huels

Guest Editor

PLOS Genetics

Michael Epstein

Section Editor

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thank you for the opportunity to review your manuscript titled "Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes." Your work on extending the mr.mash model to utilize summary statistics (mr.mash-rss) is a valuable contribution to the field of polygenic prediction. I have a few suggestions and comments that I hope you find helpful in refining your manuscript.

1. Similarity to Existing Methods:

* Your proposed method, mr.mash-rss, shows significant similarities to existing approaches, particularly mr.mash and mashR. While extending mr.mash to use summary statistics is a valuable adaptation, it would be beneficial to clearly articulate how your work distinguishes itself from these established methods. Highlighting the unique contributions and specific advancements of mr.mash-rss will help emphasize its novelty and impact.

2. Use of Variational Inference and Empirical Bayes:

* The incorporation of variational inference and empirical Bayes in mr.mash-rss is a noteworthy addition. However, the manuscript could benefit from a more detailed explanation of these techniques and their specific advantages in your context. Clarifying how variational inference improves computational efficiency and how empirical Bayes enhances adaptability will provide a clearer understanding of the strengths of your approach.

3. Technical Details and Assumptions:

* The manuscript mentions that while XTYXTY can be recovered exactly, XTXXTX is approximated using LD reference panels. It would be helpful to discuss the potential biases introduced by this approximation, especially when the reference panel does not perfectly match the study population. Additionally, addressing the assumption that summary statistics are computed using the same individuals for each response will clarify the applicability of your method in real-world scenarios where missing data are common.

4. Comprehensive Comparisons:

*Including comparisons with a wider array of existing methods, such as different Bayesian approaches and multi-trait models, will provide a clearer context for the improvements claimed by mr.mash-rss. Evaluating your method across various datasets with different characteristics (e.g., sample sizes, ethnic backgrounds, trait architectures) will strengthen the evidence for its general applicability and robustness.

5. Clarifying Contributions and Novelty:

* Clearly articulating the unique contributions of mr.mash-rss and how they advance the field will enhance the manuscript. Detailed explanations of the innovations in model design and the practical impact of these methodological improvements will help readers appreciate the significance of your work.

6. Consideration of Simpler Approaches:

* It might be valuable to explore whether a simpler approach using mashR could be effective. For instance, one could analytically compute summary statistics and then apply shrinkage using the LD matrix. This could offer a more straightforward and computationally efficient solution while still leveraging the benefits of LD information.

7. Constructive Suggestions:

* Provide more detailed insights into the implementation and advantages of variational inference in mr.mash-rss.

* Discuss the empirical Bayes techniques used and compare them with other adaptive methods.

* Acknowledge and discuss the limitations introduced by the approximation of XTXXTX and the assumptions made, particularly in diverse population contexts.

* Include a wider range of benchmarks to contextualize the improvements and demonstrate the practical significance of mr.mash-rss.

I hope these suggestions are helpful as you continue to refine your manuscript. Your work contributes valuable insights to the field of polygenic prediction, and I look forward to seeing how it progresses.

Reviewer #2: The authors have demonstrated that mr.mash-rss is a novel to learn pattern of shared effects across multiple phenotypes using GWAS summary statistics. The manuscript was well-written and clearly showed the purpose of the paper. While there is a major advancement in this work, there are several questions I would like to ask:

1. In simulations, I wonder if the authors could simulate using the same HapMap3 variants as SBayesR and LDpred2 were designed with this set of SNPs

2. On line 271-273, the author mentioned about the application to non-European descent, I would think this transferability is also due to the assumption of shared causal SNPs between ancestral groups.

3. I wonder how sensitive the method to GWASes with fewer overlapping samples and out-of-sample LD matrices (e.g. LD from 1000 Genomes).

4. I wonder if the author can evaluate the method in an independent cohort to demonstrate the power of the methods.

Reviewer #3: Kunkel et al. is a very nice paper that I really enjoyed reading. The method is concisely and clearly described, and the work is well organized. The paper extends the mr.mash method so that it can be fitted using only GWAS summary statistics, which if works, makes it much more useful in practice (as the authors note). Overall it’s a very nice study, however I believe it has some potential limitations that could be addressed. Below I provide some suggestions aimed at improving the work, where the main concern is robustness of the method in real scenarios and applicability. Also, I think the benchmarks could be extended to consider other potentially more effective multivariate methods.

Comments:

1. It seems to me that the method assumes that the covariance of genetic effects is known across outcomes (S_0). Do you estimate this using LD score regression? Also, it seems that the method also assumes Y’Y is known. However, this may not be directly estimable in practice, but could be obtained using LD score regression. LD score does however make different assumptions. I would be interested in seeing simulations and real data analyses that examined the performance when the GWAS summary stats are obtained on different cohorts, using partially and non-overlapping samples. Such analyses would in my mind represent the most useful applications scenarios for the method.

2. SBayesR and LDpred1/2, and other iterative methods do not always converge in practice when the summary statistics are somehow imperfect. This is not only an LD reference panel issue, which is obviously important as you note. E.g., in meta-analyses sample sizes often vary between SNPs, as not all SNPs may be called or imputed accurately across all cohorts. Also, different cohorts in meta-analyses may be subject to different biases and noise levels. Some of these biases could be examined in simulations, e.g. varying sample sizes, and LD reference quality. However, the best test is using real external GWAS summary statistics. You can examine public single-cohort summary statistics such as FinnGen or decode genetics summary statistics and predict into UKB.

3. The paper doesn’t examine applications to case-control outcomes. The methods that it compares against are generally all applicable to such outcomes (although the math involved generally assumes that they are quantitative). I see no clear reasons why mr.mash couldn’t be applied to such outcomes, and please explain if that’s not the case. Many disease outcomes are highly correlated, e.g. psychiatric disorders, cardiometabolic disorders, etc.

3.b. The real data considered all have relatively simple genetic architectures. I would also consider outcomes such as height, which could also be correlated to other anthropometric outcomes in the UKB. Or if possible, disease outcomes.

4. There are a couple of multivariate methods that I would recommend also considering. First, there’s SBLUP (Maier et al., Nat Comm 2018). I also recommend comparing with some of the following MTAG (Turley et al,. Nat Genet 2018), GSEM (Grotzinger et al., Nat Hum Behav 2019), or even multi-PGS (Albiñana et al. Nat Comm 2023). I do however concede that using some of these might be more complicated in practice as these are two-step approaches, in that you first have to generate multi-trait summary statistics and then fit using a single outcome PGS. However, I suspect that this could be more powerful in some settings, more generalizable and more robust. (I don’t expect you to compare against all of these, but I recommend some.)

5. There is no run time comparison, but this would be highly useful for people when deciding whether to use the software if this was presented.

6. It wasn’t clear to me exactly how the causal effects were sampled in the simulations. Did you sample their effects using equation 2? If so, what weights and values K did you use? Perhaps trying a couple combinations of these values would be of interest, to see how the performance varies with these.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Bjarni Vilhjalmsson

Decision Letter 1

Michael P Epstein, Anke Huels

19 Oct 2024

Dear Dr Morgante,

Thank you very much for submitting your Methods entitled 'Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes.' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers were generally satisfied with the revision but had a few minor comments that we ask you to address in a revised manuscript.

We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, log into your Editorial Manager account and select the option 'Revise Submission' in the 'Submissions Needing Revision' folder.

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Anke Huels

Guest Editor

PLOS Genetics

Michael Epstein

Section Editor

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: All comments have been addressed.

Reviewer #2: For integrative PRS, I would cite Albinana et al. Nat Comms, 2024 (https://www.nature.com/articles/s41467-023-40330-w) and Truong et al. Cell Genomics, 2024 (https://www.cell.com/cell-genomics/fulltext/S2666-979X(24)00065-X) because they recently proposed complementary methods to perform PRS combination.

The authors nicely addressed all of my comments!

Reviewer #3: Thank you for addressing many of my concerns and the comments made by other reviewers. I believe the manuscript has improved substantially, although I still have a couple of perhaps minor comments.

On your response to comment 2, I still believe this would be the most fair comparison for a summary statistics based method, but perhaps you can highlight this in the discussion.

On your response to comment 3, I don’t understand your reasoning for not applying mr.mash-rss to binary outcomes, especially because you also note that the methods that you compare against do. I suspect that these outcomes would be more polygenic than molecular measurements, and would therefore look like the ones in Supplementary Figure 3. Interestingly, the results in Supplementary Figure 3 suggest that mr.mash-rss underperforms on highly polygenic traits like BMI. It’s ok if mr.mash-rss doesn’t outperform other methods in every way, and I think exploring a few common case-control outcomes would be of interest.

On comment 4, why didn’t compare against MTAG+SBayesR instead of MTAG+LDpred2, as SBayesR seemed to outperform LDpred2?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Decision Letter 2

Michael P Epstein, Anke Huels

27 Nov 2024

Dear Dr Morgante,

We are pleased to inform you that your manuscript entitled "Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes." has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Anke Huels

Guest Editor

PLOS Genetics

Michael Epstein

Section Editor

PLOS Genetics

Aimée Dudley

Editor-in-Chief

PLOS Genetics

Anne Goriely

Editor-in-Chief

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-24-00499R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Michael P Epstein, Anke Huels

27 Dec 2024

PGENETICS-D-24-00499R2

Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes

Dear Dr Morgante,

We are pleased to inform you that your manuscript entitled "Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Livia Horvath

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Summary statistics for runtime (in seconds) on chromosome 10 for the “Shared Effects in Subgroups” scenario.

    For LDpred2-auto, the statistics are based on the sum of runtime across phenotypes. Each method was run using 4 CPUs.

    (PDF)

    pgen.1011519.s001.pdf (52.7KB, pdf)
    S2 Table. Mean prediction R2 across test sets for the 16 blood cell traits in the full and sampled UK Biobank data.

    (PDF)

    pgen.1011519.s002.pdf (68.2KB, pdf)
    S3 Table. Mean hg2 across training sets for the 16 blood cell traits in the full UK Biobank data.

    (PDF)

    pgen.1011519.s003.pdf (66.8KB, pdf)
    S1 Fig. Prediction accuracy in simulations with external LD matrix.

    The figure summarizes the accuracy of the test set predictions in 20 simulations of the Equal Effects scenario. The thick, black line in each box gives the median R2. The dotted line gives the maximum accuracy achievable, i.e., the simulated hg2.

    (EPS)

    pgen.1011519.s004.eps (23.3KB, eps)
    S2 Fig. Prediction accuracy in simulations with missing phenotypes.

    Each panel summarizes the accuracy of the test set predictions in 20 simulations of the Equal Effects scenario. Panel A (B) includes the results of a scenario where 20% (80%) of the individuals have missing values in any combination of the 5 phenotypes. The thick, black line in each box gives the median R2. The dotted line gives the maximum accuracy achievable, i.e., the simulated hg2.

    (EPS)

    pgen.1011519.s005.eps (39.7KB, eps)
    S3 Fig. Prediction accuracy for 8 more polygenic traits in the full UK Biobank data.

    The thick, black line in each box gives the median R2.

    (EPS)

    pgen.1011519.s006.eps (19.2KB, eps)
    S1 Text. Detailed methods.

    Detailed description of the methods, including: derivations of the mr.mash-rss algorithms; data preparation; simulations; methods compared; data analysis.

    (PDF)

    pgen.1011519.s007.pdf (352.7KB, pdf)
    Attachment

    Submitted filename: response_to_reviewers.pdf

    pgen.1011519.s008.pdf (103.5KB, pdf)
    Attachment

    Submitted filename: response_to_reviewers.pdf

    pgen.1011519.s009.pdf (59.5KB, pdf)

    Data Availability Statement

    The genotype and phenotype data used in our analyses are available from UK Biobank (https://www.ukbiobank.ac.uk/). All code implementing the simulations and data analyses, and the compiled results generated from our simulations have been deposited on Zenodo (https://doi.org/10.5281/zenodo.14262333). The methods are implemented in the R package mr.mash.alpha, available for download at https://github.com/stephenslab/mr.mash.alpha.


    Articles from PLOS Genetics are provided here courtesy of PLOS

    RESOURCES