Skip to main content
Genetics, Selection, Evolution : GSE logoLink to Genetics, Selection, Evolution : GSE
. 2025 Mar 21;57:15. doi: 10.1186/s12711-025-00964-4

Multitrait genome-wide association best linear unbiased prediction of genetic values

Theo Meuwissen 1,, Vinzent Boerner 2
PMCID: PMC11927129  PMID: 40119282

Abstract

Background

The GWABLUP (Genome-Wide Association based Best Linear Unbiased Prediction) approach used GWA analysis results to differentially weigh the SNPs in genomic prediction, and was found to improve the reliabilities of genomic predictions. However, the proposed multitrait GWABLUP method assumed that the SNP weights were the same across the traits. Here we extended and validated the multitrait GWABLUP method towards using trait specific SNP weights.

Results

In a 3-trait dairy data set, multitrait GWAS estimates of SNP effects and their standard errors were translated into trait specific likelihood ratios for the SNPs having trait effects, and posterior probabilities using the GWABLUP approach. This produced trait specific prior (co)variance matrices for each SNP, which were applied in a SNP-BLUP model for genomic predictions, implemented in the APEX linear model suite. In a validation population, the trait specific SNP weights resulted in more reliable predictions for all three traits. Especially, for somatic cell count, which was hardly related to the other traits, the use of the same weights across all traits was harming genomic predictions. The use of trait specific SNP weights overcame this problem.

Conclusions

In multitrait GWABLUP analyses of ~ 30,000 reference population cows, trait specific SNP weights resulted in up to 13% more reliable genomic predictions than unweighted SNP-BLUP, and improved genomic predictions for all three studied traits.

Background

GBLUP (Genomic Best Linear Unbiased Prediction) and SNP (Single Nucleotide Polymorphism)-BLUP [1] are currently the most commonly used genomic prediction methods. Especially, GBLUP is relatively simple to use and aligns closely with earlier breeding value evaluation methods. Moreover, these methods are equivalent and yield identical prediction accuracies [2]. Hence, we can choose the method that best meets our needs. In addition, they can be extended to single step predictions, which combines information from genotyped and non-genotyped individuals [3].

A limitation of GBLUP and SNP-BLUP is their assumption that all SNPs contribute equally to the total genetic variance. Bayesian variable selection methods (e.g. BayesA, BayesB, BayesC, and BayesR [1, 4, 5]) allocate more variance or weight to the most important SNPs but are complex and computationally demanding using Monte Carlo Markov Chain (MCMC) sampling techniques. Recently, GWABLUP was proposed which uses deterministic weights based on GWAS (Genome Wide Association Study) results [6]. Since the SNP weights are (pre)determined by the GWAS signals, GWABLUP is based either on a weighted SNP-BLUP or on a weighted G-matrix in GBLUP, which may both be extended to single step methods (ssGWABLUP).

Also a multitrait extension of GWABLUP was proposed assuming that SNP weights are equal across the traits [6]. The latter assumed that all traits are affected by the same QTL (Quantitative Trait Loci). But generally, different traits will be affected by at least partly different QTL, and the use of the same SNP weights across the traits is suboptimal. Our aim is here to extend the multitrait-GWABLUP method to using SNP weights that are trait specific and to compare the results to using equal weights across the traits. The methods are compared in the same dairy cattle data set as [6], and using the APEX linear models suite (www.ghpc.ai), which implements multitrait SNP-BLUP with different (co)variance matrices per SNP and thus allows for different weights per SNP and per trait.

Methods

Data

The 3-trait dairy data set of [6] included the yield deviations (YD) of milk and protein yield and somatic cell count (SCC) and their reliability on 32,201 Norwegian Red cows, and was kindly provided by Geno SA (www.geno.no). Estimates of heritabilities, genetic and environmental correlations of the traits are depicted in Table 1. In [6], a canonically transformation of the 3 traits was performed [7] which resulted into 3 genetically and environmentally independent canonical traits with standardised environmental variances of 1 and genetic variances of 0.16, 0.29 and 1.44. The data also included imputed HD genotypes on Nsnps = 617,739 SNPs for all 32,201 cows (see [6] for details).

Table 1.

Heritabilities (diagonals), genetic correlations (below diagonals), and residual correlations (above diagonals) of the dairy traits

Milk Prot SCC
Milk 0.26 0.96 − 0.17
Protein 0.85 0.20 0.16
SCC 0.10 0.10 0.16

SNP weights

Uniform SNP-weights across the traits were obtained from the combined log-likelihood ratios of the GWAS of the 3 canonical traits as described in [6]. The GWAS of the three canonical traits resulted also in estimates of their SNP effects and standard errors for each of the canonical traits. To obtain trait specific SNP weights, these canonical trait SNP effects were back-transformed to original trait SNP effects together with their standard errors. Following [6], GWABLUP loglikelihood ratios per SNP and trait were calculated as:

LRtj=0.5btj2^setj2, 1

where LRtj is the log-likelihood ratio of SNP j for trait t, btj^ is the GWAS estimate of the effect of SNP j for trait t, and setj is the standard error of btj^. The model averaging of MCMC Bayesian genomic prediction schemes is to some extend mimicked by calculating the moving average of the likelihood ratios, i.e., by averaging the likelihood ratios of the current SNP and two adjacent SNPs to the left and two adjacent SNPs to the right, resulting in LR¯tj. Following [6], posterior probabilities that the SNPs have non-zero effects were calculated for trait t and SNP j as:

PPtj=πeLR¯tj/[πeLR¯tj+(1-π)],

which serve as trait specific weights for the SNPs, where the prior probability of non-zero SNP effects is π=0.001.

Multitrait SNP-BLUP analyses

The model for the multitrait SNP-BLUP is:

y=Xμ+Zb+e, 2

where y is a vector of yield deviations for the 3 traits (ordered by traits within cows); X=1naI3 is a design matrix linking the records to their trait means (na is the number of animals); μ is a vector of estimates of trait means for the 3 traits; Z=MI3 with M being a (na × Nsnps) matrix of centred allele counts of the cows; b is a vector of (3* Nsnps) multitrait SNP effects (ordered by traits within SNPs). The prior distribution of the residuals was eN(0,WR), where R is the residual (co)variance matrix of the traits (Table 1) and W is a (na × na) diagonal matrix with the inverses of the weights of the yield deviations on the diagonal.

For the analysis of model (2), the multitrait SNP-BLUP method of [810] is used, where in our data all animals are genotyped and pedigree relationships are not used and thus set to an identity matrix. The (co)variance matrix of the breeding values and SNP effects (traits within cows and SNPs) is modelled by:

Varub=ZθZ+εInaZθθZθ,

where u=Zb is a vector of multitrait breeding values; ε is a small number (ε = 0.01 was used here) added to regularize the matrix and making Varub non-singular (εIna may be replaced by εAG, where G is the genetic (co)variance matrix of the traits, to give some weight to the pedigree relationship matrix A, but this was not done here); and Θ is a (3Nsnps × 3Nsnps) block-diagonal matrix of (3 × 3)-blocks:

Θ=VSNPj,

where denotes the direct sum across all SNPs; VSNPj is the (3 × 3) SNP specific (co)variance matrix across the 3 traits, i.e. the SNP effects are a priori assumed unrelated with prior distributions bjN(0,VSNPj), where bj is a (3 × 1) vector of the SNP effects of SNP j.

In case of regular multitrait unweighted SNP-BLUP (SNPunw-BLUP), i.e. unweighted SNP effects, the VSNPj matrices are the same for all SNPs j, i.e. VSNPj=VSNP, and we have:

VSNP=G/j=1Nsnps2pj1-pj,

where G is the genetic (co)variance matrix of the traits (see Table 1) and pj is the allele-frequency of SNP j.

In case of equal weights across the traits (SNPeqw-BLUP), i.e. the SNP weights are used but weights are equal for all 3 traits as in the multitrait GWABLUP model in [6], the SNP specific (co)variance matrix is:

VSNPj=VSNPPPjmeanPPj,

where PPj is the multitrait posterior probability of SNP j accumulated over the 3 traits, which is based on the sum of the loglikelihood ratios across the 3 canonical traits [6], and mean(PPj) is the average of the PPj’s.

In case of SNP and trait specific weights (SNPtsw-BLUP), the variances of the VSNP matrix are adjusted for the weights but not the correlations implied by VSNP. This is achieved by:

VSNPj=SjVSNPSj, 3

where Sj is a 3 × 3 diagonal matrix with the t-th element:

Sjtt=PPtjmeanj(PPtj),

where meanj(PPtj) denotes the average of the PPtj’s over the SNPs j for trait t.

After constructing Θ, the Varub covariance structure, and its inverse, Henderson’s [11] mixed model equations (MME) were setup as [8]:

XW-1R-1XXW-1R-10W-1R-1XW-1R-1+Ina/ε-Z/ε0-Z/εZZ/ε+Θ-1μ^u^b^=XW-1R-1yW-1R-1y0,

An efficient iterative double-preconditioned conjugate gradient algorithm [12] was used to solve these equations as implemented in the APEX linear models suite (www.ghpc.ai; [9]), where the block diagonal preconditioner consisted of inverted diagonal blocks of the MME matrix of each level of the effects. This analysis of the SNPunw-BLUP, SNPeqw-BLUP and SNPtsw-BLUP models yielded multitrait estimates of SNP effects and of the animal’s breeding values.

Validation cows

The records of the youngest cows born in 2018 (1988 cows) were used for validation, and their YDs were masked from the above data analyses. The remaining 30,213 cows were used for the training of the models, i.e. they were used for the GWAS analyses and the estimation of SNP effects. These SNP effects were used to obtain multitrait breeding value estimates of the validation cows as:

g^v=Zvb^,

where Zv are the centered genotypes of the validation cows. The squared correlation between the predicted g^v of the 1988 validation cows and their YDs (yv) were used as an indicator of the reliabilities of the g^v. These reliabilities were investigated for statistically significant differences (P < 0.05) by bootstrapping [13], which tests for significant differences between coryv,g^vk and coryv,g^vl, where superscripts denote prediction methods k and l. Bootstrap samples are obtained by sampling with replacement validation individuals, i.e. their yv,g^vk, and g^vl – values. For each of 10,000 bootstrap data sets, coryv,g^vk and coryv,g^vl is calculated, and its scored in how many of the data sets coryv,g^vk>coryv,g^vl, where the comparison coryv,g^vk>coryv,g^vl is statistically significant if this is the case for > 97.5% of the bootstrap data sets.

Results and discussion

The analyses of the SNPunw-BLUP, SNPeqw-BLUP and SNPtsw-BLUP models converged in 451, 2446, and 5211 iterations, respectively, using the Euclidean norm of the residuals of the equations relative to that of the right-hand-side < 10–9 as convergence criterion. Hence, the use of SNP weights and especially trait specific SNP weights slowed down the convergence of the models substantially.

Figure 1 shows the posterior probabilities of non-zero effect SNPs, PPtj, for each of the traits, which are proportional to the trait specific SNP weights used in the SNPtsw-BLUP analysis. Milk yield has PPmlkj>0.9 SNPs on chromosomes 5, 6, 12, 14, 16, 19, and 24. Protein yield has PPprotj>0.9 SNPs on chromosomes 5, 6, 12, and 19. SCC has PPSCCj>0.9 SNPs on chromosomes 6, 15, 19, 21, and 25. Figure 1 shows many posterior probability peaks, especially for milk yield, and the peaks for milk and protein yield show substantial overlaps, which is much less the case for the yield traits and SCC.

Fig. 1.

Fig. 1

Manhattan plots of posterior probabilities of major effect SNPs. For milk (a), protein (b) yields and somatic cell count (c)

Table 2 shows the reliabilities of the multitrait genomic predictions measured as the squared correlations between GEBV and YDs for milk- and protein yields and SCC of the 1988 validation cows. The YDs have reliabilities of 0.409, 0.326, and 0.246, respectively [6], i.e., expressed relative to the YD reliabilities the reliabilities of SNPunw-BLUP are 0.49 (= 0.199/0.409), 0.54, and 0.68, respectively. For milk- and protein yields, genomic prediction reliabilities significantly increased by 11–13% using SNP weights, i.e. using the SNPteqw-BLUP and SNPtsw-BLUP models. Using trait specific weights (SNPtsw-BLUP) resulted in somewhat higher reliabilities than the SNPeqw-BLUP, although these differences were not statistically significant. SNPtsw-BLUP obtained the highest reliability for all three traits, albeit the improvement for SCC was minor and not statistically significant.

Table 2.

Reliabilities of genomic predictions measured as the squared correlations between GEBVs and yield deviations of 1988 validation cows

coryv,g^v2*,**
Milk Protein SCC
SNPunw-BLUP*** 0.199a 0.178a 0.168a
SNPteqw-BLUP*** 0.223b 0.197b 0.160a
SNPtsw-BLUP*** 0.226b 0.201b 0.169a

*Standard errors of coryv,g^v2 are between 0.006 and 0.008

**Different letters in the superscripts denote statistically significant differences (P < 0.05)

***Subscripts unw, teqw, and tsw mean unweighted, equal weights across the traits, and unequal weights across the traits, respectively

Table 3 shows inflation biases of the multitrait predictions measured as the regression coefficients of the yield deviations on the GEBVs for the 1988 validation cows. The inflation bias was only significant for the SCC analysis without SNP weights, where there was a deflation bias. The analyses that used SNP weights yielded virtually unbiased genomic predictions.

Table 3.

Inflation bias of genomic predictions measured as the regression coefficient of yield deviations on GEBVs for 1988 validation cows

byv,g^v*
Milk Protein SCC
SNPunw-BLUP** 1.08 1.07 1.19
SNPteqw-BLUP** 1.01 1.00 1.06
SNPtsw-BLUP** 0.99 0.98 1.01

*Standard errors of byv,g^v are between 0.04 and 0.06

**Subscripts unw, teqw, and tsw mean unweighted, equal weights across the traits, and unequal weights across the traits, respectively

Alternative trait specific SNP weights were recently tested by [10] who used the marker specific scaling values of a previously conducted BayesA analysis [1] as weights. Likewise, [14] used the weights from a non-linear method introduced by [2] and compared them to the heuristic weights 2pj1-pjb^j2 suggested by [15], where b^j is the SNP effect from an unweighted SNPBLUP analysis. The 2pj1-pjb^j2 weights yielded the highest increase of the prediction reliability of up to 12.7%, which is similar to our current results. However, the rational for the inclusion of the heterozygosity 2pj1-pj in the formula for the SNP weights is not clear, since variances of SNP effect sizes are not affected by the heterozygosity of the SNPs. However, the standard errors of the SNP effect estimates are approximately proportional to 2pj1-pj-12, which, when inserted into Eq. (1), implies that 2pj1-pjb^j2 is approximately proportional to LRtj (although the SNP effect estimates in a GWAS and SNP-BLUP analysis differ). The rational of the GWABLUP SNP weights is that, in the BayesC model, the expected variance of a SNP effect is proportional to its posterior probability of having a non-zero effect.

It seems natural to combine the GWAS signals across the traits by a multitrait GWAS [6], which makes optimal use of the data. But multitrait GWAS analyses are computationally rather complicated, and simpler single-trait GWAS based approaches may be preferred. For instance, the results from single-trait GWAS analyses may be combined by a multitrait meta-GWAS analysis [16] which is based on summary statistics of the single-trait analyses. For SNPtsw-BLUP models, single-trait GWAS analyses may be directly used to provide the trait specific SNP weights. The accuracy of the resulting predictions will depend on the power of the single-trait SNP analyses. If the single-trait SNP analyses are not very powerful (do not result in clear genome-wide significant QTL signals), the use a multitrait GWAS analysis may be worthwhile.

Alternatively, a meta-analysis may be used to combine the GWAS signals across several data-sets [17] and thereby increase the power of the GWAS analysis. Although, GWAS analyses on older or other data where the LD patterns between SNPs and QTL may differ from the current data set, may result in less appropriate SNP weights. However, the most important SNPs with consistent LD across the data sets will still be upweighted. SNPs with inconsistent LD will obtain reduced effect estimates either because their high GWAS signal was combined with a low signal in the genomic prediction data or vice versa. Also, if several GWAS analyses have been applied on partial data sets, they can be combined by a meta-analysis GWAS, where weights of inconsistent SNPs (SNP estimates change sign between analyses) will be substantially reduced and may be artificially set to zero, since such SNPs are clearly not in consistent LD with the QTL. In addition, prior information on SNPs, such as whether they are synonymous mutations, can be implemented as SNP specific π-values, and thus affect PPtj.

The trait specific SNP weights applied in Eq. (3) adjust the prior variances of the traits on a per SNP basis, but not the correlations between the traits. Equal correlations between the traits SNP effects were also suggested by [8, 10]. A more flexible model would estimate also prior correlations of SNP effects on a per (group of) SNP(s) basis. Gebreyesus et al. [18] estimated prior correlations for groups of SNPs and obtained improved prediction reliabilities for milk composition traits. More research will be needed to investigate whether SNP specific prior correlations would increase the reliabilities of the genomic predictions, or whether the standard errors of SNP specific correlations are too large for this approach to be beneficial.

There are two ways to increase the accuracy of genomic predictions: (i) to increase reference population size; and (ii) to (better) locate the causal variants and use this information in genomic prediction [19]. For large reference population sizes > 100,000 and prediction accuracies exceeding 90%, the scope for GWABLUP to further improve predictions is obviously limited. Multitrait GWABLUP concentrates on approach (ii), and may improve prediction reliabilities by up to 13% when reference population sizes are of limited size (tens of thousands), which may be due to small population sizes, no large-scale trait recording for (some of) the traits, or limited numbers of genotyped animals, or any combination of the above.

Conclusions

A multitrait SNP-BLUP model was presented with trait specific SNP weights based on the GWABLUP approach. The model with trait specific SNP weights yielded EBVs with the highest reliability for all three traits analysed. For SCC, the model with identical SNP weights reduced the reliability of the EBV compared to unweighted SNP-BLUP, which was because the SNP weights were dominated by the milk production traits. This problem was remedied by the use of trait specific SNP weights. The multitrait GWABLUP models yielded up to 13% more reliable EBV compared to unweighted multitrait SNP-BLUP.

Acknowledgements

The authors are grateful to GENO SA (www.geno.no) for providing the data, to the Norwegian Research Council for funding of project number 309611, and to helpful comments from the reviewers.

Author contributions

TM developed the trait specific SNP weights method, and VB implemented the use of SNP specific prior (co)variance in APEX. VB and TM conducted the multitrait GWABLUP analyses. TM drafted the first version of the manuscript and VB and TM revised the later versions. All authors read and approved the final manuscript.

Funding

The authors gratefully acknowledge funding from the Norwegian Research Council project number 309611.

Availability of data and materials

Data are available upon request and approval of Geno SA.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Meuwissen T, Hayes B, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.VanRaden P. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23. [DOI] [PubMed] [Google Scholar]
  • 3.Legarra A, Christensen O, Aguilar I, Misztal I. Single Step, a general approach for genomic selection. Livest Sci. 2014;166:54–65. [Google Scholar]
  • 4.Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM, et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci. 2012;95:4114–29. [DOI] [PubMed] [Google Scholar]
  • 5.Habier D, Fernando R, Kizilkaya K, Garrick D. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Meuwissen T, Eikje LS, Gjuvsland AB. GWABLUP: genome-wide association assisted best linear unbiased prediction of genetic values. Genet Sel Evol. 2024;56:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ducrocq V, Besbes B. Solution of multiple trait animal models with missing data on some traits. J Anim Breed Genet. 1993;110:81–92. [DOI] [PubMed] [Google Scholar]
  • 8.Liu Z, Goddard ME, Reinhardt F, Reents R. A single-step genomic model with direct estimation of marker effects. J Dairy Sci. 2014;97:5833–50. [DOI] [PubMed] [Google Scholar]
  • 9.Boerner V. One for all: LMT—the linear models toolbox. In: Proceedings of the 12th World Congress Applied to Livestock Production: 3–8 July 2022; Rotterdam. 2022.
  • 10.Strandén I, Jenko J. A computationally feasible multi-trait single-step genomic prediction model with trait-specific marker weights. Genet Sel Evol. 2024;56:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Henderson CR. Applications of linear models in animal breeding: University of Guelph; 1984.
  • 12.Vandenplas J, Calus MPL, Eding H, Vuik C. A second-level diagonal preconditioner for single-step SNPBLUP. Genet Sel Evol. 2019;51:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mantysaari E, Koivula M. GEBV Validation Test Revisited. In: Proceedings of the 2012 Interbull technical workshop; 2–3 February 2014; Varona. 2012.
  • 14.Chegini A, Stranden I, Karaman E, Iso-Touru T, Pösö J, Aamand GP, et al. Marker weighting improves single-step genomic prediction reliabilities of udder health traits in Nordic Red and Jersey dairy cattle populations. J Dairy Sci. 2025;108:651–63. [DOI] [PubMed] [Google Scholar]
  • 15.Wang H, Misztal I, Aguilar I, Legarra A, Muir WM. Genome-wide association mapping including phenotypes from relatives without genotypes. Genet Res. 2012;94:73–83. [DOI] [PubMed] [Google Scholar]
  • 16.Willer CJL, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Van Den Berg I, Xiang R, Jenko J, Pausch H, Boussaha M, Schrooten C, et al. Meta-analysis for milk fat and protein percentage using imputed sequence variant genotypes in 94,321 cattle from eight cattle breeds. Genet Sel Evol. 2020;52:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gebreyesus G, Lund MS, Buitenhuis B, Bovenhuis H, Poulsen NA, Janss LG. Modeling heterogeneous (co)variances from adjacent-SNP groups improves genomic prediction for milk protein composition traits. Genet Sel Evol. 2017;49:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Goddard ME. Can we make genomic selection 100% accurate? J Anim Breed Genet. 2017;134:287–8. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data are available upon request and approval of Geno SA.


Articles from Genetics, Selection, Evolution : GSE are provided here courtesy of BMC

RESOURCES