Skip to main content
Genetics, Selection, Evolution : GSE logoLink to Genetics, Selection, Evolution : GSE
. 2016 Dec 8;48:96. doi: 10.1186/s12711-016-0273-2

Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals

Rohan L Fernando 1,, Hao Cheng 1, Bruce L Golden 2, Dorian J Garrick 1,3
PMCID: PMC5144523  PMID: 27931187

Abstract

Background

Two types of models have been used for single-step genomic prediction and genome-wide association studies that include phenotypes from both genotyped animals and their non-genotyped relatives. The two types are breeding value models (BVM) that fit breeding values explicitly and marker effects models (MEM) that express the breeding values in terms of the effects of observed or imputed genotypes. MEM can accommodate a wider class of analyses, including variable selection or mixture model analyses. The order of the equations that need to be solved and the inverses required in their construction vary widely, and thus the computational effort required depends upon the size of the pedigree, the number of genotyped animals and the number of loci.

Theory

We present computational strategies to avoid storing large, dense blocks of the MME that involve imputed genotypes. Furthermore, we present a hybrid model that fits a MEM for animals with observed genotypes and a BVM for those without genotypes. The hybrid model is computationally attractive for pedigree files containing millions of animals with a large proportion of those being genotyped.

Application

We demonstrate the practicality on both the original MEM and the hybrid model using real data with 6,179,960 animals in the pedigree with 4,934,101 phenotypes and 31,453 animals genotyped at 40,214 informative loci. To complete a single-trait analysis on a desk-top computer with four graphics cards required about 3 h using the hybrid model to obtain both preconditioned conjugate gradient solutions and 42,000 Markov chain Monte-Carlo (MCMC) samples of breeding values, which allowed making inferences from posterior means, variances and covariances. The MCMC sampling required one quarter of the effort when the hybrid model was used compared to the published MEM.

Conclusions

We present a hybrid model that fits a MEM for animals with genotypes and a BVM for those without genotypes. Its practicality and considerable reduction in computing effort was demonstrated. This model can readily be extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.

Background

Two types of equivalent mixed linear models are used for whole-genome analyses in livestock [1]. The first type, which we refer to as marker effects models (MEM), includes random effects (α) of marker genotype covariates (Mg) in the model [2, 3]. The second type, which we refer to as breeding value models (BVM), includes the breeding values of the animals, ug=Mgα, as a random effect that has a covariance computed from Mg [1, 2, 46] rather than from the pedigree.

It was shown that the BVM can be adapted for what is known as single-step genomic best linear unbiased prediction (SS-GBLUP) that combines information from animals with genotypes and from those without genotypes in a single BLUP analysis [79]. However, the SS-GBLUP analysis requires computing the inverse of G, which is the matrix of genomic relationships of the animals with genotypes [8, 9]. When the number Ng of genotyped animals exceeds the number of markers, G is singular, but a full-rank matrix such as G=0.95G+0.05A, with A being the pedigree-based relationship matrix might be used in its place. Single-step analyses based on the MEM do not require computing G or its inverse [10]. Furthermore, Bayesian regression analyses based on the MEM are not limited to assuming a normal prior for α, which is implicit in SS-GBLUP; Bayesian regression models can accommodate various priors including the t distribution as in BayesA [3, 11], the double exponential distribution as in Bayesian LASSO [12] or mixtures of the t distribution or the normal distribution [3, 11, 13] as in BayesB or BayesC. However, the MME that correspond to single-step MEM (SS-MEM) types of models contain dense blocks that correspond to the imputed genotypes of animals with missing genotypes [10], and those blocks can be large if many animals have missing genotypes.

Liu et al. [14] developed a single-step method based on the BVM with direct estimation of marker effects (SSME-GBLUP). An advantage of that method over SS-GBLUP is that it does not require computing G or its inverse. Also, their method can be used for Bayesian regression models [14]. However, the MME for SSME-GBLUP contains expressions that involve the inverse of the pedigree-based relationship matrix, Agg, for the animals with genotypes. This is a dense matrix, and therefore a computational strategy was proposed to avoid computing its inverse but it requires solving a dense system of equations of order Ng within each round of Jacobi or pre-conditioned conjugate gradient (PCG) iteration for solution of the MME or within each round of MCMC sampling for Bayesian inference with models such as BayesA or BayesB [3]. Equation (A1) in Legarra and Ducrocq [15] also present a set of similar MME with marker effects for genotyped animals and breeding values for non-genotyped animals. As with the MME in Liu et al. [14], the advantage of the MME of Legarra and Ducrocq [15] is that they do not require the computation of G or its inverse but require computing the inverse of Agg. Recently, in some livestock such as dairy cattle, Ng has increased towards a million or more, and thus, solving a dense system of equations of order Ng within each round of iteration will place a heavy burden on SSME-GBLUP in computing time and storage requirements.

The objective of this paper is to present computational strategies for whole-genome analyses based on the SS-MEM that avoid storing large, dense blocks of the MME that involve imputed genotypes. First, we will show this for the MME given in [10]. Second, we will present what we refer to as a hybrid type model (HM) that uses a MEM for the animals with marker genotypes and a BVM for animals without genotypes. The MME that correspond to this model also has dense blocks that correspond to animals with missing genotypes. However, in Bayesian regression analyses based on this hybrid model, storing the dense blocks can be avoided even more efficiently than was the case for the MME given in [10]. Finally, we will present the computer storage and time required for a real application.

Theory

In most genomic analyses, the columns of the matrix Mg of marker covariates are centered to have zero expectations. This ensures that the vector of breeding values, ug=Mgα, has a mean of 0. Centering Mg requires knowing the expected value of the marker covariates for founder animals. Often, these expected values are unknown, but can be incorporated into the model as a location parameter [10, 16]. However, to simplify our presentation without loss of generality we assume that Mg is a matrix of correctly centered marker covariates.

Marker effects model for single-step Bayesian regression

As in Fernando et al. [10], a MEM for single-step Bayesian regression analyses can be derived from writing the model equation as:

ynyg=XnXgβ+Zn00ZgMnα+ϵMgα+e, 1

where the vectors and matrices for animals without genotypes are denoted with a subscript n and those for the animals with genotypes with a subscript g. Thus, yn and yg are the vectors of phenotypic values, Xn and Xg are the incidence matrices for the fixed effects, β, Zn and Zg are incidence matrices that relate the breeding values of animals, Mnα+ϵMgα, to the phenotypic values, Mg is the matrix of centered marker covariates for animals with genotypes, Mn=AngAgg-1Mg, is the matrix of imputed marker covariates for animals with missing genotypes, α is the vector of random marker effects, ϵ is the vector of imputation residuals with null means and covariance matrix proportional to the inverse of Ann, the sub-matrix corresponding to animals with missing genotypes in the inverse of the matrix A of pedigree-based additive relationships, and e is a vector of residuals. The matrix of imputed genotypes can be more efficiently computed by solving the sparse system of equations [10]:

AnnMn=-AngMg. 2

Depending on the prior used for α, Model (1) can be used for a range of single-step Bayesian regression analyses, including single-step BLUP, BayesA, BayesB, BayesC or Bayesian LASSO [10]. Those models (1) and their corresponding analyses assume that the breeding values can be adequately explained by the marker covariates. If that assumption does not hold, a polygenic residual with a mean of zero and a covariance matrix that is proportional to A can be included as an additional effect in the model.

The MME that correspond to Model (1) for BayesC with π=0 are:

XXXZMXnZnMZXMZZM+Iσe2σα2MnZnZnZnXnZnZnMnZnZn+Annσe2σg2β^α^ϵ^=XyMZyZnyn, 3

where X=XnXg, Z=Zn00Zg, M=MnMg, y=ynyg, σα2 is the variance of marker effects, σg2 is the additive genetic variance, and σe2 is the residual variance. These Eq. (3) contain matrix-by-matrix products and matrix-by-vector products involving the dense matrix Mn of imputed genotypes. We will assume here that XZM,MZX and MZZM are small enough to be stored in memory. Below, we present computing strategies for calculations that involve ZnZnMn or its transpose without storing these large matrices in memory. If the matrices, XZM and MZX, are large, the computing strategies presented below can also be adapted for calculations that involve these matrices as will be done in our example application.

Computing strategies

First, we will discuss the calculations necessary to apply PCG to (3). Following this, we will discuss how to use (3) to obtain Markov chain Monte-Carlo (MCMC) samples of the location parameters of Model (1) from their full conditional distributions.

Preconditioned conjugate gradient iteration The PCG algorithm is widely used to iteratively solve the MME, e.g., [17, 18]. In each iteration of PCG, the left-hand-side of the MME (LHS-MME) is post-multiplied by a vector. However, the LHS-MME given in (3) contains two dense sub-matrices, ZnZnMn and its transpose, that may be too large for storage in memory; the remaining sub-matrices in LHS-MME can be stored in memory either because they are not too large or because they are sparse. In each round of PCG, ZnZnMn needs to be post-multiplied by a vector q that has the same order as α and the transpose of this matrix by a vector s that has the same order as ϵ. The first of these products can be done without storing ZnZnMn in memory as follows. Post-multiplying both sides of Eq. (2) by q gives:

AnnMnq=-AngMgqAnnx=b, 4

where x=Mnq and b=-Ang(Mgq). Note that for efficient computation, the matrix Mg is first multiplied by q and the resulting vector is then premultiplied by the sparse matrix -Ang to get b. Solving the sparse system (4) gives the product x=Mnq without storing the large dense matrix Mn in memory, and premultiplying x by ZnZn gives the first product that is required for PCG. To obtain the second product, note that from Eq. (2),

Mn=-MgAgn(Ann)-1. 5

Thus, the required product MnZnZns can be written as -MgAgn(Ann)-1ZnZns. To compute this efficiently, first the product b=ZnZns is obtained. Then, solving the sparse system:

Annx=b, 6

gives x=(Ann)-1ZnZns, where b and x have been reused to denote intermediate results in these computations. Next, x is premultiplied by the sparse matrix -Agn and the resulting vector is premultiplied by Mg to get the second product that is required for PCG. The remaining matrix-by-vector products for PCG can be obtained directly because these matrices are stored in memory.

We will now describe how these matrices and the right hand sides involving Mn can be computed in order to form the other elements of the MME without storing Mn in memory.

Consider computing:

MZZM=MnZnZnMn+MgZgZgMg,

without storing Mn in memory. Let mni denote row i of ZnMn. Then, MnZnZnMn can be written as:

MnZnZnMn=imnimni. 7

Thus, the matrix product MnZnZnMn can be computed without storing Mn in memory if each row of ZnMn can be obtained without computing the entire matrix. Rearranging (2), row i of ZnMn can be computed as:

mni=-eiZn(Ann)-1AngMg, 8

where ei is a row vector with 1 in the ith position and 0s elsewhere, and the product eiZn(Ann)-1 can be obtained by solving the sparse system:

Annx=b, 9

where b=Znei. Note that the solution to (9) gives x=eiZn(Ann)-1, without having to invert Ann. These row vectors of ZnMn can also be used to compute XnZnMn as:

XnZnMn=iximni, 10

where xi is used here to denote the ith column of Xn, which is the first term of

XZM=XnZnMn+XgZgMg,

which is a product of Mn. Similarly, the right-hand-side vector MZy can be written as the sum:

MZy=MnZnyn+MgZgyg,

which is a product of Mn, and its first term can be computed as:

MnZnyn=imniyi, 11

where yi is used to denote the ith element of yn.

Note that computing mni corresponding to row i of ZnMn using Eq. (8) can be done independently of its computation for any other row, and thus, the computations in Eqs. (7), (10), and (11) can be easily parallelized.

There are a number of approaches to compute mni that can be used. One approach for solving Eq. (9) for n less than approximately ten million using a typical workstation computer is to obtain a sparse Cholesky factor of Ann and directly solve for each mni using forward and backward substitution. Software libraries exist for obtaining a Cholesky factor of large sparse matrices [19] using multiple threads or general purpose graphics processing units (GPU). Once the factor is obtained, independent threads can be used to solve in parallel from a single memory copy of the factor. Note that the Cholesky factor of Ann may be denser than Ann, depending upon the nature of the relationships between genotyped and non-genotyped animals. For example, if non-genotyped animals comprise only non-parents, then Ann is diagonal and solution for mi is trivial. For larger linear systems with non-genotyped parents where the Cholesky factor is too large, indirect solution using high performance methods on GPU is a practical alternative. The PCG algorithm parallelizes well and performs efficiently on GPU.

MCMC sampling Gibbs sampling is a widely used MCMC method for inference with Bayesian regression models, e.g., [3, 20, 21]. One of the most time-consuming tasks in these analyses is single-site sampling of the location parameters from their full-conditional distributions. Let θ=βαϵ denote the location parameters in Model (1). Then, following [22], the full-conditional distribution for θi under BayesC with π=0 is:

θi|ELSENθ~i,cii-1σe2,

where ELSE is used to denote all the other parameters in the model and the vector of phenotypes, θ~i is the solution to:

ciiθ~i=ri-ciθ+ciiθi, 12

cii is the ith diagonal of the matrix C that denotes the LHS-MME given in (3), ri is the right-hand-side element from (3) corresponding to θi, and ci is row i of C. However, as mentioned previously, some sub-matrices of the LHS-MME given in (3) are dense and too large to be stored in memory. However, as explained below, the same strategy used to avoid storing these sub-matrices in PCG calculations can also be used here.

Consider computing the full conditional mean and variance for θi=αj. Then cii, the ith diagonal element from C is obtained from the jth diagonal of B=MZZM+Iσe2σα2, which can be stored in memory. Similarly, ri-ciθ+ciiθi is computed as:

ri-ciθ+ciiθi=dj-bjα+bjjαj,

where dj is element j of the vector

d=MZy-MZXβ-MnZnZnϵ,

bj is row j and bjj is the jth diagonal of B. We have already seen how the large, dense matrix MnZnZn can be multiplied by a vector such as ϵ without storing this matrix in memory, and this same strategy can be used here to compute MnZnZnϵ. The full-conditional distribution for αj, under BayesC with π=0, becomes:

αj|ELSENα~j,bjj-1σe2,

where α~j is the solution to:

bjjα~j=dj-bjα+bjjαj. 13

The right-hand-side of this Eq. (13) is also used for calculations that involve variable selection in BayesB and BayesC when π>0 [21, 23].

Similarly, to compute full-conditional mean and variance for θi=ϵj, cii is obtained from the jth diagonal of B that now denotes B=ZnZn+Annσe2σg2, and

ri-ciθ+ciiθi=dj-bjϵ+bjjϵj,

where dj is element j of the vector that now denotes:

d=Znyn-ZnXnβ-ZnZnMnα.

The product ZnZnMnα is obtained as described for PCG calculations. Then, the full-conditional distribution for ϵj becomes:

ϵj|ELSENϵ~j,bjj-1σe2,

where ϵ~j is the solution to

bjjϵ~j=dj-bjϵ+bjjϵj.

Samples of model effects such as β,α, and ϵ, or their linear functions that represent breeding values namely Mgα for genotyped animals or Mnα+ϵ for non-genotyped animals can be accumulated as sums and sums of squares to obtain posterior means and prediction error variances. Alternatively, samples of fitted model effects can be written to a file for post-processing.

Hybrid model for single-step Bayesian regression

The large, dense matrix Mn appears in the MEM given by Model (1). This is avoided here by using a BVM for animals with missing genotypes rather than expressing their breeding values as the sum of the effects of their imputed marker genotypes plus their separate imputation residuals. The advantages of the MEM such as allowing for alternative priors for marker effects are retained by still fitting a MEM but only for animals with genotypes. The hybrid model equation is:

ynyg=XnXgβ+0ZnZgMg0αun+e, 14

with un=Mnα+ϵ, and thus, this single-step hybrid model (SS-HM) is equivalent to the SS-MEM (1). To construct the MME for this model (14), we need to invert the covariance matrix corresponding to the random effects, namely Σ=Varαun. That inverse can be obtained by first writing the random effects of (14) as:

αun=I0MnIαϵ.

Then, Σ can be written as:

Σ=I0MnIVarαϵIMn0I=I0MnIIσα200(Ann)-1σg2IMn0I,

and its inverse can be obtained as:

Σ-1=I-Mn0II1σα200Ann1σg2I0-MnI=I0001σα2+-MnIAnn1σg2-MnI=I0001σα2+MnAnnMn-MnAnn-AnnMnAnn1σg2.

Now, using the result Mn=AngAgg-1Mg=-(Ann)-1AngMg [10], in the off-diagonal blocks of the second term, Σ-1 becomes:

Σ-1=I1σα2+MnAnnMn1σg2MgAgn1σg2AngMg1σg2Ann1σg2,

and then the MME for the HM (14) can be written as:

XXXgZgMgXnZnMgZgXgQMgAgnσe2σg2ZnXnAngMgσe2σg2ZnZn+Annσe2σg2β^α^u^n=XyMgZgygZnyn, 15

where Q=MgZgZgMg+Iσe2σα2+MnAnnMnσe2σg2. These equations involve Mg rather than Mn, except in Q, the diagonal block that corresponds to α^, which has dimension equal to the number of marker covariates, often less than 50,000, regardless of the number of genotyped or non-genotyped animals. Furthermore, we assume here that XgZgMg and MgZgXg are small enough to be stored in memory.

The only difference between Eq. (15) and the MME given by Equation (A1) in Legarra and Ducrocq [15] is in Q. Using the notation in this paper, the matrix expression MnAnnMn that is present in the Q is expressed as Mg(Agg-Agg-1)Mg in that paper [15], which involves the inverse of Agg that is difficult to compute. However, these two expressions are identical because (Agg-Agg-1)=Agn(Ann)-1Ang and Mn=-(Ann)-1AngMg.

Computing strategies

The matrix Mn of imputed genotypes does not appear alone in the MME (15), but the MME involve rather the matrix product MnAnnMn. However, MnAnnMn can be computed efficiently without needing to store the entire Mn matrix in memory, in situations when the number of genotyped animals is less than the number of non-genotyped animals. To do so, first from Eq. (5) MnAnn=-MgAgn. Next, column i of Mn is obtained by solving the sparse system (2) for column i and premultiply it by the sparse matrix Agn. This gives column i of the product AgnMn, which has the same size as Mg. The columns of AgnMn can be computed one at a time or in parallel. Premultiplying AgnMn by -Mg gives:

-MgAgnMn=MnAnnMn. 16

This needs to be done only once to set up the MME and has order equal to the number of marker genotypes which is often much less than the number of genotyped or non-genotyped animals.

These MME also contain two large, dense sub-matrices, namely AngMg and its transpose. As described previously, in the PCG iteration and in the Gibbs sampling, these matrices need to be post-multiplied by a vector. When the number of genotyped animals is sufficiently smaller than the number of non-genotyped animals, these matrix-by-vector products can be obtained more efficiently by storing in memory the sparse matrix Ang and the dense but smaller matrix Mg rather than their product AngMg. In each round of PCG iteration or Gibbs sampling, the matrix-by-vector product AngMgq, for example, is obtained by first multiplying the dense matrix Mg by the vector q and then premultiplying the result by the sparse matrix Ang. The corresponding calculation for the MEM required solving sparse systems of equations given by Eq. (4) in each round of PCG or Gibbs sampling, in addition to the two matrix-by-vector multiplications that are also required here.

In situations when the number of genotyped animals exceeds the number of non-genotyped animals, using SS-MEM that explicitly involves Mn in off-diagonal blocks may be competitive with SS-HM.

Application of hybrid model

An example dataset from the American Simmental Association is used to demonstrate the computing effort to obtain PCG samples from (15) and the relative computing effort to obtain MCMC samples for the MME of (15) compared to (3). The vector of phenotypes comprised of 4,934,101 birth weight observations; there were 6,179,960 animals in the pedigree file; 31,453 animals in the pedigree file were genotyped and 23,290 of those had birth weight observations. After filtering marker covariates for low minor allele frequency, 40,214 marker effects were included in the model. There were 399,036 fixed effects, including the herd-year-season effects defined in the same manner as in the routine national evaluation. To keep this presentation that compares the computational effort involved in fitting (3) and (15) simple, our application was limited to a single trait ignoring maternal genetic and permanent environmental effects. Furthermore, we did not include a comparison with SS-GBLUP since that model cannot accommodate mixture priors for marker effects as used in this example.

The analyses were performed using a workstation built on an ASUS X99E WS motherboard, a Xeon E5-1650V3 3.5 Ghz processor overclocked to 4.2 Ghz, 128 GB of DDR4 ECC RAM at 2133 Mhz and four NVidia Titan X GPU, with 9TB of workspace in a RAID5 configuration comprising four SATA disks. The operating system was Ubuntu 14.04 LTS, and the BOLT software package (http://manual.thetasolutionsllc.com/IntroBolt) built with the CUDA Toolkit 7.5 was used.

The vector y, and matrices X,Z,Ann,Agn,Ang, and Mg were built from data files using BOLT tools. Ordering the pedigree file, construction of A-1, including calculation of inbreeding, and its partitioning into blocks representing genotyped and non-genotyped animals took 3 min and required 1.0 Gb of disk storage and 302 Mb of memory. While A-1 was being formed, y, X and Z were created in about 10 s and required 38, 78 and 83 Mb of disk storage. When stored in memory, they required 19.7, 59.2 and 59.2 Mb respectively. The matrix Mg required 4.2 Gb of disk and memory when stored in single precision.

The matrix product XgZgMg and its transpose MgZgXg were not explicitly formed, instead computations involving those terms were done in parts as described previously in this paper. The sparse Cholesky decomposition of the 6,148,507 order Ann matrix took just under 4 min. The imputation of Mn, using forward and backward substitution with the Cholesky factor, and its premultiplication by Agn took just over 35 min using eight parallel processes. The creation of the matrix products MgZgZgMg and MnAnnMn each took about 20 s using 2 GPU after obtaining the imputed values and required 6.2 Gb of disk storage and memory when stored in single precision.

For the analysis using Eq. (15) the PCG solution of the MME stored in double precision took just under 40 min, using a single GPU and diagonal preconditioning. Because the PCG was performed in double precision just under 18 Gb of memory was required to store all the sub-matrices comprising the left-hand side. The right-hand side required 53Mb of memory. Additional memory for work space of approximately 4 Gb was required for PCG. Convergence was determined by comparing solutions from every 200 rounds of iteration to solutions from 5000 rounds. By 1800 rounds the correlation and regression of solutions with those from 5000 rounds were very close to one (.99 each). The PCG residual value was near 1.1e-05. The PCG solution does not give the posterior mean of the marker effects for a model with mixture priors, but was used to define starting values for MCMC sampling of all the effects in the MME, but using a mixture prior for marker effects.

Starting with the same PCG solution but different random number generator seed values, using 4 parallel chains each drawing 10,500 samples on its own GPU took 70 min to obtain a total of 42,000 Gibbs samples, using π=0.95 and known variance ratios in Eqs. (3) and (15). Each of the parallel Gibbs Sampler jobs shared a single copy in shared memory of the left-hand-side matrices, reducing the memory requirements and reading from disk.

Experience has shown that 40,000 samples after burn-in is sufficient to obtain posterior means of breeding values that are stable for the hybrid model and MEM. However, we confirmed this by sampling four additional parallel chains each with length 250,000 samples after a 5000 sample burn. The purpose of these very long chains was to confirm that the 40,000 length post-burn-in chain was sufficient, thus supporting the timings provided here to achieve useful results. The correlation of the posterior means of the breeding values for genotyped animals and non-genotyped animals from the aggregated 40,000 length chain was .99 and 1.0, respectively, with the posterior means from the aggregated 1,000,000 length chain. However, a chain longer that 40,000 may be needed to accurately estimate PEV for animals with intermediate to low accuracies. Because only off-diagonal blocks of the left-hand side are used in the GPU computation for updating the right-hand-sides for each vector of single-site Gibbs samples for β, α or un, and the Gibbs samples were obtained using single precision, the entire left-hand side without the diagonal blocks fit on the GPU. This strategy also allowed the GPU to asynchronously update the right-hand-side while the next set of effects was being sampled using the CPU.

The Gibbs sampler was performed using single precision for storage of the left- and right-hand sides, requiring approximately half as much memory as the 18 Gb required for the PCG which was performed in double precision. Additional memory for work space of approximately 2 Gb was required for the Gibbs sampler. The total time required to assemble the left- and right-hand sides, after the matrix components were formed, was just under 3 min. The total job time for all steps, starting with the raw data, to obtain posterior mean estimates of the MCMC samples of marker effects and MCMC samples of breeding values of the genotyped and non-genotyped animals and their prediction error variances (from the posterior variances of their MCMC samples), took approximately 3 h.

The memory required to store Mg is determined by the product of the number of animals genotyped and the number of marker covariates. A compressed dense format (CBRC) allows this matrix to be 32 times larger than with the double precision version used above, but increased the computing time for PCG in this example by 25%.

An additional Gibbs sampler run was made with the MME of (3) that used Eq. (4), which requires within each iteration, forward and backward solves using the factor of Ann. The time required to obtain one sample of all effects was 2.0 s. Using the MME of (15) required only 0.44 s for each sample of all effects. Accordingly, the hybrid model has considerable advantage over that of [10]. These two computing approaches should give the same estimates of breeding values as they represent equivalent models as explained in the theory section. The correlations between the MCMC-derived estimates of breeding values between the two approaches were 1.0 for non-genotyped animals and over 0.99 for genotyped animals.

Computational performance of Eq. (15) was compared ignoring the genotypes on approximately half the genotyped animals to demonstrate the effect of the proportion of genotyped animals on computing time. This reduced dataset left 15,694 animals with genotype information of which 11,683 had a birth weight observation. The total number of animals in the pedigree file and number of observations on birth weight were the same as before. After filtering the marker covariates for low minor allele frequency, 40,211 marker loci remained. The time necessary to complete the PCG solver was about 3 min less than the 27 min needed for the larger analysis, which had approximately double the number of genotyped animals. The reduction in time necessary to complete the PCG solver was primarily due to reductions in time used for matrix multiplications involving the smaller matrix Mg. The time necessary to obtain the 42,000 Gibbs samples was reduced by about 20% to 1 h. Imputation required 24 instead of 35 min. Creating MgZgZgMg and MnAnnMn required just under 20 s, the same as before. Thus, doubling the proportion of genotyped animals increased the total job time from about 2.5 to 3 h.

Discussion

Fernando et al. [10] introduced a single-step MEM that is equivalent to SS-GBLUP in the special case when all markers are fitted in the model. It has the advantage compared to SS-GBLUP that it can accommodate a wider class of models with different priors including mixture distributions. However, the MME corresponding to that model includes large, dense off-diagonal sub-matrices, ZnZnMn and its transpose, between the blocks for marker effects and the imputation residuals. These sub-matrices are prohibitively large from a storage and computational viewpoint when there is a large number of non-genotyped animals. We have shown here that these limitations can be circumvented by representing those sub-matrices as:

-ZnZn(Ann)-1AngMg

and its transpose, and by doing matrix multiplication in parts. This is possible for large problems but requires repeated solutions of an equation of the form Annx=b. A similar solution is used in every iteration of SS-GBLUP when an APY inverse is exploited [18]. Nevertheless, the model in [10] is practical for realistic problems as demonstrated. It does not require approximations [18] as in SS-GBLUP when large numbers of animals are genotyped.

Here we have introduced a single-step HM that is equivalent to the MEM in [10] and in a special case equivalent to SS-GBLUP. The HM includes marker effects and breeding values, and the off-diagonal sub matrices comprise the term AngMg. Computations that involve this sub-matrix can be done efficiently in parts without having to solve equations of the form Annx=b.

The off-diagonal sub-matrices in both these models are the same size, the lower off-diagonal matrix being of the order of the number of non-genotyped animals by the number of markers. In SS-HM, this sub-matrix has a more convenient structure for storage and computation than is generally the case for SS-MEM. We have demonstrated its practicality and its considerable reduction in computing effort. This model can be readily extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.

Authors' contributions

DJG developed the computational strategies for the MEM and also proposed the HM. RLF, HC and DJG derived the MME for the HM. BLG wrote the software and performed all the data analyses. All authors read and approved the final manuscript.

Acknowledgements

We are grateful to the reviewers of this manuscript for several helpful suggestions and for pointing out the relationship between the MME for the hybrid model (15) and the MME in [A1] of Legarra and Ducrocq [15]. The authors thank the American Simmental Association for sharing their data. This work was supported in part by the US Department of Agriculture, Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive Grant No. 2015-67015-22947.

Competing interests

DJG and BLG are partners in Theta Solutions LLC that developed the BOLT software used in this paper. The authors declare that they have no competing interests.

Contributor Information

Rohan L. Fernando, Email: rohan@iastate.edu

Hao Cheng, Email: haocheng@iastate.edu.

Bruce L. Golden, Email: dorian@iastate.edu

Dorian J. Garrick, Email: dorian@iastate.edu

References

  • 1.Strandén I, Garrick DJ. Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009;92:2971–2975. doi: 10.3168/jds.2008-1929. [DOI] [PubMed] [Google Scholar]
  • 2.Fernando RL. Genetic evaluation and selection using genotypic, phenotypic and pedigree information. In: Proceedings of the 6th World Congress on Genetics Applied to Livestock Production: 11–16 January 1998. vol. 26. Armidale; 1998. pp. 329–36.
  • 3.Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nejati-Javaremi A, Smith C, Gibson JP. Effect of total allelic relationship on accuracy of evaluation and response to selection. J Anim Sci. 1997;75:1738–1745. doi: 10.2527/1997.7571738x. [DOI] [PubMed] [Google Scholar]
  • 5.Habier D, Fernando RL, Dekkers JCM. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–2397. doi: 10.1534/genetics.107.081190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
  • 7.Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–4663. doi: 10.3168/jds.2009-2061. [DOI] [PubMed] [Google Scholar]
  • 8.Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42:2. doi: 10.1186/1297-9686-42-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–752. doi: 10.3168/jds.2009-2730. [DOI] [PubMed] [Google Scholar]
  • 10.Fernando RL, Dekkers JCM, Garrick DJ. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet Sel Evol. 2014;46:59. doi: 10.1186/1297-9686-46-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the Bayesian alphabet. Genetics. 2009;83:347–363. doi: 10.1534/genetics.109.103952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, et al. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics. 2009;182:375–385. doi: 10.1534/genetics.109.101501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. In: Proceedings of the 9th world congress on genetics applied to livestock production: 1–6 August 2010. Leipzig 2010.
  • 14.Liu Z, Goddard ME, Reinhardt F, Reents R. A single-step genomic model with direct estimation of marker effects. J Dairy Sci. 2014;97:5833–5850. doi: 10.3168/jds.2014-7924. [DOI] [PubMed] [Google Scholar]
  • 15.Legarra A, Ducrocq V. Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction. J Dairy Sci. 2012;95:4629–4645. doi: 10.3168/jds.2011-4982. [DOI] [PubMed] [Google Scholar]
  • 16.Vitezica ZG, Aguilar I, Misztal I, Legarra A. Bias in genomic predictions for populations under selection. Genet Res (Camb) 2011;93:357–366. doi: 10.1017/S001667231100022X. [DOI] [PubMed] [Google Scholar]
  • 17.Strandén I, Lidauer M. Solving large mixed linear models using preconditioned conjugate gradient iteration. J Dairy Sci. 1999;82:2779–2787. doi: 10.3168/jds.S0022-0302(99)75535-9. [DOI] [PubMed] [Google Scholar]
  • 18.Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco DAL, et al. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci. 2016;99:1968–1974. doi: 10.3168/jds.2015-10540. [DOI] [PubMed] [Google Scholar]
  • 19.Chen Y, Davis TA, Hager WW, Rajamanickam S. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans Math Softw. 2008;35:22. doi: 10.1145/1391989.1391995. [DOI] [Google Scholar]
  • 20.Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186. doi: 10.1186/1471-2105-12-186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fernando R, Garrick D. Bayesian methods applied to GWAS. In: Gondro C, van der Werf J, Hayes B, editors. Genome-wide association studies and genomic prediction. New York: Humana Press; 2013. [Google Scholar]
  • 22.Sorensen DA, Gianola D. Likelihood, Bayesian, and MCMC methods in quantitative genetics. New York: Springer; 2002. [Google Scholar]
  • 23.Cheng H, Qu L, Garrick DJ, Fernando RL. A fast and efficient Gibbs sampler for BayesB in whole-genome analyses. Genet Sel Evol. 2015;47:80. doi: 10.1186/s12711-015-0157-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genetics, Selection, Evolution : GSE are provided here courtesy of BMC

RESOURCES