Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers

Yang Da

doi:10.1186/s12863-015-0301-1

. 2015 Dec 18;16:144. doi: 10.1186/s12863-015-0301-1

Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers

Yang Da ^1,^✉

PMCID: PMC4683770 PMID: 26678438

Abstract

Background

The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation.

Results

A multi-allelic haplotype model treating each haplotype as an ‘allele’ was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h − 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h − 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q − 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h − 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h − 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results.

Conclusion

The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.

Keywords: Haplotype, Genomic selection, Variance component, Heritability, BLUP, REML

Background

Genomic best linear unbiased prediction (GBLUP) using genome-wide single nucleotide polymorphism (SNP) markers can utilize a wealth of theoretical results and computational strategies of best linear unbiased prediction (BLUP) [1] that has become a standard approach for genetic evaluation, with dairy cattle having the most widespread use of BLUP worldwide [2–5]. The implementation of GBLUP within the BLUP framework is made possible by a genomic relationship matrix that replaces the pedigree relationship matrix in BLUP [6]. With genomic relationship matrix established, genomic estimation of variance components can also readily use the method of restricted maximum likelihood estimation (REML) [7], to be referred to as GREML (genomic REML). Using a quantitative genetics model as the unifying model, genomic relationship matrix is formulated by equaling the covariance of genomic values between two individuals to the corresponding pedigree covariance [8, 9]. Previously defined genomic relationships based on standardization of SNP coding [6, 8, 10, 11] can be considered as special cases of this unifying approach [9]. The quantitative genetics model partitions a genotypic value as the summation of a common mean, breeding value and dominance deviation [12–18]. Using matrix notations, this partition can be expressed as: g = 1μ + a + d = 1μ + W_αα + W_δδ, where μ = common mean, 1 = column vector of 1’s, a = breeding values (additive values), d = dominance deviations (dominance values), α = SNP additive effects, δ = SNP dominance effects, W_α = model matrix of α as a function of SNP allele frequencies, and W_δ = model matrix of δ as a function of SNP allele frequencies. With the factorization of a = W_αα and d = W_δδ, genomic additive relationship is a function of W_αW_α ' and genomic dominance relationship is a function of W_δW_δ ' [9]. This approach for defining genomic relationships was only available for bi-allelic loci. Although SNPs are bi-allelic loci, the issue of multi-allelic loci for genomic prediction and estimation arises if each haplotype is treated as an ‘allele’ and the haplotype block containing the haplotypes is treated as a ‘locus’. For a multi-allelic locus, the partition of a genotypic value into additive and dominance values (g = 1μ + a + d) was available [17] and the multi-allelic factorization of a = W_αα and d = W_δδ was available for three alleles [19]. However, general factorization formulations for an arbitrary number of alleles were unavailable, and a method using such multi-allelic haplotype model for genomic prediction and estimation was unavailable.

Haplotype analysis is advantageous over single-locus analysis for several reasons: a haplotype is a functional unit [20], a haplotype contains combined effects of tightly linked cis-acting causal variants [21, 22], a phenotype is affected by multiple causal loci with weak LD (LD = linkage disequilibrium) [23], or a genomic region is subjected to selection with stronger LD than genome regions unaffected by selection [24, 25]. Haplotype analysis has been widely used in genetic and genomic studies [22, 26–28]. Relatively limited studies were available on using haplotypes compared to the literature on using single SNPs for genomic prediction. Methods to define haplotype blocks for genomic prediction included a constant number of SNPs per SNP block [29, 30], fixed block length [31], or LD blocks [32]. Haplotype coding methods for genomic prediction and estimation included 2-1-0 copies of a haplotype in the two-haplotype genotype [30, 33], or maternal or paternal haplotype [29]. Haplotype mixed model methods based on the quantitative genetics model with multi-allelic factorization of additive and dominance values were unavailable for genomic prediction and estimation. Functional genomic information has been growing rapidly but remains largely unused in genomic selection. Simulation study showed that genomic prediction using causal mutations could substantially improve prediction accuracy [34], and using SNPs in transcriptional regions [35] or location specific priors based on QTL mapping results [36] improved prediction accuracy. Haplotype analysis can be a useful tool to account for joint allelic effects unaccounted for by single-SNP analysis and we have obtained encouraging preliminary results of using haplotype analysis of functional genomic information [37, 38].

The purpose of this article is to develop a quantitative genetics based multi-allelic haplotype model as an alternative method to single-SNP analysis towards the integration of functional and structural genomic information for genomic selection. This development includes deriving general multi-allelic partition of genotypic values with factorization for defining genomic relationships using haplotypes, and deriving mixed model formulations for genomic prediction and estimation that can use haplotypes separately or jointly with single SNPs.

Methods

Allelic mean and population mean of multi-allelic genotypic values

A set of m SNP markers are assumed available, and r haplotype blocks are defined from some of the m SNPs across the genome. Each haplotype block is treated as a ‘locus’ and each haplotype within the haplotype block is treated as an ‘allele’. Each locus (haplotype block) is assumed to have h alleles (haplotypes) denoted by A_i, …, A_h, with allele frequency of p_i for A_i, i = 1, …, h, and ∑^h_i = 1p_i = 1. The allelic array in the population is ∑^h_i = 1p_iA_i. Let P_ij = frequency of A_iA_j genotype, ∑^h_i = 1∑^h_j = 1P_ijA_iA_j = the genotypic array of the population, and g_ij = genotypic value of A_iA_j genotype, i,j = 1,…,h. Hardy-Weinberg equilibrium (HWE) is assumed so that the genotypic array of the population is the squared allelic array, i.e., ∑^h_i = 1∑^h_j = 1P_ijA_iA_j = (∑^h_i = 1p_iA_i)². Allele frequency of A_i is calculated as:

p_{i} = P_{i i} + \frac{1}{2} \sum_{\begin{array}{l} j = 1 \\ j \neq i \end{array}}^{h} P_{i j}

The allelic mean of A_i allele is the weighted mean of all genotypic values with the A_i allele, with each genotypic value weighted by the number of copies of the A_i allele the genotype carries. The general expression of the allelic mean without requiring HWE is a conditional mean [13] and simplifies to a weighted average of genotypic values with allele frequencies as the weights under the HWE assumption [13, 17], i.e.,

μ_{i} = [2 P_{i i} g_{i i} + \sum_{j \neq i}^{h} P_{i j} g_{i j}] / [2 P_{i i} + \sum_{j \neq i}^{h} P_{i j}] = \sum_{j = 1}^{h} p_{j} g_{i j}

The population mean is the mean of all genotypic values in the population. The general formula without requiring HWE and its expression as a weighted average of allelic means with allele frequencies as the weights requiring HWE are:

μ = \sum_{i = 1}^{h} \sum_{j = 1}^{h} P_{i j} g_{i j} = \sum_{i = 1}^{h} p_{i}^{2} g_{i i} + 2 \sum_{i = 1}^{h - 1} \sum_{j = i + 1}^{h} p_{i} p_{j} g_{i j} = \sum_{k = 1}^{h} p_{k} μ_{k}

The expressions of μ_i = ∑^h_j = 1p_jg_ij and μ = ∑^h_k = 1p_kμ_k play an important role in the derivations to factorize additive and dominance values and in defining fundamental genetic parameters of quantitative traits.

Multi-allelic effect, additive effect, additive value

The allelic effect (average effect) of allele A_i (i = 1,…h) is the deviation of the allelic mean from the population mean. From Eqs. 2 and 3, the allelic effect of A_i is:

a_{i} = μ_{i} - μ = \sum_{j \neq i}^{h} p_{j} (μ_{i} - μ_{j}) = \sum_{j \neq i}^{h} p_{j} α_{i j}

where α_ij is the additive effect or the average effect of gene substitution that is the difference between the allelic effects of the two alleles defined by Eq. 4, i.e.,

α_{i j} = a_{i} - a_{j} = μ_{i} - μ_{j} = \sum_{k = 1}^{h} p_{k} (g_{i k} - g_{j k}) = - α j i

For h alleles, h(h − 1)/2 α_ij parameters of Eq. 5 are possible but these parameters are not independent for all ij values. An example of this dependency is:

α_{i j} = α_{1 j} - α 1 i

Based on Eq. 6, h-1 independent additive effects can be defined:

α_{1 k} = a_{1} - a_{k} = μ_{1} - μ_{k}, k = 2, \dots h

where μ₁ = allelic mean of allele 1 that is used as the reference allele (e.g., defining the most frequent allele as ‘allele 1’). It is readily seen that α_ii = 0. The derivation process will allow the presence of α_ii but the final results will be based on the h−1 independent additive effects of α_lk defined by Eq. 7. All the h(h − 1)/2 possible α_ij parameters can be expressed in terms of the h−1 independent α_lk parameters through Eq. 6. The additive value (breeding value) of genotype A_iA_j is the summation of the two allelic effects of the genotype, i.e.,

a_{i j} = a_{i} + a_{j}

Each additive value defined by Eq. 8 will be shown to be a function of all h−1 additive effects defined by Eq. 7.

Dominance effect and dominance value

Dominance effect of A_iA_j genotype (δ_ij) is the deviation of the heterozygous genotypic value from the average of the two homozygous genotypic values, i.e.,

δ_{i j} = g_{i j} - \frac{1}{2} (g_{i i} + g_{j j})

With the above definition, dominance effect is the unique effect of a heterozygous genotype. Therefore, the number of dominance effects is the same as number of heterozygous genotypes, and the maximum number of dominance effects is h(h − 1)/2. It is readily seen from Eq. 9 that δ_ii = 0. The derivation process will allow the presence of δ_ii but the final results will not have δ_ii. Dominance value or dominance deviation is the deviation of the genotypic value from the common mean and additive value, i.e.,

d_{i j} = g_{i j} - μ - a_{i j}

An important difference between ‘dominance value’ and ‘dominance effect’ is that a homozygous genotype may have non-zero dominance value but always has zero dominance effect. Each dominance value defined by Eq. 10 will be shown to be a function of all h(h − 1)/2 dominance effects defined by Eq. 9.

Multi-allelic partition of genotypic value and variance

The genotypic value of a multi-allelic genotype has the same partition as for a bi-allelic locus [17], i.e.,

g_{i j} = μ + a_{i j} + d_{i j}

with E(a_ij) = 0 and E(d_ij) = 0. The multi-allelic genotypic variance (σ²_g) also has the same partition as for a bi-allelic locus [17], i.e., σ²_g = σ²_a + σ²_d, where σ²_a = additive variance, and σ²_d = dominance variance. The multi-allelic haplotype model to be developed starts with the factorization of the additive and dominance values in Eq. 11.

Results and discussion

Factorization of additive and dominance values

From Eqs. 4–7, an allelic effect can be expressed as:

a_{i} = μ_{i} - μ = \sum_{k \neq i}^{h} p_{k} α_{i k} = \sum_{k \neq i}^{h} p_{k} (α_{1 k} - α 1 i) = - (1 - p_{i}) α + 1 i \sum_{k \neq i}^{h} p_{k} α_{1 k}

where α_lk is defined by Eq. 7. Equation 12 shows that an allelic effect is a function of all h-1 parameters of additive effects denoted by α_lk. The additive values (breeding values) of A_iA_j and A_iA_i genotypes can be expressed as:

\begin{array}{l} a_{i j} = a_{i} + a_{j} = [- (1 - p_{i}) α + 1 i \sum_{k \neq i}^{h} p_{k} α_{1 k}] + [- (1 - p_{j}) α + 1 j \sum_{k \neq i}^{h} p_{k} α_{1 k}] \\ = - (1 - 2 p_{i}) α - 1 i (1 - 2 p_{j}) α + 1 j 2 \sum_{k \neq i j}^{h} p_{k} α_{1 k} \end{array}

a_{i i} = 2 a_{i} = - 2 (1 - 2 p_{i}) α + 1 i 2 \sum_{k \neq i j}^{h} p_{k} α_{1 k}

In Eqs. 13 and 14, α_li = 0 if i = 1 and α_1j = 0 if j = 1. From Eqs. 1–3 and 9–10, the dominance value of the A_iA_j genotype can be expressed as

\begin{array}{l} d_{i j} = g_{i j} - μ - a_{i} - a_{j} = g_{i j} - μ_{i} - μ_{j} + μ = (g_{i j} - μ_{i}) - (μ_{j} - μ) \\ = \sum_{k \neq j}^{h} p_{k} (g_{i j} - g_{i k}) - \sum_{k \neq j}^{h} p_{k} (μ_{j} - μ_{k}) = \sum_{k \neq j}^{h} p_{k} [(g_{i j} - μ_{j}) - (g_{i k} - μ_{k})] \\ = \sum_{k \neq j}^{h} p_{k} [\sum_{f \neq i}^{h} p_{f} (g_{i j} - g_{j f}) - \sum_{f \neq i}^{h} p_{f} (g_{i k} - g_{k f})] \\ = \sum_{k \neq j}^{h} p_{k} \sum_{f \neq i}^{h} p_{f} (g_{i j} - g_{i k} - g_{j f} + g_{k f}) \end{array}

In Eq. 15, the quantity g_ij − g_ik − g_jf + g_kf has two positive terms and two negative terms, and each subscript is associated with a positive term and a negative term. Using this fact and the definition of dominance effect (δ_ij) of Eq. 9 with δ_ii = 0, g_ij − g_ik − g_jf + g_kf can be expressed as:

g_{i j} - g_{i k} - g_{j f} + g_{k f} = δ_{i j} - δ_{i k} - δ_{j f} + δ_{k f}

Combining Eqs. 15 and 16 with Eq. 10 and using p_j = 1 − ∑^h_k ≠ jp_k (Eq. 1) yields:

\begin{array}{l} d_{i j} = \sum_{k \neq j}^{h} p_{k} \sum_{f \neq i}^{h} p_{f} (δ_{i j} - δ_{i k} - δ_{j f} + δ_{k f}) \\ = \sum_{k \neq j}^{h} p_{k} [\sum_{f \neq i}^{h} p_{f} (δ_{i j} - δ_{i k}) - \sum_{f \neq i}^{h} p_{f} (δ_{j f} - δ_{k f})] \\ = \sum_{k \neq j}^{h} p_{k} [(1 - p_{i}) (δ_{i j} - δ_{i k}) - \sum_{f \neq i}^{h} p_{f} δ_{j f} + \sum_{f \neq i}^{h} p_{f} δ_{k f}] \\ = (1 - p_{i}) (1 - p_{j}) δ_{i j} - (1 - p_{i}) \sum_{k \neq j}^{h} p_{k} δ_{i k} - \sum_{k \neq j}^{h} p_{k} (\sum_{f \neq i}^{h} p_{f} δ_{j f} - \sum_{f \neq i}^{h} p_{f} δ_{k f}) \\ = (1 - p_{i}) (1 - p_{j}) δ_{i j} - (1 - p_{i}) \sum_{k \neq j}^{h} p_{k} δ_{i k} - (1 - p_{j}) \sum_{f \neq i}^{h} p_{f} δ_{j f} + \sum_{k \neq j}^{h} p_{k} \sum_{f \neq i}^{h} p_{f} δ_{k f} \end{array}

In Eq. 17,

\begin{array}{l} \sum_{k \neq j}^{h} p_{k} \sum_{f \neq i}^{h} p_{f} δ_{k f} = p_{i} p_{j} δ_{i j} + p_{i} \sum_{f \neq i, k}^{h} p_{f} δ_{j f} + p_{j} \sum_{k \neq j, f}^{h} p_{k} δ_{j k} + \sum_{k \neq i, j}^{h} p_{k} \sum_{f \neq k}^{h} p_{f} δ_{k f} \\ = p_{i} p_{j} δ_{i j} + p_{i} \sum_{k \neq i, j}^{h} p_{k} δ_{i k} + p_{j} \sum_{f \neq i, j}^{h} p_{f} δ_{j f} + 2 \sum_{k \neq i, j}^{h - 1} p_{k} \sum_{f = k + 1}^{h} p_{f} δ_{k f} \end{array}

Combining Eqs. 17 and 18 yields:

\begin{array}{l} d_{i j} = g_{i j} - μ - a_{i} - a_{j} = [1 - p_{i} (1 - p_{j}) - p_{j} (1 - p_{i})] δ_{i j} \\ - (1 - 2 p_{i}) \sum_{k \neq i, j}^{h} p_{k} δ_{i k} - (1 - 2 p_{j}) \sum_{f \neq i, j}^{h} p_{f} δ_{j f} + 2 \sum_{k \neq i, j}^{h - 1} p_{k} \sum_{f = k + 1}^{h} p_{f} δ_{k f} \end{array}

d_{i i} = g_{i i} - μ - 2 a_{i} = - 2 (1 - p_{i}) \sum_{k \neq i}^{h} p_{k} δ_{i k} + 2 \sum_{k \neq i}^{h - 1} p_{k} \sum_{f = k + 1}^{h} p_{f} δ_{k f}

Equations 13 and 14 show that each additive value is a function of all h − 1 additive effects defined by Eq. 7, and Eqs. 19–20 show that each dominance value is a function of all h(h − 1)/2 dominance effects defined by Eq. 9. Equations 13 and 14 provide the additive coding and Eqs. 19 and 20 provide the dominance coding of each multi-allelic genotype for the mixed model implementation.

Multi-allelic haplotype model based on multi-allelic genetic partition

Using the results of factorization of additive and dominance values given by Eqs. 13–14 and 19–20, the multi-allelic haplotype model treating each haplotype as an ‘allele’ by Eq. 11 can be expressed as:

g_{i j} = μ + a_{i j} + d_{i j} = μ + \sum_{k = 2}^{h} w_{α}^{i j, k} α_{1 k} + \sum_{k = 1}^{h - 1} \sum_{f = k + 1}^{h} w_{δ}^{i j, k f} δ_{k f}

In w^ij,k_α, superscripts ij are for the genotype of A_iA_j and superscript k is for α_lk. In w^ij,kf_δ, superscripts ij are for d_ij and superscripts kf are for δ_kf. From Eqs. 13 and 14, the additive coding (w^ij,k_α) of a multi-allelic genotype is:

w_{α}^{i j, k} = 2 p_{k} f o r i, j \neq k (a_{i j} and α_{1 k} do not share allele k)

w_{α}^{i j, k} = - (1 - 2 p_{k}) f o r i \neq j but i = k or j = k (a_{i j} and α_{1 k} share allele k, i \neq k)

w_{α}^{i j, k} = - 2 (1 - p_{k}) f o r i = j = k (a_{i j} and α_{1 k} share allele k, i = j)

From Eqs. 19 and 20, the dominance coding (w^ij,kf_δ) of a multi-allelic genotype is:

w_{δ}^{i j, k f} = 1 - p_{i} (1 - p_{j}) - p_{j} (1 - p_{i}) f o r i j = k f (d_{i j} and δ_{k f} share 2 alleles)

w_{δ}^{i j, k f} = - p_{k} (1 - 2 p_{i}) f o r i \neq j and i = f (d_{i j} and δ_{k f} share allele f, i \neq j)

w_{δ}^{i j, k f} = - p_{f} (1 - 2 p_{j}) f o r i \neq j and j = k (d_{i j} and δ_{k f} share allele k, i \neq j)

w_{δ}^{i j, k f} = - 2 p_{k} (1 - p_{i}) f o r i = j and i = f (d_{i j} and δ_{k f} share allele f, i = j)

w_{δ}^{i j, k f} = 2 p_{k} p_{f} f o r i, j \neq k, f (d_{i j} and δ_{k f} share no allele, i = j or i \neq j)

For convenience of computer programming, Eqs. 22–24 can be characterized by whether a_ij and α_lk share no common allele (Eq. 22), or 1 common allele when i ≠ j (Eq. 23) or 1 common allele when i = j (Eq. 24). Similarly, between d_ij and δ_kf, Eq. 25 shares two common alleles, Eqs. 26 and 27 share 1 common allele with i ≠ j, Eq. 28 shares one common allele with i = j, and Eq. 29 share no common allele. In Eqs. 25–29, p_i or p_j is the allele frequency of the shared allele between d_ij and δ_kf and p_k or p_f is the allele frequency of the non-shared allele between d_ij and δ_kf. From Eqs. 21–29, the multi-allelic haplotype model for h(h + 1)/2 possible genotypic values (g) of a given haplotype block with h haplotypes can be expressed as:

g = 1 μ + a_{h} + d_{h} = 1 μ + W_{α h} α_{h} + W_{δ h} δ_{h}

where μ = common mean, 1 = [h(h + 1)/2] × 1 column vector of 1’s, a_h = W_αhα_h = [h(h + 1)/2] × 1 column vector of additive values (breeding values), d_h = W_δhδ_h = [h(h + 1)/2] × 1 column vector of dominance values (dominance deviations), W_αh = [h(h + 1)/2] × (h − 1) model matrix of α_hwith w^ij,k_α defined by Eqs. 22–24, d_h = [h(h + 1)/2] × 1 column vector of dominance values (dominance deviations), W_δh = [h(h + 1)/2] × [h(h − 1)/2] matrix of δ_h with w^ij,kf_δ defined by Eqs. 25–29, and α_h = (h − 1) × 1 column vector with α_lk defined by Eq. 7, and δ_h = [h(h − 1)/2] × 1 column vector with δ_kf defined by Eq. 9.

Numerical example of multi-allelic genetic partition

A hypothetical numerical example is used to illustrate the genetic partition of multi-allelic genotypic values described by Eqs. 21–30. Four haplotypes as ‘alleles’ are assumed with frequencies in Table 1 and genotypic values in Table 2. The common mean of the genotypic values using Eq. 3 is: μ = 22.09. The additive effects of the four haplotypes defined by Eqs. 5–7, are:

α_{h}' = [\begin{array}{c} - 7.4 & - 1.1 & - 2.5 \end{array}]',

and the dominance effects defined by Eq. 9 are:

δ_{h}' = [\begin{array}{c} - 9.5 & - 6 & - 20 & 9.5 & 7.5 & - 14 \end{array}]' .

Table 1.

Four hypothetical haplotypes and their frequencies (h = 4)

Haplotype	1	2	3	4
Frequency	0.4	0.3	0.2	0.1

Open in a new tab

Table 2.

Genotypic values of haplotype genotypes (g_ij = g_ji)

Haplotype	1	2	3	4
1	g₁₁ = 25	g₁₂ = 18	g₁₃ = 15	g₁₄ = 10
2		g₂₂ = 30	g₂₃ = 33	g₂₄ = 40
3			g₃₃ = 17	g₃₄ = 12
4				g₄₄ = 35

Open in a new tab

Using Eqs. 13–14 and 22–24, the additive values (breeding values) are:

a_{h} = [\begin{array}{c} a_{11} \\ a_{22} \\ a_{33} \\ a_{44} \\ a_{12} \\ a_{13} \\ a_{14} \\ a_{23} \\ a_{24} \\ a_{34} \end{array}] = [\begin{array}{c} 2 p_{2} & 2 p_{3} & 2 p_{4} \\ - 2 (1 - p_{2}) & 2 p_{3} & 2 p_{4} \\ 2 p_{2} & - 2 (1 - p_{3}) & 2 p_{4} \\ 2 p_{2} & 2 p_{3} & - 2 (1 - p_{4}) \\ - (1 - 2 p_{2}) & 2 p_{3} & 2 p_{4} \\ 2 p_{2} & - (1 - 2 p_{3}) & 2 p_{4} \\ 2 p_{2} & 2 p_{3} & - (1 - 2 p_{4}) \\ - (1 - 2 p_{2}) & - (1 - 2 p_{3}) & 2 p_{4} \\ - (1 - 2 p_{2}) & 2 p_{3} & - (1 - 2 p_{4}) \\ 2 p_{2} & - (1 - 2 p_{3}) & - (1 - 2 p_{4}) \end{array}] [\begin{array}{c} α_{12} \\ α_{13} \\ α_{14} \end{array}] = [\begin{array}{c} 0.6 & 0.4 & 0.2 \\ - 1.4 & 0.4 & 0.2 \\ 0.6 & - 1.6 & 0.2 \\ 0.6 & 0.4 & - 1.8 \\ - 0.4 & 0.4 & 0.2 \\ 0.6 & - 0.6 & 0.2 \\ 0.6 & 0.4 & - 0.8 \\ - 0.4 & - 0.6 & 0.2 \\ - 0.4 & 0.4 & - 0.8 \\ 0.6 & - 0.6 & - 0.8 \end{array}] [\begin{array}{c} - 7.4 \\ - 1.1 \\ - 2.5 \end{array}] = [\begin{array}{c} - 5.38 \\ 9.42 \\ - 3.18 \\ - 0.38 \\ 2.02 \\ - 4.28 \\ - 2.88 \\ 3.12 \\ 4.52 \\ - 1.78 \end{array}] .

Using Eqs. 19–20 and 25–29, the dominance values (dominance deviations) are:

\begin{array}{l} d_{h} = [\begin{array}{c} d_{11} \\ d_{22} \\ d_{33} \\ d_{44} \\ d_{12} \\ d_{13} \\ d_{14} \\ d_{23} \\ d_{24} \\ d_{34} \end{array}] = [\begin{array}{c} - 2 p_{2} (1 - p_{1}) & - 2 p_{3} (1 - p_{1}) & - 2 p_{4} (1 - p_{1}) & 2 p_{2} p_{3} & 2 p_{2} p_{4} & 2 p_{3} p_{4} \\ - 2 p_{1} (1 - p_{2}) & 2 p_{1} p_{3} & 2 p_{1} p_{4} & - 2 p_{3} (1 - p_{2}) & - 2 p_{4} (1 - p_{2}) & 2 p_{3} p_{4} \\ 2 p_{1} p_{2} & - 2 p_{1} (1 - p_{3}) & 2 p_{1} p_{4} & - 2 p_{2} (1 - p_{3}) & 2 p_{2} p_{4} & - 2 p_{4} (1 - p_{3}) \\ 2 p_{1} p_{2} & 2 p_{1} p_{3} & - 2 p_{1} (1 - p_{4}) & 2 p_{2} p_{3} & - 2 p_{2} (1 - p_{4}) & - 2 p_{3} (1 - p_{4}) \\ w_{δ}^{12, 12} & - p_{3} (1 - 2 p_{1}) & - p_{4} (1 - 2 p_{1}) & - p_{3} (1 - 2 p_{2}) & - p_{4} (1 - 2 p_{2}) & 2 p_{3} p_{4} \\ - p_{2} (1 - 2 p_{1}) & w_{δ}^{13, 13} & - p_{4} (1 - 2 p_{1}) & - p_{2} (1 - 2 p_{3}) & 2 p_{2} p_{4} & - p_{4} (1 - 2 p_{3}) \\ - p_{2} (1 - 2 p_{1}) & - p_{3} (1 - 2 p_{1}) & w_{δ}^{14, 14} & 2 p_{2} p_{3} & - p_{2} (1 - 2 p_{4}) & - p_{3} (1 - 2 p_{4}) \\ - p_{1} (1 - 2 p_{2}) & - p_{1} (1 - 2 p_{3}) & {2 p}_{1} p_{4} & w_{δ}^{23, 23} & - p_{4} (1 - 2 p_{2}) & - p_{4} (1 - 2 p_{3}) \\ - p_{1} (1 - 2 p_{2}) & 2 p_{1} p_{3} & - p_{1} (1 - 2 p_{4}) & - p_{3} (1 - 2 p_{2}) & w_{δ}^{24, 24} & - p_{3} (1 - 2 p_{4}) \\ 2 p_{1} p_{2} & - p_{1} (1 - 2 p_{3}) & - p_{1} (1 - 2 p_{4}) & - p_{2} (1 - 2 p_{3}) & - p_{2} (1 - 2 p_{4}) & w_{δ}^{34, 34} \end{array}] [\begin{array}{c} δ_{12} \\ δ_{13} \\ δ_{14} \\ δ_{23} \\ δ_{24} \\ δ_{34} \end{array}] \\ = [\begin{array}{c} - 0.36 & - 0.24 & - 0.12 & 0.12 & 0.06 & 0.04 \\ - 0.56 & 0.16 & 0.08 & - 0.28 & - 0.14 & 0.04 \\ 0.24 & - 0.64 & 0.08 & - 0.48 & 0.06 & - 0.16 \\ 0.24 & 0.16 & - 0.72 & 0.12 & - 0.54 & - 0.36 \\ 0.54 & - 0.04 & - 0.02 & - 0.08 & - 0.04 & 0.04 \\ - 0.06 & 0.56 & - 0.02 & - 0.18 & 0.06 & - 0.06 \\ - 0.06 & - 0.04 & 0.58 & 0.12 & - 0.24 & - 0.16 \\ - 0.16 & - 0.24 & 0.08 & 0.62 & - 0.04 & - 0.06 \\ - 0.16 & 0.16 & - 0.32 & - 0.08 & 0.66 & - 0.16 \\ 0.24 & - 0.24 & - 0.32 & - 0.18 & - 0.24 & 0.74 \end{array}] [\begin{array}{c} - 9.5 \\ - 6 \\ - 20 \\ 9.5 \\ 7.5 \\ - 14 \end{array}] = [\begin{array}{c} 8.29 \\ - 1.51 \\ - 1.91 \\ 13.29 \\ - 6.11 \\ - 2.81 \\ - 9.21 \\ 7.79 \\ 13.39 \\ - 8.31 \end{array}] \end{array}

The genotypic values calculated as the summation of the additive and dominance values are:

g = 1 μ + a_{h} + d_{h} = [\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{array}] (22.09) + [\begin{array}{c} - 5.38 \\ 9.42 \\ - 3.18 \\ - 0.38 \\ 2.02 \\ - 4.28 \\ - 2.88 \\ 3.12 \\ 4.52 \\ - 1.78 \end{array}] + [\begin{array}{c} 8.29 \\ - 1.51 \\ - 1.91 \\ 13.29 \\ - 6.11 \\ - 2.81 \\ - 9.21 \\ 7.79 \\ 13.39 \\ - 8.31 \end{array}] = [\begin{array}{c} 25 \\ 30 \\ 17 \\ 35 \\ 18 \\ 15 \\ 10 \\ 33 \\ 40 \\ 12 \end{array}] = [\begin{array}{c} g_{11} \\ g_{22} \\ g_{33} \\ g_{44} \\ g_{12} \\ g_{13} \\ g_{14} \\ g_{23} \\ g_{24} \\ g_{34} \end{array}] .

By comparing with the genotypic values in Table 2, the above result verifies that the multi-allelic partition of g = 1μ + a_h + d_h = 1μ + W_αhα_h + W_δhδ_h described by Eqs. 21–30 is correct. With the note that g_ij = g_ji, a_ij = a_ji and d_ij = d_ji, the genotypic variance (σ²_g), additive variance (σ²_a) and dominance variance (σ²_d) are:

\begin{array}{l} σ_{g}^{2} = \sum_{i = 1}^{h} \sum_{j = 1}^{h} p_{i} p_{j} g_{i j}^{2} - μ^{2} = 71.0419 \\ σ_{a}^{2} = \sum_{i = 1}^{h} \sum_{j = 1}^{h} p_{i} p_{j} a_{i j}^{2} = 20.1178 \\ σ_{d}^{2} = \sum_{i = 1}^{h} \sum_{j = 1}^{h} p_{i} p_{j} d_{i j}^{2} = 50.9241 \end{array}

It is readily seen that σ²_g = σ²_a + σ²_d.

Mixed model and multi-allelic genomic relationship matrices

A mixed model to implement the multi-allelic haplotype model of Eq. 30 can be established with appropriate changes of matrix dimensions for W_αh, W_δh, a_h, d_h, α_h and δ_h in Eq. 30. A set of m SNP markers are assumed available, and r haplotype blocks of the m SNPs are defined across the genome. Haplotypes of all individuals are assumed known (e.g., constructed using a phasing or imputing software). Each haplotype block is treated as a ‘locus’ and each haplotype within a haplotype block is treated as an ‘allele’. The i_th haplotype block has h_i haplotypes, h_i−1 additive effects, and n_δi dominance effects or heterozygous genotypes. Let n_α = total number of additive effects of all r haplotype blocks, n_δ = total number of dominance effects (or heterozygous genotypes) of all r haplotype blocks. Then, n_α = ∑^r_i = 1h_i − r, and n_δ = ∑^r_i = 1n_δi. For a given sample of q individuals, the limit number of effects is 2q-1 for additive effects and is the number of heterozygous genotypes for dominance effects. For a sample with N observations on q individuals, the mixed model to implement the multi-allelic haplotype model of Eq. 30 can be expressed as:

y = X b + Z (W_{α h} α_{h} + W_{δ h} δ_{h}) + e

where Z = N × q incidence matrix allocating phenotypic observations to each individual = identity matrix for one observation per individual (N = q), α_h = n_α × 1 column vector of haplotype additive effects, W_αh = q × n_α model matrix of α_h, δ_h = n_δ × 1 column vector for dominance effects of haplotype genotypes, W_δh = q × n_δ model matrix of δ_h, α_s = m × 1 column vector of single-SNP additive effects, b = c × 1 column vector of fixed effects such as heard-year-season in dairy cattle (c = number of fixed effects), and X = N × c model matrix of b. To define two equivalent models with complementary computing advantages and identical GBLUP and GREML results, the mixed model of Eq. 31 needs to be expressed as [8]:

y = X b + Z (T_{α h} α_{h} + T_{δ h} δ_{h}) + e = X b + Z (a_{h} + d_{h}) + e

where a_h = T_αhα_h = multi-allelic genomic breeding values, d_h = T_δhδ_h = multi-allelic genomic dominance values, and each T matrix can be defined by any of the six definitions of genomic relationships we previously discussed and implemented [9]. For simplicity of notations, the T matrices are defined as: T_αh = W_αh/k^1/2_αh, T_δh = W_δh/k^1/2_δh, where k_αh = the average of diagonal elements of W_αhW_αh ', and k_δh = the average of diagonal elements of W_δhW_δh '. The genomic relationship matrices of Eq. 31 can thus be defined as:

A_{h} = T_{α h} T_{α h}' = multi ‐ allelic genomic additive relationship matrix

D_{h} = T_{δ h} T_{δ h}' = multi ‐ allelic genomic dominance relationship matrix

Interpretation of multi-allelic and haplotype genomic relationship matrices

The multi-allelic genomic relationships of Eqs. 33 and 34 using multi-allelic markers such as microsatellite markers have the same interpretation and theoretical expectation as using SNP markers that are bi-allelic, e.g., a genomic additive relationship is expected to be twice the coancestry coefficient [8, 9]. Using either multi-allelic or bi-allelic markers under the assumption of no inbreeding, the theoretical expectation of genomic additive relationships is 0.5, 0.5, 0.25 and 0 for parent-offspring, full-sibs, half-sibs and unrelated individuals respectively, and the corresponding theoretical expectation of genomic dominance relationships is 0, 0.25, 0 and 0.

It is important to distinguish between single-locus multi-allelic markers such as microsatellite markers from haplotypes where each haplotype is treated as an ‘allele’ and each haplotype block is treated as a ‘locus’, because recombination between loci within a haplotype block generally exists, leading to lowered haplotype similarity than single-locus similarity among relatives. As the number of loci increases in each haplotype block, genomic relationships using haplotypes are expected to decrease from those using single-locus markers. Therefore, the utility of haplotype genomic relationships using Eqs. 33 and 34 is for genomic prediction using haplotypes, not for measuring relationships among individuals. The optimal block size and hence the number of haplotypes per block is an important issue for genomic prediction and could be determined by validation studies, as to be further discussed towards the end of this article.

Two equivalent mixed models with complementary computing strategies

To establish mixed models using multi-allelic markers or haplotypes, assumptions for the first and second moments of the mixed model of Eq. 32 are: E(y) = Xb, E(α_h) = E(δ_h) = E(α_s) = E(δ_s) = 0, Var(α_h) = σ²_αhI_nα, Var(a_h) = G_αh = σ²_αhA_h, Var(δ_h) = σ²_δhI_nδ, Var(d_h) = G_δh = σ²_δhD_h, and Var(e) = R = σ²_eI_N, where σ²_αh = variance of multi-allelic additive effects, σ²_δh = variance of multi-allelic dominance effects, σ²_e = residual variance, and I_nα, I_nδ, I_m and I_N are identity matrices of orders n_α, n_δ, m and N, respectively. All random effects are assumed to be uncorrelated so that the phenotypic variance-covariance matrix is:

V = V a r (y) = Z (G_{α h} + G_{δ h}) Z' + σ_{e}^{2} I_{N} = Z (σ_{α h}^{2} A_{h} + σ_{δ h}^{2} D_{h}) Z' + σ_{e}^{2} I_{N}

To simply notations for the two equivalent mixed models, terms in Eqs. 32–35 are re-written as α_h = τ₁, δ_h = τ₂; T_αh = T₁, T_δh = T₂; u_i = T_iτ_i, i = 1,2; A_h = S₁, D_h = S₂; and σ²_αh = σ²₁, σ²_δh = σ²₂. Then, Eqs. 32 and 35 can be expressed as:

y = X b + Z \sum_{i = 1}^{2} T_{i} τ_{i} + e = X b + Z \sum_{i = 1}^{2} u_{i}

V = V a r (y) = Z (\sum_{i = 1}^{2} σ_{i}^{2} S_{i}) Z' + σ_{e}^{2} I_{N} .

By defining Z_i = ZT_i, an equivalent model of Eqs. 36 and 37 can be re-written as:

y = X b + \sum_{i = 1}^{2} Z_{i} τ_{i} + e

V = V a r (y) = \sum_{i = 1}^{2} σ_{i}^{2} Z_{i} Z_{i}' + σ_{e}^{2} I_{N} .

Equations 36 and 37 will be referred to as Model-I, and Eqs. 38 and 39 as Model-II. Model-I and Model-II are equivalent models because both models have identical E(y) and V, but these two models have different computational advantages that can be complementary to each other. For each model, two methods can be established for genomic prediction and estimation: the method of conditional expectation (CE) and the method of mixed model equations (MME), yielding a total of four methods for the two equivalent models. Model-I using CE is the best method for large numbers of SNP markers and multiple genetic factors, Model-II using MME is the best method for large numbers of individuals, and Model-I using MME and Model-II using CE have no computing advantage. Therefore, Model-I using CE and Model-II using MME will be used for genomic prediction and estimation. Using our previous naming of these two methods, GBLUP and GREML of Model-I using CE will be referred to as the CE set of formulations, and GBLUP and GREML of Model-II using MME as the QM set of formulation, where QM means ‘q > m’. These two methods yield identical results of prediction and estimation and are applicable to singular genomic relationship matrices. Assuming one observation per individual, CE based on Eqs. 36 and 37 is approximately easier to compute than QM based on Eqs. 38 and 39 if q < c + n_α + n_δ according to the size of the largest matrix to invert for each method (Table 3). Model-I using MME has no computing advantage over Model-I using CE due to the large coefficient matrix of MME and the requirement for full-rank relationship matrices; and Model-II using CE has no computing advantage over Model-I using CE due to the large T matrices to store in memory.

Table 3.

Comparison of computational feasibility of four methods from the two equivalent models with haplotypes and SNPs for GBLUP and GREML

		Method of for calculating GBLUP
		Conditional expectation (CE)	Mixed model equations (MME)
Model I, Eqs. 36 and 37	Largest matrix to invert	V, phenotypic variance-covariance matrix	C, coefficient matrix of MME
	Size of largest matrix to invert	q × q, assuming one observation per individual	c + 2q for C
	Largest matrix to store in memory	q × q P matrix	c + 2q for C
	Applicable to singular genomic relationship matrices	Yes, inverse relationship matrices avoided	No, inverse relationship matrices required
Model II, Eqs. 38 and 39	Largest matrix to invert	V, phenotypic variance-covariance matrix	C, coefficient matrix of MME
	Size of largest matrix to invert	q × q, assuming one observation per individual	c + n_α + n_δ for C
	Largest matrix to store in memory	q × n_α and q × n_δ T matrices, q × q P matrix	c + n_α + n_δ for C
	Applicable to singular genomic relationship matrices	Yes, inverse relationship matrices avoided	Yes, inverse relationship matrices avoided

Open in a new tab

Genomic best linear unbiased prediction of genetic values (GBLUP)

Using the CE method of Model-I (Eqs. 36 and 37), GBLUP of the i_th type of genetic values for individuals in the training population is obtained as:

{\hat{u}}_{i} = σ_{i}^{2} S_{i} Z' V^{- 1} (y - X \hat{b}) = σ_{i}^{2} S_{i} Z' P y = S_{i} ε_{i}, i = 1, 2

where $\hat{b} = {(X' V^{- 1} X)}^{-} X' V^{- 1} y$ = best linear unbiased estimator (BLUE) of fixed non-genetic effects, P = V^− 1 − V^− 1X(X ' V^− 1X)⁻X ' V^− 1, and $ε_{i} = σ_{i}^{2} Z' V^{- 1} (y - X \hat{b}) = Z' P y = q \times 1$ column vector of regressed phenotypic values of the training population as a regression of the i_th type of genetic values on the phenotypic values in the training population. Two equivalent methods with identical results can be used to predict genetic values of individuals without phenotypic observations (validation population): placing all individuals with or without records in the same mixed model by setting to zero the Z matrix for the validation population, or calculate predictions separately based on the regressed phenotypic values of the training population [8, 39]. Using this second method, GBLUP of the i_th type of genetic values for individuals in the validation population is calculated as:

{\hat{u}}_{i 0} = σ_{i}^{2} S_{i 01} Z' V^{- 1} (y - X \hat{b}) = σ_{i}^{2} S_{i 01} Z' P y = S_{i 01} ε_{i}

where S_i01 = q₀ × q genomic relationship matrix between the training and validation populations for the i_th type of genetic values (q₀ = number of individuals in the validation population).

Using the QM method (MME method of Model-II of Eqs. 38 and 39), genomic prediction first calculates the GBLUP of haplotype effects and then calculates GBLUP of genetic values. GBLUP of haplotype effects is obtained from solving the following MME:

(\begin{array}{l} X' X & X' Z_{g} \\ Z_{g}' X & Z_{g}' Z_{g} + \oplus_{i = 1}^{2} (λ_{i} I_{t i}) \end{array}) (\begin{array}{c} \hat{b} \\ \hat{τ} \end{array}) = (\begin{array}{c} X' y \\ Z_{g}' y \end{array})

where $\hat{τ} = ({\hat{τ}}_{1}, {\hat{τ}}_{2})$ , Z_g = (Z₁, Z₂), λ_i = σ²_e/σ²_i, t = n_α, n_δ, m and N for i = 1,2, respectively, and ⊕ denotes direct sum that defines a block diagonal matrix. With haplotype and SNP effects from Eq. 42, GBLUP of the i_th type of genetic values for individuals in the training and validation populations are obtained as:

{\hat{u}}_{i} = T_{i} {\hat{τ}}_{i}

{\hat{u}}_{i 0} = T_{i 0} {\hat{τ}}_{i}

where T_i0 = the T_i matrix calculated using SNPs of the validation population. Equations 43 and 44 yield identical results as those of Eqs. 40 and 41. The prediction of total genotypic values in either training or validation population can be obtained from Eqs. 40 and 41 or 43 and 44 as: ĝ = ∑²_i = 1û_i = predicted genotypic values of all individuals, and ĝ₀ = ∑²_i = 1û_i0 = predicted genotypic values of the validation population. Prediction reliabilities of additive, dominance and genotypic predictions as the squared correlations between the genomic and true values has the same formulations as the R²_ai, R²_di and R²_gi formulae in [8], and prediction accuracy is obtained as the square root of the reliability estimate.

Genomic restricted maximum likelihood estimation (GREML) of variance components

Using the CE method of Model-I (Eqs. 36 and 37), the EM type GREML estimates of variance components are:

{σ_{i}^{2}}^{(k + 1)} {= σ}_{i}^{2}^{(k)} y P^{(k)} Z S_{i} Z' P^{(k)} y / t r (P^{(k)} Z S_{i} Z'), i = 1, 2

{σ_{e}^{2}}^{(k + 1)} {= σ}_{e}^{2}^{(k)} y P^{(k)} P^{(k)} y / t r (P^{(k)})

where k = iteration number. Using the QM method (Eqs. 38 and 39), the EM type GREML estimates of variance components are

{σ_{i}^{2}}^{(k + 1)} = {\hat{τ}}_{i}^{(k)}' {\hat{τ}}_{i}^{(k)} / [m - t r (C^{i i (k)}) λ_{i}^{(k)}]

{σ_{e}^{2}}^{(k + 1)} = {\hat{e}}^{(k)}' {\hat{e}}^{(k)} / \{N - [r - \sum_{i = 1}^{4} t r (C^{i i (k)} λ_{i}^{(k)})]\}

where r is the rank of the coefficient matrix of Eq. 42, $\hat{e} = y - X \hat{b} - \sum_{i = 1}^{2} Z_{i} {\hat{τ}}_{i}$ , and Cⁱⁱ is defined by:

H^{- 1} = {(Z_{g}' M Z_{g} + \oplus_{i = 1}^{2} λ_{i} I_{t i})}^{- 1} = [\begin{array}{c} C^{11} & C^{12} \\ C^{21} & C^{22} \end{array}]

where M = I_N − X(X ' X)⁻X ', and ti = n_α for i = 1 and ti = n_δ for i = 2.

The EM-REML of Eqs. 45–48 are known to be slow but reliable to yield non-negative estimates of variance components. The AI-REML algorithm is fast but may be sensitive to starting values of variance components and may fail for extreme heritability levels. Formulations of AI-REML for the multi-allelic haplotype model in this article are straightforward extensions of the formulations we implemented for GVCBLUP [40].

Integration of haplotype and single SNP effects in genomic prediction and estimation

Haplotype analysis and single SNP analysis can be analyzed jointly for genomic prediction in the same mixed model by adding single SNP effects from our previous work [8] to the mixed model of Eq. 31, i.e.,

y = X b + Z (T_{α h} α_{h} + T_{δ h} δ_{h} + T_{α s} α_{s} + T_{δ s} δ_{s}) + e

V = V a r (y) = Z (σ_{α h}^{2} A_{h} + σ_{δ h}^{2} D_{h} + σ_{α s}^{2} A_{s} + σ_{δ s}^{2} D_{s}) Z' + σ_{e}^{2} I_{N}

where α_s = m × 1 column vector of SNP additive effects, T_αs = q × m model matrix of α_s, δ_s = m × 1 column vector of SNP dominance effects, T_δs = q × m model matrix of δ_s, Var(α_s) = σ²_αsI_m, Var(a_s) = G_αs = σ²_αsA_s, Var(δ_s) = σ²_δsI_m, Var(d_s) = G_δs = σ²_δsD_s, A_s = genomic additive relationship matrix, and D_s = SNP genomic dominance relationship matrix, and where A_s = T_αsT_αs ' and D_s = T_δsT_δs '. Let α_s = τ₃, δ = τ₄; u_i = T_iτ_i, i = 1,…,4; A_s = S₃, D_h = S₄; and σ²_αs = σ²₃, σ²_δs = σ²₄. The GBLUP and GREML formulations to jointly include haplotype and single SNP additive and dominance effects essentially entails to extending the range of the subscript i from 2 to 4 for Eqs. 38–50.

GREML estimation using the joint mixed model with haplotype and SNP effects offer flexibility to estimate the heritability for various types of functional genomic information in any given autosome regions based on formulations we implemented in GVCBLUP [40], e.g., the additive and dominance heritabilities of haplotype blocks of all genes, all LD blocks, or all single SNPs. The heritability estimate for each type of genetic effects is: h²_i = σ²_i/σ²_y, where σ²_y = ∑⁴_i = 1σ²_i + σ²_e = phenotypic variance. The total heritability of all types of genetic effects is the summation of all effect heritabilities, i.e., H² = ∑⁴_i = 1h²_i. Genomic heritability estimation has flexibility unavailable from heritability estimation using pedigree relationships: the heritability estimation for a single SNP, a chromosome region, or a set of selected SNPs. Using the GREML formulae of Eqs. 35 and 36, the heritability for haplotype block j or SNP set j can be estimated as: $h_{i j}^{2} = ({\hat{τ}}_{i j}' {\hat{τ}}_{i j} / {\hat{τ}}_{i}' {\hat{τ}}_{i}) h_{i}^{2}$ , where ${\hat{τ}}_{i j}$ = subset j of ${\hat{τ}}_{i}$ , i = 1,…,4. Given sufficient computing power and sample sizes for extensive validation studies, these heritability estimates could help identify genomic regions and genes relevant to phenotypes within the framework of genomic prediction.

Defining haplotype blocks using functional genomic information

The multi-allelic haplotype model can be used for the integration of functional genomic information with genomic prediction and estimation. This integration defines haplotype blocks using functional genomic information under the hypothesis that a chromosome region with functional information required more than a single point to affect a phenotype, followed by genomic prediction and estimation using a haplotype analysis such as the methods developed in this article. Each gene could be a ‘natural haplotype block’ and the use of gene blocks improved the prediction accuracy for some human phenotypes in our preliminary results [37]. Other types of functional information can also be used to define haplotype blocks, including ChIP-seq sites, DNA methylation sites, CNV, protein interaction, pathway information, GWAS results and selection signatures (Fig. 1). Other than ‘natural haplotype blocks’, the optimal block sizes for functional information with best prediction accuracy could be determined by extensive validation studies.

Fig. 1 — Integration of functional and structural genomic information for genomic selection. Haplotype blocks are defined using functional genomic information and are analyzed using the multi-allelic haplotype model in this article for genomic prediction and estimation. Single SNPs as structural genomic information can be used jointly with the haplotype analysis. (DHS = DNase I hypersensitive site)

Rare haplotypes, missing genotypic values

The mixed model approach outlined above allows rare haplotypes. In the extreme case of rare haplotypes with one observation per haplotype or haplotype frequency of 1/h when h is large, the multi-allelic model with the mixed model implementation still is applicable for additive effects and values. Missing genotypic values is a problem for dominance effects and values. The dominance effect defined by Eq. 9 requires the availability of all three genotypic values of a haplotype pair. Consequently, dominance effect is undefined with any missing genotypic value. We currently recommend ignoring any haplotype pair with missing genotypic value or values for defining dominance effects. For large haplotype blocks, nearly all individuals could be heterozygous so that such large blocks may not contribute to genomic prediction and estimation of dominance effects and values. This loss of dominance information should be a factor to consider in defining the block size.

Conclusions

A multi-allelic haplotype model for genomic prediction and estimation is established using the quantitative genetics model that partitions a multi-allelic genotypic value into additive and dominance values, factorizes each additive value into a product between a function of allele frequencies and additive effect, and factorizes each dominance value into a product between a function of allele frequencies and dominance effect. Haplotype genomic additive and dominance relationship matrices and formulations are then derived for GBLUP and GREML utilizing haplotypes in haplotype blocks. These results fill a gap in the theory of quantitative genetics for multi-allelic genetic partition and provide a haplotype approach within the theory of quantitative genetics towards the integration of functional and structural genomic information for genomic selection.

Availability of supporting data

The only data set used in this article is shown in Tables 1–2.

Acknowledgements

This research was supported by USDA National Institute of Food and Agriculture Grant no. 2011-67015-30333 and by project MN-16-043 of the Agricultural Experiment Station at the University of Minnesota. Dzianis Prakapenka and Chunkao Wang implemented the methodology in this article by the GVCHAP computer program. Cheng Tan and Dzianis Prakapenka evaluated the methodology. John R. Garbe provided summary and discussion of human functional genomic information. Li Ma processed a dataset for methodology evaluation.

Abbreviations

SNP: single nucleotide polymorphism
BLUP: best unbiased linear prediction
GBLUP: genomic BLUP
REML: restricted maximum likelihood estimation
GREML: genomic REML
EM: expectation-maximization
AI-REML: average information REML
CE: conditional expectation
MME: mixed model equations

Footnotes

Competing interests

The author declares to have no competing interests.

References

1.Henderson C. Applications of Linear Models in Animal Breeding. Guelph: University of Guelph; 1984. [Google Scholar]
2.Fikse W, Philipsson J. Development of international genetic evaluations of dairy cattle for sustainable breeding programs. Anim Genet Resour Inf. 2007;41:29–43. [Google Scholar]
3.Powell R, VanRaden P. International dairy bull evaluations expressed on national, subglobal, and global scales. J Dairy Sci. 2002;85(7):1863–1868. doi: 10.3168/jds.S0022-0302(02)74260-4. [DOI] [PubMed] [Google Scholar]
4.VanRaden P. Invited Review: Selection on Net Merit to Improve Lifetime Profit. J Dairy Sci. 2004;87(10):3125–3131. doi: 10.3168/jds.S0022-0302(04)73447-5. [DOI] [PubMed] [Google Scholar]
5.Wiggans G, Misztal I, Van Vleck L. Implementation of an animal model for genetic evaluation of dairy cattle in the United States. J Dairy Sci. 1988;71:54–69. doi: 10.1016/S0022-0302(88)79979-8. [DOI] [Google Scholar]
6.VanRaden P. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
7.Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika. 1971;58(3):545–554. doi: 10.1093/biomet/58.3.545. [DOI] [Google Scholar]
8.Da Y, Wang C, Wang S, Hu G. Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One. 2014;9(1):e87666. doi: 10.1371/journal.pone.0087666. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wang C, Da Y. Quantitative genetics model as the unifying model for defining genomic relationship and inbreeding coefficient. PLoS ONE. 2014;9:e114484. doi: 10.1371/journal.pone.0114484. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Hayes B, Goddard M. Genome-wide association and genomic selection in animal breeding. Genome. 2010;53(11):876–883. doi: 10.1139/G10-076. [DOI] [PubMed] [Google Scholar]
11.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Fisher RA. The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans Roy Soc Edinb. 1918;52(02):399–433. doi: 10.1017/S0080456800012163. [DOI] [Google Scholar]
13.Fisher RA. Average excess and average effect of a gene substitution. Ann. Eugen. 1941;11(1):53–63. doi: 10.1111/j.1469-1809.1941.tb02272.x. [DOI] [Google Scholar]
14.Cockerham CC. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954;39(6):859. doi: 10.1093/genetics/39.6.859. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kempthorne O. The correlation between relatives in a random mating population. Proc. R. Soc. Lond. B Biol. Sci. 1954;143(910):103–113. doi: 10.1098/rspb.1954.0056. [DOI] [PubMed] [Google Scholar]
16.Lynch M, Walsh B. Genetics and analysis of quantitative traits. Massachusetts: Sinauer Sunderland; 1998. [Google Scholar]
17.Kempthorne O. An introduction to genetic statistics. New York: Wiley; 1957. [Google Scholar]
18.Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. 4. Harlow, Essex: Longmans Green; 1996. [Google Scholar]
19.Álvarez-Castro JM, Yang R-C. Multiallelic models of genetic effects and variance decomposition in non-equilibrium populations. Genetica. 2011;139(9):1119–1134. doi: 10.1007/s10709-011-9614-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Vormfelde SV, Brockmöller J: On the value of haplotype-based genotype–phenotype analysis and on data transformation in pharmacogenetics and-genomics. Nature Reviews Genetics 2007, 8(12), doi:10.1038/nrg1916-c1. [DOI] [PubMed]
21.Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7(10):781–791. doi: 10.1038/nrg1916. [DOI] [PubMed] [Google Scholar]
22.Garnier S, Truong V, Brocheton J, Zeller T, Rovital M, Wild PS, et al. Genome-wide haplotype analysis of cis expression quantitative trait loci in monocytes. PLoS Genet. 2013;9(1):e1003240. doi: 10.1371/journal.pgen.1003240. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Morris RW, Kaplan NL. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet Epidemiol. 2002;23(3):221–233. doi: 10.1002/gepi.10200. [DOI] [PubMed] [Google Scholar]
24.Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–265. doi: 10.1093/bioinformatics/bth457. [DOI] [PubMed] [Google Scholar]
25.Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449(7164):913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84(2):210–223. doi: 10.1016/j.ajhg.2009.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78(4):629–644. doi: 10.1086/502802. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Von Holdt BM, Pollinger JP, Lohmueller KE, Han E, Parker HG, Quignon P, et al. Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature. 2010;464(7290):898–902. doi: 10.1038/nature08837. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Calus M, De Roos A, Veerkamp R. Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008;178(1):553–561. doi: 10.1534/genetics.107.080838. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Villumsen T, Janss L, Lund M. The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genet. 2009;126(1):3–13. doi: 10.1111/j.1439-0388.2008.00747.x. [DOI] [PubMed] [Google Scholar]
31.Sun X, L. FR, Garrick DJ, Dekkers JCM: Improved accuracy of genomic prediction for traits with rare QTL by fitting haplotypes. Proceedings, 10th World Congress of Genetics Applied to Livestock Production Vancouver, BC, Canada https://asas.org/docs/default-source/wcgalp-proceedings-oral/209_paper_9178_manuscript_1682_0.pdf?sfvrsn=2 [Last accessed December 8 2015].
32.Cuyabano BC, Su G, Lund MS. Selection of haplotype variables from a high-density marker map for genomic prediction. Genet Sel Evol. 2015;47(1):1–11. doi: 10.1186/s12711-015-0143-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Mulder HA, Calus MP, Veerkamp RF. Prediction of haplotypes for ungenotyped animals and its effect on marker-assisted breeding value estimation. Genet Sel Evol. 2010;42:10. doi: 10.1186/1297-9686-42-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Meuwissen T, Goddard M. Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics. 2010;185(2):623–631. doi: 10.1534/genetics.110.116590. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Erbe M, Hayes B, Matukumalli L, Goswami S, Bowman P, Reich C, et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci. 2012;95(7):4114–4129. doi: 10.3168/jds.2011-5019. [DOI] [PubMed] [Google Scholar]
36.Brøndum RF, Su G, Lund MS, Bowman PJ, Goddard ME, Hayes BJ. Genome position specific priors for genomic prediction. BMC Genomics. 2012;13(1):543. doi: 10.1186/1471-2164-13-543. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Da Y, Wang C, Tan C, Prakapenka D, Shigematsu M, Garbe J, Ma L: Multi-allelic haplotype model for genomic prediction and estimation. Abstract P1176. Plant and Animal Genome XXIII, January 10–14, 2015. San Diego. https://pag.confex.com/pag/xxiii/webprogram/Paper14435.html [Last accessed December 8 2015].
38.Tan C, Prakapenka D, Wang C, Ma L, Garbe JR, Hu X, Da Y: Integration of haplotype analysis of functional genomic information with single SNP analysis improved accuracy of genomic prediction. ADSA/ASAS 2015, Orlando, July 12–16 2015. Abstract M84. http://m.jtmtg.org/abs/t/65063. [Last accessed December 8 2015].
39.Henderson C. Best linear unbiased prediction of breeding values not in the model for records. J Dairy Sci. 1977;60(5):783–787. doi: 10.3168/jds.S0022-0302(77)83935-0. [DOI] [Google Scholar]
40.Wang C, Prakapenka D, Wang S, Pulugurta S, Runesha HB, Da Y. GVCBLUP: a computer package for genomic prediction and variance component estimation of additive and dominance effects. BMC bioinformatics. 2014;15(1):270. doi: 10.1186/1471-2105-15-270. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR1] 1.Henderson C. Applications of Linear Models in Animal Breeding. Guelph: University of Guelph; 1984. [Google Scholar]

[CR2] 2.Fikse W, Philipsson J. Development of international genetic evaluations of dairy cattle for sustainable breeding programs. Anim Genet Resour Inf. 2007;41:29–43. [Google Scholar]

[CR3] 3.Powell R, VanRaden P. International dairy bull evaluations expressed on national, subglobal, and global scales. J Dairy Sci. 2002;85(7):1863–1868. doi: 10.3168/jds.S0022-0302(02)74260-4. [DOI] [PubMed] [Google Scholar]

[CR4] 4.VanRaden P. Invited Review: Selection on Net Merit to Improve Lifetime Profit. J Dairy Sci. 2004;87(10):3125–3131. doi: 10.3168/jds.S0022-0302(04)73447-5. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Wiggans G, Misztal I, Van Vleck L. Implementation of an animal model for genetic evaluation of dairy cattle in the United States. J Dairy Sci. 1988;71:54–69. doi: 10.1016/S0022-0302(88)79979-8. [DOI] [Google Scholar]

[CR6] 6.VanRaden P. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika. 1971;58(3):545–554. doi: 10.1093/biomet/58.3.545. [DOI] [Google Scholar]

[CR8] 8.Da Y, Wang C, Wang S, Hu G. Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One. 2014;9(1):e87666. doi: 10.1371/journal.pone.0087666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Wang C, Da Y. Quantitative genetics model as the unifying model for defining genomic relationship and inbreeding coefficient. PLoS ONE. 2014;9:e114484. doi: 10.1371/journal.pone.0114484. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Hayes B, Goddard M. Genome-wide association and genomic selection in animal breeding. Genome. 2010;53(11):876–883. doi: 10.1139/G10-076. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Fisher RA. The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans Roy Soc Edinb. 1918;52(02):399–433. doi: 10.1017/S0080456800012163. [DOI] [Google Scholar]

[CR13] 13.Fisher RA. Average excess and average effect of a gene substitution. Ann. Eugen. 1941;11(1):53–63. doi: 10.1111/j.1469-1809.1941.tb02272.x. [DOI] [Google Scholar]

[CR14] 14.Cockerham CC. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954;39(6):859. doi: 10.1093/genetics/39.6.859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Kempthorne O. The correlation between relatives in a random mating population. Proc. R. Soc. Lond. B Biol. Sci. 1954;143(910):103–113. doi: 10.1098/rspb.1954.0056. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Lynch M, Walsh B. Genetics and analysis of quantitative traits. Massachusetts: Sinauer Sunderland; 1998. [Google Scholar]

[CR17] 17.Kempthorne O. An introduction to genetic statistics. New York: Wiley; 1957. [Google Scholar]

[CR18] 18.Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. 4. Harlow, Essex: Longmans Green; 1996. [Google Scholar]

[CR19] 19.Álvarez-Castro JM, Yang R-C. Multiallelic models of genetic effects and variance decomposition in non-equilibrium populations. Genetica. 2011;139(9):1119–1134. doi: 10.1007/s10709-011-9614-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Vormfelde SV, Brockmöller J: On the value of haplotype-based genotype–phenotype analysis and on data transformation in pharmacogenetics and-genomics. Nature Reviews Genetics 2007, 8(12), doi:10.1038/nrg1916-c1. [DOI] [PubMed]

[CR21] 21.Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7(10):781–791. doi: 10.1038/nrg1916. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Garnier S, Truong V, Brocheton J, Zeller T, Rovital M, Wild PS, et al. Genome-wide haplotype analysis of cis expression quantitative trait loci in monocytes. PLoS Genet. 2013;9(1):e1003240. doi: 10.1371/journal.pgen.1003240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Morris RW, Kaplan NL. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet Epidemiol. 2002;23(3):221–233. doi: 10.1002/gepi.10200. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–265. doi: 10.1093/bioinformatics/bth457. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449(7164):913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84(2):210–223. doi: 10.1016/j.ajhg.2009.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78(4):629–644. doi: 10.1086/502802. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Von Holdt BM, Pollinger JP, Lohmueller KE, Han E, Parker HG, Quignon P, et al. Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature. 2010;464(7290):898–902. doi: 10.1038/nature08837. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Calus M, De Roos A, Veerkamp R. Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008;178(1):553–561. doi: 10.1534/genetics.107.080838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Villumsen T, Janss L, Lund M. The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genet. 2009;126(1):3–13. doi: 10.1111/j.1439-0388.2008.00747.x. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Sun X, L. FR, Garrick DJ, Dekkers JCM: Improved accuracy of genomic prediction for traits with rare QTL by fitting haplotypes. Proceedings, 10th World Congress of Genetics Applied to Livestock Production Vancouver, BC, Canada https://asas.org/docs/default-source/wcgalp-proceedings-oral/209_paper_9178_manuscript_1682_0.pdf?sfvrsn=2 [Last accessed December 8 2015].

[CR32] 32.Cuyabano BC, Su G, Lund MS. Selection of haplotype variables from a high-density marker map for genomic prediction. Genet Sel Evol. 2015;47(1):1–11. doi: 10.1186/s12711-015-0143-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Mulder HA, Calus MP, Veerkamp RF. Prediction of haplotypes for ungenotyped animals and its effect on marker-assisted breeding value estimation. Genet Sel Evol. 2010;42:10. doi: 10.1186/1297-9686-42-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Meuwissen T, Goddard M. Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics. 2010;185(2):623–631. doi: 10.1534/genetics.110.116590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Erbe M, Hayes B, Matukumalli L, Goswami S, Bowman P, Reich C, et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci. 2012;95(7):4114–4129. doi: 10.3168/jds.2011-5019. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Brøndum RF, Su G, Lund MS, Bowman PJ, Goddard ME, Hayes BJ. Genome position specific priors for genomic prediction. BMC Genomics. 2012;13(1):543. doi: 10.1186/1471-2164-13-543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Da Y, Wang C, Tan C, Prakapenka D, Shigematsu M, Garbe J, Ma L: Multi-allelic haplotype model for genomic prediction and estimation. Abstract P1176. Plant and Animal Genome XXIII, January 10–14, 2015. San Diego. https://pag.confex.com/pag/xxiii/webprogram/Paper14435.html [Last accessed December 8 2015].

[CR38] 38.Tan C, Prakapenka D, Wang C, Ma L, Garbe JR, Hu X, Da Y: Integration of haplotype analysis of functional genomic information with single SNP analysis improved accuracy of genomic prediction. ADSA/ASAS 2015, Orlando, July 12–16 2015. Abstract M84. http://m.jtmtg.org/abs/t/65063. [Last accessed December 8 2015].

[CR39] 39.Henderson C. Best linear unbiased prediction of breeding values not in the model for records. J Dairy Sci. 1977;60(5):783–787. doi: 10.3168/jds.S0022-0302(77)83935-0. [DOI] [Google Scholar]

[CR40] 40.Wang C, Prakapenka D, Wang S, Pulugurta S, Runesha HB, Da Y. GVCBLUP: a computer package for genomic prediction and variance component estimation of additive and dominance effects. BMC bioinformatics. 2014;15(1):270. doi: 10.1186/1471-2105-15-270. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers

Yang Da

Abstract

Background

Results

Conclusion

Background

Methods

Allelic mean and population mean of multi-allelic genotypic values

Multi-allelic effect, additive effect, additive value

Dominance effect and dominance value

Multi-allelic partition of genotypic value and variance

Results and discussion

Factorization of additive and dominance values

Multi-allelic haplotype model based on multi-allelic genetic partition

Numerical example of multi-allelic genetic partition

Table 1.

Table 2.

Mixed model and multi-allelic genomic relationship matrices

Interpretation of multi-allelic and haplotype genomic relationship matrices

Two equivalent mixed models with complementary computing strategies

Table 3.

Genomic best linear unbiased prediction of genetic values (GBLUP)

Genomic restricted maximum likelihood estimation (GREML) of variance components

Integration of haplotype and single SNP effects in genomic prediction and estimation

Defining haplotype blocks using functional genomic information

Fig. 1.

Rare haplotypes, missing genotypic values

Conclusions

Availability of supporting data

Acknowledgements

Abbreviations

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers

Yang Da

Abstract

Background

Results

Conclusion

Background

Methods

Allelic mean and population mean of multi-allelic genotypic values

Multi-allelic effect, additive effect, additive value

Dominance effect and dominance value

Multi-allelic partition of genotypic value and variance

Results and discussion

Factorization of additive and dominance values

Multi-allelic haplotype model based on multi-allelic genetic partition

Numerical example of multi-allelic genetic partition

Table 1.

Table 2.

Mixed model and multi-allelic genomic relationship matrices

Interpretation of multi-allelic and haplotype genomic relationship matrices

Two equivalent mixed models with complementary computing strategies

Table 3.

Genomic best linear unbiased prediction of genetic values (GBLUP)

Genomic restricted maximum likelihood estimation (GREML) of variance components

Integration of haplotype and single SNP effects in genomic prediction and estimation

Defining haplotype blocks using functional genomic information

Fig. 1.

Rare haplotypes, missing genotypic values

Conclusions

Availability of supporting data

Acknowledgements

Abbreviations

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases