Skip to main content
Genetics, Selection, Evolution : GSE logoLink to Genetics, Selection, Evolution : GSE
. 2016 Apr 23;48:36. doi: 10.1186/s12711-016-0214-0

Mendelian sampling covariability of marker effects and genetic values

Sarah Bonk 1, Manuela Reichelt 1, Friedrich Teuscher 1, Dierck Segelke 2, Norbert Reinsch 1,
PMCID: PMC4842306  PMID: 27107720

Abstract

Background

Measures of the expected genetic variability among full-sibs are of practical relevance, such as in the context of mating decisions. An important application field in animal and plant breeding is the selection and allocation of mates when large or small amounts of genetic variability among offspring are desired, depending on user-specific goals. Estimates of the Mendelian sampling variance can be obtained by simulating gametes from parents with known diplotypes. Knowledge of recombination rates and additive marker effects is also required. In this study, we aimed at developing an exact method that can account for both additive and dominance effects.

Results

We derived parent-specific covariance matrices that exactly quantify the within-family (co-)variability of additive and dominance marker effects. These matrices incorporate prior knowledge of the parental diplotypes and recombination rates. When combined with additive marker effects, they allow the exact derivation of the Mendelian sampling (co-)variances of (estimated) breeding values for several traits, as well for the aggregate genotype. A comparative analysis demonstrated good average agreement between the exact values and the simulation results for a practical dataset (74,353 German Holstein cattle).

Conclusions

The newly derived method is suitable for calculating the exact amount of intra-family variation of the estimated breeding values and genetic values (comprising additive and dominance effects).

Background

The degree of genetic variability among full-sibs is known as Mendelian sampling variance. This variability is due to the inheritance of random samples of alleles from both parents. For a quantitative trait, the amount of this variability depends on the parental degree of heterozygosity, 1-F, where F (F) is the inbreeding coefficient of an individual’s sire (dam), which is derived from the pedigree. Under additivity and with unlinked loci, the Mendelian sampling variance is the sum of two parental contributions, 14σa2(1-F)+14σa2(1-F), where σa2 is the additive genetic variance [1]. The latter expression is of general importance in quantitative genetics, especially in the context of estimating genetic parameters and in genetic evaluations. In certain models (e.g. [2, 3]), it is used explicitly for the relative weighting of observations from progeny of inbred versus non-inbred parents. Moreover, the inverse Mendelian sampling variance plays a pivotal role in direct inversion of the numerator relationship matrix [4].

New methods to track Mendelian sampling variance are based on the availability of phased genotypes (diplotypes) at genetic markers across the genome as a byproduct of genomic selection (e.g. [5, 6]). Single nucleotide polymorphism (SNP) diplotypes of parents differ in terms of three features: the degree of heterozygosity, the genotypes at homozygous loci, and the linkage phase between loci. All of these features have consequences for the variability of gametes that are generated by a particular individual, and thus for the variance among the progeny in a family. A small within-family genetic variation contributes to phenotypic uniformity, which is desired e.g. for birth-weight of piglets (e.g. [7]), while a large Mendelian sampling variability may increase selection opportunity between sibs [6]. When phased genotypes are available, it is possible to simulate a large sample of the population of gametes of a selection candidate by considering recombination within chromosomes, as was demonstrated in a study on 58,035 Holsteins [5]. However, to the best of our knowledge, exact formulae for calculating the Mendelian sampling variance from phased genotypes have not been reported previously.

In this study, we provide the requisite formulae for the exact calculation of within-family genetic variation. The within-family covariance matrix between the additive and dominance effects of all markers can be derived exactly from phased SNP-genotypes and the known genetic distances between markers. Conversion to the within-family variance of (estimated) additive and dominance values for a trait is then achieved via the (estimated) additive and dominance marker effects. We provide a comparison between the results obtained by simulations and the exact method, as well as a brief discussion of the application of this method to the allocation of mates.

Methods

In the following, a breeding population is assumed, where phased SNP-genotypes are available for all potential mating partners, as well as estimates of the SNP-effects for all traits. Furthermore, it is assumed that the genetic distances between all markers are known in terms of their recombination rates, which are summarized in a comprehensive genetic map for all SNPs.

In the first part of this section, we only consider additive marker effects that contribute to the genomic breeding value of an individual:

g=cm, 1

where c is a row vector of genotype indicators (see Eq. 3) and m is a vector of marker effects. We show that the within-family covariance matrix Ω of c, with dimensions equal to the number of marker effects, can be expressed as the sum of two parent-specific covariance matrices Ω and Ω. They define independent parental contributions to the Mendelian sampling variance of genomic breeding values of a trait:

σg2=mΩm. 2

Subsequently we demonstrate that this additive property of the covariance matrix vanishes when dominance marker effects are included.

Definitions and the case of pure additivity

Throughout this study, we exploit the fact that correlations between marker effects are equivalent to correlations between genotype indicators. For the additive effect a (half the average phenotypic difference between homozygotes) at a two-allelic (AB) locus i, the genotype indicator ca,i is given by:

ca,i=1,forgenotypeAA0,forgenotypesABorBA-1,forgenotypeBB, 3

and for dominance effects by:

cd,i=-1,forgenotypesAAorBB1,forgenotypesABorBA, 4

for 1in, where n is the number of markers. See the discussion for a treatment of the coding of additive and dominance effects. Derivation of the covariance is based on the linkage disequilibrium (LD) among all the gametes produced by a parent with a particular diplotype.

A parent has one of the 16 possible diplotypes when two bi-allelic markers are involved, which are represented in Table 1. Pairs of additive and dominance genotype indicators are given in columns 2 to 5. The gametes generated by the parent (comprising one allele at the first locus and one allele at the second locus) follow a probability distribution that depends on the diplotype of the parent. Columns 6 to 9 shows the probabilities of gametes, to which we apply the concept of LD. For each parent, LD can be determined as:

Di,j=pA-ApB-B-pA-BpB-A, 5

where the lower index indicates gametes (see Table 1, column 10).

Table 1.

Ten classes of parental diplotypes with different two-locus genotypes and distributions of produced gametes

Parental diplotype Genotype indicators Probabilities of gametes Characterizing parameters
ca,i ca,j cd,i cd,j pA-A pA-B pB-A pB-B Di,j pi pj
A-AA-A 1 1 −1 −1 1 0 0 0 0 1 1
A-AA-B/A-BA-A 1 0 −1 1 12 12 0 0 0 1 12
A-AB-A/B-AA-A 0 1 1 −1 12 0 12 0 0 12 1
A-BA-B 1 −1 −1 −1 0 1 0 0 0 1 0
B-AB-A −1 1 −1 −1 0 0 1 0 0 0 1
A-AB-B/B-BA-A 0 0 1 1 1-θi,j2 θi,j2 θi,j2 1-θi,j2 1-2θi,j4 12 12
A-BB-A/B-AA-B 0 0 1 1 θi,j2 1-θi,j2 1-θi,j2 θi,j2 -1-2θi,j4 12 12
A-BB-B/B-BA-B 0 −1 1 −1 0 12 0 12 0 12 0
B-AB-B/B-BB-A −1 0 −1 1 0 0 12 12 0 0 12
B-BB-B −1 −1 −1 −1 0 0 0 1 0 0 0

Different diplotypes with identical genotypes at markers i and j are separated by a slash. Genotype indicators are given for additive (ca,i,ca,j) and dominance (cd,i,cd,j) effects at both parental marker genotypes. The probabilities of gametes are specific for each diplotype class and they can be summarized by three characteristic parameters: LD Di,j and the probabilities pi, pj for an A-allele at locus i or j. θi,j is the recombination fraction

In Table 1 the probability of the appearance of allele A at the ith locus is denoted by pi. Note that this definition applies to the diplotypes as well as to the gametes of the parent. The values for pi and pj are given in columns 11 and 12 of Table 1. All entries in Table 1 apply to both the sire (upper index ) and the dam (upper index ).

The joint distribution of genotypes at two bi-allelic marker loci among offspring is outlined in Table 2, using the frequencies of the parental gametes from Table 1. All nine two-locus genotypes are enumerated, together with their underlying diplotypes (the upper haplotype is paternal; columns 1–3). The probability of each ordered diplotype is the product of the two gametic probabilities, and the probability of a two-locus genotype is the sum of the probabilities for all of its possible underlying diplotypes (Table 2, last column). In the next step, the sex-specific probabilities (indexed as or ) of the parental gametes are expressed as functions of the parent-specific LD-parameters and allele frequencies:

pA-A=Di,j+pipj, 6
pA-B=-Di,j+pi1-pj, 7
pB-A=-Di,j+1-pipj, 8

and

pB-B=Di,j+1-pi)(1-pj, 9

for a sire. Expressions for a dam are obtained in the same manner by replacing with .

Table 2.

Two-locus genotype probabilities in a full-sib family

L1 L2 Diplotypes Probabilities
BB BB B-BB-B pB-BpB-B
AB/BA B-AB-B/B-BB-A pB-ApB-B+pB-BpB-A
AA B-AB-A pB-ApB-A
AB/BA BB A-BB-B/B-BA-B pA-BpB-B+pB-BpA-B
AB/BA A-AB-B/B-BA-A/A-BB-A/B-AA-B pA-ApB-B+pB-BpA-A+pA-BpB-A+pB-ApA-B
AA A-AB-A/B-AA-A pA-ApB-A+pB-ApA-A
AA BB A-BA-B pA-BpA-B
AB/BA A-AA-B/A-BA-A pA-ApA-B+pA-BpA-A
AA A-AA-A pA-ApA-A

Nine classes of two-locus genotypes (L1, L2) in the offspring, which all correspond to ordered diplotypes (separated by a slash, where the upper haplotype is paternal) and the probability of each class as a function of the frequencies of parental gametes (superscripts indicate the sex of the parent and subscripts indicate the haplotypes of gametes)

These terms also allow us to rewrite the genotype probabilities of Table 2 as functions of the LD-parameters and allele frequencies (see Table 3). We consider that they are arranged in a three-by-three matrix Z=(zs,q), the rows and columns of which pertain to the genotypes at the first locus and the genotypes at the second locus, respectively (the order of genotypes is BBAB / BA, and AA for both loci); for example, the probability of BBBB is:

z1,1=pB-BpB-B=Di,j+1-pi1-pj·Di,j+1-pi1-pj. 10

Table 3.

Two-locus genotype probabilities as functions of characteristic parameters

L1 L2 Probabilities
BB BB [Di,j+(1-pi)(1-pj)][Di,j+(1-pi)(1-pj)]
AB/BA [-Di,j+(1-pi)pj][Di,j+(1-pi)(1-pj)]+[Di,j+(1-pi)(1-pj)][-Di,j+(1-pi)pj]
AA [-Di,j+(1-pi)pj][-Di,j+(1-pi)pj]
AB/BA BB [-Di,j+pi(1-pj)][Di,j+(1-pi)(1-pj)]+[Di,j+(1-pi)(1-pj)][-Di,j+pi(1-pj)]
AB/BA [Di,j+pipj][Di,j+(1-pi)(1-pj)]+[Di,j+(1-pi)(1-pj)][Di,j+pipj]+[-Di,j+pi(1-pj)][-Di,j+(1-pi)pj]+[-Di,j+(1-pi)pj][-Di,j+pi(1-pj)]
AA [Di,j+pipj][-Di,j+(1-pi)pj]+[-Di,j+(1-pi)pj][Di,j+pipj]
AA BB [-Di,j+pi(1-pj)][-Di,j+pi(1-pj)]
AB/BA [Di,j+pipj][-Di,j+pi(1-pj)]+[-Di,j+pi(1-pj)][Di,j+pipj]
AA [Di,j+pipj][Di,j+pipj]

Nine classes of two-locus genotypes (L1, L2) for offspring in a full-sib family and the probability of each class as a function of the parental LD D and the A-allele frequency pi at locus i (the superscripts indicate the sex of the parent and the subscripts indicate the locus)

The covariance between the additive genotype indicators in Eq. 3 is then determined by:

covca,i,ca,j=s=13q=13ca,sca,qzs,q-s=13ca,szs,q=13ca,qz,q, 11

where the dot indicates summation over all assigned components, and zs, and z,q are the marginal genotype probabilities for the first and second locus, respectively. Furthermore, ca,1 = −1 since the first row and the first column contain genotype BB; ca,2=0 since the second row and the second column contain heterozygous genotypes; and by analogy, ca,3=1. After simplification (using Mathematica [8]), the result obtained for the off-diagonal elements of the covariance matrix Ω is:

covca,i,ca,j=Di,j+Di,j. 12

Note that the LD depends on the recombination fraction θ, which can be converted into a genetic distance x (in Morgan) using Haldane’s mapping function [9]

x=-0.5ln(1-2θ). 13

To complete the covariance matrix Ω=cov(ci,cj)i,j, the variance in the genotype indicator (as defined by Eq. 3) at each locus i is expressed as a function of A-allele frequency pi. The genotype frequencies at locus i are (1-pi)(1-pi) for genotype BB, pi(1-pi)+pi(1-pi) for AB / BA, and pipi for AA. Then,

Eca,i=-1+pi+pi 14

and

Eca,i2=1-pi-pi+2pipi, 15

and thus

varca,i=Eca,i2-E2ca,i=πi+πi 16

with πi=pi(1-pi) and πi=pi(1-pi). Note that the only possible values for var(ca,i) are 0, 14, or 12, since πi and πi can only have values of 0 or 14.

Now we have derived all of the elements of the covariance matrix Ω, so we can express the Mendelian variance for a particular trait as (Eq. 2):

σg2=mΩm.

As already mentioned, this Mendelian sampling variance can be split into the sum of two independent parental contributions:

σg2=mΩm=mΩm+mΩm, 17

where Ω and Ω represent parent-specific covariance matrices for the additive effects of single alleles based on the paternal and maternal gametes. The paternal (maternal) covariance matrix Ω(Ω) contains off-diagonal elements that are equal to Di,j (Di,j) and πi (πi) on the diagonals, according to Eqs. 12 and 16.

The variance var(ca,i) becomes zero at loci for which both parents are homozygous. The corresponding rows and columns of the covariance matrix only contain zeroes, which causes a rank deficiency. The corresponding diagonal and off-diagonal elements in the correlation matrix R are defined as zero (although they are not defined in a strictly mathematical sense) for our purposes in order to maintain rank equality between Ω and R.

Joint additive and dominance genetic effects

The covariance between two dominance effects can be derived in a similar manner as that between additive effects. Again, we consider the genotype probabilities of Table 2 arranged in a three-by-three matrix Z=(zs,q) and we define: cd,1 = −1 since the first row and the first column contain genotype BB; cd,2=1 since the second row and the second column contain heterozygous genotypes; and by analogy, cd,3 = −1. Inserting these dominance codes into Eq. 11 and expanding yield:

covcd,i,cd,j=16Di,jDi,j+4Di,j1-2pi1-2pj+4Di,j1-2pi1-2pj. 18

Analogous to the additive effects, for the variance of the dominance effects we obtain:

varcd,i=4πi+πi-16πiπi, 19

which can take values 1 or 0. In particular, var(cd,i) is equal to 1 if at least one parent is heterozygous, and 0 if both parents are homozygous. Thus, we can conclude that if one marker is homozygous in both parents, the covariance between the dominance effects is equal to 0, which is analogous to previous considerations of the rank deficiency for additive effects. Furthermore, Eqs. 18 and 19 contain the products of the characteristic parameters (Di,j,pi,pj,πi) for both the sire and the dam, which is why Ω and the Mendelian variance can no longer be split into a sum of two separate parental parts when dominance effects are included.

In order to determine the covariance cov(ca,i,cd,j) between the additive and dominance indicators, we assign the additive indicators by Eq. 3 to the first (ith) locus and the dominance indicators by Eq. 4 to the second (jth) locus, i.e.,

covca,i,cd,j=s=13q=13ca,scd,qzs,q-s=13ca,szs,q=13cd,qz,q, 20

which has to be expanded, resulting in:

covca,i,cd,j=2Di,j1-2pj+2Di,j1-2pj. 21

Exchanging the loci yields:

covcd,i,ca,j=covca,j,cd,i=2Di,j1-2pi+2Di,j1-2pi. 22

The rank deficiencies in Ω arise from variances equal to 0 as well as from perfect correlations, which is demonstrated by the two examples of joint correlation matrices for additive and dominance effects shown in the upper part of Fig. 1, as well as their corresponding parental diplotypes. The number of markers is 16 in both examples, which yield correlation matrices with dimensions 32×32. Some diagonal elements of the off-diagonal blocks, which contain correlations between additive and dominance effects, indicate a perfect dependency between the additive and dominance effects at the same locus with correlations of either 1 or −1. After the redundant rows and columns have been deleted from the dominance part of the matrix, the correlation matrices obtained (Fig. 1, bottom) exhibit a block diagonal structure. The remaining covariance matrix for dominance effects corresponds to loci for which both parents are heterozygous (five and seven for the example in Fig. 1). The first block remains unchanged numerically, but it has a different interpretation because it now represents the correlation matrix for a vector ma, which is defined as:

ma=InHnmamd, 23

where ma and md are vectors of the additive and dominance effects for all markers in order of their map position and In is an identity matrix of order n. H is a diagonal matrix with elements:

hi,i=0,ifbothparentsareheterozygousorhomozygous1,ifoneparentisheterozygousandtheotherisBB-1,ifoneparentisheterozygousandtheotherisAA. 24

Fig. 1.

Fig. 1

Correlation matrices for two matings (top) with their respective phased genotypes (middle). The marker distance is 1 cM. The dimension of both matrices is 32 due to the additive and dominance effects at the 16 markers. Block diagonal matrices (bottom) remain after the rows and columns of dominance effects are deleted when there is a perfect correlation between a dominance and additive effect at the same locus

Now, ma,i has the form:

ma,i=ma,i,ifbothparentsareheterozygousorhomozygousma,i+md,i,ifoneparentisheterozygousandtheotherisBBma,i-md,i,ifoneparentisheterozygousandtheotherisAA. 25

As a practical consequence, we can use md (i.e., md with all elements md,i eliminated, where one parent is homozygous at locus i) and ma. Then, σg2 can be computed as:

σg2=mamdΩmamd, 26

where Ω has a block-diagonal structure with unequally sized blocks, as in the example shown in Fig. 1.

Mendelian covariance with multiple traits

The Mendelian covariance between traits (tk,tl) is also of interest, especially when we aim at determining the Mendelian variance of the aggregate genotype (multiple trait breeding goal). If mk and ml are the vectors of marker effects for traits k and l, then the Mendelian covariance σg(tk,tl) between these two traits is

σgtk,tl=mkΩml. 27

The Mendelian variances and covariances for several traits can be collected in the Mendelian covariance matrix:

V=σg2(t1)σg(t1,t2)σg(t1,tN)σg(t1,t2)σg2(t2)σg(t2,tN)σg(t1,tN)σg(t2,tN)σg2(tN), 28

which is given by:

V=MΩM, 29

where each column mk of M is a vector of marker effects for trait k. The Mendelian sampling variance for the aggregate genotype can then be obtained from V and the vector f of the economic weights for all traits:

σgT2=fVf. 30

This quantity has a pivotal role in mating decisions because the total breeding value (defined as the linear combination of single breeding values ft, t=(t1,,tN)) is the most important criterion for selection.

Practical application

We compared the exact method with a recently published simulation approach [5]. The Mendelian sampling variance of gametes was calculated with both methods for each animal from a dataset that included the diplotypes of 74,353 male and female German Holstein cattle. Identical sets of recombination rates and estimates of additive marker effects were used for both methods. These parameters were derived from routine genomic evaluation data. This comparison was done for four traits: fat yield (FKG), protein yield (PKG), somatic cell score (SCS), and the direct genetic effect on stillbirth (SBd).

The exact gametic Mendelian variances σg^2 of the estimated gametic values were calculated for each trait and each animal as:

σg^2=m^Ωm^orσg^2=m^Ωm^ 31

according to Eq. 2, where m^ is the vector of the estimated additive marker effects. The simulation method used 100,000 randomly generated gametes for each animal. Thus, the estimate of the Mendelian variability for the estimated gametic values was:

σ2^g^=o=1Kwom^2K-o=1Kwom^K2, 32

where K = 100,000 is the number of simulated gametes per individual and wo is the vector of the genotype indicators for the oth simulated gamete in the individual under consideration.

The Mendelian covariances between traits were also obtained using the exact method by applying Eq. 29. Furthermore, the four traits were combined with weights of 0.1, 0.4, 0.375, and 0.175 for the traits FKG, PKG, SCS, and SBd, respectively, and the (gametic) Mendelian variances were computed for this aggregate genotype. The covariances and aggregate genotypes were not implemented in the simulation method, so no comparisons could be made with the simulation method for these quantities.

Results

Scatter plots of the estimated σ2^g^-values against their exact σg^2 counterparts are in Fig. 2. The slopes of the linear regression lines were close to unity for all of the traits and the intercepts were small, which indicated a good average agreement between both methods. The variation around the regression line was due to the Monte Carlo error of the simulation.

Fig. 2.

Fig. 2

Scatter plots of Mendelian gametic variances obtained from simulation versus the exact values. Differences between simulated values (each one derived from 100,000 simulated gametes per candidate) and exact values appear as vertical deviations from the linear regression lines (in red) and represent Monte Carlo errors. The traits include fat yield (FKG), protein yield (PKG), somatic cell score (SCS), and direct genetic effect on stillbirth (SBd)

To facilitate a better comparison between the trait combinations, the Mendelian covariances were transformed into correlations and their distributions are in Fig. 3. Interestingly, the sign and magnitude of these correlations exhibited a very high amount of variation between individuals. The Mendelian correlation between FKG and PKG was an exception because most of the values were positive and the distribution was bimodal. This bimodality is a consequence of the DGAT1 gene, for which heterozygous animals led to the smaller peak at correlations below 0.5 and homozygotes were responsible for the larger peak with correlations above 0.5.

Fig. 3.

Fig. 3

Frequency distributions of the Mendelian correlations. The frequency distributions of the correlations between FKG and all the other traits (PKG, SCS, and SBd) are shown on the left, and those between PKG and all the other traits are shown on the right. FKG fat yield, PKG protein yield, SCS somatic cell score, and SBd direct genetic effect on stillbirth

The coefficient of variation of the Mendelian variances of the aggregate genotype was 18.7 %, which was similar to the coefficient of variation of the Mendelian variances of PKG (19.0 %), SCS (20.2 %), and SBd (21.5 %), but somewhat smaller than that of FKG (30.3 %).

Discussion

The derived covariance matrices depend on prior information on the order and genetic distance of markers, as well as the parental diplotypes. They allow the calculation of the within-family variation of the estimated genetic values based on exact considerations of the genotypes, degrees of homozygosity, and linkage phases in the parents. In the classical formula, 14σa2(1-F)+14σa2(1-F), the loss of homozygosity in the parents is considered relative to the base population, where inbreeding coefficients are zero and the Mendelian variance is at maximum (12σa2) for all families. Our covariance matrices, in contrast, mirror the absolute level of marker homozygosity and the Mendelian variance reaches its maximum for fully heterozygous parents with all positive marker alleles in coupling phase. Different linkage phases can make a substantial difference in marker covariability, as shown by the example in Fig. 4, which presents the correlation matrices for two diplotypes with identical 16-marker genotypes but different linkage phases.

Fig. 4.

Fig. 4

Parent-specific correlation matrix Ω of the gametes. The respective phased genotypes are listed below. All markers are heterozygous, with a genetic distance of 1 cM between markers. For the first animal (left), markers 1–4 and 9–12 are in coupling phase, whereas markers 5–8 and 13–16 are in repulsion phase with respect to the first marker. For the second animal (right), all markers are in coupling phase

Fields of application and computational aspects

In general, the method described in this study can be applied to all diploid animals and plants. Of course, all relevant input parameters must be known, such as the marker maps, marker effects, and phased genotypes. Crosses of double haploid (i.e. fully inbred) lines occur as parents in breeding programs for plant species such as e.g. maize. An advantage of such parents is that they provide reliable diplotypes because of the complete homozygosity of genotyped grandparents, whereas the derivation of diplotypes is prone to some degree of phasing error in non-inbred populations [10]. In cases where the phase of some SNPs is only known at a probabilistic level, it may be an option to average the Mendelian sampling variances over all possible linkage phases. Simplifications may be possible, e.g. by taking only the most probable diplotypes into account. However, we did not investigate this question in detail.

For humans and mice, it has been found that marker maps generally differ for male and female parents [11]. All the covariances can be adjusted easily for sex-specific recombination rates, which is achieved most easily in the pure additive case by applying the male and female recombination rates to set up Ω and Ω. In the general case, the LD measures for both the paternal and maternal gametes D and D must be adjusted in order to obtain the adjusted covariances.

The sex of the full sibs matters for the inclusion of sex chromosomes because when all considered progeny are female, the sire can be treated as homozygous at all X-chromosomal loci and the calculation can proceed as usual. When the focus is on male progeny, such as young bulls obtained from elite matings, there is no X-chromosomal paternal contribution to the Mendelian sampling variance. Of course, dominance has no effect on the X-chromosomal Mendelian variance in males, unlike for females.

From a computational viewpoint, the purely additive case is most convenient because the parental contributions to the Mendelian variances and covariances can be calculated for a large list of potential parents and all traits by setting up the parent-specific covariance matrix. Subsequently, the parental contributions only have to be added to the total within-family variance for each mating considered. The computational time required by the exact method was roughly the same as that for the simulation approach. However, the computational demand would increase for the latter case if the Monte Carlo error needs to be reduced further.

Population-averaged Mendelian sampling variances for single traits were previously derived from large numbers of phased genotypes and available estimates of additive marker effects by Cole and VanRaden [6]. Neither simulation of gametes nor covariance matrices were used in this study since loci on the same chromosome were either assumed to be perfectly linked or fully independent. Consequently, their respective results can only be interpreted as upper and lower limits. In another study, Segelke et al. [5] took recombination within chromosomes into account by simulating gametes of individuals with known diplotypes. Parental contributions to the within-family additive genetic variance were expressed as standard deviations of gamete breeding values in a family-specific manner.

Consideration of the aggregate genotype calls for a full Mendelian sampling covariance matrix across traits, which, in the additive case, can also be derived by simulation, but this has not yet been reported in the literature. This requires that genomic breeding values are estimated for each trait of interest and each single simulated gamete and then averages of squares and cross-products are calculated over gametes. If dominance effects are to be included, pairs of paternal and maternal gametes have to be simulated. The simulation-inherent Monte Carlo errors of all single-trait variances and all pair-wise covariances will, of course, propagate and induce a joint Monte Carlo error of the resulting variability in the aggregate genotype.

From a producer’s perspective, phenotypic uniformity of a population of plants or animals is desirable because it facilitates management. Matings with high additive genetic merit and low within-family genetic variance [6] may be attractive to achieve that goal. Dominance—if of some importance for the traits under consideration—could be included for the same purpose. Breeding organizations, in contrast, are probably more interested in offspring with exceptionally high breeding values [6], since e.g. in dairy cattle, semen prices are non-linearly related to the genetic merit of bulls. For a particular mating, the probability that the estimated breeding value of offspring will be greater than a certain threshold can be determined from a normal distribution with family-specific mean and variance. The opportunity to breed the desired animals of top genetic merit can then be maximized by choosing the matings with the highest probabilities among all possible matings, possibly by taking some constraints such as inbreeding into account.

The average observed degree of homozygosity in the German Holstein dataset was 65.3 %, with a range from 25.2 to 88.0 %. These high degrees of homozygosity were exploited for computational speed by deleting the rows and columns for homozygous markers from the covariance matrix and the respective marker effects from m^. Therefore, the dimensions of the remaining vector of marker effects and the remaining covariance matrix were reduced greatly, leading to considerable computational time savings. Note that both parents had to be homozygous in the dominance case in order to reduce matrix dimension in a similar way. Clearly, computational time can be decreased by implementing parallel calculation of individuals and chromosomes. However, in the presence of dominance, each considered mating must be computed with its own covariance matrix, and thus only matings can be parallelized (the chromosomes are unaffected).

Choosing alternative genotype indicators

The formulae for the covariances and correlations are functions of the chosen genotype indicators, for which different options exist [12]. In the present study, we used (1, 0, −1) (Eq. 3) for additive effects and (−1, 1,−1) (Eq. 4) for dominance effects, but other possible indicators include (0, 1, 2) for additive effects and (0, 1, 0) or -12,12,-12 for dominance effects. All these indicators can be transformed into each other by a shift and/or a multiplication by a constant. Simply shifting the indicators does not influence the (co)variance, so it is also possible to use the formulae described in this study when additive marker effects have been estimated via the (0, 1, 2) coding.

However, multiplication of genotype codes by a constant factor does affect the (co)variance. For example, let md~ be the dominance marker effect when the indicator cd~,i=(0,1,0) is used for its estimation. The Mendelian sampling variance is then calculated by ma,d~Ωa,d~ma,d~, where ma,d~ is known, but Ωa,d~ is not known. The type of parameterization does not affect the Mendelian sampling variance, so:

ma,d~Ωa,d~ma,d~=ma,dΩa,dma,d 33

must hold, where the terms on the right-hand side are known. The relationship between indicators cd~ and cd is cd~,i=cd,i+12; hence,

covcd~,i,cd~,j=covcd,i+12,cd,j+12=14covcd,i,cd,j, 34
covcd~,i,ca,j=covcd,i+12,ca,j=12covcd,i,ca,j, 35

and

covca,i,cd~,j=covca,i,cd,j+12=12covca,i,cd,j, 36

and thus the Hadamard product

Ωa,d~=UΩa,d 37

with

U=1121121141 38

transforms the Ωa,d matrix into Ωa,d~. Instead of transforming the covariance matrix Ω, the estimated marker effects can also be transformed, which can be implemented easily by multiplying the marker effects md~ by 12.

Genetic differences can be parameterized in terms of substitution effects and dominance deviations, or in terms of additive and dominance genotype effects, as discussed in detail by Vitezica et al. [12]. For the former, the m^-vector comprises estimates of the allele substitution effects mα and dominance effects md~. Allele substitutions effects are defined as (e.g. [13]):

mα,i=ma,i+md~,i(vi-ui), 39

where ui and vi=1-ui are the population frequencies for alleles A and B, respectively. When only allele substitution effects mα are considered and dominance effects are ignored (e.g. [5]), it is possible to use the covariance matrix described in this study without changes.

If dominance effects are not ignored, the allele substitution effects must be transformed into additive marker effects in order to allow the use of the derived covariance matrix. This transformation is achieved by:

ma,i=mα,i-md~,i(vi-ui)=mα,i-12md,i(vi-ui). 40

The allele frequencies u and v, which are required for the transformation, are already available because they are required for routine estimation.

Conclusions

In this study, we proposed a new method for the exact calculation of Mendelian sampling (co-)variances based on knowledge of phased marker genotypes and marker effect estimates and we derived all the requisite formulae. The method considers inbreeding but also the absolute level of homozygosity, as indicated by the marker genotypes, while it also considers the linkage phase of the markers in both parents.

We demonstrated the applicability of our method by comparing its results with results produced by an established simulation method using a large dairy cattle dataset. We found that both approaches agreed within the range of the Monte Carlo error, which is inherent in the simulation, but which can be fully avoided because the derived covariance matrices represent an infinitely large number of progeny.

Authors' contributions

SB and FT derived the formulae for all the covariances, with the help of NR. FT introduced the use of LD parameters and their clear representation. The theory was implemented in Fortran programs by MR and SB, with some help from NR. DS ran the simulations and the exact calculations at VIT, while SB performed the comparative analysis. The manuscript was drafted by SB, with contributions from all the coauthors. NR was responsible for the general concept and supervised all of the work. All authors read and approved the final manuscript.

Acknowledgements

We thank Friedrich Reinhard (VIT Verden) for fruitful discussions. We thank three anonymous reviewers for their valuable comments. Financial support for SB and MR from BMBF via Phänomics under Grant 0315536G is gratefully acknowledged. The publication of this article was funded by the Open Access fund of the Leibniz Association.

Competing interests

The authors declare that they have no competing interests.

Contributor Information

Sarah Bonk, Email: bonk@fbn-dummerstorf.de.

Manuela Reichelt, Email: reichelt@fbn-dummerstorf.de.

Friedrich Teuscher, Email: teuscher@fbn-dummerstorf.de.

Dierck Segelke, Email: segelke@vit.de.

Norbert Reinsch, Email: reinsch@fbn-dummerstorf.de.

References

  • 1.Dempfle L. Problems in the use of the relationship matrix in animal breeding. In: Gianola D, Hammond K, editors. Advances in statistical methods for genetic improvement of livestock. 1. New York: Springer; 1990. [Google Scholar]
  • 2.Quaas RL, Pollak EJ. Mixed model methodology for farm and ranch beef cattle testing programs. J Anim Sci. 1980;51:1277–1287. [Google Scholar]
  • 3.Neugebauer N, Räder I, Schild HJ, Zimmer D, Reinsch N. Evidence for parent-of-origin effects on genetic variability of beef traits. J Anim Sci. 2010;88:523–532. doi: 10.2527/jas.2009-2026. [DOI] [PubMed] [Google Scholar]
  • 4.Henderson CR. A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics. 1976;32:69–83. doi: 10.2307/2529339. [DOI] [Google Scholar]
  • 5.Segelke D, Reinhardt F, Liu Z, Thaller G. Prediction of expected genetic variation within groups of offspring for innovative mating schemes. Genet Sel Evol. 2014;46:42. doi: 10.1186/1297-9686-46-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cole JB, VanRaden PM. Use of haplotypes to estimate Mendelian sampling effects and selection limits. J Anim Breed Genet. 2011;128:446–455. doi: 10.1111/j.1439-0388.2011.00922.x. [DOI] [PubMed] [Google Scholar]
  • 7.Wittenburg D, Guiard V, Teuscher F, Reinsch N. Comparison of statistical models to analyse the genetic effect on within-litter variance in pigs. Animal. 2008;2:1559–1568. doi: 10.1017/S1751731108002851. [DOI] [PubMed] [Google Scholar]
  • 8.Wolfram S. The MATHEMATICA ® book, version 4. Cambridge: Cambridge University Press; 1999. [Google Scholar]
  • 9.Haldane JBS. The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet. 1919;8:299–309. doi: 10.1007/BF02983270. [DOI] [Google Scholar]
  • 10.Browning SR, Browning BL. Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011;12:703–714. doi: 10.1038/nrg3054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kong X, Murphy K, Raj T, He C, White PS, Matise TC. A combined linkage-physical map of the human genome. Am J Hum Genet. 2004;75:1143–1148. doi: 10.1086/426405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Vitezica ZG, Varona L, Legarra A. On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics. 2013;195:1223–1230. doi: 10.1534/genetics.113.155176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Falconer D, Mackay T. Introduction to quantitative genetics. 4. Harlow: Longman Group Ltd; 1996. [Google Scholar]

Articles from Genetics, Selection, Evolution : GSE are provided here courtesy of BMC

RESOURCES