On estimation of genetic variance within families using genome-wide identity-by-descent sharing

William G Hill

doi:10.1186/1297-9686-45-32

. 2013 Sep 3;45(1):32. doi: 10.1186/1297-9686-45-32

On estimation of genetic variance within families using genome-wide identity-by-descent sharing

William G Hill ^1,^✉

PMCID: PMC3871764 PMID: 24007429

Abstract

Background

Traditionally, heritability and other genetic parameters are estimated from between-family variation. With the advent of dense genotyping, it is now possible to compute the proportion of the genome that is shared by pairs of sibs and thus undertake the estimation within families, thereby avoiding environmental covariances of family members. Formulae for the sampling variance of estimates have been derived previously for families with two sibs, which are relevant for humans, but sampling errors are large. In livestock and plants much larger families can be obtained, and simulation has shown sampling variances are then much smaller.

Methods

Based on the assumptions that realised relationship of sibs can be obtained from genomic data and that data are analyzed by restricted maximum likelihood, formulae were derived for the sampling variance of the estimates of genetic variance for arbitrary family sizes. The analysis used statistical differentiation, assuming the variance of relationships is small.

Results

The variance of the estimate of the additive genetic variance was approximately proportional to 1/ (fn² $σ_{R}^{2}$ ), for f families of size n and variance of relationships $σ_{R}^{2}$ .

Conclusions

Because the standard error of the estimate of heritability decreased in proportion to family size, the use of within-family information becomes increasingly efficient as the family size increases. There are however, limitations, such as near complete confounding of additive and dominance variances in full sib families.

Background

Quantitative genetic parameters such as heritability have traditionally been estimated from the variation among full- or half-sib families, or from the parent-offspring covariance [1,2]. The covariance among sibs is assumed to be proportional to the pedigree relationship, but relatives may be further correlated because they share a common environment. This problem arises particularly in humans and, although sire families can be used in livestock to minimise the environmental covariance of sibs, these and weaker relationships come at the cost of higher sampling errors of heritability estimates because the correlation between sibs has to be multiplied by the inverse of the relationship to obtain an estimate of heritability. Estimates of heritability from non-pedigreed populations also rely heavily on getting good estimates of pedigree relationship [3], which is difficult unless relationships are very close, and environmental confounding can still a source of bias.

Although pairs of full-sibs, for example, share half their genome on average, individual pairs do not because of Mendelian sampling of large chromosome segments. Such a discrepancy at pairs of loci is the basis of QTL (quantitative trait locus) mapping using, for example, the method of Haseman and Elston [4], to associate the phenotypic divergence between sibs to differences in marker frequency. Dense marker genomes are now available, and Visscher et al. [5] proposed that the actual or realised relationships between sibs can be estimated from genomic data and the association between the actual relationship and phenotypic similarity used to estimate the genetic covariance within families, thereby eliminating correlations due to shared environment. Visscher and colleagues used data on human dizygotic twins and full-sibs, first from microsatellites [5] and subsequently from SNPs (single nucleotide polymorphisms) [6] to estimate the level of genome sharing and thus trait heritability. In a later paper, Visscher [7] discussed the theory further. However, the sampling error of the estimates of genetic variance was high because the variation in actual relationship was small (typical standard deviation (SD) of 3.9% of the mean of 50% for human full-sibs, as expected from theory [5,7-10]). Since family sizes in humans are also very small, many are needed for precise estimation.

Ødegård and Meuwissen [11] pointed out that the method of Visscher et al. [5] could be used in very large families, such as for fish species, and for which it is not always practical to avoid rearing full-sibs together. They showed by simulation that sampling errors of the resulting estimates of heritability are substantially reduced as family size increases and are smaller with a few large families than with many small families. These results raise the following basic question: for a family of n sibs, is the information content, i.e. the inverse of the sampling variance of the estimate of heritability, approximately proportional to family size n (or e.g. to n -1) or to the number of pairs in the family, ½n(n -1)? The simulation results of Ødegård and Meuwissen [11] indicated the latter. Furthermore, PM Visscher (personal communication) showed that, using genomic relationships estimated from a sample of N individuals from the population, the sampling variance is a function of N². The difference between methods with sampling variances that depend on approximately squares of numbers rather than numbers of individuals is not trivial and clearly has an important impact on their design and potential utility.

The model used by Ødegård and Meuwissen [11] was based on a finite number (80) of genomic blocks that were individually marked, and with trait effects that were identically normally distributed for each block. In this note, we quantify these estimates and show how they depend on the design and variation in realised relationships. We adopt a model in which the realised relationship is continuous over the genome and with trait effects that are uniformly distributed across the genome. To calculate sampling errors, Visscher et al. [5] used regression of the squared phenotypic difference of sibs on the estimated actual relationship from tracking genome segments, whereas Ødegård and Meuwissen [11] used a REML (restricted maximum likelihood) analysis within and between families with estimated realised relationships for a finite number of genome segments. In the present analysis the data were assumed to be analysed by REML. Implications for design of experiments are discussed.

Analysis

Let us assume that the data are from matings of unrelated individuals and comprise f (≥ 1) families each of size n (≥ 2). The extension to variable n is straightforward and deferred meanwhile. The mean (i.e. pedigree) numerator relationship within families is A (e.g. 0.25 for half-sibs or 0.5 for full-sibs) and the within-family variance of actual relationships is $σ_{R}^{2}$ . We also assume that all sibs share the same environment and, for simplicity, as in the work of Visscher et al. [5,6], that additive genetic variance is estimated using only within-family differences; in essence, family effects are regarded as fixed. Therefore information is accumulated independently across families and no bias or sampling error arises due to common environment, albeit at the cost of losing potential between-family genetic information.

Additive model

Initially, we assumed that gene effects were additive but subsequently extended the results to include dominance. The additive genetic variance is $σ_{A}^{2}$ , the residual environmental variance is $σ_{E}^{2}$ , and so the within-family variance is $σ_{W}^{2}$ = (1 - A) $σ_{A}^{2}$ + $σ_{E}^{2}$ . The phenotypic variance is given by $σ_{P}^{2}$ = A $σ_{A}^{2}$ + $σ_{C}^{2}$ + $σ_{W}^{2}$ , where $σ_{C}^{2}$ is the variance due to common environment. In the analysis, it is convenient to parameterise the actual relationship between family members i and j in terms of deviations from mean pedigree-based relationships: r_ij = A_ij - A. The n × n covariance matrix V of observations y within a family of n sibs is then $var (y) = V = I σ_{W}^{2} + R σ_{A}^{2}$ , where I is the identity matrix and elements of R are r_ij, i ≠ j, and r_ii = 0.

The sampling variance of the parameter estimates can be approximated by using a Taylor series expansion in r_ij because these deviations are small, and then taking expectations so as to obtain Fisher’s information matrix S (the inverse of the variance covariance matrix) for the REML estimates of variance components ${\hat{σ}}_{A}^{2}$ and ${\hat{σ}}_{W}^{2}$ , respectively. The derivation is rather complicated, so details are given in Appendix 1. For a family of size n it is shown that:

S = \frac{n - 1}{2 σ_{W}^{4}} (\begin{array}{c} m σ_{R}^{2} & - 2 m σ_{R}^{2} σ_{A}^{2} / σ_{W}^{2} \\ - 2 m σ_{R}^{2} σ_{A}^{2} / σ_{W}^{2} & 1 + 3 m σ_{R}^{2} σ_{A}^{4} / σ_{W}^{4} \end{array})

(1)

where m = n(1 – 2/n + 2/n²). Since between-family relationships are not used, information S_k from family k is merely summed over families, with corresponding elements for family size n_k and m_k, k = 1, … , f. The overall variance-covariance matrix of the estimates is:

C = (\begin{array}{c} var ({\hat{σ}}_{A}^{2}) & cov ({\hat{σ}}_{A}^{2}, {\hat{σ}}_{W}^{2}) \\ cov ({\hat{σ}}_{A}^{2}, {\hat{σ}}_{W}^{2}) & var ({\hat{σ}}_{W}^{2}) \end{array}) = {(\sum_{k} S_{k})}^{- 1}

With f families of equal size, from (1):

\begin{array}{l} C = & (\frac{{2 σ}_{W}^{4}}{f (n - 1) {m σ}_{R}^{2} ({1 - m σ}_{R}^{2} σ_{A}^{4} / σ_{W}^{4})}) \\ \times (\begin{array}{c} 1 + 3 {m σ}_{R}^{2} σ_{A}^{4} / σ_{W}^{4} & 2 m σ_{R}^{2} σ_{A}^{2} / σ_{W}^{2} \\ 2 m σ_{R}^{2} σ_{A}^{2} / σ_{W}^{2} & {m σ}_{R}^{2} \end{array}) \end{array}

(2)

The estimate of the environmental variance is ${\overset{⌢}{σ}}_{E}^{2} = {\overset{⌢}{σ}}_{W}^{2} - \frac{1}{2} {\overset{⌢}{σ}}_{A}^{2}$ and hence $var ({\overset{⌢}{σ}}_{E}^{2}) = c_{22} - c_{12} + \frac{1}{4} c_{11}$ and $cov ({\hat{σ}}_{A}^{2}, {\hat{σ}}_{E}^{2}) = c_{12} - \frac{1}{2} c_{11}$ , where c_ij are elements of C. Taking just $σ_{A}^{2}$ and $σ_{E}^{2}$ into account, ${\hat{σ}}_{P}^{2} = {\hat{σ}}_{A}^{2} + {\hat{σ}}_{W}^{2}$ , and the sampling error of the corresponding heritability estimate, ${\hat{h}}^{2} = {\hat{σ}}_{A}^{2} / {\hat{σ}}_{P}^{2}$ , can be approximated using standard formulae for ratios (see e.g. page 818 in [2]). Between-family information, not included in the data used above, has to be incorporated to estimate the phenotypic variance and heritability if common family environment or allowance for non-additive effects is to be included.

If the quantity $m σ_{R}^{2} σ_{A}^{4} / σ_{W}^{4}$ is small, the determinant of S is dominated by its diagonal elements and var ( ${\overset{⌢}{σ}}_{A}^{2}$ ) simplifies to:

var ({\hat{σ}}_{A}^{2}) \approx 1 / s_{11} = 2 σ_{W}^{4} / [f (n - 1) m σ_{R}^{2}]

(3)

Hence for families of n = 2 individuals, m = 1 and $var ({\hat{σ}}_{A}^{2}) \approx 2 σ_{W}^{4} / (f σ_{R}^{2})$ . This corresponds to the formula of Visscher et al. [5] for the sampling error of the heritability estimate: ${2 (1 - t)}^{2} / (f σ_{R}^{2})$ , where t is the intra-class correlation of family members. As n increases, m(n - 1) = n(n – 3 + 2/n - 2/n²) → n(n – 3) → n². If $σ_{R}^{2}$ is small and n large, then var( ${\hat{σ}}_{A}^{2}$ ) $~ 2 σ_{W}^{4} / ({f n}^{2} σ_{R}^{2})$ .

The variation in relationships within a family depends on whether family members are full- or half-sibs, on the total map length (L) of the chromosomes and, to a limited extent, on their individual lengths [5,7,10]. To a good approximation, $σ_{R}^{2}$ ~ 1/(16 L) – 1/(3 L²) for full-sibs and one-half of that for half-sibs [5,7]. For humans, the number of autosomes is 22 and the total map length is 35.9 M, so $σ_{R}^{2}$ is approximately 0.00153 for full-sibs and 0.00077 for half-sibs (SD = 0.039 and 0.028). Therefore, for full-sib families of a species with a map length and chromosome number similar to humans, SE( ${\hat{σ}}_{A}^{2}$ ) ~ 36 $σ_{W}^{2} / [\sqrt{fn (n - 3)})$ , e.g. 0.28 $σ_{W}^{2}$ for 50 families of size 20 and 0.17 $σ_{W}^{2}$ for 20 families of size 50. Cattle, for example, have 29 autosomes and a map length of 32.5 M [12], so $σ_{R}^{2}$ would be a little larger and the sampling variance of estimates of heritability correspondingly smaller.

Simulation check on approximations

In the analysis in Appendix 1, many simplifying assumptions were made in the Taylor series analysis. As a partial check, simulation was undertaken for a model of 22 chromosomes, each 1.632 M long, i.e. the mean length of human chromosomes, and relationships were simulated with the programme used previously to check formulae for variance in relationships [10]. (The distribution of relationships would be little affected if map lengths varied [10]). The information matrix S was then computed directly from equation (A1) and from the approximation in Equation (1). For simplicity, however, it was assumed that the contrast matrix K (see below equation (A1)) was invariant (see examples in Table 1). In general, there was good agreement between the observed and the approximate predicted estimates of sampling variance (Table 1), but this deteriorated as family size increased, with the approximation generally underestimating the sampling variance. This bias would be greater if $σ_{R}^{2}$ were higher. Although, if only a single chromosome was fitted $σ_{R}^{2}$ would be much greater, the additive variance contributed by it would be only a fraction of the total and, as the example in Table 1 shows, the approximation remains good. Table 1 also gives predictions based solely on Equation (3), showing a good fit with those obtained directly from Equation (2).

Table 1.

Comparison of var( ${\hat{σ}}_{A}^{2}$ ) predicted from the information matrix directly and from the Taylor series approximation^*

Family	HS			FS			FS			FS			FS
h²	0.5			0.25			0.5			0.75			0.04
chr	22			22			22			22			1
n	5	15	25	5	15	25	5	15	25	5	15	25	5	15	25
Eq (A1)	174	12.2	4.15	94	6.67	2.26	82.6	5.94	2.04	71.4	5.20	1.81	4.80	0.354	0.127
Eq (1)	182	11.8	3.88	101	6.56	2.18	88.6	5.83	1.97	77.1	5.23	1.82	5.26	0.331	0.110
Eq (3)	182	11.7	3.80	101	6.53	2.16	88.1	5.69	1.88	76.0	4.90	1.62	5.26	0.330	0.110

Open in a new tab

*Predictions were obtained directly by inverting the realised information matrix (eq A1) obtained from sampling relationships, and from the Taylor series approximation eq. (1) using the variance of relationships directly; variances were computed by averaging information over samples of 100 families, but are expressed for a single family, so for f families var( ${\hat{σ}}_{A}^{2}$ ) should be divided by f; predictions using the simplification eq. (3) are shown similarly; results are for half (HS) and full (FS) sib families; h² is the proportion of variance contributed by the fitted chromosomes; chr is the number of chromosomes; chr = 22 denotes the whole genome; chr = 1 denotes a single chromosome.

Dominance

In full-sib families, both additive and dominance variance can, in principle, be estimated. Derivation of the extended information matrix is given in Appendix 2. It depends on the variance $σ_{Q}^{2}$ in dominance relationships (about its mean of ¼) and the covariance between dominance and additive relationships, cov_RQ. However, as Visscher et al. [5] pointed out, the additive and dominance relationships within families are very highly correlated, since the additive coefficient depends on the average number of paternal and maternal genes that are shared identical by descent at a locus and the dominance coefficient on whether both are shared. The regression of dominance on additive relationships (cov_RQ / $σ_{R}^{2}$ ) is equal to 1 and their correlation is approximately 0.9. This implies that, in practice, partitioning $σ_{A}^{2}$ and $σ_{D}^{2}$ using within-family information is probably not feasible and furthermore that if only an additive model is used, the estimate of $σ_{A}^{2}$ is biased upwards by $σ_{D}^{2}$ ; indeed it essentially has expectation $σ_{A}^{2} + σ_{D}^{2}$ .

Discussion and conclusions

The analysis shows that the sampling variances of estimates of heritability based on within-family realized relationships fall roughly in proportion to n² as family size n increases, i.e. based on the number of pairwise comparisons among individuals in the family, and in proportion to the number of families. Therefore, when undertaking such an analysis, it is more efficient to use few very large families, although one might be reluctant to use just one or very few families in case they are atypical [11]. Here, a model of a continuous genome was used, rather than a finite number of independent regions as by Ødegård and Meuwissen [11], and the calculations assumed a fairly even distribution of genetic variance along the genome. If there is much heterogeneity, e.g. a few QTL of large effect, the sampling errors of genetic variance estimates would increase. In the present analysis, we make the assumption that shared segments are identified accurately, for example using Merlin [13].

Ødegård and Meuwissen [11] investigated the effect of selectively genotyping only the individuals with high and low phenotypes within a family, when all phenotypes are included in the REML analysis. The efficiency of this approach was good in terms of sampling errors but estimates of heritability were biased downwards when sample sizes were small. This may reflect insufficient marker coverage of the genes of interest because of lack of linkage disequilibrium, in which case this bias may be hard to avoid, but possibly also bias caused by selection.

They also estimated actual relationships from a finite number of markers and, occasionally, obtained a singular matrix in their simulated replicates [11]. To check the causes, simulated relationships were sampled from a continuous chromosome model [10] and the exact allele sharing was computed. Pairs of individuals can inherit identical non-recombinant short chromosomes, thereby yielding a positive semi-definite relationship matrix (i.e. including zero but not negative eigenvalues). In the unlikely event that this occurs at all chromosomes, the data can still be analysed by REML. Negative eigenvalues were not obtained in our simulations and indeed seem infeasible, because the relationships were jointly sampled. Negative eigenvalues are a consequence of the estimation of weak relationships from marker data and might arise in practice.

A different approach to estimating the genetic variance free of common environment was suggested by Yang et al. [14]. They fitted by regression all the SNPs to data from individuals sampled from the population that are not known to be related and from which any pairs with a relationship above a low threshold have been removed, so as to minimise the chance of shared environment. Such an analysis is expected to give a lower estimate of heritability than the within-family analysis discussed here, however, because marker-associated effects in the population can be missed through incomplete linkage disequilibrium, especially when traits genes have low minor allele frequencies, as indeed seems to be the case [14].

A ‘back of the envelope’ calculation allows a simple comparison of the sampling errors of estimates of additive genetic variance from within families utilising variation in relationship, ${\hat{σ}}_{Aw}^{2}$ , and from between families using ANOVA, ${\hat{σ}}_{Ab}^{2}$ (Appendix 3). Provided the families are not small, $var ({\hat{σ}}_{Aw}^{2}) / var ({\hat{σ}}_{Ab}^{2}) \approx (A^{2} / σ_{R}^{2}) / {[1 + nA σ_{A}^{2} / σ_{W}^{2}]}^{2}$ . With use of half-sib families (A = 1/4) to eliminate maternal effects in the between-family estimate, for a genome of ‘human’ length, $(A^{2} / σ_{R}^{2})$ = (0.25/0.028)² ~ 80. Assuming the heritability is 1/3, such that $A σ_{A}^{2} = \frac{1}{5} σ_{W}^{2}$ , the ratio of variances is approximately 80/(1 + n/5)², equalling 1.0 when n ~ 40. This implies that, with half-sib families of size 40, a similar amount of information would be obtained from within- and between-family data. With fewer larger families, the estimate from within-family information would have the lower standard error. Furthermore, because the within- and between-family estimates use the data in a different way they are, presumably, uncorrelated and so they can be simply combined. However, estimates from both sources may be biased to different extents by common environment, dominance, epistasis, etc., so specific applications require specific consideration.

There are other aspects that could be examined. For example, additive and within-family genetic covariances and correlations among traits can be estimated from a multi-trait analysis with the same data structure. Clearly the magnitude of their sampling errors is structured similarly to those of the corresponding variances of the individual traits. Estimation of variation due to any individual autosome can be achieved by fitting just the relationship on this chromosome, and similarly for the sex chromosome [6]. The variance of the corresponding relationships is then much higher and depends on the length of the chromosome, decreasing roughly in proportion to its length. Although $var ({\hat{σ}}_{A}^{2})$ per chromosome is then much smaller, the coefficient of variation of its estimate may be similar to that for the whole genome under the simplest assumption that the contribution by any chromosome to $σ_{A}^{2}$ is roughly proportional to its length.

A problem specific to the within-family approach is the high degree of confounding between additive and dominance effects in full-sib families (albeit there is also complete confounding in a between full-sib family analysis). This is not resolved by estimating $σ_{A}^{2}$ separately from maternal and paternal sharing, since the dominance coefficient is the correlated intersection of these. The point is that, while maternal genomic similarity appears to include only the additive component because only one sire is involved, interactions between sire and dam effects, i.e. dominance, are included. Half-sib families with multiple dams per sire or a cross classified structure are needed, similar to when between-family correlations are used for estimation.

If, for example, a number of males and females are put together for mating in a single environment, then the pedigree can be obtained from genetic markers. Hence, paternal half-sibs, maternal half-sibs and full-sibs can be distinguished and the between-family covariance can be used. Additional information from within-family segregation could be identified via the markers, but this would likely contribute little. For example, in a pen comprising such a diallel structure, the variation in pedigree relationships (A = 0, ¼ or ½) is likely to be much larger than the variation in realised relationships among pairs with the same pedigree relationship.

Epistatic variance provides other associated difficulties of potential confounding and estimation. On a whole-genome basis, the relevant coefficient for the additive × additive variance component is the square of the relationship, which is highly correlated with the additive coefficient. Thus, similar to the analyses between families, obtaining a satisfactory partition between additive and additive × additive or higher order components is probably not feasible. A further problem is potential bias due to epistatic effects in the estimation of additive (e.g. from additive × additive effects) and dominance variance. Although the expected probability that sibs share alleles at pairs of genomic sites is small for the genome as a whole, it is much higher for nearby sites. Thus, if epistatic effects are substantial and predominately cis-acting, this bias could be important. To partially address this, Visscher et al. [6] fitted the mean relationship for each chromosome in a multiple regression model for human height. The variance removed by fitting variation in relationships for each chromosome was essentially the same whether chromosomes were fitted independently or in a joint analysis, indicating little or no interaction between regions on different chromosomes. Extending this more generally needs genomic regions to be defined such that joint identity by descent can be computed.

Within-family analysis, particularly when families are large, has attractive features because. it avoids bias due to common environment effects, but it introduces other potential confounding effects, as noted above. It also requires much genotyping and associated costs. Although in a breeding context this type of information may be available when collecting data to implement genomic prediction and subsequent selection, estimates of the variance components may not in themselves have value beyond what is obtained from the marker trait associations. But this is something to think about.

Appendix 1: Derivation of the sampling variance for the additive model

For the REML analysis, the information matrix S, which in turn yields the sampling variances based on S^-1 for the estimates of $σ_{A}^{2}$ and $σ_{W}^{2}$ for each family, is defined by Lynch and Walsh (see page 791 in [2]):

S = \frac{1}{2} (\begin{array}{c} tr (PRPR) & tr (PRP \\ tr (PRP) & tr (PP) \end{array}),

(A1)

where tr denotes the trace operator. Matrix P = K’(KVK’)^-1K and K_{(n – 1) x n} defines contrasts such that KX = 0, where X is the design matrix and, since family members are contemporaneous in the same environment, X is a unit vector. The Helmert contrasts are suitable for K: for i = 1, …, n – 1: k_ij = [(i(i + 1)]^-1/2, j ≤ i; k_{i ,i + 1} = -[(i/(i + 1)]^1/2 and k_ij = 0, j > i + 1. Note that KK’ = I_{(
n - 1)×(n - 1)} and K’K = I_{n × n}- $\frac{1}{n}$ J_{n× n}, where all elements of J equal 1, and (K’K)² = K’K.

The expected information using the Taylor series expansion has terms of the following form:

\begin{array}{l} E (PRPR) = & {PRPR}_{| R = 0} + \sum_{i \leq j} \partial (PRPR) / {\partial r}_{ij | R = 0} E (r_{ij}) \\ + \frac{1}{2} \sum_{i \leq j} \sum_{k \leq l} \partial^{2} (PRPR) / {\partial r}_{ij} {\partial r}_{kl | R = 0} E (r_{ij} r_{kl}) + \dots \end{array}

We note that E(r_ij) = 0 and, assuming independent Mendelian segregation to each offspring, E(r_ijr_kl) = 0, i ≠ k and/or j ≠ l and E(r_ij)² = $σ_{R}^{2}$ , where $σ_{R}^{2}$ is the variance in relationship. Differentiating

\begin{array}{l} \frac{\partial (PRPR)}{{\partial r}_{ij}} = & \frac{\partial P}{{\partial r}_{ij}} RPR + P \frac{\partial R}{{\partial r}_{ij}} PR + PR \frac{\partial P}{{\partial r}_{ij}} R \\ + RPR \frac{\partial R}{{\partial r}_{ij}}, \end{array}

(A2)

and when evaluated as R → 0, all terms in (A2) become zero. Furthermore, differentiating (A2) to obtain the second derivative, all remaining terms in R are also zero; and as R is linear in r_ij, ∂²R/∂r_ij∂r_kl = 0. Finally, as E(r_ijr_kl) = 0 unless i = k and j = l, E(PRPR) reduces to

E (PRPR) \approx \frac{1}{2} \sum_{i < j} (P \frac{\partial R}{{\partial r}_{ij}} P \frac{\partial R}{{\partial r}_{ij}} + \frac{\partial R}{{\partial r}_{ij}} P \frac{\partial R}{{\partial r}_{ij}} P) σ_{R}^{2} .

Let ∂R/r_ij = X_ij, with elements x_ij = x_ji = 1 and 0 otherwise; so taking R → 0,

E (PRPR) \approx \frac{1}{2} \sum_{i < j} ({PX}_{ij} {PX}_{ij} + X_{ij} {PX}_{ij} P) σ_{R}^{2}

(A3)

As R → 0, V → P = K’(KVK’)^-1K → (I – $\frac{1}{n}$ J)/ $σ_{W}^{2}$ . Defining further matrices, Y_ij where y_ii = y_jj = 1 and 0 otherwise, and W_ij where w_ik = w_jk = 1, k = 1, …, n, and 0 otherwise, we have X_ijX_ij = Y_ij, JX_ij = JY_ij = W_ij, W_ijW_ij = 2W_ij, and tr(X_ij) = 0, tr(Y_ij) = tr(W_ij) = 2. As the trace operator is commutative, it follows that by summing over the n(n – 1)/2 off diagonal elements in (A3), all having the same expectation,

\begin{array}{l} E [tr (PRPR)] & \approx \frac{1}{2} n (n - 1) tr [(I - J / n) X_{ij} (I - J / n) X_{ij})] σ_{R}^{2} / σ_{W}^{4} \\ \approx \frac{1}{2} n (n - 1) tr (Y_{ij} - 2 W_{ij} / n + 2 W_{ij} / n^{2}) σ_{R}^{2} / σ_{W}^{4} \\ \approx n (n - 1) (1 - 2 / n + 2 / n^{2}) σ_{R}^{2} / σ_{W}^{4} = (n - 1) m σ_{R}^{2} / σ_{W}^{4} \end{array}

(A4)

where m = n(1 – 2/n + 2/n²).

We give less detail for other terms in the information matrix.

\frac{\partial (PRP)}{{\partial r}_{ij}} = \frac{\partial P}{{\partial r}_{ij}} RP + P \frac{\partial R}{{\partial r}_{ij}} P + PR \frac{\partial P}{{\partial r}_{ij}} .

Non-zero second derivatives must involve differentiation once of P and once of R. Hence

\begin{array}{l} E (PRP) \approx \frac{1}{2} \sum_{i < j} (2 \frac{\partial P}{{\partial r}_{ij}} \frac{\partial R}{{\partial r}_{ij}} P + 2 P \frac{\partial R}{{\partial r}_{ij}} \frac{\partial P}{{\partial r}_{ij}}) σ_{R}^{2} \\ \frac{\partial P}{{\partial r}_{ij}} = - K' {(KVK')}^{- 1} K \frac{\partial V}{{\partial r}_{ij}} K' {(KVK')}^{- 1} K and, as R \to 0, \\ \frac{\partial P}{{\partial r}_{ij}} \to - (I - \frac{1}{n} J) X_{ij} (I - \frac{1}{n} J) σ_{A}^{2} / σ_{W}^{6}, so \\ E (PRP) \approx \frac{1}{2} \sum_{i < j} - 4 (I - \frac{1}{n} J) X_{ij} (I - \frac{1}{n} J) X_{ij} (I - \frac{1}{n} J) σ_{R}^{2} σ_{A}^{2} / σ_{W}^{6} \end{array}

(A5)

As the trace is commutative and I – $\frac{1}{n}$ J is idempotent, putting the last such matrix in (A5) first, we see that:

\begin{array}{l} E [tr (PRP)] \approx - 2 n (n - 1) (1 - 2 / n + 2 / n^{2}) σ_{A}^{2} σ_{R}^{2} / σ_{W}^{6} \\ = - 2 (n - 1) m σ_{A}^{2} σ_{R}^{2} / σ_{W}^{6} . \end{array}

When R = 0, P = (1 – 1/n)/ $σ_{W}^{2}$ and tr(PP) = (n – 1)/ $σ_{W}^{4}$ . Now considering the terms in r_ij,

\frac{\partial^{2} (PP)}{\partial r_{ij}^{2}} = \frac{\partial^{2} P}{\partial r_{ij}^{2}} P + 2 \frac{\partial P}{\partial r_{ij}} \frac{\partial P}{\partial r_{ij}} + P \frac{\partial^{2} P}{\partial r_{_{ij}}^{2}}

(A6)

with additional terms that become 0 as R → 0.

In (A6) $\frac{\partial P}{\partial r_{ij}} = - K' {(KVK')}^{- 1} K \frac{\partial V}{\partial r_{ij}} K' {(KVK')}^{- 1} K$

\frac{\partial^{2} P}{\partial r_{ij}^{2}} = 2 K' {(KVK')}^{- 1} K \frac{\partial V}{\partial r_{ij}} K' {(KVK')}^{- 1} K \frac{\partial V}{\partial r_{ij}} K' {(KVK')}^{- 1} K

And hence, using the commutative property,

\begin{array}{l} tr (\frac{\partial^{2} (PP)}{\partial r_{ij}^{2}}) & = 6 tr (K' {(KVK')}^{- 1} K \frac{\partial V}{\partial r_{ij}} K' {(KVK')}^{- 1} K \frac{\partial V}{\partial r_{ij}} K' {(KVK')}^{- 1} K) \\ = 6 tr ((I - \frac{1}{n} J) X_{ij} (I - \frac{1}{n} J) X_{ij} (I - \frac{1}{n} J)) σ_{A}^{4} / σ_{W}^{8} . \end{array}

Therefore, using previous results,

E [tr (PP)] \approx (n - 1) / σ_{W}^{4} + 3 nm σ_{R}^{2} σ_{A}^{4} / σ_{W}^{8}

thus completing the derivation of the information matrix in Equation (1) of the main text.

Appendix 2: Fitting additive and dominance variances

Let $V = I σ_{w}^{2} + R σ_{A}^{2} + Q σ_{D}^{2}$ of dimension n × n, where, for full sib families, $σ_{W}^{2} = σ_{E}^{2} + \frac{1}{2} σ_{A}^{2} + \frac{3}{4} σ_{D}^{2}$ . Additive and dominance effects of the loci are assumed to be uncorrelated. Let Q with elements q_ij define the departure of the realised dominance correlation of full sibs from the expected ¼, and let $σ_{Q}^{2}$ denote var(q_ij) and similarly cov_RQ denote cov(r_ij, q_ij). The information matrix is now [2]:

S = \frac{1}{2} (\begin{array}{c} tr (PRPR) & tr (PRPQ) & tr (PRP \\ tr (PRPQ) & tr (PQPQ) & tr (PQP) \\ tr (PRP) & tr (PQP) & tr (PP) \end{array}) .

The term E[tr(PRPR)] ≈ (n -1)m $σ_{R}^{2} / σ_{W}^{4}$ is unchanged from the additive case and, by symmetry,

\begin{array}{l} E [tr (PQPQ)] \approx (n - 1) {m σ}_{Q}^{2} / σ_{W}^{4} and \\ E [tr (PRPQ)] \approx (n - 1) {mcov}_{RQ} / σ_{W}^{4} . \end{array}

The derivative of the term PRP with respect to r_ij remains

\frac{\partial (PRP)}{{\partial r}_{ij}} = \frac{\partial P}{{\partial r}_{ij}} RP + P \frac{\partial R}{{\partial r}_{ij}} P + PR \frac{\partial P}{{\partial r}_{ij}},

and the expectation of its second derivative with respect to r_ij is unchanged. However, now taking the second derivative with respect to q_ij, we obtain additional terms with non zero expectation,

\frac{\partial^{2} (PRP)}{{\partial r}_{ij} \partial q_{ij}} = \frac{\partial P}{\partial q_{ij}} \frac{\partial R}{{\partial r}_{ij}} P + P \frac{\partial R}{{\partial r}_{ij}} \frac{\partial P}{\partial q_{ij}} .

Hence E [tr (PRP)] \approx - 2 (n - 1) m (σ_{R}^{2} σ_{A}^{2} + {cov}_{RQ} σ_{D}^{2}) / σ_{W}^{4},

and similarly

E[tr(PQP)] ≈ $- 2 (n - 1) m ({cov}_{RQ} σ_{A}^{2} + σ_{Q}^{2} σ_{D}^{2}) / σ_{W}^{4}$ . The term E[tr(PP)] is non-zero when differentiated twice with respect to r_ij and to q_ij and once each with both variables. Hence

\begin{array}{l} E [tr (PP)] \approx & (n - 1) / σ_{W}^{4} \\ + 3 (n - 1) m (σ_{R}^{2} σ_{A}^{4} + 2 co v_{RQ} σ_{A}^{2} σ_{D}^{2} + σ_{Q}^{2} σ_{D}^{4}) / σ_{W}^{8} . \end{array}

The information matrix for a single family is therefore

\begin{array}{l} S = & (\frac{n - 1}{2 σ_{W}^{4}}) \\ \times (\begin{array}{c} m σ_{R}^{2} & m {cov}_{RQ} & - 2 m (σ_{R}^{2} σ_{A}^{2} + σ_{D}^{2} co v_{RQ}) / σ_{W}^{2} \\ m σ_{Q}^{2} & - 2 m (co v_{RQ} σ_{A}^{2} + σ_{Q}^{2} σ_{D}^{2}) / σ_{W}^{2} \\ symm & 1 + 3 m (σ_{R}^{2} σ_{A}^{4} + 2 co v_{RQ} σ_{A}^{2} σ_{D}^{2} + σ_{Q}^{2} σ_{D}^{4}) / σ_{W}^{4} \end{array}) . \end{array}

These equations apply to estimates of ${\overset{⌢}{σ}}_{A}^{2}$ , ${\hat{σ}}_{D}^{2}$ and ${\hat{σ}}_{W}^{2}$ . For full sib families, the estimate of the error variance would be ${\overset{⌢}{σ}}_{E}^{2}$ = ${\overset{⌢}{σ}}_{W}^{2}$ - $\frac{1}{2} {\overset{⌢}{σ}}_{A}^{2}$ - $\frac{3}{4} {\overset{⌢}{σ}}_{D}^{2}$ , and its sampling error computed accordingly from S^-1.

As noted in the main text, cov_RQ = $σ_{R}^{2}$ , so S simplifies to

S = \frac{n - 1}{{2 σ}_{W}^{4}} (\begin{array}{c} {m σ}_{R}^{2} & {m σ}_{R}^{2} & {- 2 m σ}_{R}^{2} (σ_{A}^{2} + σ_{D}^{2}) / σ_{W}^{2} \\ {m σ}_{Q}^{2} & - 2 m (σ_{R}^{2} σ_{A}^{2} + σ_{Q}^{2} σ_{D}^{2}) / σ_{W}^{2} \\ symm & 1 + 3 m [σ_{R}^{2} (σ_{A}^{2} + 2 σ_{D}^{2}) σ_{A}^{2} + σ_{Q}^{2} σ_{D}^{4}] / σ_{W}^{4} \end{array}) .

However, as $σ_{R}^{2}$ and $σ_{Q}^{2}$ have similar magnitude, S is almost singular and thus the genotypic variance cannot be partitioned into additive and dominance components unless the dataset is very large.

Appendix 3: Comparison of between and within family estimators

Let us assume a balanced one-way ANOVA (which is also REML if there are no unbalanced fixed effects) is used to estimate $σ_{A}^{2}$ , i.e. ${\hat{σ}}_{Ab}^{2}$ = (MSB – MSW)/(nA) where MSB and MSW are the mean squares and A is the pedigree relationship (½ or ¼). It is assumed that there is no environmental correlation among sibs. Hence, with f families each of size n, $var (MSB) = 2 {[σ_{W}^{2} + (n - 1) A σ_{A}^{2}]}^{2} / (f - 1)$ , $var (MSW) = 2 σ_{W}^{4} / [f (n - 1)]$ and, as these are uncorrelated,

var ({\hat{σ}}_{Ab}^{2}) = \frac{2 σ_{W}^{4}}{{(nA)}^{2}} (\frac{{[1 + (n - 1) (A σ_{A}^{2} / σ_{W}^{2})]}^{2}}{f - 1} + \frac{1}{f (n - 1)}) .

For the within-family estimates, $var ({\hat{σ}}_{Aw}^{2})$ is given by (3). Further simplification requires making some assumptions about numbers and size of families. As a first approximation, assume neither is small, so

\begin{array}{l} var ({\hat{σ}}_{Ab}^{2}) \approx \frac{{2 σ_{W}^{4} [1 + nA σ_{A}^{2} / σ_{W}^{2})]}^{2}}{f n^{2} A^{2}}, var ({\hat{σ}}_{Aw}^{2}) \approx \frac{2 σ_{W}^{4}}{f n^{2} σ_{R}^{2}} \\ and \frac{var ({\hat{σ}}_{Aw}^{2})}{var ({\hat{σ}}_{Ab}^{2})} \approx \frac{A^{2} / σ_{R}^{2}}{{[1 + nA σ_{A}^{2} / σ_{W}^{2}]}^{2}} . \end{array}

Competing interests

The author declares no competing interests.

Author’s contributions

WGH proposed, executed and reported the study.

Acknowledgements

I wish to thank Ian White, Peter Visscher, reviewers and editors for helpful comments.

References

Falconer DS, Mackay TFC. Introduction to quantitative genetics. Essex: Longman Group Ltd; 1996. [Google Scholar]
Lynch M, Walsh JB. Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer Associates; 1998. [Google Scholar]
Ritland K. Marker-based method for inferences about quantitative inheritance in natural populations. Evolution. 1996;50:1062–1073. doi: 10.2307/2410647. [DOI] [PubMed] [Google Scholar]
Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972;2:3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]
Visscher PM, Medland SE, Ferreira MAR, Morley KI, Zhu G, Cornes BK, Montgomery GW, Martin NG. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]
Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S, Hill WG, Hottenga JJ, Willemsen G, Boomsma DI, Liu YZ, Deng HW, Montgomery GW, Martin NG. Genome partitioning of genetic variation for height from 11,214 sibling pairs. Am J Hum Genet. 2007;81:1104–1110. doi: 10.1086/522934. [DOI] [PMC free article] [PubMed] [Google Scholar]
Visscher PM. Whole genome approaches to quantitative genetics. Genetica. 2009;136:351–358. doi: 10.1007/s10709-008-9301-7. [DOI] [PubMed] [Google Scholar]
Hill WG. Variation in genetic identity within kinships. Heredity. 1993;71:652–653. doi: 10.1038/hdy.1993.190. [DOI] [PubMed] [Google Scholar]
Guo SW. Proportion of genome shared identical by descent by relatives: concept, computation, and applications. Am J Hum Genet. 1995;56:1468–1476. [PMC free article] [PubMed] [Google Scholar]
Hill WG, Weir BS. Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genet Res (Camb) 2011;93:47–64. doi: 10.1017/S0016672310000480. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ødegård J, Meuwissen THE. Estimation of heritability from limited family data using genome-wide identity-by-descent sharing. Genet Sel Evol. 2012;44:16. doi: 10.1186/1297-9686-44-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
Arias JA, Keehan M, Fisher P, Coppieters W, Spelman R. A high density linkage map of the bovine genome. BMC Genet. 2009;10:18. doi: 10.1186/1471-2156-10-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30:97–101. doi: 10.1038/ng786. [DOI] [PubMed] [Google Scholar]
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] Falconer DS, Mackay TFC. Introduction to quantitative genetics. Essex: Longman Group Ltd; 1996. [Google Scholar]

[B2] Lynch M, Walsh JB. Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer Associates; 1998. [Google Scholar]

[B3] Ritland K. Marker-based method for inferences about quantitative inheritance in natural populations. Evolution. 1996;50:1062–1073. doi: 10.2307/2410647. [DOI] [PubMed] [Google Scholar]

[B4] Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972;2:3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]

[B5] Visscher PM, Medland SE, Ferreira MAR, Morley KI, Zhu G, Cornes BK, Montgomery GW, Martin NG. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:e41. doi: 10.1371/journal.pgen.0020041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S, Hill WG, Hottenga JJ, Willemsen G, Boomsma DI, Liu YZ, Deng HW, Montgomery GW, Martin NG. Genome partitioning of genetic variation for height from 11,214 sibling pairs. Am J Hum Genet. 2007;81:1104–1110. doi: 10.1086/522934. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] Visscher PM. Whole genome approaches to quantitative genetics. Genetica. 2009;136:351–358. doi: 10.1007/s10709-008-9301-7. [DOI] [PubMed] [Google Scholar]

[B8] Hill WG. Variation in genetic identity within kinships. Heredity. 1993;71:652–653. doi: 10.1038/hdy.1993.190. [DOI] [PubMed] [Google Scholar]

[B9] Guo SW. Proportion of genome shared identical by descent by relatives: concept, computation, and applications. Am J Hum Genet. 1995;56:1468–1476. [PMC free article] [PubMed] [Google Scholar]

[B10] Hill WG, Weir BS. Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genet Res (Camb) 2011;93:47–64. doi: 10.1017/S0016672310000480. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Ødegård J, Meuwissen THE. Estimation of heritability from limited family data using genome-wide identity-by-descent sharing. Genet Sel Evol. 2012;44:16. doi: 10.1186/1297-9686-44-16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Arias JA, Keehan M, Fisher P, Coppieters W, Spelman R. A high density linkage map of the bovine genome. BMC Genet. 2009;10:18. doi: 10.1186/1471-2156-10-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30:97–101. doi: 10.1038/ng786. [DOI] [PubMed] [Google Scholar]

[B14] Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

On estimation of genetic variance within families using genome-wide identity-by-descent sharing

William G Hill

Abstract

Background

Methods

Results

Conclusions

Background

Analysis

Additive model

Simulation check on approximations

Table 1.

Dominance

Discussion and conclusions

Appendix 1: Derivation of the sampling variance for the additive model

Appendix 2: Fitting additive and dominance variances

Appendix 3: Comparison of between and within family estimators

Competing interests

Author’s contributions

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On estimation of genetic variance within families using genome-wide identity-by-descent sharing

William G Hill

Abstract

Background

Methods

Results

Conclusions

Background

Analysis

Additive model

Simulation check on approximations

Table 1.

Dominance

Discussion and conclusions

Appendix 1: Derivation of the sampling variance for the additive model

Appendix 2: Fitting additive and dominance variances

Appendix 3: Comparison of between and within family estimators

Competing interests

Author’s contributions

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases