Skip to main content
Genetics logoLink to Genetics
. 2011 Jan;187(1):261–270. doi: 10.1534/genetics.110.122333

Complex Genetic Effects in Quantitative Trait Locus Identification: A Computationally Tractable Random Model for Use in F2 Populations

Daisy Zimmer 1, Manfred Mayer 1,1, Norbert Reinsch 1
PMCID: PMC3018320  PMID: 20956610

Abstract

Methodology for mapping quantitative trait loci (QTL) has focused primarily on treating the QTL as a fixed effect. These methods differ from the usual models of genetic variation that treat genetic effects as random. Computationally expensive methods that allow QTL to be treated as random have been explicitly developed for additive genetic and dominance effects. By extending these methods with a variance component method (VCM), multiple QTL can be mapped. We focused on an F2 crossbred population derived from inbred lines and estimated effects for each individual and their corresponding marker-derived genetic covariances. We present extensions to pairwise epistatic effects, which are computationally intensive because a great many individual effects must be estimated. But by replacing individual genetic effects with average genetic effects for each marker class, genetic covariances are approximated. This substantially reduces the computational burden by reducing the dimensions of covariance matrices of genetic effects, resulting in a remarkable gain in the speed of estimating the variance components and evaluating the residual log-likelihood. Preliminary results from simulations indicate competitiveness of the reduced model with multiple-interval mapping, regression interval mapping, and VCM with individual genetic effects in its estimated QTL positions and experimental power.


MAPPING procedures often treat the effects of quantitative trait loci (QTL) as fixed, in particular the maximum likelihood-based method of interval mapping (IM) of Lander and Botstein (1989) and the least-squares regression interval mapping (RIM) of Haley and Knott (1992) and Martínez and Curnow (1992).

Single-QTL approaches with fixed effects were later extended to multiple QTL to avoid the so-called “ghost-QTL” phenomenon (e.g., Haley and Knott 1992) and to improve the power to detect linked QTL in repulsion (e.g., Kao 2000) as well as epistatic QTL (e.g., Jannink and Jansen 2001; Carlborg and Haley 2004). The multiple-interval mapping (MIM) approach of Kao and Zeng (1997) and Kao et al. (1999) as an extension of IM considers fixed additive genetic, dominance, and epistatic QTL effects as parts of the likelihood function for a mixture model in experimental populations. Both MIM and RIM are known to be powerful and well suited to identifying multiple, possibly interacting QTL in mapping experiments. However, the accuracy of the estimates of the positions and effects of the QTL from RIM is less compared with MIM in some situations [e.g., QTL in repulsion (Kao 2000; Mayer et al. 2004; Mayer 2005)].

Considering QTL effects as random in a linear mixed model (LMM) leads to the variance component method (VCM) for QTL mapping. This is often applied in scenarios with a large number of small families as is frequently found in humans (e.g., Haseman and Elston 1972; Xu and Atchley 1995) or in livestock (e.g., Grignola et al. 1996), where a mixture of families with parents of different QTL genotypes is expected to occur. Experiments with multiple line crosses, e.g., F2, are often advocated because of their potential to avoid nondetection of QTL by representing genetic variability of a population by only a few lines—the so-called “genetic drift error” (Xu 1996). Although fixed effect approaches are equivalent in power, at least in situations with a single QTL, VCM are easier to implement and have computational advantages in this context (Xu 1998). Rules for setting up the required QTL allelic relationship matrices from marker data were given by Wang et al. (1995) and Abdel-Azim and Freeman (2001). Marker-based relationship matrices for QTL with additive genetic and nonadditive genetic (dominance, epistasis) gene action in noninbred populations were applied by Liu et al. (2002).

The focus of Xie et al. (1998) was on backcross (BC) and F2 designs descending from inbred lines. For these types of experiments additive genetic and dominance relationship matrices can be calculated from conditional QTL genotype probabilities (given the flanking marker genotypes) for all individuals of the mapping population (as used as regressor variables in RIM). Crepieux et al. (2004) provided a general extension to any type of multicross designs from inbred parents. Furthermore, Li and Cui (2009) demonstrated how VCM can be employed for mapping imprinted QTL in a combination of different BC populations derived from inbred lines.

In this article we first propose extensions of the variance component approach of Xie et al. (1998) to multiple interacting QTL with pairwise epistatic effects. Then, maintaining the focus on inbred line-derived F2 populations, a reduced model is suggested, in which individual genetic effects are replaced by average genetic effects for different marker classes. The covariance matrix of the phenotypes is approximated in different ways, leading to less computational effort.

THEORY

Linear mixed model:

From an F2 generation derived from a cross between inbred lines, one observation per individual is considered. The vector of phenotypes Y (length n) is modeled with respect to additive genetic, dominance, and pairwise epistatic effects of the QTL, whose total number is ν. A pair of QTL is indexed by l and k. The LMM in matrix notation is given as

graphic file with name M1.gif (1)

The vector of fixed effects β has the related design matrix X. The random vectors uτ with τ ∈ {al, dl, aalk, adlk, dalk, dalk} denote the additive genetic, the dominance, and the four pairwise epistatic effects (first-order interactions) at QTL l and k. For each τ the length of uτ equals the number of F2 individuals n; i.e., all QTL effects differ between individuals. The incidence matrices Zl and Zlk with dim(Zl) = dim(Zlk) = n × n relate the observations to genetic effects. The residuals are assumed to be independently and identically normally distributed with Inline graphic, where I is the identity matrix of order n and Inline graphic is the residual variance. The covariances between normally distributed random genetic effects uτ and the residuals e are assumed zero as well as the covariances between different types of genetic effects uτ. The expectations of the QTL effects are E(uτ) = 0 and the variances are Inline graphic, where Inline graphic is the related QTL variance and Vτ is the corresponding expected QTL relationship matrix conditional on the marker genotypes. The phenotypic vector therefore follows a multivariate normal distribution with YN (Xβ, V ). The covariance matrix V is derived conditional on the observed marker genotypes and can be written as

graphic file with name M6.gif (2)

Calculation of covariance matrices:

We follow the approach of Xie et al. (1998) and derive the required genetic covariance matrices of (2) from conditional QTL genotype probabilities and elementary covariance matrices.

Conditional QTL genotype probabilities:

For a particular QTL the F2 generation can be partitioned into nine different marker classes (see Table 2 column headings) conditional on the observed genotype of the flanking markers. QTL alleles originating from the first line are denoted by uppercase letter indexes (Q, H) and those from the second line by lowercase indexes (q, h), and for marker alleles the respective line origins are indicated by numbers (1 and 2). Conditional QTL genotype probabilities depend on flanking marker genotypes and the recombination rates between the markers and QTL and can be derived as described by, e.g., Carbonell et al. (1992, Table 1). We allow for double recombinations and assume Haldane's mapping function (Haldane 1919).

TABLE 2.

Covariances miial of additive genetic sampling effects within marker class i for different numbers (n) of F2 individuals: QTL in the middle of a 10-cM marker interval

Inline graphic
i = 1:
i = 2:
i = 3:
i = 4:
i = 5:
i = 6:
i = 7:
i = 8:
i = 9:
n G11/11 G11/12 G11/22 G12/11 G12/12 G12/22 G22/11 G22/12 G22/22
500 0.000 −0.012 −0.487 −0.012 0.000 −0.012 −0.487 −0.012 0.000
1000 0.000 −0.006 −0.243 −0.006 0.000 −0.006 −0.243 −0.006 0.000
2000 0.000 −0.003 −0.122 −0.003 0.000 −0.003 −0.122 −0.003 0.000
3000 0.000 −0.002 −0.081 −0.002 0.000 −0.002 −0.081 −0.002 0.000

Flanking marker genotypes G·/· are indexed by their alleles for each i.

TABLE 1.

Correspondence of elements of additive genetic relationship matrices in the individual model and the equivalent model with additive genetic sampling effects

Case Individual model Equivalent model
Inline graphic Inline graphic Inline graphic
1 Inline graphic Inline graphic Inline graphic
2 Inline graphic Inline graphic Inline graphic
3 Inline graphic Inline graphic 0

Each variable in the second column (individual model) is the sum from the two expressions of the third and fourth columns (equivalent model). Case 1: diagonal elements for marker class i ∈ {1,…,9}; case 2: two individuals with equal marker class i; case 3: two individuals with different marker classes i and j

Probabilities for the genotypes GQQ, GQq, and Gqq of an individual at the lth QTL conditional on flanking marker information Mi can be collected in a row vector Inline graphic with

graphic file with name M8.gif

where Mi denotes the observed flanking marker genotype i ∈ {1,…, 9} of an individual. We assume that in each marker interval either no or only a single QTL exists. The joint conditional probability for two linked QTL is just the product of both single probabilities if at least one completely informative marker is in between (Rönnegård et al. 2008). Thus, the probability of a two-locus QTL genotype, e.g. GQQHh, given the particular marker genotypes Mi and Nj (i, j ∈ {1,…, 9}) at QTL l and k, respectively, is defined as Pr(GQQHh|Mi,Nj) = Pr(GQQ|Mi)Pr(GHh|Nj). We define Inline graphic as the row vector with all joint conditional QTL genotype probabilities for a pairwise epistatic effect at QTL l and k.

Elementary covariance matrices:

As a second ingredient we need elementary covariance matrices between all possible QTL genotypes GQQ, GQq, and Gqq in the F2 populations. The elementary matrices for additive genetic QTL effects A (Xie et al. 1998) and dominance QTL effects D (Smith 1984; Xie et al. 1998) are

graphic file with name M10.gif

We use the Kronecker product (symbol ⊗) of A and D to compute the four different 9 × 9 elementary matrices, AA, AD, DA, and DD, which include covariances between pairwise epistatic effects and correspond to nine genotypes (GQQHH, GQQHh, GQQhh, GQqHH, GQqHh, GQqhh, GqqHH, GqqHh, and Gqqhh) for pairwise QTL combinations.

QTL relationship matrices:

The n × n additive genetic, dominance, and pairwise epistatic relationship matrices for all F2 individuals can be set up for a putative QTL position or combinations thereof with conditional QTL genotype probabilities (Inline graphic and Inline graphic vectors) and elementary matrices (Xie et al. 1998). Relationship coefficients are averages of possible QTL genotype combinations. For the additive genetic relationship matrix Inline graphic we get diagonal elements

graphic file with name M14.gif (3)

and off-diagonals

graphic file with name M15.gif (4)

at the lth QTL. If both individuals s and t belong to the same marker class i, then Inline graphic can be simplified to

graphic file with name M17.gif (5)

because the conditional probabilities are equal. The dominance relationship matrix Inline graphic is set up equivalently, but instead of A the elementary matrix D is used, i.e., Inline graphic and Inline graphic.

We suggest that the pairwise epistatic relationship matrices Inline graphic at the lth and kth QTL are computed analogously to Val using the appropriate Kronecker product of elementary matrices (e.g., AA). Computation of matrix elements is done as in Equations 3 and 4, employing corresponding row vectors Inline graphic. Note that this is equivalent to using Hadamard products of QTL relationship matrices Inline graphic and Inline graphic given that there is at least one completely informative marker between both QTL or no linkage between them (Rönnegård et al. 2008), which is always fulfilled by our assumptions. To ensure positive definiteness of covariance matrices, we assume that locations of putative QTL and markers do not coincide.

Equivalent model with average genetic effects:

What we have outlined so far is termed “individual model,” because each individual receives its own genetic effects for the different kinds of genetic components. For a particular QTL l the LMM of (1) with only additive genetic effects becomes

graphic file with name M25.gif (6)

with covariance matrix of the phenotypes conditional on the observed marker genotypes

graphic file with name M26.gif (7)

A model equivalent to (6) is

graphic file with name M27.gif (8)

where a vector Inline graphic with length nl = 9 (number of different marker classes) of average additive genetic effects for all possible marker genotype classes is considered. An additional random effect Inline graphic of length n appears, termed “additive genetic sampling effect,” and it describes the deviations of the individual additive genetic effects from the average additive genetic effects of marker classes. The dimension of Inline graphic is n × nl. Accordingly, the covariance matrix of the phenotypes can be expressed as

graphic file with name M31.gif (9)

where Inline graphic denotes the reduced nl × nl relationship matrix of the average additive genetic effects at the QTL. The additive genetic variance of the individual model (7) is Inline graphic, which is identical to Inline graphic in (9). The variance of the additive genetic sampling effect is Inline graphic, where Inline graphic denotes the relationship matrix of the additive genetic sampling effect. There are Inline graphic individuals with the same marker genotype i at the QTL. The variance of the average additive genetic effect of a certain marker class i, averaged over Inline graphic individuals, is given in the reduced model as

graphic file with name M39.gif (10)

Equation 10 is valid, because there are Inline graphic diagonal elements and Inline graphic off-diagonal elements in the relationship matrix of the individual additive genetic effects.

The three possible cases appearing in the additive genetic relationship matrix of the individual model are further investigated (see Equations 35). First, the variance of an individual additive genetic effect with marker class i is Inline graphic and second, the covariance between two additive genetic effects with the same marker class i is Inline graphic. Then the element Inline graphic for Inline graphic can be written as

graphic file with name M46.gif (11)

The variance of the average additive genetic effect is asymptotically equal to the covariance between individual additive genetic effects of the same marker class i; i.e., Inline graphic. Third, the covariance of additive genetic effects with marker classes i and j is Inline graphic. Now, the covariance of the average additive genetic effects of marker genotypes i and j (ij) can be expressed as

graphic file with name M49.gif (12)

This is equal to the covariance among the two individual additive genetic effects of marker classes i and j.

The relationship matrix of the additive genetic sampling effects Inline graphic can be determined as the difference between the relationship matrices of additive genetic effects from the individual model (individual genetic effects) and the reduced model (average genetic effects), which are inferred from Equations 7 and 9; i.e., Inline graphic. Generally, Inline graphic(order n) can be written as

graphic file with name M53.gif (13)

if the individuals are arranged by marker class. To study the matrices Inline graphic we assume that each marker genotype appears at least once; i.e., Inline graphic.

Concerning the third case, the additive genetic covariance between a pair of individuals s and t with different marker genotypes i and j equals the difference of (4) and (12): Inline graphic. Therefore, for ij Inline graphic in (13) and Inline graphic is a block diagonal matrix if the observations are ordered by marker genotypes. The diagonal block Inline graphic corresponding to marker class i has the order Inline graphic and can be expressed as

graphic file with name M61.gif

The covariance Inline graphic of the additive genetic sampling effects of two individuals s and t given the same marker genotype i (second case) is the difference of (5) and (11):

graphic file with name M63.gif (14)

For Inline graphic. The variance Inline graphic of the additive genetic sampling effect given the marker genotype i (first case) is the difference of (3) and (11),

graphic file with name M66.gif (15)

with Inline graphic. Note that the elements Inline graphic are independent of Inline graphic. However, Inline graphic and Inline graphic depend on conditional genotype probabilities. From (14) and (15) it is obvious that Inline graphic is a function of the covariance of the additive genetic sampling effects from the same marker class i and the corresponding number Inline graphic of observations, Inline graphic.

The calculation of the relationship matrix of the additive genetic effect of the individual model (6) and the reduced model (8) as well as the additive genetic sampling relationship matrix Inline graphic is summarized in Table 1.

If model (6) includes not only additive genetic but also dominance effects, the genetic parameters for average dominance effects and dominance sampling terms can be obtained analogously. The genetic sampling relationship matrices of the pairwise epistatic effects can also be calculated similarly to the additive genetic and dominance effects, but the row vectors Inline graphic that considered the joint conditional QTL genotype probabilities of the lth and kth QTL have to be used. Then nlk different marker classes have to be considered, where nlk = 27 if the QTL are in two adjacent marker intervals and nlk = 81 otherwise.

If we assume that the number of F2 individuals approaches infinity (n → ∞), then the number of individuals per marker class i also increases Inline graphic. The diagonal elements Inline graphic as well as Inline graphic depend on Inline graphic, where Inline graphic tends to zero for n → ∞. Hence limnil→∞Inline graphic, where Inline graphic is a diagonal matrix of order n of elements Inline graphic. Therefore, the covariance matrix of the additive genetic sampling effects is asymptotically diagonal.

Reduced model:

Instead of an individual model we developed a reduced model approach, which is an approximation of model (8), with decreased dimension of the relationship matrices. The LMM is Inline graphic, where the residuals are assumed to be independently and identically normally distributed with Inline graphic. Here the F2 individuals are grouped according to their marker genotypes and average genetic effects are estimated for marker classes instead of individual genetic effects, as described in (8). The dimension of the relationship matrices depends on the number of marker classes (nl and nlk), but not on the experiment size n. We call this procedure the reduced model (vs. the individual model).

In general, the reduced model with respect to additive genetic, dominance, and pairwise epistatic effects is

graphic file with name M98.gif (16)

where the residuals are again assumed to be independently and identically normally distributed with Inline graphic. The vectors Inline graphic with Inline graphic consider the average additive genetic, dominance, and pairwise epistatic effects of length nl and nlk.

The calculation of the reduced dominance relationship matrix Inline graphic at the lth QTL is done similarly to the notes above, but A has to be replaced by D. Both Inline graphic and Inline graphic are matrices of order nl, where nl = 9 if the flanking markers are fully informative. The reduced epistatic relationship matrices Inline graphic of the lth and kth QTL are computed analogously to Inline graphic from (10) and (12), but the corresponding Kronecker product is used instead of A and the row vector Inline graphic for the ith and jth marker class is applied.

The difference Inline graphic (asymptotic variance Inline graphic) between the variance of an individual additive genetic effect and the covariance between two additive genetic effects of the same marker class decreases as the distance between flanking markers becomes smaller. Decreasing QTL effects and genetic variances lead to the same effect. In the extreme case, when the marker location and the position of the QTL coincide, the difference Inline graphic is zero and therefore Inline graphic. In this case the covariances of the phenotypes in the reduced and the individual model are identical. Therefore, approximating Inline graphic (or its multilocus equivalent) by Inline graphic seems to be a reasonable choice. Note that Xu and Atchley (1995) and Xu (1998) investigated the inflation of the residual variance through the within-marker genotype QTL variance in the RIM, which is similar to our genetic sampling effects.

The approximation of the individual model by the reduced model relies on two different aspects. First, the covariances Inline graphic between genetic sampling effects (deviation of individual genetic effects from average genetic effects of marker classes) are assumed to be zero. Second, the asymptotic variances Inline graphic of the additive genetic sampling effects are treated as equal for all marker classes. Covariances Inline graphic between additive genetic sampling effects of individuals sharing the same marker class i are shown in Table 2 for an additive QTL in the middle of a 10-cM marker interval in dependence on sample size. The elements Inline graphic were calculated using the number of expected proportions for each marker genotype according to Equation 14. To make sure that Inline graphic, we used n ≥ 500. For 500 F2 individuals this covariance Inline graphic is ≤1% of the QTL variance and shows a further decline when the sample size increases. Only for marker classes G11/22 and G22/11 is there a very high (negative) covariance (48.7% of the additive genetic variance) and an experiment with >2000 F2 individuals would be required to reach a value <10%. These marker genotypes are rare, we expect these marker genotypes to occur twice in total among 500 F2 genotypes. Therefore, omitting these covariances Inline graphic has little effect on the likelihood.

The asymptotic variances Inline graphic of the additive genetic sampling effects for different marker classes are, however, larger than their corresponding covariances and, more importantly, they show considerable variation between more frequent marker classes. The sixth line of Table 3 shows the genetic sampling variances for all marker classes, again for an additive QTL in the middle of a 10-cM marker interval. For the three most frequent marker classes, the genetic sampling variance is at ≤1% of the additive genetic variance (classes 1, 5, and 9) and for another four marker classes it equals 25% (classes 2, 4, 6, and 8), while a 50% value occurs only in the very rare classes (3 and 7). Note that the genetic sampling variances become smaller when the QTL is located closer to the boundary of the marker interval. The genetic sampling effects completely vanish if marker locations and positions of the QTL coincide (Table 3, first line). In such cases, the covariances of the genetic sampling effects are zero and the assumption Inline graphic of the reduced model (16) is exact.

TABLE 3.

Asymptotic variances m̃iial of additive genetic sampling effects within marker class i for differently sized marker intervals and different QTL positions within marker intervals (cM)

Marker interval
Position of QTL
Inline graphic
i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9
0 0 0.00 0.00 0.00
10 1 0.00 0.09 0.18 0.09 0.00 0.09 0.18 0.09 0.00
10 2 0.00 0.16 0.32 0.16 0.01 0.16 0.32 0.16 0.00
10 3 0.00 0.21 0.42 0.21 0.01 0.21 0.42 0.21 0.00
10 4 0.00 0.24 0.48 0.24 0.01 0.24 0.48 0.24 0.00
10 5 0.00 0.25 0.50 0.25 0.01 0.25 0.50 0.25 0.00
20 10 0.02 0.26 0.50 0.26 0.04 0.26 0.50 0.26 0.02
30 15 0.04 0.27 0.50 0.27 0.08 0.27 0.50 0.27 0.04
40 20 0.07 0.29 0.50 0.29 0.13 0.29 0.50 0.29 0.07

The latter considerations suggest, as a further alternative, a weighted approach, where the second part of the approximation inherent in the reduced model, i.e., equal genetic sampling variances for all marker classes, is skipped, while the assumption (first part) of zero covariances for genetic sampling effects within marker class is maintained. For a single additive QTL this results in the following mixed model equations (MME):

graphic file with name M125.gif

where Inline graphic. The variance of the residuals is Inline graphic, where Inline graphic is defined as in (8) and all other symbols as in (1) and (16). The diagonal matrix W of order n has the entries Inline graphic, which differ between observations from different marker classes and are equal for observations from the same marker class i. If more QTL and nonadditive genetic gene actions are considered in the model, then the genetic sampling variances for different QTL and different kinds τ of genetic effects have to be summed to get the entire genetic sampling variance of an observation and wss (sth individual given the marker class i) becomes

graphic file with name M130.gif (17)

where τ ∈ {al, dl, aalk, adlk, dalk, ddlk}. This weighted version of the reduced model retains the advantage of a reduced dimension of the QTL relationship matrices as in the reduced model, but may provide a better approximation of the exact residual log-likelihood-ratio test (RLRT) statistics. If marker location and position of QTL coincide, the weights of (17) are one and W is an identity matrix. The weights of (17) are similar to the weights in the weighted least-squares method of QTL mapping as shown by Xu and Atchley (1995) and Xu (1998).

Coincidence of markers and QTL results in singularity of Inline graphic (identical to A in this case) and was not further considered here. However, this situation can be treated, e.g., by regularization [adding a small quantity to the diagonal elements of Inline graphic (Neumaier 1998)], which has little effect on the test statistics and is easy to implement, by including allelic effects in the model instead of genotypic effects, or by replacing Inline graphic by a reduced rank approximation (Rönnegård et al. 2007) obtained by spectral decomposition.

SIMULATIONS

First, a single F2 family as the simplest case of a combination of multiple line crosses was considered to demonstrate the properties of the reduced model in comparison to the individual model (Xie et al. 1998) and the fixed-effects methods MIM (Kao and Zeng 1997; Kao et al. 1999) and RIM (Haley and Knott 1992; Martínez and Curnow 1992). Experiments from four different scenarios were simulated with 1000 replications per scenario and n = 200 F2 individuals per experiment. Scenarios 1 and 2 consisted of a single additive genetic QTL at 35 cM on a single chromosome of 50 cM length, whereas in the other scenarios (3 and 4) there were two linked QTL with equally sized QTL effects in repulsion. In the fourth scenario chromosome length was extended to 80 cM and an interaction effect was included. For further characteristics of all scenarios see Table 4. The observations were simulated using Cockerham's F2-metric model (Cockerham 1954; Kao and Zeng 2002, Table 3). The relative QTL variance R2 is the proportion of the phenotypic variance Inline graphic explained by the QTL and is Inline graphic.

TABLE 4.

Brief summary of simulated scenarios: the number of QTL ν, length of the chromosome lc (cM), QTL positions P1 and P2 (cM), marker positions (cM), residual variance σe2, additive genetic effects (a1, a2), and additive-by-additive genetic effects aa12 as well as the relative QTL variance R2 (%)

Scenario ν lc P1 P2 Marker positions Inline graphic a1 a2 aa12 R2
1 1 50 35 0, 10, 20, 30, 40, 50 9.529 1.0 5.0
2 1 50 35 0, 10, 20, 30, 40, 50 1.000 1.0 33.3
3 2 50 25 35 0, 10, 20, 30, 40, 50 0.181 1.0 −1.0 50.0
4 2 80 35 45 0, 40, 80 1.000 1.0 −1.0 1.0 30.0

In the second part our small simulation study focused on the performance of the reduced vs. the individual model in a situation with multiple families. Four independent F2 families, each with 50 progeny (n = 200), were derived from a population consisting of four different inbred lines, representing all pairwise combinations of QTL genotypes (GQQHH, GQQhh, GqqHH, Gqqhh). For each family F1 individuals were generated from a random pair of inbred lines. Markers were always assumed to be fully informative. In the LMM family means were treated as fixed. Remaining parameters were chosen as previously described for the third scenario (Table 4). For each genetic effect a single (population-specific) variance was assumed. The simulated data can be found in File S1.

Significance thresholds for the null hypothesis of no linked QTL were determined by simulating 1000 experiments of the same size for each scenario, where QTL with the same kind and size of effects were present, but unlinked to the markers. After analyzing these experiments, the 95% quantile of the maximum values of the test statistic from all replications was taken as a significance threshold, specific for each scenario and method, which allowed the determination of experimental power. We performed the residual log-likelihood-ratio test for the reduced and the individual model, the log-likelihood-ratio test for MIM, and the F-test for RIM. Mean QTL positions, root mean squared error (RMSE) of the QTL positions, and their 5% and 95% quantiles were evaluated to characterize the precision of location estimates. For each replication we analyzed positions or combinations thereof, where marker locations and QTL positions did not coincide (step width 1 cM, both QTL in different marker intervals). Therefore, we applied RIM and MIM with the same restrictions as the VCM. Our analyses used the true genetic model for testing for segregating QTL; i.e., the model included only the simulated effects of QTL and no model selection was performed. All calculations were done with self-written Fortran 95 programs in combination with ASReml (Gilmour et al. 2008) for estimation of variance components and evaluation of the restricted maximum-likelihood function (Patterson and Thompson 1971).

DISCUSSION

Results for all simulated single-QTL scenarios are summarized in Table 5. The experimental power was 100% (scenarios 2 and 3) or nearly so (scenario 4), with the exception of scenario 1, where the experimental power was uniformly at 82% for all methods. There was almost no variation between methods in the mean estimated position in the single-QTL scenarios (1 and 2); even the distributions of the estimates showed identical 5% and 95% quantiles. Differences between methods became, however, apparent in the two-QTL scenarios. For scenario 3 (two QTL in repulsion, no interactions), MIM resulted in average estimated QTL positions at 24.7 and 34.9 cM, nearly identical to the simulated values at 25 and 35 cM. The RMSEs for positions of the QTL were <1.4 cM for both QTL for MIM and ∼2.0 cM for the individual model, while the reduced model and RIM performed very similarly with RMSEs of ∼4.2 cM. In scenario 3 the reduced model, the individual model, and RIM on average placed the QTL somewhat more toward the ends of the chromosome compared to MIM and the true values, resulting in an overestimation of the distance (true distance: 10 cM) between both QTL, ranging from 2.7 cM (individual model) to 6.7 cM (reduced model). For scenario 4 (two QTL in repulsion with interactions) this overestimation of the distance between the QTL was, however, very similar for all methods at ∼2.0–3.1 cM. The RMSEs for estimated positions of the QTL were between 5.3 and 5.6 cM with little difference between the first and second QTL for RIM as well as the reduced and the individual model. However, the RMSE of MIM at the same time showed the highest deviation of 6.5 cM for the first and the smallest deviation of 3.7 cM for the second QTL.

TABLE 5.

Average estimates (mean) for QTL positions (P1, P2) with associated root mean squared error (RMSE) and quantiles together with mean estimates of the residual variance σ̂r2 and the observed power (%) for different scenarios: 200 F2 individuals per simulated experiment and 1000 replications per scenario

Reduced model
Individual model
RIM
MIM
P1 P2 P1 P2 P1 P2 P1 P2
Scenario 1
Mean 32.58 32.53 32.73 32.40
RMSE 10.71 10.79 10.78 10.89
5% quantile 9.00 9.00 9.00 9.00
95% quantile 48.00 48.00 48.00 48.00
Inline graphic 9.49 9.38 9.50 9.37
Power 81.70 81.80 81.70 81.90
Scenario 2
Mean 34.82 34.87 34.88 34.68
RMSE 2.80 2.66 2.68 2.57
5% quantile 31.00 31.00 31.00 31.00
95% quantile 39.00 39.00 39.00 39.00
Inline graphic 1.04 0.99 1.04 0.99
Power 100.00 100.00 100.00 100.00
Scenario 3
Mean 21.67 38.38 23.70 36.70 22.22 37.82 24.70 34.89
RMSE 4.20 4.17 2.03 2.02 4.29 4.25 1.35 1.38
5% quantile 17.00 35.00 22.00 34.00 17.00 32.00 23.00 33.00
95% quantile 25.00 42.00 26.00 38.00 28.00 42.00 27.00 37.00
Inline graphic 0.27 0.18 0.28 0.19
Power 100.00 100.00 100.00 100.00
Scenario 4
Mean 34.03 46.22 33.56 46.70 34.13 46.09 32.29 44.44
RMSE 5.36 5.50 5.26 5.47 5.51 5.55 6.47 3.72
5% quantile 24.50 41.00 24.00 41.00 24.00 41.00 21.00 41.00
95% quantile 39.00 56.50 39.00 57.00 39.00 57.00 39.00 52.00
Inline graphic 1.21 0.93 1.23 1.04
Power 99.20 99.40 99.60 100.00

Note that MIM was applied according to the original approach of Kao and Zeng (1997) and Kao et al. (1999), which ignores double recombination events (complete interference) within the marker interval. However, double recombinations were taken into account for RIM and the VCM.

As theory indicated, estimated residual variance components from methods coping better with genetic deviations from the mean of a marker class (MIM, individual model) were smaller in the two-QTL scenarios compared to RIM and the reduced model, where the genetic sampling variance (QTL genotype variability within marker genotype) is part of the residual variance.

The results of the analysis of the multiple families are shown in Table 6. The accuracy of the estimated QTL positions of the individual model under consideration of four families was slightly better than that of the reduced model. However, when multiple families were considered, the difference between both models (reduced and individual model) was less than that of the single family (scenario 3). The RMSEs for positions of the QTL as shown in Table 6 were increased compared to the RMSEs of the third scenario of Table 5, because not all families are fully informative. The observed power of the individual and the reduced model again almost reached 100%. As expected, the estimated residual variance was inflated by the within-marker genotype QTL variance.

TABLE 6.

Average estimates (mean) for QTL positions (P1, P2) with associated root mean squared error (RMSE) and quantiles together with mean estimates of the residual variance σ̂r2 and the observed power (%) for the third scenario with 50 F2 individuals for each of the four families per simulated experiment (1000 replications per scenario)

Reduced model
Individual model
P1 P2 P1 P2
Mean 22.38 35.99 23.00 35.59
RMSE 6.77 4.78 6.26 4.27
5% quantile 8.00 28.00 9.00 28.00
95% quantile 28.00 45.00 28.00 44.00
Inline graphic 0.21 0.18
Power 99.50 99.60

The required CPU time for ASREML (Gilmour et al. 2008) of the reduced and the individual model was 26.7 and 80.1 sec for each repetition recorded on an HP DL380 G6 (72 GB RAM, 2× XEON X5570, 2.93 GHz, multiuser environment) in a two-QTL scenario with only additive genetic effects (four families); i.e., the individual model required threefold more computing time. The run time required for the evaluation of a single QTL (scenario 1 or 2) was sevenfold for the individual model compared with the reduced model for each repetition. If the number of individuals and the number of variance components increase, the speed gain of the reduced model relative to the individual model is expected to increase.

Average RLRT profiles from the reduced and the individual model were almost identical for the first scenario with a single QTL (Figure 1A). For two QTL in scenario 3 (Figure 1B), the shapes of the RLRT surfaces from both methods were again very similar, but the average size of the maximum was higher for the individual model (60.62 compared to 44.52). The RLRT surfaces of scenario 4 of the reduced and the individual model as well as the weighted reduced model are nearly identical (results not shown). The likelihood profile of the weighted approach was smaller than that of the reduced model, but QTL positions seemed to be estimated more accurately.

Figure 1.—

Figure 1.—

Figure 1.—

For a single QTL (scenario 1) average RLRT profiles (A) of the individual model (dashed line) and the reduced model (solid line) nearly coincide, and so do their significance thresholds. When two QTL were present (scenario 3), contour plots (B) of the RLRT surfaces from the reduced model (below diagonal) and the individual model (above diagonal) showed a similar shape, but different absolute heights (respective RLRT maxima 44.52 and 60.62). Averaging was over 1000 replications.

The considerable advantage of the reduced model with respect to computing time is achieved by a smaller number of genetic effects accompanied by a smaller dimension of their associated covariance matrices. Moreover, this dimension does not depend on the size of the experiment, in contrast to the individual model. The amount of savable computing time can be expected to vary somewhat between different REML algorithms. Average information (AI) REML (Gilmour et al. 1995; Johnson and Thompson 1995) may be implemented either in an MME-based version or as a variant requiring the inversion of the covariance matrix V of phenotypes, termed the “direct method” by Lee and Van Der Werf (2006). These authors recommend the direct method if genetic covariance matrices are dense because of both speed and numerical stability. Application of the Sherman–Morrison–Woodbury matrix identity (e.g., Henderson and Searle 1981; Xu 1998) to determine the inverse of V results in

graphic file with name M142.gif

where R denotes the covariance matrix of residuals, G is the covariance matrix of all genetic effects (block diagonal), and H is the corresponding incidence matrix. To obtain V−1 the inversion of a dense matrix of the same order as G is required, which usually is considerably smaller than the number of observations for the reduced model (e.g., dim(G) = 9 × 9 for a single QTL with additive genetic effects and dim(G) = 36 × 36 for two QTL with additive genetic and dominant effects). In conclusion, the increase in computing speed obtained by the reduced model may differ between algorithms, but is substantial when compared with the individual model, thus broadening the general applicability of the VCM for mapping purposes.

The amount of possible improvement of the reduced model obtained by accounting for genetic sampling variation within marker classes remains to be investigated. A more comprehensive comparison of methods than presented here is underway to obtain a more complete picture. Despite the limited number of scenarios in our simulations, it can already be concluded that the proposed reduced model may be competitive with other standard methods for mapping of (multiple) QTL not only in terms of computing time, but also in terms of detection power and precision of estimated positions of the QTL.

Acknowledgments

The authors thank the reviewers for their helpful comments and suggestions. This research was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, MA 1553/3-1).

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.110.122333/DC1.

Available freely online through the author-supported open access option.

References

  1. Abdel-Azim, G., and A. E. Freeman, 2001. A rapid method for computing the inverse of the gametic covariance matrix between relatives for a marked quantitative trait locus. Genet. Sel. Evol. 33 153–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Carbonell, E. A., T. M. Gerig, E. Balansard and M. J. Asins, 1992. Interval mapping in the analysis of nonadditive quantitative trait loci. Biometrics 48 305–315. [Google Scholar]
  3. Carlborg, Ö., and C. S. Haley, 2004. Epistasis: Too often neglected in complex trait studies? Nat. Rev. Genet. 5 618–625. [DOI] [PubMed] [Google Scholar]
  4. Cockerham, C. C., 1954. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39 859–882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Crepieux, S., C. Lebreton, B. Servin and G. Charmet, 2004. Quantitative trait loci (QTL) detection in multicross inbred designs: recovering QTL identical-by-descent status information from marker data. Genetics 168 1737–1749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gilmour, A. R., R. Thompson and B. R. Cullis, 1995. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51 1440–1450. [Google Scholar]
  7. Gilmour, A. R., B. J. Gogel, B. R. Cullis and R. Thompson, 2008. ASReml User Guide Release 3.0. VSN International, Hemel Hempstead, UK.
  8. Grignola, F. E., I. Hoeschele and B. Tier, 1996. Mapping quantitative trait loci in outcross populations via residual maximum likelihood. I. Methodology. Genet. Sel. Evol. 28 479–490. [Google Scholar]
  9. Haldane, J. B. S., 1919. The combination of linkage values, and the calculation of distances between the loci of linked factors. J. Genet. 8 299–309. [Google Scholar]
  10. Haley, C. S., and S. A. Knott, 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69 315–324. [DOI] [PubMed] [Google Scholar]
  11. Haseman, J. K., and R. C. Elston, 1972. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2 3–19. [DOI] [PubMed] [Google Scholar]
  12. Henderson, H. V., and S. R. Searle, 1981. On deriving the inverse of a sum of matrices. SIAM Rev. Soc. Ind. Appl. Math. 23 53–60. [Google Scholar]
  13. Jannink, J.-L., and R. Jansen, 2001. Mapping epistatic quantitative trait loci with one-dimensional genome searches. Genetics 157 445–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Johnson, D. L., and R. Thompson, 1995. Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. J. Dairy Sci. 78 449–456. [Google Scholar]
  15. Kao, C.-H., 2000. On the differences between maximum likelihood and regression interval mapping in the analysis of quantitative trait loci. Genetics 156 855–865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kao, C.-H., and Z.-B. Zeng, 1997. General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53 653–665. [PubMed] [Google Scholar]
  17. Kao, C.-H., and Z.-B. Zeng, 2002. Modeling epistasis of quantitative trait loci using Cockerham's model. Genetics 160 1243–1261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kao, C. H., Z. B. Zeng and R. D. Teasdale, 1999. Multiple interval mapping for quantitative trait loci. Genetics 152 1203–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lander, E. S., and D. Botstein, 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121 185–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lee, S. H., and J. H. J. Van der Werf, 2006. An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genet. Sel. Evol. 38 25–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Li, G., and Y. Cui, 2009. A statistical variance components framework for mapping imprinted quantitative trait locus in experimental crosses. J. Probab. Stat. 2009 1–27. [Google Scholar]
  22. Liu, Y., G. B. Jansen and C. Y. Lin, 2002. The covariance between relatives conditional on genetic markers. Genet. Sel. Evol. 34 657–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Martínez, O., and R. N. Curnow, 1992. Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor. Appl. Genet. 85 480–488. [DOI] [PubMed] [Google Scholar]
  24. Mayer, M., 2005. A comparison of regression interval mapping and multiple interval mapping for linked QTL. Heredity 94 599–605. [DOI] [PubMed] [Google Scholar]
  25. Mayer, M., Y. Liu and G. Freyer, 2004. A simulation study on the accuracy of position and effect estimates of linked QTL and their asymptotic standard deviations using multiple interval mapping in an F2 scheme. Genet. Sel. Evol. 36 455–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Neumaier, A., 1998. Solving ill–conditioned and singular linear systems: a tutorial on regularization. SIAM Rev. Soc. Ind. Appl. Math. 40 636–666. [Google Scholar]
  27. Patterson, H. D., and R. Thompson, 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58 545–554. [Google Scholar]
  28. Rönnegård, L., K. Mischenko, S. Holmgren and Ö. Carlborg, 2007. Increasing the efficiency of variance component quantitative trait loci analysis by using reduced-rank identity-by-descent matrices. Genetics 176 1935–1938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rönnegård, L., R. Pong-Wong and Ö. Carlborg, 2008. Defining the assumptions underlying modeling of epistatic QTL using variance component methods. J. Hered. 99 421–425. [DOI] [PubMed] [Google Scholar]
  30. Smith, S. P., 1984. Dominance Relationship Matrix and Inverse for an Inbred Population. Mimeo, Department of Dairy Science, Ohio State University, Columbus, OH.
  31. Wang, T., R. L. Fernando, S. van der Beek, M. Grossman and J. A. M. van Arendonk, 1995. Covariance between relatives for a marked quantitative trait locus. Genet. Sel. Evol. 27 251–274. [Google Scholar]
  32. Xie, C., D. D. Gessler and S. Xu, 1998. Combining different line crosses for mapping quantitative trait loci using the identical by descent-based variance component method. Genetics 149 1139–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Xu, S., 1996. Mapping quantitative trait loci using four-way crosses. Genet. Res. 68 175–181. [Google Scholar]
  34. Xu, S., 1998. Mapping quantitative trait loci using multiple families of line crosses. Genetics 148 517–524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Xu, S., and W. R. Atchley, 1995. A random model approach to interval mapping of quantitative trait loci. Genetics 141 1189–1197. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES