Abstract
Statistical methods for mapping quantitative trait loci (QTLs) in full-sib forest trees, in which the number of alleles and linkage phase can vary from locus to locus, are still not well established. Previous studies assumed that the QTL segregation pattern was fixed throughout the genome in a full-sib family, despite the fact that this pattern can vary among regions of the genome. In this paper, we propose a method for selecting the appropriate model for QTL mapping based on the segregation of different types of markers and QTLs in a full-sib family. The QTL segregation patterns were classified into three types: test cross (1:1 segregation), F2 cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). Akaike’s information criterion (AIC), the Bayesian information criterion (BIC) and the Laplace-empirical criterion (LEC) were used to select the most likely QTL segregation pattern. Simulations were used to evaluate the power of these criteria and the precision of parameter estimates. A Windows-based software was developed to run the selected QTL mapping method. A real example is presented to illustrate QTL mapping in forest trees based on an integrated linkage map with various segregation markers. The implications of this method for accurate QTL mapping in outbred species are discussed.
Keywords: full-sib family, interval mapping, model selection, quantitative trait locus
Introduction
Genetic mapping of quantitative trait loci (QTLs) based on genetic linkage maps is a powerful tool for unraveling the genetic architecture of quantitative trait variation in plants, animals and humans. Since the seminal publication on interval mapping by Lander and Botstein (1989) there has been a tremendous development of statistical methods and algorithms for QTL mapping. To make interval mapping more useful, Zeng (1993, 1994) and Jansen and Stam (1994) independently proposed so-called composite interval mapping in which partial regression analysis is used to separate the effects of multiple linked QTLs. Zeng and collaborators constructed the framework for multiple interval mapping to simultaneously characterize the underlying QTLs (their number, locations, and main and epistatic effects) for a quantitative trait (Kao et al., 1999; Zeng et al., 1999). Xu and colleagues extended interval mapping to map qualitatively inherited traits, such as binary and categorical traits (Xu and Atchley, 1996; Yi and Xu, 2000; Xu et al., 2005). The principle of interval mapping was established for a pedigree, initiated with two in-bred lines, such as the F2, backcross and recombinant inbred lines. For any two inbred lines, there are only two alleles at each locus and in the F1 hybrids that transmit gametes to the next generation there is a fixed linkage phase between any two loci. These two features of inbred lines greatly facilitate statistical inference about the QTL location and effects.
In practice, it is difficult or impossible to generate inbred lines for outcrossing species such as forest trees because of their high heterozygosity and long generation intervals. For any two heterozygous individuals, the number of alleles per locus can differ from gene to gene, leading to different segregation patterns when the two individuals are crossed. Wu et al. (2002) listed all possible types of marker segregation in a full-sib family derived from two heterozygous lines. For a given heterozygous line, there is uncertainty about the linkage phase between any pair of loci, i.e., diplotype when the two homozygous chromosomes are considered together. Despite these difficulties, various models and methods for linkage analysis in out-crossing species have been developed through the collective efforts of statisticians and geneticists (Grattapaglia and Sederoff, 1994; Maliepaard et al., 1997; Wu et al., 2002). Lu et al. (2004) derived a general framework that covers all these approaches and allows for linkage analysis between any types of markers by simultaneously estimating the recombination fraction, parental diplotype and gene order. More recently, Tong et al. (2010) described a hidden Markov model approach for multilocus linkage analysis and developed a Windows-based software to construct genetic linkage maps with different segregation markers in a full-sib family.
Nevertheless, despite these advances, there has been limited exploration of the modeling and analysis of QTL mapping in outcrossing species. Haley et al. (1994) proposed an approach for mapping outcrossing QTLs in an experimental cross with the F2 type markers. Although this approach was used to detect QTLs in pigs (Andersson et al., 1994), it did not receive widespread acceptance because of its failure to incorporate the linkage phase of the parents and any type of marker segregation. Lin et al. (2003) subsequently proposed a general statistical model for simultaneously estimating the QTL-marker linkage phase, QTL location and QTL effects in an outcrossed family. Although some key statistical issues of the latter model have been investigated, there has been no systematic modeling of QTL segregation patterns.
In this article, we propose a method for selecting the appropriate model for mapping QTL intervals in a full-sib family derived from two outcrossing parents by considering all possible patterns of QTL segregation, i.e., test cross (1:1 segregation), F2 cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). The most likely QTL segregation pattern for a sample was chosen based on model selection criteria such as Akaike’s information criterion (AIC; Akaike, 1974), the Bayesian information criterion (BIC; Schwarz, 1978) and the Laplace-empirical criterion (LEC; McLachlan and Pell, 2000). The method capitalizes on all types of marker segregation and provides simultaneous estimates of the QTL segregation pattern, QTL location and QTL effects. Simulations were used to investigate the statistical behavior of this QTL mapping approach. A Windows-based software was developed to implement the statistical model for QTL mapping in outbred species and the usefulness of the method was validated by using an out-crossing forest tree as an example.
Materials and Methods
Segregation pattern
Suppose that two outcrossing lines, P1 and P2, are crossed to generate a full-sib family. The number of different alleles at an informative marker locus in the two parents may be 2, 3 or 4. Maliepaard et al. (1997) showed that the possible combinations of two parental genotypes at an informative marker locus, i.e., segregation types, were ab × aa, aa × ab, ab × ab, ab × cd, ao × ao, ab × ao or ao × ab, where a, b, c and d denote different alleles at a marker locus and o denotes the null allele, with the two characters to the left of the crossing symbol representing the marker genotype of P1 and the two characters on the right representing the marker genotype of P2. The linkage analysis used to estimate recombination and linkage phase inference between any two markers is well-defined (Wu et al., 2002, 2007; Lu et al., 2004; Tong et al., 2010) and allows the construction of an integrated linkage map that can contain any type of segregation markers. Similarly, a QTL may also have up to four alleles and present different segregation types. However, some of the segregation types, such as q1q1 × q1q2 and q1q1 × q2q3, cannot be distinguished from each other because of inadequate information about allelic configurations.
The QTL segregation patterns are generally classified into three types: (1) test cross, in which the segregation type is q1q1 × q1q2 or q1q2 × q1q1 that can generate two genotypes, q1q1 and q1q2 (1:1 segregation), (2) F2 cross, in which the segregation type is q1q2 × q1q2 that can generate three genotypes, q1q1, q1q2 and q2q2 (1:2:1 segregation) and (3) full cross, in which the segregation type is q1q2 × q3q4 that can generate four genotypes, q1q3, q1q4, q2q3, and q2q4 (1:1:1:1 segregation). Each of these QTL segregation types reflects different degrees of information and can be discriminated from the others by using appropriate model selection criteria.
Conditional probability
Consider two molecular markers and a putative QTL in the interval of two markers on a chromosome in a diploid full-sib family. We initially assume that there are four alleles for each molecular marker loci or QTL and that the combined genotypes of the two parents at two markers and a QTL are denoted by a1q1a2 / b1q2b2 and c1q3c2 / d1q4d2, where the slash is used to segregate the two haplotypes of a genotype. If r is the recombination fraction between the markers, r1 the recombination fraction between marker 1 and the QTL, and r2 the recombination fraction between the QTL and marker 2, then we have the relationship r = r1 + r2 - 2r1r2, assuming that there is no interference between two intervals on chromosomes. The frequencies or probabilities of the combined genotypes in the progeny can be easily derived, as shown in Table 1, in which the elements were multiplied by 4. For the other marker and QTL segregation patterns, the probability of marker and QTL genotype can be obtained by first merging the rows of the same marker genotype and then the columns of the same QTL genotype in Table 1. Once the probabilities of all the marker and QTL genotypes have been obtained, the conditional probability of a QTL genotype given the combined genotype of the two markers can be obtained by dividing the probability of the corresponding marker and QTL genotype by the sum of all the probabilities with the same given marker genotype.
Table 1.
Marker genotype | QTL genotype
|
|||
---|---|---|---|---|
q1q3 | q2q3 | q1q4 | q2q4 | |
a1c1 a2c2 | (1 - r1)2 (1 - r2)2 | r1(1 - r1) r2 (1 - r2) | r1(1 - r1) r2 (1 - r2) | r12 r22 |
a1c1 b2c2 | (1 - r1)2 r2 (1 - r2) | r1(1 - r1) (1 -r2)2 | r1(1 - r1)r22 | r12r2(1 - r2) |
a1c1 a2d2 | (1 - r1)2 r2 (1 - r2) | r1(1 - r1) r22 | r1(1 -r1) (1 -r2)2 | r12r2(1 - r2) |
a1c1 b2d2 | (1 - r1)2 r22 | r1(1 - r1) r2 (1 -r2) | r1(1 - r1) r2 (1 - r2) | r12(1 - r2)2 |
b1c1 a2c2 | r1(1 - r1) (1 - r2)2 | (1 - r1)2 r2 (1 - r2) | r12r2(1 - r2) | r1(1 -r1) r22 |
b1c1 b2c2 | r1(1 -r1)r2(1 - r2) | (1 - r1)2 (1 - r2)2 | r12 r22 | r1(1 - r1) r2 (1 - r2) |
b1c1 a2d2 | r1(1 - r1) r2 (1 - r2) | (1 - r1)2 r22 | r12(1 - r2)2 | r1(1 - r1) r2 (1 - r2) |
b1c1 b2d2 | r1(1 - r1) r22 | (1 - r1)2 r2 (1 - r2) | r12r2(1 - r2) | r1(1 - r1) (1 - r2)2 |
a1d1 a2c2 | r1(1 - r1) (1 - r2)2 | r12 r2(1 - r2) | (1 - r1)2 r2 (1 - r2) | r1(1 - r1) r22 |
a1d1 b2c2 | r1(1 - r1) r2 (1 - r2) | r12(1 - r2)2 | (1 - r1)2 r22 | r1(1 - r1) r2 (1 - r2) |
a1d1 a2d2 | r1(1 - r1) r2 (1 - r2) | r12 r22 | (1 - r1)2 (1 - r2)2 | r1(1 - r1) r2 (1 - r2) |
a1d1 b2d2 | r1(1 -r1) r22 | r12r2(1 - r2) | (1 - r1)2 r2 (1 - r2) | r1(1 -r1) (1 - r2)2 |
b1d1 a2c2 | r12(1 - r2)2 | r1(1 - r1) r2 (1 - r2) | r1(1 - r1) r2 (1 - r2) | (1 - r1)2 r22 |
b1d1 b2c2 | r12r2(1 - r2) | r1(1 - r1) (1 - r2)2 | r1(1 - r1) r22 | (1 - r1) 2 r2 (1 - r2) |
b1d1 a2d2 | r12 r2(1 - r2) | r1(1 - r1) r22 | r1(1 - r1) (1 - r2)2 | (1 - r1)2 r2 (1 - r2) |
b1d1 b2d2 | r12 r22 | r1(1 - r1) r2 (1 -r2) | r1(1 - r1) r2 (1 - r2) | (1 - r1)2 (1 - r2)2 |
Mixed model
For a given QTL segregation pattern, let J be the number of QTL genotypes (J = 2, 3 or 4). Assume that a quantitative trait is distributed as a normal distribution with mean μj and variance σ2 within the jth QTL genotype (j = 1,..., J). The phenotypic value of the ith individual, yi, will then have a mixture of normal distributions:
where pj|i is the conditional probability of the jth QTL genotype given the marker genotype of the ith individual.
For a sample of n individuals in the full-sib family, the likelihood of the parameter vector, θ = (μ1,..., μJ, σ2), for a specific position on the chromosome, can be written as
(1) |
where
is the density function of a normal distribution.
EM algorithm
Under the full model, the maximum-likelihood estimates of the parameters can be obtained with a form of the expectation-maximization (EM) algorithm (Dempster et al., 1977). For iteration s + 1, assume that we have estimates of the parameter θ̂(s). In the E-step, we calculate the conditional mean of the complete data log likelihood, which involves calculating the posterior probability of individual i having the jth QTL genotype, as
(2) |
In the M-step, we maximize the log likelihood by updating the estimates of μj and σ2 as
(3) |
(4) |
The EM algorithm is then initiated by taking
until the estimates converge, where ȳ is the empirical mean of observations.
Hypothesis testing
The null hypothesis of no QTL segregating at the specific position of the chromosome is
implying that the distribution of the quantitative phenotype does not depend on the genotype of the putative QTL. The corresponding likelihood function is
(5) |
where μ0 and are the mean and variance of the overall population, respectively, and θ0 (μ0, ) is the parameter vector.
Under the null model, the maximum likelihood of parameters can be directly obtained as
The test statistic for the above hypothesis can be expressed as the log-likelihood ratio of the full model over the null model:
(6) |
where θ̂ = (μ̂1, . . . , μ̂J, σ̂2) and θ̂0 = (μ̂0, ) are two vectors of the maximum likelihood estimates under the full model and null model, respectively. If a high peak of the LR profile exceeds a critical threshold then a QTL that controls the trait is asserted to exist in a marker interval. Because LR may not be asymptotically distributed as a chi-square distribution an empirical method for determining the genome-wide threshold can be used by performing permutation tests (Churchill and Doerge, 1994).
Model selection
The purpose of model selection is to identify a model that has a balance between the goodness-of-fit of the data and the complexity of the model. Fisher’s maximum likelihood cannot be used as a criterion for model selection because a simpler model has to be a subset of a more complicated model and, hence, the maximum likelihood of the former is always less than that of the latter. Akaike’s information criterion (AIC; Akaike, 1974) and the Bayesian information criterion (BIC; Schwarz, 1978) are commonly used for model selection. AIC and BIC are defined as
(7) |
and
(8) |
where L(θ̂)is the maximum likelihood, d the number of parameters to be estimated in the model, and n the sample size. AIC is derived in terms of Kullback and Leibler (1951) information for the true model with respect to the fitted model while BIC is based on an integrated likelihood within a Bayesian framework.
In addition to the above two criteria, the Laplace-Empirical criterion (LEC; McLachlan and Pell, 2000) was expected to be a good choice for model selection. LEC not only contains information on the number of parameters and sample size in a model but also provides a priori information on the parameters and information matrix of the log likelihood function. LEC is defined as
(9) |
where p(θ̂) is the prior probability density of the estimated parameters and Ie (θ̂) is the observed information matrix, i.e., the negative Hessian matrix of the log likelihood, both evaluated at the maximum likelihood estimate vector θ̂. We assumed, as did Roberts et al. (1998), that the estimated parameter μj was uniformly distributed over the interval of length 2σ̂0 for j = 1,..., J, that σ2 was uniformly distributed in the interval (0, ) and that all are independent. The LEC for our QTL mapping model can therefore be written as
(10) |
where J is the number of QTL genotypes for a certain QTL segregation pattern. The appendix (in Supplementary Material) provides the details of each element of the matrix used to calculate the determinant of Ie (θ̂).
The approach described above allowed us to choose the model that was most likely to provide the minimum AIC, BIC or LEC among the three QTL segregation patterns for a specific position on a chromosome. The power of AIC, BIC and LEC was assessed through Monte Carlo simulations.
Monte Carlo simulations
To assess the usefulness of the QTL mapping method and model selection in different QTL segregation patterns in a full-sib family we simulated a 100 cM-long chromosome with six markers evenly spaced along the chromosome. As indicated by Maliepaard et al. (1997), the segregation patterns of the six markers were aa × ab, ab × cd, aa × ab, ab × cd, ab × ab and aa × ab, and the linkage phase between two adjacent markers were r, r, r, c × r and c, respectively. One QTL was simulated at position 50 cM and the QTL segregation patterns were assumed to be: (1) q1q1 × q1q2, (2) q1q2 × q1q2 or (3) q1q2 × q3q4, corresponding to the three different QTL segregation patterns.
In the simulation, the effects of the QTL genotypes were set to be μ1 = 15 and μ2 = 10 for the test cross segregation pattern, μ1 = 20, μ2 = 16 and μ3 = 10 for the F2 segregation pattern, and μ1 = 20, μ2 = 18, μ3 = 14 and μ4 = 10 for the full cross segregation pattern. The heritability of the QTL was set at values of h2 = 0.10, 0.15, 0.20, 0.30 and 0.50. The variance of the environment effect, σ2, was therefore determined by the variance and the heritability of the assumed QTL and was defined by the relationship . For example, in the test cross segregation pattern, if h2 = 0.30, then σ2 = 14.6 because in this case . For each case of the simulation, we sampled 500 individuals from a full-sib family with 1000 replicates. Model selection criteria such as LEC, AIC and BIC were used to select the best model among the three competing models in this study and the power of these criteria was calculated based on 1000 replicates. The statistical power for each model was obtained by counting the number of runs out of 1000 replicates in which the model selection was correct and the LR value was greater than an empirical threshold. The threshold of the LR for each model was estimated by an additional 1000 simulations with no QTL segregation. Generally, the 0.95 or 0.99 quantile of the 1000 LR values under the null model was used as the empirical threshold.
Software development
We developed a Windows-based software, designated as FsQtlMap, to implement the statistical methods for QTL mapping in a full-sib family. FsQtlMap is written in VC++ 6.0 and runs on Microsoft Windows operating systems, including Windows 2000, 2003, XP, Vista and 7. The software assumes that the segregation pattern in a QTL may be test cross (1:1 segregation), F2 cross (1:2:1 segregation) or full cross (1:1:1:1 segregation) in a full-sib family and uses LEC as a model selection criterion to determine the QTL segregation ratio. The summary of QTL detection and a series of intermediate results are generated and saved in the corresponding files associated with QTL mapping. FsQtlMap uses the free software gnuplot to plot LOD (the logarithm of the odds based on 10) profiles along the linkage groups; the plots are generated in enhanced metafile format (EMF) and postscript (PS) format. FsQtlMap also provides a function that runs permutation tests to yield the genome-wide LOD threshold for asserting that a given peak of the profile is a QTL for each of the three QTL models. Further details on the data format and operational procedures are provided in the FsQtlMap manual. The software and its manual can be freely downloaded from http://fgbio.njfu.edu.cn/tong/FsQtlMap/FsQtlMap.htm.
A real example
The applicability of our statistical method for mapping QTLs in a full-sib family was demonstrated for a forest tree, specifically an interspecific F1 hybrid population between Populus deltoides and Populus euramericana in Xuchou, Jiangsu Province, China. Ninety-three genotypes randomly selected from the population were used to construct the genetic linkage map based on molecular markers detected by RAPD, AFLP, ISSR, SSR and SNP analysis (Zhang, 2005). The linkage map contained 19 linkage groups and 314 markers, of which 252 segregated in a 1:1 ratio, 7 in a 1:2:1 ratio and 55 in a 1:1:1:1 ratio. The linkage phases of the two parents between any two adjacent markers on the map were also predicted. Our analysis identified QTLs that affected the root number, an adventitious root trait, in all of the 19 linkage groups in the integrated map of P. deltoides and P. euramericana.
Results
Figure 1 compares the powers for selecting the true model among the three candidate models based on LEC, AIC and BIC. Figure 1a,b indicates that the power of LEC and BIC for selecting the QTL segregation model of test cross and F2 cross was higher than that of AIC for all the heritabilities, whereas Figure 1c shows the opposite, i.e., that the power of AIC for selecting the QTL segregation model of full cross was higher than that of LEC and BIC. Although BIC showed a slight advantage over LEC for selecting the model of test cross and F2 cross, it had drastically lower power than LEC for selecting the model of full cross, especially when the heritability of the QTL was ≤ 0.20. Overall, the powers of LEC and BIC were almost similar in finding the correct model, probably because these criteria are derived from a Bayesian framework for model selection (McLachlan and Pell, 2000). However, the LEC provides more information of the true model than BIC in that the former not only has a priori information of the parameters in the model but also contains information on the negative Hessian matrix of the log likelihood. The result of these simulations suggest that the LEC is the first choice for model selection in mapping QTLs in a full-sib family, a conclusion that agrees well with the findings of model selection theory.
Table 2 provides detailed results on the estimated QTL position, genotypic effects, heritability, and power of model selection using LEC for the three QTL segregation models. The power of model selection increased as the QTL heritability increased and was generally > 90%, except in the case of h2 = 0.10 and 0.15 for the full cross model. The levels of QTL heritability had a strong effect on the precision of the estimates of the QTL position but had a small effect on the estimates of QTL genotypic effects and QTL heritability. A high QTL heritability can yield estimates of the QTL position that tend towards the true value with a small standard deviation. When the QTL heritability was small, as in the case of h2 = 0.10, especially for the full cross model, the estimates of QTL position were biased with a standard deviation up to 10.19. The average estimates of QTL genotypic effects and heritability were almost equal to the true values, but the standard deviations decreased as the heritability increased. The precision of the estimates for QTL position, genotypic effects and heritability decreased as the number of parameters in the model increased. The test cross model yielded the most precise estimates of QTL position, genotypic effects and heritability because it had only three parameters (one for residual or environmental variance and two for QTL genotypic effects) while the full cross model yielded less precise estimates with five parameters. This difference can be explained by the fact that the high complexity of the model decreased the precision of the parameter estimates.
Table 2.
QTL segregation pattern | h2 | QTL position | û1 | û2 | û3 | û4 | û2 | Power |
---|---|---|---|---|---|---|---|---|
Test cross | 0.10 | 49.91 (5.02) | 15.01 (0.51) | 10.00 (0.50) | 0.102 (0.028) | 0.959 | ||
0.15 | 50.00 (3.60) | 15.00 (0.39) | 9.99 (0.40) | 0.152 (0.031) | 0.962 | |||
0.20 | 50.00 (2.96) | 15.01 (0.33) | 10.01 (0.32) | 0.202 (0.033) | 0.965 | |||
0.30 | 50.02 (1.98) | 15.00 (0.25) | 10.00 (0.26) | 0.302 (0.036) | 0.966 | |||
0.50 | 50.02 (1.53) | 15.00 (0.16) | 10.00 (0.17) | 0.501 (0.030) | 0.987 | |||
| ||||||||
F2 cross | 0.10 | 51.36 (7.21) | 20.03 (1.17) | 15.97 (0.84) | 9.98 (1.17) | 0.107 (0.030) | 0.911 | |
0.15 | 50.97 (5.52) | 20.01 (0.95) | 16.00 (0.65) | 10.01 (0.95) | 0.155 (0.036) | 0.942 | ||
0.20 | 50.55 (4.43) | 19.98 (0.76) | 16.00 (0.54) | 10.03 (0.78) | 0.203 (0.038) | 0.948 | ||
0.30 | 50.13 (2.91) | 19.98 (0.57) | 15.99 (0.41) | 10.00 (0.60) | 0.304 (0.043) | 0.946 | ||
0.50 | 50.01 (1.95) | 20.01 (0.38) | 16.00 (0.26) | 10.00 (0.39) | 0.503 (0.041) | 0.961 | ||
| ||||||||
Full cross | 0.10 | 51.43 (10.19) | 20.12 (1.42) | 18.58 (1.21) | 13.38 (1.22) | 9.97 (1.37) | 0.121 (0.043) | 0.679 |
0.15 | 51.27 (8.27) | 20.09 (1.11) | 18.31 (1.00) | 13.77 (0.99) | 9.95 (1.09) | 0.169 (0.050) | 0.812 | |
0.20 | 50.87 (7.22) | 20.02 (0.90) | 18.15 (0.88) | 13.80 (0.85) | 9.98 (0.93) | 0.215 (0.054) | 0.912 | |
0.30 | 50.54 (5.45) | 19.97 (0.70) | 18.03 (0.69) | 13.96 (0.67) | 10.00 (0.72) | 0.308 (0.059) | 0.986 | |
0.50 | 50.13 (2.86) | 19.98 (0.44) | 18.00 (0.42) | 14.00 (0.41) | 10.03 (0.43) | 0.503 (0.046) | 1.000 |
The linkage map of P. deltoides and P. euramericana was scanned with the three QTL segregation models using the interval mapping method. Figure 2 shows the profiles of the log likelihood ratios (LR) generated by each model to detect QTLs that control the adventitious root trait. The critical values determined at the 1% significance level by 1000 permutation tests (Doerge and Churchill, 1996) were 14.71, 21.78 and 27.54 for the test cross, F2 cross, and full cross models, respectively. For each position on a linkage map, the LEC was used to determine the most likely QTL segregation pattern. Six high peaks (A-F) that exceeded the thresholds were detected in the LR profiles (Figure 2). However, since peak E in Figure 2a and peak F in Figure 2b occurred at the same position in marker interval CG/CTT_440R∼ TC/CGT_120, linkage 3 there were only five true QTLs.
Table 3 summarizes the procedure for selecting the most likely QTL segregation pattern for the five positions in Figure 2. According to the LEC, peaks A, C and F were selected to be the significant QTL positions because each of them had the lowest value for LEC and a significant value of LR under the same QTL segregation pattern, whereas peaks B and E were not significant QTL positions since they did not have the lowest values for LEC. However, peak F was close to peak C and they had almost the same genotypic effects so that the former may be considered a ghost QTL (Martinez and Curnow, 1992; Doerge, 2002). Overall, therefore, two QTLs, i.e. peaks A and C, were concluded to be the significant QTLs responsible for root number.
Table 3.
High peak | Group | Position (cM) | Interval | Assumed QTL pattern | LR | LEC | Inferred QTL pattern | Effects
|
Heritability | ||
---|---|---|---|---|---|---|---|---|---|---|---|
μ̂1 | μ̂2 | μ̂3 | |||||||||
A | 3 | 20.97 | W_19B|P_422 | Test cross | 17.60** | 125.52 | Test cross | 1.40 | 1.86 | 0.185 | |
F2 cross | 10.40 | 134.47 | |||||||||
Full cross | 21.07 | 126.44 | |||||||||
| |||||||||||
B | 3 | 43.69 | G_1158|GT/CTC_765R | Test cross | 15.85** | 127.18 | |||||
F2 cross | 17.58 | 126.56 | |||||||||
Full cross | 15.85 | 127.73 | |||||||||
| |||||||||||
C | 3 | 80.68 | TC/CCG_500|TC/CAG_150 | F2 cross | 26.97** | 119.58 | F2 cross | 1.13 | 1.55 | 2.40 | 0.703 |
Test cross | 18.47 | 124.65 | |||||||||
Full cross | 18.47 | 126.92 | |||||||||
| |||||||||||
D | 3 | 82.68 | TC/CCG_500|TC/CAG_150 | Test cross | 18.62** | 124.48 | |||||
F2 cross | 26.87 | 119.50 | |||||||||
Full cross | 18.62 | 126.72 | |||||||||
| |||||||||||
E | 3 | 99.61 | CG/CTT_440R|TC/CGT_120 | Test cross | 18.79** | 124.38 | |||||
| |||||||||||
F | F2 cross | 23.07** | 121.68 | F2 cross | 1.24 | 1.54 | 2.31 | 0.532 | |||
Full cross | 18.79 | 125.94 |
p < 0.01.
Discussion
The efforts of many statistical geneticists in the past two decades mean that genetic linkage maps can now be constructed using different segregation molecular marker data from full-sib families in species such as forest trees in which inbred lines are almost impossible to obtain through traditional self-mating for many generations (Maliepaard et al., 1997; Wu et al., 2002; Lu et al., 2004; Tong et al., 2010). Two softwares, JoinMap (Van Ooijen, 2006) and FsLinkageMap (Tong et al., 2010), are available for constructing an integrated genetic linkage map with predicted linkage phase between any two adjacent markers. Based on such genetic linkage maps in outbred species, we have now proposed a method for selecting the appropriate model for detecting QTLs by considering three QTL segregation patterns, i.e., test cross (1:1 segregation), F2 cross (1:2:1 segregation) and full cross (1:1:1:1 segregation). Our method has some advantages in the genetic mapping of complex traits by accounting for the biological characteristics of forest trees.
First, our QTL mapping method with model selection procedures allows one to choose the most likely QTL segregation pattern of the three assumed patterns. Like molecular markers, QTL segregation may show different patterns throughout the genome in an outcrossing species. Hence, it is reasonable to incorporate different QTL segregation modes into a statistical model for QTL mapping in a full-sib family. However, MapQTL (Van Ooijen, 2009), the only available software that can be used to detect QTLs with data from a full-sib family, assumes that the QTL segregation is fixed as ab × cd. This is the case of the full cross pattern in our statistical model. The shortcoming of MapQTL can be illustrated by the real example described above in which QTLs were detected segregating in test cross and F2 cross patterns. This means that no QTLs would be found if QTL mapping in this example were done with MapQTL.
Second, our QTL mapping method could be done by using genetic linkage maps of outbred species that had been constructed in the past 20 years. For example, in forest trees, many parent-specific linkage maps (Plomion et al., 1995; Wu et al., 2000; Yin et al., 2002; Shepherd et al., 2003; Gan et al., 2003) have been constructed and QTL mapping studies have also been done with the pseudo-test cross strategy first proposed by Grattapaglia and Sederoff (1994). This method has some limitations in QTL mapping in that the linkage phase between adjacent two markers and possible multiple QTL segregation patterns are not considered. The application of our QTL mapping method to these previous data would be expected to yield better results.
Third, the use of LEC as the criterion for identifying QTL segregation patterns is not only supported by the simulation results but also by the quantity itself. Model selection is an important but very difficult problem that has not been completely resolved for mixed models (McLachlan and Pell, 2000). Although AIC and BIC have been extensively applied to many situations, they were apparently unable to select the correct QTL segregation ratio in our QTL mapping models (Figure 1). Unlike AIC and BIC, LEC contains more information about the model itself. LEC not only contains the number of estimated parameters and the sample size but also the prior probabilities of the estimated parameters and the negative Hessian matrix of the log likelihood. These characteristics indicate that LEC generally has a higher power than AIC and BIC in model selection.
Finally and most importantly, we have developed a Windows-based software (FsQtlMap) to allows the immediate implementation of our QTL mapping strategy. Computer packages for QTL mapping, such as MapMaker/QTL (Lincoln and Lander, 1990) and Windows QTL Cartographer (Wang et al., 2010), are well-established and have been extensively used for inbred lines. In contrast, there are no popular statistical tools for QTL mapping in outbred species such as forest trees. Although MapQTL (Van Ooijen, 2009) has been used for QTL mapping in forest trees by some researchers, its application is limited by the assumption that there is only one QTL segregation pattern in a full-sib family. By incorporating the characteristics of outcross species FsQtlMap provides a much more powerful computing tool for QTL mapping.
Our new QTL mapping method was applied to real data and successfully detected two QTLs that affect adventitious roots in Populus. One QTL segregated in an F2 cross and had much higher heritability. This finding indicates that the rooting capacity of poplars may be controlled by a major gene that can explain ∼70% of the phenotypic variance. This conclusion is consistent with that of Han et al. (1994).
Acknowledgments
We thank Rongling Wu and two anonymous reviewers for their constructive suggestions and comments on this manuscript. This work was partially supported through a project funded by the Priority Academic Program Development (PAPD) of the Jiangsu Higher Education Institutions and the National Natural Science Foundation of China (grant no. 30872051).
Footnotes
Associate Editor: Everaldo Gonçalves de Barros
Supplementary Material
The following online material is available for this article:
- Appendix: Elements of the information matrix of the log- likelihood.
This material is available as part of the online article from http://www.scielo.br/gmb.
References
- Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19:716–723. [Google Scholar]
- Andersson L, Haley CS, Ellegren H, Knott SA, Johansson M, Andersson K, Andersson-Eklund L, Edfors-Lilja I, Fredholm M, Hansson I, et al. Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science. 1994;263:1771–1774. doi: 10.1126/science.8134840. [DOI] [PubMed] [Google Scholar]
- Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B (Methodological) 1977;39:1–38. [Google Scholar]
- Doerge RW. Mapping and analysis of quantitative trait loci in experimental populations. Nat Rev Genet. 2002;3:43–52. doi: 10.1038/nrg703. [DOI] [PubMed] [Google Scholar]
- Doerge RW, Churchill GA. Permutation tests for multiple loci affecting a quantitative character. Genetics. 1996;142:285–294. doi: 10.1093/genetics/142.1.285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gan S, Shi J, Li M, Wu K, Wu J, Bai J. Moderate-density molecular maps of Eucalyptus urophylla S. T. Blake and E. tereticornis Smith genomes based on RAPD markers. Genetica. 2003;118:59–67. doi: 10.1023/a:1022966018079. [DOI] [PubMed] [Google Scholar]
- Grattapaglia D, Sederoff R. Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: Mapping strategy and RAPD markers. Genetics. 1994;137:1121–1137. doi: 10.1093/genetics/137.4.1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haley CS, Knott SA, Elsen JM. Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics. 1994;136:1195–1207. doi: 10.1093/genetics/136.3.1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han K, Bradshaw HD, Gordon MP, Han KH. Adventitious root and shoot regeneration in vitro is under major gene control in an F2 family of hybrid poplar (Populus trichocarpa × P. deltoides) Forest Genet. 1994;1:139–146. [Google Scholar]
- Jansen RC, Stam P. High resolution of quantitative traits into multiple loci via interval mapping. Genetics. 1994;136:1447–1455. doi: 10.1093/genetics/136.4.1447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kao C-H, Zeng Z-B, Teasdale RD. Multiple interval mapping for quantitative trait loci. Genetics. 1999;152:1203–1216. doi: 10.1093/genetics/152.3.1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kullback S, Leibler RA. On information and sufficiency. Ann Math Statist. 1951;22:79–86. [Google Scholar]
- Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121:185–199. doi: 10.1093/genetics/121.1.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin M, Lou XY, Chang M, Wu RL. A general statistical framework for mapping quantitative trait loci in nonmodel systems: Issue for characterizing linkage phases. Genetics. 2003;165:901–913. doi: 10.1093/genetics/165.2.901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lincoln SE, Lander ES. Whitehead Institute for Biomedical Research; Cambridge: 1990. Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL; p. 46. Technical Report. [Google Scholar]
- Lu Q, Cui YH, Wu RL. A multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family. BMC Genetics. 2004;5:e20. doi: 10.1186/1471-2156-5-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maliepaard C, Jansen J, Van Ooijen JW. Linkage analysis in a full-sib family of an outbreeding plant species: Overview and consequences for applications. Genet Res. 1997;70:237–250. [Google Scholar]
- Martinez O, Curnow RN. Estimating the locations and the size of the effects of quantitative trait loci using flanking markers. Theor Appl Genet. 1992;85:480–488. doi: 10.1007/BF00222330. [DOI] [PubMed] [Google Scholar]
- McLachlan G, Pell D. Finite Mixture Models. John Wiley & Sons; New York: 2000. p. 419. [Google Scholar]
- Plomion CD, Malley MO, Durel CE. Genomic analysis in maritime pine (Pinus pinaster). Comparison of two RAPD maps using selfed and open-pollinated seeds of the same individual. Theor Appl Genet. 1995;90:1028–1034. doi: 10.1007/BF00222917. [DOI] [PubMed] [Google Scholar]
- Roberts SJ, Husmeier D, Rezek I, Penny W. Bayesian approaches to Gaussian modeling. IEEE Trans Pattern Anal Mach Intell. 1998;20:1133–1142. [Google Scholar]
- Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6:461–464. [Google Scholar]
- Shepherd M, Cross M, Dieters MJ, Herry R. Genetic maps for Pinus elliottii var. elliottii and P. caribaea var. hondurensis using AFLP and microsatellite markers. Theor Appl Genet. 2003;106:1409–1419. doi: 10.1007/s00122-002-1185-9. [DOI] [PubMed] [Google Scholar]
- Tong CF, Zhang B, Shi JS. A hidden Markov model approach to multilocus linkage analysis in a full-sib family. Tree Genet Genomes. 2010;6:651–662. [Google Scholar]
- Van Ooijen JW. JoinMap 4, Software for the calculation of genetic linkage maps in experimental populations. Kyazma BV; Wageningen, Netherlands: 2006. [Google Scholar]
- Van Ooijen JW. MapQTL 6, Software for the mapping of quantitative trait loci in experimental populations of diploid species. Kyazma BV; Wageningen, Netherlands: 2009. [Google Scholar]
- Wang S, Basten CJ, Zeng Z-B. Windows QTL Cartographer 2.5. 2010. Department of Statistics, North Carolina State University, Raleigh, NC.
- Wu RL, Han YF, Hu JJ, Fang JJ, Li L, Li LM, Zeng Z-B. An integrated genetic map of Populus deltoids based on amplified fragment length polymorphisms. Theor Appl Genet. 2000;100:1249–1256. [Google Scholar]
- Wu RL, Ma CX, Painter I, Zeng Z-B. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theor Popul Biol. 2002;61:349–363. doi: 10.1006/tpbi.2002.1577. [DOI] [PubMed] [Google Scholar]
- Wu RL, Ma CX, Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL. Springer; New York: 2007. p. 365. [Google Scholar]
- Xu S, Atchley WR. Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics. 1996;143:1417–1424. doi: 10.1093/genetics/143.3.1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu C, Li Z, Xu S. Joint mapping of quantitative trait loci for multiple binary characters. Genetics. 2005;169:1045–1059. doi: 10.1534/genetics.103.019406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi N, Xu S. Bayesian mapping of quantitative trait loci for complex binary traits. Genetics. 2000;155:1391–1403. doi: 10.1093/genetics/155.3.1391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin TM, Zhang XY, Huang MR, Wang MX, Zhuge Q, Tu SM, Zhu LH, Wu RL. Molecular linkage maps of the Populus genome. Genome. 2002;45:541–555. doi: 10.1139/g02-013. [DOI] [PubMed] [Google Scholar]
- Zeng Z-B. Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc Natl Acad Sci USA. 1993;90:10972–10976. doi: 10.1073/pnas.90.23.10972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng Z-B. Precision mapping of quantitative trait loci. Genetics. 1994;136:1457–1468. doi: 10.1093/genetics/136.4.1457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng Z-B, Kao C-H, Basten CJ. Estimating the genetics architecture of quantitative traits. Genet Res. 1999;74:279–289. doi: 10.1017/s0016672399004255. [DOI] [PubMed] [Google Scholar]
Internet Resources
- gnuplot, http://www.gnuplot.info.
- FsQtlMap manual, http://fgbio.njfu.edu.cn/tong/FsQtlMap/FsQtlMap.htm.
- Zhang B. Constructing genetic linkage maps and mapping QTLs affecting important traits in poplar. 2005. PhD Dissertation, Nanjing Forestry University, Nanjing, China. http://fgbio.njfu.edu.cn/tong/zhang2005.pdf.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
- Appendix: Elements of the information matrix of the log- likelihood.