Skip to main content
Genetics logoLink to Genetics
. 2006 Aug;173(4):2339–2356. doi: 10.1534/genetics.105.054775

Mapping Quantitative Trait Loci for Longitudinal Traits in Line Crosses

Runqing Yang *, Quan Tian *, Shizhong Xu †,1
PMCID: PMC1569730  PMID: 16751670

Abstract

Quantitative traits whose phenotypic values change over time are called longitudinal traits. Genetic analyses of longitudinal traits can be conducted using any of the following approaches: (1) treating the phenotypic values at different time points as repeated measurements of the same trait and analyzing the trait under the repeated measurements framework, (2) treating the phenotypes measured from different time points as different traits and analyzing the traits jointly on the basis of the theory of multivariate analysis, and (3) fitting a growth curve to the phenotypic values across time points and analyzing the fitted parameters of the growth trajectory under the theory of multivariate analysis. The third approach has been used in QTL mapping for longitudinal traits by fitting the data to a logistic growth trajectory. This approach applies only to the particular S-shaped growth process. In practice, a longitudinal trait may show a trajectory of any shape. We demonstrate that one can describe a longitudinal trait with orthogonal polynomials, which are sufficiently general for fitting any shaped curve. We develop a mixed-model methodology for QTL mapping of longitudinal traits and a maximum-likelihood method for parameter estimation and statistical tests. The expectation-maximization (EM) algorithm is applied to search for the maximum-likelihood estimates of parameters. The method is verified with simulated data and demonstrated with experimental data from a pseudobackcross family of Populus (poplar) trees.


THE genetic variance of a quantitative trait is controlled by the segregation of many genes, each with a small effect. Small genetic effects are collectively called the polygenic effects. In the infinitesimal model, only polygenic effects exist. An alternative to this classical definition of genetic variance of a quantitative trait is the theory that the genetic variances of many quantitative traits are actually controlled by the segregation of one or more major genes plus numerous small genetic effects. Using the latter definition, the phenotype of interest still shows a continuous distribution because of the joint contribution from the polygenic effects and random environmental variant. Such a model is called the oligogenic model, which can be tested using segregation analysis (Elston and Stewart 1971; Morton and MacLean 1974). In fact, the oligogenic model is the basis for QTL mapping because QTL analysis with the current statistical methods and sample sizes (n < 1000) can detect only genes with major or moderate effects (e.g., Haley and Knott 1992; Zeng 1994; Kao et al. 1999).

When a trait is measured repeatedly over time, it is called a time-dependent trait or longitudinal trait. Three approaches are currently available for analyzing longitudinal traits. The first approach is to treat the phenotypic values at different time points as repeated measurements of the same trait and analyze the trait under the repeated measurements framework. The second approach is to treat the phenotypes measured from different time points as different traits and analyze the traits jointly on the basis of the theory of multivariate analysis (Wu et al. 1999). The third approach is to fit a growth curve to the phenotypic values across different time points and analyze the fitted parameters of the growth trajectory under the theory of multivariate analysis in a much reduced dimension (Wu et al. 2002). The method of fitting a growth trajectory is considered to be the optimal one because it treats phenotypes measured over time as different traits, but takes into account the correlation generated by the ordered time points.

Abundant molecular markers are now available for QTL mapping. Many QTL mapping studies have focused on traits measured at a single time point (e.g., Haley and Knott 1992; Zeng 1994). Methods for single-trait QTL mapping cannot be directly adopted for longitudinal traits because phenotypes measured at different time points may be controlled by different sets of genes (Nuzhdin and Pasyukova 1997; Verhaegen et al. 1997; Emebiri et al. 1998; Yan et al. 1998a,b; Wu et al. 1999, 2002a). Several methods have been proposed for multiple-trait analysis in the context of QTL mapping (e.g., Jiang and Zeng 1995; Korol et al. 1995; Ronin et al. 1995; Eaves et al. 1996; Wu et al. 1999; Knott and Haley 2000). The method of Jiang and Zeng (1995) is one of the first methods for multivariate QTL mapping. In practice, the method is suitable only for a few traits. As the number of traits increases, computational time will be a prohibitive factor. To reduce the dimensionality of the multivariate analysis, a principle component analysis has been proposed (Mangin et al. 1998; Korol et al. 2001), in which the first principle component, called a “super trait,” is used as a representative of the multiple-trait complex. QTL mapping can be performed on the super trait, which, methodologically, is not different from the univariate QTL mapping. However, the interpretation of the super trait in a biological setting is challenging.

It is in general agreement that the phenotypic values measured over time are correlated. The correlation may be generated by some underlying biological process. In fact, if the trait were measured at an infinite number of time points, the phenotypic value would form a smooth curve as a function of time. This phenotypic profile is called the growth trajectory, which may be governed by a few growth parameters. Genetic analysis of traits with time-course repeated measurements may be conducted on the few growth parameters. The work of Wu et al. (2002) was among the early studies on QTL mapping based on the growth models. The authors first fit the phenotypes of a longitudinal trait to a growth model (Equations 3 and 4 of Wu et al. 2002) and then treated the estimated growth parameters as multiple traits. QTL mapping was then conducted on the growth parameters using a standard multivariate QTL mapping procedure (Jiang and Zeng 1995). The two-step approach may not be optimal because errors in the estimated growth parameters (the first step) have not been taken into account. Nevertheless, the idea presented by Wu et al. (2002) is fundamentally important in the study of longitudinal traits. Other scientists have been attracted to this important area of QTL mapping. Rongling Wu and his colleagues (Ma et al. 2002; Wu et al. 2002b, 2003, 2004) presented substantial improvement on the method of Weiren Wu and his group (Wu et al. 2002). The improvement was reflected by the change from two steps to a single step. Ma et al. (2002) used a logistic curve to model the growth trajectory, in which the genetic value is described by Inline graphic, where t is the time point, a is the asymptotic value of g, r is the relative rate of growth, and b (combined with a) is a parameter describing the starting point of g. The combination of the three parameters, {a, b, r}, determines the actual shape of the growth trajectory. Each of the three parameters is further partitioned into an additive and a dominance effect. If the trait is measured at Inline graphic time points where Inline graphic, analysis of the Inline graphic measurements is always performed in three dimensions, regardless of the dimension of Inline graphic. The time-course functional trait analysis replaces a continuous function with a finite number of collected data points to facilitate estimation of the functional parameters.

A growth trajectory is governed by the biological law of growth (West et al. 2001). The shape, in general, is sigmoid or S-like, a monotonic increasing function of time. Many biological curves, however, cannot be described by the monotonic logistic function. For example, the yearly gain of growth in trees is not sigmoid with time; rather, it is a negative exponential. Daily milk production of cows may be better described by a quadratic function of time. A reaction norm (a trait measured in a continuously varying environment) may have an arbitrary shape. The logistic function cannot be applied to these curves as they are not sigmoid.

Even if the biological process has an S-shaped trajectory, the lack of linearity in the logistic regression limits the use of extensively developed inference methods available for linear models. As a result, the EM algorithm developed for the logistic function cannot be generalized to other types of functions. If we follow the approach of logistic analysis and develop methods using a different specialized function for a different curve, a different EM algorithm would be required for each of the functions, because each function involves a different set of parameters with different meanings. This is another area that needs further investigation.

We propose to use orthogonal polynomials to fit the biological curve and perform QTL mapping for longitudinal traits with the following justifications. First, the polynomial analysis is sufficiently flexible for fitting a biological curve in an arbitrary shape. This can be achieved by choosing different orders of the polynomials. Second, the polynomial function is linear in the parameters, although it is nonlinear in time. As a consequence, we can take advantage of the extensively developed inference methods available for linear models. Third, the same EM algorithm developed for a linear mixed-effects model can be applied to all types of biological curves, requiring no modification, which is clearly in contrast to the aforementioned logistic approach.

Orthogonal polynomials have been applied to genetic analysis for longitudinal traits in pedigree data (Henderson 1982; Swalve 2000; Jensen 2001). These analyses utilized the random regression model (RRM) because the regression (polynomial) coefficients of individual animals were treated as random effects. Meyer and Hill (1997) showed the equivalence of the covariance function of Kirkpatrick et al. (1990) to the RRM. A key element in the RRM analysis is the function submodels for the random regression (Misztal et al. 2000). One such orthogonal polynomial is the normalized Legendre polynomial (Kirkpatrick et al. 1990; Meyer 2000), which has been extensively used to evaluate breeding values for various longitudinal traits of large farm animals (Schaeffer 2004). The publication of the first QTL mapping study for longitudinal traits in pedigrees has been conducted using Legendre orthogonal polynomials (Macgregor et al. 2005), demonstrating the applied importance of this approach. Notably, dominance and epistatic effects of QTL were assumed to be absent in that study. Here we extend this idea to a much more general framework, which includes dominance as well as traditional QTL populations such as the F2 and backcross.

In this study, we take the polynomial approach to mapping QTL for longitudinal traits, but address the problem in the context of line-crossing experiments. Line crosses are major designs of the experiment for genetic analysis. An F2 population derived from the cross of two inbred lines is commonly used genetic material for QTL mapping. We first introduce the theory and methodology based on the F2 mating design. We then conduct a series of simulation experiments to validate the method. Finally, we apply the method to QTL mapping for the growth trajectory in a pseudobackcross family of Populus (poplar) trees and compare the result with that of Ma et al. (2002).

THEORY AND METHODS

Genetic model for longitudinal traits:

On the basis of Mendel's law of inheritance, there are three possible genotypes in an F2 population for a segregating locus, denoted by AA, Aa, and aa, respectively, for the three genotypes. Let Inline graphic be the phenotypic value of individual i measured at time τ (a standardized time point ranging from −1 to +1, see appendix b for the definition of standardized time point). This phenotypic value Inline graphic can be described by the following linear model,

graphic file with name M8.gif (1)

for Inline graphic, where n is the number of individuals, μ(τ) is the population mean, Inline graphic is a genotype indicator variable (defined as 1 for AA, 0 for Aa, and −1 for aa) for the ith individual at the genetic locus of interest, Inline graphic is a dominance indicator variable defined as 1 for the heterozygote and −1 for the homozygote, α(τ) is the additive effect, δ(τ) is the dominance effect, Inline graphic is an individual-specific time-dependent residual error with an i.i.d. Inline graphic distribution, and Inline graphic is an individual-specific time-independent experimental error with an i.i.d. Inline graphic distribution. This is a mixed-effects model with μ(τ), α(τ), and δ(τ) being treated as the fixed effects and Inline graphic as the random effect. The purpose of the analysis is to estimate and test α(τ) and δ(τ), the time-dependent functional genetic effects of a hypothesized QTL at a putative position of the genome.

The genetic variance contributed by the genetic locus of interest at time τ is expressed as

graphic file with name M17.gif (2)

where Inline graphic and Inline graphic under the current definitions of x and z and the assumption of no segregation distortion (see appendix a for the derivations). The phenotypic variance should be written as

graphic file with name M20.gif (3)

The proportion of the phenotypic variance contributed by the QTL is

graphic file with name M21.gif (4)

Orthogonal polynomial for longitudinal traits:

All the model parameters, except σ2, are functions of time. The functional relationships between genetic parameters and the time variable are described with an orthogonal polynomial. Define Inline graphic as a basis of the polynomial with order Inline graphic. A basis is also called a submodel. Method to construct the basis of an orthogonal polynomial is given in appendix b. Let us define Inline graphic as a vector of population means, which is time independent. The time-dependent population mean μ(τ) can be described as a linear combination of μ weighted by the basis of the polynomials; i.e., Inline graphic. Similarly, we can describe other parameters with the same basis; e.g., Inline graphic and Inline graphic, where Inline graphic and Inline graphic. The random effect at time τ can also be expressed as a linear function of the time-independent effects, Inline graphic, where Inline graphic for Inline graphic. Since Inline graphic is treated as a vector of random effects, we assume that Inline graphic is i.i.d. Inline graphic, where Σ is an Inline graphic positive definite covariance matrix.

Recall that we are dealing with a longitudinal trait (a trait measured repeatedly over some time interval on the same individual). If the phenotypic value measured at any given time point were treated as an individual trait, for Inline graphic time points, we would be dealing with Inline graphic different traits. The vector of genetic effects, say α, would have a dimension of Inline graphic because each element of vector α represents an effect for a trait. In the polynomial analysis of order r, however, the dimension of vector α is reduced to Inline graphic for Inline graphic. Therefore, the polynomial analysis for longitudinal traits is a special dimension reduction technique.

With the polynomial reparameterization, model (1) is now rewritten as a linear function of the time-independent parameters,

graphic file with name M42.gif (5)

The expectation function of Inline graphic conditional on the fixed effects is

graphic file with name M44.gif (6)

The covariance function between Inline graphic and Inline graphic conditional on the fixed effects is

graphic file with name M47.gif (7)

where Inline graphic is another standardized time point, Inline graphic for Inline graphic, and Inline graphic for Inline graphic. The first term in the right-hand side of Equation 7, Inline graphic, is actually the covariance between Inline graphic and Inline graphic for Inline graphic. When Inline graphic, it becomes the variance of Inline graphic; i.e., Inline graphic.

Model (5) is now linear in parameters, which can be estimated under the framework of the mixed-model analysis. We first estimate μ, α, δ, Σ, and σ2 and then use the basis of the polynomial to convert μ, α, δ into their time-dependent functional counterparts.

The polynomial transformation allows us to express the total genetic variance as a function of the quadratic terms of the time-independent genetic parameters, as shown below,

graphic file with name M60.gif (8)

The phenotypic variance may be rewritten as

graphic file with name M61.gif (9)

The proportion of the phenotypic variance contributed by the major gene is

graphic file with name M62.gif (10)

One can examine the behavior of each genotype by evaluating the change of the genotypic value across time. Let Inline graphic, Inline graphic, and Inline graphic be the values of the three genotypes at time τ. They are expressed as

graphic file with name M66.gif (11)

Likelihood function:

The phenotypic values for each individual are collected at Inline graphic fixed time points, denoted by Inline graphic. We add a subscript to variable τ to indicate that τ now has taken a particular value from one of the Inline graphic time points. We use an Inline graphic vector to denote an array of the phenotypic values for the ith individual,

graphic file with name M71.gif

Define

graphic file with name M72.gif

as an Inline graphic matrix. In matrix notation, the linear model for vector Inline graphic is

graphic file with name M75.gif (12)

where Inline graphic is an Inline graphic vector for the experimental errors with Inline graphic, where Inline graphic is an Inline graphic identity matrix. The conditional expectation of model (12) given the fixed effects is

graphic file with name M81.gif (13)

and the variance–covariance matrix is

graphic file with name M82.gif (14)

which applies to all Inline graphic.

The models discussed so far are based on known genotypes of the QTL. In practice, the genotypes are not observable and the likelihood function must be constructed by taking into account all three genotypes for all individuals. Let us order the three genotypes, aa, Aa, and AA, as genotypes 1, 2, and 3 (indexed by k), respectively, and use a numerical variable G to denote the three ordered genotypes. Let Inline graphic be the genotype of the major gene for individual i. Before we observe the phenotypic values, we should use markers to infer the probability of Inline graphic. This conditional probability is denoted by Inline graphic, where Inline graphic denotes marker information. The conditional probability Inline graphic varies across individuals because different individuals are supposed to have different marker genotypes. Methods to calculate Inline graphic can be found in Lander and Botstein (1989) for interval mapping and Jiang and Zeng (1997) for multipoint mapping. We take the multipoint method because it facilitates an automatic mechanism to handle missing marker genotypes. Recall that Inline graphic is the conditional expectation of Inline graphic given the genotype of individual i and the population parameters. Since each individual can take one of three genotypes, there are only three distinguishable conditional expectations, denoted by

graphic file with name M92.gif (15)

The probability density of phenotypes for the ith individual with the kth genotype (Inline graphic) is

graphic file with name M94.gif (16)

Let θ = {μ, α, δ, Σ, σ2} be the parameters. The likelihood function is

graphic file with name M95.gif (17)

Several numerical methods can be used to search for the maximum-likelihood estimates (MLEs) of the parameters, e.g., the simplex method of Nelder and Mead (1965). We decided to use the EM algorithm (Dempster et al. 1977; Lander and Botstein 1989; Jansen 1993) because explicit equations for the maximization step are available in the complete data situation.

EM algorithm for parameter estimation:

With the EM algorithm, we classify variables into data, missing values, and parameters. In QTL mapping, y = {yi} and Inline graphic are data, θ are parameters, and Inline graphic are variables with missing values. The genotype indicator variables Inline graphic, however, are determined by Inline graphic. So, the missing values can be rewritten as Inline graphic. The likelihood function given in (17) with all the missing values integrated out is called the observed likelihood function. Instead of directly maximizing the observed likelihood function, the EM algorithm deals with the following complete-data likelihood function,

graphic file with name M101.gif (18)

where

graphic file with name M102.gif (19)
graphic file with name M103.gif (20)

and

graphic file with name M104.gif (21)

Note that we introduce a new variable η whose value is determined by Inline graphic as Inline graphic for Inline graphic and Inline graphic for Inline graphic. Because the complete-data likelihood function involves missing values, integration (or expectation) is taken with respect to the missing values, not for the complete-data likelihood function, but for the following log-transformed complete-data likelihood function,

graphic file with name M110.gif (22)

The expectation of Equation 22 with respect to the missing values is

graphic file with name M111.gif (23)

where

graphic file with name M112.gif (24)
graphic file with name M113.gif (25)

and

graphic file with name M114.gif (26)

The EM algorithm requires: (1) finding the partial derivatives of Inline graphic with respect to the parameters, (2) setting these partial derivatives equal to zero, and (3) searching for explicit solutions for the parameters as a function of the missing values. This completes the maximization step. The expectation step requires taking the expectations for all terms that involve the missing values. Since the complete-data likelihood function is just a likelihood function for a typical mixed model, we can simply utilize the existing mixed-model EM algorithm to find the MLE of parameters (Henderson 1986). The following are the EM steps we implement for the mixed-model analysis.

  • Step 0: Set ζ = 0 and initialize all parameters with values in their legal domain, denoted by θ(ζ).

  • Step 1 (E1): Compute the posterior probabilities of the three genotypes for each individual,
    graphic file with name M116.gif (27)
  • Step 2 (E2): Find the posterior distribution of the random effect Inline graphic. This posterior distribution turns out to be a mixture of three normal distributions with a mean
    graphic file with name M118.gif (28)
    and a covariance matrix
    graphic file with name M119.gif (29)
    All parameters appearing in the right-hand side of the equations should be substituted by their most current values of the parameters at iteration t. This statement also holds for subsequent steps.
  • Step 3 (E3): Compute all the expectations involved in the following maximization steps (see the next paragraph for detailed expressions of the expectations).

  • Step 4 (M1): Update the population mean
    graphic file with name M120.gif (30)
  • Step 5 (M2): Update the additive effect
    graphic file with name M121.gif (31)
  • Step 6 (M3): Update the dominance effect
    graphic file with name M122.gif (32)
  • Step 7 (M4): Update the covariance matrix of the random effect
    graphic file with name M123.gif (33)
  • Step 8 (M5): Update the residual variance
    graphic file with name M124.gif (34)
  • Step 9: Increment ζ by 1; i.e., let ζ = ζ+1, and repeat steps 1–8 (three E-steps and five M-steps) until Inline graphic holds.

In this paragraph, we provide explicit expressions for the expectations of various terms involved in the EM steps. The expectation of the quadratic term of the random effects is

graphic file with name M126.gif (35)

where Inline graphic and Inline graphic are already given in step 2 (Equations 28 and 29). We need to provide only the expectations related to the genotype indicator variables. These expectations are

graphic file with name M129.gif (36)

The EM algorithm was implemented via a FORTRAN90 program, which is available from the authors on request.

Hypothesis tests:

Several important hypothesis tests are interesting and can be performed via the likelihood-ratio test statistics. The first hypothesis is “no segregation of a QTL at the current position,” which is denoted by Inline graphic. The test statistic for this hypothesis is

graphic file with name M131.gif (37)

where Inline graphic is the log-likelihood function evaluated at Inline graphic and Inline graphic is the log likelihood under the null model evaluated at Inline graphic. Note that Inline graphic differs from Inline graphic in two aspects: (1) Inline graphic has a lower dimension because Inline graphic has been enforced, and (2) the three parameter sets in Inline graphic are estimated from the null model and they are different from the counterparts estimated under the full model. The EM algorithm for estimating Inline graphic is a simple special case of the EM algorithm for the full model. Under the null hypothesis, Inline graphic will follow approximately a chi-square distribution with Inline graphic, where Inline graphic and Inline graphic are the dimensions of θ and Inline graphic, respectively. These two vectors differ by Inline graphic, each having a dimension of Inline graphic, which explains why Inline graphic.

When Inline graphic is rejected, we can further test whether the QTL effects are due to additive or dominance effects. First, we can test the departure from additivity, i.e., the dominance effect. The null hypothesis of “no dominance effect” is Inline graphic and the test statistic is

graphic file with name M152.gif (38)

where Inline graphic is the parameter vector estimated under Inline graphic.

To test the additive effect, our null hypothesis becomes “no additive effect,” denoted by Inline graphic. The test statistic is

graphic file with name M156.gif (39)

where Inline graphic. Each one of Inline graphic and Inline graphic will approximately follow a chi-square distribution with Inline graphic d.f.

It is also possible to test other hypotheses, such us whether a polynomial of order r +1 is sufficiently better than that of order r and whether the structured V-matrix sufficiently explains the covariances of the experimental errors compared to an unstructured V-matrix. We defer these hypothesis tests to the data analysis and discussion sections.

So far we have discussed hypothesis tests only at one location of the genome. To scan the entire genome for QTL, we test every putative position of the genome with a 1- or 2-cM increment and plot the test statistic against the genome location to form a test statistic profile. Positions that correspond to significant peaks of the profile provide estimates of the QTL locations. The critical value of the test statistic used to declare significance is obtained via the permutation test (Churchill and Doerge 1994).

DATA ANALYSIS

Simulation studies:

We simulated an F2 population under several different sample sizes. A single chromosome of 200 cM was simulated with 21 evenly spaced markers (10 cM per interval). The position of the first marker was designated as position zero and positions of all other markers were measured as the distances from the first marker. We put three QTL at positions 23, 94, and 187 cM, respectively. The range of the original time points (t) was [1, 150]. The cumulative heritabilities of the three QTL were 0.24, 0.24, and 0.15, respectively, each of which was defined as

graphic file with name M161.gif (40)

where Inline graphic is the proportion of the phenotypic variance explained by the QTL at time point τ [see Equation 4 for the definition of Inline graphic]. The QTL effects that generate such values of Inline graphic were found by trial and error and are given in Table 1 with Legendre polynomial of order 3 (Inline graphic). The simulated population mean was

graphic file with name M166.gif

The functional curve for the mean was Inline graphic. The functions of population mean, additive effect, dominance effect, and heritability of the three QTL plotted against time are given in Figure 1, a, b, c, and d, respectively. The individual specific and time-dependent random effect was simulated using Inline graphic, where Inline graphic was generated from a Inline graphic distribution with

graphic file with name M171.gif

which is a Inline graphic positive definite covariance matrix. The residual variance was Inline graphic. The model to generate the phenotypic values was

graphic file with name M174.gif (41)

where the summation indicated the sum of three QTL.

TABLE 1.

QTL parameters used in the simulation experiment

Additive effect
Dominance effect
QTL Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Position (cM)
1 1.28 0.65 1.52 1.20 0.97 0.36 −1.21 0.77 23
2 1.30 −0.39 1.31 0.85 1.05 −0.27 1.24 0.68 94
3 0.94 −0.64 0.73 0.72 0.82 −0.48 0.67 1.12 187

Figure 1.—

Figure 1.—

The simulated parameters of QTL1, QTL2, and QTL3 expressed as functions of time: (a) population means μ(τ), (b) additive effects α(τ), (c) dominance effects δ(τ), and (d) heritability [proportion of phenotypic variance explained by a locus Inline graphic].

An F2 population was simulated with three different sample sizes, Inline graphic. The frequency of measurement of the trait was simulated at two levels, Inline graphic. When the trait was measured five times, the interval between two consecutive measurements was 30 days, whereas Inline graphic corresponded to an interval of measurement of 15 days. All six combinations of n and m were simulated for comparison. Each of the six combinations (scenarios) was replicated 100 times.

The critical values used to declare statistical significance for QTL in the simulation experiments were obtained empirically by simulating 500 additional replicates under the null model where all QTL effects were set to zero. The empirical critical values under the six scenarios are given in Table 2.

TABLE 2.

Six different experimental designs (scenarios) of experiment (combinations of sample size and frequency of trait measurement) and the empirical critical values of the test statistics when the type I error is 0.05 or 0.01

Scenario Sample size (n) Frequency of test (m) Critical value (0.05) Critical value (0.01)
1 50 5 93.5 104.1
2 80 5 104.0 112.0
3 100 5 111.0 122.8
4 50 10 96.8 105.9
5 80 10 110.1 120.31
6 100 10 114.3 126.3

The average estimated QTL positions, the likelihood-ratio test statistics, and the empirical powers of the three QTL are given in Table 3. The estimated QTL effects (including additive and dominance effects) are given in Table 4. The purpose of the simulations was neither to select the optimal design of the experiment nor to choose the optimal model of QTL mapping; rather, it was to demonstrate the general behavior of the mapping procedure and validate the computer program. Therefore, the simulations were not exhaustive. The results of the simulations did show the expected behaviors of a mapping procedure: (1) the power increases as the sample size increases, (2) the standard error of each estimated parameter decreases as the sample size increases, and (3) the increased power and decreased error are also associated with the increased frequency of the trait measurements. Therefore, the algorithm converges as expected and the computer program is logically valid. The average likelihood-ratio profiles of the 100 replicates under Inline graphic and Inline graphic are demonstrated in Figure 2. The three peaks in the likelihood-ratio profile clearly demonstrated the efficiency of the method for scanning multiple QTL. Again, we are not looking at model selection here; rather, we are trying to verify the model fit (Wiltshire et al. 2005). We developed a single-QTL model but implemented the method with data containing multiple QTL. This is a typical approach for genome scanning with interval mapping (Lander and Botstein 1989). The estimated positions and effects of the three simulated QTL were very close to the true values (see Table 4). This demonstrated the robustness of the single-QTL model for mapping multiple additive QTL. The robustness may be credited to the covariance matrix Σ included in the model. When QTL1 was evaluated, QTL2 and QTL3 were not taken into account by the model. These two neglected QTL would ordinarily cause extra correlation among the repeated measurements. The extra correlations, however, were absorbed by matrix Σ. Therefore, the estimated μ, Σ, and σ2 from the single-QTL model were not comparable to the true values of μ, Σ, and σ2 that were used to simulate the three-QTL-controlled longitudinal trait.

TABLE 3.

Average estimates of QTL positions (cM), the likelihood-ratio (LR) test statistics, and the empirical statistical powers of three QTL calculated from 100 replicated simulations

QTL1
QTL2
QTL3
Scenario cM LR Power (5%) Power (1%) cM LR Power (5%) Power (1%) cM LR Power (5%) Power (1%)
1 24.06 (0.88) 211.13 (5.20) 99 95 95.00 (0.95) 169.97 (4.97) 92 82 179.79 (1.50) 121.65 (3.22) 80 72
2 22.61 (0.36) 318.09 (7.19) 100 100 93.86 (0.95) 242.57 (6.96) 99 92 183.41 (0.97) 176.19 (4.24) 98 93
3 22.68 (0.30) 413.37 (8.02) 100 100 94.61 (0.43) 309.74 (7.24) 100 100 184.17 (0.97) 212.84 (5.11) 99 94
4 24.04 (0.58) 251.24 (6.50) 100 100 93.20 (0.88) 205.46 (5.20) 99 99 182.60 (1.23) 140.28 (4.56) 87 79
5 23.40 (0.36) 397.70 (4.01) 100 100 94.25 (0.60) 299.65 (7.60) 99 99 184.62 (0.83) 208.00 (5.71) 99 99
6 22.82 (0.26) 512.58 (9.49) 100 100 94.51 (0.41) 353.63 (7.32) 100 100 186.80 (0.59) 250.41 (5.85) 100 99

The standard deviations obtained from the 100 replicates are given in parentheses.

TABLE 4.

Average estimates of the additive (Inline graphic) and dominance (Inline graphic) effects of three QTL calculated from 100 replicated simulations

Scenario α0 α1 α2 α3 δ0 δ1 δ2 δ3
1 1.57 (0.06) 0.59 (0.05) 1.40 (0.07) 1.18 (0.07) 1.05 (0.04) 0.27 (0.03) −0.99 (0.06) 0.84 (0.06)
1.62 (0.07) −0.31 (0.05) 1.16 (0.08) 0.99 (0.08) 1.05 (0.04) −0.19 (0.04) 1.05 (0.05) 0.71 (0.05)
1.27 (0.06) −0.61 (0.04) 0.97 (0.08) 0.91 (0.07) 0.72 (0.05) −0.44 (0.03) 0.56 (0.06) 0.95 (0.06)
2 1.62 (0.05) 0.55 (0.04) 1.49 (0.06) 1.10 (0.05) 0.98 (0.03) 0.35 (0.03) −1.05 (0.03) 0.81 (0.04)
1.74 (0.05) −0.28 (0.03) 1.36 (0.06) 0.93 (0.07) 1.03 (0.04) −0.22 (0.02) 1.06 (0.04) 0.70 (0.05)
1.23 (0.04) −0.66 (0.03) 0.96 (0.05) 0.91 (0.05) 0.89 (0.03) −0.46 (0.02) 0.73 (0.04) 1.18 (0.03)
3 1.58 (0.04) 0.55 (0.03) 1.43 (0.05) 1.20 (0.05) 1.07 (0.03) 0.34 (0.02) −1.08 (0.03) 0.87 (0.03)
1.76 (0.04) −0.30 (0.03) 1.20 (0.05) 0.93 (0.05) 1.10 (0.03) −0.25 (0.02) 1.11 (0.04) 0.70 (0.03)
1.22 (0.04) −0.67 (0.03) 0.95 (0.05) 0.82 (0.05) 0.77 (0.03) −0.45 (0.02) 0.64 (0.04) 1.07 (0.03)
4 1.64 (0.07) 0.50 (0.04) 1.46 (0.06) 1.03 (0.07) 0.93 (0.05) 0.32 (0.03) −1.11 (0.05) 0.71 (0.05)
1.70 (0.05) −0.30 (0.04) 1.28 (0.06) 0.96 (0.07) 1.03 (0.04) −0.25 (0.03) 1.11 (0.05) 0.74 (0.04)
1.21 (0.06) −0.64 (0.04) 0.89 (0.07) 0.74 (0.06) 0.84 (0.05) −0.45 (0.03) 0.66 (0.05) 1.07 (0.05)
5 1.66 (0.05) 0.50 (0.03) 1.45 (0.04) 1.15 (0.05) 1.00 (0.04) 0.36 (0.02) −1.17 (0.04) 0.76 (0.04)
1.75 (0.05) −0.32 (0.03) 1.23 (0.04) 0.99 (0.06) 1.09 (0.04) −0.22 (0.02) 1.15 (0.04) 0.72 (0.03)
1.21 (0.05) −0.71 (0.03) 0.93 (0.05) 0.94 (0.05) 0.84 (0.04) −0.46 (0.02) 0.71 (0.04) 1.18 (0.03)
6 1.60 (0.04) 0.53 (0.03) 1.40 (0.04) 1.19 (0.05) 1.06 (0.03) 0.35 (0.02) −1.11 (0.03) 0.80 (0.03)
1.74 (0.04) −0.33 (0.03) 1.22 (0.04) 0.97 (0.05) 1.07 (0.03) −0.25 (0.02) 1.15 (0.03) 0.73 (0.03)
1.19 (0.04) −0.71 (0.03) 0.88 (0.04) 0.90 (0.05) 0.77 (0.03) −0.48 (0.02) 0.66 (0.03) 1.09 (0.03)

The standard deviations obtained from the 100 replicates are given in parentheses. The three rows in each scenario represent the three simulated QTL.

Figure 2.—

Figure 2.—

The average likelihood-ratio test statistic profile obtained from 100 replicated simulations under scenario 5 (Inline graphic) with three QTL. The horizontal line indicates the empirical critical value when the type I error rate was 1%.

Normally, one would expect to see some bias in the estimated effects associated with the Legendre polynomials due to the Beavis effect (Beavis 1994; Xu 2003). The lack of such biases in the simulation experiment is due to the fact that we reported all effects, regardless of the significance of the test. The Beavis effect would be expected to happen if the mean effects were calculated on the basis of a censored sample (containing only the replicates in which significant QTL were detected).

To verify the efficiency of the method for estimating μ, Σ, and σ2, we simulated an additional data set with exactly the same setup as the original simulation experiment except that we now kept only the first QTL (see Table 1 for the parameters of QTL1) in the simulation under Inline graphic and Inline graphic. The experiment was replicated 100 times. The average estimated σ2 was Inline graphic, close to the true value of 2.0. The average estimated μ and Σ were

graphic file with name M183.gif

and

graphic file with name M184.gif

respectively, where the numbers following ± are the standard deviations obtained from the 100 replicates. The estimated μ and Σ were also close to the true values of μ and Σ. Overall, the simulation experiment demonstrated that we have developed a working procedure of QTL mapping for longitudinal traits.

For data generated by the single-QTL model, we also performed the logistic regression analysis using FunMap (Ma et al. 2002, 2004). Unfortunately, no QTL was detected in any of the replicates (see Figure 3 for the comparison of the two methods). This comparison is not quite fair because the method of Ma et al. (2002) was not designed to handle curves of arbitrary shape with a general covariance structure. This demonstrates only that extreme departure from the assumption of Ma et al. (2002) will cause the method they developed to fail, and so experimentalists should take care in its application. A fair comparison is deferred to the real data analysis.

Figure 3.—

Figure 3.—

The average likelihood-ratio test statistic profile obtained from 100 replicated simulations under scenario 5 (Inline graphic) with 1 QTL using the Legendre polynomial of order 4 proposed in this study (thick lines) as well as the logistic regression method (thin lines). The two horizontal lines indicate the empirical critical values of the two methods when the type I error rate was set at 1%.

We now demonstrate that, with a little modification, the method can handle models with epistatic (interactive) effects of QTL. Wu et al. (2004) recently extended their logistic model to map epistatic effects and published the epistatic model alone in a separate study. For simplicity of presentation, we simulated a backcross population of Inline graphic individuals with test frequencies of Inline graphic or Inline graphic for the period of 150 days. A single chromosome of 100 cM was simulated and 11 markers were placed evenly on the chromosome (10 cM per interval). Two QTL were simulated at positions 23 and 75 cM, respectively. The effects of these two QTL were defined under the following two models: (1) additive plus epistatic effect model (A + E model), in which both QTL have their own additive effects and they also act interactively to determine the phenotype of the trait, and (2) epistatic effect only model (E model), in which the two QTL act interactively to determine the phenotype and neither QTL has its own additive effect. These two models were used to generate the data, but the analytical model always assumed the presence of both the additive and epistatic effects and thus always presented estimated values for both types of effects. The epistatic model is

graphic file with name M188.gif (42)

where the values of μ, Σ, and σ2 remained the same as those given in the previous simulation experiment and the vectors of QTL effects were defined as follows.

For the A + E model, the first QTL has effects

graphic file with name M189.gif

the second QTL has effects

graphic file with name M190.gif

and the epistatic effects are

graphic file with name M191.gif

For the E model, the corresponding three vectors were defined as

graphic file with name M192.gif
graphic file with name M193.gif

and

graphic file with name M194.gif

The estimated positions of the two QTL were obtained in a two-dimension grid search for the maximum value of the likelihood function. Once the positions were found, the estimated QTL effects corresponding to the optimal positions were reported as the MLE of the QTL effects. The critical value used for declaration of statistical significance was obtained in the same way as those in the previous simulation experiment.

Table 5 shows the estimated positions of the two QTL along with the empirical statistical powers. The A + E model reached 100% power in both test frequencies examined. The E model, however, has less power. The estimated errors of parameters of the two models under different test frequencies also behave as expected. Table 6 gives the average estimated QTL effects, which were all close to the true values used to simulate the data. Overall, the simulation result has validated the method and the program.

TABLE 5.

Means and standard deviations (in parentheses) of the estimated QTL positions under the epistatic genetic effect models obtained from 100 replicated simulations

Model Test frequency (m) QTL1 QTL2 Power (0.05) (%)
Additive and epistatic 5 23.23 (0.15) 75.26 (0.14) 100
10 23.26 (0.13) 75.32 (0.12) 100
Epistatic effect only 5 23.2 (0.26) 74.69 (0.27) 60
10 23.43 (0.23) 75.01 (0.27) 94

The empirical statistical powers at type I error of 0.05 are given in the last column. The true positions of QTL1 and QTL2 are 23 and 75cM, respectively.

TABLE 6.

Means and standard deviations (in parentheses) of the estimated QTL effects obtained from 100 replicated simulations under the A + E model and the E model

Model QTL True value Estimate (m = 5) Estimate (m = 10)
Additive and epistatic (A + E) QTL1 1.28 0.65 1.52 1.20 1.28 (0.01) 0.65 (0.01) 1.52 (0.02) 1.20 (0.02) 1.26 (0.01) 0.66 (0.01) 1.53 (0.01) 1.21 (0.01)
QTL2 0.94 −0.64 0.73 0.72 0.95 (0.01) −0.59 (0.01) 0.72 (0.02) 0.71 (0.02) 0.95 (0.01) −0.65 (0.01) 0.72 (0.01) 0.7 (0.01)
QTL1,2 0.65 −0.20 0.66 0.43 0.64 (0.01) −0.18 (0.01) 0.6 (0.02) 0.37 (0.02) 0.66 (0.01) −0.17 (0.01) 0.67 (0.01) 0.44 (0.01)
QTL1 0.00 0.00 0.00 0.00 0.01 (0.01) 0.00 (0.01) 0.01 (0.02) 0.01 (0.02) 0.01 (0.01) −0.01 (0.01) 0.02 (0.01) 0.00 (0.01)
Epistatic (E) QTL2 0.00 0.00 0.00 0.00 −0.01 (0.01) 0.00 (0.01) 0.00 (0.02) −0.03 (0.02) 0.00 (0.01) 0.00 (0.01) −0.01 (0.01) 0.00 (0.01)
QTL1,2 0.65 −0.2 0.66 0.43 0.63 (0.01) −0.21 (0.01) 0.62 (0.02) 0.42 (0.02) 0.62 (0.01) −0.22 (0.01) 0.64 (0.01) 0.42 (0.01)

QTL1 and QTL2 represent the additive effects of the two QTL and QTL1,2 represents the epistatic effects between the two QTL. Estimate Inline graphic and estimate (m = 10) represent the estimated effects from data of the two different test frequencies.

Example of real data analysis:

We used the data published by Ma et al. (2002) as an example to demonstrate the application of the new method to real data from a longitudinal trait. The trait was the growth of the stem diameter of poplar trees measured annually for 11 years (Inline graphic). The mapping population consisted of 78 progeny of a pseudobackcross family (Inline graphic progeny of a hybrid backcrossed to a parent of the hybrid). The method developed for F2 design was revised by including two genotypes only per locus. The conditional probability of the QTL genotype given marker information was calculated using the multipoint method of Jiang and Zeng (1997). The data contained markers of chromosome 10 only, and thus only one chromosome was scanned.

Before conducting QTL mapping, we took the least-squares method and fit the phenotypic values of diameter growth to the logistic curve and Legendre polynomials of orders 2, 3, and 4, respectively, for each individual. The average goodness of fit (R2) for the growth trajectories of 78 individuals showed that the Legendre polynomial with Inline graphic fit the data equally well as, or better than, the logistic curve. This justifies the orders chosen for the polynomial analysis in this study and the logistic analysis in Ma et al. (2002). Theoretically, the polynomial approach and the logistic analysis can be compared only if the shape of the growth curve is sigmoid. When the order of the polynomial is Inline graphic, the curve is linear or quadratic, which fits poorly to the tree diameter growth curve. The tree diameter growth curve appeared to be S-shaped and thus a valid comparison of the polynomial and the logistic analyses can be made.

There are a maximum of 10 different orders of the polynomial to choose for the Inline graphic time points. We fit each of the orders and chose the one with the minimum Bayesian information criterion (BIC) (Schwarz 1978). The BIC for order r was calculated using

graphic file with name M200.gif (43)

where Inline graphic is the MLE of parameters for order r and Inline graphic is the dimension of θ (the number of independent parameters in the model). The BIC scores for the Legendre polynomial of orders 2, 3, 4, and 5 are 1332.6, 920.3, 768.0, and 809.6, respectively. Clearly, Inline graphic is the best order. Therefore, all subsequent reports are based on the result of Inline graphic Legendre polynomial. Results of fitting other orders of the polynomial were not reported due to space limitations. However, the results of Inline graphic and 5 were very close to that of Inline graphic (data not shown).

Figure 4 gives the likelihood-ratio test statistic profile for chromosome 10. Two peaks are evident in the profile. We conducted a permutation test (with 500 randomly reshuffled samples) to determine the empirical critical values for significance declaration. Both peaks passed the critical values, and thus both QTL were significant. The positions of the two QTL were estimated at 32 and 54 cM, respectively. The two QTL are designated as QTL32 and QTL54, respectively. The estimated QTL effects were

graphic file with name M207.gif

for QTL32 and

graphic file with name M208.gif

for QTL54. The QTL effect functions are expressed as

graphic file with name M209.gif

for QTL32 and

graphic file with name M210.gif

for QTL54. Note that dominance effects cannot be separated from the additive effects in a backcross design. Therefore, the additive effects estimated are actually confounded with the dominance effects. The functions of the two QTL effects are depicted in Figure 5, which show similar dynamic patterns of change. Both QTL appeared to be inactive until year 7 and then the effects increased until year 10 when a plateau is reached.

Figure 4.—

Figure 4.—

The likelihood-ratio test statistic profile of QTL detection for the diameter growth trajectory of poplar trees using the Legendre polynomial of order 4 proposed in this study. The two horizontal lines represent the empirical critical values at type I error rates of 0.01 (top) and 0.05 (bottom), respectively.

Figure 5.—

Figure 5.—

Additive effects of the two detected QTL expressed as functions of time.

For comparison, we analyzed the same data with the method of Ma et al. (2002) using FunMap (Ma et al. 2004), a public program for mapping longitudinal traits with the logistical regression method. The covariance structure specified in the program was AR(1), first-order autoregressive model. The likelihood-ratio profile is given in Figure 6 (the shaded curve). Only one peak passed the critical value and the location was at 14 cM. We calculated the BIC value for this analysis and found that the BIC was 915.8, >768.0, which is the BIC of the Legendre polynomial of order 4. Therefore, the two different methods seem to generate somewhat different results.

Figure 6.—

Figure 6.—

The likelihood-ratio test statistic profiles of QTL detection for the diameter growth trajectory of poplar trees using the Legendre polynomial of order 4 for QTL effects but fitted with AR(1) covariance structure for the residual errors (thick lines) and the logistic regression method (thin lines). The two horizontal lines represent the empirical critical values at type I error rates of 0.01, for the two models, respectively.

Because the difference between the two methods may be due to the different covariance structures specified, we then modified our method by using the AR(1) covariance structure but still fit the QTL effects to the Legendre polynomials. We found that the minimum BIC value still occurred when Inline graphic (the BIC score was 906.0). This value is >768.0, which is the BIC of the Legendre polynomial of order 4 with the general covariance structure (Inline graphic). The likelihood-ratio test statistic profile is depicted in Figure 6 (the solid curve). The profile has exactly the same shape as that of Ma et al. (2002). This showed that the difference in results from the two methods was partially due to the difference in the covariance structure. Judging by the BIC values, we may conclude that this hybrid method of the Legendre polynomial with AR(1) covariance structure is slightly better than the method of Ma et al. (2002).

DISCUSSION

We presented a polynomial approach to QTL mapping for longitudinal traits as opposed to the logistic regression approach (Ma et al. 2002). Both methods may be considered as dimension reduction techniques for multivariate analysis. With the polynomial approach, the dimension of the model can vary according to the model goodness of fit to the actual data. The BIC value was used to determine the optimal dimension. As a result, the polynomial model is more general and flexible than the logistic regression approach, which has a fixed dimension of three. More importantly, the polynomial model is linear in parameters, which allows well-developed linear model methodology to be fully utilized in the longitudinal trait study. Both methods benefit more when the trait is measured more frequently. There is another method for functional data analysis that is quite similar to the polynomial approach, namely, the method of B-splines (de Boor 2001). The B-splines are more commonly used in nonparametric data analysis to infer the empirical distribution of a variable and are rarely applied to quantitative genetics. We think that the polynomial approach is more intuitive and easy to understand. Once the polynomial approach is accepted by geneticists, we may further explore the B-splines approach and compare the two competing but similar methods.

For simplicity, we fit all model effects to the same basis (order) of the polynomials. In fact, different types of model effects may have different functional relationships with time. For example, the population mean may be linear in time, whereas the additive effect may be quadratic or cubic in time. This can be taken into account by describing each type of model effect by a polynomial with its own basis. The model with this flexibility may look like

graphic file with name M213.gif (44)

where Inline graphic is the basis with order Inline graphic, which may differ from Inline graphic and the orders of other effects. Allowing the random effects to have a different basis from other effects (Inline graphic) may be very desirable because this will introduce a different structured residual covariance matrix. Further discussion on the structured covariance matrix is given later.

If a major gene model sufficiently describes the data, the residual covariance matrix should have a simple form; e.g., Inline graphic or simply Inline graphic. Polygenic effects and other factors not included in the model may cause a more dense structure of the covariance matrix. Wu et al. (2002b, 2003, 2004) used AR(1), the first-order autoregressive covariance matrix, because AR(1) is one of the simplest structures that allows the authors to develop a nice EM algorithm. Extension to more complicated structures either is impossible or would take substantial effort. With the polynomial analysis, however, we are able to use Inline graphic, a general structured covariance matrix. We can choose the dimension of Inline graphic as low as zero so that Inline graphic and as high as Inline graphic so that Inline graphic (a fully unstructured covariance matrix). This flexibility cannot be enjoyed by the logistic growth curve fitting approach. The optimal dimension of Inline graphic may be found by evaluating the BIC estimates of different dimensions. Again, there is no theoretical difficulty with the polynomial analysis.

Many aspects of agricultural experiments are subject to high levels of uncertainty. Not all plants may have full measurements at all the calibrated time points. Occasionally, a plant may fail to be measured for the phenotype at a certain time point due to oversight of the researchers. This is a typical missing value problem. Some plants may die before they are fully measured for the entire term of the experiment. This is a typical problem of longitudinal study (Diggle et al. 1994). The EM algorithm was particularly designed to handle problems like this. So, there is no additional technique required to handle these missing value problems. One should still include the missing phenotype (as a symbolic quantity) in the model but simply replace all terms involving the missing phenotype by their conditional expectations and proceed with the usual EM iterations. The conditional expectation for the term involving a missing phenotype, say value Inline graphic, should be calculated on the basis of the parameters at the current iteration and a restriction Inline graphic, where Inline graphic are observed values before and after time point j. If the plant dies at time point j (longitudinal data analysis), the restriction should be Inline graphic for all Inline graphic. These restrictions apply only to growth traits. For other types of traits, say daily gain, such restrictions are not necessary.

An alternative approach for dealing with missing values may be more powerful than the aforementioned one. Instead of using the conditional expectations of the terms related to the missing phenotypes, we may model only the observed phenotypes and completely ignore the missing phenotypes. The complication encountered in this alternative approach is that the vector of phenotypic values for an individual with incomplete measurement has a dimension <Inline graphic. The general situation is that the dimension of vector Inline graphic varies across individuals. This will cause the dimension of matrix Inline graphic to vary, which will be reflected by using Inline graphic in place of Inline graphic. There is no theoretical difficulty in incorporating the property of variable dimension into the likelihood analysis. The model bearing this flexibility may be

graphic file with name M236.gif (45)

where Inline graphic now represents the basis of polynomial with order r but constructed using only the time points with actual measurements for individual i. With the polynomial basis varying across individuals, we can fully take into account the missing value problem. This approach is sufficiently general to handle even more complicated experiments, such as different individuals are measured at different time points during the entire term of the experiment. One should be cautious about standardizing the time point from the original scale to the [−1, +1] scale (see appendix b for the definition of standardized scale). The initial and ending time points used should be constant across individuals. In other words, one should use Inline graphic as the time that the measurement starts and Inline graphic the time that the measurement ends during the entire experiment for all individual plants. Other technical problems may occur, which deserve further investigation. The primary objective of this study was to lay the foundation for the polynomial approach to longitudinal trait study. Technical improvements, which will digress from the main focus, are deferred to subsequent studies.

We have demonstrated that if the growth curve is indeed sigmoid, the proposed polynomial analysis can still be used due to the great generality of the method. The flexibility of the polynomial analysis in terms of incorporating structured residual covariance matrices makes the method more preferable to the logistic approach, which fits only an RA(1) covariance structure. One criticism of the polynomial approach when applied to the sigmoid curve is the interpretability of the polynomial regression coefficients. In logistic models, each parameter has a biological meaning, whereas that is not the case when using a polynomial. However, a logistic curve can be described by a polynomial with order 3 Inline graphic. If a growth curve is indeed fitted with such a polynomial, we can find the functional relationships of the parameters in the logistic curve with respect to the parameters in the polynomial. By doing so, we can interpret the biological meanings of the polynomial parameters. We take the following approaches to infer the functional relationships between the logistic parameters and the polynomial parameters. First, we understand that the actual shape of a logistic curve is also determined by the initial value of the curve at time zero and the value of the curve at the inflection point. This implies that we can express the logistic parameters (a, b, r) as functions of the two coordinates (the initial and the inflection points) because both determine the shape of the curve. We then find the functional relationships of the values of the two coordinates with respect to the polynomial parameters Inline graphic. Combining the two steps, we can eventually find the relationships between parameters of the two types of curves, i.e., Inline graphic, Inline graphic, and Inline graphic. Derivations of the three functions are provided in appendix c. We now have expressed the logistic parameters as functions of the polynomial parameters. Although the biological meaning of each polynomial regression coefficient alone is not obvious, different combinations (functions) of the polynomial parameters determine the parameters of the logistic curve, which are well interpretable biologically.

Acknowledgments

We are grateful to two anonymous reviewers for their comments and suggestions, which greatly assisted the revision of this manuscript. This research was supported by the National Institutes of Health grant R01-GM55321, the National Science Foundation grant DBI-0345205 to S.X., and the Chinese National Natural Science Foundation grant 30471236 to R.Y.

APPENDIX A: DERIVATIONS OF QTL VARIANCES IN AN F2 POPULATION

The linear model for the phenotypic value at time τ is

graphic file with name M259.gif (A1)

Definitions of the symbols are given in the main text, see Equation 1. When α(τ) and δ(τ) are treated as fixed effects, the genetic effect,

graphic file with name M260.gif (A2)

has a variance of

graphic file with name M261.gif (A3)

Define Inline graphic, Inline graphic, and Inline graphic as the theoretical variances of the effects in question across individuals within the F2 population. Under the assumption of no segregation distortion, the ratio of the three genotypes (AA, Aa, and aa) is expected to be Inline graphic. On the basis of the definitions of variables x and z presented in the text, we can show that

graphic file with name M266.gif (A4)
graphic file with name M267.gif (A5)

and

graphic file with name M268.gif (A6)

Therefore, the genetic variance contributed by the major gene at time τ is

graphic file with name M269.gif (A7)

The particular values of Inline graphic defined for an F2 population are due to the special way we defined the values of variables x and z for the three genotypes. One can define the values of x and z in any arbitrary ways without affecting the results of the analysis. The way we chose for the definitions of x and z is adopted from Xu (1998). In fact, a mathematically more attractive way to define x and z is Inline graphic and Inline graphic for genotypes Inline graphic. The x defined this way appears to be odd, which explains why this system has not been adopted in the literature. However, one can prove that Inline graphic and Inline graphic, leading to

graphic file with name M276.gif (A8)

which is the mathematical attractiveness of the new system. One should be cautious in adopting a new system for the definitions of x and z. If x and z are not orthogonal, then Inline graphic must be incorporated into the formula of genetic variance. Furthermore, if segregation distortion has occurred, the theoretical variances and covariance of variables x and z are not known. One may estimate the variances and covariance from the observed data and use the estimated values instead.

APPENDIX B: BASIS OF THE ORTHOGONAL POLYNOMIAL

Orthogonal polynomial analysis is a general way to describe the functional relationship of one variable relative to another variable. The dependent variable is usually stochastic and follows some kind of distribution. The independent variable is usually calibrated by some fixed time points, which are controlled by the investigator. One can choose the order of the polynomial as high as the total number of fixed time points. The model can fully describe the data with no error, but such a model has no general use. In contrast, one can choose the order of the polynomial as low as 0, leading to a constant, again a trivial result. If one chooses the order of 1, the functional relationship becomes linear in time. The order of 2 describes a quadratic function and so on. The rule of thumb is to choose the order as low as possible but the model with this order must be able to describe the data sufficiently. In biological science, most curves may be described by the order of <5. A curve with an order >5 is difficult to interpret.

The first step in the orthogonal polynomial analysis is to convert the original time point t into a standardized time point τ using the following expression,

graphic file with name M278.gif (B1)

where, Inline graphic is the initial time point and Inline graphic is the ending time point. Therefore, time point τ used in the entire text is the standardized time point. Recall that if we choose a basis with order r, then

graphic file with name M281.gif (B2)

The kth Inline graphic component of ψ(τ) is defined as

graphic file with name M283.gif (B3)

where Inline graphic is an integer function. The first five terms of the orthogonal polynomial are Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic, respectively (Schaeffer 2004).

APPENDIX C: RELATIONSHIP BETWEEN LOGISTIC PARAMETERS AND POLYNOMIAL PARAMETERS

The logistic curve is

graphic file with name M290.gif (C1)

where Inline graphic is chosen for convenience. The initial coordinate is Inline graphic and the coordinate at the inflection point is Inline graphic, which is found by setting Inline graphic and solving for Inline graphic. Letting Inline graphic be the solution for equation Inline graphic, we obtain Inline graphic using Equation C1. Given Inline graphic, we now have three equations with three unknowns (logistic parameters),

graphic file with name M300.gif (C2)

The solutions of Equations C2 are

graphic file with name M301.gif (C3)

We now evaluate the polynomial curve of order 3,

graphic file with name M302.gif (C4)

The coordinate of the initial point is Inline graphic i.e.,

graphic file with name M304.gif (C5)

The second derivative of Inline graphic with respect to τ is

graphic file with name M306.gif (C6)

Setting Inline graphic and solving for τ, we get Inline graphic. Substituting Inline graphic into Equation C4, we have

graphic file with name M310.gif (C7)

We now convert Inline graphic (time point in the standardized scale) back into Inline graphic (time point in the original scale) by

graphic file with name M313.gif (C8)

The simplification is due to Inline graphic. Substituting Inline graphic (Equation C5), Inline graphic (Equation C7), and Inline graphic (Equation C8) into Equations C3, we obtain

graphic file with name M318.gif (C9)

References

  1. Beavis, W. D., 1994. The power and deceit of QTL experiments: lessons from comparative QTL studies, pp. 250–266 in Proceedings of the Forty-Ninth Annual Corn & Sorghum Industry Research Conference. American Seed Trade Association, Washington, DC.
  2. Churchill, G. A., and R. W. Doerge, 1994. Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. De Boor, C., 2001. A Practical Guide to Splines. Springer, New York.
  4. Dempster, A. P., N. M. Laird and D. B. Rubin, 1977. Maximum likelihood from incomplete data via EM algorithm. J. R. Stat. Soc. Ser. B 39: 1–38. [Google Scholar]
  5. Diggle, P. J., K. Y. Liang and S. L. Zeger, 1994. Analysis of Longitudinal Data. Oxford Science Publications/Clarendon Press, Oxford.
  6. Eaves, L. J., M. C. Neale and H. Maes, 1996. Multivariate multipoint linkage analysis of quantitative trait loci. Behav. Genet. 26: 519–525. [DOI] [PubMed] [Google Scholar]
  7. Elston, R. C., and J. Stewart, 1971. A general model for the genetic analysis of pedigree data. Hum. Hered. 21: 523–542. [DOI] [PubMed] [Google Scholar]
  8. Emebiri, L. C., M. E. Devey, A. C. Matheson and M. U. Slee, 1998. Age-related changes in the expression of QTLs for growth in radiata pine seedlings. Theor. Appl. Genet. 97: 1053–1061. [Google Scholar]
  9. Haley, C. S., and S. A. Knott, 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315–324. [DOI] [PubMed] [Google Scholar]
  10. Henderson, C. R., 1982. Analysis of covariance in the mixed model: higher level, no homogenous, and random regressions. Biometrics 38: 623–640. [PubMed] [Google Scholar]
  11. Henderson, C. R., 1986. Recent developments in variance and covariance estimation. J. Anim. Sci. 63: 208–216. [Google Scholar]
  12. Jansen, R. C., 1993. Interval mapping of multiple quantitative trait loci. Genetics 135: 205–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jensen, J., 2001. Genetic evaluation of dairy cattle using test day models. J. Dairy Sci. 84: 2803–2812. [DOI] [PubMed] [Google Scholar]
  14. Jiang, C., and Z-B. Zeng, 1995. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111–1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Jiang, C., and Z-B. Zeng, 1997. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101: 47–58. [DOI] [PubMed] [Google Scholar]
  16. Kao, C. H., Z-B. Zeng and R. D. Teasdale, 1999. Multiple interval mapping for quantitative trait loci. Genetics 152: 1203–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kirkpatrick, M., D. Lofsvold and M. Bulmer, 1990. Analysis of the inheritance, selection and evolution growth trajectories. Genetics 124: 979–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Knott, S. A., and C. S. Haley, 2000. Multitrait least squares for quantitative trait loci detection. Genetics 156: 899–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Korol, A. B., Y. I. Ronin and V. M. Kirzhner, 1995. Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics 140: 1137–1147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Korol, A. B., Y. I. Robin, A. M. Itskovin, J. Peng and E. Nevo, 2001. Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics 157: 1789–1803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lander, E. S., and D. Botstein, 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Ma, C. X., G. Casella and R. L. Wu, 2002. Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 61: 1751–1762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ma, C.-X., R. L. Wu and G. Casella, 2004. FunMap: functional mapping of complex traits. Bioinformatics 11: 1808–1811. [DOI] [PubMed] [Google Scholar]
  24. Macgregor, S., S. A. Knott, I. White and P. M. Visscher, 2005. Quantitative trait locus analysis of longitudinal quantitative trait data in complex pedigrees. Genetics 171: 1365–1376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Mangin, B., P. Thoquet and N. Grimsley, 1998. Pleiotropic QTL analysis. Biometrics 54: 88–99. [Google Scholar]
  26. Morton, N. E., and C. J. MacLean, 1974. Analysis of family resemblance. III. Complex segregation of quantitative traits. Am. J. Hum. Genet. 26: 489–503. [PMC free article] [PubMed] [Google Scholar]
  27. Meyer, K., 2000. Random regressions to model phenotypic variation in monthly weights of Australian beef cows. Livest. Prod. Sci. 65: 19–38. [Google Scholar]
  28. Meyer, K., and W. G. Hill, 1997. Estimation of genetic and phenotypic covariance functions for longitudinal or ‘repeated’ records by restricted maximum likelihood. Livest. Prod. Sci. 47: 185–200. [Google Scholar]
  29. Misztal, I., T. Strabel, J. Jamrozik, J. E. A. Mantysaari and T. H. Meuwissen, 2000. Strategies for estimating the parameters needed for different test day models. J. Dairy Sci. 83: 1125–1134. [DOI] [PubMed] [Google Scholar]
  30. Nelder, J. A., and R. Mead, 1965. A simplex method for function minimization. Comput. J. 7: 308–313. [Google Scholar]
  31. Nuzhdin, S. V., and E. G. Pasyukova, 1997. Sex-specific quantitative trait loci affecting longevity in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 94: 9734–9739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ronin, Y. L., V. M. Kirzhner and A. B. Korol, 1995. Linkage between loci of quantitative traits and marker loci: multitrait analysis with a single marker. Theor. Appl. Genet. 90: 776–786. [DOI] [PubMed] [Google Scholar]
  33. Schaeffer, L. R., 2004. Application of random regression model in animal breeding. Livest. Prod. Sci. 86: 35–45. [Google Scholar]
  34. Schwarz, G., 1978. Estimating the dimension of a model. Ann. Stat. 6: 461–464. [Google Scholar]
  35. Swalve, H. H., 2000. Theoretical basis and computational methods for different test-day genetic evaluation methods. J. Dairy Sci. 83: 1115–1124. [DOI] [PubMed] [Google Scholar]
  36. Verhaegen, D., C. Plomion, J. M. Gion, M. Poitel and P. Costa, 1997. Quantitative trait dissection analysis in Eucalyptus using RAPD markers. 1. Detection of QTL in interspecific hybrid progeny, stability of QTL expression across different ages. Theor. Appl. Genet. 95: 597–608. [Google Scholar]
  37. West, G. B., J. H. Brown and B. J. Enquist, 2001. A general model for ontogenetic growth. Nature 413: 628–631. [DOI] [PubMed] [Google Scholar]
  38. Wiltshire, S., A. P. Morris, M. I. McCarthy and L. R. Cardon, 2005. How useful is the fine-scale mapping of complex trait linkage peaks? Evaluating the impact of additional microsatellite genotyping on the posterior probability of linkage. Genet. Epidemiol. 28: 1–10. [DOI] [PubMed] [Google Scholar]
  39. Wu, R. L., C. X. Ma, J. Zhu and C. George, 2002. a Mapping epigenetic quantitative trait loci (QTL) altering a developmental trajectory. Genome 45: 28–33. [DOI] [PubMed] [Google Scholar]
  40. Wu, R. L., C. X. Ma, M. Chang, R. C. Littell, S. S. Wu et al., 2002. b A logistic mixture model for characterizing genetic determinants causing differentiation in growth trajectories. Genet. Res. 79: 235–245. [DOI] [PubMed] [Google Scholar]
  41. Wu, R. L., C. X. Ma, M. Chang, C. K. Mark, R. C. Littell et al., 2003. Quantitative trait loci for growth trajectories in Populus. Genet. Res. 81: 51–64. [DOI] [PubMed] [Google Scholar]
  42. Wu, R. L., C. X. Ma, L. Min and G. Casella, 2004. A general framework for analyzing the genetic architecture of developmental characteristics. Genetics 166: 1541–1551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wu, W., Y. Zhou, W. Li, D. Mao and Q. Chen, 2002. Mapping of quantitative trait loci based on growth models. Theor. Appl. Genet. 105: 1043–1049. [DOI] [PubMed] [Google Scholar]
  44. Wu, W. R., W. M. Li, D. Z. Tang, H. R. Lu and A. J. Worland, 1999. Time-related mapping of quantitative trait loci underlying tiller number in rice. Genetics 151: 297–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Xu, S., 1998. Further investigation on the regression method of mapping quantitative trait loci. Heredity 80: 364–373. [DOI] [PubMed] [Google Scholar]
  46. Xu, S., 2003. Theoretical basis of the Beavis effect. Genetics 165: 2259–2268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Yan, J. Q., J. Zhu, C. X. He, M. Benmoussa and P. Wu, 1998. a Quantitative trait loci analysis for the developmental behavior of tiller number in rice (Oryza sativa L.). Theor. Appl. Genet. 97: 267–274. [Google Scholar]
  48. Yan, J. Q., J. Zhu, C. X. He, M. Benmoussa and P. Wu, 1998. b Molecular dissection of developmental behavior of plant height in rice (Oryza sativa L.). Genetics 150: 1257–1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zeng, Z-B., 1994. Precision mapping of quantitative trait loci. Genetics 136: 1457–1468. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES