Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Dec 9.
Published in final edited form as: Stat Appl Genet Mol Biol. 2012 Nov 22;11(6):10.1515/1544-6115.1675. doi: 10.1515/1544-6115.1675

A Maximum Likelihood Approach to Functional Mapping of Longitudinal Binary Traits

Chenguang Wang *, Hongying Li , Zhong Wang , Yaqun Wang §, Ningtao Wang , Zuoheng Wang ||, Rongling Wu **
PMCID: PMC3856886  NIHMSID: NIHMS483180  PMID: 23183762

Abstract

Despite their importance in biology and biomedicine, genetic mapping of binary traits that change over time has not been well explored. In this article, we develop a statistical model for mapping quantitative trait loci (QTLs) that govern longitudinal responses of binary traits. The model is constructed within the maximum likelihood framework by which the association between binary responses is modeled in terms of conditional log odds-ratios. With this parameterization, the maximum likelihood estimates (MLEs) of marginal mean parameters are robust to the misspecification of time dependence. We implement an iterative procedures to obtain the MLEs of QTL genotype-specific parameters that define longitudinal binary responses. The usefulness of the model was validated by analyzing a real example in rice. Simulation studies were performed to investigate the statistical properties of the model, showing that the model has power to identify and map specific QTLs responsible for the temporal pattern of binary traits.

Keywords: binary trait, dynamic trait, functional mapping, maximum likelihood estimate

1 Introduction

Many traits important to biology and biomedicine arise in a dichotomous or binary form. For example, the fusiform rust disease in loblolly pine is described as presence or absence of the formation of galls. Despite only two states (yes or no) existing for a binary trait, this type of trait may involve a polygenic, complex component. Genetic mapping based on DNA-based molecular markers has proven to be powerful for detecting and mapping specific genes, known as quantitative trait loci (QTLs), that control a complex trait. Since Lander and Botstein’s (1989) interval mapping was published, there has been a tremendous growth in developing various statistical models for QTL mapping (see Wu et al. 2007; Broman and Sen 2009). Xu and group are among the first to develop a series of statistical approaches for mapping binary traits within the maximum likelihood or Bayesian contexts (Xu and Atchley 1996; Yi and Xu 1999a, 1999b, 2000; Xu et al. 2003, 2005). Manichaikul and Broman (2009) implemented selective genotyping approaches into QTL mapping for binary traits. Visscher et al. (1996) used stochastic stimulation to investigate the behavior of binary trait mapping in backcross and F2 populations. Deng et al. (2006) developed finite logistic regression mixture models for interval mapping of binary traits. Kadarmideen et al. (2001) extended Xu and Atchley’s (1996) approach to consider a binary mapping method for outbred lines.

Although there has been a considerable body of literature for binary mapping, a significant lack in the area is the development of dynamic models for mapping QTLs involved in longitudinal processes of binary traits. Longitudinal binary response data are common in many fields from plant and animal breeding to biomedical research. For example, the presence or absence of mastitis is longitudinally recorded in the course of lactation to breed healthy and productive cows (Rekaya et al. 2003; Hinrichs et al. 2011). To test the efficacy of hypnotic drugs in controlling insomnia, a sleep disorder that pervades the human population, a longitudinal clinical trial is designed in which the time of how quickly patients fall asleep after being treated with the drugs is viewed as a binary response, recorded repeatedly in a time course (Ghahroodi et al. 2010). For longitudinal continuous responses, a statistical model, called functional mapping, has been developed to map their genetic architecture (Ma et al. 2002; Wu and Lin 2006; Li and Wu 2010). Functional mapping uses mathematical functions that describe particular biological processes, such as growth trajectories, to quantify the temporal-spatial pattern of genetic control based on genotype-specific mathematical parameters. It has been extended to takes advantage of statistical modeling of longitudinal mean-covariance structures using various parametric, nonparametric, or semiparametric approaches, thereby equipped with power to test biologically meaningful hypotheses with a minimum set of parameters. It has been widely used to map QTLs that govern body mass growth in mice (Wu et al. 2005) and human (Li et al. 2009) and, recently, integrated with field experimental designs to detect QTLs controlling plant stem height and root growth trajectories in rice (Zhao et al. 2004), soybean (Li et al. 2010; Wu et al. 2011) and poplar (Zhang et al. 2009).

In this article, we propose a statistical model for mapping QTLs for longitudinal binary traits in an experimental cross. Genetic mapping of binary traits considered at a single time point has been shown to be more difficult than that of continuous counterparts (Hackett and Weller 1995; Lange and Whittaker 2001; Manichaikul and Broman 2009). Mapping longitudinal changes of binary traits, therefore, presents a greater challenge than continuous-trait mapping described in previous functional mapping models. In longitudinal studies, we measure repeated observations of a response variable and a set of covariates on the same individual over time so that the responses variables are usually correlated. This dependence must be accounted for in order to make correct inference. To incorporate longitudinal binary traits into a mixture model for functional mapping, we used Fitzmaurice and Laid’s (1993) approach for modeling the association between binary responses in terms of conditional log odds-ratios. We show that functional mapping incorporated by Fitzmaurice and Laid’s model provides great power to map dynamic QTLs for binary traits. We implement the maximum likelihood approach and EM algorithm to estimate and test QTL genotype-specific parameters that define binary dynamics. Simulation studies were used to investigate the statistical properties of the model and validate its usefulness.

2 Methods

2.1 Experimental design

We consider a population derived from a cross between two parental inbred lines P1 and P2, differing substantially in a longitudinal binary trait of interest. Let markers M, with alleles M and m, and N, with alleles N and n, denote two flanking markers for an interval where a putative QTL is tested. A cross between two parents P1 and P2 is performed to produce an F1 population. The F1 progeny are all heterozygotes with the same genotype MN/mn. The F1 individual is crossed back to P1 or P2 to produce a backcross population. There are four possible marker genotypes in the backcross population. Consider an unobserved QTL Q with alleles Q and q that is located in the interval flanked by markers M and N. The distribution of unobserved QTL genotypes can be inferred from the observed flanking marker genotypes according to the recombination frequencies between them (Wu et al. 2007). The conditional probabilities of the QTL genotypes given marker genotypes are given in Table 1.

Table 1.

Conditional probabilities of a putative QTL given the flanking marker genotypes for a backcross population

Marker genotype Expected frequency QTL genotype
Qq qq
MN/mn (1 − r)/2
(1-rMQ)(1-rQN)(1-r)
rMQrQN(1-r)
Mn/mn r/2
(1-rMQ)rQNr
rMQ(1-rQN)r
mN/mn r/2
rMQ(1-rQN)r
(1-rMQ)rQNr
mn/mn (1 − r)/2
rMQrQN(1-r)
(1-rMQ)(1-rQN)(1-r)

rMQ, rQN are the recombination fractions between the left marker M and the putative QTL, the putative QTL and the right marker N, respectively. The recombination fraction between the two flanking markers is r.

2.2 Genetic model

Two genotypes Qq, qq of the QTL in the backcross population each have a 12 frequencies. The genetic model for a QTL

G=[G1G0]=[μ+12aμ-12a] (1)

was proposed to specify the relation between a genotypic valueG and genetic parameters, overall mean (μ) and additive effect (a). G1 and G0 denote the genotypic values of genotypes Qq (1) and qq (0), respectively. If two QTLs are considered for epistatic modeling, the genetic model is written as

G=[G11G10G01G00]=[μ+12a1+12a2+14iaaμ+12a1-12a2-14iaaμ-12a1+12a2-14iaaμ-12a1-12a2+14iaa] (2)

where a1 and a2 are the additive effects of the first and second QTL, respectively, and iaa is the additive × additive epistatic effect between the two QTLs.

2.3 Data structure

The data of QTL mapping consists of two parts: phenotypic data and marker information. Often, we also observe covariate information for each progeny; for example, sex or age. We assume that binary response for each of N individuals is observed at T time points. Then, we generate a T × 1 vector Yi = (Yi1, …, YiT )T, where binary random variable Yit = 1 if progeny i has response 1, i.e., presence, at time t, and 0 otherwise. Also, marker information Mi for progeny i is observed for L loci on a linkage group. Let Mi = (Mi1, …, MiL), where Mil = 1 if the marker is heterozygous at locus l and 0 if the marker is homozygous at locus l. Each progeny has a J × 1 covariate vector Xit at time t, and we let Xi = (Xi1, …, XiT )T represent the T × J matrix of covariates for progeny i. Thus, the data for progeny i includes marker and phenotypic observations in (Yi, Mi, Xi).

2.4 Multivariate model for binary responses

Much of this part is derived from Fitzmaurice and Laird’s (1993) work. First, we describe the statistical model for longitudinal binary responses Y = (Y1, …, YT )T. Let X = (X1, …, XT )T denote the matrix of covariate for response Y, where Xt is a J × 1 covariate vector corresponding to response Yt. The marginal distribution of Yt is binary, expressed as

f(YtXt)=exp[Ytεt-log{1+exp(εt)}] (3)

where a logistic link εt=log{μt/(1-μt)}=XtTβ is assumed, with μt = μt(β) = E(Yt) = Pr(Yt = 1|Xt, β) being the probability of presence at time t and β being a J × 1 vector of parameters. The logit link function is a natural choice for binary responses, although any link function could be used. We use μ(β) to denote the vector of marginal probabilities of presence, μ(β) = E(Y) = (μ1, …, μT )T.

Next, following Fitzmaurice and Laird (1993), we use the form of the joint distribution of Y as follows:

f(YΨ,Ω)=exp{ΨTY+ΩTW-A(Ψ,Ω)} (4)

where W = (Y1Y2, …, YT−1YT, …, Y1Y2YT )T is a vector of two- and higher-way cross-products of Y, Ψ = (ψ1, …, ψT )T, Ω = (ω12, …,ω(T−1)T, …,ω12…T )T are vectors of canonical parameters, and A(Ψ, Ω) is a normalizing constant, exp{A(Ψ, Ω)} = Σexp(ΨTY + ΩTW), with the summation being over all 2T possible values of Y. Note that μ is a function of both Ψ and Ω. Parameters Ψ and Ω can be straightforwardly interpreted in terms of conditional probabilities. For example,

ψr=logit{Pr(Yr=1Ys=0,sr)},r=1,,T.ωrs=logOR(Yr,YsYt=0,tr,s),r<s=1,,T. (5)

and

ω123=logOR(Y1,Y2Y3=1,Ys=0,s>3)-logOR(Y1,Y2Y3=0,Ys=0,s>3)

where

OR(ν,η)=Pr(ν=η=1)Pr(ν=η=0)Pr(ν=1,η=0)Pr(ν=0,η=1)

is the odds ratio.

We assume that Ω is a function of a K×1 parameter vector α = (α1, …, αK)T. In principle, we could use any dependence link function. Yet, a natural choice is a linear link function, Ω = , where Z is a design matrix.

The form of the joint distribution above may model varying degrees of dependence among Yt. If Ω = 0, the independence model results. If Ω = (ω12, …,ω(T−1)T, …,ω12…T )T, we have a saturated model for the association parameters. Between these extremes, parsimonious models for the time dependence can be considered. For instances, we can obtain a quadratic exponential family or pairwise model by fixing a three- and higher-way association parameters of Ω to zero. In Fitzmaurice and Larid (1993), the expression for the derivative of the log-likelihood with respect to β and α is

(L/βL/α)=(XTΔV11-1(Y-μ)ZT(W-E(W)-V21V11-1(Y-μ))) (6)

where V11 = cov(Y ), V21 = cov(W, Y ), and Δ = diag{var(Yt)} is a (T × T ) diagonal matrix.

2.5 QTL mapping

Let us first consider a QTL bracketed by two markers, with conditional probabilities of QTL genotypes given in Table 1. We assume no interference in crossing over in the testing interval. Multiple QTLs with epistasis can be considered in a similar way.

The joint distribution of (Yi, Qi|Mi, Xi, Zi) for progeny i with QTL genotype Qi is expressed as

f(Yi,QiMi,Xi,Zi;Θi)=j=01[pijf(YiXi,Zi;Ψij,Ωij)]I(Qi=j)=j=01[pijexp(ΨijTYi+ΩijTWi-A(Ψij,Ωij))]I(Qi=j)

where pij is the conditional probability of QTL genotype j (j = 1 for Qq and 0 for qq) given the marker genotype of progeny i, Ωij = Ziαj, and Ψij is a function of both βj and αj. All parameters are arrayed in Θi = (pij, βj, αj).

In practice, QTL genotype Qi for progeny i is unobservable. The joint distribution for (Yi|Mi, Xi, Zi) is expressed as

f(YiMi,Xi,Zi;Θi)=j=01[f(YiQi=j,Xi,Zi)Pr(Qi=jMi)]=j=01[pijf(YiXi,Zi;Ψij,Ωij)]=j=01[pijexp(ΨijTYi+ΩijTWi-A(Ψij,Ωij))]

This is a mixture model of two possible multivariate binary densities with different parameters.

Finally, the observed log-likelihood is

L(ΘY,M,X,Z)=i=1N{log[j=01pijexp(ΨijTYi+ΩijTWi-A(Ψij,Ωij))]}. (7)

2.6 Hypothesis testing

In QTL mapping, we test whether there is a QTL that controls a longitudinal binary trait at a given position within a marker interval. This can be performed by the hypotheses

{H0:β1=β0,α1=α0H1:Atleastoneoftheequalitiesabovedoesnothold (8)

We use the likelihood ratio test (LRT) for the existence of a QTL. The LRT statistics is -2log[supΘ0L(Θ0Y,M,X,Z)supΘL(ΘY,M,X,Z)], where Θ0 and Θ are the parameter spaces under the H0 and H1, respectively. The threshold value to reject the null hypothesis cannot be simply chosen from a χ2 distribution because of the violation of regularity conditions of asymptotic theory under H0. Instead, we use permutation tests (Churchill and Doerge 1994) to get the critical value.

2.7 Maximum likelihood estimation

For a pre-specified testing position, the conditional probability parameters pij can be determined from Table 1. Thus, only βj and αj need to be estimated. This can be done by obtaining the derivative of the log-likelihood of progeny i with respect to βj and αj and the Fisher information matrix (see Appendix A for a detail). Parameter estimates are obtained under the H1 and H0, respectively.

2.8 Epistatic Modeling

Genetic interactions between different QTLs, called epistasis, are thought to play an important role in trait control (Wu et al. 2005). However, there is no study yet that reports the detection of epistasis for longitudinal binary traits. Here we show how to map QTLs for longitudinal changes of binary traits. Model (2) shows the epistatic interactions between two QTLs in a backcross design. By incorporating it into the mixture likelihood (7), we have

L(ΘY,M,X,Z)=i=1N{log[j1=01j2=01pij1j2exp(Ψij1j2TYi+Ωij1j2TWi-A(Ψij1j2,Ωij1j)2))]}, (9)

where j1 = 1, 0 and j2 = 1, 0 are the genotypes of the first and second QTL, both of which can be inferred from marker information using conditional probability, pij1j2; and unknown vector Θ contains (pij1j2, αij1j2, βij1j2) which are used to specify two-QTL genotypic values. Based on the relations (2), we can solve the time-dependent changes of additive effects at each QTL (a1t and a2t) and additive × additive epistatic effect (iaat).

3 Simulation Study

The statistical properties of the new mapping model are examined through simulation studies. Consider a random sample composed of N = 400 progeny from a backcross population. We assume only one QTL on a chromosome with 11 equally spaced markers. The interval between two adjacent markers is 20 cM. The assumed QTL is located in the second interval (25 cM from the left end). For each progeny, measurements are taken at 3 different time points, although the model allows an arbitrary number of time points to be analyzed.

The model for the marginal probability of Yi given Xi, Qi = j and βj is

Pr(Yit=1Xit,Qi=j)=exp(βj0+βj1Xit+βj2(t-1))1+exp(βj0+βj1Xit+βj2(t-1))t=1,2,3. (10)

where Xit is obtained from the distribution bin(1, 0.5). The time-dependence of the model is characterized in terms of conditional log odds-ratios, Ωj. The true marginal parameters of the model is determined on the basis of heritability size. In a backcross population, genotypes Qq and qq with an equal frequency are valued as G1 and G0, respectively (1). The genetic variation of a trait explained by the QTL is calculated as σg2=G12+G022-(G1+G02)2. Suppose the binary response corresponds to a latent variable, referred to as the liability, which is considered to be continuous and normally distributed. According to Xu and Atchley (1996), the liability is described as

Zit(Qi=j)=1c(βj0+βj1Xit+βj2(t-1))+eit,

where Zit is the liability of progeny i at time t, eit is the residual with a distribution of N(0, 1), and c=π/3. It is assumed that the binary response of progeny i at time t is 1 if Zit is above 0 and 0 otherwise. In this case, G1=1c(β10+(β11-β01)Xit+(β12-β02)(t-1)),G0=1cβ00. By assuming a heritability 0.1 or 0.4 and residual (eit) variance 1, we can determine the values of the marginal parameters (β00, β01, β02, β10, β11, β12) and the association parameters Ωji = (ωji12, ωji13, ωji23, ωji123)T = (α, α, α, 0)T(j = 0, 1; i = 1, …, N).

The simulated longitudinal binary data were analyzed by the new model, with results given in Figure 1 and Table 2. First, as shown by the maximum LRT, the position of the QTL can be very well estimated even for a small heritability (0.1) (Fig. 1). When the sample size increases from 100 to 400, the QTL can be accurately mapped to a correct position. The parameters, (β10, β11, α1) for QTL genotype Qq and (β00, β01, α0) for QTL genotype qq, that define longitudinal trends of the binary trait can be reasonably well estimated. To precisely estimate these parameters, a sample size 400 is suggested when the heritability is modest, say 0.1. It seems that sample size 100 works if the heritability is high, say 0.4. High heritability in plant and animal genetics can be obtained through controlled experiments that minimize the noise. Simulation studies also indicate that the power of detecting a significant QTL is adequately high, 0.80 or high, for a modest heritability (say 0.1) with a small sample size (say 100).

Figure 1.

Figure 1

LRT profiles for the search of a QTL over the linkage group under different sample sizes (n) and heritabilities (H2). The 5% critical values are obtained from 200 permutation tests. The arrowed vertical lines indicate the estimated QTL position (the correct position is 25 cM from the left end).

Table 2.

MLEs of the parameters that QTL genotype-specific longitudinal binary responses. The root mean square errors (RMSE) of the MLEs calculated from 200 simulation replicates are also given.

Model β10 β11 β1 β00 β01 α0 QTL maxLRT 5% Cut Point
H2 = 0.1
True −1 0.15 0.5 0.1 0.18 0.5 25 - -

MLE(N = 100) −0.9231 0.2021 0.4153 0.0149 0.1604 0.6178 20 14.80 12.17
RMSE 0.1851 0.0757 0.1274 0.1919 0.0601 0.0835
MLE(N = 400) −0.9214 0.1177 0.5035 −0.0854 0.2048 0.5212 23 49.58 12.01
RMSE 0.0969 0.0361 0.0578 0.0938 0.0331 0.0478

H2 = 0.4
True −2 0.3 0.5 0.2 0.36 0.5 25 - -

MLE(N = 100) −2.0869 0.3689 0.6332 −0.1854 0.4367 0.1647 25 53.25 12.81
RMSE 0.2563 0.0977 0.1741 0.2598 0.1159 0.1662
MLE(N = 400) 1.9836 0.2786 0.3367 0.2586 0.3396 0.5124 25 237.76 14.44
RMSE 0.1146 0.0511 0.1292 0.1376 0.0471 0.0569

4 Worked Example

We used a simple example to demonstrate the usefulness of our new mapping model. A doubled-haploid (DH) population of 123 lines was derived from semi-dwarf IR64 and tall Azucena in which a genetic linkage map was constructed to cover 12 rice chromosomes (Huang et al. 1997). The DH population was planted in a randomized complete design in the field. Plant height was measured every 10 days from 10 days after transplanted into the field to the date of plant heading. Zhao et al. (2004) used functional mapping to map QTL × environment interactions for plant height trajectories in the same population. By categorizing DH lines into two groups based on whether plant height is beyond (1) or below the median (0), one obtains a binary response for plant height at multiple time points.

By scanning the rice genome, we identified a QTL located at markers adh1 on chromosome 11, which is tested to be significant from permutation tests at the chromosome level (Figure 2). Based on Xu and Atchley’s (1996) liability, we calculated and found 1% of the total phenotypic variance, explained by this QTL, for plant height treated as a binary traits. In Zhao et al.’s (2004) analysis, chromosome 11 was also found to harbor a significant QTL that affect environment-dependent variation in plant growth trajectories. Although our categorization of plant height into a binary response is to aim to produce a longitudinal binary trait for testing our model, such a treatment seems not to be unreasonable given the DH progeny derived from one tall parent Azucena and other semi-dwarf parent IR64. A single gene, semi-dwarf-1 (sd-1), has been characterized to control rice plant height through gibberellins signaling pathway (Sasaki et al. 2002). The association between the alcohol dehydrogenase marker, adh1, and plant height was also detected in a different mapping population, in which Sripongpangkul et al. (2000) identified the genomic region linked to adh1 that carries QTLs responsible for plant elongation in rice.

Figure 2.

Figure 2

Log-likelihood ratio profile of QTL detection for plant height through rice chromosome 11. The position of QTL is indicated by the arrow. The horizonal line is the 5% significance chromosome-wise threshold obtained from permutation tests.

5 Discussion

Dynamic changes of complex traits including binary traits are a ubiquitous phenomenon in biology and biomedicine (Rekaya et al. 2003; Hinrichs et al. 2011; Ghahroodi et al. 2010). However, the genetic architecture of dynamic binary traits expressed in a time course has been poorly understood, thus limiting our inference about their developmental regulation in relation to important yield traits in agriculture or disease traits in medicine. In this article, we propose to shed light on the genetic control of dynamic binary traits by developing a statistical model for functional mapping on longitudinal binary resposne. Previous studies have focused on the genetic mapping of binary traits collected at single time points (Xu and Atchley 1996; Visscher et al. 1996; Yi and Xu 1999a, b, 2000; Kadarmideen et al. 2001; Xu et al. 2003, 2005; Deng et al. 2006; Manichaikul and Broman 2009), or functional mapping of dynamic traits that continuously vary (Ma et al. 2002; Wu and Lin 2006; Li and Wu 2010). The model presented here unifies these two aspects to better address an important genetic issue.

This unification involves a special consideration for modeling the correlation structure of repeated binary measurements, rather than is a simple summation of binary mapping and functional mapping. By making use of Fitzmaurice and Laird’s (1993) model, we integrated the association between binary responses in terms of conditional log odds-ratios within the functional mapping framework, allowing the test and characterization of higher-order associations. This parametrization produces the maximum likelihood estimates of the marginal mean parameters that are robust to misspecification of the time dependence, as theoretically proven in Fitzmaurice and Laird (1993). Our simulation studies show that the mapping model by integrating Fitzmaurice and Laird’s parameterization provides good accuracy and precision for the estimation of QTL locations and effects on dynamic patterns of binary response.

Statistical modeling and analysis of longitudinal binary responses have received considerable attention owing to the importance of these variables in addressing biological questions (Fitzmaurice and Laird 1993; Lin et al. 2004; Fitzmaurice et al. 2006). The model developed here to map QTLs for longitudinal features of binary traits is a starting point from which new models of various levels of complexity can be developed to handle problems that are closer to reality. Birmingham and Fitzmaurice (2004) discussed a case in which binary responses are measured repeatedly with some subjects who drop out from the trial with a mechanism depending on unobserved responses. Lin et al. (2004) provided an analysis model for longitudinal data with irregular, outcome-dependent follow-up. Many authors made efforts to model multiple longitudinal variables that include either binary responses or a mix of binary and continuous responses (Zeger and Liang 1986; Sammel et al. 1997; Skrondal and Rabe-Hesketh 2004). Although our approach is based on maximum likelihood approaches, other approaches, like regression models (Haley and Knott 1992), can also be developed to map QTLs for longitudinal binary response. With all these statistical developments, we will be in an excellent position to ask and address fundamental questions about the genetic control of biological processes that arise as presence or absence over time and space.

Acknowledgments

This work is partially supported by NSF/IOS-0923975 and NIH/UL1RR0330184. We thank Dr. Jun Zhu at Zhejiang University for providing his rice data for us to test our binary mapping model.

Appendix A

A1. Maximum likelihood estimation under the H1

The derivative of the observed log-likelihood (Lo) of progeny i with respect to βj and αj is expressed as

(Lio/β1Lio/α1Lio/β0Lio/α0)=E(Lif/β1Lif/α1Lif/β0Lif/α0)=(E(Qi)XiTΔ1iV1i11-1(Yi-μ1i)E(Qi)ZiT{Wi-ν1i-V1i21V1i11-1(Yi-μ1i)}(1-E(Qi))XiTΔ0iV0i11-1(Yi-μ0i)(1-E(Qi))ZiT{Wi-ν0i-V0i21V0i11-1(Yi-μ0i)})

where Inline graphic denotes taking expectation with respect to Qi given Yi, i.e., E(Qi)=E(QiYi)=pi1exp(Ψi1TYi+Ωi1TWi-A(Ψi1,Ωi1))j=01[pijexp(ΨijTYi+ΩijTWi-A(Ψij,Ωij))],Lio=Li(θYi) is the observed log-likelihood for progeny i and Lif=Li(θYi,Qi) is the full log-likelihood for this progeny.

Next, we derive the Fisher information matrix under the full log-likelihood function. Let Lf=i=1NLif,

E(Lf/β1Lf/α1Lf/β0Lf/α0)(Lf/β1Lf/α1Lf/β0Lf/α0)T=i=1NE(Lif/β1Lif/α1Lif/β0Lif/α0)(Lif/β1Lif/α1Lif/β0Lif/α0)T=Diagonal[pi1XiTΔ1iV1i11-1Δ1iXi,pi1ZiT(V1i22-V1i21V1i11-1V1i21T)Zi,pi0XiTΔ0iV0i11-1Δ0iXi,pi0ZiT(V0i22-V0i21V0i11-1V0i21T)Zi].

The MLEs, (β̂1, α̂1, β̂0, α̂0), can be obtained using the following general Fisher scoring algorithm:

β^1(τ+1)=β^1τ+(i=1Npi1XiTΔ1i(τ)V1i11-1(τ)Δ1i(τ)Xi)-1×{i=1NE(Qi)(τ)XiTΔ1i(τ)V1i11-1(Yi-μ1i(τ))},α^1(τ+1)=α^1(τ)+{i=1Npi1ZiT(V1i22(τ)-V1i21(τ)V1i11-1(τ)V1i21T(τ))Zi}-1×[i=1NE(Qi)(τ)ZiT{Wi-ν1i(τ)-V1i21(τ)V1i11-1(τ)(Yi-μ1i(τ))}],β^0(τ+1)=β^0τ+(i=1Npi0XiTΔ0i(τ)V0i11-1(τ)Δ0i(τ)Xi)-1×{i=1N(1-E(Qi)(τ))XiTΔ0i(τ)V0i11-1(Yi-μ0i(τ))},α^0(τ+1)=α^0(τ)+{i=1Npi0ZiT(V0i22(τ)-V0i21(τ)V0i11-1(τ)V0i21T(τ))Zi}-1×[i=1N(1-E(Qi))(τ)ZiT{Wi-ν0i(τ)-V0i21(τ)V0i11-1(τ)(Yi-μ0i(τ))}].

Note that, in the iteration procedure, the Fisher information matrix we used is not the Fisher information matrix under the observed log-likelihood but the one under the full data log-likelihood. That is why we called this algorithm the general Fisher scoring algorithm (see Appendix B). The reason we used the full log-likelihood Fisher information matrix is that it is easier and faster to compute than the observed log-likelihood Fisher information matrix.

Evaluation of the scoring equations above requires estimation of the joint probabilities for each progeny. That is, for any μji and Ωji, νji, Vji11, Vji21, and Vji22 depend on the two-way and higher-way marginal probabilities. In general, there is no closed form expression representing the joint probabilities as a function of μji and Ωji. But we can use an iterative proportional fitting (IPF) algorithm to estimate the two-way and higher-way marginal probabilities (see Appendix C).

This algorithm does not provide us with an estimate of the asymptotic variance-covariance matrix of (β̂1, α̂1, β̂0, α̂0). Yet, we can calculate the Fisher information matrix under the observed log-likelihood using the MLEs obtained from the algorithm. The covariance can be approximated by the inverse of the Fisher information matrix. The sample empirical covariance matrix of the individual scores in any correctly specified model is a consistent estimator of the Fisher information and involves only the first derivatives. Thus, a consistent estimator of the asymptotic variance-covariance matrix of (β̂1, α̂1, β̂0, α̂0) is

var(β1α1β0α0)=[i=1N{(Lioβ1Lioα1Lioβ0Lioα0)(Lioβ1Lioα1Lioβ0Lioα0)T}]-1=[i=1N{E(Lifβ1Lifα1Lifβ0Lifα0)E(Lifβ1Lifα1Lifβ0Lifα0)T}]-1

A2. Maximum likelihood estimation under the H0

Under the H0, there is only one component for the data so that there is only one group of parameters (β, α). We can just let pi1 = 1, ∀i ∈ 1, …, N, and use the same algorithm as above.

Appendix B

To obtain the MLEs of Θ in observed log-likelihood function Lo(Θ|Y ), a classical Fisher scoring algorithm is iterated as follows:

Θ(τ+1)=Θ(τ)+{V(-1)LoΘ}Θ=Θ(τ)

where V=[E(LoΘ)(LoΘ)T].

Actually, replacing V with any V1 where V1 > V, the algorithm still can work but will need more step to converge. Since var(X) = E(XXT) − E(X)E(X)T > 0, we know that this is always true E(XXT) > E(X)E(X)T. Let Lf(Θ|Y, Q) denote the full log-likelihood function. If X=LfΘ, we have E[(LfΘ)(LfΘ)TY]>E(LfΘY)E(LfΘY)T. Taking expectation with respect to Y for both side and applying LoΘ=E(LfΘY), we get E[(LfΘ)(LfΘ)T]>E[(LoΘ)(LoΘ)T]. a general Fisher scoring algorithm is iterated as follows:

Θ(τ+1)=Θ(τ)+{V(-1)LoΘ}Θ=Θ(τ)

where V=[E(LfΘ)(LfΘ)T].

Appendix C

The iterative proportional fitting procedure was originally developed by Deming and Stephan (1940) for adjusting the counts in an r × c contingency table to satisfy marginal totals derived from another source. The classical procedure takes a start table, and multiplies the elements of the table by appropriate scaling factors, sequentially adjusting them until they satisfy the desired set of margins. This iterative procedure can be applied to multidimensional tables and is always convergent.

Thus, given Ωi, and set Ψi = 0, we get a start 2T table, Sii), which has the specified conditional log odds-ratios. Then we fit the first-order margins μi to Sii). The procedure iterates like this

Pr(Yi1=j1,,Yi(t-1)=j(t-1),Yit=1,Yi(t+1)=j(t+1),,YiT=jiT)(k+1)=Pr(Yi1=j1,,Yi(t-1)=j(t-1),Yit=1,Yi(t+1)=j(t+1),,YiT=jiT)(k)μitj1,,jTPr(Yi1=j1,,Yi(t-1)=j(t-1),Yit=1,Yi(t+1)=j(t+1),,YiT=jiT)(k)Pr(Yi1=j1,,Yi(t-1)=j(t-1),Yit=0,Yi(t+1)=j(t+1),,YiT=jiT)(k+1)=Pr(Yi1=j1,,Yi(t-1)=j(t-1),Yit=0,Yi(t+1)=j(t+1),,YiT=jiT)(k)(1-μit)j1,,jTPr(Yi1=j1,,Yi(t-1)=j(t-1),Yit=0,Yi(t+1)=j(t+1),,YiT=jiT)(k)t=1,,T.

This procedure proportionally adjusts the cells of Sii) until they satisfy the set of margins defined by μi. Finally, it will give a 2T table of cell probabilities, mii, μi}, with margins satisfying μi and with conditional log odds-ratios, Ωi. Using these cell probabilities, we can then calculate updated estimates of νi, Vi11, Vi21, and Vi22.

References

  1. Birmingham J, Fitzmaurice GM. A pattern-mixture model for longitudinal binary responses with nonignorable nonresponse. Biometrics. 2002;58:989–996. doi: 10.1111/j.0006-341x.2002.00989.x. [DOI] [PubMed] [Google Scholar]
  2. Broman KW, Sen S. A Guide to QTL Mapping with R/qtl. Springer; New York: 2009. [Google Scholar]
  3. Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Deming WE, Stephan FF. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann Math Stat. 1940;11:427–444. [Google Scholar]
  5. Deng WP, Chen HF, Li ZH. A logistic regression mixture model for interval mapping of genetic trait loci affecting binary phenotypes. Genetics. 2006;172:1349–1358. doi: 10.1534/genetics.105.047241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Fitzmaurice GM, Laird NM. A likelihood-based method for analysing longitudinal binary responses. Biometrika. 1993;80:141–151. [Google Scholar]
  7. Fitzmaurice GM, Lipsitz SR, Ibrahim JG, Gelber R, Lipshultz S. Estimation in regression models for longitudinal binary data with outcome-dependent follow-up. Biostatistics. 2006;7:469–485. doi: 10.1093/biostatistics/kxj019. [DOI] [PubMed] [Google Scholar]
  8. Ghahroodi ZR, Ganjali M, Kazemi I. Models for longitudinal analysis of binary response data for identifying the effects of different treatments on insomnia. Appl Math Sci. 2010;4:3067–3082. [Google Scholar]
  9. Hackett CA, Weller JI. Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics. 1995;51:1252–1263. [PubMed] [Google Scholar]
  10. Haley CS, Knott SA. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity. 1992;69:315–324. doi: 10.1038/hdy.1992.131. [DOI] [PubMed] [Google Scholar]
  11. Hinrichs D, Bennewitz J, Stamer E, Junge W, Kalm E, Thaller G. Genetic analysis of mastitis data with different models. J Dairy Sci. 2011;94:471–478. doi: 10.3168/jds.2010-3374. [DOI] [PubMed] [Google Scholar]
  12. Huang N, Parco A, Mew T, Magpantay G, McCouch S, Guiderdoni E, Xu J, Subudhi P, Angeles ER, Khush GS. RFLP mapping of isozymes, RAPD and QTLs for grain shape, brown planthopper resistance in a doubled haploid rice population. Mol Breed. 1997;3:105–113. [Google Scholar]
  13. Kadarmideen HN, Janss LLG, Dekkers JCM. Generalized marker regression and interval QTL mapping methods for binary traits in half-sib family designs. J Anim Breed Genet. 2001;118:297–309. [Google Scholar]
  14. Lander ES, Botstein D. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989;121:185–199. doi: 10.1093/genetics/121.1.185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lange C, Whittaker JC. Mapping quantitative trait loci using generalized estimating equations. Genetics. 2001;159:1325–1337. doi: 10.1093/genetics/159.3.1325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Li N, Das K, Wu RL. Functional mapping of human growth trajectories. J Theor Biol. 2009;261:33–42. doi: 10.1016/j.jtbi.2009.07.020. [DOI] [PubMed] [Google Scholar]
  17. Li Q, Huang ZW, Xu M, Wang CG, Gai JY, Huang YJ, Pang XM, Wu RL. Functional mapping of genotype-environment interactions for soybean growth by a semiparametric approach. Plant Methods. 2010;6:13. doi: 10.1186/1746-4811-6-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lin H, Scharfstein DO, Rosenheck RA. Analysis of longitudinal data with irregular, outcome-dependent follow-up. J Roy Stat Soc Ser B. 2004;66:791–813. [Google Scholar]
  19. Manichaikul A, Broman KW. Binary trait mapping in experimental crosses with selective genotyping. Genetics. 2009;182:863–874. doi: 10.1534/genetics.108.098913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Rekaya R, Gianola D, Shook G. Longitudinal random effects models for genetic analysis of binary data with application to mastitis in dairy cattle. Genet Sel Evol. 2003;35:457–468. doi: 10.1186/1297-9686-35-6-457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Sasaki A, Ashikari M, Ueguchi-Tanaka M, Itoh H, Nishimura A, Swapan D, Ishiyama K, Saito T, Kobayashi M, Khush GS, Kitano H, Matsuoka M. Green revolution: A mutant gibberellin-synthesis gene in riceNew insight into the rice variant that helped to avert famine over thirty years ago. Nature. 2002;416:701–702. doi: 10.1038/416701a. [DOI] [PubMed] [Google Scholar]
  22. Sammel MD, Ryan LM, Legler JM. Latent variable models for mixed discrete and continuous outcomes. J Roy Stat Soc Ser B. 1997;59:667–678. [Google Scholar]
  23. Sripongpangkul K, Posa GBT, Senadhira DW, Brar D, Huang N, Khush GS, Li ZK. Genes/QTLs affecting flood tolerance in rice. Theor Appl Genet. 2000;101:1074–1081. [Google Scholar]
  24. Skrondal K, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Chapman & Hall/CRC; Boca Raton, FL: 2004. [Google Scholar]
  25. Visscher PM, Haley CS, Knott SA. Mapping QTLs for binary traits in backcross and F2 populations. Genet Res. 1996;68:55–63. [Google Scholar]
  26. Wu RL, Ma CX, Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. Springer-Verlag; New York: 2007. [Google Scholar]
  27. Wu RL, Ma CX, Hou W, Corva P, Medrano JF. Functional mapping of quantitative trait loci that interact with the hg gene to regulate growth trajectories in mice. Genetics. 2005;171:239–249. doi: 10.1534/genetics.104.040162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wu RL, Cao JG, Huang ZW, Wang Z, Gai JY, Vallejos CE. Systems mapping: How to improve the genetic mapping of complex traits through design principles of biological systems. BMC Sys Biol. 2011;5:84. doi: 10.1186/1752-0509-5-84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Xu S, Atchley WR. Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics. 1996;143:1417–1424. doi: 10.1093/genetics/143.3.1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Xu C, Li Z, Xu S. Joint mapping of quantitative trait loci for multiple binary characters. Genetics. 2005;169:1045–1059. doi: 10.1534/genetics.103.019406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Xu S, Yi N, Burke D, Galecki A, Miller RA. An EM algorithm for mapping binary disease loci: application to fibrosarcoma in a four-way cross mouse family. Genetical Research. 2003;82:127–138. doi: 10.1017/s0016672303006414. [DOI] [PubMed] [Google Scholar]
  32. Yi N, Xu S. Mapping quantitative trait loci for complex binary traits in outbred populations. Heredity. 1999a;82:668–676. doi: 10.1046/j.1365-2540.1999.00529.x. [DOI] [PubMed] [Google Scholar]
  33. Yi N, Xu S. A random model approach to mapping quantitative trait loci for complex binary traits in outbred populations. Genetics. 1999b;153:1029–1040. doi: 10.1093/genetics/153.2.1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Yi N, Xu S. Bayesian mapping of quantitative trait loci for complex binary traits. Genetics. 2000;155:1391–1403. doi: 10.1093/genetics/155.3.1391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42:121–130. [PubMed] [Google Scholar]
  36. Zhang B, Tong CF, Yin TM, Zhang XY, Zhuge Q, Huang MR, Wang MX, Wu RL. Detection of quantitative trait loci influencing growth trajectories of adventitious roots in Populus using functional mapping. Tree Genet Genom. 2009;5:539–552. [Google Scholar]
  37. Zhao W, Zhu J, Gallo-Meagher M, Wu RL. A unified statistical model for functional mapping of genotype environment interactions for ontogenetic development. Genetics. 2004;168:1751–1762. doi: 10.1534/genetics.104.031484. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES