Evaluating Haplotype Effects in Case-Control Studies via Penalized-Likelihood Approaches: Prospective or Retrospective Analysis?

Megan L Koehler; Howard D Bondell; Jung-Ying Tzeng

doi:10.1002/gepi.20545

. Author manuscript; available in PMC: 2011 Dec 1.

Published in final edited form as: Genet Epidemiol. 2010 Dec;34(8):892–911. doi: 10.1002/gepi.20545

Evaluating Haplotype Effects in Case-Control Studies via Penalized-Likelihood Approaches: Prospective or Retrospective Analysis?

Megan L Koehler ¹, Howard D Bondell ¹, Jung-Ying Tzeng ^1,^2,^*

PMCID: PMC3208948 NIHMSID: NIHMS331487 PMID: 21104891

Abstract

Penalized likelihood methods have become increasingly popular in recent years for evaluating haplotype-phenotype association in case-control studies. Although a retrospective likelihood is dictated by the sampling scheme, these penalized methods are typically built upon prospective likelihoods due to their modeling simplicity and computational feasibility. It has been well documented that for unpenalized methods, prospective analyses of case-control data can be valid but less efficient than their retrospective counterparts when testing for association, and result in substantial bias when estimating the haplotype effects. For penalized methods, which combine effect estimation and testing in one step, the impact of using a prospective likelihood is not clear. In this work, we examine the consequences of ignoring the sampling scheme for haplotype-based penalized likelihood methods. Our results suggest that the impact of prospective analyses depends on (1) the underlying genetic mode and (2) the genetic model adopted in the analysis. When the correct genetic model is used, the difference between the two analyses is negligible for additive and slight for dominant haplotype effects. For recessive haplotype effects, the more appropriate retrospective likelihood clearly outperforms the prospective likelihood. If an additive model is incorrectly used, as the true underlying genetic mode is unknown a priori, both retrospective and prospective penalized methods suffer from a sizeable power loss and increase in bias. The impact of using the incorrect genetic model is much bigger on retrospective analyses than prospective analyses, and results in comparable performances for both methods. An application of these methods to the Genetic Analysis Workshop 15 rheumatoid arthritis data is provided.

Keywords: haplotype-based association analysis, variable selection, regularized regression, prospective likelihood, retrospective likelihood

INTRODUCTION

Haplotype-based association analysis evaluates the joint effects of closely linked genetic markers on a trait of interest. When compared to its single-marker counterparts, this multi-marker approach can be more powerful to detect associations when the causal variants are not genotyped [de Bakker et al., 2005; Zaitlen et al., 2007], have low frequency [de Bakker et al., 2005; Schaid, 2004], or exhibit cis-acting effects [Clark, 2004; Schaid, 2004]. A standard approach for performing haplotype-based analysis is to regress the trait value on the haplotypes and test the significance of the regression parameters [Balding, 2006]. In recent years, applying penalized likelihood methods to identify important haplotypic factors has become increasingly popular in the literature. For example, Li et al. [2007] use the least absolute shrinkage and selection operator (LASSO) [Tibshirani, 1996] to perform selection among numerous possible haplotypes resulting from different haplotype window lengths. Guo and Lin [2009] use LASSO regression to evaluate the effects of rare haplotypes and high-dimension haplotype–environment interactions. Tzeng et al. [2010] use adaptive LASSO regression [Zou, 2006] to study high dimensional gene-treatment interactions in a haplotype-based pharmacogenetic analysis. These methods introduce a penalty on the regression coefficients and shrink the coefficient estimates of non-important covariates towards zero. The motivation behind using penalized methods in haplotype-based analysis is that while the model space under consideration may be large (e.g., 6 to 16 haplotypic predictors with a sample size of 500 to 1000 [Chen and Kao 2006; Epstein and Satten, 2003; French et al., 2006; Stram et al., 2003], which can yield 2⁶ to 2¹⁶ possible models), many of the haplotypic predictors are not likely to be associated with the phenotype. In this case, it is more efficient to shrink these effect estimates to zero than to estimate them purely. This shrinkage leads to a reduction in variance and can increase the power to detect important haplotypic predictors [Guo and Lin, 2009].

Modifications of classic penalized methods have also been developed to perform haplotype-based analysis and attempt to address issues specific to this type of analysis. Tanck and colleagues [Souverein et al., 2006, 2008; Tanck et al., 2003] use a modified version of Ridge regression to stabilize inference for rare haplotypes. By constructing an L₂-norm penalty term on the differences in coefficients of similar haplotypes, the coefficients of rare haplotypes are smoothed towards that of a similar common haplotype. Chen et al. [2009] develop an adaptive penalized likelihood framework to address the precision-efficiency tradeoff encountered in retrospective methods. Motivated by the fact that typical retrospective case-control estimates of haplotype effects are efficient but sensitive to violation of underlying assumptions (e.g., Hardy-Weinberg equilibrium and gene-environment independence), they construct a penalized estimator based on either L₁-norm or L₂-norm penalty that combines the merits of assumption-free estimators (i.e. robust) and assumption-dependent estimators (i.e. efficient). Tzeng and Bondell [2010] modify traditional adaptive LASSO regression by placing an L₁-norm penalty on pair-wise differences of the regression coefficients. This allows for effect comparisons between all pairs of distinct haplotypes, rather than with respect to an arbitrary baseline haplotype, during the estimation process. As a result, the approach is able to sort haplotypes into different groups according to their effect sizes and eliminates the need for a post-hoc pair-wise analysis of haplotype effects. The key of a penalized regression method lies in the form of the penalty – by carefully designing the form of the penalty, one can gear the penalized-likelihood approach towards accomplishing various desired tasks.

Penalized regression methods rely on the underlying data likelihood. When analyzing data from case-control studies, one can implement methods based on a prospective likelihood (modeling the probability of disease status conditional on exposure) or a retrospective likelihood (modeling the probability of exposure conditional on disease status). Under a case-control design, a retrospective likelihood should be used because data are collected based on disease status. However, in practice, it is common for researchers to use a prospective likelihood, as it does not require specifying a model for the joint distribution of the genetic and environmental effects. Bypassing this step makes implementing prospective methods much easier than retrospective methods [Lin et al, 2005]. This approach seems congruent with the well-known result that optimizing the prospective likelihood yields the same inference on the disease model parameters as optimizing the retrospective likelihood [Prentice and Pyke, 1979]. This result requires that the distribution of the covariates be free of restrictions, which does not generally hold in haplotype-based analysis. Haplotypes are not directly observed from unphased genotype data. In order to reconstruct the haplotypes and estimate their effects, some assumptions must be placed on their frequency distribution (typically Hardy-Weinberg equilibrium).

Most of the penalized regression approaches mentioned above utilized a prospective likelihood. It has been well-documented that when using non-penalized regression methods in haplotype-based analysis of case-control data, ignoring the ascertainment scheme can be detrimental. A prospective analysis can lead to a loss of efficiency and severe bias when assessing the haplotype effects [Cordell, 2006; Satten and Epstein, 2004; Stram et al., 2003]. The aim of this work is to determine whether similar consequences occur when using penalized regression for case-control studies. Specifically, we consider the adaptive LASSO penalty, and use simulation studies to examine the relative performance in parameter estimation and model selection between the penalized method using a prospective likelihood and using a retrospective likelihood. Our results suggest that the impact of using a prospective likelihood in place of a retrospective likelihood depends on (1) the underlying genetic mode of the causal variants, and (2) the genetic model used in the analysis. If the correct genetic model is used, then the difference between the two analyses is negligible for additive and slight for dominant haplotype effects. For recessive haplotype effects, the more appropriate retrospective likelihood clearly outperforms the prospective likelihood. If an additive model is used regardless of the underlying genetic mode, then both retrospective and prospective penalized methods suffer from a sizeable power loss and increase in bias. The impact of using the incorrect genetic model is much bigger on retrospective analyses than prospective analyses, and results in comparable performances for both methods. In addition to extensive simulation studies, we present an application to the Genetic Analysis Workshop 15 rheumatoid arthritis data.

METHODS

PROSPECTIVE AND RETROSPECTIVE LIKELIHOODS

Let the vector (Y_i, G_i, E_i) represent the observed data for individual i in a case-control sample of size n. Let Y_i be a binary indicator of disease status where Y_i =1 if individual i is a case and 0 otherwise. Let G denote the unphased genotype of individual i at m biallelic SNPs and E_i denote any environmental covariates measured on individual i. Let H_i represent the vector of haplotype counts for individual i. Although researchers want to investigate the relationship between Y_i and H_i, they only have access to G_i; therefore, the individual’s haplotype set must be inferred from their unphased genotypes.

The relationship between the disease phenotype and the covariates can be characterized by the conditional density function P(Y | H, E). A standard approach for binary trait values is logistic regression which models the conditional probability as

P (Y = y ∣ H, E) = \frac{exp {y \cdot (β_{0} + Z {(H, E)}^{T} β)}}{1 + exp {β_{0} + Z {(H, E)}^{T} β}},

where β₀ is an intercept, β is the vector of disease model parameters representing the log-odds ratios, and Z(H, E) is a specified vector-valued function of the vector of haplotype counts H and the vector of environmental covariates E. For example, one can use an identity function for Z(H, E) so that Z(H, E)^T = [H^*^T, E^T], where H^* is the vector H with baseline haplotype element removed. Other examples of Z(H, E) are described in the data generation section and the choice depends on the genetic model of the haplotype effects adopted in the analysis. The dimension of β is determined by the dimension of [H^*, E]; it is the sum of the number of haplotypes (excluding the baseline) and the number of environmental covariates included in the model, along with any interaction terms that may be used. The dimension of β is denoted by p. Throughout this work, we assume that the sample size is greater than the dimension of β, i.e. n > p. This is consistent with case-control data sets, which typically have sample sizes on the order of 10² to 10³ and the number of potential predictors on the order of 10 [Chen and Kao 2006; Epstein and Satten, 2003; French et al., 2006; Stram et al., 2003].

Various likelihood models have been developed to conduct inference about the disease model parameters in haplotype-based analyses while properly accounting for phase uncertainty. The inference can be based on a prospective likelihood or on a retrospective likelihood. In this work, we consider maximum likelihood methods developed by two groups – one focusing on a prospective approach and the other on a retrospective approach. We implement the prospective method developed in Lake et al [2003]. Their prospective likelihood models P(Y_i | G_i, E_i) and is expressed as

L_{P} = \prod_{i = 1}^{n} P (Y_{i} ∣ G_{i}, E_{i}) = \prod_{i = 1}^{n} \sum_{H_{i} \in S (G_{i})} P (H_{i}, Y_{i} ∣ G_{i}, E_{i}) = \prod_{i = 1}^{n} \sum_{H_{i} \in S (G_{i})} P (Y_{i} ∣ H_{i}, E_{i}) P (H_{i}),

where S(G) is the set of all haplotype pairs consistent with G, $P (H = h) = 2 \prod_{j = 1}^{l} π_{j}^{h_{j}} / h_{j}$ under the assumption of Hardy-Weinberg equilibrium, h_k is the number of copies of the k^th haplotype in H,π_k is the population frequency of the k^th haplotype, and l is the number of haplotypes included in the disease model. We implement the retrospective method developed in Lin and Zeng [2006]. Their retrospective likelihood models P(G_i, E_i | Y_i) and is expressed as

L_{R} = \prod_{i = 1}^{n} P (G_{i}, E_{i} ∣ Y_{i}) = \prod_{i = 1}^{n} \sum_{H_{i} \in S (G_{i})} P (H_{i}, G_{i}, E_{i} ∣ Y_{i}) \propto \prod_{i = 1}^{n} \sum_{H_{i} \in S (G_{i})} P (Y_{i} ∣ H_{i}, E_{i}) P (H_{i}) P (E_{i} ∣ G_{i}) .

The only difference between the two likelihoods is the conditional density function P(E_i | G_i) found in the retrospective likelihood. The parameters in this model are of no interest to researchers performing haplotype-based association analysis, but they must be estimated in order to make proper inference when using a retrospective design. Specifying a model for this conditional density function and the subsequent maximum likelihood estimation are computationally intensive. As a result, researchers often rely on prospective methods when analyzing case-control data even though retrospective methods are dictated by the ascertainment scheme [Lin and Zeng, 2006].

HAPLOTYPE ANALYSIS VIA PENALIZED LIKELIHOOD METHODS

While many different penalized likelihood methods can be used in haplotype-based association analysis, we consider the adaptive LASSO (ALASSO) penalty in this work. This approach achieves simultaneous variable selection and parameter estimation and is an oracle procedure. This refers to the fact that the approach asymptotically selects the correct model, and the resulting estimator is root-n consistent and asymptotically normal with the same variance as if the true model were known beforehand [Zou, 2006].

The ALASSO effect estimates are obtained by minimizing a penalized negative log-likelihood. These estimates are expressed as

{\hat{β}}_{λ} = {argmin}_{β} - ℓ_{n} (β, φ) + λ \sum_{j = 1}^{p} w_{j} | β_{j} |,

where ℓ_n (β,φ) denotes the log-likelihood, φ is a (possible) set of nuisance parameters (e.g. the haplotype frequencies, π_k), λ is the non-negative regularization parameter that controls the amount of shrinkage, and w_j are data-dependent weights. By placing an L₁-norm penalty on the regression coefficients, the ALASSO can set their estimates to exactly zero if the value of λ is large enough. It is this feature that allows the procedure to perform simultaneous variable selection and parameter estimation. Unlike its predecessor the LASSO, the ALASSO places a different penalty on each coefficient through the use of adaptive weights that are inversely proportional to their relative importance. Consequently, haplotypes with negligible effects receive larger penalties and are more readily shrunk to zero. This allows the effects of associated haplotypes to be estimated more efficiently. Zou [2006] proposed to set the weights as w_j = |β̃_j|⁻^γ where β̃_j is an initial root-n consistent estimator of β_j and γ > 0 is an additional tuning parameter. In our analysis, we chose γ =1 and let β̃_j be the maximum likelihood estimate of the haplotype effect computed by haplo.glm in R and HAPSTAT in Linux for the prospective and retrospective likelihoods, respectively [Lake et al, 2003; Lin et al, 2005].

When performing penalized likelihood methods, it is typical to center and scale the design matrix. Scaling assures that each column of the design matrix has the same variance and the resulting estimator is scale equivariant (i.e., multiplication of any predictor by any constant will simply divide the resulting slope estimate by the identical constant; hence the linear predictor remains unchanged). This is desirable so that if, for example, the units of a predictor are changed, such as feet to inches, the resulting predicted values will remain unchanged. Often the predictors are also centered, so that in the normal linear regression setting, the intercept can be omitted and the slope parameter estimates are orthogonal to the intercept estimate. However, in the generalized linear models as considered here, this is not the case; hence the design matrix is typically not centered. Furthermore, in the ALASSO analysis, we also do not scale the imputed haplotype design matrix because the adaptive weights we set (i.e.|β̃_j|⁻^γ) are scale-equivariant. The use of scale-equivariant weights automatically forces the resulting estimator to be scale equivariant.

The ALASSO solution (β̂_λ) also depends on the value of λ. The regularization parameter controls the tradeoff between model fit and model sparsity. By including more predictors, one can continually improve the fit on the training data at the expense of interpretability and over fitting. Many model selection criteria, like Mallow’s C_p, Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross validation [Shao, 1997; Hastie, Tibshirani, and Friedman, 2009; Arlot and Celisse, 2010], can be used to determine the appropriate value of λ from an exhaustive grid search. Because the goal of haplotype-based association analysis is more aligned with selecting the true model than minimizing prediction error, we use BIC for tuning which can achieve consistent model selection [Yang, 2005]. BIC is defined as

BIC = - 2 ℓ_{n} ({\hat{β}}_{λ}, \hat{φ}) + {d f}_{λ} \cdot log (n),

where ℓ_n (β̂_λ,φ̂) is the log-likelihood evaluated at the estimated regression coefficients and maximized over φ for a given λ and df_λ is the degrees of freedom, which equals the number of non-zero elements in(β̂_λ,φ̂). The λ that minimizes BIC is chosen as the regularization parameter, and its corresponding β̂_λ is the ALASSO estimate. For comparison, we also present some of the results using AIC as a tuning method. In the definition of AIC, the penalty on the degrees of freedom is changed from log (n) to 2. As a result, models selected using AIC incur less shrinkage, and the chosen ALASSO estimate will be closer to the unpenalized MLE estimate than those found using BIC.

For computational convenience, the objective function created via the least squares approximation (LSA) method was used to calculate the ALASSO solution. The LSA method replaces the objective function of the original ALASSO problem with a least squares objective function [Wang and Leng, 2007]. The method is motivated by a standard Taylor series expansion of −ℓ_n(β,φ) about (β̃,φ̃), the function’s unpenalized minimizer, and shows that the ALASSO estimate has the exact same asymptotic distribution as the estimator given by

{\hat{β}}_{λ} = {argmin}_{β} {(β - \tilde{β})}^{T} {\sum^{\sim}}^{- 1} (β - \tilde{β}) + λ \sum_{j = 1}^{p} w_{j} | β_{j} |,

where Σ̃ is the estimated covariance matrix of β̃. Note that the minimizer of the unpenalized least squares objective function is exactly the maximum likelihood estimator. Hence, as with the penalized likelihood, varying the tuning parameter yields a continuous solution path from the MLE to the solution with all coefficients equal to zero. Because the underlying data likelihoods are not quadratic in the regression coefficients, using the alternative least squares objective function greatly reduces the computational costs for finding the ALASSO solution [Weng and Lang, 2007]. Using the LSA method eliminates the need for an iterative procedure to perform optimization; it only requires one unpenalized fit of the original objective function and then a grid search to determine λ. The final estimate is again chosen by minimizing the BIC.

SIMULATION STUDIES

We performed simulation studies to examine the performance of the ALASSO method under two competing data likelihoods when analyzing case-control data. Specifically, we wanted to determine if using a prospective likelihood in place of the more appropriate retrospective likelihood was detrimental when performing haplotype-based analyses using a penalized likelihood method. To answer this question, we compared the parameter estimation and model selection properties of each approach. For ease of discussion, let aPro refer to ALASSO coupled with a prospective likelihood and aRetro refer to ALASSO coupled with a retrospective likelihood.

SIMULATION SETTINGS

Our simulation studies were based on two haplotype distributions (given in Table 1) studied by Lin and Huang [2007]. These distributions are based on the common haplotypes formed by five SNPs on chromosome 18 in the CEU sample of the HapMap data. The SNPs used to build the first haplotype distribution were in strong linkage disequilibrium, while those used to build the second haplotype distribution were not. Distribution 1 represents a haplotype distribution with a few high frequency haplotypes, while the haplotype frequencies in Distribution 2 are more uniform. Each distribution was normalized so that the haplotype frequencies summed to 1. Because 8 haplotypes define Distribution 1 and 11 haplotypes define Distribution 2, the specific dimension of β is p = 7 and p = 10, respectively.

Table 1.

Haplotype distributions used in simulation studies

	Distribution 1		Distribution 2

Hap ID	Haplotype	Frequency	Haplotype	Frequency
1	00000	0.406	00010	0.131
2	00001	0.213	00001	0.105
3	01111	0.141	10010	0.103
4	10000	0.132	10101	0.100
5	10001	0.055	00100	0.088
6	01000	0.021	10100	0.088
7	01100	0.018	00101	0.086
8	01001	0.014	01101	0.084
9			10001	0.081
10			10000	0.079
11			00000	0.055

	Distribution 1						Distribution 2

Hap ID	Freq	Sim I		Sim II			Freq	Sim I		Sim II
Hap ID	Freq	R	C	R/R	R/C	C/C	Freq	R	C	R/R	R/C	C/C
1	0.406	–	–	–	–	–	0.131	–	–	–	–	–
2	0.213	1	1	1	1	1	0.105	1	j	1	j	j
3	0.141	1	1	1	1	j	0.103	1	1	1	1	j
4	0.132	1	j	1	j	j	0.100	1	1	1	1	1
5	0.055	j	1	j	j	1	0.088	1	1	1	1	1
6	0.021	1	1	j	1	1	0.088	1	1	1	1	1
7	0.018	1	1	1	1	1	0.086	1	1	1	1	1
8	0.014	1	1	1	1	1	0.084	1	1	1	1	1
9							0.081	1	1	1	1	1
10							0.079	1	1	j	1	1
11							0.055	j	1	j	j	1

			Model Selection Results		Parameter Estimation Results
			False Positives^a		Bias		MSE

		Model^b	Pro	Retro	Pro	Retro	Pro	Retro
Haplotype Distribution 1	Correct Analysis	Additive	0.025^c (0.005)	0.028 (0.006)	0.000 (0.002)	−0.001 (0.002)	0.002 (0.001)	0.002 (0.001)

		Dominant	0.025 (0.006)	0.033 (0.006)	0.000 (0.002)	0.000 (0.002)	0.002 (0.001)	0.003 (0.001)

		Recessive	0.026 (0.008)	0.029 (0.011)	−0.001 (0.001)	0.003 (0.002)	0.001 (0.001)	0.003 (0.002)

Haplotype Distribution 2	Correct Analysis	Additive	0.062 (0.020)	0.055 (0.017)	0.000 (0.002)	0.000 (0.002)	0.002 (0.001)	0.002 (0.001)

		Dominant	0.059 (0.018)	0.054 (0.014)	0.001 (0.002)	0.001 (0.001)	0.001 (0.001)	0.001 (0.001)

		Recessive	0.057 (0.0014)	0.054 (0.017)	0.000 (0.000)	0.007 (0.005)	0.000 (0.000)	0.007 (0.005)

			Model Selection Results				Parameter Estimation Results
			False Negatives^b		False Positives		Bias		MSE

Mode^c	Freq	OR	Pro	Retro	Pro	Retro	Pro	Retro	Pro	Retro
Additive	Rare	1.3	0.943^d (0.007)	0.945 (0.007)	0.019 (0.004)	0.024 (0.005)	−0.230 (0.004)	−0.231 (0.004)	0.071 (0.001)	0.071 (0.001)
		1.5	0.820 (0.012)	0.805 (0.013)	0.010 (0.007)	0.015 (0.006)	−0.304 (0.005)	−0.298 (0.006)	0.142 (0.003)	0.140 (0.004)
		1.7	0.623 (0.015)	0.614 (0.015)	0.051 (0.008)	0.050 (0.008)	−0.305 (0.010)	−0.304 (0.010)	0.186 (0.004)	0.183 (0.004)
		2.0	0.244 (0.014)	0.239 (0.013)	0.059 (0.008)	0.064 (0.008)	−0.201 (0.010)	−0.200 (0.010)	0.144 (0.006)	0.142 (0.006)

	Common	1.3	0.801 (0.013)	0.792 (0.013)	0.030 (0.005)	0.027 (0.005)	−0.190 (0.005)	−0.188 (0.005)	0.059 (0.001)	0.058 (0.001)
		1.5	0.465 (0.016)	0.460 (0.016)	0.075 (0.006)	0.075 (0.007)	−0.189 (0.005)	−0.187 (0.007)	0.082 (0.002)	0.081 (0.001)
		1.7	0.120 (0.010)	0.112 (0.010)	0.066 (0.009)	0.073 (0.009)	−0.103 (0.006)	−0.101 (0.006)	0.050 (0.003)	0.048 (0.003)
		2.0	0.003 (0.002)	0.003 (0.002)	0.054 (0.008)	0.056 (0.008)	−0.071 (0.005)	−0.072 (0.005)	0.027 (0.001)	0.027 (0.001)

Dominant	Rare	1.3	0.946 (0.007)	0.941 (0.007)	0.018 (0.004)	0.028 (0.005)	−0.230 (0.004)	−0.228 (0.004)	0.072 (0.001)	0.071 (0.001)
		1.5	0.870 (0.011)	0.840 (0.012)	0.025 (0.008)	0.015 (0.007)	−0.324 (0.006)	−0.311 (0.007)	0.151 (0.002)	0.146 (0.002)
		1.7	0.659 (0.015)	0.632 (0.015)	0.034 (0.006)	0.031 (0.006)	−0.317 (0.010)	−0.307 (0.010)	0.196 (0.004)	0.188 (0.004)
		2.0	0.332 (0.015)	0.303 (0.015)	0.064 (0.009)	0.069 (0.009)	−0.251 (0.011)	−0.239 (0.011)	0.184 (0.007)	0.170 (0.007)

	Common	1.3	0.864 (0.011)	0.849 (0.011)	0.024 (0.006)	0.025 (0.005)	−0.207 (0.004)	−0.203 (0.005)	0.063 (0.001)	0.062 (0.001)
		1.5	0.590 (0.016)	0.555 (0.016)	0.060 (0.007)	0.055 (0.007)	−0.231 (0.006)	−0.224 (0.007)	0.101 (0.002)	0.096 (0.002)
		1.7	0.263^e (0.014)	0.206 (0.013)	0.064 (0.008)	0.067 (0.009)	−0.161 (0.008)	−0.145 (0.007)	0.088 (0.004)	0.073 (0.003)
		2.0	0.035 (0.006)	0.025 (0.005)	0.062 (0.008)	0.065 (0.008)	−0.090 (0.006)	−0.091 (0.006)	0.046 (0.003)	0.041 (0.003)

Recessive	Rare	2.0	0.990 (0.003)	0.910 (0.011)	0.010 (0.007)	0.035 (0.013)	−0.693 (0.000)	−0.583 (0.026)	0.480 (0.000)	0.474 (0.005)
		2.5	0.990 (0.003)	0.870 (0.013)	0.010 (0.007)	0.010 (0.007)	−0.916 (0.000)	−0.753 (0.031)	0.840 (0.000)	0.753 (0.016)
		3.0	0.990 (0.002)	0.770 (0.014)	0.010 (0.007)	0.015 (0.009)	−1.099 (0.000)	−0.764 (0.045)	1.207 (0.000)	0.987 (0.030)
		3.5	0.995 (0.001)	0.720 (0.009)	0.005 (0.005)	0.005 (0.005)	−1.253 (0.000)	−0.865 (0.046)	1.569 (0.000)	1.163 (0.047)

	Common	2.0	0.925 (0.008)	0.695 (0.015)	0.000 (0.000)	0.010 (0.007)	−0.603 (0.023)	−0.441 (0.028)	0.467 (0.006)	0.347 (0.014)
		2.5	0.735 (0.014)	0.350 (0.015)	0.010 (0.007)	0.030 (0.012)	−0.549 (0.045)	−0.329 (0.033)	0.711 (0.023)	0.321 (0.027)
		3.0	0.480 (0.016)	0.110 (0.010)	0.015 (0.009)	0.045 (0.016)	−0.440 (0.047)	−0.226 (0.026)	0.631 (0.040)	0.188 (0.026)
		3.5	0.355 (0.015)	0.060 (0.008)	0.030 (0.012)	0.020 (0.010)	−0.367 (0.052)	−0.242 (0.024)	0.671 (0.055)	0.176 (0.026)

		Prospective Analysis			Retrospective Analysis

Model^b	Haplotype	Unpenalized (p-value)	BIC - Penalized	AIC - Penalized	Unpenalized (p-value)	BIC - Penalized	AIC - Penalized
Additive	1121111	0.016 (0.883)	0.000	0.000	0.028 (0.784)	0.000	0.000
	1121112	0.069 (0.883)	0.000	0.000	0.076 (0.871)	0.000	0.000
	1121121	−2.028 (0.000)	−2.023	−2.023	−1.979 (0.000)	−1.978	−1.978
	2212222	−0.016 (0.919)	0.000	0.000	−0.016 (0.922)	0.000	0.000

Dominant	1121111	−0.039 (0.797)	0.000	0.000	0.126 (0.335)	0.000	0.000
	1121112	0.043 (0.927)	0.000	0.000	0.110 (0.814)	0.000	0.000
	1121121	−2.022 (0.000)	−2.018	−2.018	−1.967 (0.000)	−1.861	−1.861
	2212222	−0.026 (0.880)	0.000	0.000	0.018 (0.914)	0.000	0.000

		Prospective Analysis			Retrospective Analysis

Model^b	Haplotype	Unpenalized (p-value)	BIC - Penalized	AIC - Penalized	Unpenalized (p-value)	BIC - Penalized	AIC - Penalized
Additive	1221211211	0.175 (0.096)	0.000	0.120	0.173 (0.098)	0.000	0.152
	2112122122	0.098 (0.539)	0.000	0.000	0.070 (0.656)	0.000	0.000
	2211211211	−0.368 (0.355)	0.000	0.000	−0.451 (0.224)	0.000	−0.407
	2221211211	−0.762 (0.100)	0.000	−0.522	−0.767 (0.094)	0.000	−0.729

Dominant	1221211211	0.283 (0.042)	0.000	0.225	0.246 (0.052)	0.000	0.226
	2112122122	0.128 (0.461)	0.000	0.000	0.072 (0.660)	0.000	0.000
	2211211211	−0.356 (0.371)	0.000	0.000	−0.446 (0.231)	0.000	−0.398
	2221211211	−0.739 (0.111)	0.000	−0.506	−0.768 (0.094)	0.000	−0.727

			Model Selection Results				Parameter Estimation Results
			False Negatives^b		False Positives		Bias		MSE

Mode^c	Freq		Pro	Retro	Pro	Retro	Pro	Retro	Pro	Retro
Additive	R/R	R1^d	1.040^e (0.056)	1.025 (0.056)	0.195 (0.037)	0.140 (0.028)	−0.318 (0.021)	−0.301 (0.022)	0.190 (0.009)	0.184 (0.009)
	R/R	R2	1.040^e (0.056)	1.025 (0.056)	0.195 (0.037)	0.140 (0.028)	−0.227 (0.019)	−0.219 (0.019)	0.125 (0.009)	0.122 (0.009)

	R/C	R1	1.030 (0.033)	0.985 (0.054)	0.120 (0.028)	0.120 (0.028)	−0.470 (0.013)	−0.422 (0.017)	0.257 (0.006)	0.234 (0.007)
	R/C	C1	1.030 (0.033)	0.985 (0.054)	0.120 (0.028)	0.120 (0.028)	−0.417 (0.015)	−0.333 (0.018)	0.219 (0.008)	0.176 (0.009)

	C/C	C1	0.960 (0.037)	0.970 (0.055)	0.095 (0.024)	0.130 (0.029)	−0.467 (0.013)	−0.425 (0.015)	0.251 (0.006)	0.228 (0.007)
	C/C	C2	0.960 (0.037)	0.970 (0.055)	0.095 (0.024)	0.130 (0.029)	−0.452 (0.013)	−0.416 (0.015)	0.235 (0.007)	0.217 (0.008)

Dominant	R/R	R1	1.120 (0.055)	0.975 (0.054)	0.170 (0.028)	0.130 (0.026)	−0.319 (0.021)	−0.279 (0.021)	0.188 (0.009)	0.166 (0.009)
	R/R	R2	1.120 (0.055)	0.975 (0.054)	0.170 (0.028)	0.130 (0.026)	−0.248 (0.021)	−0.229 (0.020)	0.151 (0.009)	0.135 (0.009)

	R/C	R1	1.290^f (0.042)	1.060 (0.052)	0.165 (0.029)	0.210 (0.037)	−0.414 (0.019)	−0.337 (0.021)	0.240 (0.007)	0.197 (0.009)
	R/C	C1	1.290^f (0.042)	1.060 (0.052)	0.165 (0.029)	0.210 (0.037)	−0.391 (0.017)	−0.349 (0.018)	0.210 (0.009)	0.184 (0.009)

	C/C	C1	1.180 (0.046)	1.130 (0.056)	0.075 (0.020)	0.095 (0.023)	−0.456 (0.013)	−0.411 (0.015)	0.243 (0.007)	0.212 (0.008)
	C/C	C2	1.180 (0.046)	1.130 (0.056)	0.075 (0.020)	0.095 (0.023)	−0.434 (0.014)	−0.379 (0.016)	0.229 (0.008)	0.194 (0.009)

Recessive	R/R	R1	2.000 (0.000)	1.610 (0.044)	0.030 (0.012)	0.050 (0.015)	−0.916 (0.000)	−0.749 (0.032)	0.840 (0.000)	0.763 (0.018)
	R/R	R2	2.000 (0.000)	1.610 (0.044)	0.030 (0.012)	0.050 (0.015)	−0.916 (0.000)	−0.642 (0.034)	0.840 (0.000)	0.642 (0.025)

	R/C	R1	1.880 (0.023)	1.400 (0.045)	0.020 (0.010)	0.045 (0.015)	−0.916 (0.000)	−0.736 (0.032)	0.840 (0.000)	0.746 (0.017)
	R/C	C1	1.880 (0.023)	1.400 (0.045)	0.020 (0.010)	0.045 (0.015)	−0.846 (0.024)	−0.517 (0.033)	0.827 (0.009)	0.487 (0.028)

	C/C	C1	1.725 (0.034)	1.330 (0.052)	0.010 (0.007)	0.085 (0.022)	−0.885 (0.019)	−0.632 (0.031)	0.854 (0.014)	0.593 (0.026)
	C/C	C2	1.725 (0.034)	1.330 (0.052)	0.010 (0.007)	0.085 (0.022)	−0.874 (0.017)	−0.604 (0.031)	0.824 (0.007)	0.560 (0.027)

PERMALINK

Evaluating Haplotype Effects in Case-Control Studies via Penalized-Likelihood Approaches: Prospective or Retrospective Analysis?

Megan L Koehler

Howard D Bondell

Jung-Ying Tzeng

Abstract

INTRODUCTION

METHODS

PROSPECTIVE AND RETROSPECTIVE LIKELIHOODS

HAPLOTYPE ANALYSIS VIA PENALIZED LIKELIHOOD METHODS

SIMULATION STUDIES

SIMULATION SETTINGS

Table 1.

Table 2.

DATA GENERATION

COMPUTATIONAL DETAILS

SIMULATION RESULTS

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 11.

NULL SIMULATION

SIMULATION I for Haplotype Distribution 1

Additive Genetic Mode

Table 12.

Table 13.

Dominant Genetic Mode

Recessive Genetic Mode

SIMULATION II for Haplotype Distribution 1

SIMULATION I and II for Haplotype Distribution 2

ANALYSIS OF GAW15 RA DATA

DISCUSSION

Table 14.

Table 15.

Table 16.

Table 17.

Table 9.

Table 10.

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases