A General Conditional-Logistic Model for Affected-Relative-Pair Linkage Studies

Jane M Olson

doi:10.1086/302662

. 1999 Nov 5;65(6):1760–1769. doi: 10.1086/302662

A General Conditional-Logistic Model for Affected-Relative-Pair Linkage Studies

Jane M Olson ¹

PMCID: PMC1288403 PMID: 10577930

Summary

Model-free LOD-score methods are often employed to detect linkage between marker loci and common diseases, with samples of affected sib pairs. Although extensions of the basic one-disease-locus model have been proposed that allow separate inclusion of other types of affected relative pairs, discordant relative pairs, covariates, or additional disease loci, a unified framework that can handle all of these features has been lacking. In this report, I propose a conditional-logistic parameterization that generalizes easily to include all of these features. Two data examples, one using simulated data and one using type 1 diabetes, illustrate applications of the models.

Introduction

Since Risch (1990b) proposed a LOD-score formulation for affected-sib-pair (ASP) linkage analysis, a variety of extensions and modifications of the basic ASP LOD score have been suggested. For analysis of other types of affected relative pairs (ARPs), Risch (1990b) also proposed LOD-score formulations similar to that for ASPs. Risch (1990b) and Lunetta and Rogus (1998) developed and studied analogous LOD-score models for discordant sib pairs (DSPs), and Rogus and Krolewski (1996) demonstrated that, for diseases with high sibling recurrence risk, DSPs can have, for detection of linkage, power equal to or greater than that of ASPs. Greenwood and Bull (1997, 1999a) proposed a multinomial logistic regression model that easily allows covariates to be included in an ASP analysis and showed that inclusion of important covariates can increase the power to detect linkage.

Multilocus models have also been proposed. Risch (1990a) developed multilocus extensions to the ASP and ARP models and investigated specialized models that are additive or multiplicative in disease penetrance. Cordell et al. (1995) parameterized the two-locus ASP LOD-score model, using joint allele-sharing–specific relative risks as functions of genetic variance components and the disease prevalence and then studied the application of these models to type 1 diabetes. They also developed genetic constraints for the two-locus case. Olson (1997) proposed an alternative parameterization to the ASP and ARP two-locus LOD score, on the basis of genetic variance components divided by the population ASP proportion. Cordell et al. (1999) developed genetic variance-component parameterizations of the multilocus ARP LOD score and studied their application to conditional analysis of multiple loci involved in type 1 diabetes.

Although these extensions and modifications increase the ability of researchers to investigate linkage, no method has yet been proposed that combines all of these features into a unified framework. In this article, I shall develop a conditional-logistic representation of the affected-pair likelihood ratio (LR), one that is valid for any type of ARP or for a sample containing more than one type of ARP. The model is parameterized in terms of the logarithms of allele-sharing–specific relative risks and is equivalent to the Risch (1990b) likelihood models. Because of the parameterization, the model can be easily extended to include the effects of covariates and to model more than one disease locus. In addition, I propose a similar parameterization for discordant relative pairs (DRPs) and show how DRPs can be analyzed both separately and as part of an analysis that also includes ARPs. Two data examples, one that illustrates the inclusion of DRPs and covariates and one that illustrates the multilocus models, are given. First, I develop the conditional-logistic model for the simple case of a one-locus disease without covariates.

Methods

The Conditional-Logistic Likelihood Ratio

I first describe the conditional-logistic model in the case of a single disease locus. For all theoretical development, I assume that allele-sharing probabilities are calculated at the location of the disease locus, using multipoint methods (e.g., those of Kruglyak and Lander 1995; Kruglyak et al. 1996; Idury and Elston 1997) and that there is no disequilibrium between disease and marker loci. Let f_ri be the unconditional (prior) probability that a relative pair of type r shares i alleles identical by descent (IBD), and let Inline graphic be the estimated probability, conditional on available marker data I_m, that the pair shares i alleles IBD, i=0, 1, 2. Also, let π be the number of alleles shared IBD. Let A be the event that both members of a pair are affected, and let A₁ and A₂ be the events that the first and second sibs, respectively, are affected.

Then the likelihood ratio (LR) contribution for an ARP of type r is

graphic file with name AJHGv65p1760df1.jpg

where λ_i is the relative risk to an individual who shares i alleles IBD with an affected relative. Specifically, λ₀=λ_u (=1) is the relative risk for unrelated individuals, λ₁=λ_o is the offspring relative risk, and λ₂=λ_m is the MZ-twin relative risk.

So that covariates can later be added to the model, I propose to parameterize the model in terms of the logarithms of genetic relative-risk parameters. Specifically, let β_i=log_eλ_i. Then, putting

graphic file with name AJHGv65p1760df2.jpg

where I(·) is the indicator function, we can write the model as

graphic file with name AJHGv65p1760df3.jpg

subject to β₀=0. Equivalently, we can write the likelihood as

graphic file with name AJHGv65p1760df4.jpg

and the likelihood under the null hypothesis of no linkage as L(β₁=0,β₂=0)=P(I_m|r); however, for a sample of ARPs, it is easier to directly maximize the LOD score, which is obtained by summing the base-10 logarithms of the pair-specific LRs. When markers are fully informative, the model takes the form of a conditional-logistic model in which the denominator sums over the set of “pseudocontrols” that represent the possible IBD outcomes multiplied by their respective prior probabilities. When marker information is incomplete, the numerator also sums over the possible IBD outcomes, so that the model is a mixture of conditional-logistic models, weighted by the observed IBD probabilities.

To see that the LR (1) is equivalent to the Risch (1990b) LR,

graphic file with name AJHGv65p1760df5.jpg

where z_ri=P(π=i|A,r), consider first the case of ASPs. For sib pairs, the parameters in the Risch model are Inline graphic , , and , where λ_s is the sibling relative risk. The Risch LR can therefore be written as

Now, the denominator in model (1) equals

and that LR can be written as

graphic file with name AJHGv65p1760df8.jpg

the same as the Risch LR.

Model (1) holds for a general ARP of type r; the denominator equals λ_r, and the numerator is the corresponding linear combination of λ_u=1, λ_o, and λ_m. No additional parameters are needed to include other types of ARPs in an ASP linkage analysis, and, unlike the model of Cordell et al. (1999), the disease prevalence need not be prespecified. To ensure that the genetic constraints (Holmans 1993) hold, β₁ and β₂ must be such that β₁⩾0 and β₂⩾log_e(2e^β₁-1)—or, equivalently, λ_m⩾2λ_o-1. If a model with no dominance genetic variance is to be fit, then β₂=log_e(2e^β₁-1)—or, equivalently, λ_m=2λ_o-1.

The conditional-logistic model can now be easily extended to include covariates and additional disease loci. I first consider the addition of covariates.

Covariates

Consider the single-locus LR (1), and suppose that we wish to condition on the value of a covariate denoted as “x.” Following the same plan of derivation as has been used for the single-locus model, we can write the LR as

graphic file with name AJHGv65p1760df9.jpg

where δ_i, i=0, 1, 2, are parameters associated with the covariate x, with δ₀=0. There are therefore two additional parameters—δ₁ and δ₂—for every covariate. (If the data set contains no sib pairs, then only one parameter, δ₁, need be added.) The model assumes a log-linear (i.e., multiplicative) effect of the covariate on genetic relative risk, a common, natural, and flexible way to model relative risk in general epidemiology.

Constraints on regression parameters

Greenwood and Bull (1997, 1999a), who have proposed a multinomial logistic regression model for the inclusion of covariates in an ASP model, have provided an excellent discussion of the problem of constraining the covariate parameters so that Holmans's (1993) genetic constraints are satisfied. They model Inline graphic and . Algebraic manipulations show that the conditional-logistic parameters are linear combinations of the Greenwood-Bull parameters: β₁=(μ₂-μ₁)/log_e2, δ₁=(ν₂-ν₁)/log_e2, β₂=-μ₁, and δ₂=-ν₁. The Greenwood-Bull model is therefore also a conditional-logistic model.

Therefore, the same issues regarding the genetic constraints arise under the conditional-logistic parameterization. I consider two cases. The first is Greenwood and Bull's “average constraint” situation, in which one wishes to constrain the model so that the expected value of the constrained allele-sharing estimates fall within the region of the parameter space consistent with a genetic model. As Greenwood and Bull point out, such a situation would be reasonable for many covariates—in particular, continuous covariates such as age. For such covariates, it seems sensible to expect that the genetic constraints be valid “on average” but not sensible to expect them to hold at all values of x, because specific values of x do not necessarily represent subgroups of pairs that can be considered to be separate populations.

The conditional-logistic model also can be constrained in the same manner, with procedures such as those outlined by Greenwood and Bull. In addition, I suggest a conceptually simpler approximation to the “average constraint” condition that requires only that the genetic constraints hold at the mean of x, denoted as “ Inline graphic ,” but not necessarily for all values of x. If the model is to be constrained only at , then the appropriate constraints are such that and . To avoid maximization of the LOD score under joint constraints on the β's and δ's, consider the centered covariate . For the first constraint, write Inline graphic , with a similar result for the second constraint. The value of , evaluated at , is 0, so, for a centered covariate, no constraints on δ₁ or δ₂ are required, provided that the usual constraints on β₁ and β₂ hold.

More generally, the covariate may be “centered” around any value at which the genetic constraints are assumed to hold. For example, suppose, because of sampling bias, that the covariate sample mean does not estimate the population covariate mean μ and that μ is known from earlier research. The investigator may choose to assume that the genetic constraints hold at μ and enter x-μ into the conditional-logistic model. According to the argument in the preceding paragraph, (x-μ), evaluated at x=μ, is 0, so no constraints on δ₁ and δ₂ are required if the investigator assumes only that the genetic constraints hold at x=μ. Note that only μ must be known; a similar modification of the Greenwood-Bull approach would require that the population distribution of x be known.

For some covariates, such as indicator variables that represent different populations, it is sensible to expect that the genetic constraints hold for all values of x. If so, then the constraints must be such that β₁+δ₁x⩾0 for all x and β₂+δ₂x⩾log_e{2[exp(β₁+δ₁x)]-1} for all x. If the smallest value of x equals 0, then the usual constraints on β₁ and β₂ must hold, and the constraints on δ₁ and δ₂ are

and

graphic file with name AJHGv65p1760df11.jpg

Note that the δ's can still be positive or negative; no directionality of the covariate effect is implied. However, when the smallest value of x is arbitrarily set to equal 0 (as is natural for an indicator variable), the formulas above can be used directly as lower bounds. Otherwise, joint bounding of the β's and δ's becomes more complicated. For example, if values of x are positive and negative (e.g., a coding of −1, 1), then both the upper bound and the lower bound on the δ's are required and different bounds on the β's may be needed.

Because of the genetic constraints, the distribution of LR (i.e., LOD score) tests of linkage and of tests of covariate contribution are often mixtures of χ² distributions and are thus difficult to specify except in simple situations. Use of simulation methods to obtain P values may be preferred, particularly in complex situations. However, I provide some guidelines in the next subsection, using results from Self and Liang (1987).

Covariate types and tests of significance

Covariates may be pair specific or individual specific. For pair-specific covariates, such as a dichotomous indicator for ethnic group, two additional parameters per covariate are fitted. Individual-specific covariates may be incorporated by inclusion of the pair's sum (or mean)—or the pair's difference, or both—into model (2); either two or four additional parameters are then fitted for each individual-specific covariate. In the following, I use the term “covariate” to mean that two additional parameters are fitted.

In general, an LR test that compares the model with a covariate to the model without the covariate can be used to determine whether a covariate contributes significant information about linkage. In essence, such a test is a test of linkage heterogeneity due to that covariate and is only meaningful if linkage exists in the first place. In addition, for the following distributional results to hold, linkage must be present in the population from which the data are sampled, to ensure that the true values of the β's do not fall on the boundaries of the parameter space and that the principle represented in case 4 of Self and Liang (1987) applies. (For a population-indicator covariate, provided that linkage exists in at least one subpopulation, it is always possible to code the covariate so that the true value of the β's is an interior point of the β parameter space.)

Whether genetic constraints are required only at one value of x or for all values of x, the null-hypothesis values of the δ's equal 0, an interior point of the δ parameter space. Therefore, without loss of generality and in the presence of linkage, the LR test for significance of a covariate can be seen to have, asymptotically, a χ²₂ distribution. It should be noted that transformation of the covariate is not required for this distributional result to hold; the purpose of transformation is to simplify the estimation procedure. On the other hand, when one or both of the true values of the β's are close to their boundaries and the sample size is not large enough to ensure good precision in estimation, the quality of the asymptotic approximation may be affected. For this reason, simulations may be preferred for final models or when the contribution of a covariate is of major interest to the investigator.

Overall tests of linkage include both the β parameters and the δ parameters, increasing the number of df in the linkage test and thereby raising the LOD-score cutoff value for every level of significance. For tests of linkage that include covariates, the distribution of the LR statistic is a mixture of χ² distributions. For complex models, simulation-based or exact P values may be preferred, particularly in candidate regions or in other “final-testing” situations. To avoid such computational intensity, one might prefer—for preliminary studies, such as in the setting of a genome scan—approximate P values. As a large-sample approximation for P values, I suggest the use of approximate df. Holmans (1993) showed that the two-parameter LR linkage statistic, under the genetic constraints, has a distribution slightly stochastically smaller than a χ² distribution with 1 df. (For example, a LOD score of 2.3 corresponds to a significance level of .001. The corresponding LR statistic has the value 2.3×2log_e10=10.59, which is smaller than 10.83, the critical value of a χ² with 1 df at the same level of significance.) If all covariates are population-indicator covariates and if the genetic constraints are applied to all subpopulations, then the LR for the model with the covariates is simply the product of all the subpopulation-specific LRs, all maximized under the genetic constraints. Each subpopulation makes an independent contribution to the LR, so that its distribution is stochastically smaller than a χ² distribution with df equal to the number of subpopulations.

When genetic constraints are applied only to Inline graphic (or some other value), then the LR statistic cannot be easily partitioned. In this case, the null-hypothesis values of the β's fall on the boundaries of the β parameter space and the null-hypothesis values of the δ constitute an interior point of the δ space. The LR statistic has a distribution that is a mixture of χ² distributions, the largest with df equal to the number of parameters estimated. The largest χ² distribution in this mixture is thus a conservative upper bound. On the other hand, without covariates, constraints on the β's reduces the effective df from 2 to slightly less than 1, and I hypothesize that a similar reduction will be attained in the presence of covariates with “average” constraints. Therefore, one might expect that a χ² distribution with 1+2N_u+N_c df will be a reasonable approximation, where N_u is the number of covariates with “average” or no constraints and N_c is the number of fully constrained covariates. This approximation might be used for genome screening and exploratory data analysis, whereas more time-consuming and mathematically difficult (but more accurate) methods might be reserved for final analyses.

DRPs

DRPs can be analyzed separately or added to a set of ARPs, with use of the conditional-logistic framework. Rogus and Krolewski (1996) have demonstrated that DSPs are more powerful than ASPs when the sibling recurrence risk is >.5. Let D be the event that a DRP is ascertained, and let A₁ and U₂ be the events that the first member is affected and the second member is unaffected. According to the same reasoning that has been used for ARPs, the LR contribution of a DRP can be written as

graphic file with name AJHGv65p1760df12.jpg

after multiplication of the numerator and denominator by P(A₁)/[1-P(A₂)] and summation over the possible IBD states. First consider analysis of DRPs separately. Substitution of DRP-type relative-risk parameters gives

graphic file with name AJHGv65p1760df13.jpg

where β^*₀=0, β^*₁⩽0, and β^*₂⩽log_e(2e^{β^*₁}-1). Covariates may be added to the DRP model in a manner analogous to that for ARPs.

Now consider addition of DRPs to a set of ARPs with use of the conditional-logistic framework. Let y_si=P(π=i|D,r=s), i=0, 1, 2. Then, for a DSP, Inline graphic , , and (e.g., see Lunetta and Rogus 1998). Using these expressions, the previous expressions for z_si, i=0, 1, 2, and the constraint , algebraic manipulation gives

graphic file with name AJHGv65p1760df14.jpg

with λ^*_o approaching 1 as λ_s, λ_o, λ_m, and λ^*_m approach 1. Substitution of this expression for λ^*_o in the DRP LR allows the LR for ARPs and DRPs combined to be maximized with respect to three free parameters—λ_o, λ_m, and λ^*_m. (If no sib pairs are present, then two parameters—λ_o and λ^*_o—can be estimated.)

Covariates may be added in the same manner as has been used for ARPs, except that a third covariate parameter, δ^*₂, must be added, to allow λ^*_m to depend on the covariate. The same arguments used in the case of ARPs suggest that, in the presence of linkage, the distribution of the LR test of the covariate is asymptotically χ² with 3 df. In addition, in simulations, LOD-score critical values to determine significance of the three-parameter LOD score that combines ARPs and DRPs, with no covariates, were determined to be .72, 1.02, 1.65, 1.90, and 2.48 at significance levels of .1, .5, .01 .005, .001, and .0001, respectively.

Multilocus Models

Consider a model with M disease loci. For simplicity, we will drop the subscript denoting the type of relative pair under consideration, but note that all relevant quantities must be calculated conditional on relative pair type. Let f_ij…m denote the unconditional probability that the relative pair shares i alleles IBD at the first locus, j alleles at the second locus, and so on, up to m alleles IBD at the Mth locus, for i, j, …, m=0, 1, 2. Let Inline graphic be the corresponding estimated allele-sharing probabilities, conditional on available marker data, and let β_ij…m be the corresponding parameters to be estimated. In general, β_ij…m=log_eλ_ij…m, where λ_ij…m is the relative risk for a relative pair sharing i alleles IBD at the first locus, j alleles IBD at the second locus, and so on, up to m alleles at the Mth locus. These allele-sharing–specific λ's were first discussed by Cordell et al. (1995). Note that λ_00…0=1 and β_00…0=0.

If the loci are linked, then the Inline graphic must be estimated jointly, and the f_ij…m must be computed by use of the recombination fractions between the loci (also see Farrell 1997). For unlinked loci, these quantities are products of the locus-specific quantities. For example, consider a three-locus model in which loci 1 and 2 are linked but locus 3 is not linked to either locus 1 or locus 2. Then f_ijk=f_ijf_k and Inline graphic . If all three of the loci are unlinked, then f_ijk=f_if_jf_k and .

The general LR for M loci can be written as

graphic file with name AJHGv65p1760df15.jpg

Cordell et al. (1999) have given expressions for the λ_ij…m in terms of the genetic-variance components. The general multilocus model (3) includes all single-locus and epistatic-variance components, up to and including M-way epistatic-variance components. Extension of the results reported by Cordell et al. (1995) shows that genetic constraints on multilocus models hold when

graphic file with name AJHGv65p1760df16.jpg

for all i, j,…, m and at all loci 1, 2,…, M. If there is no dominance-variance contribution for the kth locus (in ij…k…m) , then λ_ij…2…m=2λ_ij…1…m-λ_ij…0…m.

Specialized models can also be fitted. For a model multiplicative in penetrance (Risch 1990a), it is well known that relative risks are muliplicative—that is, λ_ij…m=λ⁽¹⁾_iλ⁽²⁾_j…λ^(M)_m, where λ^(l)_w is the marginal relative risk for a pair that shares w alleles IBD at locus l. The multiplicative model is therefore specified by β_ij…m=β⁽¹⁾_i+β⁽²⁾_j+…+β^(M)_m, where β^(l)_w, w=0, 1, 2, are parameters specific for locus l and β^(l)₀=0, for l=0, 1,…, M, so that there are two free parameters for each locus.

A model additive in penetrance (Risch 1990a) is also easily specified, by λ_ij…m=e^β⁽¹⁾_i+e^β⁽²⁾_j+…+e^{β^(M)_m}-(M-1), where β^(l)_w are locus-specific parameters with β^(l)₀=0 for all l. For the additive model, e^{β^(l)_w} is the relative risk that the pair shares w alleles IBD at locus l and 0 alleles IBD at all other loci. A two-locus derivation helpful in the interpretation of the new parameterization of the additive model is given in the Appendix.

Covariates can be included in multilocus models by putting λ_ij…m=exp(β_ij…m+δ_ij…mx). Inclusion of a single covariate doubles the number of parameters to be estimated. For full multilocus models with covariates, large sample sizes are needed to have power to detect significant effects. For data sets of moderate size, use of specialized models (such as additive and multiplicative models) provides a means of reducing the number of parameters to be estimated.

Data Examples

Simulated Data: DSPs and a Covariate

I simulated marker data for ASPs and DSPs from two populations—one in which the marker was completely linked to the disease (population L) and one in which the marker was unlinked to the disease (population U). Each of the two samples contained 150 ASPs and 150 DSPs. The frequency of the disease allele was .6, and the penetrances of the disease homozygote, heterozygote, and normal homozygote were .8, .2, and .2, respectively. This genetic model gives a disease prevalence of .42 and a sibling recurrence risk of .50, so that DSPs are expected to contain the same amount of linkage information as do ASPs. Because the disease prevalence is high, large samples are needed to detect evidence of linkage.

I analyzed the ASPs and DSPs both separately and together, with and without the population indicator (population L=0, population U=1) as a covariate. Genetic constraints were assumed to hold for both values of the covariate. For illustration, I also analyzed the two population groups separately. The results are shown in table 1. When ASPs are analyzed separately, evidence for linkage is seen in population L alone, but not in population U alone or in U and L together. Inclusion of the population indicator, in the analysis of the U and L ASPs together, recovers the linkage evidence found in population L alone. Under this parameterization, Inline graphic and are the parameter estimates associated with population L, and and are the parameter estimates associated with population U. Both the β's and the δ's model linkage.

Table 1.

LOD Score and Parameter Estimates for ASP and DSP Simulated Data

				Parameter Estimate^a
				ASP		DSP		ASP		DSP
Data Set	Covariate in Model? (No. of Parameters)	Sample Size	LODScore	β₁	β₂	β^*₁	β^*₂	δ₁	δ₂	δ^*₁	δ^*₂
ASP:
Linked	No (2)	150	1.03	.204	.374
Unlinked	No (2)	150	.00	.000	.000
Unlinked and linked	No (2)	300	.00	.000	.000
Unlinked and linked	Yes (4)	300	1.03	.203	.372			−.203	−.372
DSP:
Linked	No (2)	150	1.04			.000	−.508
Unlinked	No (2)	150	.05			.000	−.106
Unlinked and linked	No (2)	300	.77			.000	−.298
Unlinked and linked	Yes (4)	300	1.09			.000	−.508			.000	.402
ASP and DSP:
Unlinked and linked	No (3)	600	.94	.175	.324		−.297
Unlinked and linked	Yes (6)	600	2.02	.311	.377		−.304	−.311	−.377		.199

Open in a new tab

Parameters with an asterisk (*) are DSP parameters; parameters without an asterisk are ASP parameters.

The LOD-score difference between the models with and without the covariate is 1.02 (P=.096, 2 df). When the conservative approximation of the distribution of the overall LR statistic is used, the P value for the LOD score test for linkage (with the covariate) is no greater than .096 (2 df). When DSPs are analyzed separately, the results are similar, except that there is virtually no evidence for linkage in population U and moderate evidence for linkage in U and L together. The LOD-score difference between the models with and without the covariate is .325 (P=.473). The conservative approximation for the overall DSP LOD-score test of linkage gives P⩽.0813.

When the DSPs are added to the sample of ASPs but no covariate is included, moderate evidence for linkage is obtained (LOD score .94, .05<P<.1). When the covariate is included in the model, evidence for linkage is substantially increased (LOD-score difference 1.08, P=.174, 3 df). The overall LOD score is 2.02, giving an LR statistic of 9.3. The crude approximation suggested for the significance of the overall ARP LOD score does not apply, since there are two additional parameters estimated. As a result, a P value of .024 was obtained by use of the generating model (but with no linkage) to simulate 1,000 replicates, each of which was analyzed under the conditional-logistic model. The overall P value for the test of linkage that includes both DRPs and the covariate is thus smaller than that for any of the separate analyses of ARPs and DRPs, with or without covariates, except when the population subgroups are analyzed separately. Overall, this example shows that inclusion of covariates and DRPs in an ARP analysis can increase evidence for linkage.

Type 1 Diabetes: Multiple Loci

I analyzed data from a genome scan (Mein et al. 1998) of 356 ASPs with type 1 diabetes, a complex disease with multiple genetic and environmental determinants. Three chromosomal locations that showed significant or suggestive linkage in the genome scan were chosen for analysis: IDDM1 (near the HLA locus and D6S291), IDDM2 (near TH/INS, on chromosome 11), and a location near D16S3098. One-, two-, and three-locus models were fitted to these data. The results are shown in table 2. The one-locus LOD scores show a large effect for IDDM1 and modest effects for IDDM2 and D16S3098. For D16S3098, λ_m=2λ_o-1, implying that dominance variance at this locus was 0. Two-locus full, multiplicative, and additive models suggest that the additive model is a poorer fit for all two-locus combinations than is either the multiplicative model or the full model. LOD-score differences between the full (eight-parameter) and multiplicative (four-parameter) models were not large enough to declare the full model to be a significantly better fit.

Table 2.

LOD Scores and Number of Parameters for Three-Locus Analysis of Type 1 Diabetes Regions

	LOD Score (No. of Parameters) for
Analysis and Region(s)	Full Model	Multiplicative Model	Additive Model
One locus:
IDDM1	31.75 (2)	…	…
IDDM2	2.76 (2)	…	…
D16S3098	3.29 (2)	…	…
Two locus:
IDDM1 and IDDM2	35.51 (8)	34.53 (4)	33.87 (4)
IDDM1 and D16S3098	35.57 (8)	34.97 (4)	32.66 (4)
IDDM2 and D16S3098	6.14 (8)	5.95 (4)	5.65 (4)
Three locus:
IDDM1, IDDM2, and D16S3098	39.53 (17)	37.82 (5)	35.01 (5)

Open in a new tab

Full, multiplicative, and additive three-locus models were also fitted. For these analyses, I assumed that D16S3098 contributes no dominance variance to either main effects or interactions. As a result, the full, multiplicative, and additive model have 17, 5, and 5 free parameters, respectively. Starting values for the full model were obtained by summation of appropriate estimates from the one-locus models (i.e., by assumption of a multiplicative model). Again, the additive model was a poorer fit than the multiplicative model, and the LOD-score difference between the full and multiplicative models is not large enough to declare the full model to be a significantly better fit. The best-fitting, most parsimonious model is therefore the multiplicative model, although a larger sample size is probably needed to ensure that sufficient power exists to detect such a difference if it exists. An upcoming report by Cordell et al. (1999) gives an extensive multilocus analysis (under a variance-components parameterization) of this data set.

Discussion

I have proposed a unified conditional-logistic framework for model-free linkage analysis of qualitative traits, using relative pairs. The model easily allows the inclusion, within the same analysis, of more than one type of ARP, DRPs, covariates, and multiple genetic loci. The data examples have shown that, in a sib-pair analysis, inclusion of covariates and DSPs can increase the evidence for linkage. Some guidelines for both determination of the significance of a covariate and an approximation of the distribution of the overall linkage statistic that may be useful in preliminary analyses also have been given. More-extensive simulations will be required in order to determine whether the suggested approximation is useful in practice. For final analyses, simulation-based P values are recommended, to ensure accuracy of the inferences.

Nonetheless, inclusion of covariates increases the number of df of the overall LOD score for linkage, so that only covariates that substantially improve the LOD score will increase the power of the linkage test. In some situations, the number of parameters to be estimated may be reduced. For example, if age or age at onset is to be included as a covariate in a set of sib pairs, it is likely, in many cases, that the mean age of the pair contains much more linkage information than does the difference between the pair's ages, so that it would be prudent to include only the mean age.

Another way in which the number of parameters could be reduced is by putting δ₁=δ₂≡δ. This constraint assumes that λ_o and λ_m both increase by the same factor e^δ for a unit increase in x. For some covariates, this assumption may reasonably approximate the true model. For other covariates, this assumption is not reasonable. Consider, for example, a dichotomous covariate indicating membership in one of two ethnic groups. Suppose that in the first ethnic group (x=0) there is no linkage and that in the second ethnic group (x=1) there is linkage. Then β₁=β₂=0, so that λ_o=λ_m=1 in the first ethnic group. In the second ethnic group, λ_o=e^δ₁ and λ_m=e^δ₂. Putting δ₁=δ₂ implies that λ_o=λ_m (=λ_s), a highly unrealistic assumption. Therefore, great care must be taken when one is considering reducing the number of parameters in the model by putting δ₁=δ₂.

The conditional-logistic model assumes that all genotyped pairs have a value at each covariate in the model. If substantial missing data are present for a covariate but not for the allele-sharing, then power to detect linkage would be lessened if one limits the analysis to only those pairs for which covariate values are not missing. To include the allele-sharing information from pairs with missing covariates, one might either replace the missing value with the covariate mean Inline graphic or use a multiple-imputation method to find a suitable value to substitute for the missing covariate value. If the covariate values are missing at random, then substitution of the mean would reasonably approximate the average pair. If the covariate values are missing in a nonrandom manner, then parameter estimates might be considerably biased, although biased parameter estimates may be preferable to the loss of power expected if the pairs are excluded from the analysis in their entirety.

I have discussed one way in which DRPs can be included in an analysis of ARPs. An advantage of this approach is that it allows all types of relative pairs to be included in the same analysis. A potential disadvantage is that power may be reduced because additional parameters must be estimated. In this report, I have concentrated on full models that account for all available genetic variability. Previous researchers have proposed, both for ASPs alone (e.g., see Risch 1990b; Whittemore and Tu 1998) and for DSPs alone (Lunetta and Rogus 1998), one-parameter models that have good power for most genetic models. Similar models might be adapted to the conditional-logistic framework and to the combined ARP-DRP model, to provide more-powerful tests under some or most genetic models. In addition to reducing the number of parameters to be estimated, LR statistics derived from these models might have less-complex large-sample distributions. The power of an ARP-DRP analysis might also be reduced if the genetic model generates DRPs with considerably more or less power than ARPs have. Schemes that weight ARPs and DRPs according to their relative linkage information might also be developed and applied in the conditional-logistic framework. Much further research is needed, in order to investigate these issues and to develop more-powerful tests.

Another issue requiring further exploration is the application of complex models to sibship data. For simple ASP models, there remains some controversy about optimal weighting of affected sibships, although recent work (Sham et al. 1997; Greenwood and Bull 1999b) suggests that the common practice of down-weighting sibships with more than two affected individuals is unnecessarily conservative. Nonetheless, there is general agreement that the LOD-score contribution from large affected sibships is at least somewhat biased. For many studies, inclusion of DSPs will increase the number of pairs from each sibship that are analyzed. The potential impact that covariates can have on the degree of bias should also be addressed.

I have shown that the conditional-logistic framework extends easily to both general and specialized multilocus models. Models that mix additive and multiplicative properties are also possible. For example, suppose that three disease loci, denoted as “1,” “2,” and “3,” are such that loci 2 and 3 are additive with respect to each other and that locus 1 interacts with locus 2 multiplicatively and with locus 3 multiplicatively. Such a model would be reasonable if locus 1 and either locus 2 or locus 3 (but not both) are necessary to produce disease and if loci 2 and 3 are such that genetic heterogeneity, which approximates additivity, holds for these two loci. This model can be specified by putting λ_ijk=e^β⁽¹⁾_i(e^β⁽²⁾_j+e^β⁽³⁾_k-1). In this model, the 2,3 and 1,2,3 epistatic-genetic variances are equal to 0. If loci 2 and 3 interact, but not in a multiplicative manner, then one can specify λ_ijk=e^β⁽¹⁾_ie^{β^(2,3)_jk}=e^{β⁽¹⁾_i+β^(2,3)_jk}. In this model, all genetic-variance components are allowed to be positive.

Multiple testing issues arise when researchers perform several analyses using various combinations of covariates, ARPs and DRPs, and multiple loci, particularly if the various combinations are studied as part of an overall genome scan designed to detect susceptibility loci. To help control type I error, I recommend that investigators choose, a priori, a scientifically “reasonable” model with a modest number of parameters to be the primary scan and then report rescans, under other models, as supplemental exploratory analyses. Similarly, complex investigations of signals detected by a primary scan should be treated as exploratory and subject to replication.

The conditional-logistic framework is planned for implementation in SAGE (1999, version 4.0), a genetic-analysis computer software program.

Acknowledgments

This work was supported in part by U.S. Public Health Service grants HG01577, from the National Center for Human Genome Research, and RR03655, from the National Center for Research Resources. I thank Charles Mein's and John Todd's groups, for sharing their family data for type 1 diabetes; Heather Cordell, for providing IBD computations for the diabetes loci and providing me with a copy of her manuscript on multilocus analysis; and Heather Cordell, Robert Elston, and Sanjay Shete, for comments on an earlier version of the manuscript. Some of the results in this report were obtained by use of SAGE, which is supported by National Center for Research Resources grant RR03655.

Appendix A

Expressions for the λ_ij in terms of variance components have been given by Cordell et al. (1995), for a two-locus model. When the epistatic variance components are set equal to 0, to obtain an additive model, the λ_ij are

graphic file with name AJHGv65p1760df17.jpg

where V_{a_l} and V_{d_l} are the additive and dominance variances, respectively, for loci l=1, 2.

Put λ_0j=e^β⁽²⁾_j and λ_i0=e^β⁽¹⁾_i for i,j=0, 1, 2, with β^(l)₀=0, for l=1, 2. When these expressions are substituted for the right-hand sides of the foregoing expressions for the λ_ij, the results are

graphic file with name AJHGv65p1760df180.jpg

—or, generally, Inline graphic , with β^(l)₀=0, for l=1, 2.

References

Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, Farrall M (1995) Two-locus maximum LOD score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am J Hum Genet 57:920–934 [PMC free article] [PubMed]
Cordell HJ, Wedig GC, Jacobs KB, Elston RC (1999) Multilocus linkage tests based on affected relative pairs. Genet Epidemiol 17:194 [DOI] [PMC free article] [PubMed] [Google Scholar]
Farrell M (1997) Affected sibpair linkage tests for multiple linkage susceptibility genes. Genet Epidemiol 14:103–115 [DOI] [PubMed]
Greenwood CMT, Bull SB (1997) Incorporation of covariates into genome scanning using sib-pair analysis in bipolar affective disorder. Genet Epidemiol 14:635–640 [DOI] [PubMed]
——— (1999a) Analysis of affected sib pairs, with covariates—with and without constraints. Am J Hum Genet 64:871–885 [DOI] [PMC free article] [PubMed] [Google Scholar]
——— (1999b) Down-weighting of multiple affected sib pairs leads to biased likelihood-ratio tests, under the assumption of no linkage. Am J Hum Genet 64:1248–1252 [DOI] [PMC free article] [PubMed] [Google Scholar]
Holmans P (1993) Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet 52:362–374 [PMC free article] [PubMed]
Idury RM, Elston RC (1997) A faster and more general hidden Markov model algorithm for multipoint likelihood calculations. Hum Hered 47:197–202 [DOI] [PubMed]
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347–1363 [PMC free article] [PubMed]
Kruglyak L, Lander ES (1995) Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am J Hum Genet 57:439–454 [PMC free article] [PubMed]
Lunetta KL, Rogus JJ (1998) Strategy for mapping minor histocompatibility genes involved in graft-versus-host disease: a novel application of discordant sib pair methodology. Genet Epidemiol 15:595–607 [DOI] [PubMed]
Mein CA, Esposito L, Dunn MG, Johnson GCL, Timms AE, Goy JV, Smith AN, et al (1998) A search for type 1 diabetes susceptibility genes in families from the United Kingdom. Nat Genet 19:297–300 [DOI] [PubMed]
Olson JM (1997) Likelihood-based models for genetic linkage analysis using affected sib pairs. Hum Hered 47:110–120 [DOI] [PubMed]
Risch N (1990a) Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46:222–228 [PMC free article] [PubMed]
Risch N (1990b) Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet 46:229–241 [PMC free article] [PubMed]
Rogus JJ, Krolewski AS (1996) Using discordant sib pairs to map loci for qualitative traits with high sibling recurrence risk. Am J Hum Genet 59:1376–1381 [PMC free article] [PubMed]
SAGE (1999) Statistical Analysis for Genetic Epidemiology, version 4.0 (currently beta). Computer software package. Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland [Google Scholar]
Self SG, Liang K-Y (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610 [Google Scholar]
Sham PC, Zhao JH, Curtis D (1997) Optimal weighting schemes for affected sib-pair analysis of sibship data. Ann Hum Genet 61:61–69 [DOI] [PubMed]
Whittemore AS, Tu I-P (1998) Simple, robust linkage tests for affected sibs. Am J Hum Genet 62:1228–1242 [DOI] [PMC free article] [PubMed]

[RF1] Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, Farrall M (1995) Two-locus maximum LOD score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes. Am J Hum Genet 57:920–934 [PMC free article] [PubMed]

[RF2] Cordell HJ, Wedig GC, Jacobs KB, Elston RC (1999) Multilocus linkage tests based on affected relative pairs. Genet Epidemiol 17:194 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RF3] Farrell M (1997) Affected sibpair linkage tests for multiple linkage susceptibility genes. Genet Epidemiol 14:103–115 [DOI] [PubMed]

[RF4] Greenwood CMT, Bull SB (1997) Incorporation of covariates into genome scanning using sib-pair analysis in bipolar affective disorder. Genet Epidemiol 14:635–640 [DOI] [PubMed]

[RF5] ——— (1999a) Analysis of affected sib pairs, with covariates—with and without constraints. Am J Hum Genet 64:871–885 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RF6] ——— (1999b) Down-weighting of multiple affected sib pairs leads to biased likelihood-ratio tests, under the assumption of no linkage. Am J Hum Genet 64:1248–1252 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RF7] Holmans P (1993) Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet 52:362–374 [PMC free article] [PubMed]

[RF8] Idury RM, Elston RC (1997) A faster and more general hidden Markov model algorithm for multipoint likelihood calculations. Hum Hered 47:197–202 [DOI] [PubMed]

[RF9] Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347–1363 [PMC free article] [PubMed]

[RF10] Kruglyak L, Lander ES (1995) Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am J Hum Genet 57:439–454 [PMC free article] [PubMed]

[RF11] Lunetta KL, Rogus JJ (1998) Strategy for mapping minor histocompatibility genes involved in graft-versus-host disease: a novel application of discordant sib pair methodology. Genet Epidemiol 15:595–607 [DOI] [PubMed]

[RF12] Mein CA, Esposito L, Dunn MG, Johnson GCL, Timms AE, Goy JV, Smith AN, et al (1998) A search for type 1 diabetes susceptibility genes in families from the United Kingdom. Nat Genet 19:297–300 [DOI] [PubMed]

[RF13] Olson JM (1997) Likelihood-based models for genetic linkage analysis using affected sib pairs. Hum Hered 47:110–120 [DOI] [PubMed]

[RF14] Risch N (1990a) Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46:222–228 [PMC free article] [PubMed]

[RF15] Risch N (1990b) Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet 46:229–241 [PMC free article] [PubMed]

[RF16] Rogus JJ, Krolewski AS (1996) Using discordant sib pairs to map loci for qualitative traits with high sibling recurrence risk. Am J Hum Genet 59:1376–1381 [PMC free article] [PubMed]

[RF17] SAGE (1999) Statistical Analysis for Genetic Epidemiology, version 4.0 (currently beta). Computer software package. Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland [Google Scholar]

[RF18] Self SG, Liang K-Y (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610 [Google Scholar]

[RF19] Sham PC, Zhao JH, Curtis D (1997) Optimal weighting schemes for affected sib-pair analysis of sibship data. Ann Hum Genet 61:61–69 [DOI] [PubMed]

[RF20] Whittemore AS, Tu I-P (1998) Simple, robust linkage tests for affected sibs. Am J Hum Genet 62:1228–1242 [DOI] [PMC free article] [PubMed]

PERMALINK

A General Conditional-Logistic Model for Affected-Relative-Pair Linkage Studies

Jane M Olson

Summary

Introduction

Methods