Abstract
For both model-free and model-based linkage analysis the S.A.G.E. (Statistical Analysis for Genetic Epidemiology) program package has some unique capabilities in analyzing both continuous traits and binary traits with variable age of onset. Here we highlight model-based linkage analysis of a quantitative trait (plasma dopamine β hydroxylase) that is known to be largely determined by monogenic inheritance, using a prior segregation analysis to produce the best fitting model for the trait. For a binary trait with variable age of onset (schizophrenia), we illustrate how using age of onset information to obtain a quantitative susceptibility trait leads to more statistically significant linkage signals, suggesting better power.
Key Words: Age of onset, Best linear unbiased predictor, Haseman-Elston, Individual-specific penetrance values, Multipoint, Power transform, Segregation model for a continuous trait
Introduction
There are many computer programs that implement the various statistical methods of linkage analysis. Our purpose here is to indicate some of the features available in the program package S.A.G.E. (Statistical Analysis for Genetic Epidemiology) [1], explaining how these features are best used to obtain optimal validity and power. We do this by comparing a few of the various options available for two different types of multipoint linkage analysis on one common set of pedigree data. These pedigrees were ascertained because they segregate for schizophrenia, and we chose them for illustration because they were typed for linkage analysis using the Illumina version 4 linkage panel [2] and have values of plasma dopamine β hydroxylase (pDßH) concentration on 284 of the 922 family members. We first briefly describe the main functions of some of the relevant programs available and then describe a model-based analysis of pDßH, a quantitative trait, and a model-free analysis of a binary disease, schizophrenia, with and without using information on age of onset.
S.A.G.E. comprises 17 programs for the analysis of genetic epidemiological data, designed for analyzing pedigree data but also capable of analyzing data on unrelated individuals. Here we first highlight those programs that can be used to perform linkage analysis and then illustrate the use of some of them. Programs are also available for pedigree structure statistics, and marker and relationship error detection. Furthermore the program ASSOC, though originally designed for association analysis, can also be used for linkage analysis using a TDT (transmission disequilibrium test)-type statistic [3], as well to estimate heritability.
For model-based linkage analysis, the program MLOD performs multipoint model-based LOD-score linkage analysis on small pedigrees and LODLINK is the two-point equivalent but can accommodate large pedigrees. A unique feature of S.A.G.E. is the program SEGREG, which can perform segregation analysis with the ability to output penetrance files for the model-based linkage programs LODLINK and MLOD.
For model-free linkage analysis, which we define as a linkage analysis where no particular mode of inheritance is assumed for the trait being analyzed, there are three programs. First, SIBPAL analyzes data (qualitative or quantitative) on sibships as a function of marker allele sharing identity-by-descent (IBD), and includes Haseman-Elston (H-E) regression. Second, RELPAL is a new regression-based univariate or multivariate model-free linkage program that extends the H-E approach to multiple relative types and to a two-level analysis, as proposed by Wang and Elston [4]. Both SIBPAL and RELPAL use either the single- or the multi-marker IBD information produced by the S.A.G.E. program GENIBD, which is relatively fast, to investigate linkage at one or more trait loci, including epistatic interactions and covariate effects. Third, LODPAL was designed to perform linkage analysis based on the LOD score formulation for affected sib pairs (ASPs) [5], but has been generalized to use information from both affected and discordant relative pairs in general. The current implementation is that of the general conditional logistic model proposed by Olson [6], with the possibility of the one-parameter model of Goddard et al. [7] based on the min-max statistic of Whittemore and Tu [8]. The model allows for the inclusion of all affected relative pairs, together with covariates or discordant sibling pairs, with the possibility of pooling unaffected relative pairs together with affected relative pairs.
Methods
Model-Based Linkage Analysis of a Quantitative Trait
To perform model-based linkage analysis we need to assume a trait model, or fit the trait model by segregation analysis. The trait model includes the known (or assumed) trait locus allele frequencies and the values of the penetrance functions (which are usually probability mass functions but, in the case of continuous traits, can be probability density functions) for the genotypes at the trait locus. The SEGREG program in S.A.G.E. can fit very general segregation models for both binary and continuous traits, and outputs a ‘type’ file that includes the individual-specific penetrance function values conditional on any fitted segregation model. This type file can then be used for model-based linkage analysis by LODLINK or MLOD. For the quantitative trait pD ßH, a low-value dominant model under a square-root transformation was shown to produce the best genome-wide linkage signal, on chromosome 9 at the DßH structural locus [9]. Here we illustrate the fitting of the low-value dominant model with SEGREG, incorporating residual familial correlations and various power transformations and covariates; we then investigate how these factors affect the linkage result.
In SEGREG, we have the option to use Bonney's class D regressive model [10], which allows for the incorporation of familial correlations. The pDßH activities were assumed to have a low-value dominant mode of inheritance; i.e. we assumed a mixture of two density functions, the one with lower mean for genotypes AA and AB, and the one with higher mean for genotype BB. We investigated both sex and schizophrenia status as binary covariates of the genotype means but here only illustrate this for the one covariate sex (neither was significant). SEGREG allows for transforming the trait using the standardized Box-Cox transformation (see Appendix) [11], simultaneously estimating the power parameter together with all the other parameters of the segregation model. It is assumed that, conditional on genotype, after transformation the phenotype y is normally distributed.
Models with various transformations using the general two-parameter standardized Box-Cox transformation (Appendix) were fitted, under which we obtained maximum likelihood estimates (MLEs) of all unknown parameters. First, we set Λ1 = 1 and Λ2 = 1 to fit a model with no transformation. If 0 < Λ1< 1, in some cases Λ2 = 0 could result in erroneous transformed distributions (see Appendix) and so, for the other three power transformations used in this study, we fixed Λ2 to have a small value, 0.1. Then Λ1 = 0.5 approximates the square-root transformation and Λ1 = 0 approximates the logarithmic transformation. We also fitted a model to obtain the MLEs of all parameters other than Λ2, but including Λ1. SEGREG outputs likelihoods for the segregation models using the standardized Box-Cox transformation, i.e. adjusted by the Jacobian of the transformation, with the result that all the likelihoods are for the same (untransformed) data and hence comparable.
In SEGREG, the following correlations among the residuals from the predicted genotype means can be allowed in a class D regressive model: ρFM, spouse (father-mother); ρMO, mother-offspring; ρFO, father-offspring; and ρSS, sibling. Under the class D assumption, the sibling correlation is not constrained to be due to common parentage alone, as is the case for the class A regressive model; in this latter model the sibling correlation can only be larger than the parent-offspring correlation if the spouse correlation is negative. The residuals are assumed to follow a multivariate normal distribution across family members. Thus the power transformation helps to assure this normality, so that a major gene model is fitted allowing for an approximately normally distributed cumulative effect of various factors, such as polygenes and environmental factors, that are not separately distinguished. In each case we fitted familial correlations as for the usual mixed major-gene/polygenic model, i.e. equal parent-offspring and sib-sib correlations, and no spouse correlation. We also fitted models without any familial correlations, for comparison. For each fitted model, the type file includes each individual's penetrance function values for each genotype after the class D regressive model has been fitted, thus allowing for residual familial correlations, and this was used as an input file for MLOD. We performed model-based multipoint linkage analysis for pDßH activity specifically for chromosome 9. LODs were estimated at the markers and at intervals of 2 cM.
Estimating Marker Allele Frequency and Exact Multipoint IBD Sharing
MLEs of the pedigree founder marker allele frequencies were estimated by the S.A.G.E. program FREQ, using the genotype data available on all the pedigree members. These were used for the model-based linkage analysis of pDßH activity, as well as for estimating multipoint IBD sharing by GENIBD for the model-free analysis of schizophrenia described below; in this case GENIBD was used to estimate exact multipoint IBD sharing distributions for five types of relative pairs, for each marker location and at intervals of 2 cM.
Model-Free Linkage Analysis of a Disease Trait with and without Age of Onset Information
For a binary trait, SIBPAL performs the usual mean test and proportion test of allele sharing among ASPs, as well as the analogous tests for discordant and concordantly unaffected sib pairs. For these tests, estimates are obtained of f0, f1 and f2, the probabilities that sibs share 0, 1, or 2 alleles IBD. For quantitative traits SIBPAL performs various types of H-E regression [4]. For binary traits with a variable age of onset, S.A.G.E. has the capability of using age of onset information in a model-free linkage analysis. The program AGEON obtains MLEs of parameters that determine ‘susceptibility’ to disease as a function of age, producing a quantitative susceptibility trait for each individual.
Let y be the probability a sib is susceptible, by which we mean that individual will eventually become affected. This implies a sib who is not susceptible will never become affected (even, theoretically, should the sib live infinitely long). Let the cumulative distribution function of the age of onset, a (relevant only for susceptible sibs), be Φ(a). Then the probability that a sib is affected by age a is γΦ(a), and 1 – γΦ(a) is the probability of not being affected by age a. If unaffected by age a, the probability of a sib being susceptible is P(susceptible)P(unaffected at age a | susceptible)/P(unaffected at age a) = γ[1 – Φ (a)] / [1 – γΦ(a)]. Thus the probability a sib is susceptible, conditional on the binary phenotype and age a, is 1 if the sib is affected, and γ[l – Φ (a)] / [1 –γΦ(a)] if the sib is unaffected at age a. We call this probability the sib's susceptibility trait.
The affection status of the parents (each of the two parents may be affected, unaffected or unknown) form a categorical covariate with 6 classes [12] and hence allows estimation of 6 corresponding values of γ. Using both affected and unaffected individuals, AGEON also Box-Cox transforms the age of onset, which is assumed to be normally distributed after transformation. Thus age of onset is assumed to follow a power-normal distribution but, for the same reasons as stated above, λ2 is set to a small value (we used 0.5, the default value). AGEON obtains the MLEs of the mean and variance of the age of onset distribution, together with the 6 values of susceptibility to disease, γ – the probability of ever being affected. The likelihood is also maximized constraining these 6 susceptibilities to be equal, allowing for a likelihood ratio test of this as a null hypothesis.
We performed analyses using SIBPAL for (1) the binary trait affection status, and (2) two different quantitative traits. The mean test and the proportion test were used for the binary trait and H-E regression tests were performed for two quantitative traits – the binary trait as a (0, 1) quantitative trait [13] and the susceptibility trait as produced by the program AGEON.
SIBPAL has the option to use either the best linear unbiased predictor (BLUP) of the sibship mean or the overall sample mean when mean-correcting the squared sib-pair sum, which can be combined with the squared sib-pair difference to form the dependent variable. It has been shown that these methods that use a mean-corrected sum are asymptotically more powerful than the original H-E method, which is independent of the population mean [4, 14]. The sibship-specific mean is the mean of the trait values for a given sibship and is estimated as the average
where Ns is the number of sibs in a particular sibship. The BLUP of the mean for that sibship is then
where σ2b is the variance among sibships and σ2w is the variance within sibships. We used option W4 in SIBPAL, which is asymptotically optimal, to combine the sib-pair sum and the sib-pair difference [15], obtaining both asymptotic and empirical p values from the permutation distribution. For the latter, estimates of IBD sharing, used as the predictor in the regression, are permuted by SIBPAL both within sibships and across sibships of the same size. We compared these results to the original H-E method. In addition, for option W4, we compared the results obtained using the sample mean to the results using the BLUP of the sibship mean.
Results
Segregation Models
An earlier study fitted the segregation model under a square-root transformation by fixing Λ1 = 0.5 and Λ2 = 0 [9]. In this study, we fixed Λ2 = 0.1. Among the four low-value dominant models under the square-root transformation in table 1 (models 1, 2, 3, 4), the two models incorporating familial correlations (models 1 and 3, assuming equal parent-offspring and sib-sib correlations) fitted better according to Akaike's A information criterion (AIC) [16] than the two without familial correlations (models 2 and 4). Moreover, including sex as a covariate for the genotype means did not improve the fit, and in fact yielded a worse fitting model (compare the AIC values: model 1 with 3, model 2 with 4), and sex as a covariate was not significant (table 1). Models using either a logarithmic or estimated power transformation fitted these data better than models using a square-root transformation, as judged by their AIC values, but all fitted better than models with no transformation.
Table 1.
Estimates of the parameters for low-value dominant inheritance under various models, defined by transformations, with (+) and without (-) sex as a covariate, and with (+) and without (-) familial correlations, using 284 individuals with pDßH activities in 70 pedigrees
| Models | 1 | 2 | 3 | 4 | 5 | 6 | 7a |
|---|---|---|---|---|---|---|---|
| Transformation | Square-root | Square-root | Square-root | Square-root | None | Logarithm | Estimated power(λ1) |
| Familial correlations | + | − | + | − | + | + | + |
| Covariate | + | + | − | − | − | − | − |
| qAb | 0.51 ± 0.07 | 0.39 ± 0.05 | 0.52 ± 0.07 | 0.40 ± 0.05 | 0.59 ± 0.04 | 0.11 ± 0.03 | 0.52 ± 0.05 |
| µAAc | 8.00 ± 1.07 | 6.41 ± 0.69 | 8.12 ± 1.01 | 6.52 ± 0.69 | 10.64 ± 0.72 | 1.91 ± 0.39 | 1.78 ± 0.29 |
| µABc | 8.00 ± 1.07 | 6.41 ± 0.69 | 8.12 ± 1.01 | 6.52 ± 0.69 | 10.64 ± 0.72 | 1.91 ± 0.39 | 15.41 ± 1.37 |
| µABc | 34.14 ± 4.6 | 26.70 ± 2.1 | 34.71 ± 4.40 | 27.11 ± 2.2 | 42.47 ± 1.62 | 13.99 ± 1.26 | 15.41 ± 1.37 |
| βsexd | 0.72 ± 1.03 | 1.40 ± 1.04 | − | − | − | − | − |
| ρPO = ρSS | 0.23 ± 0.07 | 0 | 0.23 ± 0.06 | 0 | 0.22 ± 0.05 | 0.13 ± 0.07 | 0.37 ± 0.07 |
| Λ1 | 0.50 | 0.50 | 0.50 | 0.50 | 1 | 0 | 0.01 ± 0.06 |
| Akaike's AIC | 2,118.04 | 2,123.55 | 2,116.53 | 2,123.34 | 2,210.55 | 2,109.22 | 2,087.74 |
Values are given with standard errors.
Converged to a low-value recessive model.
Frequency of allele A.
Genotype means on the untransformed scale.
Coefficient for mean centered sex, coded as 0 for males and 1 for females.
Model-Based Linkage Analysis
As shown in figure 1, among the four low-value dominant models using the square-root transformation, the two models incorporating familial correlations produced higher linkage signals on chromosome 9 compared with the two models ignoring familial correlations. The model including sex as a covariate for the genotype means produced a lower linkage signal compared to the model without the covariate. These results are consistent with the results from segregation analysis in that the better fitting model produced a better linkage signal. Among all the models in table 1, the best linkage signal on chromosome 9 was produced by model 1, the square-root transformation model incorporating familial correlations but without a sex covariate (fig. 1). The maximum LOD for this model was 5.94.
Fig. 1.
Multipoint model-based linkage on chromosome 9 under the low-value dominant trait models with a square-root transformation (Box-Cox transformation, Λ1 = 0.5, Λ2 = 0.1), with and without residual familial correlations and sex as a covariate of the genotype means.
Comparing linkage results under the dominant models using various transformations (fig. 2), the model using a square-root transformation produced the highest linkage signal, while the model using the estimated transformation power Λ1, which converged to a low-value recessive model, produced the lowest linkage signal. Thus, restricting the model to be dominant for low values, as found in previous studies [17, 18], was necessary to obtain a segregation model useful for linkage analysis.
Fig. 2.
Multipoint model-based linkage on chromosome 9 under the low-value dominant models, with different power transformations. Residual familial correlations were included in the trait model, and no covariate was included. The Box-Cox transformations with power Λ1 = 0.5, 1 and 0 are approximately a square-root transformation, no transformation and a logarithmic transformation, respectively. The model with estimated Λ1 (=0.01) was a low-value recessive model instead of a low-value dominant model because it had the larger likelihood.
Model-Free Linkage Analysis
Model-free linkage analysis of pDßH activity in these pedigrees was reported in [9]. Here we concentrate on schizophrenia as our trait of interest. A brief description of the data is presented in table 2. There were 286 affected individuals with (apart from 11 for whom this information was missing) average age of onset 21.78 years, and 487 unaffected individuals with (apart from 68 for whom this information was missing) average last known age at which the subject was known to be unaffected (age at exam) 57.27 years. In order to increase the amount of information to estimate the age of onset distribution, we imputed the missing ages for those unaffected to be 11, the smallest age at exam in the sample, on the reasonable assumption that this would almost certainly be an age by which these individuals would not have been affected. For the 11 affected individuals missing age of onset, age at exam was available. We now give separate results for schizophrenia considered as a binary trait and as two quantitative traits.
Table 2.
Descriptive statistics for schizophrenia: sex, age of onset, and age at exam by affection status
| Age of onset (SD), years | Age at exam (SD), years | |
|---|---|---|
| Affected (187 M, 110 F) | n = 286; missing: n = 11a | n = 11a |
| Mean | 21.78 (6.93) | 45.82 (13.98) |
| Median | 20 | 38 |
| Range | 5–48 | 27–69 |
| Unaffected (276 M, 270 F) | not applicable | n = 478; missing: n = 68 |
| Mean | 57.27 (16.93) | |
| Median | 59 | |
| Range | 11–106 | |
Affected subjects missing age of onset but having age at exam.
Binary Trait
For the binary trait affected or unaffected with schizophrenia, the smallest p value was obtained on chromosome 13. The results for the usual ASP mean and proportion tests are plotted for chromosome 13 in figure 3. This figure also shows (top three panels) the analogous tests for discordant sib pairs and concordantly unaffected sib pairs. The min-max statistic of Whittemore and Tu [8] should lie between these two lines in the top panel of figure 3, and analogous statistics should lie between the two lines in the next two panels. At position 94.8 cM p-ter, the mean allele sharing for concordantly affected and unaffected sib pairs was significantly larger than expected under the null hypothesis (p < 0.01 and p < 0.05, respectively). The results were also significant for concordantly unaffected sib pairs sharing 0 or 2 alleles IBD, as was the result for ASPs sharing 0 alleles IBD.
Fig. 3.
Quantitative Traits
For schizophrenia analyzed as a quantitative (0, 1) trait, the analysis using the BLUP of the sibship mean instead of the sample mean in H-E regression yielded a smaller p value (0.001 vs. 0.002, respectively). The bottom panel of figure 3 shows the H-E regression using the BLUP of the sibship mean; the regression parameter was equal to the allele sharing by concordant sib pairs minus that by discordant sib pairs, and is plotted for f2 + f1/2.
When fitting the susceptibility model with AGEON, the MLE of the power parameter was 0 (natural log transformation). When estimating five susceptibility parameters (γ) (one category had no observations, i.e. there were no sibships where both parents were affected), the category where both parents were unaffected had the lowest estimate of susceptibility (0.53), but the estimate for the case where one parent was unknown and the other parent unaffected was slightly higher than that for one unknown and one affected parent (0.633 vs. 0.630) – i.e. the values of γ were not in a logical order. Twice the difference in the log likelihoods between the model with five susceptibilities compared to equal susceptibilities was not significant (χ2 = 3.91, d.f. = 4), and therefore estimates with the susceptibilities equal were used in the analysis with SIBPAL. The susceptibility for this model was 0.56 and the estimated age of onset distribution had mean 20.9 years and standard deviation 1.92 on the original scale. Using the quantitative susceptibility trait in H-E regression yielded a smaller p value than did the binary unaffected/affected (0, 1) trait (0.00002 vs. 0.001, table 3). The susceptibility trait yielded empirical p values <0.001 on chromosomes 2, 11 and 13, with the smallest p value on chromosome 13. We chose to report p values instead of LOD scores as the LOD score can be biased [19]. Table 4 summarizes these quantitative trait results using both the BLUP of the sibship mean and the sample mean to correct the sib-pair trait sum. Empirical p values were calculated using the permutation option in SIBPAL. In all cases, using the original H-E regression resulted in larger p values than the W4 option.
Table 3.
Most significant p values for tests of f2 and H-E regression (original and the W4 option), chromosome 13
| Analysis | Location | Asymptotic | Empirical |
|---|---|---|---|
| cM | p | p | |
| ASP | 77.2 | 0.019 | − |
| DSP | 90.9 | 0.052 | − |
| USP | 94.7 | 0.0022 | − |
| H-Ea (0, 1 quantitative) | 94.7 | 0.002 | 0.001 |
| H-E (Susc.b) | 94.0 | 0.00007 | 0.00002 |
| H-E (Susc.c) | 94.8 | 0.0001 | 0.00004 |
DSP = Discordant sib pairs; USP = unaffected sib pairs.
Susceptibility trait with option W4.
Susceptibility trait with original H-E.
Table 4.
Summary of SIBPAL results for quantitative trait analysis (H-E regression)
| Trait | Chromosome | BLUP of sibship mean |
Sample mean |
||
|---|---|---|---|---|---|
| position from p-ter | empirical p value | position from p-ter | empirical p value | ||
| (0, 1) trait | 13 | 94.8 | 0.001 | 94.8 | 0.002 |
| Susceptibility trait | 2 | 133.3–135.5 | 0.0007 | 128.9 | 0.006 |
| 11 | 90 | 0.0006 | 90.3 | 0.007 | |
| 13 | 94 | 0.00002 | 94 | 0.0006 | |
Discussion
SEGREG and MLOD are two programs in S.A.G.E. that respectively fit segregation models and perform model-based multipoint linkage analysis. SEGREG can fit models for both quantitative and binary traits; it allows for including covariates for genotype means and genotype variances in the case of a quantitative trait, or covariates of susceptibilities for a binary trait. An example of the flexibility of SEGREG for the segregation analysis of a binary trait is given in Sun et al. [20]. SEGREG incorporates a transformation in the likelihood function for a quantitative trait, and can incorporate a multifactorial effect, and hence residual familial correlations, in two distinct ways – via a regressive model as illustrated here and via a special form of the usual mixed major gene/polygenic model to incorporate a finite number of polygenic loci. It also allows for ascertainment by conditioning the likelihood on the phenotypes of a subset of individuals. MLOD can perform model-based linkage based on a trait model fitted by SEGREG. Our study illustrated fitting genetic models for pDßH activity allowing for familial correlations by using a regressive model, as well as including covariates and incorporating various transformations. The program REGRESS [21, 22] and the updated faster program FINESSE [Florence Demenais, personal communication] incorporate regressive models into the LINKAGE program, and they can simultaneously fit a regressive model while performing linkage analysis (‘combined segregation and linkage analysis’), but there are advantages to a two-stage analysis using SEGREG and MLOD separately: incorporating covariates, adjusting for ascertainment, and simultaneously estimating the best transformation. In addition, performing segregation analysis separately enables use of all the data available to obtain the penetrance values and then trim off those individuals with no DNA data for the linkage analysis. If genotype means or variances are dependent on some covariates, SEGREG can incorporate such covariates in the model to overcome the difficulty of directly incorporating them in model-based linkage analysis.
Our results indicated pDßH activity, using the appropriate square-root transformation and fitting the appropriate low-value dominant penetrance model, still required a model with familial correlations, but not a sex covariate, for this model to show higher power to detect linkage. However, among all the models with various transformations, the best fitting model (model 7 in table 1) did not produce a higher linkage signal (the lowest linkage signal in fig. 2). This is consistent with the earlier study showing that the best fitting model may not produce a better linkage result [9]. However, note that in the best fitting model, when Λ1 was estimated, its standard error was relatively large, i.e. the model is suspect because it has a poorly estimated parameter. This emphasizes the assumption, always made in a model-based linkage analysis, that the genetic model is true. Thus we should be cautious in choosing a model for a model-based linkage analysis.
Our results confirmed accounting for residual familial correlations in the regressive models can increase the power to detect linkage [21]. It is difficult to incorporate a regressive model when parents have missing trait values – typically it is then assumed the grandparent-grandchild residuals are zero. However, the usual mixed model with a latent polygenic component does not require that all parents have trait values. SEGREG could be used to fit a mixed model and hence estimate a residual familial correlation (the same for sibs and parent-offspring) even when parental values are missing (in these data, 38 and 25% of the children have one parent and both parents missing pDßH values, respectively). Using this correlation estimated from the mixed model, we could impute from the siblings’ trait values all missing parental values and then fit a regressive model. Including this information in a linkage analysis via the penetrance function, as we illustrated, would be expected to further improve the linkage signal.
For schizophrenia, analysis of the binary trait yielded nominally significant results on chromosome 13. Using the BLUP of the sibship mean produced a smaller p value than the sample mean. However, the BLUP of the sibship mean may not always lead to most power [14]; it is expected to do so in large-enough samples (i.e. when the estimate w is good). The susceptibility trait, however, produced a more significant result at the same position (table 4). In addition, the susceptibility trait also produced nominally significant findings on chromosomes 2 and 11. Previous studies have found linkage to chromosome 8 for these data [23]. Other studies have previously reported findings suggesting the importance of chromosome 11 [24, 25, 26, 27, 28, 29] and chromosome 13 [30, 31, 32, 33], and a recent meta-analysis found evidence for linkage to 1,2q, 3q, 4q, 5q, 8p and 10q [34]. We therefore believe our analysis of susceptibility has probably increased the power of linkage analysis in these data.
Acknowledgements
This work was supported in part by a US Public Health Service Resource grant (RR03655). We wish to thank Dr. Ann E. Pulver of The Johns Hopkins School of Medicine Epidemiology-Genetics Program in Psychiatry as the source of the schizophrenia data and Dr. Joseph F. Cubells of Emory University School of Medicine for the plasma DßH data, obtained in part with support from a US Public Health Service research grant (MH 077233).
Appendix
The general two-parameter standardized Box-Cox transformation of a continuous variate y is given by
where
and N is the number of data values y in the dataset.
For the original trait x, assume that after transformation y is normally distributed with mean μ and variance σ2. The relationship between y and x (Box-Cox transformation without inclusion of the Jacobian) is
The probability density function of y is
and the probability density function of x is
where
So, for
When performing a segregation (or commingling) analysis, we wish to avoid any transformation that introduces any spurious modes or antimodes. In order to check if the transformation might cause this in fx(x), we deduce the location of the possible modes or antimodes of fx (x) by setting
The solution z is
and, in order for z to have at least one solution, we must have σ2 ≤ 1 + μ.
(1) If Λ1 ≥ 1, z has one positive solution, which means the transformation does not change the number of modes (both fx(x) and fy(y) have one mode, no antimode).
(2) If λ1 < 1, z has two positive solutions, which means that fx(x) = 0 at
Furthermore, when
fx(x) (x ≥ 0) has an antimode, which means that the distribution of a trait x with two suprema was transformed to have one (Appendix fig. 1). Therefore, if we want both fx (x) and fy(y) to have just one supremum, Λ2 should satisfy, when Λ1 < 1,
Appendix Fig. 1.
Illustration of a situation where the Box-Cox transformation changes a distribution fx(x) with an antimode to a unimodal fy(y). The distribution of y is N(0.5, 1), and the transformation is y = (x0.5 − 1)/(0.5) (Λ1 = 0.5, Λ2 = 0).
If the trait values are all positive (x ≥ 0), i.e. xmin = 0, then the restriction on Λ2 when Λ1 < 1 becomes
References
- 1.S.A.G.E. program. Available at http://darwin.cwru.edu/sage 2012.
- 2.Murray SS, Oliphant A, Shen R, McBride C, Steeke RJ, Shannon SG, Rubano T, Kermani BG, Fan JB, Chee MS, Hansen MST. A highly informative SNP linkage panel for human genetic studies. Nat Methods. 2004;1:113–117. doi: 10.1038/nmeth712. [DOI] [PubMed] [Google Scholar]
- 3.George V, Tiwari HK, Zhu X, Elston RC. A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am J Hum Genet. 1999;65:236–245. doi: 10.1086/302444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wang T, Elston RC. Two-level Haseman-Elston regression for general pedigree data analysis. Genet Epidemiol. 2005;29:12–22. doi: 10.1002/gepi.20075. [DOI] [PubMed] [Google Scholar]
- 5.Risch N. Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet. 1990;46:229–241. [PMC free article] [PubMed] [Google Scholar]
- 6.Olson JM. A general conditional-logistic model for affected-relative-pair linkage studies. Am J Hum Genet. 1999;65:1760–1769. doi: 10.1086/302662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Goddard KA, Witte JS, Suarez BK, Catalona WJ, Olson JM. Model-free linkage analysis with covariates confirms linkage of prostate cancer to chromosomes 1 and 4. Am J Hum Genet. 2001;68:1197–1206. doi: 10.1086/320103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Whittemore AS, Tu IP. Simple, robust linkage tests for affected sibs. Am J Hum Genet. 1998;62:1228–1242. doi: 10.1086/301820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cubells JF, Sun X, Li W, Bonsall RW, McGrath JA, Avramopoulos D, Lasseter VK, Wolyniec PS, Tang YL, Mercer K, Pulver AE, Elston RC. Linkage analysis of plasma dopamine beta-hydroxylase activity in families of patients with schizophrenia. Hum Genet. 2011;130:635–645. doi: 10.1007/s00439-011-0989-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bonney GE. On the statistical determination of major gene mechanisms in continuous human traits: regressive models. Am J Med Genet. 1984;18:731–749. doi: 10.1002/ajmg.1320180420. [DOI] [PubMed] [Google Scholar]
- 11.Box GEP, Cox DR. An analysis of transformations. J Roy Stat Soc B. 1964;26:211–252. [Google Scholar]
- 12.Elston RC. The use of polymorphic markers to detect genetic variability. In: Woodhead A, editor. Phenotypic Variation in Populations: Relevance to Risk Assessment. New York: Plenum Press; 1988. pp. 105–112. [Google Scholar]
- 13.Elston RC, Song D, Iyengar SK. Mathematical assumptions versus biological reality: myths in affected sib pair linkage analysis. Am J Hum Genet. 2005;76:152–156. doi: 10.1086/426872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sinha R, Gray-McGuire C. Haseman-Elston regression in ascertained samples: importance of dependent variable and mean correction factor selection. Hum Hered. 2008;65:66–76. doi: 10.1159/000108938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shete S, Jacobs KB, Elston RC. Adding further power to the Haseman and Elston method for detecting linkage in larger sibships: weighting sums and differences. Hum Hered. 2003;55:79–85. doi: 10.1159/000072312. [DOI] [PubMed] [Google Scholar]
- 16.Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;AC-19:716–713. [Google Scholar]
- 17.Asamoah A, Wilson AF, Elston RC, Dalferes E, Jr, Berenson GS. Segregation and linkage analyses of dopamine-beta-hydroxylase activity in a six-generation pedigree. Am J Med Genet. 1987;27:613–621. doi: 10.1002/ajmg.1320270314. [DOI] [PubMed] [Google Scholar]
- 18.Goldin LR. Segregation analysis of dopamine-beta-hydroxylase (dbh) and catechol-o-methyltransferase (comt): identification of major locus and polygenic components. Genet Epidemiol. 1985;2:317–325. doi: 10.1002/gepi.1370020308. [DOI] [PubMed] [Google Scholar]
- 19.Province MA. Sequential methods of analysis for genome scans. Adv Genet. 2001;42:499–514. doi: 10.1016/s0065-2660(01)42039-6. [DOI] [PubMed] [Google Scholar]
- 20.Sun X, Elston R, Barnholtz-Sloan J, Falk G, Grady WM, Kinnard M, Mittal SK, Willis JE, Markowitz S, Brock W, Chak A. A segregation analysis of Barrett's esophagus and associated adenocarcinomas. Cancer Epidemiol Biomarkers Prev. 2010;19:666–674. doi: 10.1158/1055-9965.EPI-09-1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Demenais F, Lathrop M. Use of the regressive models in linkage analysis of quantitative traits. Genet Epidemiol. 1993;10:587–592. doi: 10.1002/gepi.1370100643. [DOI] [PubMed] [Google Scholar]
- 22.Demenais F, Lathrop M. Regress: a computer program including the regression approach into the LINKAGE package. Genet Epidemiol. 1994;11:291. [Google Scholar]
- 23.Fallin MD, Lasseter VK, Liu Y, Avramopoulos D, McGrath J, Wolyniec PS, Nestadt G, Liang KY, Chen PL, Valle D, Pulver AE. Linkage and association on 8p21.2-p21.1 in schizophrenia. Am J Med Genet B Neuropsychiatr Genet 2010. [DOI] [PubMed]
- 24.Klar AJ. The chromosome 1;11 translocation provides the best evidence supporting genetic etiology for schizophrenia and bipolar affective disorders. Genetics. 2002;160:1745–1747. doi: 10.1093/genetics/160.4.1745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Klar AJ. A genetic mechanism implicates chromosome 11 in schizophrenia and bipolar diseases. Genetics. 2004;167:1833–1840. doi: 10.1534/genetics.104.028217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Choudhury K, McQuillin A, Puri V, Pimm J, Datta S, Thirumalai S, Krasucki R, Lawrence J, Bass NJ, Quested D, Crombie C, Fraser G, Walker N, Nadeem H, Johnson S, Curtis D, St Clair D, Gurling HM. A genetic association study of chromosome 11q22–24 in two different samples implicates the FXYD6 gene, encoding phosphohippolin, in susceptibility to schizophrenia. Am J Hum Genet. 2007;80:664–672. doi: 10.1086/513475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Fletcher JM, Evans K, Baillie D, Byrd P, Hanratty D, Leach S, Julier C, Gosden JR, Muir W, Porteous DJ, et al. Schizophrenia-associated chromosome 11q21 translocation: identification of flanking markers and development of chromosome 11q fragment hybrids as cloning and mapping resources. Am J Hum Genet. 1993;52:478–490. [PMC free article] [PubMed] [Google Scholar]
- 28.Gill M, McGuffin P, Parfitt E, Mant R, Asherson P, Collier D, Vallada H, Powell J, Shaikh S, Taylor C, et al. A linkage study of schizophrenia with DNA markers from the long arm of chromosome 11. Psychol Med. 1993;23:27–44. doi: 10.1017/s0033291700038824. [DOI] [PubMed] [Google Scholar]
- 29.Wang ZW, Black D, Andreasen NC, Crowe RR. A linkage study of chromosome 11q in schizophrenia. Arch Gen Psychiatry. 1993;50:212–216. doi: 10.1001/archpsyc.1993.01820150062006. [DOI] [PubMed] [Google Scholar]
- 30.Maziade M, Chagnon YC, Roy MA, Bureau A, Fournier A, Merette C. Chromosome 13q13-q14 locus overlaps mood and psychotic disorders: the relevance for redefining phenotype. Eur J Hum Genet. 2009;17:1034–1042. doi: 10.1038/ejhg.2008.268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Brzustowicz LM, Honer WG, Chow EW, Little D, Hogan J, Hodgkinson K, Bassett AS. Linkage of familial schizophrenia to chromosome 13q32. Am J Hum Genet. 1999;65:1096–1103. doi: 10.1086/302579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mulle JG, McDonough JA, Chowdari KV, Nimgaonkar V, Chakravarti A. Evidence for linkage to chromosome 13q32 in an independent sample of schizophrenia families. Mol Psychiatry. 2005;10:429–431. doi: 10.1038/sj.mp.4001639. [DOI] [PubMed] [Google Scholar]
- 33.Blouin JL, Dombroski BA, Nath SK, Lasseter VK, Wolyniec PS, Nestadt G, Thornquist M, Ullrich G, McGrath J, Kasch L, Lamacz M, Thomas MG, Gehrig C, Radhakrishna U, Snyder SE, Balk KG, Neufeld K, Swartz KL, DeMarchi N, Papadimitriou GN, Dikeos DG, Stefanis CN, Chakravarti A, Childs B, Housman DE, Kazazian HH, Antonarakis S, Pulver AE. Schizophrenia susceptibility loci on chromosomes 13q32 and 8p21. Nat Genet. 1998;20:70–73. doi: 10.1038/1734. [DOI] [PubMed] [Google Scholar]
- 34.Ng MY, Levinson DF, Faraone SV, Suarez BK, DeLisi LE, Arinami T, Riley B, Paunio T, Pulver AE, Irmansyah, Holmans PA, Escamilla M, Wildenauer DB, Williams NM, Laurent C, Mowry BJ, Brzustowicz LM, Maziade M, Sklar P, Garver DL, Abecasis GR, Lerer B, Fallin MD, Gurling HM, Gejman PV, Lindholm E, Moises HW, Byerley W, Wijsman EM, Forabosco P, Tsuang MT, Hwu HG, Okazaki Y, Kendler KS, Wormley B, Fanous A, Walsh D, O'Neill FA, Peltonen L, Nestadt G, Lasseter VK, Liang KY, Papadimitriou GM, Dikeos DG, Schwab SG, Owen MJ, O’Donovan MC, Norton N, Hare E, Raventos H, Nicolini H, Albus M, Maier W, Nimgaonkar VL, Terenius L, Mallet J, Jay M, Godard S, Nertney D, Alexander M, Crowe RR, Silverman JM, Bassett AS, Roy MA, Merette C, Pato CN, Pato MT, Roos JL, Kohn Y, Amann-Zalcenstein D, Kalsi G, McQuillin A, Curtis D, Brynjolfson J, Sigmundsson T, Petursson H, Sanders AR, Duan J, Jazin E, Myles-Worsley M, Karayiorgou M, Lewis CM. Meta-analysis of 32 genome-wide linkage studies of schizophrenia. Mol Psychiatry. 2009;14:774–785. doi: 10.1038/mp.2008.135. [DOI] [PMC free article] [PubMed] [Google Scholar]




