Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Sep 1.
Published in final edited form as: Genet Epidemiol. 2014 Jul 8;38(6):483–493. doi: 10.1002/gepi.21814

X-chromosome Genetic Association Test Accounting for X-inactivation, Skewed X-inactivation, and Escape from X-inactivation

Jian Wang 1,2, Robert Yu 1, Sanjay Shete 1,2
PMCID: PMC4127090  NIHMSID: NIHMS590049  PMID: 25043884

Abstract

X-chromosome inactivation (XCI) is the process in which one of the two copies of the X-chromosome in females is randomly inactivated to achieve the dosage compensation of X-linked genes between males and females. That is, 50% of the cells have one allele inactive and the other 50% of the cells have the other allele inactive. However, studies have shown that skewed or non-random XCI is a biological plausibility wherein more than 75% of cells have the same allele inactive. Also, some of the X-chromosome genes escape XCI, i.e., both alleles are active in all cells. Current statistical tests for X-chromosome association studies can either account for random XCI (e.g., Clayton’s approach) or escape from XCI (e.g., PLINK software). Because the true XCI process is unknown and differs across different regions on the X-chromosome, we proposed a unified approach of maximizing likelihood ratio over all biological possibilities: random XCI, skewed XCI, and escape from XCI. A permutation-based procedure was developed to assess the significance of the approach. We conducted simulation studies to compare the performance of the proposed approach with Clayton’s approach and PLINK regression. The results showed that the proposed approach has higher powers in the scenarios where XCI is skewed while losing some power in scenarios where XCI is random or XCI is escaped, with well-controlled type I errors. We also applied the approach to the X-chromosomal genetic association study of head and neck cancer.

Keywords: X-chromosome, X-chromosome inactivation, skewness, escape from X-chromosome inactivation, SNP, genome-wide association study, likelihood ratio

Introduction

X-chromosome inactivation (XCI) on female X-chromosome loci, which was originally hypothesized by Lyon in 1961,1 states that in females during early embryonic development one of the two copies of the X-chromosome present in each cell is randomly inactivated to achieve the dosage compensation of X-linked genes in males and females29. Because of this random XCI, two copies of the X-chromosome in females do not have twice the effect of a single copy of the X-chromosome in males. Clayton’s approach10 was the first statistical method taking the random XCI into account when analyzing the X-chromosome genetic data. He proposed two chi-squared tests, including the 1-degree-of-freedom and 2-degrees-of-freedom chi-squared tests, where the males were treated as homozygous females in the models. Specifically, three genotypes of females are coded as 0, 1, or 2, while two genotypes of males are coded as 0 or 2. With this coding strategy, the heterozygous genotype in females falls midway between two homozygous genotypes on the linear predictor scale,10 which is appropriate because in heterozygous females about 50% of cells have the deleterious allele active while the other 50% of cells have the normal allele active due to random XCI. The 1-degree-of-freedom chi-squared test proposed by Clayton has been shown to be more powerful in previous studies7,8. Clayton’s approach is also implemented in other software programs for genetic analysis, such as IMPUTE11,12 and MaCH.13

The XCI process is in general random; however, studies have suggested that skewed or non-random XCI is a biological plausibility.3,5,9,1419 In this study, we denote this phenomenon of skewed X-chromosome inactivation as XCI-S. The skewness of X-chromosome inactivation has been defined using an arbitrary threshold as inactivation of one of the alleles in more than 75% of cells.3,5,2024 Extreme or severe skewness, which is defined as inactivation of one of the alleles in more than 90% of cells, has also been observed.3,5,9,18,2429 In a population of phenotypically unaffected females, the percentage of cells with one X-chromosome active can range from 50% (i.e., random XCI) to 100% (i.e., same X-chromosome is active in all cells).18,19 Skewed XCI has been observed in young children, but the skewness increases with age.3,5,14,15,18,24

Multiple studies of complex disorders have shown that the skewed XCI pattern could be more common in affected females than in unaffected females. For example, Plenge et al.16 reported that XCI-S pattern is a relatively common feature in women with X-linked mental retardation disorders. They found that approximately 50% of affected women demonstrated a markedly XCI-S pattern, compared with only 10% of female control subjects. Talebizadeh et al.30 showed that the XCI-S pattern was observed in a larger proportion of females in the autism group (33%) than in the control group (11%). Chabchoub et al.23 found that the XCI-S pattern was observed in 34% of rheumatoid arthritis patients and 26% of autoimmune thyroid disease patients, compared to 11% of controls. Two other studies have suggested that the XCI-S pattern is more common in patients with invasive ovarian cancer and young patients with breast cancer than in controls.31,32 Therefore, it is important to account for XCI-S when testing the association between X-chromosome genetic markers and diseases. In such association studies, special consideration is needed because one cannot assume that the genotypic effects for heterozygous females will be midway between two homozygous genotypes. To our knowledge, no statistical test has been developed to account for the skewed X-chromosome inactivation.

Another complexity in analyzing X-chromosome data is the escape from XCI (denoted as XCI-E) outside the pseudo-autosomal regions on the X-chromosome. It is estimated that about 75% of X-linked genes undergo silencing of one copy of the female X-chromosomes as the result of X-chromosome inactivation; however, the remaining genes may escape inactivation, and in those genes both alleles will be active (i.e., no dosage compensation).9,3336 The XCI-E regions can be analyzed using the standard association tests for autosomal loci, such as allele-counting approaches37 and the regression approach used by PLINK.38 Zheng et al.37 proposed six association tests for X-chromosome genetic markers, using different combinations of tests for male and female samples based on the genotypic counts and allelic counts in cases and controls. PLINK is the most popular software for genome-wide association (GWA) studies and has been widely used in association studies of the X-chromosome.3941 PLINK performs the association tests for X-chromosome loci in two ways: using only females or using all samples in regression models (linear or logistic) that include sex as a covariate. The first approach might lead to a loss of power for the analysis because of the smaller sample size due to the exclusion of males from the analyses. For the regression models, PLINK codes the genotypes assuming the effect of the deleterious allele in males is the same as the effect of the heterozygote genotype in females, that is, three genotypes of females are coded as 0, 1, or 2, while two genotypes of males are coded as 0 or 1. Both the PLINK and Zheng et al. approaches account for escape from XCI but ignore biologically plausible random and skewed XCI mechanisms. On the other hand, Clayton’s approach accounts for random XCI but ignores escape from XCI and skewed XCI.

Because the true underlying XCI process is unknown and differs across different regions on the X-chromosome, we proposed a unified approach that maximizes the likelihood ratio over all such biological possibilities: random XCI, XCI-S, and XCI-E. A permutation-based procedure was developed to assess the significance of the proposed association test. We conducted simulation studies to investigate the performance of the proposed approach and compared it to the 1-degree-of-freedom chi-squared test proposed by Clayton and the PLINK regression approach. The results showed that the proposed association test had higher power than the other two approaches in the scenarios where XCI was skewed while losing some power in scenarios where XCI was random or XCI escape occurred. The type I errors of all three methods were well controlled. We also applied all three approaches to investigate X-chromosome genetic association in head and neck cancer.

Methods

We considered a single nucleotide polymorphism (SNP) on the X-chromosome with two alleles: deleterious allele A and normal allele a. We assumed a binary random variable for the disease of interest and denoted it as Y = {0, 1}, with 0 representing individuals without the disease and 1 representing individuals with the disease. As discussed above, the true underlying XCI process is unknown and differs from region to region on the X-chromosome; therefore, at any given locus on the X-chromosome it is possible to observe one of four biological models: XCI, XCI-S in the direction of the deleterious allele, XCI-S in the direction of the normal allele, and XCI-E. We aimed to account for all of these biological models in our statistical approach for the X-chromosome association test. Particularly, for random and non-random X-chromosome inactivation, i.e., XCI and XCI-S, we used a random variable X = {0, 2} to denote alleles a and A, respectively, for males and a random variable X = {0, γ, 2} to denote genotypes (a, a), (A, a), and (A, A), respectively, for females, where γ ∈ [0, 2]. Because we considered both random and non-random XCI in the model, we would not know the true underlying percentage of skewness with certainty. Therefore, instead of using a fixed number for γ (i.e., 1 as denoted in Clayton’s approach), we used a number for γ that varied between 0 and 2 to denote the level of skewness in the heterozygous females. Note that when γ = 1, this coding is the same as in Clayton’s additive genetic model, which assumes a random XCI. When γ takes a value between 1 and 2, this coding assumes a non-random XCI-S skewed towards the deleterious allele. For example, γ = 1.5 represents a scenario where 75% of the cells have the deleterious allele active and the other 25% of the cells have the normal allele active. When γ takes a value between 0 and 1, this coding assumes a non-random XCI-S skewed towards the normal allele. For example, γ = 0.5 represents a scenario where 25% of the cells have the deleterious allele active and the other 75% of the cells have the normal allele active. To account for XCI-E, we used the same coding as the one used by PLINK: for males, we used a binary random variable X = {0, 1} to denote alleles a and A, respectively; for females, we used a categorical random variable X = {0, 1, 2} to denote genotypes (a, a), (A, a), and (A, A), respectively. In this scenario, both copies of the X-chromosome in females are active, so the males carrying the deleterious allele were treated as heterozygous females.

Given a case-control sample with N subjects, the association between an SNP on X-chromosome X and disease of interest Y can be expressed using a logistic model:

Logit(P(Y=1X))=β0+β1X,

where β0 and β1 are regression coefficients, and X ∈ M, where M is a set of different coding values for X based on sex of the individual and different XCI processes and is defined as M={XF={0,γ,2},γ[0,2];X{XCI,XCI-S}M={0,2};XXCI-EM={0,1}}, where XF denotes coding for three genotypes (a, a), (A, a), and (A, A) for females, and XM denotes coding for two allele types a and A for males and the subscript denotes the XCI process.

For each individual, the conditional probability can be written as

πi=P(yi=1xi)=exp(β0+β1xi)1+exp(β0+β1xi),i=1,,N,

where xi is the observed value of SNP as denoted in M based on the sex of the individual and the underlying XCI process. Given the sample data, the likelihood is written as L(YX;β0,β1)=i=1N(exp(β0+β1xi)1+exp(β0+β1xi))yi(11+exp(β0+β1xi))1-yi under the alternative hypothesis and L(Yβ0)=i=1N(exp(β0)1+exp(β0))yi(11+exp(β0))1-yi under the null hypothesis. The likelihood ratio, therefore, can be expressed as a function of the coding strategy X:

LR(X)=L(YX;β0,β1)L(Yβ0),XM. (1)

As discussed above, the underlying biological process for XCI is unknown; therefore, we infer the optimal coding strategy of X that maximizes the likelihood ratio in Equation (1) given the sample data:

argmaxXMLR(X)=L(YX;β0,β1)L(Yβ0). (2)

In the above maximization scheme, we performed a grid search in which the γ value ranged from 0 to 2. Given the fixed coding of X, we can estimate the regression coefficients β0 and β1 by maximizing the likelihood ratio LR as in Equation (1), and the corresponding LR can be calculated. Thus, the maximum LR, or LR*, corresponding to the optimal coding strategy X* given the sample data, can be obtained by enumerating all the coding strategies X ∈ M. Moreover, the effect size (or odds ratio [OR] for the logistic model) of the association between the disease and the SNP can be obtained using the β1(OR=exp(β1)) corresponding to LR*.

Based on the simulation studies, we found that we do not need to perform a grid search using a small step function as it has very little impact on the LR values and grid search strategy typically leads to loss of statistical power because of the multiple testing corrections. Therefore, we considered only four coding strategies: one coding for XCI-E and three coding for XCI and XCI-S. Particularly, the value for γ was set as 0, 1, or 2 to represent XCI-S towards the normal allele, random XCI, or XCI-S towards the deleterious allele, respectively.

Permutation-based Calculation of Empirical P Value

To assess the significance of the statistical test, we proposed a permutation-based procedure to compute the empirical p values. With N subjects, the empirical p value corresponding to the maximum LR* with respect to the optimal coding strategy X* was obtained as follows:

  1. We randomly permuted the values of disease status for B times and kept all the other variables unchanged (i.e., SNP). By permuting the disease status values, we ensured that there would be no association between the disease and the SNP.

  2. For each permuted disease status, we evaluated the association between the disease and SNP and obtained permuted LRu, u = 1, 2, …, B, corresponding to the optimal strategy Xu.

  3. The empirical p value of LR* was estimated from the proportion of LRu, u = 1, 2, …, B, resulting from permutations greater than the observed LR:(numberofLRu>LR)/B.

Simulation Approach

We performed simulation studies to investigate the performance of the proposed statistical test for X-chromosome genetic association studies and compared the approach to Clayton’s 1-degree-of-freedom test and the PLINK regression approach. We considered an associated di-allele SNP to assess the power and another un-associated SNP to assess the type I error rate. In addition to the genetic risk factors, we also included sex in the simulation model as follows:

Logit(P(Y=1X1,X2,Xsex))=β0+β1X1+β2X2+βsXsex.

In the logistic model, X1 and X2 represent associated SNP1 and unassociated SNP2, and Xsex represents the sex covariate. The minor allele frequency (MAF) for both SNPs was assumed to be 40%. We fixed the regression coefficients at β1 = 0.2624 and β2 = 0, which correspond to ORs of 1.3 and 1, respectively. We assumed that sex was associated with the disease of interest (βs = 0.4055). We used a binary random variable for sex, Xsex = {0, 1}, with either female or male being at increased risk for disease (i.e., coded as 1). The intercept coefficient β0 was set as −2.55. Note that allowing sex to be an independent risk factor, we are considering scenarios with different male and female proportions in cases and controls. Across different scenarios listed in Table 1, the proportions of females in cases varied from 40% to 60%. We also investigated different MAFs in males and females, which has been shown to have an impact on different statistical approaches for X-chromosome genetic association in previous studies7,8. We observed that the largest estimated difference in MAFs of males and females was ~13% based on the head and neck X-chromosomal genetic data. Thus, in some simulation scenarios, we set the MAF as 30% (or 40%) for males and 40% (or 30%) for females, respectively.

Table 1.

Median odds ratios (ORs) and 95% confidence intervals (CIs) for PLINK regression, Clayton’s 1-degree-of-freedom test, and our approach, based on 100,000 replicates each with 1,000 cases and 1,000 controls. The true ORs for simulation were 1.3 for SNP1 and 1.0 for SNP2.

Biological models Increased risk* Median OR (95% CI)
PLINK
Clayton
Our Approach
SNP1 SNP2 SNP1 SNP2 SNP1 SNP2
XCI-S to deleterious allele Male 1.47 (1.27–1.71) 1.00 (0.86–1.16) 1.32 (1.19–1.46) 1.00 (0.90–1.11) 1.32 (1.20–1.44) 1.00 (0.89–1.12)
Female 1.46 (1.26–1.69) 1.00 (0.86–1.16) 1.32 (1.19–1.47) 1.00 (0.90–1.11) 1.32 (1.20–1.45) 0.99 (0.89–1.12)
XCI-S to normal allele Male 1.40 (1.21–1.63) 1.00 (0.86–1.16) 1.29 (1.16–1.42) 1.00 (0.90–1.11) 1.31 (1.19–1.45) 0.99 (0.89–1.12)
Female 1.37 (1.19–1.58) 1.00 (0.86–1.16) 1.28 (1.16–1.42) 1.00 (0.90–1.11) 1.31 (1.18–1.45) 0.99 (0.89–1.12)
Random XCI Male 1.44 (1.24–1.67) 1.00 (0.86–1.16) 1.30 (1.17–1.44) 1.00 (0.90–1.11) 1.31 (1.18–1.45) 0.99 (0.89–1.12)
Female 1.41 (1.22–1.63) 1.00 (0.87–1.16) 1.30 (1.17–1.44) 1.00 (0.90–1.11) 1.31 (1.18–1.46) 1.01 (0.90–1.12)
XCI-E Male 1.30 (1.12–1.51) 1.00 (0.86–1.16) 1.19 (1.07–1.32) 1.00 (0.90–1.11) 1.25 (1.11–1.43) 1.01 (0.89–1.12)
Female 1.30 (1.13–1.50) 1.00 (0.86–1.16) 1.20 (1.08–1.33) 1.00 (0.90–1.11) 1.26 (1.11–1.44) 1.00 (0.89–1.12)

XCI: X-chromosome inactivation; XCI-S: skewed X-chromosome inactivation; XCI-E: escape from X-chromosome inactivation

*

male or female implies the gender for which the disease risk was higher

Given these parameters, we first randomly generated the sex for each subject on the basis of the prevalence of males in the general population (i.e., 50%). Because males are hemizygous, the genotypes were simulated conditional on sex according to the different biological models discussed in the Methods section. The disease statuses were then generated based on SNP genotypes and sex. Using this approach, we simulated a large amount of data on the population of interest and then randomly selected 1,000 cases (subjects with the disease) and 1,000 controls (subjects without the disease). We employed the permutation procedure described above to evaluate the empirical p values for our approach based on B=100,000 permutations. The results for the PLINK regression approach were obtained using PLINK software, version 1.0738. Clayton’s 1-degree-of-freedom test was performed with the use of R package ‘snpStats’ software developed by Clayton 42. The powers and type I error rates reported for the simulation studies were based on 100,000 replicate datasets.

Furthermore, to investigate the potential bias in OR estimates obtained using different approaches, we performed additional simulations. Particularly, we simulated a range of ORs from 1.0 to 3.0 at 0.1 grid values resulting in a total of 21 ORs for each of the four biological models: random XCI, XCI-S toward either the deleterious or normal allele, and XCI-E. As in the previous simulations, we used an SNP MAF of 40%, with males coded as 1 and females coded as zero and a corresponding beta coefficient (βs = 0.4055). We reported median estimated ORs based on 500 replicates, each with 1,000 cases and 1,000 controls.

Results

In Table 1, we report the median estimated ORs and their 95% confidence intervals (CIs) for testing the association between X-chromosome SNPs and the disease of interest using PLINK regression, Clayton’s 1-degree-of-freedom test, and the proposed approach. For all four biological models, all three approaches provided accurate OR estimates with comparable 95% CIs when the SNP was not associated with the disease (i.e., SNP2). When the SNP was associated with the disease (i.e., SNP1), the PLINK regression highly over-estimated ORs for most of the scenarios. For example, the estimated median ORs for the XCI-S to the deleterious allele model in males and females at increased risk, respectively, were 1.47 and 1.46, compared to the true OR of 1.3. As expected, the only scenario in which PLINK regression provided accurate ORs was when the simulated biological model was XCI-E. In contrast, our approach and Clayton’s 1-degree-of-freedom test provided accurate OR estimates for most of the scenarios except for the XCI-E biological model. However, our approach was less biased for the XCI-E biological model compared to Clayton’s approach. In this scenario, compared to the true OR of 1.3, Clayton’s approach provided estimated median ORs of 1.19 and 1.20, respectively, for males and females at increased risk, whereas our approach provided estimated median ORs of 1.25 and 1.26, respectively. We also investigated the 95% coverage probabilities for the confidence intervals using the three approaches and observed similar trends (Supplementary Material Table S1).

To further investigate the bias in OR estimates, we performed simulations for a range of ORs. Figure 1 shows the estimated ORs obtained using the different approaches compared to the true ORs used for the simulation of these datasets. Panels (A) to (D) correspond to different biological models. Each panel shows the median ORs based on 500 replicates. For all four of the biological models, our approach provided accurate OR estimates for the entire simulated range of ORs, except when the true model was XCI-E and ORs were relatively small (1.2–1.5) because in these scenarios the different XCI models have very close likelihood ratio values limiting ability of our approach to select the true XCI-E model, which in turn leads to underestimation of the estimated ORs (Figure 1(D)). PLINK regression provided highly over-estimated ORs except for the XCI-E model, and the magnitude of bias increased as the true ORs increased. For example, when the true OR was 3, PLINK regression gave OR estimates close to 5 for the random and skewed XCI models (Figure 1, panels (A)–(C)). Clayton’s approach provided highly under-estimated ORs for the XCI-E model, and the magnitude of bias increased as the true ORs increased. For example, when the true OR was 3, Clayton’s approach gave an OR estimate close to 2 (Figure 1, panel (D)). Clayton’s approach also provided a slightly over-estimated OR for the scenario of XCI-S toward the deleterious allele when the true ORs were higher than 2 (Figure 1, panel (A)). The proposed approach was thus found to be mostly robust for estimating ORs in different biological models.

Figure 1.

Figure 1

Estimated median odds ratios (ORs) versus true ORs assuming different underlying biological models, using PLINK regression, Clayton’s 1-degree-of-freedom test and our approach, based on 500 replicates each with 1,000 cases and 1,000 controls.

We conducted further simulations to investigate the robustness of our approach which considered only four coding strategies: one coding for XCI-E and three coding for XCI and XCI-S (see Methods section). Specifically, when generating the data for females, we used X = {0, 1.5, 2} to denote genotypes (a, a), (A, a), and (A, A), respectively, a scenario where 75% of the cells have the deleterious allele active and the other 25% of the cells have the normal allele active. We also considered another scenario where female was coded as X = {0, 0.5, 2}, reflecting 25% of the cells having the deleterious allele active and the other 75% of the cells having the normal allele active. We used two SNPs as we defined previously: associated SNP1 and unassociated SNP2 with MAFs of 40%. The true underlying ORs were set as 1.3 and 1, respectively. The median of ORs and 95% CIs were reported in Supplementary Material Table S2 based on 100,000 replicates, each with 1,000 cases and 1,000 controls. As can be seen from Table S2, the four coding strategies that we had used for our approach remained robust with either male or female as the factor increasing the disease risk.

We also investigated the type I error rates for the different approaches using SNP2, which was not associated with the disease. The type I error rates were estimated at nominal significance levels of 0.001 and 0.0005 (Table 2). We observed that, for all scenarios, all three approaches controlled the type I error rates at both nominal significance levels, and the type I error rates were similar for the three approaches. For example, when the underlying biological model was XCI-S toward the deleterious allele and females were at increased risk for the disease, the type I error rates were 0.0008, 0.0011, and 0.0012 at the 0.001 significance level and 0.0004, 0.0005, and 0.0006 at the 0.0005 significance level for PLINK regression, Clayton’s 1-degree-of-freedom test, and our approach, respectively. When the MAFs were different for males (30%) and females (40%), we considered two permutation strategies: permute case-control status using combined male and female data, and permute case-control status within sex-specific strata. Both permutation approaches provided controlled type I error rates (Supplementary Material Table S3).

Table 2.

Type I error rates for PLINK regression, Clayton’s 1-degree-of-freedom test, and our approach at different significance levels, based on 100,000 replicates each with 1,000 cases and 1,000 controls.

Biological models Increased risk* Type I Errors
PLINK Clayton Our Approach
α = 0.001
XCI-S to deleterious allele Male 0.0010 0.0007 0.0008
Female 0.0008 0.0011 0.0012
XCI-S to normal allele Male 0.0008 0.0009 0.0014
Female 0.0011 0.0012 0.0011
Random XCI Male 0.0010 0.0012 0.0011
Female 0.0009 0.0009 0.0012
XCI-E Male 0.0011 0.0010 0.0013
Female 0.0010 0.0010 0.0010

α = 0.0005
XCI-S to deleterious allele Male 0.0004 0.0004 0.0003
Female 0.0004 0.0005 0.0006
XCI-S to normal allele Male 0.0003 0.0004 0.0008
Female 0.0006 0.0007 0.0007
Random XCI Male 0.0006 0.0005 0.0003
Female 0.0006 0.0004 0.0006
XCI-E Male 0.0005 0.0006 0.0008
Female 0.0005 0.0006 0.0005

XCI: X-chromosome inactivation; XCI-S: skewed X-chromosome inactivation; XCI-E: escape from X-chromosome inactivation

*

male or female implies the gender for which the disease risk was higher

Power Comparisons

We also investigated the statistical power of each approach using SNP1, which was associated with the disease. The powers were assessed at nominal significance levels of 0.001 and 0.0005 (Table 3). When the true underlying biological model for the simulation was assumed to be XCI-S to either the deleterious or normal allele, our approach had the highest power to identify the associated SNP. For example, when the underlying model was XCI-S to the normal allele and females were at increased risk, the powers were 80.67, 89.73, and 93.72% for PLINK regression, Clayton’s approach, and our approach, respectively, at a significance level of 0.0005. The power loss for PLINK regression was highest when the true biological models were XCI-S.

Table 3.

Power comparisons for PLINK regression, Clayton’s 1-degree-of-freedom test, and our approach at different significance levels, based on 100,000 replicates each with 1,000 cases and 1,000 controls.

Biological models Increased risk* Powers
PLINK Clayton Our Approach
α = 0.001
XCI-S to deleterious allele Male 96.02% 97.52% 98.59%
Female 95.37% 96.63% 98.23%
XCI-S to normal allele Male 88.73% 94.42% 96.98%
Female 85.43% 92.79% 95.56%
Random XCI Male 92.92% 96.12% 94.95%
Female 91.32% 94.91% 94.08%
XCI-E Male 58.03% 50.21% 54.98%
Female 60.90% 53.23% 55.41%

α = 0.0005
XCI-S to deleterious allele Male 94.06% 95.99% 97.40%
Female 93.21% 94.99% 97.09%
XCI-S to normal allele Male 84.65% 91.90% 95.50%
Female 80.67% 89.73% 93.72%
Random XCI Male 89.87% 94.24% 91.36%
Female 87.94% 92.62% 92.15%
XCI-E Male 50.51% 42.43% 47.94%
Female 53.38% 45.76% 47.65%

XCI: X-chromosome inactivation; XCI-S: skewed X-chromosome inactivation; XCI-E: escape from X-chromosome inactivation

*

male or female implies the gender for which the disease risk was higher

As expected, when the underlying true biological model for simulation was assumed to be random XCI, Clayton’s 1-degree-of-freedom test always had the highest power, whereas PLINK regression had the lowest power to identify the associated SNP. In this situation, our approach had higher power than PLINK regression but lower power than Clayton’s approach. For example, when females were at increased risk, the powers were 87.94, 92.62, and 92.15% for PLINK regression, Clayton’s approach, and our approach, respectively, at a significance level of 0.0005.

As expected, when the underlying true biological model was assumed to be XCI-E, the PLINK regression approach always had the highest power to detect the associated SNP, whereas Clayton’s 1-degree-of-freedom test always had the lowest power. In this scenario, our approach had higher power than Clayton’s approach but lower power than PLINK regression. For example, when females were at increased risk, the powers were 53.38, 45.76, and 47.65% for PLINK regression, Clayton’s approach, and our approach, respectively, at a significance level of 0.0005.

We also investigated the statistical power of each approach when the MAF for female was higher than MAF for male (40% vs. 30%). Once again the powers were assessed at nominal significance levels of 0.001 and 0.0005 (Supplementary Material Table S4). The results from this scenario showed similar patterns as in Table 3. Furthermore, we once again considered two strategies for permutation for our approach: permute case-control status using combined male and female data, and permute case-control status within sex-specific strata. Both permutation approaches provided similar powers (Table S4). The scenario where the MAF for female was lower than MAF for male (30% vs. 40%) provided similar results (data not shown).

Head and Neck Cancer X-chromosome Association Test

Next, we applied our approach to a case-control association study of head and neck cancer and X-chromosome genetic variants using data from a head and neck genome-wide association study. The phase 1 analysis included 2,718 individuals, with 1,161 head and neck cancer patients and 1,557 controls frequency-matched to the cases by age (±5 years), sex, residency (by county), and ethnicity. There were 902 males and 259 females in the cases and 986 males and 571 females in the controls. The phase 2 analysis included 3,996 individuals, with 1,031 patients and 2,965 controls. There were 786 males and 245 females in the cases and 1,507 males and 1,458 females in the controls. The head and neck cancer cases were accrued at The University of Texas MD Anderson Cancer Center (UT MD Anderson) and were patients with newly diagnosed, histologically confirmed, previously untreated head and neck cancer, including cancers of the oral cavity, pharynx, and larynx. In both phases, genotyping of cases was conducted using Illumina HumanOmniExpress-12v1 BeadChip. For phase 1 analysis, after removing the individuals with discordant sex information, genotypes were available for 1,155 cases. For controls, we used Illumina HumanOmniExpress-12v1 BeadChip genotypes on 531 individuals recruited by UT MD Anderson for the study of head and neck cancers and Illumina Omni1-Quad_v1-0_B BeadChip genotypes on 1,026 individuals also recruited at UT MD Anderson for the study of cutaneous melanoma previously43. After removing the individuals with discordant sex information, genotypes were available for 1,547 individuals. The phase 2 analysis was based on genotyping 1,031 cases ascertained by UT MD Anderson. For phase 2 controls, we used Illumina HumanOmniExpress-12v1 BeadChip genotypes on 643 individuals recruited by UT MD Anderson and Illumina Human1Mv1 BeadChip genotypes on 2,322 European-descendent-only individuals from the Study of Addiction: Genetic and Environment provided by the National Center for Biotechnology Information and downloaded from dbGaP44. From the second phase data, no individual was removed due to discordant sex information. This case-control study was approved by the institutional review board at UT MD Anderson, and all participants provided written informed consent. In the phase 1 analysis, 14,169 tagging SNPs were genotyped on the X-chromosome; in the phase 2 analysis, 14,371 tagging SNPs were genotyped on the X-chromosome. We excluded SNPs that were missing in more than 10% of the study population. To assess the empirical p values for our approach, we used 1,000,000 permutations in both phases. The fixed and random effect model analyses in the meta-analysis were conducted using PLINK software, version 1.0738.

In the phase 1 study, we selected the top 50 SNPs based on the most significant p values obtained using the PLINK regression approach and another top 50 SNPs based on the most significant p values obtained using Clayton’s 1-degree-of-freedom test. In the phase 2 data, a total of 33 SNPs were available from the list of SNPs that were significant using PLINK regression and Clayton’s 1-degree-of-freedom test in phase 1. We then performed meta-analysis of the 33 SNPs based on the results from the phase 1 and phase 2 data using Fisher’s method and the fixed and random effects models. The resulting combined p values for the three approaches, as well as the corresponding p values for Cochrane’s Q statistic and heterogeneity indexes I, are reported in Table 4 (ranked using Fisher’s method p values based on our approach). We also showed the −log10(meta-analysis p values) for the 33 SNPs with respect to their base-pair positions on the X-chromosome (Figure 2). Given that there are 14,169 SNPs in phase 1 and 14,371 SNPs in phase 2, the chromosome-wide significance level should be approximately 3.5×10−6. Using the proposed approach, SNP rs12388803 had meta-analysis-based p values of 2.04×10−6, 2.83×10−6, and 2.83×10−6 using the Fisher’s, fixed effect, and random effect models, respectively, which reached the chromosome-wide significance threshold. Using Clayton’s approach, the corresponding meta-analysis p values were 3.74×10−5, 8.58×10−6, and 8.58×10−6, and using PLINK regression, the corresponding meta-analysis p values were 3.22×10−3, 9.16×10−4, and 9.16×10−4. The p values using Clayton’s method approached chromosome-wide significance, whereas the PLINK regression method gave p values that were much less significant.

Table 4.

Results of meta-analysis of SNPs combining results from phases 1 and 2 based on PLINK regression, Clayton’s 1-degree-of-freedom test, and our approach using Fisher’s method, fixed effect model, or random effect model of performing meta-analysis.

rs number bp PLINK
Clayton
Our approach
Fisher Fixed Random Q I Fisher Fixed Random Q I Fisher Fixed Random Q I
rs12388803 94862551 3.22E-03 9.16E-04 9.16E-04 0.904 0.00 3.74E-05 8.58E-06 8.58E-06 0.889 0.00 2.04E-06 2.83E-06 2.83E-06 0.624 0.00
rs1554987 48621514 4.19E-05 1.52E-04 1.27E-01 0.012 84.07 8.50E-05 2.25E-04 1.19E-01 0.018 82.06 8.38E-06* 1.31E-05 6.41E-03 0.192 41.15
rs2075837 48676839 6.81E-05 1.13E-04 6.35E-02 0.035 77.50 1.53E-04 2.05E-04 5.75E-02 0.049 74.32 1.71E-04 1.14E-04 6.34E-02 0.035 77.45
rs4824286 145929424 1.54E-04 1.17E-03 2.22E-01 0.005 87.30 8.35E-04 2.23E-03 1.72E-01 0.020 81.45 1.83E-04 3.17E-03 2.13E-01 0.001 90.37
rs5906714 48684646 7.28E-05 1.34E-04 7.25E-02 0.031 78.47 1.58E-04 2.23E-04 6.30E-02 0.045 75.10 3.22E-04 1.33E-04 7.20E-02 0.031 78.41
rs5905706 48619002 1.40E-04 6.67E-04 1.86E-01 0.009 85.46 2.82E-04 8.63E-04 1.64E-01 0.016 82.90 3.32E-04 3.43E-03 2.22E-01 0.002 90.13
rs760393 48612615 1.19E-04 3.64E-04 1.33E-01 0.016 82.85 2.55E-04 5.40E-04 1.17E-01 0.026 79.86 3.39E-04 2.46E-03 2.02E-01 0.002 89.43
rs4824284 145929326 1.20E-04 9.15E-04 2.16E-01 0.005 87.31 7.27E-04 3.00E-03 2.23E-01 0.011 84.55 4.09E-04 9.00E-04 2.15E-01 0.005 87.32
rs5906709 48646906 1.51E-04 5.24E-04 1.53E-01 0.014 83.60 3.41E-04 8.16E-04 1.39E-01 0.022 80.92 5.43E-04 3.57E-03 2.18E-01 0.002 89.72
rs2579849 11949003 1.01E-03 4.01E-02 4.75E-01 0.001 90.14 1.89E-03 8.12E-02 5.37E-01 0.002 89.78 9.06E-04 3.61E-02 5.18E-01 0.000 91.92
rs10521783 136444480 6.00E-04 2.38E-01 6.75E-01 0.000 92.48 2.76E-04 2.79E-01 7.33E-01 0.000 93.30 9.24E-04 2.77E-01 7.33E-01 0.000 93.28
rs5904897 145939528 4.11E-04 2.03E-03 2.16E-01 0.009 85.51 1.93E-03 3.76E-03 1.62E-01 0.032 78.23 1.03E-03 4.53E-03 2.06E-01 0.003 88.91
rs5975918 136446557 8.47E-04 2.24E-01 6.61E-01 0.000 92.03 3.52E-04 2.66E-01 7.23E-01 0.000 93.06 1.23E-03 2.65E-01 7.23E-01 0.000 93.03
rs2239477 123545077 9.70E-04 5.19E-03 2.39E-01 0.009 85.50 9.55E-04 5.84E-03 2.66E-01 0.007 86.32 1.23E-03 2.11E-03 2.10E-01 0.008 85.93
rs3373 48567295 6.97E-04 9.03E-02 6.06E-01 0.001 91.50 1.69E-03 1.07E-01 6.12E-01 0.001 90.24 1.33E-03 6.72E-01 5.41E-01 0.000 93.41
rs16995035 148268501 1.13E-03 1.39E-01 5.77E-01 0.001 91.29 5.85E-03 1.37E-01 5.61E-01 0.004 87.88 2.15E-03 1.22E-02 7.12E-01 0.006 86.76
rs6417935 55960724 5.98E-03 6.71E-02 4.41E-01 0.006 86.68 3.94E-03 1.55E-02 2.95E-01 0.012 84.05 2.96E-03 1.49E-03 9.70E-02 0.070 69.55
rs5935185 11930465 2.27E-03 5.12E-03 1.86E-01 0.027 79.67 2.15E-03 6.59E-03 2.34E-01 0.017 82.50 3.24E-03 5.68E-04 7.76E-02 0.111 60.68
rs2105910 57778934 5.19E-03 3.50E-03 3.66E-02 0.159 49.52 1.40E-03 1.13E-03 4.27E-02 0.108 61.34 3.26E-03 1.04E-03 4.60E-02 0.087 65.82
rs7888207 11916455 7.06E-03 1.56E-02 2.52E-01 0.027 79.62 7.10E-03 1.62E-02 2.57E-01 0.026 79.76 3.47E-03 4.64E-03 2.18E-01 0.015 83.18
rs16993411 136486358 3.84E-03 1.38E-01 5.68E-01 0.003 89.00 1.63E-03 1.55E-01 6.34E-01 0.001 90.74 4.14E-03 1.51E-01 6.65E-01 0.001 90.89
rs1231461 12042112 2.02E-03 2.03E-02 3.75E-01 0.005 87.55 4.80E-03 5.18E-02 4.47E-01 0.006 86.66 4.21E-03 2.09E-01 4.24E-01 0.001 91.74
rs5935181 11923987 3.78E-03 1.11E-02 2.72E-01 0.017 82.52 4.09E-03 2.04E-02 3.56E-01 0.010 84.96 5.07E-03 2.32E-02 4.17E-01 0.005 87.64
rs708467 11846155 4.66E-03 1.52E-01 5.85E-01 0.003 88.64 2.96E-03 3.14E-01 7.15E-01 0.001 90.40 5.76E-03 7.62E-01 5.98E-01 0.001 91.69
rs7876455 107260683 6.96E-03 1.07E-02 1.68E-01 0.050 73.88 6.95E-03 7.41E-03 1.09E-01 0.084 66.42 6.63E-03 1.31E-02 1.69E-01 0.029 78.96
rs5978730 15495233 6.19E-03 1.01E-02 1.87E-01 0.044 75.39 3.89E-03 7.62E-03 2.00E-01 0.032 78.27 6.83E-03 4.73E-03 1.75E-01 0.031 78.39
rs1415124 145949281 5.79E-03 3.75E-02 3.89E-01 0.009 85.38 1.14E-02 2.15E-02 2.57E-01 0.035 77.57 8.37E-03 1.49E-02 1.99E-01 0.015 83.21
rs844971 140439277 2.38E-02 4.31E-02 3.08E-01 0.038 76.80 6.52E-03 1.50E-02 2.66E-01 0.025 80.20 9.00E-03 1.45E-02 2.64E-01 0.015 83.27
rs12394374 135076147 5.20E-03 1.69E-01 5.68E-01 0.003 88.59 2.18E-02 4.72E-01 6.50E-01 0.008 85.76 1.05E-02 6.05E-01 5.45E-01 0.001 90.82
rs5935986 15498034 1.01E-02 1.29E-02 1.68E-01 0.064 70.92 6.40E-03 9.77E-03 1.82E-01 0.047 74.72 1.12E-02 1.83E-03 3.86E-02 0.200 39.02
rs10856171 145878179 5.05E-03 7.36E-03 1.58E-01 0.050 74.06 1.56E-02 1.68E-02 1.54E-01 0.085 66.31 1.26E-02 3.62E-02 2.48E-01 0.010 85.10
rs5929779 136040115 8.32E-03 1.70E-02 2.32E-01 0.033 78.07 5.93E-03 1.22E-02 2.15E-01 0.032 78.27 1.30E-02 9.17E-03 1.90E-01 0.042 75.73
rs3761543 48554637 8.79E-03 3.91E-02 3.86E-01 0.014 83.46 4.02E-02 7.13E-02 3.59E-01 0.047 74.66 1.60E-02 2.98E-01 4.32E-01 0.002 89.38
*

The phase 1 p value was less than 1.00E-06. The meta-analysis p value was calculated using a p value of 1.00E-06 for phase 1.

bp: base-pair position; Q: p value for Cochrane’s Q statistic; I: heterogeneity index

Figure 2.

Figure 2

Values of −log(meta-analysis p values) of 33 X-chromosome SNPs for head and neck cancer genome-wide association data based on PLINK regression, Clayton’s 1-degree-of-freedom test and our approach, with respect to their base-pair positions.

For this SNP rs12388803, we also investigated potential heterogeneity between phase 1 and phase 2 data using Cochrane’s Q statistic and the heterogeneity index, I. The p values of Cochrane’s Q statistic were 0.904, 0.889, and 0.624 for PLINK regression, Clayton’s approach, and our approach, respectively, and the heterogeneity index values were 0 for all three approaches, implying that there is no heterogeneity for this SNP between the phase 1 and phase 2 studies.

Discussion

The biological process for X-chromosome inactivation is complex. In addition to the random XCI process, non-random, or skewed, XCI has been shown to be a biological plausibility associated with complex disorders. Furthermore, some of the X-linked genes altogether escape XCI. Currently, to our knowledge, there is no method of association testing that accounts for all of the different plausible biological models. To overcome this limitation, we proposed a unified approach for maximizing the likelihood ratio that accounts for the unknown underlying X-chromosome inactivation process, including random XCI, skewed XCI towards either the deleterious or normal allele, and escape from XCI. We also developed a permutation procedure to obtain p values for the proposed approach. We conducted simulation studies to investigate the performance of the proposed approach and compared it to PLINK regression and Clayton’s 1-degree-of-freedom test. We examined multiple scenarios with different plausible biological models (random XCI, XCI-S towards either allele, and XCI-E) and different sexes at increased risk for the disease.

Power comparisons showed that Clayton’s 1-degree-of-freedom test was the most powerful approach when the true underlying biological model was random XCI, but it lost some power when the true underlying biological models were escape from or skewed in X-chromosome inactivation. On the other hand, PLINK regression was the most powerful approach when the true underlying biological model was XCI-E but would lose power when the true underlying biological models were random or skewed X-chromosome inactivation. Finally, the proposed approach was the most powerful when the true underlying biological model was XCI-S (towards either the deleterious or normal allele), and it lost a small amount of power when the true underlying biological models were random or escape from X-chromosome inactivation.

We also investigated the potential bias in the OR estimations for the three approaches. PLINK regression provided upward biased ORs for random XCI and XCI-S models, and the magnitude of over-estimation increased when the true ORs were higher; Clayton’s approach provided under-estimated ORs for the XCI-E model and slightly over-estimated ORs for XCI-S to the deleterious allele model, and the magnitude of bias increased as the true OR values increased. Our approach provided accurate estimations for ORs for all four biological models, except when the true model was XCI-E and ORs were relatively small (1.2–1.5). We also conducted simulation studies using other parameters, including different ORs for the association between sex and disease of interest, different ORs for the disease-associated SNP1, and different MAFs such as 10%, and obtained similar results and conclusions (data not shown).

In addition to reporting our new approach developed for testing the association between X-chromosome SNPs and the disease of interest, we also have compared, for the first time to our knowledge, PLINK regression and Clayton’s approach under scenarios of XCI-S towards either the deleterious or normal allele. We found that in our simulation studies, PLINK regression had more loss of power than Clayton’s approach in general.

We also applied our approach to the case-control association study of head and neck cancer and X-chromosome genetic variants. Based on the meta-analysis outcomes combining results from both phases, we found that, using our approach, SNP rs12388803 reached the chromosome-wide significance threshold. Clayton’s test provided p values approaching chromosome-wide significance, and PLINK regression gave p values that were much less significant. The optimal biological model identified for this SNP is XCI-S toward to deleterious allele. This SNP does not belong to any gene region and is not functional. Additional studies are needed to externally validate our findings.

We considered two permutation strategies: permute case-control status using combined male and female data, and permute case-control status within sex-specific strata. Both permutations strategies provided similar results in the simulation studies and head and neck X-chromosomal genetic data analysis. However, these findings could be due to the fact that the differences in MAFs in males and females were not very large (≤10%). There could be a scenario where this difference could be much higher. Therefore, we recommend performing the permutations within males and females separately. A computer program that analyzes X-chromosomal SNP association with the use of the proposed approach is available at website https://sites.google.com/site/jianwangswebsite/xchrom. The computation time of the program highly depends on the number of permutations conducted and the number of clusters used. For example, to obtain the results reported in Table 4, the program took about 9 hours to conduct 1,000,000 permutations (at approximate X-chromosome-wide significance level), using multiple high performance clusters with 3.07GHz CPU and 96 GB memory available in UT MD Anderson, which showed that it is feasible to use our approach for the X-chromosome-wide genetic association study.

There are several advantages to the approach proposed in this article. First of all, the approach was developed based on biologically plausible models. Not only does this approach account for random XCI and escape from XCI as do Clayton’s approach and PLINK regression, respectively, it also accounts for the skewed XCI pattern, which, to our knowledge, has not been considered in previous X-chromosomal genetic variant association tests. As we have discussed in the Introduction section, the skewed XCI pattern is a special phenomenon that is more common in affected females in certain complex diseases, whereas random XCI is more common in unaffected females 16,23,3032. Therefore, accounting for this phenomenon of skewed XCI will increase the power of detecting X-chromosome disease-associated genetic variants. If the genetic association test is conducted within the pseudo-autosomal regions or within the genes that have been identified to escape XCI, one may choose to employ PLINK regression for the study. However, for most of the X-chromosomal regions, the true underlying XCI process is not known with certainty and could differ from region to region; our approach is therefore more robust than Clayton’s approach or PLINK regression.

In genetic association studies, there might be differences in the genetic architecture between females and males. For example, there might be different MAFs, effect sizes, or prevalence values for males and females, different numbers of males and females in the study sample, and different sex ratios in cases and controls 7,8. Therefore, we recommend always including sex as a covariate when conducting X-chromosomal genetic association study using our proposed approach. Also, studies have shown that the prevalence of the skewed XCI pattern increases in females with increasing age 3,5,14,15,18,24, which might be included in the analysis as an interaction between genetic variant and age.

In conclusion, the new approach we propose in this study was developed based on biological plausibility and accounts for all possibilities of the X-chromosome inactivation process. The proposed approach controls the type I error rate and compared with current approaches has higher powers in the scenarios where XCI is skewed with some loss of power in scenarios where XCI is random or XCI is escaped. Finally, the approach is more robust to different XCI processes, including random XCI, XCI-S towards the deleterious or normal allele, and XCI-E, than the existing popular approaches of PLINK regression and Clayton’s 1-degree-of-freedom test for testing the association between X-chromosome SNPs and the disease of interest.

Supplementary Material

Acknowledgments

This work was supported by National Institutes of Health grant 1R01CA131324 (SS), R01DE022891 (SS), and R25DA026120 (SS) and by a faculty fellowship from The University of Texas MD Anderson Cancer Center Duncan Family Institute for Cancer Prevention and Risk Assessment (JW). Funding support for the Study of Addiction: Genetics and Environment (SAGE) was provided through the NIH Genes, Environment and Health Initiative [GEI] (U01 HG004422). SAGE is one of the genome-wide association studies funded as part of the Gene Environment Association Studies (GENEVA) under GEI. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for collection of datasets and samples was provided by the Collaborative Study on the Genetics of Alcoholism (COGA; U10 AA008401), the Collaborative Genetic Study of Nicotine Dependence (COGEND; P01 CA089392), and the Family Study of Cocaine Dependence (FSCD; R01 DA013423). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH contract “High throughput genotyping for studying the genetic contributions to human disease” (HHSN268200782096C). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000092.v1.p1 through dbGaP accession number phs000092.v1.p.

References

  • 1.Lyon MF. Gene action in the X-chromosome of the mouse (Mus musculus L.) Nature. 1961;190:372–373. doi: 10.1038/190372a0. [DOI] [PubMed] [Google Scholar]
  • 2.Gendrel AV, Heard E. Fifty years of X-inactivation research. Development. 2011;138:5049–5055. doi: 10.1242/dev.068320. [DOI] [PubMed] [Google Scholar]
  • 3.Minks J, Robinson WP, Brown CJ. A skewed view of X chromosome inactivation. J Clin Invest. 2008;118:20–23. doi: 10.1172/JCI34470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Starmer J, Magnuson T. A new model for random X chromosome inactivation. Development. 2009;136:1–10. doi: 10.1242/dev.025908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wong CC, Caspi A, Williams B, Houts R, Craig IW, Mill J. A longitudinal twin study of skewed X chromosome-inactivation. PLoS One. 2011;6:e17873. doi: 10.1371/journal.pone.0017873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chow JC, Yen Z, Ziesche SM, Brown CJ. Silencing of the mammalian X chromosome. Annu Rev Genomics Hum Genet. 2005;6:69–92. doi: 10.1146/annurev.genom.6.080604.162350. [DOI] [PubMed] [Google Scholar]
  • 7.Hickey PF, Bahlo M. X chromosome association testing in genome wide association studies. Genet Epidemiol. 2011;35:664–670. doi: 10.1002/gepi.20616. [DOI] [PubMed] [Google Scholar]
  • 8.Loley C, Ziegler A, Konig IR. Association tests for X-chromosomal markers--a comparison of different test statistics. Hum Hered. 2011;71:23–36. doi: 10.1159/000323768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Willard HF. The sex chromosomes and X chromosome inactivation. In: Scriver CR, Beaudet AL, Sly WS, Valle D, Childs B, Vogelstein B, editors. The Metabolic and Molecular Bases of Inherited Disease. New York: McGraw-Hill; 2000. pp. 1191–1221. [Google Scholar]
  • 10.Clayton D. Testing for association on the X chromosome. Biostatistics. 2008;9:593–600. doi: 10.1093/biostatistics/kxn007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–913. doi: 10.1038/ng2088. [DOI] [PubMed] [Google Scholar]
  • 13.Li Y, Abecasis GR. Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet. 2006;S79:2290. [Google Scholar]
  • 14.Busque L, Paquette Y, Provost S, Roy DC, Levine RL, Mollica L, Gilliland DG. Skewing of X-inactivation ratios in blood cells of aging women is confirmed by independent methodologies. Blood. 2009;113:3472–3474. doi: 10.1182/blood-2008-12-195677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chagnon P, Provost S, Belisle C, Bolduc V, Gingras M, Busque L. Age-associated skewing of X-inactivation ratios of blood cells in normal females: a candidate-gene analysis approach. Exp Hematol. 2005;33:1209–1214. doi: 10.1016/j.exphem.2005.06.023. [DOI] [PubMed] [Google Scholar]
  • 16.Plenge RM, Stevenson RA, Lubs HA, Schwartz CE, Willard HF. Skewed X-chromosome inactivation is a common feature of X-linked mental retardation disorders. Am J Hum Genet. 2002;71:168–173. doi: 10.1086/341123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Struewing JP, Pineda MA, Sherman ME, Lissowska J, Brinton LA, Peplonska B, Bardin-Mikolajczak A, Garcia-Closas M. Skewed X chromosome inactivation and early-onset breast cancer. J Med Genet. 2006;43:48–53. doi: 10.1136/jmg.2005.033134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Amos-Landgraf JM, Cottle A, Plenge RM, Friez M, Schwartz CE, Longshore J, Willard HF. X chromosome-inactivation patterns of 1,005 phenotypically unaffected females. Am J of Hum Genet. 2006;79:493–499. doi: 10.1086/507565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Belmont JW. Genetic control of X inactivation and processes leading to X-inactivation skewing. Am J of Hum Genet. 1996;58:1101–1108. [PMC free article] [PubMed] [Google Scholar]
  • 20.Abkowitz JL, Taboada M, Shelton GH, Catlin SN, Guttorp P, Kiklevich JV. An X chromosome gene regulates hematopoietic stem cell kinetics. Proc Natl Acad Sci U S A. 1998;95:3862–3866. doi: 10.1073/pnas.95.7.3862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Naumova AK, Olien L, Bird LM, Smith M, Verner AE, Leppert M, Morgan K, Sapienza C. Genetic mapping of X-linked loci involved in skewing of X chromosome inactivation in the human. Eur J Hum Genet. 1998;6:552–562. doi: 10.1038/sj.ejhg.5200255. [DOI] [PubMed] [Google Scholar]
  • 22.Renault NK, Pritchett SM, Howell RE, Greer WL, Sapienza C, Orstavik KH, Hamilton DC. Human X-chromosome inactivation pattern distributions fit a model of genetically influenced choice better than models of completely random choice. Eur J Hum Genet. 2013 doi: 10.1038/ejhg.2013.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chabchoub G, Uz E, Maalej A, Mustafa CA, Rebai A, Mnif M, Bahloul Z, Farid NR, Ozcelik T, Ayadi H. Analysis of skewed X-chromosome inactivation in females with rheumatoid arthritis and autoimmune thyroid diseases. Arthritis Res Ther. 2009;11:R106. doi: 10.1186/ar2759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sharp A, Robinson D, Jacobs P. Age- and tissue-specific variation of X chromosome inactivation ratios in normal women. Hum Genet. 2000;107:343–349. doi: 10.1007/s004390000382. [DOI] [PubMed] [Google Scholar]
  • 25.Busque L, Mio R, Mattioli J, Brais E, Blais N, Lalonde Y, Maragh M, Gilliland DG. Nonrandom X-inactivation patterns in normal females: lyonization ratios vary with age. Blood. 1996;88:59–65. [PubMed] [Google Scholar]
  • 26.Champion KM, Gilbert JG, Asimakopoulos FA, Hinshelwood S, Green AR. Clonal haemopoiesis in normal elderly women: implications for the myeloproliferative disorders and myelodysplastic syndromes. Br J Haematol. 1997;97:920–926. doi: 10.1046/j.1365-2141.1997.1933010.x. [DOI] [PubMed] [Google Scholar]
  • 27.Gale RE, Fielding AK, Harrison CN, Linch DC. Acquired skewing of X-chromosome inactivation patterns in myeloid cells of the elderly suggests stochastic clonal loss with age. Br J Haematol. 1997;98:512–519. doi: 10.1046/j.1365-2141.1997.2573078.x. [DOI] [PubMed] [Google Scholar]
  • 28.Hatakeyama C, Anderson CL, Beever CL, Penaherrera MS, Brown CJ, Robinson WP. The dynamics of X-inactivation skewing as women age. Clin Genet. 2004;66:327–332. doi: 10.1111/j.1399-0004.2004.00310.x. [DOI] [PubMed] [Google Scholar]
  • 29.Tonon L, Bergamaschi G, Dellavecchia C, Rosti V, Lucotti C, Malabarba L, Novella A, Vercesi E, Frassoni F, Cazzola M. Unbalanced X-chromosome inactivation in haemopoietic cells from normal women. Br J Haematol. 1998;102:996–1003. doi: 10.1046/j.1365-2141.1998.00867.x. [DOI] [PubMed] [Google Scholar]
  • 30.Talebizadeh Z, Bittel DC, Veatch OJ, Kibiryeva N, Butler MG. Brief report: non-random X chromosome inactivation in females with autism. J Autism Dev Disord. 2005;35:675–681. doi: 10.1007/s10803-005-0011-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Buller RE, Sood AK, Lallas T, Buekers T, Skilling JS. Association between nonrandom X-chromosome inactivation and BRCA1 mutation in germline DNA of patients with ovarian cancer. J Natl Cancer Inst. 1999;91:339–346. doi: 10.1093/jnci/91.4.339. [DOI] [PubMed] [Google Scholar]
  • 32.Kristiansen M, Langerod A, Knudsen GP, Weber BL, Borresen-Dale AL, Orstavik KH. High frequency of skewed X inactivation in young breast cancer patients. J Med Genet. 2002;39:30–33. doi: 10.1136/jmg.39.1.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434:400–404. doi: 10.1038/nature03479. [DOI] [PubMed] [Google Scholar]
  • 34.Brown CJ, Carrel L, Willard HF. Expression of genes from the human active and inactive X chromosomes. Am J Hum Genet. 1997;60:1333–1343. doi: 10.1086/515488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Carrel L, Park C, Tyekucheva S, Dunn J, Chiaromonte F, Makova KD. Genomic environment predicts expression patterns on the human inactive X chromosome. PLoS Genet. 2006;2:e151. doi: 10.1371/journal.pgen.0020151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Miller AP, Willard HF. Chromosomal basis of X chromosome inactivation: identification of a multigene domain in Xp11.21-p11.22 that escapes X inactivation. Proc Natl Acad Sci U S A. 1998;95:8709–8714. doi: 10.1073/pnas.95.15.8709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zheng G, Joo J, Zhang C, Geller NL. Testing association for markers on the X chromosome. Genet Epidemiol. 2007;31:834–843. doi: 10.1002/gepi.20244. [DOI] [PubMed] [Google Scholar]
  • 38.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chung RH, Ma D, Wang K, Hedges DJ, Jaworski JM, Gilbert JR, Cuccaro ML, Wright HH, Abramson RK, Konidari I, et al. An X chromosome-wide association study in autism families identifies TBL1X as a novel autism spectrum disorder candidate gene in males. Mol Autism. 2011;2:18. doi: 10.1186/2040-2392-2-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Carrasquillo MM, Zou F, Pankratz VS, Wilcox SL, Ma L, Walker LP, Younkin SG, Younkin CS, Younkin LH, Bisceglio GD, et al. Genetic variation in PCDH11X is associated with susceptibility to late-onset Alzheimer’s disease. Nat Genet. 2009;41:192–198. doi: 10.1038/ng.305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wise AL, Gyi L, Manolio TA. eXclusion: Toward Integrating the X Chromosome in Genome-wide Association Analyses. Am J Hum Genet. 2013;92:643–647. doi: 10.1016/j.ajhg.2013.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Clayton D. snpStats: SnpMatrix and XSnpMatrix classes and methods. R package version 1.2.1 2011 [Google Scholar]
  • 43.Amos CI, Wang LE, Lee JE, Gershenwald JE, Chen WV, Fang S, Kosoy R, Zhang M, Qureshi AA, Vattathil S, et al. Genome-wide association study identifies novel loci predisposing to cutaneous melanoma. Human Molecular Genetics. 2011;20:5012–5023. doi: 10.1093/hmg/ddr415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39:1181–1186. doi: 10.1038/ng1007-1181. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES