Whole-genome detection of disease-associated deletions or excess homozygosity in a case–control study of rheumatoid arthritis

Chih-Chieh Wu; Sanjay Shete; Eun-Ji Jo; Yaji Xu; Emily Y Lu; Wei V Chen; Christopher I Amos

doi:10.1093/hmg/dds512

. 2012 Dec 6;22(6):1249–1261. doi: 10.1093/hmg/dds512

Whole-genome detection of disease-associated deletions or excess homozygosity in a case–control study of rheumatoid arthritis

Chih-Chieh Wu ^1,^*, Sanjay Shete ², Eun-Ji Jo ⁴, Yaji Xu ⁵, Emily Y Lu ³, Wei V Chen ³, Christopher I Amos ^3,⁶

PMCID: PMC3578409 PMID: 23223014

Abstract

Unlike genome-wide association studies, few comprehensive studies of copy number variation's contribution to complex human disease susceptibility have been performed. Copy number variations are abundant in humans and represent one of the least well-studied classes of genetic variants; in addition, known rheumatoid arthritis susceptibility loci explain only a portion of familial clustering. Therefore, we performed a genome-wide study of association between deletion or excess homozygosity and rheumatoid arthritis using high-density 550 K SNP genotype data from a genome-wide association study. We used a genome-wide statistical method that we recently developed to test each contiguous SNP locus between 868 cases and 1194 controls to detect excess homozygosity or deletion variants that influence susceptibility. Our method is designed to detect statistically significant evidence of deletions or homozygosity at individual SNPs for SNP-by-SNP analyses and to combine the information among neighboring SNPs for cluster analyses. In addition to successfully detecting the known deletion variants on major histocompatibility complex, we identified 4.3 and 28 kb clusters on chromosomes 10p and 13q, respectively, which were significant at a Bonferroni-type-corrected 0.05 nominal significant level. Independently, we performed analyses using PennCNV, an algorithm for identifying and cataloging copy numbers for individuals based on a hidden Markov model, and identified cases and controls that had chromosomal segments with copy number <2. Using Fisher's exact test for comparing the numbers of cases and controls with copy number <2 per SNP, we identified 26 significant SNPs (protective; more controls than cases) aggregating on chromosome 14 with P-values <10⁻⁸.

INTRODUCTION

Studies of human genome have demonstrated extensive and wide-spread copy number variations (CNVs) of DNA sequences, such as deletions, insertions, duplications and complex multi-site variants, that indicate the presence of variable numbers of copies of large genomic regions (mostly >1 kb in size) among individuals. Comprehensive whole-genome reference maps of human CNVs by SNP microarrays and array comparative genomic hybridization have been constructed (1–3). Genomic deletions represent a variant class that is often associated with disease. Three concurrent studies that specifically investigated common deletion polymorphisms in healthy individuals demonstrated that deletion variants of various sizes are ubiquitous; they also provided comprehensive maps of deletions in the human genome (4–6). These studies provided important baseline information to enable the discovery of CNV classes and facilitate whole-genome studies of associations between disease and CNVs.

Deletion variants have long been known to cause microdeletion syndromes, such as DiGeorge syndrome, Prader-Willi syndrome and Wilms tumor (7), and are frequently observed in patients with neuron-developmental disorders, such as autism and schizophrenia (8–11). Recently discovered are a common 20 kb deletion upstream of the IRGM gene that is associated with Crohn's disease, a 45 kb deletion upstream of NEGR1 that is associated with the body mass index, and a deletion and duplication of KIR that is associated with HIV-1 control (12–14). A study of CNVs as trait-associated polymorphisms and expression quantitative trait loci that influence phenotype by altering gene regulation demonstrated that they contribute to the genetics of certain disease classes, such as autoimmune disorders and metabolic traits (15). However, controversy exists; it has yet to be fully ascertained to what extent CNVs account for missing heritability that is undetected by genome-wide association studies (1,15–17). In fact, few comprehensive whole-genome studies exist of their contribution to susceptibility over a wide variety of common, complex human diseases compared with genome-wide association studies (15,17). CNVs remain one of the least well-studied classes of genetic variants. More recently, a nucleotide-resolution map of CNVs based on whole-genome DNA sequencing data from 185 individuals in the 1000 Genome Project was constructed, enabling the discovery, genotyping and imputation of CNVs and serving as a resource for sequencing-based association studies (18).

Rheumatoid arthritis (RA) is a common autoimmune disorder of unknown etiology; it is characterized by the destruction of the synovial joints, resulting in severe disability. It has a complex mode of inheritance and is influenced by both genetic and environmental risk factors. It affects ∼1% of individuals of European ancestry, with an estimated sibling recurrence risk of 5–10 (19–21). In addition to the established susceptibility loci of HLA-DRB1 and PTPN22 (protein tyrosine phosphatase and non-receptor type 22) in patients with severe anti-CCP-positive RA, several associated alleles of modest risk on the newly identified loci have been reproducibly discovered in recent genome-wide association studies, including REL, STAT4, TNFAIP3 and BLK. On the basis of estimates of a recent meta-analysis, validated RA risk alleles on major histocompatibility complex (MHC) and non-MHC loci explained ∼12 and 4% of phenotypic variance, respectively; a large portion of heritable variation remains to be discovered (22).

We recently developed a genome-wide statistical method for detecting disease-associated deletion variants or excess homozygosity using high-density SNP genotype data in genome-wide association studies (23). Our method is based on identifying areas in which excess homozygosity of cases varies from controls and is structured to test each contiguous SNP locus across the whole genome between a group of cases and a group of controls from a genome-wide association study. The method has proved to be useful and robust in the presence of linkage disequilibrium. It provides outcomes for SNP-by-SNP analyses and cluster analyses on the basis of combined evidence from multiple neighboring SNPs in case–control studies. Genome-wide association studies are designed to discover individual disease-associated SNPs; in contrast, methods for detecting CNVs and deletions are generally designed to find small chromosomal segments (4,6,23–25). In this study, we used our method to perform a comprehensive genome-wide study of associations between common deletion variants or excess homozygosity and RA susceptibility using an Illumina HumanHap550 array in 868 RA patients and 1194 controls from the North American Rheumatoid Arthritis Consortium (20). The SNP-by-SNP analyses identified individual significant SNPs over the whole genome at a nominal significance level of 10⁻⁸; the cluster analyses detected candidate deleted segments in which at least 2 neighboring significant SNPs were overly aggregated. In addition to successfully detecting known deleterious deletion variants on HLA-DRB1 and C4 genes that increase RA risk in the MHC region, we identified additional 4.3 and 28 kb clusters on chromosomes 10p (5 316 846–5 321 159) and 13q (20 783 404–20 811 429), respectively, which were significant at a corrected 0.05 nominal significance level, adjusted for multiple comparison procedures.

Independently, we performed analyses using the PennCNV method and identified cases and controls that had chromosomal segments with copy number <2. PennCNV is an algorithm for identifying and cataloging copy numbers for individuals on the basis of a hidden Markov model (25). Using Fisher's exact test to compare the numbers of cases and controls per SNP, we identified 26 significant SNPs (protective; more controls than cases) that were overly aggregated on chromosome 14 with P-values <10⁻⁸ and additional 49 SNPs on chromosomes 2, 14 and 20 with P-values of 10⁻⁵–10⁻⁸. In this report, we extend genome-wide association studies to deletion and excess homozygosity detection for finding additional common genetic variants that influence RA susceptibility. We also provide a strategy and analytical framework that can be used at no additional cost: using SNP and intensity data from genome-wide association studies to detect disease-associated deletion variants or excess homozygosity and identify individual patients with commonly shared disease-associated deletion variants.

RESULTS

For SNP-by-SNP analyses, we performed the z-score test to assess the statistical significance of differences in homozygosity proportions between the 868 cases and 1194 controls on each of 550 K contiguous SNP loci. We found that 535 individual SNPs reached genome-wide significance (defined as P-value <10⁻⁸). Table 1 shows the frequencies of SNP genotypes, missing SNP genotypes, SNPs tested and significant SNPs by chromosome and arm. The number of SNPs tested is the difference in counts between SNP genotypes and missing SNP genotypes.

Table 1.

Frequencies of SNPs, missing SNPs, SNPs tested and significant SNPs by chromosome and arm

Chromosome	Arm	Number of SNP genotyped	Number of missing SNPs	Number of SNPs tested^a	Number of significant SNPs^b
1	p	21 533	81	21 452	19
	q	19 396	104	19 292	13
2	p	18 526	98	18 428	9
	q	25 564	105	25 459	22
3	p	18 457	55	18 402	8
	q	18 233	94	18 139	7
4	p	9488	35	9453	3
	q	23 140	105	23 035	10
5	p	9106	37	9069	11
	q	24 506	96	24 410	13
6	p	13 964	47	13 917	81
	q	21 610	67	21 543	12
7	p	13 249	30	13 219	9
	q	15 995	69	15 926	14
8	p	12 222	64	12 158	3
	q	18 768	74	18 694	13
9	p	10 878	31	10 847	6
	q	15 250	45	15 205	19
10	p	9616	18	9598	7
	q	18 715	53	18 662	21
11	p	10 550	49	10 501	11
	q	15 927	46	15 881	13
12	p	8048	43	8005	7
	q	18 317	79	18 238	11
13	p	–	–	–	–
	q	20 242	84	20 158	11
14	p	–	–	–	–
	q	17 951	62	17 889	9
15	p	–	–	–	–
	q	16 166	47	16 119	19
16	p	6382	38	6344	13
	q	10 078	43	10 035	11
17	p	4526	11	4515	10
	q	9501	31	9470	17
18	p	3515	3	3512	4
	q	12 935	63	12 872	9
19	p	3704	5	3699	20
	q	5532	13	5519	12
20	p	6697	17	6680	7
	q	7146	27	7119	17
21	p	1	–	1	–
	q	8050	18	8032	13
22	p	–	–	–	–
	q	8205	33	8172	21
Total				529 669	535

Open in a new tab

^aThe number of SNPs tested is the difference in counts between SNP genotypes and missing SNPs.

^bThese SNPs were statistically significant by the z-score test at a nominal significance level of 10⁻⁸ for SNP-by-SNP analyses.

Figure 1 displays a graphical summary of outcomes of the genome-wide association scan between deletion variants or excess homozygosity and RA risk in which SNPs are plotted according to corresponding chromosomal locations with the values of –log₁₀(P-values). The largest association signal lies in the MHC region with a maximal aggregation of neighboring significant SNPs. We identified the deleterious deletion variants that encompassed HLA-DRB1 and C4 genes in the MHC region in which deletions and CNVs were previously discovered in RA patients (26,27). Deletions in the HLA-DRB1 region are a common characteristic of HLA class II haplotypes, and the major DR4 and DR9 haplotypes associated with RA belong to a related haplotype family with multiple DRB loci, including several pseudogenes. In contrast, the DR1 haplotypes associated with RA are members of a distinct family of haplotypes that have fewer DRB loci. Thus, we would expect to find copy number differences in DRB genes between RA cases and controls, which contain haplotype families with more variable numbers of DRB loci.

In this study, a cluster was defined as two or more significant SNPs gathered on a short chromosomal segment of pre-determined length on the basis of the SNP-by-SNP analysis outcome on the first stage. Because the tagged SNP genotypes used in genome-wide association studies are not uniformly distributed over the whole genome and because gene-sparse regions may have fewer SNPs genotyped and higher probabilities of containing genomic deletions, we used two different cluster criteria to determine the minimal length of a chromosomal segment that accommodates multiple adjacent significant SNPs. One criterion for defining a cluster of significant SNPs is that two successive significant SNPs are separated by 20 or fewer SNP loci; the other criterion is the use of a maximum distance of ≤100 kb between two successive significant SNPs. Under these criteria, a cluster begins with a significant SNP locus and ends with another significant SNP locus. Clusters can continuously extend this way to accommodate more than two significant SNPs. The mean distance was 5.39 kb between adjacent SNPs in this application; a 20-SNP-locus chromosomal segment spans a mean of 107.8 kb. We previously used extensive simulations to demonstrate that our method is effective at detecting disease-associated deletions and excess homozygosity under these cluster criteria (23).

Cluster analysis under the first criterion

Under the first cluster criterion of two successive significant SNPs separated by no >20 SNP loci, we identified 14 distinct clusters of neighboring significant SNPs over the whole genome. Each is described and shown in Table 2 in detail. Common variants of the first cluster in the MHC region contributed the strongest statistical signal of risk. We found that 54 significant SNPs overly aggregated on a short segment of 252 contiguous SNP loci in the first cluster. Excluding two significant SNPs on each end of the cluster, 52 significant SNPs were allocated inside this cluster. In this case, T = 13917, k = 81, w = 252 and x = 54 were used for formula (1). The null probability was Inline graphic , as 13 917 contiguous SNP loci were tested individually on the p arm of chromosome 6 and 81 of them were significant at the nominal significance level of 10⁻⁸ (shown in 5th and 6th columns and 12th row of Table 1). The exact P-value of this cluster was , using formula (1).

Table 2.

Cluster analysis under the first criterion of two successive significant SNPs separated by 20 or fewer SNP loci

Cluster number	1	2	3	4	5	6	7
Chromosome	6p	7q	10p	10q	11p	13q	16p
Position^a (kb)	32 182.782–32 810.427	148 176.586–148 326.588	5 316.846–5 321.159	105 335.191–105 403.030	45 242.379–45 297.296	20 783.404–20 811.429	1 066.544–1 091.324
Cluster size (kb)	627.645	150.002	4.313	67.839	54.917	28.025	24.780
No. of significant SNPs	54	2	2	2	2	2	2
No. of SNPs encompassed cluster	252	19	2	16	10	8	7
P-value of cluster test	2.91 × 10⁻⁶⁶	1.31 × 10⁻⁴	5.32 × 10⁻⁷	1.50 × 10⁻⁴	4.91 × 10⁻⁵	8.32 × 10⁻⁶	8.76 × 10⁻⁵
Corrected P-value of cluster test	1.61 × 10⁻⁶⁴	0.110	2.55 × 10⁻³	0.175	5.16 × 10⁻²	2.10 × 10⁻²	7.94 × 10⁻²
P-value of scan test	3.43 × 10⁻⁷⁰	0.212	8.74 × 10⁻³	0.351	0.103	4.34 × 10⁻²	0.169
Cluster number	8	9	10	11	12	13	14
Chromosome	19p	19p	20q	21q	22q	22q	22q
Position^a (kb)	2 054.962–2 165.057	19 083.070–19 117.870	49 383.424–49 422.842	14 121.682–14 367.339	24 086.564–24 108.959	42 601.072–42 611.432	48 493.142–48 620.780
Cluster size (kb)	110.095	34.800	39.418	245.657	22.395	10.360	127.638
No. of significant SNPs	2	2	2	2	2	2	2
No. of SNPs encompassed cluster	18	8	8	14	3	10	14
P-value of cluster test	4.22 × 10⁻³	8.01 × 10⁻⁴	1.58 × 10⁻⁴	2.35 × 10⁻⁴	1.98 × 10⁻⁵	2.93 × 10⁻⁴	5.89 × 10⁻⁴
Corrected P-value of cluster test	0.868	0.370	0.141	0.135	5.39 × 10⁻²	0.240	0.344
P-value of scan test^b	–	0.774	0.298	0.264	0.153	–	–

Open in a new tab

^aBuild 35.

^bThe P-values of the clusters that are not the largest on a chromosome arm are not available and are indicated by the symbol ‘–’ for the scan test.

It is noteworthy that the P-values that are directly obtained using expression (1) have no corrections imposed, adjusted for multiple comparison procedures. The chromosomal segment that encompassed this cluster can occur at other locations along chromosome 6p; we must take this into account when assessing statistical significance using this test for cluster analyses. We used a Bonferroni-type correction to adjust P-value thresholds by multiplying the P-value with the ratio of T (the total number of SNPs tested over a chromosomal region) to w (the number of SNPs that encompass the cluster of interest) (23). In this case, the corrected P-value is equal to Inline graphic . Because only one cluster is present on chromosome 6p, we used the scan test and obtained the P-value of . Both the cluster test and scan test demonstrated that this 627 kb clustering segment on chromosome 6p (32 182 782–32 810 427) is highly significant.

We analyzed the remaining 13 distinct clusters using the same approach; the corresponding results are shown in Table 2. In contrast with the first cluster on chromosome 6p, each of these 13 clusters contained exactly two significant SNPs. The clusters of significant SNPs on the 4.3 kb segment of chromosome 10p (5 316 846–5 321 159) and 28-kb segment of chromosome 13q (20 783 404–20 811 429) had corrected P-values of Inline graphic and , respectively; these were significant at a corrected 0.05 nominal significance level, adjusted for multiple comparison procedures. The corresponding P-values of the scan test for these two clusters were and . Detailed information on these two clusters is presented in the fourth and seventh columns of Table 2. It is important to determine the pattern of linkage disequilibrium between adjacent significant SNPs in clusters. We used the values of r² to measure the magnitude of linkage disequilibrium between two adjacent significant SNPs on these two clusters. The r² values between significant SNPs were 0.387 for cases and 0.310 for controls on the 4.3 kb cluster of chromosome 10p; 0.121 for cases and 0.070 for controls on the 28 kb cluster of chromosome 13q. The P-values of the 8th, 13th, and 14th clusters of Table 2 for the scan test are not available because the scan test only assesses the largest cluster on a chromosome arm, and the 9th and 12fth clusters of Table 2 are the largest on chromosomes 19p and 22q, respectively.

Cluster analysis under the second criterion

Under the second cluster criterion of a maximum distance of ≤100 kb between two adjacent significant SNPs, we identified 14 distinct clusters of neighboring significant SNPs, each of which is described and shown in Table 3 in detail. The strongest association signal remained in the MHC region, as it contained five distinct clusters on chromosome 6p rather than only the 1 shown in Table 2. The largest cluster found using the first criterion on chromosome 6p in Table 2 was split into two adjacent clusters (the third and fourth clusters of Table 3) under the use of second cluster criterion because only three SNPs were genotyped on the 144 kb gap between these two clusters. Both clusters were large (353 kb and 130 kb in size) and highly significant by our cluster test or scan test. The remaining clusters contained exactly two significant SNPs each. Besides the clusters on chromosome 6p, the same two clusters on chromosomes 10p and 13q were significant at a corrected 0.05 nominal significance level, using a Bonferroni-type correction, as those using the first cluster criterion.

Table 3.

Cluster analysis under the second criterion of a maximum distance of ≤100 kb between two successive significant SNPs

Cluster number	1	2	3	4	5	6	7
Chromosome	6p	6p	6p	6p	6p	10p	10q
Position^a (kb)	31 133.030–31 203.780	31 652.168–31 723.146	32 182.782–32 536.263	32 680.229–32 810.427	33 194.227–33 293.896	5 316.846–5 321.159	105 335.191–105 403.030
Cluster size (kb)	70.750	70.978	353.481	130.198	99.669	4.313	67.839
No. of significant SNPs	2	2	34	20	2	2	2
No. of SNPs encompassed cluster	37	23	164	85	46	2	16
P-value of cluster test	1.97 × 10⁻²	7.90 × 10⁻³	8.39 × 10⁻⁴²	1.95 × 10⁻²⁶	5.16 × 10⁻²	5.32 × 10⁻⁷	1.50 × 10⁻⁴
Corrected P-value of cluster test	–	–	7.12 × 10⁻⁴⁰	3.19 × 10⁻²⁴	–	2.55 × 10⁻³	0.175
P-value of scan test^b	–	–	–	5.38 × 10⁻²³	–	8.74 × 10⁻³	0.351
Cluster number	8	9	10	11	12	13	14
Chromosome	11p	13q	16p	19p	20q	22q	22q
Position^a (kb)	45 242.379–45 297.296	20 783.404–20 811.429	1 066.544–1 091.324	19 083.070–19 117.870	49 383.424–49 422.842	24 086.564–24 108.959	42 601.072–42 611.432
Cluster size (kb)	54.917	28.025	24.780	34.800	39.418	22.395	10.360
No. of significant SNPs	2	2	2	2	2	2	2
No. of SNPs encompassed cluster	10	8	7	8	8	3	10
P-value of cluster test	4.91 × 10⁻⁵	8.32 × 10⁻⁶	8.76 × 10⁻⁵	8.01 × 10⁻⁴	1.58 × 10⁻⁴	1.98 × 10⁻⁵	2.93 × 10⁻⁴
Corrected P-value of cluster test	5.16 × 10⁻²	2.10 × 10⁻²	7.94 × 10⁻²	0.370	0.141	5.39 × 10⁻²	0.240
P-value of scan test^b	0.103	4.34 × 10⁻²	0.169	0.774	0.298	0.153	–

Open in a new tab

^aBuild 35.

^bThe P-values of the clusters that are not the largest on a chromosome arm are not available and are indicated by the symbol ‘–’ for the scan test.

Four clusters of significant SNPs under the second cluster criterion, shown in Table 3, were statistically significant at a corrected 0.05 nominal significance level, adjusted for multiple comparison procedures. However, the two largest clusters (the third and fourth clusters of Table 3) combined in the MHC region were eventually the same as the single largest cluster found under the first cluster criterion (the first cluster of Table 2). In conclusion, both our cluster test and scan test identified three nearly identical clusters of neighboring significant SNPs under any cluster criteria at a corrected 0.05 nominal significance level: the known deleterious deletion variants in the MHC region, a 4.3 kb segment of chromosome 10p and a 28 kb segment of chromosome 13q. Because genomic variants are not uniformly distributed and genotyped over the whole genome, it is more prudent to perform additional, separate association analyses using both cluster criteria rather than using any single criterion alone in real-data analyses.

We used the proposed logistic regression framework extension (shown in the Test for SNP-by-SNP Analyses on the First Stage section) to assess significance of excess homozygosity on these three clusters of significant SNPs, accounting for population stratification. Our analysis showed that 58 of 252 SNPs in the MHC region (32 182.782–32 810.427) were significant at a nominal significance level of 10⁻⁸. In addition, the two significant SNPs on chromosomes 10p and 13q, respectively, also remained highly significant using this logistic regression extension. These results indicate that our cluster-based method is robust for population stratification in this application.

In addition, using any cluster criteria, we found that three clusters of significant SNPs were borderline statistically significant. These clusters were located on chromosomes 11p, 16p and 22q and had corrected P-values of 5.16 × 10⁻², 7.94 × 10⁻², and 5.39 × 10⁻², respectively. The corresponding r² values between significant SNPs were <0.01 for cases and controls on chromosomes 11p and 16p and 0.213 for cases and 0.167 for controls on chromosome 22q.

Whole-genome scan of RA-association using PennCNV

The SNP-based statistical method that we developed is designed to detect disease-associated deletion variants or excess homozygosity; it is structured to test each contiguous SNP locus between a group of cases and a group of controls from a genome-wide association study (23). In contrast, PennCNV is an algorithm that calls individual level copy numbers, providing position-specific copy numbers (25). We used PennCNV to obtain whole-genome CNV maps for 891 RA cases and 601 controls that had available intensity data. PennCNV outputs small chromosomal segments with copy numbers other than two. We detected 62 162 CNVs with a median size of ∼54 kb: cases had 44 729 CNVs with a median size of ∼64 kb; and controls had 17 433 with a median size of ∼32 kb.

We first used PennCNV to identify cases and controls that had chromosomal segments with copy number = 0 or 1; we then used Fisher's exact test to assess the statistical significance of association between RA risk and deletions (copy number = 0 or 1 determined by PennCNV) by comparing the numbers of cases and controls per SNP locus. In Figure 2, we present a graphical SNP-by-SNP outcome summary of the whole-genome scan of association between deletions and RA risk in which SNPs are plotted according to corresponding chromosomal locations with the values of –log₁₀(P-values). The P-values of the two-sided Fisher's exact test were calculated and are shown in the figure. We identified 26 significant SNPs (protective; more controls than cases) clustering on chromosome 14 with P-values <10⁻⁸. An amplified display of the values of –log₁₀(P-values) by their corresponding physical position over this small region is shown in Figure 3. In addition, we found 49 SNPs with P-values between 10⁻⁵ and 10⁻⁸: 9 SNPs on chromosome 20 increased RA risk (more cases than controls), 35 SNPs on chromosome 14 and 5 SNPs on chromosome 2 decreased RA risk (more controls than cases). Table 4 shows all 75 SNPs with P-values <10⁻⁵, including their positions, names and exact P-values. We also present the corresponding numbers of cases and controls with copy number = 0 or 1 for each SNP in the table. There were 891 RA cases and 601 controls that had available intensity data for the PennCNV analyses; thus, the numbers of cases and controls with copy number ≠ 0 or 1 can be obtained correspondingly for calculating P-values of the Fisher's exact test. It is noteworthy that, unlike our cluster-based approach, the PennCNV method did not detect known deleterious deletion variants that encompassed HLA-DRB1 and C4 genes in the MHC region.

Figure 2. — Genome-wide scan of association between rheumatoid arthritis and deletions (copy number = 0 or 1) defined by PennCNV, using Fisher's exact test. SNPs were plotted according to corresponding chromosomal locations with –log₁₀(P-values), using two-sided Fisher's exact test. We identified 26 significant SNPs overly aggregating on chromosome 14 with P-values <10⁻⁸ and additional 49 SNPs on chromosomes 2, 14 and 20 with P-values of 10⁻⁵–10⁻⁸. These SNPs on chromosomes 2 and 14 were protective; those on chromosomes 20 were associated with increased RA risk.

Figure 3. — Amplification of association scan on chromosome 14, 20.5–23 Mb, between rheumatoid arthritis and deletions (copy number = 0 or 1) defined by PennCNV, using Fisher's exact test. The largest association signal appears on a 165 kb segment of chromosome 14q (21 834 952–21 999 998) in which all 26 significant SNPs lie at the nominal significance level of 10⁻⁸ and spans 59 SNP loci. Twenty-four consecutive SNPs were statistically significant on a 46.5 kb segment of chromosome 14q (21 834 952–21 881 469).

Table 4.

Regions of the genome showing evidence of association between rheumatoid arthritis and deletions (copy number = 1 or 0) by PennCNV

No.	Chromosome	Position	SNP	No. of cases with copy no. = 0,1^a	No. of controls with copy no. = 0,1^a	P-values of two-sided Fisher's exact test^b
1	14	21 852 217	rs11845134	13	61	4.10 × 10⁻¹⁴
2	14	21 849 683	rs7146411	12	58	9.08 × 10⁻¹⁴
3	14	21 850 339	rs3811259	12	58	9.08 × 10⁻¹⁴
4	14	21 850 502	rs11850894	12	58	9.08 × 10⁻¹⁴
5	14	21 834 952	rs12588739	6	43	3.39 × 10⁻¹²
6	14	21 837 485	rs722448	6	43	3.39 × 10⁻¹²
7	14	21 856 055	rs1474477	18	62	5.24 × 10⁻¹²
8	14	21 859 477	rs8007403	24	70	5.28 × 10⁻¹²
9	14	21 860 760	rs916048	25	71	8.69 × 10⁻¹²
10	14	21 857 381	rs10047935	18	60	1.93 × 10⁻¹¹
11	14	21 861 403	rs2204990	26	70	2.56 × 10⁻¹¹
12	14	21 841 092	rs741713	9	46	3.12 × 10⁻¹¹
13	14	21 841 139	rs1076549	9	46	3.12 × 10⁻¹¹
14	14	21 841 963	rs2009858	9	46	3.12 × 10⁻¹¹
15	14	21 838 610	rs3811260	7	42	3.98 × 10⁻¹¹
16	14	21 845 319	rs1540268	11	49	4.11 × 10⁻¹¹
17	14	21 845 708	rs10142594	11	49	4.11 × 10⁻¹¹
18	14	21 864 135	rs17793809	27	70	6.52 × 10⁻¹¹
19	14	21 867 816	rs4981422	27	69	1.18 × 10⁻¹⁰
20	14	21 869 910	rs11627649	27	69	1.18 × 10⁻¹⁰
21	14	21 842 503	rs1467891	9	44	1.28 × 10⁻¹⁰
22	14	21 862 055	rs11847479	26	64	1.37 × 10⁻⁰⁹
23	14	21 878 594	rs4981423	17	52	1.78 × 10⁻⁰⁹
24	14	21 881 469	rs3811256	17	51	3.38 × 10⁻⁰⁹
25	14	21 999 540	rs10162417	22	56	9.05 × 10⁻⁰⁹
26	14	21 999 998	rs10131293	22	56	9.05 × 10⁻⁰⁹
27	14	21 885 790	rs2032442	14	45	1.46 × 10⁻⁰⁸
28	14	21 886 996	rs12436199	14	45	1.46 × 10⁻⁰⁸
29	14	22 000 627	rs2733776	22	54	2.95 × 10⁻⁰⁸
30	14	21 831 090	rs10483271	7	33	5.90 × 10⁻⁰⁸
31	14	21 832 139	rs17198314	7	33	5.90 × 10⁻⁰⁸
32	14	21 832 903	rs17198328	7	33	5.90 × 10⁻⁰⁸
33	14	21 898 729	rs2331662	14	42	9.73 × 10⁻⁰⁸
34	14	21 996 759	rs17794083	26	57	1.05 × 10⁻⁰⁷
35	14	21 995 192	rs1882704	28	58	1.94 × 10⁻⁰⁷
36	14	21 827 106	rs2001022	7	31	2.25 × 10⁻⁰⁷
37	20	35 462 245	rs1570209	96	22	2.45 × 10⁻⁰⁷
38	14	21 994 034	rs2242545	29	59	2.49 × 10⁻⁰⁷
39	14	21 985 656	rs12147516	52	83	2.87 × 10⁻⁰⁷
40	14	21 986 886	rs10483273	52	83	2.87 × 10⁻⁰⁷
41	20	35 442 559	rs6090585	62	9	3.36 × 10⁻⁰⁷
42	20	35 443 071	rs6018199	62	9	3.36 × 10⁻⁰⁷
43	14	21 826 110	rs10129606	7	30	4.42 × 10⁻⁰⁷
44	2	208 064 035	rs918843	34	63	5.25 × 10⁻⁰⁷
45	2	208 064 167	rs918842	34	63	5.25 × 10⁻⁰⁷
46	2	208 064 454	rs2551649	34	63	5.25 × 10⁻⁰⁷
47	2	208 065 237	rs6755425	34	63	5.25 × 10⁻⁰⁷
48	2	208 066 083	rs959668	34	63	5.25 × 10⁻⁰⁷
49	14	21 991 120	rs11848747	32	61	5.32 × 10⁻⁰⁷
50	20	35 440 545	rs12329503	61	9	5.49 × 10⁻⁰⁷
51	20	35 485 009	rs6018428	98	24	6.36 × 10⁻⁰⁷
52	20	35 485 260	rs6018432	98	24	6.36 × 10⁻⁰⁷
53	20	35 438 689	rs6094509	60	9	9.03 × 10⁻⁰⁷
54	20	35 475 054	rs11905013	97	24	9.49 × 10⁻⁰⁷
55	20	35 476 320	rs4810624	97	24	9.49 × 10⁻⁰⁷
56	14	22 009 307	rs8020193	17	42	1.14 × 10⁻⁰⁶
57	14	22 002 896	rs10483275	21	47	1.49 × 10⁻⁰⁶
58	14	21 908 470	rs8014927	14	38	1.79 × 10⁻⁰⁶
59	14	21 822 713	rs17116039	8	30	1.84 × 10⁻⁰⁶
60	14	21 819 582	rs4435168	13	36	2.13 × 10⁻⁰⁶
61	14	21 973 302	rs2141988	49	75	2.30 × 10⁻⁰⁶
62	14	21 973 771	rs3811232	49	75	2.30 × 10⁻⁰⁶
63	14	21 974 905	rs8021297	49	75	2.30 × 10⁻⁰⁶
64	14	22 010 682	rs10483277	15	38	2.87 × 10⁻⁰⁶
65	14	21 970 760	rs6572449	49	74	4.98 × 10⁻⁰⁶
66	14	21 972 830	rs7142158	49	74	4.98 × 10⁻⁰⁶
67	14	21 975 565	rs11623995	49	74	4.98 × 10⁻⁰⁶
68	14	21 976 908	rs11157596	49	74	4.98 × 10⁻⁰⁶
69	14	21 914 810	rs4982619	14	36	5.51 × 10⁻⁰⁶
70	14	21 816 895	rs3811266	13	34	7.13 × 10⁻⁰⁶
71	14	21 817 304	rs4982599	13	34	7.13 × 10⁻⁰⁶
72	14	21 931 475	rs12891257	13	34	7.13 × 10⁻⁰⁶
73	14	21 933 475	rs10142552	13	34	7.13 × 10⁻⁰⁶
74	14	21 928 200	rs3811247	12	33	7.51 × 10⁻⁰⁶
75	14	21 929 322	rs3811244	12	33	7.51 × 10⁻⁰⁶

Open in a new tab

^aThere were 891 RA cases and 601 controls in the PennCNV analyses. With the numbers of cases and controls with copy number = 0 or 1, the numbers of cases and controls with copy number ≠ 0 or 1 can be obtained correspondingly to calculate P-values of the Fisher's exact test.

^bThe two-sided Fisher's exact test was used to assess the statistical significance of association between RA risk and deletions (copy number = 0 or 1).

The largest association signal appeared on a 165 kb segment of chromosome 14q (21 834 952–21 999 998), in which all 26 significant SNPs lie at the nominal significance level of 10⁻⁸, and spans 59 SNP loci. Notably, we found that 24 consecutive SNPs were statistically significant on a 46.5 kb segment of chromosome 14q (21 834 952–21 881 469). The respective maps of this region for cases and controls, shown in Figure 4, suggest that at least four distinct loci in separate linkage disequilibrium blocks are present on the 46.5 kb segment that accommodates the 24 consecutive significant SNPs; at least eight distinct loci in separate linkage disequilibrium blocks are present on the 165 kb segment of chromosome 14q (21 834 952–21 999 998) that accommodates all 26 significant SNPs. This region contains the T-cell receptor alpha chain which is rearranged in T-cells. As different T-cells show different rearrangements, the DNA intensity across this region would be decreased, while heterozygosity calling of genotypes would not be altered, hence explaining differences between PennCNV and the homozygosity clustering approach.

Figure 4. — (A and B) The haplotype maps of chromosome 14q (21 834 952–22 000 629). The first figure is the haplotype map for cases (A) and second for controls (B). We used the value of D’ to create linkage disequilibrium blocks of these two haplotype maps. These two figures suggest that at least four distinct loci, in separate linkage disequilibrium blocks, are present on the 46.5 kb segment of chromosome 14q (21 834 952–21 881 469); this segment accommodates 24 consecutive significant SNPs. At least eight distinct loci, in separate linkage disequilibrium blocks, are present on the 165 kb segment of chromosome 14q (21 834 952–21 999 998); this segment accommodates all 26 significant SNPs.

We used the proposed logistic regression framework extension (shown in the Test for SNP-by-SNP Analyses on the First Stage section) to assess significance of deletions (copy number = 0 or 1 determined by PennCNV) on the top-signal region of chromosome 14q, accounting for population stratification. Our analysis showed that this region remained highly significant.

In addition, we found that nine consecutive SNPs on a 46.6kb segment of chromosome 20 (35 438 689–35 485 260) were associated with increased RA risk with P-values = 10⁻⁶ to 10⁻⁷ (shown in bold in Table 4); five consecutive SNPs on a 2 kb segment of chromosome 2 (208 064 035–208 066 083) were associated with decreased RA risk with a P-value of 5.25 × 10⁻⁷. The proto-oncogene tyrosine-protein kinase SRC lies in the 46.6 kb chromosomal segment of chromosome 20.

Additional analysis outcome of cluster-based and PennCNV methods combined

Twelve RA patients and one control commonly shared a 6.6 kb segment of deletion with copy number = 1 by PennCNV on chromosome 19p (2 060 157–2 066 790) that spans two SNP loci. This segment also lay between two adjacent significant SNPs on chromosome 19p (2 054 962–2 165 057) identified by our cluster-based method (shown on the lower second column of Table 2). This cluster of significant SNPs was not statistically significant at a corrected 0.05 nominal significance level, using a Bonferroni-type correction, by our cluster test. The 12 RA patients commonly shared a 15.4 kb segment on chromosome 19p (2 051 346–2 066 790) that spans four SNP loci. The AP301 adaptor-related protein complex 3, delta 1, lies in this region. The Fisher's one-sided (two-sided) exact test for comparing 12/891 versus 1/601 gives a P-value of 1.17 × 10⁻² (1.98 × 10⁻²); significantly more RA cases than controls were observed on this 6.6 kb deleted segment. Supplementary Material, Table S1 provides data on the 12 identified RA patients and 1 control, including their respective affection statuses, copy numbers, deletion segment lengths, starting and ending deletion SNPs and starting and ending physical deletion positions.

DISCUSSION

Because known RA susceptibility loci explain only a small portion of familial clustering (22) and because CNVs are abundant in humans and represent one of the least well-studied classes of genetic variants (18), we attempted to determine some of the unknown heritability by performing a genome-wide study of association between deletions or excess homozygosity and RA risk in this report. We analyzed high-density 550 K SNP genotype data from a genome-wide association study of RA (20). In the SNP-by-SNP analysis using our method (23), we detected the strongest association signal in the MHC region with a maximal aggregation of neighboring significant SNPs at the nominal significance level of 10⁻⁸, which encompasses known deletion variants on HLA-DRB1 and C4 genes. We observed a complex and extensive linkage disequilibrium pattern among significant SNPs in this region.

The subsequent cluster analysis is designed to detect clusters of two or more neighboring significant SNPs overly aggregated on a small chromosomal segment and to test for statistical significance of clustering. In addition to successfully detecting known deleterious deletion variants on HLA-DRB1 and C4 genes in the MHC region (shown in the second column of Table 2), we identified 4.3 and 28 kb clusters of significant SNPs on chromosomes 10p and 13q (shown in the fourth and seventh columns of Table 2) using our cluster test and scan test, which were significant at a corrected 0.05 nominal significance level, adjusted for multiple comparison procedures.

Several RA-associated alleles of modest risk sizes on new loci have been discovered in recent genome-wide association studies. We evaluated the significance status of the neighboring SNPs that encompassed these associated alleles, including PTPN22, STAT4, CTLA4, REL, HLA-DRB1, TNFAIP3, BLK, TRAF1-C5, PRKCQ and CD40. We evaluated 100 adjacent SNPs (50 SNPs on each of the two sides of the associated loci each) from the SNP-by-SNP analysis outcome. Thirty-two significant SNPs encompassed HLA-DRB1; 4 encompassed C4 and 1 (rs2572386) was apart from BLK by 114 kb. Given a complex and extensive linkage disequilibrium pattern in the MHC region, it may not be surprising that many significant SNPs neighbor HLA-DRB1. Further fine-mapping studies are required to determine whether additional risk deletion variants exist besides HLA-DRB1 and C4 in the MHC region.

Independently, we performed PennCNV analyses and obtained whole-genome CNV maps for 891 RA cases and 601 controls with available intensity data. We first identified cases and controls that had chromosomal segments with copy number = 0 or 1; we then used Fisher's exact test to compare the numbers of cases and controls per SNP locus for testing the statistical significance of the association between RA risk and deletions (copy number = 0 or 1 by PennCNV). In Figure 2, we present a graphical SNP-by-SNP outcome summary according to corresponding chromosomal locations with the values of –log₁₀(P-values). We identified 26 significant SNPs aggregating on chromosome 14 with P-values <10⁻⁸ and additional 49 SNPs on chromosomes 2, 14 and 20 with P-values of 10⁻⁵–10⁻⁸. The SNPs that were found on chromosomes 2 and 14 are protective (more controls than cases); those that were found on chromosome 20 increased RA risk (more cases than controls). The 75 SNPs with P-values <10⁻⁵ are presented in Table 4.

The cluster-based and PennCNV methods are different approaches to investigating the relationships between disease status and deletion variants. The cluster-based method is structured to identify commonly shared excess homozygosity among patients with a genetic disorder, providing strong evidence that the genes in the deleted or excess homozygosity region predispose patients to the disease. It uses a two-stage design to evaluate the association with complex human traits from high-density SNP genotype data in genome-wide association studies (23). The evidence of genomic deletions that are associated with disease is further enhanced by observing successive or neighboring SNPs with excess homozygosity in cases compared with in controls in our cluster-based scheme. In contrast, the PennCNV method is an algorithm for cataloging and identifying copy numbers for individuals, using intensity data on the basis of a hidden Markov model (25). We used PennCNV to identify cases and controls that had chromosomal segments with copy number = 0 or 1 and used Fisher's exact test to assess the statistical significance of association between RA risk and deletions by comparing the numbers of cases and controls per SNP locus. The cluster-based and PennCNV methods may be sensitive to different aspects of data and observation, thus providing different information for discovering associated deletion variants or excess homozygosity in RA patients. Notably, our cluster-based method identified the strongest signals on a chromosomal segment that encompassed known deleterious deletion variants on HLA-DRB1 and C4 genes, but the PennCNV analysis did not detect statistical significance in the MHC region.

We performed another cluster-based analysis using a smaller data set of 851 RA cases and 571 controls that was included in the PennCNV analysis and was a subset of the 868 RA cases and 1194 controls in our original cluster analysis. The cluster-based method remained effective and identified the largest association signal with a maximal aggregation of 50 neighboring significant SNPs in the MHC region. Supplementary Material, Figure S1 displays a graphical summary of outcomes of the genome-wide association scan between deletion variants or excess homozygosity and RA risk; SNPs are plotted, according to corresponding chromosomal locations, with the values of –log₁₀(P-values) on the basis of this smaller data set. A smaller data set is not likely to be the major reason that the PennCNV method failed to detect known deleterious deletion variants in the MHC region.

The cluster-based method also detected a segment on chromosome 19p (2 054 962–2 165 057) that was encompassed by two adjacent significant SNPs but was not statistically significant at a corrected 0.05 nominal significance level, using a Bonferroni correction, by our cluster test (shown on the lower second column of Table 2). The PennCNV analysis identified 12 RA patients and 1 control that commonly shared a 6.6 kb segment of copy number = 1 on chromosome 19p (2 060 157–2 066 790) that lay in the segment that was described by our cluster-based approach. We used Fisher's one-sided (two-sided) exact test for comparing cases (12/891) and controls (1/601) and obtained a P-value of 1.17 × 10⁻² (1.98 × 10⁻²): significantly more RA cases than controls were observed on this 6.6 kb chromosomal segment. Supplementary Material, Table S1 presents detailed information on these 13 individuals and their respective deletion segments. Several sequencing-based methods are available to validate deletion variants or excess homozygosity, such as fluorescent in situ hybridization, two-color fluorescence intensity, PCR amplification and quantitative PCR. Biological confirmation and molecular validation on the top-signal chromosomal segments detected by the cluster and PennCNV analyses, including those on chromosomes 10p, 13q, 14q and 19p, are warranted in the future.

In this study, we (i) used our cluster-based method to perform a whole-genome scan of disease-associated deletions or excess homozygosity and identified novel 4.3 and 28 kb clusters on chromosomes 10p and 13q, respectively, at a corrected 0.05 nominal significance level; (ii) used PennCNV and Fisher's exact test to independently perform a whole-genome analysis of association with deletion variants and identified 26 significant SNPs that were overly aggregated on a 165 kb segment of chromosome 14q at a nominal significance level of 10⁻⁸; (iii) identified 12 RA cases and 1 control that commonly shared a 6.6 kb segment with copy number = 1, determined by PennCNV, on chromosome 19p that were also identified by our cluster-based method; (iv) proposed a novel logistic regression method to perform additional analyses for deletions and excess homozygosity, accounting for population stratification.

In contrast to the design of genome-wide association studies in which a point-wise approach is used to find individual disease-associated SNPs, segment-wise approaches are generally used to discover small chromosomal CNV segments. Existing SNP-based approaches and algorithms, including our cluster-based method, are structured to identify deletion variants or excess homozygosity through observing aberrant SNP patterns in a run of consecutive SNPs (4,6,23–25). If we find statistically significant evidence of excess homozygosity at individual SNPs for SNP-by-SNP analyses, we use the cluster-based statistical approach to combine information from multiple neighboring SNPs and find a run of tightly adjacent significant SNPs associated with a disease of interest. In this report, we also provide a strategy and analytical framework that can be used, at no additional cost, to detect disease-associated deletion variants or excess homozygosity and identify individual patients with commonly shared disease-associated deletion variants, using SNP and intensity data from a genome-wide association study.

In addition to unbalanced structural variants, low-frequency and rare variants may explain a portion of the missing heritability of many common human diseases. The high-density SNP genotype data in genome-wide association studies are more likely to capture common CNVs than are low-frequency ones. Furthermore, early commercial SNP array platforms were designed to be biased against SNP genotyping near CNV regions. These factors may limit the sensitivity and scope of SNP-based CNV association studies. However, newer generations of SNP arrays have been designed to eliminate much of the bias against capturing genomic segments affected by CNVs and provide higher-resolution maps of CNVs, enabling more effective and efficient CNV association studies using SNPs (28,29). The recent nucleotide-resolution CNV map on the basis of whole-genome DNA sequencing data will further enable robust investigation in sequencing-based CNV association studies (18).

MATERIALS AND METHODS

Study population

To evaluate the potential role of deletion variants of CNVs that influence the case–control status on a whole-genome scale, we used data from the North American Rheumatoid Arthritis Consortium, genotyped on the Illumina HumanHap550 array. The study population consisted of 868 cases and 1194 controls from North America and was previously reported in a genome-wide association study of RA susceptibility loci (20). All patients were anti-CCP-positive and met the criteria for RA adopted by the American College of Rheumatology in 1987. Cases and controls were self-reported as white. Genotyping was performed on the SNP assay with Infinium HumanHap550 (Illumina), and 54 080 SNPs were genotyped in samples from cases and controls. The data set was filtered individually on the basis of SNP genotype call rates (>95% completeness), minor allele frequency (>0.01) and the Hardy–Weinberg proportion (P ≥ 10⁻⁵). Patients and controls whose percentages of missing genotypes were >5%, who had non-European ancestry, who were related, or who had evidence of DNA contamination were removed from the analysis. Written informed consent was obtained from all subjects who provided blood samples, in accordance with protocols approved by the local institutional review boards. More details of the sample collection used are described elsewhere (20).

SNP-based statistical method in a two-stage design

Current molecular technologies and SNP genotyping methods have technical challenges that result in relatively limited resolutions; they are not capable of effectively identifying and cataloging CNVs in whole-genome array scans. CNVs and genomic deletions in particular can perturb the collection of SNP genotype data in CNV regions, causing SNP intensity data to cluster poorly and SNP genotypes in the hemizygous deletion regions to be observed as homozygous for the present allele (4,6,24).

We recently proposed and developed a statistical method that uses a two-stage design to detect deletion variants or excess homozygosity that are associated with complex human traits from high-density SNP genotype data in genome-wide association studies. The method was designed for single-SNP analyses on the first stage and utilized evidence from multiple adjacent SNPs combined with a cluster-based approach on the second stage in case–control studies (23). SNP-based methods, including our cluster-based method, are not capable of effectively distinguishing between homozygosity and deletions. The identification of excess homozygosity regions in multiple cases forms the basis of our method. It was structured to detect commonly shared deletion variants or excess homozygosity among patients with a genetic disorder, providing strong evidence that the genes in the deleted region predispose patients to the disease.

Test for SNP-by-SNP analyses on the first stage

We compared the level of homozygosity on each contiguous SNP locus by using normal approximations to test the significance of differences in homozygosity proportions between cases and controls on the first stage. This test infers the presence of genomic deletions associated with disease by assessing the statistical significance of higher homozygosity proportions in cases than in controls. Letting Inline graphic and be the respective estimates of homozygosity proportions in cases and controls at a single SNP locus and be their weighted average, the normal deviate Z is based on the difference in proportion quantities, , divided by its standard error, , where n₁ and n₁ represent the sample sizes of cases and controls, respectively. This z-score test can be performed on each contiguous SNP locus along the whole human genome.

The above method does not account for covariates in the model (e.g. eigen vectors for population stratification, age and sex). Therefore, we considered a logistic regression framework extension of this approach to assess significance of excess homozygosity or CNV as follows: log(Pr(individual is a case)/Pr(individual is a control) = b₀ + b₁ × x + b₂ × eigenvectors + b₃ × covariates, where x is the indicator of homozygosity status (or copy number = 0, 1) at a SNP locus for an individual; b₂ is a vector with the same dimension as the numbers of eigenvectors adjusted for population stratification. In our analyses, we adjusted for the top four significant eigenvalues as performed in the original genome-wide association study (20).

Test for cluster analyses on the second stage

Evidence of disease-associated genomic deletions can be enhanced by observing successive or neighboring SNPs with excess homozygosity in cases compared with in controls in our cluster-based scheme. In addition, it can delineate or define the extent of the minimal regions of common genomic deletions among patients, indicating the critical region of disease. Our cluster test is useful for subsequent and further investigations into the outcomes of SNP-by-SNP analyses on the first stage and is designed to assess the statistical significance of multiple clusters of SNPs with excess homozygosity in cases compared with in controls.

Suppose that T SNP loci over a chromosomal region are tested using the z-score test for SNP-by-SNP analyses in which k SNP loci have significantly higher homozygosity proportions in cases than in controls. Consider the frequency of significant SNP loci occurring within a narrow segment of interest compared with the frequency of significant SNP loci over the whole region. Suppose that the narrow segment of interest encompasses w SNP loci, among which x SNP loci have significant excess homozygosity in cases aggregating in this segment. What interests us is to determine whether the observation of x significant SNP loci in the segment that contains w SNP loci is statistically significant compared with the occurrence of k significant SNP loci over T SNP loci. Assuming that each of the T SNP loci tested is independently and equally likely to have a significant excess homozygosity proportion in cases and assuming that X represents the number of significant SNP loci within the segment that contains w SNP loci, the statistical test for cluster analyses is based on the random variable X with a binomial distribution. The P-value formula for this cluster test under the null hypothesis of random allocations of significant SNP loci is expressed as follows:

(1)

where x represents the observed number of significant SNP loci within the segment that contains w SNP loci. A small P-value of expression (1) indicates that the occurrence of x significant SNP loci aggregating in a w-SNP interval cannot be explained by chance alone. Our cluster test is an exact statistical test and has proved to be useful and robust in the presence of linkage disequilibrium (23).

Detection of excess homozygosity regions across multiple adjacent or neighboring SNPs forms the basis of this method for cluster analyses. Related statistical methods have been developed for detecting temporal and space-time clusters or anomalies of disease in epidemiology studies (30–33). Our cluster test is designed to assess the statistical significance of multiple clusters of SNP loci with excess homozygosity in cases compared with in controls. In contrast, the scan test is structured to detect the largest cluster. The scan test employs a moving window of pre-determined length and finds the maximum number of cases revealed through the window as it slides over the entire region (34). When only the largest cluster is being assessed or only one observed cluster is present, the scan test is useful. In the case of applications to the human genome, we often find more than one aggregation of neighboring significant SNPs on a chromosome. It is noteworthy that the cluster test in expression (1) gives exact P-values; the P-value formulae of the scan test provide approximate results in most situations. In this report, we provide a P-value for each cluster of significant SNPs in our cluster test and a P-value for the largest cluster of significant SNPs in the scan test on a chromosome arm.

PennCNV method

PennCNV is an algorithm for identifying and cataloging copy numbers for individuals in a hidden Markov model framework: a statistical method that models a Markov process in which the probability of observing a state only depends on the states at previous time points (25). PennCNV uses a first-order hidden Markov model to account for dependence structure between hidden copy numbers at nearby SNPs: the hidden copy number state at each SNP only depends on the copy number state at most preceding SNP. PennCNV integrates multiple sources of information, including total signal intensity, allelic signal intensity ratio, population SNP allele frequency and distance between neighboring SNPs. It was used to experimentally validate and fine-map CNVs in the FBXL7, EYA1 and CTDSPL genes (25). Instead of three distinct states of ‘loss’, ‘normal’ and ‘gain’, PennCNV uses a 6-state definition to model copy numbers from 0 to 4 and copy neutral loss of heterozygosity. PennCNV calculates the probabilities of all six states at each SNP locus and calls copy numbers from the most likely state sequence.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

FUNDING

The research presented in this manuscript was partially supported by U.S. NIH/NCI grant R03 CA143979 to C.C.W. and NIH grants AR44422 and the Human Pedigree Analysis Resource of P30 CA016772 to C.I.A.

Supplementary Material

Supplementary Data

supp_22_6_1249__index.html^{(1.1KB, html)}

ACKNOWLEDGEMENTS

Dr Peter Gregersen has an extensive publication history focusing on elucidating the mechanisms of action of genetic factors in causing rheumatoid arthritis and other autoimmune conditions. Dr Annette Lee has extensive experience in characterizing genetic contributions to autoimmune diseases. We thank Drs Lee and Gregersen for assisting in the presented research by making data available for this study.

Conflicts of interest statement: None declared.

REFERENCES

1.Conrad D.F., Pinto D., Redon R., Feuk L., Gokcumen O., Zhang Y., Aerts J., Andrews T.D., Barnes C., Campbell P., et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–712. doi: 10.1038/nature08516. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Redon R., Ishikawa S., Fitch K.R., Feuk L., Perry G.H., Andrews T.D., Fiegler H., Shapero M.H., Carson A.R., Chen W., et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. doi: 10.1038/nature05329. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wong K.K., deLeeuw R.J., Dosanjh N.S., Kimm L.R., Cheng Z., Horsman D.E., MacAulay C., Ng R.T., Brown C.J., Eichler E.E., Lam W.L. A comprehensive analysis of common copy-number variations in the human genome. Am. J. Hum. Genet. 2007;80:91–104. doi: 10.1086/510560. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Conrad D.F., Andrews T.D., Carter N.P., Hurles M.E., Pritchard J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 2006;38:75–81. doi: 10.1038/ng1697. [DOI] [PubMed] [Google Scholar]
5.Hinds D.A., Kloek A.P., Jen M., Chen X., Frazer K.A. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 2006;38:82–85. doi: 10.1038/ng1695. [DOI] [PubMed] [Google Scholar]
6.McCarroll S.A., Hadnott T.N., Perry G.H., Sabeti P.C., Zody M.C., Barrett J.C., Dallaire S., Gabriel S.B., Lee C., Daly M.J., Altshuler D.M. Common deletion polymorphisms in the human genome. Nat. Genet. 2006;38:86–92. doi: 10.1038/ng1696. [DOI] [PubMed] [Google Scholar]
7.Lindsay E.A. Chromosomal microdeletions: dissecting del22q11 syndrome. Nat. Rev. Genet. 2001;2:858–868. doi: 10.1038/35098574. [DOI] [PubMed] [Google Scholar]
8.Moreno-De-Luca D., Mulle J.G., Kaminsky E.B., Sanders S.J., Myers S.M., Adam M.P., Pakula A.T., Eisenhauer N.J., Uhas K., Weik L., et al. Deletion 17q12 is a recurrent copy number variant that confers high risk of autism and schizophrenia. Am. J. Hum. Genet. 2010;87:618–630. doi: 10.1016/j.ajhg.2010.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Stefansson H., Rujescu D., Cichon S., Pietilainen O.P., Ingason A., Steinberg S., Fossdal R., Sigurdsson E., Sigmundsson T., Buizer-Voskamp J.E., et al. Large recurrent microdeletions associated with schizophrenia. Nature. 2008;455:232–236. doi: 10.1038/nature07229. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Weiss L.A., Shen Y., Korn J.M., Arking D.E., Miller D.T., Fossdal R., Saemundsen E., Stefansson H., Ferreira M.A., Green T., et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 2008;358:667–675. doi: 10.1056/NEJMoa075974. [DOI] [PubMed] [Google Scholar]
11.Yu C.E., Dawson G., Munson J., D'Souza I., Osterling J., Estes A., Leutenegger A.L., Flodman P., Smith M., Raskind W.H., et al. Presence of large deletions in kindreds with autism. Am. J. Hum. Genet. 2002;71:100–115. doi: 10.1086/341291. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.McCarroll S.A., Huett A., Kuballa P., Chilewski S.D., Landry A., Goyette P., Zody M.C., Hall J.L., Brant S.R., Cho J.H., et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 2008;40:1107–1112. doi: 10.1038/ng.215. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Willer C.J., Speliotes E.K., Loos R.J., Li S., Lindgren C.M., Heid I.M., Berndt S.I., Elliott A.L., Jackson A.U., Lamina C., et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 2009;41:25–34. doi: 10.1038/ng.287. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Pelak K., Need A.C., Fellay J., Shianna K.V., Feng S., Urban T.J., Ge D., De L.A., Martinez-Picado J., Wolinsky S.M., et al. Copy number variation of KIR genes influences HIV-1 control. PLoS Biol. 2011;9:e1001208. doi: 10.1371/journal.pbio.1001208. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Gamazon E.R., Nicolae D.L., Cox N.J. A study of CNVs as trait-associated polymorphisms and as expression quantitative trait loci. PLoS Genet. 2011;7:e1001292. doi: 10.1371/journal.pgen.1001292. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Craddock N., Hurles M.E., Cardin N., Pearson R.D., Plagnol V., Robson S., Vukcevic D., Barnes C., Conrad D.F., Giannoulatou E., et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3000 shared controls. Nature. 2010;464:713–720. doi: 10.1038/nature08979. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.McCarroll S.A. Extending genome-wide association studies to copy-number variation. Hum. Mol. Genet. 2008;17:R135–R142. doi: 10.1093/hmg/ddn282. [DOI] [PubMed] [Google Scholar]
18.Mills R.E., Walter K., Stewart C., Handsaker R.E., Chen K., Alkan C., Abyzov A., Yoon S.C., Ye K., Cheetham R.K., et al. Mapping copy number variation by population-scale genome sequencing. Nature. 2011;470:59–65. doi: 10.1038/nature09708. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Gregersen P.K., Behrens T.W. Genetics of autoimmune diseases—disorders of immune homeostasis. Nat. Rev. Genet. 2006;7:917–928. doi: 10.1038/nrg1944. [DOI] [PubMed] [Google Scholar]
20.Plenge R.M., Seielstad M., Padyukov L., Lee A.T., Remmers E.F., Ding B., Liew A., Khalili H., Chandrasekaran A., Davies L.R., et al. TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl. J. Med. 2007;357:1199–1209. doi: 10.1056/NEJMoa073491. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wordsworth P., Bell J. Polygenic susceptibility in rheumatoid arthritis. Ann. Rheum. Dis. 1991;50:343–346. doi: 10.1136/ard.50.6.343. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Stahl E.A., Raychaudhuri S., Remmers E.F., Xie G., Eyre S., Thomson B.P., Li Y., Kurreeman F.A., Zhernakova A., Hinks A., et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 2010;42:508–514. doi: 10.1038/ng.582. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wu C.C., Shete S., Chen W.V., Peng B., Lee A.T., Ma J., Gregersen P.K., Amos C.I. Detection of disease-associated deletions in case-control studies using SNP genotypes with application to rheumatoid arthritis. Hum. Genet. 2009;126:303–315. doi: 10.1007/s00439-009-0672-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kohler J.R., Cutler D.J. Simultaneous discovery and testing of deletions for disease association in SNP genotyping studies. Am. J. Hum. Genet. 2007;81:684–699. doi: 10.1086/520823. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Wang K., Li M., Hadley D., Liu R., Glessner J., Grant S.F., Hakonarson H., Bucan M. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17:1665–1674. doi: 10.1101/gr.6861907. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Beck S., Trowsdale J. Sequence organisation of the class II region of the human MHC. Immunol. Rev. 1999;167:201–210. doi: 10.1111/j.1600-065x.1999.tb01393.x. [DOI] [PubMed] [Google Scholar]
27.Spies T., Sorrentino R., Boss J.M., Okada K., Strominger J.L. Structural organization of the DR subregion of the human major histocompatibility complex. Proc. Natl Acad. Sci. U S A. 1985;82:5165–5169. doi: 10.1073/pnas.82.15.5165. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Korn J.M., Kuruvilla F.G., McCarroll S.A., Wysoker A., Nemesh J., Cawley S., Hubbell E., Veitch J., Collins P.J., Darvishi K., et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs 10. Nat. Genet. 2008;40:1253–1260. doi: 10.1038/ng.237. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.McCarroll S.A., Kuruvilla F.G., Korn J.M., Cawley S., Nemesh J., Wysoker A., Shapero M.H., de Bakker P.I., Maller J.B., Kirby A., et al. Integrated detection and population-genetic analysis of SNPs and copy number variation 11. Nat. Genet. 2008;40:1166–1174. doi: 10.1038/ng.238. [DOI] [PubMed] [Google Scholar]
30.Grimson R.C. Disease clusters, exact distributions of maxima, and P-values. Stat. Med. 1993;12:1773–1794. doi: 10.1002/sim.4780121906. [DOI] [PubMed] [Google Scholar]
31.Grimson R.C., Mendelsohn S. A method for detecting current temporal clusters of toxic events through data monitoring by poison control centers. J. Toxicol. Clin. Toxicol. 2000;38:761–765. doi: 10.1081/clt-100102389. [DOI] [PubMed] [Google Scholar]
32.Wu C.C., Grimson R.C., Amos C.I., Shete S. Statistical methods for anomalous discrete time series based on minimum cell count. Biom. J. 2008;50:86–96. doi: 10.1002/bimj.200610374. [DOI] [PubMed] [Google Scholar]
33.Wu C.C., Grimson R.C., Shete S. Exact statistical tests for heterogeneity of frequencies based on extreme values. Commun. Stat. Simul. 2010;39:612–623. doi: 10.1080/03610910903528335. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Wallenstein S., Neff N. An approximation for the distribution of the scan statistic. Stat. Med. 1987;6:197–207. doi: 10.1002/sim.4780060212. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

supp_22_6_1249__index.html^{(1.1KB, html)}

supp_dds512_dds512supp.doc^{(705.5KB, doc)}

supp_dds512_dds512supp_file.xlsx^{(47.8KB, xlsx)}

[DDS512C1] 1.Conrad D.F., Pinto D., Redon R., Feuk L., Gokcumen O., Zhang Y., Aerts J., Andrews T.D., Barnes C., Campbell P., et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–712. doi: 10.1038/nature08516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C2] 2.Redon R., Ishikawa S., Fitch K.R., Feuk L., Perry G.H., Andrews T.D., Fiegler H., Shapero M.H., Carson A.R., Chen W., et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. doi: 10.1038/nature05329. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C3] 3.Wong K.K., deLeeuw R.J., Dosanjh N.S., Kimm L.R., Cheng Z., Horsman D.E., MacAulay C., Ng R.T., Brown C.J., Eichler E.E., Lam W.L. A comprehensive analysis of common copy-number variations in the human genome. Am. J. Hum. Genet. 2007;80:91–104. doi: 10.1086/510560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C4] 4.Conrad D.F., Andrews T.D., Carter N.P., Hurles M.E., Pritchard J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 2006;38:75–81. doi: 10.1038/ng1697. [DOI] [PubMed] [Google Scholar]

[DDS512C5] 5.Hinds D.A., Kloek A.P., Jen M., Chen X., Frazer K.A. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 2006;38:82–85. doi: 10.1038/ng1695. [DOI] [PubMed] [Google Scholar]

[DDS512C6] 6.McCarroll S.A., Hadnott T.N., Perry G.H., Sabeti P.C., Zody M.C., Barrett J.C., Dallaire S., Gabriel S.B., Lee C., Daly M.J., Altshuler D.M. Common deletion polymorphisms in the human genome. Nat. Genet. 2006;38:86–92. doi: 10.1038/ng1696. [DOI] [PubMed] [Google Scholar]

[DDS512C7] 7.Lindsay E.A. Chromosomal microdeletions: dissecting del22q11 syndrome. Nat. Rev. Genet. 2001;2:858–868. doi: 10.1038/35098574. [DOI] [PubMed] [Google Scholar]

[DDS512C8] 8.Moreno-De-Luca D., Mulle J.G., Kaminsky E.B., Sanders S.J., Myers S.M., Adam M.P., Pakula A.T., Eisenhauer N.J., Uhas K., Weik L., et al. Deletion 17q12 is a recurrent copy number variant that confers high risk of autism and schizophrenia. Am. J. Hum. Genet. 2010;87:618–630. doi: 10.1016/j.ajhg.2010.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C9] 9.Stefansson H., Rujescu D., Cichon S., Pietilainen O.P., Ingason A., Steinberg S., Fossdal R., Sigurdsson E., Sigmundsson T., Buizer-Voskamp J.E., et al. Large recurrent microdeletions associated with schizophrenia. Nature. 2008;455:232–236. doi: 10.1038/nature07229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C10] 10.Weiss L.A., Shen Y., Korn J.M., Arking D.E., Miller D.T., Fossdal R., Saemundsen E., Stefansson H., Ferreira M.A., Green T., et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 2008;358:667–675. doi: 10.1056/NEJMoa075974. [DOI] [PubMed] [Google Scholar]

[DDS512C11] 11.Yu C.E., Dawson G., Munson J., D'Souza I., Osterling J., Estes A., Leutenegger A.L., Flodman P., Smith M., Raskind W.H., et al. Presence of large deletions in kindreds with autism. Am. J. Hum. Genet. 2002;71:100–115. doi: 10.1086/341291. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C12] 12.McCarroll S.A., Huett A., Kuballa P., Chilewski S.D., Landry A., Goyette P., Zody M.C., Hall J.L., Brant S.R., Cho J.H., et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 2008;40:1107–1112. doi: 10.1038/ng.215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C13] 13.Willer C.J., Speliotes E.K., Loos R.J., Li S., Lindgren C.M., Heid I.M., Berndt S.I., Elliott A.L., Jackson A.U., Lamina C., et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 2009;41:25–34. doi: 10.1038/ng.287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C14] 14.Pelak K., Need A.C., Fellay J., Shianna K.V., Feng S., Urban T.J., Ge D., De L.A., Martinez-Picado J., Wolinsky S.M., et al. Copy number variation of KIR genes influences HIV-1 control. PLoS Biol. 2011;9:e1001208. doi: 10.1371/journal.pbio.1001208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C15] 15.Gamazon E.R., Nicolae D.L., Cox N.J. A study of CNVs as trait-associated polymorphisms and as expression quantitative trait loci. PLoS Genet. 2011;7:e1001292. doi: 10.1371/journal.pgen.1001292. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C16] 16.Craddock N., Hurles M.E., Cardin N., Pearson R.D., Plagnol V., Robson S., Vukcevic D., Barnes C., Conrad D.F., Giannoulatou E., et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3000 shared controls. Nature. 2010;464:713–720. doi: 10.1038/nature08979. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C17] 17.McCarroll S.A. Extending genome-wide association studies to copy-number variation. Hum. Mol. Genet. 2008;17:R135–R142. doi: 10.1093/hmg/ddn282. [DOI] [PubMed] [Google Scholar]

[DDS512C18] 18.Mills R.E., Walter K., Stewart C., Handsaker R.E., Chen K., Alkan C., Abyzov A., Yoon S.C., Ye K., Cheetham R.K., et al. Mapping copy number variation by population-scale genome sequencing. Nature. 2011;470:59–65. doi: 10.1038/nature09708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C19] 19.Gregersen P.K., Behrens T.W. Genetics of autoimmune diseases—disorders of immune homeostasis. Nat. Rev. Genet. 2006;7:917–928. doi: 10.1038/nrg1944. [DOI] [PubMed] [Google Scholar]

[DDS512C20] 20.Plenge R.M., Seielstad M., Padyukov L., Lee A.T., Remmers E.F., Ding B., Liew A., Khalili H., Chandrasekaran A., Davies L.R., et al. TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl. J. Med. 2007;357:1199–1209. doi: 10.1056/NEJMoa073491. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C21] 21.Wordsworth P., Bell J. Polygenic susceptibility in rheumatoid arthritis. Ann. Rheum. Dis. 1991;50:343–346. doi: 10.1136/ard.50.6.343. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C22] 22.Stahl E.A., Raychaudhuri S., Remmers E.F., Xie G., Eyre S., Thomson B.P., Li Y., Kurreeman F.A., Zhernakova A., Hinks A., et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 2010;42:508–514. doi: 10.1038/ng.582. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C23] 23.Wu C.C., Shete S., Chen W.V., Peng B., Lee A.T., Ma J., Gregersen P.K., Amos C.I. Detection of disease-associated deletions in case-control studies using SNP genotypes with application to rheumatoid arthritis. Hum. Genet. 2009;126:303–315. doi: 10.1007/s00439-009-0672-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C24] 24.Kohler J.R., Cutler D.J. Simultaneous discovery and testing of deletions for disease association in SNP genotyping studies. Am. J. Hum. Genet. 2007;81:684–699. doi: 10.1086/520823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C25] 25.Wang K., Li M., Hadley D., Liu R., Glessner J., Grant S.F., Hakonarson H., Bucan M. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17:1665–1674. doi: 10.1101/gr.6861907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C26] 26.Beck S., Trowsdale J. Sequence organisation of the class II region of the human MHC. Immunol. Rev. 1999;167:201–210. doi: 10.1111/j.1600-065x.1999.tb01393.x. [DOI] [PubMed] [Google Scholar]

[DDS512C27] 27.Spies T., Sorrentino R., Boss J.M., Okada K., Strominger J.L. Structural organization of the DR subregion of the human major histocompatibility complex. Proc. Natl Acad. Sci. U S A. 1985;82:5165–5169. doi: 10.1073/pnas.82.15.5165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C28] 28.Korn J.M., Kuruvilla F.G., McCarroll S.A., Wysoker A., Nemesh J., Cawley S., Hubbell E., Veitch J., Collins P.J., Darvishi K., et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs 10. Nat. Genet. 2008;40:1253–1260. doi: 10.1038/ng.237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C29] 29.McCarroll S.A., Kuruvilla F.G., Korn J.M., Cawley S., Nemesh J., Wysoker A., Shapero M.H., de Bakker P.I., Maller J.B., Kirby A., et al. Integrated detection and population-genetic analysis of SNPs and copy number variation 11. Nat. Genet. 2008;40:1166–1174. doi: 10.1038/ng.238. [DOI] [PubMed] [Google Scholar]

[DDS512C30] 30.Grimson R.C. Disease clusters, exact distributions of maxima, and P-values. Stat. Med. 1993;12:1773–1794. doi: 10.1002/sim.4780121906. [DOI] [PubMed] [Google Scholar]

[DDS512C31] 31.Grimson R.C., Mendelsohn S. A method for detecting current temporal clusters of toxic events through data monitoring by poison control centers. J. Toxicol. Clin. Toxicol. 2000;38:761–765. doi: 10.1081/clt-100102389. [DOI] [PubMed] [Google Scholar]

[DDS512C32] 32.Wu C.C., Grimson R.C., Amos C.I., Shete S. Statistical methods for anomalous discrete time series based on minimum cell count. Biom. J. 2008;50:86–96. doi: 10.1002/bimj.200610374. [DOI] [PubMed] [Google Scholar]

[DDS512C33] 33.Wu C.C., Grimson R.C., Shete S. Exact statistical tests for heterogeneity of frequencies based on extreme values. Commun. Stat. Simul. 2010;39:612–623. doi: 10.1080/03610910903528335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[DDS512C34] 34.Wallenstein S., Neff N. An approximation for the distribution of the scan statistic. Stat. Med. 1987;6:197–207. doi: 10.1002/sim.4780060212. [DOI] [PubMed] [Google Scholar]

PERMALINK

Whole-genome detection of disease-associated deletions or excess homozygosity in a case–control study of rheumatoid arthritis

Chih-Chieh Wu

Sanjay Shete

Eun-Ji Jo

Yaji Xu

Emily Y Lu

Wei V Chen

Christopher I Amos

Abstract

INTRODUCTION

RESULTS

Table 1.

Figure 1.

Cluster analysis under the first criterion

Table 2.

Cluster analysis under the second criterion

Table 3.

Whole-genome scan of RA-association using PennCNV

Figure 2.

Figure 3.

Table 4.

Figure 4.

Additional analysis outcome of cluster-based and PennCNV methods combined

DISCUSSION

MATERIALS AND METHODS

Study population

SNP-based statistical method in a two-stage design

Test for SNP-by-SNP analyses on the first stage

Test for cluster analyses on the second stage

PennCNV method

SUPPLEMENTARY MATERIAL

FUNDING

Supplementary Material

ACKNOWLEDGEMENTS

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases