Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: Genet Epidemiol. 2022 Feb 16;46(3-4):145–158. doi: 10.1002/gepi.22444

Integrating External Controls in Case-Control Studies Improves Power for Rare-Variant Tests

Yatong Li 1, Seunggeun Lee 1,2
PMCID: PMC9393083  NIHMSID: NIHMS1775772  PMID: 35170803

Abstract

Large-scale sequencing and genotyping data provide an opportunity to integrate external samples as controls to improve power of association tests. However, due to the systematic differences between genotyped samples from different studies, naively aggregating the controls could lead to inflation in type I error rates. There has been recent effort to integrate external controls while adjusting for batch effect, such as the integrating External Controls into Association Test (iECAT) and its score-based single variant tests. Building on the original iECAT framework, we propose an iECAT-Score region-based test that increases power for rare-variant tests when integrating external controls. This method assesses the systematic batch effect between internal and external samples at each variant and constructs compound shrinkage score statistics to test for the joint genetic effect within a gene or a region, while adjusting for covariates and population stratification. Through simulation studies, we demonstrate that the proposed method controls for type I error rates and improves power in rare-variant tests. The application of the proposed method to the association studies of age-related macular degeneration (AMD) from the International AMD Genomics Consortium (IAMDGC) and UK Biobank revealed novel rare-variant associations in gene DXO. Through the incorporation of external controls, the iECAT methods offer a powerful suite to identify disease-associated genetic variants, further shedding light on future directions to investigate roles of rare variants in human diseases.

Keywords: External control, case-control study, rare variant test, SKAT, GWAS

1. Introduction

Recent advances in genotyping and sequencing technologies have enabled progressively larger scale sequencing and genotyping projects to identify disease-associated rare (Cruchaga et al., 2014). For instance, the Michigan Genomics Initiative has collected genotype data from over 66,000 unrelated individuals within Michigan Medicine; the UK Biobank has produced genome-wide genotype data on approximately 500,000 individuals from the United Kingdom. This rapid increase in the number of genotyped individuals provides a unique opportunity to develop methodologies that can leverage the large-scale sequencing and genotyping projects, whose data are publicly available, as additional control samples to increase the power of rare-variant association testing in case-control studies.

When combining controls from external studies, systematic batch effect between genotyped data from different studies are likely to exist due to differences in sequencing platforms, genotype calling procedures and population stratification. Undesired type I error inflation could result from the systematic batch effect if they are left unaddressed. Several recent methodologic developments have attempted to address the systematic differences between genotyped data of internal and external sources, most of which directly or indirectly use sequence read data. Derkach et al. (Derkach et al., 2014) developed a score test that replaces the called genotype with an expected genotype. The calculation of expected genotype requires known read depths, base-calling error rates of the sequencing platform and prior knowledge on allele frequencies. Using expected genotype accounts for several factors that contribute to the systematic batch effect between genotyped data from different studies and thus reduces inflation in type I error rates, but the calculation of which could be challenging if the posterior genotype likelihoods are not provided in the genotype vcf files. In addition, by considering the retrospective setting, this method does not allow for covariate adjustment. Extending Derkash’s method, Chen and Lin (Chen & Lin, 2018) proposed regression calibration (RC)-based and maximum-likelihood (ML)-based methods to account for differential sequencing errors between cases and controls; these methods allow for parameter’s effect size estimation, with the assumption that weak confounding from population stratification are the only potential confounders. Hu et al. (Hu, Liao, Johnston, Allen, & Satten, 2016) proposed a likelihood-based method that directly models sequencing reads using sequence data without calling the genotypes. This method first estimates the single nucleotide variant (SNV) locations and then applies a burden-type test to assess the significance of the association between an SNV and a trait. Hendricks et al. (Hendricks et al., 2018) proposed ProxECAT which uses allele counts from genotyped data to estimate enrichment of rare variants in external controls. Although this method does not require genotype probabilities or sequence read data to be available, it does not include internal control samples in the analyses as a baseline reference and results in consistent inflation in Type I error rates. Thus, the author suggests using more conservative significance level, which potentially limits the power of the association test.

To address the shortcomings of the above methods, we recently proposed a novel score-based test, the iECAT-Score test, that uses genotype data to integrate external control samples into association test. We built upon the original iECAT test developed by Lee et al (Lee, Kim, & Fuchsberger, 2017), which assesses the batch effect and includes external control samples using allele counts from genotype data, and developed a score test that further allows for covariate adjustment. Compared to the iECAT test in its originally presented format, the score tests are more stable, computationally efficient, and allow to adjust for covariates and population stratification. Through applying recent improvements of score tests including the Saddlepoint approximation (SPA) (Dey, Schmidt, Abecasis, & Lee, 2017) and efficient resampling (ER) (Lee, Fuchsberger, Kim, & Scott, 2015) methods, the iECAT-Score test protects the type I error in the scenario of case-control imbalance and low minor allele count (MAC).

The iECAT-Score test we previously proposed tests association between a single variant and the disease status. It controls type I error rates while increasing samples from external studies to improve power for association tests. However, in the case of rare-variant association test, single-variant tests are often underpowered to identify causal rare variants. Hence, in this work, we extend the single-variant iECAT-Score test to burden (B. Li & Leal, 2008), SKAT (Wu et al., 2011) and SKAT-O (Lee, Wu, & Lin, 2012) type tests to test for the combined genetic effects in a gene or region. Similar to the burden, SKAT and SKAT-O tests that are used in the common case-control studies setting, the iECAT-Score region test aggregates the single-variant test statistics using a weighted linear (burden) or quadratic (SKAT) sum, or a linear combination of both (SKAT-O). Association between the rare variants in the region and the phenotype is then assessed by comparing the compound statistic to a specified distribution under the null of no genetic effect.

We organize this article as the following. We first introduce the model for the rare variant association test in case-control studies using burden, SKAT, and SKATO tests, and propose the iECAT-Score region tests that allow for integration of external control samples in case-control association tests. We then describe the simulation studies to assess the type I error rates and power of our proposed methods, as well as their applications to the association studies of age-related macular degeneration (AMD) combining data from the International AMD Genomics Consortium (IAMDGC) (Fritsche et al., 2015) and the UK Biobank (Bycroft et al., 2018). Finally, we present the results from simulation studies and data analyses of the proposed methods, discuss our findings, and provide guidelines for integrating external control samples in case-control studies.

2. Materials and Methods

2.1. iECAT-Score Region-based association test

The iECAT-Score region-based test is a shrinkage-estimation-based test for aggregated genetic effects within a genomic region. At each variant within the region, the iECAT-Score test assesses the batch effect between internal and external control samples and constructs a shrinkage estimator to access the single-variant genetic effect. The iECAT-Score region-based test then groups the single variant test statistics to test for association between the joint effect of variants in a region and outcome of disease status, using burden, SKAT, and SKAT-O type methods.

Region-Based Test for Genetic Effect

To present the model for a region-based test that tests for the aggregated genetic effect within a gene or a region, we first consider a scenario of no external controls. Consider the internal study of sample size n. For subject i, let yi=0/1 be the dichotomous phenotype for control/case; Xi=Xi1, Xi2, .., Xip is the covariate vector of length p; Gi=Gi1,Gi2,..,Gim is the vector of genotypes consisting genotypes at m variants within a region. Hence, G=G1, G2,.., GnT is the n×m genotype matrix for the n subjects at m variants. To relate the phenotype Y, the covariates X, and the genotype G, we consider the following logistic regression model

logitPr(Yi=1 |Xi, Gi]=Xiα+Giβ (1)

where the phenotype Yi follows a Bernoulli distribution, α is an p×1 vector of coefficients for the covariates, and β=β1,β2,..,βmT are the regression coefficients for the m variants in a region. We assume that β is a random vector with Eβj=0, Varβj=wj2τ where wj is the weight assigned to variant j, and corrβj, βk=ρ, j, k1,2,..,m. To test for the association between the phenotypes Yi and the genotypes within the region, we want to test for the null hypothesis H0:β=0 in Equation (1).

Within the region of m variants, the score test statistic at variant j is given by Sj=i=1nGijYiμ^i where μ^i is the maximum likelihood estimate of μi with μ={μi}={Pr(Yi=1 |Xi)} under H0. To test the null hypothesis under the assumption that τ=0, the burden- and SKAT-type score test statistics can be constructed as

QB=j=1mwjSj2, and QS=j=1mwjSj2,

where wj is the weight assigned to variant j (Wu et al., 2011). The omnibus SKAT-O type test takes the form of weighted sum of the burden and SKAT test statistics and can be constructed

Qρ=1ρQB+ρQS,

where ρ is a parameter between 0 and 1 (Lee et al., 2012).

Under the null hypothesis of no genetic effect, S approximately follows a multivariate Gaussian distribution with mean zero and variance Φ=WΣW, where W=wj is the diagonal weight matrix and Σ is the covariance matrix of S1, S2, , Sm'. The covariance matrix of S1, S2, , Sm' is given by Σ=A1/2CA1/2, with A=diagVarS1, VarS2, , VarSm and C a m×m correlation matrix of m variants. The statistic Qρ approximately follows mixture of chi-square distributions under the null hypothesis, and Davies method (Davies, 1980) can be applied to obtain the p value. As the parameter ρ is unknown, the SKAT-O test adaptively applies the minimum p values over a grid of ρ to search for a ρ that maximizes power.

iECAT-Score Region-Based Test

We introduce external control samples to the internal study samples that consist of cases and controls. We apply the single-variant iECAT-Score method (Y. Li & Lee, 2021) to each variant that integrates external controls to improve the power. Let n1I, n0I, n0E denote the sample sizes of internal cases, internal controls, and external controls, respectively. Similar to the notations in model (1), we let Yi=0/1i=1,2,, n1I+ n0I+ n0E be the dichotomous phenotype for control/case.

At variant j within a region, the score statistic that tests for the association between the variant j and the phenotype using exclusively internal samples is given by Sint,j=Gint,jTYintμ^int. In this equation, Gint,j=Gint,1,j, Gint,2,j,.., Gint,(n1I+ n0I),jT is the vector of genotypes at variant j of internal samples; similarly, Yint=Yint,1, Yint,2,.., Yint,(n1I+ n0I)T is the vector of phenotypes of internal samples, and μ^int=μ^int,1, μ^int,2, .., μ^int,(n1I+ n0I)T is the vector of maximum likelihood estimate of μint under the null logistic regression model of no genetic effect built using internal samples only as in model (1). When external control samples are included assuming no systematic differences between internal and external studies, we construct a score statistic at variant j as Sall,j=Gall,jTYallμ^all. In this equation, Gall,j, Yall and μ^all are vectors of length n1I+ n0I+ n0E, denoting genotypes at variant j, phenotypes, and expected mean outcome under a null model built of combined internal and external samples.

Using a similar approach to the single variant iECAT-Score method, we quantify the level of batch effect between internal and external control samples at each variant within a region. Specifically, we test for an association between each genetic variant and whether a control sample belongs to the internal or external study, while adjusting for covariates. We define a new outcome variable Y˜IvE=(Y˜j)=0/1 j=1, 2,, n0I+ n0E to represent a control sample belonging to the external Y˜j=0 or internal Y˜j=1 study, Xj=Xj1, Xj2, .., XjpT the covariates for jth subject, and GIvE,j=G1,j, G2,j,.., Gn0I+n0E,jT be the genotypes at variant j for the n0I+n0E controls samples. To test for the relationship between the genetic variant and source of control samples, we consider the logistic model

logitPr(Y˜i=1 |Xi, Gi,j]=XiT α˜+Gi,jT β˜

A score test statistic can be constructed to test the null hypothesis of no batch effect between the internal and external controls samples as SIvE=GIvE,jTY˜IvETμ˜IvE, where μIvE=μIvE,j=Pr(Y˜i=1 |Xi) and μ˜IvE,i is the maximum likelihood estimate of μIvE,i.

Following the single-variant iECAT-Score method, a compound score statistic that tests the hypothesis no genetic effect at variant j is given by

Sw,j=aτjSint,j+1τjSall,j (2)

where a=n1I+n0In1In0I+n1In0En1In0In1I+n0I+n0E adjusts for the different sample sizes used to calculate Sint,j and Sall,j, and τj=τ1j1+τ1j with τ1=SIvE2 jVarSIvE j is a variant-specific weight that reflects the level of batch effect existed between the internal and external control samples at the variant j. When minor allele frequencies (MAFs) of external controls are in between those of internal cases and internal controls, and the MAFs are such that μ^all2σ^all2> μ^int2σ^int2, we let τj=0 follwing Li and Lee (Y. Li & Lee, 2021). Under the null hypothesis of no genetic effect, ESw,j=0 and VarSwj can be calculated using the delta method. Additionally, we update the VarSwj to its robust estimate by applying the Saddlepoint approximation (SPA) or Efficient resampling (ER) method, allowing for scenarios of unbalanced case-control ratio and low MAC.

After obtaining the iECAT-Score statistic at each variant within a region, we test the joint genetic effect of variants by performing Burden-, SKAT-, and SKAT-O-type tests to the region. Consider m variants in a region. Let Sw=w1Sw,1, w2Sw,2, , wmSw,m', where Sj is the single variant iECAT-Score statistic at variant j, j=1, 2, , m. Under the null hypothesis of no genetic effect, S approximately follows a multivariate Gaussian distribution with mean zero and variance Φ=WΣW, where W=wj is the diagonal weight matrix and Σ is the covariance matrix of S1, S2, , Sm'. We use wj=BetaMAFj, a1, a2 where a1, a2=1, 25 with MAFj estimated based on the combined samples. Such choice of a1, a2 upweights rare variants (MAF less than 1%) while giving adequate nonzero weights to less common variants (MAF 1%−5%) (Wu et al., 2011). The covariance matrix of S1, S2, , Sm' is given by Σ=A1/2CA1/2, where A=diagVarS1, VarS2, , VarSm and C is a m×m correlation matrix of m variants. As we are interested in maintaining the correlation structure between the variants reflected through the internal sample population, we estimate the correlation matrix C by the empirical correlation between genetic variants within the region using exclusively the internal case and control samples.

The SKAT statistic is QSKAT=j=1mwj2Sj2 and the burden test statistic is QBurden=j=1mwjSj2. The weighted average of the SKAT and burden test statistics is Qρ=1ρQSKAT+ρQBurden. Let Rρ=1ρI+ρ11 and Lρ be its Choleskey decomposition matrix such that LρLρ'=Rρ. Then Qρ asymptotically follows a mixture of chi-square distributions, Σj=1mλjχj2, where λ1,, λm are the eigenvalues of LρΦLρ'. We apply the Davies method (Davies, 1980) to obtain the p value for association between the genetic region and phenotype.

Minimum P-value Based on Combination of Sw and Sint

Similar to the single-variant iECAT-Score test, the iECAT-Score method may yield a larger p value than exclusively using internal samples only as a result of large variance estimate (Y. Li & Lee, 2021). Thus, to improve power, we use the minimum p value calculation procedure (Conneely & Boehnke, 2007; Y. Li & Lee, 2021). At variant j, Sw,j, Sint,j jointly follow a multivariate normal distribution. The p value of the minimum p value of Sw,j and Sint,j, i.e. minP p value, are calculated as the probability of observing one or both the p values as small as the smaller one of the two under the null hypothesis of no association (Conneely & Boehnke, 2007).

For region-based test, we first obtain the minP p value of Sw,j, Sint,j j=1, 2, , m in each variant to re-construct the single variant score statistics, i.e., SminP, j, and then combine them for association analysis. One issue of this approach is that to estimate SminP, j from minP p value, the variance of SminP, j should be specified. To address this, we use the geometric mean of the score statistics using internal samples only and using the iECAT-Score method, i.e., VarSminP,j=Var(Sw,j)×Var(Sint,j). Such choice of the geometric mean does not only reflect that VarSminP,j is on the same scale as Var(Sw,j) and Var(Sint,j), but also takes into consideration the correlation among Sw,j, Sint,j, and SminP, j. The minP score statistic SminP, j is then derived by SminP,j=VarSminP,j×χquantile21pminP. The score statistics and their variances at each variant are then used to calculate the SKAT-, Burden-, and SKATO-type statistics for the region.

2.2. Type I error and power simulations

We conducted simulation studies of various scenarios to evaluate the performance of the proposed iECAT score region-based test regarding type I error rates and power. We used the coalescent simulator COSI (Schaffner et al., 2005) to generate genotyped data of 3000 bps of European ancestry on samples sizes of two combinations of case-control ratios (n1I: n0I:n0E): (1) 5,000: 5,000: 10,000; (2) 6,667: 3,333: 10,000.

For both type I error and power simulations, we generated binary phenotypes of case/control from the logistic regression model:

logitPr(Y=1 |X, G]=α0+0.5X1+0.5X2+βG

where X1 was a binary covariate following a Bernoulli distribution with probability of 0.5 being 1, X2 was a continuous covariate following the standard normal distribution, and α0 was chosen such that the disease prevalence was 0.01. G consist of variants of 3kb regions randomly selected from the 3kb regions generated by the coalescent simulator.

We assumed that 3% of the variants were subject to different MAFs in internal and external control samples to mimic the batch effects between internal and external samples. When batch effect existed at a variant, the MAFs of the external controls were randomly generated from Uniform0.1×q, 4×q, where q was the MAF of corresponding variants in the internal samples.

In type one error simulations, the genetic effect size β=0. We generated 5×106 datasets to evaluate type I error rates at 1.0×104 and 2.5×106 levels. In power simulations, we randomly selected 5%, 10%, 20%, and 50% of variants with MAF < 1% in the 3kb region as causal variants. The effect size of causal variants β=clog10MAF where  C=0.6, 0.46, 0.35, 0.27 when 5%, 10%, 20%, 50% of the rare variants were causal. We assumed that either all causal SNPs had positive effect (homogeneous effect), or 80% had positive effect and 20% negative (heterogeneous effect). We generated 100,000 data sets in each simulation setting and case-control ratio to evaluate power at the significance level of 2.5×106.

2.3. Real data analysis

We applied our proposed method to genotype data from the International AMD Genomics Consortium (IAMDGC) (Fritsche et al., 2015) downloaded from dbGaP (phs001039.v1.p1). The IAMDGC dataset consists of 17,286 cases and 14,377 controls. We used 348,465 unrelated samples from the UK Biobank as external controls. We used ICD-9 code to select samples from UK Biobank who are free from macular degeneration of retina and posterior pole of retina. For both studies, the samples used in our analysis are of European ancestry.

We performed analyses on the genotype data of overlapping variants between the AMD and UK Biobank studies to compare the performance of our proposed iECAT-Score method with the method that solely uses internal samples. We applied the ANNOVAR software (Wang, Li, & Hakonarson, 2010) for gene-based annotation, using the hg19 build downloaded from the UCSC Genome Browser Annotation Database (Haeussler et al., 2018). We included exonic, intronic, splicing, and UTR variants for analyses.

We applied the Fruposa software (Zhang, Dey, & Lee, 2020) with the 1000 Genomes reference (The 1000 Genomes Project Consortium, 2015) to obtain population principal component scores. Following Li and Lee (Y. Li & Lee, 2021), we first removed samples with missing genotypes at each variant, and then removed variants that show monomorphism in either in internal samples or the combined internal and external samples. A logistic regression model was used to test for the association between the disease status of age-related macular degeneration (AMD) and single common genetic variants that are shared by IAMDGC and UK Biobank data sets, adjusting for age, sex, and first four principal components. Then we tested for association between rare variants within genes and the phenotype, conditioned on significant (p value <1e-06) common variant within 3kb region of the gene, based on single-variant association results using the iECAT-Score minP method. In the region-based test, we adjusted for age, sex, and first four principal components. We compared the performance of iECAT-Score, iECAT-Score minP, and methods that exclusively use internal samples and that naively combines control samples without adjusting for batch effect.

3. Results

3.1. Type I error and power simulations

We present in Table 1 the type I error rates of the proposed methods at the significance level of 1e-04 and 2.5e-06, for two settings of case-control ratios (n1I: n0I:n0E): (1) 5,000: 5,000: 10,000; (2) 6,667: 3,333: 10,000. For each setting, we present type I error rates of SKAT, burden, and SKAT-O type tests using test statistics that are constructed using the following methods: (1) exclusively using internal samples; (2) iECAT-Score method; (3) iECAT-Score minP method; (4) naively combining external controls without adjusting for batch effect. The results show that for all of SKAT, burden, and SKAT-O type tests, both versions of iECAT-Score methods controlled type I error rates at both significance levels, although iECAT-Score methods tended to be more conservative than the method that exclusively uses internal samples. If external control samples are naively integrated without adjusting for batch effect, however, substantial inflation of type I error rates is observed.

Table 1.

Type I error rates of score tests using internal samples exclusively, various versions of iECAT-Score method, and using combined internal and external samples without adjusting for batch effect, under different scenarios. Type I error rates were estimated based on 5×106 simulations.

Internal cases: internal controls: external controls α level Internal only iECAT-Score iECAT-Score minP Combined controls
5000:5000:10000 1e-04 SKAT 9.73e-05 5.27e-05 5.27e-05 9.13e-02
Burden 9.86e-05 5.85e-05 2.98e-05 2.08e-02
SKAT-O 1.14e-04 6.80e-05 5.75e-05 8.85e-02

2.5e-06 SKAT 3.10e-06 8.61e-07 1.89e-06 7.15e-02
Burden 2.75e-06 1.55e-06 8.61e-07 1.02e-02
SKAT-O 2.58e-06 1.89e-06 1.38e-06 6.98e-02

6667:3333:10000 1e-04 SKAT 1.03e-04 4.12e-05 5.75e-05 1.10e-01
Burden 1.04e-04 4.50e-05 3.78e-05 3.37e-02
SKAT-O 1.22e-04 4.67e-05 5.57e-05 1.08e-01

2.5e-06 SKAT 3.24e-06 1.24e-06 2.62e-06 9.30e-02
Burden 3.08e-06 9.25e-07 9.25e-07 1.95e-02
SKAT-O 4.32e-06 1.08e-06 2.31e-06 9.10e-02

We compared the power of iECAT-Score, iECAT-Score minP methods and method using internal controls exclusively to assess genetic association at alpha level of 2.5e-06, using SKAT, burden, and SKAT-O type tests. As the method that naively combines external controls does not control for type I error rates, we did not include this method in the power comparison. Figure 1 compares powers of different methods for varying percentage of causal rare variants, when all causal SNPs had positive effect (homogeneous effect). Top three panels show powers of such comparison when case-control ratios are (n1I: n0I:n0E)=5,000: 5,000: 10,000; bottom three panels show a similar power comparison for case-control ratio of 6,667: 3,333: 10,000. In Figure 2, we show power comparison in a similar manner when 80% of causal variants had positive effect and 20% had negative effect (heterogeneous effect). The results show that in all settings, both iECAT-Score and iECAT-Score minP methods had improved power over the method that used exclusively internal control samples. Under homogeneous causal effect, iECAT-Score method had higher power than iECAT-Score minP method when a small percentage (5%) of rare variants were causal; iECAT-Score minP method had higher power than iECAT-Score method when a large percentage (50%) of rare variants were causal. Under heterogenous causal effect, however, such relationship was reversed.

Figure 1.

Figure 1.

Empirical power comparisons of iECAT-Score, iECAT-Score minP, and Internal only methods for varying percentage of causal rare variants, when all causal SNPs had positive effect (homogeneous effect). Shown are power from simulations using 10,000 internal samples and 10,000 external control samples, with internal case and control ratios 1:1 (top panels) and 2:1 (bottom panels).

Figure 2.

Figure 2.

Empirical power comparisons of iECAT-Score, iECAT-Score minP, and Internal only methods for varying percentage of causal rare variants, 80% of causal variants had positive effect and 20% had negative effect (heterogeneous effect). Shown are power from simulations using 10,000 internal samples and 10,000 external control samples, with internal case and control ratios 1:1 (top panels) and 2:1 (bottom panels).

We compared p values from the power simulations between iECAT-Score method, iECAT-Score minP method, and the method using internal samples only. Figure 3 shows the log-10 scaled p values of SKAT-O test under homogeneous causal effect with 5% causal rare variants of iECAT-Score method (panel a) and iECAT-Score minP method (panel b) vs. using internal samples only, when case-control ratios are (n1I: n0I:n0E)=5,000: 5,000: 10,000. Shown in blue color are variants that were not significant at 1e-06 level by solely using internal samples but showed signals of association when iECAT-Score method was used; shown in coral color are variants that showed significance at 1e-06 level by exclusively using internal samples, but did not get picked up by iECAT-Score methods. There were some cases where the internal sample-only approach produced smaller p values than iECAT-Score, which is a result of large variance estimates using the iECAT-Score method. However, the iECAT-Score minP approach substantially rescued this disadvantage by leveraging the minimum p value between internal sample-only and iECAT-Score p values. In both comparisons, there were significant number of variants that showed significance only through the usage of iECAT-Score methods, indicating higher power of iECAT-Score methods in detecting associations.

Figure 3.

Figure 3.

Comparison of p values (in log 10 scale) from analyses using iECAT-Score method (a), iECAT-Score minP method (b), vs. using internal samples only. We simulated the data assuming 5% causal rare variants with homogeneous positive causal effect. Sample sizes of the internal cases: internal controls: external controls are 5,000: 5,000: 10,000. Shown in blue color are variants that were not significant at 1e-06 level by solely using internal samples but showed signals of association when iECAT-Score method was used; shown in coral color are variants that showed significance at 1e-06 level by exclusively using internal samples, but did not get picked up by iECAT-Score methods. Numbers of the color-coded variants are listed in the quadrants.

3.2. Application to Age-Related Macular Degeneration (AMD) Data

We analyzed the association between genes and age-related macular degeneration from IAMDGC, using samples from UK Biobank as external controls. The UK Biobank data include markers directly genotyped or imputed by the TOPMed reference panel (Taliun et al., 2021). We restricted our analysis to the markers shared by the IAMDGC and UK Biobank samples. All samples used in the analyses are of European ancestry. The proportions of female samples 41.29%, 43.99%, and 42.51% in internal cases, internal controls, and external controls, respectively (Table 2). We observed that samples with AMD tended to be older than the controls, with mean ages 85.86 years and 70.08 years, respectively. Samples of the external controls had a mean age of 56.75 years.

Table 2.

Descriptive statistics of study subjects from internal (IAMDGC) and external (UK Biobank) studies.

Study Sample Size
N
Female
N (%)
Age
Years (SD; min-max)
Total
Cases Controls Cases Controls Total Cases Controls Total
IAMDGC (internal) 17,286 14,377 7,137
(41.29)
6,322
(43.97)
13,459 (42.51) 75.86
(8.10; 50–90)
70.08
(9.71; 35–90)
73.24
(9.32; 35–90)
31,663
UKB (external) 334,088 179,984
(53.87)
179,984
(53.87)
56.75
(8.00; 39–73)
56.75
(8.00; 39–73)
334,088

Total 17,286 348,465 7137 (41.29) 188,039 (53.96) 198,188 (54.19) 75.86
(8.10; 50–90)
57.30
(8.50; 35–90)
58.17
(9.35: 35–90)
365,751

A total of 74,676 variants of exonic, intronic, splicing, and UTR regions are present in both IAMDGC and UK Biobank data sets and were used in single-variant analyses with the iECAT-Score minP single variant test. We then performed on region-based tests on 36,389 rare variants (MAF < 0.01) consisting of 7,640 genes that include at least two rare variants, adjusting for age, sex, and first four principal components, conditioning on top common variant (MAF > 0.01) with p value < 1e-06 from the single variant test within a 3kb region of the gene. The QQ plots from the tests integrating external control samples using both versions of iECAT-Score methods and using internal samples exclusively are shown in Figure 4. Compared to the method that exclusively used internal samples, both iECAT-Score methods controlled type I error rates; the patterns of the QQ plots that uses internal samples only and that uses iECAT-Score method are similar; iECAT-Score minP method, on the other hand, is more conservative than the other two methods, which is expected from applying the minimum p value method.

Figure 4.

Figure 4.

The QQ plots for analysis of age-related macular degeneration (AMD) from internal study of International AMD Genomics Consortium (IAMDGC) and external control study of UK Biobank. For better visualization, the maximum of x and y axes in the plots are set to be 9, corresponding to p values of 1e-09.

Table 3 presents the top variants showing significance at 6.54e-06 after Bonferroni correction. The iECAT-Score method detected several AMD-associated genes including C3 (Maller et al., 2007; Yates, 2007), CFHR5 (Narendra, Pauer, & Hagstrom, 2009), F13B (Keenan et al., 2015), CFB (Sun, Zhao, & Li, 2012), which are well-known associations for the trait. The association of gene DXO was revealed by applying the iECAT-Score method (p value: 3.62e-06), but did not reach significance level (p value: 5.18e-05) with the sole usage of internal samples from the IAMDGC dataset (Figure 5). We show in Figure 6 the single variant association p values within the top seven genes that showed association in the conditional rare-variant testing based on the iECAT-Score minP method. We also present the results from the unconditioned association testing results in the Supplementary materials (Figure S1 and Table S1).

Table 3.

Identification of variants showing significance (6.54e-06 level after Bonferroni correction) based on iECAT-Score minP method, jointly testing the rare variants within each gene after conditioning on top common variant whose single-variant association p value less than 1e-06, within the 3kb region. Shown are conditional p values of analyses from exclusive usage of internal samples and using iECAT-Score minP method, total minor allele count (MAC), and top variant and its respective minor allele frequency (MAF) from single variant analyses, in different case groups.

No. Gene Chr Set Size Region Test p value Total MAC Top variant p value Top variant MAF
Internal iECAT minP Internal case Internal control External control Internal iECAT minP Internal case Internal control External control
1 C3 19 29 2.16e-25 8.31e-24 576 273 7060 4.34e-24 8.68e-24 1.23e-02 4.30e-03 5.25e-03
2 ASPM 1 15 2.80e-19 1.70e-17 739 868 12642 6.73e-11 1.35e-10 6.71e-03 1.22e-02 9.41e-03
3 PRRC2A 6 32 1.50e-06 4.95e-07 1073 686 14535 7.60e-06 1.52e-05 1.38e-02 9.91e-03 7.96e-03
4 CFHR5 1 9 6.50e-06 5.68e-07 470 331 6682 2.35e-05 2.70e-06 9.68e-03 6.40e-03 4.87e-03
5 F13B 1 4 5.75e-07 1.11e-06 233 295 5261 7.02e-07 1.40e-06 6.72e-03 1.03e-02 8.52e-03
6 DXO 6 9 7.38e-05 3.34e-06 342 209 4444 2.06e-05 1.29e-05 2.59e-03 1.30e-03 1.47e-03
7 CFB 6 17 3.17e-06 3.49e-06 983 689 12934 6.48e-06 1.29e-05 1.42e-02 1.04e-02 8.10e-03

Figure 5.

Figure 5.

Identification of variants showing significance (6.54e-06 level after Bonferroni correction) based on iECAT-Score minP method and the internal sample-only method, jointly testing the rare variants within each gene after conditioning on top common variant whose single-variant association p value less than 1e-06, within the 3kb region. Shown in blue color are genes that reached significance level using both the iECAT-Score minP method and the internal sample-only method. The association of gene DXO (red color) was revealed by applying the iECAT-Score method (p value: 3.62e-06), but did not reach significance level (p value: 5.18e-05) with the sole usage of internal samples.

Figure 6.

Figure 6.

P values in -log10 scale of single variants in top seven significant genes from the iECAT-Score minP conditioned rare-variant gene-based test. The single variant p values are calculated using the iECAT-Score minP method, conditioned on significant common variant (p value < 1e-06) within 3kb region of the gene. Shown on the x-axis are the positions of each variant within each gene on their respective chromosomes.

4. Discussion

Integrating External Controls into Association Testing (iECAT) is a powerful tool to increase power in genetic association test and discovering genetic variants that are predisposed to human disease. In this paper, we proposed a region-based iECAT-Score test, which allows for testing the joint effect of rare variants within a gene or a region when integrating genotyped data from external studies. We extended the original iECAT-Score single variant test to three variant-set tests: burden-, SKAT-, and SKATO-type tests. We also took advantage of the minimum p value method to further improve the performance of the iECAT-Score test. Our proposed iECAT-Score variant-set tests are not only able to adjust for covariates and population stratification but are computationally efficient when applied to large-scale genome-wide association studies. We showed through simulation studies that our proposed methods have controlled type I error rates and improved power for association testing compared to the methods that exclusively use internal samples. The analysis of the AMD from IAMDGC using UK Biobank exome data as external controls revealed associations that were not found by the sole usage of IAMDGC data.

Our type I error simulation studies showed that the both versions of the iECAT-Score variant set tests are conservative as compared to the methods that exclusively use internal samples. Such observations are expected as the single-variant iECAT-Score is conservative and both SKAT and burden type tests are conservative at low α levels (B. Li & Leal, 2008; Wu et al., 2011). However, the iECAT-Score tests still improve power for disease association testing. As the true proportion of the true causal rare variants varies in a region, either the SKAT-type or the burden-type test has higher power, consistent with their performance in the usual region-based rare variant tests. The optimal SKAT-O test attains highest power under all possible underlying causal effects of rare variants in the region.

The data analysis of AMD data from IAMDGC using UK Biobank as external controls revealed a rare variant association of gene DXO that was not identified with the sole usage of the internal. The Decapping Exoribonuclease (DXO) gene is suggested as a housekeeping gene whose protein function is yet unknown (Jiao et al., 2017; Picard-Jean et al., 2018). However, its association with AMD has been reported through a study based on retina eQTL data (Ratnapriya et al., 2019). Although no definitive pathogenic DXO mutations have been found, our analyses shed light on further directions to investigate its roles in the prognosis of AMD.

We noticed that the number of significant associations from integrating external controls in the real data analysis seems to be less than that in the simulated data. Several factors could contribute to this discrepancy. First, additional simulation studies showed that the additional power gained starts to level out at larger sample sizes. We increased the external control sample size to 50k and 100k, resulting in an internal to external sample size ratio to 1:5 and 1:10 (Supplementary Figure S2). We found that integrating additional external controls continues to increase the power, but the additional power gain is not as prominent at very large sample sizes. Second, the resulting p values in real data could be affected by the MAF discrepancies between internal and external samples at the variants within the region. As we have previously studied in the single-variant tests, the iECAT-Score method achieves the greatest power when MAF of external controls was close to MAF of internal controls; when MAF of external controls is closer to MAF of internal cases than to MAF of internal controls, the improvement in power could be weakened from the increase in noise (Y. Li & Lee, 2021). Thus, the region-based test power could be a result of increased power from additional controls and weakened power from increased noise from the constituting variants in the region.

In our simulation studies, we modeled the batch effect by setting three percent of variants subject to differential MAFs between internal and external control samples, with the MAFs of external samples varying between 10% to four times those in the internal samples. We considered an alternative pattern of differential distributions of MAFs between internal and external samples where the majority of variants are subject to small differences in their MAFs, mimicking population stratification between study populations. In the additional simulation studies, we assumed that 90% of the variants were subject to different MAFs in internal and external control samples, and the MAFs of the external controls were randomly generated from Uniform (0.9×q, 1.1×q), where q was the MAF of corresponding variants in the internal samples. The simulation results (Supplementary Figure S3) show that the iECAT-Score methods controlled for type I error rates.

In addition to assessing the performance of our method under an alternative pattern of differential MAFs between internal and external, we also checked the stability of the iECAT-Score methods when the minor allele count is low within a gene/region. As compared to using the coalescent simulator COSI (Schaffner et al., 2005) to generate genotyped data of 3000 bps of European ancestry, we performed additional simulation studies where we restricted the minor allele frequencies to <0.01 (total number of alternate alleles fewer than 40). The simulation results (Supplementary Figure S4) showed that our methods still controlled the type I error rates with overall conservative results, indicating the method does not inflate type I error rates when a small number of alternate alleles exist with a gene/region.

In this article, we proposed the iECAT-Score region-based test that can improve power for rare-variant association test and applied the methods to the genotyped AMD data from IAMDGC using UK Biobank as external controls. In some additional data analyses, we applied our method to the study of myocardial infarction, using whole exome sequence data of the Myocardial Infarction Genetics Exome Sequencing Consortium (Myocardial Infarction Genetics Consortium, 2009) and 5,000 samples from the UK Biobank (Bycroft et al., 2018) (Supplementary Note A). Our results (Supplementary Figure S5) show that our iECAT-Score methods produced calibrated QQ plots and can be applied to both GWAS and sequence data. We should note that when applied to genotyped data, as the iECAT-Score tests assess the batch effect using internal and external control samples, the confidence of such comparison could depend on the quality of genotype calls. Hence, as the sequencing cost continues to drop and large-scale biobanks become available, we ought to be cautious about the quality of genotyped data called from the sequencing data. The quality of genotype data is subject to many factors such as read depths, genotype-calling error rates, quality control (QC) pipelines, etc., all of which could result in bias in the estimation of minor allele frequencies (MAFs) using genotyped data, leading to batch effect between the two sets of control samples. We will extend our method to account for these factors using sequence data to better account for batch effect between samples from different cohorts.

Through introducing the iECAT-Score region-based test, we expanded the iECAT framework to jointly test for the genetic effects within a gene or a region and improve power for rare-variant tests. When applying the iECAT methods, we require that individual level data including the genotypes and covariates to be available. This requirement on individual level data might pose some challenges when selecting external controls, but as more public biobank data become available, we believe that our iECAT methods will be easily applied. The method is implemented in the R-package “iECAT” available on the GitHub repository https://github.com/leeshawn/iECAT.

Supplementary Material

supinfo

ACKNOWLEDGMENTS

This work was supported by NIH grant R01-HG008773 and Brain Pool Plus (BP+, Brain Pool+) Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT(2020H1D3A2A03100666). The dataset used for the Age-related Macular Degeneration analysis were obtained from the International AMD Genomics Consortium (IAMDGC), which were downloaded via dbGaP with accession number phs001039.v1.p1. The dataset used for the Myocardial Infarction analysis were obtained from the Myocardial Infarction Genetics Exome Sequencing Consortium (MIGen), which were downloaded via dbGaP with accession number phs001000.v1.p1. UK Biobank data were accessed under the accession number UKB: 45227.

Footnotes

Conflict of Interest: none declared.

Data Availability Statement

The age-related macular degeneration (AMD) data that support the findings of this study are from the International AMD Genomics Consortium (IAMDGC) and the UK Biobank. The IAMDGC data are available in dbGaP at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001039.v1.p1, reference number [phs001039.v1.p1]. UK Biobank data were accessed from https://www.ukbiobank.ac.uk/ under the accession number UKB: 45227. The myocardial Infarction data that support the finds in the supplementary materials are from the Myocardial Infarction Genetics Exome Sequencing Consortium: U. of Leicester and are available in dbGaP at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001000.v1.p1, reference number [phs001000.v1.p1].

REFERENCES

  1. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature, 1–25. 10.1038/s41586-018-0579-z [DOI] [PMC free article] [PubMed]
  2. Chen S, & Lin X (2018). Analysis in Case–Control Sequencing Association Studies with Different Sequencing Depths. Biostatistics, 1–17. 10.1093/biostatistics/kxy073 [DOI] [PMC free article] [PubMed]
  3. Conneely KN, & Boehnke M (2007). So Many Correlated Tests, So Little Time! Rapid Adjustment of P Values for Multiple Correlated Tests. The American Journal of Human Genetics, 81(6), 1158–1168. 10.1086/522036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Derkach A, Chiang T, Gong J, Addis L, Dobbins S, Tomlinson I, et al. (2014). Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic. Bioinformatics, 30(15), 2179–2188. 10.1093/bioinformatics/btu19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dey R, Schmidt EM, Abecasis GR, & Lee S (2017). A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS. The American Journal of Human Genetics, 101(1), 37–49. 10.1016/j.ajhg.2017.05.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Fritsche LG, Igl W, Bailey JNC, Grassmann F, Sengupta S, Bragg-Gresham JL, et al. (2015). A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nature Genetics, 48(2), 134–143. 10.1038/ng.3448 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, et al. (2018). The UCSC Genome Browser database: 2019 update. Nucleic Acids Research, 47(D1), D853–D858. 10.1093/nar/gky1095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hendricks AE, Billups SC, Pike HNC, Farooqi IS, Zeggini E, Santorico SA, et al. (2018). ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls. PLOS Genetics, 14(10), e1007591–14. 10.1371/journal.pgen.1007591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hu Y-J, Liao P, Johnston HR, Allen AS, & Satten GA (2016). Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls. PLOS Genetics, 12(5), e1006040–19. 10.1371/journal.pgen.1006040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Jiao X, Doamekpor SK, Bird JG, Nickels BE, Tong L, Hart RP, & Kiledjian M (2017). 5′ End Nicotinamide Adenine Dinucleotide Cap in Human Cells Promotes RNA Decay through DXO-Mediated deNADding. Cell, 168(6), 1015–1027.e10. 10.1016/j.cell.2017.02.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Keenan TDL, Toso M, Pappas C, Nichols L, Bishop PN, & Hageman GS (2015). Assessment of Proteins Associated With Complement Activation and Inflammation in Maculae of Human Donors Homozygous Risk at Chromosome 1 CFH-to- F13B. Investigative Opthalmology & Visual Science, 56(8), 4870–10. 10.1167/iovs.15-17009 [DOI] [PubMed] [Google Scholar]
  12. Lee S, Fuchsberger C, Kim S, & Scott L (2015). An efficient resampling method for calibrating single and gene-based rare variant association analysis in case–control studies. Biostatistics, 17(1), 1–15. 10.1093/biostatistics/kxv033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lee S, Kim S, & Fuchsberger C (2017). Improving power for rare-variant tests by integrating external controls. Genetic Epidemiology, 41(7), 610–619. 10.1002/gepi.22057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lee S, Wu MC, & Lin X (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13(4), 762–775. 10.1093/biostatistics/kxs014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Li B, & Leal SM (2008). Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data. The American Journal of Human Genetics, 83(3), 311–321. 10.1016/j.ajhg.2008.06.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Li Y, & Lee S (2021). Novel score test to increase power in association test by integrating external controls. Genetic Epidemiology, 2021(45), 293–304. 10.1002/gepi.22370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Maller JB, Fagerness JA, Reynolds RC, Neale BM, Daly MJ, & Seddon JM (2007). Variation in complement factor 3 is associated with risk of age-related macular degeneration. Nature Genetics, 39(10), 1200–1201. 10.1038/ng2131 [DOI] [PubMed] [Google Scholar]
  18. Myocardial Infarction Genetics Consortium. (2009). Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nature Genetics, 41(3), 334–341. 10.1038/ng.327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Narendra U, Pauer G, & Hagstrom S (2009). Genetic analysis of complement factor H related 5. Molecular Vision, 15, 731–736. [PMC free article] [PubMed] [Google Scholar]
  20. Picard-Jean F, Brand C, Tremblay-Létourneau M, Allaire A, Beaudoin MC, Boudreault S, et al. (2018). 2’-O-methylation of the mRNA cap protects RNAs from decapping and degradation by DXO. PLoS ONE, 13(3), e0193804–14. 10.1371/journal.pone.0193804 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ratnapriya R, Sosina OA, Starostik MR, Kwicklis M, Kapphahn RJ, Fritsche LG, et al. (2019). Retinal transcriptome and eQTL analyses identify genes associated with age-related macular degeneration. Nature Genetics, 1–13. 10.1038/s41588-019-0351-9 [DOI] [PMC free article] [PubMed]
  22. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, & Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Sun C, Zhao M, & Li X (2012). CFB/C2Gene Polymorphisms and Risk of Age-Related Macular Degeneration: A Systematic Review and Meta-Analysis. Current Eye Research, 37(4), 259–271. 10.3109/02713683.2011.635401 [DOI] [PubMed] [Google Scholar]
  24. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. (2021). Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature, 2021(590), 290–299. 10.1038/s41586-021-03205-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. The 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Wang K, Li M, & Hakonarson H (2010). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 38(16), e164–e164. 10.1093/nar/gkq603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wu MC, Lee S, Cai T, Li Y, Boehnke M, & Lin X (2011). AR TICLE Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. The American Journal of Human Genetics, 89(1), 82–93. 10.1016/j.ajhg.2011.05.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Yates JRW (2007). Complement C3 Variant and the Risk of Age-Related Macular Degeneration. The New England Journal of Medicine, 1–9. 10.1056/NEJMoa072618) [DOI] [PubMed]
  29. Zhang D, Dey R, & Lee S (2020). Fast and robust ancestry prediction using principal component analysis. Bioinformatics, 36(11), 3439–3446. 10.1093/bioinformatics/btaa152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. [IAMDGC] International Age-Related Macular Degeneration Genomics Consortium (IAMDGC). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001039.v1.p1
  31. [UK Biobank] UK Biobank. https://www.ukbiobank.ac.uk
  32. [MIGen] Myocardial Infarction Genetics Consortium (MIGen). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001000.v1.p1.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

Data Availability Statement

The age-related macular degeneration (AMD) data that support the findings of this study are from the International AMD Genomics Consortium (IAMDGC) and the UK Biobank. The IAMDGC data are available in dbGaP at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001039.v1.p1, reference number [phs001039.v1.p1]. UK Biobank data were accessed from https://www.ukbiobank.ac.uk/ under the accession number UKB: 45227. The myocardial Infarction data that support the finds in the supplementary materials are from the Myocardial Infarction Genetics Exome Sequencing Consortium: U. of Leicester and are available in dbGaP at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001000.v1.p1, reference number [phs001000.v1.p1].

RESOURCES