Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2022 Feb 21;2021:601–610.

Gene-Based Analysis Reveals Sex-Specific Genetic Risk Factors of COPD

Jaehyun Joo 1, Blanca Himes 1
PMCID: PMC8861659  PMID: 35308900

Abstract

Sex-specific differences have been noted among people with chronic obstructive pulmonary disease (COPD), but whether these differences are attributable to genetic variation is poorly understood. The availability of large biobanks with deeply phenotyped subjects such as the UK Biobank enables the investigation of sex-specific genetic associations that may provide new insights into COPD risk factors. We performed sex-stratified genome-wide association studies (GWAS) of COPD (male: 12,958 cases and 95,631 controls; female: 11,311 cases and 123,714 controls) and found that while most associations were shared between sexes, several regions had sex-specific contributions, including respiratory viral infection-related loci in/near C5orf56 and PELI1. Using the newly developed R package 'snpsettest', we performed gene-based association tests and identified gene-level sex-specific associations, including C5orf56 on 5q31.1, CFDP1/TMEM170A/CHST6 on 16q23.1 and ASTN2/TRIM32 on 9q33.1. Our results identified promising genes to pursue in functional studies to better understand sexual dimorphism in COPD.

Introduction

Chronic obstructive pulmonary disease (COPD) is a public health challenge worldwide1. Smoking is the most important COPD risk factor, accounting for 8 out of 10 COPD-related deaths in the U.S., but the relationship is complex and many smokers do not develop COPD1,2. In the past, COPD was perceived as a disease affecting mostly older men, but recent evidence shows that its prevalence has increased faster in women than men due in part to the increased rate of smoking among women3,4. Women may be more susceptible to the harmful effects of cigarette smoke, as some studies have found that women have worse disease outcomes for the equivalent quantity of cigarettes consumed compared to men, although research on this topic is not conclusive5-7. An observational study found that as women who smoked aged, they had an accelerated decline in percent predicted forced expiratory volume in one second (FEV1) and a faster recovery in lung function after smoking cessation than men8. In smoking- or FEV1-matched men and women, women with COPD reported more symptoms, including dyspnea and cough4,6,7,9. Among people with severe COPD, women had less extensive emphysema and thicker small airway walls relative to luminal perimeters than men6. The mechanisms underlying these sex-specific differences in COPD are largely unknown, but they may be influenced by genetic factors. Genome-wide association studies (GWAS) have linked many genetic loci with COPD and have contributed to the growing recognition that multiple biological mechanisms result in the observed lung function changes that receive the diagnostic label of COPD. Uncovering these so-called endotypes of COPD is a major goal of precision medicine that seeks to provide tailored medical care. However, research into sex-specific genetic risk factors that may inform sex-related COPD endotypes is lacking10.

The UK Biobank is a population-based cohort study with over 500,000 participants, which aims to investigate genetic and nongenetic factors that influence a variety of diseases affecting middle and older aged people11. Because the UK Biobank is the largest spirometric study ever conducted in the UK, it is an especially valuable resource to study COPD genetics compared to other biobanks. We previously showed that defining COPD according to Global Initiative for Obstructive Lung Disease (GOLD) criteria, which relies on lung function measures, yielded UK Biobank GWAS results consistent with those of genetic epidemiology cohorts but quite different from those based on ICD codes or self-reported disease12. Given its large sample size and the availability of many health-related measures and spirometry data for most genotyped participants regardless of health status, the UK Biobank provides an unprecedented opportunity to study sexually dimorphic features of COPD11,13.

Gene-based association methods are a complementary strategy for GWAS that evaluate biologically meaningful units of the genome14-16. They detect joint effects of multiple variants for which individual genetic effects may not reach genome-wide significance by reducing the burden of multiple tests that arise when simultaneously assessing millions of markers that are common to GWAS. In addition, gene-based statistics are often required as input for network- and pathway-based approaches. The versatile gene-based association study (VEGAS) method, which is commonly used to investigate putative genes for a trait of interest15,17, takes as input variant-level p-values and reference linkage disequilibrium (LD) data and computes gene-based p-values using a simulation-based approach. Although VEGAS is offered as a webserver or local program that can be readily used by anyone with GWAS summary statistics regardless of the underlying study design, its use is hampered in practice when gene-based p-values are very small because the number of simulations required to obtain such p-values is very large. GWAS performed with biobanks that contain a large amount of clinical information linked to biological samples often yield sample sizes large enough to produce hundreds to thousands of nominal associations18-20, which renders the VEGAS approach impractical due to the long run times necessary to obtain gene-based association results.

In this study, we identified sex-specific genetic associations of COPD using UK Biobank data, developed an R package that efficiently estimates gene-based p-values from GWAS summary statistics, and we compared SNP-level GWAS results with gene-based results obtained using the newly developed package.

Methods

Study population and COPD definition

Phenotypes of UK Biobank participants were obtained from responses to computer-assisted interviews, self-completed questionnaires, and physical measures obtained during in-person visits as reported in data-fields 31 (sex), 50 (height), 20003 (age), 20160 (ever smoked), 22001 (genetic sex) and 22006 (genetic ethnic grouping). To determine lung function, the 'best' measures per individual of FEV1 and forced vital capacity (FVC) were obtained from pre-bronchodilator spirometry blow-volume time-series data following the spirometry quality control steps described in Shrine et al21. COPD affection status was assigned based on spirometric evidence of moderate-to-severe airflow limitation by the modified GOLD criteria22: cases had FEV1/FVC < 0.7 and FEV1 < 80% predicted while controls had FEV1/FVC > 0.7 and FEV1 > 80% predicted.

Sex-stratified GWAS

For each sex, a GWAS was performed using individuals who self-identified as white British and had very similar genetic ancestry based on genetic principal components, while excluding those who had a mismatch between self-reported and genetic sex as determined by chromosomal make-up, sex chromosome configurations that were not XX or XY, or had non-normal heterozygosity and missing rates according to genetic information provided by the UK Biobank team23. These procedures resulted in 12,958 cases and 95,631 controls for males and 11,311 cases and 123,741 controls for females. At the genotype level, variants with imputation INFO score measure < 0.7 or minor allele frequency (MAF) < 0.01 were excluded. Genome-wide association testing was conducted using a generalized mixed model framework implemented in SAIGE24 (v0.43.3) to account for subject relatedness and fine-scale population structure, while including as covariates age, age-squared (age2), height, ever-smoking status (ever vs never), and the first 4 principal components obtained from genotypes. Pack-years of smoking was not included due to a large amount of missing values as described in Shrine et al21. FUMA25, a web-based post-GWAS annotation tool, was used to identify independent risk loci meeting the genome-wide significance threshold of p-value = 5 × 10-8 using the UK Biobank white British reference panel. Genetic correlation between the sex-specific GWAS results was estimated via LD score regression26, while using pre-calculated LD scores based on the 1000 Genomes Project data for European populations27. Regional association plots were created with LocusZoom (v1.4)28. SNP-by-sex interaction tests were performed for the sex-specific genome-wide significant loci using logistic regression models that included data for all participants and the same covariates used in the sex-specific analyses.

Gene-based association tests

For gene-based association tests, we employed the statistical model described in VEGAS15. Briefly, the test statistic was defined as Q=ZZ=i=1pzi2, that is, the sum of squared variant-level z-statistics, where Z={z1,zp} is a vector that follows a multivariate normal distribution with a mean vector 0 and a covariance matrix K; i.e., Z~N(0,K). VEGAS uses Monte Carlo simulations to approximate the distribution of Q, and thus, compute gene-based p-values. Instead of that approach, we rewrote Q=XΛX=i=1pλixi2, where K=PΛP (with a spectral theorem) and X=PΣ1/2Z~N(0,Ip). This represents a quadratic form in independent central normal variables and its distribution can be evaluated with various methods such as numerical inversion of the characteristic function29. We developed the R package snpsettest (https://CRAN.R-project.org/package=snpsettest) based on this latter version of the test statistic while utilizing the algorithm of Davies30, or saddlepoint approximation31, to obtain gene-based p-values. We performed gene-based association analyses with snpsettest using gene annotations from GENCODE release 1932 for the sex-stratified COPD GWAS results. We considered genes in the following biotypes: protein-coding, immunoglobulin variable chain and T-cell receptor genes which included 18,774 entries. For each gene, any variants within 20 kb of 5' and 3' UTRs were selected for association tests and the 1000 Genomes phase 3 European reference panel27 (genotypes of 240 males and 263 females) was used to infer relationships between markers. A significant association was defined as having a Bonferroni-corrected p-value < 0.05. Gene-based results were also obtained with a local instance of VEGAS217 using default parameters (up to one million simulations for p-value calculation) except for gene boundaries for SNP selection that were set to 20 kb of 5' and 3' UTRs of a gene.

Results

Sample characteristics

Characteristics of GWAS subjects stratified by sex are provided in Table 1. Male and female COPD cases were older and more likely to have ever smoked. The proportion of COPD cases was higher in males (11.9%) than females (8.4%), which is consistent with a higher portion of male subjects having a positive smoking history. Female subjects were more likely to have a diagnosis of asthma (ICD-based prevalence of 5.7% in male versus 7.2% in female subjects; self-reported doctor's diagnosis prevalence of 10.9% in male versus 12.5% in female subjects), and a substantially greater proportion of COPD cases had asthma compared to controls. Consistent with the definition of COPD, lung function was lower in cases than controls as measured by precent predicted FEV1 and FEV1/FVC ratio.

Table 1.

Characteristics of GWAS subjects

Male Female
Total Control Case Control Case
N 243,641 95,631 12,958 123,714 11,311
Age, mean (SD) 57.6 (8.4) 57.6 (8.7) 60.6 (7.7) 57.0 (8.3) 60.3 (7.3)
Male % 44.6 - - - -
Height (cm), mean (SD) 168.4 (9.1) 175.7 (6.6) 175.8 (6.9) 162.6 (6.1) 162.7 (6.3)
Ever smoker % 59.4 62.9 76.4 54.0 69.2
Asthma %
ICD-coded diagnosis* 6.5 3.9 19.3 5.8 21.7
Self-reported Doctors diagnosis 11.8 8.2 31.3 10.6 32.9
Lung function, median (IQR)
FEV1 predicted percentage 95.4 (17.4) 98.0 (16.2) 68.3 (15.5) 96.3 (15.4) 67.5 (16.2)
FEV1/FVC ratio 0.77 (0.07) 0.78 (0.06) 0.64 (0.09) 0.78 (0.06) 0.65 (0.08)

ICD: International Classification of Diseases; IQR: Interquartile range.

*

Affection status was assigned based on having any sub-categories of ICD-9 493 and/or ICD-10 J45.

Affection status was assigned based on the UK Biobank data-fields 22127 (Doctor diagnosed asthma) and 20002 (Non-cancer illness code, self-reported).

Most COPD-associated loci were consistent by sex

We identified 17 and 14 genome-wide significant loci in male and female GWAS, respectively (Figure 1), each of which was previously identified in GWAS of COPD and/or lung function-related traits (e.g., FEV1, FEV1/FVC ratio, and smoking behavior). Nine of these loci were present in male and female GWAS results (Table 2), including the most well-replicated COPD locus near HHIP (Figure 2A). The estimated genetic correlation between male and female results was 0.961 (SE 0.04), demonstrating shared genetic effects overall. Male-specific associations were observed at loci in/near C1orf87, PELI1, ARHGEF3, C5orf56, PPP1R3B, BNC1, and CFDP1, and female-specific associations were observed at loci in/near MECOM, ADAM19, C10orf11, KANSL1, and SOGA2. Regional association plots for these genes suggested that some of the sex-specific associations were not due to differences in statistical power attributable to a decreased sample size, most notably that of the C5orf56 locus (Figure 2B). Results of the SNP-by-sex interaction analyses for lead variants are in Table 3. Consistent with the sex-stratified GWAS, rs2158101 at the male-specific locus in C5orf56 had the lowest interaction p-value (P = 1.16 × 10-4). The other lead variants in the male-specific loci had interaction p-values < 0.05 except for rs330934 at the PPP1R3B locus. Among the lead variants in female-specific loci, only rs2637261 at the C10orf11 locus had an interaction p-value < 0.05.

Figure 1.

Figure 1.

Manhattan plots of sex-stratified COPD GWAS results for (A) male and (B) female subjects. Each locus was annotated with its nearest coding gene. The green horizontal dashed line indicates the genome-wide significance threshold of 5 × 10-8. Overlapping loci in the GWAS are shown in blue. Sex-specific loci are shown in orange when genome-wide significant, with corresponding non-significant region shown in red in the alternate sex-specific plot.

Table 2.

Summary statistics for the lead variants at genome-wide significant loci

Male Female
rsID Position* RA RAF OR P-value RAF OR P-value
Male risk loci
rs77714938 1:60926112 C 0.05 1.19 (1.12-1.27) 3.14×10-8 0.05 1.07 (1.01-1.15) 3.23×10-2
rs572473905 2:64288008 C 0.02 0.74 (0.67-0.83) 4.55×10-8 0.02 0.98 (0.88-1.10) 7.74×10-1
3:57060652_GCCA_G 3:57060652 G 0.16 1.11 (1.07-1.15) 3.71×10-8 0.16 1.01 (0.97-1.05) 6.22×10-1
rs1903003† 4:89886297 T 0.55 1.08 (1.06-1.11) 4.86×10-9 0.54 1.08 (1.05-1.11) 2.87×10-7
rs34712979† 4:106819053 A 0.26 1.18 (1.14-1.22) 1.17×10-25 0.26 1.18 (1.14-1.22) 1.59×10-22
rs6828540† 4:145463231 A 0.40 0.85 (0.83-0.87) 6.61×10-31 0.40 0.87 (0.85-0.90) 3.73×10-20
rs2158101 5:131769273 A 0.23 1.12 (1.09-1.16) 1.87×10-12 0.23 1.02 (0.99-1.06) 1.65×10-1
rs10037493† 5:147854970 T 0.45 0.88 (0.86-0.91) 1.04×10-18 0.45 0.92 (0.89-0.94) 3.14×10-9
rs3210176† 6:32627850 C 0.37 1.10 (1.07-1.14) 9.68×10-12 0.38 1.10 (1.07-1.13) 4.20×10-10
rs7753012† 6:142745883 G 0.31 0.86 (0.84-0.89) 1.72×10-22 0.31 0.90 (0.88-0.93) 1.42×10-10
rs330934 8:9013766 T 0.33 1.08 (1.05-1.12) 4.35×10-8 0.34 1.07 (1.03-1.10) 3.02×10-5
rs4743294 9:101673848 G 0.71 0.92 (0.89-0.95) 3.92×10-8 0.71 0.97 (0.94-1.00) 8.43×10-2
rs2271804† 10:12252217 A 0.53 0.91 (0.89-0.94) 6.22×10-11 0.53 0.92 (0.89-0.95) 4.42×10-9
rs2119568† 15:71665824 C 0.18 1.16 (1.12-1.20) 1.67×10-16 0.18 1.12 (1.08-1.16) 5.81×10-9
rs72740964† 15:78868636 A 0.33 1.13 (1.10-1.16) 1.55×10-16 0.33 1.11 (1.07-1.14) 6.61×10-11
15:84034556_CA_C 15:84034556 C 0.29 0.91 (0.88-0.94) 3.55×10-9 0.30 0.99 (0.96-1.02) 4.68×10-1
rs72787160 16:75445162 C 0.59 1.10 (1.07-1.13) 3.22×10-12 0.59 1.05 (1.02-1.08) 7.17×10-4
Female risk loci
rs1420472 3:168776326 T 0.44 1.06 (1.03-1.09) 9.69×10-6 0.44 1.09 (1.06-1.12) 1.13×10-9
rs2045517† 4:89870964 T 0.41 1.06 (1.04-1.09) 8.86×10-6 0.41 1.09 (1.06-1.12) 9.12×10-9
rs34712979† 4:106819053 A 0.26 1.18 (1.14-1.22) 1.17×10-25 0.26 1.18 (1.14-1.22) 1.59×10-22
rs6817273† 4:145492003 C 0.40 0.85 (0.83-0.88) 5.75×10-29 0.40 0.87 (0.85-0.90) 3.66×10-21
rs6580550† 5:147856232 C 0.45 0.89 (0.86-0.91) 1.84×10-18 0.45 0.92 (0.89-0.94) 1.26×10-9
rs112325689 5:156931851 AGCCGG 0.34 1.07 (1.04-1.10) 6.66×10-6 0.34 1.11 (1.08-1.14) 7.97×10-12
rs17843608† 6:32620454 C 0.48 1.09 (1.06-1.13) 6.30×10-10 0.48 1.12 (1.09-1.16) 5.59×10-14
rs262115† 6:142817407 C 0.31 0.87 (0.84-0.90) 7.36×10-21 0.31 0.90 (0.87-0.93) 1.06×10-11
rs2001546† 10:12285735 T 0.52 0.92 (0.89-0.94) 3.60×10-10 0.51 0.92 (0.89-0.94) 7.44×10-10
rs2637261 10:78320593 T 0.51 1.04 (1.01-1.07) 4.35×10-3 0.51 1.09 (1.06-1.12) 1.65×10-9
15:71630062_TGAG_T† 15:71630062 T 0.39 1.09 (1.06-1.12) 1.14×10-9 0.39 1.10 (1.07-1.14) 3.83×10-11
15:78804071_TA_T† 15:78804071 T 0.67 0.89 (0.86-0.92) 4.22×10-15 0.66 0.89 (0.87-0.92) 5.67×10-13
rs113434679 17:44126765 A 0.20 1.06 (1.03-1.10) 3.77×10-4 0.20 1.11 (1.07-1.15) 2.97×10-8
18:8803157_TA_T 18:8803157 T 0.70 0.95 (0.92-0.97) 2.78×10-4 0.70 0.91 (0.88-0.94) 1.07×10-8

RA: Risk allele; RAF: Risk allele frequency; OR: Odds ratio

* Position based on GRCh37, † Overlapping loci according to genomic position

Figure 2.

Figure 2.

Regional association plots for the loci in (A) HHIP as an example of a finding present in male and female GWAS and (B) C5orf56, the locus with the greatest sex-specific difference. Left panels display the results of the male GWAS and the right panels display the results of the female GWAS.

Table 3.

SNP-by-sex interaction p-values for non-overlapping lead variants at sex-specific genome-wide significant loci in Table 2

rsID Position Risk Allele Beta P-value*
Male-specific risk loci
rs77714938 1:60926112 C 0.105 2.19×10-2
rs572473905 2:64288008 T -0.283 2.84×10-4
3:57060652_GCCCA_G 3:57060652 G 0.095 5.63×10-4
rs2158101 5:131769273 A 0.091 1.16×10-4
rs330934 8:9013766 T 0.016 0.441
rs4743294 9:101673848 G -0.056 1.07×10-2
15:84034556_CA_C 15:84034556 C -0.082 3.69×10-4
rs72787160 16:75445162 C 0.048 1.76×10-2
Female-specific risk loci
rs1420472 3:168776326 T -0.027 0.179
rs112325689 5:156931851 AGCCGG -0.038 6.86×10-2
rs2637261 10:78320593 T -0.048 1.73×10-2
rs113434679 17:44126765 A -0.039 0.121
18:8803157_TA_T 18:8803157 T 0.036 0.105
* P-values from logistic regressions with female as a reference group

The snpsettest package results were comparable to those of VEGAS

Comparison of gene-based association tests obtained with snpsettest versus those of VEGAS found a strong agreement between the two: estimated correlations of -log10(p-value) were 0.981 for the male-specific GWAS and 0.991 for the female-specific GWAS (Figure 3). We observed a lack of consistency with VEGAS for genes whose actual p-values were < 10-6. When only the gene-based p-values ≥ 10-6 were considered, the correlations were 0.999 for both sets of results. A major advantage of the snpsettest package over VEGAS was speed. On the same laptop (i9-9980HK, 32GB memory, Windows subsystem for Ubuntu 18.04 LTS), VEGAS took 29 and 32 hours to complete the gene-based association tests for 18,774 genes using ~8.8 million GWAS summary statistics of males and females, respectively, while the snpsettest package completed the tests in approximately 9 minutes.

Figure 3.

Figure 3.

Comparison of -log10(P) obtained with the snpsettest package versus VEGAS using the summary statistics from GWAS of (A) male and (B) female subjects. The diagonal line indicates a 1:1 relationship.

Sex-specific gene-based association tests identified two additional loci of interest

Sex-specific gene-based associations were determined as those meeting a Bonferroni-adjusted threshold, while excluding loci in the HLA region (Figure 4). Consistent with the GWAS results, C5orf56 on 5q31.1 had the strongest sex-specific association. Other male-specific associations included OTUD4, ABCE1, ANAPC10 on 4q31.21; FBXO38 on 5q32; and TMEM170A, CFDP1, and CHST6 on 16q23.1. Female-specific associations were ASTN2 and TRIM32 on 9q33.1; LCAT, CTRL, and PSMB10 on 16q22.1; and NSF, CRHR1, and SPPL2C on 17q21.31. Most of the significant genes were in/near the genome-wide significant GWAS loci, but novel sex-specific gene associations on 16q22.1 and 9q33.1 were observed only via gene-level statistics.

Figure 4.

Figure 4.

Results of gene-based association tests. Sex-specific associations between genes and COPD are shown in a pairwise manner. Genes showing a significant sex-specific association are labeled. The vertical and horizontal dashed lines indicate the Bonferroni-corrected p-value = 0.05.

Discussion

COPD is a complex disease with sex-specific differences in susceptibility and presentation10. Previous studies by COPDGene (Genetic epidemiology of COPD) found that sex-related genetic components conferred a higher risk of severe, early-onset COPD in women33,34, but the genetic contributions that lead to sex divergence in COPD remain poorly understood. In sex-stratified COPD GWAS, we identified 17 genome-wide significant loci for males and 14 for females. Although the number of significant loci identified was smaller than that in our previous study12 that used all subjects-likely due to decreased statistical power-all loci have been previously associated with COPD and/or lung function-related traits21,35. Our results demonstrated that a large proportion of genetic liability are shared between males and females: i) there were 9 overlapping genome-wide significant loci corresponding to genes FAM13, NPNT, HHIP, HTR4, HLA-DQB1 (HLA-DQA1), GRP126, CDC123, THSD4, and CHRNA5 (HYKK), and ii) there was substantial genetic correlation when considering the effects of all variants not reaching genome-wide significance.

Sex-specific genome-wide significant associations were observed in 8 male-specific and 5 female-specific loci. SNP-by-sex interaction tests for these loci using data for all subjects supported most of the stratified findings except for the male-specific locus in PPP1R3B and female-specific loci in/near MECOM, ADAM19, KANSL1, and SOGA2. The locus in C5orf56 showed the largest difference in patterns of association: it was convincingly associated with COPD in males but no signal was present in females. C5orf56 is a long non-coding gene known as IRF1-AS1 that has not been mechanistically linked to COPD, but the nearby IRF1 gene encodes interferon regulatory factor 1, which has been associated with anti-viral defense in airway epithelium36. Given that respiratory viral infections are a common cause of exacerbations of chronic lung diseases and that viruses (e.g., Human rhinovirus, respiratory syncytial virus, and influenza) are often detected during COPD exacerbations specifically37,38, the associated locus may influence susceptibility to COPD via altered responses to virus exposure. Interestingly, the IRF1 locus has been proposed as a strong candidate region for male-specific asthma susceptibility that may attributable to sex-specific interferon responses39. Another male-specific significant locus we observed near PELI1 may be linked with response to airways viruses40 given that PELI1 is involved in IL-1 signaling and its expression was correlated with the number of exacerbations experienced by patients with obstructive airway disease41. Other regions with sex-specific associations in ARHGEF3, C1orf87, and C10orf11 are not easily linked to mechanistic hypotheses based on what is currently known about the function of these genes.

To obtain gene-based associations from SNP-level results, we developed the snpsettest R package that modifies the statistical test in the VEGAS software15,17 to compute gene-based p-values more efficiently. Although VEGAS can run in a reasonable amount of time if the maximum number of simulations is bounded (e.g., 106), this truncation is not optimal when gene-based association statistics are needed for large-scale GWAS with many significant SNP associations, and subsequently, many gene-based associations that are necessary as input for subsequent pathway and network analyses. We demonstrated that our package produces results consistent with those of VEGAS but with a much shorter runtime.

By comparing gene-level associations obtained for sex-specific COPD GWAS, we found that C5orf56 had the strongest sex-specific COPD association, as it was present only in males. Other male-specific associations were found in CFDP1, TMEM170A, and CHST6 located in the non-overlapping significant GWAS locus on 16q23.1. This region has been implicated in coronary heart disease, whose mortality is higher among men42,43, and was identified as having a potential genetic overlap with COPD susceptibility35. Although OTUD4, ABCE1, and ANAPC10 on 4q31.21

and FBXO38 on 5q32 had male-specific associations, they had similar patterns of regional variant associations despite not reaching genome-wide significance levels in the female GWAS. The most prominent female-specific associations were observed in ASTN2 and TRIM32 on 9q33.1, a region previously reported as having sex-specific associations with neurodevelopmental disorders44. Moreover, several genes on 16q22.1 were associated with COPD only in females, of which SMPD3 was previously reported as having gene-by-smoking interaction effects on COPD45, suggesting that the 16q22.1 locus may be involved in sex-related differences in lung function decline between male and female smokers.

Our study is limited in that some of the sex-specific associations may not be observed due to insufficient statistical power resulting from decreased sample sizes. We chose to perform stratified analyses for ease of interpretation of association odds ratios, and we note that identification of interactions in general requires more statistical power than that of a standard GWAS because effect sizes of interactions are expected to be smaller than their main effects. Future replication of results in independent cohorts could ensure generalizability of our findings. Our study lacks functional validation of sex-specific genetic components in support of the statistical associations, but future experimental studies can explore our findings to elucidate the mechanisms underlying the observed differences. Another limitation is that the UK Biobank is not representative of the general population with respect to a variety of health characteristics; participants were more likely to be older, to be female, and to be more health-conscious than nonparticipants46. Additionally, our results were obtained from participants of European ancestry, and thus, may not generalize to other racial/ethnic groups. Our gene-based association tests did not fully utilize information from the GWAS results since the lead variants in some loci were not necessarily within boundaries of protein-coding genes.

In summary, sex-stratified GWAS of COPD found substantial overlap in the significant risk loci and genetic correlation of male versus female results, but evidence of sex-specific effects was found for several genes, the most prominent of which were C5orf56, CFDP1, TMEM170A, CHST6, ASTN2 and TRIM32. We developed the snpsettest package to conduct gene-based association tests with GWAS summary statistics and identified genes showing a sex-specific association. Contrasting the GWAS and gene-based association test results provided insight into what genetic components can be the topic of future studies aimed at understanding sexual dimorphisms in COPD.

Acknowledgements

This work was supported by National Institutes of Health (NIH) awards R01 HL133433 and R01 HL141991. This research was conducted using the UK Biobank Resource under Application Number 40375.

Figures & Table

References

  • 1.Vogelmeier CF, Criner GJ, Martinez FJ, Anzueto A, Barnes PJ, Bourbeau J, et al. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary. Am J Respir Crit Care Med. 2017 Jan 27;195(5):557–82. doi: 10.1164/rccm.201701-0218PP. [DOI] [PubMed] [Google Scholar]
  • 2.United States Surgeon General . The Health Consequences of Smoking -- 50 Years of progress: A Report of the Surgeon General: (510072014-001) [Internet] American Psychological Association; 2014. [cited 2019 Aug 9]. Available from: http://doi.apa.org/get-pe-doi.cfm?doi=10.1037/e510072014-001 . [Google Scholar]
  • 3.Ntritsos G, Franek J, Belbasis L, Christou MA, Markozannes G, Altman P, et al. Gender-specific estimates of COPD prevalence: a systematic review and meta-analysis. Int J Chron Obstruct Pulmon Dis. 2018 May 10;13:1507–14. doi: 10.2147/COPD.S146390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gut-Gobert C, Cavaillès A, Dixmier A, Guillot S, Jouneau S, Leroyer C, et al. Women and COPD: do we need more evidence? Eur Respir Rev [Internet] 2019 Mar 31;28(151) doi: 10.1183/16000617.0055-2018. [cited 2021 Mar 3]. Available from: https://err.ersjournals.com/content/28/151/180055 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Global Initiative for Chronic Obstructive Lung Disease. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Lung Disease 2021 Report. [Internet] 2020. Available from: https://goldcopd.org/2021-gold-reports/
  • 6.Martinez FJ, Curtis JL, Sciurba F, Mumford J, Giardino ND, Weinmann G, et al. Sex Differences in Severe Pulmonary Emphysema. Am J Respir Crit Care Med. 2007 Aug 1;176(3):243–52. doi: 10.1164/rccm.200606-828OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Watson L, Vestbo J, Postma DS, Decramer M, Rennard S, Kiri VA, et al. Gender differences in the management and experience of Chronic Obstructive Pulmonary Disease. Respir Med. 2004 Dec;98(12):1207–13. doi: 10.1016/j.rmed.2004.05.004. [DOI] [PubMed] [Google Scholar]
  • 8.Gan WQ, Man SFP, Postma DS, Camp P, Sin DD. Female smokers beyond the perimenopausal period are at increased risk of chronic obstructive pulmonary disease: a systematic review and meta-analysis. Respir Res. 2006 Mar 29;7:52. doi: 10.1186/1465-9921-7-52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.de Torres JP, Casanova C, Hernández C, Abreu J, Aguirre-Jaime A, Celli BR. Gender and COPD in patients attending a pulmonary clinic. Chest. 2005 Oct;128(4):2012–6. doi: 10.1378/chest.128.4.2012. [DOI] [PubMed] [Google Scholar]
  • 10.Hardin M, Cho MH, Sharma S, Glass K, Castaldi PJ, McDonald M-L, et al. Sex-Based Genetic Association Study Identifies CELSR1 as a Possible Chronic Obstructive Pulmonary Disease Risk Locus among Women. Am J Respir Cell Mol Biol. 2017 Mar;56(3):332–41. doi: 10.1165/rcmb.2016-0172OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med [Internet] 2015 Mar 31;12(3) doi: 10.1371/journal.pmed.1001779. [cited 2019 Jul 20]. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380465/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Joo J, Hobbs BD, Cho MH, Himes BE. Trait Insights Gained by Comparing Genome-Wide Association Study Results using Different Chronic Obstructive Pulmonary Disease Definitions. AMIA Summits Transl Sci Proc. 2020 May 30;2020:278–87. [PMC free article] [PubMed] [Google Scholar]
  • 13.Gupta RP, Strachan DP. Ventilatory function as a predictor of mortality in lifelong non-smokers: evidence from large British cohort studies. BMJ Open. 2017 Jul 1;7(7):e015381. doi: 10.1136/bmjopen-2016-015381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015 Sep;47(9):1091–8. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, et al. A Versatile Gene-Based Test for Genome-wide Association Studies. Am J Hum Genet. 2010 Jul 9;87(1):139–45. doi: 10.1016/j.ajhg.2010.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLOS Comput Biol. 2015 Apr 17;11(4):e1004219. doi: 10.1371/journal.pcbi.1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mishra A, Macgregor S. VEGAS2: Software for More Flexible Gene-Based Testing. Twin Res Hum Genet Off J Int Soc Twin Stud. 2015 Feb;18(1):86–91. doi: 10.1017/thg.2014.79. [DOI] [PubMed] [Google Scholar]
  • 18.Scott SA, Owusu Obeng A, Botton MR, Yang Y, Scott ER, Ellis SB, et al. Institutional profile: translational pharmacogenomics at the Icahn School of Medicine at Mount Sinai. Pharmacogenomics. 2017 Oct;18(15):1381–6. doi: 10.2217/pgs-2017-0137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cox N. UK Biobank shares the promise of big data. Nature. 2018 Oct;562(7726):194–5. doi: 10.1038/d41586-018-06948-3. [DOI] [PubMed] [Google Scholar]
  • 20.Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nat Genet. 2018 Nov;50(11):1593–9. doi: 10.1038/s41588-018-0248-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat Genet. 2019 Mar;51(3):481–93. doi: 10.1038/s41588-018-0321-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hobbs BD, de Jong K, Lamontagne M, Bossé Y, Shrine N, Artigas MS, et al. Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis. Nat Genet. 2017 Mar;49(3):426–32. doi: 10.1038/ng.3752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018 Oct;562(7726):203–9. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, et al. Efficiently controlling for case- control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet. 2018 Sep;50(9):1335. doi: 10.1038/s41588-018-0184-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017 Nov 28;8(1):1826. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, et al. An Atlas of Genetic Correlations across Human Diseases and Traits. Nat Genet. 2015 Nov;47(11):1236–41. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015 Oct;526(7571):68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinforma Oxf Engl. 2010 Sep 15;26(18):2336–7. doi: 10.1093/bioinformatics/btq419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Duchesne P, De Micheaux P. Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Comput Stat Data Anal. 2010;54:858–62. [Google Scholar]
  • 30.Davies RB. Algorithm AS 155: The Distribution of a Linear Combination of Chi-square Random Variables. J R Stat Soc Ser C Appl Stat. 1980;29(3):323–33. [Google Scholar]
  • 31.Kuonen D. Saddlepoint Approximations for Distributions of Quadratic Forms in Normal Variables. Biometrika. 1999;86(4):929–35. [Google Scholar]
  • 32.Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019 Jan 8;47(D1):D766–73. doi: 10.1093/nar/gky955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Silverman EK, Weiss ST, Drazen JM, Chapman HA, Carey V, Campbell EJ, et al. Gender-Related Differences in Severe, Early-Onset Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med. 2000 Dec 1;162(6):2152–8. doi: 10.1164/ajrccm.162.6.2003112. [DOI] [PubMed] [Google Scholar]
  • 34.Foreman MG, Zhang L, Murphy J, Hansel NN, Make B, Hokanson JE, et al. Early-onset chronic obstructive pulmonary disease is associated with female sex, maternal factors, and African American race in the COPDGene Study. Am J Respir Crit Care Med. 2011 Aug 15;184(4):414–20. doi: 10.1164/rccm.201011-1928OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sakornsakolpat P, Prokopenko D, Lamontagne M, Reeve NF, Guyatt AL, Jackson VE, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet. 2019;51(3):494–505. doi: 10.1038/s41588-018-0342-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kalinowski A, Galen BT, Ueki IF, Sun Y, Mulenos A, Osafo-Addo A, et al. Respiratory syncytial virus activates epidermal growth factor receptor to suppress interferon regulatory factor 1-dependent interferon-lambda and antiviral defense in airway epithelium. Mucosal Immunol. 2018 May;11(3):958–67. doi: 10.1038/mi.2017.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kalinowski A, Ueki I, Min-Oo G, Ballon-Landa E, Knoff D, Galen B, et al. EGFR activation suppresses respiratory virus-induced IRF1-dependent CXCL10 production. Am J Physiol - Lung Cell Mol Physiol. 2014 Jul 15;307(2):L186–96. doi: 10.1152/ajplung.00368.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Linden D, Guo-Parke H, Coyle PV, Fairley D, McAuley DF, Taggart CC, et al. Respiratory viral infection: a potential “missing link” in the pathogenesis of COPD. Eur Respir Rev [Internet] 2019 Mar 31;28(151) doi: 10.1183/16000617.0063-2018. [cited 2021 Mar 8]. Available from: https://err.ersjournals.com/content/28/151/180063 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Myers RA, Scott NM, Gauderman WJ, Qiu W, Mathias RA, Romieu I, et al. Genome-wide interaction studies reveal sex-specific asthma risk alleles. Hum Mol Genet. 2014 Oct 1;23(19):5251–9. doi: 10.1093/hmg/ddu222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Marsh EK, Prestwich EC, Marriott HM, Williams L, Hart AR, Muir CF, et al. Pellino-1 regulates the responses of the airway to viral infection. 2020 Aug 31. [cited 2021 Mar 8]; Available from: https://derby.openrepository.com/handle/10545/625059 . [DOI] [PMC free article] [PubMed]
  • 41.Baines KJ, Fu J, McDonald VM, Gibson PG. Airway gene expression of IL-1 pathway mediators predicts exacerbation risk in obstructive airway disease. Int J Chron Obstruct Pulmon Dis. 2017 Feb 8;12:541–50. doi: 10.2147/COPD.S119443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Karl Gertow, Bengt Sennblad, Strawbridge Rona J., John Öhrvik, Delilah Zabaneh, Sonia Shah, et al. Identification of the BCAR1-CFDP1-TMEM170A Locus as a Determinant of Carotid Intima-Media Thickness and Coronary Artery Disease Risk. Circ Cardiovasc Genet. 2012 Dec 1;5(6):656–65. doi: 10.1161/CIRCGENETICS.112.963660. [DOI] [PubMed] [Google Scholar]
  • 43.Bots SH, Peters SAE, Woodward M. Sex differences in coronary heart disease and stroke mortality: a global assessment of the effect of ageing between 1980 and 2010. BMJ Glob Health. 2017 Mar;2(2):e000298. doi: 10.1136/bmjgh-2017-000298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Lionel AC, Tammimies K, Vaags AK, Rosenfeld JA, Ahn JW, Merico D, et al. Disruption of the ASTN2/TRIM32 locus at 9q33.1 is a risk factor in males for autism spectrum disorders, ADHD and other neurodevelopmental phenotypes. Hum Mol Genet. 2014 May 15;23(10):2752–68. doi: 10.1093/hmg/ddt669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kim W, Prokopenko D, Sakornsakolpat P, Hobbs BD, Lutz SM, Hokanson JE, et al. Genome-Wide Gene-by- Smoking Interaction Study of Chronic Obstructive Pulmonary Disease. Am J Epidemiol [Internet] 2020 Oct 27. [cited 2021 Mar 8];(kwaa227). Available from: [DOI] [PMC free article] [PubMed]
  • 46.Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am J Epidemiol. 2017 Nov 1;186(9):1026–34. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES