Abstract
Genetic factors play an important role in cutaneous squamous cell carcinoma (cSCC) risk. Genome-wide association studies have identified 21 single nucleotide polymorphisms (SNPs) associated with cSCC risk. Yet no studies have attempted to quantify the contribution of heritability to cSCC risk by calculating the population attributable risk (PAR) using a combination of all discovered genetic variants. Using an additive multi-locus linear logistic model, we determined the cumulative association of these 21 genetic regions to cSCC PAR. We computed a multi-locus PAR of 62%, suggesting that if the effects of all the risk alleles were removed from a population, the cSCC risk would drop by 62%. Using stratified analysis, we also examined the impact of sex on polygenic risk score, and found that men have an increased relative risk throughout the spectrum of the polygenic risk score. Quantifying the impact of genetic predisposition on the proportion of cancer cases can guide future research decisions and public health policy planning.
Introduction
Cutaneous squamous cell carcinoma (cSCC) is a common form of skin cancer responsible for a substantial public health burden and significant health care costs (Housman et al., 2003). Non-Hispanic whites have a high lifetime prevalence of cSCC, ranging from 7–11% (Kallini, Hamed, & Khachemoune, 2015). Previous studies have shown that cSCC risk clusters in families, suggesting a strong heritable component to this common form of skin cancer (Asgari et al, 2015; Hussain, 2009), but the contribution of genetics to cSCC risk has not been quantified. Recently, genome-wide association studies among non-Hispanic whites have identified variants in 21 genetic loci that have are associated with cutaneous squamous cell carcinoma (cSCC) risk (Asgari et al., 2016; Chahal et al., 2016; Siiskonen et al., 2016). But the degree to which these genetic variants impact the overall burden of cSCC is unclear.
One way to measure the contribution of genetic variants to disease risk is to examine the population attributable risk (PAR), which approximates the reduction in incidence that would be achieved if the risk factor was eliminated from that population. (Rothman, 2008) Although risk variants cannot be removed from the population, genetic PARs are often used in epidemiologic research to estimate the degree to which a disease can be attributed to the risk variant. The validity of the PAR for measuring genetic contributions to disease risk have been established (Witte et al, Nature Reviews 2014) and have bounds that range from 0% to 100%. We computed a multi-locus PAR for cSCC using 21 published SNPs from large-scale GWAS studies. To examine how PAR can be modified by a patient variable, we explored the effects of sex on PAR by stratifying our population by sex. Finally, we used data from the 21 published variants to calculate a polygenic risk score. In other cancers, such as breast cancer, risk prediction models that combine a polygenic risk score with epidemiologic risk factors provide substantial risk stratification of the general population (Meads et al, 2012; Maas et al, 2016). Quantifying the genetic contribution of these SNPs to cSCC risk, in combination with clinical and environmental risk exposure data, could help identify individuals at greatest risk of developing cSCC, who would benefit most from enhanced monitoring programs.
Results
cSCC risk was associated with 21 SNPs, all of which were used to compute a multi-locus PAR (Rockhill et al 1998; Kraft et al 2009). Not all risk alleles are minor alleles. For SNPs associated with a reduction of cSCC risk (“protective” SNP), we took the inverse ORs and the major allele frequencies for the association. This way, all variants could be added to the population attributable risk calculation. The multi-locus PAR (table 1) suggests that if the effects of all risk alleles were removed from a population, there would be a 62% reduction in cSCC cases. SNPs in IRF4, TYR, RALY, FOXP1, MCIR used to compute the multi-locus PAR were identified in more than one GWAS study, and had similar odds ratios, showing a consistency across study populations. Prevalent SNPs with moderate to high odds ratios contributed the most the multi-locus PAR, and included SNPs in the genes SLC45A2, SRC, HERC2, DEF8 and HLADQA1.
Table 1.
cSCC GWAS variants contribute to a multi-locus population attributable risk of 62%
Gene | Chromosome | SNP | Risk Allele | Risk Allele Frequency | Odds Ratio |
---|---|---|---|---|---|
Unknown | 2 | rs192481803 | T | 0.01 | 1.9 |
FOXP1 | 3 | rs62246017 | A | 0.33 | 1.06 |
TPRG1/TP63 | 3 | rs6791479 | T | 0.43 | 1.05 |
ERBB2IP | 5 | rs17247181 | T | 0.10 | 1.34 |
SLC45A2 | 5 | rs35407* | G | 0.96 | 1.69 |
IRF4 | 6 | rs12203592 | T | 0.17 | 1.62 |
HLADQA1 | 6 | rs4455710 | T | 0.38 | 1.17 |
PARK2 | 6 | rs9689649 | C | 0.22 | 1.28 |
AHR | 7 | rs117132860 | A | 0.02 | 1.48 |
ST3GAL1 | 8 | rs9643297 | G | 0.31 | 1.21 |
SEC16A | 9 | rs57994353dsf | C | 0.30 | 1.12 |
BNC2, CNTLN | 9 | rs10810657* | A | 0.59 | 1.11 |
TYR | 11 | rs1126809 | A | 0.28 | 1.16 |
CADM1-BUD13 | 11 | rs74899442 | C | 0.01 | 2.13 |
OCA2 | 15 | rs1800407 | T | 0.07 | 1.2 |
HERC2 | 15 | rs12916300* | T | 0.74 | 1.14 |
DEF8 | 16 | rs4268748 | C | 0.26 | 1.33 |
MC1R | 16 | rs1805007 | T | 0.07 | 1.46 |
DEF8 | 16 | rs8063761 | T | 0.33 | 1.34 |
RALY-ASIP | 20 | rs6059655 | A | 0.07 | 1.27 |
SRC | 20 | rs754626 | G | 0.25 | 1.26 |
These SNPs have been reported as protective SNPs, thus, the major risk allele and its corresponding odds ratio (the reciprocal of the protective OR) are shown.
Figure 1, shows the relative risk of cSCC increasing with higher percentiles of the polygenic risk score, based on 21 risk alleles. Sex-specific curves for males and females are shown, which reveal that males have higher risk as compared to females, particularly at polygenic risk scores that exceed the 40th percentile. This suggests a potential interaction between sex and genetic score. The risk of cSCC is twice the average population risk at the 84th percentile of the polygenic risk score for females and at the 77th percentile for males. A two-fold increased relative risk is a threshold widely used as a clinical benchmark for clinically meaningful increased risk for common diseases, such as cSCC (Roberts et al, 2017).
Figure 1.
cSCC risk with increasing Polygenic Risk Score
Discussion
Different measures can be used to assess how much known genetic factors contribute to overall cSCC risk including heritability on various scales, sibling relative risk, log relative risk genetic variance, the area under the receiver-operating curve, and the population attributable risk (PAR) (Witte et al, 2014). The utility and limitations of these various metrics have been extensively discussed elsewhere (Witte et al, 2014). Here we focused on the PAR and log relative risk variance (which is used to calculate the population distribution of the polygenic risk score), because of their direct relevance to the clinical utility of these genetic variants for cSCC prevention and screening
Most polymorphisms that contribute to the multi-locus PAR are within biologically plausible gene candidates for cSCC etiology. IRF4 (Han et al., 2008) is associated with pigmentation phenotypes. DEF8 is associated with expression of a cell cycle progression gene (CDK10) in sun-exposed skin (GTEx Consortium, 2015). Expression of CADM1, a gene that modifies tumor interaction with cell-mediated immunity, is associated with survival in cSCC patients (Liu et al., 2013). Activation of AHR during UV radiation exposure may decrease apoptosis in keratinocytes (Frauenstein et al., 2013) with potential consequences for enhancing cancer risk. ERBB2IP has been shown to activate the Ras pathway (Kolch, 2003). Risk alleles in SLC45A2, a gene associated with skin pigmentation, were also linked to cSCC risk.
While the multi-locus PAR allows quantification of the joint impact of these variants on cSCC risk, this one statistic is of limited utility for screening purposes, as it does not consider known environmental risk factors, such as sun exposure and smoking. However, calculation of a polygenic risk score, using the combination of these risk alleles helps us to visualize their relationship with cSCC risk, and may be a useful metric, when combined with relevant clinical variables, for screening. Although we have demonstrated the potential utility of the polygenic risk score here, this type of screening is not yet at a stage where it could be incorporated into clinical practice. The next phase of risk prediction models should incorporate both clinical and environmental risk factors, combined with a polygenic risk score to help optimize primary and secondary prevention strategies for cSCCs. Ideally, the model would be developed and calibrated in one population, and validated in an independent population. Identification of individuals with substantially elevated risk is of paramount importance for early detection and prevention efforts.
The PAR is commonly used to approximate the public health implications of modifying or removing an exposure. Although it is a useful quantifiable estimate of the impact of a causal factor (in this case, genetic risk), it can rapidly approach the upper bound of 100%, if the risk allele frequency and relative risk of the disease are high, and may be more inflated than other measures of genetic risk, such as the heritability of disease liability, approximate heritability, sibling recurrence risk, and overall genetic variance using a log relative risk scale (Witte et al, 2014). The polygenic risk score is based on the excess fraction of disease that is associated with the presence of these risk alleles, but not necessarily the etiologic fraction, or the fraction of disease that truly arises due to these risk alleles. An additional limitation of the multi-locus population attributable risk and the polygenic risk score based on these loci is that they do not account for interactions amongst SNPs. Furthermore, it should be noted that these GWAS were done primarily in non-Hispanic white populations so these risks may not translate to other ethnic groups.
Despite these limitations, the multi-locus population attributable risk and its related polygenic risk score are promising tools for identifying those individuals at greatest risk for cSCC, who would benefit from enhanced monitoring, and those individuals who are at lowest cSCC risk for which enhanced monitoring would be time-consuming, costly and unnecessary. Development of such a risk prediction tool for cSCC would benefit from further refinement that includes gene-environment interactions and from validation across different populations, but would have the potential to be clinically meaningful, impacting both screening and prevention efforts. Next generation studies relating the polygenic contribution to cSCC risk will likely incorporate newly discovered rare variants. Future studies may also include variants identified through meta-analyses of existing as well as forthcoming cSCC genome wide association studies, which will increase both the power and robustness of genetic associations used to derive the polygenic risk score.
Materials and Methods
We performed a search of the published literature using standard search strategies involving the querying of two online databases (MEDLINE® and Cochrane) using key words “squamous cell carcinoma”, “skin” and “genome-wide association” from January 1980 to January 31, 2017, followed by evaluation of the bibliographies of relevant articles, and identified three GWAS that utilized cohorts from Kaiser Permanente Northern California, Nurses’ Health Study, the Health Professionals Follow-up Study, the Rotterdam Study, and the 23andMe (Asgari et al., 2016; Chahal et al., 2016; Siiskonen et al., 2016). We excluded one study that was primarily a BCC GWAS that tested significant BCC-associated SNP variants for their relationship with cSCC, and was not a true SCC GWAS (Nan et al, 2011). Whereas our methodology was not a formalized systematic review, we did capture all published cSCC GWAS data. We compiled all SNPs associated with cSCC that replicated in independent populations. Odds ratios for SNPs associated with cSCC were used for the multi-locus population attributable risk calculations, and all studies assumed an additive genetic model. We focused on bi-allelic SNPs, removing one tri-allelic SNP, duplicates or SNPs in LD. If the same SNP was reported in more than one GWAS, or if two SNPs at the same locus were in linkage disequilibrium (R2 >0.3), we used the OR from the largest study for the multi-locus PAR and polygenic risk score calculations. Odds ratios for individual risk alleles, as well as the multi-locus PAR for all alleles combined, are shown in table 1. The multi-locus PAR is weighted most heavily by the risk alleles with both the highest prevalence and largest odds ratios for cSCC. We used the formula below (derived from Rockhill et al, 1998) to compute the multi-locus PAR
where i indexes SNP, and j indexes genotype at each SNP.
We also estimated the population distribution of polygenic risk scores based on published cSCC GWAS. For risk prediction based on multiple loci, we assumed a log-additive model for the joint effects of SNPs and constructed polygenic risk scores by summing the number of alleles across SNPs:
Here βi is the log odds ratio per risk allele at SNP i (given in Table 1) and Gi is the count of risk alleles at locus i and α is chosen so that the average relative risk is 1.
The standard deviation of the polygenic risk score was calculated as follows,
where βi is log relative risk (RR) ~ log OR for SNPi and qi is (1-risk allele frequency). For SNPs that were reported as protective, we used the major allele as the risk allele in our calculations, along with the corresponding cSCC odds ratio for the major allele. The standard deviation of the polygenic risk score can be used to plot the distribution of the relative risk due to known common SNPs: Relative risk of cSCC per polygenic risk score percentile is shown in figure 1, with lines plotted for both males and females. We have plotted percentiles of the polygenic risk score in relation to cSCC risk, rather than the raw values themselves, to facilitate interpretation of the score relative to the population. The plot for males accounts for the underling increased risk of cSCC in male subjects (RR=1.31 for males vs females) (Whiteman et al, 2016).
Acknowledgments
Sources of support: This work was supported by the National Institutes of Health (R01 CA166672). The authors state no conflict of interest. Maryam Asgari’s institution has received grant funding Valeant Pharmaceuticals, but the topic is not relevant to this work.
Abbreviations
- cSCC
cutaneous squamous cell carcinoma
- PAR
population attributable risk
- SNP
single nucleotide polymorphism
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Asgari MM, Wang W, Ioannidis NM, Itnyre J, Hoffmann T, Jorgenson E, et al. Identification of susceptibility loci for cutaneous squamous cell carcinoma. J Invest Dermatol. 2016;136:930–937. doi: 10.1016/j.jid.2016.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asgari MM, Warton EM, Whittemore AS. Family history of skin cancer is associated with increased risk of cutaneous squamous cell carcinoma. Dermatol Surg. 2015;41(4):481–6. doi: 10.1097/DSS.0000000000000292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chahal HS, Lin Y, Ransohoff KJ, Hinds DA, Wu W, Dai HJ, et al. Genome-wide association study identifies novel susceptibility loci for cutaneous squamous cell carcinoma. Nat Commun. 2016;7:12048. doi: 10.1038/ncomms12048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frauenstein K, Sydlik U, Tigges J, Majora M, Wiek C, Hanenberg H, et al. Evidence for a novel anti-apoptotic pathway in human keratinocytes involving the aryl hydrocarbon receptor, E2F1, and checkpoint kinase 1. Cell Death Differ. 2013;20:1425–1434. doi: 10.1038/cdd.2013.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hussain SK, Sundquist J, Hemminki K. The effect of having an affected parent or sibling on invasive and in situ skin cancer risk in Sweden. J Invest Dermatol. 2009;129(9):2142–7. doi: 10.1038/jid.2009.31. [DOI] [PubMed] [Google Scholar]
- GTEx Consortium. Human genomics. The genotype-tissue expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science (New York, NY) 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han J, Kraft P, Nan H, Guo Q, Chen C, Qureshi A, et al. A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PloS Genetics. 2008;4(5) doi: 10.1371/journal.pgen.1000074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Housman TS, Feldman SR, Williford PM, Fleischer AB, Jr, Goldman ND, Acostamadiedo JM, et al. Skin cancer is among the most costly of all cancers to treat for the medicare population. J Am Acad Dermatol. 2003;48:425–429. doi: 10.1067/mjd.2003.186. [DOI] [PubMed] [Google Scholar]
- Kallini JR, Hamed N, Khachemoune A. Squamous cell carcinoma of the skin: Epidemiology, classification, management, and novel trends. Int J Dermatol. 2015;54:130–140. doi: 10.1111/ijd.12553. [DOI] [PubMed] [Google Scholar]
- Kolch W. Erbin: Sorting out ErbB2 receptors or giving ras a break? Sci STKE. 2003;199:pe37. doi: 10.1126/stke.2003.199.pe37. [DOI] [PubMed] [Google Scholar]
- Kraft P, Wacholder S, Cornelis MC, Hu FB, Hayes RB, Thomas G, et al. Nat Rev Genet. 2009 Apr;10(4):264–9. doi: 10.1038/nrg2516. [DOI] [PubMed] [Google Scholar]
- Liu D, Feng X, Wu X, Li Z, Wang W, Tao Y, et al. Tumor suppressor in lung cancer 1 (TSLC1), a novel tumor suppressor gene, is implicated in the regulation of proliferation, invasion, cell cycle, apoptosis, and tumorigenicity in cutaneous squamous cell carcinoma. Tumour Biol. 2013;34:3773–3783. doi: 10.1007/s13277-013-0961-2. [DOI] [PubMed] [Google Scholar]
- Maas P, Barrdahl M, Joshi AD, Auer PL, Gaudet MM, Milne RL, et al. Breast cancer risk from modifiable and non-modifiable risk factors among white women in the United States. JAMA Oncol. 2016;2(10):1295–1302. doi: 10.1001/jamaoncol.2016.1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meads C, Ahmed I, Riley RD. A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance. Breast Cancer Res Treat. 2012;132(2):365–77. doi: 10.1007/s10549-011-1818-2. [DOI] [PubMed] [Google Scholar]; Rockhill, et al. Am J Public Health. 2008 Dec;98(12):2119. [Google Scholar]
- Nan H, Xu M, Kraft P, Qureshi AA, Chen C, Guo Q, et al. Genome-wide association study identifies novel alleles associated with risk of cutaneous basal cell carcinoma and squamous cell carcinoma. Hum Mol Genet. 2011;20:3718–3724. doi: 10.1093/hmg/ddr287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roberts NJ, Vogelstein JT, Parmigiani G, Kinzler KW, Vogelstein B, et al. The predictive capacity of personal genome sequencing. Science Translational Medicine. 2012;133(4):1–9. doi: 10.1126/scitranslmed.3003380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rockhill B, Newman B, Weinberg C. Am J Public Health. 1998;88(1):15–19. doi: 10.2105/ajph.88.1.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rothman K. Modern epidemiology. Philadelphia: Lippincott Williams & Wilkins; 2008. [Google Scholar]
- Siiskonen SJ, Zhang M, Li WQ, Liang L, Kraft P, Nijsten T, et al. A genome-wide association study of cutaneous squamous cell carcinoma among european descendants. Cancer Epidemiol Biomarkers Prev. 2016;25:714–720. doi: 10.1158/1055-9965.EPI-15-1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whiteman DC, Thompson BS, Thrift AP, Hughes MC, Muranushi C, et al. A model to predict the risk of keratinocyte carcinomas. Journal of Investigative Dermatology. 2016;136:1247–1254. doi: 10.1016/j.jid.2016.02.008. [DOI] [PubMed] [Google Scholar]
- Witte JS, Visscher PM, Wray NR. The contribution of genetic variants to disease depends on the ruler. Nat Rev Genet. 2014;15(11):765–76. doi: 10.1038/nrg3786. [DOI] [PMC free article] [PubMed] [Google Scholar]