Summary
Esophageal squamous cell carcinoma (ESCC) has a high disease burden in sub-Saharan Africa and has a very poor prognosis. Genome-wide association studies (GWASs) of ESCC in predominantly East Asian populations indicate a substantial genetic contribution to its etiology, but no genome-wide studies have been done in populations of African ancestry. Here, we report a GWAS in 1,686 African individuals with ESCC and 3,217 population-matched control individuals to investigate its genetic etiology. We identified a genome-wide-significant risk locus on chromosome 9 upstream of FAM120A (rs12379660, p = 4.58 × 10−8, odds ratio = 1.28, 95% confidence interval = 1.22–1.34), as well as a potential African-specific risk locus on chromosome 2 (rs142741123, p = 5.49 × 10−8) within MYO1B. FAM120A is a component of oxidative stress-induced survival signals, and the associated variants at the FAM120A locus co-localized with highly significant cis-eQTLs in FAM120AOS in both esophageal mucosa and esophageal muscularis tissue. A trans-ethnic meta-analysis was then performed with the African ESCC study and a Chinese ESCC study in a combined total of 3,699 ESCC-affected individuals and 5,918 control individuals, which identified three genome-wide-significant loci on chromosome 9 at FAM120A (rs12379660, pmeta = 9.36 × 10−10), chromosome 10 at PLCE1 (rs7099485, pmeta = 1.48 × 10−8), and chromosome 22 at CHEK2 (rs1033667, pmeta = 1.47 × 10−9). This indicates the existence of both shared and distinct genetic risk loci for ESCC in African and Asian populations. Our GWAS of ESCC conducted in a population of African ancestry indicates a substantial genetic contribution to ESCC risk in Africa.
Keywords: esophageal squamous cell carcinoma, African cancer genome-wide association study, ESCC genetics, ESCC GWAS, ESCC meta-analysis
Esophageal squamous cell cancer is common in sub-Saharan Africa, but little is known about the contribution of genetic factors to the development of this cancer. We conducted a genome-wide search for such factors in an African population and identified genetic variants that impact the risk of this cancer.
Introduction
Esophageal cancer (EC) is the eighth most common cancer in the world: an estimated 544,100 deaths occurred globally in 2020.1 The primary histologic types are either adenocarcinoma or squamous cell carcinoma. Esophageal adenocarcinoma (ADC) is associated with gastric reflux and Barret’s esophagus and occurs mainly in Western countries,2 whereas esophageal squamous cell carcinoma (ESCC) is more common in the developing world and accounts for 85% of its global incidence.1 High-risk area for ESCC includes central China, northern Iran, and sub-Saharan Africa. Reported age-standardized incidence rates (ASRs) in South and East Africa range from 14 to 32 and 21 to 47 cases per 100,000 respectively.3,4
The etiology of ESCC is complex. Known risk factors in Africa include low socio-economic status, poor nutrition, tobacco smoking, alcohol use, smoke inhalation from cooking fires, and dental fluorosis with tooth loss.5,6,7 In South Africa, case-control studies in urban Soweto confirmed the association with smoking and alcohol, particularly smoking pipe tobacco and consuming home-brewed beer.8,9 The Johannesburg Cancer Study (JCS) reported that the combination of smoking and regular alcohol consumption increased the risk further, from an odds ratio of 0.9 (95% confidence interval [CI]: 0.6–1.4) in non-smokers to 4.4 (95% CI: 3.2–6.1) in smokers.10,11 A meta-analysis reported the prevalence of infection with human papillomavirus (HPV)-16/18 to be 18% in ESCC-affected individuals,12 but extensive analysis of tumor tissues from individuals seropositive for HPV antibodies in high-incidence regions did not find the consistent presence of HPV DNA, HPV mRNA, or p16(INK4a) upregulation.13 Clinically, the development of ESCC is largely asymptomatic, resulting in late presentation with advanced dysphagia and a very poor prognosis. In Africa, treatment is largely palliative, including the insertion of a stent to enable swallowing and radiotherapy or chemotherapy. Median survival in affected individuals from Mozambique was 3.5 months,14 and affected individuals treated with stents in Malawi and Kenya had a median survival of 210 and 250 days, respectively, after treatment.15,16 The high incidence and dismal prognosis for EC constitute a major ongoing public health problem in South and East Africa.
Genome-wide association studies (GWASs) of many diseases have yielded a rich harvest of robust associations, which provided substantial insights into their pathogenesis.17 However, populations of African ancestry, particularly in continental Africa, have been understudied. This limits the access of these populations to the benefits of precision medicine approaches and also the potential of Africa’s rich genomic diversity to contribute to novel discoveries in the genetics of human disease.17 In ESCC, GWAS and targeted genetic studies from Japanese, Chinese, and European populations have reported associations at multiple loci, including ALDH2, ADH1B, PLCE1, PDE4D, STING1, HEATR3, TP53, MTMR3, CASP8, RUNX1, XBP1/CHEK2, CCHCR1, TCN2, TNXB, LTA, CYP26B1, and FASN.18,19,20,21,22,23,24,25,26 In Xhosa speakers from the South African Black (SAB) population, single-nucleotide polymorphisms (SNPs) associated with ESCC from previous GWAS were tested.27,28,29,30 Most SNPs showed no evidence of association, although preliminary evidence was found for an association of a missense variant p.Arg548Leu in PLCE1 (rs17417407)28 and for a SNP in CHEK2 (rs1033667).29
These results suggest there may be substantial differences in the genetic determinants of susceptibility to ESCC in the SAB population as compared to other populations. A substantially larger and broader association study is needed to investigate genetic susceptibility to ESCC in SAB populations. We have therefore carried out a GWAS to investigate the genetic contribution to ESCC in Black South African populations. We then used a trans-ethnic meta-analysis to identify shared and distinct genetic risk loci in African and Chinese populations.
Material and methods
Study design
This case-control study forms part of the larger Evolving Risk Factors for Cancer in African populations (ERICA-SA) study (https://www.samrc.ac.za/intramural-research-units/evolving-risk-factors-cancers-african-populations-erica-sa). The overall schematic and workflows for the study are outlined in Figure S1. Ethical clearance was obtained from the University of the Witwatersrand Human Research Ethics Committee (HREC) (Medical), certificate numbers M160807 and M2111154.
Study participants
Individuals with histologically confirmed ESCC were recruited from three study sites across South Africa: the Johannesburg Cancer Study (JCS) from Soweto and Johannesburg, Gauteng Province;11 the University of the Cape Town (UCT), Western Cape Province;27,28 and Grey’s Hospital, Pietermaritzburg, KwaZulu-Nata Province.29 Ethnically matched population control individuals were provided by the H3Africa AWI-Gen study (from Soweto, Gauteng Province, and Dikgale, Limpopo Province)31,32,33 and the JCS (Soweto, Gauteng Province)11 (Table S1). Informed consent was obtained from all study participants.
Bio-sampling
Genomic DNA (gDNA) was isolated from peripheral blood samples collected from all study participants. DNA isolation methodologies were previously described.27,28,34 In summary, gDNA was isolated with either a kit-based DNA extraction with the Qiagen DNA FlexiGene kit or the salting-out method.35 Isolated gDNA were resuspended in TE buffer (0.1 mM EDTA and 10 mM Tris-HCL [pH 8.0]) and stored at −80°C in the HREC-approved Sydney Brenner Institute for Molecular Bioscience biobank at the University of the Witwatersrand until use.36
Genotyping
Genotyping of all gDNA samples was done with the H3Africa Custom African ∼2.3 million SNP Array (Illumina)37 (https://www.h3abionet.org/h3africa-chip). Genotyping of all ESCC-affected individuals and the JCS population control individuals was performed at the Genomics Core Facility, Social Genetics & Development Psychiatry Centre, King’s College London as a part of the ERICA-SA study. Genotyping of the AWI-Gen population control individuals was performed with the Illumina FastTrack Sequencing Service (https://www.illumina.com/services/sequencing-services.html). Raw intensity data files were used for data analysis. Genotype clustering and calling were done for all affected individuals and control individuals with the Illumina Array Analysis Platform Genotyping orchestrated command-line workflow, with the Illumina GenCall algorithm, predefined cluster file, and manifest file provided by Illumina (https://emea.support.illumina.com/downloads/iaap-genotyping-orchestrated-workflow.html#:∼:text=Support%20Center%3A,GTC%20format%20and%20PED%20Files.). The PLINK software version 1.9 was used for genotype data management.38 The H3ABioNet/H3AGWAS Pipeline Version 3 (https://github.com/h3abionet/h3agwas) was used for data formatting and data quality control (QC). Parameters for the data quality control are outlined in Table S2.
Imputation
The Sanger Imputation Service was used for genotype imputation (https://imputation.sanger.ac.uk/). QC was applied to the dataset before imputation (Table S2). Imputation was done with the African Genome Resource panel with pre-phasing done with EAGLE2. An Impute2 score of 0.6 was used as the post-imputation cut-off. Post imputation, 14,472,257 SNPs were retained for the final imputed dataset, of which 12,843,994 SNPs were imputed SNPs and 1,628,263 were genotyped SNPs originally submitted for imputation. 69,257 SNPs were deleted during the imputation process.
Population sub-structure control
In view of the complexity of the genetic architecture and presence of extensive population substructure present in the SAB populations,39 our study took four approaches to adjust for the population sub-structure within our sample cohort. (1) ADMIXTURE analysis was done with CEU (n = 99, European), CHB (n = 103, Han Chinese, East Asian), and YRI (n = 108, Yoruba, Nigeria) individuals from the 1000 Genomes Project (1KG) as the reference population. For this analysis, 275,209 linkage disequilibrium (LD)-independent SNPs that were present in both the 1KG dataset as well as the African ESCC GWAS dataset were used. ADMIXTURE analysis was performed for K = 2 to K = 9 with ADMIXTURE v.1.3,40 and K = 6 was found to have the lowest cross-validation error (Figure S2). Study individuals with ≥20% CEU or CHB genetic contributions were excluded. Visualization of the ADMIXTURE results was done with Stata 17.0 (Stata Corp). (2) We performed iterative random sub-sampling testing of control individuals to identify population outlier SNPs.39 We tested a series of random sub-samples of our control populations against each other, including Soweto control individuals vs. Dikgale control individuals, Soweto male control individuals vs. Soweto female control individuals, and Dikgale control individuals vs. Dikgale control individuals. Each model was subjected to 30 permutations and modeled via linear-mixed model (LMM) methods (Gemma v.0.98.1).41 A total of 1,213,862 SNPs with nominal p values of less than 0.05 and exceeding the p < 0.05 threshold ten times or more out of the 30 permutations under each model were flagged and removed from the final GWAS LMM summary statistics. (3) Eigen decomposition for principal-component (PC) analysis with LD-independent SNPs (100 kb window, 20 SNPs within each window, at an r2 of 0.2) was done with PLINK v.1.9.38 After LD pruning of the genotyped dataset, 428,379 (out of 1,699,678 genotyped SNPs) LD-independent SNPs were retained for PC analysis. PCs 1 to 5 were tested for associations with disease phenotype status with a generalized linear model (GLM). The results were visualized in R (https://www.R-project.org/) via ggplot2.42 (4) A genetic relationship/kinship matrix (GRM) was constructed with ∼1 million genotyped SNPs. The GRM is incorporated into the LMM for the final GWAS modeling.
SNP-based heritability estimation (h2g)
We calculated SNP-based heritability estimates (h2g) for ESCC by using restricted maximum likelihood estimations (REMLs) implemented in LDAK v.5.2.43 We performed LDAK weighting to account for LD by using a correlation squared threshold of 0.98. This allowed for the thinning of predictors from 1,699,678 SNPs to 1,097,541 SNPs. A kinship matrix was computed on the thinned predictors and used for the h2g estimation. The h2g for ESCC was estimated on the liability scale with the Globocan 2020 incidence for esophageal cancer in South Africa (age-standardized incidence rate of 6.8/100,000) as a proxy for disease prevalence.3
Genome-wide association analysis: Linear-mixed modeling
For the LMM, the binary case-control phenotype together with PCs 1–5 as covariates and the kinship matrix was used for the modeling. LMM was done with GEMMA v.0.98.1.41 Odds ratios were approximated with the beta and case-control ratios.44 Data visualization was done in Stata 17.0 (Stata Corp) and R’s ggplot2 (https://www.R-project.org/).42
Permutation testing
Adaptive permutation testing was done with PLINK v.1.9’s permutation algorithm deployed in field-programmable gate arrays (FPGA) designed to run on a cloud-based AWS EC2 FPGA instance.45 Seven hundred million adaptive permutations were performed on the imputed GWAS dataset with the linearized phenotype derived by residualizing binary phenotype and PCs 1–5 under an additive model. Seven hundred million adaptive permutations were the limit of the AWS EC2 FPGA deployment.
Meta-analysis
A fixed-effect, weighted sum of Z score meta-analysis of SNPs common to both our study and the study by Abnet et al.19 was performed with METAL (v.2011-03-25).46 Abnet et al.19 reported the largest Asian ESCC GWAS for which the summary statistics have been made publicly available. Both GWASs were modeled with the same conditions (sex and PCs used as covariates). The meta-analysis was performed with p values and direction of effect, weighted by the sum of Z scores (1,686 African ESCC-affected individuals, 3,217 ethnically matched population control individuals, 2,013 Chinese ESCC-affected individuals, 2,701 ethnically matched control individuals). 4,850,898/7,599,678 SNPs from Abnet et al.19 overlapped with our dataset.
Fine mapping and functional analysis of associated variants
We performed regional mapping by using LocusZoom v.1.447 for all top hits with p < 5 × 10−6, with a 400 kb flanking nucleotide window, and by using precomputed LD information from our study population. We performed conditional analysis by using GEMMA41 to assess the independence of association signals identified in the LMM. We used FINEMAP v.1.4.1 to estimate posterior probabilities to identify plausible causal variants within the regions surrounding the top hits.48 We used the FUMA online application (PLMM threshold ≤ 5 × 10−6) to annotate and interpret associated GWAS variants49 and to annotate co-localized expression quantitative trait loci (eQTLs) in the esophageal tissues of interest from GTEx version 8.39 We utilized the REACTOME pathway database to analyze pathway knowledge of top GWAS hits.50,51,52 We used ChIP-Atlas (http://chip-atlas.org)53 to assess the presence/absence of transcriptional regulators, histone modifications, or chromatin accessibility in esophageal tissue or cell lines in the vicinity of top hits with PLMM < 5 × 10−8.
Results
Study participants and dataset
After data quality control, our final study population consisted of 1,686 ESCC-affected individuals and 3,217 control individuals (Table S1). After imputation, SNPs with an Impute2 score of 0.6 or more (Table S3) were retained, returning 14,472,257 SNPs for the final genotyped and imputed dataset; with 92.8% of SNPs had an impute score of ≥0.90.
Population substructure in African affected individuals and control individuals
Complex degrees of population structure were present within our affected and control populations enrolled from several sites within South Africa (Table S1). Although all participants self-reported Black African ancestry, we took additional steps to adjust for the sub-structure observed within the study population. We used ADMIXTURE analysis with the 1KG phase 3 reference data54 to exclude individuals with high (≥20%) European (CEU) or East-Asian (CHB) ancestry from our study population (Figure 1). PCs 1 to 5 accounted for the majority of the variance observed (Figure S3) and PCs 1, 2, 4, and 5 were statistically associated with disease phenotype status (p < 0.001). PCs 1 to 5 were selected for incorporation into the LMM as covariates. Overall, our affected and control populations were well matched on PC (Figures 1 and S4). When we compared the PC by study sites, samples from the JCS and Soweto (both affected individuals and control individuals) were closely matched, and individuals from other study sites clustered together (Figures 1 and S4). Lastly, the inclusion of a kinship matrix to account for participant relatedness with ∼1 million markers in the LMM completed the population sub-structure adjustment methodology.
Figure 1.
Admixture plot and principal component plots of ESCC-affected individuals and control individuals
(A) The ADMIXTURE plot (K = 6) of ESCC-affected individuals and control individuals by study sites is shown on the top.
(B) PC plot of study participants by disease phenotype status is shown on the bottom left.
(C) PC plot of study participants by study site is shown by the bottom right. JCS, Johannesburg Cancer Study; UCT, University of Cape Town; UKZN, University of KwaZulu-Natal; 1KG: CEU, 1000 Genome Northern European; 1KG: CHB, Han Chinese; 1KG: YRI, Yoruba.
Genome-wide association analysis of African ESCC-affected individuals and control individuals
Genotyped and imputed SNPs were tested for association with ESCC via LMM. The LMM method was chosen for its effectiveness in adjusting for genetic inflation from polygenic backgrounds.55 The genomic inflation factor lambda for our LMM was 1.01 and the quantile-quantile plot reflected similar distributions (Figure 2). We identified one genome-wide-significant signal on chromosome 9 (rs12379660, pLMM = 4.58 × 10−8; odds ratio [OR] = 1.28, 95% CI: 1.22–1.34, effect-allele frequencyaffected_individuals [EAFaffected_individuals] = 0.251) near FAM120A and one signal nearing genome-wide significance on chromosome 2 (rs142741123, pLMM = 5.49 × 10−8, OR = 2.66, 95% CI: 2.43–2.89, EAFaffected_individuals = 0.017) in MYO1B (Figure 2; Table 1). Both signals have support from regional associations with SNPs in high LD and with similar permutation p values (chr9:rs10992729, pperm = 9.71 × 10−8; chr2:rs141499985, pperm = 5.57 × 10−8). The lead SNP on chromosome 9, rs12379660, was genotyped rather than imputed and showed discrete genotype clusters (Figure S5). SNPs in LD with this SNP span the entirety of FAM120A and the adjacent FAM120AOS (at 96.2 Mb; Figure 3). The chromosome 2 locus has two independent signals separated by 400 Kb (rs142741123 and rs113702517, r2 = 0.0007). The lead signal in this window (rs142741123) is located within MYO1B (Figure 3) and the second signal (rs113702517) is located within STAT4 (pLMM = 9.37 × 10−7, OR = 0.68, 95% CI: 0.57–0.78, EAFaffected_individuals = 0.05) (Figure 3). These signals are separated by recombination hotspots and are not in LD (r2 = 0.00039). Top SNPs in both chromosome 2 signals have strong support from neighboring SNPs in LD and are likely to be African specific since they are monomorphic in European and Asian populations (Table S4). In addition, both these signals had permutation p values of <1 × 10−7 after 700 million adaptive permutation tests via the PLINK permutation algorithm (Table 1). Conditional analysis of rs142741123 with rs113702517 as a covariate and of rs113702517 with rs142741123 as a covariate further confirmed that two loci were independent (rs142741123, pcond = 9.25 × 10−8; rs113702517, pcond = 1.52 × 10−6). Using FINEMAP,48 the 95% credible set contained eight SNPs for rs12379660 on chromosome 9 and two SNPs for rs142741123 on chromosome 2, and these top SNPs had the highest posterior probability as lead SNPs for their respective loci (Table S5). We also found a further 60 SNPs from 29 independent loci with suggestive association to ESCC (pLMM ≤ 5 × 10−6) (Table S6). The estimated genetic heritability (h2g) of ESCC with 1,097,541 genotyped SNPs was 32.67% (standard deviation [SD]: 1.98%) on the liability scale.
Figure 2.
Quantile-Quantile plot and Manhattan plot of the ESCC GWAS
(A) The QQ plot is shown on the left, with lambda for genomic control of 1.01.
(B) The Manhattan plot with top hits on chromosome 9 and chromosome 2 is shown on the right.
Table 1.
Top SNPs associated with African esophageal squamous cell carcinoma from chromosome 9 and chromosome 2
| rsID | Chr | Position (hg19) | Gene | Alleles (effect/non-effect) | EAF affected individuals | EAF control individuals | OR (95% CI) | pLMMvalue | ppermvaluea | Impute2 score |
|---|---|---|---|---|---|---|---|---|---|---|
| rs12379660b | 9 | 96182487 | FAM120A | A/G | 0.251 | 0.207 | 1.28 (1.22–1.34) | 4.58 × 10−8 | 3.71 × 10−8 | –c |
| rs10992729 | 9 | 96181075 | FAM120A | T/C | 0.254 | 0.212 | 1.26 (1.21–1.32) | 1.64 × 10−7 | 9.71 × 10−8 | 0.995 |
| rs142741123b | 2 | 192207115 | MYO1B | C/T | 0.017 | 0.007 | 2.66 (2.43–2.89) | 5.49 × 10−8 | 3.86 × 10−8 | 0.935 |
| rs141499985 | 2 | 192212930 | MYO1B | T/G | 0.017 | 0.007 | 2.64 (2.41–2.87) | 6.33 × 10−8 | 5.57 × 10−8 | 0.929 |
OR, odds ratio; 95% CI, confidence interval; pLMM, p value from linear mixed model.
p value from 700 million adaptive permutations, see material and methods.
Lead SNP at each locus, with their strongest supporting SNP.
Genotyped SNP.
Figure 3.
Locuszoom plots of top hits in the African ESCC GWAS
(A) rs12379660/FAM120A on chromosome 9.
(B) rs142741123/MYO1B on chromosome 2.
(C) rs113702517/STAT4 on chromosome 2.
(D) rs2834763/RUNX1 on chromosome 21.
Replication of known ESCC risk loci
A total of 33 SNPs from 23 loci were assessed for association in our African GWAS based on prior evidence of association in GWASs with at least suggestive evidence of association at p < 5 × 10−6 (Table S7).19,21,23,25,27,28,56 Of the 33 SNPs, 24 had the same directional effect in the African ESCC GWAS (exact binomial test p = 0.0135). None of the ESCC-associated SNPs reached a Bonferroni p value threshold (p < 0.0015) for 33 SNPs in our study, although three loci showed modest association: rs1614972 at ADH1C (p = 0.011), rs130079 at CCHCR1 (p = 0.037), and rs1033667 at CHEK2 (p = 0.005). The reported top signal at the RUNX1 locus in the Asian population on chromosome 21, rs2014300, was not associated in our study (p = 0.099), but a SNP located just upstream of RUNX1, rs2834762, showed moderate but not genome-wide, evidence of association (p = 6.64 × 10−5, OR = 1.19, 95% CI = 1.13–1.24) (Figure 3).
Trans-ethnic meta-analysis of African and Chinese ESCC GWAS
We carried out a fixed-effect, weighted sum of Z score meta-analysis of our African ESCC GWAS with an ESCC GWAS from the Chinese population to increase the power of the study to identify risk loci and to look for evidence of transferability of risk loci across different ethnicities. Summary statistics from Abnet et al.,19 comprising 2,013 Chinese ESCC-affected individuals and 2,701 ethnically matched control individuals, were used for the meta-analysis. In order to match the modeling conditions of the Chinese GWAS, we remodeled our African ESCC GWAS to include sex and PCs as covariates. This remodeling resulted in a small reduction in the level of significance of the LMM p values. 12 SNPs from three loci reached the genome-wide-significance threshold (Table 2). This includes FAM120A on chromosome 9 led by rs12379660 (pmeta = 9.36 × 10−10), CHEK2 on chromosome 22 led by rs1033667 (pmeta = 1.47 × 10−9), and PLCE1 on chromosome 10 led by rs7099485 (pmeta = 1.48 × 10−8) (Figure 4). The chromosome 9 FAM120A locus was identified in our African ESCC GWAS and not previously reported to be associated with ESCC in the Chinese GWAS. A moderate level of association at this locus was present in the Chinese ESCC GWAS, but it did not reach genome-wide-significance thresholds in previous ESCC GWASs (Figure 5). The chromosome 22 locus at CHEK2 was previously identified in the Chinese ESCC GWAS, with moderate levels of support in the African ESCC GWAS (pAFR = 3.82 × 10−3) (Figure 5). The chromosome 10 locus at PLCE1, previously identified in the Chinese ESCC GWAS, shows genome-wide significance in the meta-analysis, but the signal is primarily driven by the Chinese ESCC GWAS (Figure 5). Lastly, a strong but not genome-wide significant signal on chromosome 2 was found near SLC16A14 led by rs34115901 (pmeta = 1.39 × 10−7). A moderate level of association with this SNP was present in both GWASs but did not reach genome-wide-significance thresholds, whereas the increase in power of the meta-analysis provided strengthened evidence of association for this locus (Figure 5). Robust additional support is evident from nearby SNPs within this locus.
Table 2.
Top signals from African-Chinese meta-analysis for esophageal squamous cell carcinoma
| Chr | Position (hg19) | rsID | Allelea | EAF affected individuals (CHN) | EAF affected individuals (AFR) | INFO score (CHN) | INFO score (AFR) | p value (CHN) | p value (AFR)b | p value meta-analysis | OR | 95% CI | Heterogeneity p valuec |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 230941048 | rs34115901d | A/G | 0.228 | 0.029 | 0.905 | 0.883 | 1.48 × 10−3 | 2.06 × 10−5 | 1.39 × 10−7 | 1.86 | 1.72–2.01 | 0.4764 |
| 2 | 230937668 | rs34590391 | A/G | 0.249 | 0.028 | 0.962 | 0.906 | 4.27 × 10−3 | 5.90 × 10−6 | 1.65 × 10−7 | 1.81 | 1.67–1.96 | 0.2580 |
| 2 | 230940750 | rs35396967 | A/G | 0.246 | 0.030 | 0.937 | 0.883 | 1.94 × 10−3 | 2.17 × 10−5 | 1.97 × 10−7 | 1.81 | 1.68–1.96 | 0.4472 |
| 9 | 96182487 | rs12379660d | A/G | 0.406 | 0.251 | 1.000 | 1.000 | 1.09 × 10−3 | 7.89 × 10−8 | 9.36 × 10−10 | 1.51 | 1.42–1.60 | 0.1533 |
| 9 | 96181075 | rs10992729 | T/C | 0.388 | 0.254 | 0.967 | 0.995 | 1.51 × 10−3 | 2.01 × 10−7 | 2.97 × 10−9 | 1.50 | 1.41–1.60 | 0.1694 |
| 9 | 96179785 | rs1351117 | A/G | 0.408 | 0.254 | 0.974 | 0.994 | 2.20 × 10−3 | 2.75 × 10−7 | 6.12 × 10−9 | 1.48 | 1.39–1.58 | 0.1579 |
| 10 | 96065694 | rs7099485d | C/T | 0.231 | 0.456 | 0.999 | 0.998 | 2.26 × 10−9 | 3.74 × 10−2 | 1.42 × 10−8 | 1.94 | 1.86–2.03 | 0.0049 |
| 10 | 96074157 | rs7903902 | C/T | 0.231 | 0.456 | 0.995 | 0.998 | 3.38 × 10−9 | 4.70 × 10−2 | 2.74 × 10−8 | 1.93 | 1.85–2.01 | 0.0046 |
| 10 | 96063279 | rs3781265 | A/T | 0.231 | 0.459 | 0.998 | 0.996 | 2.26 × 10−9 | 5.48 × 10−2 | 2.76 × 10−8 | 1.94 | 1.86–2.02 | 0.0035 |
| 10 | 96058298 | rs3765524 | T/C | 0.230 | 0.465 | 1.000 | 1.000 | 1.88 × 10−9 | 6.62 × 10−2 | 3.42 × 10−8 | 1.94 | 1.86–2.03 | 0.0027 |
| 10 | 96070375 | rs3781264 | G/A | 0.170 | 0.234 | 1.000 | 1.000 | 6.50 × 10−9 | 4.55 × 10−2 | 4.00 × 10−8 | 2.05 | 1.96–2.14 | 0.0061 |
| 22 | 29130300 | rs1033667d | T/C | 0.279 | 0.442 | 0.993 | 1.000 | 1.28 × 10−8 | 3.82 × 10−3 | 1.47 × 10−9 | 1.84 | 1.76–1.92 | 0.0417 |
| 22 | 29140823 | rs5752783 | C/A | 0.731 | 0.506 | 0.992 | 0.984 | 8.80 × 10−7 | 5.97 × 10−4 | 3.78 × 10−9 | 0.58 | 0.41–0.82 | 0.2682 |
| 22 | 29135121 | rs9613668 | C/G | 0.275 | 0.506 | 0.967 | 0.981 | 1.23 × 10−6 | 6.78 × 10−4 | 5.79 × 10−9 | 1.73 | 1.65–1.81 | 0.2781 |
| 22 | 29105415 | rs5752773 | C/G | 0.779 | 0.419 | 0.998 | 0.993 | 1.62 × 10−7 | 4.42 × 10−3 | 1.20 × 10−8 | 0.55 | 0.34–0.87 | 0.0807 |
Chr2/SLC16A, Chr9/FAM120A, Chr10/PLCE1, Chr22/CHEK2. EAF, effect allele frequency; OR, odds ratio; 95% CI, confidence interval.
Effect allele/non-effect allele.
p value (AFR) differs from the main LMM as the AFR GWAS was remodeled to match the conditions of the CHN GWAS.
Heterogeneity p value threshold is p < 0.05.
Lead SNP at each locus, with their supporting SNPs.
Figure 4.
Meta-analysis Manhattan plot for African ESCC and Chinese ESCC
Green arrows indicate African-GWAS-driven signals, blue arrows indicate Chinese-GWAS-driven signals.
Figure 5.
Meta-analysis Locuszoom plots of FAM120A, CHEK2, PLCE1, and SLC16A14
First column contains Locuszoom plots of the meta-analysis, using the African GWAS LD data as reference. The second column contains Locuszoom plots of the African GWAS, using African LD data as reference. The third column contains the Chinese GWAS plots, using Chinese LD data as reference.
Functional analysis with bioinformatics
The two most strongly associated SNPs at the FAM120A locus (rs12379660 and rs10992729) are among seven highly significant eQTLs for FAM120AOS in esophageal mucosa and the two eQTLs for FAM120AOS in esophageal muscularis (Table S8) with GTEx version 8.39 No eQTL data are available for the rare African-specific SNPs in the MYO1B/STAT4 region. The top associated SNPs at both PLCE1 and CHEK2 are strong eQTLs in esophageal mucosa (Table S9). Via Reactome,50,51,52 FAM120A was found to interact with YBX1, a DNA- and RNA-binding protein involved in various processes, such as translational repression, RNA stabilization, mRNA splicing, DNA repair, and transcriptional regulation.57,58,59,60,61,62 The ChIP-Atlas53 database lookup did not show RNA polymerase binding sites, transcription-factor-binding sites, histone modifications, or DNAase I hypersensitive sites in the immediate vicinity (30 kb flanking) of the lead SNP on chromosome 9 in esophageal epithelium or in squamous cell carcinoma cell lines.
Discussion
The genetic contribution to esophageal squamous cell carcinoma disease risk is not well understood. The unique geographical distribution of high-risk regions for ESCC across north-central China to the Caspian Sea and eastern to southern Africa suggests possible shared disease etiology among populations located far apart.3 The common epidemiological risk factors described for ESCC to date do not account for the global burden of the disease.63,64 Previous studies have provided strong evidence for genetic risk factors for ESCC, but most large-scale genetic studies were conducted in Asian populations,20,21,22,23 with smaller candidate gene studies conducted in African populations.65 Our GWAS identified risk loci for ESCC in a South African Black population and provides evidence for a substantial contribution of genetic factors to the risk of ESCC. Our trans-ethnic meta-analysis identified both shared and distinct genetic risk alleles between the SAB and Chinese populations.
Given the genetic diversity and continental and regional population substructure present within the African and SAB populations,66,67,68 the ability to control for this complexity in a GWAS of disease-affected individuals and control individuals is critical to the success of such studies. We used a combined approach of ADMIXTURE analysis, PC analysis, kinship matrix, and iterative testing of the control individuals to adjust for genetic variance within our dataset and to remove SNPs that produced false-positive association signals in the ESCC GWAS resulting from subpopulation-specific differences. This, together with appropriate sample and genotype data quality control, provided a firm foundation for the ESCC GWAS.
We detected strong signals for African ESCC on chromosome 9 and chromosome 2. Several lines of evidence support our lead SNP on chromosome 9, rs12379660, upstream of FAM120A. This association is supported by multiple SNPs in LD across the FAM120A locus, and the top SNPs survived 700 million adaptive permutation tests. Also, this was the strongest signal in the meta-analysis, and the Chinese data increased the significance of the association at rs12379660 to a combined pmeta value of 9.36 × 10−10. This SNP is common in both of these populations (EAFaffected_individuals, 0.40 in CHN vs. 0.25 in AFR), and the evidence is indicative of shared risk alleles for ESCC in the Chinese and African populations at FAM120A. The potential functional significance of the top SNPs at this locus is suggested by their co-localization with highly significant cis-eQTLs in FAM120AOS in both esophageal mucosa and esophageal muscularis tissue, although evidence of regulatory features in the vicinity of these SNPs, such as transcription-factor-binding sites, is currently lacking. FAM120A, also known as the constitutive co-activator of PPAR-gamma-like protein 1, is a key component of the oxidative stress-induced survival signaling pathway. It activates SRC family kinases and enables SRC family kinases to phosphorylate and activate PI3-kinase.39 FAM120A has been reported to be overexpressed in several cancers, including colorectal cancer and head and neck squamous cell carcinoma.69
The signal at MYO1B on chromosome 2 has not been previously reported to be associated with ESCC in non-African populations. The top SNP occurs at a frequency of 1.7% in the SAB population and is monomorphic in other populations (Table S4). The gene of interest in this window, MYO1B, encodes the myosin 1b protein, which has been linked to cellular proliferation and migration in head and neck squamous cell carcinomas and cervical cancer with HPV co-infection.70,71,72 Association at a second, independent locus at STAT4 was also detected within 400 kb of MYO1B. The lead SNPs in these two loci are not in LD, and conditional analysis of the top hit in MYO1B with the leading SNP in STAT4 as a covariate and vice versa further supports the independence of these loci, but neither of the lead SNPs reached genome-wide significance in the conditional analysis. STAT4 is involved in cytokine signaling in the immune system by binding to IL12RB2 in the interleukin-12 receptor complex and has recently been identified as a key component of the JAK/STAT signaling pathway in HPV-induced cervical carcinogenesis.73
The trans-ethnic meta-analysis confirmed our previous findings that CHEK2 is a risk factor for ESCC in the SAB population, although with a smaller effect size compared to the Chinese population (ORSAB(95% CI): 1.11 (1.06–1.16) vs. ORCHN(95% CI): 1.33 (1.20–1.46)). The very strong association observed at this locus, led by rs1033667, in the meta-analysis provides further support for the contribution of CHEK2 variants to the risk of ESCC, and the risk allele is more common in African populations (EAFaffected_individuals, 0.28 in CHN vs. 0.44 in AFR). PLCE1, however, which is very strongly associated with ESCC in the Chinese population, shows only nominal evidence of association in the SAB population with the lowest p value of 0.037 (for rs7099485, EAFaffected_individuals, 0.23 in CHN vs. 0.46 in AFR). There are several possible explanations for this. One is that the causal variant at this locus has not been genotyped and is in low LD with genotyped SNPs at this locus in African populations. Alternatively, the causal variant may have arisen after the out-of-Africa migration,74 or there may be genetic and environmental interactions involved that are specific to the Chinese population. Even though few of the known ESCC genetic risk factors were replicated in our study, genetics plays a sizable role in the etiology of African ESCC. Genetic heritability analysis estimates the genetic contribution for ESCC is approximately 33% in our African study, using South Africa’s EC incidence data from Globocan 2020 as a more robust estimate for the disease prevalence. This finding is comparable with previous findings of 38% genetic heritability for ESCC estimated in Asian populations.75
We found a nominally significant association with rs1614972 in ADH1C on chromosome 4q23, a SNP that has been associated with alcohol dependence.76 In studies of East Asian populations, strong gene-environment interactions have been reported for ESCC risk where there are common alleles at 4q23 and in ALDH2 at 12q24, which leads to high concentrations of acetaldehyde in alcohol drinkers.77 We could not fully explore this potential interaction in the current study, as alcohol consumption data were not available for our control individuals. The importance of alcohol use in African ESCC etiology has varied between studies and alcohol intake varies widely among individuals in this region.7,8,9,10 Large datasets with both genetic data and alcohol intake data may be necessary to understand this important risk factor.
Although our African ESCC GWAS consisted of a total of 4,903 individuals, a substantially larger sample size will be required to detect a greater proportion of potential genetic loci associated with ESCC in African populations. A GWAS of ESCC in African populations is being carried out by the African Esophageal Cancer Consortium78 in ESCC-affected individuals and ethnically matched control individuals from east Africa. This expanded dataset will allow future replication and meta-analysis studies of African ESCC and may assist in fine-mapping the associated loci detected in our study.
In conclusion, this study provides evidence of a substantial genetic contribution to ESCC risk in an African population and has identified novel and shared risk loci for this important African cancer. It also provides an example of the power of trans-ethnic meta-analysis to identify common and distinct risk loci in populations of diverse ancestry.
Acknowledgments
This work was supported by the South African Medical Research Council (SAMRC), the National Health Laboratory Service (to E.S.), and the Cancer Association of South Africa (to C.G.M., C.B.dV., and C.M.L.). The Evolving Risk Factors for Cancers in African Populations (ERICA-SA) Study and the Johannesburg Cancer Study were supported by the SAMRC (with funds received from the South African national Department of Health (NDoH) and the UK Medical Research Council (with funds from the UK Government’s Newton Fund) (MRCRFA-SHIP 01-2015). M.I.P. and C.G.M. were jointly supported by the SAMRC with funds received from the NDOH and the MRC UK with funds from the UK Government’s Newton Fund grant #046 and GSK. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the SAMRC or the South African NDoH or the MRC UK from the UK Government’s Newton Fund. This work was also supported by the following organizations and funding sources: the German Academic Exchange Service-National Research Foundation Joint In-country Scholarship Programme (to W.C.C.); the National Research Foundation and Department of Science and Technology Thuthuka fund (to W.C.C.); the Boehringer Ingelheim Fonds travel grant (to W.C.C.); the Grants, Innovation, and Product Development Unit (GIPD) of the SAMRC; the National Institute for Health Research Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London (to C.M.L.); and the South African NDoH (to E.S.). We acknowledge the NIH-funded H3Africa Consortium Collaborative Centre, AWI-Gen, for sharing data for the population controls used in our study (PI: M.R.). It is with great sorrow that we record the death of our colleague and co-author Dr. Elvira Singh on 27 February 2022.
Author contributions
C.G.M., W.C.C., C.M.L., D.B., F.S., and E.S. designed the study. C.G.M., D.B., F.S., E.S., T.W., R.N., C.B.d.V., and C.M.L. acquired the funding. W.C.C., C.B.d.V., L.F., C.S., L.S., C.J.C., F.S., M.R., C.A., M.I.P., and E.S. were responsible for sample acquisition, processing, and management and genotyping. W.C.C., J.T.B., M.H., and Y.S. performed the analysis. W.C.C. and C.G.M. wrote the manuscript. All authors critically reviewed, edited, and approved the manuscript.
Declaration of interests
The authors declare no competing interests.
Published: September 5, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.08.007.
Supplemental information
Data and code availability
The African ESCC data used in this study are available to interested researchers through the European Genome-Phenome Archive (EGA), subject to controlled access review by the Data and Biospecimen Access Committee of the University of the Witwatersrand; African ESCC genotype dataset EGA study accession number: EGAS00001007477. The AWI-Gen data used in this study are available to interested researchers through EGA, subject to controlled access review by the Data and Biospecimen Access Committee of the H3Africa Consortium; AWI-Gen genotype dataset accession number: EGAD00010001996. GWAS Catalog (https://www.ebi.ac.uk/gwas/) summary statistics reported in the paper are accessible on GWAS Catalog at the following accession numbers: GCST90271955 and GCST90271956.
References
- 1.Morgan E., Soerjomataram I., Rumgay H., Coleman H.G., Thrift A.P., Vignat J., Laversanne M., Ferlay J., Arnold M. The global landscape of esophageal squamous cell carcinoma and esophageal adenocarcinoma incidence and mortality in 2020 and projections to 2040: New estimates from GLOBOCAN 2020. Gastroenterology. 2022 doi: 10.1053/j.gastro.2022.05.054. [DOI] [PubMed] [Google Scholar]
- 2.Enzinger P.C., Mayer R.J. Esophageal Cancer. N. Engl. J. Med. 2003;349:2241–2252. doi: 10.1056/NEJMra035010. [DOI] [PubMed] [Google Scholar]
- 3.Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
- 4.Somdyala N.I.M., Parkin D.M., Sithole N., Bradshaw D. Trends in cancer incidence in rural Eastern Cape Province; South Africa, 1998-2012. Int. J. Cancer. 2015;136:E470–E474. doi: 10.1002/ijc.29224. [DOI] [PubMed] [Google Scholar]
- 5.Abnet C.C., Buckle G.C., Chen Y., Dawsey S.M., Kayamba V., Mwachiro M.M., Dzamalala C., Fleischer D.E., Kaimila B., Kelly P., et al. Expanding oesophageal cancer research and care in eastern Africa. Nat. Rev. Cancer. 2022;22:253–254. doi: 10.1038/s41568-022-00458-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Menya D., Maina S.K., Kibosia C., Kigen N., Oduor M., Some F., Chumba D., Ayuo P., Middleton D.R.S., Osano O., et al. Dental fluorosis and oral health in the African Esophageal Cancer Corridor: Findings from the Kenya ESCCAPE case-control study and a pan-African perspective. Int. J. Cancer. 2019;145:99–109. doi: 10.1002/ijc.32086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Menya D., Kigen N., Oduor M., Maina S.K., Some F., Chumba D., Ayuo P., Osano O., Middleton D.R., Schüz J., McCormack V.A. Traditional and commercial alcohols and esophageal cancer risk in Kenya. Int. J. Cancer. 2019;144:459–469. doi: 10.1002/ijc.31804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Segal I., Reinach S.G., de Beer M. Factors associated with oesophageal cancer in Soweto, South Africa. Br. J. Cancer. 1988;58:681–686. doi: 10.1038/bjc.1988.286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pillay V., Isaacson C., Mothobi P., Hale M., Tomar L.K., Tyagi C., Altini M., Choonara Y.E., Kumar P. Carcinogenic nitrosamines in traditional beer as the cause of oesophageal squamous cell carcinoma in black South Africans. S. Afr. Med. J. 2015;105:656–658. doi: 10.7196/SAMJnew.7935. [DOI] [PubMed] [Google Scholar]
- 10.Pacella-Norman R., Urban M.I., Sitas F., Carrara H., Sur R., Hale M., Ruff P., Patel M., Newton R., Bull D., Beral V. Risk factors for oesophageal, lung, oral and laryngeal cancers in black South Africans. Br. J. Cancer. 2002;86:1751–1756. doi: 10.1038/sj.bjc.6600338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chen W.C., Singh E., Muchengeti M., Bradshaw D., Mathew C.G., Babb de Villiers C., Lewis C.M., Waterboer T., Newton R., Sitas F., et al. Johannesburg Cancer Study (JCS): contribution to knowledge and opportunities arising from 20 years of data collection in an African setting. Cancer Epidemiol. 2020;65 doi: 10.1016/j.canep.2020.101701. [DOI] [PubMed] [Google Scholar]
- 12.Petrelli F., de Santi G., Rampulla V., Ghidini A., Mercurio P., Mariani M., Manara M., Rausa E., Lonati V., Viti M., et al. Human papillomavirus (HPV) types 16 and 18 infection and esophageal squamous cell carcinoma: a systematic review and meta-analysis. J. Cancer Res. Clin. Oncol. 2021;147:3011–3023. doi: 10.1007/s00432-021-03738-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Halec G., Schmitt M., Egger S., Abnet C.C., Babb C., Dawsey S.M., Flechtenmacher C., Gheit T., Hale M., Holzinger D., et al. Mucosal alpha-papillomaviruses are not associated with esophageal squamous cell carcinomas: Lack of mechanistic evidence from South Africa, China and Iran and from a world-wide meta-analysis. Int. J. Cancer. 2016;139:85–98. doi: 10.1002/ijc.29911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Come J., Castro C., Morais A., Cossa M., Modcoicar P., Tulsidâs S., Cunha L., Lobo V., Morais A.G., Cotton S., et al. Clinical and Pathologic Profiles of Esophageal Cancer in Mozambique: A Study of Consecutive Patients Admitted to Maputo Central Hospital. J. Glob. Oncol. 2018;4:1–9. doi: 10.1200/JGO.18.00147. –9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.White R.E., Parker R.K., Fitzwater J.W., Kasepoi Z., Topazian M. Stents as sole therapy for oesophageal cancer: a prospective analysis of outcomes after placement. Lancet Oncol. 2009;10:240–246. doi: 10.1016/S1470-2045(09)70004-X. [DOI] [PubMed] [Google Scholar]
- 16.Thumbs A., Borgstein E., Vigna L., Kingham T.P., Kushner A.L., Hellberg K., Bates J., Wilhelm T.J. Self-expanding metal stents (SEMS) for patients with advanced Esophageal cancer in Malawi: An effective palliative treatment. J. Surg. Oncol. 2012;105:410–414. doi: 10.1002/jso.23003. [DOI] [PubMed] [Google Scholar]
- 17.Pereira L., Mutesa L., Tindana P., Ramsay M. African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet. 2021;22:284–306. doi: 10.1038/s41576-020-00306-8. [DOI] [PubMed] [Google Scholar]
- 18.Cui R., Kamatani Y., Takahashi A., Usami M., Hosono N., Kawaguchi T., Tsunoda T., Kamatani N., Kubo M., Nakamura Y., Matsuda K. Functional variants in ADH1B and ALDH2 coupled with alcohol and smoking synergistically enhance esophageal cancer risk. Gastroenterology. 2009;137:1768–1775. doi: 10.1053/j.gastro.2009.07.070. [DOI] [PubMed] [Google Scholar]
- 19.Abnet C.C., Freedman N.D., Hu N., Wang Z., Yu K., Shu X.-O., Yuan J.-M., Zheng W., Dawsey S.M., Dong L.M., et al. A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. Nat. Genet. 2010;42:764–767. doi: 10.1038/ng.649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang L.D., Zhou F.-Y., Li X.M., Sun L.-D., Song X., Jin Y., Li J.M., Kong G.-Q., Qi H., Cui J., et al. Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies a susceptibility locus at PLCE1. Nat. Genet. 2010;42:759–763. doi: 10.1038/ng.648. [DOI] [PubMed] [Google Scholar]
- 21.Wu C., Kraft P., Zhai K., Chang J., Wang Z., Li Y., Hu Z., He Z., Jia W., Abnet C.C., et al. Genome-wide association analyses of esophageal squamous cell carcinoma in Chinese identify multiple susceptibility loci and gene-environment interactions. Nat. Genet. 2012;44:1090–1097. doi: 10.1038/ng.2411. [DOI] [PubMed] [Google Scholar]
- 22.Wu C., Hu Z., He Z., Jia W., Wang F., Zhou Y., Liu Z., Zhan Q., Liu Y., Yu D., et al. Genome-wide association study identifies three new susceptibility loci for esophageal squamous-cell carcinoma in Chinese populations. Nat. Genet. 2011;43:679–684. doi: 10.1038/ng.849. [DOI] [PubMed] [Google Scholar]
- 23.Wu C., Wang Z., Song X., Feng X.-S., Abnet C.C., He J., Hu N., Zuo X.-B., Tan W., Zhan Q., et al. Joint analysis of three genome-wide association studies of esophageal squamous cell carcinoma in Chinese populations. Nat. Genet. 2014;46:1001–1006. doi: 10.1038/ng.3064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McKay J.D., Truong T., Gaborieau V., Chabrier A., Chuang S.-C., Byrnes G., Zaridze D., Shangina O., Szeszenia-Dabrowska N., Lissowska J., et al. A Genome-Wide Association Study of Upper Aerodigestive Tract Cancers Conducted within the INHANCE Consortium. PLoS Genet. 2011;7 doi: 10.1371/journal.pgen.1001333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chang J., Zhong R., Tian J., Li J., Zhai K., Ke J., Lou J., Chen W., Zhu B., Shen N., et al. Exome-wide analyses identify low-frequency variant in CYP26B1 and additional coding variants associated with esophageal squamous cell carcinoma. Nat. Genet. 2018;50:338–343. doi: 10.1038/s41588-018-0045-8. [DOI] [PubMed] [Google Scholar]
- 26.Yokoyama A., Muramatsu T., Ohmori T., Higuchi S., Hayashida M., Ishii H. Esophageal cancer and aldehyde dehydrogenase-2 genotypes in Japanese males. Cancer Epidemiol. Biomarkers Prev. 1996;5:99–102. [PubMed] [Google Scholar]
- 27.Bye H., Prescott N.J., Matejcic M., Rose E., Lewis C.M., Parker M.I., Mathew C.G. Population-specific genetic associations with oesophageal squamous cell carcinoma in South Africa. Carcinogenesis. 2011;32:1855–1861. doi: 10.1093/carcin/bgr211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bye H., Prescott N.J., Lewis C.M., Matejcic M., Moodley L., Robertson B., Rensburg C.v., Parker M.I., Mathew C.G. Distinct genetic association at the PLCE1 locus with oesophageal squamous cell carcinoma in the South African population. Carcinogenesis. 2012;33:2155–2161. doi: 10.1093/carcin/bgs262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chen W.C., Bye H., Matejcic M., Amar A., Govender D., Khew Y.W., Beynon V., Kerr R., Singh E., Prescott N.J., et al. Association of genetic variants in CHEK2 with oesophageal squamous cell carcinoma in the South African Black population. Carcinogenesis. 2019;40:513–520. doi: 10.1093/carcin/bgz026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Matejcic M., Li D., Prescott N.J., Lewis C.M., Mathew C.G., Parker M.I. Association of a deletion of GSTT2B with an altered risk of oesophageal squamous cell carcinoma in a South African population: a case-control study. PLoS One. 2011;6 doi: 10.1371/journal.pone.0029366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ali S.A., Soo C., Agongo G., Alberts M., Amenga-Etego L., Boua R.P., Choudhury A., Crowther N.J., Depuur C., Gómez-Olivé F.X., et al. Genomic and environmental risk factors for cardiometabolic diseases in Africa: methods used for Phase 1 of the AWI-Gen population cross-sectional study. Glob. Health Action. 2018;11 doi: 10.1080/16549716.2018.1507133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ramsay M., Crowther N., Tambo E., Agongo G., Baloyi V., Dikotope S., Gómez-Olivé X., Jaff N., Sorgho H., Wagner R., et al. H3Africa AWI-Gen Collaborative Centre: a resource to study the interplay between genomic and environmental risk factors for cardiometabolic diseases in four sub-Saharan African countries. Glob. Health Epidemiol. Genom. 2016;1 doi: 10.1017/gheg.2016.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.H3Africa Consortium. Rotimi C., Abayomi A., Abimiku A., Adabayeri V.M., Adebamowo C., Adebiyi E., Ademola A.D., Adeyemo A., Adu D., et al. Research capacity. Enabling the genomic revolution in Africa. Science. 2014;344:1346–1348. doi: 10.1126/science.1251546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Chen W.C., Kerr R., May A., Ndlovu B., Sobalisa A., Duze S.T., Joseph L., Mathew C.G., Babb de Villiers C. The Integrity and Yield of Genomic DNA Isolated from Whole Blood Following Long-Term Storage at −30°C. Biopreserv. Biobank. 2018;16:106–113. doi: 10.1089/bio.2017.0050. [DOI] [PubMed] [Google Scholar]
- 35.Miller S.A., Dykes D.D., Polesky H.F. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 1988;16:1215. doi: 10.1093/nar/16.3.1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Soo C.C., Mukomana F., Hazelhurst S., Ramsay M. Establishing an academic biobank in a resource-challenged environment. S. Afr. Med. J. 2017;107:486–492. doi: 10.7196/SAMJ.2017.v107i6.12099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mulder N., Abimiku A., Adebamowo S.N., de Vries J., Matimba A., Olowoyo P., Ramsay M., Skelton M., Stein D.J. H3Africa: current perspectives. Pharmgenomics. Pers. Med. 2018;11:59–66. doi: 10.2147/PGPM.S141546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhou X., Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012;44:821–824. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wickham H. Springer-Verlag; 2016. ggplot2: Elegant Graphics for Data Analysis. [Google Scholar]
- 43.Speed D., Cai N., UCLEB Consortium. Johnson M.R., Nejentsev S., Balding D.J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 2017;49:986–992. doi: 10.1038/ng.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cook J.P., Mahajan A., Morris A.P. Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes. Eur. J. Hum. Genet. 2017;25:240–245. doi: 10.1038/ejhg.2016.150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Swiel S., Brandenburg J.T., Hayat M., Chen W.C., Cox M.A., Hazelhurst S. FPGA Acceleration of GWAS Permutation Testing. bioRxiv. 2022 doi: 10.1101/2022.03.11.483235. Preprint at. [DOI] [Google Scholar]
- 46.Willer C.J., Li Y., Abecasis G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pruim R.J., Welch R.P., Sanna S., Teslovich T.M., Chines P.S., Gliedt T.P., Boehnke M., Abecasis G.R., Willer C.J. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26:2336–2337. doi: 10.1093/bioinformatics/btq419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Benner C., Spencer C.C.A., Havulinna A.S., Salomaa V., Ripatti S., Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–1501. doi: 10.1093/bioinformatics/btw018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Watanabe K., Taskesen E., van Bochoven A., Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 2017;8:1826. doi: 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Fabregat A., Sidiropoulos K., Viteri G., Forner O., Marin-Garcia P., Arnau V., D’Eustachio P., Stein L., Hermjakob H. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinf. 2017;18:142. doi: 10.1186/s12859-017-1559-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Jassal B., Matthews L., Viteri G., Gong C., Lorente P., Fabregat A., Sidiropoulos K., Cook J., Gillespie M., Haw R., et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48:D498–D503. doi: 10.1093/nar/gkz1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Griss J., Viteri G., Sidiropoulos K., Nguyen V., Fabregat A., Hermjakob H. ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis. Mol. Cell. Proteomics. 2020;19:2115–2125. doi: 10.1074/mcp.TIR120.002155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zou Z., Ohta T., Miura F., Oki S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res. 2022;50 doi: 10.1093/nar/gkac199. E175–W182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Wen Y.-J., Zhang H., Ni Y.-L., Huang B., Zhang J., Feng J.-Y., Wang S.-B., Dunwell J.M., Zhang Y.-M., Wu R. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief. Bioinform. 2018;19:700–712. doi: 10.1093/bib/bbw145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Hyland P.L., Zhang H., Yang Q., Yang H.H., Hu N., Lin S.-W., Su H., Wang L., Wang C., Ding T., et al. Pathway, in silico and tissue-specific expression quantitative analyses of oesophageal squamous cell carcinoma genome-wide association studies data. Int. J. Epidemiol. 2016;45:206–220. doi: 10.1093/ije/dyv294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Chen X., Li A., Sun B.-F., Yang Y., Han Y.-N., Yuan X., Chen R.-X., Wei W.-S., Liu Y., Gao C.-C., et al. 5-methylcytosine promotes pathogenesis of bladder cancer through stabilizing mRNAs. Nat. Cell Biol. 2019;21:978–990. doi: 10.1038/s41556-019-0361-y. [DOI] [PubMed] [Google Scholar]
- 58.Chattopadhyay R., Das S., Maiti A.K., Boldogh I., Xie J., Hazra T.K., Kohno K., Mitra S., Bhakat K.K. Regulatory role of human AP-endonuclease (APE1/Ref-1) in YB-1-mediated activation of the multidrug resistance gene MDR1. Mol. Cell Biol. 2008;28:7066–7080. doi: 10.1128/MCB.00244-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gaudreault I., Guay D., Lebel M. YB-1 promotes strand separation in vitro of duplex DNA containing either mispaired bases or cisplatin modifications, exhibits endonucleolytic activities and binds several DNA repair proteins. Nucleic Acids Res. 2004;32:316–327. doi: 10.1093/nar/gkh170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Capowski E.E., Esnault S., Bhattacharya S., Malter J.S. Y box-binding factor promotes eosinophil survival by stabilizing granulocyte-macrophage colony-stimulating factor mRNA. J. Immunol. 2001;167:5970–5976. doi: 10.4049/jimmunol.167.10.5970. [DOI] [PubMed] [Google Scholar]
- 61.Chen C.Y., Gherzi R., Andersen J.S., Gaietta G., Jürchott K., Royer H.D., Mann M., Karin M. Nucleolin and YB-1 are required for JNK-mediated interleukin-2 mRNA stabilization during T-cell activation. Genes Dev. 2000;14:1236–1248. [PMC free article] [PubMed] [Google Scholar]
- 62.Horwitz E.M., Maloney K.A., Ley T.J. A human protein containing a “cold shock” domain binds specifically to H-DNA upstream from the human gamma-globin genes. J. Biol. Chem. 1994;269:14130–14139. [PubMed] [Google Scholar]
- 63.Kamangar F., Chow W.-H., Abnet C.C., Dawsey S.M. Environmental causes of esophageal cancer. Gastroenterol. Clin. North Am. 2009;38:27–57. doi: 10.1016/j.gtc.2009.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Abnet C.C., Arnold M., Wei W.-Q. Epidemiology of Esophageal Squamous Cell Carcinoma. Gastroenterology. 2018;154:360–373. doi: 10.1053/j.gastro.2017.08.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Simba H., Kuivaniemi H., Lutje V., Tromp G., Sewram V. Systematic Review of Genetic Factors in the Etiology of Esophageal Squamous Cell Carcinoma in African Populations. Front. Genet. 2019;10:642. doi: 10.3389/fgene.2019.00642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Choudhury A., Sengupta D., Ramsay M., Schlebusch C. Bantu-speaker migration and admixture in southern Africa. Hum. Mol. Genet. 2021;30:R56–R63. doi: 10.1093/hmg/ddaa274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Sengupta D., Choudhury A., Fortes-Lima C., Aron S., Whitelaw G., Bostoen K., Gunnink H., Chousou-Polydouri N., Delius P., Tollman S., et al. Genetic substructure and complex demographic history of South African Bantu speakers. Nat. Commun. 2021;12:2080. doi: 10.1038/s41467-021-22207-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Choudhury A., Aron S., Botigué L.R., Sengupta D., Botha G., Bensellak T., Wells G., Kumuthini J., Shriner D., Fakim Y.J., et al. High-depth African genomes inform human migration and health. Nature. 2020;586:741–748. doi: 10.1038/s41586-020-2859-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Cerami E., Gao J., Dogrusoz U., Gross B.E., Sumer S.O., Aksoy B.A., Jacobsen A., Byrne C.J., Heuer M.L., Larsson E., et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Zhang H.-R., Lai S.-Y., Huang L.-J., Zhang Z.-F., Liu J., Zheng S.-R., Ding K., Bai X., Zhou J.-Y. Myosin 1b promotes cell proliferation, migration, and invasion in cervical cancer. Gynecol. Oncol. 2018;149:188–197. doi: 10.1016/j.ygyno.2018.01.024. [DOI] [PubMed] [Google Scholar]
- 71.Chapman B. v, Wald A.I., Akhtar P., Munko A.C., Xu J., Gibson S.P., Grandis J.R., Ferris R.L., Khan S.A. MicroRNA-363 targets myosin 1B to reduce cellular migration in head and neck cancer. BMC Cancer. 2015;15:861. doi: 10.1186/s12885-015-1888-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Ohmura G., Tsujikawa T., Yaguchi T., Kawamura N., Mikami S., Sugiyama J., Nakamura K., Kobayashi A., Iwata T., Nakano H., et al. Aberrant Myosin 1b Expression Promotes Cell Migration and Lymph Node Metastasis of HNSCC. Mol. Cancer Res. 2015;13:721–731. doi: 10.1158/1541-7786.MCR-14-0410. [DOI] [PubMed] [Google Scholar]
- 73.Naeger L.K., McKinney J., Salvekar A., Hoey T. Identification of a STAT4 binding site in the interleukin-12 receptor required for signaling. J. Biol. Chem. 1999;274:1875–1878. doi: 10.1074/jbc.274.4.1875. [DOI] [PubMed] [Google Scholar]
- 74.López S., van Dorp L., Hellenthal G. Human Dispersal Out of Africa: A Lasting Debate. Evol. Bioinform. Online. 2015;11:57–68. doi: 10.4137/EBO.S33489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Sampson J.N., Wheeler W.A., Yeager M., Panagiotou O., Wang Z., Berndt S.I., Lan Q., Abnet C.C., Amundadottir L.T., Figueroa J.D., et al. Analysis of Heritability and Shared Heritability Based on Genome-Wide Association Studies for Thirteen Cancer Types. J. Natl. Cancer Inst. 2015;107:djv279. doi: 10.1093/jnci/djv279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Gelernter J., Kranzler H.R., Sherva R., Almasy L., Koesterer R., Smith A.H., Anton R., Preuss U.W., Ridinger M., Rujescu D., et al. Genome-wide association study of alcohol dependence:significant findings in African- and European-Americans including novel risk loci. Mol. Psychiatry. 2014;19:41–49. doi: 10.1038/mp.2013.145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Matejcic M., Gunter M.J., Ferrari P. Alcohol metabolism and oesophageal cancer: a systematic review of the evidence. Carcinogenesis. 2017;38:859–872. doi: 10.1093/carcin/bgx067. [DOI] [PubMed] [Google Scholar]
- 78.van Loon K., Mwachiro M.M., Abnet C.C., Akoko L., Assefa M., Burgert S.L., Chasimpha S., Dzamalala C., Fleischer D.E., Gopal S., et al. The African Esophageal Cancer Consortium: A Call to Action. J. Glob. Oncol. 2018;4:1–9. doi: 10.1200/JGO.17.00163. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The African ESCC data used in this study are available to interested researchers through the European Genome-Phenome Archive (EGA), subject to controlled access review by the Data and Biospecimen Access Committee of the University of the Witwatersrand; African ESCC genotype dataset EGA study accession number: EGAS00001007477. The AWI-Gen data used in this study are available to interested researchers through EGA, subject to controlled access review by the Data and Biospecimen Access Committee of the H3Africa Consortium; AWI-Gen genotype dataset accession number: EGAD00010001996. GWAS Catalog (https://www.ebi.ac.uk/gwas/) summary statistics reported in the paper are accessible on GWAS Catalog at the following accession numbers: GCST90271955 and GCST90271956.





