Skip to main content
Cell Genomics logoLink to Cell Genomics
. 2024 Jan 9;4(2):100474. doi: 10.1016/j.xgen.2023.100474

Host genetic variants, Epstein-Barr virus subtypes, and the risk of nasopharyngeal carcinoma: Assessment of interaction and mediation

Miao Xu 1,2,19, Ruimei Feng 3,19, Zhonghua Liu 4,19, Xiang Zhou 1,5,19, Yanhong Chen 1, Yulu Cao 1, Linda Valeri 4,6, Zilin Li 2,7, Zhiwei Liu 8, Su-Mei Cao 1, Qing Liu 1, Shang-Hang Xie 1, Ellen T Chang 9,10, Wei-Hua Jia 1, Jincheng Shen 11, Youyuan Yao 12, Yong-Lin Cai 13, Yuming Zheng 13, Zhe Zhang 14, Guangwu Huang 14, Ingemar Ernberg 15, Minzhong Tang 14, Weimin Ye 16,17,, Hans-Olov Adami 6,16,18,∗∗, Yi-Xin Zeng 1,∗∗∗, Xihong Lin 2,20,∗∗∗∗
PMCID: PMC10879020  PMID: 38359790

Summary

Epstein-Barr virus (EBV) and human leukocyte antigen (HLA) polymorphisms are well-known risk factors for nasopharyngeal carcinoma (NPC). However, the combined effects between HLA and EBV on the risk of NPC are unknown. We applied a causal inference framework to disentangle interaction and mediation effects between two host HLA SNPs, rs2860580 and rs2894207, and EBV variant 163364 with a population-based case-control study in NPC-endemic southern China. We discovered the strong interaction effects between the high-risk EBV subtype and both HLA SNPs on NPC risk (rs2860580, relative excess risk due to interaction [RERI] = 4.08, 95% confidence interval [CI] = 2.03–6.14; rs2894207, RERI = 3.37, 95% CI = 1.59–5.15), accounting for the majority of genetic risk effects. These results indicate that HLA genes and the high-risk EBV have joint effects on NPC risk. Prevention strategies targeting the high-risk EBV subtype would largely reduce NPC risk associated with EBV and host genetic susceptibility.

Keywords: HLA polymorphism, nasopharyngeal carcinoma, genetic susceptibility to cancer, Epstein-Barr virus, causal inference, interaction, mediation

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • A causal inference framework applied to disentangle interaction and mediation effects

  • The high-risk EBV subtype and HLA alleles jointly determine the majority of NPC risk

  • The mediated effect through favoring the increased frequency of high-risk EBV is weak

  • Targeting high-risk EBV could largely reduce NPC risk in southern China


Xu et al. revealed that the risk of nasopharyngeal carcinoma (NPC) associated with HLA variants depended on infection of a high-risk EBV subtype, indicating that most NPC risk could theoretically be eliminated by intervening against infection of the high-risk EBV and by routine NPC screening among high-risk EBV carriers for early detection.

Introduction

Although nasopharyngeal carcinoma (NPC) is rare in most parts of the world, it is one of the most common cancers in southern China.1 Epstein-Barr virus (EBV) has long been postulated to be a near-necessary factor for NPC development because it is present in the tumor cells of almost all patients with NPC, and it is the basis for serologic viral antibody and DNA tests that are widely used for screening and early diagnosis of NPC in high-risk populations.2,3,4 Recent studies identified that the EBV subtype that carries the non-synonymous variant at position 163364, encoding the V317M mutation in EBV BALF2 protein, significantly contributes to the overall risk of NPC (p = 2.40E−32, odds ratio [OR] = 6.14), and its distribution is strongly associated with the unique epidemic of NPC in southern China.5 However, because EBV infection is common and NPC is rare, it is widely accepted that other host genetic or environmental factors are also important determinants of NPC risk. Previous studies implicated that both host genetic, including the SNPs in the human leukocyte antigen (HLA), TERT, CDKN2A/2B, TNFRSF19, MECOM, CIITA, and ITAG9 regions,6,7,8,9 and environmental factors, including cigarette smoking, consumption of salt-preserved fish (a traditional Cantonese food suggested to be an NPC risk factor), and occupational exposures to wood dust,1 may play a role in NPC development.

Among the genetic factors, HLA genes have the most consistent and prominent evidence for the association with NPC. Previous genome-wide association studies (GWASs) have identified two host HLA genetic variants, rs2860580 (pGWAS = 1.34E−28, OR = 1.72) and rs2894207 (pGWAS = 1.22E−16, OR = 1.64), to be strongly associated with NPC.6,7,8 Given the a priori knowledge of the central role of HLA in host immune response against virus and the most prominent associations of HLA genes and the high-risk EBV subtypes with NPC, it is postulated that HLA-mediated pathways may cooperate with EBV in NPC development.1 However, genetic and epidemiological evidence is lacking on how HLA genes and EBV act collaboratively to cause NPC. Without this knowledge, the clinical and public health utility of the genetic findings is limited.

Specifically, one possible causal pathway is that certain HLA genes may increase individuals’ vulnerability to the oncogenic effect of high-risk EBV infection and that they synergistically influence the risk of NPC, a form of gene-EBV interaction. The other possible pathway is that the association of HLA genes with NPC risk may be mediated through increasing susceptibility to high-risk EBV infection, a form of mediation effect through EBV. Hence, a comprehensive and quantitative assessment of HLA-EBV genetic interplay is required to provide novel insights into the etiology of NPC and may inform the design of effective prevention strategies. Traditional methods to disentangle these possible pathways have been limited by the lack of adequate methods to accommodate mediation and interaction effects within a single framework and inadequate handling of case-control data and confounders.

Here, in a population-based case-control study of NPC conducted in NPC-endemic areas of southern China, we applied an advanced causal inference framework10,11,12,13 to study whether the effects of two important host HLA variants (rs2860580 and rs2894207) on NPC are mediated by the EBV variant 163364, as well as their interactions. The causal inference method we applied accommodates gene-EBV interaction and mediation and allows us to disentangle mediation and interaction effects within one framework. To ensure the validity and reproducibility of this study,14 we conducted mediation and interaction analyses using a two-phase design: an original study followed by a replication study, where we replicated the novel findings of the significant interactions between the two host HLA variants and the EBV variant from analyses of the original study with an independent, non-overlapping dataset. To highlight the public health implication, we further evaluated how much the genetic effects of the two HLA variants on NPC that are mediated by EBV, or are due to their interaction, can theoretically be eliminated by prevention strategies targeting individuals infected with the high-risk EBV subtype.

Results

Study population characteristics and associations with host and EBV variants

We performed the causal inference analyses in a population-based case-control study conducted in two provinces, Guangdong and Guangxi, in NPC-endemic areas of southern China. The case-control data were frequency matched on the following variables, sex, 5-year age group, and area of residence, as described in the STAR Methods. Data exclusion criteria are outlined in Figure S1. Briefly, two human SNPs at the HLA locus, rs2860580 and rs2894207, and EBV variant 163364 were genotyped from saliva DNA. The success rate for genotyping both human SNPs was 98.7% (3,906/3,956, 1,683 affected individuals and 2,223 control subjects), whereas the genotyping rate for the EBV variant was 66.9% (2,648/3,956, 1,098 affected individuals and 1,550 control subjects). Hence, the success rate for EBV genotyping is likely dictated by the quantity of EBV DNA in saliva. As shown in the literature, periodic lytic EBV production in the oral epithelium is hypothesized to be the main source of the likely random fluctuation observed for saliva EBV DNA.15,16,17,18 The missingness of EBV genotyping data did not differ by age, the consumption of salt-preserved fish, educational level, rural or urban area of residence, current occupation, selected environmental exposures, or a family history of NPC, and successful EBV genotyping was not correlated with increased NPC risk (Figure S2). A lower missing rate of EBV variant was found among smokers and men (who were substantially more likely than women to be smokers) (Figure S2B). This pattern is concordant with the observation that smoking stimulates EBV lytic production, which increases the chance of EBV being genotyped.19,20 However, because the relative risk and attributable risk of NPC associated with smoking are relatively small, with a relative risk of only 1.1–1.519 compared to a relative risk of 6–7 associated with the high-risk EBV subtype,5 any bias caused by smoking in our dataset would be small. Additionally, sex, age at interview, a family history of NPC, salt-preserved fish consumption, smoking, educational level, rural or urban area of residence, current occupation, and environmental exposure were included as covariates in the logistic regression models for the interaction and mediation analyses to control for confounding in the following causal inference analyses (STAR Methods).

Affected individuals and control subjects recruited from Guangdong were used in the original study, while those recruited from Guangxi were used for the replication study. Table 1 summarizes the demographic characteristics of the original and replication study subjects. The original and replication studies had similar distributions for the case-control status, sex, and age. In the pooled dataset, affected individuals were slightly younger than control subjects and were more likely to live in urban areas, to have a first-degree family history of NPC, to be less educated, to have blue-collar jobs, and to be exposed to selected hazardous agents (mostly inhalants; Table 1).

Table 1.

Characteristics of nasopharyngeal carcinoma affected individuals and control subjects among the original, replication, and pooled studies

Variables Original study
Replication study
Pooled study
Affected individuals n = 572 n (%) Control subjects n = 696 n (%) p Affected individuals n = 497 n (%) Control subjects n = 826 n (%) p Affected individuals n = 1,069 n (%) Control subjects n = 1,522n (%) p
Age, years 0.103 0.002 2.2E−4
Mean (SD) 48.6 (10.9) 49.6 (10.7) 0.108 49.2 (10.8) 51.1 (10.9) 0.003 48.9 (10.8) 50.4 (10.8) 3.9E−4
<35 58 (10.1) 52 (7.5) 45 (9.1) 53 (6.4) 103 (9.6) 105 (6.9)
35–59 424 (74.1) 512 (73.6) 366 (73.6) 568 (68.8) 790 (73.9) 1,080 (71.0)
>59 90 (15.7) 132 (19.0) 86 (17.3) 205 (24.8) 176 (16.5) 337 (22.1)
Sex 0.879 0.928 0.819
Male 431 (75.4) 527 (75.7) 381 (76.7) 635 (76.9) 812 (76.0) 1,162 (76.4)
Female 141 (24.7) 169 (24.3) 116 (23.3) 191 (23.1) 257 (24.0) 360 (23.7)
Education level, years 0.117 0.102 0.021
<7 229 (40.0) 247 (35.5) 216 (43.5) 310 (37.5) 445 (41.6) 557 (36.6)
7–9 233 (40.7) 286 (41.1) 179 (36.0) 327 (39.6) 412 (38.5) 613 (40.3)
≥10 110 (19.2) 163 (23.4) 102 (20.5) 189 (22.9) 212 (19.8) 352 (23.1)
Residential area 4.6E−6 0.251 2.9E−5
Urban 96 (16.8) 58 (8.3) 65 (13.1) 89 (10.8) 161 (13.1) 147 (10.8)
Rural 476 (83.2) 638 (91.7) 432 (86.9) 737 (89.2) 908 (86.9) 1,375 (89.2)
Salt-preserved fish consumption in 2000–2002 0.186 0.027 0.119
Yearly or less 390 (68.2) 450 (64.7) 431 (86.7) 678 (82.1) 821 (76.8) 1,128 (74.1)
Monthly or more 182 (31.8) 246 (35.3) 66 (13.3) 148 (17.9) 248 (23.2) 394 (25.9)
NPC history among first-degree relatives 1.9E−8 7.3E−5 2.4E−13
No 487 (85.1) 653 (93.8) 449 (90.3) 790 (95.6) 936 (87.6) 1,443 (94.8)
Yes 73 (12.8) 26 (3.7) 39 (7.9) 22 (2.7) 112 (10.5) 48 (3.2)
Unknown/missing 12 (2.1) 17 (2.4) 9 (1.8) 14 (1.7) 21 (2.0) 31 (2.0)
Smoking status 0.732 0.785 0.864
Never 224 (39.2) 266 (38.2) 217 (43.7) 367 (44.4) 441 (41.3) 633 (41.6)
Ever 348 (60.8) 430 (61.8) 280 (56.3) 459 (55.6) 628 (58.8) 889 (58.4)
Current occupation 3.9E−12 5.5E−4 1.0E−5
Farmer/Unemployment 208 (36.4) 232 (33.3) 192 (38.6) 388 (47.0) 400 (37.4) 620 (40.7)
Blue collar 236 (41.3) 213 (30.6) 190 (38.2) 265 (32.1) 426 (39.9) 478 (31.4)
White collar 89 (15.6) 102 (14.7) 59 (11.9) 118 (14.3) 148 (13.8) 220 (14.5)
Unknown 39 (6.8) 149 (21.4) 56 (11.3) 55 (6.7) 95 (8.9) 204 (13.4)
Selected environmental exposurea 6.5E−19 0.029 1.3E−16
None 58 (10.1) 82 (11.8) 35 (7.0) 85 (10.3) 93 (8.7) 167 (11.0)
Dust exposure 226 (39.5) 217 (31.2) 246 (49.5) 326 (39.5) 472 (44.2) 543 (35.7)
Smoke/exhaust exposure 107 (18.7) 91 (13.1) 125 (25.2) 167 (20.2) 232 (21.7) 258 (17.0)
Other exposure 175 (30.6) 193 (27.7) 85 (17.1) 245 (29.7) 260 (24.3) 438 (28.8)
Unknown/missing 6 (1.1) 113 (16.2) 6 (1.2) 3 (0.4) 12 (1.1) 116 (7.6)
EBV infection determined by variant 163364 0.693 0.692 0.890
Multiple strains 16 (2.8) 17 (2.4) 16 (3.2) 30 (3.6) 32 (3.0) 47 (3.1)
Single strain 556 (97.2) 679 (97.6) 481 (96.8) 796 (96.4) 1,037 (97.0) 1475 (96.9)

Multiple strains represent the genotype CT for EBV variant 163364; single strain represents the genotype C or T for the same variant.

a

Dust exposure includes exposure to wood, metal, textile, leather, cement, and other types of non-soil dust. Smoke/exhaust exposure includes exposure to exhaust of diesel, gasoline, coal, firewood, asphalt/tar, nature gas, and other types of exhaust/smoke. Other environmental exposure includes exposure to wood preservatives, formaldehyde, organic solvents, pesticides, and other types of chemical vapor, as well as sulfuric acid, hydrochloride, and other types of acid/alkali.

Table 2 displays the effects of the two host SNPs, rs2860580 and rs2894207, their joint status, and EBV variant 163364 on NPC risk in the original, replication, and pooled studies. No deviation from Hardy-Weinberg equilibrium was detected for the two human SNPs. The two human SNPs were independent of each other with weak linkage disequilibrium (R2 = 0.01) among the control subjects in this study, consistent with previous GWASs.6 In the pooled study, individuals carrying only the risk allele of rs2860580, only the risk allele of rs2894207, and only the risk alleles of both host SNPs had an increased NPC risk by 1.80-, 1.68-, and 2.12-fold, respectively, compared to the individuals in the reference group indicated in Table 2; the effect associated with per risk alleles was 1.68 (95% confidence interval [CI] = 1.47–1.91) and 1.65 (95% CI = 1.40–1.93) for rs2860580 and rs2894207, respectively, consistent with the published GWAS results (Table S1). EBV variant 163364 was associated with 6.86-fold increased risk of NPC in both datasets (Table 2).

Table 2.

Association between two host genetic variants or EBV variant 163364 and risk of nasopharyngeal carcinoma

Affected individuals n (%) Control subjects n (%) OR (95% CI)a
Original study in Guangdong

rs2860580 (risk allele = G)
AA/AG 242 (42.3) 402 (57.8) reference
GG 330 (57.7) 294 (42.2) 1.93 (1.52, 2.44)
rs2894207 (risk allele = T)
CC/CT 148 (25.9) 266 (38.2) reference
TT 424 (74.1) 430 (61.8) 1.70 (1.32, 2.19)
Joint status of rs2860580 and rs2894207b
Low risk 303 (53.0) 493 (70.8) reference
High risk 269 (47.0) 203 (29.2) 2.24 (1.75, 2.85)
EBV 163364 (high-risk subtype = T)
C 83 (14.5) 374 (53.7) reference
CT/T 489 (85.5) 322 (46.3) 6.99 (5.25, 9.31)

Replication study in Guangxi

rs2860580 (risk allele = G)
AA/AG 213 (42.9) 459 (55.6) reference
GG 284 (57.1) 367 (44.4) 1.73 (1.38, 2.18)
rs2894207 (risk allele = T)
CC/CT 111 (22.3) 269 (32.6) reference
TT 386 (77.7) 557 (67.4) 1.74 (1.34, 2.26)
Joint status of rs2860580 and rs2894207b
Low risk 256 (51.5) 561 (67.9) reference
High risk 241 (48.5) 265 (32.1) 2.12 (1.68, 2.69)
EBV 163364 (high-risk subtype = T)
C 122 (24.6) 563 (68.2) reference
CT/T 375 (75.5) 263 (31.8) 6.55 (5.07, 8.46)

Pooled study

rs2860580 (risk allele = G)
AA/AG 455 (42.6) 861 (56.6) reference
GG 614 (57.4) 661 (43.4) 1.80 (1.53, 2.11)
rs2894207 (risk allele = T)
CC/CT 259 (24.2) 535 (35.2) reference
TT 810 (75.8) 987 (64.9) 1.68 (1.41, 2.01)
Joint status of rs2860580 and rs2894207b
Low risk 559 (52.3) 1,054 (69.3) reference
High risk 510 (47.7) 468 (30.8) 2.12 (1.79, 2.50)
EBV 163364 (high-risk subtype = T)
C 205 (19.2) 937 (61.6) reference
CT/T 864 (80.8) 585 (38.4) 6.86 (5.68, 8.27)
a

Adjusted for age at interview, sex and smoking joint status, education level, salt-preserved fish consumption in 2000–2002, NPC history among first-degree relatives, rural or urban area of residence, current occupation, and environmental exposure.

b

The joint status of high-risk group: GG for rs2860580 and TT for rs2894207; the joint status of low-risk group: AA/AG for rs2860580 or CC/CT for rs2894207.

Interaction effects between host SNPs and EBV variant

Figure 1 shows ORs, additive interactions, and their 95% CIs for the joint effects of host HLA SNPs rs2860580 and rs2894207 and EBV variant 163364 on NPC risk. Compared to the subjects carrying both protective alleles (AA/AG) of rs2860580 and the low-risk EBV variant (C), those carrying only the susceptible alleles (GG) of rs2860580, only the high-risk EBV variant (CT/T), or both had approximately 1.5-, 6-, and 11-fold increased risk, respectively, in the original and the replication studies (Figure 1A). Similarly, joint ORs and 95% CIs are shown for rs2894207 and EBV variant 163364 (Figure 1B). Importantly, significant additive interactions were observed between the two host SNPs, rs2860580 and rs2894207, and EBV variant 163364 in both the original and replication studies (Figure 1). In the pooled study, the total effect of carrying only host-susceptible HLA alleles and the high-risk EBV subtype due to interaction (relative excess risk due to interaction [RERI]) was 4.08 (95% CI = 2.03–6.14) for rs2860580 and 3.37 (95% CI = 1.59–5.15) for rs2894207 (Figure 1). Furthermore, in the interaction analyses, we combined the two host genetic variants into one categorical variable by their joint status, which divided the study subjects into two groups: one group carrying only risk alleles of both host SNPs and the other group carrying protective alleles of either host SNP. We identified significant and stronger interaction effects between the joint status of the two host SNPs and EBV subtypes in both the original study (RERI = 6.03, 95% CI = 2.06–9.99) and the replication study (RERI = 4.68, 95% CI = 1.13–8.23) than analyzing the two host SNPs separately (Figure 1C).

Figure 1.

Figure 1

Joint effect and additive interaction between EBV variant 163364 and host HLA SNPs on the risk of nasopharyngeal carcinoma

(A) rs2860580, (B) rs2894207, and (C) their joint status. OR, odds ratio; CI, confidence interval; RERI, relative excess risk due to interaction. Two host SNPs were combined as one categorical variable in the models by their joint status, which divided the study subjects into two groups: one group at higher risk carrying only risk alleles of both host SNPs and the other group at lower risk carrying protective alleles of either host SNP.

To evaluate the impact of individual alleles of the two variants and produce OR for each risk allele, we performed the interaction analyses using the additive model that compares the effects of carrying homozygous versus heterozygous versus no risk alleles. The interaction effects with EBV variant 163364 on NPC were statistically significant in both the original and replication studies and were 3.08 (95% CI = 1.79–4.37) and 2.57 (95% CI = 1.24–3.90) per risk allele for rs2860580 and rs2894207, respectively, in the pooled study (Figure S3). These results further showed that the findings of highly significant interaction effects between EBV and the two human HLA genetic variants were robust to the assumed genetic models, i.e., both the additive model and the recessive model (comparing the individuals carrying only susceptible alleles versus those carrying protective alleles). Taken together, these coherent results indicate that host-susceptible HLA genes and the high-risk EBV subtype have synergistic interaction effects on the risk of NPC.

Genetic effects mediated by high-risk EBV subtype

The associations between the two host HLA SNPs, rs2860580 and rs2894207, and EBV variant 163364 (Table S2) indicate that the genetic effects might be mediated through increasing the frequency of high-risk EBV subtype. We applied causal mediation analyses, allowing for interaction between host SNPs and EBV (STAR Methods). The mediation effects (indirect effects) for NPC risk through EBV variant 163364 and direct effects are shown in Table 3. Both the original and replication studies, as well as the pooled study, revealed significant direct effects and small, statistically non-significant or weakly significant indirect effects. The direct effect in the pooled dataset was estimated as ORs 1.69 (95% CI = 1.40–2.03) for rs2860580 and 1.56 (95% CI = 1.27–1.90) for rs2894207, whereas the indirect effect was close to 1 (OR = 1.07, 95% CI = 0.98–1.17 for rs2860580; OR = 1.10, 95% CI = 1.00–1.21 for rs2894207). When we combined the two host SNPs as one categorical variable by their joint status and compared the individuals carrying only risk alleles of both host SNPs to those carrying protective alleles of either host SNP, the mediation effect through EBV subtypes (indirect effect) became significant, albeit weak, in the pooled analysis at a 5% significance level (indirect effect, OR = 1.12, 95% CI = 1.02–1.23; Table 3), possibly due to the stronger genetic effects of combining two SNPs together (Table 2).

Table 3.

Direct and indirect effects on nasopharyngeal carcinoma between host SNPs rs2860580 and rs2894207 as well as their joint status and EBV variant 163364

Effect ORa 95% CIa p
rs2860580 and EBV 163364

Original study natural direct effect 1.88 1.44, 2.46 3.4E−6
natural indirect effect 1.02 0.90, 1.16 0.730
marginal total effect 1.92 1.44, 2.58 1.2E−5
Replication study natural direct effect 1.56 1.20, 2.02 8.0E−4
natural indirect effect 1.13 0.99, 1.28 0.066
marginal total effect 1.76 1.31, 2.35 1.5E−4
Pooled study natural direct effect 1.69 1.40, 2.03 2.5E−8
natural indirect effect 1.07 0.98, 1.17 0.157
marginal total effect 1.80 1.47, 2.21 1.9E−8

rs2894270 and EBV 163364

Original study natural direct effect 1.53 1.16, 2.03 0.003
natural indirect effect 1.13 0.99, 1.29 0.062
marginal total effect 1.73 1.27, 2.36 5.0E−4
Replication study natural direct effect 1.62 1.21, 2.17 0.001
natural indirect effect 1.08 0.94, 1.25 0.294
marginal total effect 1.75 1.27, 2.41 6.1E−4
Pooled study natural direct effect 1.56 1.27, 1.90 1.5E−5
natural indirect effect 1.10 1.00, 1.21 0.052
marginal total effect 1.71 1.37, 2.14 1.7E−6

Joint status of host SNPsb and EBV 163364

Original study natural direct effect 2.04 1.55, 2.69 3.5E−7
natural indirect effect 1.12 0.98, 1.27 0.088
marginal total effect 2.28 1.68, 3.09 1.2E−7
Replication study natural direct effect 1.90 1.46, 2.48 2.2E−6
natural indirect effect 1.13 0.99, 1.29 0.067
marginal total effect 2.15 1.59, 2.91 6.2E−7
Pooled study natural direct effect 1.92 1.59, 2.31 1.3E−11
natural indirect effect 1.12 1.02, 1.23 0.020
marginal total effect 2.14 1.73, 2.64 2.2E−12
a

Adjusted for age at interview, sex and smoking joint status, education level, salt-preserved fish consumption in 2000–2002, NPC history among first-degree relatives, rural or urban area of residence, current occupation, and environmental exposure.

b

Two host SNPs were combined as one categorical variable in the models by their joint status, which divided the study subjects into two groups: one group at higher risk carrying only risk alleles of both host SNPs and the other group at lower risk carrying protective alleles of either host SNP.

We further used the additive model to assess the mediation effects per risk allele of each HLA SNP. Similarly, the direct effects of both host HLA SNPs were highly significant in both the original and replication studies, while the indirect effects through high-risk EBV variant 163364 were statistically non-significant for rs2860580 (indirect effect, OR = 1.05, 95% CI = 0.98–1.22) and became significant for rs2894207 at a 5% significance level (indirect effect, OR = 1.12, 95% CI = 1.03–1.21; Table S3) in the pooled analysis. In accordance with the results using the recessive model (comparing the individuals carrying only susceptible alleles versus those carrying protective alleles; Table 3), the effect sizes using the additive model per risk allele on NPC through increasing the frequency of the high-risk EBV subtype (Table S3; indirect effect, ORs = 1.05 and 1.12 for rs2860580 and rs2894207, respectively) were small and could not explain the majority of genetic effects for the two HLA SNPs or the EBV variant on NPC. Taken together, these results indicate that the majority of the effects of the host SNPs rs2894207 and rs2860580 on NPC risk might not be mediated by the high-risk EBV subtype.

NPC risk attributable to the high-risk EBV subtype

Evaluating the interaction between host genes and EBV subtypes enables quantification of attributable risk, that is, the potential beneficial impact of preventing infection with the high-risk EBV subtype. Therefore, we applied four-way decomposition to evaluate the proportion of NPC risk due to host-virus interaction and mediation that can be reduced or eliminated by intervention against high-risk EBV. The four-way decomposition method separates the excess relative risk of NPC due to host genetic effects into four parts involving interaction, mediation, both, or neither (Figure 2). The decomposition analysis showed that the excess relative risk of NPC due to pure interaction between HLA SNPs and EBV variant 163364 was significant in the pooled dataset (rs2860580: reference interaction = 0.47, 95% CI = 0.21–0.73; rs2894207: reference interaction = 0.40, 95% CI = 0.15–0.65) and accounted for the largest proportion for both SNPs (Figures 2A and 2B). In the pooled dataset, interaction with the high-risk EBV (reference interaction + mediated interaction) accounted for 66.0% of the total excess risk associated with SNP rs2860580 and 69.2% with rs2894207, comparing individuals carrying only susceptible HLA alleles with those carrying protective alleles. The association of NPC risk with both SNPs mediated through the high-risk EBV (mediated interaction + pure indirect effect) was non-significant in both the original and replication studies (Figures 2A and 2B). When we combined the two HLA SNPs in one model and compared the individuals carrying only risk alleles of both host SNPs to those carrying protective alleles of either host SNP, the excess relative risk of NPC due to the pure interaction effect between HLA SNPs and EBV variant 163364 became even stronger (reference interaction = 0.57, 95% CI = 0.26–0.87 in the pooled dataset; Figure 2C).

Figure 2.

Figure 2

Four-way decomposition of total excess relative risk for nasopharyngeal carcinoma associated with host SNPs

(A) rs2860580, (B) rs2894207, and (C) their joint status. CI, confidence interval; proportion, the proportion of total excess relative risk. Two host SNPs were combined as one categorical variable in the models by their joint status, which divided the study subjects into two groups: one group at higher risk carrying only risk alleles of both host SNPs and the other group at lower risk carrying protective alleles of either host SNP.

Furthermore, by combining the effects of host-virus interaction and mediation (reference interaction + mediated interaction + pure indirect effect) in the pooled dataset, we found that 74.5% and 82.7% of the total excess relative risk associated with carrying only the susceptible alleles of rs2860580 and rs2894207, respectively, can potentially be eliminated by preventing high-risk EBV infection (Figures 2A and 2B). Consistently, in the model using the joint status of the two SNPs, 69.9% of the total excess risk associated with carrying only susceptible alleles of both SNPs can potentially be eliminated by preventing high-risk EBV infection (Figure 2C). The remaining effects independent of high-risk EBV (i.e., the controlled direct effects) would be small (Figure 2).

Finally, the conclusion of the relatively small risk effects independent of high-risk EBV is robust under the additive model. Similarly, with the additive model (Figure S4), interaction with the high-risk EBV subtype accounted for 61.6% and 54.8%, the largest proportion, of the genetic effects of per risk allele on NPC risk for rs2860580 and rs2894207, respectively; 69.0% and 71.1% of the total excess relative risk associated per risk allele of rs2860580 and rs2894207, respectively, can potentially be eliminated by preventing high-risk EBV infection.

Potential mechanistic interaction between EBV subtypes and HLA alleles

By contributing the majority of disease risk, the strong interaction effect we discovered indicates that NPC risk depends not only on EBV subtypes but also on the HLA alleles of the host. To explore the plausible mechanisms underlying the high-risk EBV subtype and its interaction with host HLA variants on NPC risk, we evaluated the impact of high-risk EBV variant 163364 on the protein function. We used AlphaFold2 to predict the structure of BALF2 proteins from the low-risk and high-risk EBV strains.21,22 Interestingly, we observed that BALF2 amino acid 317, encoded by EBV SNP 163364, is located in the neck region, a presumed single-stranded DNA (ssDNA) binding pocket (Figure S5A). Within this pocket, two positively charged amino acids, D309 and E546 on the opposite side, form a gate structure, facilitating the movement of ssDNA through the binding pocket. Meanwhile, two negatively charged amino acids, R322 and K548, dock the ssDNA. With the presence of V317M mutation, encoded by the high-risk variant 163364, the long side chain of M drives a shift of α-helix (314–328) harboring M317 and its neighboring loop (308–313), narrowing the gate between D309 and E546, as well as the docking interface between R322 and K548 (Figures S5B–S5D). This narrowed gate and binding pocket due to the M317 high-risk variant may potentially affect DNA movement and viral DNA replication during the lytic cycle.

Furthermore, our findings underscore a substantial additional risk associated with the concurrent presence of the high-risk EBV subtype and susceptible HLA alleles. This association may be related to distinct HLA-mediated T cell immune responses to different EBV subtypes, suggesting a potential for immune evasion by the high-risk EBV subtype from susceptible HLA alleles. Among the HLA alleles associated with NPC in southern Chinese populations, A∗0207 is the most significantly associated risk allele.9,23,24 Thus, we evaluated HLA-A∗0207 binding affinity for the nonamer peptide pairs from high-risk and low-risk EBV subtypes using NetMHCpan-4.1.25 There is no difference in the predicted binding affinity for the high-risk peptide containing BALF2 variant 163364, but some other peptides associated with this variant and NPC risk have a lower binding affinity to A∗0207 in the high-risk strain (Figure S6). Therefore, EBV SNP 163364 may be a marker linked to yet still unidentified functional variation in the NPC high-risk strain.

Discussion

Our analyses revealed significant additive interactions on NPC risk between the EBV subtype classified by variant 163364 and host SNPs rs2860580 at HLA-A and rs2894207 at HLA-B/C loci. The evidence was weaker for the EBV-mediated genetic effects associated with the two HLA SNPs. By decomposing the total effects of the host risk alleles into separate and joint effects of interaction and mediation by high-risk EBV subtype, we showed that interaction between host SNPs and EBV accounts for the majority of excess NPC risk conferred by the host genetic variants rs2860580 and rs2894207. Finally, we found that nearly three-quarters of the excess NPC risk attributable to both host HLA SNPs could in theory be eliminated by prevention of the infection with the high-risk EBV subtype through vaccination. Our study thus provides strong evidence of the critical interplay between human genetics and EBV subtype in the etiology of NPC and lays the groundwork for EBV-subtype-specific prevention for reducing the NPC burden in endemic populations.

A strong interaction between host genetic susceptibility and EBV subtypes indicates that the direct effects of the host variants on NPC risk may occur primarily among individuals carrying the high-risk EBV subtype. This risk model is supported by the fact that the ORs associated with rs2860580 and rs2894207 were 1–2 among individuals carrying the low-risk EBV subtype, whereas the ORs associated with the two SNPs were 6–13 among individuals carrying the high-risk EBV subtype. The strong interaction highlights that the susceptible HLA alleles increase the risk associated with high-risk EBV infection for NPC, supporting potential immune evasion by the high-risk EBV subtype from susceptible HLA alleles. The association between HLA genes and NPC risk has been confirmed consistently in both candidate-gene studies and several independent GWASs, highlighting HLA-A∗1101 and A∗0207 as the most significantly associated genes in these investigations.6,9,23,24,26 HLA-A∗1101 is the protective allele, while A∗0207 is the risk allele. The EBV EBNA-3B epitope IVTDFSVIK, restricted to HLA-A∗11, has a high-frequency mutation (IVTDFSVIKN) among southern Chinese populations with a high A∗11 frequency. These mutations are thought to provide selective advantage in the highly A∗11-positive populations.27,28,29 Our findings, revealing an additional risk associated with the co-occurrence of NPC-high-risk EBV and HLA-A∗0207, suggest that the high-risk EBV may carry the sequence variations correlated with reduced binding affinity for A∗0207, potentially contributing to an elevated NPC risk among individuals carrying the A∗0207 allele. A recent study has also reported the trend of decreased HLA-A∗02 binding affinity with peptides harboring NPC-high-risk mutations.30 Specifically, the LMP-1 YFLEILWRL mutant peptide, which shows association with NPC risk, has been reported to evade recognition by A∗02-restricted epitope (YLLEMLWRL)-specific T cells (Figure S6; Table S4) and to fail to elicit T cell responses in patients with NPC.31,32 Since EBV LMP-1 protein is among the few latent antigens expressed in NPC cells, it is plausible that NPC cells infected with the high-risk EBV subtype possess an enhanced ability to evade HLA-A∗0207-mediated T cell immune surveillance. This scenario could further increase the NPC risk among individuals carrying the susceptible HLA allele and the high-risk EBV subtype. Extensive mapping of T cell epitopes of the high-risk EBV subtype is important for designing EBV vaccines and T cell therapies targeting NPC.

Furthermore, the NPC-derived EBV strain, M81, has been shown to exhibit an epitheliotropism and a high level of spontaneous replication in B cells.33 These unique properties of M81 are thought to be related to its epithelial oncogenic potential and consistent with the observed increased viral replication preceding NPC onset. Polymorphisms within the NPC-high-risk EBV subtype, particularly in the transactivator protein BZLF1 and its promoter region, the non-coding RNA EBER2, as well as the gene structure of BALF5, have been reported to contribute to these properties.33,34,35,36,37,38 Given the role of BALF2 as the ssDNA binding protein, an essential component of EBV DNA replication complex, the V317M mutation (variant 163364) in the BALF2 protein of high-risk EBV strains, could potentially influence the conformation of the ssDNA interaction surface, thereby altering its function during viral DNA replication. Functional analysis of these high-risk EBV variants would be indispensable to elucidate whether they might contribute the enhanced oncogenicity. Taken together, the distinct viral functional properties between NPC-high-risk and low-risk EBV, coupled with their interplay with HLA genetic factors, suggest that vaccine design aimed at NPC prevention should take into account the genetic variations within the high-risk EBV subtype.

In summary, our findings constitute strong epidemiological evidence for the joint interaction effect between host HLA genes and EBV subtypes on the risk of NPC, thereby providing an illuminating model of the interplay between critical host genetic factors and the virus in NPC carcinogenesis. Notably, the substantial contribution of the interaction with the high-risk EBV subtype to the genetic susceptibility associated with HLA SNPs in NPC suggests that a vaccine targeting high-risk EBV could significantly mitigate NPC risk associated with both viral and host genetic factors within the southern Chinese population. In this context, careful consideration of the genetic diversity specific to the NPC-high-risk EBV subtype is imperative for the development of vaccines aimed at NPC prevention. Moreover, given that both precancerous lesions and early NPC can be treated successfully, routine NPC screening would benefit early disease detection and treatment among individuals carrying the high-risk EBV subtype in endemic populations.

Limitations of the study

First, the weak mediation effect cannot be robustly detected with the current sample size. Using the simulation method proposed by Rudolph et al.,39 we found that under the recessive genetic model, the necessary sample size with at least 80% power for detecting the mediation effect (natural indirect effect [NIE]) of 1.07 for the host SNP rs2860580 through EBV variant 163364 needs to be at least 2,390 affected individuals and 3,501 control subjects, more than double the current sample size (1,069 affected individuals and 1,522 control subjects). Under the additive model, the current sample size has the power to detect a modest mediation effect (NIE) of 1.12 for the host SNP rs2894207 but still lacks sufficient power to robustly detect a weak mediation effect (NIE) of 1.05 for rs2860580. For a mediation effect of 1.05 per risk allele of rs2860580 through EBV variant 163364, the necessary sample size with at least 80% power needs to be 1,999 affected individuals and 2,846 control subjects. However, the significant interaction effects between the two HLA SNPs and the EBV variant were robust to both the additive and recessive genetic models, indicating a relatively strong HLA-EBV interaction effect on NPC risk. Second, although we have adjusted for major NPC risk factors and several factors related to socioeconomic status, our estimates of interaction and mediation effects might be biased by unadjusted or incompletely adjusted confounders. Our case-control study bases were chosen because of the high incidence of NPC, the relatively stable population, the geographic contiguity, and the comparable industrialized level. The control subjects were completely randomly sampled (within strata of age and sex) from the same population in each study base using computerized population registries, and the participation rate was high (83%),40 such that the control subjects are a representative sample of the underlying population where affected individuals were recruited, and they are genetically homogeneous with the affected individuals. Taken together, using a population-based study design and controlling for the confounders, including the rural or urban area of residence, current occupation, and environmental exposure, as well as educational level as a representative of socioeconomic status, the potential confounding related to population stratification, environmental exposure, and spatial dependence could be largely reduced. Moreover, because the more than 6-fold increased NPC risk conferred by the high-risk EBV subtype is far greater than that associated with other known or suspected risk factors, the potential unmeasured confounding bias, if any, is expected to be proportionally small and unlikely to change the direction and significance of the strong interaction effect between host HLA SNPs and EBV. Finally, the current datasets used in this study do not have sufficient power to detect the interaction and mediation effects between EBV and the non-HLA susceptibility SNPs, i.e., the SNPs from TERT, CDKN2A/2B, TNFRSF19, MECOM, CIITA, and ITAG9 loci. To comprehensively understand the etiology of NPC, a genome-wide gene-EBV interaction analysis as well as the interplay between gene-EBV-environmental factors merit future studies.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Biological samples

Human saliva DNA Biological repository of the NPC Genes, Environment, and EBV (NPCGEE) study (Ye et al.40) https://doi.org/10.18632/oncotarget.19692

Deposited data

Human genotype data This paper National Genomics Data Center (NGDC: GVM000648, https://bigd.big.ac.cn/gvm/getProjectDetail?Project=GVM000648)
EBV genotype data This paper National Genomics Data Center (NGDC: GVM000647, https://bigd.big.ac.cn/gvm/getProjectDetail?Project=GVM000647)

Software and algorithms

SAS code: interaction effects A tutorial on interaction (VanderWeele et al.10); Recommendations for presenting analyses of effect modification and interaction (Knol et al.41) https://www.hsph.harvard.edu/tyler-vanderweele/tools-and-tutorials/; https://doi.org/10.1093/ije/dyr218
SAS macro: direct and indirect effect Mediation analysis allowing for exposure–mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros (Valeri et at.11) https://doi.org/10.1037/a0031034 (https://www.hsph.harvard.edu/tyler-vanderweele/tools-and-tutorials/)
SAS code: interaction, mediation, and four-way decomposition analyses A Unification of Mediation and Interaction: A 4-Way Decomposition (VanderWeele et al.12) https://doi.org/10.1097/EDE.0000000000000121

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Xihong Lin (xlin@hsph.harvard.edu).

Materials availability

This study did not generate new unique reagents.

Data and code availability

The EBV genotype and human genotype data are available at the National Genomics Data Center (NGDC: GVM000647 and NGDC: GVM000648). This study did not generate any new code. Data web links and code/software used in this paper are also listed in the key resources table.

Experimental model and study participant details

We assessed interaction and mediation effects in a population-based case-control study of NPC based in the Zhaoqing area (including 7 cities/counties) of Guangdong Province and the Wuzhou and Guiping/Pingnan areas (including 6 cities/counties) of Guangxi Zhuang Autonomous Region (Guangxi), between 2010 and 2014. These bases were chosen because of the high incidence of NPC, the relatively stable population, the geographic contiguity, and the comparable industrialized level. This case-control study of NPC followed a population-based and frequency-matched study design. To ensure a high identification rate of NPC cases in the study bases, a rapid case ascertainment system involving a network of local physicians in each study base was built to recruit cases before case enrollment. Control subjects, who were frequency matched to the expected five-year age and sex distribution of the cases, were randomly selected every six months from total population registries in each study base. The overall participation rate was high, 83.8% for cases and 82.7% for controls, respectively. A total of 85.6% of participants enrolled from Zhaoqing area of Guangdong and 85.8% from Wuzhou and Guiping/Pingnan areas of Guangxi were from the rural area. The study design and subject enrollment have been previously described in detail.40

Overall, 1306 and 1248 eligible newly diagnosed NPC cases were recruited in Guangdong and Guangxi, respectively. Through random selection from total population registries, 1356 population-based control subjects in Guangdong and 1292 in Guangxi were identified and enrolled with frequency matching to the cases by sex, 5-year age group, and area of residence. All cases and controls were aged between 20 and 74 and did not have a history of cancer, or congenital or acquired immune deficiency. Each participant completed an in-person, structured interview conducted by a trained interviewer. This study was approved by the institutional ethics committees of all the collaborative institutes. Written informed consent was obtained from each study participant.

Method details

Genotyping of human and EBV variants

We selected two host single nucleotide polymorphisms (SNPs) rs2860580 and rs2896207 and the EBV variant at position 163364, based on published GWASs showing the most consistent statistically significant associations with NPC risk.5,6,7,8 Human and EBV variant genotyping was performed using saliva DNA available from 1710 cases and 2246 controls with the Agena Bioscience MassArray platform and was previously described in detail.42

To assess potential selection bias, the association of missingness of EBV genotyping data with case-control status and a set of covariates was evaluated. Based on the analyses, which suggested limited bias (described in results and Figure S2), participants with missing EBV and host genotype data were excluded. We further excluded the study subjects with missing data on smoking, salted-preserved fish consumption, and educational level. Finally, 572 NPC cases and 696 controls from Guangdong were included in the original study, and a non-overlapping set of 497 NPC cases and 826 controls from Guangxi were included in the replication study (Figure S1). The current original study has been included in the validation phase of the initial study describing the EBV variant 163364 by Xu et al.,5 while the replication dataset of the current study is completely independent of this prior study. Both the original and replication datasets of the current study are completely independent of the initial studies describing the two HLA genetic variants.6,7,8

Prediction of BALF2 structure with AlphaFold2

The structures of BALF2 proteins from the low-risk Akata EBV and the high-risk M81 EBV were predicted using AlphaFold2.21 The conformational prediction was based on the crystal structure of single-stranded DNA-binding protein ICP8 from HSV-1 (PDB code: 1URJ), a homologous protein of BALF2, resolved at 3.0 Å resolution.22 Given ∼25% amino acid sequence similarity between BALF2 and ICP8, we achieved a high confidence in the predicted structure of BALF2. For visualization and in-depth analysis of protein structure, we employed the PyMOL Molecular Graphics System (version 0.99, Schrödinger LLC; http://www.pymol.org/).43 Among the variations characterizing the low-risk Akata and high-risk M81 BALF2 proteins, the high-risk EBV variant 163364 has the most pronounced influence on protein structure.

Prediction of HLA binding affinity with EBV peptides

We predicted HLA-A∗0207 binding affinity for peptide pairs derived from high-risk and low-risk EBV strains. Our analysis focused on the nonamer peptides encompassing these variants: BALF2 variant 163364, 24 non-synonymous NPC-associated variants that were in moderate to high linkage disequilibrium with 163364 (R2 > 0.25), and three non-synonymous variants within two previously reported NPC-associated A∗02 epitopes. Next, we evaluated their binding affinity to HLA-A∗0207 using NetMHCpan-4.125. We used the sliding window to consider all unique peptide nonamers containing the NPC-risk-associated amino acid as input for predicting binding affinity. Peptide pairs from high-risk and low-risk EBV strains were retained if either member could be confidently assigned to A∗0207 with an affinity ranking <2%. We identified nine peptide pairs as candidate A∗0207 epitopes, covering 13 variants, including the two reported A∗02 epitopes (Table S4). Comprehensive MHC-I benchmarking suggests that this threshold captured approximately 90% of the epitopes that elicit T cell response in vivo.44

Quantification and statistical analysis

All statistical analyses were conducted separately in the original study, replication study, and pooled study. Characteristics between NPC cases and controls were compared using the Chi-square test or t-test. Individual or joint odds ratios (ORs) and their 95% confidence intervals (CIs) for the associations between NPC and the two host SNPs, rs2860580 (0 = AA/AG; 1 = GG) and rs2894207 (0 = CC/CT; 1 = TT), the joint status of two host SNPs (low risk, 0 = AA/AG for rs2860580 or CC/CT for rs2894207; high risk, 1 = GG for rs2860580 and TT for rs2894207) and EBV variant 163364 (0 = C, 1 = CT/T) were evaluated using logistic regression models where the recessive genetic model was employed. Alternatively, the effects associated with per risk allele were also evaluated using logistic regression models that employed the additive genetic model for the two HLA SNPs (risk allele = G for rs2860580 and T for rs2894207). In all the logistic models for the interaction and mediation analyses, we adjusted the same set of covariates, including age at interview (<35, 35–59, or >59 years old), cross-classified sex and smoking status (male and never smokers, male and ever smokers and female), education level (<7, 7–9 or ≥10 years), salt-preserved fish consumption in 2000–2002 (yearly or less, or monthly or more), family history among first-class relatives (yes or no), area of residence (rural or urban), current occupation (unemployment/farmer, blue-collar, white-collar), and selected environmental exposures (none, dust exposures including exposures to wood, metal, textile, leather, cement and other types dust but excluding the soil dust, exhaust/smoke exposures including exposures to diesel, gasoline, coal, firewood, asphalt/tar, nature gas and other types of exhaust/smoke but excluding dust exposures, and other exposures including exposures to chemical vapor and acid/alkali but excluding dust and smoke/exhaust exposures). Because NPC is a rare disease, with a prevalence of approximately 0.16% in endemic regions in Southern China in 2008–2013,45,46 based on rare event assumption, we used OR as an approximation of the relative risk in the interaction and mediation analyses. The odds ratio (OR) of NPC (Y) associated with the host genotype (A = a) and the EBV variant (M = m) was defined as ORam. C = c denotes the set of covariates.

We estimated the additive interaction effect between the two host genetic variants and the EBV variant on NPC risk as the relative excess risk due to interaction (RERI). We first fit the following logistic regression model for NPC:10

logit{P(Y=1|a,m,c)}=θ0+θ1a+θ2m+θ3am+θ4c(Model1)

Under the rare outcome assumption, we have10

RERI=OR11OR10OR01+1=exp(θ1+θ2+θ3)exp(θ1)exp(θ2)+1

The analyses of interaction effects were conducted using a published SAS code.10,41

Causal mediation analysis was used to investigate the mediation effect of EBV variant 163364 on the two host SNPs in relation to the risk of NPC (Y). Mediation analysis was conducted to decompose the effect of a total effect into a direct and an indirect effect, and these effects on the odds ratio (OR) scale were evaluated in a case-control study design setting.11,47 In addition to the logistic regression for the outcome (Model 1), a second logistic regression model for the mediator (EBV subtypes, Model 2) was applied only on controls:11,13

logit{P(M=1|a,c)}=β0+β1a+β2c(Model2)

Under the rare outcome assumption, we have

ORNDEexp(θ1a){1+exp(θ2+θ3a+β0+β1a+β2c)}exp(θ1a){1+exp(θ2+θ3a+β0+β1a+β2c)},
ORNIE{1+exp(β0+β1a+β2c)}{1+exp(θ2+θ3a+β0+β1a+β2c)}{1+exp(β0+β1a+β2c)}{1+exp(θ2+θ3a+β0+β1a+β2c)}

where a = 1 and a = 0 denoting the host high-risk and low-risk genotype, respectively.11 The natural direct effect (NDE) can be interpreted as the effect on odds ratio scale of the host SNPs (exposure variable) on NPC (outcome variable) not mediated by EBV variant 163364; while the natural indirect effect (NIE) can be interpreted as the causal effect on odds ratio scale of the host SNPs on NPC mediated by the high-risk EBV subtype. The analyses of mediation effects were conducted using a published SAS macro with the setting “casecontrol = true”.11

In order to concurrently investigate the potential interaction and mediation together, we further conducted a joint causal mediation and interaction analysis by decomposing the total effect (TE) of each host genetic variant on NPC risk into the following four components12: (1) controlled direct effect (CDE), due to high-risk host genotype in the absence of high-risk EBV genotype; (2) reference interaction (INTref), only interaction effect between high-risk host genotype and high-risk EBV genotype; (3) mediated interaction (INTmed), both interaction and mediation effect with high-risk host genotype acting only via high-risk EBV genotype selected by high-risk host genotype; and (4) pure indirect effect (PIE), only mediation effect operating exclusively through high-risk EBV subtype, but suppressing the interaction. The total effect is a sum of the four components:12

TE=CDE+INTref+INTmed+PIE

More detailed calculation formulas for the four components are provided in the Section 2.3 of the eAppendix in the paper we cite.12 For this analysis, the two logistic regression models, Model 1 and Model 2, were used, allowing interaction between host SNPs and EBV variant in Model 1.

The sum of reference interaction and mediated interaction is the total risk attributable to interaction, and the sum of pure indirect effect and mediated interaction is the total risk attributable to mediation. The sum of reference interaction, mediated interaction, and pure indirect effect is the total risk due to interaction and mediation, which can be potentially eliminated by intervention in high-risk EBV. The analyses of the four-way decomposition of total effect were conducted using the published SAS code.12

For details of the sample size estimation, see the section limitations of the study. All statistical tests used are two-sided. The 95% CIs and p values in the interaction, mediation, and four-way decomposition analyses were calculated using the delta method.10,11,12,48 Analyses were implemented with SAS 9.4.

Acknowledgments

We sincerely thank Dr. Yi Zeng, a virologist who passed away in 2020, for his contributions as a founder member of NPC research and in this case-control study of NPC. We thank the members of the External Advisory Board of the NPC Genes, Environment, and Epstein-Barr Virus Study, including Dr. Curtis Harris (US National Cancer Institute), Dr. Allan Hildesheim (US National Cancer Institute), Dr. Wei-Cheng You (Peking University), and Dr. You-Lin Qiao (Chinese Academy of Medical Sciences), for their guidance of the overall case-control study. This work was supported by the National Key Research and Development Program of China (grant number 2022YFC2305400 to M.X.), the National Natural Science Foundation of China (grant number 82122050 to M.X.), the National Cancer Institute (grant numbers R35 CA197449 and P01 CA134294 to X.L.), the National Institutes of Health (grant number R01CA11587301 to H.-O.A. and Y.-X.Z.), the Swedish Research Council, and the National Natural Science Foundation of China (grant number 8171101281 to W.Y. and W.-H.J.).

Author contributions

X.L., Y.-X.Z., and M.X. conceived and designed the study; H.-O.A., Y.-X.Z., W.Y., and E.T.C. supervised the design and implementation of the population-based case-control study; M.X., X.Z., and Y.Y. performed sample preparation, quality control, and genotyping; M.X., R.F., Zhonghua Liu, X.Z., L.V., Z. Li, Y. Chen, and J.S. conducted statistical analyses; Y. Cao conducted protein structural analysis; S.-M.C., Q.L., W.-H.J., S.-H.X., Zhiwei Liu, Y.-L. Cai, Y.Z., Z.Z., I.E., M.T., and G.H. participated in the case-control study design, subject recruitment, and sample collection; and the manuscript was drafted by M.X., Zhonghua Liu, and R.F. under the supervision of X.L. and was edited by L.V., E.T.C., H.-O.A., W.Y., and X.L.

Declaration of interests

The authors declare no competing interests.

Published: January 9, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xgen.2023.100474.

Contributor Information

Weimin Ye, Email: ywm@fjmu.edu.cn.

Hans-Olov Adami, Email: hadami@hsph.harvard.edu.

Yi-Xin Zeng, Email: zengyx@sysucc.org.cn.

Xihong Lin, Email: xlin@hsph.harvard.edu.

Supplemental information

Document S1. Tables S1–S4 and Figures S1–S6
mmc1.pdf (1MB, pdf)
Document S2. Transparent peer review records for Miao Xu et al
mmc2.pdf (1.9MB, pdf)
Document S3. Article plus supplemental information
mmc3.pdf (5MB, pdf)

References

  • 1.Chang E.T., Ye W., Zeng Y.-X., Adami H.-O. The Evolving Epidemiology of Nasopharyngeal Carcinoma. Cancer Epidemiol. Biomarkers Prev. 2021;30:1035–1047. doi: 10.1158/1055-9965.EPI-20-1702. [DOI] [PubMed] [Google Scholar]
  • 2.Chien Y.C., Chen J.Y., Liu M.Y., Yang H.I., Hsu M.M., Chen C.J., Yang C.S. Serologic markers of Epstein-Barr virus infection and nasopharyngeal carcinoma in Taiwanese men. N. Engl. J. Med. 2001;345:1877–1882. doi: 10.1056/NEJMoa011610. [DOI] [PubMed] [Google Scholar]
  • 3.Liu Y., Huang Q., Liu W., Liu Q., Jia W., Chang E., Chen F., Liu Z., Guo X., Mo H., et al. Establishment of VCA and EBNA1 IgA-based combination by enzyme-linked immunosorbent assay as preferred screening method for nasopharyngeal carcinoma: a two-stage design with a preliminary performance study and a mass screening in southern China. Int. J. Cancer. 2012;131:406–416. doi: 10.1002/ijc.26380. [DOI] [PubMed] [Google Scholar]
  • 4.Chan K.C.A., Woo J.K.S., King A., Zee B.C.Y., Lam W.K.J., Chan S.L., Chu S.W.I., Mak C., Tse I.O.L., Leung S.Y.M., et al. Analysis of Plasma Epstein-Barr Virus DNA to Screen for Nasopharyngeal Cancer. N. Engl. J. Med. 2017;377:513–522. doi: 10.1056/NEJMoa1701717. [DOI] [PubMed] [Google Scholar]
  • 5.Xu M., Yao Y., Chen H., Zhang S., Cao S.-M., Zhang Z., Luo B., Liu Z., Li Z., Xiang T., et al. Genome sequencing analysis identifies Epstein–Barr virus subtypes associated with high risk of nasopharyngeal carcinoma. Nat. Genet. 2019;51:1131–1136. doi: 10.1038/s41588-019-0436-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bei J.-X., Li Y., Jia W.-H., Feng B.-J., Zhou G., Chen L.-Z., Feng Q.-S., Low H.-Q., Zhang H., He F., et al. A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci. Nat. Genet. 2010;42:599–603. doi: 10.1038/ng.601. [DOI] [PubMed] [Google Scholar]
  • 7.Bei J.X., Su W.H., Ng C.C., Yu K., Chin Y.M., Lou P.J., Hsu W.L., McKay J.D., Chen C.J., Chang Y.S., et al. A GWAS Meta-analysis and Replication Study Identifies a Novel Locus within CLPTM1L/TERT Associated with Nasopharyngeal Carcinoma in Individuals of Chinese Ancestry. Cancer Epidemiol. Biomarkers Prev. 2016;25:188–192. doi: 10.1158/1055-9965.EPI-15-0144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cui Q., Feng Q.-S., Mo H.-Y., Sun J., Xia Y.-F., Zhang H., Foo J.N., Guo Y.-M., Chen L.-Z., Li M., et al. An extended genome-wide association study identifies novel susceptibility loci for nasopharyngeal carcinoma. Hum. Mol. Genet. 2016;25:3626–3634. doi: 10.1093/hmg/ddw200. [DOI] [PubMed] [Google Scholar]
  • 9.Hildesheim A., Apple R.J., Chen C.-J., Wang S.S., Cheng Y.-J., Klitz W., Mack S.J., Chen I.-H., Hsu M.-M., Yang C.-S., et al. Association of HLA Class I and II Alleles and Extended Haplotypes With Nasopharyngeal Carcinoma in Taiwan. J. Natl. Cancer Inst. 2002;94:1780–1789. doi: 10.1093/jnci/94.23.1780. [DOI] [PubMed] [Google Scholar]
  • 10.VanderWeele T.J., Knol M.J. A tutorial on interaction. Epidemiol. Methods. 2014;3:33–72. [Google Scholar]
  • 11.Valeri L., VanderWeele T.J. Mediation analysis allowing for exposure–mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychol. Methods. 2013;18:137–150. doi: 10.1037/a0031034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.VanderWeele T.J. A Unification of Mediation and Interaction: A 4-Way Decomposition. Epidemiology. 2014;25:749–761. doi: 10.1097/EDE.0000000000000121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.VanderWeele T.J. Oxford University Press; 2015. Explanation in Causal Inference: Methods for Mediation and Interaction. [Google Scholar]
  • 14.National Academies of Sciences, E., and Medicine . The National Academies Press; 2019. Reproducibility and Replicability in Science. [PubMed] [Google Scholar]
  • 15.Kieff E.D., Rickinson A.B. In: Fields’ virology. Knipe D.M., Howley P.M., editors. Lippincott Williams & Wilkins, Wolters Kluwer); 2007. Epstein-Barr Virus and Its Replication; pp. 2603–2654. [Google Scholar]
  • 16.Borza C.M., Hutt-Fletcher L.M. Alternate replication in B cells and epithelial cells switches tropism of Epstein-Barr virus. Nat. Med. 2002;8:594–599. doi: 10.1038/nm0602-594. [DOI] [PubMed] [Google Scholar]
  • 17.Frangou P., Buettner M., Niedobitek G. Epstein-Barr virus (EBV) infection in epithelial cells in vivo: rare detection of EBV replication in tongue mucosa but not in salivary glands. J. Infect. Dis. 2005;191:238–242. doi: 10.1086/426823. [DOI] [PubMed] [Google Scholar]
  • 18.Hadinoto V., Shapiro M., Sun C.C., Thorley-Lawson D.A. The Dynamics of EBV Shedding Implicate a Central Role for Epithelial Cells in Amplifying Viral Output. PLoS Pathog. 2009;5 doi: 10.1371/journal.ppat.1000496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xu F.-H., Xiong D., Xu Y.-F., Cao S.-M., Xue W.-Q., Qin H.-D., Liu W.-S., Cao J.-Y., Zhang Y., Feng Q.-S., et al. An Epidemiological and Molecular Study of the Relationship Between Smoking, Risk of Nasopharyngeal Carcinoma, and Epstein–Barr Virus Activation. J. Natl. Cancer Inst. 2012;104:1396–1410. doi: 10.1093/jnci/djs320. [DOI] [PubMed] [Google Scholar]
  • 20.He Y.-Q., Liao X.-Y., Xue W.-Q., Xu Y.-F., Xu F.-H., Li F.-F., Li X.-Z., Zhang J.-B., Wang T.-M., Wang F., et al. Association Between Environmental Factors and Oral Epstein-Barr Virus DNA Loads: A Multicenter Cross-sectional Study in China. J. Infect. Dis. 2019;219:400–409. doi: 10.1093/infdis/jiy542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mapelli M., Panjikar S., Tucker P.A. The crystal structure of the herpes simplex virus 1 ssDNA-binding protein suggests the structural basis for flexible, cooperative single-stranded DNA binding. J. Biol. Chem. 2005;280:2990–2997. doi: 10.1074/jbc.M406780200. [DOI] [PubMed] [Google Scholar]
  • 23.Hildesheim A., Wang C.P. Genetic predisposition factors and nasopharyngeal carcinoma risk: a review of epidemiological association studies, 2000-2011: Rosetta Stone for NPC: genetics, viral infection, and other environmental factors. Semin. Cancer Biol. 2012;22:107–116. doi: 10.1016/j.semcancer.2012.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Tang M., Lautenberger J.A., Gao X., Sezgin E., Hendrickson S.L., Troyer J.L., David V.A., Guan L., McIntosh C.E., Guo X., et al. The principal genetic determinants for nasopharyngeal carcinoma in China involve the HLA class I antigen recognition groove. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1003103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Reynisson B., Alvarez B., Paul S., Peters B., Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 2020;48:W449–W454. doi: 10.1093/nar/gkaa379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Tse K.P., Su W.H., Chang K.P., Tsang N.M., Yu C.J., Tang P., See L.C., Hsueh C., Yang M.L., Hao S.P., et al. Genome-wide association study reveals multiple nasopharyngeal carcinoma-associated loci within the HLA region at chromosome 6p21.3. Am. J. Hum. Genet. 2009;85:194–203. doi: 10.1016/j.ajhg.2009.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Midgley R.S., Bell A.I., McGeoch D.J., Rickinson A.B. Latent gene sequencing reveals familial relationships among Chinese Epstein-Barr virus strains and evidence for positive selection of A11 epitope changes. J. Virol. 2003;77:11517–11530. doi: 10.1128/JVI.77.21.11517-11530.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.de Campos-Lima P.-O., Gavioli R., Zhang Q.-J., Wallace L.E., Dolcetti R., Rowe M., Rickinson A.B., Masucci M.G. HLA-A11 epitope loss isolates of Epstein-Barr virus from a highly A11+ population. Science. 1993;260:98–100. doi: 10.1126/science.7682013. [DOI] [PubMed] [Google Scholar]
  • 29.Midgley R.S., Bell A.I., Yao Q.-Y., Croom-Carter D., Hislop A.D., Whitney B.M., Chan A.T.C., Johnson P.J., Rickinson A.B. HLA-A11-restricted epitope polymorphism among Epstein-Barr virus strains in the highly HLA-A11-positive Chinese population: incidence and immunogenicity of variant epitope sequences. J. Virol. 2003;77:11507–11516. doi: 10.1128/JVI.77.21.11507-11516.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Deng C.M., Wang T.M., He Y.Q., Zhang W.L., Xue W.Q., Li D.H., Yang D.W., Wang Q.L., Liao Y., Diao H., et al. Peptidome-wide association analysis of Epstein− Barr virus identifies epitope repertoires associated with nasopharyngeal carcinoma. J. Med. Virol. 2023;95 doi: 10.1002/jmv.28860. [DOI] [PubMed] [Google Scholar]
  • 31.Lin J.C., Cherng J.M., Lin H.J., Tsang C.W., Liu Y.X., Lee S.P. Amino acid changes in functional domains of latent membrane protein 1 of Epstein-Barr virus in nasopharyngeal carcinoma of southern China and Taiwan: prevalence of an HLA A2-restricted 'epitope-loss variant. J. Gen. Virol. 2004;85:2023–2034. doi: 10.1099/vir.0.19696-0. [DOI] [PubMed] [Google Scholar]
  • 32.Duraiswamy J., Burrows J.M., Bharadwaj M., Burrows S.R., Cooper L., Pimtanothai N., Khanna R. Ex vivo analysis of T-cell responses to Epstein-Barr virus-encoded oncogene latent membrane protein 1 reveals highly conserved epitope sequences in virus isolates from diverse geographic regions. J. Virol. 2003;77:7401–7410. doi: 10.1128/JVI.77.13.7401-7410.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tsai M.H., Raykova A., Klinke O., Bernhardt K., Gärtner K., Leung C.S., Geletneky K., Sertel S., Münz C., Feederle R., Delecluse H.J. Spontaneous lytic replication and epitheliotropism define an Epstein-Barr virus strain found in carcinomas. Cell Rep. 2013;5:458–470. doi: 10.1016/j.celrep.2013.09.012. [DOI] [PubMed] [Google Scholar]
  • 34.Bristol J.A., Djavadian R., Albright E.R., Coleman C.B., Ohashi M., Hayes M., Romero-Masters J.C., Barlow E.A., Farrell P.J., Rochford R., et al. A cancer-associated Epstein-Barr virus BZLF1 promoter variant enhances lytic infection. PLoS Pathog. 2018;14 doi: 10.1371/journal.ppat.1007179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li Z., Baccianti F., Delecluse S., Tsai M.-H., Shumilov A., Cheng X., Ma S., Hoffmann I., Poirey R., Delecluse H.-J. The Epstein–Barr virus noncoding RNA EBER2 transactivates the UCHL1 deubiquitinase to accelerate cell growth. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2115508118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Li Z., Tsai M.-H., Shumilov A., Baccianti F., Tsao S.W., Poirey R., Delecluse H.-J. Epstein–Barr virus ncRNA from a nasopharyngeal carcinoma induces an inflammatory response that promotes virus production. Nat. Microbiol. 2019;4:2475–2486. doi: 10.1038/s41564-019-0546-y. [DOI] [PubMed] [Google Scholar]
  • 37.Wang Y., Ungerleider N., Hoffman B.A., Kara M., Farrell P.J., Flemington E.K., Lee N., Tibbetts S.A. A Polymorphism in the Epstein-Barr Virus EBER2 Noncoding RNA Drives In Vivo Expansion of Latently Infected B Cells. mBio. 2022;13 doi: 10.1128/mbio.00836-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Church T.M., Verma D., Thompson J., Swaminathan S. Efficient Translation of Epstein-Barr Virus (EBV) DNA Polymerase Contributes to the Enhanced Lytic Replication Phenotype of M81 EBV. J. Virol. 2018;92 doi: 10.1128/JVI.01794-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Rudolph K.E., Goin D.E., Stuart E.A. The Peril of Power: A Tutorial on Using Simulation to Better Understand When and How We Can Estimate Mediating Effects. Am. J. Epidemiol. 2020;189:1559–1567. doi: 10.1093/aje/kwaa083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ye W., Chang E.T., Liu Z., Liu Q., Cai Y., Zhang Z., Chen G., Huang Q.-H., Xie S.-H., Cao S.-M., et al. Development of a population-based cancer case-control study in southern china. Oncotarget. 2017;8:87073–87085. doi: 10.18632/oncotarget.19692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Knol M.J., VanderWeele T.J. Recommendations for presenting analyses of effect modification and interaction. Int. J. Epidemiol. 2012;41:514–520. doi: 10.1093/ije/dyr218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zhou X., Cao S.-M., Cai Y.-L., Zhang X., Zhang S., Feng G.-F., Chen Y., Feng Q.-S., Chen Y., Chang E.T., et al. A comprehensive risk score for effective risk stratification and screening of nasopharyngeal carcinoma. Nat. Commun. 2021;12:5189. doi: 10.1038/s41467-021-25402-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Schrodinger L.L.C. 2015. The PyMOL Molecular Graphics System. [Google Scholar]
  • 44.Paul S., Croft N.P., Purcell A.W., Tscharke D.C., Sette A., Nielsen M., Peters B. Benchmarking predictions of MHC class I restricted T cell epitopes in a comprehensively studied model system. PLoS Comput. Biol. 2020;16 doi: 10.1371/journal.pcbi.1007757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ferlay J E.M., Lam F., Colombet M., Mery L., Piñeros M., Znaor A., Soerjomataram I., Bray F. International Agency for Research on Cancer; 2018. Global Cancer Observatory: Cancer Today.https://gco.iarc.fr/today [Google Scholar]
  • 46.Ji M.F., Sheng W., Cheng W.M., Ng M.H., Wu B.H., Yu X., Wei K.R., Li F.G., Lian S.F., Wang P.P., et al. Incidence and mortality of nasopharyngeal carcinoma: interim analysis of a cluster randomized controlled screening trial (PRO-NPC-001) in southern China. Ann. Oncol. 2019;30:1630–1637. doi: 10.1093/annonc/mdz231. [DOI] [PubMed] [Google Scholar]
  • 47.VanderWeele T.J., Vansteelandt S. Odds Ratios for Mediation Analysis for a Dichotomous Outcome. Am. J. Epidemiol. 2010;172:1339–1348. doi: 10.1093/aje/kwq332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hosmer D.W., Lemeshow S. Confidence interval estimation of interaction. Epidemiology. 1992;3:452–456. doi: 10.1097/00001648-199209000-00012. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Tables S1–S4 and Figures S1–S6
mmc1.pdf (1MB, pdf)
Document S2. Transparent peer review records for Miao Xu et al
mmc2.pdf (1.9MB, pdf)
Document S3. Article plus supplemental information
mmc3.pdf (5MB, pdf)

Data Availability Statement

The EBV genotype and human genotype data are available at the National Genomics Data Center (NGDC: GVM000647 and NGDC: GVM000648). This study did not generate any new code. Data web links and code/software used in this paper are also listed in the key resources table.


Articles from Cell Genomics are provided here courtesy of Elsevier

RESOURCES