Skip to main content
eBioMedicine logoLink to eBioMedicine
. 2024 Sep 11;108:105321. doi: 10.1016/j.ebiom.2024.105321

Establishment and validation of circulating cell-free DNA signatures for nasopharyngeal carcinoma detection

Su-Fang Qiu a,b,f, Qing-Zheng Zhang c,f, Zi-Yi Wu a,f, Ming-Zhu Liu a, Qin Ding a,b, Fu-Ming Sun c, Yin Wang c,d, Han-Xuan Yang a,b, Lu Zheng c, Xin Chen a, Lin Wu c, Jian Bai c,d,∗∗∗, Jing-Feng Liu a,b,∗∗, Chuan-Ben Chen a,e,
PMCID: PMC11416236  PMID: 39265506

Summary

Background

Early detection of nasopharyngeal carcinoma (NPC) poses a significant challenge. The absence of highly sensitive and specific diagnostic biomarkers for nasopharyngeal carcinoma contributes to the unfavourable prognosis of NPC patients. Here, we aimed to establish a non-invasive approach for detecting NPC using circulating cell-free DNA (cfDNA).

Methods

We investigated the potential of next-generation sequencing (NGS) of peripheral blood cells as a diagnostic tool for NPC. We collected data on genome-wide nucleosome footprint (NF), 5′-end motifs, fragmentation patterns, CNV information, and EBV content from 553 Chinese subjects, including 234 NPC patients and 319 healthy individuals. Through case–control analysis, we developed a diagnostic model for NPC, and validated its detection capability.

Findings

Our findings revealed that the frequencies of NF, fragmentation, and motifs were significantly higher in NPC patients compared to healthy controls. We developed an NPC score based on these parameters that accurately distinguished NPC from non-NPC cases according to the American Joint Committee on Cancer staging system from non-NPC (validation set: area under curve (AUC) = 99.9% (95% CI: 99.8%–100%), se: 98.15%, sp: 100%). This model showed superior performance over plasma EBV DNA. Additionally, the NPC score effectively differentiated between NPC patients and healthy controls, even after clinical treatment. Furthermore, the NPC score was found to be independent of potential confounders such as age, sex, or TNM stage.

Interpretation

We have developed and verified a non-invasive approach with substantial potential for clinical application in detecting NPC.

Funding

A full list of funding bodies that contributed to this study can be found in Funding section.

Keywords: Biomarkers, Nasopharyngeal carcinoma, cfDNA, Cancer detection


Research in context.

Evidence before this study

Nasopharyngeal carcinoma (NPC) is endemic in East and Southeast Asia. Numerous biomarkers for NPC diagnosis have been identified in retrospective case–control studies, yet none of them were based on analysis of circulating DNA.

Added value of this study

Through case–control analyses, we demonstrated the effectiveness of an integrated diagnostic model based on genome-wide nucleosome footprint (NF), fragmentation, CNV information, and the Epstein–Barr virus (EBV) content of cfDNA data. This model accurately distinguishes patients with NPC from healthy controls.

Implications of all the available evidence

The genomic features of cfDNA have been shown to play a crucial role in NPC detection. This study provides a promising method to detect NPC, potentially improving early diagnosis and patient outcomes.

Introduction

Nasopharyngeal carcinoma (NPC) is a common cancer originating in the nasopharynx, with a notably high prevalence in East and Southeast Asia. The etiology of NPC is strongly correlated with environmental factors, dietary habits, genetic predisposition, and Epstein–Barr virus (EBV) infection. In 2018, approximately 60,558 new NPC cases were diagnosed in China, representing 47.7% of all global cases.1 Approximately 75% of patients are initially diagnosed with advanced NPC, which is associated with a higher risk of recurrence, metastasis, and poor prognosis.2 Conversely, less than 10% of patients with NPC exhibit stage I disease, which is associated with a better prognosis and a high chance of successful treatment, as evidenced by a 5-year overall survival rate of 90% or more.3,4 Although pathological evidence remains the preferred diagnostic method for NPC according to established guidelines, its invasiveness and expensive cost limit its application in asymptomatic individuals at risk of NPC. This limitation results in a low rate of early detection.5

Efforts have been made to enhance the effectiveness of NPC screening strategies through the utilisation of multiple biomarkers, comprehensive screening strategies, and the discovery of new diagnostic biomarkers.6 Given that most cases of NPC are caused by EBV infection, measuring EBV DNA in pretreatment plasma is valuable for NPC screening and patient management.7 Various methods for EBV detection are employed in clinical decision-making, facilitating diagnosis, optimization of treatment strategies, and disease monitoring.8 Quantitative real-time PCR (qPCR) is the most used method for measuring EBV DNA. However, this method requires reference materials for standardisation, and its quantitative precision can be affected by factors such as the DNA isolation protocol, amplification bias, or cross-contamination. The relatively low accuracy of current diagnostic methods underscores the urgent need to explore non-invasive strategies for detecting NPC.9

Recently, the use of circulating cell-free DNA (cfDNA) has garnered significant attention as a non-invasive method for NPC detection, since it has been found to be more abundant in NPC patients compared to healthy individuals.10 Understanding the correlation between cfDNA levels and tumour characteristics is crucial for comprehending the onset, progression, and dissemination of NPC.11, 12, 13

The use of liquid biopsy via cfDNA analysis is effective for non-invasive cancer detection.14 However, the efficacy of cfDNA as a diagnostic biomarker for NPC has not been fully elucidated. In this case–control study, we enrolled 234 patients diagnosed with NPC, and 319 healthy controls from Fujian province, China. We employed next-generation sequencing (NGS) to acquire copy number variation (CNV),15 fragmentation,16 nucleosome footprint (NF),17 5′end motif,18 and EBV fragmentation19 profiles of cfDNAs from enrolled individuals. Using logistic regression, we developed a weighted diagnostic model based on these five features. Our findings indicated the efficacy of the cfDNA diagnostic model effectively distinguished between NPC patients and healthy controls.

Methods

Study population

In accordance with institutional guidelines, all participants provided written informed consent for this study, which was approved by Fujian Cancer Hospital's Ethics Committee (K2022-112-01). Peripheral blood samples were collected between June 2022 and August 2023 from 273 study participants at the Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, China. The following inclusion criteria were used: 1) NPC diagnosis was histologically confirmed, 2) NPC patients did not receive any anti-tumour treatment such as chemotherapy, radiotherapy, or immunotherapy before blood collection, 3) sufficient amount of high-quality blood samples were available for next-generation sequencing. A total of 234 blood samples were collected from patients who were confirmed to have NPC following a biopsy before the initiation of treatment, following the guidelines outlined in the 8th edition of the American Joint Committee on Cancer (AJCC) staging system. Additionally, 319 venous blood samples were obtained from healthy individuals attending the clinic for routine physical examinations, and were matched by age and sex with NPC patients. Notably, none of these individuals in this cohort reported any incidental cases of NPC during the follow-up period leading up to the preparation of this manuscript (The median follow-up time of healthy controls was 7.9 [6.3–12] months). For sex identification, self-reporting by the study participants was required. To assess the probability of disease monitoring, 24 high-quality blood samples were procured one month after definitive treatment. Subsequently, plasma EBV DNA levels of NPC patients were measured at the Department of Clinical Laboratory of Fujian Cancer Hospital.

Sample processing and cfDNA extraction

All venous blood samples were collected in accordance with the established standard operating procedures. Specifically, cell-free DNA BCT tubes (Streck, USA) were used to collect 5–10 mL of venous blood samples. The plasma was isolated by centrifuged at 800×g for 10 min at 4 °C, and only non-haemolytic samples were considered further processing. Cellular debris was removed by centrifuged at 18,000×g. The extraction of cell-free DNA (cfDNA) was meticulously conducted using the MagMAX Cell-Free DNA Isolation Kit from Thermo Fisher Scientific, adhering strictly to the protocol provided by the manufacturer. The purified cfDNA was then quantified for quality using a Qubit® 4.0 Fluorometer (Life Technologies, USA), and the DNA fragment size distribution was analysed employing a Fragment Analyzer (Agilent, USA).

Low-pass whole genome sequencing (WGS) and data processing

Libraries for sequencing were prepared with 5 ng of DNA. The DNA samples underwent end-repair and dA-tailing using a 5X ER/A-Tailing Enzyme Mix, followed by adaptor ligation with WGS Ligase. Adaptor sequences were tailored for compatibility with the Illumina CN500 platform. Post-ligation, t purification of libraries was done using Agencourt AMPure XP beads (Beckman Coulter, USA). Quantification of the purified libraries was performed with the KAPA Library Quantification Kit (Kapa Biosystems, USA), and their size distribution was verified using a Bioanalyzer (Agilent, USA). The sequencing libraries were pooled in equimolar concentrations. Illumina CN500 platform was used for whole genome sequencing (WGS), with an average coverage of 1.2x, utilizing two paired-end sequences of 36 bp.

FASTQ files underwent processing with the Fastp software (https://github.com/OpenGene/fastp) to remove adaptors and filter out sequences with low sequencing quality averages. After filtering, the filtered reads were subsequently aligned to the Hg19 reference genome using the bwa-mem algorithm to determine the exact genomic locations of each DNA fragment. PCR-induced data redundancy was mitigated through the application of Sambamba software. Additionally, fragments with poor alignment quality, those not aligned, or those not perfectly matched in paired-end reads were removed using Samtools. The filtered DNA fragments were or gained based on their alignment positions to facilitate subsequent analyses and processing. Reads exhibiting mapping rates exceeding 95%, duplicate rates below 10%, and coverage above 50% were considered to pass quality control. Consequently, all sample complied with the established quality benchmarks.

WGS-based biomarker identification and integrated model construction

The WGS can identify specific biomarkers associated with NPC by analyzing cell-free DNA (cfDNA). An examination of NF, 5′-end sequence motif, DNA fragmentation characteristics, EBV presence, and copy number variations is included in this analysis. Consequently, we developed a diagnostic model that integrates and weighs the predictive power of four specific features (NF, 5′-end motifs, fragmentation, and EBV content). The efficacy of the final model was subsequently validated using a distinct test dataset. The detailed selection process is shown in Fig. 1.

Fig. 1.

Fig. 1

Study design. This figure shows the workflow of the study.

Nucleosome footprint (NF)

The original RNA transcripts of coding genes were examined. The promoter sequences and non-promoter regions within these transcripts were delineated, and the count of sequence reads overlapping these areas was tallied using the featureCounts. The disparity in read counts between the promoter and non-promoter regions was leveraged to estimate the density of nucleosome distribution. Genes were excluded based on the following criteria: 1) an NF score of zero in a proportion exceeding 10% of the entire sample set, 2) a P-value at or above 0.01 from the Wilcoxon rank-sum test that assessed the difference between the NPC and NC groups, and 3) a null weight as assigned by the LASSO algorithm. Ultimately, 432 genes passed these filters and were carried forward for deeper analysis.

Fragmentations

The complete genome, excluding the Y chromosome, was segmented into 1-Mb bins, yielding a total of 3055 segments. Utilizing the Pysam software, the lengths of the inserted DNA fragments were measured, and the ratio of shorter fragments (between 90 and 150 base pairs) to longer ones (between 151 and 220 base pairs) was evaluated across these segments. Subsequently, the Least Absolute Shrinkage and Selection Operator (LASSO) method was applied to filter out regions with a weight of zero, resulting in the retention of 329 regions for further analysis.

5′-end motifs

DNA fragment termination positions were determined by aligning each fragment with the reference genome. A total of 256 unique types of 4-mer 5′-end motifs were identified, and their frequencies were determined with pysam, with the exclusion of chromosome Y and ambiguous bases. Motif types were filtered based on the following criteria: 1) a P-value of 0.05 or higher in the Wilcoxon rank-sum test between the NPC and NC groups, and 2) a LASSO-assigned weight of zero. Ultimately, 167 types of motifs were selected for continued analysis.

EBV content (EBV)

After extracting the reads that did not match the human genome, the reads were mapped to the EB wild-type genome (GenBank: AJ507799.2). The number of reads that perfectly matched the EB genome was calculated, and the proportion of EB was calculated.

Copy number variation (CNV)

After segmenting the human genome into 20 kb regions, the average sequencing depth of each region was determined. Subsequently, GC content correction was applied. A baseline threshold was established for each region with the mean and variance of copy number from the data of the healthy population. Regions that show significant deviations from the baseline threshold are identified as potential copy number variations CNV with lengths greater than 2 Mb were obtained by connecting them to adjacent windows. The CNV score was calculated using a equation reported previously, which likely quantifies the magnitude or impact of the CNV, considering factors such as the size of the variation, the number of affected segments, and the degree of deviation from the norm.20

NPC score model construction

Utilizing the markers selected by LASSO from the trio of genomic signatures, the support vector machine (SVM) method was used for model construction as previously described.21 The model's parameters were honed through a 10-fold cross-validation process on the training dataset, identifying the cutoff threshold that corresponded to the peak diagnostic accuracy. To achieve an optimal diagnostic model, logistic regression was employed to integrate the outcomes of four individual models by utilizing their predictive scores as input features. This integration was conducted based on both training and validation datasets (Fig. 1).

LASSO was implemented as follows: By constructing a penalty function, a more refined model is obtained, which compresses some coefficients and sets some coefficients to zero. This model compresses variables with large parameter estimates to zero, whereas variables with small parameter estimates are compressed to zero to achieve feature dimensionality reduction. This procedure is implemented using the LassoCV() function of the ‘sklearn’ package in Python. The LASSO inputs of the NF, Fragment, and Motif included only samples of the training set. NF input is the value of each gene in each sample after filtering P-value less than 0.01, where Fragment is the ratio of each region (number of short fragments/number of long fragments) of each sample after filtering P-value less than 0.01, and motif is the proportion of various motif in each sample after filtering P-value less than 0.05. Output is the feature that their “lasso.coef_" is not 0, that is, the result after dimensionality reduction.

The Support Vector Machine (SVM) method was employed for constructing models based on individual genomic features, utilizing three parameters: (1) the penalty coefficient (C), (2) the kernel function, and (3) the gamma parameter. The input comprised sample data from the training set, and feature selection for each dimension was performed using Least Absolute Shrinkage and Selection Operator (LASSO) for dimensionality reduction. The GridSearchCV() function from the ‘sklearn’ Python package was employed to ascertain the optimal combination of three parameters within the training set, utilizing the 10-fold cross-validation method. In this approach, the training set samples were partitioned into 10 equal subsets, with 9 subsets used for parameter training and fitting, while the remaining subset was utilized for performance validation. Each dimension was trained separately to determine the optimal combination of parameters. The identified optimal parameters were directly applied to the independent validation set samples. Finally, the probability of each sample predicted to be an NPC in each of the three dimensions was used as the output.

The process of calculating feature weights by logistic regression included training set samples, and consisted of the following main steps: (1) establishing a linear combination of categorical labels and the probability value output of each dimension of each sample, (2) using the sigmoid function to convert linear combinations into probability values, (3) using maximum likelihood estimation to determine the weights of each dimension, (4) using gradient descent to update the feature weights, and (5) iterative optimization until convergence. The 10X cross-validation process was partially identical to that of the SVM method described above. Complete the integration of all dimensions.

Statistical analysis

The sample size of the training cohort was determined to have a power of 80% at a two-sided type I error rate of 0.05, which required at least 175 participants per group (actual enrolment: 180 NPC + 245 NC; actual power: 86.19%, calculated by “pwr.t2n.test ()” in R package).

The Wilcoxon rank-sum test was used to compare two groups of continuous variables, and Fisher's exact test was used to compare categorical variables. P value were calculated using Python software (version 2.7.14), and P < 0.05 was considered as statistically significant. Receiver operating characteristic (ROC) curves were generated to evaluate the performance of the prediction algorithm using the pROC library in R. Sensitivity and specificity were assessed at the cutoff score that maximized the sum of sensitivity and specificity using the pROC library in R.

Role of funding source

The funders had no role in the study design, data collection, experimental workflow, data analysis and interpretation, or manuscript writing.

Results

Patient characteristics

We first obtained 234 and 319 blood samples from NPC patients and healthy controls, respectively (Fig. 1, also see Methods section for details on inclusion criteria). All participants were randomly assigned to two cohorts: a training cohort comprising 180 NPC patients and 245 healthy controls (N = 425), and a validation cohort including 74 healthy controls and 54 NPC patients (N = 128). Demographic and clinical information are summarised in Table 1. There were no statistically significant differences in age distribution or sex between the two groups (P > 0.05, Chi-Squared Test). The distribution of all patients across the AJCC stages (8th edition) was as follows: 6 (2.56%) in stage I, 34 (14.53%) in stage II, 97 (41.45%) in stage III, and 96 (41.03%) in stage IV.

Table 1.

Participants characteristics in the training, and validation cohorts.

Training cohort
Validation cohort
NPC No. (%) Control No. (%) P NPC No. (%) Control No. (%) P
Age
 <40 34 (18.89%) 47 (19.18%) 0.85 11 (20.37%) 16 (21.62%) 0.66
 40–60 112 (62.22%) 157 (64.08%) 30 (55.56%) 45 (60.81%)
 >60 34 (18.89%) 41 (16.73%) 13 (24.07%) 13 (17.57%)
Gender
 Female 55 (30.56%) 73 (29.80%) 0.95 15 (27.78%) 23 (31.08%) 0.84
 Male 125 (69.44%) 172 (70.20%) 39 (72.22%) 51 (68.92%)
T stage
 T1 35 (19.44%) 16 (29.63%)
 T2 28 (15.56%) 4 (7.41%)
 T3 80 (44.44%) 18 (33.33%)
 T4 36 (30.00%) 14 (25.93%)
N stage
 N0 20 (11.11%) 6 (11.11%)
 N1 73 (40.56%) 23 (42.59%)
 N2 43 (28.89%) 14 (25.93%)
 N3 43 (23.89%) 9 (16.67%)
M Stage
 M0 176 (97.78%) 48 (88.89%)
 M1 3 (1.67%) 3 (5.56%)
Clinical stage
 I 3 (1.67%) 3 (5.56%)
 II 24 (13.33%) 8 (14.81%)
 III 78 (43.33%) 19 (35.19%)
 IV 74 (41.11%) 22 (40.74%)
 NA 1 (0.56%) 2 (3.70%)

The P values were determined using the two-tailed χ2 test.

Identification of nasopharyngeal carcinoma cfDNA genomic features

Next, we performed WGS to identify the genomic features of NPC cfDNA. Cluster analysis showed that the read coverage of differential genes yielded the power to distinguish NPC from healthy controls (Fig. 2a). Fragmentation omics constitutes a key molecular characteristic of cfDNA in the plasma. Indeed, we further revealed that NPC-derived cfDNA was more variable in length, and shorter (median size <150 bp) than cfDNA from healthy controls (Fig. 2b). This highlights the potential for differentiating between NPC and non-NPC samples based on the cfDNA fragment length. In addition, compared to the size distribution of human autosomal DNA, the EBV DNA fragment in the plasma of patients was found to be shorter, with a peak at 150 bp (Fig. 2c). A total of 167 end-motif patterns were also identified as potential classification parameters for NPC in healthy controls (Fig. 2d). Additionally, copy number variations (CNVs) demonstrated considerably higher CNV loads in patients with advanced-stage disease (Fig. 2e). The details of the genomic features are listed in Supplementary Table S1. Collectively, all four features showed diagnostic potential for NPC.

Fig. 2.

Fig. 2

a. Heatmap analysis of z-scores of promoters with differential read coverage. b. Distribution of frequencies of different autosomal fragment size between control and NPC. c. Distribution of frequencies of autosomal fragment size and EBV fragment of NPC patients. d Top 10 different motif frequencies between control and NPC samples. e. Frequencies of CNV in different TNM stage.

Development and validation of an NPC prediction model

We found that the predicted score calculated by NF, fragmentation, or motifs separately in patients were significantly higher than those in healthy controls in both training and validation sets (Fig. 3a–f, Supplementary Table S2). Thus, we generated a risk score using a formula that included the above three genomic features weighted by their regression coefficients in the model as follows: Score = exp(Z)/(1 + exp(Z)), where Z = −2.93+ (1.88 ∗ NF) + (3.19 ∗ Fragmentation) + (1.79 ∗ Motif). The results obtained using this model demonstrated that the score calculated using these genomic features could explain the difference between patients with NPC and healthy controls (Fig. 3g and h). An optimal cutoff value (0.43, AUC = 1.00) was selected via receiver operating characteristic (ROC) curve analysis, which resulted in a strong diagnostic value in differentiating NPC from healthy controls, yielding a sensitivity of 99.44% and a specificity of 96.33% for NPC in the training set (180 NPC and 245 CTRL), and a sensitivity of 96.30% and a specificity of 98.65% in the validation set (54 NPC and 74 CTRL). The differentiation power of the combined method for NPC vs control (training: AUC = 1.00 [0.999–1.00], validation: AUC = 0.999 [0.998–1.00]) was superior to NF (training: AUC = 0.984 [0.968–1.00], validation: AUC = 0.971 [0.946–0.995]), motif (training: AUC = 0.953 [0.933–0.974], validation: AUC = 0.968 [0.941–0.995]), and fragmentation (training: AUC = 0.991 [0.985–0.998], validation: AUC = 0.981 [0.960–1.00]) (Fig. 3i and j).

Fig. 3.

Fig. 3

Box and Whisker plot of predicted score calculated by NF, fragmentation, or motif separately in training set (a–c), validation set (d–f) and combined those three features in training or validation (g, h). ROC curves of three genomic signatures and combined method for NPC patients vs control in the training set (i) and validation set (j).

Because NPC patients had significantly higher amounts of plasma EBV DNA (Fig. 4a and b, Supplementary Table S2), we also investigated whether the diagnosis performance increases upon including the EBV fragments calculated as: NPC predicted score = exp(Z)/(1 + exp(Z)), where Z = −2.89 + (1.77 ∗ NF) + (2.35 ∗ Fragmentation) + (1.30 ∗ Motif) + (1.55 ∗ EBV). As anticipated, there was a positive correlation between NPC score levels and clinical stage. Consistent with the progression of clinical stages I-IV, the median NPC score exhibited an upward trend in TNM stage (Fig. 4c and d). Additionally, the AUC of this combined method was 0.999 (95% CI = 0.998–1.00), resulting in a sensitivity of 98.33% and specificity of 99.18% for NPC in the training set. In the validation set, the sensitivity was 98.15% and the specificity was 100%, using a cut-off value of 0.48 (Fig. 4e and f, Supplementary Table S3). These findings suggest that the presence of cell-free EBV nucleotides in plasma improves the accuracy of NPC detection.

Fig. 4.

Fig. 4

Box and Whisker plot of predicted score calculated by EBV in training set (A)and validation set (B). Box and Whisker plot of NPC score in training set (C)and validation set (D). E. ROC curves of EBV and NPC score for NPC patients vs control in the training set. F. ROC curves of NPC score and EBV for NPC patients vs control in the validation set.

Detection performance of the NPC score

In the validation set, no healthy controls with NPC scores equal to or exceeding 0.48 was observed. Using this threshold, 53 of 54 patients (98.15%) exhibited detectably positive results. Overall, the NPC score progressively increased from CTRL to NPC (Fig. 5a). In the subgroup analysis, the proportion of positive calling with the NPC score surpassed that of clinical EBV DNA detection (with a detection limit of 500 copies/ml) across various age groups, genders, and TNM stages (Fig. 5b–d). Epstein–Barr virus (EBV) DNA copy number is commonly used for NPC detection, although no universally accepted standard exists. We compared the NPC scores with three different thresholds for detecting EBV DNA (0, 500, or 1000 copies/ml), and observed that the NPC score exhibited positive detection rates ranging from 97.92% to 100% in patients across stages I to IV, whereas the EBV DNA detection rates ranged from 16.67% to 89.58% (Fig. 5e). The diagnostic performance of the NPC score is summarised in Supplementary Table S4.

Fig. 5.

Fig. 5

NPC scores of all enrolled patients were shown. The cutoff value of NPC score was 0.48. Upper: information of NPC patients. Proportions of positive and negative calling by NPC score and EBV DNA load in NPC patients with different ages, genders and stages in validation sets (b–d). e. Proportions of positive and negative calling by NPC score and EBV DNA load with difference detection limitation (0, 500 or 1000 copies/ml) in training + validation set.

Monitoring of NPC disease status by NPC score

According to previous reports, plasma EBV DNA copy number serves as a valuable biomarker for the assessment of disease progression in NPC patients after treatment. To demonstrate the applicability of the NPC score for disease monitoring, 24 plasma samples from patients with NPC were selected at the time of treatment and the NPC score was computed. We found that the plasma EBV DNA copy number significantly decreased at the end of treatment (Fig. 6a). Furthermore, our study demonstrated a significant decrease in the NPC score after radiochemotherapy (Fig. 6b). These findings imply that the NPC score holds promise for monitoring the NPC disease status.

Fig. 6.

Fig. 6

a. Line chart displaying the change of EBV DNA loads measured by the qPCR in each individual. b. Line chart displaying the change of auto-NPC score.

Discussion

Most patients with early stage NPC who are eligible for radiotherapy with curative intent (approximately 95% of cases) achieve a 5-year survival rate of 90% or higher.3 However, most patients diagnosed with advanced, incurable, metastatic diseases at the time of diagnosis. Therefore, it is urgent to address the low positive predictive value of early NPC screening. To this end, minimally invasive biomarker assays are needed to enable earlier detection and monitoring of NPC. The existence of non-invasive detection biomarkers for NPC is substantiated by a multitude of epidemiological findings, biological evidence, and advancements in high-performance technologies.22 Throughout tumorigenesis and progression, various cancer cells acquire many biological characteristics, including DNA amplification, methylation, and somatic mutations, which are often detected in genetically altered DNA.23,24 These alterations may be exploited for disease detection.

Plasma EBV DNA holds great potential for the diagnosis of NPC and is positively correlated with the tumour stage of NPC.25,26 A large-scale prospective screening study showed that plasma EBV DNA can be used for early asymptomatic NPC detection with a sensitivity of 97·1% and specificity of 98·6%.27 A retrospective case–control study reported that anti–BNLF2b total antibody can distinguish patients with NPC with a sensitivity of 94.4% and specificity of 99.6%.28 Additionally, the application of novel technologies in cell-free DNA analysis can specifically identify early NPC.29,30 According to studies of cfDNA detected in the blood of patients with NPC, genomic DNA from tumour cells predominates over EB virion DNA.31 Here, we devised a non-invasive method using plasma bloos samples to demonstrate the capacity of cfDNA for NPC detection.

We conducted a case–control study, and adopted a NGS-based strategy for NPC detection. From the NGS data, we calculated a NF score, fragments score, 5′-end motifs score and EBV score for each sample, which measures deviations from healthy individuals. These findings corroborate previous findings indicating the potential of these unique genomic characteristics as biomarkers for NPC detection.32,33 These results highlighted the successful detection of specific tumour-derived nucleic acid fragments. LASSO and the support vector machine (SVM) method were used to screen the markers in the training set for model construction. We thus developed an integrated method for the diagnosis of NPC. The results show that the NPC score efficiently identifies individuals with NPC. Although we only enrolled six patients with stage I and II NPC, the NPC score in our study was 100% positive. We demonstrated that the NPC score, which includes whole genome and viral information, is a highly accurate, non-invasive detection strategy for NPC.

In summary, the NPC score developed here showed high accuracy for NPC detection, highlighting its potential as a new strategy for NPC diagnosis and status monitoring. The NPC score fills an urgent need to identify NPC cases that qualify for curative treatment. However, our study had several limitations as well. First, the patients in our cohort were all from the Fujian Province of China. Therefore, the capability of the NPC score to diagnose NPC in other geographical areas requires further investigation. Second, the number of patients with early-stage NPC was rather small since most patients had no clinical symptoms. Hence, our conclusion on NPC score performance requires further confirmation in a large screening cohort.

Contributors

Su-Fang Qiu, Qing-Zheng Zhang and Zi-Yi Wu contributed equally. Su-Fang Qiu: Supervision, Methodology, Resources, Funding acquisition, Writing–original draft. Qing-Zheng Zhang: Methodology, Investigation, Formal analysis; Zi-Yi Wu: Conceptualization, Methodology, Investigation, Writing–original draft; Ming-Zhu Liu, Qin Ding, Han-Xuan Yang & Xin Chen: Sample collection; Fu-Ming Sun: Data Curation; Yin Wang: Validation; Lu Zheng: Investigation; Lin Wu: Project administration; Jian Bai: Supervision, Methodology, Conceptualization; Jing-Feng Liu: Supervision, Methodology, Conceptualization, Funding acquisition; Chuan-Ben Chen: Supervision, Methodology, Resources, Funding acquisition, Writing—review. All authors read and approved the final manuscript.

Data sharing statement

The data that support the findings of this study have been deposited into CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) with accession number CNP0004987.

Declaration of interests

We declare no competing interests.

Acknowledgments

This work was supported by grants of Science and Technology Program of Fujian Province, China (2018Y2003, 2021Y9230); Fujian Provincial Clinical Research Center for Cancer Radiotherapy and Immunotherapy (2020Y2012); Supported by the National Clinical Key Specialty Construction Program (2021); Fujian Clinical Research Center for Radiation and Therapy of Digestive, Respiratory and Genitourinary Malignancies; National Natural Science Foundation of China (82072986); Major Scientific Research Program for Young and Middle-aged Health Professionals of Fujian Province, China (Grant No. 2021ZQNZD010); Joint Funds for the Innovation of Science and Technology, Fujian province (2021Y9196); High-level Talent Training Program of Fujian Cancer Hospital (2022YNG07); and Natural Science Foundation of Fujian Province (2023J01121763, 2023J05235). We are grateful to the patients and healthy controls involved in this study.

Footnotes

Appendix A

Supplementary data related to this article can be found at https://doi.org/10.1016/j.ebiom.2024.105321.

Contributor Information

Jian Bai, Email: baijian488@berryoncology.com.

Jing-Feng Liu, Email: drjingfeng@fjmu.edu.cn.

Chuan-Ben Chen, Email: ccb@fjmu.edu.cn.

Appendix A. Supplementary data

Supplementary Table S1

The details of top genomic signatures from CNV, nucleosome footprinting, fragmentation and 5’-end motifs.

mmc1.xlsx (88.3KB, xlsx)
Supplementary Table S2

Summary score for all 533 plasma samples.

mmc2.xlsx (34.2KB, xlsx)
Supplementary Table S3

Table of the diagnostic powers of different combination of genome features.

mmc3.xlsx (10.2KB, xlsx)
Supplementary Table S4

The diagnostic accuracy of NPC score.

mmc4.xlsx (12.3KB, xlsx)

References

  • 1.Bray F., Ferlay J., Soerjomataram I., Siegel R.L., Torre L.A., Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. doi: 10.3322/caac.21492. [DOI] [PubMed] [Google Scholar]
  • 2.Tang S.Q., Mao Y.P., Xu C., et al. The evolution of the nasopharyngeal carcinoma staging system over a 10-year period: implications for future revisions. Chin Med J. 2020;133(17):2044–2053. doi: 10.1097/CM9.0000000000000978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ji M.F., Sheng W., Cheng W.M., et al. Incidence and mortality of nasopharyngeal carcinoma: interim analysis of a cluster randomized controlled screening trial (PRO-NPC-001) in southern China. Ann Oncol. 2019;30(10):1630–1637. doi: 10.1093/annonc/mdz231. [DOI] [PubMed] [Google Scholar]
  • 4.Liu Y.P., Lv X., Zou X., et al. Minimally invasive surgery alone compared with intensity-modulated radiotherapy for primary stage I nasopharyngeal carcinoma. Cancer Commun. 2019;39(1):75. doi: 10.1186/s40880-019-0415-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Huang S.H., O'Sullivan B. Overview of the 8th edition TNM classification for head and neck cancer. Curr Treat Options Oncol. 2017;18(7):40. doi: 10.1007/s11864-017-0484-y. [DOI] [PubMed] [Google Scholar]
  • 6.Miller J.A., Le Q.T., Pinsky B.A., Wang H. Cost-effectiveness of nasopharyngeal carcinoma screening with epstein-barr virus polymerase chain reaction or serology in high-incidence populations worldwide. J Natl Cancer Inst. 2021;113(7):852–862. doi: 10.1093/jnci/djaa198. [DOI] [PubMed] [Google Scholar]
  • 7.Chen Y.P., Chan A.T.C., Le Q.T., Blanchard P., Sun Y., Ma J. Nasopharyngeal carcinoma. Lancet (London, England) 2019;394(10192):64–80. doi: 10.1016/S0140-6736(19)30956-0. [DOI] [PubMed] [Google Scholar]
  • 8.He Q., Zhou Y., Zhou J., et al. Clinical relevance of plasma EBV DNA as a biomarker for nasopharyngeal carcinoma in non-endemic areas: a multicenter study in southwestern China. Clin Chim Acta. 2023;541 doi: 10.1016/j.cca.2023.117244. [DOI] [PubMed] [Google Scholar]
  • 9.Yip T.T., Ngan R.K., Fong A.H., Law S.C. Application of circulating plasma/serum EBV DNA in the clinical management of nasopharyngeal carcinoma. Oral Oncol. 2014;50(6):527–538. doi: 10.1016/j.oraloncology.2013.12.011. [DOI] [PubMed] [Google Scholar]
  • 10.Leon S.A., Shapiro B., Sklaroff D.M., Yaros M.J. Free DNA in the serum of cancer patients and the effect of therapy. Cancer Res. 1977;37(3):646–650. [PubMed] [Google Scholar]
  • 11.Alix-Panabières C., Pantel K. Liquid biopsy: from discovery to clinical application. Cancer Discov. 2021;11(4):858–873. doi: 10.1158/2159-8290.CD-20-1311. [DOI] [PubMed] [Google Scholar]
  • 12.Cheng M.L., Pectasides E., Hanna G.J., Parsons H.A., Choudhury A.D., Oxnard G.R. Circulating tumor DNA in advanced solid tumors: clinical relevance and future directions. CA Cancer J Clin. 2021;71(2):176–190. doi: 10.3322/caac.21650. [DOI] [PubMed] [Google Scholar]
  • 13.Azad T.D., Chaudhuri A.A., Fang P., et al. Circulating tumor DNA analysis for detection of minimal residual disease after chemoradiotherapy for localized esophageal cancer. Gastroenterology. 2020;158(3):494–505.e6. doi: 10.1053/j.gastro.2019.10.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dudley J.C., Diehn M. Detection and diagnostic utilization of cellular and cell-free tumor DNA. Annu Rev Pathol. 2021;16:199–222. doi: 10.1146/annurev-pathmechdis-012419-032604. [DOI] [PubMed] [Google Scholar]
  • 15.Panda A., Yadav A., Yeerna H., et al. Tissue- and development-stage-specific mRNA and heterogeneous CNV signatures of human ribosomal proteins in normal and cancer samples. Nucleic Acids Res. 2020;48(13):7079–7098. doi: 10.1093/nar/gkaa485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Deveson I.W., Gong B., Lai K., et al. Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology. Nat Biotechnol. 2021;39(9):1115–1128. doi: 10.1038/s41587-021-00857-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhao Y., Wang J., Liang F., et al. NucMap: a database of genome-wide nucleosome positioning map across species. Nucleic Acids Res. 2019;47(D1):D163–D169. doi: 10.1093/nar/gky980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jiang P., Sun K., Peng W., et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 2020;10(5):664–673. doi: 10.1158/2159-8290.CD-19-0622. [DOI] [PubMed] [Google Scholar]
  • 19.Cristiano S., Leal A., Phallen J., et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385–389. doi: 10.1038/s41586-019-1272-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Davoli T., Xu A.W., Mengwasser K.E., et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell. 2013;155(4):948–962. doi: 10.1016/j.cell.2013.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen L., Abou-Alfa G.K., Zheng B., et al. Genome-scale profiling of circulating cell-free DNA signatures for early detection of hepatocellular carcinoma in cirrhotic patients. Cell Res. 2021;31(5):589–592. doi: 10.1038/s41422-020-00457-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ahlquist D.A. Universal cancer screening: revolutionary, rational, and realizable. NPJ Precis Oncol. 2018;2:23. doi: 10.1038/s41698-018-0066-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hahn W.C., Bader J.S., Braun T.P., et al. An expanded universe of cancer targets. Cell. 2021;184(5):1142–1155. doi: 10.1016/j.cell.2021.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Garraway Levi A., Lander Eric S. Lessons from the cancer genome. Cell. 2013;153(1):17–37. doi: 10.1016/j.cell.2013.03.002. [DOI] [PubMed] [Google Scholar]
  • 25.Ma B.B., King A., Lo Y.M., et al. Relationship between pretreatment level of plasma Epstein-Barr virus DNA, tumor burden, and metabolic activity in advanced nasopharyngeal carcinoma. Int J Radiat Oncol Biol Phys. 2006;66(3):714–720. doi: 10.1016/j.ijrobp.2006.05.064. [DOI] [PubMed] [Google Scholar]
  • 26.Lo Y.M., Leung S.F., Chan L.Y., et al. Plasma cell-free Epstein-Barr virus DNA quantitation in patients with nasopharyngeal carcinoma. Correlation with clinical staging. Ann N Y Acad Sci. 2000;906:99–101. doi: 10.1111/j.1749-6632.2000.tb06597.x. [DOI] [PubMed] [Google Scholar]
  • 27.Chan K.C.A., Woo J.K.S., King A., et al. Analysis of plasma epstein-barr virus DNA to screen for nasopharyngeal cancer. N Engl J Med. 2017;377(6):513–522. doi: 10.1056/NEJMoa1701717. [DOI] [PubMed] [Google Scholar]
  • 28.Li T., Li F., Guo X., et al. Anti-epstein-barr virus BNLF2b for mass screening for nasopharyngeal cancer. N Engl J Med. 2023;389(9):808–819. doi: 10.1056/NEJMoa2301496. [DOI] [PubMed] [Google Scholar]
  • 29.Tian F., Yip S.P., Kwong D.L., Lin Z., Yang Z., Wu V.W. Promoter hypermethylation of tumor suppressor genes in serum as potential biomarker for the diagnosis of nasopharyngeal carcinoma. Cancer Epidemiol. 2013;37(5):708–713. doi: 10.1016/j.canep.2013.05.012. [DOI] [PubMed] [Google Scholar]
  • 30.Lam W.K.J., Jiang P., Chan K.C.A., et al. Sequencing-based counting and size profiling of plasma Epstein-Barr virus DNA enhance population screening of nasopharyngeal carcinoma. Proc Natl Acad Sci U S A. 2018;115(22):E5115–E5124. doi: 10.1073/pnas.1804184115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shamay M., Hand N., Lemas M.V., et al. CpG methylation as a tool to characterize cell-free Kaposi sarcoma herpesvirus DNA. J Infect Dis. 2012;205(7):1095–1099. doi: 10.1093/infdis/jis032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zviran A., Schulman R.C., Shah M., et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med. 2020;26(7):1114–1124. doi: 10.1038/s41591-020-0915-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Diehl F., Schmidt K., Choti M.A., et al. Circulating mutant DNA to assess tumor dynamics. Nat Med. 2008;14(9):985–990. doi: 10.1038/nm.1789. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table S1

The details of top genomic signatures from CNV, nucleosome footprinting, fragmentation and 5’-end motifs.

mmc1.xlsx (88.3KB, xlsx)
Supplementary Table S2

Summary score for all 533 plasma samples.

mmc2.xlsx (34.2KB, xlsx)
Supplementary Table S3

Table of the diagnostic powers of different combination of genome features.

mmc3.xlsx (10.2KB, xlsx)
Supplementary Table S4

The diagnostic accuracy of NPC score.

mmc4.xlsx (12.3KB, xlsx)

Articles from eBioMedicine are provided here courtesy of Elsevier

RESOURCES