Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2021 Feb 24;95(6):e02354-20. doi: 10.1128/JVI.02354-20

Identification and Complete Validation of Prognostic Gene Signatures for Human Papillomavirus-Associated Cancers: Integrated Approach Covering Different Anatomical Locations

Eun Jung Kwon a,#, Mihyang Ha a,#, Jeon Yeob Jang b,c,, Yun Hak Kim d,e,
Editor: Rozanne M Sandri-Goldinf
PMCID: PMC8094965  PMID: 33361419

Persistent infection with high-risk HPV interferes with cell function regulation and causes cell mutations, which accumulate over the long term and eventually develop into cancer. Results of pathway enrichment analysis presumably showed this accumulation of intracellular damage during the chronic HPV-infected state.

KEYWORDS: human papillomavirus, head and neck cancer, cervical cancer, prognostic biomarker, gene signature

ABSTRACT

Human papillomavirus (HPV) infects squamous epithelium and is a major cause of cervical cancer (CC) and a subset of head and neck cancers (HNC). Virus-induced tumorigenesis, molecular alterations, and related prognostic markers are expected to be similar between the two cancers, but they remain poorly understood. We present integrated molecular analysis of HPV-associated tumors from TCGA and GEO databases and identify prognostic biomarkers. Analysis of gene expression profiles identified common upregulated genes and pathways of DNA replication and repair in the HPV-associated tumors. We established 34 prognostic gene signatures with a universal cutoff value in TCGA-CC using Elastic Net Cox regression analysis. We were able to externally validate our results in the TCGA-HNC and several GEO data sets, and demonstrated prognostic power in HPV-associated HNC, but not in HPV-negative cancers. The HPV-related prognostic and predictive indicator did not discriminate other cancers, except bladder urothelial carcinoma. These results identify and completely validate a highly selective prognostic system and its cross-usefulness in HPV-associated cancers, regardless of the tumor’s anatomical subsite.

IMPORTANCE Persistent infection with high-risk HPV interferes with cell function regulation and causes cell mutations, which accumulate over the long term and eventually develop into cancer. Results of pathway enrichment analysis presumably showed this accumulation of intracellular damage during the chronic HPV-infected state. We used highly advanced statistical methods to identify the most appropriate genes and coefficients and developed the HPV-related prognostic and predictive indicator (HPPI) risk scoring system. We applied the same cutoff value to training and validation sets and demonstrated good prognostic performance in both data sets, and confirmed a consistent trend in external validation. Moreover, HPPI presented significant validation results for bladder cancer suspected to be related to HPV. This suggested that our risk scoring system based on the prognostic gene signature could play an important role in the development of treatment strategies for patients with HPV-related cancer.

INTRODUCTION

Human papillomavirus (HPV) is a double-stranded, circular DNA virus with a tropism for squamous epithelium, and is known to be a causative agent for cervical cancer (CC), one of the leading causes of death in women worldwide (1, 2). The HPV replication cycle is tightly associated with the differentiation of the basal layers of the infected epithelium (3). Once inside the host cell, the HPV genome is transported to the nucleus and replicates upon the differentiation of the basal cells while maintaining a low copy number of viral DNA (47). In terminally differentiated cells, the HPV replicates a high copy number of DNA and produces progenies (3, 4, 8).

The HPV genome is divided into three sections: (i) the early gene-coding region (E1, E2, E4, E5, E6 and E7), (ii) the late gene-coding region (L1 and L2), and (iii) the long control region (LCR or URR). The E6 and E7 viral oncoproteins play especially important roles in HPV carcinogenesis (9). Overexpression of viral oncogenes E6 and E7 and genetic alterations are known as drivers of mutation in cancer progression (2, 1012). Thus, HPV is grouped into high- and low-risk HPV according to the potential for cellular transformation, which is managed by E6 and E7 proteins (13). Persistent infection with high-risk HPV interferes with cell function regulation and causes cell mutations, which accumulate over the long term and eventually develop into cancer (5, 14, 15).

Besides CC, it is now widely accepted that subtypes of high-risk HPV are associated with a subset of head and neck cancers (HNC), especially in the oropharyngeal subsite (1, 16, 17). The incidence of HPV-associated HNC is increasing worldwide, which elicits major public health problems (1821). Patients with HPV-associated oropharyngeal cancers have unique epidemiologic, molecular, and clinical characteristics; HPV-associated and HPV-negative cancers are now classified as different diseases with different clinical staging systems (2224). One of the most prominent differences in HPV-associated HNC is its favorable prognosis compared with traditional HPV-negative HNC (25, 26). Multiple prospective clinical trials have indicated significantly better overall survival (OS) rates and response to radiation therapy in HPV-associated HNC, eliciting further discussion of treatment deescalation to reduce long-term side effects (25, 26). Clinical trials for radiation and chemotherapy de-escalation and minimally invasive surgical techniques are currently ongoing for HPV-associated HNC, while insufficient evidence supports less intensive therapy for HPV-negative HNC (22).

Gene signatures are groups of genes within a cell that exhibit unique gene expression patterns. Identifying gene signatures is an important issue because it can be used to select patient groups whose specific treatments are effective (27). Moreover, it is important to establish accurate risk stratification and identify biomarkers for predicting prognosis to avoid potential under treatment of patients at risk for recurrence and poor survival (1, 28), and for guiding treatment decisions for HPV-associated HNC (23, 29). The most commonly used method for identifying biomarkers and developing prognostic gene signatures is Cox regression analysis with variable selection (30). However, since the development of modern scientific technology has generated data with a huge number of variables, such as microarrays and SNPs, it is necessary to apply different prognostic modeling approaches distinct from the existing variable selection methods.

A recent study demonstrated the effectiveness of a therapeutic agent that could be applied to treat mismatch repair-deficient tumors with different anatomical locations and tissues of origin (31). It is worth noting that tumors with similar molecular characteristics could be considered cancer subtypes regardless of their anatomical location, which comprises the classical classification of human cancers. Interestingly, in HPV-associated cancer, developing therapies such as genetically engineered T cells have exhibited promising therapeutic efficacy in early-stage clinical trials regardless of the tumor’s anatomical location (32, 33). Therefore, a comprehensive analysis is warranted to reveal common molecular features in other cancers that are not limited to specific cancer types.

To date, several studies have identified differentially expressed genes (DEGs) while investigating functional and biological process dysfunctions associated with HPV infection (34); however, the DEGs were determined for only one type of cancer (3538), and common DEGs induced by the virus, regardless of anatomical location, were not identified.

Therefore, the goal of the present study was to identify common DEGs and pathways of HPV-related cancers, independent of their anatomical location, and to identify and completely validate prognostic gene signatures for HPV-associated cancers based on the hypothesis that HPV-associated tumors have similar molecular characteristics. Toward these goals, the established prognostic gene signature for TCGA-CC was validated in other types of tumors, including HPV-associated TCGA-HNC, and validated externally in GEO data sets to confirm our technique. The study workflow is illustrated in Fig. 1.

FIG 1.

FIG 1

Flowchart of the proposed method. BLCA, bladder urothelial carcinoma; CC, cervical cancer; coef, regression coefficient; DEGs, differentially expressed genes; HNC, head and neck cancer; HPPI, HPV-related prognostic and predictive indicator.

RESULTS

Upregulated genes and enriched pathways in HPV-associated cancers.

To identify common DEGs between HPV-associated CC and HNC, we employed data sets from TCGA and GSE65858 databases (Fig. 2A to C). Using significance analysis of microarrays (SAM), a robust and straightforward technique to distinguish significant genes between two or more groups (39), we identified 18 common upregulated genes (ANKRD32, ARHGAP4, C16orf75, C2orf55, CENPK, CIDEB, DDB2, FAM111B, GRIN2C, LIG1, MGA, NEURL1B, NUSAP1, PHF19, POLD1, SP1, UBR7, and WDR76) in HPV-associated cancers (Fig. 2D, Table S1 in the supplemental material). The significantly enriched pathways of common genes were “nucleotide excision repair,” “mismatch repair,” “base excision repair,” and “DNA replication” (Fig. 2E). These results led us to hypothesize the presence of common prognostic genomic biomarkers that centrally control HPV-associated cancers.

FIG 2.

FIG 2

Analysis of differentially expressed genes (DEGs). Volcano plot visualizing DEGs between HPV-negative and HPV-positive data sets TCGA-CC (A), TCGA-HNC (B), and GSE65858 (C). The abscissa represent the fold change in gene expression and the ordinate represents −log 10 (adjusted P value; q value). A cutoff of a 1-fold change and q value of <0.01 were used as the threshold values to determine DEG significance. Blue and red dots represent downregulated and upregulated genes, respectively; gray dots indicate insignificant DEGs. (D) Gene expression heatmap of 18 common upregulated genes in 3 cohorts; TCGA-CC, TCGA-HNC, and GSE65858 in order from the top. For the gene expression level of TCGA data, a log10 transformation was used. (E) Result of KEGG pathway enrichment analysis of 18 common upregulated genes. Colored squares indicate genes involved in the pathway(s).

Selected variables using Elastic Net and construction of HPPI.

To identify a prognostic gene signature, we employed grouped variable selection using Elastic Net to choose prognostic variables in high-dimensional genomic data from the TCGA database for CC, one of the largest and reliable data sets regarding HPV-associated tumors. To determine the optimal parameter alpha in variable selection, we compared the C index of the training and validation sets at all values of alpha (0.1 to 1.0) (Fig. S1). Since the C index of the training and validation sets at alpha = 0.99 was especially higher than any of the surrounding values, we selected 34 genes to calculate the HPV-related prognostic and predictive indicator (HPPI) risk score at alpha = 0.99. We constructed the HPPI risk scoring system using the regression coefficients and expression values of the selected genes from Elastic Net. The selected genes and their regression coefficients are listed in Table 1 and the expression pattern of selected genes according to HPPI risk score is depicted in Fig. 3A.

TABLE 1.

Selected gene list and regression coefficients

Gene Official symbol Full name Regression coefficient
APOBEC3H APOBEC3H Apolipoprotein B mRNA editing enzyme catalytic subunit 3H −0.00326
C13orf1 SPRYD7 SPRY domain containing 7 0.000285
C19orf20 TPGS1 Tubulin polyglutamylase complex subunit 1 0.000593
C6orf211 ARMT1 Acidic residue methyltransferase 1 7.88E−05
CD46 CD46 CD46 molecule 2.40E−05
CXCL2 CXCL2 C-X-C motif chemokine ligand 2 0.000165
DLEU1 DLEU1 Deleted in lymphocytic leukemia 1 5.80E−05
EGLN1 EGLN1 Egl-9 family hypoxia inducible factor 1 0.000401
ERG ERG ETS transcription factor ERG 2.85E−05
ESM1 ESM1 Endothelial cell specific molecule 1 1.13E−06
FAM124B FAM124B Family with sequence similarity 124 member B 1.10E−05
GPR143 GPR143 G protein-coupled receptor 143 −0.00022
MTCP1NB CMC4 C-X9-C motif containing 4 −1.98E−05
MTMR2 MTMR2 Myotubularin related protein 2 6.70E−05
NAMPT NAMPT Nicotinamide phosphoribosyltransferase 1.67E−05
NCSTN NCSTN Nicastrin 2.27E−05
NRP1 NRP1 Neuropilin 1 9.56E−05
P4HA2 P4HA2 Prolyl 4-hydroxylase subunit alpha 2 6.23E-05
PEAR1 PEAR1 Platelet endothelial aggregation receptor 1 0.000379
PGK1 PGK1 Phosphoglycerate kinase 1 2.83E-06
PLAC1 PLAC1 Placenta enriched 1 4.42E-05
PRKCDBP CAVIN3 Caveolae associated protein 3 0.0001
PTPN12 PTPN12 Protein tyrosine phosphatase nonreceptor type 12 0.000172
RAD9B RAD9B RAD9 checkpoint clamp component B −0.00027
ROR1 ROR1 Receptor tyrosine kinase like orphan receptor 1 0.002461
SCML1 SCML1 Scm polycomb group protein like 1 −0.0003
SH3GLB2 SH3GLB2 SH3 domain containing GRB2 like, endophilin B2 −9.53E−05
STARD3NL STARD3NL STARD3 N-terminal like 0.000178
THBD THBD Thrombomodulin 1.33E−05
TMEM87B TMEM87B Transmembrane protein 87B 2.12E−05
TYW1 TYW1 tRNA-yW synthesizing protein 1 homolog 0.000216
UBE2NL UBE2NL Ubiquitin conjugating enzyme E2 N like (gene/pseudogene) −0.00213
ZNF134 ZNF134 Zinc finger protein 134 3.12E−05
ZNF418 ZNF418 Zinc finger protein 418 0.001514

FIG 3.

FIG 3

Development of the HPPI risk scoring system. (A) Standardized gene expression heatmap for the training set (blue = low expression, red = high expression). Columns are ordered by increasing risk scores. (For readability, values below −3 and above 3 are denoted by −3 and 3.) Score distribution, survival status of patients, median and universal cutoff scores, and hazard ratios (HRs) for each cutoff score are illustrated. (B) Forest plots of the expression of each gene in the training set. Rows are ordered by increasing HR. The horizontal axis represents HR with 95% confidence intervals (CI) estimated using the univariate Cox proportional hazards model. Asterisks represent the statistical significance in patient survival outcome by univariate Cox regression: ***, P < 0.001; **, P < 0.01; *, P < 0.05. (C) Violin plot depicting score distribution in each data set; a cutoff score of 1.9 was used to stratify patients into 2 groups. CC, cervical cancer; HNC, head and neck cancer; HPPI, HPV-related prognostic and predictive indicator.

In survival analysis, the hazard ratio (HR) is a value obtained by dividing the hazard rate of the experimental group by the hazard rate of the control group. If it is greater than 1, it means that the risk of the experimental group is increased, and if it is less than 1, it means that the risk of the experimental group is reduced. A univariate Cox regression was performed to assess the impact of selected genes individually on survival, and the HR and 95% confidence intervals of selected genes can be identified (Fig. 3B). We chose the cutoff value that maximizes the Uno’s C index within the common range (from 0.318 to 3.088) of the HPPI risk score of the training and validation sets as described above. The distribution and 5-number summary of HPPI risk scores for the training and validation sets are summarized in Fig. 3C and Table S2.

Prognostic values of HPPI in HPV-positive cancers.

To confirm prognostic significance, we performed Kaplan-Meier (KM) survival analysis with log rank test on the TCGA-CC and TCGA-HNC data sets. The universal cutoff value (1.9) was able to significantly discriminate patient risk in both cancer groups (TCGA-CC, HR = 10.028, P < 0.001; TCGA-HNC, HR = 2.577, P < 0.001) (Fig. 4A and B). These results indicate there is a significant relationship between high-risk group and a poor prognosis.

FIG 4.

FIG 4

Prognostic value of the HPPI gene signature in HPV-positive cancers. (A and B) KM plots of overall survival (OS) in data sets for CC (A) and HNC (B). HPPI risk scores stratified patients into two groups based on the universal cutoff score. Hazard ratio (HR), log rank test P value, and number of patients successfully stratified (n) determined from the univariate Cox regression analysis are shown on each KM survival curve. Blue and red KM curves represent predicted low- and high-risk groups, respectively. (C) Time-dependent area under ROC curves at the universal cutoff score in each data set. (D) Distribution of HPPI risk scores divided into two groups of patients based on the universal cutoff score with density plot and histogram. (E) Bar plot comparing the number of patients for other clinical prognostic variables between low- and high-risk groups. CC, cervical cancer; HNC, head and neck cancer; HPPI, HPV-related prognostic and predictive indicator; KM, Kaplan-Meier; OS, overall survival.

The time-dependent area under the curve (AUC) at the universal cutoff score, defined as the HPPI risk scoring system, was assessed for each data set. The results indicated that the universal cutoff score demonstrated good prognostic performance in both data sets with C index values of 0.745 and 0.689 in TCGA-CC and TCGA-HNC, respectively (Fig. 4C).

Next, we briefly described the distribution of the HPPI risk scores and the various subgroups (stage, age, smoking history, gender) by risk groups (Fig. 4D and E). HPPI could divide patients into low-risk (TCGA-CC, n = 226; TCGA-HNC, n = 70) and high-risk (TCGA-CC, n = 43; TCGA-HNC, n = 27) groups. We found the distribution of the other factors, including stage, age, smoking history, and gender was mostly homogeneous between low- and high-risk groups.

Because the stage and age factors could affect overall survival in cancer patients, we further performed subgroup analysis regarding the stage and age. As a result, the HPPI risk score predicted the overall survival (OS) with considerable accuracy in the subgroups stage I/II, stage III/IV, age = <55, and age = 55+ in TCGA-CC (HR, 15.325, 5.265, 12.071, and 10.465; P < 0.001, P < 0.001, P < 0.001, and P < 0.001; C index, 0.897, 0.886, 0.911, and 0.887, respectively). Excluding the stage I/II subgroup, subgroups in TCGA-HNC also predicted the OS with reliable accuracy (HR, 741.248, 3.019, 3.313, and 2.384; P = 0.067, P < 0.001, P = 0.017, and P < 0.001; C index, 0.912, 0.644, 0.632, and 0.664, respectively) (Fig. 5).

FIG 5.

FIG 5

Subgroup analysis in the training and validation sets. (A to D and F to I) KM plots of OS for each subgroup of patients in the training and validation sets for stage (A, B, F, and G) and age (C, D, H, and I). HPPI risk scores stratified patients into two groups based on the universal cutoff score. Hazard ratio (HR), log rank test P value, and number of patients successfully stratified (n) determined from the univariate Cox regression analysis are shown at the bottom right of each graph. Blue and red KM curves represent predicted low- and high-risk groups, respectively. (E and J) Time-dependent AUC curves at 5 years for each subgroup of patients in the training (E) and validation sets (J). Red, stage I/II; yellow, stage III/IV; blue, age = <55; green, age = 55+. C index values are described at the bottom right of the graphs. HPPI, HPV-related prognostic and predictive indicator; KM, Kaplan-Meier; OS, overall survival.

In univariate and multivariate Cox regression analyses, HPPI had a significant effect on survival as an independent prognostic factor for HPV-related cancers (Table 2). These results suggest that TCGA-CC and TCGA-HNC groups with a high HPPI risk score may have mechanisms that potentially contribute to disease progression and poor prognosis.

TABLE 2.

Univariate and multivariate Cox regression analysis

Parametera Univariate analysisa
Multivariate analysisa
P HR 95 CI P HR 95 CI
CC (HPV+)
    Age 0.23 1.01 0.99 1.03 0.35 1.01 0.99 1.03
    Stage (I/II vs III/IV) 0.003c 2.17 1.30 3.62 0.25 1.38 0.79 2.40
    HPPI risk score <0.001d 10.03 6.69 15.03 <0.001d 9.75 6.48 14.68
HNC (HPV+)
    Age 0.95 1.00 0.96 1.04 0.62 0.99 0.95 1.03
    Stage (I/II vs III/IV) 0.66 1.27 0.43 3.73 0.32 1.81 0.57 5.80
    HPPI risk score 0.015b 2.58 1.20 5.53 0.005c 4.10 1.55 10.87
BLCA
    Age <0.001d 1.03 1.02 1.05 <0.001d 1.03 1.02 1.05
    Stage (I/II vs III/IV) <0.001d 2.20 1.52 3.18 <0.001d 1.97 1.35 2.87
    HPPI risk score 0.008c 1.36 1.08 1.71 0.04b 1.27 1.01 1.61
a

BLCA, bladder urothelial carcinoma; CC, cervical cancer; HNC, head and neck cancer; HPPI, HPV-related prognostic and predictive indicator; HR, high risk; CI, confidence interval.

b

Significant at P < 0.05.

c

Significant at P < 0.01.

d

Significant at P < 0.001.

Evaluation of HPPI in HPV-negative cancers and other types of cancers.

Next, we evaluated the prognostic role of our HPPI scoring system in HPV-negative TCGA-CC and TCGA-HNC. Importantly, KM survival analysis indicated the HPPI scoring system did not predict OS in patients with HPV-negative TCGA-CC (P = 0.49) and TCGA-HNC (P = 0.40) (Fig. 6A and B). These results indicated the HPPI scoring system, based on the identified prognostic gene signature, demonstrated selective predictability in HPV-positive but not in HPV-negative tumors. The distribution of HPPI risk scores and the various subgroups (stage, age, smoking history, gender) by risk groups is demonstrated in Fig. 6C and D. The risk scores for HPV-negative were not clearly stratified, while the distribution of subgroups by risk group was mostly homogeneous.

FIG 6.

FIG 6

Prognostic value of the HPPI gene signature in HPV-negative cancers. (A and B) KM plots of OS in each data sets for CC (HPV-negative) (A) and HNC (HPV-negative) (B). HPPI risk scores stratified patients into two groups based on the universal cutoff score. Hazard ratio (HR), log rank test P value, and number of patients successfully stratified (n) determined from univariate Cox regression analysis are shown on each KM survival curve. Blue and red KM curves represent predicted low- and high-risk groups, respectively. (C) Distribution of HPPI risk scores was divided into two groups of patients based on the universal cutoff score with density plot and histogram. (D) Bar plot comparing the number of patients for other clinical prognostic variables between low- and high-risk groups. CC, cervical cancer; HNC, head and neck cancer; HPPI, HPV-related prognostic and predictive indicator; KM, Kaplan-Meier; OS: overall survival.

To further evaluate whether HPPI could prognostically classify other types of cancers throughout the body, we comprehensively performed log rank test and univariate Cox regression analysis for 31 cancer subtypes in TCGA data sets. Interestingly, HPPI did not discriminate OS in most cancers except bladder urothelial carcinoma (BLCA) (HR, 1.362; log rank test P = 0.001) (Fig. 7A) and BLCA was the only cancer subtype that was significant in both the log rank test and univariate Cox regression analyses (Cox regression P = 0.008) (Table S3). These results show the selective predictability of HPPI for OS.

FIG 7.

FIG 7

Prognostic value of the HPPI gene signature in BLCA. (A) KM plot of OS. HPPI risk scores stratified patients into two groups based on the universal cutoff score. Hazard ratio (HR), log rank test P value, and number of patients successfully stratified (n) determined from univariate Cox regression analysis are shown on each survival KM curve. Blue and red KM curves represent predicted low- and high-risk groups, respectively. (B) Time-dependent AUC curve at 5 years for all patients and each subgroup of patients in BLCA. Red, stage I/II; yellow, stage III/IV; blue, age = <55; green, age = 55+. C index values are described at the bottom right of graph. (C) Distribution of HPPI risk scores was divided into two groups of patients based on the universal cutoff score with density plot and histogram. (D) Bar plot comparing the number of patients for other clinical prognostic variables between low- and high-risk groups. BLCA, bladder urothelial carcinoma; HPPI, HPV-related prognostic and predictive indicator; KM, Kaplan-Meier; OS, overall survival.

In BLCA, time-dependent AUC at the universal cutoff score were evaluated. All patients and subgroup analyses demonstrated relatively good prognosis performance except for the age = <55 subgroup (C index, 0.559, 0.591, 0.539, 0.487, and 0.569, respectively) (Fig. 7B). HPPI significantly divided BLCA patients into low-risk (n = 296) and high-risk (n = 109) groups (Fig. 7C). The high-risk group presented a slightly higher proportion of male, stage IV, and older patients compared with the low-risk group (Fig. 7D).

External validation of the HPPI scoring system.

Finally, to further verify that complete risk stratification with HPPI is possible, the GEO database (CC-GSE39001, HNC-GSE65858, and BLCA-GSE13507) was used. Since the GEO database does not use the same platform and normalization method, we applied a different optimal cutoff maximizing the Uno’s C index value to each data set. We performed KM survival analysis with log rank tests on the CC-GSE39001, HNC-GSE65858, and BLCA-GSE13507 data sets (Fig. S2). As a result, the patients’ survival was statistically significantly divided by HPPI risk scoring system in the HNC-GSE65858 cohort (HNC-GSE65858, HR > 10, P = 0.044). In the CC-GSE39001 cohort, although survival analysis did not reach statistical significance, probably due to the low subject number (n = 37), the KM plot was also clearly divided into two risk groups that are consistent with our TCGA analyses (CC-GSE39001, n = 37, HR > 10, P = 0.112). In the BLCA-GSE13507 cohort, although there was a trend of risk stratification, it did not reach statistical significance (BLCA-GSE13507, HR > 10, P = 0.186). Taken together, our HPPI scoring system could significantly stratify the patients’ risk groups in both HPV-positive CC and HNC in external validation cohorts as well as TCGA.

DISCUSSION

Recent advances in gene sequencing technology have enabled high-throughput molecular analysis of large patient cohorts, which has contributed considerably to our understanding of cancer biology as well as the development of therapeutic biomarkers (23, 40, 41). Genetic biomarkers are now indispensable in therapeutic decision making and clinical trial design for several types of cancers, such as breast cancer (4245). Currently, there are commercialized prediction kits that have clinical utility. The 70-gene signature Mammaprint test (https://agendia.com/mammaprint/) (46, 47), for example, is a genomic test that analyzes the activity of specific genes in early-stage breast cancer (48, 49). It is currently recommended and used in clinical practice to support decision making for adjuvant treatments all over the world (50). In addition, the 21-gene signature Oncotype DX (ODX) test developed by Genomic Health is currently the most common commercialized prediction kit in the United States (http://www.oncotypedx.com/en-US/Breast/PatientCaregiver/OncoOverview.aspx), and has sold more than 200,000 tests since it entered the U.S. market in 2004 (51). Both the National Comprehensive Cancer Network (NCCN) and the American Society of Clinical Oncology (ASCO) clinical guidelines encourage the use of the ODX test in appropriate conditions (51). Investigations of gene expression signatures in HNC have identified unique molecular characteristics and carcinogenesis of HPV-associated HNC compared with HPV-negative tumors (52). However, although a growing body of knowledge has accumulated regarding prognostic gene expression signatures in HPV-negative HNC, those specific to HPV-associated HNC remain unknown (53, 54). The lack of a well-established cohort is one of the major reasons for the absence of HPV-specific prognostic gene signatures. To overcome this limitation, we used an HPV-associated CC cohort to develop an HPV-specific prognostic gene signature. This is the first study to comprehensively analyze different HPV-associated cancers that exist in different anatomical sites.

Virus-infected host cells operate complex networks that maintain genome integrity by detecting and repairing DNA damage (5557). Compromise of the DNA damage response (DDR) network that detects DNA lesions causes genomic instability and cancer (55, 57, 58). DNA synthesis is one of the downstream responses in the DDR signaling, so it is exploited by some viruses, such as HPV (59), because they do not encode the polymerase necessary for replication and depend instead on the cellular replication machineries to facilitate viral DNA replication (60, 61). Many studies have identified that HPV promotes its productive replication using both the ataxia-telangiectasia mutated (ATM)-dependent and ataxia telangiectasia and Rad3 related (ATR) DDR (5, 6264). However, induction of the DDR network has a risk of inducing processes, such as apoptosis, that do not benefit HPV (59). To prevent these events, the HPV E6 and E7 protein bind and degrade cellular p53 and pRb, respectively, which are very important tumor suppressors involved in processes such as cell cycle progression, DNA repair, and apoptosis (59, 65, 66). Accordingly, the results of our KEGG pathway enrichment analysis, showing the upregulated genes in HPV-positive cancers belonged to DNA replication and repair pathways, were in line with previous studies. It is a valuable insight because we demonstrated that DNA damage-related genes were commonly upregulated in HPV-positive cancers, regardless of the anatomical region.

Several HPV-related gene expression profile prognostic prediction systems for the survival of cancer patients have been developed based on clinical data. Stepwise selection using univariate Cox regression analysis is the most commonly used method for developing prognostic gene signatures (67, 68). However, most of the available variable selection methods cannot handle complex interdependence among genes (6971). The methods demonstrate high accuracy as a prognostic classifier in a training cohort, but exhibit low accuracy and serious overfitting problems when evaluated in a validation cohort (72, 73). Solving these problems requires the normalization of regression models, such as the least absolute shrinkage and selection operator (Lasso) and Ridge models (74). Because these models consider the effects of grouping, they provide a more systemic and statistical approach than the univariate Cox regression model, which only considers the individual effects of variables (41). Lasso is a powerful method to reduce the number of predictors by penalizing the regression coefficients but, in the case of highly correlated groups, it could select only one variable from the group while missing the others (75, 76). To avoid this problem, we utilized Elastic Net with both Lasso and Ridge advantages. The normalization method finds an optimal trade-off between accuracy and simplicity, which results in less overfitting and increased likelihood for generalization (71, 77). Most previous studies have applied different cutoffs to each data set, which is not strictly an accurate external validation (78, 79). In contrast, we applied a universal cutoff to all data sets and built a prognostic and predictive risk scoring system for different types of HPV-related cancers.

One of the interesting findings of the study is the predictive power of our HPPI scoring system for BLCA prognosis. The traditional risk factors for BLCA are smoking and occupational exposure to chemicals, but not all risk factors for BLCA have been clearly identified (80). Several recent studies have demonstrated that HPV is also associated with a subset of BLCA (81), which concurred with the results of our study, in which significant HPPI results were demonstrated only for BLCA out of 31 cancers in the validation set. Unfortunately, further HPV-related cohort analysis was not possible since HPV infection information was not included for BLCA. However, relatively good prognosis performance was demonstrated for BLCA patients with the HPPI scoring system.

To further support our arguments, we performed external validation through the GEO database. Unfortunately, because the data sets with enough information (number of patients, HPV status, survival information) have not yet been established and there were genes that were not expressed in the GEO database (CC-GSE39001, 16 genes; HNC-GSE65858, 1 gene; BLCA-GSE13507, 2 genes), there was a difficulty in complete validation. Nonetheless, external validation showed a noticeably consistent trend with our findings, which is expected show useful clinical worth. There is no doubt the HPPI scoring system will perform risk stratification completely when all conditions are met.

Taken together, our study established and completely validated an ideal prognostic gene signature that can be effectively applied to different independent data sets related to HPV, unlike studies that demonstrated efficiency in only one data set (82, 83). We suggest the HPPI risk scoring system, based on 34 prognostic gene signatures, could play a significant role in developing treatment strategies for patients with HPV-associated cancers. This novel method ensures reproducibility and facilitated application for newly diagnosed patients.

MATERIALS AND METHODS

Data acquisition and preprocessing.

RNA-Seq transcriptome sequencing and clinical data sets of 307 CC samples, 546 HNC samples, and 412 BLCA samples were downloaded from the Broad GDAC Firehose (http://gdac.broadinstitute.org/). The gene expression data set was composed of data generated in an Illumina HiSeq 2000 RNA-Seq platform (level 3) using the preprocessed RNAseqV2 normalized count expression values based on RNA-Seq by expectation-maximization (RSEM). The CC samples were composed of 281 HPV-positive and 22 HPV-negative cases, and the HNC samples were composed of 97 HPV-positive and 423 HPV-negative cases.

Similarly, we downloaded the gene expression data sets CC (GSE39001), HNC (GSE65858), and BLCA (GSE13507) from the Gene Expression Omnibus (GEO) database (84) using the R package GEOquery 2.54.1. The GEO data sets were annotated by the Affymetrix Human Gene 1.0 ST Array, the Illumina HumanHT-12 V4.0 expression beadchip, and the Illumina human-6 v2.0 expression beadchip platform, respectively. The CC samples consisted of 62 HPV-positive and 17 healthy cervical epithelium cases, and the HNC samples consisted of 73 HPV-positive and 179 HPV-negative cases.

We used the TCGA-CC, TCGA-HNC, and GSE65858 data sets for DEGs analysis. Patients with no survival information were excluded in the final prognostic gene signature development, and the HPV-positive TCGA-CC data set was used as a training set. TCGA-HNC and TCGA-BLCA sets were used for validation, and GEO data sets were used for external validation. Detailed patient characteristics of the TCGA and GEO data sets are summarized in Table 3 and Table S4, respectively.

TABLE 3.

Patient characteristics

Parameter Subgroup CC (HPV+; n = 269)b CC (HPV−; n = 21) HNC (HPV+; n = 97)b HNC (HPV−; n = 421)
Stage I 148 11 3 24
II 57 6 11 62
III 40 1 10 71
IV 18 3 42 223
Age <55 196 11 37 112
≥55 73 10 60 309
Smokinga 1 132 7 28 89
2 53 8 25 150
3 9 - 13 60
4 32 5 30 109
5 4 - - 2
Gender Female 11 124
Male 86 297
a

1, lifelong nonsmoker (<100 cigarettes smoked in lifetime); 2, current smoker (includes daily smokers non-daily/occasional smokers); 3, current reformed smoker for >15 years; 4, current reformed smoker for ≤15 years; 5, current reformed smoker, duration not specified.

b

CC, cervical cancer; HNC, head and neck cancer.

Differentially expressed genes according to HPV infection.

DEGs between HPV-positive and HPV-negative patients were assessed in the TCGA-CC and TCGA-HNC cohorts independently using the SAM method in R package siggenes 1.60.0 (85). First, DEGs were identified in each cancer, then common DEGs were identified in both cancers using the cutoff value for false discovery rate (FDR) of 0.01. Among common DEGs, a fold change of >1 was set for common upregulated genes, and a fold change of <1 was set for common downregulated genes. For external validation, we chose an independent cohort, GSE65858, and identified common DEGs between three cohorts. The R package gplots 3.0.4 was used to draw the plots.

Pathway enrichment analysis.

In order to better understand the molecular mechanisms of common DEGs, we performed pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database using NetworkAnalyst 3.0 (https://www.networkanalyst.ca/) (86). FDR of <0.05 was set as the cutoff criterion.

Development of the HPPI risk score.

To identify a prognostic gene signature for HPV-associated cancers, we performed grouped variable selection on the training set, HPV-positive TCGA-CC, using Elastic Net, a regularized high‐dimensional Cox regression method that combines Ridge and least absolute shrinkage and selection operator (Lasso) Cox regression analysis, using the R package glmnet 4.0-2. To improve the predictive power by reducing the variance of the predictive model, the “leave-one-out” cross validation method was used to find the optimal alpha value to determine the appropriate balance between Ridge and Lasso regression penalties (0 < alpha < 1). Alpha, the penalty parameter, sets how much weight should be given to either the Ridge (alpha = 0) or Lasso (alpha = 1) regression.

HPV-related prognostic and predictive indicator (HPPI) risk score was calculated as a sum of the multiplication of the expression value of each selected HPPI gene and its corresponding Cox regression coefficient identified from the training set (HPPI risk score = ∑[gene expression] × [regression coefficient]). The risk score is a calculated number (score) for the possibility that a patient will experience a particular outcome, such as death. Comparing the patient’s risk score to others showing similar clinical characteristics in the validated data helps predict what will occur. We selected the cutoff value for the best risk stratification using the maximal Uno’s C index in the common range of training and validation risk scores, and stratified patients into high- and low-risk groups (87). Here, the high‐risk groups mean the higher risk of mortalities.

The C index is the calculated area under the receiver operating characteristic (ROC) curve (AUC) as an indicator that can determine its performance by evaluating the predictive power of the survival prediction model. The C index values are between 0 and 1, where a C index value of 0.5 would indicate that a given combination of biomarkers has no better accuracy to predict risk than chance. In addition, a C index value of 1 represents complete accuracy at predicting risk. While C index values above 0.9 typically represent diagnostic tests of biomarker combinations that have the greatest utility in stratifying low- and high-risk, tests with C index values of 0.7 to 0.9 indicate biomarker combinations that have good utility and can potentially be improved (88). The cutoff value selected for risk stratification was called a universal cutoff value because it applied equally to both training and validation data sets.

Validation of the HPPI risk scoring system.

In order to verify the prognostic value of HPPI in predicting OS, we first applied it to HPV-positive TCGA-HNC, as HNC is known to be related to HPV. Then, we performed the log rank test and Cox regression analysis with HPPI for 31 other cancer data sets in TCGA. Other TCGA data sets were downloaded from the Broad GDAC Firehose (http://gdac.broadinstitute.org/) as previously described, and preprocessed with the same platform and normalized count expression values. Therefore, the cutoff value of HPPI for the training set was applied equally to the other TCGA data sets.

For complete external validation of HPPI risk scoring system, we finally performed external validation on GSE39001, GSE65858 and GSE13507. Unlike the TCGA database, the GEO database uses different platforms and normalized count expression values, so we applied different cutoff values maximizing Uno’s C index to each data set for the best risk stratification.

Statistical analyses.

To predict overall survival (OS), we conducted several survival analyses with the HPPI risk score using the R packages survival 3.1–8, survAUC 1.0–5, and survminer 0.4.6. Event (1) represents death with respect to OS. First, we constructed the Kaplan-Meier (KM) plot to describe the survival of patients in the low- and high-risk groups and also performed a log rank test to compare the survival distributions between the two groups. Next, we calculated Uno's C index to assess the predictive accuracy of risk stratification with a universal cutoff, which was performed for all patients and subgroups (stage and age). Finally, we performed univariate and multivariate Cox regression with HPPI risk scores, and then clinicopathological characteristics (stage and age) were assessed to figure out predictors that significantly affected survival. All data analyses were performed using R programming (version 3.6.2) and P < 0.05 was considered to represent a statistically significant difference.

Supplementary Material

Supplemental file 1
JVI.02354-20-s0001.pdf (354.7KB, pdf)

ACKNOWLEDGMENTS

This work was supported by National Research Foundation of Korea (NRF) grants 2020R1C1C1011647, NRF-2020R1C1C1003741, NRF-2019R1F1A1061323, and NRF-2018R1A5A2023879.

Author contributions were as follows. J.Y.J. and Y.H.K. initiated the study and guided the work. E.J.K. and M.H. analyzed the experimental data and interpreted the data, supported by all coauthors. J.Y.J., Y.H.K., E.J.K., and M.H. wrote the manuscript with input from all coauthors, and all authors read and approved the final manuscript.

Footnotes

Supplemental material is available online only.

REFERENCES

  • 1.Maxwell JH, Grandis JR, Ferris RL. 2016. HPV-associated head and neck cancer: unique features of epidemiology and clinical management. Annu Rev Med 67:91–101. doi: 10.1146/annurev-med-051914-021907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Harden ME, Munger K. 2017. Human papillomavirus molecular biology. Mutat Res Rev Mutat Res 772:3–12. doi: 10.1016/j.mrrev.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Munger K, Baldwin A, Edwards KM, Hayakawa H, Nguyen CL, Owens M, Grace M, Huh K. 2004. Mechanisms of human papillomavirus-induced oncogenesis. J Virol 78:11451–11460. doi: 10.1128/JVI.78.21.11451-11460.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Burd EM. 2003. Human papillomavirus and cervical cancer. Clin Microbiol Rev 16:1–17. doi: 10.1128/cmr.16.1.1-17.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Moody CA, Laimins LA. 2010. Human papillomavirus oncoproteins: pathways to transformation. Nat Rev Cancer 10:550–560. doi: 10.1038/nrc2886. [DOI] [PubMed] [Google Scholar]
  • 6.McBride AA. 2017. Mechanisms and strategies of papillomavirus replication. Biol Chem 398:919–927. doi: 10.1515/hsz-2017-0113. [DOI] [PubMed] [Google Scholar]
  • 7.Graham SV. 2017. The human papillomavirus replication cycle, and its links to cancer progression: a comprehensive review. Clin Sci (Lond) 131:2201–2221. doi: 10.1042/CS20160786. [DOI] [PubMed] [Google Scholar]
  • 8.Stubenrauch F, Laimins LA. 1999. Human papillomavirus life cycle: active and latent phases. Semin Cancer Biol 9:379–386. doi: 10.1006/scbi.1999.0141. [DOI] [PubMed] [Google Scholar]
  • 9.Pal A, Kundu R. 2019. Human papillomavirus E6 and E7: the cervical cancer hallmarks and targets for therapy. Front Microbiol 10:3116. doi: 10.3389/fmicb.2019.03116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Oyervides-Munoz MA, Perez-Maya AA, Rodriguez-Gutierrez HF, Gomez-Macias GS, Fajardo-Ramirez OR, Trevino V, Barrera-Saldana HA, Garza-Rodriguez ML. 2018. Understanding the HPV integration and its progression to cervical cancer. Infect Genet Evol 61:134–144. doi: 10.1016/j.meegid.2018.03.003. [DOI] [PubMed] [Google Scholar]
  • 11.Uyar D, Rader J. 2014. Genomics of cervical cancer and the role of human papillomavirus pathobiology. Clin Chem 60:144–146. doi: 10.1373/clinchem.2013.212985. [DOI] [PubMed] [Google Scholar]
  • 12.Noman ASM, Parag RR, Rashid MI, Rahman MZ, Chowdhury AA, Sultana A, Jerin C, Siddiqua A, Rahman L, Shirin A, Nayeem J, Mahmud R, Akther S, Shil RK, Hossain I, Alam S, Chowdhury A, Basher SB, Hasan A, Bithy S, Aklima J, Rahman M, Chowdhury N, Banu T, Karakas B, Yeger H, Farhat WA, Islam SS. 2020. Widespread expression of Sonic hedgehog (Shh) and Nrf2 in patients treated with cisplatin predicts outcome in resected tumors and are potential therapeutic targets for HPV-negative head and neck cancer. Ther Adv Med Oncol 12:1758835920911229. doi: 10.1177/1758835920911229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Durzynska J, Lesniewicz K, Poreba E. 2017. Human papillomaviruses in epigenetic regulations. Mutat Res Rev Mutat Res 772:36–50. doi: 10.1016/j.mrrev.2016.09.006. [DOI] [PubMed] [Google Scholar]
  • 14.Duensing S, Munger K. 2004. Mechanisms of genomic instability in human cancer: insights from studies with human papillomavirus oncoproteins. Int J Cancer 109:157–162. doi: 10.1002/ijc.11691. [DOI] [PubMed] [Google Scholar]
  • 15.Borcoman E, Le Tourneau C. 2017. Pembrolizumab in cervical cancer: latest evidence and clinical usefulness. Ther Adv Med Oncol 9:431–439. doi: 10.1177/1758834017708742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gillison ML, Koch WM, Capone RB, Spafford M, Westra WH, Wu L, Zahurak ML, Daniel RW, Viglione M, Symer DE, Shah KV, Sidransky D. 2000. Evidence for a causal association between human papillomavirus and a subset of head and neck cancers. J Natl Cancer Inst 92:709–720. doi: 10.1093/jnci/92.9.709. [DOI] [PubMed] [Google Scholar]
  • 17.Berman TA, Schiller JT. 2017. Human papillomavirus in cervical cancer and oropharyngeal cancer: one cause, two diseases. Cancer 123:2219–2229. doi: 10.1002/cncr.30588. [DOI] [PubMed] [Google Scholar]
  • 18.Gillison ML, Chaturvedi AK, Anderson WF, Fakhry C. 2015. Epidemiology of human papillomavirus-positive head and neck squamous cell carcinoma. J Clin Oncol 33:3235–3242. doi: 10.1200/JCO.2015.61.6995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Marur S, D'Souza G, Westra WH, Forastiere AA. 2010. HPV-associated head and neck cancer: a virus-related cancer epidemic. Lancet Oncol 11:781–789. doi: 10.1016/S1470-2045(10)70017-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chaturvedi AK, Anderson WF, Lortet-Tieulent J, Curado MP, Ferlay J, Franceschi S, Rosenberg PS, Bray F, Gillison ML. 2013. Worldwide trends in incidence rates for oral cavity and oropharyngeal cancers. J Clin Oncol 31:4550–4559. doi: 10.1200/JCO.2013.50.3870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bansal A, Singh MP, Rai B. 2016. Human papillomavirus-associated cancers: a growing global problem. Int J Appl Basic Med Res 6:84–89. doi: 10.4103/2229-516X.179027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bhatia A, Burtness B. 2015. Human papillomavirus-associated oropharyngeal cancer: defining risk groups and clinical trials. J Clin Oncol 33:3243–3250. doi: 10.1200/JCO.2015.61.2358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Budach V, Tinhofer I. 2019. Novel prognostic clinical factors and biomarkers for outcome prediction in head and neck cancer: a systematic review. Lancet Oncol 20:e313–e326. doi: 10.1016/S1470-2045(19)30177-9. [DOI] [PubMed] [Google Scholar]
  • 24.O'Sullivan B, Huang SH, Su J, Garden AS, Sturgis EM, Dahlstrom K, Lee N, Riaz N, Pei X, Koyfman SA, Adelstein D, Burkey BB, Friborg J, Kristensen CA, Gothelf AB, Hoebers F, Kremer B, Speel EJ, Bowles DW, Raben D, Karam SD, Yu E, Xu W. 2016. Development and validation of a staging system for HPV-related oropharyngeal cancer by the International Collaboration on Oropharyngeal cancer Network for Staging (ICON-S): a multicentre cohort study. Lancet Oncol 17:440–451. doi: 10.1016/S1470-2045(15)00560-4. [DOI] [PubMed] [Google Scholar]
  • 25.Ang KK, Harris J, Wheeler R, Weber R, Rosenthal DI, Nguyen-Tan PF, Westra WH, Chung CH, Jordan RC, Lu C, Kim H, Axelrod R, Silverman CC, Redmond KP, Gillison ML. 2010. Human papillomavirus and survival of patients with oropharyngeal cancer. N Engl J Med 363:24–35. doi: 10.1056/NEJMoa0912217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Fakhry C, Westra WH, Li S, Cmelak A, Ridge JA, Pinto H, Forastiere A, Gillison ML. 2008. Improved survival of patients with human papillomavirus-positive head and neck squamous cell carcinoma in a prospective clinical trial. J Natl Cancer Inst 100:261–269. doi: 10.1093/jnci/djn011. [DOI] [PubMed] [Google Scholar]
  • 27.Pak K, Oh SO, Goh TS, Heo HJ, Han ME, Jeong DC, Lee CS, Sun H, Kang J, Choi S, Lee S, Kwon EJ, Kang JW, Kim YH. 2020. A user-friendly, web-based integrative tool (ESurv) for survival analysis: development and validation study. J Med Internet Res 22:e16084. doi: 10.2196/16084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wintergerst L, Selmansberger M, Maihoefer C, Schuttrumpf L, Walch A, Wilke C, Pitea A, Woischke C, Baumeister P, Kirchner T, Belka C, Ganswindt U, Zitzelsberger H, Unger K, Hess J. 2018. A prognostic mRNA expression signature of four 16q24.3 genes in radio(chemo)therapy-treated head and neck squamous cell carcinoma (HNSCC). Mol Oncol 12:2085–2101. doi: 10.1002/1878-0261.12388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bersani C, Mints M, Tertipis N, Haeggblom L, Sivars L, Ahrlund-Richter A, Vlastos A, Smedberg C, Grun N, Munck-Wikland E, Nasman A, Ramqvist T, Dalianis T. 2017. A model using concomitant markers for predicting outcome in human papillomavirus positive oropharyngeal cancer. Oral Oncol 68:53–59. doi: 10.1016/j.oraloncology.2017.03.007. [DOI] [PubMed] [Google Scholar]
  • 30.Xie J, Zeng L. 2010. Group variable selection methods and their applications in analysis of genomic data, p 231–248. In Frontiers in computational and systems biology. Springer, London, UK. doi: 10.1007/978-1-84996-196-7_12. [DOI] [Google Scholar]
  • 31.Le DT, Durham JN, Smith KN, Wang H, Bartlett BR, Aulakh LK, Lu S, Kemberling H, Wilt C, Luber BS, Wong F, Azad NS, Rucki AA, Laheru D, Donehower R, Zaheer A, Fisher GA, Crocenzi TS, Lee JJ, Greten TF, Duffy AG, Ciombor KK, Eyring AD, Lam BH, Joe A, Kang SP, Holdhoff M, Danilova L, Cope L, Meyer C, Zhou S, Goldberg RM, Armstrong DK, Bever KM, Fader AN, Taube J, Housseau F, Spetzler D, Xiao N, Pardoll DM, Papadopoulos N, Kinzler KW, Eshleman JR, Vogelstein B, Anders RA, Diaz LA, Jr. 2017. Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science 357:409–413. doi: 10.1126/science.aan6733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Doran SL, Stevanovic S, Adhikary S, Gartner JJ, Jia L, Kwong MLM, Faquin WC, Hewitt SM, Sherry RM, Yang JC, Rosenberg SA, Hinrichs CS. 2019. T-cell receptor gene therapy for human papillomavirus-associated epithelial cancers: a first-in-human, phase I/II study. J Clin Oncol 37:2759–2768. doi: 10.1200/JCO.18.02424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Quayle SN, Girgis N, Thapa DR, Merazga Z, Kemp MM, Histed A, Zhao F, Moreta M, Ruthardt P, Hulot S, Nelson A, Kraemer LD, Beal DR, Witt L, Ryabin J, Soriano J, Haydock M, Spaulding E, Ross JF, Kiener PA, Almo S, Chaparro R, Seidel R, Suri A, Cemerski S, Pienta KJ, Simcox ME. 2020. CUE-101, a novel E7-pHLA-IL2-Fc fusion protein, enhances tumor antigen-specific T-cell activation for the treatment of HPV16-friven malignancies. Clin Cancer Res 26:1953–1964. doi: 10.1158/1078-0432.CCR-19-3354. [DOI] [PubMed] [Google Scholar]
  • 34.Reiner A, Yekutieli D, Benjamini Y. 2003. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19:368–375. doi: 10.1093/bioinformatics/btf877. [DOI] [PubMed] [Google Scholar]
  • 35.Costa RL, Boroni M, Soares MA. 2018. Distinct co-expression networks using multi-omic data reveal novel interventional targets in HPV-positive and negative head-and-neck squamous cell cancer. Sci Rep 8:15254. doi: 10.1038/s41598-018-33498-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Desrichard A, Kuo F, Chowell D, Lee KW, Riaz N, Wong RJ, Chan TA, Morris LGT. 2018. Tobacco smoking-associated alterations in the immune microenvironment of squamous cell carcinomas. J Natl Cancer Inst 110:1386–1392. doi: 10.1093/jnci/djy060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Rachidi S, Wallace K, Day TA, Alberg AJ, Li Z. 2014. Lower circulating platelet counts and antiplatelet therapy independently predict better outcomes in patients with head and neck squamous cell carcinoma. J Hematol Oncol 7:65. doi: 10.1186/s13045-014-0065-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sen P, Ghosal S, Hazra R, Arega S, Mohanty R, Kulkarni KK, Budhwar R, Ganguly N. 2020. Transcriptomic analyses of gene expression by CRISPR knockout of miR-214 in cervical cancer cells. Genomics 112:1490–1499. doi: 10.1016/j.ygeno.2019.08.020. [DOI] [PubMed] [Google Scholar]
  • 39.Tusher VG, Tibshirani R, Chu G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116–5121. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Doherty GJ, Petruzzelli M, Beddowes E, Ahmad SS, Caldas C, Gilbertson RJ. 2019. Cancer treatment in the genomic era. Annu Rev Biochem 88:247–280. doi: 10.1146/annurev-biochem-062917-011840. [DOI] [PubMed] [Google Scholar]
  • 41.Kim YH, Jeong DC, Pak K, Goh TS, Lee CS, Han ME, Kim JY, Liangwen L, Kim CD, Jang JY, Cha W, Oh SO. 2017. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients. Oncotarget 8:77515–77526. doi: 10.18632/oncotarget.20548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Janiaud P, Serghiou S, Ioannidis JPA. 2019. New clinical trial designs in the era of precision medicine: an overview of definitions, strengths, weaknesses, and current use in oncology. Cancer Treat Rev 73:20–30. doi: 10.1016/j.ctrv.2018.12.003. [DOI] [PubMed] [Google Scholar]
  • 43.Sparano JA, Gray RJ, Makower DF, Pritchard KI, Albain KS, Hayes DF, Geyer CE, Jr, Dees EC, Goetz MP, Olson JA, Jr, Lively T, Badve SS, Saphner TJ, Wagner LI, Whelan TJ, Ellis MJ, Paik S, Wood WC, Ravdin PM, Keane MM, Gomez Moreno HL, Reddy PS, Goggins TF, Mayer IA, Brufsky AM, Toppmeyer DL, Kaklamani VG, Berenberg JL, Abrams J, Sledge GW, Jr. 2018. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N Engl J Med 379:111–121. doi: 10.1056/NEJMoa1804710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Poorvu PD, Gelber SI, Rosenberg SM, Ruddy KJ, Tamimi RM, Collins LC, Peppercorn J, Schapira L, Borges VF, Come SE, Warner E, Jakubowski DM, Russell C, Winer EP, Partridge AH. 2020. Prognostic impact of the 21-gene recurrence score assay among young women with node-negative and node-positive ER-positive/HER2-negative breast cancer. J Clin Oncol 38:725–733. doi: 10.1200/JCO.19.01959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Croce S, Lesluyes T, Valle C, M'Hamdi L, Thebault N, Perot G, Stoeckle E, Noel JC, Fontanges Q, Devouassoux-Shisheboran M, Querleu D, Guyon F, Floquet A, Chakiba C, Mayeur L, Rebier F, MacGrogan GM, Soubeyran I, Le Guellec S, Chibon F. 2020. The NanoCind signature is an independent prognosticator of recurrence and death in uterine leiomyosarcomas. Clin Cancer Res 26:855–861. doi: 10.1158/1078-0432.CCR-19-2891. [DOI] [PubMed] [Google Scholar]
  • 46.Drukker CA, Bueno‐de‐Mesquita JM, Retèl VP, Harten WH, Tinteren H, Wesseling J, Roumen RMH, Knauer M, Veer LJ, Sonke GS, Rutgers EJT, Vijver MJ, Linn SC. 2013. A prospective evaluation of a breast cancer prognosis signature in the observational RASTER study. Int J Cancer 133:929–936. doi: 10.1002/ijc.28082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Soliman H, Shah V, Srkalovic G, Mahtani R, Levine E, Mavromatis B, Srinivasiah J, Kassar M, Gabordi R, Qamar R, Untch S, Kling HM, Treece T, Audeh W. 2020. MammaPrint guides treatment decisions in breast Cancer: results of the IMPACt trial. BMC Cancer 20:81. doi: 10.1186/s12885-020-6534-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Wuerstlein R, Kates R, Gluz O, Grischke EM, Schem C, Thill M, Hasmueller S, Kohler A, Otremba B, Griesinger F, Schindlbeck C, Trojan A, Otto F, Knauer M, Pusch R, Harbeck N, WSG-PRIMe investigators in Germany, Austria, Switzerland. 2019. Strong impact of MammaPrint and BluePrint on treatment decisions in luminal early breast cancer: results of the WSG-PRIMe study. Breast Cancer Res Treat 175:389–399. doi: 10.1007/s10549-018-05075-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Slodkowska EA, Ross JS. 2009. MammaPrint 70-gene signature: another milestone in personalized medical care for breast cancer patients. Expert Rev Mol Diagn 9:417–422. doi: 10.1586/erm.09.32. [DOI] [PubMed] [Google Scholar]
  • 50.Kunz G. 2011. Use of a genomic test (MammaPrint) in daily clinical practice to assist in risk stratification of young breast cancer patients. Arch Gynecol Obstet 283:597–602. doi: 10.1007/s00404-010-1454-9. [DOI] [PubMed] [Google Scholar]
  • 51.Carlson JJ, Roth JA. 2013. The impact of the Oncotype Dx breast cancer assay in clinical practice: a systematic review and meta-analysis. Breast Cancer Res Treat 141:13–22. doi: 10.1007/s10549-013-2666-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tonella L, Giannoccaro M, Alfieri S, Canevari S, De Cecco L. 2017. Gene expression signatures for head and neck cancer patient stratification: are results ready for clinical application? Curr Treat Options Oncol 18:32. doi: 10.1007/s11864-017-0472-2. [DOI] [PubMed] [Google Scholar]
  • 53.Makarov V, Gorlin A. 2019. Meta-analysis of gene expression for development and validation of a diagnostic biomarker panel for Oral Squamous Cell Carcinoma. Comput Biol Chem 82:74–79. doi: 10.1016/j.compbiolchem.2019.06.008. [DOI] [PubMed] [Google Scholar]
  • 54.Qian X, Nguyen DT, Dong Y, Sinikovic B, Kaufmann AM, Myers JN, Albers AE, Graviss EA. 2019. Prognostic score predicts survival in HPV-negative head and neck squamous cell cancer patients. Int J Biol Sci 15:1336–1344. doi: 10.7150/ijbs.33329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Weitzman MD, Lilley CE, Chaurushiya MS. 2010. Genomes in conflict: maintaining genome integrity during virus infection. Annu Rev Microbiol 64:61–81. doi: 10.1146/annurev.micro.112408.134016. [DOI] [PubMed] [Google Scholar]
  • 56.Holcomb AJ, Brown L, Tawfik O, Madan R, Shnayder Y, Thomas SM, Wallace NA. 2020. DNA repair gene expression is increased in HPV positive head and neck squamous cell carcinomas. Virology 548:174–181. doi: 10.1016/j.virol.2020.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Giglia-Mari G, Zotter A, Vermeulen W. 2011. DNA damage response. Cold Spring Harb Perspect Biol 3:a000745. doi: 10.1101/cshperspect.a000745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Rawlinson SM, Zhao T, Rozario AM, Rootes CL, McMillan PJ, Purcell AW, Woon A, Marsh GA, Lieu KG, Wang LF, Netter HJ, Bell TDM, Stewart CR, Moseley GW. 2018. Viral regulation of host cell biology by hijacking of the nucleolar DNA-damage response. Nat Commun 9:3057. doi: 10.1038/s41467-018-05354-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Nilsson K, Wu C, Schwartz S. 2018. Role of the DNA damage response in human papillomavirus RNA splicing and polyadenylation. Int J Mol Sci 19:1735. doi: 10.3390/ijms19061735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Gautam D, Moody CA. 2016. Impact of the DNA damage response on human papillomavirus chromatin. PLoS Pathog 12:e1005613. doi: 10.1371/journal.ppat.1005613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Anacker DC, Moody CA. 2017. Modulation of the DNA damage response during the life cycle of human papillomaviruses. Virus Res 231:41–49. doi: 10.1016/j.virusres.2016.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Moody CA, Laimins LA. 2009. Human papillomaviruses activate the ATM DNA damage pathway for viral genome amplification upon differentiation. PLoS Pathog 5:e1000605. doi: 10.1371/journal.ppat.1000605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Gillespie KA, Mehta KP, Laimins LA, Moody CA. 2012. Human papillomaviruses recruit cellular DNA repair and homologous recombination factors to viral replication centers. J Virol 86:9520–9526. doi: 10.1128/JVI.00247-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Anacker DC, Gautam D, Gillespie KA, Chappell WH, Moody CA. 2014. Productive replication of human papillomavirus 31 requires DNA repair factor Nbs1. J Virol 88:8528–8544. doi: 10.1128/JVI.00517-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Lakin ND, Jackson SP. 1999. Regulation of p53 in response to DNA damage. Oncogene 18:7644–7655. doi: 10.1038/sj.onc.1203015. [DOI] [PubMed] [Google Scholar]
  • 66.Spriggs CC, Laimins LA. 2017. Human papillomavirus and the DNA damage response: exploiting host repair pathways for viral replication. Viruses 9:232. doi: 10.3390/v9080232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Alsaleem MA, Ball G, Toss MS, Raafat S, Aleskandarany M, Joseph C, Ogden A, Bhattarai S, Rida PCG, Khani F, Davis M, Elemento O, Aneja R, Ellis IO, Green A, Mongan NP, Rakha E. 2020. A novel prognostic two-gene signature for triple negative breast cancer. Mod Pathol 33:2208–2220. doi: 10.1038/s41379-020-0563-7. [DOI] [PubMed] [Google Scholar]
  • 68.Li L, Liu J, Xu M, Yu H, Lv C, Cao F, Wang Z, Fu Y, Zhang M, Meng H, Zhang X, Kang L, Zhang Z, Li J, Feng J, Lian X, Yu L, Zhou J. 2020. Treatment response, survival, safety, and predictive factors to chimeric antigen receptor T cell therapy in Chinese relapsed or refractory B cell acute lymphoblast leukemia patients. Cell Death Dis 11:207. doi: 10.1038/s41419-020-2388-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Goh TS, Lee JS, Il Kim J, Park YG, Pak K, Jeong DC, Oh SO, Kim YH. 2019. Prognostic scoring system for osteosarcoma using network-regularized high-dimensional Cox-regression analysis and potential therapeutic targets. J Cell Physiol 234:13851–13857. doi: 10.1002/jcp.28065. [DOI] [PubMed] [Google Scholar]
  • 70.Pak K, Kim YH, Suh S, Goh TS, Jeong DC, Kim SJ, Kim IJ, Han ME, Oh SO. 2019. Development of a risk scoring system for patients with papillary thyroid cancer. J Cell Mol Med 23:3010–3015. doi: 10.1111/jcmm.14208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Sun H, Lin W, Feng R, Li H. 2014. Network-regularized high-dimensional Cox regression for analysis of genomic data. Stat Sin 24:1433–1459. doi: 10.5705/ss.2012.317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Subramanian J, Simon R. 2013. Overfitting in prediction models—is it a problem only in high dimensions? Contemp Clin Trials 36:636–641. doi: 10.1016/j.cct.2013.06.011. [DOI] [PubMed] [Google Scholar]
  • 73.Kim K, Zakharkin SO, Allison DB. 2010. Expectations, validity, and reality in gene expression profiling. J Clin Epidemiol 63:950–959. doi: 10.1016/j.jclinepi.2010.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Pereira JM, Basto M, Silva AFd. 2016. The logistic Lasso and Ridge regression in predicting corporate failure. Procedia Economics and Finance 39:634–641. doi: 10.1016/S2212-5671(16)30310-0. [DOI] [Google Scholar]
  • 75.Ajana S, Acar N, Bretillon L, Hejblum BP, Jacqmin-Gadda H, Delcourt C, BLISAR Study Group. 2019. Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size. Bioinformatics 35:3628–3634. doi: 10.1093/bioinformatics/btz135. [DOI] [PubMed] [Google Scholar]
  • 76.Benner A, Zucknick M, Hielscher T, Ittrich C, Mansmann U. 2010. High-dimensional Cox models: the choice of penalty as part of the model building process. Biom J 52:50–69. doi: 10.1002/bimj.200900064. [DOI] [PubMed] [Google Scholar]
  • 77.Tran TP, Ong E, Hodges AP, Paternostro G, Piermarocchi C. 2014. Prediction of kinase inhibitor response using activity profiling, in vitro screening, and elastic net regression. BMC Syst Biol 8:74. doi: 10.1186/1752-0509-8-74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Guo W, Zhu L, Zhu R, Chen Q, Wang Q, Chen JQ. 2019. A four-DNA methylation biomarker is a superior predictor of survival of patients with cutaneous melanoma. Elife 8:e44310. doi: 10.7554/eLife.44310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Yan X, Fu X, Guo ZX, Liu XP, Liu TZ, Li S. 2020. Construction and validation of an eight-gene signature with great prognostic value in bladder cancer. J Cancer 11:1768–1779. doi: 10.7150/jca.38741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kim SH, Joung JY, Chung J, Park WS, Lee KH, Seo HK. 2014. Detection of human papillomavirus infection and p16 immunohistochemistry expression in bladder cancer with squamous differentiation. PLoS One 9:e93525. doi: 10.1371/journal.pone.0093525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Jorgensen KR, Jensen JB. 2020. Human papillomavirus and urinary bladder cancer revisited. APMIS 128:72–79. doi: 10.1111/apm.13016. [DOI] [PubMed] [Google Scholar]
  • 82.Yang S, Wu Y, Wang S, Xu P, Deng Y, Wang M, Liu K, Tian T, Zhu Y, Li N, Zhou L, Dai Z, Kang H. 2020. HPV-related methylation-based reclassification and risk stratification of cervical cancer. Mol Oncol 14:2124–2141. doi: 10.1002/1878-0261.12709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Lohavanichbutr P, Mendez E, Holsinger FC, Rue TC, Zhang Y, Houck J, Upton MP, Futran N, Schwartz SM, Wang P, Chen C. 2013. A 13-gene signature prognostic of HPV-negative OSCC: discovery and external validation. Clin Cancer Res 19:1197–1203. doi: 10.1158/1078-0432.CCR-12-2647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. 2013. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Pak K, Suh S, Goh TS, Kim SJ, Oh SO, Seok JW, Kim IJ, Kim YH. 2019. BRAF-positive multifocal and unifocal papillary thyroid cancer show different messenger RNA expressions. Clin Endocrinol (Oxf) 90:601–607. doi: 10.1111/cen.13928. [DOI] [PubMed] [Google Scholar]
  • 86.Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. 2019. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res 47:W234–W241. doi: 10.1093/nar/gkz240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Ha M, Jeong H, Roh JS, Lee B, Han ME, Oh SO, Sohn DH, Kim YH. 2019. DYSF expression in clear cell renal cell carcinoma: a retrospective study of 2 independent cohorts. Urol Oncol 37:735–741. doi: 10.1016/j.urolonc.2019.07.007. [DOI] [PubMed] [Google Scholar]
  • 88.Aung MT, Yu Y, Ferguson KK, Cantonwine DE, Zeng L, McElrath TF, Pennathur S, Mukherjee B, Meeker JD. 2019. Prediction and associations of preterm birth and its subtypes with eicosanoid enzymatic pathways and inflammatory markers. Sci Rep 9:17049. doi: 10.1038/s41598-019-53448-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1
JVI.02354-20-s0001.pdf (354.7KB, pdf)

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES