Skip to main content
Disease Markers logoLink to Disease Markers
. 2021 Oct 19;2021:5340240. doi: 10.1155/2021/5340240

Identification of Epithelial-Mesenchymal Transition- (EMT-) Related LncRNA for Prognostic Prediction and Risk Stratification in Esophageal Squamous Cell Carcinoma

Peipei Wang 1, Yueyun Chen 1, Yue Zheng 1, Yang Fu 1, Zhenyu Ding 1,
PMCID: PMC8548124  PMID: 34712369

Abstract

Background

Epithelial-mesenchymal transition (EMT) is significantly associated with the invasion and development of esophageal squamous cell carcinoma (ESCC). However, the importance of EMT-related long noncoding RNA (lncRNA) is little known in ESCC.

Methods

GSE53624 (N = 119) and GSE53622 (N = 60) datasets retrieved from the Gene Expression Omnibus (GEO) database were used as training and external validation cohorts, respectively. GSE53624 and GSE53622 datasets were all sampled from China. Then, the prognostic value of EMT-related lncRNA was comprehensively investigated by weighted coexpression network analysis (WGCNA) and COX regression model.

Results

High expression of PLA2G4E-AS1, AC063976.1, and LINC01592 significantly correlated with the favorable overall survival (OS) of ESCC patients, and LINC01592 had the greatest contribution to OS. Importantly, ESCC patients were divided into low- and high-risk groups based on the optimal cut-off value of risk score estimated by the multivariate COX regression model of these three lncRNA. Patients with high risk had a shorter OS rate and restricted mean survival time (RMST) than those with low risk. Moreover, univariate and multivariate COX regression revealed that risk stratification, age, and TNM were independent prognostic predictors, which were used to construct a nomogram model for individualized and visualized prognosis prediction of ESCC patients. The calibration curves and time-dependent ROC curves in the training and validation cohorts suggested that the nomogram model had a good performance. Interestingly, clear trends indicated that risk score positively correlated with tumor microenvironment (TME) scores and immune checkpoints TIGIT, CTLA4, and BTLA. In addition, the Kyoto Encyclopedia of Genes and Genomes (KEGG) showed that PLA2G4E-AS1, AC063976.1, and LINC01592 were primarily associated with TNF signaling pathway, NF-kappa B signaling pathway, and ECM-receptor interaction.

Conclusion

We developed EMT-related lncRNA PLA2G4E-AS1, AC063976.1, and LINC01592 for prognostic prediction and risk stratification of Chinese ESCC patients, which might provide deep insight for personalized prognosis prediction in Chinese ESCC patients and be potential biomarkers for designing novel therapy.

1. Introduction

Esophageal cancer is one of the top ten malignant tumors globally and the sixth leading cause of cancer-related deaths. Esophageal cancer is mainly prevalent in eastern Asia and eastern and southern Africa [1, 2]. Esophageal squamous cell carcinoma (ESCC) is the most common histological subtype of esophageal cancer, accounting for about 90%. ESCC has a high tendency to be aggressive and metastatic, as well as a high chance of recurrence. Even with multimodality treatments (surgery, radiotherapy, chemotherapy, and targeted therapy), the prognosis of ESCC patients remains poor [3]. Insight into the molecules and mechanisms behind ESCC invasion and metastasis helps deepen the understanding of the disease. It is also urgent to discover novel biomarkers to develop new therapeutic strategies and improve the prognosis of ESCC patients.

Long noncoding RNA (LncRNA) is a group of evolutionarily conserved RNA molecules with more than 200 nucleotides in length, lacking protein-coding ability [4]. Abnormal expression of LncRNA plays a vital role in the occurrence and development of various tumors [59], including ESCC [10, 11]. Many studies have shown that lncRNA plays multiple roles in malignant behaviors such as tumor formation, invasion, migration, and immunogenicity. LncRNA BRCAT54 inhibits the tumorigenesis of non-small -ell lung cancer by binding to RPS9 to regulate JAK-STAT and calcium pathways [12]. Moreover, lncRNA LINC00472 regulates cell stiffness and inhibits the migration and invasion of lung adenocarcinoma by binding to YBX1 [13]. In addition, a study found that LIMIT may be a target for cancer immunotherapy [14]. LncRNA can participate in the occurrence and development of tumors by affecting chromatin remodeling, histone modification, DNA methylation, gene transcription, translation, etc.

Notably, the effect of lncRNA on the epithelial-mesenchymal transition (EMT) of tumor cells has also been confirmed in recent studies [15, 16]. EMT is an essential step in the metastasis of malignant tumors, which can transform epithelial-like cells into a mesenchymal-like cell state. By modifying the adhesion molecules expressed in cells, EMT reduces the adhesion ability of epithelial-derived tumor cells, thereby causing epithelial cells to separate from each other, increasing the metastatic potential of tumor cells, and further resisting antitumor treatments [17, 18]. Accordingly, EMT plays a crucial role in the metastasis of epithelial tumors. Few studies have explored the prognostic value of EMT-related lncRNA in cancer patients [19]. The role of EMT-related lncRNA in ESCC and related mechanisms is little known.

This study comprehensively investigated and validated the prognostic value of EMT-related lncRNA in Chinese ESCC patients from the Gene Expression Omnibus (GEO) database by weighted gene coexpression network analysis (WGCNA), COX proportional hazard regression, and Kaplan-Meier survival analysis. Furthermore, risk stratification and a nomogram model were constructed to personalize and visualize the overall survival (OS) rates of ESCC patients. Additionally, the correlation between weighted EMT-related lncRNA with tumor microenvironment (TME) scores, immune checkpoints (ICs), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways was further explored.

2. Materials and Methods

2.1. ESCC Patients

The transcriptome data of 119 and 60 newly diagnosed ESCC patients in the GSE53624 and GSE53622 datasetswere downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/), assigned as training and validation cohorts, respectively. GSE53624 and GSE53622 datasets were all sampled from China. The clinical characteristics, including age, gender, tumor invasion depth (T), lymph node metastasis (N), tumor node metastasis (TNM) staging, tumor grade, overall survival (OS) time, and events, were obtained and listed in Table S1. The workflow of data analysis in this study was performed according to Figure 1. Since the GEO database is publicly available, no approval from the local ethics committee was required.

Figure 1.

Figure 1

Schematic diagram of the study. The transcriptome sequencing data and clinical information of patients with esophageal squamous cell carcinoma (ESCC) were downloaded from the Gene Expression Omnibus (GEO) database. Different datasets were assigned as training and validation cohorts for weighted coexpression network analysis (WGCNA) and univariate COX proportional hazards regression analysis. Then, the epithelial-mesenchymal transition- (EMT-) related lncRNA determined by multivariate COX regression analysis was used for risk stratification and constructing a nomogram model. Finally, the pathways of EMT-related lncRNA and the correlation between lncRNA and tumor microenvironment (TME) or immune checkpoints (ICs) were explored.

2.2. Acquisition of lncRNA and EMT-Related mRNA

A total of 17,936 lncRNA (version 35) were downloaded from the GENCODE database (https://www.gencodegenes.org/human/) [20]. Furthermore, the 50 hallmark gene sets and the “HALLMARK_EPITHELIAL_MESENCHYMAL_ TRANSITION” gene list, including 200 EMT-related mRNA, were obtained from the Gene Set Enrichment Analysis (GSEA) database (http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp#H) [21, 22].

2.3. Weighted Gene Coexpression Network Analysis (WGCNA)

As an unsupervised machine learning, WGCNA was applied for investigating the correlation between genes. The R package “WGCNA” in R software (version 4.0.2, https://www.r-project.org/) was used to construct a weighted coexpression network between the lncRNA and EMT-related mRNA [23]. In the network, the pairwise Pearson coefficient was used to evaluate the coexpression weight among all genes. The power β of the soft threshold was used to confirm a scale-free network. Notably, the genes with similar expression patterns were clustered into the same color module in the unsupervised coexpression network.

2.4. Nomogram Model

The R packages “foreign” and “rms” in R software (version 4.0.2, https://www.r-project.org/) were used to construct a nomogram model to personalize and visualize the OS rate of ESCC patients [24, 25]. Each variable was assigned a point according to the nomogram model. Then, the total points were obtained by summing the points of all variables for determining the OS rate of an ESCC patient. Finally, time-dependent receiver operating characteristic (ROC) and calibration curves of the training and validation cohort were used to evaluate the accuracy of the nomogram model that predicted the OS rate.

2.5. Estimation of Tumor Microenvironment (TME) Score

The ESTIMATE algorithm was used to calculate the fraction of immune and stromal cells in ESCC tissues based on gene expression levels [26]. The ESTIMATE algorithm was performed using R package “estimate” to calculate TME, immune, and stromal scores in each ESCC patient.

2.6. Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways

“DOSE,” “http://org.Hs.eg.db,” “topGO,” and “clusterProfiler” packages in R software (version 4.0.2, https://www.r-project.org/) were used to obtain the KEGG pathways of EMT-related lncRNA in ESCC patients.

2.7. Statistical Analysis

All statistical analysis was performed using R software (version 4.0.2, https://www.r-project.org/). A chi-square and Fisher tests were used to compare differences between two groups of categorical variables, as appropriate. Univariate and multivariate Cox proportional hazard regression analysis was performed using package “survival.” The “surv_cutpoint” function in the R package “survminer” was used to determine the optimal cut-point of a gene (Figure S1). Kaplan-Meier curves were compared by the log-rank test. The R package “survRM2” was used to obtain the restricted mean survival time (RMST). Correlation coefficients between two quantitative variables were obtained by Pearson's method. The “survivalROC” package determined the area under the curve (AUC) in the time-dependent ROC curve. A two-tailed P value <0.05 and a P value <0.1 were considered statistically significant and a clear trend, respectively.

3. Results

3.1. WGCNA for lncRNA and EMT-Related mRNA in ESCC Patients

As shown in Table S1, the clinical characteristics were balanced between GSE53624 and GSE53622 datasets (P > 0.05). To identify EMT-related lncRNA in ESCC patients, 188 EMT-related mRNA and 5,506 lncRNA were included in the construction of WGCNA, and the workflow for data analysis was shown in Figure 1. The soft thresholds for building a scale-free network of training and validation cohorts were set to 4 and 5, respectively (Figures 2(a) and 2(b)). Then, a total of 3,900 lncRNA were coexpressed with EMT-related mRNA in the training cohort, which was distributed in 6 modules, including black, blue, brown, green, grey, and turquoise modules (Figure 2(a)). Moreover, 3,742 lncRNA and EMT-related mRNA were showed in 11 coexpression modules, including black, blue, green, greenyellow, grey, magenta, pink, purple, red, turquoise, and yellow, in the validation cohort (Figure 2(b)). Therefore, 3,900 and 3,742 EMT-related lncRNA in the training and validation cohorts, respectively, were used for the following univariate and multivariate COX regression analysis. Notably, the percent of overlapping EMT-related lncRNA between training and validation cohorts was 78.1% (3,045/3,900) and 81.4% (3,045/3,742), respectively.

Figure 2.

Figure 2

Construction of WGCNA between lncRNA and EMT-related genes in the training (a) and validation (b) cohorts. The soft threshold was obtained when the scale-free topology model was fitted to 0.85 (a). The dendrogram was constructed by hierarchical clustering of coexpressed genes (b). The different colors below the dendrogram represented the modules corresponding to the coexpression of lncRNA and EMT-related genes.

3.2. Univariate and Multivariate COX Regression Analysis

After univariate COX regression analysis, 338 and 169 EMT-related lncRNA were significantly associated with the OS of ESCC patients in the training and validation cohorts, respectively (P < 0.05, Figure 3(a)). To further confirm that EMT-related lncRNA had prognostic value in both cohorts, lncRNA with a hazard ratio (HR) > 1 or HR<1 in the univariate regression model was overlapped between the training and validation cohorts, and the results showed that 3 EMT-related lncRNA expression, including PLA2G4E-AS1, AC063976.1, and LINC01592, significantly correlated with the favorable OS of ESCC patients (HR < 1, Figure 3(b)). Then, multivariate COX regression analysis was used for weighted combination of AC063976.1, LINC01592, and PLA2G4E-AS1, which indicated that LINC01592 contributed the greatest to the OS of ESCC patients (coefficient = −0.54). Notably, the time-dependent ROC curve results demonstrated that the multivariate COX regression model performs well in the training cohort (AUC > 0.60, Figure 3(c)). This finding was confirmed in the validation cohort (AUC ≥ 0.70, Figure 3(c)).

Figure 3.

Figure 3

Univariate and multivariate COX regression analysis of EMT-related lncRNA in the training and validation cohorts. (a) The heatmaps displayed the hazard ratio (HR) and 95% confidence interval (CI) of EMT-related lncRNA with P < 0.05 by univariate COX regression analysis in the training (a) and validation (c) cohorts. Blue to red in the color scale represented HR from low to high. HR.95 L: the low value in the 95% confidence interval of HR; HR.95H: the high value in the 95% confidence interval of HR. (b) The overlapping number of EMT-related lncRNA with HR < 1 (a) or HR > 1 (c) in the training and validation cohorts. (c) Multivariate COX regression analysis of EMT-related lncRNA. The contribution of EMT-related lncRNA to the overall survival (OS) of ESCC patients in the training cohort (upper panel). The time-dependent receiver operator characteristic (ROC) curves of the training (b) and validation (d) cohorts were used to evaluate the performance of the multivariate COX regression model.

3.3. Establishment of Risk Stratification for ESCC Patients

To establish a risk stratification for ESCC patients, we first obtain the risk score based on the coefficients of multivariate COX regression, and the formula for calculating the risk score was as follows: risk score = −0.45 × (expression level of AC063976.1) − 0.54 × (expression level of LINC01592) − 0.23 × (expression level of PLA2G4E − AS1) (Figure 3(c)). Based on the optimal prognostic cut-point of risk score -6.57, ESCC patients were divided into high- and low-risk groups. ESCC patients with high risk were significantly associated with poor OS in the training cohort (HR = 3.70, 95% confidence interval (CI): 2.05 to 6.66, P < 0.001, Figure 4(a)). This result was confirmed in the validation cohort (HR = 2.54, 95% CI: 1.27 to 5.07, P = 0.007, Figure 4(c)). Furthermore, high-risk ESCC patients had a shorter RMST than low-risk patients in the training cohort (4-year RMST: 25 (95% CI: 22 to 29) vs. 40 (95% CI: 36 to 44) months) (Figure 4(a)). This result was again confirmed in the validation cohort (4-year RMST: 24 (95% CI: 17 to 31) vs. 37 (95% CI: 32 to 42) months) (Figure 4(c)). Interestingly, The Kaplan-Meier curves indicated that high expression of AC063976.1, LINC01592, and PLA2G4E-AS1 correlated with the favorable OS of ESCC patients in both the training and validation cohorts (P < 0.05, Figures 4(b) and 4(d)). Importantly, risk stratification was an independent prognostic predictor for ESCC patients by univariate and multivariate COX regression analysis in the training cohort (HR = 3.89, 95% CI: 2.13 to 7.11, P < 0.001, Table 1). This was again confirmed in the validation cohort (HR = 2.73, 95% CI: 1.29 to 5.78, P = 0.008, Table 1). In order to determine whether risk stratification has prognostic significance in a random population, Microsoft Excel 2016 was further used to randomly select a portion of samples in each dataset as a training cohort and then treat the other samples as a validation cohort. Patients with a high-risk score were associated with a poor OS in the training cohort of the GSE53624 dataset (HR = 2.67, 95% CI: 1.21 to 5.91, P = 0.012). This result was confirmed in the validation cohort of GSE53624 dataset (HR = 5.16, 95% CI: 2.13 to 12.47, P < 0.001) (Figure 5(a)). Interestingly, high-risk patients had a shorter OS than low-risk patients in the training cohort of the GSE53622 dataset; although, it is not statistically significant at that point (HR = 2.46, 95% CI: 0.91 to 6.66, P = 0.067). This finding was again confirmed in the validation cohort of GSE53622 dataset (HR = 2.52, 95% CI: 0.96 to 6.62, P = 0.054) (Figure 5(b)). This might be due to the small sample size in the GSE53622 dataset. To confirm the risk stratification that could better predict prognosis in which part of the population of ESCC patients, we conducted a subgroup analysis. There is a clear trend in the training and validation cohorts that risk stratification can better predict the prognosis in patients with TNM III/IV stage, N1-3, T3-4, male, and >60 years old. The prognosis can be predicted well through risk stratification regardless of the tumor grade (Figure 6).

Figure 4.

Figure 4

The weighted combination of AC063976.1, LINC01592, and PLA2G4E-AS1 was used for risk stratification of ESCC patients. (a, c) Kaplan-Meier curves (a) and 4-year restricted mean survival time (RMST) (b)–(d) for low- and high-risk groups of ESCC patients in the training (a) and validation (c) cohorts. (b, d) Kaplan-Meier curves were plotted based on the expression levels of AC063976.1 (a), LINC01592 (b), and PLA2G4E-AS1 (c, d) in training (b) and validation (d) cohorts.

Table 1.

Uni- and multivariate regression analysis in ESCC patients.

Variables Univariate COX regression Multivariate COX regression
Training cohort Validation cohort Training cohort Validation cohort
HR (95% CI) P value HR (95% CI) P value HR (95% CI) P value HR (95% CI) P value
Risk stratification
 Low risk Reference Reference Reference Reference
 High risk 3.70 (2.05, 6.66) < 0.001 2.54 (1.27, 5.07) 0.009 3.89 (2.13, 7.11) < 0.001 2.73 (1.29, 5.78) 0.008
Age, years
 ≤ 60 Reference Reference Reference Reference
 >60 1.58 (1.00, 2.50) 0.051 1.97 (0.98, 3.97) 0.057 1.77 (1.10, 2.85) 0.018 1.95 (0.95, 4.00) 0.067
Gender
 Female Reference Reference
 Male 0.83 (0.47, 1.46) 0.513 0.71 (0.31, 1.64) 0.424
T
 T1-2 Reference Reference
 T3-4 0.96 (0.57, 1.64) 0.891 1.55 (0.60, 4.01) 0.369
N
 N0 Reference Reference Reference Reference
 N1-3 2.16 (1.32, 3.54) 0.002 2.02 (0.99, 4.12) 0.053 1.11 (0.57, 2.17) 0.758 2.47 (0.66, 9.24) 0.179
TNM stage
 I/II Reference Reference Reference Reference
 III/IV 2.16 (1.32, 3.54) 0.002 2.01 (1.01, 4.01) 0.046 1.91 (0.99, 3.71) 0.055 0.80 (0.22, 2.99) 0.742
Tumor grade
 Well Reference Reference
 Moderate/poor 1.00 (0.56, 1.80) 0.989 2.04 (0.62, 6.70) 0.239

CI: confidence interval; HR: hazard ratio; N: lymph node metastasis; T: tumor invasion depth.

Figure 5.

Figure 5

The risk stratification was performed in GSE53624 (a) and GSE53622 (b) datasets based on the combination of AC063976.1, LINC01592, and PLA2G4E-AS1. Microsoft Excel 2016 was used to randomly select a portion of samples in each dataset as training cohort (a) and then treat the other samples as validation cohort (b).

Figure 6.

Figure 6

Subgroup analysis of risk stratification in different tumor grades, the tumor, node, metastasis (TNM) stage, lymph node metastasis (N), tumor invasion depth (T), gender, and age in the training and validation cohorts. CI: confident interval; HR: hazard ratio.

3.4. Construction of a Nomogram Visualizing and Personalizing the OS Rate of ESCC Patients

Univariate and multivariate COX regression analysis was used to identify independent prognosis factors for constructing a nomogram model. In addition to risk stratification, age and TNM stage were independent prognostic factors for ESCC patients in both the training and validation cohorts (HR > 1, P < 0.1, Table 1). Accordingly, a nomogram model constructed by the risk stratification, age, and TNM stage could visualize and personalize the 1-, 2-, 3-, and 4-year OS rates of ESCC patients (Figure 7(a)). Details of the points for the variables and OS rates in the nomogram model were listed in Table S2. The time-dependent ROC and calibration curves were further used to evaluate the predicted performance of the nomogram model. Notably, the time-dependent ROC curves illustrated that all the AUCs were ≥0.70 in both the training and validation cohorts (Figure 7(b)). Moreover, calibration curves indicated that the 1-, 2-, 3-, and 4-year OS rates predicted in the nomogram model were highly in line with actual observations in both the training and validation cohorts (Figures 7(c) and 7(d)). These results suggested that the nomogram model had good performance in predicting the OS rate of ESCC patients.

Figure 7.

Figure 7

Construction of the nomogram model. (a) The combination of risk stratification, age, and TNM stage visualized and personalized the OS rate of ESCC patients. After a point was assigned to risk stratification, age, and TNM stage of each patient according to the nomogram model, the total points were obtained to predict the OS rate of ESCC patients. (b) The time-dependent receive operating characteristic curve (ROC) curve was used to evaluate the performance of the nomogram model in the training (a, b) and validation (c, d) cohorts. (c d) The calibration curves were used to evaluate the performance of the nomogram model in predicting the OS rate in training (c) and validation (d) cohorts. The criterion for good performance of the nomogram model was that the predicted OS rate was relatively consistent with the actual OS rate. AUC: the area under the curve.

3.5. KEGG Pathways for EMT-Related lncRNA

Based on WGCNA, 57 and 25 EMT-related genes were coexpressed with AC063976.1, LINC01592, or PLA2G4E-AS1 in the training and validation cohorts, respectively (Figure 8(a)). Then, the KEGG was applied to identify the significant pathway associated with the AC063976.1, LINC01592, or PLA2G4E-AS1 in ESCC patients. The results showed that in the training and validation cohorts, a total of 7 and 8 pathways were enriched, respectively. And three overlapped pathways, including TNF signaling pathway, NF-kappa B signaling pathway, and ECM-receptor interaction, were enriched in both cohorts (Figures 8(c) and 8(d)).

Figure 8.

Figure 8

Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for the EMT-related lncRNA. (a, b) mRNA that coexpressed with EMT-related lncRNA was obtained from WGCNA in the training (a) and validation (b) cohorts. (c, d) The KEGG pathways for the EMT-related lncRNA in training (c) and validation (d) cohorts. The pathways marked in red represented the overlapped pathways enriched by EMT-related lncRNA in both the training and validation cohorts.

3.6. The Risk Score Was Positively Correlated with the TME Score and Immune Checkpoints (ICs)

Ample reports found that EMT-related genes were enriched in TME, whereas little was known about these genes in ESCC. This prompted us to investigate further the relationship between the risk score based on EMT-related lncRNA and TME score and the expression levels of ICs. TME score was composed of stromal score and immune score. The results suggested that risk score was positively correlated with TME score (R = 0.25, P = 0.007). Further analysis found that the risk score had a positive correlation with immune score (R = 0.16, P = 0.081); although, statistical significance was not reached at this point. Because ICs is closely related to immune score, we further analyzed the correlation between risk score and ICs. Notably, risk score had a significantly positive correlation with BTLA (R = 0.26, P = 0.004) and CTLA4 (R = 0.23, P = 0.012). What is more, there was a clear trend suggested that the risk score was positively correlated with TIGIT (R = 0.15, P = 0.093); although, statistical significance was not reached at this point. However, there was no significant correlation between risk score and PD-1, PD-L1, PD-L2, LAG3, and CD276 (P > 0.1). Interestingly, risk score was also positively correlated with stromal scores (R = 0.31, P < 0.001) (Figures 9(a) and 9(b)). We further analyzed the correlation between risk score and cancer-associated fibroblast- (CAFs-) related genes. The results suggested that the risk score had a significant positive correlation with MGP (R = 0.43, P < 0.001), MFAP5 (R = 0.19, P = 0.036), ITGA11 (R = 0.18, P = 0.050), DCN (R = 0.34, P < 0.001), or ACTA2 (R = 0.36, P < 0.001). Moreover, there was a clear trend showing that risk score was positively correlated with COL11A1 (R = 0.16, P = 0.091) and BMP4 (R = 0.17, P = 0.060); although. the data were not yet significant enough at this point. However, there is no significant correlation between risk score and SPHK1, CSPG4, TGFBI, and TNNC1 (P > 0.1) (Figures 9(a) and 9(b)).

Figure 9.

Figure 9

The correlation between risk score estimated by nomogram and tumor microenvironment (TME) and immune checkpoints (ICs) in the training cohort. (a) Distribution of risk score, TME score, ICs, and cancer-associated fibroblast (CAF) related gene expression level. R package “estimate” was used to calculate TME, immune, and stromal scores in each ESCC patient based on gene expression levels. (b) The relationship between risk score and TME, ICs (a) and CAFs (b). The size of the dot represents the size of the correlation coefficient, and the color scale from blue to yellow represents the P value from large to small.

4. Discussion

ESCC is the prevalent histological subtype of esophageal cancer, with a poor prognosis and prone to distant metastasis [3, 27]. Therefore, exploring potential biomarkers is essential for managing and predicting the prognosis of ESCC patients. Increasing evidence suggests that EMT is highly correlated with cancer progression and metastasis [28]. In recent years, a few studies have investigated the role of EMT-related lncRNA in the prognosis and progression of ESCC [29, 30]. However, based on next-generation transcriptome sequencing, a comprehensive assessment of the prognostic importance, risk stratification, and visualization of OS rates by EMT-related lncRNA in ESCC was little known.

In this study, based on the analysis of two large datasets in the GEO database, the results suggested that the high expression of AC063976.1, LINC01592, or PLA2G4E-AS1 was significantly associated with favorable OS in ESCC patients. In addition, KEGG results indicated that AC063976.1, LINC01592, or PLA2G4E-AS1 were mainly enriched in the TNF signaling pathway, NF-kappa B signaling pathway, and ECM-receptor interaction. AC063976.1, as a novel EMT-related lncRNA, has not yet been explored in cancer. For the first time, we revealed that upregulation of AC063976.1 corrected with a favorable OS of ESCC patients. Li et al. reported that LINC01592 was a protective factor for ESCC patients [31], which was consistent with this study. Furthermore, LINC01592 contributed the greatest to the OS of ESCC patients. Although PLA2G4E-AS1 was downregulated in thyroid carcinoma, its prognostic importance in cancer patients has not been elucidated [32]. The results of this study demonstrated that the high expression of PLA2G4E-AS1 could predict favorable OS of ESCC patients. These findings will provide prognostic information for further exploring the functions and mechanisms of AC063976.1, LINC01592, or PLA2G4E-AS1 in ESCC in the future.

Risk stratification plays a vital role in guiding clinical treatment and management of cancer patients [33]. Notably, the risk stratification constructed by a weighted combination of AC063976.1, LINC01592, and PLA2G4E-AS1 divided ESCC patients into low- and high-risk groups, implying it was also an independent prognostic predictor for ESCC patients. Interestingly, a subgroup analysis found that risk stratification was mainly performed in ESCC patients with TNM III/IV stage, N1-3, T3-4, male, or >60 y. Furthermore, the risk stratification could be used regardless of the tumor grade. Notably, a nomogram model established by the risk stratification, age, and TNM stage could display and visualize the 1-, 2-, 3-, and 4-year OS rates of ESCC patients, which might contribute to the management of individualized treatment.

Previous studies reported that TME was associated with EMT in cancers [34]. Moreover, the stromal microenvironment and CAFs play an important role in EMT [3538]. Hence, the relationship between risk scores calculated by AC063976.1, LINC01592, and PLA2G4E-AS1 and TME was further investigated. In this study, the risk score was positively correlated with TME, immune, and stromal scores. Furthermore, the risk score had a significant positive correlation with CAF genes, including MGP, ITGA11, DCN, ACTA2, COL11A1, or BMP4. However, high expression levels of ICs usually lead to T cell exhaustion in cancers [39, 40]. Interestingly, there was also a positive correlation between risk score and ICs, including BTLA, TIGIT, and CTLA4. These results can be interpreted that the antitumor effect of the high level of immune cell infiltration is offset by the strong immunosuppressive pathway activated by upregulated IC proteins [34, 41] and might provide the likelihood of immunotherapy for high-risk ESCC patients.

This study had several limitations: first, the results of the analysis and validation of transcriptome sequencing data in this study were based on publicly available datasets. Therefore, some important clinical information was incomplete, such as treatment options, which might produce potential biases in conclusions. Secondly, this study did not provide additional validation by original ESCC samples from our clinical center. Finally, EMT-related lncRNA should be validated through in vivo and in vitro experiments in the future.

5. Conclusions

We demonstrated that based on risk stratification constructed by PLA2G4E-AS1, AC063976.1, and LINC01592, ESCC patients were divided into low- and high-risk groups. Moreover, a nomogram model established by the risk stratification, age, and TNM stage could display and visualize the 1-, 2-, 3-, and 4-year OS rates of Chinese ESCC patients. In addition, the risk score was positively correlated with the TME score, ICs, and CAFs. These findings might provide deep insight for personalized prognosis prediction by EMT-related lncRNA in Chinese ESCC patients, and the three lncRNAs might be potential biomarkers for designing novel therapy.

Acknowledgments

This work was supported by the Clinical Research Incubation Project, West China Hospital, Sichuan University (2020HXFH056).

Abbreviations

AUC:

Area under the curve

CI:

Confidence interval

EMT:

Epithelial-mesenchymal transition

ESCC:

Esophageal squamous cell carcinoma

GEO:

Gene expression omnibus

GSEA:

Gene set enrichment analysis

HR:

Hazard ratio

IC:

Immune checkpoint

KEGG:

Kyoto Encyclopedia of Genes and Genomes

lncRNA:

Long noncoding RNA

OS:

Overall survival

RMST:

Restricted mean survival time

ROC:

Receiver operating characteristic

TME:

Tumor microenvironment

TNM:

Tumor invasion depth, lymph node metastasis, tumor node metastasis

WGCNA:

Weighted coexpression network analysis.

Data Availability

The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors have declared that no competing interest exists.

Supplementary Materials

Supplementary Materials

Figure S1: the optimal cut-points for risk score, AC063976.1, LINC01592, and PLA2G4E-AS1 in the training (a) and validation (b) cohorts. Table S1: clinical characteristics of ESCC patients. Table S2: the points for the nomogram model.

References

  • 1.Sung H., Ferlay J., Siegel R. L., et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a Cancer Journal for Clinicians . 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
  • 2.Lagergren J., Smyth E., Cunningham D., Lagergren P. Oesophageal cancer. Lancet . 2017;390(10110):2383–2396. doi: 10.1016/s0140-6736(17)31462-9. [DOI] [PubMed] [Google Scholar]
  • 3.Chen S. B., Weng H. R., Wang G., et al. Prognostic factors and outcome for patients with esophageal squamous cell carcinoma underwent surgical resection alone: evaluation of the seventh edition of the American joint committee on cancer staging system for esophageal squamous cell carcinoma. Journal of Thoracic Oncology . 2013;8(4):495–501. doi: 10.1097/JTO.0b013e3182829e2c. [DOI] [PubMed] [Google Scholar]
  • 4.Ma L., Bajic V. B., Zhang Z. On the classification of long non-coding RNAs. RNA Biology . 2013;10(6):924–933. doi: 10.4161/rna.24604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen C. T., Wang P. P., Mo W. J., et al. Expression profile analysis of prognostic long non-coding RNA in adult acute myeloid leukemia by weighted gene co-expression network analysis (WGCNA) Journal of Cancer . 2019;10(19):4707–4718. doi: 10.7150/jca.31234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Xu W., Li K., Song C., et al. Knockdown of lncRNA LINC01234 suppresses the tumorigenesis of liver cancer via sponging miR-513a-5p. Frontiers in Oncology . 2020;10, article 571565 doi: 10.3389/fonc.2020.571565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lv Y., Wang Y., Song Y., et al. LncRNA PINK1-AS promotes Gαi1-driven gastric cancer tumorigenesis by sponging microRNA-200a. Oncogene . 2021;40(22):3826–3844. doi: 10.1038/s41388-021-01812-7. [DOI] [PubMed] [Google Scholar]
  • 8.Chang K. C., Diermeier S. D., Yu A. T., et al. _MaTAR25_ lncRNA regulates the _Tensin1_ gene to impact breast cancer progression. Nature Communications . 2020;11(1):p. 6438. doi: 10.1038/s41467-020-20207-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu Q.-N., Luo X.-J., Liu J., et al. MYC-activated LncRNAMNX1-AS1Promotes the progression of colorectal cancer by stabilizing YB1. Cancer Research . 2021;81(10):2636–2650. doi: 10.1158/0008-5472.can-20-3747. [DOI] [PubMed] [Google Scholar]
  • 10.Zhang C., Xie L., Fu Y., Yang J., Cui Y. lncRNA MIAT promotes esophageal squamous cell carcinoma progression by regulating miR-1301-3p/INCENP axis and interacting with SOX2. Journal of Cellular Physiology . 2020;235(11):7933–7944. doi: 10.1002/jcp.29448. [DOI] [PubMed] [Google Scholar]
  • 11.Li X., Wu Z., Mei Q., et al. Long non-coding RNA HOTAIR, a driver of malignancy, predicts negative prognosis and exhibits oncogenic activity in oesophageal squamous cell carcinoma. British Journal of Cancer . 2013;109(8):2266–2278. doi: 10.1038/bjc.2013.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yang W., Qian Y., Gao K., et al. LncRNA BRCAT54 inhibits the tumorigenesis of non-small cell lung cancer by binding to RPS9 to transcriptionally regulate JAK-STAT and calcium pathway genes. Carcinogenesis . 2021;42(1):80–92. doi: 10.1093/carcin/bgaa051. [DOI] [PubMed] [Google Scholar]
  • 13.Deng X., Xiong W., Jiang X., et al. LncRNA LINC00472 regulates cell stiffness and inhibits the migration and invasion of lung adenocarcinoma by binding to YBX1. Cell Death & Disease . 2020;11(11):p. 945. doi: 10.1038/s41419-020-03147-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li G., Kryczek I., Nam J., et al. LIMIT is an immunogenic lncRNA in cancer immunity and immunotherapy. Nature Cell Biology . 2021;23(5):526–537. doi: 10.1038/s41556-021-00672-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jarroux J., Foretek D., Bertrand C., et al. HOTAIR lncRNA promotes epithelial-mesenchymal transition by redistributing LSD1 at regulatory chromatin regions. EMBO Reports . 2021;22(7, article e50193) doi: 10.15252/embr.202050193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang P., Chen W., Ma T., et al. lncRNA lnc-TSI inhibits metastasis of clear cell renal cell carcinoma by suppressing TGF-β-induced epithelial-mesenchymal transition. Mol Ther Nucleic Acids. . 2020;22:1–16. doi: 10.1016/j.omtn.2020.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 17.Taki M., Abiko K., Ukita M., et al. Tumor immune microenvironment during epithelial-Mesenchymal transition. Clinical Cancer Research . 2021;27(17):4669–4679. doi: 10.1158/1078-0432.ccr-20-4459. [DOI] [PubMed] [Google Scholar]
  • 18.Erin N., Grahovac J., Brozovic A., Efferth T. Tumor microenvironment and epithelial mesenchymal transition as targets to overcome tumor multidrug resistance. Drug Resistance Updates . 2020;53, article 100715 doi: 10.1016/j.drup.2020.100715. [DOI] [PubMed] [Google Scholar]
  • 19.Liu C., Hu C., Li J., Jiang L., Zhao C. Identification of epithelial-mesenchymal transition-related lncRNAs that associated with the prognosis and immune microenvironment in colorectal cancer. Frontiers in Molecular Biosciences . 2021;8, article 633951 doi: 10.3389/fmolb.2021.633951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Frankish A., Diekhans M., Ferreira A. M., et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Research . 2019;47(D1):D766–d773. doi: 10.1093/nar/gky955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Subramanian A., Tamayo P., Mootha V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America . 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mootha V. K., Lindgren C. M., Eriksson K. F., et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics . 2003;34(3):267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
  • 23.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics . 2008;9(1):p. 559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dong D., Fang M. J., Tang L., et al. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Annals of Oncology . 2020;31(7):912–920. doi: 10.1016/j.annonc.2020.04.003. [DOI] [PubMed] [Google Scholar]
  • 25.Wang P., Fu Y., Chen Y., et al. Nomogram personalizes and visualizes the overall survival of patients with triple-negative breast cancer based on the immune genome. BioMed Research International . 2020;2020 doi: 10.1155/2020/4029062.4029062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yoshihara K., Shahmoradgoli M., Martínez E., et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Communications . 2013;4(1):p. 2612. doi: 10.1038/ncomms3612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhang D., Qian C., Wei H., Qian X. Identification of the prognostic value of tumor microenvironment-related genes in esophageal squamous cell carcinoma. Frontiers in Molecular Biosciences . 2020;7, article 599475 doi: 10.3389/fmolb.2020.599475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tian Y., Qi P., Niu Q., Hu X. Combined snail and E-cadherin predicts overall survival of cervical carcinoma patients: comparison among various epithelial-mesenchymal transition proteins. Frontiers in Molecular Biosciences . 2020;7:p. 22. doi: 10.3389/fmolb.2020.00022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gao G. D., Liu X. Y., Lin Y., Liu H. F., Zhang G. J. LncRNA CASC9 promotes tumorigenesis by affecting EMT and predicts poor prognosis in esophageal squamous cell cancer. European Review for Medical and Pharmacological Sciences . 2018;22(2):422–429. doi: 10.26355/eurrev_201801_14191. [DOI] [PubMed] [Google Scholar]
  • 30.Ren P., Zhang H., Chang L., Hong X. D., Xing L. LncRNA NR2F1-AS1 promotes proliferation and metastasis of ESCC cells via regulating EMT. European Review for Medical and Pharmacological Sciences . 2020;24(7):3686–3693. doi: 10.26355/eurrev_202004_20831. [DOI] [PubMed] [Google Scholar]
  • 31.Li W., Liu J., Zhao H. Identification of a nomogram based on long non-coding RNA to improve prognosis prediction of esophageal squamous cell carcinoma. Aging (Albany NY) . 2020;12(2):1512–1526. doi: 10.18632/aging.102697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhang Y., Jin T., Shen H., Yan J., Guan M., Jin X. Identification of long non-coding RNA expression profiles and co-expression genes in thyroid carcinoma based on the Cancer Genome Atlas (TCGA) Database. Medical Science Monitor . 2019;25:9752–9769. doi: 10.12659/msm.917845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tallman M. S., Wang E. S., Altman J. K., et al. Acute myeloid leukemia, version 3.2019, NCCN clinical practice guidelines in oncology. Journal of the National Comprehensive Cancer Network . 2019;17(6):721–749. doi: 10.6004/jnccn.2019.0028. [DOI] [PubMed] [Google Scholar]
  • 34.Zhong W., Zhang F., Huang C., Lin Y., Huang J. Identification of epithelial-mesenchymal transition-related lncRNA with prognosis and molecular subtypes in clear cell renal cell carcinoma. Frontiers in Oncology . 2020;10, article 591254 doi: 10.3389/fonc.2020.591254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wu X., Ruan L., Yang Y., Mei Q. Analysis of gene expression changes associated with human carcinoma-associated fibroblasts in non-small cell lung carcinoma. Biological Research . 2017;50(1):p. 6. doi: 10.1186/s40659-017-0108-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wen S., Qu N., Ma B., et al. Cancer-associated fibroblasts positively correlate with dedifferentiation and aggressiveness of thyroid cancer. Oncotargets and Therapy . 2021;Volume 14:1205–1217. doi: 10.2147/ott.s294725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Nurmik M., Ullmann P., Rodriguez F., Haan S., Letellier E. In search of definitions: cancer-associated fibroblasts and their markers. International Journal of Cancer . 2020;146(4):895–905. doi: 10.1002/ijc.32193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Calon A., Lonardo E., Berenguer-Llergo A., et al. Stromal gene expression defines poor-prognosis subtypes in colorectal cancer. Nature Genetics . 2015;47(4):320–329. doi: 10.1038/ng.3225. [DOI] [PubMed] [Google Scholar]
  • 39.Chen C., Liang C., Wang S., et al. Expression patterns of immune checkpoints in acute myeloid leukemia. Journal of Hematology & Oncology . 2020;13(1):p. 28. doi: 10.1186/s13045-020-00853-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chen C., Xu L., Gao R., et al. Transcriptome-based co-expression of BRD4 and PD-1/PD-L1 predicts poor overall survival in patients with acute myeloid leukemia. Frontiers in Pharmacology . 2021;11, article 582955 doi: 10.3389/fphar.2020.582955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Yao Y., Wang Z. C., Yu D., Liu Z. Role of allergen-specific T-follicular helper cells in immunotherapy. Current Opinion in Allergy and Clinical Immunology . 2018;18(6):495–501. doi: 10.1097/aci.0000000000000480. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

Figure S1: the optimal cut-points for risk score, AC063976.1, LINC01592, and PLA2G4E-AS1 in the training (a) and validation (b) cohorts. Table S1: clinical characteristics of ESCC patients. Table S2: the points for the nomogram model.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding authors.


Articles from Disease Markers are provided here courtesy of Wiley

RESOURCES