Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2018 Jul 2;8:9912. doi: 10.1038/s41598-018-28299-9

Identification of Biomarkers Based on Differentially Expressed Genes in Papillary Thyroid Carcinoma

Jun Han 1,#, Meijun Chen 1,#, Yihan Wang 2, Boxuan Gong 3, Tianwei Zhuang 4, Lingyu Liang 5, Hong Qiao 1,
PMCID: PMC6028435  PMID: 29967488

Abstract

The incidence of papillary thyroid carcinoma (PTC) is increasing rapidly throughout the world. Hence, there is an urgent need for identifying more specific and sensitive biomarkers to explorate the pathogenesis of PTC. In this study, three pairs of stage I PTC tissues and matched normal adjacent tissues were sequenced by RNA-Seq, and 719 differentially expressed genes (DEGs) were screened. KEGG pathway enrichment analyses indicated that the DEGs were significantly enriched in 28 pathways. A total of 18 nodes consisting of 20 DEGs were identified in the top 10% of KEGG integrated networks. The functions of DEGs were further analysed by GO. The 13 selected genes were confirmed by qRT-PCR in 16 stage I PTC patients and by The Cancer Genome Atlas (TCGA) database. The relationship interactions between DEGs were analysed by protein-protein interaction networks and chromosome localizations. Finally, four newly discovered genes, COMP, COL3A1, ZAP70, and CD247, were found to be related with PTC clinical phenotypes, and were confirmed by Spearman’s correlation analyses in TCGA database. These four DEGs might be promising biomarkers for early-stage PTC, and provide an experimental foundation for further exploration of the pathogenesis of early-stage PTC.

Introduction

Thyroid carcinoma is the most common malignancy in the endocrine system. Papillary thyroid carcinoma (PTC) is the most common pathological type of thyroid carcinoma, accounting for approximately 80% of all thyroid carcinomas1. Its incidence is rapidly growing throughout the world during the past few decades2,3. PTC patients diagnosed at late stages have a five-year survival rate <60%, and the recurrence has been reported to be as high as 30%4. Hence, there is an urgent need for identifying more specific and sensitive biomarkers to explorate the pathogenesis of PTC. These include the telomerase reverse transcriptase promoter region (TP53, BRAF, and RAS) as well as other gene mutations that can be used in the exploration of the pathogenesis of thyroid cancer. Molecular markers and their related molecular pathways of genetic and epigenetic changes can also be helpful in developing targeted therapies5, so identifying PTC-related molecular markers is important for exploration of the pathogenesis of PTC.

Tumour related biomarkers have a variety of forms, including pathological biomarkers, epigenetic biomarkers, protein biomarkers, DNA biomarkers, and RNA biomarkers. The mRNAs that play a key role in the protein translation process can also be used as biomarkers for exploration of the pathogenesis of cancer6. Garcia and colleagues reported that the level of cyclin D1 mRNA in plasma can be used as a possible marker of clinical outcomes in breast cancer7, and March-Villalba reported that hTERT mRNA was a useful noninvasive tumour marker for the molecular diagnosis of prostate cancer8.

For the detection of mRNA levels, the most commonly used methods include northern blots, the polymerase chain reaction (PCR), RNA in situ hybridization, cDNA microarrays, and high-throughput sequencing techniques. RNA sequencing (RNA-Seq) has become a widely-accepted method for detection of gene expression levels9. It provides a more comprehensive method for mapping and quantifying transcriptomes, when compared with gene chips or other sequencing techniques10,11. Although the data obtained by RNA-Seq is massive, bioinformatics can analyse the large data comprehensively, systematically, and accurately. It is therefore possible to identify key elements or genes associated with human disease from the high-throughput data obtained from this technique.

The development of bioinformation technology, the emergence of various public databases, and the application of analytical strategies have provided powerful tools for the analysis and identification of differentially expressed genes. The GO database is currently the most widely-used gene annotation system for gene functions and products12. It can perform functional enrichment analyses of target genes, and provide a better understanding of the relationships between genes and diseases. The KEGG database combines genetic information with functional information, and can be used to systematically analyse the relationships between gene functions and enriched pathways13. Protein-protein interaction (PPI) network analysis is also widely used in data processing. It can intuitively analyse the interactions between proteins, in order to accurately assess the interaction between genes14.

In this study, we performed RNA-Seq and utilized bioinformatics technology to identify genes that were differentially expressed genes (DEGs) in stage I PTC tissues vs. matched normal adjacent tissues. The Cancer Genome Atlas database and qt-PCR were used for double validation. The relationship interactions between DEGs were analysed by protein-protein interaction networks and chromosome localizations. Finally, four newly discovered genes, COMP, COL3A1, ZAP70, and CD247, were found to be related with PTC clinical phenotypes, and were confirmed by Spearman’s correlation analyses in TCGA database. The expression level of COMP was significantly and positively related to the tumour sizes of PTC patients. The higher the gene expression, the larger the tumour size. In addition, the expression levels of COL3A1, COMP, and ZAP70 were positively related to the risk of lymph node metastasis. Furthermore, COL3A1 and COMP expression levels were correlated with the TNM stage in PTC patients. These four DEGs might be promising biomarkers for early-stage PTC, and provide an experimental foundation for further exploration of the pathogenesis of early-stage PTC.

Results

Screening of DEGs based on RNA-Seq

In order to discover novel genes related to the pathogenesis of papillary thyroid carcinoma (PTC) by using differentially expressed genes (DEGs), we selected the female patients with stage I PTC. First of all, three patients who met the above criteria were enrolled in this study. It was a reasonable amount of patients to do the initial RNA-Seq experiments15,16. Therefore, the three pairs of stage I PTC tissues and matched normal adjacent tissues were sequenced by RNA-Seq. Then we tried to gather samples as much as possible, 17 patients who met the above criteria were also enrolled in this study. But four patients’ tissues were unable to carry out qRT-PCR because of RNA degradation. So 13 patients were enrolled in this study. In total, 16 patients were enrolled in this study (Table 1).

Table 1.

Clinical information on 16 PTC patients.

Number gender age tumor diameter (cm) TNM Stage
1 female 41 1.00 T1N1M0 Stage I
2 female 39 1.30 T1N1M0 Stage I
3 female 37 2.00 T1N1M0 Stage I
4 female 38 1.40 T1N1M0 Stage I
5 female 57 1.20 T1N0M0 Stage I
6 female 45 0.80 T1N0M0 Stage I
7 female 41 1.50 T1N0M0 Stage I
8 female 42 2.00 T1N1M0 Stage I
9 female 29 1.50 T1N1M0 Stage I
10 female 42 1.20 T1N0M0 Stage I
11 female 47 0.80 T1N0M0 Stage I
12 female 54 0.90 T1N0M0 Stage I
13 female 61 1.80 T1N0M0 Stage I
14 female 26 1.20 T1N0M0 Stage I
15 female 49 0.70 T1N0M0 Stage I
16 female 37 2.00 T4N1M0 Stage I

We obtained 9–11 million reads for each sample after RNA sequencing (RNA-Seq). A total of 13,703 unique genes were detected by removing the genes with transcript per million mapped (RPKM) values <0.5 from the analyses. We calculated the difference of RPKM values and the fold changes between cancer samples and matched normal adjacent samples. A difference of a RPKM value >10 and a fold change >1.5 were used to classify the DEGs. Based on this definition, there were 456 upregulated [Eca(i) Eadj(i) > 1.5 i = 1,2,3] and 263 downregulated [Eadj(i)/Eca(i) > 1.5 i = 1, 2, 3] genes. These 719 DEGs were regarded as candidate genes for further study, and their expression levels are shown in a heat map in Fig. 1A. The results of haematoxylin and eosin (HE) staining of the three pairs of stage I PTC tissues are shown in Fig. 2.

Figure 1.

Figure 1

Hierarchical clustering and significantly enriched KEGG pathways of differentially expressed genes. (A) Numbers, the sample number; ca, cancer tissue; adj, adjacent normal tissue; and exp, gene expression values. The expression level for each gene is represented by a colour range from blue (low) to yellow (high). (B) Significantly-enriched KEGG pathways of upregulated genes. (C) Significantly enriched KEGG pathways of downregulated genes.

Figure 2.

Figure 2

Haematoxylin and eosin staining for papillary thyroid carcinoma (PTC) tissues. (AC) Represent the three samples of PTC; 40×, 40 times the visual field observation; 100×, 100 times the visual field observation; and 200×, 200 times the visual field observation.

KEGG signalling pathway enrichment analysis

To investigate the DEG-related pathways to reveal potential mechanisms of PTC, we performed enrichment analyses to identify related pathways. The 456 upregulated genes were highly enriched in 15 pathways and the 263 downregulated genes in 13 pathways (p < 0.05). Significantly enriched KEGG pathways are shown in Fig. 1B,C. The lower the p value, the more significant the enrichment. KEGG analyses indicated that the upregulated, DEGs were involved in multiple tumorigenesis pathways, including pathways in thyroid cancer, small cell lung cancer, bladder cancer, the p53 signalling pathway, cell adhesion molecules, focal adhesion, adherens junctions, and extracellular matrix (ECM)-receptor interactions. Downregulated genes were significantly involved in KEGG pathways related to autoimmune thyroid disease, natural killer cell-mediated cytotoxicity, cytokine-cytokine receptor interactions, chemokine signalling pathways, B-cell receptor signalling pathways, intestinal immune networks for IgA production, systemic lupus erythematosus, viral myocarditis, asthma, and type I diabetes mellitus.

Integrated KEGG pathway regulatory networks

To further screen DEGs and identify the relationships between genes and diseases, we extracted the relationship between all genes in 28 KEGG pathways enriched by the DEGs, and constructed an integrated KEGG pathway regulation network.

The network included 857 nodes and 1,224 edges. Each node in the network represented a data object in KEGG, which was the product of one or more genes. The edges represented the relationships. Different colours represented different pathways. There were ten pathways with less nodes, so we represented them uniformly with grey. If a node was comprised of different colours, the node appeared in different pathways. The square nodes represented nodes included in the DEGs, while the rounded nodes represented those not included in the DEGs. The size of the nodes represented the degrees of node distributions. The larger the node, the higher the degree (Fig. 3).

Figure 3.

Figure 3

The integrated KEGG pathways regulatory network. Different colours represent different pathways; square nodes represent the nodes that included the differentially expressed genes (DEGs), the round nodes represent nodes that did not include the DEGs.

Because the network was too large to obtain additional important information, we chose nodes whose degrees were in the top 10% of the networks, which contained the differentially expressed genes. A total of 18 nodes containing 16 upregulated genes and four downregulated genes were identified in the networks (Supplementary Table S1). The upregulated genes included CTNNB1, HRAS, FN1, CCND1, C3, LAMA5, LAMB1, LAMB3, COL1A1, COL3A1, NOTCH4, ITGB4, PXN, COMP, CDKN1A, and ITGA3. The downregulated genes included RAC2, ZAP70, IL2RG, and CD247.

GO functional enrichment analysis

Functional enrichment analyses were used to further investigate functional differences of the 20 differentially expressed genes in the integrated KEGG pathway regulatory networks. From the GO results, 21 biology processes were significantly overrepresented for 20 differentially expressed genes (adjusted values of p < 0.05). The enrichment terms are shown in Fig. 4. The following differentially expressed genes were significantly enriched in GO: 0007155 ~ cell adhesion (11 genes) and GO: 0022610 ~ biological adhesion (11 genes).

Figure 4.

Figure 4

Functional enrichment analysis of 20 differentially expressed genes. Red boxes represent the significant enrichment of GO terms.

Validation of DEGs by qRT-PCR and TCGA database

In the 20 selected DEGs, ten genes (COMP, COL3A1, LAMA5, LAMB1, PXN, C3, RAC2, ZAP70, IL2RG, and CD247) were first found to be associated with PTC in this study, and were not previously reported. The other ten genes were specifically reported to be associated with PTC1726. All newly discovered PTC-related genes were selected to be validated. And from those ten genes which were reported to be associated with PTC, we randomly selected three PTC-related genes (FN1, ITGA3, and LAMB3). We then used qRT-PCR to validate the mRNA levels of 13 DEGs in stage I PTC tissues and matched normal adjacent tissues from 16 stage I PTC patients. Among the selected 13 genes, nine genes showed consistent expression differences with the RNA-Seq data, and the three genes, which were reported to be associated with PTC, were all validated. Among the identified genes, COMP (p = 0.0002), COL3A1 (p = 0.0026), FN1 (p < 0.0001), ITGA3 (p = 0.0112), LAMB3 (p = 0.0005) levels were increased in stage I PTC tissues. In contrast, the expressions of RAC2 (p = 0.0405), ZAP70 (p = 0.0121), IL2RG (p = 0.0175) and CD247 (p = 0.0112) were downregulated in tumour tissues (Fig. 5). TCGA gene expression data by RNA sequencing from 513 PTC samples and 59 normal samples were used as the validation cohort, and 4,137 DEGs were screened out. The results showed that among the nine genes verified by qRT-PCR, seven genes (COMP, COL3A1, FN1, ITGA3, LAMB3, ZAP70, and CD247) were validated and consistent with the RNA-Seq and qRT-PCR data. The three genes, which were reported to be associated with PTC, were all validated (Fig. 5).

Figure 5.

Figure 5

qRT-PCR and TCGA verification results of nine genes. TCGA, TCGA database validation results; PCR, qRT-PCR validation results; can, cancer tissue; nor, adjacent normal tissue *p < 0.05; **p < 0.01.

PPI networks and chromosome locations

To clarify the interactions between the verified DEGs and to identify the intrinsic mechanism of the genes and diseases, we constructed the PPI network and analysed the protein interactions between the seven genes (Fig. 6). There were three pairs of interacting genes, FN1 and COMP, FN1 and ITGA3, and CD247 and ZAP70. Using the Ensembl database chromosome positioning and Circos mapping of the three pairs of genes, we determined that the three pairs of genes were located on different chromosomes (Fig. 6).

Figure 6.

Figure 6

The Protein-Protein interaction (PPI) network and the distribution of the interacted genes on chromosomes. (A) The yellow dots represent the identified seven differentially expressed genes (DEGs); the green dots represent the genes interacted with the seven DEGs; the lines represent PPIs. (B) Different colours represent different chromosomes; the scale marked on each chromosome represents the genetic map distance. The lines represent PPIs; red lines represent the relationship between the identified three pairs of DEGs; the rest of the interacting relationship is shown with grey lines.

Correlation between the DEGs and clinical characteristics of PTC

To determine the correlation between the identified DEGs and PTC clinical characteristics, a total of 504 PTC samples with clinical phenotypic data in TCGA database were included. The correlations between the DEG expression levels and clinical characteristics [tumour size, lymph nodes metastasis, distant metastasis, and tumour node metastasis (TNM) staging] were analysed using Spearman’s correlation, with Gu’s method used as a ref.27. A value of p < 0.05 was defined as indicating a statistical significance. As shown in Table 2, the expression level of COMP was significantly and positively related to the tumour sizes of PTC patients. The higher the gene expression, the larger the tumour size. In addition, the expression levels of COL3A1, COMP, and ZAP70 were positively related to the risk of lymph node metastasis. Furthermore, COL3A1 and COMP expression levels were correlated with the TNM stage in PTC patients. In addition, we used t-test to analyse the correlation between the DEG expression levels and distant metastasis. We subdivided the gene expression levels of CD247, ZAP70, COMP, and COL3A1 into two groups based on whether they had distant metastasis or not, respectively. The results showed that there was no significant difference between the two groups (P = 0.755, 0.76, 0.837, 0.306, respectively).

Table 2.

Correlation analysis between 4 identified DEGs and clinical characteristics of PTC.

Gene Tumor size Lymph node Metastasis TNM stage
CD247 r −0.033 0.089 −0.088 −0.073
P 0.462 0.058 0.137 0.102
COL3A1 r −0.014 0.151** 0.079 −0.112*
P 0.755 0.001 0.179 0.012
COMP r 0.132** 0.261** −0.041 0.107*
P 0.003 1.72 × 10−8 0.487 0.016
ZAP70 r −0.036 0.115* −0.048 −0.066
P 0.427 0.014 0.414 0.138

r: spearman correlation; P: Sig.(2-tailed); *P < 0.05; **P < 0.01.

Discussion

Papillary thyroid carcinoma (PTC) was the only histological type of tumour with incidence rates rising consistently among all ethnic groups over the past three decades28. Thus, it is important to identify appropriate biomarkers for exploration of the pathogenesis of this cancer.

We identified 719 differentially expressed genes (DEGs), with KEGG pathway enrichment analyses showing that the upregulated genes were significantly enriched in the pathways related to focal adhesions, ECM-receptor interactions, adherens junctions, and 12 other pathways. These pathways were all closely related to cancer. Focal adhesions are large protein complexes linking the cell cytoskeleton with the ECM. They affect many cellular processes including motility, proliferation, differentiation, regulation of gene expression, and cell survival29. Moreover, this pathway has been found to be significantly associated with gene expression studies in many types of cancers3032.

To further screen DEGs and explore the relationships between genes and diseases, we constructed the integrated KEGG pathway regulation network. We chose the DEGs in the nodes whose degrees were in the top 10% in the network. A total of 20 DEGs were identified. GO functional enrichment analyses were then used to further investigate the function of these 20 DEGs, indicating that they were significantly enriched in 21 GO terms closely related to cancer, such as regulation of cell proliferation, cell adhesion, biological adhesion, and cell substrate adhesion.

To confirm the reliability of our methods and the experimental data, we screened the DEGs related to stage I PTC, and validated the selected 13 DEGs by qRT-PCR and TCGA databases. The positive results were further analysed using the PPI network. Knowledge of the PPI network helps to solve many problems such as signalling pathways identification33,34, recognition of functional modules35, and prediction of protein functions36. We can therefore assess the interactions between DEGs and understand the intrinsic mechanisms between genes and diseases more accurately. Among the selected 13 genes, nine genes showed consistent expression differences with the RNA-Seq data, and the three genes, which were reported to be associated with PTC, were all validated. Among the identified genes, COMP (p = 0.0002), COL3A1 (p = 0.0026), FN1 (p < 0.0001), ITGA3 (p = 0.0112), LAMB3 (p = 0.0005) levels were increased in stage I PTC tissues. In contrast, the expressions of RAC2 (p = 0.0405), ZAP70 (p = 0.0121), IL2RG (p = 0.0175) and CD247 (p = 0.0112) were downregulated in tumour tissues. TCGA gene expression data by RNA sequencing from 513 PTC samples and 59 normal samples were used as the validation cohort, and 4,137 DEGs were screened out. The results showed that among the nine genes verified by qRT-PCR, seven genes (COMP, COL3A1, FN1, ITGA3, LAMB3, ZAP70, and CD247) were validated and consistent with the RNA-Seq and qRT-PCR data. The three genes, which were reported to be associated with PTC, were all validated, and the high positive percentages confirmed the reliability of our studies. Among the seven gene products, FN1 and COMP, FN1 and ITGA3, and CD247 and ZAP70 interacted with each other. Cancer is a complex disease caused by the interaction of multiple environmental factors and genes37. Gene expression is a complex and orderly process that is regulated by cis-acting elements and trans-acting factors38. Regulation of gene expression at the level of transcription is often associated with trans-acting proteins and cis-acting promoter sequences that work together to affect the function of RNA polymerase (RNAP). In response to environmental cues, regulatory proteins can interact directly with RNAP to alter its activity or interact with specific sequences or structures in the promoter region to affect RNAP binding or processing. Certain genes are subject to complex controls involving multiple trans-acting factors and sequences in the promoter region in order to function co-ordinately or independently to affect transcription39. In our study, the above three pairs of interacting genes were located on different chromosomes, suggesting that the three pairs of genes affected the expression of each other through trans-acting and protein interactions, which then affected the occurrence and development of PTC.

In our study, COL3A1 (collagen, Type III, alpha 1) and COMP (cartilage oligomeric matrix protein), were significantly upregulated in stage I PTC tissues (COL3A1, p = 0.0026; COMP, p = 0.0002). Both of the genes were co-enriched in GO terms of cell adhesion, biological adhesion, and cell substrate adhesion. Cell adhesion is involved in stimulating signals that regulate cell differentiation, cell cycle, cell migration, and cell survival. Tumour cells are characterized by changes in the adhesion to the ECM, which may be related to their invasive and metastatic potentials40. Moreover, COL3A1 was related to blood vessel development and vasculature development. It is well-known that tumour growth and metastasis are complementary processes. When tumour cells are switched to an angiogenic phenotype, tumour growth and progression occur41. In our study, there was a notable correlation between COL3A1 and lymph node metastasis. Su et al. suggested that COL3A1 may increase renal cell carcinoma growth, metastasis, and tumour macrophage infiltration42. Another study reported that high COL3A1 mRNA and/or protein expression was accompanied with a high stage, as well as smoking and the recurrence of colorectal cancer43. Using analyses of the PPI network, we found that COMP and FN1 interacted with each other, and had similar GO functions. We found that COMP was positively correlated with tumour size, lymph node metastasis, and TNM stage. In addition, in a study of prostate cancer44, breast cancer45, and other cancers, COMP was also found to be closely related to tumorigenesis. All these findings indicated that the upregulation of COL3A1 and COMP is closely related to the occurrence and development of cancer. However, the identification of COL3A1 and COMP as potential therapeutic targets or molecular markers of PTC still require a more complete understanding of their mechanisms of action.

ZAP70 (zeta-chain associated protein kinase; 70 kDa) and CD247 (CD247 molecule) were both downregulated in PTC tissues. They were co-enriched in many GO terms of immune response. It is well-known that there is a close connection between inflammation and cancer. Autoimmune diseases always result in tissue destruction and inflammation, or even an increased risk of PTC46. In addition, an interaction between the ZAP70 and CD247 genes was found using PPI analysis. The two genes were co-enriched in natural killer cell-mediated cytotoxicity in the KEGG pathway. Once activated, NK cells are able to reserve large amounts of cytotoxic granules containing perforin and granzymes that produce cytotoxicity of tumour cells47. Thus, downregulation of ZAP70 and CD247 expression in this pathway may lead to an attenuation of NK cell-mediated cytotoxicity of the tumour, which in turn leads to the occurrence of PTC. We also found that ZAP70 was positively correlated with lymph node metastasis, so it may be related to the invasion and metastasis of PTC.

In conclusion, four DEGs, COMP, COL3A1, ZAP70, and CD247, were identified by RNA-Seq and bioinformatic methods. ZAP70 and CD247 were co-enriched in many immune response-related functions, and COMP and COL3A1 were associated with cell adhesion and biological adhesion during the development of PTC. FN1 and COMP, FN1 and ITGA3, and CD247 and ZAP70 interactions may influence the expression of each other by trans-acting and protein-protein interactions, which in turn may affect the development of PTC. The expression level of COMP was significantly and positively related to the tumour sizes of PTC patients. The higher the gene expression, the larger the tumour size. In addition, the expression levels of COL3A1, COMP, and ZAP70 were positively related to the risk of lymph node metastasis. Furthermore, COL3A1 and COMP expression levels were correlated with the TNM stage in PTC patients. These four DEGs might be promising biomarkers for early-stage PTC, and provide an experimental foundation for further exploration of the pathogenesis of early-stage PTC.

Materials and Methods

Patients and tissue procurement

In order to discover novel genes related to the pathogenesis of papillary thyroid carcinoma (PTC) by using differentially expressed genes (DEGs), we selected the female patients with stage I PTC. First of all, three patients who met the above criteria were enrolled in this study. It was a reasonable amount of patients to do the initial RNA-Seq experiments15,16. Therefore, the three pairs of stage I PTC tissues and matched normal adjacent tissues were sequenced by RNA-Seq. Then we tried to gather samples as much as possible, 17 patients who met the above criteria were also enrolled in this study. But four patients’ tissues were unable to carry out qRT-PCR because of RNA degradation. So 13 patients were enrolled in this study. In total, 16 patients were enrolled in this study. These 16 stage I PTC patients underwent thyroidectomy at The Second Affiliated Hospital of Harbin Medical University (China) from November of 2013 to January of 2014. The entire cohort consisted of 16 females, with a mean age of 42.8 ± 9.3 years. According to the Union for International Cancer Control and the American Joint Committee on Cancer on Tumour Node Metastasis classification, all these patients presented as TNM stage I. We obtained stage I PTC tissues and matched normal adjacent tissues that were >2 cm from the tumour without infiltration. All tissues were obtained at the time of surgical resection, then immediately frozen in liquid nitrogen and stored at −80 °C. Clinical and histopathological information was collected for all patients. All methods were performed in accordance with the relevant guidelines of the ethics committee of The Second Affiliated Hospital of Harbin Medical University, and all patients granted informed consent. The experimental protocols were approved by the ethics committee of The Second Affiliated Hospital of Harbin Medical University.

Haematoxylin and eosin (HE) staining

HE staining was used to assess the sections of three sequenced stage I PTC tissues. After deparaffinization and rehydration, 5 μm longitudinal sections were stained with hematoxylin solution for 5 min followed by 5 dips in 1% acid ethanol (1% HCl in 70% ethanol) and then rinsed in distilled water. Then the sections were stained with eosin solution for 3 min and followed by dehydration with graded alcohol and clearing in xylene. The mounted slides were then examined and photographed using an Olympus BX53 fluorescence microscope (Tokyo, Japan).

RNA extraction, library preparation, and sequencing

Total RNA was extracted from stage I PTC tissues and matched normal adjacent tissues of 16 stage I PTC patients, using TRIzol reagent (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. Total RNA was then stored at −80 °C until used. RNA quantity and purity were assessed by using a Nanodrop (OD 260/280 ratio). RNAs of stage I PTC tissues and matched normal adjacent tissues from three stage I PTC patients (sample number: 1ca, 1adj, 2ca, 2adj, 3ca, and 3adj; the number denoted different samples, the “ca” denoted a cancer sample, and the “adj” denoted a matched, normal, adjacent sample) were used. Six libraries were constructed using an Illumina standard kit according to the manufacturer’s protocol. All sequencing was performed on an Illumina Hiseq. 2500 instrument. The RNA-Seq reads involved 8–11 million with 101 nt unique ends.

Data analysis

Read mapping and differentially expressed genes analyses

Read mapping and differentially expressed genes screening TopHat48 (version 2.0.6) were used to map RNA-Seq reads to reference genomes (Ensembl Human Genome GCRh37). Parameters with default values were used. Following mapping of the sequencing reads, the transcripts were assembled with Cufflinks49 (version 2.2.1). Then, the cuff norm was used to quantify the expression levels for each gene normalized by reads per kb of RPKM reads (1). RPKM ≥ 0.5 was defined as a mapped gene. The mapped genes were then used to calculate the difference of RPKM values and the fold changes between cancer samples and matched normal adjacent samples. A difference >10 and a fold change >1.5 were classified as a DEG.

RPKM=totalexonreadsmappedreads(millions)×exonlength(KB) 1

Functional enrichment analysis

The functional enrichment analyses for differentially expressed genes were performed using the DAVID function annotation tool (http://david.abcc.ncifcrf.gov/home.jsp), which included the KEGG pathway, biological processes, molecular functions, and cellular components. A value of p < 0.05 was defined as significant enrichment.

Construction of the KEGG pathway integrated network

The integrated network of the KEGG pathway was constructed. The relationships between DEGs in significant enrichment in the KEGG pathways were extracted in R (http://www.r-project.org) with the XML package (R, version 2.15.2, Bioconductor, version 2.3). The network was visualized using Cytoscape50.

GO functional enrichment analysis

GO is a standard classification system of gene function and gene products12. We chose the DEGs in the nodes whose degrees were in the top 10% in the network. The GO terms with Benjamini-adjusted p < 0.05 in DAVID were used.

Data validation

qRT-PCR validation

In the abovementioned selected DEGs, all newly discovered PTC-related genes were selected to be validated. And from those ten genes which were reported to be associated with PTC, we randomly selected three PTC-related genes. Validations of the mRNA levels of DEGs were performed using quantitative real-time PCR (qRT-PCR). Total RNAs were extracted as mentioned above from stage I PTC tissues and matched normal adjacent tissues of 16 stage I PTC patients. The cDNAs were synthesized using a first strand cDNA synthesis kit (Takara RR036A; Takara, Japan) according to the manufacturer’s instructions. Subsequently, 1 μL of cDNA product and each gene specific primer were used for PCR, using the Real-time PCR Master Mix kit (Takara RR820A), which was implemented using an ECO fluorescence quantitative PCR system (Illumina, USA). Relative gene expression values were calculated using the 2−ΔΔCt method51.

TCGA database validation

To make our results more reliable, we downloaded thyroid cancer RNA-Seq V2 isoform expression profiles of 513 PTC samples and 59 normal samples from TCGA to validate the positive DEGs that qRT-PCR had verified. The R package, “edgeR,” was used and the genes with values of p < 0.05, fold change >1.5 (or <2/3) between tumour and adjacent normal samples were validated.

Analysis of the PPI networks and chromosomal locations

We chose the integrated PPI network as background, which was integrated from the Biomolecular Interaction Network Database, the Biological General Repository for Interaction Data sets, the Database of Interacting Proteins, the Human Protein Reference Database, IntAct, the Molecular IN Teraction database, the mammalian PPI database of the Munich Information Center on Protein Sequences, PDZBase (a PPI database for PDZ-domains), and Reactome. The validated DEGs of TCGA database were put into the background network of the PPI, and the protein interaction pairs were screened to construct the protein interaction subnet.

The chromosomal positions of genes that interacted with proteins were mapped using the Ensembl database. The mapping software, Circos, was used to identify the chromosomal location, and connected the interacted genes.

Correlations between the DEGs and clinical characteristics of PTCs

To identify correlations between the identified DEGs and PTC clinical characteristics, a total of 504 PTC samples with clinical phenotypic data in TCGA database were included. The correlations between DEG expression levels and clinical characteristics (tumour size, lymph nodes metastasis, distant metastasis, and TNM staging) were analysed using Spearman’s correlation, with Gu’s method used as a ref.27. In addition, we used t-test to analyse the correlation between the DEG expression levels and distant metastasis. We subdivided the gene expression levels of CD247, ZAP70, COMP, and COL3A1 into two groups based on whether they had distant metastasis or not, respectively. A value of p < 0.05 was defined as indicating significance.

Electronic supplementary material

Supplementary information (102.1KB, pdf)

Acknowledgements

This work was funded by the National Natural Science Foundation of China [grant number 81673108], Science and Technology Innovation Talent Research Foundation of Harbin [grant number 2016RAXYJ088], and the Science and Technology Innovation Foundation for Graduates of Harbin Medical University [grant number YJSCX2015-62HYD].

Author Contributions

H.Q. conceived and designed the experiments. M.C., T.Z. and L.L. acquired the experiment data. J.H., M.C. and T.Z. performed the study. J.H., M.C. and Y.W. carried out the data analysis. J.H., M.C. and B.G. wrote this manuscript. All authors have read and approved the final manuscript.

Competing Interests

The authors declare no competing interests.

Footnotes

Jun Han, Meijun Chen contributed equally to this work.

Electronic supplementary material

Supplementary information accompanies this paper at 10.1038/s41598-018-28299-9.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Zhang H, Teng X, Liu Z, Zhang L, Liu Z. Gene expression profile analyze the molecular mechanism of CXCR7 regulating papillary thyroid carcinoma growth and metastasis. Journal of experimental & clinical cancer research: CR. 2015;34:16. doi: 10.1186/s13046-015-0132-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chen AY, Jemal A, Ward EM. Increasing incidence of differentiated thyroid cancer in the United States, 1988–2005. Cancer. 2009;115:3801–3807. doi: 10.1002/cncr.24416. [DOI] [PubMed] [Google Scholar]
  • 3.Ito Y, Nikiforov YE, Schlumberger M, Vigneri R. Increasing incidence of thyroid cancer: controversies explored. Nature reviews. Endocrinology. 2013;9:178–184. doi: 10.1038/nrendo.2012.257. [DOI] [PubMed] [Google Scholar]
  • 4.Stephen JK, et al. DNA methylation in thyroid tumorigenesis. Cancers. 2011;3:1732–1743. doi: 10.3390/cancers3021732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tavares C, Melo M, Cameselle-Teijeiro JM, Soares P, Sobrinho-Simoes M. Endocrine Tumours: Genetic predictors of thyroid cancer outcome. European journal of endocrinology. 2016;174:R117–126. doi: 10.1530/EJE-15-0605. [DOI] [PubMed] [Google Scholar]
  • 6.Rapisuwon S, Vietsch EE, Wellstein A. Circulating biomarkers to monitor cancer progression and treatment. Computational and structural biotechnology journal. 2016;14:211–222. doi: 10.1016/j.csbj.2016.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Garcia V, et al. Free circulating mRNA in plasma from breast cancer patients and clinical outcome. Cancer letters. 2008;263:312–320. doi: 10.1016/j.canlet.2008.01.008. [DOI] [PubMed] [Google Scholar]
  • 8.March-Villalba JA, et al. Cell-free circulating plasma hTERT mRNA is a useful marker for prostate cancer diagnosis and is associated with poor prognosis tumor characteristics. Plos One. 2012;7:e43470. doi: 10.1371/journal.pone.0043470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sun Z, et al. Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing. Plos One. 2011;6:e17490. doi: 10.1371/journal.pone.0017490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.O’Brien MA, Costin BN, Miles MF. Using genome-wide expression profiling to define gene networks relevant to the study of complex traits: from RNA integrity to network topology. International review of neurobiology. 2012;104:91–133. doi: 10.1016/B978-0-12-398323-7.00005-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
  • 12.Jelier R, et al. Literature-aided interpretation of gene expression data with the weighted global test. Briefings in bioinformatics. 2011;12:518–529. doi: 10.1093/bib/bbq082. [DOI] [PubMed] [Google Scholar]
  • 13.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic acids research. 2012;40:D109–114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mewes HW, et al. MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic acids research. 2006;34:D169–172. doi: 10.1093/nar/gkj148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Trapnell C, et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature biotechnology. 2013;31:46–53. doi: 10.1038/nbt.2450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Xu H, Gong Z, Shen Y, Fang Y, Zhong S. Circular RNA expression in extracellular vesicles isolated from serum of patients with endometrial cancer. Epigenomics. 2018;10:187–197. doi: 10.2217/epi-2017-0109. [DOI] [PubMed] [Google Scholar]
  • 17.Barros-Filho MC, Marchi FA, Pinto CA, Rogatto SR, Kowalski LP. High Diagnostic Accuracy Based on CLDN10, HMGA2, and LAMB3 Transcripts in Papillary Thyroid Carcinoma. The Journal of clinical endocrinology and metabolism. 2015;100:E890–899. doi: 10.1210/jc.2014-4053. [DOI] [PubMed] [Google Scholar]
  • 18.Cong D, et al. Expression profiles of pivotal microRNAs and targets in thyroid papillary carcinoma: an analysis of The Cancer Genome Atlas. OncoTargets and therapy. 2015;8:2271–2277. doi: 10.2147/OTT.S85753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.da Silveira Mitteldorf CA, de Sousa-Canavez JM, Leite KR, Massumoto C, Camara-Lopes LH. FN1, GALE, MET, and QPCT overexpression in papillary thyroid carcinoma: molecular analysis using frozen tissue and routine fine-needle aspiration biopsy samples. Diagnostic cytopathology. 2011;39:556–561. doi: 10.1002/dc.21423. [DOI] [PubMed] [Google Scholar]
  • 20.Jung CK, et al. The cytological, clinical, and pathological features of the cribriform-morular variant of papillary thyroid carcinoma and mutation analysis of CTNNB1 and BRAF genes. Thyroid: official journal of the American Thyroid Association. 2009;19:905–913. doi: 10.1089/thy.2008.0332. [DOI] [PubMed] [Google Scholar]
  • 21.Qiu J, et al. RNA sequencing identifies crucial genes in papillary thyroid carcinoma (PTC) progression. Experimental and molecular pathology. 2016;100:151–159. doi: 10.1016/j.yexmp.2015.12.011. [DOI] [PubMed] [Google Scholar]
  • 22.Rodriguez-Rodero S, et al. DNA methylation signatures identify biologically distinct thyroid cancer subtypes. The Journal of clinical endocrinology and metabolism. 2013;98:2811–2821. doi: 10.1210/jc.2012-3566. [DOI] [PubMed] [Google Scholar]
  • 23.Sadow PM, Heinrich MC, Corless CL, Fletcher JA, Nose V. Absence of BRAF, NRAS, KRAS, HRAS mutations, and RET/PTC gene rearrangements distinguishes dominant nodules in Hashimoto thyroiditis from papillary thyroid carcinomas. Endocrine pathology. 2010;21:73–79. doi: 10.1007/s12022-009-9101-3. [DOI] [PubMed] [Google Scholar]
  • 24.Stokowy T, Gawel D, Wojtas B. Differences in miRNA and mRNA Profile of Papillary Thyroid Cancer Variants. International journal of endocrinology. 2016;2016:1427042. doi: 10.1155/2016/1427042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yin Y, et al. MiR-195 Inhibits Tumor Growth and Metastasis in Papillary Thyroid Carcinoma Cell Lines by Targeting CCND1 and FGF2. International journal of endocrinology. 2017;2017:6180425. doi: 10.1155/2017/6180425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhao Y, et al. The combined use of miRNAs and mRNAs as biomarkers for the diagnosis of papillary thyroid carcinoma. International journal of molecular medicine. 2015;36:1097–1103. doi: 10.3892/ijmm.2015.2305. [DOI] [PubMed] [Google Scholar]
  • 27.Gu X, et al. RNA sequencing reveals differentially expressed genes as potential diagnostic and prognostic indicators of gallbladder carcinoma. Oncotarget. 2015;6:20661–20671. doi: 10.18632/oncotarget.3861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Enewold L, et al. Rising thyroid cancer incidence in the United States by demographic and tumor characteristics, 1980–2005. Cancer epidemiology, biomarkers & prevention: a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2009;18:784–791. doi: 10.1158/1055-9965.EPI-08-0960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wendl MC, et al. PathScan: a tool for discerning mutational significance in groups of putative cancer genes. Bioinformatics. 2011;27:1595–1602. doi: 10.1093/bioinformatics/btr193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Crijns AP, et al. Survival-related profile, pathways, and transcription factors in ovarian cancer. Plos Medicine. 2009;6:e24. doi: 10.1371/journal.pmed.1000024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Emery LA, et al. Early dysregulation of cell adhesion and extracellular matrix pathways in breast cancer progression. The American journal of pathology. 2009;175:1292–1302. doi: 10.2353/ajpath.2009.090115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Huang D, Chow TW. Identifying the biologically relevant gene categories based on gene expression and biological data: an example on prostate cancer. Bioinformatics. 2007;23:1503–1510. doi: 10.1093/bioinformatics/btm141. [DOI] [PubMed] [Google Scholar]
  • 33.Mosca R, Pons T, Ceol A, Valencia A, Aloy P. Towards a detailed atlas of protein-protein interactions. Current opinion in structural biology. 2013;23:929–940. doi: 10.1016/j.sbi.2013.07.005. [DOI] [PubMed] [Google Scholar]
  • 34.Navlakha S, Gitter A, Bar-Joseph Z. A network-based approach for predicting missing pathway interactions. Plos computational biology. 2012;8:e1002640. doi: 10.1371/journal.pcbi.1002640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chen B, Fan W, Liu J, Wu FX. Identifying protein complexes and functional modules–from static PPI networks to dynamic PPI networks. Briefings in bioinformatics. 2014;15:177–194. doi: 10.1093/bib/bbt039. [DOI] [PubMed] [Google Scholar]
  • 36.Zeng E, Ding C, Narasimhan G, Holbrook SR. Estimating support for protein-protein interaction data with applications to function prediction. Computational systems bioinformatics. Computational Systems. Bioinformatics Conference. 2008;7:73–84. [PubMed] [Google Scholar]
  • 37.Dai X, Xiang L, Li T, Bai Z. Cancer Hallmarks, Biomarkers and Breast Cancer Molecular Subtypes. Journal of Cancer. 2016;7:1281–1294. doi: 10.7150/jca.13141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhang B, et al. cis-Acting elements and trans-acting factors in the transcriptional regulation of raf kinase inhibitory protein expression. Plos one. 2013;8:e83097. doi: 10.1371/journal.pone.0083097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lloyd G, Landini P, Busby S. Activation and repression of transcription initiation in bacteria. Essays in biochemistry. 2001;37:17–31. doi: 10.1042/bse0370017. [DOI] [PubMed] [Google Scholar]
  • 40.Khalili AA, Ahmad MR. A Review of Cell Adhesion Studies for Biomedical and Biological Applications. International journal of molecular sciences. 2015;16:18149–18184. doi: 10.3390/ijms160818149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lee SH, Jeong D, Han YS, Baek MJ. Pivotal role of vascular endothelial growth factor pathway in tumor angiogenesis. Annals of surgical treatment and research. 2015;89:1–8. doi: 10.4174/astr.2015.89.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Su B, et al. Let-7d suppresses growth, metastasis, and tumor macrophage infiltration in renal cell carcinoma by targeting COL3A1 and CCL7. Molecular cancer. 2014;13:206. doi: 10.1186/1476-4598-13-206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wang XQ, et al. Epithelial but not stromal expression of collagen alpha-1(III) is a diagnostic and prognostic indicator of colorectal carcinoma. Oncotarget. 2016;7:8823–8838. doi: 10.18632/oncotarget.6815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Dakhova O, Rowley D, Ittmann M. Genes upregulated in prostate cancer reactive stroma promote prostate cancer progression in vivo. Clinical cancer research: an official journal of the American Association for Cancer Research. 2014;20:100–109. doi: 10.1158/1078-0432.CCR-13-1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kim C, et al. Global analysis of microarray data reveals intrinsic properties in gene expression and tissue selectivity. Bioinformatics. 2010;26:1723–1730. doi: 10.1093/bioinformatics/btq279. [DOI] [PubMed] [Google Scholar]
  • 46.Bozec A, et al. The thyroid gland: a crossroad in inflammation-induced carcinoma? An ongoing debate with new therapeutic potential. Current medicinal chemistry. 2010;17:3449–3461. doi: 10.2174/092986710792927804. [DOI] [PubMed] [Google Scholar]
  • 47.Wang W, Erbe AK, Hank JA, Morris ZS, Sondel PM. NK Cell-Mediated Antibody-Dependent Cellular Cytotoxicity in Cancer Immunotherapy. Frontiers in immunology. 2015;6:368. doi: 10.3389/fimmu.2015.00368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Trapnell C, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27:431–432. doi: 10.1093/bioinformatics/btq675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001;25:402–408. doi: 10.1006/meth.2001.1262. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary information (102.1KB, pdf)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES