Skip to main content
Journal of Cancer Research and Clinical Oncology logoLink to Journal of Cancer Research and Clinical Oncology
. 2014 Jun 12;140(10):1715–1721. doi: 10.1007/s00432-014-1719-y

Integrative metabolome and transcriptome profiling reveals discordant glycolysis process between osteosarcoma and normal osteoblastic cells

Kai Chen 1,2, Chunyan Zhu 1, Ming Cai 1, Dong Fu 1, Biao Cheng 1, Zhengdong Cai 1, Guodong Li 1,, Jilong Liu 3,
PMCID: PMC11823902  PMID: 24919440

Abstract

Background

Osteosarcoma (OS) is the most common primary malignant tumor of bone in children and adolescents. However, few biomarkers of diagnostic significance have been established. In recent years, high-throughput transcriptomic and metabolomic approaches make it possible for studying the levels of thousands of biomarkers simultaneously.

Methods

In this study, we integrated two disparate transcriptomic and metabolomic datasets to find meaningful biomarkers and then used an independent dataset to test the sensibility and specificity of these biomarkers.

Results

By using integrated two datasets, we discovered that the biomarkers involved in the glycolysis pathway are highly enriched, including 4 genes (ENO1, TPI1, PKG1 and LDHC) and 2 metabolites (lactate and pyruvate). The 4 genes were significantly down-regulated in OS samples as well as the 2 metabolites. The mixed metabolites + genes signature also outperformed metabolites or genes alone, with recall being 0.813 and F-measure being 0.812. And the AUC value of metabolites + genes classifier was 0.825 (compared to 0.58 for metabolites and 0.821 for genes alone).

Conclusion

Our findings establish that integrated transcriptomic and metabolomic signature can be used to distinguish OS malignant with good diagnostic accuracy superior to other methods.

Electronic supplementary material

The online version of this article (doi:10.1007/s00432-014-1719-y) contains supplementary material, which is available to authorized users.

Keywords: Osteosarcoma, Metabolome, Transcriptome, Data integration

Introduction

Osteosarcoma (OS) is the most common highly malignant tumor of the bone and often manifests during the second or third decade of life. It accounts for approximately 60 % of all malignant bone tumors occurring in the first 2 decades of life and has an annual incidence of about 5.6 cases per million population (Stein 1975). The introduction of neoadjuvant chemotherapy 20 years ago increased long-term survival rates from 10–20 % to nearly 60–70 % (Sandberg and Bridge 2002). Although adjuvant chemotherapy effectively improves patient survival and treatment of primary tumors, some patients who present with metastatic disease and some with tumors that recur after treatment continue to have a poor prognosis. The estimated 5-year survival rate after relapse is only 25 %. Small, recurrent tumors can be hard to detect during follow-up because most are localized and difficult to access. Although computed tomography or magnetic resonance imaging is usually performed, the distinction between inflammation and recurrent tumor is sometimes impossible. Therefore, new techniques for the early diagnosis and treatment of OS are urgently needed (Jin et al. 2012).

Several attempts have been made to predict the occurrence and prognosis of OS based on single or multiple clinicopathologic features such as the severity of the liver function, age, tumor size, grade, microvascular invasion, portal vein thrombosis and the presence of microsatellite regions. However, their clinical applicability is worthy of further large-scale validations (Toguchida et al. 1989; Lopez-Guerrero et al. 2004). Recent studies on gene expression profiles could successfully predict the occurrence, progression or survival of cancers, but the lack of consistency of these microarray-based predictors generated from the heterogeneity of the patient cohorts and the difference in microarray platforms remain one of the major obstacles to their clinical use, making it necessary to identify a reliable and consistent predictor that is robust enough to overcome the variabilities induced by different platforms or different patient cohorts.

Previously, we analyzed metabolome and transcriptome of OS cells and serum from OS patients, respectively. We hypothesized that integrating both datasets could be used to find biomarkers with better performance. To this end, we aimed to integrate these two dataset in a unified statistical frame at pathway level. We found that a combination of four genes (TPI1, PGK1, ENO2 and LDHC) and two metabolites (pyruvate and lactate) is a potential signature for OS. Finally, we evaluated the general and specific performance of this integrated biomarkers in an independent dataset. Our findings establish that integrated gene and metabolite signature can be used to distinguish OS malignant with good diagnostic accuracy superior to other methods.

Materials and methods

Transcriptomic and metabolomic datasets

The transcriptomic dataset was obtained as described previously (Li et al. 2009). Briefly, three OS cell lines (MG-63, Saos-2 and U-2 OS) and one osteoblastic cell line (hFOB1.19) were subjected to Affymetrix gene microarray hybridization. Data analysis was performed using Microarray Suite 5.0 for each cell line compared to the osteoblastic cell line, respectively. Differentially expressed genes were selected using following criterion: (1) the average fold change between OS cell lines and osteoblastic cell line was more than or equal to twofolds; and (2) the p value of single sample t test was less than or equal to 0.05. T test was conducted using MATLAB 7.5 (MathWorks). The metabolomic dataset was obtained by analyzing the serum samples collected from 24 OS patients and a control group of 32 healthy individuals (Zhang et al. 2010). Serum metabolites were subjected to trimethylsilyl derivatization and analyzed by GC–TOF MS. Metabolites associated with OS were selected based on a threshold of variable importance in the projection (VIP) value (VIP > 1) from a typical sevenfold cross-validated OPLS-DA model. In parallel, these differential metabolites from the OPLS-DA model were validated at a univariate level using Student’s t test (p < 0.05). The cutoff of fold change shows how the metabolites varied from diseased individuals compared with those of the healthy controls was set to 1.2.

Patients and samples

All participants provided written informed consent, and the protocol was approved by our institutional review board. We obtained sera from 20 patients with OS and 20 age- and sex-matched healthy control subjects. Serum samples were separated by centrifugation at 3,000 r/min for 30 min. Serum aliquots of 150 μl were obtained and stored at −80 °C until use.

Real-time RT-PCR

Total RNAs from mouse uteri or cultured cells were isolated using TRIzol reagent (Invitrogen), digested by RQ1 deoxyribonuclease I (Promega) and reverse transcribed into cDNA with PrimeScript reverse transcriptase reagent kit (Perfect real time; TaKaRa, Dalian, China). For real-time RT-PCR, cDNA was amplified using a SYBR Premix Ex Taq kit (TaKaRa; DRR041S) on the Rotor-Gene 3000A system (Corbett Research, Mortlake, Victoria, Australia). The conditions used for real-time PCR were as follows: 95 °C for 10 s, followed by 40 cycles of 95°C for 5 s and 60 °C for 34 s. All reactions were run in triplicate. 18S rRNA was used for normalization. Data from real-time PCR were analyzed using the 2-ΔΔCt method. Primers used for real-time PCR were listed in Table 1.

Table 1.

Primers used in this study

Gene Primers Product length (bp)
TPI1 Forward: 5′-TTTATGGAGGCTCTGTGA-3′ 133
Reverse: 5′-GTAGGGAAGATGGATGGG-3′
PKG1 Forward: 5′-ATGGAACACGGAGGATAA-3′ 220
Reverse: 5′-GAACTAAAAGGCAGGAAAG-3′
ENO2 Forward: 5′-TCATCAAGGACAAATACGG-3′ 125
Reverse: 5′-CTCTGAGGCAGCAACATC-3′
LDHC Forward: 5′-CAAGGCAGCAGGAGGGAG-3′ 94
Reverse: 5′-GTAACGGAAACGGGCAGA-3′
18S rRNA Forward: 5′-CCTGGATCCGCAGCTAGGA-3′ 422
Reverse: 5′-GCGGCGAATACGAATGCCC-3′

Serum l-lactate and pyruvate assay

Serum samples were used for lactate determination with an l-lactate assay kit (Biomedical Research Service Center) according to the manufacturer’s protocol. Pyruvate level was measured using the Pyruvate Assay Kit from BioVison. Fifty microliters of the d-alanine conversion reaction was added to 50 μl of Pyruvate Detection Mix and incubated at room temperature for 30 min. A standard curve covering a range of 10–0.1 nmol/well was created as a control. Absorbance was measured at 570 nm, and results were calculated relative to standard curve (Table 2).

Table 2.

Pathway enrichment analysis of differential metabolites

Pathway Count p value
Alanine, aspartate and glutamate metabolism 4 1.91E−04
Citrate cycle (TCA cycle) 3 0.001761
Glycolysis/gluconeogenesis 3 0.003169
Pentose phosphate pathway 3 0.003169
Pyruvate metabolism 3 0.003169
Metabolic pathways 14 0.003194
Arginine and proline metabolism 4 0.003194
ABC transporters 4 0.003782
Butanoate metabolism 3 0.003924
Glyoxylate and dicarboxylate metabolism 3 0.004766
Glycerolipid metabolism 2 0.037919
Propanoate metabolism 2 0.044455

Pathway enrichment analysis

Biological pathway analysis was performed with the KEGG pathway database (http://www.genome.jp/kegg/pathway.html). To identify the canonical pathways, genes and metabolites were mapped to known entries in the KEGG pathway database. The significance of the pathway was tested using Bioconductor packages (http://www.bioconductor.org), and pathways with p value <0.05 were chosen for further analysis (Table 3).

Table 3.

Pathway enrichment analysis of differential genes

Pathway Count p value
Purine metabolism 10 0.0015989
Valine, leucine and isoleucine degradation 5 0.0332322
Glycolysis/gluconeogenesis 5 0.0351129
Arginine and proline metabolism 5 0.0443051
Cysteine and methionine metabolism 4 0.0450947

Discriminant analysis and ROC curves

The biomarker panel was analyzed by supporting vector machine (SVM) algorithm (R e1071 package). ROC analysis was performed to evaluate the performance. Area under the ROC curve was calculated using RORC package.

Results

Overview of transcriptomic and metabolomic signatures

The transcriptomic dataset was obtained as described previously. The data were re-analyzed. Differentially expressed genes were selected using following criterion: (1) the average fold change between OS cell lines and osteoblastic cell line was more than or equal to twofolds; and (2) the p value of single sample t test was ≤0.05. T test was conducted using MATLAB 7.5 (MathWorks). Data mining of this dataset led to the identification of a list of 211 genes, of which 93 genes were up-regulated and 118 were down-regulated. The metabolomic dataset was obtained by analyzing the serum samples collected from 24 OS patients and a control group of 32 healthy individuals. Metabolites associated with OS were selected based on a threshold of VIP value (VIP > 1) from a typical sevenfold cross-validated OPLS-DA model. In parallel, these differential metabolites from the OPLS-DA model were validated at a univariate level using Student’s t test (p < 0.05). The cutoff of fold change shows how the metabolites varied from diseased individuals compared with those of the healthy controls was set to 1.2. Finally, 7 up-regulated and 12 down-regulated metabolites were selected. Both gene and metabolite signatures were used for further data integration.

Integrated analysis of OS signatures

We integrated gene and metabolite signatures in the pathway level. Figure 1 presents the general framework of pathway enrichment analysis and data integration. First, the genes differentially expressed in OS tissues relative to their corresponding non-tumor tissues were filtered by a statistical criterion. The identified genes were chosen as the candidate genes. Meanwhile, metabolite signatures were also identified in statistical frame. Then, biological pathway analysis was performed with the KEGG pathway database (http://www.genome.jp/kegg/pathway.html). To identify the canonical pathways, genes and metabolites were mapped to known entries in the KEGG pathway database. Pathway enrichment analysis was analyzed using our in-house R scripts, and the enriched pathways were chosen. For genes signature, enriched pathways were alanine, aspartate and glutamate metabolism, citrate cycle (TCA cycle), glycolysis/gluconeogenesis, pentose phosphate pathway, pyruvate metabolism, metabolic pathways, arginine and proline metabolism, ABC transporters, butanoate metabolism, glyoxylate and dicarboxylate metabolism, glycerolipid metabolism and propanoate metabolism. For metabolites signature, enriched pathways were purine metabolism, valine, leucine and isoleucine degradation, glycolysis/gluconeogenesis, arginine and proline metabolism, cysteine and methionine metabolism. After that, pathways enriched in both genes and metabolites were selected. In the end, we found that only the glycolysis/gluconeogenesis pathway was both enriched in genes and metabolites. We mapped genes and metabolites in glycolysis/gluconeogenesis pathway and get a mixed signature of four genes (ENO1, TPI1, PKG1 and LDHC) and two metabolites (lactate and pyruvate) (Fig. 2). This signature was used for validation in an independent dataset.

Fig. 1.

Fig. 1

Schematic of experimental design and workflow

Fig. 2.

Fig. 2

Glycolysis pathway is consistently enriched in differential genes and differential metabolites

Validation in an independent dataset

To investigate the predictive value of mixed metabolite and gene signature, we obtained samples from 20 patients with OS and 20 age- and sex-matched healthy control subjects. Using qRT-PCR and ELISA, we constructed an independent test dataset (N = 40). In this dataset, there are four genes and two metabolites. These four genes were ENO1, TPI1, PKG1 and LDHC, all of which were significantly down-regulated in OS samples (Fig. 3, upper panel). The metabolites were lactate and pyruvate. They also tended to exhibit lower levels in OS samples (Fig. 3, lower panel).

Fig. 3.

Fig. 3

Validation in an independent dataset. The p values were 0.049, 0.046, 0.000127, 0.000233, 0.0148 and 0.0381 for ENO2, TPI1, PKG1, LDHC, lactate and pyruvate

A predictive model using integrated signatures

The independent expression datasets were used to train a SVM classifier using e1071 R package. Sensitivity and specificity were calculated with the Weka software. Sensitivity and specificity can measure the ability of a test to identify true positives and false ones in a dataset. Sensitivity = TP/(TP + FN); specificity = TN/(TN + FP), where TP, TN, FP, FN, respectively, refer to the number of true-positive, true-negative, false-positive and false-negative result components in a test. The dataset was randomly separated into the training and test datasets, and this procedure was repeated 100 times. Finally, the accuracy values of our classifier on metabolites, genes, and metabolites + genes were 0.586, 0.771 and 0.813, respectively. It suggests the mixed metabolites + genes signature had a relatively high ability to identify the true OS samples against metabolites or genes alone. Additionally, we found that, based on recall and F-measure parameters, the mixed metabolites + genes signature also outperformed metabolites or genes alone, with recall being 0.813 and F-measure being 0.812 (Fig. 4).

Fig. 4.

Fig. 4

A predictive model using integrated signatures

Performance of the predictor

The overall performance was evaluated by fivefold cross-validation test. The resulting ROC curves were illustrated in Fig. 5. Each point on the ROC curve denotes the sensitivity and specificity against a set of weights and score threshold.

Fig. 5.

Fig. 5

ROC curve

Because the area under curve (AUC) is an indicator of the discriminatory power for the classifier, it was also used here to evaluate the predictive efficacy of our classifier. From Fig. 5, we can find that our metabolites + genes classifier had an AUC value approximating 0.825 (compared to 0.58 for metabolites and 0.821 for genes alone), suggesting that the mixed signature has a great reliability and efficacy to identify the true OS tissues.

Discussion

Early detection of cancer, especially OS, is important because the tumor has often spread to other organs by the time of diagnosis. Nevertheless, morphological discrimination between early cancer and dysplasia is difficult by using biopsy specimens. Therefore, to improve the prognosis of patients with OS, the development of early detection methods is critical. In this article, we discussed framework and evaluation of a data integration approach, which combine statistical significance at the pathway level. In general, our approach has the advantage of not requiring gene matching across studies and is often statistically more powerful. Our validation study characterized our methods outperform one another.

To our knowledge, this is the first article to systematically investigate and develop data integration approaches for OS samples. When searching for a consensus cancer classifier, some studies have applied a combined analysis of several heterogenous datasets and used certain mathematical methods such as logistic discrimination, quadratic discriminant analysis or analysis of variance to “correct” systematic biases existing within those datasets to train classifiers. Scherf et al. (support vector machine classification and validation of cancer tissue samples using microarray expression data) (Furey et al. 2000) used average-linkage clustering for tumor tissues from various sites of origin. Support vector machine was applied to the classification of tumor and normal ovarian tissues by Furey et al. (from signatures to models: understanding cancer using microarrays). While these methods are certainly a step forward in the right direction, they may bring about some problems as well. Experimental biases present in similar datasets generated in different laboratories using different platforms can be possibly lessened or removed by those methods (Shen and Tseng 2010). However, if datasets contain diverse patient populations, technical and biological effects embedded in the microarray data cannot be differentiated. Thus, the application of those methods will remove informative biological variability.

In our classifier, the systematic integration of the differential expression analysis of transcriptomic and metabolomic data and mixed features of gene and metabolite levels offers us two main advantages: First, it enables us to sufficiently utilize the gene and metabolite information, which is believed to be more informative than expression changes alone. Second, pathway analysis is a powerful tool to understand pathological mechanisms of disease. By integrating the topological features of biological network, some information lost in the differential expression analysis is added to our classifier.

There are, however, some flaws with our classifier. First, the analysis favored well-studied pathways, because the published literature-based pathways were key criteria used for enrichment analysis. Thus, the current model might not be as effective for identifying orphan genes. Second, although connectivity is the most important topological feature for the components of biological networks, this information is incomplete. Future studies integrating more characteristics into our classifier should be able to provide a keener insight into pathways and OS.

In conclusion, we integrated two disparate transcriptomic and metabolomic datasets and discovered that the biomarkers involved glycolysis pathway are highly enriched. Using an independent dataset, we showed that incorporating the integrated signature information successfully penalized false prediction and improved the whole performance. Our results suggested that high-throughput proteomic profiling may reveal new biomarkers for the early detection and diagnosis of OS.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Acknowledgments

This work was financially supported by National Natural Science Foundation of China (81202116).

Conflict of interest

We declare that no IRB approval was required in this study. This work was financially supported by National Natural Science Foundation of China (81202116). We have no financial and personal relationships with other people or organizations that can inappropriately influence our work. All authors have participated in our research, and the article has not been published elsewhere.

Abbreviation

OS

Osteosarcoma

Footnotes

Kai Chen and Chunyan Zhu are co-first authors.

Contributor Information

Guodong Li, Email: litrue2004@yahoo.com.cn.

Jilong Liu, Email: liujilong1234@126.com.

References

  1. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914 [DOI] [PubMed] [Google Scholar]
  2. Jin S, Shen JN, Peng JQ, Wang J, Huang G, Li MT (2012) Increased expression of serum gelsolin in patients with osteosarcoma. Chin Med J (Engl) 125:262–269 [PubMed] [Google Scholar]
  3. Li G, Zhang W, Zeng H, Chen L, Wang W, Liu J, Zhang Z, Cai Z (2009) An integrative multi-platform analysis for discovering biomarkers of osteosarcoma. BMC Cancer 9:150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Lopez-Guerrero JA, Lopez-Gines C, Pellin A, Carda C, Llombart-Bosch A (2004) Deregulation of the G1 to S-phase cell cycle checkpoint is involved in the pathogenesis of human osteosarcoma. Diagn Mol Pathol 13:81–91 [DOI] [PubMed] [Google Scholar]
  5. Sandberg A, Bridge J (2002) Updates on the cytogenetics and molecular genetics of bone and soft tissue tumors: alveolar soft part sarcoma. Cancer Genet Cytogenet 136:1–9 [DOI] [PubMed] [Google Scholar]
  6. Shen K, Tseng GC (2010) Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 26:1316–1323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Stein JJ (1975) Osteogenic sarcoma (osteosarcoma): results of therapy. Am J Roentgenol Radium Ther Nucl Med 123:607–613 [DOI] [PubMed] [Google Scholar]
  8. Toguchida J, Ishizaki K, Sasaki MS, Nakamura Y, Ikenaga M, Kato M, Sugimot M, Kotoura Y, Yamamuro T (1989) Preferential mutation of paternally derived RB gene as the initial event in sporadic osteosarcoma. Nature 338:156–158 [DOI] [PubMed] [Google Scholar]
  9. Zhang Z, Qiu Y, Hua Y, Wang Y, Chen T, Zhao A, Chi Y, Pan L, Hu S, Li J, Yang C, Li G, Sun W, Cai Z, Jia W (2010) Serum and urinary metabonomic study of human osteosarcoma. J Proteome Res 9:4861–4868 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Journal of Cancer Research and Clinical Oncology are provided here courtesy of Springer

RESOURCES