Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Mark Burton; Mads Thomassen; Qihua Tan; Torben A Kruse

doi:10.4137/CIN.S10375

. 2012 Dec 10;11:193–217. doi: 10.4137/CIN.S10375

Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Mark Burton ^1,^2,^✉, Mads Thomassen ^1,², Qihua Tan ^1,^2,³, Torben A Kruse ^1,²

PMCID: PMC3529607 PMID: 23304070

Abstract

Background

The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features.

Methods

In this study we compared the performance of either metagene-or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach.

Results

MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms.

Conclusion

Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.

Keywords: microarray, classification, metagenes, breast cancer

Background

Microarray gene expression analysis has in several studies been applied to elucidate the relation between clinical outcome and gene expression patterns in breast cancer and has demonstrated improvement of recurrence prediction.¹^–¹⁴ In some studies, genes in such profiles might be fully or partially missing in the test data used for validation due to the choice of microarray platform or the presence of missing values associated with a given probe. Furthermore, an obtained gene list can have none or few genes in common with other gene lists addressing the same clinical outcome,¹⁵^,¹⁶ due to usage of different microarray platforms, different methods for measuring mRNA expression levels, variation in patient sampling,¹⁵ lab variation/measurement noise,¹⁷ and differences in data processing such as different normalization methods.¹⁸ Furthermore, a wide array of feature selection methods is available for gene selection, which also affects the constitution of such final gene lists.¹⁵ These feature selection methods encompass filter approaches; selection of top features from a ranked list of genes; wrapper methods where model selection algorithms are wrapped in the search process of feature subsets (ie, the Gini index in random forest);¹⁹ and embedded methods where the feature selection is an integrated part of the classification method, such as iteratively eliminating redundant features with minimal information regarding classification performance.²⁰ More recent approaches to gene selection include recursive feature elimination based on support vector machines (SVM-RFE). This approach uses the coefficient of the weight vector to compute a feature ranking score, from which features with the smallest ranking scores are built into the model, for example, a leave-out-N number of genes approach.²¹ The advanced version combines SVM-RFE with a minimum-redundancy and maximum-relevancy filter, where relevance of each feature is determined by the mutual information among genes and class labels, and the redundancy is given by the mutual information among the genes.²²

In addition to the above mentioned factors, the choice of classification method also impacts the final model and gene lists. Furthermore, the validations of such gene lists in independent data are very heterogeneous, with the majority testing significant differences in survival, which barely reflect the actual classification accuracy, while few studies conducts validations in terms of classification accuracies.

To overcome the above obstacles, individual genes could be considered part of a larger network, that is, their expression being controlled by the expression level of other genes or that a group of genes belong to a specific pathway performing a well-defined task. These genes may be controlled by the same transcription factor or located in the same chromosomal region. Such grouping has been collected in public databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG)²³ the Molecular Signature Database (MsigDB),²⁴ and the Gene Ontology database (GO).²⁵ In relation to breast cancer, for example, cell cycle upregulation or deregulation of other pathways are associated with metastasis²^,³^,²⁶^,²⁷ Furthermore, it has been shown that metastasis progression²⁸ and tumor grading²⁹ in breast cancer are associated with accumulated mutations in several genes, leading to amplification or inactivation of genes, and even large genomic losses or gains in specific chromosomal regions affecting gene expression levels.

Our previous studies showed that the expression levels of such specific entities, called metagenes (MGs), are significantly associated with metastatic outcome in breast cancer across eight different datasets.³⁰^,³¹ Several studies have defined metagene/gene modules derived from microarray data using various methods such as penalized matrix decomposition which clusters similar genes but without similar expression profiles³² hierarchical clustering,³³ correlation,³⁴ or combining a priori protein-protein interactions with microarray gene expression data defining interaction networks as features.³⁵^,³⁶ Few studies have attempted to use such predefined gene sets for prediction models. One such study used a compendia of microarray cancer genes for defining metagene/gene modules by performing hierarchical clustering of these genes expressions and seeding genes within the clusters into gene sets annotated in the public databases.³³ A second study defined metagene/gene modules as sets of significantly correlated or anticorrelated genes combined with prior information about the genes.³⁴ One of the strengths of using gene sets as features is that this circumvents the necessity of sharing all genes between studies. Furthermore, grouping the genes together also reduces the dimensionality of the datasets and thus functions as feature reduction. Therefore, profiles consisting of MGs might be used for developing predictive classifiers that can be validated in independent data.

This study systematically assesses and compares the performance of MG- and SG-(single gene) feature sets and MG- and SG-classifiers extracted from the same samples in predicting metastasis outcome among lymph node–negative breast cancer patients who have not been treated with adjuvant therapy.

These comparisons were first made by model building and classification within the same dataset using 10-fold cross validation. Furthermore, the comparisons were also done across datasets in two ways: (1) application of the entire classifier on the test data and (2) only the features from the classifier are transferred to the test data for model building and testing and evaluated by leave-one-out cross validation (LOOCV). In each case, we also examined two possible scenarios. In the first, the validations were conducted between studies using the same microarray platform (Affymetrix classifier/feature set validated on an independent Affymetrix dataset), while the second, encompassed validations across studies with different platforms (Agilent-developed classifier/feature set validated on an independent Affymetrix dataset).

Results

Features and models

The 71 metagenes used in this study wer determined as gene sets covering similar biological pathways having common transcription factor binding sites or genes located in the same chromosomal region (Supplementary Table 1), which in previous studies have proven to be associated with breast cancer metastasis across eight different datasets using a rank-based method (Fig. 1).³⁰^,³¹ An overview of the eight datasets is shown in Table 1. The final MGs consist, on average, of 21 genes, with the smallest MGs consisting of only 2 genes and the largest, of 65 genes. This rank-based method was also applied to each gene across the same eight datasets, which led to identification of 283 rank-significant SGs (Supplementary Table 2) shared by the four datasets to be used later. Amongst the 283 SGs, 119 genes were also present on the gene lists underlying the MGs (data not shown). These 71 MGs and 283 SGs were used for selecting the optimal MG- and SG-feature sets and development of their corresponding MG- and SG-classifiers.

Metagene and single gene selection procedure.

**Notes:** MGs (blue) and SGs (red) were both derived from the same eight breast cancer gene expression datasets. These covered 32418 genes. 1057 gene lists was defined from these 32418 genes/probes. These were subjected to gene set enrichment analysis (GSEA), ranked within each dataset according to their signal-to-noise ratio, and their across dataset mean rank calculated. This mean rank was significance tested as described in the Materials and Methods section, resulting in 71 metagenes that were scored by the median gene expression of the GSEA leading edge genes. The single genes were selected by directly ranking each gene/probe across the datasets and subsequently following the same procedure as for the metagenes, resulting in 283 significant single genes. The measure for each single is the gene expression level associated with each gene.

Table 1.

Overview of datasets.

Dataset	Chip	Probes (K)	Patients	Outcome	Treatment	Define MG	Define SG	Train	Test	Ref.
Amsterdam	Agilent/Rosetta	25	295, N⁺, N⁻	DM	None, et, ct	√	√			[14]
Amsterdam (AG1) (subset of the above)	Agilent/Rosetta	25	151, N⁻	DM	None	√	√	√		[14]
Rotterdam (AF1)	Affymetrix HG-133A	22	286, N⁻	DM	None	√	√	√		[3]
HUMAC	Spotted oligonucleotides	29	60, N⁻	ME	None	√	√			[7]
Huang	Affymetrix 95av2	12	52, N⁺	RE	Ct	√	√			[13]
Sotiriou 2003	Spotted cDNA	7.6	99, N⁺/N⁻	RE	Et, ct	√	√			[1]
Sotiriou 2006	Affymetrix HG-133A	22	179, N⁺/N⁻	DM	Et	√	√			[12]
Uppsala	Affymetrix HG-133A+B	44	236, N⁺/N⁻	DF	None, ct, et	√	√			[52]
Stockholm	Affymetrix HG-133A+B	44	159, N⁺/N⁻	RE	None, ct, et	√	√			[11]
TRANSBIG (AF2)	Affymetrix HG-133A	22	147, N⁻	DM	None			√^a	√	[35]
Mainz (AF3)	Affymetrix HG-133A	22	200, N⁻	DM	None			√^b	√	[53]

Dataset	AG1		AF1		AF2		AF3

Features method	#MG	#SG	#MG	#SG	#MG	#SG	#MG	#SG
RF	4	21	15	21	14	14	21	26
R-SVM	18	20	57	25	5	22	10	71
S-SVM	29	17	67	35	9	19	64	122

Metagene	Type	# genes
12q13	Chromosome region	28
14q24	Chromosome region	18
16q22	Chromosome region	23
16q24	Chromosome region	14
17q23	Chromosome region	13
17q25	Chromosome region	16
1p31	Chromosome region	14
1q42	Chromosome region	24
20q11	Chromosome region	10
20q13	Chromosome region	29
5q14	Chromosome region	6
5q33	Chromosome region	7
8p21	Chromosome region	14
8q22	Chromosome region	12
8q24	Chromosome region	21
ACTINYPATHWAY	Biological pathway	14
AMINOACYL_TRNA_BIOSYNTHESIS	Biological pathway	8
ARAPPATHWAY	Biological pathway	5
ATRBRCAPATHWAY	Biological pathway	10
BETA_ALANINE_METABOLISM	Biological pathway	11
CELL_CYCLE_KEGG	Biological pathway	39
DNA_REPLICATION_REACTOME	Biological pathway	19
EGFPATHWAY	Biological pathway	8
ELECTRON_TRANSPORT_CHAIN	Biological pathway	39
ERBB2_GRB7	Biological pathway	2
FATTY_ACID_METABOLISM	Biological pathway	20
FRUCTOSE_AND_MANNOSE_METABOLISM	Biological pathway	10
G2PATHWAY	Biological pathway	11
GCCATNTTG_V$YY1_Q6	Transcription factor binding motif	65
GLEEVECPATHWAY	Biological pathway	7
GLYCEROLIPID_METABOLISM	Biological pathway	14
GLYCOLYSIS_AND_GLUCONEOGENESIS	Biological pathway	12
GPCRPATHWAY	Biological pathway	8
HISTIDINE_METABOLISM	Biological pathway	11
Il-12	Biological pathway	8
MRNA_PROCESSING_REACTOME	Biological pathway	24
MRPPATHWAY	Biological pathway	3
NUCLEAR_RECEPTORS	Biological pathway	12
OXIDATIVE_PHOSPHORYLATION	Biological pathway	26
PDGFPATHWAY	Biological pathway	7
PENTOSE_PHOSPHATE_PATHWAY	Biological pathway	11
PPARAPATHWAY	Biological pathway	10
PROTEASOME_DEGRADATION	Biological pathway	18
PURINE_METABOLISM	Biological pathway	28
PYRIMIDINE_METABOLISM	Biological pathway	23
RNA_TRANSCRIPTION_REACTOME	Biological pathway	9
S1P_SIGNALING	Biological pathway	6
S1P54_01	Biological pathway	53
TGASTMAGC_V$NFE2_01	Transcription factor binding motif	35
TNFR2	Biological pathway	9
TOLLPATHWAY	Biological pathway	10
UBIQUITIN_MEDIATED_PROTEOLYSIS	Biological pathway	2
V$AP1_01	Transcription factor binding motif	39
V$AP2_Q3	Transcription factor binding motif	33
V$ARNT_02	Transcription factor binding motif	34
V$BACH1_01	Transcription factor binding motif	50
V$CETS1P54_01	Transcription factor binding motif	53
V$COUP_DR1_Q6	Transcription factor binding motif	29
V$E2F_Q6_01	Transcription factor binding motif	52
V$ELK1_02	Transcription factor binding motif	38
V$ER_Q6_02	Transcription factor binding motif	25
V$GABP_B	Transcription factor binding motif	20
V$HIF1_Q5	Transcription factor binding motif	27
V$MYCMAX_B	Transcription factor binding motif	54
V$NFY_Q6	Transcription factor binding motif	22
V$NRF1_Q6	Transcription factor binding motif	35
V$NRF2_01	Transcription factor binding motif	35
V$SP1_Q6_01	Transcription factor binding motif	26
V$USF2_Q6	Transcription factor binding motif	34
VALINE_LEUCINE_AND_ISOLEUCINE_DEGRADATION	Biological pathway	15
VEGFPATHWAY	Biological pathway	9

Gene symbol	Description
ABCA5	ATP-binding cassette, sub-family A (ABC1), member 5
ABCA8	ATP-binding cassette, sub-family A (ABC1), member 8
ABCC10	ATP-binding cassette, sub-family C (CFTR/MRP), member 10
ABCC5	ATP-binding cassette, sub-family C (CFTR/MRP), member 5
ABTB2	Ankyrin repeat and BTB (POZ) domain containing 2
ACD	Adrenocortical dysplasia homolog (mouse)
ADFP	Adipose differentiation-related protein
ADH1B	Alcohol dehydrogenase IB (class I), beta polypeptide
ADRA2A	Adrenergic, alpha-2A-, receptor
ADRM1	Adhesion regulating molecule 1
ALDH1A1	Aldehyde dehydrogenase 1 family, member A1
ALDH2	Aldehyde dehydrogenase 2 family (mitochondrial)
ALDH6A1	Aldehyde dehydrogenase 6 family, member A1
APOD	Apolipoprotein D
ARHGEF6	Rac/Cdc42 guanine nucleotide exchange factor (GEF) 6
ATP1B3	ATPase, Na+/K+ transporting, beta 3 polypeptide
ATP2A2	ATPase, Ca++ transporting, cardiac muscle, slow twitch 2
ATP9A	ATPase, Class II, type 9A
AURKB	Aurora kinase B
BARD1	BRCA1 associated RING domain 1
BCL2	B-cell CLL/lymphoma 2
BCL2L1	BCL2-like 1
BRCA1	Breast cancer 1, early onset
BUB1	BUB1 budding uninhibited by benzimidazoles 1 homolog (yeast)
BUB1B	BUB1 budding uninhibited by benzimidazoles 1 homolog beta (yeast)
C6	Complement component 6
C7ORF24	Chromosome 7 open reading frame 24
CACNA1D	Calcium channel, voltage-dependent, L type, alpha 1D subunit
CARS	Cysteinyl-tRNA synthetase
CAT	Catalase
CCNA2	Cyclin A2
CCNB1	Cyclin B1
CCNB2	Cyclin B2
CCNE2	Cyclin E2
CCNF	Cyclin F
CCT5	Chaperonin containing TCP1, subunit 5 (epsilon)
CCT6A	Chaperonin containing TCP1, subunit 6A (zeta 1)
CD44	CD44 molecule (Indian blood group)
CDC2	Cell division cycle 2, G1 to S and G2 to M
CDC20	CDC20 cell division cycle 20 homolog (S. cerevisiae)
CDC25B	Cell division cycle 25B
CDC25C	Cell division cycle 25C
CDC34	Cell division cycle 34
CDC45L	CDC45 cell division cycle 45-like (S. cerevisiae)
CDK8	Cyclin-dependent kinase 8
CDKN3	Cyclin-dependent kinase inhibitor 3 (CDK2-associated dual specificity phosphatase)
CDO1	Cysteine dioxygenase, type I
CENPE	Centromere protein E, 312 kDa
CENPF	Centromere protein F, 350/400 ka (mitosin)
CH25H	Cholesterol 25-hydroxylase
CHAF1B	Chromatin assembly factor 1, subunit B (p60)
CIRBP	Cold inducible RNA binding protein
CKAP5	Cytoskeleton associated protein 5
CKS2	CDC28 protein kinase regulatory subunit 2
CNN3	Calponin 3, acidic
CNTN1	Contactin 1
CP	Ceruloplasmin (ferroxidase)
CREBL2	CAMP responsive element binding protein-like 2
CRIM1	Cysteine rich transmembrane BMP regulator 1 (chordin-like)
CSE1L	CSE1 chromosome segregation 1-like (yeast)
CSTF1	Cleavage stimulation factor, 3′ pre-RNA, subunit 1, 50 kDa
CTPS	CTP synthase
CTSD	Cathepsin D (lysosomal aspartyl peptidase)
CTSL	Cathepsin L
CX3CR1	Chemokine (C-X3-C motif) receptor 1
CYP4B1	Cytochrome P450, family 4, subfamily B, polypeptide 1
CYP4F12	Cytochrome P450, family 4, subfamily F, polypeptide 12
DDIT4	DNA-damage-inducible transcript 4
DDX39	DEAD (Asp-Glu-Ala-Asp) box polypeptide 39
DLG7	Discs, large homolog 7 (Drosophila)
DLX2	Distal-less homeobox 2
DOCK1	Dedicator of cytokinesis 1
DPT	Dermatopontin
DUSP1	Dual specificity phosphatase 1
DUSP4	Dual specificity phosphatase 4
DYRK2	Dual-specificity tyrosine-(Y)-phosphorylation regulated kinase 2
EBP	Emopamil binding protein (sterol isomerase)
EDG1	Endothelial differentiation, sphingolipid G-protein-coupled receptor, 1
EGR2	Early growth response 2 (Krox-20 homolog, Drosophila)
ELOVL5	ELOVL family member 5, elongation of long chain fatty acids (FEN1/Elo2, SUR4/Elo3- like, yeast)
ENPP2	Ectonucleotide pyrophosphatase/phosphodiesterase 2 (autotaxin)
EPHX2	Epoxide hydrolase 2, cytoplasmic
ESPL1	Extra spindle poles like 1 (S. cerevisiae)
EVPL	Envoplakin
EXO1	Exonuclease 1
EZH2	Enhancer of zeste homolog 2 (Drosophila)
F3	Coagulation factor III (thromboplastin, tissue factor)
FADD	Fas (TNFRSF6)-associated via death domain
FANCG	Fanconi anemia, complementation group G
FAS	Fas (TNF receptor superfamily, member 6)
FBLN1	Fibulin 1
FBLN5	Fibulin 5
FCER1A	Fc fragment of IgE, high affinity I, receptor for; alpha polypeptide
FEN1	Flap structure-specific endonuclease 1
FGL2	Fibrinogen-like 2
FLJ22531	–
FMO2	Flavin containing monooxygenase 2 (non-functional)
FOS	v-fos FBJ murine osteosarcoma viral oncogene homolog
FOXM1	Forkhead box M1
FRZB	Frizzled-related protein
FUCA1	Fucosidase, alpha-L-1, tissue
GABARAP	GABA(A) receptor-associated protein
GAD1	Glutamate decarboxylase 1 (brain, 67 kDa)
GALK1	Galactokinase 1
GEM	GTP binding protein overexpressed in skeletal muscle
GGCX	Gamma-glutamyl carboxylase
GLA	Galactosidase, alpha
GLI1	Glioma-associated oncogene homolog 1 (zinc finger protein)
GMPS	Guanine monphosphate synthetase
GNG11	Guanine nucleotide binding protein (G protein), gamma 11
GNG12	Guanine nucleotide binding protein (G protein), gamma 12
GPSM2	G-protein signalling modulator 2 (AGS3-like, C. elegans)
GRIK1	Glutamate receptor, ionotropic, kainate 1
GSTM3	Glutathione S-transferase M3 (brain)
GUK1	Guanylate kinase 1
GYS2	Glycogen synthase 2 (liver)
H2AFZ	H2A histone family, member Z
HIST1H3D	Histone cluster 1, H3d
HMGB2	High-mobility group box 2
HMMR	Hyaluronan-mediated motility receptor (RHAMM)
HNMT	Histamine N-methyltransferase
HNRPAB	Heterogeneous nuclear ribonucleoprotein A/B
HNRPH2	Heterogeneous nuclear ribonucleoprotein H2 (H′)
HPN	Hepsin (transmembrane protease, serine 1)
HPRT1	Hypoxanthine phosphoribosyltransferase 1 (Lesch-Nyhan syndrome)
IFNGR2	Interferon gamma receptor 2 (interferon gamma transducer 1)
IGFBP4	Insulin-like growth factor binding protein 4
IQGAP2	IQ motif containing GTPase activating protein 2
ITM2A	Integral membrane protein 2A
ITPR1	Inositol 1,4,5-triphosphate receptor, type 1
JAK2	Janus kinase 2 (a protein tyrosine kinase)
KCTD12	Potassium channel tetramerisation domain containing 12
KIF11	Kinesin family member 11
KIF13B	Kinesin family member 13B
KIF14	Kinesin family member 14
KIF2C	Kinesin family member 2C
KIFC1	Kinesin family member C1
KIAA0101	KIAA0101
KIAA0247	KIAA0247
KIAA0286	–
KIAA0319	KIAA0319
LAMA2	Laminin, alpha 2 (merosin, congenital muscular dystrophy)
LARP1	La ribonucleoprotein domain family, member 1
LEP	Leptin (obesity homolog, mouse)
LIG1	Ligase I, DNA, ATP-dependent
LMNB1	Lamin B1
LMO2	LIM domain only 2 (rhombotin-like 1)
LPHN2	Latrophilin 2
LPL	Lipoprotein lipase
LRIG1	Leucine-rich repeats and immunoglobulin-like domains 1
LRP2	Low density lipoprotein-related protein 2
MAD2L1	MAD2 mitotic arrest deficient-like 1 (yeast)
MAPRE1	Microtubule-associated protein, RP/EB family, member 1
MARS	Methionine-tRNA synthetase
MCM3	MCM3 minichromosome maintenance deficient 3 (S. cerevisiae)
MCM6	MCM6 minichromosome maintenance deficient 6 (MIS5 homolog, S. pombe) (S. cerevisiae)
MCM7	MCM7 minichromosome maintenance deficient 7 (S. cerevisiae)
MEIS1	Meis1, myeloid ecotropic viral integration site 1 homolog (mouse)
MELK	Maternal embryonic leucine zipper kinase
MGP	Matrix Gla protein
MKI67	Antigen identified by monoclonal antibody Ki-67
MN1	Meningioma (disrupted in balanced translocation) 1
MRPL12	Mitochondrial ribosomal protein L12
MT2A	Metallothionein 2A
MTHFD2	Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2, methenyltetrahydrofolate cyclohydrolase
MVD	Mevalonate (diphospho) decarboxylase
MYBL2	v-myb myeloblastosis viral oncogene homolog (avian)-like 2
NCOA1	Nuclear receptor coactivator 1
NDUFA9	NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 9, 39 kDa
NEDD9	Neural precursor cell expressed, developmentally down-regulated 9
NEK2	NIMA (never in mitosis gene a)-related kinase 2
NME5	Non-metastatic cells 5, protein expressed in (nucleoside-diphosphate kinase)
NNAT	Neuronatin
NP	Nucleoside phosphorylase
NR3C2	Nuclear receptor subfamily 3, group C, member 2
NTRK2	Neurotrophic tyrosine kinase, receptor, type 2
NUDT1	Nudix (nucleoside diphosphate linked moiety X)-type motif 1
NUP155	Nucleoporin 155 kDa
NUP62	Nucleoporin 62 kDa
NVL	Nuclear VCP-like
OMD	Osteomodulin
P4HA2	Procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), alpha polypeptide II
PDCD4	Programmed cell death 4 (neoplastic transformation inhibitor)
PDE4A	Phosphodiesterase 4A, cAMP-specific (phosphodiesterase E2 dunce homolog, Drosophila)
PDZRN3	PDZ domain containing RING finger 3
PFKP	Phosphofructokinase, platelet
PHLDA2	Pleckstrin homology-like domain, family A, member 2
PIN1	Protein (peptidylprolyl cis/trans isomerase) NIMA-interacting 1
PIP	Prolactin-induced protein
PIR	Pirin (iron-binding nuclear protein)
PKMYT1	Protein kinase, membrane associated tyrosine/threonine 1
PLK4	Polo-like kinase 4 (Drosophila)
PLP2	Proteolipid protein 2 (colonic epithelium-enriched)
PNMA2	Paraneoplastic antigen MA2
PNRC1	Proline-rich nuclear receptor coactivator 1
POLD1	Polymerase (DNA directed), delta 1, catalytic subunit 125 kDa
POLR2H	Polymerase (RNA) II (DNA directed) polypeptide H
POLS	Polymerase (DNA directed) sigma
PRAME	Preferentially expressed antigen in melanoma
PSD3	Pleckstrin and Sec7 domain containing 3
PSMB3	Proteasome (prosome, macropain) subunit, beta type, 3
PSMB7	Proteasome (prosome, macropain) subunit, beta type, 7
PSMD1	Proteasome (prosome, macropain) 26S subunit, non-ATPase, 1
PSMD11	Proteasome (prosome, macropain) 26S subunit, non-ATPase, 11
PTDSR	Phosphatidylserine receptor
PTGER3	Prostaglandin E receptor 3 (subtype EP3)
PTGER4	Prostaglandin E receptor 4 (subtype EP4)
PTPRT	Protein tyrosine phosphatase, receptor type, T
PTTG1	Pituitary tumor-transforming 1
QDPR	Quinoid dihydropteridine reductase
RABGGTA	Rab geranylgeranyltransferase, alpha subunit
RABIF	RAB interacting factor
RAD51	RAD51 homolog (RecA homolog, E. coli) (S. cerevisiae)
RAD51AP1	RAD51 associated protein 1
RAE1	RAE1 RNA export 1 homolog (S. pombe)
RALA	v-ral simian leukemia viral oncogene homolog A (ras related)
RBMS3	RNA binding motif, single stranded interacting protein
RDBP	RD RNA binding protein
RECQL4	RecQ protein-like 4
RFC3	Replication factor C (activator 1) 3, 38 kDa
RFC4	Replication factor C (activator 1) 4, 37 kDa
RGS5	Regulator of G-protein signalling 5
RICS	–
RNASEH2A	Ribonuclease H2, subunit A
RRM1	Ribonucleotide reductase M1 polypeptide
RRM2	Ribonucleotide reductase M2 polypeptide
RTN1	Reticulon 1
SAC3D1	SAC3 domain containing 1
SC5DL	Sterol-C5-desaturase (ERG3 delta-5-desaturase homolog, fungal)-like
SDS	Serine dehydratase
SEC14L2	SEC14-like 2 (S. cerevisiae)
SEC61G	Sec61 gamma subunit
SELE	Selectin E (endothelial adhesion molecule 1)
SEMA3E	Sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3E
SERPINA1	Serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1
SF3B4	Splicing factor 3b, subunit 4, 49 kDa
SFRP4	Secreted frizzled-related protein 4
SFRS5	Splicing factor, arginine/serine-rich 5
SH3BGRL	SH3 domain binding glutamic acid-rich protein like
SIAHBP1	–
SIX1	Sine oculis homeobox homolog 1 (Drosophila)
SLBP	Stem-loop (histone) binding protein
SLC14A1	Solute carrier family 14 (urea transporter), member 1 (Kidd blood group)
SLC16A3	Solute carrier family 16, member 3 (monocarboxylic acid transporter 4)
SLC25A1	Solute carrier family 25 (mitochondrial carrier; citrate transporter), member 1
SLC4A7	Solute carrier family 4, sodium bicarbonate cotransporter, member 7
SLIT2	Slit homolog 2 (Drosophila)
SMARCA2	SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 2
SORBS2	Sorbin and SH3 domain containing 2
SORL1	Sortilin-related receptor, L(DLR class) A repeats-containing
SPAG5	Sperm associated antigen 5
SPRY2	Sprouty homolog 2 (Drosophila)
SSPN	Sarcospan (Kras oncogene-associated gene)
SSRP1	Structure specific recognition protein 1
STC2	Stanniocalcin 2
STMN1	Stathmin 1/oncoprotein 18
SURF2	Surfeit 2
TACSTD1	Tumor-associated calcium signal transducer 1
TAT	Tyrosine aminotransferase
TBCD	Tubulin-specific chaperone d
TGFB3	Transforming growth factor, beta 3
TIMELESS	Timeless homolog (Drosophila)
TIMM17B	Translocase of inner mitochondrial membrane 17 homolog B (yeast)
TLR3	Toll-like receptor 3
TOP2A	Topoisomerase (DNA) II alpha 170 kDa
TPX2	TPX2, microtubule-associated, homolog (Xenopus laevis)
TRIP13	Thyroid hormone receptor interactor 13
TROAP	Trophinin associated protein (tastin)
TUBA1	Tubulin, alpha 1
TXN	Thioredoxin
TXNIP	Thioredoxin interacting protein
TXNRD1	Thioredoxin reductase 1
TYRP1	Tyrosinase-related protein 1
UBE2C	Ubiquitin-conjugating enzyme E2C
UBE2V2	Ubiquitin-conjugating enzyme E2 variant 2
WDHD1	WD repeat and HMG-box DNA binding protein 1
WFDC2	WAP four-disulfide core domain 2
WWP2	WW domain containing E3 ubiquitin protein ligase 2
XPOT	Exportin, tRNA (nuclear export receptor for tRNAs)
YWHAZ	Tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide
ZNF238	Zinc finger protein 238
ZWINT	ZW10 interactor
AASS	Aminoadipate-semialdehyde synthase

Internal results	RF			R-SVM			S-SVM			Across

Method	Sen	Spe	bAcc	Sen	Spe	bAcc	Sen	Spe	bAcc	Sen	Spe	bAcc
MG	73	69	71	80	65	73	71	76	74	75	70	73
SG	85	53	69	85	59	72	69	79	74	80	64	72

PERMALINK

Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Mark Burton

Mads Thomassen

Qihua Tan

Torben A Kruse

Abstract

Background

Methods

Results

Conclusion

Background

Results

Features and models

Figure 1.

Table 1.

Table 2.

Internal performance of MGand SG-models

Figure 2.

Figure 3.

Feature set transferability

Figure 4.

Figure 5.

Comparison of classifier performance based on MG- and SG-feature sets in independent datasets

Classifier transferability and performance

Figure 6.

Comparison of classifier performance

Discussion

Conclusions

Methods

Datasets used in this study

Dataset processing

Single gene and metagene features

Classifier building

Classification performance assessment

Endpoint/outcome

Comparison of external validation performance

Supplementary Tables

Table S1.

Table S2.

Table S3.

Table S4.

Table S5.

Table S6.

Table S7.

Acknowledgements

List of abbreviations used

Footnotes

References

Associated Data

Supplementary Materials

Table S1.

Table S2.

Table S3.

Table S4.

Table S5.

Table S6.

Table S7.

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases