TMA Navigator: network inference, patient stratification and survival analysis with tissue microarray data

Alexander L R Lubbock; Elad Katz; David J Harrison; Ian M Overton

doi:10.1093/nar/gkt529

. 2013 Jun 11;41(Web Server issue):W562–W568. doi: 10.1093/nar/gkt529

TMA Navigator: network inference, patient stratification and survival analysis with tissue microarray data

Alexander L R Lubbock ¹, Elad Katz ², David J Harrison ³, Ian M Overton ^1,^*

PMCID: PMC3692046 PMID: 23761446

Abstract

Tissue microarrays (TMAs) allow multiplexed analysis of tissue samples and are frequently used to estimate biomarker protein expression in tumour biopsies. TMA Navigator (www.tmanavigator.org) is an open access web application for analysis of TMA data and related information, accommodating categorical, semi-continuous and continuous expression scores. Non-biological variation, or batch effects, can hinder data analysis and may be mitigated using the ComBat algorithm, which is incorporated with enhancements for automated application to TMA data. Unsupervised grouping of samples (patients) is provided according to Gaussian mixture modelling of marker scores, with cardinality selected by Bayesian information criterion regularization. Kaplan–Meier survival analysis is available, including comparison of groups identified by mixture modelling using the Mantel-Cox log-rank test. TMA Navigator also supports network inference approaches useful for TMA datasets, which often constitute comparatively few markers. Tissue and cell-type specific networks derived from TMA expression data offer insights into the molecular logic underlying pathophenotypes, towards more effective and personalized medicine. Output is interactive, and results may be exported for use with external programs. Private anonymous access is available, and user accounts may be generated for easier data management.

INTRODUCTION

Oncogenic selection manifests through dysregulated pathways (1). Protein abundance and post-translational modifications (PTMs) are key determinants of network/pathway activity; therefore, functional proteomics is particularly important for understanding signalling networks underlying cancer progression, including evolution of drug resistance and metastasis (2). Tissue microarrays (TMAs) enable study of protein (and RNA) expression in ex vivo material, typically formalin-fixed paraffin-embedded tissue obtained at operation (3). Multiplexed immunohistochemical analysis across arrays of tissue cores efficiently derives protein expression measurements for many specimens (4). TMAs also provide greater consistency than whole section approaches due to simultaneous processing of multiple samples in identical conditions, among other features (5). Clinical subtyping frequently uses TMAs, for example to determine estrogen receptor-α (ER-α) and HER2/neu status in breast cancer (5–7). Although alternative techniques afford greater throughput for estimating protein expression, notably reverse phase protein arrays (8) and mass spectrometry (9), TMAs have particular advantages. These include identification of marker subcellular localization and discrimination of tumour compartments (e.g. stroma) using little material and without requirement for laser capture microdissection or cell fractionation (10,11). Furthermore, TMAs provide potential to identify single cell expression distributions (12). TMA Navigator provides an integrated platform for TMA data, designed to handle both categorical, semi-continuous and continuous scoring, e.g. (13–16). User-friendly interactive access is provided for data processing, investigation of marker networks and risk stratification. An option is available for reduction of batch effects, which are common, for example where data are split across multiple TMA blocks (17,18). Techniques for data exploration include kernel density estimation and Gaussian mixture modelling with Bayesian information criterion regularization for unbiased cluster identification. Analysis of survival is included (19), incorporating stratification based on mixture model results. Evidence is mounting that most phenotypes are governed by complex networks (20,21). TMA Navigator provides network inference approaches applicable to TMA datasets, which typically have relatively few markers. While several resources for TMA image data processing and management exist (22–25), few user-friendly tools provide tailored workflows for data analysis and integration with clinical variables. Stanford TMA software (26) and X-tile (27) are notable, but provide comparatively restricted functionality. Study of marker relationships in clinical samples contributes to the development and testing of hypotheses about control of medically relevant phenotypes, such as treatment response or metastasis (21).

USAGE

A flowchart summarizing the steps involved in using TMA Navigator (www.tmanavigator.org) is given in Supplementary Figure S1 and includes embedded hyperlinks to relevant parts of the user guide. Extensive help documentation is available by clicking on the Help button near the top-right of any page on the website, which opens at the section relevant to the current page. Many parts of the website have context-sensitive help, including tooltips and links from headings to appropriate subsections of the user guide. The first step in working with TMA Navigator is to create a dataset by importing marker scores, typically protein expression values; survival information can also be uploaded. A unique page for the dataset (the ‘dataset page’) has a Run analysis button providing access to data exploration, network inference and survival analysis. Analyses are processed in a queuing system and results are accessed from the dataset page.

Importing data

TMA Navigator has a button labelled Add dataset near the top-right of every page to start the process of importing marker data. A grid format is required, with markers as columns and samples as rows. Marker replicates are specified by multiple columns with identical names. File formats accepted are Microsoft Excel (.xls, .xlsx), tab-separated (.tsv, .txt) or comma-separated values (.csv). For anonymous guest users, an imported dataset receives a unique URL, which is easily bookmarked and protected by a random key. Alternatively, users may register an account, which provides a single point of reference for multiple uploaded datasets.

Tissue microarray datasets are often split across multiple TMA blocks, which can lead to unwanted non-biological variation (batch effects). TMA Navigator provides an option for batch effect reduction using ComBat (17). We have adapted ComBat for use with TMAs, including improved error handling and automatic removal of replicates/markers that prove problematic due to missing data. Batch correction is offered during data import when batch information is included with marker scores—batches are indicated by a column named *Batch and covariates specified with a column name including the prefix *cov. Additional information on batch correction is provided at www.tmanavigator.org/help/score-requirements#batches.

Survival data are uploaded using the Attach survival button located on the dataset page. Patient identifiers in the TMA marker and survival data must match; anonymous patient identifiers such as a sequential numeric value must be used. The user guide (www.tmanavigator.org/help) gives further details on data import and formatting requirements.

Data exploration

Marker distributions may be visualized using density plots (continuous data) or histograms (categorical data). Samples may be clustered by modelling marker expression as a mixture of Gaussian distributions. The number of clusters is determined automatically, and the procedure is fully unsupervised (methods). The mixture model is plotted with the centre of each cluster indicated, overlaid with a density plot and histogram; model parameters are displayed in a sidebar. Risk stratification according to marker values is commonly done manually or with quantiles (4,28,29). Mixture modelling with appropriate regularization (methods) has significant advantages, providing fully automated and statistically well-founded identification of groups according to expression values. Marker relationships may be explored with a heatmap (Supplementary Figure S2).

Figure 1 shows a mixture model for the protein E-cadherin in the dataset ‘Breast Cancer 1’ (Demonstration data). The suffix ‘Cy-Mem’ indicates cytoplasmic and membrane expression values (i.e. non-nuclear). E-cadherin is a clinically important adhesion protein that is putatively down-regulated in epithelial to mesenchymal transition (EMT) and metastasis (30–32). Mixture modelling identified two groups, ‘E-cadherin low’ (n = 10, mean score = 705) and ‘E-cadherin high’ (n = 118, mean score = 3769). Survival of these groups was investigated in TMA Navigator (Figure 2); the ‘E-cadherin low’ group showed a trend for worse survival, consistent with expectations (28,31,33).

Figure 2. — Survival analysis with E-cadherin expression informed by mixture modelling. Kaplan–Meier plot: x-axis denotes overall survival in months, y-axis the proportion of the group alive. Stratification of invasive ductal breast cancers by mixture modelling of E-cadherin expression (AQUA data); the low-expressing group shows a trend for worse prognosis consistent with expectations. Marker tabs shown in red indicate single group (unimodal) mixture models, for which Kaplan–Meier plots are not available.

Survival analysis

Survival analysis involves statistical testing to examine relationships of marker scores with survival, accounting for censoring, for a review see (34). Groups are defined according to marker scores with survival displayed as a Kaplan–Meier plot (19). The difference in survival between groups is tested for significance using the Mantel-Cox log-rank test (35) with false discovery rate (FDR) correction applied (36). Figure 2 and Supplementary Figure S3 show Kaplan–Meier plots for E-cadherin and PTEN expression respectively on ‘Breast Cancer 1’ (invasive ductal) and ‘Breast Cancer 3’ (trastuzumab-treated) cohorts (Demonstration data). Grouping according to E-cadherin expression (Figure 2) was determined by mixture modelling, a fully unsupervised approach (Data exploration). Loss of E-cadherin confers poor prognosis (30,31,33), and the low-expressing group showed the expected trend for worse survival. TMA Navigator provides for survival analyses on mixture modelling results as the option ‘Kaplan–Meier (mixture model) plots’ in the Run analysis dialogue box. Supplementary Figure S3 shows survival for tertiles of PTEN expression (FDR P = 0.0207), a tumour suppressor important for trastuzumab response (37) scored using the semi-continuous ‘quickscore’ method (Demonstration data). Splitting by tertiles provides roughly equal group sizes and so may improve prospects of obtaining statistical significance (38). However, these groups are unlikely to reflect modes of the underlying marker score distribution. Mixture modelling provides for biologically motivated grouping and so may enable better risk stratification, although associated smaller group sizes can lead to lower statistical power (38). When mixture modelling returns a single Gaussian (unimodal) model, survival analysis is still possible using tertiles. For categorical data, groups are defined by score values.

Network inference

Correlation networks provide a useful abstraction of the relationships (edges) between multiple markers, for example to inform biomarker discovery (39). TMA Navigator is typically used for analysis of protein expression, although markers might also include clinical variables such as lymph node metastasis count. TMA studies usually involve relatively few proteins that may have close relationships in signalling and/or metabolic pathways; therefore, common assumptions about network structure such as sparsity (40,41) do not necessarily hold. Furthermore, TMA data are subject to multiple sources of confounding variation that may be extremely challenging to remove, including differences in surgical procedure, sample age, reagent batch/age, sample fixation and variation in the material analysed. This variation acts as ‘noise’ and may reduce correlation values even when markers have biological relationships (17). Accordingly, edge thresholding for TMA networks is usefully tailored to the individual dataset studied, and to enable this, TMA Navigator affords access to correlation values for all marker pairs. Statistical significance is normally applied to identify minimum threshold values (e.g. FDR P-value ≤ 0.05). Correlations can identify biologically meaningful edges (42,43); however, statistically significant correlations do not necessarily underlie genuine functional interactions (44). Ideally, the edge threshold may be calibrated against negative control markers unrelated to the pathway(s) studied, as well as positive controls where relationships are well characterized in the system of interest.

Correlation networks may be inferred in TMA Navigator using several measures: mutual information, Spearman correlation or Pearson correlation. Mutual information measures statistical dependency between markers and therefore detects many types of interaction, although does not distinguish between positive and negative relationships. Also, significance is estimated by permutation and therefore statistical power is influenced by sample size and dependencies within the data (45). Spearman and Pearson correlation are limited to detecting monotonic and linear marker relationships respectively, but have the advantage of analytical significance estimation (methods) and can identify signed edges. Interactive thresholding is available on P-values adjusted for multiple hypothesis testing [Benjamini–Yekutieli (46) or Bonferroni correction], displayed as an interactive network using the Cytoscape Web plugin (47).

Figure 3 shows a Spearman correlation network for the dataset ‘Breast Cancer 2’ (Demonstration data), thresholded at FDR P ≤ 0.05 (46). Three components are identified, one (top-left) with the expected positive relationship between C35 and HER2 (48) and negative relationship between HER2 and ER-α (49). Interestingly, a positive relationship between C35 and MAL2 is found, in contrast to PCR results in cell culture with C35 induction (48). The second component (bottom) includes expected edges between the EMT transcription factors Snail, Slug, ZEB1 (30). The third component (top-right) includes edges between E-cadherin, Claudin-7 and β-catenin, as expected (30,48), suggesting a primary role for β-catenin in adhesion in this cohort, although an edge between nuclear β-catenin and Snail occurs close to the significance threshold (FDR P = 0.0783).

Demonstration data

Several example datasets are available to demonstrate the capabilities of TMA Navigator (www.tmanavigator.org/demo). The dataset ‘Breast Cancer 1’ includes expression data for nine markers obtained using AQUA (16) and survival over 9 years for a cohort of 128 lymph node positive patients (10). The dataset ‘Breast Cancer 2’ has AQUA expression for 16 markers and survival over 5 years for a cohort of 92 trastuzumab-treated patients (37). The dataset ‘Breast Cancer 3’ includes expression for four markers measured using a semi-continuous approach and survival over 5 years on 122 trastuzumab-treated patients (37). The latter dataset has also been discretized into five quantiles for demonstration of categorical data handling. Antibodies for the above datasets are summarized in Supplementary Table S1; all data are from primary tumours. The example datasets described above are available pre-imported in TMA Navigator, and may also be downloaded.

METHODS

Density plots approximate the empirical score distribution non-parametrically with adaptive bandwidth kernel density estimation (50,51). Mixture modelling identifies clusters of samples using expectation-maximization (52) to fit a mixture of Gaussian distributions to marker values. Each cluster has independent mean and standard deviation parameters, better aligning with biological expectations than fixed standard deviation. The number of clusters (modality) is selected using the Bayesian information criterion (BIC) (53). Survival is examined by Kaplan–Meier analysis (19), using the Mantel-Cox log-rank test (35), and stratification determined per marker with Benjamini–Hochberg corrected P-values (36). Network edge significance is determined using algorithm AS89 (54) (Spearman if n < 1290), Student t approximation (Spearman, Pearson) or permutation (mutual information), and P-values corrected with Benjamini–Yekutieli (recommended), or the overly conservative Bonferroni method (46,55). The service architecture is illustrated in Supplementary Figure S4 and described in Supplementary Data.

CONCLUDING REMARKS

TMAs offer high-throughput immunohistochemical analysis of clinical samples and provide for study of tissue and cell-type specific networks underlying pathophenotypes (4,21). TMA Navigator is a unique interactive platform for TMA data processing and analysis that has been successfully tested on multiple web browsers (Internet Explorer, Firefox, Chrome, Opera, Safari). Key features include batch correction (17), unsupervised stratification by marker scores, survival analysis and network inference. An extensive user guide and demonstration datasets are available. We very much appreciate feedback on any issues relating to TMA Navigator, ideally sent via the form at www.tmanavigator.org/contact, and welcome requests for new functionality.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Table 1 and Supplementary Figures 1–4.

ACKNOWLEDGEMENTS

We thank Sylvie Dubois-Marshall, In Hwa Um and Helen Caldwell for contributing to collection of demonstration data. We are also grateful to everyone who helped with testing.

FUNDING

Scottish Funding Council (SFC) and the Chief Scientist's Office (CSO) (to D.H.); Royal Society of Edinburgh Scottish Government Fellowship co-funded by Marie Curie Actions and the UK Medical Research Council (MRC) (to I.O.). Funding for open access charge: Royal Society of Edinburgh.

Conflict of interest statement. None declared.

REFERENCES

1.Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat. Med. 2004;10:789–799. doi: 10.1038/nm1087. [DOI] [PubMed] [Google Scholar]
2.Kolch W, Pitt A. Functional proteomics to dissect tyrosine kinase signalling pathways in cancer. Nat. Rev. Cancer. 2010;10:618–629. doi: 10.1038/nrc2900. [DOI] [PubMed] [Google Scholar]
3.Kononen J, Bubendorf L, Kallionimeni A, Barlund M, Schraml P, Leighton S, Torhorst J, Mihatsch MJ, Sauter G, Kallionimeni O. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat. Med. 1998;4:844–847. doi: 10.1038/nm0798-844. [DOI] [PubMed] [Google Scholar]
4.Camp RL, Neumeister V, Rimm DL. A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers. J. Clin. Oncol. 2008;26:5630–5637. doi: 10.1200/JCO.2008.17.3567. [DOI] [PubMed] [Google Scholar]
5.Camp RL, Charette LA, Rimm DL. Validation of tissue microarray technology in breast carcinoma. Lab. Invest. 2000;80:1943–1949. doi: 10.1038/labinvest.3780204. [DOI] [PubMed] [Google Scholar]
6.Camp RL, Dolled-Filhart M, King BL, Rimm DL. Quantitative analysis of breast cancer tissue microarrays shows that both high and normal levels of HER2 expression are associated with poor outcome. Cancer Res. 2003;63:1445–1448. [PubMed] [Google Scholar]
7.Zhang D, Salto-Tellez M, Putti TC, Do E, Koay ES. Reliability of tissue microarrays in detecting protein expression and gene amplification in breast cancer. Mod. Pathol. 2003;16:79–85. doi: 10.1097/01.MP.0000047307.96344.93. [DOI] [PubMed] [Google Scholar]
8.Paweletz CP, Charboneau L, Bichsel VE, Simone NL, Chen T, Gillespie JW, Emmert-Buck MR, Roth MJ, Petricoin EF, III, Liotta LA. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene. 2001;20:1981–1989. doi: 10.1038/sj.onc.1204265. [DOI] [PubMed] [Google Scholar]
9.Washburn MP, Wolters D, Yates JR., III Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotech. 2001;19:242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]
10.Dubois-Marshall S, Thomas JS, Faratian D, Harrison DJ, Katz E. Two possible mechanisms of epithelial to mesenchymal transition in invasive ductal breast cancer. Clin. Exp. Metastasis. 2011;28:811–818. doi: 10.1007/s10585-011-9412-x. [DOI] [PubMed] [Google Scholar]
11.Lahrmann B, Halama N, Sinn HP, Schirmacher P, Jaeger D, Grabe N. Automatic tumor-stroma separation in fluorescence TMAs enables the quantitative high-throughput analysis of multiple cancer biomarkers. PLoS One. 2011;6:e28048. doi: 10.1371/journal.pone.0028048. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Rao J, Seligson D, Hemstreet GP. Protein expression analysis using quantitative fluorescence image analysis on tissue microarray slides. Biotechniques. 2002;32:924–926. doi: 10.2144/02324pt04. 928–930, 932. [DOI] [PubMed] [Google Scholar]
13.Allred DC, Harvey JM, Berardo M, Clark GM. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod. Pathol. 1998;11:155–168. [PubMed] [Google Scholar]
14.McCarty KS, Jr, Szabo E, Flowers JL, Cox EB, Leight GS, Miller L, Konrath J, Soper JT, Budwit DA, Creasman WT. Use of a monoclonal anti-estrogen receptor antibody in the immunohistochemical evaluation of human tumors. Cancer Res. 1986;46:4244s–4248s. [PubMed] [Google Scholar]
15.Detre S, Saclani Jotti G, Dowsett M. A ‘quickscore’ method for immunohistochemical semiquantitation: validation for oestrogen receptor in breast carcinomas. J. Clin. Pathol. 1995;48:876–878. doi: 10.1136/jcp.48.9.876. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.McCabe A, Dolled-Filhart M, Camp RL, Rimm DL. Automated quantitative analysis (AQUA) of in situ protein expression, antibody concentration, and prognosis. J. Natl Cancer Inst. 2005;97:1808–1815. doi: 10.1093/jnci/dji427. [DOI] [PubMed] [Google Scholar]
17.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]
18.Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 2010;11:733–739. doi: 10.1038/nrg2825. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J. Am. Statist. Assoc. 1958;53:457–481. [Google Scholar]
20.Vidal M, Cusick ME, Barabási AL. Interactome networks and human disease. Cell. 2011;144:986–998. doi: 10.1016/j.cell.2011.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 2011;12:56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Morris L, Tsui A, Crichton C, Harris S, Maccallum P, Howat W, Davies J, Brenton J, Caldas C. A metadata-aware application for remote scoring and exchange of tissue microarray images. BMC Bioinformatics. 2013;14:147. doi: 10.1186/1471-2105-14-147. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Thallinger G, Baumgartner K, Pirklbauer M, Uray M, Pauritsch E, Mehes G, Buck C, Zatloukal K, Trajanoski Z. TAMEE: data management and analysis for tissue microarrays. BMC Bioinformatics. 2007;8:81. doi: 10.1186/1471-2105-8-81. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kim R, Demichelis F, Tang J, Riva A, Shen R, Gibbs D, Mahavishno V, Chinnaiyan A, Rubin M. Internet-based profiler system as integrative framework to support translational research. BMC Bioinformatics. 2005;6:304. doi: 10.1186/1471-2105-6-304. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sharma-Oates A, Quirke P, Westhead D. TmaDB: a repository for tissue microarray data. BMC Bioinformatics. 2005;6:218. doi: 10.1186/1471-2105-6-218. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Liu CL, Prapong W, Natkunam Y, Alizadeh A, Montgomery K, Gilks CB, van de Rijn M. Software tools for high-throughput analysis and archiving of immunohistochemistry staining data obtained with tissue microarrays. Am. J. Pathol. 2002;161:1557–1565. doi: 10.1016/S0002-9440(10)64434-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin. Cancer Res. 2004;10:7252–7259. doi: 10.1158/1078-0432.CCR-04-0713. [DOI] [PubMed] [Google Scholar]
28.Liu X, Minin V, Huang Y, Seligson DB, Horvath S. Statistical methods for analyzing tissue microarray data. J. Biopharm. Stat. 2004;14:671–685. doi: 10.1081/BIP-200025657. [DOI] [PubMed] [Google Scholar]
29.Jamieson NB, Carter CR, McKay CJ, Oien KA. Tissue biomarkers for prognosis in pancreatic ductal adenocarcinoma: a systematic review and meta-analysis. Clin. Cancer Res. 2011;17:3316–3331. doi: 10.1158/1078-0432.CCR-10-3284. [DOI] [PubMed] [Google Scholar]
30.Thiery JP, Acloque H, Huang RY, Nieto MA. Epithelial-mesenchymal transitions in development and disease. Cell. 2009;139:871–890. doi: 10.1016/j.cell.2009.11.007. [DOI] [PubMed] [Google Scholar]
31.Oka H, Shiozaki H, Kobayashi K, Inoue M, Tahara H, Kobayashi T, Takatsuka Y, Matsuyoshi N, Hirano S, Takeichi M, et al. Expression of E-cadherin cell adhesion molecules in human breast cancer tissues and its relationship to metastasis. Cancer Res. 1993;53:1696–1701. [PubMed] [Google Scholar]
32.Onder TT, Gupta PB, Mani SA, Yang J, Lander ES, Weinberg RA. Loss of E-cadherin promotes metastasis via multiple downstream transcriptional pathways. Cancer Res. 2008;68:3645–3654. doi: 10.1158/0008-5472.CAN-07-2938. [DOI] [PubMed] [Google Scholar]
33.Taube JH, Herschkowitz JI, Komurov K, Zhou AY, Gupta S, Yang J, Hartwell K, Onder TT, Gupta PB, Evans KW, et al. Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes. Proc. Natl Acad. Sci. USA. 2010;107:15449–15454. doi: 10.1073/pnas.1004900107. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br. J. Cancer. 2003;89:232–238. doi: 10.1038/sj.bjc.6601118. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother. Rep. 1966;50:163–170. [PubMed] [Google Scholar]
36.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B. 1995;57:289–300. [Google Scholar]
37.Faratian D, Goltsov A, Lebedeva G, Sorokin A, Moodie S, Mullen P, Kay C, Um IH, Langdon S, Goryanin I, et al. Systems biology reveals new strategies for personalizing cancer medicine and confirms the role of PTEN in resistance to trastuzumab. Cancer Res. 2009;69:6713–6720. doi: 10.1158/0008-5472.CAN-09-0777. [DOI] [PubMed] [Google Scholar]
38.Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68:316–319. [Google Scholar]
39.Adourian A, Jennings E, Balasubramanian R, Hines WM, Damian D, Plasterer TN, Clish CB, Stroobant P, McBurney R, Verheij ER, et al. Correlation network analysis for data integration and biomarker selection. Mol. Biosyst. 2008;4:249–259. doi: 10.1039/b708489g. [DOI] [PubMed] [Google Scholar]
40.Szederkényi G, Banga JR, Alonso AA. Inference of complex biological networks: distinguishability issues and optimization-based solutions. BMC Syst. Biol. 2011;5:177. doi: 10.1186/1752-0509-5-177. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Yeung MK, Tegnér J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proc. Natl Acad. Sci. USA. 2002;99:6163–6168. doi: 10.1073/pnas.092576199. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, Eizinger A, Wylie BN, Davidson GS. A gene expression map for Caenorhabditis elegans. Science. 2001;293:2087–2092. doi: 10.1126/science.1061603. [DOI] [PubMed] [Google Scholar]
43.Gillis J, Pavlidis P. The role of indirect connections in gene networks in predicting function. Bioinformatics. 2011;27:1860–1866. doi: 10.1093/bioinformatics/btr288. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Venet D, Dumont JE, Detours V. most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput. Biol. 2011;7:e1002240. doi: 10.1371/journal.pcbi.1002240. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.So H, Sham PC. Multiple testing and power calculations in genetic association studies. Cold Spring Harb. Protoc. 2011;2011 doi: 10.1101/pdb.top95. pdb.top95. [DOI] [PubMed] [Google Scholar]
46.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 2001;29:1165–1188. [Google Scholar]
47.Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD. Cytoscape Web: an interactive web-based network browser. Bioinformatics. 2010;26:2347–2348. doi: 10.1093/bioinformatics/btq430. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Katz E, Dubois-Marshall S, Sims AH, Gautier P, Caldwell H, Meehan RR, Harrison DJ. An in vitro model that recapitulates the epithelial to mesenchymal transition (EMT) in human breast cancer. PLoS One. 2011;6:e17083. doi: 10.1371/journal.pone.0017083. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Cuzick J, Dowsett M, Wale C, Salter J, Quinn E, Zabaglo L, Howell A, Buzdar A, Forbes J. Prognostic value of a combined ER, PgR, Ki67, HER2 immunohistochemical (IHC4) score and comparison with the GHI recurrence score–results from TransATAC. Cancer Res. 2010;69:74–74. [Google Scholar]
50.Rosenblatt M. Remarks on some nonparametric estimates of a density function. Ann. Math Statist. 1956;27:832–837. [Google Scholar]
51.Abramson IS. On bandwidth variation in kernel estimates—a square root law. Ann. Statist. 1982;10:1217–1223. [Google Scholar]
52.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM Algorithm. J. Roy. Stat. Soc. Ser. B. 1977;39:1–38. [Google Scholar]
53.Schwarz G. Estimating the dimension of a model. Ann. Statist. 1978;6:461–464. [Google Scholar]
54.Best DJ, Roberts DE. Algorithm AS 89: the upper tail probabilities of spearman’s rho. J. Roy. Stat. Soc. Ser. C. 1975;24:377–379. [Google Scholar]
55.Verhoeven KJ, Simonsen KL, McIntyre LM. Implementing false discovery rate control: increasing your power. Oikos. 2005;108:643–647. [Google Scholar]

[gkt529-B1] 1.Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat. Med. 2004;10:789–799. doi: 10.1038/nm1087. [DOI] [PubMed] [Google Scholar]

[gkt529-B2] 2.Kolch W, Pitt A. Functional proteomics to dissect tyrosine kinase signalling pathways in cancer. Nat. Rev. Cancer. 2010;10:618–629. doi: 10.1038/nrc2900. [DOI] [PubMed] [Google Scholar]

[gkt529-B3] 3.Kononen J, Bubendorf L, Kallionimeni A, Barlund M, Schraml P, Leighton S, Torhorst J, Mihatsch MJ, Sauter G, Kallionimeni O. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat. Med. 1998;4:844–847. doi: 10.1038/nm0798-844. [DOI] [PubMed] [Google Scholar]

[gkt529-B4] 4.Camp RL, Neumeister V, Rimm DL. A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers. J. Clin. Oncol. 2008;26:5630–5637. doi: 10.1200/JCO.2008.17.3567. [DOI] [PubMed] [Google Scholar]

[gkt529-B5] 5.Camp RL, Charette LA, Rimm DL. Validation of tissue microarray technology in breast carcinoma. Lab. Invest. 2000;80:1943–1949. doi: 10.1038/labinvest.3780204. [DOI] [PubMed] [Google Scholar]

[gkt529-B6] 6.Camp RL, Dolled-Filhart M, King BL, Rimm DL. Quantitative analysis of breast cancer tissue microarrays shows that both high and normal levels of HER2 expression are associated with poor outcome. Cancer Res. 2003;63:1445–1448. [PubMed] [Google Scholar]

[gkt529-B7] 7.Zhang D, Salto-Tellez M, Putti TC, Do E, Koay ES. Reliability of tissue microarrays in detecting protein expression and gene amplification in breast cancer. Mod. Pathol. 2003;16:79–85. doi: 10.1097/01.MP.0000047307.96344.93. [DOI] [PubMed] [Google Scholar]

[gkt529-B8] 8.Paweletz CP, Charboneau L, Bichsel VE, Simone NL, Chen T, Gillespie JW, Emmert-Buck MR, Roth MJ, Petricoin EF, III, Liotta LA. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene. 2001;20:1981–1989. doi: 10.1038/sj.onc.1204265. [DOI] [PubMed] [Google Scholar]

[gkt529-B9] 9.Washburn MP, Wolters D, Yates JR., III Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotech. 2001;19:242–247. doi: 10.1038/85686. [DOI] [PubMed] [Google Scholar]

[gkt529-B10] 10.Dubois-Marshall S, Thomas JS, Faratian D, Harrison DJ, Katz E. Two possible mechanisms of epithelial to mesenchymal transition in invasive ductal breast cancer. Clin. Exp. Metastasis. 2011;28:811–818. doi: 10.1007/s10585-011-9412-x. [DOI] [PubMed] [Google Scholar]

[gkt529-B11] 11.Lahrmann B, Halama N, Sinn HP, Schirmacher P, Jaeger D, Grabe N. Automatic tumor-stroma separation in fluorescence TMAs enables the quantitative high-throughput analysis of multiple cancer biomarkers. PLoS One. 2011;6:e28048. doi: 10.1371/journal.pone.0028048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B12] 12.Rao J, Seligson D, Hemstreet GP. Protein expression analysis using quantitative fluorescence image analysis on tissue microarray slides. Biotechniques. 2002;32:924–926. doi: 10.2144/02324pt04. 928–930, 932. [DOI] [PubMed] [Google Scholar]

[gkt529-B13] 13.Allred DC, Harvey JM, Berardo M, Clark GM. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Mod. Pathol. 1998;11:155–168. [PubMed] [Google Scholar]

[gkt529-B14] 14.McCarty KS, Jr, Szabo E, Flowers JL, Cox EB, Leight GS, Miller L, Konrath J, Soper JT, Budwit DA, Creasman WT. Use of a monoclonal anti-estrogen receptor antibody in the immunohistochemical evaluation of human tumors. Cancer Res. 1986;46:4244s–4248s. [PubMed] [Google Scholar]

[gkt529-B15] 15.Detre S, Saclani Jotti G, Dowsett M. A ‘quickscore’ method for immunohistochemical semiquantitation: validation for oestrogen receptor in breast carcinomas. J. Clin. Pathol. 1995;48:876–878. doi: 10.1136/jcp.48.9.876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B16] 16.McCabe A, Dolled-Filhart M, Camp RL, Rimm DL. Automated quantitative analysis (AQUA) of in situ protein expression, antibody concentration, and prognosis. J. Natl Cancer Inst. 2005;97:1808–1815. doi: 10.1093/jnci/dji427. [DOI] [PubMed] [Google Scholar]

[gkt529-B17] 17.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]

[gkt529-B18] 18.Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 2010;11:733–739. doi: 10.1038/nrg2825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B19] 19.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J. Am. Statist. Assoc. 1958;53:457–481. [Google Scholar]

[gkt529-B20] 20.Vidal M, Cusick ME, Barabási AL. Interactome networks and human disease. Cell. 2011;144:986–998. doi: 10.1016/j.cell.2011.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B21] 21.Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 2011;12:56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B22] 22.Morris L, Tsui A, Crichton C, Harris S, Maccallum P, Howat W, Davies J, Brenton J, Caldas C. A metadata-aware application for remote scoring and exchange of tissue microarray images. BMC Bioinformatics. 2013;14:147. doi: 10.1186/1471-2105-14-147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B23] 23.Thallinger G, Baumgartner K, Pirklbauer M, Uray M, Pauritsch E, Mehes G, Buck C, Zatloukal K, Trajanoski Z. TAMEE: data management and analysis for tissue microarrays. BMC Bioinformatics. 2007;8:81. doi: 10.1186/1471-2105-8-81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B24] 24.Kim R, Demichelis F, Tang J, Riva A, Shen R, Gibbs D, Mahavishno V, Chinnaiyan A, Rubin M. Internet-based profiler system as integrative framework to support translational research. BMC Bioinformatics. 2005;6:304. doi: 10.1186/1471-2105-6-304. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B25] 25.Sharma-Oates A, Quirke P, Westhead D. TmaDB: a repository for tissue microarray data. BMC Bioinformatics. 2005;6:218. doi: 10.1186/1471-2105-6-218. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B26] 26.Liu CL, Prapong W, Natkunam Y, Alizadeh A, Montgomery K, Gilks CB, van de Rijn M. Software tools for high-throughput analysis and archiving of immunohistochemistry staining data obtained with tissue microarrays. Am. J. Pathol. 2002;161:1557–1565. doi: 10.1016/S0002-9440(10)64434-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B27] 27.Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin. Cancer Res. 2004;10:7252–7259. doi: 10.1158/1078-0432.CCR-04-0713. [DOI] [PubMed] [Google Scholar]

[gkt529-B28] 28.Liu X, Minin V, Huang Y, Seligson DB, Horvath S. Statistical methods for analyzing tissue microarray data. J. Biopharm. Stat. 2004;14:671–685. doi: 10.1081/BIP-200025657. [DOI] [PubMed] [Google Scholar]

[gkt529-B29] 29.Jamieson NB, Carter CR, McKay CJ, Oien KA. Tissue biomarkers for prognosis in pancreatic ductal adenocarcinoma: a systematic review and meta-analysis. Clin. Cancer Res. 2011;17:3316–3331. doi: 10.1158/1078-0432.CCR-10-3284. [DOI] [PubMed] [Google Scholar]

[gkt529-B30] 30.Thiery JP, Acloque H, Huang RY, Nieto MA. Epithelial-mesenchymal transitions in development and disease. Cell. 2009;139:871–890. doi: 10.1016/j.cell.2009.11.007. [DOI] [PubMed] [Google Scholar]

[gkt529-B31] 31.Oka H, Shiozaki H, Kobayashi K, Inoue M, Tahara H, Kobayashi T, Takatsuka Y, Matsuyoshi N, Hirano S, Takeichi M, et al. Expression of E-cadherin cell adhesion molecules in human breast cancer tissues and its relationship to metastasis. Cancer Res. 1993;53:1696–1701. [PubMed] [Google Scholar]

[gkt529-B32] 32.Onder TT, Gupta PB, Mani SA, Yang J, Lander ES, Weinberg RA. Loss of E-cadherin promotes metastasis via multiple downstream transcriptional pathways. Cancer Res. 2008;68:3645–3654. doi: 10.1158/0008-5472.CAN-07-2938. [DOI] [PubMed] [Google Scholar]

[gkt529-B33] 33.Taube JH, Herschkowitz JI, Komurov K, Zhou AY, Gupta S, Yang J, Hartwell K, Onder TT, Gupta PB, Evans KW, et al. Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes. Proc. Natl Acad. Sci. USA. 2010;107:15449–15454. doi: 10.1073/pnas.1004900107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B34] 34.Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br. J. Cancer. 2003;89:232–238. doi: 10.1038/sj.bjc.6601118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B35] 35.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother. Rep. 1966;50:163–170. [PubMed] [Google Scholar]

[gkt529-B36] 36.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B. 1995;57:289–300. [Google Scholar]

[gkt529-B37] 37.Faratian D, Goltsov A, Lebedeva G, Sorokin A, Moodie S, Mullen P, Kay C, Um IH, Langdon S, Goryanin I, et al. Systems biology reveals new strategies for personalizing cancer medicine and confirms the role of PTEN in resistance to trastuzumab. Cancer Res. 2009;69:6713–6720. doi: 10.1158/0008-5472.CAN-09-0777. [DOI] [PubMed] [Google Scholar]

[gkt529-B38] 38.Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68:316–319. [Google Scholar]

[gkt529-B39] 39.Adourian A, Jennings E, Balasubramanian R, Hines WM, Damian D, Plasterer TN, Clish CB, Stroobant P, McBurney R, Verheij ER, et al. Correlation network analysis for data integration and biomarker selection. Mol. Biosyst. 2008;4:249–259. doi: 10.1039/b708489g. [DOI] [PubMed] [Google Scholar]

[gkt529-B40] 40.Szederkényi G, Banga JR, Alonso AA. Inference of complex biological networks: distinguishability issues and optimization-based solutions. BMC Syst. Biol. 2011;5:177. doi: 10.1186/1752-0509-5-177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B41] 41.Yeung MK, Tegnér J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proc. Natl Acad. Sci. USA. 2002;99:6163–6168. doi: 10.1073/pnas.092576199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B42] 42.Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, Eizinger A, Wylie BN, Davidson GS. A gene expression map for Caenorhabditis elegans. Science. 2001;293:2087–2092. doi: 10.1126/science.1061603. [DOI] [PubMed] [Google Scholar]

[gkt529-B43] 43.Gillis J, Pavlidis P. The role of indirect connections in gene networks in predicting function. Bioinformatics. 2011;27:1860–1866. doi: 10.1093/bioinformatics/btr288. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B44] 44.Venet D, Dumont JE, Detours V. most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput. Biol. 2011;7:e1002240. doi: 10.1371/journal.pcbi.1002240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B45] 45.So H, Sham PC. Multiple testing and power calculations in genetic association studies. Cold Spring Harb. Protoc. 2011;2011 doi: 10.1101/pdb.top95. pdb.top95. [DOI] [PubMed] [Google Scholar]

[gkt529-B46] 46.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 2001;29:1165–1188. [Google Scholar]

[gkt529-B47] 47.Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD. Cytoscape Web: an interactive web-based network browser. Bioinformatics. 2010;26:2347–2348. doi: 10.1093/bioinformatics/btq430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B48] 48.Katz E, Dubois-Marshall S, Sims AH, Gautier P, Caldwell H, Meehan RR, Harrison DJ. An in vitro model that recapitulates the epithelial to mesenchymal transition (EMT) in human breast cancer. PLoS One. 2011;6:e17083. doi: 10.1371/journal.pone.0017083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkt529-B49] 49.Cuzick J, Dowsett M, Wale C, Salter J, Quinn E, Zabaglo L, Howell A, Buzdar A, Forbes J. Prognostic value of a combined ER, PgR, Ki67, HER2 immunohistochemical (IHC4) score and comparison with the GHI recurrence score–results from TransATAC. Cancer Res. 2010;69:74–74. [Google Scholar]

[gkt529-B50] 50.Rosenblatt M. Remarks on some nonparametric estimates of a density function. Ann. Math Statist. 1956;27:832–837. [Google Scholar]

[gkt529-B51] 51.Abramson IS. On bandwidth variation in kernel estimates—a square root law. Ann. Statist. 1982;10:1217–1223. [Google Scholar]

[gkt529-B52] 52.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM Algorithm. J. Roy. Stat. Soc. Ser. B. 1977;39:1–38. [Google Scholar]

[gkt529-B53] 53.Schwarz G. Estimating the dimension of a model. Ann. Statist. 1978;6:461–464. [Google Scholar]

[gkt529-B54] 54.Best DJ, Roberts DE. Algorithm AS 89: the upper tail probabilities of spearman’s rho. J. Roy. Stat. Soc. Ser. C. 1975;24:377–379. [Google Scholar]

[gkt529-B55] 55.Verhoeven KJ, Simonsen KL, McIntyre LM. Implementing false discovery rate control: increasing your power. Oikos. 2005;108:643–647. [Google Scholar]

PERMALINK

TMA Navigator: network inference, patient stratification and survival analysis with tissue microarray data

Alexander L R Lubbock

Elad Katz

David J Harrison

Ian M Overton

Abstract

INTRODUCTION