Abstract
Recent advances in biomarker discovery, biocomputing, and nanotechnology have raised new opportunities for the emerging field of personalized medicine in which disease detection, diagnosis, and therapy are tailored to each individual’s molecular profile, and also for predictive medicine that uses genetic/molecular information to predict disease development, progression, and clinical outcome. Here we discuss advanced biocomputing tools for cancer biomarker discovery and multiplexed nanoparticle probes for cancer biomarker profiling, together with prospects and challenges in correlating biomolecular signatures with clinical outcome. This bio-nano-info convergence holds great promise for molecular diagnosis and individualized therapy of cancer and other human diseases.
Introduction
The last two decades have witnessed an explosive growth in the amount of genomic and proteomic data, major advances in unraveling the molecular mechanisms of human diseases, and a rapid pace in developing new technologies for molecular diagnostics and therapy. This has ushered in a new era of molecular medicine in which disease detection, diagnosis and treatment are tailored to each individual’s molecular profile (1-4). This revolution is based on the availability and application of new biomarkers for predicting disease behavior, advanced technologies for rapid detection and diagnosis, new therapies for molecular and cellular targeting, as well as on computing technologies for data analysis and management. For molecular profiling and diagnostics, however, a major challenge is that human diseases are often characterized by histologic lesions that are heterogeneous at the cellular and molecular levels. In cancerous tumors, for example, malignant cells are typically intermixed with benign stroma, blood vessels, and inflammatory cells (5-8). Current technologies such as real-time polymerase chain reactions (RT-PCR) and gene microarrays are not designed to handle this type of heterogeneity, in part because they require destructive preparation of cells and tissue specimens into a homogeneous solution, leading to a loss of valuable information regarding the 3-D cellular environment and tissue morphology. The development of nanotechnology has provided new opportunities for integrating morphological and molecular information, and also for correlating observed molecular and cellular changes with disease behavior (9-11). In particular, bioconjugated quantum dots (QDs) (12-15) have been used to quantify multiple biomarkers in intact cancer cells and tissue specimens, allowing a comparative test of traditional histopathology versus molecular signatures for the same tissue (16-20). For molecular imaging and therapy, nanotechnology can be used to improve the efficacy and toxicity profiles of chemotherapeutic agents because these agents can be encapsulated, covalently attached, or adsorbed onto nanoparticles (21-23).
At the present, a major task in biomedical nanotechnology is to understand how nanoparticles interact with blood, cells, and organs under in vivo physiological conditions, and how to overcome one of their inherent limitations, i.e. their delivery to diseased sites or organs (24-26). Another major challenge is to generate critical studies that can clearly link biomarkers with disease behaviors, such as the rate of tumor progression and different responses to surgery, radiation or drug therapy (27). Here we discuss how biomarkers and biocomputing can be integrated with nanotechnology for high-throughput analysis of gene expression data and for multiplexed molecular profiling of intact cells and tissue specimens (see Figure 1). In particular, we discuss web-based bioinformatics tools for biomarker discovery, optimization, and clinical validation.
Figure 1.
Schematic diagram of the ‘bio-nano-info’ convergence, which uses bioconjugated nanoparticles such as multicolor quantum dots (QDs) to analyzemulitple biomakers for molecular imaging, diagnosis, and targeted therapy.
Biomarkers
Biomolecular markers include altered or mutant genes, RNA, proteins, lipids, carbohydrates, small metabolite molecules, and altered expression states of such markers that can be correlated with a biological behavior or a clinical outcome (28-31). Most biomarkers have been discovered by molecular profiling studies, based on an association or correlation between a molecular signature and disease behavior. One of the first molecular profiling studies was reported by Golub et al. (32), who showed that gene expression patterns could classify tumors, thereby yielding new insights into tumor pathology such as its stage, grade, clinical course, and response to treatment. Gene expression studies further revealed that the molecular signatures of each tumor are a result of the combined tumoral, stromal, and inflammatory factors of the original heterogeneous lesion (33). The first clinical correlation of gene expression patterns with clinical outcome was reported for diffuse large B-cell lymphoma (34), a clinically heterogeneous disease. Whereas most of the patients succumbed to the disease, the remainder responded well to therapy and had prolonged survival. This variability in disease progression could be correlated with a distinct pattern of gene expression. The concept of a specific molecular portrait for a tumor of each individual patient was later validated by Perou, Bittner, and their coworkers using an array of clinical samples (35, 36). Recent work in several groups has identified unique gene expression patterns, which are strongly correlated with clinical outcomes for other types of tumors including prostate, breast, lung, and liver cancers (37-41).
Biomarkers are usually divided into prognostic, predictive, and therapeutic response markers. Prognostic biomarkers allow to predict the natural course of an individual cancer, thus making it possible to distinguish indolent tumors from aggressive tumors. Predictive biomarkers are used to assess the probability that a patient will benefit from a particular treatment. For example, patients with breast cancer in which the gene HER2 (ERBB2) encoding a receptor tyrosine kinase is amplified are expected to benefit from treatment with trastuzumab (Herceptin) (42), whereas, in cases in which the gene encoding the estrogen receptor is overexpressed by the tumor, the patients might better respond to tamoxifen treatment (43). Pharmacodynamic biomarkers measure the short-term treatment effects a drug has on a tumor and these are used to guide dose selection in the early stages of clinical development of new drugs.
For most applications, single biomarkers are unlikely to provide the necessary sensitivity and specificity owing to the substantial heterogeneity among cancers. It is unrealistic to expect that a single biomarker will provide information about tissue type and malignant transformation throughout the various stages of tumor development and progression. Therefore, panels of biomarkers are needed, but their discovery and validation must go through several key steps before they could be employed in clinical practice. As shown in Figure 2, the first step involves experimental design and acquisition of molecular data, typically in the form of large amounts of genomic or proteomic expression data together with the patient history. These data need to be properly organized and annotated using available databases and web-based tools. Furthermore, the original data are evaluated and improved by removing technical artifacts and by combining multiple datasets to increase statistical significance. The second stage in data processing uses feature extraction and classification methods such as omniBiomarker (see below) to identify relevant biomarkers, which are differentially expressed. Before these biomarkers can be used in a clinical application, their functional relevance is validated through determining their expression level with multiplexed nanotechnology (for proteins), or by RT-PCR (for nucleic acids). In the following, we discuss web-based bioinformatics tools for analysis of microarray data, biomarker discovery, and their clinical validation.
Figure 2.
Flow diagram of biocomputing tools for discovery and validation of molecular biomarkers. As discussed in the text, genomic and proteomic data first need to be properly organized and annotated. Before further analyzing the data to identify differentially expressed biomarkers, the original data are evaluated and improved by removing technical artifacts and by combining multiple datasets to increase statistical significance. Candidate biomarkers can be identified using omniBiomarker (73), followed by their clinical validation using for example multiplexed immunostaining or RT-PCR.
Bioinformatics Tools
Early in the microarray era, bioinformatics tools often focused on unsupervised clustering and the main interests were to explore new technologies and to discovernew properties within the data structure, without dwelling on potential clinical applications. For example, Eisen et al. (44) developed a software application that combines several types of unsupervised clustering methods. A more recent development combines clustering algorithms and visualization tools into a web-based application (45), but with a focus on unsupervised clustering. Similar methods have been applied to analyze high-throughput gene expression data from different clinical scenarios and have led to significant findings concerning the identification of cancer subtypes (46, 47). As such, unsupervised clustering applications are still widely used for data visualization and discovery.
More recently, the focus of analysis of microarray data has moved away from unsupervised clustering to more guided and supervised analysis. Consequently, web-based bioinformatics applications have shifted, and these newer tools focus on the analysis of genes which are differentially expressed under different known conditions. Some of these tools are specific to microarray platforms; for example, MAGMA and ILOOP (Interwoven Loop) are web-based applications designed to analyze two-channel microarrays (48, 49). ILOOP is an interface that assists in the experimental design of two-channel microarrays, while MAGMA incorporates standard normalization and statistical methods into an application which primary aim is usability and reproducibility. Not surprisingly, many of these web-based applications implement functionality for several common steps in the data analysis pipeline. GEPAS (Gene Expression Profile Analysis Suite), for example, includes functions that address several aspects of microarray analysis, including data normalization, feature selection, class prediction, and even unsupervised clustering (50). CARMAweb (comprehensive R and bioconductor-based web) is another recent tool for microarray analysis (51), which uses modules from Bioconductor, an open-source bioinformatics software package that leverages the R programming language (https://carmaweb.genome.tugraz.at). The microarray analysis functions available in Bioconductor include background correction, quality control, normalization, differential gene detection, clustering, dimensionality reduction, and visualization (52). As with most bioinformatics applications, the main contribution of CARMAweb to the bioinformatics community is the integration of numerous tools into a user-friendly web interface. GenePattern (53) is another compilation of different gene expression analysis tools, and is furthering the concept of usability and reproducibility by being integrated into the cancer Bioinformatics Grid (caBIG), an initiative of the National Cancer Institute (NCI), to create a standard for semantic interoperability of bioinformatics software (54).
It is well established that the lists of candidate biomarkers resulting from microarray data analysis depend on both, the available samples and the selection algorithm (55). In fact, these lists may be highly unstable and often vary from sample to sample. Furthermore, high-throughput assay platforms typically consist of tens of thousands of genes, many of which are still not fully understood. Thus, the task of interpreting their results is daunting. By associating each candidate gene with a biological function, one might be able to begin to understand the underlying mechanisms of the associated disease and the biological relevance of the feature selection algorithm. Databases such as the Gene Ontology (GO) are designed to facilitate interpretation of gene functions on a large scale (56). A diverse range of GO tools is available to extract statistically significant conclusions from a GO database analysis. These are available as either web-based or downloadable packages, including GoMiner (57, 58), GOStat (59), AmiGO (60), BiNGO (61), and GOEAST (62). There are also similar applications that mine the literature, rather than the GO database. CoPub, for example, links lists of candidate genes to keywords that are obtained from the literature by searching Medline abstracts and visualizes those keywords that are statistically overrepresented using a network structure (http://services.nbic.nl/cgi-bin/copub/CoPub.pl) (63).
With the steadily increasing accumulation of gene expression data, several applications have emerged with the aim to organize and integrate these data sources and heterogeneous datasets more effectively. As previously mentioned, increasing the data sample size can improve the reproducibility of the resulting predictive models. Thus, there has been a demand for solutions that would allow data sharing. The Gene Expression Omnibus (GEO) (64) and ArrayExpress (65) are examples of large repositories that adhere to community data standards such as MIAME (66). ArrayWiki is an alternative solution that allows the community of users to annotate gene expression metadata (67). caArray is part of the caBIG initiative and is intended to become a semantically interoperable standard for microarray storage of caBIG applications (54). Just as there is an overlap between the different analytical methods in gene expression analysis and gene interpretation software, there are overlaps between the data that are deposited in these high-throughput data repositories. Consequently, a web-based application called the ‘Microarray Retriever’ (68) has been developed to retrieve gene expression data from both the GEO and ArrayExpress repositories in order to maximize the potential for large-sample microarray studies. Similarly, GEOmetadb is an improvement over the querying capabilities of the GEO repository (69). Although this application is currently only available for GEO, it is anticipated that meta-analysis applications will become increasingly useful.
Despite the availability of these software packages, it remains difficult to use the data output of a quality control and normalization application in a subsequent clustering or feature selection application (70). Furthermore, there is a need to translate lists of gene symbols from a feature selection application before they could be interpreted by a particular GO application. Workflow applications, such as GeneTrailExpress and Taverna, address this issue in different ways. GeneTrailExpress is a comprehensive web-based application that implements its own normalization, statistical analysis, interpretation, and visualization modules based on common methods (71). Taverna is more general and builds workflows for caBIG certified web services (72).
Another web-based bioinformatics resource for biomarker identification, the so-called omniBioMarker, has been developed by Phan et al. (73) (see Figure 3). In this software tool, biomarkers are identified through several steps including quality control and normalization, feature selection, biological interpretation, validation, and clinical prediction. Since one cannot expect a single path within the biomarker identification pipeline to perform well for all possible datasets (74), unique analysis parameters will need to be applied for each specific clinical problem. The computational foundation of omniBiomarker addresses this problem by fine-tuning every step in the pipeline to a particular dataset or clinical problem based on prior biological knowledge (73). This knowledge is used to overcome the “curse of dimensionality” (see Box 1 and Figure 4) and to stabilize the results, as well as to increase the reproducibility of clinical prediction.
Figure 3.
Schematic illustration of the use of omniBioMarker for biomarker identification and clinical validation. (A): Shown is a knowledge-guided workflow for identifying differentially expressed genes. This protocol consists of three major steps. First, high-throughput –‘omics’ data are collected from microarray gene expression studies together with previous biological knowledge concerning the disease of interest (1). The biological knowledge is then used to guide the feature selection process and to identify those algorithms that will result in rankings with maximal biological relevance (2). Finally, candidate biomarkers are validated using RT-PCR or multiplexed nanotechnology (3). (B): Shown here are the results of renal cell carcinoma studies with a list of candidate biomarkers as the system output. Using omniBiomarker, several algorithms for feature ranking and selection can be tested simultaneously, yielding a short list of biomarkers that are most strongly correlated with biological function or clinical outcome. omniBiomarker is accessible via http://omnibiomarker.bme.gatech.edu/.
Figure 4.
Graphic drawing showing how prior knowledge can be used to guess a shape from a limited number of data points. As information about the geometric shape is introduced (upper), the number of possible solutions shrinks. Eventually the true shape emerges as the only solution (middle). Likewise, the biomarker search space is large due to the variety of available feature selection algorithms, each of which produces a different list of candidate biomarkers (lower). As information about biomarkers is introduced, the number of valid feature selection algorithms decreases, leading to an optimal algorithm that can consistently identify the known biomarkers.
The first step in the biomarker identification pipeline is quality control. Due to the stochastic nature of high-throughput data, it is important to assess data quality prior to their further analysis. Moreover, the large quantity of high-throughput data requires specialized software applications. There are several existing applications that assess data quality, particularly for microarrays, within a population of samples while simultaneously estimating and normalizing gene expression. These applications vary in terms of their modeling complexity and usability, ranging from downloadable software packages, for example RMA Express (75) and dChip (76), to web-based applications such as caCorrect (77). Although gene expression assays are generally reproducible (78), statistical artifacts in smaller datasets need to be identified and either corrected or removed prior to further data analysis. For a case study for the use of omniBiomarker for biomarker discovery see Box 2.
Clinical Validation by Multiplexed Molecular Analysis
For large data sets of over 100,000 genes and proteins, computing tools can be used to select and optimize a small panel of biomarkers (perhaps a dozen of genes or proteins) that are strong predictors for patient outcome or therapeutic response. Nanoparticles that are conjugated to antibodies can be designed with the purpose of following this small set of biomarkers for molecular diagnosis and targeted therapy. In particular, multiplexed QD probes can be used to profile a selected biomarker panel in typical clinical tissue specimens, such as needle biopsies and tissue microarrays. The use of around five to ten protein biomarkers could have a significant impact on disease diagnosis, as well as in guiding individualized treatment. Towards these goals, Xing et al. (18) have obtained promising results for the molecular profiling of clinical formalin-fixed paraffin-embedded (FFPE) prostate specimens. In this study, four QD-antibody conjugates have been used to recognize and detect four tumor antigens, the E3 ubiquitin ligase mdm-2, the tumor-suppressor p53, the zinc-finger transcription factor EGR-1 and the cyclin-dependent kinase inhibitor p21/CDN1A. These markers are known to be important in prostate cancer diagnosis and have been correlated with tumor behavior (79, 80). Recent work also confirmed that the results from molecular profiling with QDs was consistent with results obtained by traditional immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) using human breast cancer cells (19). It is important to note that tumor classification with antigens which are expressed at low levels can be subjective and therefore requires experienced observers which furthermore can often contribute to considerable variations. In contrast, quantitative QD measurements allow accurate and user-independent determination of tumor antigens even when they are expressed at low levels. Thus, the quantitative nature of QD-based molecular profiling could simplify and standardize categorization of antigens of low-abundance on intact cells and tissue specimens. This is of fundamental importance in the management of breast cancer, since the likely benefit of hormonal therapies and trastuzumab directly depends not only on the presence, but also the quantity of hormone or HER2 receptors (81-83).
Prospects and Challenges
Looking into the future, there are a number of research directions that are particularly promising for biomedical applications but require additional concerted efforts for success. The first direction of research is the design and development of nanoparticles, either with only one or multiple functionalities. For applications in cancer and other medical conditions, relevant nanoparticle functions include imaging (either as single or dual-modality) and therapy, via their delivery of a drug or of a combination of several drugs, as well as targeting through one or more ligands. With each function added, nanoparticles could be designed to have novel properties and applications. For example, binary nanoparticles with two functionalities could be developed for molecular imaging and targeted therapy, or for simultaneous imaging and therapy (but without targeting). Bioconjugated QDs, which have both targeting and imaging functions, could be used for targeted tumor imaging and for molecular profiling applications. Conversely, ternary nanoparticles that combine three functions could be designed so that they would allow for simultaneous imaging and targeted therapy.
The second direction of research will address the optimizing biomarker panels using bioinformatics and quantitative molecular profiling with the help of nanotechnology, for example by using bioconjugated nanoparticle probes in order to predict cancer behavior, clinical outcome, treatment response, and thus to individualize or personalize therapy. Such an approach should ideally start with retrospective studies of archived specimens as the patient outcome is already known for these specimens. The key hypotheses that will need to be tested are that a panel of tumor markers will allow more accurate correlations than relying on single tumor markers; and that the combination of tumor gene expression and host stroma molecular data is necessary for defining aggressive phenotypes of cancer, as well as for determining the response of early stage disease to treatment (chemotherapy, radiation, or surgery). The third important research direction is to further investigate nanoparticle distribution, excretion, metabolism, and pharmacodynamics in in-vivo animal models. These studies will be very important in the development of nanoparticles for clinical applications in cancer imaging or therapy.
Box 1: The “Curse of Dimensionality” and its Knowledge-Based Solution.
The curse of dimensionality is a problem originally coined by Bellman (84) to describe an exponential increase in mathematical space with increasing dimensions. Consider a one-dimensional interval of unit length. We can sample 5 points from this space such that the points are evenly spaced at distance of 0.2 from adjacent points. If we increase the dimension of this space to two—a square—the number of evenly spaced points increases to 25. At three dimensions, this number increases to 53 or 125. For 10 dimensions in mathematical space, the number explodes to 510 or 9,765,625.
This problem also occurs in classic pattern recognition because the training set size increases exponentially when more features (dimensions) are required to classify (or recognize) future samples. The same problem is encountered in microarray data analysis and gene classification because the number of patient samples (data size) is much smaller than the number of genes (feature size or dimensions) analyzed for each patient. Thus, clinical classification of new patients is “cursed” by such a mismatch between the small patient population and the large number of genomic features. Bioinformaticians have handled this problem by reducing the dimension size of microarray samples using feature selection algorithms. Feature selection algorithms search for the best set of genes with the highest potential for accurate prediction. Because gene expression is not independent, feature selection algorithms must identify groups of genes that act in concert (85). However, feature selection is also subject to the curse of dimensionality. The space of possible gene combinations increases exponentially as the number of dimensions increases. For example, consider the problem of searching for a single gene from a set of N total genes. Here we need only evaluate N genes to find the best predictor. But if we are interested in a pair of genes, we will need to evaluate N*(N-1)/2 total gene pairs. In the case of choosing k genes out of N total, the number of unique groups grows exponentially to N!/[k! (N-k)!]. Indeed, there is no existing algorithm that can find the true optimal set of genes from a typical gene expression dataset. As such, there are many algorithms that “approximate” the optimal solution from a reduced search space (86, 87).
Further compounding the curse of dimensionality is the existence of a variety of feature selection algorithms, each of which produces different results on a particular gene expression dataset. Consequently, bioinformaticians must choose from a population of sub-optimal feature selection algorithms, and hope that their selection yields the best results. One way to combat this “curse” is to use biological knowledge to guide the algorithm selection process. For example, one can take advantage of many previous studies that have identified and validated biomarkers for particular clinical problems. Assuming that there is reliable knowledge for a specific clinical problem, one can assess the ability of different feature selection methods to identify these features while reducing false discoveries. In this knowledge-based approach, the method that is “optimal” within the search space of given knowledge and ranking methods results in minimal false discoveries.
Box 2: A Case Study – Biomarker Discovery for Classification of Renal Cell Carcinoma (RCC) Subtypes.
Human renal cell carcinoma (RCC) has several distinct subtypes including clear cell carcinoma, chromophobe, papillary, and oncocytoma (88). Individuals diagnosed with renal cancer could have one or more of these subtypes in varying degrees (89), and must be treated according to the diagnosed subtype in order to maximize treatment success. To make these decisions, pathologists rely not only on their expert knowledge but also on other indicators to decide the best course of treatment. Morphologic tissue indicators are the simplest diagnostic tool, but the accuracy of the diagnosis can be further improved by measuring the expression level or spatial location of protein markers present in tissue biopsies. For example, protein markers that differentiate the clear cell group from other subtypes are important due to the malignancy of the clear cell subtype compared to other subtypes such as oncocytoma. More importantly, subtypes such as oncocytoma and chromophobe are morphologically similar, yet present different clinical problems (90). Oncocytoma is a benign tumor whereas chromophobe is malignant, so there is a greater benefit to using biomarkers to improve diagnostic accuracy.
As a test case, we have used omniBioMarker to discover new biomarkers for classification of RCC subtypes. This was accomplished by integrating knowledge about existing renal cancer biomarkers and by choosing a ‘biologically relevant’ feature selection algorithm (73). We define a ‘biologically relevant’ method as one that favors previously validated or established biomarkers over other genes. Because feature selection is sensitive to the dataset as well as algorithm parameters, we can expect a gene expression dataset to yield highly variable gene lists using different methods. Likewise, a particular feature selection algorithm might perform well on some datasets, but not others. The additional knowledge introduced to the biomarker discovery pipeline decreases the uncertainty of feature selection and improves reproducibility. Our reference knowledge in the renal cancer study included RT-PCR validated genes from existing and reliable studies, as well as from in-house RT-PCR experiments.
In addition to gene ranking and feature selection functions, the omniBiomarker application provides an interface for storage, retrieval, and normalization of gene expression data. Prior to uploading the renal cancer microarray data to omniBiomarker, we processed the data using caCorrect to identify and remove spatially correlated statistical artifacts (76). We also imported the data into ArrayWiki (available at: http://arraywiki.bme.gatech.edu/index.php/Molecular_Classification_of_Renal_Tumors_by_Gen e_Expression_Profiling), which displays graphical representations of the data quality (67). After uploading the quality-checked gene expression data to omniBiomarker, we initiated the feature selection process from omniBiomarker’s analysis interface, which includes options for several methods that assess the predictive ability of genes. omniBiomarker enables users to execute several different ranking processes simultaneously and identify the specific ranking parameters that are most biologically relevant. This test case has identified a list of candidate biomarkers (see Table 1) for differentiating the subtypes of renal cell carcinoma. The next step is to experimentally validate these markers by using RT-PCR and/or immunohistochemical analysis of clinical tissue specimens.
Table 1. List of Candidate Biomarkers Discovered by Using omniBiomarker.
| Gene Symbol | Gene Title |
|---|---|
| ACLY | ATP citrate lyase |
| CXCR4 | chemokine (C-X-C motif) receptor 4 |
| C4A/C4B | complement component 4A /4B |
| FLNA | filamin A, alpha (actin binding protein 280) |
| PMP22 | peripheral myelin protein 22 |
| PFKFB3 | 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3 |
| KLF10 | Kruppel-like factor 10 |
| PRG1 | proteoglycan 1, secretory granule |
| LGALS1 | lectin, galactoside-binding, soluble, 1 (galectin 1) |
| PCCB | propionyl Coenzyme A carboxylase, beta polypeptide |
| TMSB10 | thymosin, beta 10 |
| HCLS1 | hematopoietic cell-specific Lyn substrate 1 |
| ACTA2 | actin, alpha 2, smooth muscle, aorta |
| IGFBP3 | insulin-like growth factor binding protein 3 |
| NFKBIA | nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha |
| CD44 | CD44 molecule (Indian blood group) |
| IER3 | immediate early response 3 |
ACKNOWLEDGMENTS
We are grateful to Drs Yun Xing, Tao Liu, Brian Leyland-Jones, John Petros, Georgia Chen, Lily Yang, and Dong Shin of Emory University School of Medicine for insightful discussions. This work was supported by grants from the US National Cancer Institute Centers of Cancer Nanotechnology Excellence (CCNE) Program (U54CA119338 to S.N. and M.D.W.) and the Bioengineering Research Partnerships Program (BRP) (R01CA108468 to S.N.). S.N. and M.D.W. also acknowledge the Georgia Cancer Coalition (GCC) for distinguished cancer scholar awards.
Footnotes
COMPETING INTERESTS STATEMENT. The authors declare that they have no competing financial interests.
REFERENCES
- 1.Ginsburg GS, McCarthy JJ. Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol. 2001;19:491–496. doi: 10.1016/s0167-7799(01)01814-5. [DOI] [PubMed] [Google Scholar]
- 2.Little PFR, Williams RBH, Wilkins MR. Inter-individual variation in expression: a missing link in biomarker biology. Trends Biotechnol. 2008;27:5–10. doi: 10.1016/j.tibtech.2008.10.002. [DOI] [PubMed] [Google Scholar]
- 3.Jain KK. Personalized medicine. Curr. Opinion Mol. Ther. 2002;4:548–558. [PubMed] [Google Scholar]
- 4.Allison M. Is personalized medicine finally arriving. Nat. Biotechnol. 2008;26:509–517. doi: 10.1038/nbt0508-509. [DOI] [PubMed] [Google Scholar]
- 5.Hepper GH. Tumor heterogeneity. Cancer Res. 1984;44:2259–2265. [PubMed] [Google Scholar]
- 6.Liu AY, Roudier MP, True LD. Heterogeneity in primary and metastatic porstate cancer as defined by cell surface CD profile. Am. J. Path. 2004;165:1543–1556. doi: 10.1016/S0002-9440(10)63412-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Steeg PS. Heterogeneity of drug target expression among metastatic lesions: lessons from a breast cancer autopy program. Clin. Cancer Res. 2008;14:3643–3645. doi: 10.1158/1078-0432.CCR-08-1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wu JM, et al. Heterogeneity of breast cancer metastasis: comparison of therapeutic target expression and promoter methylation between primary tumors and their multifocal metastases. Clin. Cancer Res. 2008;14:1938–1946. doi: 10.1158/1078-0432.CCR-07-4082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ferrari M. Cancer nanotechnology: Opportunities and challenges. Nat. Rev. Cancer. 2005;5:161–71. doi: 10.1038/nrc1566. [DOI] [PubMed] [Google Scholar]
- 10.Wang X, Yang L, Zhuo GZ, Shin DM. Application of nanotechnology in cancer therapy and imaging. CA Cancer J. Clin. 2008;58:97–110. doi: 10.3322/CA.2007.0003. [DOI] [PubMed] [Google Scholar]
- 11.Nie S,M, Xing Y, Kim GJ, Simons JW. Nanotechnology applications in cancer. Ann. Rev. Biomed. Eng. 2007;9:257–288. doi: 10.1146/annurev.bioeng.9.060906.152025. [DOI] [PubMed] [Google Scholar]
- 12.Chan WCW, Nie SM. Quantum dot bioconjugates for ultrasensitive nonisotopic detection. Science. 1998;281:2016–2018. doi: 10.1126/science.281.5385.2016. [DOI] [PubMed] [Google Scholar]
- 13.Alivisatos AP. The use of nanocrystals in biological detection. Nat. Biotechnol. 2004;22:47–52. doi: 10.1038/nbt927. [DOI] [PubMed] [Google Scholar]
- 14.Michalet X, et al. Quantum dots for live cells, in vivo imaging, and diagnostics. Science. 2005;307:538–544. doi: 10.1126/science.1104274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gao XH, Yang L, Petros JA, Marshall FF, Simons JW, Nie SM. In-vivo molecular and cellular imaging with quantum dots. Current Opinion in Biotechnol. 2005;16:63–72. doi: 10.1016/j.copbio.2004.11.003. [DOI] [PubMed] [Google Scholar]
- 16.Gao XH, Nie SM. Molecular profiling of single cells and tissue specimens with quantum dots. Trends Biotechnol. 2003;21:371–373. doi: 10.1016/S0167-7799(03)00209-9. [DOI] [PubMed] [Google Scholar]
- 17.Xing Y, et al. Molecular profiling of single cancer cells and clinical tissue specimens with semiconductor quantum dots. Int. J. Nanomed. 2006;1:473–481. doi: 10.2147/nano.2006.1.4.473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xing Y, et al. Bioconjugated quantum dots for multiplexed and quantitative immunohistochemistry. Nature Protocols. 2007;2:1152–1165. doi: 10.1038/nprot.2007.107. [DOI] [PubMed] [Google Scholar]
- 19.Yezhelyev MV, Al-Hajj A, Morris C, et al. In situ molecular profiling of breast cancer biomarkers with multicolor quantum dots. Adv. Mat. 2007;19:3146. [Google Scholar]
- 20.Ghazani AA, et al. High throughput quantification of protein expression of cancer antigens in tissue microarray using quantum dot nanocrystals. NanoLett. 2006;6:2881–2886. doi: 10.1021/nl062111n. [DOI] [PubMed] [Google Scholar]
- 21.Yezhelyev M, Gao XH, Xing Y, Al-Hajj A, Nie SM, O’Regan RM. Emerging use of nanoparticles in diagnosis and treatment of breast cancer. Lancet Oncol. 2006;7:657–767. doi: 10.1016/S1470-2045(06)70793-8. [DOI] [PubMed] [Google Scholar]
- 22.Sinha R, Kim GK, Nie SM, Shin DM. Nanotechnology in cancer therapeutics: bioconjugated nanoparticles for drug delivery. Mol Cancer Ther. 2006;5:1909–1917. doi: 10.1158/1535-7163.MCT-06-0141. [DOI] [PubMed] [Google Scholar]
- 23.Davis ME, Chen Z, Shin DM. Nanoparticle therapeutics: an emerging treatment modality for cancer. Nat. Rev. Drug Disc. 2008;7:771–782. doi: 10.1038/nrd2614. [DOI] [PubMed] [Google Scholar]
- 24.Jain RK. Transport of molecules, particles, and cells in solid tumors. Ann. Rev. Biomed. Eng. 1999;1:241–263. doi: 10.1146/annurev.bioeng.1.1.241. [DOI] [PubMed] [Google Scholar]
- 25.Jain RK. Delivery of molecular and cellular Medicine to solid tumors. Avd. Drug Del. Rev. 2001;46:149–168. doi: 10.1016/s0169-409x(00)00131-9. [DOI] [PubMed] [Google Scholar]
- 26.Jain RK. The next frontier of molecular medicine: Delivery of therapeutics. Nat. Med. 1998;4:655–657. doi: 10.1038/nm0698-655. [DOI] [PubMed] [Google Scholar]
- 27.Wang MD, Simons JW, Nie SM. Biomedical nanotechnology with bioinformatics - The promise and current progress. Proc. IEEE. 2007;95:1386–1389. [Google Scholar]
- 28.Liotta L, Petricoin E. Molecular profiling of human cancer. Nature Reviews Genetics. 2000;1:48–56. doi: 10.1038/35049567. [DOI] [PubMed] [Google Scholar]
- 29.Petricoin EF, Zoon KC, Kohn EC, Barrett JC, Liotta LA. Clinical proteomics: Translating benchside promise into bedside reality. Nat. Rev. Drug Discov. 2002;1:683–695. doi: 10.1038/nrd891. [DOI] [PubMed] [Google Scholar]
- 30.Negm RS, Verma M, Srivastava S. The promise of biomarkers in cancer screening and detection. Trends Mol. Med. 2002;8:288–293. doi: 10.1016/s1471-4914(02)02353-5. [DOI] [PubMed] [Google Scholar]
- 31.Ludwig JA, Weistein JN. Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer. 2005;5:845–856. doi: 10.1038/nrc1739. [DOI] [PubMed] [Google Scholar]
- 32.Golub TR, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. [DOI] [PubMed] [Google Scholar]
- 33.Ross DT, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 2000;24:227–235. doi: 10.1038/73432. [DOI] [PubMed] [Google Scholar]
- 34.Alizadeh AA, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. [DOI] [PubMed] [Google Scholar]
- 35.Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–752. doi: 10.1038/35021093. [DOI] [PubMed] [Google Scholar]
- 36.Bittner M, Meitzer P, Chen Y, Jiang Y, Seftor E, et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000;406:536–540. doi: 10.1038/35020115. [DOI] [PubMed] [Google Scholar]
- 37.Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, et al. Delineation of prognostic biomarkers in prostate cancer. Nature. 2001;412:822–826. doi: 10.1038/35090585. [DOI] [PubMed] [Google Scholar]
- 38.Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 2004;351:2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
- 39.Chen, et al. A five-gene signature and clinical outcome in non-small-cell lung cancer. New Eng. J. Med. 2007;356:11–20. doi: 10.1056/NEJMoa060096. [DOI] [PubMed] [Google Scholar]
- 40.Beer DG, et al. Gene expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 2002;8:816–824. doi: 10.1038/nm733. [DOI] [PubMed] [Google Scholar]
- 41.Hoshida Y. Gene expression in fixed tissues and outcome in heptocellular carcinoma. New Eng. J. Med. 2008;359:1995–2004. doi: 10.1056/NEJMoa0804525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hynes NE, Lane HA. ERBB receptors and cancer: The complexity of targeted inhibitors. Nat. Rev. Cancer. 2005;5:341–354. doi: 10.1038/nrc1609. [DOI] [PubMed] [Google Scholar]
- 43.Osborne CK. Drug therapy – Tamoxifen in the treatment of breast cancer. New Eng. J. Med. 1998;339:1609–1618. doi: 10.1056/NEJM199811263392207. [DOI] [PubMed] [Google Scholar]
- 44.Eisen M, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Geraci F, Pellegrini M, Renda M. AMIC@: All MIcroarray Clusterings @ once. Nucleic Acids Res. 2008;36:W315–W319. doi: 10.1093/nar/gkn265. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.de Souto M, et al. Cluster cancer gene expression data: a comparative study. BMC Bioinformatics. 2008;9:497. doi: 10.1186/1471-2105-9-497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.de Revnies A, et al. Gene Expression Profiling Reveals a New Classification of Adrenocortical Tumors and Identifies Molecular Predictors of Malignancy and Survival. J Clin Oncol. 2009 doi: 10.1200/JCO.2008.18.5678. /doi/10.1200/JCO.2008.18.5678. [DOI] [PubMed] [Google Scholar]
- 48.Rehrauer H, Zoller S, Schlapbach R. MAGMA: analysis of two-channel microarrays made easy. Nuc. Acids Res. 2007;35:W86–W90. doi: 10.1093/nar/gkm302. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pirooznia M, et al. ILOOP - a web application for two-channel microarray interwoven loop design. BMC Genomics. 2008;9(Suppl 2):S11. doi: 10.1186/1471-2164-9-S2-S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Tarraga J, et al. GEPAS, a web-based tool for microarray data analysis and interpretation. Nuc. Acids. Res. 2008;36:W308–W314. doi: 10.1093/nar/gkn303. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rainer J, et al. CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis. Nuc. Acids Res. 2006;34:W498–W503. doi: 10.1093/nar/gkl038. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Gentleman R, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Reich M, et al. GenePattern 2.0. Nature Genetics. 2006;38:500–501. doi: 10.1038/ng0506-500. [DOI] [PubMed] [Google Scholar]
- 54.Cancer Biomedical Informatics Grid (caBIG) see: https://cabig.nci.nih.gov/ [PMC free article] [PubMed]
- 55.Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation study. Lancet. 2005;365:488–492. doi: 10.1016/S0140-6736(05)17866-0. [DOI] [PubMed] [Google Scholar]
- 56.Consortium, T.G.O. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zeeberg B, et al. GoMiner: A Resource for Biological Interpretation of Genomic and Proteomic Data. Genome Biol. 2003;4:R28. doi: 10.1186/gb-2003-4-4-r28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zeeberg B, et al. High-Throughput GoMiner, an ‘industrial-strength’ integrative Gene Ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID) BMC Bioinformatics. 2005;6:168. doi: 10.1186/1471-2105-6-168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Beissbarth T, Speed T. GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004;20:1464–1465. doi: 10.1093/bioinformatics/bth088. [DOI] [PubMed] [Google Scholar]
- 60.Carbon S, et al. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25:288–289. doi: 10.1093/bioinformatics/btn615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks. Bioinformatics. 2005;21:3448–3449. doi: 10.1093/bioinformatics/bti551. [DOI] [PubMed] [Google Scholar]
- 62.Zheng Q, Wang X-J. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nuc. Acids Res. 2008;36:W358–W363. doi: 10.1093/nar/gkn276. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Frijters R, et al. CoPub: a literature-based keyword enrichment tool for microarray data analysis. Nuc. Acids Res. 2008;36:W406–W410. doi: 10.1093/nar/gkn215. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Edgar R, Domrachev M, Lash A. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 2002;30:207–210. doi: 10.1093/nar/30.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Parkinson H, et al. ArrayExpress-a public database of microarray experiments and gene expression profiles. Nuc. Acids Res. 2006;35:D747–D750. doi: 10.1093/nar/gkl995. Database Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Brazma A, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics. 2001;29:365–371. doi: 10.1038/ng1201-365. [DOI] [PubMed] [Google Scholar]
- 67.Stokes T, et al. ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analysis. BMC Bioinformatics. 2008;9(Suppl 6):S18. doi: 10.1186/1471-2105-9-S6-S18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ivliev A, et al. Microarray retriever: a web-based tool for searching and large scale retrieval of public microarray data. Nuc. Acids Res. 2008;36:W327–W331. doi: 10.1093/nar/gkn213. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zhu Y, et al. GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics. 2008;24:2798–2800. doi: 10.1093/bioinformatics/btn520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Ochs M, Casagrande J. Information systems for cancer research. Cancer Invest. 2008;26:1060–1067. doi: 10.1080/07357900802272729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Keller A, et al. GeneTrailExpress: a web-based pipeline for the statistical evaluation of microarray experiments. BMC Bioinformatics. 2008;9:552. doi: 10.1186/1471-2105-9-552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Hull D, et al. Taverna: a tool for building and running workflows of services. Nucl. Acids Res. 2006;34:W729–W732. doi: 10.1093/nar/gkl320. Web Server Issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Phan J, et al. Improving the Efficiency of Biomarker Identification Using Biological Knowledge. Pacific Symposium on Biocomputing. 2009;14:427–438. [PMC free article] [PubMed] [Google Scholar]
- 74.Fogel G. Computational intelligence approaches for pattern discovery in biological systems. Briefings in Bioinformatics. 2008;9:307–316. doi: 10.1093/bib/bbn021. [DOI] [PubMed] [Google Scholar]
- 75.Irizarry R, et al. Summaries of Affymetrix GeneChip probe level data. Nuc. Acids Res. 2003;31:e15. doi: 10.1093/nar/gng015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Li C. Automating dChip: toward reproducible sharing of microarray data analysis. BMC Bioinformatics. 2008;9:231. doi: 10.1186/1471-2105-9-231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Stokes T, et al. chip artifact CORRECTion (caCORRECT): A Bioinformatics System for Quality Assurance of Genomics and Proteomics Array Data. Ann. Biomed. Eng. 2007;35:1068–1080. doi: 10.1007/s10439-007-9313-y. [DOI] [PubMed] [Google Scholar]
- 78.Shi L, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–61. doi: 10.1038/nbt1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Hernandez I, et al. Prostate-specific expression of p53 (R172L) differentially regulates p21, Bax, and mdm2 to inhibit prostate cancer progression and prolong survival. Molecular Cancer Research. 2003;1:1036–1047. [PubMed] [Google Scholar]
- 80.Mora GR, Olivier KR, Mitchell RF, Jenkins RB, Tindall DJ. Regulation of expression of the early growth response gene-1 (EGR-1) in malignant and benign cells of the prostate. Prostate. 2005;63:198–207. doi: 10.1002/pros.20153. [DOI] [PubMed] [Google Scholar]
- 81.Marquez DC, Pietras RJ. Membrane-associated binding sites for estrogen contribute to growth regulation of human breast cancer cells. Oncogene. 2001;20:5420–5430. doi: 10.1038/sj.onc.1204729. [DOI] [PubMed] [Google Scholar]
- 82.Hicks DG, Tubbs RR. Assessment of the HER2 status in breast cancer by fluorescence in situ hybridization: a technical review with interpretive guidelines. Hum Pathol. 2005;36:250–261. doi: 10.1016/j.humpath.2004.11.010. [DOI] [PubMed] [Google Scholar]
- 83.Konecny G, et al. Quantitative association between HER-2/neu and steroid hormone receptors in hormone receptor-positive primary breast cancer. J Natl Cancer Inst. 2003;95:142–153. doi: 10.1093/jnci/95.2.142. [DOI] [PubMed] [Google Scholar]
- 84.Bellman RE. Adaptive Control Processes. Princeton University Press; Princeton, NJ: 1961. [Google Scholar]
- 85.Xiao Y, et al. Multivariate search for differentially expressed gene combinations. BMC Bioinformatics. 2004;5:164. doi: 10.1186/1471-2105-5-164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Ding C, Peng H. Minumum redundancy feature selection from microarray gene expression data. J. Bioin. Comput. Biol. 2005;3:185–205. doi: 10.1142/s0219720005001004. [DOI] [PubMed] [Google Scholar]
- 87.Hua J, et al. Optimal number of features as a function of sample size for various classification rules. Bioinformatics. 2005;21:1509–1515. doi: 10.1093/bioinformatics/bti171. [DOI] [PubMed] [Google Scholar]
- 88.Schuetz AN, et al. Molecular Classification of Renal Tumors by Gene Expression Profiling. J. Mol. Diagn. 2005;7:206–218. doi: 10.1016/S1525-1578(10)60547-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Skubitz K, et al. Differential gene expression identifies subgroups of renal cell carcinoma. J. Lab. Clin. Med. 2006;147:250–267. doi: 10.1016/j.lab.2006.04.001. [DOI] [PubMed] [Google Scholar]
- 90.Rohan S, et al. Gene expression profiling separates chromophobe renal cell carcinoma from oncocytoma and identifies vesicular transport and cell junction proteins as differentially expressed genes. Clin. Cancer Res. 2006;12:6937–45. doi: 10.1158/1078-0432.CCR-06-1268. [DOI] [PubMed] [Google Scholar]




