Abstract
Global expression profiling by DNA microarrays provides a snapshot of cell and tissue status and becomes an essential tool in biological and medical sciences. Typical questions that can be addressed by microarray analysis in developmental biology include: (1) to find a set of genes expressed in a specific cell type; (2) to identify genes expressed commonly in multiple cell types; (3) to follow the time-course changes of gene expression patterns; (4) to demonstrate cell’s identity by showing similarities or differences among two or multiple cell types; (5) to find regulatory pathways and/or networks affected by gene manipulations, such as overexpression or repression of gene expression; (6) to find downstream target genes of transcription factors; (7) to find downstream target genes of cell signaling; (8) to examine the effects of environmental manipulation of cells on gene expression patterns; and (9) to find the effects of genetic manipulation in embryos and adults. Here we describe strategies for executing these experiments and monitoring changes of cell state with gene expression microarrays in application to mouse embryology. Both statistical assessment and interpretation of data are discussed. We also present a protocol for performing microarray analysis on a small amount of embryonic materials.
1. INTRODUCTION
Microarray technology is a high-throughput experimental tool to obtain the expression levels of essentially all genes by quantitating their transcripts (RNAs) amounts. Resultant expression profiles of cells and tissues provide snapshots of cell status, thereby uncover molecular signatures of various cell types and tissues, and explain embryo development by gene expression regulations. As numerous excellent reviews and technical guides in gene expression profiling technologies have already been published, adding another method article to this vast amount of literatures is hard to justify. Nevertheless, we have written this chapter with a hope that our hands-on experience in applying the technologies to mouse embryology may still be useful to the research community, as our lab has a long-standing interest in the global expression profiling (Ko, 1990, 2001, 2006) and has analyzed mouse embryos and cell cultures with more than 2000 microarrays for the past 10 years. However, our experiences are limited to cDNA clone-spotted nylon membrane arrays (Tanaka et al., 2000) and Agilent Technologies’ in situ synthesized 60-mer oligonucleotide glass slide microarrays. Our lab has indeed designed oligonucleotide sequences of Agilent’s first mouse microarray (“mouse development array”; (Carter et al., 2003); updated version (Carter et al., 2005)) in collaboration. Here we describe the methods that our lab has been routinely using with the Agilent microarrays. Although there are platform-specific issues, we believe that most of the methods and considerations described here can be applied to other platforms. Affymetrix platform has been covered in details in previous article in Methods in Enzymology (Hipp and Atala, 2006). Following the convention, gene sequences placed on microarrays are called “probes”, whereas RNA sequences labeled and hybridized to probes are called “targets” throughout this paper.
2. CONSIDERATIONS FOR METHODS OF GENE EXPRESSION PROFILING
Most frequently performed expression profiling is for protein-coding RNAs, i.e., transcripts from DNA sequences commonly called genes. A variety of methods has been developed and used, including traditional Northern blotting and quantitative RT-PCR, and state-of-art RNA-seq (discussed below). As its running cost has been significantly reduced lately, microarray-based expression profiling can be a routine method of choice. However, there are some situations where the standard microarray technology does not seem to be best suited. Let us discuss these situations in the following sections.
2.1. Noncoding RNAs, microRNAs, and proteins
Noncoding RNAs (ncRNAs), including both long ncRNAs (Wilusz et al., 2009) and MicroRNAs (miRNAs) (Olena and Patton, 2010) that regulate the expression of target mRNAs, are increasingly important but requires specialized techniques to quantitate for miRNAs (Thomson et al., 2007) and specialized microarray for long ncRNAs (Babak et al., 2005). However, as noncoding RNAs often affect the levels of coding RNAs, regular microarray-based expression profiling can still provide the overall status of cells and tissues. Similarly, protein profiling begins to be used as a way to examine the status of the cells; however, considering the changes of protein profiles also affect the global profiled of coding RNAs, microarray-based profiling of RNAs often provides sufficient information about the status of cells and tissues.
2.2. Spatial resolution in complex tissues and organs
One important issue to consider is the poor spatial resolution by microarray technology: gene expression patterns in complex tissues and organs, which consist of many different cell types, are difficult to assess. One way is to isolate specific cells in tissues/organs by microdissection (manually or laser-captured) or FACS-sorting, and then carry out the microarray analysis. This usually requires the microarray analysis of small amount of materials (see the section 4). For example, primordial germ cells (PGC) are usually extracted by dissociating the embryonic gonads into individual cells by trypsin, staining with germ cell specific marker, and subsequent sorting (Abe et al., 1998; McCarrey et al., 1987). Another possible approach is gene expression profiling of individual cells of dissociated embryo, and then reconstructing cell type from gene expression. For example, it was possible to classify cells from the blastocyst into trophectoderm and ICM groups on the basis of their gene expression (Kurimoto et al., 2006). However, gene expression profiling of individual cells is challenging (see the section 4).
Alternatively, a large number of genes selected after carrying out the microarray analysis can be examined by the whole mount in situ hybridizations (WISH). For example, WISH analyses have shown the localization of transcripts for nearly 100 genes in mouse blastocysts (Yoshikawa et al., 2006) and those for nearly 250 genes in mouse ES cell colonies (Carter et al., 2008).
2.3. Assessing absolute mRNA abundance with RNA-seq
Microarrays technology is reliable for comparison of the expression of the same gene in various cell types or various conditions. However, it is not best suited to assess absolute mRNA abundance and compare the expression levels of different genes in the same cell type, because the intensity of the signal can be varied depending on oligonucleotide probes. For example, signal intensity in microarrays depends strongly on the location of the oligonucleotide probe within the transcript. Because the process of labeling starts from polyA tails, probes located far from the 3′ end of the transcript often show weaker signals that those located near the 3′-end. Other factors affecting signal intensity include differential efficiency of target amplification, nucleotide composition of probes, non-specific hybridization, and cooperative hybridization (see the section 5.5).
Recent advances in deep sequencing technologies (Mardis, 2008) have led to the emergence of the RNA-seq method, which yields reliable estimates of absolute mRNA abundance (Cloonan et al., 2008; Marioni et al., 2008; Mortazavi et al., 2008; Sultan et al., 2008). This method is based on sequencing of short cDNA fragments using the Illumina Genome Analyzer (Mortazavi et al., 2008), Applied Biosystems SOLiD technology, which is based on “sequencing by ligation” chemistry (Cloonan et al., 2008), or Roche 454 Life Science. Sequence tags are aligned back to the transcriptome to identify corresponding genes.
As the RNA-seq becomes more affordable, it is often viewed as potentially a better method for gene expression profiling than microarrays (Oshlack and Wakefield, 2009; Wang et al., 2009). Furthermore, transcriptomes that can be measured by the RNA-seq are potentially broader that those measured by microarrays, as the detection is not limited to probes present on microarrays. However, several limitations of the RNA-seq method make it unlikely to replace microarrays in the near future. First, because the number of sequenced tags per gene is proportional to transcript length, short transcripts are represented by a small number of tags. As a result, the method has a “transcript-length bias”, which reduces statistical power to detect differential expression of short mRNAs compared to long ones (Oshlack and Wakefield, 2009). In contrast, microarrays have a uniform statistical power for both short and long transcripts. A similar problem may exist for low-expressed genes, which are also represented by a small number of tags in RNA-seq. Second, reliable measurement of mRNA abundance in mammals requires large RNA-seq data sets ranging from 40 to 100 million tags (Cloonan et al., 2008; Mortazavi et al., 2008). As a result, the cost for RNA-seq experiment is currently ~100-fold higher than that for microarrays. Although both technologies will become less expensive in the future, the cost ratio may not change much. Third, RNA-seq is substantially more labor-intensive and time-consuming. Finally, microarray data can be easily adjusted to represent the absolute abundance of mRNA: by doing RNA-seq and microarray analyses on the same RNA sample just once, adjustment coefficients can be estimated and then applied to all other microarray results.
3. EXPERIMENTAL STRATEGIES
3.1. Check first existing datasets in the public database
Before embarking on the microarray analysis, one should check whether the same or similar type of study has already been done or not. Vast amount of data have already been made available in the public database, e.g., GEO (Barrett et al., 2009) (http://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (Parkinson et al., 2009) (http://www.ebi.ac.uk/microarray-as/ae/), so one may find necessary information without doing actual expression profiling. This is particularly relevant, if a goal is to find the expression levels of genes of interest in some specific tissues. Global expression profiles of many tissues and organs can also be found in GNF database (Su et al., 2002) (http://biogps.gnf.org/) and NIA database (GEO accession number GSE19806) (http://lgsun.grc.nia.nih.gov/exatlas/).
3.2. Analysis of embryonic materials without experimental manipulation
The most common applications of microarray analysis are to obtain global gene expression profiles of embryonic tissues and cell cultures. Here are some typical questions to be addressed.
Find a set of genes expressed in a specific cell type.
Identify genes expressed commonly in multiple cell types.
Follow the time-course changes of gene expression patterns. For example, gene expression during preimplantation period has been studied by comparing RNA samples from unfertilized eggs, zygotes, 2-cell embryos, 4-cell embryos, 8-cell embryos, morulae, and blastocysts (Hamatani et al., 2004a; Wang et al., 2004; Zeng et al., 2004) (Fig. 1). Expected results from descriptive series include waves of gene activation at various time points of development (Cui et al., 2007; Hamatani et al., 2004a), or organ-specific profiles of gene expression (Frankenberg et al., 2007; Sherwood et al., 2007).
Demonstrate cell’s identity by showing similarities or differences among two or multiple cell types. It is worth mentioning that this task is not that simple as it appears to be. Similarities or differences can always be defined relatively. Only way to show how similar (or even identical) two cell types are to show that differences of global expression profiles are smaller than the variations among the same cell types from different samples or sources. For example, it is concluded that mouse embryonic stem (ES) cells and embryonic germ (EG) cells are indistinguishable in terms of gene expression profiles by showing that the differences between ES cells and EG cells are smaller than those between ES cells derived from different mouse strains or EG cells derived from different mouse strains (Sharova et al., 2007a).
Figure 1.
An example of a bird’s-eye view of gene expression changes during mouse preimplantation development. Adapted from (Ko, 2006).
3.3. Analysis of embryonic materials after experimental manipulation
Microarray analysis of transgenic and gene-targeted mice has become a routine. In addition to the analyses of morphological and physiological defects, global gene expression profiling provides comprehensive pictures related to the effects of individual genes (Chen et al., 2007; Zhu et al., 2007). For example, gene-targeting often results in embryonic lethality but the cause of embryo death remains unknown. Gene expression profiling may elucidate gene regulatory problems, which eventually causes the death of embryo or adult mice (van Loo et al., 2007). Here are some typical questions to be addressed.
Find regulatory pathways and/or networks affected by gene manipulations, such as overexpression or repression of gene expression.
Find downstream target genes of transcription factors.
Find downstream target genes of cell signaling.
Examine the effects of environmental manipulation of cells on gene expression patterns.
Find the effects of genetic manipulation in embryos and adults.
3.3.1. Common issues
The main problem of aforementioned strategy is that results are not always interpretable in terms of cause-effect relationship. Observed changes in the phenotype and gene expression may be mediated by multiple intermediate steps and have little to do with the gene itself. Most of these changes may be compensatory responses of an organism to functional deficiency caused by gene suppression. This problem is most serious for housekeeping genes that are expressed constitutively, but tissue-specific and/or transiently expressed genes also suffer from the lack of causative-relationship due to functional compensation.
3.3.2. Possible solution: use of normal looking embryos at the ear time points
Obviously, one possible approach to this problem is to perform microarray analyses on embryos or embryonic tissues at earlier stage before the manifestation of phenotypic defects. Often, normal-looking transgenic or gene-targeted embryos show the altered gene expression profiles, which provide great insights into the function of the gene of interest (e.g., (Landry et al., 2008)). Even in this case, finding the right control wild-type embryos may not be trivial, because genetic alteration may delay or accelerate the development. Thus, comparison with wild-type embryos collected at multiple time points may be required (Zhu et al., 2007).
Cell culture system may be used as an alternative way to study the effects of gene manipulation. Often, it is easier to make ES cells with null alleles rather than to generate nullizygous animals. Differentiation of mouse ES cells in various conditions (e.g., in the absence of Leukemia Inhibitory Factor (LIF), in the presence of retinoic acids, or DMSO) mimics well certain events in the early embryogenesis. For example, the 3rd day of culture in LIF(−) conditions corresponds well to gastrulation stage in terms of gene expression patterns (Sharova et al., 2007a).
3.3.3. Possible solution: profiling the immediate effects of gene manipulation
Alternatively, it is possible to design a controllable gene suppression or activation in the whole organism or in a specific organ. This is the most reliable experimental method that allows one to perform microarray analysis immediately after gene manipulation, and thereby, differentiate between direct effects of the gene and subsequent compensatory responses of the organism. For example, the expression of a gene can be altered via a transgene under the control of a promoter with a specific hormone- or tetracycline-responsive element (Bockamp et al., 2008; Lewandoski, 2001). An endogenous gene can be repressed by the conditional gene-targeting (both alleles) or via the RNAi technology. If a promoter of a transgene is organ-specific, then it is induced only in that organ, which makes it possible to avoid indirect effects mediated by gene activation in other organs. An alternative approach is to flank a transgene with LoxP sites, so that it can be easily excised when necessary. Gene manipulation can be used in cell culture system as well (Matsumoto et al., 2006; Nishiyama et al., 2009; Niwa et al., 2000). Manipulation of environmental factors (e.g., growth factors, signaling pathway inhibitors) can also provide tools to investigate the immediate effects of the manipulation on global expression profiles (Jung et al., 2007; Kitaya et al., 2007). For example, mesoderm development can be induced by TGFB factors (Kimelman, 2006; LaGamba et al., 2005), whereas neural differentiation can be induced by retinoic acid (Aiba et al., 2006; Wang et al., 2005; Williams et al., 2004).
If a goal is to identify direct targets of a manipulated transcription factor, we recommend including early time points (e.g., 6, 12, and 24 hr) after gene manipulation. These early time points can help to differentiate between direct and indirect effects in gene expression change (Masui et al., 2007). Later time points (e.g., 48 and 72 hr) are also important, because activation of target genes usually progresses over time and is easier to detect at later time points. Non-manipulated embryos (or cell culture) should be always used as additional controls, which are sampled at the same time points as manipulated organisms. Non-manipulated control is important, because gene expression profiles often change over time even in control embryos or cell cultures. It is thus important to distinguish changes induced by manipulated gene or environmental factors from natural progression in gene expression.
3.4. Number of replications required for the microarray analysis
Planning the number of replications for gene expression profiling is a crucial step that determines the success of the project. Here by replications we mean true biological replications that are handled independently from a start to an end. Samples should thus been collected from different embryos or even from different mothers. In the case of cell culture, cells should be cultured independently. Although in the case of small amount of material it is acceptable to pool samples from multiple embryos, in this case the information on variability of gene expression between individuals will be lost. Thus, if possible it is better not to pool samples. Three main factors affect the required number of replications: (1) noise levels in microarray measurements, (2) expected difference in gene expression, and (3) the number of factors (e.g., cell types, time points, or organs) used in the experiment.
Noise levels depend on microarray platforms and on the consistency of RNA processing from extraction to hybridization. Old type of nylon membrane arrays with spotted cDNA clones have shown high levels of noise (Tanaka et al., 2000). Currently, three microarray technologies are available, all of which are based on synthetic short oligonucleotide DNAs: (1) synthesis by photo-lithography for Affymetrix and NimbleGen; (2) synthesis by ink-jet printing for Agilent Technologies; and (3) beads with oligonucleotides for Illumina (Kawasaki, 2006). All of them generate highly reproducible results (Cheadle et al., 2007). Systematic studies by MicroArray Quality Control (MAQC) consortium have shown that these platforms provide comparable results (Shi et al., 2006) and the reliability of the results has been validated by other methods, such as qPCR (Canales et al., 2006).
For Agilent microarrays, competitive hybridization of two RNA samples labeled with Cy3 and Cy5 dyes respectively increases the sensitivity of arrays and makes signal normalization more accurate (Hughes et al., 2001). If samples used for competitive hybridization represent experiment and control (e.g., gene-targeted embryos vs. wild-type embryos), then replications with dye swap are needed to eliminate possible effect of preferential hybridization with one of the dyes (Fig. 2A). However, it is better to use standard RNA sample (e.g. universal mouse reference, UMR) in all arrays so that they can be normalized on the basis of that common sample (Fig. 2B). A potential problem of using UMR is if UMR does not contain transcripts of all genes, this may lead to biased normalization (see the section 4.1.2.).
Figure 2.
Design of two-color (dye) microarray experiments.
If expected differences in gene expression are large (e.g., >100 genes with >3-fold change in expression), then 3 replications are usually sufficient for pair-wise comparison. But if differences in gene expression are small (e.g., <50 genes with >1.5-fold change), then the number of replications should be increased. Statistical models are available for planning the number of replication on the basis of expected statistical power (Pan et al., 2002). The number of replications can be reduced in experiments that involve a large number of cell types and/or time points, because the error variance can be estimated from all cell types or time points simultaneously using ANOVA (Sharov et al., 2005b). However, two replications are still needed for reliable measurement of gene expression. Single replication may be acceptable for redundant cell/tissue types. For example, if three different types of controls are expected to have similar gene expression profiles, it may be possible to do just one replication of each. In general, it is recommended that at least one extra sample should be prepared as a possible substitute in case they fail during target preparation steps.
4. EXPRESSION PROFILING OF SMALL AMOUNTS OF RNAS
Working with embryos requires the expression profiling of small amounts of RNAs. Various protocols for microarray analysis for cells as few as single cells have been reported (e.g., (Kurimoto et al., 2006)). However, these methods are usually labor-intensive and technically demanding. Due to their relatively low reproducibility, many replications are also required.
A protocol described here has been routinely used in our laboratory to perform microarray analysis of oocytes and preimplantation embryos (Hamatani et al., 2004a; Hamatani et al., 2004b). The protocol can be done relatively easily with a few extra days in target preparations and can be applied to 2 ng total RNAs reliably and reproducibly (Carter et al., 2003). A single oocyte, which contains ~0.2 – 0.4 ng total RNAs (Nagy et al., 2002) could be used; however, it does not work routinely and the preparation of labeled targets have to be repeated many times until it passes QC described below. A pool of 10 oocytes (~2 ng total RNAs) works routinely and produces the satisfactorily results in terms of reproducibility and sensitivity. This protocol also works for microdissected tissues. Below is a step-by-step protocol.
4.1. Protocol
4.1.1. Cy3-labeled target preparation with two-round RNA amplification
This protocol uses Agilent Low RNA Input Fluorescent Linear Amplification Kit (Cat# 5184-3523) with the modification to allow 2-rounds of amplification to obtain enough quantity of labeled targets (Hopkins et al., 2003). We have made a slight modification to the protocol to obtain reproducible results from one to 10 oocytes or embryos. One of the key points is direct lysis of eggs/embryos/tissues collected in M2 medium in the primer annealing buffer without isolating total RNAs or polyA+ RNAs. Although use of single oocytes or preimplantation embryo can produce the expression profiles, use of 10 oocytes or embryos are strongly recommended as they produce high-quality and reproducible data consistently.
4.1.1.1. cDNA Synthesis directly from embryos
| Annealing Reaction | |
| Nuclease free water | 7.3 μl |
| 1 – 10 oocytes or embryos in M2 medium | 1.0 μl |
| 5% NP-40 | 2.0 μl |
| T7 Promoter Primer 8-fold dilution | 1.2 μl |
| Total Volume | 11.5 μl |
Incubate at 65 °C for 10 min, and on ice for 5 min.
| cDNA Reaction | |
| Annealing Reaction from above | 11.5 μl |
| 5X First Strand Buffer | 4.0 μl |
| 0.1 M DTT | 2.0 μl |
| dNTP Mix (10mM) | 1.0 μl |
| MMLV RT | 1.0 μl |
| RnNase Out | 0.5 μl |
| Total Volume | 20.0 μl |
Incubate at 40 °C for 2hrs, then inactivate at 65 °C for 15 min, and place a tube on ice for 5 min.
4.1.1.2. Amplification - Round 1
| cDNA reaction from above | 20.0 μl |
| Nuclease free water | 12.1 μl |
| 4X Transcription Buffer | 20.0 μl |
| 0.1 M DTT | 6.0 μl |
| NTP Mix | 8.0 μl |
| CTP | 5.6 μl |
| 50% PEG | 6.4 μl |
| RnNase Out | 0.5 μl |
| Inorganic Phosphate | 0.6 μl |
| T7 RNA Polymerase | 0.8 μl |
| Total Volume | 80.0 μl |
Incubate at 40 °C for 4 hrs.
Purify by Qiagen RNAeasy Mini Spin Column, elute in 30 μl of water, and dry in SpeedVac to final volume 10.5 μl in water.
4.1.1.3. cDNA Synthesis from cRNA
| First Strand Synthesis | |
| Annealing Reaction | |
| cRNA Solution from above | 10.5 μl |
| Random Hexamers 10-fold dilution | 1.0 μl |
| Total Volume | 11.5 μl |
Incubate at 65 °C for 10 min, ice 5 min.
| cDNA Reaction - First Strand | |
| Annealing Reaction from above | 11.5 μl |
| 5X First Strand Buffer | 4.0 μl |
| 0.1 M DTT | 2.0 μl |
| dNTP Mix (10mM) | 1.0 μl |
| MMLV RT | 1.0 μl |
| RnNase Out | 0.5 μl |
| Total Volume | 20.0 μl |
Incubate at 40 °C for 2 hrs.
| Second Strand Synthesis | |
| Annealing reaction | |
| cDNA Solution from above | 20.0 μl |
| T7 Promoter Primer (no dilution) | 1.2 μl |
| Total Volume | 21.2 μl |
Incubate at 65 °C for 10 min and on ice for 5 min.
| cDNA Reaction - Second Strand | |
| Annealing Reaction from above | 21.2 μl |
| Nuclease free water | 5.8 μl |
| 5X First Strand Buffer | 4.0 μl |
| 0.1 M DTT | 4.0 μl |
| dNTP Mix (10mM) | 2.0 μl |
| MMLV RT | 2.0 μl |
| RnNase Out | 1.0 μl |
| Total Volume | 40.0 μl |
Incubate at 40 °C for 2hrs.
Inactivate at 65 °C for 15 min., ice 5 min.
Dry in SpeedVac to final volume 35.3 μl in water.
4.1.1.4. Amplification - Round 2 and Cy3-labeling
| cDNA reaction from above | 35.3 μl |
| CTP-Cy3 | 2.4 μl |
| 4X Transcription Buffer | 20.0 μl |
| 0.1 M DTT | 6.0 μl |
| NTP Mix | 8.0 μl |
| 50% PEG | 6.4 μl |
| RnNase Out | 0.5 μl |
| Inorganic Phosphate | 0.6 μl |
| T7 RNA Polymerase | 0.8 μl |
| Total Volume | 80.0 μl |
Incubate at 40 °C for 4 hrs.
Purify by Qiagen RNAeasy Mini Spin Column and elute in 60 μl (2 × 30 μl) water.
4.1.2. Cy5-labeled target preparation from universal reference RNAs
Cy5-CTP-labeled reference target is prepared using the Agilent Low RNA Input Fluorescent Linear Amplification Kit (Cat# 5184-3523) from the mixture of Stratagene Universal Mouse Reference RNA (UMR) and total RNAs from mouse ES cells cultured in the standard condition (LIF+ medium) at 2:1 ratio. 2.5 μg mixed RNA is used for labeling in each tube. UMR is composed of total RNA from 11 mouse cell lines. We have added mouse ES cell RNAs as we find that genes specifically expressed in ES cells and early embryos are less represented in the UMR, which could cause some distortion of the expression data of ES-specific genes (see the section 3.4).
4.1.3. Purification of Cy3 and Cy5 targets
Labeled targets are purified using the RNeasy Mini Kit (Qiagen, Cat# 74104), and then quantified using a NanoDrop scanning spectrophotometer (NanoDrop Technologies). Purified samples can be stored at −80 °C for long time. We have had experience in using labeled samples stored at 80 °C for up to 8 years without any problems.
4.1.4. Quality control (QC) of the labeled targets
It is important to check the quality of labeled targets before doing hybridizations. In the standard procedure with 2.5 μg of total RNAs as a starting material and one round of amplification, usually ~50 μg of labeled targets will be produced and only 825 ng of them will be used for hybridization as indicated in the manufacture’s protocol. In the case of two-round amplification, a pool of 10 oocytes will produce ~20 μg of labeled targets in ~30 μl solution. However, this does not mean that they are of high quality. We run one μl of the solution in NanoDrop and quantify the sample. Fig. 3 shows an example of standard targets (>0.1 peak height; usually 0.12–0.17). Targets prepared by two round of amplification from 1 oocyte to 5 oocytes show peak heights lower than 0.1, which usually do not produce high quality microarray data. So we usually do not proceed to hybridization. Targets prepared by two round amplification from 10 oocytes produce the peak height higher than 0.1. Four μg of targets will be used for hybridization.
Figure 3.
Quality control of labeled targets by NanoDrop.
4.1.5. Microarray hybridizations, washing, and scanning
Target cRNA are hybridized to the NIA Mouse 44K Microarray v3.0 (whole genome 60-mer oligonucleotide arrays; Mouse Development Microarray Kit by Agilent Technologies) according to manufacturers protocol (Two-Color Microarray-Based Gene Expression Analysis Protocol, Product # G4140-90050, Version 5.0.1). Slides are scanned with the Agilent DNA Microarray Scanner (model G2505-64120) at 100% and 10% PMT in both channels, with a scan resolution of 5μm.
5. QC OF MICROARRAY RESULTS
Microarray technology is highly complex in terms of the number of processing steps and factors that may alter the results. It is thus important to have robust quality control (QC) tools for testing final microarray results. Low-quality results often remain unnoticed, because the software used for the analysis of scanned images often masks the problem via various kinds of normalizations. Normalization algorithms are designed to improve the quality of results within normal limits of variation; however, when applied to strongly distorted data, they simply hide the problem. Agilent Feature Extraction software provides several indicators of microarray quality, which include spatial distribution of outliers, array uniformity, sensitivity, and spike-ins signal statistics (http://www.chem.agilent.com/Library/usermanuals/Public/ReferenceGuide_050416.pdf). These indicators are useful for detecting technical problems within a single array, such as scratches, printing errors, and signal intensity trends across the array (i.e., spatial non-uniformity). However, many problems cannot be detected within a single array and require additional QC tests. In this section we describe two such tests that we have been routinely using: Rank-Plot and correlation between replications.
5.1. Rank-Plot and correlation between replications
Rank-Plot can be generated in the following manner.
Step 1: Extract valued from two columns, “gBGSubSignal” and “rBGSubSignal”, from output files of the Agilent Feature Extraction Software.
Step 2: Transforms these values in log10 scale. Substitute them with 0, if negative. Name these values as “GVal” and “RVal”.
Step 3: Sort “GVal” and “RVal” respectively and index each value with its corresponding rank.
Step 4: If there are multiple microarray data from the past experiments, all these data should also be analyzed in the same manner. Then, make an ensemble average of “GVal” and “RVal” for all the past experiments. Name these values “GVal_mean” and “RVal_mean”. Sort “GVal_mean” and “RVal_mean” respectively and index each value with its corresponding rank.
Step 5: Plot the followings in (X, Y) coordinate: (Rank, GVal), (Rank, GVal_mean), (Rank, RVal), (Rank, RVal_mean).
Plotting can be done by Excel or Gnuplot (free software).
The Rank-Plot method appears sensitive in detecting various problems with microarrays, which include too low or too high amounts of targets in hybridization reaction, compromised washing buffers, degradation of Cy5 signal due to ozone or UV light, and scanner malfunctions. It is important to use non-normalized data for the Rank-Plot, because dye normalization may compensate signal intensities and mask the problem. We recommend using background-subtracted signal intensities (“gBGSubSignal” and “rBGSubSignal”) for both green and red signals. Signals are log-transformed, sorted by increasing order (sorting is done independently for the green and red signals), and then plotted against the rank (Fig. 4A, B). It appears that Rank-Plots have very consistent shapes for all high-quality microarrays and do not depend on the sample tested. Although different cell types have different composition of expressed genes, genes are sorted based on their expression. Thus, the order of genes in Rank-Plots is specific to the cell type, but the shape of the graph remains the same. As a base line we used averaged Rank-Plots for a set of known high-quality arrays (“RVal_mean” and “GVal-mean”). Then, deviations from the base line indicate potential problems. If the Rank-Plot matches well to the base line, then we assume that microarrays are of good quality. Of course, the Rank-Plot represents only the distribution of the signal in microarrays. Thus, some specific problems like scratches and misprints should be detected using other quality-control indicators, as provided by the Agilent Feature Extraction Software.
Figure 4.
Quality control of microarray data by Rank-Plots.
Competitive hybridization of two targets labeled with Cy3 (green) and Cy5 (red) dyes yields best results when both have comparable intensity. Thus, ideally Rank-Plots for red and green signals should match each other. In our case, red intensity is slightly higher than green intensity due to historically selected RNA doses which we decided not to change for better compatibility with previous results (Fig. 4A, B). The difference in red and green Rank-Plots is within acceptable range (<2 fold).
Down-ward shift of the Rank-Plot relative to the base line indicates low signal intensity. Low signals are often caused by low amount of labeled RNA used for hybridization (Fig. 4C, D). We have observed this problem when the initial RNA sample is small and requires two rounds of amplification. Although the quantities of all probes are measure based on NanoDrop reads, targets with two rounds of amplifications apparently contain a larger portion of “junk RNA”. As a result, the amount of real RNA appears diluted. The Rank-Plot can be used for adjusting RNA amounts. For example, the green line in Fig. 4C, which represents the test sample labeled with Cy3 dye, is shifted down by 0.3 in log10 scale relative to the base line. Because 100.3 = 2, this shift can be compensated if the amount of targets used for hybridization is doubled. However, if the signal is extremely low as in Fig. 4D, increasing the amount of targets would not help in our hand.
Red dye (Cy5) can be bleached in the presence of ozone (Byerly et al., 2009; Fare et al., 2003), which may result in the down-ward shift of the Rank-Plot (red line in Fig. 4E, F). Although microarray facilities are kept ozone-free, this problem appears occasionally due to neglect or malfunction of the ozone filters. [We have set up bioBubble enclosure surrounding the bench space for the microarray work and the microarray scanner. Air intakes are handled by the Ozone-scrubber from the ScieGene.] Deviation of the Rank-Plot from the base line can be restricted to low-intensity signals (Fig. 4G–J). Upward deviation in both channels indicates non-specific hybridization (Fig. 4G, H). After we have observed it in a number of arrays, we have done comprehensive troubleshooting and eventually identified the manufactures’ problem in washing buffer as the cause. After replacing the buffer, the Rank-Plot results have returned to the base line. Occasionally, we have observed abnormal Rank-Plots for individual arrays (e.g., Fig. 4I, J). Because it is difficult to find the cause in each case, we usually fix the problem by repeating the hybridization or using a different RNA sample for analysis.
5.2. Correlation between replications
Correlation of signal intensity between biological replications is another useful QC method. It is better to use dye-normalized signals for this test, because these values usually show better matches. High correlation between log-transformed signals for independent biological replications indicates both: (1) high quality of microarray results and (2) reproducibility of the experiment. Gene expression in cultured cells is highly reproducible and the correlation of log-transformed signals between biological replications is mostly >0.995. The reproducibility of gene expression in tissues and organs of different animals is lower than in cultured cells, and the correlation between biological replications usually ranges from 0.95 to 1. Thresholds for acceptable correlation between replications should be selected based on the goals of the study.
5.3. Calibration of microarray signal intensities
Certain experiments require accurate scaling of gene expression values so that the ratio of hybridization signal intensities is equal to the ratio of mRNA abundance across the entire range of expression levels. For example, we have used microarrays for detecting the rate of mRNA degradation in ES cells (Sharova et al., 2009). If arrays were not well calibrated, then the estimates of the rate of mRNA decay would be biased. We have tested the scaling of Agilent microarrays using a series of dilutions (5×, 1×, 0.2×, and 0.04×) of the same mRNA pool obtained from newborn mice of C57BL/6J strain and labeled with Cy3 dye. Samples have been hybridized together with the constant amount of the Universal Mouse Reference (UMR) labeled with Cy5. Regression of log-transformed Cy3 signals versus log RNA input was close to 1 for most probes with log10 signal >1.5 (Fig. 5), which indicates proper scaling. The results confirmed accurate calibration of intensity signals across the entire range of values. Only 50 (0.13% of total) oligonucleotide probes with high signals (>4.8) has shown decreased slopes, indicating some degree of saturation effects (Fig. 5).
Figure 5.
Calibration of microarray signal intensities.
5.4. Cooperative hybridization issues
The test of microarray calibration (see above) has revealed an unexpected fact: red signals for >1000 oligonucleotide probes are not constant despite the fact that the same amount of UMR labeled with Cy5 is used for all microarrays. Instead, red signals in these probes correlated positively with the amount of test RNA labeled with Cy3. This result shows that instead of competitive hybridization some oligonucleotide probes show cooperative hybridization. The dependency between Cy5 signal and test RNA input, measured by the slope of the linear regression, has been observed mostly for oligonucleotides that had green signals higher than red signals (Fig. 6). The mechanism of this effect is unknown, but we hypothesize that it stems from the chain hybridization when free ends of Cy3-labeled mRNA bound to probes on the microarray can hybridize with Cy5-labeled mRNA either non-specifically or via repetitive sequences. Cooperative hybridization may cause underestimation of the difference in gene expression between samples if values are normalized based on the UMR signal. To avoid biased estimates, we do not use UMR for normalization for a small portion of probes, where we expect cooperative hybridizations.
Figure 6.
Cooperative hybridization issues.
6. ANALYSIS AND INTERPRETATION OF MICROARRAY DATA
6.1. Statistical analysis of microarray data
The main purpose of statistical analysis of microarray data is to find genes that are differentially expressed between either stages of development or manipulated and intact embryos. The specific feature of microarray analysis is to simultaneously test a very large set of genes. Most whole genome microarrays include >40000 oligonucleotide probes that match to ca. 30000 non-redundant transcripts. If a statistical significance is evaluated using p-value (p = 0.05 is traditionally used for testing individual hypotheses), 2000 genes (5% of 40000) will be considered statistically significant even in the case of no real difference in expression (e.g., if we compare identical samples). Obviously, such a high number of false positives is not acceptable; thus, correction for multiple hypotheses testing is always required in microarray analysis. The most commonly used criterion for multiple hypotheses testing is False Discovery Rate (FDR), which is interpreted as the proportion of false positives among all genes that are considered significant. According to the theorem of Benjamini and Hochberg (Benjamini and Hochberg, 1995), FDR is estimated:
| (1) |
where r is the rank of a gene ordered by increasing p-values, pr is the p-value for gene with rank r, and N is the total number of genes tested. The FDR value increases monotonously with increasing p-value. As follows from (1), FDR becomes more stringent as the number of significant genes decreases.
Microarray data often include a small number of replications; for example, a typical pair-wise comparison with 3 replications leaves only 4 degrees of freedom for the error variance in ANOVA. In this case, the estimates of error variance are unstable and may appear very small for some genes by pure chance, and these genes are then erroneously treated as statistically significant. This problem can be fixed by adjusting the error variance on the basis of additional information coming from other genes. For example, the Bayesian method takes a weighted average of actual error variance and the average error variance for other genes with similar intensity (Baldi and Long, 2001). A simpler and more conservative approach is to take the maximum value from the actual error variance and the average error variance for other genes with similar intensity (Sharov et al., 2005b). Another method is to use a fold-change threshold as a criterion of statistical significance in addition to FDR.
Several software packages are available for the analysis of microarrays, including SAM and Bioconductor (Gentleman et al., 2004; Tusher et al., 2001). In this paper we focus on the software that we have developed earlier, NIA Array Analysis (http://lgsun.grc.nia.nih.gov/ANOVA; (Sharov et al., 2005b)), as it combines algorithms, which we think are most important for microarray analysis (i.e., ANOVA, FDR, adjustment of error variance, PCA, gene clustering, and pattern matching), and offers a user friendly web-based interface.
6.2. Finding a set of statistically significant genes
The first step in using the NIA Array Analysis software (Sharov et al., 2005b) is to create an account which is password-protected. Then, the input file with data should be prepared using instructions available in the help page (http://lgsun.grc.nia.nih.gov/ANOVA/help.html#format). You can assemble it in Excel and then save it as a tab-delimited text file. Another option is to use “Arrayjoin” tool, which is designed for compiling an input file from multiple scanner files. If column headers are formatted properly, then the software can automatically identify tissue/cell types and replications used in the experiment. However, this information can be also added or modified after the file is uploaded. Data can be normalized during uploading step if necessary. In addition, a file with array annotation (formatting rules are specified in http://lgsun.grc.nia.nih.gov/ANOVA/help.html#annotations) can be uploaded if it is not already available for all users. Software can be used for both one-dye arrays and two-dye arrays. In the latter case we assume that each array is represented by two columns: (1) sample data and (2) control or reference data (e.g., UMR). If your input data are log-transformed, then select the appropriate log-transformation type (log10, log2, or loge) before loading the file. Outliers are removed based on a user-selected z-threshold level (default threshold z = 8).
Statistical analysis is based on the single-factor ANOVA of log-transformed data with optional adjustment of error variance and FDR criterion for selecting differentially expressed genes. Significant genes are displayed using scatter-plot (Fig. 7A, B), log-ratio plot (Fig. 7C, D), and tables. It is possible to modify criteria for gene significance (e.g., use p-values instead of FDR or use a fold change threshold), and re-generate scatter-plot and log-ratio plot. Additional methods of analysis include hierarchical clustering, correlation matrix, estimation of error function (error vs. log intensity), and principal component analysis (PCA, see below). Hierarchical clustering of tissues/cell types is done using the average distance method. It is also possible to identify genes that are specific for each cluster. Data can be retrieved by (1) searching for a specific gene name; (2) pattern-matching, i.e., searching genes whose expression change matches a specific pattern; (3) browsing tables with differentially-expressed genes; (4) clicking on individual genes in scatter-plot or log-ratio plot; and (5) downloading a full tab-delimited table of ANOVA results which can be then opened in Microsoft Excel for any kind of custom-defined analysis.
Figure 7.
Scatter-plot (A, B) and log-ratio plot (C, D) for ES cells in the presence (Dox−) or absence (Dox+) of overexpression of transcription factor (control and Klf4). Data from (Nishiyama et al., 2009).
6.3. Principal component analysis (PCA)
PCA is a powerful method for determining patterns of gene expression change in large data sets, which may include multiple time points, tissues, or genotypes. The strongest pattern of gene expression is represented by the 1st principal component (PC1). A large set of genes show high correlation with this pattern, and correlation can be either positive or negative. The 2nd most important pattern is represented by PC2, which is orthogonal to PC1 and hence have no correlation with it. There are a smaller number of genes that correlate with PC2 compared to genes that correlate with PC1. Then the 3rd principal component (PC3) can be determined, which is orthogonal to both PC1 and PC2, and so on. PCA results are often shown as a scatter-plot with principal components as axis. This plot can be viewed as a projection of original data onto a sub-space of principal components. Thus, PCA is a method for dimensionality reduction. As we have no ability to navigate in 50 dimensions if the data matrix includes 50 samples, it is possible to extract 2 or 3 principal components and project the data into this 2- or 3-dimensional space which can be visualized. For example, analysis of gene expression change of differentiating mouse ES cell have revealed three cell lineages, primitive ectoderm, trophoblast, and extraembryonic endoderm, as cell lineage trajectories in 3D space (Fig. 8).
Figure 8.
3D PCA plots showing the locations of mouse ES cells during differentiation based on the microarray data. Adapted from (Aiba et al., 2009).
PCA is done using the Singular Value Decomposition (SVD) method that generates eigenvectors both for rows and columns of the log-transformed data matrix (Chapman et al., 2002; Sharov et al., 2005b). The advantage of SVD method is that it can project both rows and columns of the data matrix on the same coordinates represented by principal components (biplot); thus, the user can visually explore associations between genes and tissues or time points. The NIA Array Analysis software generates 2-dimensional and 3-dimensional (based on Virtual Reality Modeling Language (VRML)) biplots. All biplots (including 3D) are interactive; each gene is hyperlinked with its annotation and histogram that shows details of expression pattern. For each principal component (PC) we identify 2 clusters of genes that are positively and negatively correlated with this PC. The degree of gene expression change within a specific PC is measured by the slope of regression of log-transformed gene expression versus the corresponding eigenvector multiplied by the range of values within the eigenvector.
It is possible to save PCA results from one experiment and then use them for the analysis of another experiment as reference coordinate system. This method works best if both experiments used the same microarray platform, i.e., all probe IDs are matching. Partial matching of probe IDs is also acceptable (e.g., if one microarray is an extension of the earlier array version). Use “same array platform” checkbox if both experiments were done with the same array platform; in this case the software will not attempt to re-normalize data sets. If array platforms are different, then PCA can be based on common gene symbols, GenBank accession number, or other identifiers. After PCA is generated, click the button “Save for import”. When you analyze the second data set, then saved files will appear in the select box (PCA menu). Saved files remain available for 1 day on the NIA Array Analysis.
6.4. Functional annotation: Gene Ontology, pathways, and transcription factor binding sites
Getting information on differentially-expressed genes is only the first step of microarray analysis. The next step is to do functional annotation of these genes, which can help to interpret gene expression changes. The most common method of functional annotation is the use of Gene Ontology (GO) database, which specifies gene function based on literature and expert opinions (http://www.geneontology.org; (Ashburner et al., 2000)). However, statistical analysis is needed to determine which GO terms are over-represented within a gene list (e.g., among genes that are downregulated in a gene-targeted embryo). This analysis can be done using a variety of software tools, including GenMAPP (www.genmapp.org (Dahlquist et al., 2002)), GSEA (http://www.broad.mit.edu/gsea; (Kim and Volsky, 2005; Subramanian et al., 2005)), GOminer (http://discover.nci.nih.gov/gominer; (Zeeberg et al., 2003)) or NIA Mouse Gene Index (lgsun.grc.nia.nih.gov/geneindex/mm8; (Sharov et al., 2005a; Sharova et al., 2007b)). A list of gene symbols generated by the NIA Array Analysis software can be uploaded into the NIA Mouse Gene Index by selecting an option “View a list of selected genes/transcripts”. Then, this list can be tested for over-representation of GO terms by clicking the “GO-annotation” button. Statistical significance is determined using two criteria: FDR and over-representation ratio with adjustable thresholds. The result can be viewed interactively in the web browser or can be downloaded as a tab-delimited text (see “Tab-delimited table” below the title). In addition, the NIA Mouse Gene Index can generate a similar table with over-represented protein domains and identify clusters of neighboring genes in the genome. Overrepresentation of genes that belong to a specific signaling pathway have certain transcription factor binding sites (TFBS) in their promoters, or clustered in the genome can be analyzed by the GSEA software (Kim and Volsky, 2005; Subramanian et al., 2005). Recent chromatin-immunoprecipitation experiments have yielded genome-wide data on the location of TFBS as well as chromatin modification patterns in the mouse genome (Loh et al., 2006; Mikkelsen et al., 2007). Combined analysis of TFBS for a specific transcription factor (TF) and microarray data after overexpressing or repressing the same TF in cells and embryos can help to identify a set of genes that are direct downstream of the TF (see the methods in (Nishiyama et al., 2009; Sharov et al., 2008)).
7. SUBMITTING THE DATA TO THE PUBLIC DATABASE
Expression profiling does not end with the analysis of the data. Most journals require that data on microarray experiments are submitted to public databases (e.g., GEO (Barrett et al., 2009) (http://www.ncbi.nlm.nih.gov/geo/) or ArrayExpress (Parkinson et al., 2009) (http://www.ebi.ac.uk/microarray-as/ae/)) in the format compliant with the Minimum Information About a Microarray Experiment (MIAME) (Brazma et al., 2001). To be MIAME compliant, which dramatically increases the usability of the microarray data in the future and benefits the research community, the detailed information about all the experimental conditions and sample preparations are required. The best strategy is to collect the detailed information, while one is still doing experiments, with the data submission in mind.
One important issue is gene nomenclature. Although most journals encourage authors to use the standard gene nomenclature, authors often use the gene names/gene symbols that they have coined or they like. However, we strongly recommend that authors follow the standard nomenclature that is set by the International Committee on Standardized Genetic Nomenclature for Mice (http://www.informatics.jax.org/nomen/). If it is a new gene with no name assigned yet, the authors are encouraged to contact the nomenclature committee to discuss the appropriate gene symbols.
Acknowledgments
The authors thank past and current members of the Ko lab for contributing the work described in this paper. The authors’ laboratory had a CRADA arrangement with Agilent Technologies; however, the authors have no personal financial interest in Agilent Technologies. The work was entirely supported by the Intramural Research Program of the NIH, National Institute on Aging.
Contributor Information
Alexei A. Sharov, Email: sharoval@mail.nih.gov.
Yulan Piao, Email: piaoy@mail.nih.gov.
Minoru S. H. Ko, Email: kom@mail.nih.gov.
References
- Abe K, Ko MS, MacGregor GR. A systematic molecular genetic approach to study mammalian germline development. Int J Dev Biol. 1998;42:1051–1065. [PMC free article] [PubMed] [Google Scholar]
- Aiba K, Nedorezov T, Piao Y, Nishiyama A, Matoba R, Sharova LV, Sharov AA, Yamanaka S, Niwa H, Ko MS. Defining developmental potency and cell lineage trajectories by expression profiling of differentiating mouse embryonic stem cells. DNA Res. 2009;16:73–80. doi: 10.1093/dnares/dsn035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aiba K, Sharov AA, Carter MG, Foroni C, Vescovi AL, Ko MS. Defining a developmental path to neural fate by global expression profiling of mouse embryonic stem cells and adult neural stem/progenitor cells. Stem Cells. 2006;24:889–895. doi: 10.1634/stemcells.2005-0332. [DOI] [PubMed] [Google Scholar]
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Babak T, Blencowe BJ, Hughes TR. A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription. BMC Genomics. 2005;6:104. doi: 10.1186/1471-2164-6-104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics. 2001;17:509–519. doi: 10.1093/bioinformatics/17.6.509. [DOI] [PubMed] [Google Scholar]
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–890. doi: 10.1093/nar/gkn764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. Journal of Royal Statistical Society, B. 1995;57:289–300. [Google Scholar]
- Bockamp E, Sprengel R, Eshkind L, Lehmann T, Braun JM, Emmrich F, Hengstler JG. Conditional transgenic mouse models: from the basics to genome-wide sets of knockouts and current studies of tissue regeneration. Regen Med. 2008;3:217–235. doi: 10.2217/17460751.3.2.217. [DOI] [PubMed] [Google Scholar]
- Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001;29:365–371. doi: 10.1038/ng1201-365. [DOI] [PubMed] [Google Scholar]
- Byerly S, Sundin K, Raja R, Stanchfield J, Bejjani BA, Shaffer LG. Effects of ozone exposure during microarray posthybridization washes and scanning. J Mol Diagn. 2009;11:590–597. doi: 10.2353/jmoldx.2009.090009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol. 2006;24:1115–1122. doi: 10.1038/nbt1236. [DOI] [PubMed] [Google Scholar]
- Carter MG, Hamatani T, Sharov AA, Carmack CE, Qian Y, Aiba K, Ko NT, Dudekula DB, Brzoska PM, Hwang SS, et al. In situ-synthesized novel microarray optimized for mouse stem cell and early developmental expression profiling. Genome Res. 2003;13:1011–1021. doi: 10.1101/gr.878903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carter MG, Sharov AA, VanBuren V, Dudekula DB, Carmack CE, Nelson C, Ko MS. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray. Genome Biol. 2005;6:R61. doi: 10.1186/gb-2005-6-7-r61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carter MG, Stagg CA, Falco G, Yoshikawa T, Bassey UC, Aiba K, Sharova LV, Shaik N, Ko MS. An in situ hybridization-based screen for heterogeneously expressed genes in mouse ES cells. Gene Expr Patterns. 2008;8:181–198. doi: 10.1016/j.gep.2007.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chapman S, Schenk P, Kazan K, Manners J. Using biplots to interpret gene expression patterns in plants. Bioinformatics. 2002;18:202–204. doi: 10.1093/bioinformatics/18.1.202. [DOI] [PubMed] [Google Scholar]
- Cheadle C, Becker KG, Cho-Chung YS, Nesterova M, Watkins T, Wood W, 3rd, Prabhu V, Barnes KC. A rapid method for microarray cross platform comparisons using gene expression signatures. Mol Cell Probes. 2007;21:35–46. doi: 10.1016/j.mcp.2006.07.004. [DOI] [PubMed] [Google Scholar]
- Chen Y, Haviernik P, Bunting KD, Yang YC. Cited2 is required for normal hematopoiesis in the murine fetal liver. Blood. 2007;110:2889–2898. doi: 10.1182/blood-2007-01-066316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5:613–619. doi: 10.1038/nmeth.1223. [DOI] [PubMed] [Google Scholar]
- Cui XS, Li XY, Shen XH, Bae YJ, Kang JJ, Kim NH. Transcription profile in mouse four-cell, morula, and blastocyst: Genes implicated in compaction and blastocoel formation. Mol Reprod Dev. 2007;74:133–143. doi: 10.1002/mrd.20483. [DOI] [PubMed] [Google Scholar]
- Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR. GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet. 2002;31:19–20. doi: 10.1038/ng0502-19. [DOI] [PubMed] [Google Scholar]
- Fare TL, Coffey EM, Dai H, He YD, Kessler DA, Kilian KA, Koch JE, LeProust E, Marton MJ, Meyer MR, et al. Effects of atmospheric ozone on microarray data quality. Anal Chem. 2003;75:4672–4675. doi: 10.1021/ac034241b. [DOI] [PubMed] [Google Scholar]
- Frankenberg S, Smith L, Greenfield A, Zernicka-Goetz M. Novel gene expression patterns along the proximo-distal axis of the mouse embryo before gastrulation. BMC Dev Biol. 2007;7:8. doi: 10.1186/1471-213X-7-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamatani T, Carter MG, Sharov AA, Ko MS. Dynamics of global gene expression changes during mouse preimplantation development. Dev Cell. 2004a;6:117–131. doi: 10.1016/s1534-5807(03)00373-3. [DOI] [PubMed] [Google Scholar]
- Hamatani T, Falco G, Carter MG, Akutsu H, Stagg CA, Sharov AA, Dudekula DB, VanBuren V, Ko MS. Age-associated alteration of gene expression patterns in mouse oocytes. Hum Mol Genet. 2004b;13:2263–2278. doi: 10.1093/hmg/ddh241. [DOI] [PubMed] [Google Scholar]
- Hipp J, Atala A. GeneChips in stem cell research. Methods Enzymol. 2006;420:162–224. doi: 10.1016/S0076-6879(06)20009-0. [DOI] [PubMed] [Google Scholar]
- Hopkins C, Sadelova S, Fournier T, Ilsley D, Park M. Application of Agilent’s 60-mer oligonucleotide microarrays for gene expression analysis of small RNA quantities derived from Laser Capture Microdissection. Palo Alto: Agilent Technologies, Inc; 2003. [Google Scholar]
- Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001;19:342–347. doi: 10.1038/86730. [DOI] [PubMed] [Google Scholar]
- Jung HJ, Shim JS, Suh YG, Kim YM, Ono M, Kwon HJ. Potent inhibition of in vivo angiogenesis and tumor growth by a novel cyclooxygenase-2 inhibitor, enoic acanthoic acid. Cancer Sci. 2007;98:1943–1948. doi: 10.1111/j.1349-7006.2007.00617.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kawasaki ES. The end of the microarray Tower of Babel: will universal standards lead the way? J Biomol Tech. 2006;17:200–206. [PMC free article] [PubMed] [Google Scholar]
- Kim SY, Volsky DJ. PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics. 2005;6:144. doi: 10.1186/1471-2105-6-144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimelman D. Mesoderm induction: from caps to chips. Nat Rev Genet. 2006;7:360–372. doi: 10.1038/nrg1837. [DOI] [PubMed] [Google Scholar]
- Kitaya K, Yasuo T, Yamaguchi T, Fushiki S, Honjo H. Genes regulated by interferon-gamma in human uterine microvascular endothelial cells. Int J Mol Med. 2007;20:689–697. [PubMed] [Google Scholar]
- Ko MS. An ‘equalized cDNA library’ by the reassociation of short double-stranded cDNAs. Nucleic Acids Res. 1990;18:5705–5711. doi: 10.1093/nar/18.19.5705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ko MS. Embryogenomics: developmental biology meets genomics. Trends Biotechnol. 2001;19:511–518. doi: 10.1016/s0167-7799(01)01806-6. [DOI] [PubMed] [Google Scholar]
- Ko MS. Expression profiling of the mouse early embryo: reflections and perspectives. Dev Dyn. 2006;235:2437–2448. doi: 10.1002/dvdy.20859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kurimoto K, Yabuta Y, Ohinata Y, Ono Y, Uno KD, Yamada RG, Ueda HR, Saitou M. An improved single-cell cDNA amplification method for efficient high-density oligonucleotide microarray analysis. Nucleic Acids Res. 2006;34:e42. doi: 10.1093/nar/gkl050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- LaGamba D, Nawshad A, Hay ED. Microarray analysis of gene expression during epithelial-mesenchymal transformation. Dev Dyn. 2005;234:132–142. doi: 10.1002/dvdy.20489. [DOI] [PubMed] [Google Scholar]
- Landry J, Sharov AA, Piao Y, Sharova LV, Xiao H, Southon E, Matta J, Tessarollo L, Zhang YE, Ko MS, et al. Essential role of chromatin remodeling protein Bptf in early mouse embryos and embryonic stem cells. PLoS Genet. 2008;4:e1000241. doi: 10.1371/journal.pgen.1000241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewandoski M. Conditional control of gene expression in the mouse. Nat Rev Genet. 2001;2:743–755. doi: 10.1038/35093537. [DOI] [PubMed] [Google Scholar]
- Loh YH, Wu Q, Chew JL, Vega VB, Zhang W, Chen X, Bourque G, George J, Leong B, Liu J, et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat Genet. 2006;38:431–440. doi: 10.1038/ng1760. [DOI] [PubMed] [Google Scholar]
- Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402. doi: 10.1146/annurev.genom.9.081307.164359. [DOI] [PubMed] [Google Scholar]
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. doi: 10.1101/gr.079558.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masui S, Nakatake Y, Toyooka Y, Shimosato D, Yagi R, Takahashi K, Okochi H, Okuda A, Matoba R, Sharov AA, et al. Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat Cell Biol. 2007;9:625–635. doi: 10.1038/ncb1589. [DOI] [PubMed] [Google Scholar]
- Matsumoto N, Kubo A, Liu H, Akita K, Laub F, Ramirez F, Keller G, Friedman SL. Developmental regulation of yolk sac hematopoiesis by Kruppel-like factor 6. Blood. 2006;107:1357–1365. doi: 10.1182/blood-2005-05-1916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCarrey JR, Hsu KC, Eddy EM, Klevecz RR, Bolen JL. Isolation of viable mouse primordial germ cells by antibody-directed flow sorting. J Exp Zool. 1987;242:107–111. doi: 10.1002/jez.1402420116. [DOI] [PubMed] [Google Scholar]
- Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–560. doi: 10.1038/nature06008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
- Nagy A, Gertsenstein M, Vintersten K, Behringer RR. Manipulating the Mouse Embryo: A Laboratory Manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboraory Press; 2002. [Google Scholar]
- Nishiyama A, Xin L, Sharov AA, Thomas M, Mowrer G, Meyers E, Piao Y, Mehta S, Yee S, Nakatake Y, et al. Uncovering early response of gene regulatory networks in ESCs by systematic induction of transcription factors. Cell Stem Cell. 2009;5:420–433. doi: 10.1016/j.stem.2009.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niwa H, Miyazaki J, Smith AG. Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat Genet. 2000;24:372–376. doi: 10.1038/74199. [DOI] [PubMed] [Google Scholar]
- Olena AF, Patton JG. Genomic organization of microRNAs. J Cell Physiol. 2010;222:540–545. doi: 10.1002/jcp.21993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009;4:14. doi: 10.1186/1745-6150-4-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan W, Lin J, Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. 2002;3:research0022. doi: 10.1186/gb-2002-3-5-research0022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, et al. ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009;37:D868–872. doi: 10.1093/nar/gkn889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharov AA, Dudekula DB, Ko MS. Genome-wide assembly and analysis of alternative transcripts in mouse. Genome Res. 2005a;15:748–754. doi: 10.1101/gr.3269805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharov AA, Dudekula DB, Ko MS. A web-based tool for principal component and significance analysis of microarray data. Bioinformatics. 2005b;21:2548–2549. doi: 10.1093/bioinformatics/bti343. [DOI] [PubMed] [Google Scholar]
- Sharov AA, Masui S, Sharova LV, Piao Y, Aiba K, Matoba R, Xin L, Niwa H, Ko MS. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data. BMC Genomics. 2008;9:269. doi: 10.1186/1471-2164-9-269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharova LV, Sharov AA, Nedorezov T, Piao Y, Shaik N, Ko MS. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 2009;16:45–58. doi: 10.1093/dnares/dsn030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharova LV, Sharov AA, Piao Y, Shaik N, Sullivan T, Stewart CL, Hogan BL, Ko MS. Global gene expression profiling reveals similarities and differences among mouse pluripotent stem cells of different origins and strains. Dev Biol. 2007a;307:446–459. doi: 10.1016/j.ydbio.2007.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharova LV, Sharov AA, Piao Y, Shaik N, Sullivan T, Stewart CL, Hogan BL, Ko MS. Global gene expression profiling reveals similarities and differences among mouse pluripotent stem cells of different origins and strains. Dev Biol. 2007b doi: 10.1016/j.ydbio.2007.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sherwood RI, Jitianu C, Cleaver O, Shaywitz DA, Lamenzo JO, Chen AE, Golub TR, Melton DA. Prospective isolation and global gene expression analysis of definitive and visceral endoderm. Dev Biol. 2007;304:541–555. doi: 10.1016/j.ydbio.2007.01.011. [DOI] [PubMed] [Google Scholar]
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–1161. doi: 10.1038/nbt1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A. 2002;99:4465–4470. doi: 10.1073/pnas.012025199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. doi: 10.1126/science.1160342. [DOI] [PubMed] [Google Scholar]
- Tanaka TS, Jaradat SA, Lim MK, Kargul GJ, Wang X, Grahovac MJ, Pantano S, Sano Y, Piao Y, Nagaraja R, et al. Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray. Proc Natl Acad Sci U S A. 2000;97:9127–9132. doi: 10.1073/pnas.97.16.9127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomson JM, Parker JS, Hammond SM. Microarray analysis of miRNA gene expression. Methods Enzymol. 2007;427:107–122. doi: 10.1016/S0076-6879(07)27006-5. [DOI] [PubMed] [Google Scholar]
- Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Loo PF, Mahtab EA, Wisse LJ, Hou J, Grosveld F, Suske G, Philipsen S, Gittenberger-de Groot AC. Transcription factor Sp3 knockout mice display serious cardiac malformations. Mol Cell Biol. 2007;27:8571–8582. doi: 10.1128/MCB.01350-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L, Mear JP, Kuan CY, Colbert MC. Retinoic acid induces CDK inhibitors and growth arrest specific (Gas) genes in neural crest cells. Dev Growth Differ. 2005;47:119–130. doi: 10.1111/j.1440-169X.2005.00788.x. [DOI] [PubMed] [Google Scholar]
- Wang QT, Piotrowska K, Ciemerych MA, Milenkovic L, Scott MP, Davis RW, Zernicka-Goetz M. A genome-wide study of gene activity reveals developmental signaling pathways in the preimplantation mouse embryo. Dev Cell. 2004;6:133–144. doi: 10.1016/s1534-5807(03)00404-0. [DOI] [PubMed] [Google Scholar]
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams SS, Mear JP, Liang HC, Potter SS, Aronow BJ, Colbert MC. Large-scale reprogramming of cranial neural crest gene expression by retinoic acid exposure. Physiol Genomics. 2004;19:184–197. doi: 10.1152/physiolgenomics.00136.2004. [DOI] [PubMed] [Google Scholar]
- Wilusz JE, Sunwoo H, Spector DL. Long noncoding RNAs: functional surprises from the RNA world. Genes Dev. 2009;23:1494–1504. doi: 10.1101/gad.1800909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshikawa T, Piao Y, Zhong J, Matoba R, Carter MG, Wang Y, Goldberg I, Ko MS. High-throughput screen for genes predominantly expressed in the ICM of mouse blastocysts by whole mount in situ hybridization. Gene Expr Patterns. 2006;6:213–224. doi: 10.1016/j.modgep.2005.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003;4:R28. doi: 10.1186/gb-2003-4-4-r28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng F, Baldwin DA, Schultz RM. Transcript profiling during preimplantation mouse development. Dev Biol. 2004;272:483–496. doi: 10.1016/j.ydbio.2004.05.018. [DOI] [PubMed] [Google Scholar]
- Zhu H, Cabrera RM, Wlodarczyk BJ, Bozinov D, Wang D, Schwartz RJ, Finnell RH. Differentially expressed genes in embryonic cardiac tissues of mice lacking Folr1 gene activity. BMC Dev Biol. 2007;7:128. doi: 10.1186/1471-213X-7-128. [DOI] [PMC free article] [PubMed] [Google Scholar]








