Skip to main content
Genome Research logoLink to Genome Research
. 2002 Oct;12(10):1517–1522. doi: 10.1101/gr.418402

Identification and Confirmation of a Module of Coexpressed Genes

H Garrett R Thompson 3, Joseph W Harris 3, Barbara J Wold 1, Stephen R Quake 2, James P Brody 3,4
PMCID: PMC187523  PMID: 12368243

Abstract

We synthesize a large gene expression data set using dbEST and UniGene. We use guilt-by-association (GBA) to analyze this data set and identify coexpressed genes. One module, or group of genes, was found to be coexpressed mainly in tissue extracted from breast and ovarian cancers, but also found in tissue from lung cancers, brain cancers, and bone marrow. This module contains at least six members that are believed to be involved in either transcritional regulation (PDEF, H2AFO, NUCKS) or the ubiquitin proteasome pathway (PSMD7, SQSTM1, FLJ10111). We confirm these observations of coexpression by real-time RT–PCR analysis of mRNA extracted from four model breast epithelial cell lines.


Molecular studies of cellular functions have led to broad knowledge of cellular processes. Most cellular processes are the result of molecules interacting, rather than due to the activity of individual molecules. The study of functional groups of genes has been termed modular cell biology (Hartwell et al. 1999). We are interested in identifying functional modules.

Technical constraints make the study of individual genes much easier than identifying interacting molecules. One approach to identify modules is to examine genome scale expression data.

Genes coexpressed in many different tissues, under both normal and diseased conditions, and at different times during development, are candidates for forming functional modules.

There are different types of experimental gene expression data sets available for use to identify modules. DNA microarray experiments measure mRNA expression levels on a known set of genes by way of hybridization. A fluorescent tag is put on an unknown single-stranded DNA molecule. A set of single-stranded DNAs of known sequence is immobilized onto a surface at known locations. The unknown fluorescently tagged sample is allowed to hybridize to its complimentary immobilized strand. The surface is scanned to create a fluorescent image. The intensity of the fluorescence is a measure of the concentration of DNA in the sample.

Substantial DNA microarray data sets exist, but are not easy to compare between different laboratories. These measurements are made relative to a control and the numbers are reported as a fold difference relative to the control. The specific choice for a control varies widely between different laboratories. Most troubling is the lack of any reported error in these measurements. Hence, there is little systematic information available on how certain the experimenter is of the reported value.

Similar expression data measurements can be made by DNA sequencing. Message RNA molecules isolated from a tissue are reverse transcribed into cDNA and cloned into Escherichia coli vectors to generate a library. Random clones are sampled from the library and a few hundred base pairs are sequenced from each. These are known as expressed sequence tags (ESTs). The sequence read from each clone is generally sufficient to identify the gene when cross referenced to a consensus sequence data set.

The UniGene data set (Schuler et al. 1996) groups the large number of publicly available DNA sequence fragments on the basis of sequence overlaps into unique genes. In addition to complete sequences of some well-known genes, there are thousands of uncharacterized EST fragments that are found as a result of the clustering process. These uncharacterized EST fragments are thought to represent previously uncharacterized genes.

Recently, concerted efforts have been formed to construct a diverse set of cDNA libraries derived from various tissues in normal and diseased states. These libraries are heavily influenced by the National Cancer Institute's Cancer Genome Anatomy Project (CGAP), whose stated goals are to characterize normal, precancerous, and malignant cells. These libraries, when combined with the UniGene collection, provide a set of gene expression data that can be analyzed to identify groups of coexpressed genes.

In general, EST sequencing is more accurate but substantially more expensive than DNA microarrays. DNA microarrays suffer from cross-hybridization, uncertain linearity, and numerous other technical problems. However, they are almost always more sensitive and cheaper to use. (The sensitivity of EST sequencing is limited by the depth of sampling into the library. The costs increase linearly with the depth of sequencing.) However, EST sequencing provides data that is easily comparable across different experiments. EST sequencing provides absolute numbers that represent a sampling of the mRNAs present in a tissue sample.

The specific purpose of this analysis is to identify functional modules from gene expression data by identifying groups of genes that are coexpressed. Our hypothesis is that coexpression has functional significance. Our approach of analyzing gene expression across all tissues (rather than just comparing pairs of diseased and normal tissue), offers two advantages. First, because only genes similarly expressed across all tissues will be grouped together, the identification of both tissue-specific and tissue-generic modules is possible. Second, modules of genes that have a specific and normal function in, for instance, the development of the fetal brain, but also play a pathologic role in, for example, some forms of cancer due to mutations that lead to misregulation may be identified.

Although dbEST (Benson et al. 2002) contains sequence data from many different laboratories, the vast majority of sequence analyzed in this study is derived either from the Washington University/Merck EST project (Hillier et al. 1996) or the National Cancer Institute's Cancer Genome Anatomy Project (CGAP).

The simple listing of ESTs, as is provided in dbEST, is insufficient for large-scale gene expression analysis. A clustering (or assembly) of the ESTs into unique genes is also needed. At least two databases have been established to do this. The first, UniGene (Wheeler et al. 2002), is regularly updated and available freely online. The second, TIGR Human Gene Index (Quakenbush et al. 2000), is available with a restrictive license. The data presented in this study is based on the analysis of gene expression data compiled in the UniGene database.

Analysis algorithms for gene expression data fall into two classes. The first identifies differentially expressed genes between two experiments (cells under different conditions, tissues, or groups of tissues). This is most simply done with a Fisher's Exact test, but can also use more rigorous statistical methods (Audic and Claverie 1997; Stekel et al. 2000). The second class of algorithms seeks to identify groups of genes with similar patterns of expression across many (hundreds) different experiments. We need an algorithm from this second class.

Many different approaches have been used to classify and organize gene expression data. Most algorithms are variations on Eisen's approach (Eisen et al. 1998) of organizing the data on the basis of similarity of gene expression. A different approach, guilt-by-association (GBA) was described by Walker et. al (1999) and used to identify novel genes with expression patterns similar to those of known disease-related genes in a private (Incyte's LifeSeq) gene expression data set generated from ESTs. In this work, we used the GBA algorithm to identify genes with similar expression patterns.

In this study, we first constructed a large set of human expression data from dbEST and UniGene. The data set was found to be reliable on the basis of its identification of ubiquitously expressed genes and known coexpressed genes. Then, we analyzed this data set to identify modules of coexpressed genes. The analysis identified potential functional modules. One was identified in the literature as a module expressed during pregnancy (Thompson et al. 1990), further confirming the reliability of the data set. We also identified a set of coexpressed genes that may form a novel functional module. The members of this functional module are not well studied in the literature. We examined gene expression in several cell lines to confirm that these genes are coexpressed.

RESULTS

After constructing the gene expression data set, we first examined ubiquitously expressed genes as a check on the validity of the data set. Genes that showed the widest expression (found in the greatest number of libraries) are shown in Table 1. No genes appear in more than one-third of the libraries.

Table 1.

The Ten Most Ubiquitously Expressed Genes in the Dataset

Total Gene Description



422 EEF1A1 eukaryotic translation elongation factor 1  alpha 1
422 ACTB actin, beta
371 ACTG1 actin, gamma 1
356 GAPD glyceraldehyde-3-phosphate dehydrogenase
354 RPLP0 ribosomal protein, large, P0
353 EEF1G eukaryotic translation elongation factor 1  gamma
342 TPT1 tumor protein, translationally-controlled 1
334 RPL13A ribosomal protein L13a
330 HDAC3 histone deacetylase 3
326 RPS4X ribosomal protein S4, X-linked

There were 1573 libraries examined and the above ten appeared in the greatest number of libraries. Total indicates the number of libraries (of 1573) in which that particular gene was found to be expressed. 

The results of our analysis for coexpressed genes revealed both known relationships and unknown relationships. We highlight one of the known relationships as anecdotal evidence for the effectiveness of this approach. Table 2 shows the 10 genes with expression patterns most similar to pregnancy specific β-1-glycoprotein. The pregnancy-specific glycoproteins are a group of proteins that are found in large amounts in placenta (Thompson et al. 1990). They are all located on chromosome 19 and share common transcriptional regulatory elements (Thompson et al. 1990).

Table 2.

A Module of Genes That Is Expressed During Pregnancy

Coinc Total Gene Description




 9 16 PSG4 pregnancy specific β-1-glycoprotein 4
10 65 CHS1 chorionic somatomammotropin hormone 1 (placental lactogen)
11 23 PSG9 pregnancy specific β-1-glycoprotein 9
 7 8 PSG3 pregnancy specific β-1-glycoprotein 3
 9 12 PSG6 pregnancy specific β-1-glycoprotein 6
 7 13 PSG2 pregnancy specific β-1-glycoprotein 2
 7 22 PAPPA pregnancy-associated plasma protein A
 7 15 CSH2 chorionic somatomammotropin hormone 2
10 17 CYP19 cytochrome P450, subfamily XIX (aromatization of androgens)
10 30 DAM12 a disintegrin and metalloproteinase domain 12 (meltrin alpha)
 8 15 PSG5 pregnancy specific β-1-glycoprotein 5
11 46 TFPI2 tissue factor pathway inhibitor 2
 5 5 PSG7 pregnancy specific β-1-glycoprotein 7
11 356 GAPD glyceraldehyde-3-phosphate dehydrogenase

The genes are listed relative to PSG1 (pregnancy specific β-1-glycoprotein), which is found in 13 different libraries. The table lists (coinc) the number of libraries in which both PSG1 and the listed gene are both found and (total) the number of libraries in which the listed gene is found, but PSG1 is not found. For example, PSG4 is found in a total of 16 libraries, PSG1 is present in a total of 13 libraries. In 9 of the 13 libraries that PSG1 is found in, PSG9 is also found. GAPD is shown for reference. 

A postulated functional module is shown in Table 3. These five genes are all shown with their coexpression relative to PDEF, a recently described transcription factor (Oettgen et al. 2000). To better understand the role of this module, we examined the tissues in which it was expressed. We found the module to be expressed in some breast (Table 4) and ovary (Table 5) tissues. The module is also expressed in several other brain, lung cancer, and bone marrow tissues (Table 6).

Table 3.

A Module (the PDEF Module) of Genes That Is Expressed in Breast and Ovary Cancers

Coinc Total Gene Description




 9 46 H2AFO H2A histone family, member O
22 105 PSMD7 proteasome (prosome, macropain) 26S subunit, non-ATPase, 7 (Mov34 homolog)
16 104 NUCKS similar to rat nuclear ubiquitous casein kinase 2
14 51 FLJ10111 hypothetical protein FLJ10111
21 203 SQSTM1 sequestosome 1
10 356 GAPD glyceraldehyde-3-phosphate dehydrogenase

The genes are listed relative to PDEF (prostate derived Ets factor, a transcription factor), which is found in 27 different libraries. The table lists (Coinc) the number of libraries in which both PDEF and the listed gene are both found and (Total) the number of libraries in which the listed gene is found, but PDEF is not found. GAPD is shown for reference. 

Table 4.

The Expression of the PDEF Module in Libraries Derived from Breast Tissues

ID Description


517 invasive ductal breast tumors (pooled bulk)
557 ductal tumor (bulk)
590 infiltrating ductal carcinoma (microdissected)
634 normal (bulk)
730 • carcinoma: in situ (microdissected)
759 • carcinoma: invasive (microdissected)
766 • adenocarcinoma (microdissected)
768 • carcinoma: lobular (microdissected)
770 • adenocarcinoma (microdissected)
931 high grade neoplasia (bulk)
726 • normal epithelium (microdissected)
583 normal ductal tissue (microdissected)

The bullets indicate in which libraries the PDEF module is present. ID is the UniGene library identification number. 

Table 5.

The Expression of the PDEF Module in Libraries Derived from Ovary Tissues

ID Description


186 tumor/normal (bulk)
389 normal (bulk)
390 71 yrs old normal (bulk)
514 invasive tumor serous, papillary, adenocarcinoma (micro-bulk)
564 serous papillary adenocarcinoma (bulk)
652 serous adenocarcinoma (bulk)
675 • serous papillary carcinoma (microdissected)
706 • borderline neoplasia (microdissected)
708 • borderline preneoplasia (microdissected)
709 • invasive carcinoma (microdissected)
710 • invasive carcinoma (microdissected)
731 • borderline preneoplasia (microdissected)
765 serous papillary carcinoma (microdissected)
717 serous papillary, clear cell, spindle cell, carcinoma (bulk)

The labels are identical to those in Table 2

Table 6.

The PDEF Module Is Also Expressed in Libraries Derived from Other Tissues

UniGene ID Description


764 • adult bone marrow normal stem cells 34+/38+ (flow sorted)
767 • adult brain oligodendroma (bulk)
771 • adult lung carcinoma: broncholoalveolar (microdissected)
774 • adult lung adenocarcinoma: invasive (microdissected)
812 • adult bone marrow (lymphoid tissue) normal stem cell (bulk)
819 • adult brain oligodendroglioma (bulk)
857 • fetus brain normal (bulk)

The labels are identical to those in Table 2

Many other significant relationships were found between pairs of genes. Significance of a relationship was defined, following Walker et al. (1999), by a P-value, indicating the probability that the coexpression was due to chance. Over 29,000 gene pairs showed a P-value of <10−6. The distribution of these P-values is shown in Figure 1 along with the approximate range of P-values for the functional modules described in this work. There is not necessarily a clear relationship of what constitutes a module. The structure of the data is more like a complex web then a group of distinct modules and our definition of which genes belong to the PDEF module is probably not complete.

Figure 1.

Figure 1

The cumulative distribution of pairwise P-values. For example, ∼1000 gene pairs showed correlations with a P-value <10−30 and ∼100 showed correlations with a P-value <10−60. The P-values for each gene pair in the modules described in this work fall within the indicated regions. Many other significant coexpressions were observed, but not confirmed.

To confirm the observation of coexpression, we performed quantitative reverse transcriptase real-time PCR on four different cell lines derived from mammary epithelial tissue. We measured the quantity of mRNA from PDEF and the five other genes shown in Table 3, along with a control, GAPD. Results are shown in Figure 2.

Figure 2.

Figure 2

Real-time PCR data quantifying the levels of mRNA transcript for each indicated gene in four different cell lines.

DISCUSSION

Examination of the most ubiquitously expressed genes (Table 1) provides interesting insight into the typical population of cellular mRNAs. As expected, most of these mRNAs code for well-known ubiquitously expressed proteins (components of the ribosome, structural proteins, and enzymes). However, one protein labeled tumor protein, translationally controlled 1 or TPT1, is prominent due to its obscurity in the literature. Although its name implies association with cancer, it is better described as a histamine releasing factor (HRF) and has been studied as playing a role in the allergic response of mast cells that results in the exocytosis of histamine. However, its ubiquitous nature perhaps indicates it plays a much larger role in cellular signaling. In fact, the recent determination of the structure of this protein (Thaw et al. 2001) revealed significant similarity to Mss4, a small GTPase accessory protein involved in amplification of extracellular signals. Our finding that TPT1 is ubiquitously expressed is supported by other works. One study used SAGE to exhaustively enumerate the populations of mRNAs within colorectal cancer cell lines (Velculescu et al. 1999) and found TPT1 to be among the most abundant mRNAs in those cell lines. A second study found a yeast homolog to TPT1 to be one of the top 20 most abundant proteins by two-dimensional gel analysis (Norbeck and Blomberg 1997).

The reliability of the data set was further confirmed by examining genes coexpressed with PSG1. The human pregnancy-specific glycoproteins (PSGs) form a group of proteins that are closely clustered on chromosome 19. This group of genes, along with a few others (CSH1, CSH2) were found in libraries derived from placenta and fetal tissues. Interestingly, one library derived from aorta (UniGene Library ID 182) also contained many of these genes. All of these genes are known to be specifically expressed in placental tissues.

We also identified members of a novel functional module. Six members of this postulated functional module are reported as follows: PDEF, SQSTM1, FLJ10111, H2AFO, PSMD7, and NUCKS.

PDEF was first identified as a prostate epithelium-specific Ets transcription factor that interacts with the androgen receptor to activate the promoter of the well-known prostate cancer marker gene, PSA (Oettgen et al. 2000). Simultaneously, it was identified and cloned in both human (Nozawa et al. 2000) and mouse (Yamada et al. 2000), in which it was identified as a positive regulator of maspin, a protease inhibitor that is down-regulated in advanced breast cancers (Zou et al. 1994). Later investigators identified PDEF as a candidate breast tumor marker (Ghadersohi and Sood 2001) and showed that PDEF mRNA was highly overexpressed in 14 of 20 primary human breast tumors examined. They also found that one patient with metastatic breast cancer had PDEF mRNA 192-fold higher in blood as compared with normal individuals (Ghadersohi and Sood 2001). Although PDEF mRNA has been identified as being highly up-regulated in breast cancer cell lines (Fig. 1) and breast tumors (Ghadersohi and Sood 2001), only two targets [the PSA promoter (Oettgen et al. 2000) and the maspin promoter (Yamada et al. 2000)] have been identified that PDEF regulates. Little is known about PDEF's function, either in breast cancer or in normal tissue.

The class of Ets transcription factors is characterized by the ETS domain, which binds monomerically to an 8-bp long DNA element (Dittmer and Nordheim 1998). Most Ets transcription factors bind to the GGAA core sequence of DNA, but PDEF prefers a GGAT core (Oettgen et al. 2000).

P62 is a widely expressed cytoplasmic protein that is known to play a role in cellular signaling by interacting with the SH2 domain of p56(lck) (Park et al. 1995). P62 was also identified as a ubiquitin-binding protein through a yeast two-hybrid screen (Vadlamudi et al. 1996). More recently, p62 has been found to form cytoplasmic structures called intrahyaline bodies that serve as a storage place for multiubiquitinated proteins (Shin 1998). The ability of p62 to bind noncovalently to ubiquitin and several signaling proteins, suggests that p62 may play a regulatory role connected to the ubiquitin–proteasome system.

The promoter of the SQSTM1 gene has been characterized previously (Vadlamudi and Shin 1998). This information, along with the DNA-binding studies of Oettgen et. al (2000), allowed us to identify two potential PDEF-binding sites within the SQSTM1 promoter and to postulate that PDEF regulates the SQSTM1 promoter. We have performed transfection experiments showing that PDEF activates the SQSTM promoter (Thompson et al. 2002) threefold over basal levels.

The hypothetical protein FLJ10111 was first identified and named by the Nedo Full-length cDNA Sequencing Project in Japan. Although the message for the gene has been found (Yawata et al. 2001), no gene product has yet been characterized. Several lines of evidence point to the gene product being involved in the ubiquitin–proteasome pathway.

FLJ10111 is located in a tightly packed region of chromosome 14 that contains other proteasome-related genes (Yawata et al., 2001). This 35-kb region contains six different genes, two of which (PA28α and PA28β) code for components of the proteasome activator (PA28), and transcription of genes in this region is induced by γ-interferon – suggesting a mode of regulation. The PA28 proteasome activator is thought to bind to the 20 S proteasome and enhance the generation of major histocompatibility complex (MHC) class I-binding peptides (Ma et al. 1992; Yawata et al 2001). The products of the other four genes in this 35-kb region are not well characterized. The predicted protein encoded by FLJ10111 has two RING finger domains, which are characteristic of E3 ubiquitin ligases (Yawata et al. 2001). The RING finger motif is found in many (>200) proteins in different eukaryotes, but not in any prokaryote proteins (Freemont 2000).

Histones are nuclear proteins responsible for organizing genomic DNA into the tightly packed chromosomes within eukaryotic cells. The histones play a role in eukaryotic transcriptional regulation and post-translational modifications can change their DNA-binding properties. The transcript for H2AFO, a member of the H2A family, was found among this group of genes.

The protein encoded by the PSMD7 gene, S12, is a regulatory subunit of the proteasome. This protein is homologous to the mouse Mov34 protein. Mutations in Mov34 are lethal in the embryonic stage of development. The protein is evolutionary conserved and homologs can be found in yeast, Drosophila, along with the mouse.

NUCKS is similar (by sequence analysis) to the rat nuclear ubiquitous casein kinase 2, which was first isolated from both HeLa cells and rat brain and subsequently found to be ubiquitously expressed (Ostvold et al. 2001). The protein localizes to the nucleus and binds single- and double-stranded DNA. It is phosphorylated at multiple sites by several kinases, including protein kinase C and casein kinase 2 (Ostvold et al. 1985, 1992; Walaas et al. 1989).

This approach to identifying functionally related modules of genes has wide applicability. The libraries that comprise the data set we examined were largely populated by those sequenced under the support of the NCI's CGAP program, which is focused on cancer. Hence, it is not surprising that we identified a cancer-related module. We expect that concerted expression profiling of other diseased tissues will uncover functional modules active in those diseases. This method, however, requires a discovery-based approach to studying molecular interactions. We queried the data set to examine some well-known genes and there is no information available on these. This may be because these genes have rarely (rather than abundantly) expressed mRNAs and the paucity of sequences that were read in each library.

The hypothetical functional module that we identified in this study has members in two distinct classes, transcriptional control (H2AFO, PDEF, and NUCKS) and ubiquitin–proteasome pathway (PSMD7, SQSTM1, and FLJ10111). The obvious hypothesis that this leads to is that members of the first class control the transcription of the second class and that this occurs in some diseased tissues. We have shown that PDEF activates the SQSTM1 promoter and that the product of SQSTM1, p62, is overexpressed in breast cancer samples relative to normal breast tissue (Thompson et al. 2002).

In summary, we use a method to identify a hypothetical functional module of coexpressed genes in publicly available large-scale gene expression data. We confirmed that these genes are coexpressed by quantitative measurements of mRNA in cell lines derived from breast tissue. Published information on the function of the products of these genes leads to a hypothesis about how these are functionally related. This work focuses on a small subset of the available data. The approach has wide applicability and could lead to the identification of many more functional modules.

METHODS

Computation

We scanned the Homo Sapiens UniGene data set ( build #146, available at ftp://ftp.ncbi.nlm.nih.gov/repository/UniGene/) and compiled a list of libraries in which each UniGene appears. Some appear multiple times in a single library. We screened out all libraries that had less than 100 UniGenes associated with it. The final data set that we analyzed consisted of expression information for 96,574 genes across 1573 libraries.

We analyzed gene expression data by using guilt-by-association (GBA), a combinatoric measure of similarity between the expression patterns of two genes (Walker et al. 1999). We calculated the pairwise expression similarity between all 96,574 genes (4.7 billion comparisons).

Cell Culture

Four mammary epithelial cell lines were obtained from ATCC and examined for mRNA levels. MCF-7 cells were derived originally from an adenocarcinoma. These cells are estrogen-receptor positive. HCC1428 cells were first derived from a metastatic adenocarcinoma. The patient had a family history of breast cancer. The cells are Her2/neu negative and p53 negative. MCF-10-2F and MCF-10-2A cells are both nontumorigenic cell lines (they do not form tumors in immunosuppressed mice) that were derived from a 36-year-old female with fibrocystic breast disease. The MCF-10-2F cells are derived from floating cells, whereas the MCF-10-2A cells are derived from adherent cells.

Mammary epithelial cells were cultured in monolayers in 10-cm2 culture dishes in the recommended medium supplemented with chelated or unchelated horse or fetal bovine serum as applicable, until <80% confluent. Ten dishes of cultured cells, or ∼5–10 × 107 cells, were solubilized in TRI Reagent per the manufacturer's instructions. RNA extraction, precipitation, and solubilization were performed as described by the manufacturer. A total of 0.25 mg RNA was aliquoted for recovery of poly(A) RNA from each cell line cultured using a spin-column format (QIAGEN) per the manufacturer's instructions. Poly(A) RNA was quantitated by fluorescence using the RiboGreen RNA Quantitation Reagent (Molecular Probes) in 96-well plate format, assayed in triplicate, using the Fluoroskan Accent FL combination luminometer/fluorometer (LabSystems). A total of 50 ng mRNA was used for RT–PCR using the Thermoscript RT–PCR System (GIBCO-BRL/Invitrogen) to generate cDNA per the manufacturer's instructions.

Real-Time PCR

Prior to assaying concentrations of mRNA species from samples by real-time quantitative PCR, several preparatory steps were required. First, separate standard curves were generated for each gene to be analyzed. Genes to be analyzed were amplified from an appropriate cDNA source (i.e., prostate cDNA from Clontech) by PCR using primers designed from gene sequences available at www.ncbi.nlm.nih.gov. Amplified sequences were gel purified and quantitated by spectroscopy in triplicate. Tenfold serial dilutions of the standard curve over five orders of magnitude were generated and assayed in duplicate with each sample.

Primers used in the real-time PCR reaction were designed from sequences internal to those used to amplify the cDNA for the standard curve, with care taken to encompass all mRNA variants reported, in case multiple cDNAs were generated by RT–PCR. In addition, real-time primers were designed to come from two different exons to eliminate the possibility of genomic DNA and immature RNA contamination. Results from real-time PCR are collected as fluorescence over cycle number. By establishing a fluorescence threshold, a linear graph is generated of the standard curve using log (concentration) plotted as a function of cycle threshold number. The equation of the line is then used to determine the starting concentration of mRNA for each unknown. For each assay, unknowns are assayed in triplicate with a new standard curve. Samples are normalized to total mRNA but a housekeeping gene, GAPD, was also quantitated.

WEB SITE REFERENCES

ftp://ftp.ncbi.nlm.nih.gov/repository/UniGene/; Homo Sapiens UniGene data set.

Acknowledgments

This work was supported by the National Human Genome Research Institute through grant 5K22-HG000047.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Footnotes

E-MAIL jpbrody@uci.edu; FAX (949) 824-9968.

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.418402.

REFERENCES

  1. Audic S, Claverie JM. The significance of digital gene expression profiles. Genome Res. 1997;7:986–995. doi: 10.1101/gr.7.10.986. [DOI] [PubMed] [Google Scholar]
  2. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL. Genbank. Nucleic Acids Res. 2002;30:17–20. doi: 10.1093/nar/30.1.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Dittmer J, Nordheim A. Ets transcription factors and human disease. Biochim Biophys Acta. 1998;1377:F1–F11. doi: 10.1016/s0304-419x(97)00039-5. [DOI] [PubMed] [Google Scholar]
  4. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Freemont PS. Ring for destruction. Curr Biol. 2000;10:R84. doi: 10.1016/s0960-9822(00)00287-6. [DOI] [PubMed] [Google Scholar]
  6. Ghadersohi A, Sood AK. Prostate epithelium-derived Ets transcription factor mRNA is overexpressed in human breast tumors and is a candidate breast tumor marker and a breast tumor antigen. Clin Cancer Res. 2001;7:2731. [PubMed] [Google Scholar]
  7. Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999;402:C47–C52. doi: 10.1038/35011540. [DOI] [PubMed] [Google Scholar]
  8. Hillier L, Lennon G, Becker M, Bonaldo M, Chiapelli B, Chissoe S, Dietrich N, Dubuque T, Favello A, Gish W, et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 1996;6:807–828. doi: 10.1101/gr.6.9.807. [DOI] [PubMed] [Google Scholar]
  9. Ma CP, Slaughter CA, DeMartino GN. Identification, purification, and characterization of a protein activator (PA28) of the 20 S proteasome (macropain) J Biol Chem. 1992;267:10515–10523. [PubMed] [Google Scholar]
  10. Norbeck J, Blomberg A. Two-dimensional electrophoretic separation of yeast proteins using a non-linear wide range (pH 3–10) immobilized pH gradient in the first dimension; reproducibility and evidence for isoelectric focusing of alkaline (pI >7) proteins. Yeast. 1997;13:1519–1534. doi: 10.1002/(SICI)1097-0061(199712)13:16<1519::AID-YEA211>3.0.CO;2-U. [DOI] [PubMed] [Google Scholar]
  11. Nozawa M, Yomogida K, Kanno N, Nonomura N, Miki T, Okuyama A, Nisimune Y, Nozaki M. Prostate-specific transcription factor hPSE is translated only in prostate epithelial cells. Cancer Res. 2000;60:1348–1352. [PubMed] [Google Scholar]
  12. Oettgen P, Finger E, Sun Z, Akbarali Y, Thamrongsak U, Boltax J, Grall F, Dube A, Weiss A, Brown L, et al. PDEF, a novel prostate epithelium-specific Ets transcription factor, interacts with the androgen receptor and activates prostate-specific antigen gene expression. J Biol Chem. 2000;275:1216–1225. doi: 10.1074/jbc.275.2.1216. [DOI] [PubMed] [Google Scholar]
  13. Ostvold AC, Holtlund J, Laland SG. A novel, highly phosphorylated protein, of the high mobility group type, present in a variety of proliferating and non-proliferating mammalian cells. Eur J Biochem. 1985;153:469–475. doi: 10.1111/j.1432-1033.1985.tb09325.x. [DOI] [PubMed] [Google Scholar]
  14. Ostvold AC, Hullstein I, Laland SG. The phosphate groups of the high mobility group like protein P1 strengthens its affinity for DNA. Biochem Biophys Res Commun. 1992;185:1091–1097. doi: 10.1016/0006-291x(92)91738-c. [DOI] [PubMed] [Google Scholar]
  15. Ostvold AC, Norum JH, Mathiesen S, Wanvik B, Sefland I, Grundt K. Molecular cloning of a mammalian nuclear phosphoprotein NUCKS, which serves as a substrate for Cdk1 in vivo. Eur J Biochem. 2001;268:2430–2440. doi: 10.1046/j.1432-1327.2001.02120.x. [DOI] [PubMed] [Google Scholar]
  16. Park I, Chung J, Walsh CT, Yun YD, Strominger JL, Shin J. Phosphotyrosine-independent binding of a 62-kDa protein to the src homology 2 (SH2) domain of p56(lck) and its regulation by phosphorylation of Ser-59 in the lck unique N-terminal region. Proc Natl Acad Sci. 1995;92:12338–12342. doi: 10.1073/pnas.92.26.12338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Quakenbush J, Liang F, Holt I, Pertea G, Upton J. The TIGR Gene Indices: Re-construction and representation of expressed gene sequences. Nucleic Acids Res. 2000;28:141–145. doi: 10.1093/nar/28.1.141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K, White RE, Rodriguez-Tome P, Aggarwal A, Bajorek E, et al. A gene map of the human genome. Science. 1996;274:540–546. [PubMed] [Google Scholar]
  19. Shin J. P62 and the sequestosome, a novel mechanism for protein metabolism. Arch Pharm Res. 1998;21:629–633. doi: 10.1007/BF02976748. [DOI] [PubMed] [Google Scholar]
  20. Stekel DJ, Git Y, Falciani F. The comparison of gene expression from multiple cDNA libraries. Genome Res. 2000;10:2055–2061. doi: 10.1101/gr.gr-1325rr. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Thaw P, Baxter NJ, Hounslow AM, Price C, Waltho JP, Craven CJ. Structure of TCTP reveals unexpected relationship with guanine nucleotide-free chaperones. Nat Struct Biol. 2001;8:701–704. doi: 10.1038/90415. [DOI] [PubMed] [Google Scholar]
  22. Thompson, H.G.R., Harris, J.W., Lin, F., Wold, B., and Brody, J.P. 2002. P62 over-expression in breast tumors and its regulation by Prostate-Derived Ets Factor PDEF in breast cancer cells in vitro. Preprint. Available at http://brodylab.eng.uci.edu/˜jpbrody/tmp/p62.pdf. [DOI] [PubMed]
  23. Thompson J, Koumari R, Wagner K, Barnert S, Schleussner C, Schrewe H, Zimmermann W, Muller G, Schempp W, Zaninetta D. The human pregnancy-specific glycoprotein genes are tightly linked on the long arm of chromosome 19 and are coordinately expressed. Biochem Biophys Res Commun. 1990;167:848–859. doi: 10.1016/0006-291x(90)92103-7. [DOI] [PubMed] [Google Scholar]
  24. Vadlamudi RK, Shin J. Genomic structure and promoter analysis of the p62 gene encoding a non-proteasomal multiubiquitin chain binding protein. FEBS Lett. 1998;435:138–142. doi: 10.1016/s0014-5793(98)01021-7. [DOI] [PubMed] [Google Scholar]
  25. Vadlamudi RK, Joung I, Strominger JL, Shin J. P62, a Phosphotyrosine-independent Ligand of the SH2 domain of p56(lck), belongs to a new class of ubiquitin-binding proteins. J Biol Chem. 1996;271:209235–209237. doi: 10.1074/jbc.271.34.20235. [DOI] [PubMed] [Google Scholar]
  26. Velculescu VE, Madden SL, Zhang L, Lash AE, Yu J, Rago C, Lal A, Wang CJ, Beaudry GA, Ciriello KM, et al. Analysis of human transcriptomes. Nat Genet. 1999;23:387–388. doi: 10.1038/70487. [DOI] [PubMed] [Google Scholar]
  27. Walaas SI, Ostvold AC, Laland SG. Phosphorylation of P1, a high mobility group-like protein, catalyzed by casein kinase II, protein kinase C, cyclic AMP-dependent protein kinase and calcium/calmodulin-dependent protein kinase II. FEBS Lett. 1989;258:106–108. doi: 10.1016/0014-5793(89)81626-6. [DOI] [PubMed] [Google Scholar]
  28. Walker MG, Volkmuth W, Sprinzak E, Hodgsdon D, Klinger T. Prediction of gene function by genome-scale expression analysis: Prostate cancer-associated genes. Genome Res. 1999;9:1198–1203. doi: 10.1101/gr.9.12.1198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wheeler DL, Church DM, Lash AE, Leipe DD, Madden TL, Pontius JU, Schuler GD, Schriml LM, Tatusova TA, Wagner L, et al. Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res. 2002;30:13–16. doi: 10.1093/nar/30.1.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Yamada N, Tamai Y, Miyamoto H, Nozaki M. Cloning and expression of the mouse Pse gene encoding a novel Ets family member. Gene. 2000;241:267–274. doi: 10.1016/s0378-1119(99)00484-9. [DOI] [PubMed] [Google Scholar]
  31. Yawata M, Murata S, Tanaka K, Ishigatsubo Y, Kasahara M. Nucleotide sequence analysis of the approximately 35-kb segment containing interferon-γ-inducible mouse proteasome activator genes. Immunogenetics. 2001;53:119–129. doi: 10.1007/s002510100308. [DOI] [PubMed] [Google Scholar]
  32. Zou Z, Anisowicz A, Hendrix MJC, Thor A, Neveu M, Sheng S, Rafidi K, Seftor E, Sager R. Maspin, a serpin with tumor-suppressing activity in human mammary epithelial cells. Science. 1994;263:526–529. doi: 10.1126/science.8290962. [DOI] [PubMed] [Google Scholar]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press

RESOURCES