Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Feb 1.
Published in final edited form as: Genomics. 2006 Nov 27;89(2):300–305. doi: 10.1016/j.ygeno.2006.10.004

Evaluating microarrays using a semi-parametric approach: application to the central carbon metabolism of E. coli BL21 and JM109

Je-Nie Phue a, Benjamin Kedem b, Pratik Jaluria a, Joseph Shiloach a,*
PMCID: PMC1945183  NIHMSID: NIHMS16517  PMID: 17125967

Abstract

E. coli K (JM109) and E. coli B (BL21) are strains used routinely for recombinant protein production. These two strains grow and respond differently to environmental factors, such as glucose and oxygen concentration. The differences were attributed to different expression of individual genes that constituted certain metabolic pathways that are part of the central carbon metabolism. By implementing the semi-parametric algorithm which is based on the null hypothesis of equal distribution, it was possible to compare and quantify the expression patterns of groups of genes involved in several central carbon metabolic pathways. The groups comprising the glyoxylate shunt, TCA cycle, fatty acid, and gluconeogenesis & anaplerotic pathways were expressed differently between the two strains while no differences were apparent for the groups comprising either glycolysis or the pentose phosphate pathway. These results further characterized differences between the two E. coli strains and illustrated the potency of the semi-parametric algorithm.

Keywords: microarray, E.coli, glucose metabolism, semi-parametric algorithm

Introduction

Following gene transcription with microarrays is currently the preferred method for evaluating either gene expression variations between different cells or the effects of various compounds and environmental conditions on gene expression. With the use of commercially available software, such as Partek Genomics Suite or Acuity, microarray data can be organized to highlight specific genes or to identify entire biological pathways [1]. The level of gene organization depends on the algorithms being used to search the data for possible correlations, groupings, and patterns [2]. From a statistical standpoint, a critical aspect of evaluating microarray data is determining whether or not the expression levels of certain genes or groups of genes are truly different from one another [3, 4]. For this type of comparative analysis, a number of hypothesis testing methods are commonly used such as classical t-test, F-test, and ANOVA; each with advantages and limitations depending on the exact application [5, 6].

It was previously established that the growth behavior and metabolite production pattern of E. coli K (JM109) and E. coli B (BL21) are very different [7,8]. Most importantly, E. coli B is insensitive to high glucose concentration and does not produce acetate, which is believed to adversely affect both growth and protein production. In contrast, E. coli K is very sensitive to glucose concentration and produces high levels of acetate. As a result, intensive work was conducted to understand the differences between the two strains by comparing their central carbon metabolism. This was done by analyzing growth characteristics and acetate production [7], performing flux analysis [8], following carbon flow using labeled carbon [9], and following individual gene transcription using northern blot analysis and cDNA microarrays [10, 11]. In the presented work, a statistical semi-parametric algorithm [12] was implemented for evaluating previous results by comparing the transcription of groups of genes belonging to specific metabolic pathways of the central carbon metabolism. This algorithm analyzed microarray results by employing both computational and graphical components. The computational component calculated density functions, estimated data distributions, and then correlated differences between data sets that had distinct distributions with commonly used p-values. The graphical component used calculated log2 numbers that were obtained from the intensity values and plotted these unitless numbers along the x-axis while the estimated density functions were plotted along the y-axis.

A list of the genes involved in the glyoxylate shunt, TCA cycle, glycolysis, pentose phosphate, the combined gluconeogenesis and anaplerotic patyways, and the fatty acid pathways was assembled. Bacterial samples were collected and analyzed using oligonucleotide microarrays. Normalized intensity values for each gene within a specific pathway were then used as input for the semi-parametric algorithm and for evaluating the differences between the pathways of the two strains.

Results

E.coli B (BL21) and E.coli K (JM109) were grown at an initial glucose concentration of 40 g/L (Figure 1) and their gene transcriptions were analyzed using oligonucleotide arrays. Samples for microarray analysis were taken at the end of the logarithmic growth phase, when the glucose concentration was below 1 g/L. Genes were then grouped according to the following pathways: glyoxylate shunt, TCA cycle, glycolysis, pentose phosphate, fatty acids, and gluconenogenesis and anaplerotic; each consisting of 5 to 18 genes. Three separate fermentations were conducted with samples from each run being assayed using microarrays. The resulting data, in the form of gene-specific signal intensity values, are summarized in Table 1. This data was then used as input for the semi-parametric algorithm, the results of which are shown in Table 2 and Figure 2. Table 2 provides the p-values for each pathway while Figure 2 provides plots of the estimated density functions for both E. coli strains vs. log2 of the average intensity values. Gene-specific intensity values from the three different arrays were averaged and converted to log2 values for each E. coli strain. (This log2 conversion is the reason for the difference between the intensity values listed in Table 2 and the values in found along the x-axis in the plots of Figure 2).

Figure 1.

Figure 1

Bacterial growth, acetate production and glucose consumption at high initial glucose concentration of E. coli strains, A – E.coli BL21, B – E.coli JM109 the arrows indicate sampling time for microarray analysis

Table 1.

Microarray data in the form of normalized signal intensities; used as input for the algorithm

Signal Intensity Signal Intensity
BL21 Batch No. JM109 Batch No. BL21 Batch No. JM109 Batch No.
Gene Symbol 1 2 3 1 2 3 Gene Symbol 1 2 3 1 2 3
Glyoxylate Shunt Pentose Phosphate
aceA 1961 920 9329 257 332 158 gcd 164 70 365 551 973 588
aceB 1427 2663 5448 246 245 145 gntK 2548 5833 640 37 206 195
aceK 312 196 446 149 238 197 gntP 374 323 405 232 246 139
iclR 159 323 656 349 92 171 gntR 212 687 462 858 807 692
fadR 498 490 485 556 317 526 gntT 4158 6040 2235 337 553 308
fruR 456 273 242 229 47 225 gntU_1 234 622 268 73 93 109
himA 12631 5377 12753 4146 3491 3227 gntU_2 426 669 145 55 145 152
himD 7171 5304 20350 5166 3517 5730 zwf 859 1189 757 2074 1770 1781
TCA Cycle gnd 633 1432 941 5216 4284 3423
acnA 2332 661 1361 1831 2919 1847 rpe 115 787 353 235 188 213
acnB 4671 3524 6634 2900 3082 1661 rpiA 1802 1396 1462 1356 1341 1382
icdA 4547 9686 10406 8756 7511 4768 rpiB 551 566 399 355 387 225
sucA 4977 6101 2902 2870 6574 728 talA 4304 372 3102 4366 5600 1778
sucB 7728 9698 5137 3958 6922 915 talB 2554 2446 2022 1804 2014 2295
sucC 14842 14685 8499 6757 7490 1866 talC 86 38 44 16 15 23
sucD 2393 2615 1849 1845 1593 603 tktA 924 1149 843 969 559 300
sdhA 4126 8633 5329 2941 7560 1955 tktB 1759 129 1220 1989 3296 784
sdhB 2050 3725 1454 1046 2427 629 eda 928 1697 928 1537 1369 1214
sdhC 1082 4757 1500 331 971 430 Fatty Acid
sdhD 4090 9170 5372 1104 4336 1628 fabA 207 845 693 1070 834 1210
fumA 3702 4071 4013 1340 3395 1074 fabB 3785 5922 5729 5061 5983 3950
mdh 8393 7132 7670 5801 3695 1437 fabD 1486 2355 1451 1570 1095 822
gltA 7182 10969 8510 3085 8217 2575 fabF 178 931 498 356 153 333
Glycolysis Pathway fabG 2683 4035 2195 2405 1855 1672
glk 542 1368 620 383 422 350 fabH 988 1581 1207 1476 1203 1041
pgi 1337 833 534 1699 1530 906 fabI 839 1592 3343 4860 2927 2733
pfkA 1955 1636 1709 2548 2003 2386 fabZ 778 2424 1699 1705 1378 2064
pfkB 2084 503 1639 1017 1501 778 fadA 301 164 2175 252 449 215
fba 2013 3185 2408 3729 3722 1199 fadB 904 1256 6194 78 282 85
tpiA 2490 1799 2241 1329 1427 1394 fadD 1190 915 2224 366 407 220
gapA 4684 15341 7454 15938 14089 7851 fadL 2219 744 5804 31 28 85
pgk 1749 2640 2478 4521 4491 2979 fadR 498 490 485 556 317 526
gpmA 2437 2384 3860 4215 4107 2116 Gluconeogenesis and anaplerotic pathway
eno 4677 3717 2965 6521 3889 2980 fbp 2062 2400 2871 1805 1166 870
pykA 1802 1812 2527 1044 1065 978 pckA 5721 4041 5127 1929 1955 563
pykF 2687 1100 1724 1695 1302 2060 ppsA 2162 1261 2053 455 389 270
pyrB 519 363 259 240 244 259 ppc 728 3105 607 1099 685 763
pgm 607 1176 644 533 515 430 sfcA 195 584 179 810 721 473
manA 693 764 687 1054 1420 890
mgsA 1541 2058 1814 717 505 623

Table 2.

Results of the semi-parametric algorithm applied to normalized microarray data from 6 hybridized oligonucleotide arrays

Pathway n α1 β1 χ1 p-value
Glyoxylate Shunt 24 3.284 −0.344 7.325 0.0068
TCA Cycle 42 8.827 −0.755 12.303 0.0005
Glycolysis 48 0.835 −0.079 0.254 0.6142
Pentose Phosphate 54 0.994 −0.108 1.090 0.2964
Fatty Acid 39 3.327 −0.337 5.111 0.0238
Gluconeogenesis and Anaplerotic 15 5.712 −0.567 3.559 0.0592

Symbols in the table are described as follows:

n = number of data points for a given pathway

α1 = calculated parameter as defined in Eq. (1)

β1 = calculated parameter as defined in Eq. (1) and the main component of null hypothesis

χ1 = test statistic as defined in Eq. (2) used to quantify similarities between data distributions

p-value = the probability associated with a sample being drawn from the two data sets being tested given that the null hypothesis is true.

Figure 2.

Figure 2

Comparison of the reference density function (E.coli BL21) and the distortion density function (E.coli JM109) vs. log2 of average intensity values for each of the following pathways: (a) Glyoxylate shunt, (b) TCA cycle, (c) Glycolysis, (d) Pentose phosphate, (e) Fatty acid, (f) Gluconeogenesis & Anaplerotic

As shown in Table 2, the glyoxylate shunt, TCA cycle, and fatty acid pathways are distributed differently between the two E.coli strains because their p-values are much smaller than 0.05 (0.0068, 0.0005, and 0.0238, respectively) which was set to correspond to the likelihood of occurrence of 5% [13]. Acceptance of the null hypothesis of equal distribution takes place for p-values greater than the limit of 0.05. Conversely, p-values below the limit of 0.05 correspond to rejection of the null hypothesis. In other words, the genes that collectively constitute each of the three pathways listed above are being expressed differently between the two E. coli strains, and these differences are less than 5% likely to occur naturally; taking also into consideration inherent variability between slides, sample preparation, etc. [14]. In fact, the glyoxylate shunt and the TCA cycle have such low p-values that the likelihood of these distributions occurring naturally is a fraction of 1% (0.68% and 0.05%, respectively). Figures 2(a), 2(b), and 2(e) graphically illustrate the differences between the two E. coli strains for the three pathways. No point-specific overlaps or structural similarities are apparent in any of these figures.

The gluconeogenesis and anaplerotic pathway has a p-value only slightly larger than the limit of 0.05 (0.0592) and therefore the genes in this pathway are also being expressed differently between the two strains, but not as significantly as the previously mentioned pathways. Figure 2(f) highlights the differences and similarities between the two strains for this pathway. Despite some common features between the two curves, such as the slope of the initial ascent, there are several important differences, such as the well-defined peak in the E. coli JM109 curve and the rapid descent of the curve for higher values of x when compared with the E. coli BL21 curve.

For the glycolysis and the pentose phosphate pathways, no differences were apparent, evidenced by their relatively large p-values (0.6142 and 0.2964, respectively). In both Figures 2(c) and 2(d) the curve for one strain traces the curve for the other strain. Both figures have points of overlap and nearly identical shapes, therefore, the genes constituting each of these two pathways behave similarly between the two E. coli strains.

Discussion

Microarrays are an efficient tool for the identification of gene transcription differences between cells, tissues, and microorganisms [15]. They have been used extensively for studying genotypic causes for phenotypic differences, divergent responses to environmental pressures, and evolutionary trends. A typical array experiment generates a large amount of data that requires statistical methods to perform searches to detect and quantify differences between gene expression levels [16, 17]. In most cases, the results of such a search are a list of up-regulated and down-regulated genes, relative to a reference or control gene expression pattern.

E. coli JM109 and E. coli BL21 are two strains commonly used for recombinant protein production. These two strains are different in their response to glucose concentration, especially excess glucose. E. coli JM109 excretes high levels of acetate when the glucose concentration exceeds a few grams per liter, while E. coli BL21 is insensitive to glucose concentrations and excretes low levels of acetate even the glucose concentration is above 30 grams per liter. When careful study of the central carbon metabolism of these strains was done by enzymatic activity, metabolic flux analysis, and cDNA arrays [11] several differences in the metabolic pathways were identified. The cDNA array analysis enabled us to determine in which strain and under what growth conditions would the transcription of a particular gene be higher or lower. However, it was not possible to directly compare and evaluate the expression levels of groups of genes constituting specific pathways such as glycolysis, TCA cycle, or the glyoxylate shunt. This comparison was needed for a comprehensive picture of the strains’ metabolic behavior, and was possible to perform by using a semi-parametric algorithm. Results generated from the implementation of this algorithm indicated that the transcription of the glycolysis and the pentose phosphate pathways were comparable in the two strains, however, clear differences were identified in the transcription of the TCA cycle, glyoxylate shunt, fatty acid pathways and to a lesser extent in the gluconegensis and the anaplerotic pathway. Differences in the growth rate and the metabolic patterns, including acetate formation between the two E.coli strains, can therefore be attributed to the combined effects of the glyoxylate shunt, TCA cycle, gluconeogenesis, and fatty acid pathways. In fact, this difference in gene transcription patterns is very likely the reason for efficient utilization of glucose through both the TCA cycle and the glyoxylate shunt, as well as assimilation of acetate via glucoenogensis and fatty acid biosynthesis in the E. coli BL21. The work presented in this text therefore supports previous information demonstrating that the glyoxylate shunt enzymes were more active in E. coli B than in K and that certain TCA, gluconeogensis and anaplerotic, and fatty acid metabolism genes are transcribed differently between the two strains. However, it was not possible to demonstrate that the transcription patterns of an entire pathway were different.

Although other hypothesis testing methods are available, the semi-parametric algorithm was chosen because it has been shown to be robust, relatively insensitive to outliers, and readily capable of analyzing an assortment of number sets, irrespective of their origin [18]. The work presented focused on using results from hybridized oligonucleotide arrays as input for the semi-parametric algorithm, but the overall process presented here is applicable to cDNA arrays as well. This, however, will require a universal control sample to be used with each dual-channel cDNA slide to allow cross-comparisons, a limitation not encountered with single-channel oligonucleotide arrays since only one sample is hybridized per slide instead of two samples (test and control).

The results of this study demonstrate the benefit of the semi-parametric algorithm to validate and expanded upon information obtained from genomic microarrays. The purpose of this study was to present a comprehensive approach involving the implementation of a statistical method to microarray data in order to validate and expand upon previous results. As such, the method presented offers researchers another way to decipher microarray data and explore genomic differences in the context of entire biological pathways.

Materials and Methods

1. Statistical formulation

The semi-parametric method used in this work generates both numerical values (p-values) and graphical illustrations to highlight distinctions between genes or groups of genes [12, 19, 20, 21]. Suppose that a set of gene-specific intensity values labeled x11, x12, …x1m are distributed such that a probability density function g1(x) can describe the distribution of these m numbers. Another set of gene-specific intensity values labeled x21, x22, …x2n are distributed such that a probably density function g(x) can describe the distribution of these n numbers [22]. As part of this mathematical construct, m and n may be different, however, for some h(x) the following equation is assumed:

g1(x)=exp{α1+β1h(x)}g(x) Eq. (1)

where α1 and β1 are unknown parameters that can be estimated from the data sets (i.e. x11, x12, …x1m, x21, x22, …x2n), and h(x) is a function that must be specified. This equation expresses g(x) as a baseline or reference density while calculating the deviation or ‘tilt’ associated with g1(x) in terms of the reference density. In other words, Eq. 1 illustrates the mathematical relationship between g(x) and g1(x), the reference density function and the deviation density function, respectively.

This set-up allows for testing the null hypothesis of equal distribution; that is g1(x) = g(x) and (H0) β1=0. Incorporating the idea of a null hypothesis allows insight into subsequent analysis [19]. Accepting the null hypothesis, β1 = 0, signifies g1(x) and g(x) are distributed equally. If the null hypothesis is rejected, then g1(x) ≠ g(x); hence there is a difference between the distributions of these 2 data sets [20]. In order to test the hypothesis, a test statistic, χ1, is designated. It is asymptotically distributed as χ12 with one degree of freedom and adheres to the following equation:

χ1=(m+n)ρ1(1+ρ1)2V^ar(h(t))β^12 Eq. (2)

where ρ1 = m/n, V̂ar(h(t)) is the estimate of the variance of h(t) with respect to the reference distribution g(x), and β̂1 is the estimate of β1 [18]. For our microarray data sets, m = n = the number of genes in a particular pathway times the number of arrays spotted with each gene, 3 this case because that was how many arrays were hybridized per E. coli strain [22].

The semi-parametric algorithm makes no assumptions regarding normal distributions for either g(x) or g1(x). The only assumption made is for h(x). The choice of h(x) = x is quite satisfactory for symmetric or nearly symmetric probability distributions whereas h(x) = log x is adequate for skewed distributions. In our analysis we utilized h(x) = x [21].

The algorithm quantifies the level of similarity between g(x) and g1(x) by numerical and graphical means. The numerical approach calculates the p-values resulting from the hypothesis test, discussed earlier. The graphical approach (seen in Figure 2) is an integral part of the analysis and not simply an illustration. The graphical approach: highlights the differences between the two E.coli strains for a specific biological pathway, indirectly correlates p-values with the overall structure and predictability of the density functions, and demonstrates how a series of numbers, in this case the intensity values, can be averaged and combined into a new data set independent of the original source, in this case, the genes. In other words the graphical approach both visualizes and interprets the results generated by the semi-parametric method, as shown in Figure 2. The greater the similarity between the distributions of a given pathway, the closer the plots of the estimated g(x) and g1(x) are one to another; and the higher the corresponding p-value.

2. Bacterial strains

The two E. coli strains studied were BL21(λDE3) (F, ompT, hsdSB (rB−,mB+), dcm, gal, (DE3), Cmr) and JM109(DE3) (endA1, recA1, gyrA96, thi, hsdR17 (rk,mk+), relA1, supE44,Δλ, Δ(lac-proAB), [F', traD36, proAB, lacIqZΔM15], λDE3). Both strains were obtained from Promega Corp. (Madison, WI).

3. Fermentation and sample preparation

Both strains were grown at 37ºC in modified LB medium containing 10 g/L tryptone, 5 g/L yeast extract (15 g/L for JM109), 5 g/L NaCl, and 5 g/L K2HPO4. After sterilization, 10 mM MgSO4, 1 mL/L trace metal solution, and 40 g/L glucose were added. Overnight cultures grown at 37ºC were used to inoculate 4.0 L of medium in a B. Braun fermentor equipped with data acquisition and a control system. The cultures were grown to high cell density, the pH was controlled at 7.0 by the addition of 50% NH4OH, and dissolved oxygen was kept above 30% of saturation at all times.

Samples for total RNA purification were collected at the late logarithmic phase of growth, indicated by arrows in Figure 1. Next, the samples were centrifuged at 14,000 g for 10 min at 4°C; the supernatant was removed and the pellets were quickly frozen with dry ice and stored at −80°C.

4. Total RNA preparation

Total RNA was isolated using a MasterPure RNA Purification Kit (Epicentre Technologies, Madison WI) according to the manufacturer’s protocol (Kit MCR 85102). Isolated RNA was further purified with an RNAeasy Kit 75144 (Qiagen). Overall RNA concentration was determined by measuring absorbance at 260 nm (A260) using a GeneQuant Pro (Amersham Biosciences). Purified RNA samples were determined to have absorbance ratios (A260/A280) of 1.85–1.95 and by running 1% agarose/formadehyde denaturing gel. To further ensure equivalency between individual samples, the 23S and 16S ribosomal RNA (rRNA) from each sample were analyzed by an Agilent 2100 Bioanalyzer (Agilent Technologies). The intensity of each band was calculated and the rRNA ratio (23S/16S) for each sample was calculated to be greater than 1.5.

5. Oligonucleotide microarrays

Standard methods available from Affymetrix (Santa Clara, CA) for cDNA synthesis, fragmentation, and end-terminus biotin labeling starting with a total RNA (10 μg) sample were used. The biotin-labeled cDNA was hybridized to E.coli Affymetrix Antisense Genome Arrays at 45 °C for 16 hours as recommended in the GeneChip technical manual (Affymetrix). Hybridized arrays were stained with streptavidin-phycoerythrin using an Affymetrix Fluidic Station. The GeneChips were scanned using an Affymetrix/Hewlett–Packard GeneArray GC2500 Scanner. The signal intensity was normalized using Affymetrix Microarray Suite Software (version 4.0).

Acknowledgments

Funding was provided by the National Institute of Diabetes & Digestive & Kidney Diseases (NIDDK), National Institutes of Health (NIH).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Murphy D. Gene expression studies using microarrays: principles, problems, and prospects. Adv Physiol Educ. 2002;26:256–270. doi: 10.1152/advan.00043.2002. [DOI] [PubMed] [Google Scholar]
  • 2.Tamayo P, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. doi: 10.1073/pnas.96.6.2907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Burke HB. Discovering patterns in microarray data. Mol Diagn. 2000;5:349–357. doi: 10.1007/BF03262096. [DOI] [PubMed] [Google Scholar]
  • 4.Dopazo J, Zanders E, Dragoni I, Amphlett G, Falciani F. Methods and approaches in the analysis of gene expression data. J Immunol Methods. 2001;250:93–112. doi: 10.1016/s0022-1759(01)00307-6. [DOI] [PubMed] [Google Scholar]
  • 5.Saeed AI, et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–378. doi: 10.2144/03342mt01. [DOI] [PubMed] [Google Scholar]
  • 6.Yang HH, Hu Y, Buetow KH, Lee MP. A computational approach to measuring coherence of gene expression in pathways. Genomics. 2004;84:211–217. doi: 10.1016/j.ygeno.2004.01.007. [DOI] [PubMed] [Google Scholar]
  • 7.Shiloach J, Kaufman J, Guillard AS, Fass R. Effect of glucose supply strategy on acetate accumulation, growth, and recombinant protein production by Escherichia coli BL21 (lDE3) and Escherichia coli JM109. Biotechnol Bioeng. 1996;49:421–428. doi: 10.1002/(SICI)1097-0290(19960220)49:4<421::AID-BIT9>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]
  • 8.Van de Walle M, Shiloach J. Proposed mechanism of acetate accumulation in two recombinant Escherichia coli strains during high density fermentation. Biotechnol Bioeng. 1998;57:71–78. doi: 10.1002/(sici)1097-0290(19980105)57:1<71::aid-bit9>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
  • 9.Noronha SB, Yeh HJ, Spande TF, Shiloach J. Investigation of the TCA cycle and the glyoxylate shunt in Escherichia coli BL21 and JM109 using 13C-NMR/MS. Biotechnol Bioeng. 2000;68:316–327. [PubMed] [Google Scholar]
  • 10.Phue J, Shiloach J. Transcription levels of key metabolic genes are the cause for different glucose utilization pathways in E. coli B (BL21) and E. coli K (JM109) J Biotechnol. 2004;109:21–30. doi: 10.1016/j.jbiotec.2003.10.038. [DOI] [PubMed] [Google Scholar]
  • 11.Phue J, Noronha SB, Hattacharyya R, Wolfe AJ, Shiloach J. Glucose metabolism at high density growth of E. coli B and E. coli K: differences in metabolic pathways are responsible for efficient glucose utilization in E. coli B as determined by microarrays and Northern blot analyses. Biotechnol Bioeng. 2005;90:805–820. doi: 10.1002/bit.20478. [DOI] [PubMed] [Google Scholar]
  • 12.Qi Y. Classification of Microarray Data, Department of Mathematics. College Park, MD: University of Maryland; 2002. [Google Scholar]
  • 13.Conway T, Schoolnik GK. Microarray expression profiling: capturing a genome. Mol Microbiol. 2003;47:879–889. doi: 10.1046/j.1365-2958.2003.03338.x. [DOI] [PubMed] [Google Scholar]
  • 14.Ideker T, Thorsson V, Siegel AF, Hood LE. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J Comput Biol. 2000;7:805–817. doi: 10.1089/10665270050514945. [DOI] [PubMed] [Google Scholar]
  • 15.Mantripragada KK, Buckley PG, Diaz de Stahl T, Dumanski JP. Genomic microarrays in the spotlight. Trends Genet. 2004;20:87–93. doi: 10.1016/j.tig.2003.12.008. [DOI] [PubMed] [Google Scholar]
  • 16.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. [DOI] [PubMed] [Google Scholar]
  • 18.Fokianos K, Kedem B, Qin J, Short D. A semiparametric approach to the one way layout. Technometrics. 2001;43:56–65. [Google Scholar]
  • 19.Qin J, Zhang B. A goodness of fit test for the logistic regression model based on case-control data. Biometrika. 1997;84:609–618. [Google Scholar]
  • 20.Gilbert PB, Lele SR, Vardi Y. Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika. 1999;86:27–43. [Google Scholar]
  • 21.Kedem B, Wolff DB, Fokianos K. Statistical Comparison of Algorithms. IEEE Trans Instrum Meas. 2004;53:770–776. [Google Scholar]
  • 22.Gagnon R. Certain Computational Aspects of Power Efficiency and State Space Models, Department of Mathematics. College Park, MD: University of Maryland; 2005. [Google Scholar]

RESOURCES