Evaluating microarrays using a semi-parametric approach: application to the central carbon metabolism of E. coli BL21 and JM109

Je-Nie Phue; Benjamin Kedem; Pratik Jaluria; Joseph Shiloach

doi:10.1016/j.ygeno.2006.10.004

. Author manuscript; available in PMC: 2008 Feb 1.

Published in final edited form as: Genomics. 2006 Nov 27;89(2):300–305. doi: 10.1016/j.ygeno.2006.10.004

Evaluating microarrays using a semi-parametric approach: application to the central carbon metabolism of E. coli BL21 and JM109

Je-Nie Phue ^a, Benjamin Kedem ^b, Pratik Jaluria ^a, Joseph Shiloach ^a,^*

PMCID: PMC1945183 NIHMSID: NIHMS16517 PMID: 17125967

Abstract

E. coli K (JM109) and E. coli B (BL21) are strains used routinely for recombinant protein production. These two strains grow and respond differently to environmental factors, such as glucose and oxygen concentration. The differences were attributed to different expression of individual genes that constituted certain metabolic pathways that are part of the central carbon metabolism. By implementing the semi-parametric algorithm which is based on the null hypothesis of equal distribution, it was possible to compare and quantify the expression patterns of groups of genes involved in several central carbon metabolic pathways. The groups comprising the glyoxylate shunt, TCA cycle, fatty acid, and gluconeogenesis & anaplerotic pathways were expressed differently between the two strains while no differences were apparent for the groups comprising either glycolysis or the pentose phosphate pathway. These results further characterized differences between the two E. coli strains and illustrated the potency of the semi-parametric algorithm.

Keywords: microarray, E.coli, glucose metabolism, semi-parametric algorithm

Introduction

Following gene transcription with microarrays is currently the preferred method for evaluating either gene expression variations between different cells or the effects of various compounds and environmental conditions on gene expression. With the use of commercially available software, such as Partek Genomics Suite or Acuity, microarray data can be organized to highlight specific genes or to identify entire biological pathways [1]. The level of gene organization depends on the algorithms being used to search the data for possible correlations, groupings, and patterns [2]. From a statistical standpoint, a critical aspect of evaluating microarray data is determining whether or not the expression levels of certain genes or groups of genes are truly different from one another [3, 4]. For this type of comparative analysis, a number of hypothesis testing methods are commonly used such as classical t-test, F-test, and ANOVA; each with advantages and limitations depending on the exact application [5, 6].

It was previously established that the growth behavior and metabolite production pattern of E. coli K (JM109) and E. coli B (BL21) are very different [7,8]. Most importantly, E. coli B is insensitive to high glucose concentration and does not produce acetate, which is believed to adversely affect both growth and protein production. In contrast, E. coli K is very sensitive to glucose concentration and produces high levels of acetate. As a result, intensive work was conducted to understand the differences between the two strains by comparing their central carbon metabolism. This was done by analyzing growth characteristics and acetate production [7], performing flux analysis [8], following carbon flow using labeled carbon [9], and following individual gene transcription using northern blot analysis and cDNA microarrays [10, 11]. In the presented work, a statistical semi-parametric algorithm [12] was implemented for evaluating previous results by comparing the transcription of groups of genes belonging to specific metabolic pathways of the central carbon metabolism. This algorithm analyzed microarray results by employing both computational and graphical components. The computational component calculated density functions, estimated data distributions, and then correlated differences between data sets that had distinct distributions with commonly used p-values. The graphical component used calculated log₂ numbers that were obtained from the intensity values and plotted these unitless numbers along the x-axis while the estimated density functions were plotted along the y-axis.

A list of the genes involved in the glyoxylate shunt, TCA cycle, glycolysis, pentose phosphate, the combined gluconeogenesis and anaplerotic patyways, and the fatty acid pathways was assembled. Bacterial samples were collected and analyzed using oligonucleotide microarrays. Normalized intensity values for each gene within a specific pathway were then used as input for the semi-parametric algorithm and for evaluating the differences between the pathways of the two strains.

Results

E.coli B (BL21) and E.coli K (JM109) were grown at an initial glucose concentration of 40 g/L (Figure 1) and their gene transcriptions were analyzed using oligonucleotide arrays. Samples for microarray analysis were taken at the end of the logarithmic growth phase, when the glucose concentration was below 1 g/L. Genes were then grouped according to the following pathways: glyoxylate shunt, TCA cycle, glycolysis, pentose phosphate, fatty acids, and gluconenogenesis and anaplerotic; each consisting of 5 to 18 genes. Three separate fermentations were conducted with samples from each run being assayed using microarrays. The resulting data, in the form of gene-specific signal intensity values, are summarized in Table 1. This data was then used as input for the semi-parametric algorithm, the results of which are shown in Table 2 and Figure 2. Table 2 provides the p-values for each pathway while Figure 2 provides plots of the estimated density functions for both E. coli strains vs. log₂ of the average intensity values. Gene-specific intensity values from the three different arrays were averaged and converted to log₂ values for each E. coli strain. (This log₂ conversion is the reason for the difference between the intensity values listed in Table 2 and the values in found along the x-axis in the plots of Figure 2).

Bacterial growth, acetate production and glucose consumption at high initial glucose concentration of *E. coli* strains, A – *E.coli* BL21, B – *E.coli* JM109 the arrows indicate sampling time for microarray analysis

Table 1.

Microarray data in the form of normalized signal intensities; used as input for the algorithm

	Signal Intensity							Signal Intensity
	BL21 Batch No.			JM109 Batch No.				BL21 Batch No.			JM109 Batch No.
Gene Symbol	1	2	3	1	2	3	Gene Symbol	1	2	3	1	2	3
Glyoxylate Shunt							Pentose Phosphate
aceA	1961	920	9329	257	332	158	gcd	164	70	365	551	973	588
aceB	1427	2663	5448	246	245	145	gntK	2548	5833	640	37	206	195
aceK	312	196	446	149	238	197	gntP	374	323	405	232	246	139
iclR	159	323	656	349	92	171	gntR	212	687	462	858	807	692
fadR	498	490	485	556	317	526	gntT	4158	6040	2235	337	553	308
fruR	456	273	242	229	47	225	gntU_1	234	622	268	73	93	109
himA	12631	5377	12753	4146	3491	3227	gntU_2	426	669	145	55	145	152
himD	7171	5304	20350	5166	3517	5730	zwf	859	1189	757	2074	1770	1781
TCA Cycle							gnd	633	1432	941	5216	4284	3423
acnA	2332	661	1361	1831	2919	1847	rpe	115	787	353	235	188	213
acnB	4671	3524	6634	2900	3082	1661	rpiA	1802	1396	1462	1356	1341	1382
icdA	4547	9686	10406	8756	7511	4768	rpiB	551	566	399	355	387	225
sucA	4977	6101	2902	2870	6574	728	talA	4304	372	3102	4366	5600	1778
sucB	7728	9698	5137	3958	6922	915	talB	2554	2446	2022	1804	2014	2295
sucC	14842	14685	8499	6757	7490	1866	talC	86	38	44	16	15	23
sucD	2393	2615	1849	1845	1593	603	tktA	924	1149	843	969	559	300
sdhA	4126	8633	5329	2941	7560	1955	tktB	1759	129	1220	1989	3296	784
sdhB	2050	3725	1454	1046	2427	629	eda	928	1697	928	1537	1369	1214
sdhC	1082	4757	1500	331	971	430	Fatty Acid
sdhD	4090	9170	5372	1104	4336	1628	fabA	207	845	693	1070	834	1210
fumA	3702	4071	4013	1340	3395	1074	fabB	3785	5922	5729	5061	5983	3950
mdh	8393	7132	7670	5801	3695	1437	fabD	1486	2355	1451	1570	1095	822
gltA	7182	10969	8510	3085	8217	2575	fabF	178	931	498	356	153	333
Glycolysis Pathway							fabG	2683	4035	2195	2405	1855	1672
glk	542	1368	620	383	422	350	fabH	988	1581	1207	1476	1203	1041
pgi	1337	833	534	1699	1530	906	fabI	839	1592	3343	4860	2927	2733
pfkA	1955	1636	1709	2548	2003	2386	fabZ	778	2424	1699	1705	1378	2064
pfkB	2084	503	1639	1017	1501	778	fadA	301	164	2175	252	449	215
fba	2013	3185	2408	3729	3722	1199	fadB	904	1256	6194	78	282	85
tpiA	2490	1799	2241	1329	1427	1394	fadD	1190	915	2224	366	407	220
gapA	4684	15341	7454	15938	14089	7851	fadL	2219	744	5804	31	28	85
pgk	1749	2640	2478	4521	4491	2979	fadR	498	490	485	556	317	526
gpmA	2437	2384	3860	4215	4107	2116	Gluconeogenesis and anaplerotic pathway
eno	4677	3717	2965	6521	3889	2980	fbp	2062	2400	2871	1805	1166	870
pykA	1802	1812	2527	1044	1065	978	pckA	5721	4041	5127	1929	1955	563
pykF	2687	1100	1724	1695	1302	2060	ppsA	2162	1261	2053	455	389	270
pyrB	519	363	259	240	244	259	ppc	728	3105	607	1099	685	763
pgm	607	1176	644	533	515	430	sfcA	195	584	179	810	721	473
manA	693	764	687	1054	1420	890
mgsA	1541	2058	1814	717	505	623

Open in a new tab

Table 2.

Results of the semi-parametric algorithm applied to normalized microarray data from 6 hybridized oligonucleotide arrays

Pathway	n	α₁	β₁	χ₁	p-value
Glyoxylate Shunt	24	3.284	−0.344	7.325	0.0068
TCA Cycle	42	8.827	−0.755	12.303	0.0005
Glycolysis	48	0.835	−0.079	0.254	0.6142
Pentose Phosphate	54	0.994	−0.108	1.090	0.2964
Fatty Acid	39	3.327	−0.337	5.111	0.0238
Gluconeogenesis and Anaplerotic	15	5.712	−0.567	3.559	0.0592

Open in a new tab

Symbols in the table are described as follows:

n = number of data points for a given pathway

α₁ = calculated parameter as defined in Eq. (1)

β₁ = calculated parameter as defined in Eq. (1) and the main component of null hypothesis

χ₁ = test statistic as defined in Eq. (2) used to quantify similarities between data distributions

p-value = the probability associated with a sample being drawn from the two data sets being tested given that the null hypothesis is true.

Comparison of the reference density function (*E.coli* BL21) and the distortion density function (*E.coli* JM109) vs. log₂ of average intensity values for each of the following pathways: (a) Glyoxylate shunt, (b) TCA cycle, (c) Glycolysis, (d) Pentose phosphate, (e) Fatty acid, (f) Gluconeogenesis & Anaplerotic

As shown in Table 2, the glyoxylate shunt, TCA cycle, and fatty acid pathways are distributed differently between the two E.coli strains because their p-values are much smaller than 0.05 (0.0068, 0.0005, and 0.0238, respectively) which was set to correspond to the likelihood of occurrence of 5% [13]. Acceptance of the null hypothesis of equal distribution takes place for p-values greater than the limit of 0.05. Conversely, p-values below the limit of 0.05 correspond to rejection of the null hypothesis. In other words, the genes that collectively constitute each of the three pathways listed above are being expressed differently between the two E. coli strains, and these differences are less than 5% likely to occur naturally; taking also into consideration inherent variability between slides, sample preparation, etc. [14]. In fact, the glyoxylate shunt and the TCA cycle have such low p-values that the likelihood of these distributions occurring naturally is a fraction of 1% (0.68% and 0.05%, respectively). Figures 2(a), 2(b), and 2(e) graphically illustrate the differences between the two E. coli strains for the three pathways. No point-specific overlaps or structural similarities are apparent in any of these figures.

The gluconeogenesis and anaplerotic pathway has a p-value only slightly larger than the limit of 0.05 (0.0592) and therefore the genes in this pathway are also being expressed differently between the two strains, but not as significantly as the previously mentioned pathways. Figure 2(f) highlights the differences and similarities between the two strains for this pathway. Despite some common features between the two curves, such as the slope of the initial ascent, there are several important differences, such as the well-defined peak in the E. coli JM109 curve and the rapid descent of the curve for higher values of x when compared with the E. coli BL21 curve.

For the glycolysis and the pentose phosphate pathways, no differences were apparent, evidenced by their relatively large p-values (0.6142 and 0.2964, respectively). In both Figures 2(c) and 2(d) the curve for one strain traces the curve for the other strain. Both figures have points of overlap and nearly identical shapes, therefore, the genes constituting each of these two pathways behave similarly between the two E. coli strains.

Discussion

Microarrays are an efficient tool for the identification of gene transcription differences between cells, tissues, and microorganisms [15]. They have been used extensively for studying genotypic causes for phenotypic differences, divergent responses to environmental pressures, and evolutionary trends. A typical array experiment generates a large amount of data that requires statistical methods to perform searches to detect and quantify differences between gene expression levels [16, 17]. In most cases, the results of such a search are a list of up-regulated and down-regulated genes, relative to a reference or control gene expression pattern.

E. coli JM109 and E. coli BL21 are two strains commonly used for recombinant protein production. These two strains are different in their response to glucose concentration, especially excess glucose. E. coli JM109 excretes high levels of acetate when the glucose concentration exceeds a few grams per liter, while E. coli BL21 is insensitive to glucose concentrations and excretes low levels of acetate even the glucose concentration is above 30 grams per liter. When careful study of the central carbon metabolism of these strains was done by enzymatic activity, metabolic flux analysis, and cDNA arrays [11] several differences in the metabolic pathways were identified. The cDNA array analysis enabled us to determine in which strain and under what growth conditions would the transcription of a particular gene be higher or lower. However, it was not possible to directly compare and evaluate the expression levels of groups of genes constituting specific pathways such as glycolysis, TCA cycle, or the glyoxylate shunt. This comparison was needed for a comprehensive picture of the strains’ metabolic behavior, and was possible to perform by using a semi-parametric algorithm. Results generated from the implementation of this algorithm indicated that the transcription of the glycolysis and the pentose phosphate pathways were comparable in the two strains, however, clear differences were identified in the transcription of the TCA cycle, glyoxylate shunt, fatty acid pathways and to a lesser extent in the gluconegensis and the anaplerotic pathway. Differences in the growth rate and the metabolic patterns, including acetate formation between the two E.coli strains, can therefore be attributed to the combined effects of the glyoxylate shunt, TCA cycle, gluconeogenesis, and fatty acid pathways. In fact, this difference in gene transcription patterns is very likely the reason for efficient utilization of glucose through both the TCA cycle and the glyoxylate shunt, as well as assimilation of acetate via glucoenogensis and fatty acid biosynthesis in the E. coli BL21. The work presented in this text therefore supports previous information demonstrating that the glyoxylate shunt enzymes were more active in E. coli B than in K and that certain TCA, gluconeogensis and anaplerotic, and fatty acid metabolism genes are transcribed differently between the two strains. However, it was not possible to demonstrate that the transcription patterns of an entire pathway were different.

Although other hypothesis testing methods are available, the semi-parametric algorithm was chosen because it has been shown to be robust, relatively insensitive to outliers, and readily capable of analyzing an assortment of number sets, irrespective of their origin [18]. The work presented focused on using results from hybridized oligonucleotide arrays as input for the semi-parametric algorithm, but the overall process presented here is applicable to cDNA arrays as well. This, however, will require a universal control sample to be used with each dual-channel cDNA slide to allow cross-comparisons, a limitation not encountered with single-channel oligonucleotide arrays since only one sample is hybridized per slide instead of two samples (test and control).

The results of this study demonstrate the benefit of the semi-parametric algorithm to validate and expanded upon information obtained from genomic microarrays. The purpose of this study was to present a comprehensive approach involving the implementation of a statistical method to microarray data in order to validate and expand upon previous results. As such, the method presented offers researchers another way to decipher microarray data and explore genomic differences in the context of entire biological pathways.

Materials and Methods

1. Statistical formulation

The semi-parametric method used in this work generates both numerical values (p-values) and graphical illustrations to highlight distinctions between genes or groups of genes [12, 19, 20, 21]. Suppose that a set of gene-specific intensity values labeled x₁₁, x₁₂, …x_1m are distributed such that a probability density function g₁(x) can describe the distribution of these m numbers. Another set of gene-specific intensity values labeled x₂₁, x₂₂, …x_2n are distributed such that a probably density function g(x) can describe the distribution of these n numbers [22]. As part of this mathematical construct, m and n may be different, however, for some h(x) the following equation is assumed:

g_{1} (x) = exp {α_{1} + β_{1} h (x)} g (x)

Eq. (1)

where α₁ and β₁ are unknown parameters that can be estimated from the data sets (i.e. x₁₁, x₁₂, …x_1m, x₂₁, x₂₂, …x_2n), and h(x) is a function that must be specified. This equation expresses g(x) as a baseline or reference density while calculating the deviation or ‘tilt’ associated with g₁(x) in terms of the reference density. In other words, Eq. 1 illustrates the mathematical relationship between g(x) and g₁(x), the reference density function and the deviation density function, respectively.

This set-up allows for testing the null hypothesis of equal distribution; that is g₁(x) = g(x) and (H₀) β₁=0. Incorporating the idea of a null hypothesis allows insight into subsequent analysis [19]. Accepting the null hypothesis, β₁ = 0, signifies g₁(x) and g(x) are distributed equally. If the null hypothesis is rejected, then g₁(x) ≠ g(x); hence there is a difference between the distributions of these 2 data sets [20]. In order to test the hypothesis, a test statistic, χ₁, is designated. It is asymptotically distributed as χ₁² with one degree of freedom and adheres to the following equation:

χ_{1} = (m + n) \frac{ρ_{1}}{{(1 + ρ_{1})}^{2}} \hat{V} a r (h (t)) {\hat{β}}_{1}^{2}

Eq. (2)

where ρ₁ = m/n, V̂ar(h(t)) is the estimate of the variance of h(t) with respect to the reference distribution g(x), and β̂₁ is the estimate of β₁ [18]. For our microarray data sets, m = n = the number of genes in a particular pathway times the number of arrays spotted with each gene, 3 this case because that was how many arrays were hybridized per E. coli strain [22].

The semi-parametric algorithm makes no assumptions regarding normal distributions for either g(x) or g₁(x). The only assumption made is for h(x). The choice of h(x) = x is quite satisfactory for symmetric or nearly symmetric probability distributions whereas h(x) = log x is adequate for skewed distributions. In our analysis we utilized h(x) = x [21].

The algorithm quantifies the level of similarity between g(x) and g₁(x) by numerical and graphical means. The numerical approach calculates the p-values resulting from the hypothesis test, discussed earlier. The graphical approach (seen in Figure 2) is an integral part of the analysis and not simply an illustration. The graphical approach: highlights the differences between the two E.coli strains for a specific biological pathway, indirectly correlates p-values with the overall structure and predictability of the density functions, and demonstrates how a series of numbers, in this case the intensity values, can be averaged and combined into a new data set independent of the original source, in this case, the genes. In other words the graphical approach both visualizes and interprets the results generated by the semi-parametric method, as shown in Figure 2. The greater the similarity between the distributions of a given pathway, the closer the plots of the estimated g(x) and g₁(x) are one to another; and the higher the corresponding p-value.

2. Bacterial strains

The two E. coli strains studied were BL21(λDE3) (F⁻, ompT, hsdS_B (r_B−,m_B+), dcm, gal, (DE3), Cm^r) and JM109(DE3) (endA1, recA1, gyrA96, thi, hsdR17 (r_k⁻,m_k⁺), relA1, supE44,Δλ⁻, Δ(lac-proAB), [F', traD36, proAB, lacI^qZΔM15], λDE3). Both strains were obtained from Promega Corp. (Madison, WI).

3. Fermentation and sample preparation

Both strains were grown at 37ºC in modified LB medium containing 10 g/L tryptone, 5 g/L yeast extract (15 g/L for JM109), 5 g/L NaCl, and 5 g/L K₂HPO₄. After sterilization, 10 mM MgSO₄, 1 mL/L trace metal solution, and 40 g/L glucose were added. Overnight cultures grown at 37ºC were used to inoculate 4.0 L of medium in a B. Braun fermentor equipped with data acquisition and a control system. The cultures were grown to high cell density, the pH was controlled at 7.0 by the addition of 50% NH₄OH, and dissolved oxygen was kept above 30% of saturation at all times.

Samples for total RNA purification were collected at the late logarithmic phase of growth, indicated by arrows in Figure 1. Next, the samples were centrifuged at 14,000 g for 10 min at 4°C; the supernatant was removed and the pellets were quickly frozen with dry ice and stored at −80°C.

4. Total RNA preparation

Total RNA was isolated using a MasterPure RNA Purification Kit (Epicentre Technologies, Madison WI) according to the manufacturer’s protocol (Kit MCR 85102). Isolated RNA was further purified with an RNAeasy Kit 75144 (Qiagen). Overall RNA concentration was determined by measuring absorbance at 260 nm (A₂₆₀) using a GeneQuant Pro (Amersham Biosciences). Purified RNA samples were determined to have absorbance ratios (A₂₆₀/A₂₈₀) of 1.85–1.95 and by running 1% agarose/formadehyde denaturing gel. To further ensure equivalency between individual samples, the 23S and 16S ribosomal RNA (rRNA) from each sample were analyzed by an Agilent 2100 Bioanalyzer (Agilent Technologies). The intensity of each band was calculated and the rRNA ratio (23S/16S) for each sample was calculated to be greater than 1.5.

5. Oligonucleotide microarrays

Standard methods available from Affymetrix (Santa Clara, CA) for cDNA synthesis, fragmentation, and end-terminus biotin labeling starting with a total RNA (10 μg) sample were used. The biotin-labeled cDNA was hybridized to E.coli Affymetrix Antisense Genome Arrays at 45 °C for 16 hours as recommended in the GeneChip technical manual (Affymetrix). Hybridized arrays were stained with streptavidin-phycoerythrin using an Affymetrix Fluidic Station. The GeneChips were scanned using an Affymetrix/Hewlett–Packard GeneArray GC2500 Scanner. The signal intensity was normalized using Affymetrix Microarray Suite Software (version 4.0).

Acknowledgments

Funding was provided by the National Institute of Diabetes & Digestive & Kidney Diseases (NIDDK), National Institutes of Health (NIH).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Murphy D. Gene expression studies using microarrays: principles, problems, and prospects. Adv Physiol Educ. 2002;26:256–270. doi: 10.1152/advan.00043.2002. [DOI] [PubMed] [Google Scholar]
2.Tamayo P, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. doi: 10.1073/pnas.96.6.2907. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Burke HB. Discovering patterns in microarray data. Mol Diagn. 2000;5:349–357. doi: 10.1007/BF03262096. [DOI] [PubMed] [Google Scholar]
4.Dopazo J, Zanders E, Dragoni I, Amphlett G, Falciani F. Methods and approaches in the analysis of gene expression data. J Immunol Methods. 2001;250:93–112. doi: 10.1016/s0022-1759(01)00307-6. [DOI] [PubMed] [Google Scholar]
5.Saeed AI, et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–378. doi: 10.2144/03342mt01. [DOI] [PubMed] [Google Scholar]
6.Yang HH, Hu Y, Buetow KH, Lee MP. A computational approach to measuring coherence of gene expression in pathways. Genomics. 2004;84:211–217. doi: 10.1016/j.ygeno.2004.01.007. [DOI] [PubMed] [Google Scholar]
7.Shiloach J, Kaufman J, Guillard AS, Fass R. Effect of glucose supply strategy on acetate accumulation, growth, and recombinant protein production by Escherichia coli BL21 (lDE3) and Escherichia coli JM109. Biotechnol Bioeng. 1996;49:421–428. doi: 10.1002/(SICI)1097-0290(19960220)49:4<421::AID-BIT9>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]
8.Van de Walle M, Shiloach J. Proposed mechanism of acetate accumulation in two recombinant Escherichia coli strains during high density fermentation. Biotechnol Bioeng. 1998;57:71–78. doi: 10.1002/(sici)1097-0290(19980105)57:1<71::aid-bit9>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
9.Noronha SB, Yeh HJ, Spande TF, Shiloach J. Investigation of the TCA cycle and the glyoxylate shunt in Escherichia coli BL21 and JM109 using 13C-NMR/MS. Biotechnol Bioeng. 2000;68:316–327. [PubMed] [Google Scholar]
10.Phue J, Shiloach J. Transcription levels of key metabolic genes are the cause for different glucose utilization pathways in E. coli B (BL21) and E. coli K (JM109) J Biotechnol. 2004;109:21–30. doi: 10.1016/j.jbiotec.2003.10.038. [DOI] [PubMed] [Google Scholar]
11.Phue J, Noronha SB, Hattacharyya R, Wolfe AJ, Shiloach J. Glucose metabolism at high density growth of E. coli B and E. coli K: differences in metabolic pathways are responsible for efficient glucose utilization in E. coli B as determined by microarrays and Northern blot analyses. Biotechnol Bioeng. 2005;90:805–820. doi: 10.1002/bit.20478. [DOI] [PubMed] [Google Scholar]
12.Qi Y. Classification of Microarray Data, Department of Mathematics. College Park, MD: University of Maryland; 2002. [Google Scholar]
13.Conway T, Schoolnik GK. Microarray expression profiling: capturing a genome. Mol Microbiol. 2003;47:879–889. doi: 10.1046/j.1365-2958.2003.03338.x. [DOI] [PubMed] [Google Scholar]
14.Ideker T, Thorsson V, Siegel AF, Hood LE. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J Comput Biol. 2000;7:805–817. doi: 10.1089/10665270050514945. [DOI] [PubMed] [Google Scholar]
15.Mantripragada KK, Buckley PG, Diaz de Stahl T, Dumanski JP. Genomic microarrays in the spotlight. Trends Genet. 2004;20:87–93. doi: 10.1016/j.tig.2003.12.008. [DOI] [PubMed] [Google Scholar]
16.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. [DOI] [PubMed] [Google Scholar]
18.Fokianos K, Kedem B, Qin J, Short D. A semiparametric approach to the one way layout. Technometrics. 2001;43:56–65. [Google Scholar]
19.Qin J, Zhang B. A goodness of fit test for the logistic regression model based on case-control data. Biometrika. 1997;84:609–618. [Google Scholar]
20.Gilbert PB, Lele SR, Vardi Y. Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika. 1999;86:27–43. [Google Scholar]
21.Kedem B, Wolff DB, Fokianos K. Statistical Comparison of Algorithms. IEEE Trans Instrum Meas. 2004;53:770–776. [Google Scholar]
22.Gagnon R. Certain Computational Aspects of Power Efficiency and State Space Models, Department of Mathematics. College Park, MD: University of Maryland; 2005. [Google Scholar]

[R1] 1.Murphy D. Gene expression studies using microarrays: principles, problems, and prospects. Adv Physiol Educ. 2002;26:256–270. doi: 10.1152/advan.00043.2002. [DOI] [PubMed] [Google Scholar]

[R2] 2.Tamayo P, et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. doi: 10.1073/pnas.96.6.2907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Burke HB. Discovering patterns in microarray data. Mol Diagn. 2000;5:349–357. doi: 10.1007/BF03262096. [DOI] [PubMed] [Google Scholar]

[R4] 4.Dopazo J, Zanders E, Dragoni I, Amphlett G, Falciani F. Methods and approaches in the analysis of gene expression data. J Immunol Methods. 2001;250:93–112. doi: 10.1016/s0022-1759(01)00307-6. [DOI] [PubMed] [Google Scholar]

[R5] 5.Saeed AI, et al. TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–378. doi: 10.2144/03342mt01. [DOI] [PubMed] [Google Scholar]

[R6] 6.Yang HH, Hu Y, Buetow KH, Lee MP. A computational approach to measuring coherence of gene expression in pathways. Genomics. 2004;84:211–217. doi: 10.1016/j.ygeno.2004.01.007. [DOI] [PubMed] [Google Scholar]

[R7] 7.Shiloach J, Kaufman J, Guillard AS, Fass R. Effect of glucose supply strategy on acetate accumulation, growth, and recombinant protein production by Escherichia coli BL21 (lDE3) and Escherichia coli JM109. Biotechnol Bioeng. 1996;49:421–428. doi: 10.1002/(SICI)1097-0290(19960220)49:4<421::AID-BIT9>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]

[R8] 8.Van de Walle M, Shiloach J. Proposed mechanism of acetate accumulation in two recombinant Escherichia coli strains during high density fermentation. Biotechnol Bioeng. 1998;57:71–78. doi: 10.1002/(sici)1097-0290(19980105)57:1<71::aid-bit9>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]

[R9] 9.Noronha SB, Yeh HJ, Spande TF, Shiloach J. Investigation of the TCA cycle and the glyoxylate shunt in Escherichia coli BL21 and JM109 using 13C-NMR/MS. Biotechnol Bioeng. 2000;68:316–327. [PubMed] [Google Scholar]

[R10] 10.Phue J, Shiloach J. Transcription levels of key metabolic genes are the cause for different glucose utilization pathways in E. coli B (BL21) and E. coli K (JM109) J Biotechnol. 2004;109:21–30. doi: 10.1016/j.jbiotec.2003.10.038. [DOI] [PubMed] [Google Scholar]

[R11] 11.Phue J, Noronha SB, Hattacharyya R, Wolfe AJ, Shiloach J. Glucose metabolism at high density growth of E. coli B and E. coli K: differences in metabolic pathways are responsible for efficient glucose utilization in E. coli B as determined by microarrays and Northern blot analyses. Biotechnol Bioeng. 2005;90:805–820. doi: 10.1002/bit.20478. [DOI] [PubMed] [Google Scholar]

[R12] 12.Qi Y. Classification of Microarray Data, Department of Mathematics. College Park, MD: University of Maryland; 2002. [Google Scholar]

[R13] 13.Conway T, Schoolnik GK. Microarray expression profiling: capturing a genome. Mol Microbiol. 2003;47:879–889. doi: 10.1046/j.1365-2958.2003.03338.x. [DOI] [PubMed] [Google Scholar]

[R14] 14.Ideker T, Thorsson V, Siegel AF, Hood LE. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J Comput Biol. 2000;7:805–817. doi: 10.1089/10665270050514945. [DOI] [PubMed] [Google Scholar]

[R15] 15.Mantripragada KK, Buckley PG, Diaz de Stahl T, Dumanski JP. Genomic microarrays in the spotlight. Trends Genet. 2004;20:87–93. doi: 10.1016/j.tig.2003.12.008. [DOI] [PubMed] [Google Scholar]

[R16] 16.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. [DOI] [PubMed] [Google Scholar]

[R18] 18.Fokianos K, Kedem B, Qin J, Short D. A semiparametric approach to the one way layout. Technometrics. 2001;43:56–65. [Google Scholar]

[R19] 19.Qin J, Zhang B. A goodness of fit test for the logistic regression model based on case-control data. Biometrika. 1997;84:609–618. [Google Scholar]

[R20] 20.Gilbert PB, Lele SR, Vardi Y. Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika. 1999;86:27–43. [Google Scholar]

[R21] 21.Kedem B, Wolff DB, Fokianos K. Statistical Comparison of Algorithms. IEEE Trans Instrum Meas. 2004;53:770–776. [Google Scholar]

[R22] 22.Gagnon R. Certain Computational Aspects of Power Efficiency and State Space Models, Department of Mathematics. College Park, MD: University of Maryland; 2005. [Google Scholar]

PERMALINK

Evaluating microarrays using a semi-parametric approach: application to the central carbon metabolism of E. coli BL21 and JM109

Je-Nie Phue

Benjamin Kedem

Pratik Jaluria

Joseph Shiloach

Abstract

Introduction

Results

Figure 1.

Table 1.

Table 2.

Figure 2.

Discussion

Materials and Methods

1. Statistical formulation

2. Bacterial strains

3. Fermentation and sample preparation

4. Total RNA preparation

5. Oligonucleotide microarrays

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Evaluating microarrays using a semi-parametric approach: application to the central carbon metabolism of E. coli BL21 and JM109

Je-Nie Phue

Benjamin Kedem

Pratik Jaluria

Joseph Shiloach

Abstract

Introduction

Results

Figure 1.

Table 1.

Table 2.

Figure 2.

Discussion

Materials and Methods

1. Statistical formulation

2. Bacterial strains

3. Fermentation and sample preparation

4. Total RNA preparation

5. Oligonucleotide microarrays

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases