Experimental and computational analysis of transcriptional start sites in the cyanobacterium Prochlorococcus MED4

Jörg Vogel; Ilka M Axmann; Hanspeter Herzel; Wolfgang R Hess

doi:10.1093/nar/gkg398

. 2003 Jun 1;31(11):2890–2899. doi: 10.1093/nar/gkg398

Experimental and computational analysis of transcriptional start sites in the cyanobacterium Prochlorococcus MED4

Jörg Vogel, Ilka M Axmann ^1,2, Hanspeter Herzel ¹, Wolfgang R Hess ^2,^*

PMCID: PMC156731 PMID: 12771216

Abstract

In contrast to certain model eubacteria, little is known as to where transcription is initiated in the genomes of cyanobacteria, which are largely distinct from other prokaryotes. In this work, 25 transcription start sites (TSS) of 21 different genes of Prochlorococcus sp. MED4 were determined experimentally. The data suggest more than one TSS for the genes ftsZ, petH, psbD and ntcA. In contrast, the rbcL-rbcS operon encoding ribulose 1,5-bisphosphate carboxylase/oxygenase lacks a detectable promoter and is co-transcribed with the upstream located gene ccmK. The entire set of experimental data was used in a genome-wide scan for putative TSS in Prochlorococcus. A –10 element could be defined, whereas at the –35 position there was no element common to all investigated sequences. However, splitting the data set into sub-classes revealed different types of putative –35 boxes. Only one of them resembled the consensus sequence TTGACA recognized by the vegetative σ factor (σ⁷⁰) of enterobacteria. Using a scoring matrix of the –10 element, more than 3000 TSS were predicted, about 40% of which were estimated to be functional. This is the first systematic study of transcription initiation sites in a cyanobacterium.

INTRODUCTION

Fossil and molecular evidence suggests cyanobacteria as a group of eubacteria which can possibly be traced back up to 3.5 billion years (1). Photosynthesis evolved first in an ancient cyanobacterium. It was the photosynthetic activity of cyanobacteria over hundreds of millions of years which produced all the oxygen in the atmosphere of this planet turning its originally reducing into oxidising conditions ∼1.5 billion years ago. As an endosymbiont, one or several cyanobacteria gave rise to the eukaryotic forms of photoautotrophs, among them all higher plants (2,3). Estimations show that about 4500 (18%) of all nuclear genes of extant land plants might be derived from the cyanobacterial endosymbiont (4,5). Due to this history, cyanobacteria constitute an unusually diverse and heterogeneous group of prokaryotes, yet they form a coherent systematic group which is distinct in many features from other bacteria.

Transcription has been investigated in several bacteria, with Escherichia coli being by far the best studied example. Escherichia coli uses seven different species of σ factors to modulate promoter activation, among them the abundant σ factor of housekeeping genes, σ⁷⁰ (reviewed in 6). A consensus sequence of σ⁷⁰-dependent promoters has been established, and is defined by two consensus hexamers, TTGACA and TATAAT, centred 35 and 10 bp upstream of the site of transcription initiation (7). To date (7th January 2003) the E.coli database REGULONDB (http://kinich.cifn. unam.mx:8850/db/regulondb_intro.frameset) contains information on 4641 predicted promoters and 2326 predicted transcription units (8).

While the basic principles of promoter recognition and geometry of RNA polymerase are similar for all eubacteria, the actual sequences of promoter elements might not be. In cyanobacteria, this situation is further complicated by the presence of a truncated RNA polymerase subunit β′ (or γ) encoded by rpoC1 (9) and an additional subunit β″ (or δ) encoded by rpoC2. These subunits correspond to the N- and C-terminal parts of the β′ subunit of other eubacteria plus an additional protein domain of ∼70 kDa, and are found in this split form in all cyanobacteria including Prochlorococcus, as well as in plant plastids. Schyns et al. (10) showed that purified cyanobacterial RNA polymerase could not precisely initiate transcription at an E.coli promoter and, reciprocally, the E.coli holoenzyme could not at the promoter of a cyanobacterial phycobiliprotein gene. In other words, while RNA polymerases from enterobacteria and cyanobacteria share similar domains and subunits, they are not directly interchangeable. Thus, the known facts about E.coli promoter motifs cannot be extrapolated directly to any cyanobacterium.

Most of the knowledge about cyanobacterial transcription start sites (TSSs) to date comes from mapping of individual promoters in Synechocystis PCC 6803, the first and for many years the only totally sequenced cyanobacterial genome (11,12). Recently, several other cyanobacterial genome analyses have been completed. Of the latter, we chose to analyse transcription initiation sites from Prochlorococcus sp. MED4 for the following reasons: it represents the ecologically most important group of chlorophyll b-possessing cyanobacteria (13); its genome is the most minimised of any free-living photoautotroph, suggesting that it may possess only that part of the regulatory network of a cyanobacterial cell which is most fundamental and widespread. Therefore, our data provide information not only for comparison to related marine cyanobacteria but also to many other cyanobacteria.

MATERIALS AND METHODS

Culture conditions

Prochlorococcus sp. MED4 was grown in artificial seawater medium described previously (14) with a trace metal mix derived from medium Pro99 (Chisholm, personal communication). This modification resulted in the following final concentrations: 1.17 mM EDTA; 0.008 mM ZnCl₂; 0.005 mM CoCl₂; 0.09 mM MnCl₂; 0.003 mM Na₂MoO₄; 0.01 mM Na₂SeO₃; 0.01 mM NiCl₂; 1.17 mM FeCl₃.

Cultures were kept under 10 µmol quanta m^–2 s^–1 continuous blue light at 19 ± 1°C and harvested by centrifugation at 10 200 g for 10 min in a Dupont RC5C centrifuge.

Analysis of RNA

Total RNA was isolated as previously described (15). Transcriptional start sites were determined by 5′-RACE following the method of Bensing et al. (16) with modifications outlined in detail in Argaman et al. (17). Dephosphorylation of RNA (10 µg) prior to ligation (control) was performed using 10 U calf intestine phosphatase (AP Biotech, Sweden) in a buffer containing 50 mM Tris–HCl pH 7.9, 100 mM NaCl, 10 mM MgCl₂, 1 mM DTT, at 37°C for 30 min, followed by phenol/chloroform extraction and ethanol precipitation. All enzymatic treatments of RNA were performed in the presence of 2 U Super RNase Inhibitor (Ambion, USA). An amount of cDNA equalling 75 ng of starting RNA was used as template in 30 µl PCR reactions, of which 15 µl (equal loading throughout the whole set of experiments) was run on 3% Nusieve agarose gels. Amplified fragments were cloned into plasmid PCR 2.1-TOPO (Invitrogen) or pGEMT (Promega). After transformation into E.coli XL1-Blue, plasmid inserts were screened by colony PCR as described previously (17). The PCR fragments were then purified on QIAquick spin columns (Qiagen, Germany) and sequenced using an ABI 373 automatic DNA sequencer (Applied Biosystems).

List of oligonucleotides

For each gene to be tested, one deoxyribonucleotide was synthesized for reverse transcription (A-oligo) and a second, nested deoxyribonucleotide for the subsequent race PCR (B-oligo). Following the identification of a TSS, for several genes a third deoxyribonucleotide (C-oligo) was employed that overlapped the TSS, with its 5′ end located at least 10 nt upstream of the TSS. These served in a second series of RACE reactions as controls either to verify the detected TSS or to detect another, more upstream located promoter.

A complete list of all oligonucleotides used is part of the Supplementary Material, and is also available at http://www.biologie.hu-berlin.de/∼genetics/hess/hessproj.html.

Bioinformatics: datasets

The total genome sequence from the high light-adapted strain Prochlorococcus sp. MED4, which is very closely related to Prochlorococcus marinus ssp. pastoris PCC 9511 (14), was used in the computer annotated version of November 2002 downloaded from an ftp site of the Joint Genome Institute (ftp://ftp.jgi-psf.org/pub/JGI_data/Microbial/ prochlorococcus/). The Prochlorococcus sp. MED4 genome sequence is also available from GenBank (accession no. NZ_AAAW00000000). A dataset was derived by using the annotated genes and RNAs resulting in 1745 upstream regions (UPR) from –300 to –1 calculated relative to the annotated translation start site or the first nucleotide of functional RNAs. Following the removal of UPR < 50 bp, 1091 UPRs remained in the analysis. Because the sequence version of Prochlorococcus MED4 contained only one strand of the DNA, the UPRs of all genes and RNAs on the complementary strand were translated and reversed. The sets of UPRs were created using only non-coding sequences by masking the annotated coding regions located on the same DNA strand as the annotated gene or RNA. The input data used for the in silico studies were 25 experimentally determined TSSs of 21 genes found by a systematic analysis of promoters in Prochlorococcus MED4 (Table 1).

Table 1. Experimentally determined mRNA 5′ ends.

Gene	Label	Sequence	Position relative to start codon	No. of clones
pcb	or1216	atccgtttgatgaaatataaagtattctcaaaacG	–22	3/3^a
psbA	or2010	cacattattgatgattttggactatatttatcaaA	–63	3/3^a
ftsZ	or1658 TSS2	ctacatatcagcttagtgagttcataatgcatccA	–27	2/2
ftsZ^b	or1658 TSS1	ttcactcaaatttttgacaagttaatatttaaggA	–79	4/4
psbD	or1562 TSS2	ccactttactctcaatctttgttacagtctcatcG	–27	3/3^a
psbD^b	or1562 TSS1	cctactaatttaagaattaatatagagtattattA	–124	2/2
ntcA	or2021 TSS1	aaatataagaattgaaaaatgttactgttgataca	–48	5/14
ntcA	or2021 TSS2	gttatttcttgctgtcttcttaagttttttaatAG	–12/–13	2/14
petE	or1000	cgatttagttccaaaaaccttgtaatatatataaA	–16	2/3
petB	or2061	caccaagtaagtgaattttaagtaagctttccatA	–21	2/2
rps12	or0380	ttagatatgactcgtaaaaggttatgatgttttgA	–26	3/3
ureE	or1426	taccttcccaagaagatttagccataattgttttC	–38	2/5
groES	or0427	tttaattgctgcttaacagattatgttatttaccA	–70	5/9
atpB	or1762	taacgaatataccctcgggatggtaatatttcgcA	–48	3/5
kaiB	or0505	atgtttgtttatcttcttattttatagtgtaattA	–49	6/10
rpl21	or1677	gacTaattacactataaaataagaagataaacaaa	–66	5/5
psbH	or0118	acaatttatgtattaactcttatacaataactaaa	–25	3/3
psaF	or1055	catatttgattcgatactgttttattttaataatA	–43	4/4
chlN	or1017	tgcttaaatccaaaagtttatgcaagcttgaagaA	–19	10/10
coaT	or0187	aaagtctataacatttattactatagtaattaatA	–35	2/3
ccmK	or1172	acttatcagtacgttatggaccattcttcggattG	–45^c	8/8
deaD	or0652	atttatctttacctccagatgttagattagtgatT	–82	2/11
ndhC	or0093	tacattcgaatctattttgttctaaattggtaatG	–177	7/12
petH	or0666 TSS2	tacaaattaatatacaaattgataatctcttagtA	–17	1/2
petH	or0666 TSS1	ttttataaaacacatacaaattaatatacaaattG	–31	1/2

Open in a new tab

Putative –10 regions are underlined. The first transcribed nucleotide is in upper case letters. The upper case T at position 4 of rpl21 and the italicised hexanucleotide indicate the positions of the first transcribed nucleotide and –10 region, respectively, of kaiB in reverse complementary orientation (overlapping promoters). The number of sequenced clones is given for each gene together with the frequency of the respective TSS.

^aConsistent with results of primer extension in Garczarek et al. (21).

^bFound with upstream primers spanning the more proximal TSS.

^c407 nt upstream of rbcL (label 1173) (25).

Scoring matrix

A set of sites can be used to create a scoring matrix with the nucleotides A, T, C and G as columns and the binding site positions as rows. Each entry of the matrix is determined from the logarithm of the ratio of actually observed frequencies and the number of expected nucleotides.

The conservation of a position is given by the sum of the scores of one column of the scoring matrix. Thus, if the sum is close to 2 bits, it is completely conserved; 0 bits stands for no conservation (compare Fig. 2). However, for small sets of known sites zeroes appear in frequency tables such as seen in Figure 2. Then it is appropriate to add pseudo-counts to each entry of the table (18). Thus we added 1 to each entry of the table in Figure 2 in order to develop the scoring matrix designated W. The entries in this scoring matrix (Fig. 3) were obtained by taking the logarithm (base 2) of the ratio of observed to expected frequencies (18), respecting the high AT content of 74% within the UPR of the MED4 genome (0.13 for C or G, 0.37 for A or T). As an example we take the first position: the frequent nucleotide T gets a score of

Alignment of –10 boxes of 25 promoter sequences of experimentally determined promoters (left) and counted nucleotides at each position of the 25 aligned –10 boxes (right). The positions –12 to –7 were found for the majority of TSSs, except that it was –13 to –8 for *ntcA2*, *groES*, *rbcL*, *petH1* and *coaT* and –14 to –9 for *atpB* (see Tables 1 and 2).

Scheme of promoter prediction using the *kaiB* gene as an example. The strategy consisted of three steps: (i) raster, a regular expression search based on biological features; (ii) score, determining the weights of the potential boxes with a scoring matrix; (iii) filter, using a filter (cut-off value) to reduce the false positive rate of the predicted boxes. The numerical values have been rounded.

log₂{(20 + 1)/[0.37 × (25 + 4)]} = 0.969

whereas the unexpected nucleotide G gets a score of

log₂{(0 + 1)/[0.13 × (25 + 4)]} = –0.915

Software tools

All software tools used are freely available. The multiple alignments were performed with CLUSTALW 1.81 (19). The sequence logos were created with WebLogo (www.bio. cam.ac.uk/cgi-bin/seqlogo/logo.cgi), developed by Schneider (20). All other scripts were written in PERL and are available on request.

RESULTS

Experimental analysis

To experimentally determine sites of transcription initiation in Prochlorococcus sp. MED4, we employed a 5′-RACE technique first described by Bensing et al. (16). Primary transcripts in bacteria carry a 5′ triphosphate, which can be cleaved specifically by tobacco acid pyrophosphatase (TAP). The resulting 5′ monophosphate would subsequently be ligated to the 3′ hydroxyl group of an RNA oligonucleotide (5′ adaptor), which here was followed by reverse transcription with a gene-specific oligo (placed within the first 200 nt downstream of a start codon) and PCR amplification with a 5′ adaptor and a nested gene-specific primer. TAP treatment is expected to yield a specific or at least strongly enhanced signal for primary transcripts in the amplification step, as compared to untreated RNA samples (lane T+ in Fig. 1). Because of this selectivity for newly initiated transcripts among the pool of 5′ ends, and the small amounts of RNA material required, we consider this technique more suitable for our approach than traditional primer extension analysis.

RACE mapping of cyanobacterial primary mRNA 5′ ends. Amplification of mRNAs for *ftsZ*, *psbD*, *NtcA* and *rbcLS* is shown. In most cases a single TAP-specific PCR product was obtained (lane T+). RNA control samples included: untreated (U), 5′ dephosphorylated (C) and mock-treated in TAP-buffer but without TAP (T–). Total DNA served as yet another control for amplification of reverse transcription-specific cDNAs (lane D).

Upon cloning and sequencing of the amplification product, the first nucleotide downstream of the 5′ adaptor RNA was assigned the transcription initiation site. Three controls were included in all experiments: RNA without any treatment, RNA 5′ dephosphorylated using calf intestine phosphatase and RNA mock-treated in TAP buffer but omitting TAP (lanes U, C and T–, respectively, in Fig. 1).

We first investigated three genes, pcb, psbA and psbDC, for which promoters had earlier been determined by primer extension (21). The nucleotides found in the TAP-specific amplification products were identical to those previously identified as reverse transcription stops, except that primer extension in two cases had suggested two adjacent nucleotides as the mRNA 5′ ends. Thus, the data obtained by 5′-RACE are in agreement with the TSS determined by other methods and may be even more precise. We also found a second TSS for psbDC that had remained undetected in the previous study.

Typical examples extracted from the complete set of genes examined are shown in Figure 1. In most cases, a single prominent band was obtained with TAP-treated RNA. The presence of weaker bands of the same size in some of the control reactions indicates that a minor proportion of the primary transcripts were present in 5′ monophosphate form in the original transcript population. Sequence analysis of 2–14 individual clones revealed the first transcribed nucleotide represented in each amplicon (Table 1). For most genes all sequences showed an identical 5′ nucleotide. In some cases a couple of clones (deaD, ntcA, ureE and ndhC) with deviating sequences were found exhibiting a scattered distribution of the respective first nucleotide alongside the coding region. If these 5′ ends were not found again in additionally sequenced clones, they would be considered the products of partial degradation of mRNAs in the absence of a major primary transcript. However, for some genes two groups of sequences with an identical 5′ nucleotide were identified in this analysis or subsequently using a second primer (C-oligo, see Materials and Methods) that overlapped the initially identified TSS, namely ftsZ, psbD, ntcA and petH. These groups of 5′ nucleotides were considered as evidence for multiple TSS if they corresponded in size with different TAP-specific bands (depending on gel resolution) (for example Fig. 1, ntcA) or if they were consistently found with the respective C-oligo.

In some cases, the experimentally mapped promoter overlapped with the coding regions of adjoining genes on the complementary strand (the TSS of ureE, petB, ndhC and psaF overlapped with the genes ureD, prc, rub and gcp, respectively).

Further sequence analysis showed that the preferred first transcribed nucleotide is A (A >> G > C = T; Table 1). The distance between TSS and start codon was 12–177 nt for all investigated genes except rbcL. The length of the obtained 5′-RACE fragment for rbcL was >500 nt and thus considerably longer than expected. Upon sequence inspection, it was found to include an additional gene, ccmK, located directly upstream of rbcL, and indicated a TSS 45 bp upstream of the ccmK start codon.

Bioinformatic analysis

–10 region. When aligning the 25 experimentally determined promoter sequences at their mapped TSS, over-represented nucleotides only appeared in the –10 region and at the TSS. Two-thirds of the analysed sequences, petE, rpl21, psbH, psbA, ftsZ2, pcb, ntcA1, ndhC, psaF, psbD1, kaiB, psbD2, chlN, ftsZ1, petB, petH2 and rps12, revealed a preference for a purine (adenine or guanine) at the TSS, a thymine at –7 and an adenine at –11. The latter two conserved bases indicated the presence of a –10 element at positions –12 to –7 for these 17 TSSs as well as for ureC and deaD, the only transcripts not starting with a purine. An alignment of this subset with CLUSTALW was used as a profile to align the remaining six promoter sequences, groES, rbcL (ccmK), coaT, ntcA2, petH1 and atpB. This resulted in positions –13 to –8 or –14 to –9 being suggested as –10 elements because only here were conserved nucleotide positions found. The –10 region of the final alignment is shown graphically in Figure 2 (left) in the form of a sequence logo of all 25 sequences. Adenine (position 2) and thymine (position 6) are highly conserved. The first position of the box seems to be semi-conserved, with a preferred thymine. This was also apparent from calculated counts of each base at each position shown in Figure 2 (right), which was taken subsequently to create a scoring matrix in the whole genome prediction.

–35 region. It was not possible to define a –35 region in one step since the set was too heterogeneous to obtain a conserved core by aligning all sequences at the TSS. Thus, sequences of 13 bp including the –35 position plus 6 nt upstream and downstream were extracted from the 25 experimentally determined promoters and aligned with CLUSTALW (Table 3). No conserved position or pattern common to all sequences was found even if a shift of these 13 bp within an 18 bp window was allowed. Therefore, the set was split into sub-classes. The first four sequences inside the alignment shown in Table 3 are the most similar to each other, containing five conserved positions at exactly the same upstream position, except the –35 region of ftsZ2, which is shifted 1 bp to align with the other three regions, of ureE, petH2 and ndhC. Another group consists of kaiB, rpl21, rps12 and ccmK, aligned at the end after all others (Table 3), because they are mostly dissimilar to the first aligned sequences and exhibit nearly the same consensus sequence TTGACA of the –35 region as recognised by the vegetative σ factor σ⁷⁰ of E.coli.

Table 3. Alignment of the –35 region.

Open in a new tab

The bases from position –41 to –29 are shown and distances to the –10 box and TSS are given. Conserved positions in the upper or lower sequence subset are labelled in white.

Whole genome prediction. This new knowledge about promoter structure in MED4 allowed the prediction of additional promoters in a genome-wide fashion. A set of UPRs was created using only non-coding sequences by masking the annotated coding regions. The information of the 25 aligned –10 boxes was compressed in a scoring matrix W. A search was done for promoters which exhibited features similar to the experimentally verified set, and the quality of the newly found –10 boxes inside uncharacterised regions was measured by means of this scoring matrix W. The prediction was divided into three steps: (i) raster, a search for 6 nt as a potential –10 region based on the known promoter structure in bacteria, –35 region/space/–10 region/space/TSS/space/translation start site; (2) score, determining the weights of the potential boxes with a scoring matrix; (3) filter, using a filter (cut-off value) to reduce the false positive rate of the predicted boxes.

Figure 3 illustrates the promoter prediction for one example, the –10 box TATAGT of the gene kaiB. The distances between –35 box and –10 box (16 bp) and –10 box and TSS (6 bp) are the typical spaces of the known promoter structure in E.coli and other bacteria. The distance range of 15–85 bp between TSS and translation start site was derived from the majority of the experimentally determined promoters in MED4 (Table 1). Note that since only UPRs longer than 50 bp were searched, this raster would not detect promoters in small UPRs within tightly clustered operons or in coding regions. A search using this raster within the UPR data set resulted in 6 nt for the position of the –10 box. The identified matches for possible –10 regions were weighted using the scoring matrix W. In the last step a cut-off value was chosen to find putative –10 boxes in the scored matches. The value of the threshold was the lowest score that appeared for one of the known boxes. In Table 2 the scores for the known promoters are listed (weighted with scoring matrix W). The –10 box of gene rpl21 exhibited the lowest score, used as the cut-off value of 2.5.

Table 2. –10 boxes of known promoter regions, their distance relative to the experimentally determined TSS and their ranking in prediction.

Name		–10 box	–10 box to TSS (bp)	Raster–score–filter method			NNPP
Gene	Promoter			Score (using W)	Ranking	No. of predictions	Ranking	No. of predictions
pcb	pcb	tattct	6	4.1	2	4	–	9
psbA	psbA	tatatt	6	3.0	1	5	– (2)	3
ftsZ	ftsZ2	cataat	6	2.9	3	3	–	3
	ftsZ1	taatat	6	3.4	1	3	– (2)	3
psbD	psbD2	tacagt	6	4.3	1	4	–	8
	psbD1	tagagt	6	3.9	–	4	–	8
ntcA	ntcA1	tactgt	6	4.2	1	3	–	12
	ntcA2	taagtt	7/8	3.8	–	3	–	12
petE	petE	taatat	6	3.4	1	1	–	1
petB	petB	taagct	6	5.1	1	1	– (6)	6
rps12	rps12	tatgat	6	4.4	1	1	– (2)	7
groES	groE	tatgtt	7	3.8	3	4	3	4
atpB	atpB	taatat	5	3.4	2	4	2	4
kaiB	kaiB	tatagt	6	4.3	2	7	– (2)	7
rpl21	rpl21	gaagat	6	2.5	5	5	– (5)	9
psbH	psbH	tacaat	6	3.5	2	4	–	7
psaF	psaF	tatttt	6	2.8	2	3	– (6)	6
chlN	chlN	caagct	6	4.6	1	2	–	5
coaT	coaT	tatagt	7	4.3	1	4	–	7
ccmK	ccmK	cattct	7	3.6	3	3	–	4
deaD	deaD	tagatt	6	2.6	3	3	–	1
ndhC	ndhC	taaatt	6	3.0	–	3	–	7
petH	petH2	taatct	6	4.1	1	2	–	8
	petH1	taatat	7	3.4	2	2	–	8
ureE	ureE	cataat	6	2.9	2	3	–	3

Summary
	Raster–score–filter method	NNPP	Ranking of known sites by raster–score–filter method
Predicted sites	69	121	Rank	1	2	3	5
Known sites	25	25	Known sites	10	7	4	1
Missing known sites	3	23 (16)

Open in a new tab

Experimentally, 25 TSS were determined leading to 25 known –10 regions found with CLUSTALW. The developed raster–score–filter method predicts 69 putative –10 boxes for these 21 upstream regions whereby three of the known –10 regions are missing due to the defined space of 15–85 bp between the TSS and translation start site. Thus, the prediction finds about three times more promoters than are experimentally verified. On the other hand, if one ranks the predicted boxes due to their scores, 21 of the known sites are high scoring: 10 of them are top scoring, seven in the second rank and four are ranked at the third position. The scores of the –10 boxes have been rounded. Minus stands for missing in the prediction. For comparison, a Neural Network Promoter Prediction (NNPP) was run with settings for prokaryotes and a minimum score of 0.8 (NNPP available at http://www.fruitfly.org/seq_tools/promoter.html). About five times more promoters were predicted than found but 23 of the known sites were missing using NNPP. The number of missing sites was reduced to 16 if a shift was accepted of 1 bp between predicted and known TSS, as denoted in parentheses.

The prediction was done within the data set UPR that initially contained 1726 regions upstream of annotated protein-coding and RNA genes. Excluding all UPRs shorter than 50 bp, about 1000 regions remained. Those 700 genes of MED4 excluded from the analysis are likely to be organised in operons with other genes. Consequently, one can expect to find about 1000–2000 genuine promoters in MED4, because a gene may have more than one promoter. For example, in E.coli the topA gene is transcribed from five different promoters (22), and transcription of the ftsZ gene is driven as part of the division and cell wall gene cluster by at least six different promoters (23).

If the prediction of –10 boxes drastically exceeds this estimation of 1000–2000 genuine promoters, the promoter prediction may contain many false positives. Therefore, the search parameters, length of scanned UPRs and scoring matrices, were varied to reduce the number of predictions (see Materials and Methods). Finally, 3624 predicted boxes were found inside the set UPR (space 15–85 bp between TSS and translation start site) by weighting with scoring matrix W (includes one pseudo-count and an AT content of 74%) (Fig. 3). The promoter prediction was tested for the UPRs of the 21 genes used for the experimental promoter analysis. Rankings and scores of the respective TSSs are presented in Table 2. Only three of the known –10 boxes (12%) were missing in this prediction. The full prediction results can be downloaded from http://www.biologie.hu-berlin.de/∼genetics/hess/hessproj.html. An independent promoter prediction run with settings based on the analysis of 272 E.coli promoters (http://www.fruitfly.org/seq_tools/promoter.html) predicted about five times more promoters than found for the experimental data set, whereas 23 of the known sites were missing (Table 2). This emphasises the need to generate initial promoter data from the organism to be analysed, rather than relying on data from other eubacteria for promoter predictions in cyanobacteria such as Prochlorococcus.

DISCUSSION

Small genome size (24,25) and a lack of several otherwise widespread regulatory components in Prochlorococcus suggest a comparatively simple organisation of the regulatory system of this organism. The recently completed analysis of several total genomes within this genus now opens the possibility for systematic studies of its gene expression and regulatory systems. Here, several intriguing features were unravelled by mapping of TSSs directly in 5′-RACE experiments and by using a computational strategy to gain insight into general features of promoter architecture of this ecologically important cyanobacterium.

Firstly, a cross-species comparison turned out to be informative for two genes, rbcL and ntcA. The first nucleotides of the ccmK-rbcLS transcript in MED4 are highly similar to the first transcribed nucleotides described for the Synechococcus WH 7803 ccmK-rbcLS transcript (26). While there is no evidence that this has regulatory or other relevance, it is intriguing that in both marine cyanobacteria the rbcLS genes are part of the same tricistronic operon including the ccmK gene. rbcLS encodes ribulose bisphosphate carboxylase, the key enzyme of photoautotrophic carbon assimilation. The gene ccmK (or csoS1A) and four additional genes (csoS2-csoS3-orfA,B for carboxysome peptides A and B) located immediately downstream of rbcLS encode components of the carbon concentrating mechanism. There is evidence that the whole gene cluster was transferred into marine cyanobacteria by horizontal gene transfer from γ proteobacteria (25). Thus, a conservation of TSS identified here and in WH 7803 may indicate conservation of regulatory elements over a wider evolutionary distance. Indeed, a comparison of the regions upstream of ccmK revealed that this is the case. If the here defined –10 and –35 sequences for Prochlorococcus MED4 (Tables 2 and 3) are compared to the information that is available for Synechococcus WH 7803, Synechococcus WH 8102 and Prochlorococcus MIT9313, a perfect and almost perfect conservation of the –35 and –10 elements, respectively, can be seen (Fig. 4). Thus it has become possible to predict a TSS for this operon in the other two species for which there is no experimental information (26). This illustrates how this data set might be utilised for tentative identification of promoters and regulatory sequences in other cyanobacterial species.

Cross-species comparison of the 5′-UTR of *ccmk-rbcLS* from marine cyanobacteria. The experimentally identified TSS (labelled by arrows above the sequence), 5′-Gaacat for *Prochlorococcus* MED4 (this work) and 5′-gAacat for *Synechococcus* WH 7803 (21), are part of a conserved nucleotide motif at the mRNA 5′ end. A strong conservation of this motif and of the suggested –10/–35 regions from MED4 is not only obvious in the two species but also apparent in *Synechococcus* WH 8102 and *Prochlorococcus* MIT9313, for which no experimental data exist.

Spacings and sequences of the two found TSSs for ntcA correlate well with data for NtcA-activated genes from other cyanobacteria. Typically, such a promoter contains an NtcA-binding site GTAN₈TAC and a TAN₃T –10 element and regulates a TSS 33–38 nt downstream of the final C of the binding site (27). Here, an NtcA-binding site (tGTtactgttgaTAC, nucleotides in upper case letters match the consensus) frames the –10 box of TSS1 and is suggested to regulate TSS2, which is 35 nt downstream and has TAAGTT at its –10 (Table 1). Thus, this situation resembles the canonical ntcA-activated promoter. A very similar organisation was described for the nblA and ntcA promoters of Synechococcus 7942 (28,29) and the ntcA promoter of Synechococcus WH 7803 (30). For ntcA promoters in both strains, the TSS closest to the start codon is under autoregulation by NtcA and activated only in the absence of ammonia, whereas the other is used constitutively (29,30). The data here indicate a similar regulatory mechanism in Prochlorococcus MED4. Indeed, activation of ntcA expression under nitrogen deprivation was found in Prochlorococcus (31). Yet this mode of ntcA regulation is intriguing, taking into account the absence of several genes for cyanobacterial nitrogen uptake and assimilation from the genome of Prochlorococcus MED4 and the lack of an ability to assimilate nitrate and nitrite (32,33).

Secondly, the gene product of kaiB, along with those of kaiA and kaiC, is considered a core component of the circadian clock of cyanobacteria (34–36). The observation of a synchronised cell cycle (37,38) and circadian gene expression pattern in the natural environment as well as in the laboratory (39,40) provide some evidence for the existence of a circadian clock in Prochlorococcus. The promoters of kaiB and rpl21 detected here are each on the complementary strand with regard to each other, yet the distance between the –10 elements and TSS is only 6 and 30 nt, respectively (Table 1). Thus the two promoters are physically linked. Interestingly, the 9 nt around the kaiB TSS ttAgtcaAg (first transcribed nucleotide in upper case) are found in Prochlorococcus MIT9313 upstream of kaiA. Thus, the data might be considered as evidence for the conservation of the kaiA promoter during evolution of the Prochlorococcus group due to the linkage to rpl21, whereas the native kaiB promoter might have been lost together with the gene kaiA.

Thirdly, computational analysis revealed some general features of the promoter architecture in this hitherto scarcely investigated cyanobacterium. The 25 experimentally identified promoter regions revealed a consensus promoter structure similar to other eubacteria. The –10 region and TSS are separated by about 6 bp; the TSS and translation start site are separated by 15–85 bp in most cases. The –10 box itself exhibits three conserved nucleotides, T(–12), A(–11) and T(–7), which partly represents the well-known consensus TATAAT of E.coli and Bacillus subtilis. The –35 regions of the promoter set are suggested to define subsets. Although the spacing between the –10 and –35 regions differs by up to 3 nt, a consensus is found similar to the known consensus-type sequence TTGACA for housekeeping genes of E.coli for one of these subsets containing the genes kaiB, rpl21, rps12 and ccmK.

Based on this promoter consensus sequence, and including a minimal distance between the TSS and translation start site while excluding regions more distant than seen in the majority of the experimentally verified promoters, new promoter regions were predicted in the Prochlorococcus MED4 genome. In a preliminary run, more matches than the estimated 2000 genuine promoters within the whole genome were scored, indicating false positives. Different strategies and parameters were tested to reduce the rate of false positives. It turned out that over-prediction was less severe by using a scoring matrix W that included one pseudo-count and took into account the high AT content of the MED4 genome. In this case, 3652 promoters were predicted for 1091 transcription units in MED4. The problem of in silico predictions exceeding the number of genuine promoters is well known. It remains a challenging task to minimise false positives even in the case of a well-studied bacterium such as E.coli, for which the predictions of 2326 transcription units and 4641 promoters are three to six times higher than the number of known objects (http://kinich.cifn.unam.mx:8850/db/regulondb_intro.frameset) (8). Furthermore, the fact that unique sequences were found for only 12 of the 25 TSSs implies that the derived scoring matrix could still be improved by more experimental data. There are several limitations in range and threshold or caused by the existence of promoter subclasses, i.e. recognition by different σ factors. Other features of the DNA structure itself, e.g. DNA bending, add to this problem. As an example, the hidden Markov model of Yada et al. failed to recognise 40% of 390 known transcriptional units in E.coli (41). Thus, the accuracy and significance of this prediction problem is far from trivial. Moreover, in MED4 nearly 20% of the experimentally determined promoters overlap with coding regions on the complementary strand, which could be due to the extremely small genome of Prochlorococcus (1.66 Mb), in which all genes for a marine cyanobacterium, including a complete photosynthetic apparatus, have to be encoded.

In eubacteria, the expression of distinct sets of genes under varying environmental conditions or in post-exponential stationary phase is controlled by alternative σ factors that provide differential promoter preferences to the RNA polymerase core enzyme. In E.coli, under stress conditions or in stationary phase, the alternative RNA polymerase subunit σ^S and the vegetative factor σ⁷⁰ coexist. It is a paradox that the two holoenzymes transcribe different sets of genes, yet both recognise promoter sequences that are basically very similar. In contrast, another E.coli σ factor, σ⁵⁴ (or σ^N), employs a different mechanism. It requires transcriptional activators binding enhancer sequences relatively far upstream from the TSS and acts therefore over a longer distance (42). Promoter selectivity between the σ^S and the σ⁷⁰ RNA polymerase in E.coli is achieved by subtle differences in the elements surrounding the –10 and –35 elements, different degrees of local DNA superhelicity and by the fact that the σ^S polymerase tolerates partial degeneration of the –35 box better than the σ⁷⁰ enzyme (43).

Cyanobacteria generally lack genes encoding a σ⁵⁴ factor, however, in analogy to σ^S of E.coli they frequently have multiple copies of genes encoding additional σ factors. According to functional information and phylogenetic analysis, σ factors have been categorised into group 1 (the vegetative factor, essential), group 2 (additional but non-essential factors, structurally related to group 1) and the structurally different group 3. The high number of group 2 σ factors in cyanobacteria in contrast to the single alternative E.coli σ^S factor is thought to be relevant for adaptation to highly variable environmental conditions and actually a key to better understand the biology of cyanobacteria. Prochlorococcus MED4 has five σ factor genes (W.R. Hess, G. Rocap and S.W. Chisholm, unpublished results), as opposed to nine in Synechocystis PCC 6803 (12), 12 in the chromosome and on plasmids of Anabaena PCC 7120 (44) and eight in Thermosynechococcus elongatus BP1 (45). Phylogenetic analysis shows that one gene codes for a group 1 vegetative σ factor and four for group 2 additional factors (W.R. Hess, G. Rocap and S.W. Chisholm, unpublished results). In Synechocystis PCC 6803 group 3 σ factor polypeptides were not detectable under normal physiological conditions (46). SigF in Synechocystis PCC 6803 is critical for the control of motility (47), for which the potential target genes simply do not exist in Prochlorococcus MED4. Thus the absence of a group 3 σ factor gene in MED4 argues for a lack of certain regulatory loops in this cyanobacterium. According to the total genome sequence, the regulatory and adaptive capabilities of this cyanobacterium have been highly reduced in concert with its very small genome size (25). Therefore it is highly likely that the products of the four putative group 2 σ factor genes in MED4 are involved in the most essential regulatory processes for a cyanobacterium that otherwise sustains a high growth rate and a dominant role in its ecological niche (13,48). Group 2 σ factors SigB and SigD in Synechocystis PCC 6803 have been suggested as the heat shock- and high light-responsive σ factors in this cyanobacterium (46), while SigB and SigC in Anabaena PCC 7120 are expressed in response to nitrogen and/or sulphate starvation (49). The four group 2 factors of MED4 are probably implicated in similar processes, such as nutrient limitation or high light stress responses. Recent results of in vitro transcription experiments have indicated that group 1 and group 2 σ factors in the cyanobacteria Synechococcus PCC 7942 and Synechocystis PCC 6803 exhibit the same promoter selectivities with regard to the –10 and –35 elements (46,50). Therefore, the promoters found in this study might actually derive from the activity of differentially composed RNA polymerase complexes, including those using additional σ factors, or even the activity of particular transcription factors. The identification of two TSSs for ntcA, one possibly induced under nitrogen limitation, illustrates that transcription initiation at some stress-controlled promoters might have been detected here.

In summary, the experimental and computational analysis done here defines the basic elements for a transcription initiation site in Prochlorococcus MED4. As a result, based on the available total genome sequence, more than 3000 TSSs were predicted. The comparison between experimentally determined and predicted TSS for a set of 21 different genes suggests that about 40% of the prediction can be estimated to be functional. Thus, the results of this genome-wide scan will be useful for the future refined analysis of the regulation of gene expression and of the architecture of transcriptional units in bacteria.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.

[Supplementary Material]

nar_31_11_2890__index.html^{(1.2KB, html)}

Acknowledgments

ACKNOWLEDGEMENTS

We thank Ruti Hershberg for critical reading of the manuscript and two anonymous referees for valuable and detailed comments. This work was supported by a grant ‘MARGENES’ (QLRT-2001-01226) from the European Union to W.R.H. and by an EMBO long-term fellowship to J.V.

REFERENCES

1.Schopf J.W. (1993) Microfossils of the Early Archean Apex chert: new evidence of the antiquity of life. Science, 260, 640–646. [DOI] [PubMed] [Google Scholar]
2.Douglas S. (1998) Plastid evolution: origins, diversity, trends. Curr. Opin. Genet. Dev., 8, 655–661. [DOI] [PubMed] [Google Scholar]
3.Moreira D., Le Guyader,H. and Philippe,H. (2000) The origin of red algae and the evolution of chloroplasts. Nature, 405, 69–72. [DOI] [PubMed] [Google Scholar]
4.Rujan T. and Martin,W. (2001) How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet., 17, 113–120. [DOI] [PubMed] [Google Scholar]
5.Martin W., Rujan,T., Richly,E., Hansen,A., Cornelsen,S., Lins,T., Leister,D., Stoebe,B., Hasegawa,M. and Penny,D. (2002) Evolutionary analysis of Arabidopsis, cyanobacterial and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl Acad. Sci. USA, 99, 12246–12251. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ishihama A. (2000) Functional modulation of Escherichia coli RNA polymerase. Annu. Rev. Microbiol., 54, 499–518. [DOI] [PubMed] [Google Scholar]
7.Hawley D.K. and McClure,W.R. (1983) Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res., 11, 2237–2255. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Salgado H., Santos-Zavaleta,A., Gama-Castro,S., Millan-Zarate,D., Diaz-Peredo,E., Sanchez-Solano,F., Perez-Rueda,E., Bonavides-Martinez,C. and Collado-Vides,J. (2001) RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res., 29, 72–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Xie W.Q., Jager,K. and Potts,M. (1989) Cyanobacterial RNA polymerase genes rpoC1 and rpoC2 correspond to rpoC of Escherichia coli. J. Bacteriol., 171, 1967–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Schyns G., Jia,L., Coursin,T., Tandeau de Marsac,N. and Houmard,J. (1998) Promoter recognition by a cyanobacterial RNA polymerase: in vitro studies with the Calothrix sp. PCC 7601 transcriptional factors RcaA and RcaD. Plant Mol. Biol., 36, 649–659. [DOI] [PubMed] [Google Scholar]
11.Kaneko T. and Tabata,S. (1997) Complete genome structure of the unicellular cyanobacterium Synechocystis sp. PCC6803. Plant Cell Physiol., 38, 1171–1176. [DOI] [PubMed] [Google Scholar]
12.Kaneko T., Sato,S., Kotani,H., Tanaka,A., Asamizu,E., Nakamura,Y., Miyajima,N., Hirosawa,M., Sugiura,M., Sasamoto,S., Kimura,T., Hosouchi,T., Matsuno,A., Muraki,A., Nakazaki,N., Naruo,K., Okumura,S., Shimpo,S., Takeuchi,C., Wada,T., Watanabe,A., Yamada,M., Yasuda,M. and Tabata,S. (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res., 3, 109–136. [DOI] [PubMed] [Google Scholar]
13.Partensky F., Hess,W.R. and Vaulot,D. (1999) Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol. Mol. Biol. Rev., 63, 106–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Rippka R., Coursin,T., Hess,W.R., Lichtlé,C., Scanlan,D.J., Palinska,K.A., Iteman,I., Partensky,F., Houmard,J. and Herdman,M. (2000) Prochlorococcus marinus Chisholm et al. 1992 subsp. pastoris subsp. nov. strain PCC 9511, the first axenic chlorophyll a₂/b₂-containing cyanobacterium (Oxyphotobacteria). Int. J. Syst. Evol. Microbiol., 50, 1833–1847. [DOI] [PubMed] [Google Scholar]
15.Garcia-Fernandez J.M., Hess,W.R., Houmard,J. and Partensky,F. (1998) Expression of the psbA gene in the marine oxyphotobacteria Prochlorococcus spp. Arch. Biochem. Biophys., 359, 17–23. [DOI] [PubMed] [Google Scholar]
16.Bensing B.A., Meyer,B.J. and Dunny,G.M. (1996) Sensitive detection of bacterial transcription initiation sites and differentiation from RNA processing sites in the pheromone-induced plasmid transfer system of Enterococcus faecalis. Proc. Natl Acad. Sci. USA, 93, 7794–7799. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Argaman L., Hershberg,R., Vogel,J., Bejerano,G., Wagner,E.G., Margalit,H. and Altuvia,S. (2001) Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr. Biol., 11, 941–950. [DOI] [PubMed] [Google Scholar]
18.Mount D.W. (2001) Bioinformatics: Sequences and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
19.Thompson J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Schneider T.D. and Stephens,R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Garczarek L., Partensky,F., Irlbacher,H., Holtzendorff,J., Babin,M., Mary,I., Thomas,J.C. and Hess,W.R. (2001) Differential expression of antenna and core genes in Prochlorococcus PCC 9511 (Oxyphotobacteria) grown under a modulated light-dark cycle. Environ. Microbiol., 3, 168–175. [DOI] [PubMed] [Google Scholar]
22.Qi H., Menzel,R. and Tse-Dinh,Y.C. (1997) Regulation of Escherichia coli topA gene transcription: involvement of a sigmaS-dependent promoter. J. Mol. Biol., 267, 481–489. [DOI] [PubMed] [Google Scholar]
23.Flardh K., Garrido,T. and Vicente,M. (1997) Contribution of individual promoters in the ddlB-ftsZ region to the transcription of the essential cell-division gene ftsZ in Escherichia coli. Mol. Microbiol., 24, 927–936. [DOI] [PubMed] [Google Scholar]
24.Strehl B., Holtzendorff,J., Partensky,F. and Hess,W.R. (1999) A small and compact genome in the marine cyanobacterium Prochlorococcus marinus CCMP 1375: lack of an intron in the gene for tRNA(Leu)(UAA) and a single copy of the rRNA operon. FEMS Microbiol. Lett., 181, 261–266. [DOI] [PubMed] [Google Scholar]
25.Hess W.R., Rocap,G., Ting,C., Larimer,F., Lamerdin,J., Stilwagon,S. and Chisholm,S.W. (2001) The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynthesis Res., 70, 53–72. [DOI] [PubMed] [Google Scholar]
26.Watson G.M. and Tabita,F.R. (1996) Regulation, unique gene organization and unusual primary structure of carbon fixation genes from a marine phycoerythrin-containing cyanobacterium. Plant Mol. Biol., 32, 1103–1115. [DOI] [PubMed] [Google Scholar]
27.Herrero A., Muro-Pastor,A.M. and Flores,E. (2001) Nitrogen control in cyanobacteria. J. Bacteriol., 183, 411–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Luque I., Zabulon,G., Contreras,A. and Houmard,J. (2001) Convergence of two global transcriptional regulators on nitrogen induction of the stress-acclimation gene nblA in the cyanobacterium Synechococcus sp. PCC 7942. Mol. Microbiol., 41, 937–947. [DOI] [PubMed] [Google Scholar]
29.Luque I., Flores,E. and Herrero,A. (1994) Molecular mechanism for the operation of nitrogen control in cyanobacteria. EMBO J., 13, 2862–2869. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Lindell D., Padan,E. and Post,A.F. (1998) Regulation of ntcA expression and nitrite uptake in the marine Synechococcus sp. strain WH7803. J. Bacteriol., 180, 1878–1886. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Lindell D., Erdner,D., Marie,D., Prasil,O., Koblizek,M., Le Gall,F., Rippka,R., Partensky,F., Scanlan,D.J. and Post,A.F. (2002) Nitrogen stress response of Prochlorococcus strain PCC 9511 (oxyphotobacteria) involves contrasting regulation of ntcA and amt1. J. Phycol., 38, 1113–1124. [Google Scholar]
32.Moore L.R., Post,A.F., Rocap,G. and Chisholm,S.W. (2002) Utilization of different nitrogen sources by the marine cyanobacteria, Prochlorococcus and Synechococcus. Limnol. Oceanogr., 47, 989–996. [Google Scholar]
33.Palinska K.A., Laloui,W., Bedu,S., Loiseaux-De Goer,S., Castets,A.M., Rippka,R. and Tandeau De Marsac,N. (2002) The signal transducer P(II) and bicarbonate acquisition in Prochlorococcus marinus PCC 9511, a marine cyanobacterium naturally deficient in nitrate and nitrite assimilation. Microbiology, 148, 2405–2412. [DOI] [PubMed] [Google Scholar]
34.Xu Y., Mori,T. and Johnson,C.H. (2000) Circadian clock-protein expression in cyanobacteria: rhythms and phase setting. EMBO J., 19, 3349–3357. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Iwasaki H., Taniguchi,Y., Ishiura,M. and Kondo,T. (1999) Physical interactions among circadian clock proteins KaiA, KaiB and KaiC in cyanobacteria. EMBO J., 18, 1137–1145. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Ishiura M., Kutsuna,S., Aoki,S., Iwasaki,H., Andersson,C.R., Tanabe,A., Golden,S.S., Johnson,C.H. and Kondo,T. (1998) Expression of a gene cluster kaiABC as a circadian feedback process in cyanobacteria. Science, 281, 1519–1523. [DOI] [PubMed] [Google Scholar]
37.Vaulot D., Marie,D., Olson,R.J. and Chisholm,S.W. (1995) Growth of Prochlorococcus, a photosynthetic prokaryote, in the equatorial Pacific Ocean. Science, 268, 1480–1482. [DOI] [PubMed] [Google Scholar]
38.Jacquet S., Partensky,F., Marie,D., Casotti,R. and Vaulot,D. (2001) Cell cycle regulation by light in Prochlorococcus strains. Appl. Environ. Microbiol., 67, 782–790. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Holtzendorff J., Partensky,F., Jacquet,S., Bruyant,F., Marie,D., Garczarek,L., Mary,I., Vaulot,D. and Hess,W.R. (2001) Diel expression of cell cycle-related genes in synchronized cultures of Prochlorococcus sp strain PCC 9511. J. Bacteriol., 183, 915–920. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Holtzendorff J., Marie,D., Post,A.F., Partensky,F., Rivlin,A. and Hess,W.R. (2002) Synchronized expression of ftsZ in natural Prochlorococcus populations of the Red Sea. Environ. Microbiol., 4, 644–653. [DOI] [PubMed] [Google Scholar]
41.Yada T., Nakao,M., Totoki,Y. and Nakai,K. (1999) Modeling and predicting transcriptional units of E. coli genes using hidden Markov models. Bioinformatics, 15, 987–993. [DOI] [PubMed] [Google Scholar]
42.Buck M., Gallegos,M.T., Studholme,D.J., Guo,Y. and Gralla,J.D. (2000) The bacterial enhancer-dependent σ⁵⁴ (σ^N) transcription factor. J. Bacteriol., 182, 4129–4136. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Hengge-Aronis R. (2002) Stationary phase gene regulation: what makes an Escherichia coli promoter σ^S-selective? Curr. Opin. Microbiol., 5, 591–595. [DOI] [PubMed] [Google Scholar]
44.Kaneko T., Nakamura,Y., Wolk,C.P., Kuritz,T., Sasamoto,S., Watanabe,A., Iriguchi,M., Ishikawa,A., Kawashima,K., Kimura,T., Kishida,Y., Kohara,M., Matsumoto,M., Matsuno,A., Muraki,A., Nakazaki,N., Shimpo,S., Sugimoto,M., Takazawa,M., Yamada,M., Yasuda,M. and Tabata,S. (2001) Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120. DNA Res., 8, 205–213. [DOI] [PubMed] [Google Scholar]
45.Nakamura Y., Kaneko,T., Sato,S., Ikeuchi,M., Katoh,H., Sasamoto,S., Watanabe,A., Iriguchi,M., Kawashima,K., Kimura,T., Kishida,Y., Kiyokawa,C., Kohara,M., Matsumoto,M., Matsuno,A., Nakazaki,N., Shimpo,S., Sugimoto,M., Takeuchi,C., Yamada,M. and Tabata,S. (2002) Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1. DNA Res., 9, 123–130. [DOI] [PubMed] [Google Scholar]
46.Imamura S., Yoshihara,S., Nakano,S., Shiozaki,N., Yamada,A., Tanaka,K., Takahashi,H., Asayama,M. and Shirai,M. (2003) Purification, characterization and gene expression of all sigma factors of RNA polymerase in a cyanobacterium. J. Mol. Biol., 325, 857–872. [DOI] [PubMed] [Google Scholar]
47.Bhaya D., Watanabe,N., Ogawa,T. and Grossman,A.R. (1999) The role of an alternative sigma factor in motility and pilus formation in the cyanobacterium Synechocystis sp. strain PCC6803. Proc. Natl Acad. Sci. USA, 96, 3188–3193. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Partensky F., Blanchot,J. and Vaulot,D. (1999) Differential distribution and ecology of Prochlorococcus and Synechococcus in oceanic waters: a review. In Charpy,L. and Larkum,A.W.D. (eds), Marine Cyanobacteria. Musée Océanographique, Monaco, pp. 457–475.
49.Brahamsha B. and Haselkorn,R. (1992) Identification of multiple RNA polymerase sigma factor homologs in the cyanobacterium Anabaena sp. strain PCC 7120: cloning, expression and inactivation of the sigB and sigC genes. J. Bacteriol., 174, 7273–7282. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.GotoSeki A., Shirokane,M., Masuda,S., Tanaka,K. and Takahashi,H. (1999) Specificity crosstalk among group 1 and group 2 sigma factors in the cyanobacterium Synechococcus sp PCC7942: in vitro specificity and a phylogenetic analysis. Mol. Microbiol., 34, 473–484. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Material]

nar_31_11_2890__index.html^{(1.2KB, html)}

nar_31_11_2890__1.pdf^{(113.1KB, pdf)}

nar_31_11_2890__2.pdf^{(206.7KB, pdf)}

nar_31_11_2890__3.pdf^{(47.9KB, pdf)}

[gkg398c1] 1.Schopf J.W. (1993) Microfossils of the Early Archean Apex chert: new evidence of the antiquity of life. Science, 260, 640–646. [DOI] [PubMed] [Google Scholar]

[gkg398c2] 2.Douglas S. (1998) Plastid evolution: origins, diversity, trends. Curr. Opin. Genet. Dev., 8, 655–661. [DOI] [PubMed] [Google Scholar]

[gkg398c3] 3.Moreira D., Le Guyader,H. and Philippe,H. (2000) The origin of red algae and the evolution of chloroplasts. Nature, 405, 69–72. [DOI] [PubMed] [Google Scholar]

[gkg398c4] 4.Rujan T. and Martin,W. (2001) How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet., 17, 113–120. [DOI] [PubMed] [Google Scholar]

[gkg398c5] 5.Martin W., Rujan,T., Richly,E., Hansen,A., Cornelsen,S., Lins,T., Leister,D., Stoebe,B., Hasegawa,M. and Penny,D. (2002) Evolutionary analysis of Arabidopsis, cyanobacterial and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl Acad. Sci. USA, 99, 12246–12251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c6] 6.Ishihama A. (2000) Functional modulation of Escherichia coli RNA polymerase. Annu. Rev. Microbiol., 54, 499–518. [DOI] [PubMed] [Google Scholar]

[gkg398c7] 7.Hawley D.K. and McClure,W.R. (1983) Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res., 11, 2237–2255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c8] 8.Salgado H., Santos-Zavaleta,A., Gama-Castro,S., Millan-Zarate,D., Diaz-Peredo,E., Sanchez-Solano,F., Perez-Rueda,E., Bonavides-Martinez,C. and Collado-Vides,J. (2001) RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res., 29, 72–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c9] 9.Xie W.Q., Jager,K. and Potts,M. (1989) Cyanobacterial RNA polymerase genes rpoC1 and rpoC2 correspond to rpoC of Escherichia coli. J. Bacteriol., 171, 1967–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c10] 10.Schyns G., Jia,L., Coursin,T., Tandeau de Marsac,N. and Houmard,J. (1998) Promoter recognition by a cyanobacterial RNA polymerase: in vitro studies with the Calothrix sp. PCC 7601 transcriptional factors RcaA and RcaD. Plant Mol. Biol., 36, 649–659. [DOI] [PubMed] [Google Scholar]

[gkg398c11] 11.Kaneko T. and Tabata,S. (1997) Complete genome structure of the unicellular cyanobacterium Synechocystis sp. PCC6803. Plant Cell Physiol., 38, 1171–1176. [DOI] [PubMed] [Google Scholar]

[gkg398c12] 12.Kaneko T., Sato,S., Kotani,H., Tanaka,A., Asamizu,E., Nakamura,Y., Miyajima,N., Hirosawa,M., Sugiura,M., Sasamoto,S., Kimura,T., Hosouchi,T., Matsuno,A., Muraki,A., Nakazaki,N., Naruo,K., Okumura,S., Shimpo,S., Takeuchi,C., Wada,T., Watanabe,A., Yamada,M., Yasuda,M. and Tabata,S. (1996) Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res., 3, 109–136. [DOI] [PubMed] [Google Scholar]

[gkg398c13] 13.Partensky F., Hess,W.R. and Vaulot,D. (1999) Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol. Mol. Biol. Rev., 63, 106–127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c14] 14.Rippka R., Coursin,T., Hess,W.R., Lichtlé,C., Scanlan,D.J., Palinska,K.A., Iteman,I., Partensky,F., Houmard,J. and Herdman,M. (2000) Prochlorococcus marinus Chisholm et al. 1992 subsp. pastoris subsp. nov. strain PCC 9511, the first axenic chlorophyll a₂/b₂-containing cyanobacterium (Oxyphotobacteria). Int. J. Syst. Evol. Microbiol., 50, 1833–1847. [DOI] [PubMed] [Google Scholar]

[gkg398c15] 15.Garcia-Fernandez J.M., Hess,W.R., Houmard,J. and Partensky,F. (1998) Expression of the psbA gene in the marine oxyphotobacteria Prochlorococcus spp. Arch. Biochem. Biophys., 359, 17–23. [DOI] [PubMed] [Google Scholar]

[gkg398c16] 16.Bensing B.A., Meyer,B.J. and Dunny,G.M. (1996) Sensitive detection of bacterial transcription initiation sites and differentiation from RNA processing sites in the pheromone-induced plasmid transfer system of Enterococcus faecalis. Proc. Natl Acad. Sci. USA, 93, 7794–7799. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c17] 17.Argaman L., Hershberg,R., Vogel,J., Bejerano,G., Wagner,E.G., Margalit,H. and Altuvia,S. (2001) Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr. Biol., 11, 941–950. [DOI] [PubMed] [Google Scholar]

[gkg398c18] 18.Mount D.W. (2001) Bioinformatics: Sequences and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

[gkg398c19] 19.Thompson J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c20] 20.Schneider T.D. and Stephens,R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c21] 21.Garczarek L., Partensky,F., Irlbacher,H., Holtzendorff,J., Babin,M., Mary,I., Thomas,J.C. and Hess,W.R. (2001) Differential expression of antenna and core genes in Prochlorococcus PCC 9511 (Oxyphotobacteria) grown under a modulated light-dark cycle. Environ. Microbiol., 3, 168–175. [DOI] [PubMed] [Google Scholar]

[gkg398c22] 22.Qi H., Menzel,R. and Tse-Dinh,Y.C. (1997) Regulation of Escherichia coli topA gene transcription: involvement of a sigmaS-dependent promoter. J. Mol. Biol., 267, 481–489. [DOI] [PubMed] [Google Scholar]

[gkg398c23] 23.Flardh K., Garrido,T. and Vicente,M. (1997) Contribution of individual promoters in the ddlB-ftsZ region to the transcription of the essential cell-division gene ftsZ in Escherichia coli. Mol. Microbiol., 24, 927–936. [DOI] [PubMed] [Google Scholar]

[gkg398c24] 24.Strehl B., Holtzendorff,J., Partensky,F. and Hess,W.R. (1999) A small and compact genome in the marine cyanobacterium Prochlorococcus marinus CCMP 1375: lack of an intron in the gene for tRNA(Leu)(UAA) and a single copy of the rRNA operon. FEMS Microbiol. Lett., 181, 261–266. [DOI] [PubMed] [Google Scholar]

[gkg398c25] 25.Hess W.R., Rocap,G., Ting,C., Larimer,F., Lamerdin,J., Stilwagon,S. and Chisholm,S.W. (2001) The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynthesis Res., 70, 53–72. [DOI] [PubMed] [Google Scholar]

[gkg398c26] 26.Watson G.M. and Tabita,F.R. (1996) Regulation, unique gene organization and unusual primary structure of carbon fixation genes from a marine phycoerythrin-containing cyanobacterium. Plant Mol. Biol., 32, 1103–1115. [DOI] [PubMed] [Google Scholar]

[gkg398c27] 27.Herrero A., Muro-Pastor,A.M. and Flores,E. (2001) Nitrogen control in cyanobacteria. J. Bacteriol., 183, 411–425. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c28] 28.Luque I., Zabulon,G., Contreras,A. and Houmard,J. (2001) Convergence of two global transcriptional regulators on nitrogen induction of the stress-acclimation gene nblA in the cyanobacterium Synechococcus sp. PCC 7942. Mol. Microbiol., 41, 937–947. [DOI] [PubMed] [Google Scholar]

[gkg398c29] 29.Luque I., Flores,E. and Herrero,A. (1994) Molecular mechanism for the operation of nitrogen control in cyanobacteria. EMBO J., 13, 2862–2869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c30] 30.Lindell D., Padan,E. and Post,A.F. (1998) Regulation of ntcA expression and nitrite uptake in the marine Synechococcus sp. strain WH7803. J. Bacteriol., 180, 1878–1886. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c31] 31.Lindell D., Erdner,D., Marie,D., Prasil,O., Koblizek,M., Le Gall,F., Rippka,R., Partensky,F., Scanlan,D.J. and Post,A.F. (2002) Nitrogen stress response of Prochlorococcus strain PCC 9511 (oxyphotobacteria) involves contrasting regulation of ntcA and amt1. J. Phycol., 38, 1113–1124. [Google Scholar]

[gkg398c32] 32.Moore L.R., Post,A.F., Rocap,G. and Chisholm,S.W. (2002) Utilization of different nitrogen sources by the marine cyanobacteria, Prochlorococcus and Synechococcus. Limnol. Oceanogr., 47, 989–996. [Google Scholar]

[gkg398c33] 33.Palinska K.A., Laloui,W., Bedu,S., Loiseaux-De Goer,S., Castets,A.M., Rippka,R. and Tandeau De Marsac,N. (2002) The signal transducer P(II) and bicarbonate acquisition in Prochlorococcus marinus PCC 9511, a marine cyanobacterium naturally deficient in nitrate and nitrite assimilation. Microbiology, 148, 2405–2412. [DOI] [PubMed] [Google Scholar]

[gkg398c34] 34.Xu Y., Mori,T. and Johnson,C.H. (2000) Circadian clock-protein expression in cyanobacteria: rhythms and phase setting. EMBO J., 19, 3349–3357. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c35] 35.Iwasaki H., Taniguchi,Y., Ishiura,M. and Kondo,T. (1999) Physical interactions among circadian clock proteins KaiA, KaiB and KaiC in cyanobacteria. EMBO J., 18, 1137–1145. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c36] 36.Ishiura M., Kutsuna,S., Aoki,S., Iwasaki,H., Andersson,C.R., Tanabe,A., Golden,S.S., Johnson,C.H. and Kondo,T. (1998) Expression of a gene cluster kaiABC as a circadian feedback process in cyanobacteria. Science, 281, 1519–1523. [DOI] [PubMed] [Google Scholar]

[gkg398c37] 37.Vaulot D., Marie,D., Olson,R.J. and Chisholm,S.W. (1995) Growth of Prochlorococcus, a photosynthetic prokaryote, in the equatorial Pacific Ocean. Science, 268, 1480–1482. [DOI] [PubMed] [Google Scholar]

[gkg398c38] 38.Jacquet S., Partensky,F., Marie,D., Casotti,R. and Vaulot,D. (2001) Cell cycle regulation by light in Prochlorococcus strains. Appl. Environ. Microbiol., 67, 782–790. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c39] 39.Holtzendorff J., Partensky,F., Jacquet,S., Bruyant,F., Marie,D., Garczarek,L., Mary,I., Vaulot,D. and Hess,W.R. (2001) Diel expression of cell cycle-related genes in synchronized cultures of Prochlorococcus sp strain PCC 9511. J. Bacteriol., 183, 915–920. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c40] 40.Holtzendorff J., Marie,D., Post,A.F., Partensky,F., Rivlin,A. and Hess,W.R. (2002) Synchronized expression of ftsZ in natural Prochlorococcus populations of the Red Sea. Environ. Microbiol., 4, 644–653. [DOI] [PubMed] [Google Scholar]

[gkg398c41] 41.Yada T., Nakao,M., Totoki,Y. and Nakai,K. (1999) Modeling and predicting transcriptional units of E. coli genes using hidden Markov models. Bioinformatics, 15, 987–993. [DOI] [PubMed] [Google Scholar]

[gkg398c42] 42.Buck M., Gallegos,M.T., Studholme,D.J., Guo,Y. and Gralla,J.D. (2000) The bacterial enhancer-dependent σ⁵⁴ (σ^N) transcription factor. J. Bacteriol., 182, 4129–4136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c43] 43.Hengge-Aronis R. (2002) Stationary phase gene regulation: what makes an Escherichia coli promoter σ^S-selective? Curr. Opin. Microbiol., 5, 591–595. [DOI] [PubMed] [Google Scholar]

[gkg398c44] 44.Kaneko T., Nakamura,Y., Wolk,C.P., Kuritz,T., Sasamoto,S., Watanabe,A., Iriguchi,M., Ishikawa,A., Kawashima,K., Kimura,T., Kishida,Y., Kohara,M., Matsumoto,M., Matsuno,A., Muraki,A., Nakazaki,N., Shimpo,S., Sugimoto,M., Takazawa,M., Yamada,M., Yasuda,M. and Tabata,S. (2001) Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120. DNA Res., 8, 205–213. [DOI] [PubMed] [Google Scholar]

[gkg398c45] 45.Nakamura Y., Kaneko,T., Sato,S., Ikeuchi,M., Katoh,H., Sasamoto,S., Watanabe,A., Iriguchi,M., Kawashima,K., Kimura,T., Kishida,Y., Kiyokawa,C., Kohara,M., Matsumoto,M., Matsuno,A., Nakazaki,N., Shimpo,S., Sugimoto,M., Takeuchi,C., Yamada,M. and Tabata,S. (2002) Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1. DNA Res., 9, 123–130. [DOI] [PubMed] [Google Scholar]

[gkg398c46] 46.Imamura S., Yoshihara,S., Nakano,S., Shiozaki,N., Yamada,A., Tanaka,K., Takahashi,H., Asayama,M. and Shirai,M. (2003) Purification, characterization and gene expression of all sigma factors of RNA polymerase in a cyanobacterium. J. Mol. Biol., 325, 857–872. [DOI] [PubMed] [Google Scholar]

[gkg398c47] 47.Bhaya D., Watanabe,N., Ogawa,T. and Grossman,A.R. (1999) The role of an alternative sigma factor in motility and pilus formation in the cyanobacterium Synechocystis sp. strain PCC6803. Proc. Natl Acad. Sci. USA, 96, 3188–3193. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c48] 48.Partensky F., Blanchot,J. and Vaulot,D. (1999) Differential distribution and ecology of Prochlorococcus and Synechococcus in oceanic waters: a review. In Charpy,L. and Larkum,A.W.D. (eds), Marine Cyanobacteria. Musée Océanographique, Monaco, pp. 457–475.

[gkg398c49] 49.Brahamsha B. and Haselkorn,R. (1992) Identification of multiple RNA polymerase sigma factor homologs in the cyanobacterium Anabaena sp. strain PCC 7120: cloning, expression and inactivation of the sigB and sigC genes. J. Bacteriol., 174, 7273–7282. [DOI] [PMC free article] [PubMed] [Google Scholar]

[gkg398c50] 50.GotoSeki A., Shirokane,M., Masuda,S., Tanaka,K. and Takahashi,H. (1999) Specificity crosstalk among group 1 and group 2 sigma factors in the cyanobacterium Synechococcus sp PCC7942: in vitro specificity and a phylogenetic analysis. Mol. Microbiol., 34, 473–484. [DOI] [PubMed] [Google Scholar]

PERMALINK

Experimental and computational analysis of transcriptional start sites in the cyanobacterium Prochlorococcus MED4

Jörg Vogel

Ilka M Axmann

Hanspeter Herzel

Wolfgang R Hess

Abstract

INTRODUCTION