Abstract
Bacteria respond to their environment by regulating mRNA synthesis, often by altering the genomic sites at which RNA polymerase initiates transcription. Here, we investigate genome-wide changes in transcription start site (TSS) usage by Clostridium phytofermentans, a model bacterium for fermentation of lignocellulosic biomass. We quantify expression of nearly 10,000 TSS at single base resolution by Capp-Switch sequencing, which combines capture of synthetically capped 5′ mRNA fragments with template-switching reverse transcription. We find the locations and expression levels of TSS for hundreds of genes change during metabolism of different plant substrates. We show that TSS reveals riboswitches, non-coding RNA and novel transcription units. We identify sequence motifs associated with carbon source-specific TSS and use them for regulon discovery, implicating a LacI/GalR protein in control of pectin metabolism. We discuss how the high resolution and specificity of Capp-Switch enables study of condition-specific changes in transcription initiation in bacteria.
Bacteria may respond to a change in environment by using alternative transcriptional start sites. Here, the authors use a novel genome-wide capture and reverse transcription method to find substrate-specific start sites for hundreds of genes at single base resolution in Clostridium phytofermentans.
Bacteria translate environmental signals into cellular responses using a network of regulatory RNA and proteins that control genome-wide transcription patterns. Many of these regulators affect where RNA polymerase initiates messenger RNA (mRNA) synthesis at transcription start sites (TSS). As such, locating and quantifying changes in TSS usage is an important step to understand bacterial gene regulation. Here, we investigate TSS architecture in Clostridium phytofermentans ISDg, a soil bacterium that ferments plant biomass into ethanol, H2 and acetate1, and belongs to the Lachnospiraceae family that includes gut commensals with important roles in host nutrition2,3. This anaerobic mesophile metabolizes diverse plant components including cellulose, hemicellulose and pectin by tailoring expression of many carbohydrate-active enzymes (CAZymes) and other metabolic enzymes to the available substrate4,5. C. phytofermentans has a 4.8 Mb genome with 3,926 predicted protein-encoding genes3, and its ability to alter gene expression in response to carbon sources and other environmental cues is mediated by over 300 transcription regulator proteins6 and numerous non-coding RNA including metabolite-sensing riboswitches7.
We investigate genome-wide patterns of C. phytofermentans transcription initiation on heterogeneous plant substrates by demonstrating an approach called Capp-Switch sequencing. The initiating nucleotide of nascent mRNA is distinguished by a 5′ triphosphate (5′-PPP), which has been exploited for genome-wide TSS identification with dRNA-seq8 by depleting rRNA and other monophosphorylated transcripts using terminal exonuclease (TEX). dRNA-seq has been applied to diverse bacteria9,10,11,12,13, but incomplete and non-specific degradation of processed RNA requires TSS identification to be based on statistical comparison of read coverage in +TEX and −TEX samples. Capp-Switch avoids these problems by capturing and purifying 5′ mRNA fragments, which are reverse transcribed with template-switching to tagged cDNA for high-throughput sequencing (Fig. 1). The 5′-PPP of mRNA are modified by vaccinia capping enzyme (VCE) to bear a biotinylated guanosine cap that facilitates their capture and purification using streptavidin magnetic beads. Recently, TSS were identified by Cappable-Seq14 using VCE to add a desthiobiotin cap for bead-based capture of 5′ mRNA, which were then eluted from the beads and de-capped to ligate adapters for reverse transcription to tagged cDNA. Capp-Switch streamlines this approach by reverse transcribing the 5′ mRNA fragments using template-switching by Moloney murine leukemia virus reverse (MMLV) transcriptase15. Template-switching avoids adapter ligation and enables synthesis of 5′-tagged cDNA without releasing RNA from the beads, permitting use of an irreversible, biotinylated cap to increase RNA capture affinity. In all, we show Capp-Switch is a robust method that yields a genome-wide, strand-specific, quantitative map of TSS at single nucleotide resolution.
We apply Capp-Switch sequencing to define a genome-wide map of 9,457 TSS during C. phytofermentans growth on raw biomass, heterogeneous polysaccharides (cellulose, hemicellulose and pectin) and their constituent sugars. We use this TSS map to investigate features controlling gene regulation, such as RNA polymerase binding sites, 5′ untranslated region (UTR) structure, alternative promoters, operons and non-standard (leaderless and antisense) transcription. We identify sequence motifs associated with groups of TSS that are differentially expressed on specific carbon sources and show these motifs can be used to reconstruct transcription factor regulons. By integrating Capp-Switch data with an updated genome annotation, RNA-seq and proteomics, we discover novel transcriptional units (TU) and protein-encoding genes. Finally, we discuss how Capp-Switch sequencing can be applied as a general approach to explore transcription regulation in prokaryotes.
Results
General transcriptome features
Capp-Switch sequencing quantified TSS with high reproducibility between duplicate model substrate (Fig. 2a) and raw biomass (Fig. 2b) cultures. We identified 9,457 TSS across treatments (Supplementary Data 1), one-third of which were expressed in both sugar and polysaccharide cultures (Fig. 2c). Most reads (74%) contribute to InterS TSS (Fig. 2d), which we observed upstream of 898 genes. Among these, 687 genes (77%) are predicted to start operons16 (Supplementary Data 2), supporting these operon predictions and the existence of many sub-operons. The 5′ UTR, spanning from the primary TSS to the start codon, is less than 100 bp for most genes, but there is no correlation between 5′ UTR length and TSS strength (Fig. 2e). Studies in other bacteria report many leaderless mRNA without 5′ UTR and ribosome binding sites (RBS)11. Four per cent of InterS TSS are potentially leaderless in C. phytofermentans, but these genes generally have another upstream TSS and retain a typical RBS similar to highly expressed C. phytofermentans genes (Supplementary Fig. 1).
Most genes were expressed from a single, primary TSS on all substrates (Fig. 2f), but 191 (21%) genes altered their primary TSS in response to carbon source. Further, genes with substrate-specific InterS TSS are often differentially expressed on that carbon source (χ2 test, P<0.01 for all substrates relative to glucose) (Fig. 2g), supporting that changing TSS is a widespread means of transcription regulation. In total, more than a thousand TSS are specific to each polysaccharide (Supplementary Fig. 2A). Xylan-specific (Supplementary Fig. 2B) and pectin-specific (Supplementary Fig. 2C) TSS are primarily associated with carbohydrate metabolism genes, while the most abundant functional category of cellulose-specific TSS is prophage genes (Supplementary Fig. 2D). The C. phytofermentans genome includes a large prophage island that is not predicted to encode a viable phage3, but whose transcription is up-regulated on cellulose and biomass (Supplementary Fig. 3). This burst of transcriptional initiation at viral genes could indicate prophage excision was triggered on cellulosic substrates, that is, by low carbon stress, or that viral proteins contribute to bacterial fitness17.
Sequences upstream of primary TSS generally contain the sigma-A-type consensus −35 and −10 hexamers recognized by RNA polymerase (RNAP) and associated elements that likely contribute to promoter function in this organism. An A-rich region upstream of the -35 hexamer (TTGACA) (Fig. 2h) resembles the ‘UP element' that stimulates transcription initiation by interacting with the RNAP alpha subunit18. Also, the Pribnow hexamer (TATAAT) has an upstream TG di-nucleotide (Fig. 2i), which enhances transcription in certain other bacteria19,20,21 by interacting with the RNAP sigma-A subunit22. In contrast, searching upstream of IntraS TSS identified an AT-rich stretch ∼10 bp upstream of the TSS lacking RNAP binding sites (Supplementary Fig. 4A), suggesting IntraS TSS often result from promiscuous initiation at AT-rich sequences. We observed IntraS TSS comprised that more than 50% of TSS (Fig. 2d), albeit with fewer reads per site than InterS TSS. dRNA-seq studies have rationalized similarly abundant intragenic TSS as resulting from incomplete TEX degradation12, but our data support these TSS bear 5′-PPP indicative of transcription initiation. IntraS TSS are preferentially found in the 5′ end of genes (Supplementary Fig. 4B), supporting they are under selective pressure and may have roles including expression of alternative protein isoforms or as mimicry molecules to sequester other RNA and ribonucleases from their mRNA targets9.
Capp-Switch reads (Fig. 3a–d) start at specific positions with respect to known genes showing TSS at single base resolution, whereas RNA-seq reads begin throughout genes (Fig. 3e–h). We observed four common TSS situations: genes with a single upstream TSS, genes with both upstream and intragenic TSS, genes with multiple TSS on a single substrate and genes with substrate-specific TSS. For example, the glyceraldehyde 3-phosphate dehydrogenase (gadph) gene is constitutively transcribed from a single TSS (Fig. 3a). The pyruvate ferredoxin oxidoreductase (pfor) gene is transcribed from a single, upstream TSS and another, weaker TSS in the coding sequence (Fig. 3b). The cel5A cellulase gene23 is simultaneously transcribed from multiple TSS on cellulose (Fig. 3c), as are other cellulases (Supplementary Fig. 5). CAZyme expression in C. phytofermentans is controlled by carbon source24,25 and our data supports their regulation involves multiple promoters. The cphy1510 gene encoding the most active xylanase5 is transcribed from three TSS on xylan and a different, upstream TSS on pectin (Fig. 3d). Similarly, genes for other CAZymes including three cellulases, one other xylanase, four pectinases and two glycosyl transferases changed their primary TSS as a function of carbon source. We confirmed the positions of the primary TSS identified by Capp-Switch for gadph, pfor (IntraS and primary TSS), cphy2243 and cphy1510 (xylan and pectin) using 5′ RACE (Supplementary Fig. 6).
Motifs associated with TSS clusters
We clustered TSS based on expression across carbon sources and searched sequences surrounding TSS for overrepresented motifs (Supplementary Fig. 7; Supplementary Data 3), revealing TSS clusters that share motifs with potential regulatory functions (Fig. 4). For example, the TSS cluster up-regulated on galacturonic acid and homogalacturonan (HG) (Fig. 4c) has a palindromic motif resembling the cre operator (TGAAAGCGCTTTCA) bound by B. subtilis CcpA26,27, a LacI/GalR regulator of numerous carbon metabolism genes. LacI/GalR genes often have upstream copies of their operators to auto-repress transcription28, and we found three copies of the galacturonic acid cluster motif in the 5′ UTR of cphy2742, a LacI/GalR gene specifically up-regulated on galacturonic acid (Fig. 5a). Further, three of the six LacI/GalR genes with detected primary TSS have upstream variants of the cre operator that are conserved in their orthologs from related species (Fig. 5b–d), leading us to propose C. phytofermentans LacI/GalR regulators recognize related, but distinct, operators to control separate regulons. Supportingly, the putative Cphy2742 operator (Fig. 5b) is upstream of 22 genes in the C. phytofermentans genome (Supplementary Table 1) including 3 CAZymes (PL9 pectin lyases) that degrade HG to galacturonic acid5 and transcription units containing all genes needed to assimilate galacturonic acid29 (Supplementary Fig. 8).
The putative Cphy2742 operator sites are co-located with or downstream of TSS for HG degradation and galacturonic acid metabolism genes (Fig. 5e), supporting Cphy2742 binds these sites to block transcription. Transcription of the pl9 genes cphy2919 and cphy3869 switches to upstream primary TSS on galacturonic acid relative to HG, but all TSS are close enough to be potentially regulated by Cphy2742 operators. The pta-ackA (cphy1326-7) acetate synthesis operon also has a Cphy2742 operator and both pta-ackA expression and acetate formation are elevated on galacturonic acid (Supplementary Fig. 9). While B. subtilis CcpA represses most of its targets, it activates pta and ackA transcription30,31 by binding upstream of their promoters32. The Cphy2742 operator is also upstream of the pta gene TSS, suggesting Cphy2742 may similarly activate transcription of the pta-ackA operon as well as the glycolytic gene ppdK and the hydrolase gene cphy0367. Collectively, we propose Cphy2742 represses a comprehensive set of pectin fermentation genes by binding a conserved palindrome at or downstream of their TSS to block transcription. In response to a galacturonic acid-based signal, Cphy2742 de-represses itself and its targets, and may activate transcription of acetate synthesis and other aspects of carbon metabolism by binding upstream of TSS.
Antisense and novel transcripts
Recent studies found 30–40% of TSS are antisense in other bacteria8,9,13. However, antisense transcription appears rare in C. phytofermentans: <1% of TSS were antisense either between (InterA) or within genes (IntraA) (Fig. 2d). To further investigate whether diffuse antisense transcription was underestimated by our TSS thresholds, we classified all mapped read starts, including those not meeting TSS thresholds. Even then, InterA and IntraA classes together comprise <4% reads. This dearth of antisense transcription may relate to the early evolutionary divergence of the Clostridiales33. Alternatively, we would not detect antisense transcripts that were processed to remove 5′-PPP or that are below the 200 bp size threshold of our cDNA libraries, but studies in other bacteria using larger size thresholds found antisense TSS in ∼35% of genes10. While comparatively rare, antisense transcription appears to have important cellular functions. For example, we observed an antisense TSS in the 5′ UTR of the sporulation regulator spoOA (cphy2497) that also opposes transcription of the spoIVB peptidase (cphy2498) (Fig. 6a). This TSS was expressed on all sugars, but not polysaccharides, supporting antisense transcription has a role in repressing sporulation during log growth in sugar-replete conditions.
TSS reveal novel transcriptional features such as a TU downstream of the glycoside hydrolase cphy2658 that is up-regulated to have the strongest initiation site in the genome on cellulose and corn stover (Fig. 6b). This region contains a hypothetical open-reading frame (ORF) in the MaGe annotation (clops3132) that has no similar sequences in Genbank, but the ORF lacks an ribosome binding site (RBS), and we did not detect any expressed peptides from this region by mass spectrometry, suggesting it is a non-coding RNA. The most highly expressed ABC transporter on glucose is a putative operon (cphy2241-3) with a single TSS (Supplementary Fig. 5C,F). On all other carbon sources, we observed repression of cphy2241-3 along with appearance of an upstream, antisense TU (Fig. 6c) that has no mapped peptides or predicted ORF. Non-coding RNA are often associated with ABC transporters in clostridia34, and they may also regulate ABC transport in this organism.
The C. phytofermentans genome may encode significantly more genes than in the NCBI Genbank annotation. Classifying TSS using the MaGe annotation showed 735 (7%) TSS map to MaGe-specific clops genes of unknown function (Supplementary Data 4), including 64 clops genes with InterS TSS. We examined which of these novel TU encode proteins by mapping C. phytofermentans MS/MS peptide spectra to the genome translated in all frames, identifying peptides outside the predicted proteome in 21 InterS, 13 IntraS, 5 InterA and 25 IntraA regions (Supplementary Data 5). The combination of TSS and expressed peptides supports ORFs with N-terminal extensions such as cphy0891 (Supplementary Fig. 10A) and the existence of novel ORFs. For example, clops3461, which overlaps with cphy2929 on the opposite strand (Fig. 6d), and an antisense overlapping ORF in cphy1953 encoding the ComEA competence protein (Supplementary Fig. 10B).
TSS also show mechanisms of RNA-mediated gene regulation. Comparative genomics with other clostridia detected a putative T-box upstream of the C. phytofermentans trp operon34. In low tryptophan conditions, the T-box promotes antitermination of the trp operon by base pairing with uncharged tRNAtrp (ref. 35). We observed transcription halted abruptly in the 5′ UTR of the trp operon in glucose cultures (Fig. 6e), consistent with T-box-mediated repression. In cellulose cultures, antitermination in the T-box enabled trp operon mRNA expression, potentially enabling translation of the trytophan-rich carbohydrate binding modules in cellulases and other CAZymes. TSS also support riboswitches associated with genes for metabolism of flavin mononucleotide (FMN), cobalamin, thiamine pyrophosphate (TPP) and lysine (Supplementary Data 6). For example, C. phytofermentans is auxotrophic for thiamine, which it uptakes by a thiamine transporter, Cphy0729 (ref. 36). The cphy0729 gene has a single, constitutive TSS with an extended 5′ UTR containing a putative TPP-sensing riboswitch (Fig. 6f) that could regulate transporter expression in response to intracellular TPP levels37.
Discussion
The strategy presented here to quantify condition-specific changes in transcription initiation by Capp-Switch sequencing could be generally applied to dissect the regulation of complex bacterial phenotypes. In this study, we explored the transcriptional programme enabling C. phytofermentans to ferment the cellulosic, hemicellulosic and pectic components of plant biomass. We found that growth on these different carbon sources entailed widespread TSS changes, including use of substrate-specific TSS for genes encoding biomass-degrading enzymes such as cellulases, xylanases and pectinases. Substrate-specific TSS could enable tuning of expression by changing promoters or the regulatory properties (that is, binding sites or secondary structure) of the 5′ UTR. We observed that genes encoding cellulases and other enzymes are simultaneously expressed from more than one TSS. Multiple regulators may control transcription of these genes, reflecting the numerous transcription factors encoded by this organism (Supplementary Data 7). Genes for biomass-degrading enzymes in other Clostridiales are regulated by various transcription factors including a two-component system for hemicellulases38, a LacI/GalR protein for β-1-3 glucanases39 and alternative sigma factors for cellulases40. We defined TSS clusters that were differentially expressed on specific carbon sources and used them to guide the discovery of sequence motifs with potential regulatory function, leading us to identify the LacI/GalR Cphy2742 as a putative regulator of pectin metabolism. Combining TSS mapping with motif searching could be broadly applied to LacI/GalR regulators and other types of transcription factors. For example, each of the 4 TetR regulators for which we detected TSS also have conserved, TSS-associated palindromes that resemble operator sites (Supplementary Fig. 11).
We also gained insight into regulatory mechanisms such as antisense transcription, leaderless transcription and non-coding RNA. We observed that antisense and leaderless transcription are much rarer than reported in other bacteria and it will be interesting to see if they are similarly uncommon in closely-related bacteria. We also show that integration of Capp-Switch TSS mapping with RNA-seq and proteomics enables discovery of novel transcription units and protein-encoding genes. Transcription initiation is a complex and important component of gene regulation for which most of the underlying mechanisms in C. phytofermentans are yet unknown. Further, these results illustrate how little we know about gene regulation in plant-fermenting clostridia, a group of bacteria with important roles in soil and gut microbiomes that have significant potential to serve as biocatalysts for industrial transformation of plant biomass.
Methods
Bacterial cultivation
C. phytofermentans ISDg (ATCC 700394) was cultured anaerobically at 30 °C in GS2 medium41 containing 5 g l−1 of either D-(+)-glucose (Sigma G5767), D-(+)-xylose (Sigma X3877), D-galacturonic acid sodium salt (Sigma 73960), regenerated amorphous cellulose (RAC) from Avicel PH-101 (Sigma 11365), birchwood xylan (Sigma X0502), apple pectin (HG) (Sigma P8471) or raw corn stover (Qteros Inc) cut in 0.5 × 3.0 cm strips. RAC was prepared by phosphoric acid treatment42. Duplicate cultures were sampled in mid-log phase or after 2 days (RAC) or 3 days (stover). Fermentation products were quantified by HPLC43.
Capp-Switch library preparation
Total RNA was extracted from duplicate cultures for each treatment using TRI reagent (Sigma 93289) and treated with Turbo DNase (Ambion AM2238) at 0.2 U μg−1 RNA for 30 min at 37 °C. RNA was purified by Zymo Concentrator-5 (Zymo Research R1015) (>200 bp capture) into 15 μl water. RNA was 5′ capped using VCE (NEB M2080) at 3 U μg−1 RNA with 0.1 mM SAM and 0.5 mM 3′ biotin-GTP (NEB N0760) for 30 min at 37 °C and purified by Zymo Concentrator-5 (>200 bp capture) with two additional washes into 45 μl water. RNA was fragmented for 30 s at 94 °C using NEBNext Magnesium-based RNA fragmentation buffer (NEB E6101) and purified by Zymo Concentrator-5 (total RNA capture) into 100 μl water. Streptavidin magnetic beads (NEB S1421S) were pre-washed twice with low-salt buffer (10 mM Tris, 50 mM NaCl, 1 mM EDTA), twice with binding buffer (10 mM Tris, 500 mM NaCl, 1 mM EDTA) and resuspended at 4 mg ml−1 beads in binding buffer. Capped RNA fragments were bound to streptavidin beads for 20 min at room temperature and magnetically separated from other RNA by washing twice with binding buffer and twice with low-salt buffer to elute non-bound RNA. Beads were washed once with 1 mM Tris–HCl pH 7.5 and resuspended in 1 mM Tris–HCl pH 7.5.
RNA was converted to single-strand cDNA by SMARTscribe MMLV reverse transcriptase (Clontech 634836) at 10 U μl−1 with 2.5 mM DTT, 1 mM dNTP, 1.2 μM SMARTer stranded oligo and 0.6 μM SMART stranded N6 primer (Clontech 634836) by incubating 90 min at 42 °C and 10 min at 70 °C. Beads were collected and the supernatant was combined with the liquid fraction after the beads were washed with 30 μl 1 mM Tris pH 7.5. The cDNA was twice purified using 1 volume of solid phase reversible immobilization (SPRI) beads (Beckman Coulter A63880). cDNA was left on beads after the second purification and double-stranded cDNA was synthesized by 18 cycles PCR using SeqAmp DNA polymerase (Clontech 638504) with 0.25 μM primers (Universal Forward PCR primer and indexed Reverse PCR primer) and then SPRI purified with 1 volume of beads. DNA was sequenced on Illumina MiSeq with 150 bp paired-end reads chemistry.
TSS identification and classification
Sequencing reads were quality filtered44 and the 3 bp MMLV reverse transcriptase 3′ non-template extension was removed from the 5′ end of forward (R1) reads. Reads were mapped to the C. phytofermentans ISDg genome (NCBI NC_010001.1) using Bowtie 2 (version 2.2.4)45. Alignments showed 87–98% of reads mapped to unique positions in the C. phytofermentans genome, yielding between 0.4 million (corn stover) and 3.4 million (glucose) reads per culture (Supplementary Table 2). TSS were identified using R1 reads by calculating the number of reads starting at each genomic position, clustering read counts within a 5 bp sliding window, and retaining the position with the greatest number of reads. TSS were defined as genome positions with greater than 10 read starts per million reads in both duplicate cultures. Capp-switch TSS were confirmed by 5′ RACE (Sigma 03353621001) using primers in Supplementary Table 3 to amplify PCR products, which were resolved by electrophoresis, excised and sequenced.
Genes in the NCBI and MicroScope (MaGe) annotations46 were used to divide TSS into four categories: InterS (intergenic TSS with downstream gene in same orientation), InterA (intergenic TSS with downstream gene opposite orientation), IntraS (intragenic TSS in gene with same orientation) or IntraA (intragenic TSS in gene with opposite orientation). The InterS TSS with the most reads for each gene was defined as the primary TSS. Capp-Switch results were compared with strand-specific (dUTP) RNA-seq of C. phytofermentans grown in the same culture conditions5. RNA-seq gene expression was calculated as RPKM using the Bioconductor47 package ‘easyRNASeq' and differential expression was defined as a DESeq48 (version 1.22.1) P-value <0.05 adjusted for multiple testing of the 3,902 genes in C. phytofermentans genome by Bonferroni correction. Peptides corresponding to novel ORFs were identified by mapping peptide MS/MS spectra from glucose, xylan and cellulose cultures4 to the genome translated in all six frames. Peptides were identified from spectra using SEQUEST and filtered to a 5% false discovery rate using a target-decoy approach49,50 including a target database and a decoy of the reversed sequences.
Motif analysis
Sequence motifs were identified using MEME51 with a background model of di-nucleotide frequencies in the C. phytofermentans genome. Searches for RNA polymerase binding site motifs included positions 25–50 bp (−35 motif) and 5–20 bp (−10 motif) upstream of all primary TSS expressed on the three sugars and polysaccharides. The top palindromic motifs associated with LacI/GalR and TetR regulators were found by searching sequences from −250 (upstream) to +50 bp (downstream) relative to the start codon of C. phytofermentans genes and their putative orthologs from related genomes identified by top reciprocal BLAST searches (Supplementary Table 4). These motifs were used for genome-wide scans from −250 to +50 bp within all C. phytofermentans genes using MAST52. To cluster TSS by expression, the 1,188 TSS with at least a 30-fold change in read counts between two conditions were log2-transformed and each TSS was normalized to have a median value of 0 across conditions and scaled so the sum of the squared expression levels is 1. TSS were separated into 24 clusters by K-means using the city-block similarity metric. Significant motifs (e<0.001) associated with individual K-means clusters were identified by searching −100 to +10 bp with respect to each TSS.
Data availability
The authors confirm that all data underlying the findings are fully available without restriction. RNA sequencing files in FASTQ format are available in the European Nucleotide Archive under study accession PRJEB13063.
Additional information
How to cite this article: Boutard, M. et al. Global repositioning of transcription start sites in a plant-fermenting bacterium. Nat. Commun. 7, 13783 doi: 10.1038/ncomms13783 (2016).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Material
Acknowledgments
This work was funded by a CNRS Chaire d'Excellence to A.C.T. and the Genoscope-CEA. We thank NEB for providing reagents (biotin-GTP, vaccinia capping enzyme and streptavidin beads), the Genoscope-CEA sequencing platform for RNA sequencing and the LABGeM group for supporting the MicroScope (MaGe) annotation resource.
Footnotes
Author contributions L.E., A.A., M.S., I.S. and A.C.T. conceived the project. M.B., T.C. and K.L. collected data. M.B., L.E., I.S. and A.C.T. analysed the results. A.C.T. wrote the paper.
References
- Warnick T. A., Methé B. A. & Leschine S. B. Clostridium phytofermentans sp. nov., a cellulolytic mesophile from forest soil. Int. J. Syst. Evol. Microbiol. 52, 1155–1160 (2002). [DOI] [PubMed] [Google Scholar]
- Meehan C. J. & Beiko R. G. A phylogenomic view of ecological specialization in the Lachnospiraceae, a family of digestive tract-associated bacteria. Genome Biol. Evol. 6, 703–713 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petit E. et al. Genome and transcriptome of Clostridium phytofermentans, catalyst for the direct conversion of plant feedstocks to fuels. PLoS ONE 10, e0118285 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tolonen A. C. et al. Proteome-wide systems analysis of a cellulosic biofuel-producing microbe. Mol. Syst. Biol. 7, 461 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boutard M. et al. Functional diversity of carbohydrate-active enzymes enabling a bacterium to ferment plant biomass. PLoS Genet. 10, e1004773 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hunter S. et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 40, D306–D312 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nawrocki E. P. et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 43, D130–D137 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharma C. M. et al. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464, 250–255 (2010). [DOI] [PubMed] [Google Scholar]
- Mitschke J. et al. An experimentally anchored map of transcriptional start sites in the model cyanobacterium Synechocystis sp. PCC6803. Proc. Natl Acad. Sci. USA 108, 2124–2129 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schlüter J.-P. et al. Global mapping of transcription start sites and promoter motifs in the symbiotic α-proteobacterium Sinorhizobium meliloti 1021. BMC Genomics 14, 156 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cortes T. et al. Genome-wide mapping of transcriptional start sites defines an extensive leaderless transcriptome in Mycobacterium tuberculosis. Cell Rep. 5, 1121–1131 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shao W., Price M. N., Deutschbauer A. M., Romine M. F. & Arkin A. P. Conservation of transcription start sites within genes across a bacterial genus. MBio. 5, e01398-14 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomason M. K. et al. Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli. J. Bacteriol. 197, 18–28 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ettwiller L., Buswell J., Yigit E. & Schildkraut I. A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome. BMC Genomics 17, 199 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Y. Y., Machleder E. M., Chenchik A., Li R. & Siebert P. D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques 30, 892–897 (2001). [DOI] [PubMed] [Google Scholar]
- Dehal P. S. et al. MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res. 38, D396–D400 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bondy-Denomy J. & Davidson A. R. When a virus is not a parasite: the beneficial effects of prophages on bacterial fitness. J. Microbiol. 52, 235–242 (2014). [DOI] [PubMed] [Google Scholar]
- Ross W. et al. A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. Science 262, 1407–1413 (1993). [DOI] [PubMed] [Google Scholar]
- Graves M. C. & Rabinowitz J. C. In vivo and in vitro transcription of the Clostridium pasteurianum ferredoxin gene. Evidence for ‘extended' promoter elements in gram-positive organisms. J. Biol. Chem. 261, 11409–11415 (1986). [PubMed] [Google Scholar]
- Helmann J. D. Compilation and analysis of Bacillus subtilis sigma A-dependent promoter sequences: evidence for extended contact between RNA polymerase and upstream promoter DNA. Nucleic Acids Res. 23, 2351–2360 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burns H. D., Ishihama A. & Minchin S. D. Open complex formation during transcription initiation at the Escherichia coli galP1 promoter: the role of the RNA polymerase alpha subunit at promoters lacking an UP-element. Nucleic Acids Res. 27, 2051–2056 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barne K. A., Bown J. A., Busby S. J. & Minchin S. D. Region 2.5 of the Escherichia coli RNA polymerase sigma70 subunit is responsible for the recognition of the ‘extended-10' motif at promoters. EMBO J. 16, 4034–4040 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu W., Zhang X.-Z., Zhang Z. & Zhang Y.-H. P. Engineering of Clostridium phytofermentans Endoglucanase Cel5A for improved thermostability. Appl. Environ. Microbiol. 76, 4914–4917 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tolonen A. C., Chilaka A. C. & Church G. M. Targeted gene inactivation in Clostridium phytofermentans shows that cellulose degradation requires the family 9 hydrolase Cphy3367. Mol. Microbiol. 74, 1300–1313 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tolonen A. C. et al. Fungal lysis by a soil bacterium fermenting cellulose. Environ. Microbiol. 17, 2618–2627 (2015). [DOI] [PubMed] [Google Scholar]
- Weickert M. J. & Chambliss G. H. Site-directed mutagenesis of a catabolite repression operator sequence in Bacillus subtilis. Proc. Natl Acad. Sci. USA. 87, 6238–6242 (1990). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marciniak B. C. et al. High- and low-affinity cre boxes for CcpA binding in Bacillus subtilis revealed by genome-wide analysis. BMC Genomics 13, 401 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Francke C., Kerkhoven R., Wels M. & Siezen R. J. A generic approach to identify transcription factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1. BMC Genomics 9, 145 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richard P. & Hilditch S. D-galacturonic acid catabolism in microorganisms and its biotechnological relevance. Appl. Microbiol. Biotechnol. 82, 597–604 (2009). [DOI] [PubMed] [Google Scholar]
- Grundy F. J., Waters D. A., Allen S. H. & Henkin T. M. Regulation of the Bacillus subtilis acetate kinase gene by CcpA. J. Bacteriol. 175, 7348–7355 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Presecan-Siedel E. et al. Catabolite regulation of the pta gene as part of carbon flow pathways in Bacillus subtilis. J. Bacteriol. 181, 6889–6897 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fujita Y. Carbon catabolite control of the metabolic network in Bacillus subtilis. Biosci. Biotechnol. Biochem. 73, 245–259 (2009). [DOI] [PubMed] [Google Scholar]
- Paredes C. J., Alsaker K. V. & Papoutsakis E. T. A comparative genomic view of clostridial sporulation and physiology. Nat. Rev. Microbiol. 3, 969–978 (2005). [DOI] [PubMed] [Google Scholar]
- Chen Y., Indurthi D. C., Jones S. W. & Papoutsakis E. T. Small RNAs in the genus Clostridium. MBio. 2, e00340-10 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merino E. & Yanofsky C. Transcription attenuation: a highly conserved regulatory strategy used by bacteria. Trends Genet. 21, 260–264 (2005). [DOI] [PubMed] [Google Scholar]
- Tolonen A. C., Petit E., Blanchard J. L., Warnick T. & Leschine S. B. in Biological Conversion of Biomass for Fuels and Chemicals (eds Sun, J. et al.) 114–139 (Royal Society of Chemistry, 2013).
- Winkler W., Nahvi A. & Breaker R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419, 952–956 (2002). [DOI] [PubMed] [Google Scholar]
- Celik H. et al. A two-component system (XydS/R) controls the expression of genes encoding CBM6-containing proteins in response to straw in Clostridium cellulolyticum. PLoS ONE 8, e56063 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newcomb M., Chen C.-Y. & Wu J. H. D. Induction of the celC operon of Clostridium thermocellum by laminaribiose. Proc. Natl Acad. Sci. USA 104, 3747–3752 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nataf Y. et al. Clostridium thermocellum cellulosomal genes are regulated by extracytoplasmic polysaccharides via alternative sigma factors. Proc. Natl Acad. Sci. USA 107, 18646–18651 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cavedon K., Leschine S. B. & Canale-Parola E. Cellulase system of a free-living, mesophilic clostridium (strain C7). J. Bacteriol. 172, 4222–4230 (1990). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hong J., Ye X., Wang Y. & Zhang Y.-H. P. Bioseparation of recombinant cellulose-binding module-proteins by affinity adsorption on an ultra-high-capacity cellulosic adsorbent. Anal. Chim. Acta 621, 193–199 (2008). [DOI] [PubMed] [Google Scholar]
- Tolonen A. C. et al. Physiology, genomics, and pathway engineering of an ethanol-tolerant strain of Clostridium phytofermentans. Appl. Environ. Microbiol. 81, 5440–5448 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alberti A. et al. Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data. BMC Genomics 15, 912 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B. & Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vallenet D. et al. MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic Acids Res. 41, D636–D647 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delhomme N., Padioleau I., Furlong E. E. & Steinmetz L. M. easyRNASeq: a bioconductor package for processing RNA-Seq data. Bioinformatics 28, 2532–2533 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anders S. & Huber W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elias J. E. & Gygi S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007). [DOI] [PubMed] [Google Scholar]
- Tolonen A. C. & Haas W. Quantitative proteomics using reductive dimethylation for stable isotope labeling. J Vis. Exp. 89, e51416 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey T. L. & Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994). [PubMed] [Google Scholar]
- Bailey T. L. & Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors confirm that all data underlying the findings are fully available without restriction. RNA sequencing files in FASTQ format are available in the European Nucleotide Archive under study accession PRJEB13063.