Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2022 Jan 18;204(1):e00352-21. doi: 10.1128/JB.00352-21

Identification of Novel Translated Small Open Reading Frames in Escherichia coli Using Complementary Ribosome Profiling Approaches

Anne Stringer a,#, Carol Smith a,#, Kyle Mangano b, Joseph T Wade a,c,
Editor: Tina M Henkind
PMCID: PMC8765432  PMID: 34662240

ABSTRACT

Small proteins of <51 amino acids are abundant across all domains of life, but they are often overlooked because their small size makes them difficult to predict computationally and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either the start or stop codons. Thus, we identify ribosome-occupied start and stop codons with high sensitivity for ∼400 novel putative ORFs. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting that they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs.

IMPORTANCE Small proteins of <51 amino acids are abundant across all domains of life, but they are often overlooked because their small size makes them difficult to predict computationally and they are refractory to standard proteomic approaches. Recent studies have discovered small proteins by mapping the location of translating ribosomes on RNA using a technique known as ribosome profiling. Discovery of translated sORFs using ribosome profiling can be improved by treating cells with drugs that trap initiating ribosomes. Here, we show that combining these data with equivalent data for cells treated with a drug that stalls terminating ribosomes facilitates the discovery of small proteins. We use this approach to discover 365 putative genes that encode small proteins in Escherichia coli.

KEYWORDS: Ribo-RET, Ribosome profiling, small protein, apidaecin, retapamulin, sORF

INTRODUCTION

Tens of thousands of bacterial genomes have been fully sequenced. These genome sequences are annotated using computational pipelines that predict the location of open reading frames (ORFs). One of the main criteria used by ORF prediction pipelines is the length of the putative ORFs; ORFs encoding proteins with >50 amino acids (aa) are unlikely to occur by chance; thus, there is a strong bias toward identifying longer ORFs. However, there is strong evidence for large numbers of short ORFs (sORFs; encoding proteins with <51 aa): phylogenetic analysis has identified many conserved sORFs (1), and transcriptomic and proteomic experimental approaches provide strong evidence for the translation of many sORFs (215). As more sORFs and their encoded small proteins are identified, it is becoming increasingly clear that sORFs and small proteins play important functional roles in bacteria. In particular, sORFs can function as cis-acting regulators of downstream operonic genes, and small proteins can function to modulate the activity of protein complexes (16, 17).

There are 118 described sORFs in Escherichia coli K-12, most of which have not been functionally characterized. Recent studies have used genome-scale approaches to identify translated sORFs. Ribosome profiling is a method that experimentally maps ribosome association with the transcriptome. Two studies applied ribosome profiling to E. coli K-12 and BL21 (closely related to E. coli K-12) cells treated with antibiotics, retapamulin (“Ribo-RET”) and Onc112, that trap ribosomes at start codons (2, 3). Thus, sites of translation initiation were identified, leading to the identification of many putative translated sORFs. In one study, 38 of the encoded small proteins were successfully validated by Western blotting (3).

Ribo-RET has also been applied to Mycobacterium tuberculosis, revealing an even larger number of putative sORFs than described for E. coli (4). Moreover, the majority of the sORFs identified in M. tuberculosis do not appear to have been subject to purifying selection, suggesting that they represent the products of “pervasive translation,” whereby ribosomes initiate translation at large numbers of locations across the transcriptome, with many of the translated proteins being nonfunctional, i.e., not contributing to cell fitness.

Ribosome profiling has also been applied to E. coli cells treated with the antibiotic apidaecin, which traps ribosomes at stop codons by binding in the nascent peptide exit tunnel and trapping release factor (18). This method, “Ribo-Api,” also leads to ribosome accumulation upstream and downstream of stop codons, and at start codons (19); this additional signal can be partially reduced by the addition of puromycin to cell lysates (“Ribo-Api/Pmn”), but even with this reduction in signal away from stop codons, Ribo-Api/Pmn data cannot be used to identify stop codons with high specificity. Nonetheless, we reasoned that combining Ribo-RET and Ribo-Api/Pmn data sets for E. coli would facilitate the identification of translated sORFs with high sensitivity, since the likelihood of detecting both Ribo-RET signal at a start codon and Ribo-Api/Pmn signal at the corresponding stop codon by chance would be very low. Our analysis of Ribo-RET and Ribo-Api/Pmn data led to the identification of 397 novel ORFs with high confidence, most of which are sORFs. We validated expression of the associated small proteins for 10 of 17 novel sORFs tested, and we detected three additional small proteins following treatment of cells with a protease inhibitor. Thus, our data suggest that most or all of the 365 putative sORFs identified by combining Ribo-RET and Ribo-Api/Pmn data are robustly translated, but that many of these sORFs encode scarce or unstable small proteins. We speculate that pervasive translation leads to the production of many unstable, nonfunctional proteins in E. coli.

RESULTS

Reanalysis of Ribo-RET data to identify putative start codons.

Using a previously described Ribo-RET data set for E. coli BL21 (2) (Fig. 1A), we identified 12,756 sites of high ribosome occupancy that exceeded the local background (see Materials and Methods); we refer to these positions as “initiation-enriched ribosome footprints” (IERFs) (see Table S1 in the supplemental material). Our approach for selecting IERFs differed from that used in the original Ribo-RET study (2) in two ways. First, we used a considerably lower sequence read coverage threshold. Second, we required that sequence read coverage at IERFs substantially exceed the local background, which varies considerably based on genome location. We then determined the enrichment of every possible trinucleotide sequence in the regions surrounding IERFs. Consistent with most IERFs representing sites of translation initiation, and with the expected position for the downstream edge of initiating ribosomes (2, 20), ATG, GTG, and TTG trinucleotide sequences were enriched over the local background at positions 15 to 18 nt upstream of IERFs (Fig. 1B), with the highest enrichment observed for ATG sequences and the lowest enrichment observed for TTG sequences. We did not observe enrichment of CTG, ATT, or ATC sequences, consistent with these sequences being used infrequently as start codons (21). Based on the degree of enrichment for ATG, GTG, and TTG sequences at different positions 14 to 18 nt upstream of IERFs, we reanalyzed the Ribo-RET data to identify putative start codons (see Materials and Methods). Thus, we identified 7,936 putative start codons with an estimated false discovery rate (FDR) of 11.7%. Of these, 2,545 putative start codons matched those of annotated ORFs. A total of 2,474 putative start codons match those identified previously from Ribo-RET data alone (2) using a higher read coverage threshold, but with a considerably more relaxed range of potential start codons.

FIG 1.

FIG 1

Identification of ORFs by combining Ribo-RET and Ribo-Api/Pmn data. (A) Sequence read coverage is shown across a selected genomic region for Ribo-RET (retapamulin treatment; stalls initiating ribosomes), Ribo-Api/Pmn (apidaecin and puromycin treatment; stalls terminating ribosomes), and Ribo-Pmn (puromycin treatment; control) data. Blue and orange triangles indicate the positions of start codons paired with initiation-enriched ribosome footprints (IERFs) and stop codons paired with termination-enriched ribosome footprints (TERFs), respectively. Triangles joined by lines indicate start and stop codons from the same putative ORF. The positions of the annotated yjtD and novel ytjE ORFs are shown by arrows above the data. (B) Heatmap showing enrichment of possible start codon sequences at positions upstream of IERFs. (C) Heatmap showing enrichment of possible stop codon sequences at positions upstream of TERFs. (D) Classification of ORFs identified from Ribo-RET and Ribo-Api/Pmn data, with categories based on the overlap of start/stop codons with annotated ORFs, as shown in the schematic.

Reanalysis of Ribo-Api data to identify putative stop codons.

Using a previously described Ribo-Api data set, including a puromycin treatment (Ribo-Api/Pmn), for E. coli BL21 (19) (Fig. 1A), we identified 12,756 sites of high ribosome occupancy that exceeded the local background (see Materials and Methods); we refer to these positions as “termination-enriched ribosome footprints” (TERFs) (Table S2). We then determined the enrichment of every possible trinucleotide sequence in the regions surrounding TERFs. Consistent with most TERFs representing sites of translation termination, and with the expected position for the downstream edge of terminating ribosomes (19, 20), TAA, TGA, and TAA trinucleotide sequences were enriched over the local background at positions 12 to 14 nt upstream of TERFs (Fig. 1C). Based on these enrichments, we reanalyzed the Ribo-Api/Pmn data to identify putative stop codons by selecting TERFs with a TAA/TGA/TAG trinucleotide sequence in a specified distance range upstream. Thus, we identified 6,877 putative stop codons with an estimated FDR of 33.7% (see Materials and Methods). Of these putative stop codons, 1,377 match those of annotated genes.

Combining putative start and stop codons to identify putative novel ORFs.

We reasoned that by selecting putative start codons inferred from Ribo-RET data that are paired with putative stop codons inferred from Ribo-Api/Pmn data, we could identify ORFs with higher confidence than we would by using either Ribo-RET or Ribo-Api/Pmn data alone. Moreover, evidence of ribosome occupancy at both the start and stop codons of these putative ORFs implies that they are actively translated from beginning to end (Fig. 1A). Using this approach, we detected 1,839 putative ORFs (Table S3), with a conservative FDR estimate of 2.2% (see Materials and Methods). Of the ORFs we detected, 1,005 perfectly match annotated genes; we refer to these as “annotated” ORFs. In addition, 437 of the ORFs we detected share a stop codon with an annotated gene, but have a different start codon. These could represent mis-annotated genes, or one of a pair of ORFs where the corresponding proteins share the same C-terminal region. We refer to these as “isoform” ORFs. Finally, 397 of the ORFs we detected have a stop codon that does not match any annotated gene. We refer to these as “novel” ORFs (Fig. 1D). Only 116 of the novel ORFs were identified in a previous study from Ribo-RET data alone (2). The likelihood of detecting an annotated ORF by chance is exceedingly low (we conservatively estimate the FDR to be ∼0.07% [see Materials and Methods]). We conservatively estimate that the FDRs for identifying isoform and novel ORFs are 2.3% and 7.5%, respectively.

Putative isoform and novel ORF start codons are associated with RNA secondary structure features expected of actively translated ORFs.

Actively translated start codons in E. coli tend to be associated with regions of reduced RNA secondary structure relative to the surrounding sequence (22). We examined the sequences around the 1,005 annotated, 437 isoform, and 397 novel ORF start codons. For all three classes of ORF, we observed significantly reduced predicted RNA secondary structure in the regions from −25 to +15 nt relative to start codons, compared to that of 500 randomly selected genomic sequences of the same length (Fig. 2) (Mann-Whitney U test, P < 2.2e−16 for each of annotated and novel ORFs; P = 3.2e−6 for isoform ORFs). We conclude that the start codons of isoform and novel ORFs are associated with RNA secondary structure features expected of actively translated ORFs. Predicted RNA secondary structure around the start codons of annotated ORFs was slightly lower than that for novel ORFs (Mann-Whitney U test, P = 1.4e−8), and substantially lower than that for isoform ORFs (Mann-Whitney U test, P < 2.2e−16).

FIG 2.

FIG 2

Reduced RNA secondary structure for sequences around identified start codons. Strip plot showing the ΔG for the predicted minimum free energy structures for the regions from −25 to +15 nt relative to putative start codons for annotated, isoform, and novel ORFs identified by analyzing Ribo-RET and Ribo-Api/Pmn data, and for a set of 500 random sequences.

Classification of novel ORFs based on local gene context.

We compared the positions of the 397 putative novel ORFs with the positions of annotated genes (Fig. 3). Of the novel ORFs, 27% are located entirely in intergenic regions; of these, roughly two thirds are immediately upstream of an annotated gene on the same strand, suggesting that they are cotranscribed. In addition, 44% of the novel ORFs overlap completely with an annotated gene in the sense orientation, whereas only 12% overlap completely with an annotated gene in the antisense orientation. Finally, 17% of the novel ORFs partially overlap an annotated gene.

FIG 3.

FIG 3

Classification of novel ORFs by position relative to annotated genes. The pie chart shows the distribution of novel ORFs across different categories defined by the type of overlap with an annotated gene, as shown in the schematic.

Strikingly, the novel ORFs were typically much smaller than the annotated ORFs; the median length of the novel ORFs was 15 codons, and 365 of them encode proteins of ≤50 amino acids and were therefore classified as sORFs. We compared the 365 putative novel sORFs found in E. coli BL21 with sORFs described for E. coli K-12 (strain MG1655); 15 of the putative sORFs perfectly matched those described previously for MG1655, and 2 of the putative sORFs matched previously described ORFs with one or two nucleotide mismatches (3, 14, 15).

Experimental validation of putative novel sORFs.

We selected 17 sORFs for experimental validation, covering a range of expression values based on Ribo-RET and Ribo-Api/Pmn coverage at start and stop codons, respectively. We C-terminally epitope tagged the 17 sORFs at their native loci with SPA tags. We then used Western blotting to detect stably expressed tagged proteins for cells grown in rich medium. As a positive control, we used a strain expressing SPA-tagged YhgP, a small protein that was previously detected by Western blotting (3). We observed single bands for the positive control protein and for 10 of the 17 proteins encoded by putative sORFs. We observed no bands for an untagged strain grown under the same conditions (Fig. 4A; Fig. S1A). We concluded that approximately half of the putative sORFs were stably expressed. We assigned new gene names to the ORFs for which a protein was detectable by Western blotting (Fig. 4A; Table S3).

FIG 4.

FIG 4

Experimental validation of sORFs by Western blotting. (A) Western blotting of SPA-tagged novel small proteins. Each lane was loaded with whole-cell extract from cells expressing a SPA-tagged small protein, except for lanes labeled “Untagged,” which were loaded with wild-type MG1655. Underlined sample names indicate that the cells were ΔthyA and grown in medium supplemented with thymine. Asterisks indicate nonspecific bands. The arrow indicates the expected position of SPA-tagged small proteins. YhgP is a positive control, previously confirmed in a Western blot. Untagged strains are negative controls. Empty triangles indicate the positive control lanes. Filled triangles indicate novel small proteins with bands of the expected size. (B) Western blotting of SPA-tagged novel small proteins in cells grown with/without bortezomib, an inhibitor of the ClpP protease.

We reasoned that failure to detect a protein by Western blotting could be due either to the corresponding sORF being a false positive, or to the instability of the protein. To address the role of protein instability for the 7 proteins we failed to detect by Western blotting, we performed an additional Western blotting test using cells grown with or without bortezomib, an inhibitor of the ClpP protease (23), a major cellular protease in E. coli (24). For untreated cells, we only detected expression of the YhgP control protein. In contrast, for cells treated with bortezomib, we detected expression of YhgP and three additional proteins (Fig. 4B; Fig. S1B). We conclude that many of the novel sORFs encode proteins that are robustly translated but rapidly degraded by ClpP.

Evidence for function of novel ORFs.

Our data strongly suggest that the majority of the 397 novel ORFs are actively translated in E. coli BL21, but this does not indicate whether the ORFs or their encoded proteins are functional, with “function” defined as providing a positive contribution to cell fitness (25). Indeed, recent studies suggest that bacteria and eukaryotes can express hundreds of sORFs that are likely nonfunctional (4, 26, 27). We used three approaches to identify likely functional novel ORFs. First, we searched the encoded proteins for functional domains in the PFAM database (28). Most of the proteins were too short to include a functional domain; nonetheless, 6 of the novel ORFs had significant matches to functional domains listed in the PFAM database (Table S3), with the shortest of these ORFs encoding a protein of 61 amino acids. Second, we examined the codon bias of the novel ORFs; codon sequences that are nonrandom compared to the overall nucleotide content of the ORFs are indicative of purifying selection, a feature observed for annotated ORFs. We reasoned that novel ORFs with codon usage similar to that of annotated ORFs were likely under purifying selection. We analyzed the codon usage of 105 novel ORFs in regions that had at least 30 nt nonoverlapping with annotated genes (including annotated genes not identified by Ribo-RET or Ribo-Api/Pmn). Specifically, we calculated the relative codon deoptimization index (RCDI), a measure of how closely the codon usage of an ORF matches that of the annotated ORFs from the same genome (29, 30). Eleven of the 105 selected ORFs have codon usage similar to that of annotated genes (RCDI score <1.9 [P < 0.01]; RCDI scores closer to 1 indicate greater codon optimality) (see Materials and Methods) (Table S3), suggesting that these ORFs are under purifying selection, and hence are likely to be functional. Third, we searched for homologues of proteins encoded by novel ORFs, excluding those that have previously been described in E. coli MG1655, and limiting the analysis to proteins or portions of proteins encoded entirely in intergenic regions. Thus, we identified likely homologues for five of the encoded proteins (Fig. 5). Overall, our data identify a small subset of novel ORFs that are likely to be functional; however, we found no evidence of function for the majority of sORFs.

FIG 5.

FIG 5

Conservation of selected sORFs. Sequence alignments of five selected novel proteins and their putative homologues in Salmonella enterica and (where found) Citrobacter koseri. Alignments were extracted from tBLASTn output files. Shaded amino acids match the protein sequence from E. coli. The boxed region for YpeD and putative homologues indicates the region of overlap in the corresponding genes with the downstream gene (mntH) and its homologues. Representative S. enterica and C. koseri genome sequences are shown (Salmonella enterica subsp. enterica serovar Worthington, strain CFSAN051295, CP029041.1; Citrobacter koseri, strain FDAARGOS_1029, CP066089.1).

DISCUSSION

Combining Ribo-RET and Ribo-Api/Pmn data sets is a highly specific approach to identify translated ORFs.

Previous studies in prokaryotes and eukaryotes have identified start codons by combining ribosome profiling with antibiotics such as retapamulin or homoharringtonine that trap initiating ribosomes (2, 3, 3134). While this approach has been effective at mapping sites of translation initiation, the addition of the antibiotics does not completely remove the signal from elongating ribosomes. Hence, Ribo-RET and related approaches have uncertainty that can make it difficult to identify start codons with high confidence, especially for start codons internal to highly expressed ORFs (32). Moreover, Ribo-RET signal at a putative start codon does not indicate whether ribosomes actively translate the corresponding ORF to completion in the absence of antibiotic.

Similar to Ribo-RET and related approaches, Ribo-Api/Pmn can be used to identify sites of translation termination, i.e., active stop codons. However, apidaecin treatment does not cause ribosomes to be exclusively associated with stop codons; ribosomes are also enriched in the regions immediately upstream and downstream of stop codons, and at start codons (19), such that stop codon identification from Ribo-Api/Pmn data is associated with a high FDR (estimated at 33.7% for our analysis). Moreover, the identification of a stop codon position rarely indicates the position of the associated start codon(s).

Since IERFs in Ribo-RET data and TERFs in Ribo-Api/Pmn data are independent measurements, but start codons can be unambiguously associated with a single stop codon, combining data from both approaches allows for ORF identification with a greatly reduced FDR compared to that of either Ribo-RET or Ribo-Api/Pmn alone (estimated at 2.2% for our analysis). Moreover, detecting ribosomes at both the start and stop codons for the same ORF strongly suggests that translation runs from start to stop codon.

The high sensitivity of an approach combining Ribo-RET and Ribo-Api/Pmn is evidenced by the fact that 35% of the novel ORFs and 9% of the annotated ORFs we identified have only two Ribo-Api/Pmn sequence reads at their stop codons. Nonetheless, the low FDR and the propensity for novel ORFs to have low RNA secondary structure around their start codons strongly suggest that the majority of novel ORFs are actively translated. We anticipate that Ribo-RET and Ribo-Api/Pmn or related approaches will be combined to identify translated ORFs with high confidence in other bacterial and nonbacterial species. One potential barrier to these methods is the inability of the drugs to accumulate within cells of some species; this could be due to low permeability or active efflux. While Ribo-RET has proven effective in E. coli (2) and M. tuberculosis (4), deletion of the tolC gene encoding a component of the drug efflux pump was needed for successful Ribo-RET with E. coli, and most Gram-negative species have similarly low sensitivity to retapamulin. Apidaecin uptake requires the sbmA gene, which encodes an inner membrane transport protein (35); sbmA is missing in some Gram-negative and all Gram-positive bacterial species. If apidaecin uptake is low, an alternative approach would be to encode and express apidaecin, a small protein, inside the bacteria (36, 37). Onc112, which functions similarly to retapamulin, is a small protein that could also be encoded and expressed within cells (38).

E. coli expresses large numbers of small proteins, many of which are likely unstable.

Most of the novel ORFs we identified in this study are sORFs, i.e., they encode proteins of <51 amino acids. sORFs and small proteins have been challenging to identify by other methods; thus, we have greatly expanded the number of novel ORFs known for E. coli, with the caveat that not all have been experimentally validated using an independent approach. Western blotting indicates large differences in the expression of small proteins encoded by selected novel sORFs, with seven of the proteins encoded by selected sORFs being undetectable by Western blotting. However, three of these seven proteins were detected in cells treated with an inhibitor of the ClpP protease. These data strongly suggest that (i) the large majority of novel sORFs identified by combining Ribo-RET and Ribo-Api/Pmn are actively translated, and (ii) many of the proteins encoded by identified sORFs are unstable due to proteolysis by ClpP.

Evidence that most small ORFs in E. coli are not functional.

Two lines of evidence suggest that most of the E. coli sORFs we identified are not functional (with “function” defined as providing a positive contribution to cell fitness) (25): (i) the codon usage of most analyzed novel sORFs did not match that of annotated ORFs, and (ii) we detected sequence conservation at the amino acid level for only a few of the novel sORFs in genera beyond Escherichia and Shigella (Shigella species were excluded due to very high similarity between Shigella and Escherichia sequences). These analyses, however, have important limitations: (i) codon usage patterns are only informative for sORFs that do not overlap annotated ORFs, (ii) the statistical significance of both codon usage and sequence conservation is limited by the small size of sORFs, and (iii) regulatory sORFs can be functional even if the corresponding proteins are not, and these sORFs often have nonstandard codon usage. Based on our study of sORFs in M. tuberculosis, we proposed that the M. tuberculosis transcriptome is pervasively translated, with most translated proteins being nonfunctional. We propose that the E. coli is also subject to pervasive translation. A lack of function for most sORFs is consistent with half of the small proteins tested appearing to be weakly expressed and/or unstable, features expected of spurious proteins.

Annotated genes that contain start codons for novel ORFs tend to have suboptimal codon usage.

The position of novel ORFs with respect to annotated genes is nonrandom (Fig. 3). Specifically, we detected many more novel ORFs on the same strands as overlapping or downstream annotated genes. We speculate that this bias is because novel ORFs require a transcript, and mRNAs for annotated genes tend to be more abundant than antisense RNAs. Nonetheless, we were surprised to find that 44% of all the novel ORFs we detected overlapped completely with an annotated gene on the same strand; this suggests that the overlapping translation of annotated ORFs does not always prevent translation initiation at ORF-internal start codons, consistent with the findings of a previous study (2).

We speculated that novel ORFs whose start codons are within annotated genes reflect inefficient translation of the overlapping annotated ORF, such that RNA around the novel start codons is rarely protected by elongating ribosomes. To test this idea, we assessed the codon usage of annotated genes that encompass novel ORF start codons. The codon usage of annotated genes that encompass novel ORFs (mean RCDI score = 1.41) was significantly less optimal than that for the set of all other annotated ORFs (mean RCDI score = 1.33; Mann-Whitney U-test comparing RCDI scores, P = 1.7e−5). This difference was more pronounced when we considered only highly expressed novel ORFs (mean RCDI score of 1.59 for annotated genes that encompass novel ORF starts with Ribo-RET signal >5 RPM; P = 2.1e−12). We conclude that suboptimal codon usage of an annotated ORF increases the likelihood of internal initiation sites, likely due to a reduced efficiency of translation elongation. Suboptimal codon usage is a feature of horizontally acquired genes in E. coli (39), suggesting that horizontally acquired genes often have an increased abundance of internal translation. Another feature of horizontally acquired genes is low GC content (39). Strikingly, annotated genes that encompassed highly expressed novel ORF starts had a much lower GC content (mean of 42.2%) than the set of all annotated ORFs (mean of 51.1%). Given that genes with internal start codons often have suboptimal codon usage and/or low GC content, we speculate that horizontally acquired genes are hot spots for internal novel ORFs due to inefficient translation elongation. An alternative explanation for this phenomenon is that high AT content reduces the likelihood of local RNA secondary structure, a feature known to be associated with translation initiation efficiency (Fig. 2) (22).

Identification of E. coli sORFs that are likely to be functional.

A few of the novel sORFs we identified are associated with features expected of functional proteins. Specifically, we detected stably expressed protein by Western blotting, we observed codon usage patterns similar to those of annotated ORFs, and/or we identified putative homologues in other genera. While the functions of these sORFs or their corresponding small proteins are unknown, they represent promising candidates for further study. In the case of sORFs that are immediately upstream or partially overlapping with annotated genes on the same strand, one potential function is regulation of the downstream gene by translation of the sORF, as has been described for other sORFs in bacteria (17). Our data also suggest exercising caution when selecting sORFs/small proteins for further study, since many sORFs/small proteins may be nonfunctional; relying on codon usage patterns and/or sequence conservation is a simple way to prioritize sORFs/small proteins that are most likely to be functional.

MATERIALS AND METHODS

Strains and plasmids.

All strains and plasmids used in this work are listed in Table 1. All oligonucleotides used in this work are listed in Table S4 in the supplemental material. All strains used in this work are derivatives of Escherichia coli K-12 MG1655 (40). Strains AMD783-AMD800 were constructed using λ Red recombineering as described previously (41). Specifically, PCR products used for recombineering were generated with the oligonucleotides JW10796-JW10803 and JW10883-JW10912, using DY330 allR-SPA::kanR (42) as a template. The PCR products generated to make AMD783 and AMD784 were electroporated into AMD052 (Escherichia coli K-12 MG1655 ΔthyA) harboring pKD46 (41). The PCR products generated to make AMD785-AMD801 were electroporated into MG1655 harboring pKD46 (43, 44).

TABLE 1.

List of strains and plasmids used in this study

Strain or plasmid Genotype or description Source or reference
MG1655 Wild type 40
AMD052 MG1655 ΔthyA 43
N/A DY330 allR-SPA::kan 42
AMD783 MG1655 ΔthyA yhgP-SPA::kan, constructed using JW10796 + JW10797 This study
AMD784 MG1655 ΔthyA ytjE-SPA::kan, constructed using JW10802 + JW10803 This study
AMD785 MG1655 [1995451_-]-SPA::kan, constructed using JW10887 + JW10888 This study
AMD786 MG1655 ytiE-SPA::kan, constructed using JW10798 + JW10799 This study
AMD787 MG1655 yrfJ-SPA::kan, constructed using JW10889 + JW10890 This study
AMD788 MG1655 [2179003_+]-SPA::kan, constructed using JW10800 + JW10801 This study
AMD789 MG1655 [1882207_-]-SPA::kan, constructed using JW10883 + JW10884 This study
AMD790 MG1655 ykfO-SPA::kan, constructed using JW10893 + JW10894 This study
AMD791 MG1655 yoeJ-SPA::kan, constructed using JW10901 + JW10902 This study
AMD792 MG1655 [2983454_+]-SPA::kan, constructed using JW10897 + JW10898 This study
AMD793 MG1655 ytgB-SPA::kan, constructed using JW10909 + JW10910 This study
AMD794 MG1655 [905143_-]-SPA::kan, constructed using JW10891 + JW10892 This study
AMD795 MG1655 ymgN-SPA::kan, constructed using JW10899 + JW10900 This study
AMD796 MG1655 [3953439_-]-SPA::kan, constructed using JW10905 + JW10906 This study
AMD797 MG1655 ypeD-SPA::kan, constructed using JW10907 + JW10908 This study
AMD798 MG1655 [3510920_-]-SPA::kan, constructed using JW10895 + JW10896 This study
AMD799 MG1655 ynhH-SPA:: kan, constructed using JW10911 + JW10912 This study
AMD800 MG1655 ytcB-SPA::kan, constructed using JW10903 + JW10904 This study
pKD46 paraBAD-gam-bet-exo under the control of AraC, AmpR (Ts origin) 41
pSIM6 pL-gam-bet-exo genes under the control of CI857 repressor, AmpR (Ts origin) 44

Identification of IERFs from Ribo-RET data.

Sequence reads for the Ribo-RET data (NCBI SRA accession number SRR8156054) (2) were aligned to the E. coli BL21 reference genome and to a reverse complemented reference genome using Rockhopper (version 2.03). The genome positions and strands of sequence read 3′-ends were inferred from the resultant sam file. Sequence read coverage across known noncoding RNAs (Table S5) was set to zero, and coverage was normalized to reads per million (RPM). Every genome position on both strands was considered as a possible IERF. IERFs were selected if the corresponding position/strand met three criteria: (i) normalized sequence read coverage was ≥0.5 RPM, (ii) normalized sequence read coverage was ≥10-fold higher than the mean sequence read depth at all positions in a 101-nt window centered on the position being considered, and (iii) normalized sequence read depth was at least as high as that for any other position in a 21-nt window centered on the position being considered.

Identification of putative start codons from IERFs.

The frequency of all trinucleotide sequences was determined for positions −50 to −1 relative to the position of each IERF. We reasoned that trinucleotide sequences used as start codons would be enriched relative to the local background in the nucleotide (nt) −14 to −18 range relative to IEFR positions, with the strongest enrichment at position −15 (2); we considered the local background frequency for each trinucleotide sequence to be that for positions −50 to −41 relative to the IERFs. To determine possible start codons, we first selected candidate start codon sequences by requiring that a trinucleotide sequence be enriched at least 1.4-fold above background at position −15 relative to IERFs, and that the highest level of enrichment in the nt −14 to −18 range be at position −15. These criteria were met by only three trinucleotide sequences, ATG, GTG, and TTG; hence, only these sequences were considered as possible start codons. ATG is enriched at least 1.4-fold above background at positions −14 to −18 relative to IERFs; GTG is enriched at least 1.4-fold above background at positions −14 to −16 relative to IERFs; TTG is enriched at least 1.4-fold above background at positions −14 to −15 relative to IERFs. We therefore identified putative start codons as ATG sequences positioned 14 to 18 nt upstream of IERFs, GTG sequences positioned 14 to 16 nt upstream of IERFs, and TTG sequences positioned 15 to 16 nt upstream of IERFs.

Calculating the false discovery rate for putative start codons.

The likelihood of randomly selecting a genome coordinate with an associated start codon sequence (as defined above for IERFs) was estimated by selecting 100,000 random genome coordinates and determining the fraction, “R,” that would be associated with a start codon. The set of IERFs contains a number of true positives (i.e., corresponding to a genuine start codon) and a number of false positives. We assumed that true-positive IERFs were all associated with a start codon using the parameters described above for calling ORFs. We assumed that false-positive IERFs were associated with a start codon at the same frequency as random genome coordinates, i.e., R. Since we knew how many IERFs were not associated with a start codon, we could use this number to estimate how many false-positive IERFs were associated with a start codon by chance. With the total number of IERFs as “I” and the total number of identified ORFs as “O,” the FDR for ORF calls was estimated by the formula {100 × (I − O) × [R/(1 − R)]}/O.

Identification of TERFs from Ribo-Api/Pmn data.

TERFs were identified as described for IERFs, but using Ribo-Api/Pmn data (NCBI SRA accession number SRR11728142) (19).

Control Ribo-seq data for puromycin-treated cells.

Sequence read coverage for a control data set (puromycin-treated cells) was generated as described above (NCBI SRA accession number SRR11728143) (19).

Identification of putative stop codons from TERFs.

Putative stop codons were identified from TERFs using the same approach as described above for identifying putative start codons from IERFs, except that only TAA, TGA, and TAG trinucleotide sequences were considered. Based on the enrichment of TAA, TGA, and TAG sequences above the local background in the regions upstream of TERFs, we identified putative stop codons as TAA, TGA, or TAG sequences positioned 12 to 14 nt upstream of TERFs. The FDR for putative stop codons was determined as described for start codons.

Combining IERFs and TERFs to identify putative ORFs.

Stop codons identified from TERFs were discarded if the TERF was located within 3 nt of an IERF on the same strand, since apidaecin stalls ribosomes at start codons as well as at stop codons. Putative start codons identified from IERFs were translated in silico to identify their corresponding stop codons. These stop codons were compared to the stop codons identified from TERFs; for shared stop codons, the ORF defined by the IERF was selected as a putative ORF.

To conservatively estimate an FDR for ORF identification, we rotated the normalized sequence read coverage data for Ribo-RET and Ribo-Api/Pmn relative to the genome sequence by 1,000 nt, each of 10 times. After each rotation, we repeated IERF, TERF, and ORF identification as described above. The average number of ORFs identified after each of the 10 rotations was considered a conservative estimate of the number of falsely discovered ORFs in the nonrotated data. FDRs for annotated, isoform, and novel ORFs were estimated by determining how many of the ORFs identified from rotated data were found in each category and dividing the overall FDR equivalently.

RNA folding prediction.

The sequence from nt −25 to +15 relative to each start codon, or for each of 500 41-nt sequences randomly selected from the E. coli BL21 genome, was selected to predict the free energy of the predicted minimum free energy structure, using a local installation of ViennaRNA Package tool RNAfold v. 2.4.14 with default settings (45).

Western blotting.

For the experiments corresponding to Fig. 4A and Fig. S1A, MG1655 (untagged) and strains AMD785 to AMD800 were grown overnight at 37°C in LB, subcultured 1:100 in LB, and grown with shaking at 37°C to an OD600 of 0.5 to 0.7. Where indicated, cells were grown for an additional hour with 25 μM bortezomib. AMD052, AMD783, and AMD784 were grown overnight at 37°C in LB supplemented with 100 μg/mL thymine, subcultured 1:100 in LB supplemented with 100 μg/mL thymine, and grown with shaking at 37°C to an OD600 of 0.5 to 0.7. Where indicated, cells were grown for an additional hour with 25 μM bortezomib. After growth, 500 μL of each culture was pelleted and resuspended in 25 μL Laemmli sample buffer with a 19:1 ratio of buffer β-mercaptoethanol. Samples were separated on a 10% SDS-PAGE gel. Proteins were transferred to polyvinylidene difluoride membranes and probed with M2 anti-FLAG antibody (Sigma-Aldrich; 1:5,000 dilution) and horseradish peroxidase-conjugated goat anti-mouse antibody (1:250,000 dilution for Fig. 4A and Fig. S1A; 1:100,000 dilution for Fig. 4B and Fig. S1B). Tagged proteins were visualized using the SuperSignal West Femto detection kit (Thermo Fisher).

Analysis of conserved protein domains.

A FASTA file containing all proteins encoded by novel ORFs with >20 codons was submitted to the PFAM webserver (v. 34.0) (28) for a batch sequence search using default parameters.

Codon usage analysis.

FASTA files containing the nucleotide sequences of regions of novel ORFs not overlapping with annotated genes were generated and submitted to the RCDI/eRCDI web server (30). The E. coli BL21 genome codon usage table (https://www.kazusa.or.jp/codon) was used with the following parameters: upper confidence limit, 99% confidence/99% population; random sequence length, 300 nt; and the standard genetic code.

Sequence conservation analysis.

The amino acid sequences of all proteins encoded by novel, fully intergenic ORFs were submitted to the NCBI tBLASTn webserver (v. 2.11.0) (46). The Escherichia (taxid:561) and Shigella (taxid:620) genera were excluded from the search. The following parameters were adjusted: 250 max target sequences were selected, the expect threshold was set to 100, and the low complexity region filter was unselected. The search results were further filtered by selecting only ORFs with 100% query coverage and a ≥70% sequence identity. Groups of conserved ORFs were realigned using the Clustal Omega multiple sequence alignment program with default settings (47).

Code availability.

Custom Python scripts used in this study are available at https://github.com/wade-lab/E_coli_sORF_discovery.

ACKNOWLEDGMENTS

We thank the Wadsworth Center Applied Genomic Technologies Core Facility for DNA sequencing. We thank the Wadsworth Center Tissue Culture and Media Core Facility and Glassware Facility for technical support. We thank Alexander Mankin, Nora Vazquez-Laslop, Zachary Ardern, and Gisela Storz for helpful suggestions and comments on the manuscript. We thank Don Court for the recombineering plasmid pSIM6. We thank an anonymous reviewer for suggesting bortezomib treatment to prevent proteolysis.

K.M. was supported by a National Institutes of Health Training Grant (no. 5T32AT007533). J.T.W. was supported by National Institutes of Health grant no. R01GM139277.

Footnotes

Supplemental material is available online only.

Supplemental file 1
Fig. S1. Download JB.00352-21-s0001.pdf, PDF file, 1.5 MB (1.5MB, pdf)
Supplemental file 2
Tables S1 to S5. Download JB.00352-21-s0002.xlsx, XLSX file, 3.1 MB (3.1MB, xlsx)

Contributor Information

Joseph T. Wade, Email: joseph.wade@health.ny.gov.

Tina M. Henkin, Ohio State University

REFERENCES

  • 1.Sberro H, Fremin BJ, Zlitni S, Edfors F, Greenfield N, Snyder MP, Pavlopoulos GA, Kyrpides NC, Bhatt AS. 2019. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell 178:1245–1259.e14. 10.1016/j.cell.2019.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Meydan S, Marks J, Klepacki D, Sharma V, Baranov PV, Firth AE, Margus T, Kefi A, Vázquez-Laslop N, Mankin AS. 2019. Retapamulin-assisted ribosome profiling reveals the alternative bacterial proteome. Mol Cell 74:481–493. 10.1016/j.molcel.2019.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Weaver J, Mohammad F, Buskirk AR, Storz G. 2019. Identifying small proteins by ribosome profiling with stalled initiation complexes. mBio 10. 10.1128/mBio.02819-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Smith C, Canestrari JG, Wang J, Derbyshire KM, Gray TA, Wade JT. 2019. Pervasive translation in Mycobacterium tuberculosis. bioRxiv 665208. 10.1101/665208. [DOI] [PMC free article] [PubMed]
  • 5.Fuchs S, Kucklick M, Lehmann E, Beckmann A, Wilkens M, Kolte B, Mustafayeva A, Ludwig T, Diwo M, Wissing J, Jänsch L, Ahrens CH, Ignatova Z, Engelmann S. 2021. Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet 17:e1009585. 10.1371/journal.pgen.1009585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Miravet-Verde S, Ferrar T, Espadas-García G, Mazzolini R, Gharrab A, Sabido E, Serrano L, Lluch-Senar M. 2019. Unraveling the hidden universe of small proteins in bacterial genomes. Mol Syst Biol 15:e8290. 10.15252/msb.20188290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. 2017. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 27:2083–2095. 10.1101/gr.218255.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Petruschke H, Anders J, Stadler PF, Jehmlich N, von Bergen M. 2020. Enrichment and identification of small proteins in a simplified human gut microbiome. J Proteomics 213:103604. 10.1016/j.jprot.2019.103604. [DOI] [PubMed] [Google Scholar]
  • 9.Venturini E, Svensson SL, Maaß S, Gelhausen R, Eggenhofer F, Li L, Cain AK, Parkhill J, Becher D, Backofen R, Barquist L, Sharma CM, Westermann AJ, Vogel J. 2020. A global data-driven census of Salmonella small proteins and their potential functions in bacterial virulence. microLife 1 10.1093/femsml/uqaa002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Baek J, Lee J, Yoon K, Lee H. 2017. Identification of unannotated small genes in Salmonella. G3 (Bethesda) 7:983–989. 10.1534/g3.116.036939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ndah E, Jonckheere V, Giess A, Valen E, Menschaert G, Van Damme P. 2017. REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes. Nucleic Acids Res 45:e168. 10.1093/nar/gkx758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Shell SS, Wang J, Lapierre P, Mir M, Chase MR, Pyle MM, Gawande R, Ahmad R, Sarracino DA, Ioerger TR, Fortune SM, Derbyshire KM, Wade JT, Gray TA. 2015. Leaderless transcripts and small proteins are common features of the mycobacterial translational landscape. PLoS Genet 11:e1005641. 10.1371/journal.pgen.1005641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Miranda-CasoLuengo AA, Staunton PM, Dinan AM, Lohan AJ, Loftus BJ. 2016. Functional characterization of the Mycobacterium abscessus genome coupled with condition specific transcriptomics reveals conserved molecular strategies for host adaptation and persistence. BMC Genomics 17:553. 10.1186/s12864-016-2868-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hemm MR, Paul BJ, Schneider TD, Storz G, Rudd KE. 2008. Small membrane proteins found by comparative genomics and ribosome binding site models. Mol Microbiol 70:1487–1501. 10.1111/j.1365-2958.2008.06495.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.VanOrsdel CE, Kelly JP, Burke BN, Lein CD, Oufiero CE, Sanchez JF, Wimmers LE, Hearn DJ, Abuikhdair FJ, Barnhart KR, Duley ML, Ernst SEG, Kenerson BA, Serafin AJ, Hemm MR. 2018. Identifying new small proteins in Escherichia coli. Proteomics 18:e1700064. 10.1002/pmic.201700064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Storz G, Wolf YI, Ramamurthi KS. 2014. Small proteins can no longer be ignored. Annu Rev Biochem 83:753–777. 10.1146/annurev-biochem-070611-102400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Dever TE, Ivanov IP, Sachs MS. 2020. Conserved upstream open reading frame nascent peptides that control translation. Annu Rev Genet 54:237–264. 10.1146/annurev-genet-112618-043822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Florin T, Maracci C, Graf M, Karki P, Klepacki D, Berninghausen O, Beckmann R, Vázquez-Laslop N, Wilson DN, Rodnina MV, Mankin AS. 2017. An antimicrobial peptide that inhibits translation by trapping release factors on the ribosome. Nat Struct Mol Biol 24:752–757. 10.1038/nsmb.3439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mangano K, Florin T, Shao X, Klepacki D, Chelysheva I, Ignatova Z, Gao Y, Mankin AS, Vázquez-Laslop N. 2020. Genome-wide effects of the antimicrobial peptide apidaecin on translation termination in bacteria. Elife 9 10.7554/eLife.62655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mohammad F, Woolstenhulme CJ, Green R, Buskirk AR. 2016. Clarifying the translational pausing landscape in bacteria by ribosome profiling. Cell Rep 14:686–694. 10.1016/j.celrep.2015.12.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gvozdjak A, Samanta MP. 2020. Genes preferring non-AUG start codons in bacteria. arXiv:2008.10758 [q-bio.GN].
  • 22.Del Campo C, Bartholomäus A, Fedyunin I, Ignatova Z. 2015. Secondary structure across the bacterial transcriptome reveals versatile roles in mRNA regulation and function. PLoS Genet 11:e1005613. 10.1371/journal.pgen.1005613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sassetti E, Durante Cruz C, Tammela P, Winterhalter M, Augustyns K, Gribbon P, Windshügel B. 2019. Identification and characterization of approved drugs and drug-like compounds as covalent Escherichia coli ClpP inhibitors. Int J Mol Sci 20:2686. 10.3390/ijms20112686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Olivares AO, Baker TA, Sauer RT. 2016. Mechanistic insights into bacterial AAA+ proteases and protein-remodelling machines. Nat Rev Microbiol 14:33–44. 10.1038/nrmicro.2015.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Keeling DM, Garza P, Nartey CM, Carvunis A-R. 2019. The meanings of “function” in biology and the problematic case of de novo gene emergence. Elife 8 10.7554/eLife.47014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Carvunis A-R, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, Brar GA, Weissman JS, Regev A, Thierry-Mieg N, Cusick ME, Vidal M. 2012. Proto-genes and de novo gene birth. Nature 487:370–374. 10.1038/nature11184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ruiz-Orera J, Verdaguer-Grau P, Villanueva-Cañas JL, Messeguer X, Albà MM. 2018. Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat Ecol Evol 2:890–896. 10.1038/s41559-018-0506-6. [DOI] [PubMed] [Google Scholar]
  • 28.Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. 2021. Pfam: the protein families database in 2021. Nucleic Acids Res 49:D412–D419. 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mueller S, Papamichail D, Coleman JR, Skiena S, Wimmer E. 2006. Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. J Virol 80:9687–9696. 10.1128/JVI.00738-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Puigbò P, Aragonès L, Garcia-Vallvé S. 2010. RCDI/eRCDI: a web-server to estimate codon usage deoptimization. BMC Res Notes 3:87. 10.1186/1756-0500-3-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gelsinger DR, Dallon E, Reddy R, Mohammad F, Buskirk AR, DiRuggiero J. 2020. Ribosome profiling in archaea reveals leaderless translation, novel translational initiation sites, and ribosome pausing at single codon resolution. Nucleic Acids Res 48:5201–5216. 10.1093/nar/gkaa304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Eisenberg AR, Higdon AL, Hollerer I, Fields AP, Jungreis I, Diamond PD, Kellis M, Jovanovic M, Brar GA. 2020. Translation initiation site profiling reveals widespread synthesis of non-AUG-initiated protein isoforms in yeast. Cell Syst 11:145–160.e5. 10.1016/j.cels.2020.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ingolia NT, Lareau LF, Weissman JS. 2011. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147:789–802. 10.1016/j.cell.2011.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lee S, Liu B, Lee S, Huang S-X, Shen B, Qian S-B. 2012. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc Natl Acad Sci USA 109:E2424–2432. 10.1073/pnas.1207846109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mattiuzzo M, Bandiera A, Gennaro R, Benincasa M, Pacor S, Antcheva N, Scocchi M. 2007. Role of the Escherichia coli SbmA in the antimicrobial activity of proline-rich peptides. Mol Microbiol 66:151–163. 10.1111/j.1365-2958.2007.05903.x. [DOI] [PubMed] [Google Scholar]
  • 36.Kuru E, Määttälä R-M, Noguera K, Stork DA, Narasimhan K, Rittichier J, Wiegand D, Church GM. 2020. Release factor inhibiting antimicrobial peptides improve nonstandard amino acid incorporation in wild-type bacterial cells. ACS Chem Biol 15:1852–1861. 10.1021/acschembio.0c00055. [DOI] [PubMed] [Google Scholar]
  • 37.Baliga C, Brown TJ, Florin T, Colon S, Shah V, Skowron KJ, Kefi A, Szal T, Klepacki D, Moore TW, Vázquez-Laslop N, Mankin AS. 2021. Charting the sequence-activity landscape of peptide inhibitors of translation termination. Proc Natl Acad Sci USA 118:e2026465118. 10.1073/pnas.2026465118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Muthunayake NS, Islam R, Inutan ED, Colangelo W, Trimpin S, Cunningham PR, Chow CS. 2020. Expression and in vivo characterization of the antimicrobial peptide oncocin and variants binding to ribosomes. Biochemistry 59:3380–3391. 10.1021/acs.biochem.0c00600. [DOI] [PubMed] [Google Scholar]
  • 39.Riley M. 1993. Functions of the gene products of Escherichia coli. Microbiol Rev 57:862–952. 10.1128/mr.57.4.862-952.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474. 10.1126/science.277.5331.1453. [DOI] [PubMed] [Google Scholar]
  • 41.Datsenko KA, Wanner BL. 2000. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci USA 97:6640–6645. 10.1073/pnas.120163297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Butland G, Peregrín-Alvarez JM, Li J, Yang W, Yang X, Canadien V, Starostine A, Richards D, Beattie B, Krogan N, Davey M, Parkinson J, Greenblatt J, Emili A. 2005. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433:531–537. 10.1038/nature03239. [DOI] [PubMed] [Google Scholar]
  • 43.Stringer AM, Singh N, Yermakova A, Petrone BL, Amarasinghe JJ, Reyes-Diaz L, Mantis NJ, Wade JT. 2012. FRUIT, a scar-free system for targeted chromosomal mutagenesis, epitope tagging, and promoter replacement in Escherichia coli and Salmonella enterica. PLoS One 7:e44841. 10.1371/journal.pone.0044841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Datta S, Costantino N, Court DL. 2006. A set of recombineering plasmids for gram-negative bacteria. Gene 379:109–115. 10.1016/j.gene.2006.04.018. [DOI] [PubMed] [Google Scholar]
  • 45.Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. 2011. ViennaRNA Package 2.0. Algorithms Mol Biol 6:26. 10.1186/1748-7188-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Gertz EM, Yu Y-K, Agarwala R, Schäffer AA, Altschul SF. 2006. Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol 4:41. 10.1186/1741-7007-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sievers F, Higgins DG. 2014. Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116. 10.1007/978-1-62703-646-7_6. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1

Fig. S1. Download JB.00352-21-s0001.pdf, PDF file, 1.5 MB (1.5MB, pdf)

Supplemental file 2

Tables S1 to S5. Download JB.00352-21-s0002.xlsx, XLSX file, 3.1 MB (3.1MB, xlsx)


Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES