Abstract
The complete genome sequence of Geobacillus thermodenitrificans NG80-2, a thermophilic bacillus isolated from a deep oil reservoir in Northern China, consists of a 3,550,319-bp chromosome and a 57,693-bp plasmid. The genome reveals that NG80-2 is well equipped for adaptation into a wide variety of environmental niches, including oil reservoirs, by possessing genes for utilization of a broad range of energy sources, genes encoding various transporters for efficient nutrient uptake and detoxification, and genes for a flexible respiration system including an aerobic branch comprising five terminal oxidases and an anaerobic branch comprising a complete denitrification pathway for quick response to dissolved oxygen fluctuation. The identification of a nitrous oxide reductase gene has not been previously described in Gram-positive bacteria. The proteome further reveals the presence of a long-chain alkane degradation pathway; and the function of the key enzyme in the pathway, the long-chain alkane monooxygenase LadA, is confirmed by in vivo and in vitro experiments. The thermophilic soluble monomeric LadA is an ideal candidate for treatment of environmental oil pollutions and biosynthesis of complex molecules.
Keywords: adaptation, degradation, monooxygenase
Geobacillus is a phenotypically and phylogenetically coherent genus of thermophilic bacilli with a high 16S rRNA sequence similarity (98.5–99.2%), and was recently separated from the genus Bacillus (1). Members of Geobacillus have been isolated from various terrestrial and marine environments, not only in geothermal areas, but also in temperate regions and permanently cold habitats (2), demonstrating great capabilities for adaptation to a wide variety of environmental niches. Geobacillus spp. have attracted industrial interest for their potential applications in biotechnological processes as sources of various thermostable enzymes (2).
There are 11 validly described Geobacillus species (3), and only the whole genome sequence of Geobacillus kaustophilus HTA426, a deep-sea sediment isolate, has been reported (4). G. thermodenitrificans are Gram-positive, facultative soil bacteria with distinctive property of denitrification (5). G. thermodenitrificans NG80-2 was isolated from a deep-subsurface oil reservoir in Dagang oilfield, Northern China. It grows between 45°C and 73°C (optimum 65°C) and can use long-chain (C15-C36) alkanes as a sole carbon source (6).
Alkanes are the major components of crude oils and are commonly found in oil contaminated environments. Biotechnological applications for microbial degradation of those pollutants are of long-standing interest. Although long-chain alkanes are more persistent in the environment than shorter-chains, only genes involved in the degradation of alkanes up to C16 have been well studied (7–11), and those for longer alkanes have not yet been reported. G. thermodenitrificans NG80-2 is the only described thermophilic bacterial strain that degrades long-chain alkanes up to at least C36 (6).
Here, we report the full genome sequence of NG80-2 and its partial proteome. General and specific metabolic features are described, and mechanisms for adaptation into oil reservoirs are discussed. We identified in silico the genes for the degradation pathway of long-chain alkanes, and confirmed in vitro their encoded proteins by proteomic analysis. The key enzyme of this pathway, a long-chain alkane monooxygenase, was fully characterized and shows particular potential to be used for the treatment of environmental oil pollutions and in other biocatalytic processes.
Results
General Genome Features.
The genome of G. thermodenitrificans NG80-2 is composed of a 3,550,319-bp chromosome and a 57,693-bp plasmid, designated pLW1071, with mean G + C contents of 49.0% and 39.8%, respectively (Table 1). Both the size and structure of the NG80-2 genome are similar to that of G. kaustophilus HTA426, the only other sequenced Geobacillus genome, which contains a 3,544,776-bp chromosome and a 47,890-bp plasmid (4). There are 3,499 predicted ORFs, 11 rRNA operons and 87 tRNA genes for all 20 aa, covering 86% of the genome (Table 1 and Fig. 1). Putative functions were assigned to 2,479 ORFs [70.9%; supporting information (SI) Table 2]. Of the remainder, 757 (21.6%) showed similarity to hypothetical proteins, and 263 (7.5%) had no detectable homologs in the public protein databases (e-value <1 × 10−10). There are 68 putative transposase genes in intact or mutated forms. The genome contains 30 phage-related genes, 22 of which constitute a defective prophage located 2,915 kb from oriC.
Table 1.
General features of G. thermodenitrificans NG80-2 genome
Chromosome | Plasmid | |
---|---|---|
Total size, bp | 3,550,319 | 57,693 |
G + C content, % | 49.01 | 39.75 |
Coding density, % | 84.5 | 83.5 |
ORFs: | ||
with assigned function | 2,445 | 34 |
conserved hypothetical | 749 | 8 |
with no database match | 250 | 13 |
total | 3,444 | 55 |
Average ORF length, bp | 871 | 876 |
rRNA operons | 11 | 0 |
tRNAs | 87 | 0 |
Transposases | 55 | 13 |
Prophage-related genes | 30 | 0 |
Fig. 1.
Circular maps of the G. thermodenitrificans NG80-2 chromosome and plasmid pWL1071. (a) The chromosome map from the outside inward: the first and second circles show predicted protein-coding regions on the plus and minus strands, respectively (colors were assigned according to the color code of the COG functional classes; see key); the third circle shows transposase genes and insertion sequence elements in red; the fourth and fifth circles show tRNAs and rRNAs in dark slate blue and dark golden red, respectively; the sixth and seventh circles show percentage G + C in relation to the mean G + C and GC skew, respectively. (b) The plasmid map shows ORFs color-coded according to their assigned functions (see key).
The likely origin of replication (oriC) on the chromosome was assigned to the intergenic region upstream dnaA (GT0001) based on GC skew analysis, and four DnaA box-like sequences were also found in this region.
pLW1071 shows no overall similarity to other sequenced plasmids. Genes involved in plasmid replication and conjugation were found, and 22 more genes could be assigned functions including a long-chain alkane monooxygenase gene (ladA, see below). Involvement of pLW1071 in alkane degradation was confirmed by the fact that a derivative of NG80-2 with the plasmid cured failed to degrade hexadecane (data not shown).
Comparative Genomics.
Most predicted proteins (75.6%) have closest homologs in Geobacillus. Of the remainder, 11.4% are in Bacillus, 3.0% in other Gram-positive bacteria, and only 2.7% in other organisms. There are 2,578 (74.9%) NG80-2 genes having orthologs with an average protein identity of 83.2% in G. kaustophillus HTA426, and genome-wide synteny for the orthologs between the two strains could be detected (SI Fig. 4). Also, between 53.1% and 58.7% of the NG80-2 genes have orthologs in each of 13 Bacillus strains sequenced (www.ncbi.nih.gov/genomes/lproks.cgi), and the average identity levels are in the range of 53.5–58.1%. No orthologs were found between the plasmids of NG80-2 and HTA426, indicating different origins.
There are 1,179 orthologs shared among NG80-2, HTA426 and all 13 sequenced Bacillus genomes, of which, 1,099 (93.2%) are located in the 0–1.3 Mb and 2.2–3.55 Mb regions of the NG80-2 genome (SI Table 3). The NG80-2 genes involved in nucleotide metabolism and translation were found most conserved, with 79.4% and 76.7%, respectively, of the total related genes having orthologs in others, in contrast to 26.8% of the genes involved in carbohydrate metabolism shared (SI Fig. 5).
G. thermodenitrificans NG80-2 and G. kaustophillus HTA426 share 385 orthologs which are absent in the 13 Bacillus genomes, of which, 200 could be assigned putative functions (SI Table 4). 14 genes involved in fatty acid synthesis and degradation are shared by NG80-2 and HTA426, and coincidentally, all Geobacillus species have a distinctive fatty acid profile with high proportion of iso-branched saturated acids (1). Other shared genes include a gene cluster (GT1516-GT1518) with genes encoding a heme O oxygenase and subunits of b(o/a)3 cytochrome c oxidase involved in aerobic respiration, and a gene cluster (GT1698-GT1681) for the synthesis of cobalamin. The ability to synthesize cobalamin was also reported for G. stearothermophilus, but not for Bacillus spp. with the exception of B. megaterium in which cobalamin synthesis genes are located on a plasmid (12).
Four hundred ninety-eight of the NG80-2 genes are absent in HTA426 and the 13 Bacillus genomes, of which, 253 are orphan genes and 202 could be assigned putative functions (SI Table 5). They include those encoding proteins for the reduction of N2O to N2 (GT1734-GT1731), in line with the distinctive denitrification property of G. thermodenitrificans.
General Metabolism and Adaptation to Oil Reservoirs.
Genes required for the synthesis of purine and pyrimidine nucleotides, fatty acids, and all 20 aa except for serB of the serine pathway were identified. serB was not found in any of the sequenced Bacillus genomes, suggesting the presence of nonorthologous genes. Complete sets of genes for the synthesis of all vitamins and cofactors are present except for biotin. However, a putative biotin transporter gene (bioY, GT2896) was found, and the requirement of exogenous biotin for growth of NG80-2 was confirmed.
All central metabolic pathways for carbohydrates except for the Entner-Doudoroff pathway are present. NG80-2 utilizes a large variety of carbohydrates such as glycerol, cellobiose, trehalose and starch as indicated by the fermentation tests using API 50 CHB/E test strips (BioMérieux, Marcy l'Etoile, France), and corresponding genes including genes encoding different glycolytic activities were found (SI Table 6). Genome analysis further revealed the presence of a gene cluster (GT1801-GT1756) for utilization of plant hemicellulose xylans, and the ability of NG80-2 to use xylans as a sole carbon source was confirmed. A gene cluster for the synthesis of carbon storage compound glycogen (GT2782-GT2778) was also found. In line with its capacity to use a broad range of carbohydrates, NG80-2 has large number of genes involved in uptake of sugars (SI Table 7). At least four complete PTS systems with predicted specificity to glucose (GT0878), fructose (GT1726), mannitol (GT1857) and cellulose (GT1748-GT1750), as well as a glycerol facilitator (GT1215) are present. In addition, 16 ABC-type sugar transporter systems were found, 5 of which are clustered with the genes involved in utilization of starch or xylans.
In oil reservoirs, organic acids, with acetate being the most abundant, are commonly detected and thought to be important for bacterial survival (13). Acetate may be assimilated through the reversible Pta-AckA (GT3361, GT2688) pathway and Glyoxylate bypass. The presence of genes encoding butyrate kinase (GT2309) and formate dehydrogenases (GT0464, GT1579) indicates that butyrate and formate may also be used. A variety of aromatic compounds are present in reservoirs, and NG80-2 encodes four distinct aerobic ring cleavage pathways for degradation of benzoate (CoA activated, GT1899-GT1888), phenylacetate (CoA activated, GT1930-GT1919), 4-hydroxyphenylacetate (GT2993-GT2973), and 3-hydroxyanthranilate (GT3163-GT3150). Fatty acids are the major products of alkane degradation (see below) and NG80-2 contains multiple candidate genes for enzymes of β-oxidation. Genes encoding the two subunits of methylmalonyl-CoA mutase (B12-dependent) required for degradation of odd-carbon number fatty acids are also present (GT2300-GT2299, GT3336).
Ammonia, which is the primary nitrogen source in oil reservoirs (14), may be taken up by a specific Amt type transporter (GT1313), and assimilated via glutamate dehydrogenase (GT2171, GT3148), or the glutamine synthetase (GT1182, GT1483)-glutamate synthase (GT1292-GT1293) pathway. The latter route has a high affinity for ammonia and is more efficient in N-limited conditions (15). Typical nasAB genes encoding assimilatory nitrate and nitrite reductases were not found, indicating presence of nonorthologous genes in NG80-2, which can use nitrate as a nitrogen source. G. thermodenitrificans does not produce urease (5), and no genes encoding urease activity were found in NG80-2. Instead, we found genes encoding allophanate hydrolase (GT1351-GT1352) of the urea amidolysis reaction (16), and utilization of urea by NG80-2 was confirmed. NG80-2 contains genes for degradation of most amino acids, and at least 60 genes encoding various proteases and peptidases were also found (SI Table 6). NG80-2 has a large number of transporter genes for uptake of exogenous amino acids and peptides including 13 sets of ABC-type systems with predicted specificity for oligopeptides (4 sets), branched-chain amino acids (3 sets), polar amino acids (2 sets), spermidine/putrescine (2 sets) and methionine (2 sets) (SI Table 7), indicating that exogenous proteins, peptides and amino acids can serve as nitrogen sources.
Respiration and Fermentation.
In oil reservoirs, oxygen is only transiently available during water flushing used for oil production (17). Therefore, a flexible respiration system for quick response to changed O2 concentration is very important for the survival of bacteria in reservoirs. G. thermodenitrificans is a facultative aerobe, capable of oxygen and nitrate respiration (5). For aerobic respiration, NG80-2 contains a gene cluster encoding Complex I-like enzymes, comprising 11 (GT3302-GT3292) of the 14 subunits encoded by the E. coli nuo operon, adjacent to the ATP synthase complex genes (GT3311-GT3303). As in some thermophilic bacteria such as G. stearothermophilus (18), NG80-2 lacks three subunits of the nuo operon (nuoEFG) that code for NADH dehydrogenase activity, and no homologs of the FpoF subunit [containing the motif to bind F420H2 in M. mazei (18)] were found despite the presence of genes encoding F420H2 related proteins. Therefore, the nature of the electron donors for the Complex I-like enzymes is not clear. The NG80-2 genome also encodes two type II NADH dehydrogenases (GT2904, GT2909), a succinate dehydrogenase complex (GT2601-GT2599), a cytochrome b6c1 reductase complex (GT2126-GT2124), and five terminal oxidases of caa3-type (GT0947-GT0950), b(o/a)3-type (GT1395-GT1394, GT1517-GT1518), aa3-type (GT3391-GT3388), and bd-type (GT0518-GT0519). The bd-type quinol oxidase has high affinity to O2 for operating under low O2 concentrations (19).
NG80-2 has genes needed for the reduction of nitrate to N2. Typical narGHJI operon encoding membrane-bound nitrate reductase is present in two copies (GT0653-GT0656, GT1712-GT1715), and the identity level between orthologous genes ranges from 38% to 68%, indicating different origins. The GT0653-GT0656 set is clustered with nirK (GT0650) encoding a copper-containing nitrite reductase, norZ (GT0643) encoding a quinol oxidizing nitric oxide reductase, narK (GT0649) for nitrate or nitrite uptake, fnr (GT0657, GT0668) encoding the global anaerobic regulator, dnrN (GT0660) encoding the nitric oxide-dependent regulator, and genes involved in synthesis of molybdenum cofactor (GT0645, GT0658-GT0659, GT0662-GT0665), which is required for the activity of nitrate reductase. Fnr is the anaerobic activator of narGHJI in E. coli and B. subtilis, and both Fnr and other Fnr-like proteins such as Dnr are also involved in regulation of different denitrifying reductase genes (20).
A gene cluster (GT1735-GT1729) containing homologs of nosZDYF for the reduction of N2O to N2 as the last step of denitrification was found. GT1734 shares 41% identity with NosZ (nitrous oxide reductase) of Wolinella succinogenes (21). The purified product of GT1734 showed nitrous oxide reductase activity, and the gene was confirmed to be a nosZ (unpublished data). The presence of a nos gene cluster in Gram-positive bacteria has not been previously described.
Fermentation is an important mechanism for anaerobic growth of bacteria in the absence of alternative electron acceptors (22), and several genes of typical pyruvate fermentation pathways were found such as pta (GT3361), ackA (GT2688), ldh (GT0487), and buk (GT2309). The por genes (GT1718-GT1717), which encode pyruvate-ferredoxin oxidoreductase and are commonly present in anaerobic bacteria (23), were also found.
Nutrient Uptake and Detoxification.
Petroleum reservoirs represent an oligotrophic environment in which such elements as N, P, S, and Fe are limited (14). In another hand, growth and survival of microorganisms may be affected by the toxicity of crude oil hydrocarbons, various antimicrobial metabolites and heavy metals. Efficient transporter systems for uptake of nutrient elements and detoxification play a crucial role for the survival of bacteria under reservoir conditions. We identified 368 (10.5%) genes encoding transporter-related proteins by searching the Transport Protein Database (www.tcdb.org) (SI Table 7). The largest number (193) of these proteins are primary active transporters (class 3) including 186 members of the ATP-binding cassette transporter superfamily, and the second largest number (128) are electrochemical potential-driven transporters (class 2).
Phosphorus is considered to be the major rate limiting element for biological activity in oil reservoirs (14). NG80-2 contains a typical ABC-type Pst system (GT2397-GT2394), which is under control of the phosphate uptake regulator PhoU (GT2393). Phosphate may also be taken up by a Na+/phosphate symporter (GT2434) and two members of the PiT (inorganic phosphate transporter) family (GT0388, GT2897). GT0388 is also likely to act as a sulfate transporter, as it is clustered with other sulfate assimilation genes (GT0389-GT0386). Typical sulfate ABC-type transporter and sulfate permease (SulP) were not found. Iron uptake seems to be very important for NG80-2, as 4 sets of iron chelate ABC-type systems (GT2199-GT2197, GT1317-GT1320, GT1272-GT1275, GT0175-GT0172) and a set of FeoAB proteins (GT1740-GT1739) were found. NG80-2 also has specific uptake transporters for Zn2+ (ZnuABC, GT2407-GT2409), Co2+ (CbiMNOQ, GT1695-GT1698), Mg2+/Co2+ (CorA, GT0899, GT1415, GT1433), Zn2+/Fe2+ (ZIP, GT0845), Mn2+/Fe2+ (Nramp, GT1354), and Mg2+ (MgtC, GT2865). Organic acids including benzoate, propionate and butyrate are commonly detected in reservoirs (13), and putative genes for their uptake were found (GT1615, GT1850, GT3161). We also found a putative transporter gene (GT2686) for fatty acids, the common degradation products of n-alkanes.
NG80-2 utilizes both primary and secondary transporters for export of various drugs and heavy metals. At least 7 ABC-type drugs efflux systems are recognized, and another 27 electrochemical gradient driven systems including 14 members of the Drug:H+ antiporter, 4 of the cation diffusion facilitator (CDF), 3 of the resistance-nodulation-cell division (RND), 2 of the small multidrug resistance (SMR), and 2 of the multi antimicrobial extrusion (MATE) families; as well as two arsenite efflux pumps (GT1547, GT1599) were found. In addition to be expelled, arsenite may also be detoxified by arsenate reductase (GT1600, GT2957). The P-ATPase exporting systems for Cu2+ (GT1534-GT1535), and Zn2+/ Cd2+/Pb2+ (GT0637) were also found.
Motility, Chemotaxis, and Signal Transduction.
Similar to those found in B. subtilis, NG80-2 possesses 50 genes involved in assembly and function of flagella (GT1069-GT1100, GT2142, GT2466-GT2467, GT3052-GT3069, GT3273-GT3274) including a cluster of 31 genes (GT1069-GT1100) corresponding to the fla/che operon of B. subtilis. However, motPS encoding the second set of Mot proteins in B. subtilis was not found in NG80-2. Therefore, the flagella system in NG80-2 seems to be powered solely by MotAB complex. In addition, five intact and one interrupted methyl-accepting chemotaxis proteins (MCPs) genes (GT345, GT661, GT884, GT1165-GT1166, GT2002, GT3317) were found. Inactivation of the MCP may be resulted from changed environment in oil reservoirs.
Most of the Gram-positive bacteria genomes contain relative smaller number of genes involved in signal transduction (24). NG80-2 has genes encoding 27 histidine kinases and 29 response regulators including 18 pairs of histidine kinases-response regulators, 5 Ser/Thr protein kinases (GT0059, GT0497, GT1530, GT3032, GT3493), a Tyr protein kinase (GT3268) and 10 proteins with EAL (GT0376, GT0905, GT1555), HD-GYP (GT0007, GT1736), GGDEF (GT1560, GT2703, GT3401), EAL:GGDEF (GT0605) and GAF (GT2702) domains.
Thermotolerance.
As a thermophilic bacterium, NG80-2 can easily adapt to geothermal oil reservoirs when other survival requirements are met. Mesophilic bacteria response to heat induced stresses by induction of heat shock proteins, which remove or refold damaged proteins (25). NG80-2 contains a wide range of genes encoding molecular chaperones including dnaK operon comprised of genes encoding DnaJ-DnaK-GrpE and the HrcA repressor (GT2444-GT2441), genes encoding GroEL-GroES (GT0223-GT0224), a disulfide bond chaperone of HSP33 family (GT0065), and small heat shock proteins of IbpA family (GT0237, GT2085, GT2097). Genes encoding ATP-dependent heat shock-responsive proteases such as HslVU (GT1067-GT1068), Clp (GT0079, GT0678, GT0856), and Lon (GT2582, GT2583) are also present. Makarova et al. (26) listed 58 COG (clusters of orthologous groups) families, whose members are frequently detected in archaea and thermophilic bacteria and predicted to be associated with the (hyper)thermophilic phenotype. NG80-2 has three genes encoding proteins of those COG families: GT0279 (COG1583), GT0445 (COG3044), and GT3202 (COG2152). Homologs of GT0445 and GT3202 are also present in HTA426, and GT0279 is “unique” to NG80-2. None of the three genes were found in any of the sequenced Bacillus genomes. Polyamines such as norspermine and norspermidine are commonly found in hyperthermophilic bacteria and are necessary for the hyperthermophilic phenotype of those bacteria (27). Spermine is the major type of polyamine in Geobacillus species, and has been implicated in thermophily in G. kaustophilus based on the presence of the “unique” genes encoding spermine/spermidine synthase and polyamine ABC transporters (4). Those unique genes were also found in NG80-2 (GT1637, GT0623-GT0626). Takami et al. (4) reported possible association of asymmetric amino acid substitutions (Arg, Ala, Gly, Val, and Pro are more frequently used in HTA426 than in mesophilic Bacillus strains) with thermoadaptation of G. kaustophilus HTA426. Similar amino acid substitution pattern was also observed in NG80-2 when compared with 7 sequenced Bacillus strains including 4 used to compare with HTA426 (SI Table 8), supporting the previous suggestion.
Long-Chain Alkane Degradation System.
Alkanes are the major components of crude oil, and potentially the most abundant and available carbon and energy sources in reservoirs. NG80-2 can use long-chain alkanes as its sole carbon and energy source under aerobic conditions (6). Here, we describe the pathway and functional genes (Fig. 2).
Fig. 2.
Proteomic characterization of pathways involved in hexadecane metabolism in G. thermodenitrificans NG80-2. Differentially expressed proteins in hexadecane-grown cells, in comparison with sucrose-grown cells, were investigated by 2-D electrophoresis (2-DE)/MALDI-TOF MS analysis. The predicted metabolic pathways for sucrose and hexadecane are shown. The enzymes of the pathways and corresponding genes in NG80-2 (as gene ID numbers) are noted. Proteins detected by 2-DE/MALDI-TOF MS are highlighted. Blue, induced; red up-regulated; yellow, down-regulated; green, unchanged. Proteins with pI values outside the range of pH 4 to 7 were not investigated. Analogs are in brackets. P, phosphate; DH, dehydrogenase.
Alkane molecules are chemically inert and must be activated to allow further metabolic steps. An aerobic degradation pathway for short to medium-chain alkanes (C5-C16) has been well studied and a membrane-bound monooxygenase AlkB is responsible for the crucial initial activation of alkanes to the corresponding primary alcohols, which are further oxidized by alcohol and aldehyde dehydrogenases to fatty acids before entering β-oxidation (28). NG80-2 does not have any AlkB homologs. Real-time RT-PCR showed a 120-fold increase in transcription of a plasmid-borne putative monooxygenase gene (GT3499) when crude oil was used as a sole carbon source instead of sucrose (SI Fig. 6). GT3499 shares 38% identity with DBT-5,5′-dioxide monooxygenase of Paenibacillus sp. A11-2 (29) but has no detectable similarity with any known alkane monooxygenases.
The purified product of GT3499 converted alkanes ranging from C15 to C36 to corresponding primary alcohols as determined by GC and GC-MS (Fig. 3 b–d). In vivo experiments showed that a plasmid carrying GT3499 (pCOM8-ladA) partially restored an alkB knockout mutant of P. fluorescens CHA0 (KOB2▵1) the ability to grow on alkanes by allowing growth on C15-C17, but not C6-C14. Complementation on longer chain alkanes could not be assessed as the mutant retains the ability to grow on C18-C28 because of presence of a second unidentified alkane hydroxylase system (30). Thus GT3499 was identified as an alkane monooxygenase gene, and designated as ladA (long-chain alkane degradation). The results indicate that NG80-2 also utilizes a terminal oxidation pathway for the degradation of long-chain alkanes. LadA and AlkB, which catalyze the initial attack on alkanes, determine the size range of alkanes to be degraded.
Fig. 3.
Characterization of long-chain alkane monooxygenase LadA. (a) Induced expression of LadA as an extracellular protein in hexadecane-grown NG80-2 cells. Sucrose-grown cells were used as the reference state. Sections of 2-D PAGE gels stained with colloidal CBB G-250 are shown. The spot in circle was identified as LadA by MALDI-TOF MS analysis. (b) Expression and purification of LadA in E. coli as shown by SDS/PAGE. Lane 1, crude extract of E. coli BL21 (pET-ladA) before IPTG induction; lane 2, crude extract induced with IPTG; lane 3, fractions eluted from Ni2+ column; lane 4, molecular size markers. (c) Hydroxylation of hexadecane by purified LadA. GC chromatographs show conversion of hexadecane to 1-hexadecanol. IS, internal standard (squalane). (d) Specific activity of purified LadA on alkanes with different chain lengths.
By using sucrose-grown cells as the reference state, proteins differentially expressed in hexadecane-grown cells were investigated by 2-D electrophoresis and MALDI-TOF MS analysis. Expression of LadA was induced and mainly found in extracellular fraction (Figs. 2 and 3a; see also SI Fig. 7), although this protein has no leader peptides. A putative aldehyde dehydrogenase (GT3117) and enzymes of the β-oxidation pathway were also up-regulated, whereas enzymes of the glycolysis pathway were down-regulated (Fig. 2; see also SI Fig. 8). Alcohol dehydrogenase is the second enzyme of the terminal oxidation pathway and three putative alcohol dehydrogenases (GT1287, GT1754, GT2878) were detected at similar levels under both conditions, suggesting their involvement in multiple metabolic functions (Fig. 2; see also SI Fig. 8). Expression of the TCA cycle enzymes was also unchanged as expected for this central metabolic pathway. However, glyoxylate bypass which allows growth on acetate as the sole carbon source was found up-regulated. The proteomic analysis confirmed the presence of a terminal oxidation pathway for degradation of long-chain alkanes, which is initiated by LadA, and fatty acids produced are further degraded to acetyl-CoA via β-oxidation before entering the TCA cycle or glyoxylate bypass when alkanes are present as the sole carbon source for energy production (Fig. 2).
Thermophilic enzymes offer major biotechnological advantages over mesophilic enzymes. Oxygenases have been used for industrial synthesis of many complex molecules (31), and LadA has great potential to be used in this area, particularly using petroleum compounds as substrates, and also in treatment of environmental oil pollution. Industrial applications of currently known alkane oxygenases are restricted because of their complex biochemistry and process requirements, e.g., AlkB is a membrane protein and requires rubredoxin and a rubredoxin reductase for activity (31). LadA acts on long-chain alkanes, is single-component with no coenzyme requirement, soluble (extracellular), and easily expressed and purified in E. coli.
Discussion
G. thermodenitrificans NG80-2 has versatile metabolic pathways, flexible respiration systems and robust transport systems for its survival under various environmental conditions. The ability to degrade long-chain alkanes and carry out denitrification provides major survival advantages to NG80-2 in the current oil reservoir, in which water had been used as the driving force for oil exploitation for 11 years before NG80-2 was isolated (unpublished data). When oxygen is available through water flushing, alkanes can be used. Upon exhaustion of oxygen, NG80-2 can use nitrate, which may be carried in by the flushing waters, as an alternative electron acceptor and various organic acids as electron donors for survival.
Presence of genes involved in utilization of xylans and degradation of oligopeptides confirms that NG80-2 originated from a soil environment. We postulate that NG80-2 gained the capacity for alkane oxidization using plasmid pLW1071 in an oil contaminated soil before invading the oil reservoir. Gene ladA is flanked at both ends (14,207 bp upstream and 267 bp downstream) by two insertion sequences of the IS21 family. IS21 elements are mainly found in Bacillus, Pseudomonas and Yersinia strains (32), indicating that ladA originated outside of G. thermodenitrificans and moved into its current position on the plasmid mediated by the two flanking IS21 elements. The upstream IS21 is intact, indicating that the movement of the ladA gene and its flanking DNA was a recent event. The downstream IS21 is also intact except for the insertion of an intact IS982 in the istB gene. The IS982 element is also present in the NG80-2 chromosome in eight places, and these nine elements share high level DNA identity ranging from 90% to 100%. This fact indicates that IS982 inserted into the plasmid in NG80-2.
NG80-2 can facilitate oil recovery when added to oil reservoirs, and this property may be partially attributed to its ability to selectively degrade long-chain alkanes, which leads to reduced viscosity of crude oils (data not shown). We have demonstrated a combined genomic and proteomic approach to predict in silico and confirm in vitro a long-chain alkane degrading pathway. The NG80-2 genome provides an excellent platform for further improvement of this organism for oil bioremediation and other biotechnological applications.
Methods
Genome Sequencing and Assembly.
Genomic DNA, isolated from G. thermodenitrificans NG80-2, was sequenced by using a conventional whole genome random shotgun strategy (33). Plasmid libraries of small inserts (2–3 kb) and large inserts (8–10 kb) generated by mechanical shearing of genomic DNA were constructed in pUC118 (Takara, Dalian, China). Plasmid DNA, for use as template for sequencing, was extracted by the alkaline lysis method from cells grown in 96-well plates, and purified by using Whatman Unifilter plates (Whatman Inc., Clifton, NJ). Double-ended plasmid sequencing reactions were carried out by using ABI BigDye Terminator V3.1 Cycle Sequencing Kit, and the sequencing ladders were resolved on an ABI 3730 Automated DNA Analyzer (Applied Biosystems, Foster City, CA). Approximately 52,000 reads with an average length of 629 bp were generated, providing a 9.1-fold genome coverage. These sequences were assembled into contigs by using the PHRED_PHRAP_CONSED software package (34). Sequence gaps were closed by primer walking on gap-spanning library clones identified based on linking information from forward and reverse reads. The remaining 229 physical gaps were closed by using an improved Multiplex PCR method (35). The sequence was edited manually, and additional PCR and sequencing reactions were done to improve coverage and resolve sequence ambiguities. All repeated DNA regions were verified by PCR amplification across the repeat and sequencing of the product. The final genome is based on 55,727 reads.
Genome Analysis.
ORFs were identified by using CRITICA (36) and GLIMMER (37), followed by BLASTX searches of the remaining intergenic regions. RBSFINDER (www.tigr.org/software) was used to locate the start codons. Transfer RNAs were predicted by using tRNAscan-SE (38). Ribosomal RNAs were identified by searching the genomic sequences against a set of known rRNAs with BLASTN and verified by multiple alignments with other known rRNA sequences. The software Artemis (39) was used for whole-genome visualization. Genome annotation was performed by comparing protein sequences with those in the NCBI protein database (www.ncbi.nlm.nih.gov) and the clusters of orthologous groups (COG) database (www.ncbi.nlm.nih.gov/COG). All annotations were inspected manually through searches against PFAM (40), SMART (41), and PROSITE (42) databases. TMHMM 2.0 (43) was used to identify transmembrane domains, and SignalP 3.0 (44) was used to predict signal peptide regions. Orthologous gene sets were identified by BLASTP Reciprocal Best Hits (RBH) (45). Putative metabolic pathways were analyzed by Metacyc (46) and the KEGG database (47).
Other Methods.
Proteomic analysis, and methods used for in vivo and in vitro characterization of ladA are presented as SI Materials and Methods.
Supplementary Material
Acknowledgments
We thank Dr. Jan B. van Beilen (Institute of Biotechnology, Zurich, Switzerland) for kindly providing Pseudomonas fluorescens KOBΔ1 and plasmid pCom8 and for advice on in vivo functional analysis of long-chain alkane monooxygenase genes. The following persons are acknowledged for their contributions to genome sequencing: Yuan Sui, Yue Li, Chun Zhang, Gang Yuan, Likun Lu, Tao Tang, Hao Yu, Jianxiong Peng, Zhifeng Yang, and Chengtai Liao. This work was supported by Tianjin Municipal Special Fund for Science and Technology Innovation Grant 05FZZDSH00800, National 863 Program of China Grants 2002AA226011 and 2006AA020703, and funds from the Tianjin Municipal Science and Technology Committee (Grants 033105811 and 06YFJZJC02200).
Abbreviation
- COG
clusters of orthologous groups.
Footnotes
Conflict of interest statement: L.F., W.W., Y.T., and L.W. have a financial conflict of interest resulting from a patent application for the DNA sequence of the ladA gene.
This article is a PNAS direct submission.
Data deposition: The complete genome sequence of G. thermodenitrificans NG80-2 reported in this paper has been deposited in the GenBank database [accession nos. CP000557 (chromosome) and CP000558 (plasmid)].
This article contains supporting information online at www.pnas.org/cgi/content/full/0609650104/DC1.
References
- 1.Nazina TN, Tourova TP, Poltaraus AB, Novikova EV, Grigoryan AA, Ivanova AE, Lysenko AM, Petrunyaka VV, Osipov GA, Belyaev SS, et al. Int J Syst Evol Microbiol. 2001;51:433–446. doi: 10.1099/00207713-51-2-433. [DOI] [PubMed] [Google Scholar]
- 2.McMullan G, Christie JM, Rahman TJ, Banat IM, Ternan NG, Marchant R. Biochem Soc Trans. 2004;32:214–217. doi: 10.1042/bst0320214. [DOI] [PubMed] [Google Scholar]
- 3.Rahman TJ, Marchant R, Banat IM. Biochem Soc Trans. 2004;32:209–213. doi: 10.1042/bst0320209. [DOI] [PubMed] [Google Scholar]
- 4.Takami H, Takaki Y, Chee GJ, Nishi S, Shimamura S, Suzuki H, Matsui S, Uchiyama I. Nucleic Acids Res. 2004;32:6292–6303. doi: 10.1093/nar/gkh970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Manachini PL, Mora D, Nicastro G, Parini C, Stackebrandt E, Pukall R, Fortina MG. Int J Syst Evol Microbiol. 2000;50:1331–1337. doi: 10.1099/00207713-50-3-1331. [DOI] [PubMed] [Google Scholar]
- 6.Wang L, Tang Y, Wang S, Liu RL, Liu MZ, Zhang Y, Liang FL, Feng L. Extremophiles. 2006;10:347–356. doi: 10.1007/s00792-006-0505-4. [DOI] [PubMed] [Google Scholar]
- 7.van Beilen JB, Panke S, Lucchini S, Franchini AG, Rothlisberger M, Witholt B. Microbiology. 2001;147:1621–1630. doi: 10.1099/00221287-147-6-1621. [DOI] [PubMed] [Google Scholar]
- 8.Tani A, Ishige T, Sakai Y, Kato N. J Bacteriol. 2001;183:1819–1823. doi: 10.1128/JB.183.5.1819-1823.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.van Beilen JB, Smits TH, Whyte LG, Schorcht S, Rothlisberger M, Plaggemeier T, Engesser KH, Witholt B. Environ Microbiol. 2002;4:676–682. doi: 10.1046/j.1462-2920.2002.00355.x. [DOI] [PubMed] [Google Scholar]
- 10.Whyte LG, Smits TH, Labbe D, Witholt B, Greer CW, van Beilen JB. Appl Environ Microbiol. 2002;68:5933–5942. doi: 10.1128/AEM.68.12.5933-5942.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yadav JS, Loper JC. Gene. 1999;226:139–146. doi: 10.1016/s0378-1119(98)00579-4. [DOI] [PubMed] [Google Scholar]
- 12.Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS. J Biol Chem. 2003;278:41148–41159. doi: 10.1074/jbc.M305837200. [DOI] [PubMed] [Google Scholar]
- 13.Barth T. Appl Geochem. 1991;6:1–15. [Google Scholar]
- 14.Head IM, Jones DM, Larter SR. Nature. 2003;426:344–352. doi: 10.1038/nature02134. [DOI] [PubMed] [Google Scholar]
- 15.Merrick MJ, Edwards RA. Microbiol Rev. 1995;59:604–622. doi: 10.1128/mr.59.4.604-622.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kanamori T, Kanou N, Atomi H, Imanaka T. J Bacteriol. 2004;186:2532–2539. doi: 10.1128/JB.186.9.2532-2539.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Whelan JK, Kennicutt MC, Brooks JM, Schumacher D, Eglington LB. Org Geochem. 1994;22:587–615. [Google Scholar]
- 18.Pereira MM, Bandeiras TM, Fernandes AS, Lemos RS, Melo AM, Teixeira M. J Bionenerg Biomembr. 2004;36:93–105. doi: 10.1023/b:jobb.0000019601.74394.67. [DOI] [PubMed] [Google Scholar]
- 19.D'Mello R, Hill S, Poole RK. Microbiology. 1996;142:755–763. doi: 10.1099/00221287-142-4-755. [DOI] [PubMed] [Google Scholar]
- 20.Philippot L. Biochim Biophys Acta. 2002;1577:355–376. doi: 10.1016/s0167-4781(02)00420-7. [DOI] [PubMed] [Google Scholar]
- 21.Simon J, Einsle O, Kroneck PM, Zumft WG. FEBS Lett. 2004;569:7–12. doi: 10.1016/j.febslet.2004.05.060. [DOI] [PubMed] [Google Scholar]
- 22.Eschbach M, Schreiber K, Trunk K, Buer J, Jahn D, Schobert M. J Bacteriol. 2004;186:4596–4604. doi: 10.1128/JB.186.14.4596-4604.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kletzin A, Adams MW. J Bacteriol. 1996;178:248–257. doi: 10.1128/jb.178.1.248-257.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Galperin MY. Environ Microbiol. 2004;6:552–567. doi: 10.1111/j.1462-2920.2004.00633.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yura T, Nakahigashi K. Curr Opin Microbiol. 1999;2:153–158. doi: 10.1016/S1369-5274(99)80027-7. [DOI] [PubMed] [Google Scholar]
- 26.Makarova KS, Koonin EV. Genome Biol. 2003;4:115. doi: 10.1186/gb-2003-4-8-115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Daniel RM, Cowan DA. Cell Mol Life Sci. 2000;57:250–264. doi: 10.1007/PL00000688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.van Beilen JB, Li Z, Duetz WA, Smits THM, Witholt B. Oil Gas Sci Technol. 2003;58:427–440. [Google Scholar]
- 29.Ishii Y, Konishi J, Okada H, Hirasawa K, Onaka T, Suzuki M. Biochem Biophys Res Commun. 2000;270:81–88. doi: 10.1006/bbrc.2000.2370. [DOI] [PubMed] [Google Scholar]
- 30.Smits TH, Balada SB, Witholt B, van Beilen JB. J Bacteriol. 2002;184:1733–1742. doi: 10.1128/JB.184.6.1733-1742.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.van Beilen JB, Funhoff EG. Curr Opin Biotechnol. 2005;16:308–314. doi: 10.1016/j.copbio.2005.04.005. [DOI] [PubMed] [Google Scholar]
- 32.Mahillon J, Chandler M. Microbiol Mol Biol Rev. 1998;62:725–774. doi: 10.1128/mmbr.62.3.725-774.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Science. 1995;269:496–512. doi: 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
- 34.Gordon D, Abajian C, Green P. Genome Res. 1998;8:195–202. doi: 10.1101/gr.8.3.195. [DOI] [PubMed] [Google Scholar]
- 35.Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL. Genomics. 1999;62:500–507. doi: 10.1006/geno.1999.6048. [DOI] [PubMed] [Google Scholar]
- 36.Badger JH, Olsen GJ. Mol Biol Evol. 1999;16:512–524. doi: 10.1093/oxfordjournals.molbev.a026133. [DOI] [PubMed] [Google Scholar]
- 37.Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Nucleic Acids Res. 1999;27:4636–4641. doi: 10.1093/nar/27.23.4636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lowe TM, Eddy SR. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B. Bioinformatics. 2000;16:944–945. doi: 10.1093/bioinformatics/16.10.944. [DOI] [PubMed] [Google Scholar]
- 40.Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al. Nucleic Acids Res. 2006;34:D247–251. doi: 10.1093/nar/gkj149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P. Nucleic Acids Res. 2006;34:D257–260. doi: 10.1093/nar/gkj079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ. Nucleic Acids Res. 2006;34:D227–230. doi: 10.1093/nar/gkj063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Krogh A, Larsson B, von Heijne G, Sonnhammer EL. J Mol Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
- 44.Bendtsen JD, Nielsen H, von Heijne G, Brunak S. J Mol Biol. 2004;340:783–795. doi: 10.1016/j.jmb.2004.05.028. [DOI] [PubMed] [Google Scholar]
- 45.Hirsh AE, Fraser HB. Nature. 2001;411:1046–1049. doi: 10.1038/35082561. [DOI] [PubMed] [Google Scholar]
- 46.Caspi R, Foerster H, Fulcher CA, Hopkinson R, Ingraham J, Kaipa P, Krummenacker M, Paley S, Pick J, Rhee SY, et al. Nucleic Acids Res. 2006;34:D511–516. doi: 10.1093/nar/gkj128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. Nucleic Acids Res. 2004;32:D277–280. doi: 10.1093/nar/gkh063. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.