Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2010 Jul 30;192(21):5788–5798. doi: 10.1128/JB.00425-10

Unexpected Abundance of Coenzyme F420-Dependent Enzymes in Mycobacterium tuberculosis and Other Actinobacteria

Jeremy D Selengut 1,, Daniel H Haft 1,‡,*
PMCID: PMC2953692  PMID: 20675471

Abstract

Regimens targeting Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), require long courses of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F420, which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F420 biosynthesis nominated many actinobacterial proteins as candidate F420-dependent enzymes. Three such families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5′-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be limited to F420-producing species. The LLM and PPOX families were observed in F420-producing species as well as species lacking F420 but were particularly numerous in many actinobacterial species, including M. tuberculosis. Partitioning the LLM and PPOX families based on an organism's ability to make F420 allowed the application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to identify F420-correlated subsequences. These regions were found to correspond to flavonoid cofactor binding sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F420-dependent enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in these families. While prevalent in mycobacteria, markers of F420 biosynthesis appeared to be absent from the normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F420 for its redox reactions. This dependence and the cofactor's rarity may make F420-related proteins promising drug targets.


Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), is an actinobacterium that presents a number of clinical challenges. For example, due to the high frequency of drug-resistant mutants, TB antibiotic regimens require long courses of treatment and a combination of three or more separate drugs (37). Long courses of combination therapy contribute to noncompliance, which in turn has led to an increase in the occurrence of multiple-drug-resistant (MDR) and extensively drug-resistant (XDR) tuberculosis (39). There is a clear need for additional tuberculosis-specific drugs that, in combination with the current pharmacopeia, can shorten the course of treatment and increase its effectiveness.

Biological features that are present in mycobacteria but rare or absent in other organisms are useful targets for treating TB. For example, mycobacteria have “mycolic” fatty acids present in their cell walls that distinguish them from all other bacteria. Four major anti-TB drugs (isoniazid, cycloserine, ethambutol, and ethionamide) are known to target enzymes involved in the biosynthesis of the mycobacterial cell wall, and others, such as pyrazinamide, caprazamycin, and caprolactams, may do so as well (34, 43).

Similarly, the enzyme cofactor F420 (Fig. 1), a deazaflavin analog of flavin mononucleotide (FMN), is absent from humans but distributed sporadically and sparsely among prokaryotes and observed universally in the mycobacteria (including being encoded by the reduced genome of Mycobacterium leprae). It has been suggested that the reduced F420 (F420H2) produced by the action of the F420-dependent glucose-6-phosphate dehydrogenase (4) under aerobic conditions may protect mycobacterial cells from macrophage-generated NO2 (31). Moreover, Rv3547 from M. tuberculosis uses reduced F420 in the activation of the NO2-containing antitubercular drug candidate PA-824 (40). Overall, F420 may confer an advantage to mycobacteria in anaerobic environments because it has a lower redox potential than NADP (5).

FIG. 1.

FIG. 1.

Flavonoid cofactor structures. (A) FMN. (B) Coenzyme F420. Note that coenzyme F420 typically contains 5 to 7 side chain glutamate residues in mycobacterial species (3).

The sporadic phylogenetic distribution of F420 provides an opportunity for the application of comparative genomic methods. We introduced partial phylogenetic profiling (PPP) to efficiently discover protein families codistributed with such patterns of biological traits (21). Unlike earlier profiling methods, PPP does not require the prior accurate determination of protein families for success. This method is well suited to the identification of F420-dependent enzyme families, which may have distributions only partially spanning the entire profile. PPP analysis is further augmented by SIMBAL (sites inferred by metabolic background assertion labeling) (36). This technique can pinpoint sites discriminating F420 binding from FMN binding and subsequently identify additional correlated genes that are undetectable by PPP.

Here we demonstrate how comparative genomics, namely, profiling, can strongly associate sets of genes in a particular genome of interest with a biologically important trait, generating numerous experimentally testable hypotheses. This analysis has indicated a pervasive and presumably important feature of M. tuberculosis and its lifestyle. The lack of F420-based reactions in humans or their associated gut flora and their prevalence in M. tuberculosis may provide another drug target.

MATERIALS AND METHODS

Data sets.

A total of 1,451 bacterial and archaeal complete and draft genomes were downloaded from the NCBI on 1 June 2009. An all-versus-all BLAST calculation was performed on the protein coding sequences from all of these genomes, and the E value results were stored as a flat file.

Construction of a coenzyme F420 utilization phylogenetic profile.

We applied hidden Markov models (HMMs) for coenzyme F420 biosynthetic genes (Table 1), using the HMMER 2.0 package (biosequence analysis using profile hidden Markov models [http://hmmer.janelia.org]), to the genomic data set (described above) and identified those genomes where genes for at least four of five F420 biosynthesis components were detected. Genomes containing these F420 biosynthetic genes were placed in the positive branch of the profile. Those containing no detectable components were placed in the negative branch. Organisms with intermediate content were assigned as follows. A set of cyanobacterial genomes encoded only the Fo synthase subunits (CofGH) and were placed in the negative set, since they are known to make only Fo, which is used as a distinct cofactor (14). A separate set of genomes contained CofCDE genes but lacked the Fo synthase subunit gene (e.g., Xylanimonas cellulosilytica DSM 15894). PPP analyses (data not shown, but essentially the same as the results shown in Table S2 in the supplemental material) indicated that these genomes encode the same families of F420-dependent enzymes as those with the complete biosynthesis pathway. This suggests that either a separate, nonhomologous Fo synthase or an Fo importer system exists in these organisms. Despite failing to identify candidates for these potential functions, these genomes were included in the positive branch of the profile. A final group of genomes contained only close homologs of the terminal enzyme CofE (e.g., Jonesia denitrificans DSM 20603). As described above, these genomes carried genes for putative F420 binding enzymes (sometimes adjacent to cofE, as in sequence EEN39210.1 from Cellulomonas flavigena DSM 20109) as well as an ABC-type transporter gene often found adjacent to the cofE gene (in J. denitrificans sequences EEJ12725.1, -6.1, and -7.1), and therefore a candidate Fo transporter gene. Accordingly, these genomes were also counted among the F420-positive set.

TABLE 1.

Components of the coenzyme F420 biosynthetic pathway and their TIGRFAMs HMMs

Component Function Model Reference
CofC 2-Phospho-l-lactate guanylyltransferase TIGR03552 18
CofD LPPG:FO 2-phospho-l-lactate transferase TIGR01819 15
CofE Gamma-glutamyl ligase TIGR01916 25
CofG Fo synthase subunit TIGR03550 19
CofH Fo synthase subunit TIGR03551 17

PPP.

Phylogenetic profiling (PP) methods allow the discovery of biological features (usually protein families) codistributed with traits observed in defined sets of organisms. PP methods often are limited by a dependence on protein families of a fixed size, whether preconstructed or generated at run time by use of static parameters. Such preconstructed families are often too large or too small for the comparison at hand. PPP differs from other profiling methods in its independence from precalculated protein clusters. In PPP, clusters of increasing size are generated on the fly for each query protein by selecting increasingly permissive sequence similarity cutoffs. These clusters are then compared with the reference profile under study; PPP returns both an optimized score and the correspondingly optimized protein cluster. The statistical procedures utilized by PPP may be tuned to enable the identification of proteins that have distributions that are strict subsets of a reference profile, as F420-dependent enzymes are expected to be compared to an F420 utilization profile.

Here we identified F420-utilizing enzymes in mycobacteria based on the observed pattern (profile) of the F420 biosynthesis trait. A phylogenetic profile was determined from the above results, with genomes in the positive branch represented by 1's and those in the negative branch represented by 0's. Profiling was carried out as previously reported (21). For each gene in the genome of interest, a ranked list of all BLAST matches was prepared. Processing from the strongest hit downwards, the genomic source of each hit was examined. The genome of interest itself and all but the first hit to other genomes were ignored. Additionally, a taxonomic filter was applied to limit sample bias caused by a superabundance of genomes from certain species in the data set: once a hit from a certain species was found, all other hits to that species were ignored. As each new genome was identified (whether marked as a “1” or a “0” in the profile), the likelihood, based on the proportion (P) of 1's in the profile (or a manually set value), that the total observed number of 1's thus far might have occurred by chance was calculated using the binomial equation. As discussed above, P was set to 0.3 for this study, to accentuate families whose members form strict subsets of the positive branch. As the list was processed, the lowest likelihood (most significant) score achieved was recorded for each gene in the genome. After all genes in the genome were processed, those having achieved the most significant overall scores were reported in rank order.

Protein family HMM building.

Sequences suggested by PPP to be linked to F420 metabolism were considered for construction of defining protein families based on full-length multiple sequence alignments. Those proteins not directly associated with known steps in F420 biosynthesis belonged mostly to the luciferase-like monooxygenase (LLM), pyridoxamine 5′-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) domain families, each of which is known to contain members binding at least one type of flavonoid cofactor. Sets of candidate orthologs occurring in multiple F420-biosynthesizing species, as identified by bidirectional best-hit relationships and essentially full-length homology, were aligned first by Muscle (13), manually inspected, and then trimmed and realigned as necessary to produce seed alignments for HMM construction. Criteria such as the proper alignment of known conserved motifs, stability of alignments to realignment after trimming or after the addition or removal of sequences, and the absence of especially long branch lengths in computed phylogenetic trees were assessed manually to judge alignment quality and to select the most accurately constructed alignments. If Muscle alignments were deemed suspect, alignment using Clustal W (24) was attempted. Deep separation of clades in neighbor-joining (NJ) trees, differences in domain architecture, and sharp drops in sequence similarity scores were taken as indicators of splits between distinct subfamilies to guide HMM construction. Cutoff scores were set for the resulting HMMs to select only full-length homologs from F420-synthesizing species. Families with no more than one hit per genome were designated putative equivalogs, that is, were hypothesized to share a specific function even though that function is unknown. HMMs identifying multiple proteins per genome were designated instead as subfamily models. All models described in this work were included in TIGRFAMs (35), release 9.0 (http://www.jcvi.org/cms/research/projects/tigrfams/).

SIMBAL.

SIMBAL can be used to gain insight into the molecular mechanisms underlying the associations between proteins and traits that are discovered by PPP. In this study, families that include both FMN-binding and F420-binding members may illuminate molecular details of cofactor binding sites and provide convenient classifiers for the identification of F420-dependent enzymes. True and false training sets were constructed by partitioning members of the LLM (Pfam accession no. PF00296) and PPOX (Pfam accession no. PF01243) families based on their genomes of origin, using the same profile as that for PPP. This method generates a noisy true set containing a population of false-positive results, i.e., FMN-binding LLM or PPOX proteins present in F420-producing organisms (since FMN is universal). In the case of the LLM study, a cleaner true set was generated by collecting all members of the presumptive F420-specific families modeled by the HMMs in Table 2 .

TABLE 2.

Phylogenetic distribution analysis of subfamilies within the PF00296 LLM family

Family Distribution among F420-producing taxaa M. tuberculosis member(s) (no. found in M. smegmatis) Member(s) identified by PPP (no. found in M. smegmatis)
Likely F420-dependent families
    TIGR03554 (FGD1) Ac Rv0407 (1) Rv0407 (1)
    TIGR03559 Ac, Ch, Pr Rv3520c (4) Rv3520c (4)
    TIGR03560 Ac, Ch, Pr, Ar Rv1855c (6) Rv1855c (6)
    TIGR03564 Ac (3) (3)
    TIGR03617 Ac, Ch, Pr Rv1360 (3) Rv1369 (3)
    TIGR03619 Ac, Ch, Pr, Ba Rv0791c, Rv0940c, Rv0953c, Rv2161c, Rv3079c (13) Rv2161c, Rv3079c (8)
    TIGR03620 Ac, Pr Rv3463 (5) (2)
    TIGR03621 Ac Rv2893 (3) Rv2893 (3)
    TIGR03841 Ac Rv3093 (0)
    TIGR03842 Ac, Pr (2) (2)
    TIGR03854 Ac (1) (1)
    TIGR03856 Ac, Ch Rv0044c (3) Rv0044c (3)
    TIGR03857 Ac, Pr (1) (1)
Others Rv0132c, Rv2951c, Rv1936, Rv3618 (21) Rv0132c, Rv2951c (7)
Likely FMN-dependent families
    TIGR03558 (1) (0)
    TIGR03860 (18) (0)
a

Ac, Actinobacteria; Ch, Chloroflexi; Pr, Proteobacteria; Ba, Bacteroides; Ar, Archaea.

For each query protein, subsequences of the indicated lengths were generated by scanning appropriately sized windows over the entire sequence. SIMBAL was carried out as previously published (36). Each subsequence was used as a BLAST query versus a combined database of the true and false training sets. In a manner analogous to that for PPP, subsequences were scored based on the preponderance of hits to the true partition, using the binomial equation. SIMBAL scores are reported as log likelihood values. Longer subsequences may include multiple short regions of correlation and thus will tend to have significant scores. Locally important subsequence regions will stand out above this background and may even outscore the full-length sequence. This increasing background can be removed, and the localization of the SIMBAL signal accentuated, by dividing the SIMBAL score by the window length.

RESULTS

Determination of the set of prokaryotic coenzyme F420 utilizers.

In order to apply profiling methods to the study of F420 in mycobacteria, we constructed a profile of F420 utilization over all bacterial genomes. The most straightforward marker of F420 utilization is the biosynthesis of F420 itself. The pathway for F420 biosynthesis has been elucidated and proceeds from two compounds, 5-amino-6-ribitylamino-2,4(1H,3H)-pyrimidinedione and 4-hydroxyphenylpyruvate. These compounds are intermediaries in the biosyntheses of FMN and tyrosine, respectively, and are condensed by Fo synthase: two subunits, designated CofG and CofH, are often observed, but these are fused into one protein (FbiC) in mycobacteria (19). The mature cofactor is subsequently produced by two enzymes, CofD (FbiA) and CofE (FbiB). These attach a phospho-l-lactate group and a variable number of glutamate residues (generally five in the case of actinobacteria). The activated precursor of the phospho-l-lactate group, lactyl-2-diphospho-5′-guanosine (LPPG), is made by the CofC protein.

To accurately detect these genes in genomes, we utilized HMMs. Equivalogs are protein families with conserved molecular function since their last common ancestor (20). Equivalog HMMs have been built for each F420 biosynthetic enzyme and are included within the TIGRFAMs library (35). These in turn have been combined to form a Genome Property (35) for F420 biosynthesis which can be used to conveniently identify the presence of this set of genes in any prokaryotic genome (Table 1).

We applied these HMMs to a set of all 1,451 bacterial and archaeal genomes available (at the time of this work) from the NCBI in order to determine which contained the essential components of the F420 biosynthesis trait. Details of this procedure are presented in Materials and Methods. We identified 11% of all species in our sample as F420 producers (including about 50% of all actinobacterial species [see Table S1 in the supplemental material]).

Identification of candidate F420-associated proteins by phylogenetic profiling.

Using PPP on this F420 producer profile in its default mode, the program provided results optimized to identify proteins present in all F420-producing species by setting a probability value equal to the 11% of genomes in the positive branch of the profile. Here we were interested in families of F420-dependent enzymes which may be present only in a subset of the positive branch but only very rarely, if at all, in the negative branch. This makes sense for particular F420-dependent enzymes, whose functions might not be required in every F420-producing organism. By forcing the probability variable to have an arbitrary value of >11%, which had the effect of penalizing hits to the negative branch more severely, higher scores were obtained for families strictly limited to subsets of the positive branch of the profile, even if those families are far from universal across those genomes.

We first applied the PPP algorithm, using the above profile and a P value of 30%, to the genome of Mycobacterium smegmatis MC2 155. Aside from the biosynthetic genes for CofC (MSMEG_2393), CofD (fbiA; MSMEG_1830), CofE (fbiB; MSMEG_1829), and the fusion protein CofGH (fbiC; MSMEG_5126), 62 of the top 63 hits are members of only three homology families (see Table S1 in the supplemental material), each of which is known to include flavin cofactor binding proteins. Primary among these, with 44 prominent hits and the widest species distribution, was the LLM (Pfam model PF00296) family. (This model was recently updated to version 13 in release 24.0 of Pfam, correcting serious deficiencies of sensitivity; it requires version 3.0 of HMMER to run, a version of which is available for download from http://hmmer.janelia.org/.) Several LLM family proteins, particularly those found in archaea, are known to be F420-dependent enzymes (1, 2, 4), and one has been characterized as requiring FAD (8), but most, including luciferase itself, utilize FMN (6, 22, 26, 41). Second most prominent among the PPP results, with nine hits, was the DDN family (represented by TIGR00026 and InterPro accession no. IPR004378). The only characterized member of this family is F420 dependent (40). Third, with five prominent members, was the PPOX family (represented by Pfam model PF01243). There are no known F420-dependent PPOX family members, although several FMN-dependent enzymes are known: PPOX (also called PdxH [11]), an FMN binding protein of unknown function (23), and PhzG, an enzyme involved in the biosynthesis of phenazine (28). Tellingly, a crystal structure of the M. tuberculosis Rv1155 protein, a member of the PPOX family, was noted to have a much altered flavin binding site, consistent with its apparent lack of FMN binding; it was hypothesized that a novel binding capability evolved in this and related enzymes (7).

The DDN family is observed exclusively in F420-producing species. The other two families showed an excess in the number of genes found per genome for F420 producers among actinobacteria (Fig. 2) and in general. These data suggest that in these families and in these genomes, a notable expansion has occurred. F420, with its lower reduction potential, gives access to an increased range of chemical transformations; presumably, the large number of family members in some genomes indicates a diverse group of available reactions. These three families account for 32 genes in M. tuberculosis and 123 genes in M. smegmatis. Although PPP was able to identify certain members of the LLM and PPOX families as F420 correlated, it remains to be determined how many and which of the many uncharacterized members of these families are F420 binding.

FIG. 2.

FIG. 2.

Average numbers of putative F420/FMN-binding protein family genes in actinobacterial species. The presence of F420 biosynthesis components is correlated with large expansions of these families.

LLM family.

Although PPP is clearly a very efficient method for determining candidates for association with a profile, it can both fail to identify all such related proteins and indicate others that are correlated for indirect or fortuitous reasons. PPP relies on the ordering (not the strength) of BLAST hits and generally ignores the relatedness of species in the BLAST list to the query genome. We attempted to explore the preponderance of LLM homologs among the top hits by PPP for the F420 biosynthesis profile by producing estimated phylogenetic trees. Sequence diversity within the LLM family is so great that it is not clear that a tree produced from its multiple sequence alignment is sufficiently trustworthy to estimate a molecular phylogenetic tree of the entire family. Nevertheless, such a tree shows numerous clades of sequences limited to F420-producing species (see Fig. S1 in the supplemental material). This tree also shows nearly all of the F420-limited clades to be descendant from a single ancestor (with possibly two instances of reversion), but despite the parsimony and appeal of this interpretation, we do not believe the tree in and of itself represents strong evidence in support of that model. Consequently, we constructed alignments from smaller sets of sequences, including each of the LLM proteins identified in M. smegmatis and M. tuberculosis, particularly those identified by PPP. Sets for alignment were obtained by virtue of pairwise BLAST homology to these Mycobacterium LLM proteins. In this way, we identified 13 subfamilies (clades) that are each distributed more widely than the genus Mycobacteria and are observed only in F420-producing organisms. Each of these has been modeled separately by an HMM included within the TIGRFAMs library (Table 2) (35). Together, these models identified 48 F420-dependent members of the total of 86 LLM family genes in M. smegmatis and 13 of 17 LLM family genes in M. tuberculosis. Included in this set of families is one, TIGR03554, encompassing the characterized F420-dependent glucose-6-phosphate dehydrogenase (29, 30). (Note that this clade is in turn part of a larger “subfamily” [of LLM proteins] that is modeled by TIGR03557. It includes a number of other clades, all but one of which are limited to F420 producers. This more broadly distributed clade, modeled by TIGR03885, appears to have reverted to FMN binding or otherwise changed its cofactor specificity and is not analyzed further here.)

Where the patterns made by these families over all F420-producing organisms are sporadic or punctate in nature, they are indicative of either lateral gene transfer or widespread gene loss. For instance, TIGR03559 represents an LLM enzyme of unknown function that is found in 90% of all F420-producing actinobacterial species, once in the marine gammaproteobacterium HTCC2143, and twice in the alphaproteobacterium Phenylobacterium zucineum HLK1. Either these genes (and the associated F420 biosynthesis genes) were selectively retained while being lost from the vast majority of lineages derived from the last common ancestor of the actinobacteria and proteobacteria, or they were laterally transferred to the two proteobacterial species from some other F420-producing (likely actinobacterial) strain. Indeed, in P. zucineum, these two LLM family genes encode the only candidate F420-dependent enzymes, and they are observed in an operon with the F420 biosynthesis genes. It is clear that in the rare instance where families contain members with such clear evidence of en bloc lateral gene transfer of both enzyme and F420 biosynthetic machinery genes, the connection to F420 can be regarded as particularly strong.

At the other end of the spectrum are LLM family members that are not identified by PPP and that belong to clades, including organisms (both within and outside the Actinobacteria) that do not produce F420. It is reasonable to suppose that these proteins are not F420 related. Several of these clades contain members that have been characterized as FMN-utilizing enzymes, such as bacterial luciferase itself (16) and the FMN-dependent nitrilotriacetate monooxygenase (44). An additional two models have been built to represent clades with members in the Actinobacteria that include many non-F420-utilizing species and identify 19 additional M. smegmatis LLM genes (Table 2). Interestingly, no genes of this type are observed in M. tuberculosis, perhaps indicating a greater specialization toward F420-dependent enzymes in M. tuberculosis than in M. smegmatis.

In the middle ground are the genes of most interest for the purposes of antimycobacterial drug design, i.e., those whose phylogenetic distribution is limited to Mycobacterium or only a few closely related species, including Mycobacterium spp. For such genes, PPP may give a strong score if the closest relatives are F420 dependent, even if that relationship is distant and even if they had undergone a switch to utilization of a different cofactor. Alternatively, the degree of sequence divergence may be so great as to obscure the PPP detection of correlation for an authentic F420-dependent gene. There are 19 such genes in M. smegmatis and 5 in M. tuberculosis (Table 2).

We previously observed that sequence similarity correlations may be localized to discrete short sequence regions corresponding to functional sites related to the physical nature of a profile and developed a method, SIMBAL, to identify such subsequences (36). In the current case, we can reasonably expect to identify sequence motifs correlated with the binding of the F420 cofactor in particular that are distinct from motifs for the binding of FMN or other flavonoids. In order to apply the SIMBAL method, the LLM family was partitioned such that all genes from non-F420-utilizing genomes were placed in the negative branch and all members of the F420 producer-restricted families (Table 2) were in the positive branch of the partition. To add additional rigor and to remove family-specific signals, when SIMBAL was applied to genes from each of these families, that particular family was removed from the positive branch set.

The F420-dependent glucose-6-phosphate dehydrogenase (FGD1) from M. tuberculosis has been crystallized with F420 bound, and its structure has been reported (4). SIMBAL applied to the sequence of this enzyme yielded a result that is typical of genes in these F420-restricted families (Fig. 3). Each of the subsequences corresponding to the hot spots in this plot either include residues in the structure which make direct contact with the cofactor or are directly adjacent to the short resolved part of the polyglutamate side chain in an extended surface cleft that is a likely binding site for the full-length polyglutamate (Fig. 4A). Twenty of 24 residues that make up the cofactor binding pocket in this structure were identified by SIMBAL, and 19 of these were centrally located in those subsequence regions. The four residues that were missed were located at the cofactor-substrate interface. The putative polyglutamate binding cleft is lined by a number of lysine and arginine residues and lacks negatively charged residues, as would be expected of a region binding a polyanion (Fig. 4B). This structure also includes a molecule of citrate bound in the substrate binding pocket. It is notable that 13 of 15 residues making up the substrate binding pocket are outside the regions identified by SIMBAL. Very similar results were obtained when SIMBAL was applied to the archaeal F420-dependent LLM enzymes Mer and Adf (1, 2).

FIG. 3.

FIG. 3.

SIMBAL analysis of the M. tuberculosis FGD1 gene (Rv0407) versus a partition of the LLM family (PF00296) based on the ability (positive branch) of source genomes to produce cofactor F420. Closely related homologs of the TIGR03557 family were removed from the positive training set to accentuate features common to all F420-dependent LLM family sequences. (Top left) Raw SIMBAL data (log likelihood scores). (Top right) Normalized data are represented as SIMBAL scores divided by the sequence window length in order to identify prominent localized regions (colored circles). (Bottom) Sequence of Rv0407. High-scoring subsequence regions are indicated in colors corresponding to the circles in the top right panel. Underlined residues make contacts (≤3.5 Å) with the F420 cofactor in the crystal structure under Protein Data Bank (PDB) accession no. 3B4Y (4), starred residues make contacts with the citrate molecule bound in the putative substrate cavity, and dotted residues contribute to a positively charged patch in an extended surface cleft adjacent to the end of the resolved part of the polyglutamate cofactor side chain (see Fig. 4B).

FIG. 4.

FIG. 4.

(A) SIMBAL-identified residues making up the F420-binding surface of M. tuberculosis FGD1 (PDB accession no. 3B4Y [4]). Peak 1, SDH, is in contact with the carboxylate oxygens of the deazaflavin terminal ring (cyan). Peak 2, SVLT, includes the nonproline cis-peptide bond between serine and valine (2) and comprises the “bulge” behind the deazaflavin central ring (red). Peak 3, GTGE, is in contact with the phospholactate component of the side chain (yellow). Peak 4, FKER, is in contact with the single glutamate resolved by the crystal structure and forms a long adjacent surface cleft (blue). Peak 5, AAGGPAV, contacts the deazaflavin hydroxyl (obscured), the side chain phospholactate, and the carboxylate of the resolved side chain glutamate and also forms the putative polyglutamate binding cleft (green). (B) A patch of positively charged residues (blue) lines the poly-Glu binding cleft and is surrounded by a more distant ring of negatively charged residues (red). F420 is indicated as a stick model (green = carbon, red = oxygen, blue = nitrogen, and orange = phosphorus). Molecular models were visualized with MacPyMOL (http://pymol.org/).

Four genes in M. tuberculosis, Rv0132c, Rv1936, Rv2951c, and Rv3618 (and 18 more in M. smegmatis), were not covered by our models because their only close relatives are restricted to the mycobacteria (Table 2). Running SIMBAL on these may help to resolve whether they are F420 dependent. Rv0132c and Rv2951c showed patterns consistent with other putative F420-dependent LLM enzymes, Rv3618 clearly did not, and Rv1936 showed a mixed result, including a very strong third peak, but with all other peaks either weak or missing (Fig. 5). In all, we concluded that M. tuberculosis contains 14 separate F420-dependent members of the LLM family, and possibly only two FMN-dependent members. M. smegmatis, in contrast, has 19 sequences not covered by the F420 models, and only 5 of these show strong SIMBAL results (data not shown).

FIG. 5.

FIG. 5.

SIMBAL analysis of four M. tuberculosis LLM family proteins not found by the positive branch models versus a partition of the LLM (PF00296) family based on the TIGRFAMs-modeled clades of F420-producing organisms (Table 2) (positive branch) and all members from non-F420-producing organisms (negative branch). Window-length-normalized SIMBAL data are plotted as the maximum scores observed over a range of subsequence window lengths, in essence tracing the highest contour across a triangle plot like the one shown in Fig. 3 (top right).

PPOX family.

PPP identified five members of the PPOX family in M. smegmatis (see Table S2 in the supplemental material) and four in M. tuberculosis as likely F420-dependent enzymes. Sixteen and four additional family members (Table 3) were present in the M. smegmatis and M. tuberculosis genomes, respectively, including the known FMN-dependent protein PdxH (also known as PPOX itself). Like the case with the LLM family, we identified five clades within the larger PPOX family that are restricted to F420-producing organisms. These are represented by the TIGR03618, TIGR03666, TIGR03667, TIGR03668, and TIGR04023 models. Aside from the PdxH gene itself, all eight of the M. tuberculosis PPOX genes were found within these clades and therefore encode putative F420-dependent enzymes. M. smegmatis carries seven PPOX genes that fall outside these clades and might be F420 or FMN dependent.

TABLE 3.

Evidence for F420 association in PPOX family genes from Mycobacterium smegmatis and M. tuberculosise

Gene or sequence M. tuberculosis gene Found by PPPa F420-specific clade HMMb SIMBAL peak 1
SIMBAL peak 2
Score Sequencec Score Sequencec
M. smegmatis genes
    MSMEG_3380 Rv2074 Yes TIGR03618 45.0 RPDGTPQVNAMW 14.5 RQKYRNIKANPA
    MSMEG_0048 Rv2991 Yes TIGR03618 44.6 LPDGRPHLVAMW 29.8 SQKAVNLRRDPT
    MSMEG_6526 Rv0121c No TIGR03668 39.7 NADGAPHLVPVV 6.6 LRRLANIDRDSR
    MSMEG_3863 Rv2061c Yes TIGR03666 38.2 TKDGRPKPTAIW 16.4 SWKVKRIRNTPR
    MSMEG_5819 - No TIGR04023 37.2 QPDGTPQNSPVG 6.3 SQKYRNIARNNR
    MSMEG_6576 - Yes TIGR03618 32.5 KANGLPQLSPVT 42.0 RAKTANLRRDPR
    MSMEG_3880 - No TIGR03618 31.1 RSDGSPHVVAVG 2.6 SQKAVNAQERGV
    MSMEG_3179 - No TIGR03666 30.7 RRDGTAVDTPIW 2.3 GPKTKRLAARPE
    MSMEG_2791 - No TIGR03618 28.5 NPDGSPQATLVW 50.8 HKKVRNVRRDPR
    MSMEG_6848 - No TIGR03618 24.2 DPDGAPQQSVVW 38.8 SRKERNLRRDPR
    MSMEG_6485 - No TIGR03618 20.0 RADGSLQSSPVT 35.3 RAKSANIRRTPR
    MSMEG_5170 Rv1155 Yes TIGR03618 16.8 KQDGRPQLSNVS 42.0 RAKTRNLRRDPR
    MSMEG_5717 - No - 12.8 GDKRGPLTVPIW 6.5 SRKHRLIESAGR
- Rv3369 No TIGR03667 9.6 ARSGQPVPRLVW 1.6 AAKVAHITAHPQ
    MSMEG_4975 - No - 8.1 VRDGHPVAFPIG 1.0 SPWLRALAEGAP
- Rv1875 No TIGR03618 6.6 RADGTVQASLVN 5.6 KVKLGNLRARPQ
    MSMEG_1668 - No TIGR03666 4.1 KRSGEAVPSPIN 8.9 TAKVKRIRNNPN
    MSMEG_5136 - No - 3.5 TEDALPAVQPVN 1.3 GGKLSAAAKNQV
    MSMEG_1061 - No - 1.8 DAEGRVDVSPKG 4.3 VDGYLNVLQQPH
    MSMEG_6744 - No - 1.8 DDAGRVWAGPLT 2.3 YMTLGNLEVDSR
    MSMEG_6519 - No - 1.5 TTEGDPWASFVT 3.1 AEHGRNLAHDPR
    MSMEG_0964 - No - 0.9 DADGRPRSRVLH 2.3 PVKRAHLAEHPY
Negative control sequencesd
    Mycobacterial PdxH (MSMEG_5675; Rv2607) 2.6 DADGRPVTRSVL 0.7 SAKGEHLAVNAY
    Escherichia coli PdxH 0.0 DEHGQPYQRIVL 0.0 SRKAHQIENNPR
    PdxH model TIGR00558 consensus 0.7 ePeGRPssRmVL 0.0 SRKGhqieeNPn
    Non-F420-producing species PPOX family consensus 0.1 dddGrPyaRpvl 0.2 srkarnlaanPr
a

Genes identified among the top 63 PPP hits versus an F420 producer profile (see Table S1 in the supplemental material).

b

TIGRFAMs HMMs built to represent clades of PPOX genes consisting only of genes from F420-producing organisms.

c

Twelve-mer sequences corresponding to the centers of the two prominent SIMBAL peaks. Residues in bold are observed primarily in the highest-scoring sequences, and underlined residues are those typical of low-scoring and known FMN-binding sequences.

d

These sequences are presumed to correspond to non-F420-binding enzymes. PdxH is a characterized FMN-binding enzyme. The non-F420-binding PPOX family consensus was constructed from a multiple sequence alignment of all PPOX family proteins from non-F420-producing species in our data set. Conserved residues are in uppercase; consensus residues are in lowercase.

e

Items in bold represent items in evidence of F420 binding for the respective gene.

The product of the M. tuberculosis gene Rv1155 (which was identified by PPP as an F420-correlated gene) has had its crystal structure solved (44). Although the authors identified Rv1155 (by homology) as a pyridoxamine 5′-phosphate oxidase and reported structures with FMN and PLP bound (separately), these two ligands appear to bind at the same site, which is inconsistent with the catalytic mechanism of PPOX enzymes (12). Furthermore, a concentration of 5 mM was required in order to achieve FMN binding, which is inconsistent with the observed micromolar affinity of PdxH for its FMN cofactor (9). The crystal structure of Escherichia coli PdxH, however, does show FMN bound in roughly the same position, confirming at least the location of the Rv1155 flavonoid cofactor binding site, if not the identity of the cofactor. Based on the analysis described below, we suggest that F420 may bind Rv1155 and that this could be examined by fairly routine bench work.

We applied SIMBAL to the sequence of Rv1155, partitioning the PPOX family based on the ability of the source genomes to produce F420. Two very prominent peaks were observed (Fig. 6), corresponding to the cleft where FMN (and PLP) binds in the crystal structures (Fig. 7A). The SIMBAL-identified region is clearly larger than the FMN molecule and appears to be consistent with the binding of a cofactor, such as F420, with a much longer side chain (Fig. 1). Indeed, the structure of FMN-dependent E. coli PdxH has a very different shape in this region, with a closed-off cofactor binding cleft well matched to the size of FMN′s side chain and inconsistent with the extended polyglutamate found in F420 (Fig. 7B). When the sequence in E. coli PdxH homologous to SIMBAL peak 1 was examined in the structure, it was found that it contains an arginine (R67) involved in the coordination of the terminal phosphate of the FMN side chain. This arginine is invariant in the family of PdxH enzymes described by the TIGR00558 model. This residue is instead a serine in the Rv1155 protein, a change consistent with the requirements for binding the F420 cofactor, with its increased size and decreased charge around the corresponding phosphate group. Additionally, the subsequence corresponding to peak 1 of Rv1155 (TIKHDGRPQLSN) contains two residues (underlined) which make up the surface of the cleft proximal to the end of the bound FMN molecule. Relative to PdxH, these represent a shift to a more positively charged environment (D→K and Y→Q), consistent with binding of F420's negatively charged polyglutamate side chain.

FIG. 6.

FIG. 6.

SIMBAL analysis of the M. tuberculosis PPOX family gene Rv2991, using an F420 biosynthesis-based partition, indicates two strongly correlated regions. Window-length-normalized SIMBAL data are plotted as the maximum scores observed over a range of subsequence window lengths.

FIG. 7.

FIG. 7.

(A) Crystal structure of the (dimeric) M. tuberculosis PPOX family protein Rv1155, with (monomeric) FMN bound (blue), showing the locations of SIMBAL peaks 1 and 2 (yellow and orange) (see Fig. 6 and Table 3). FMN binds only weakly to Rv1155, which is a likely F420-binding enzyme. Extending downwards from the short FMN side chain is an extended cleft which appears complementary to the much longer F420 polyglutamate side chain. (B) The FMN-dependent E. coli PPOX family enzyme PdxH is shown with the homologous regions colored. PdxH binds FMN roughly 1,000 times tighter than Rv1155 and contains a pocket into which the FMN side chain fits snugly, while no extended cleft is apparent. Molecular models were visualized with MacPyMOL (http://pymol.org/).

Application of SIMBAL to M. smegmatis and M. tuberculosis PPOX proteins (Table 3) showed a consistent pattern with respect to the presence of a prominent peak 1 consisting of a shift away from the arginine in the phosphate binding pocket and a shift toward positively charged and amide residues in the putative polyglutamate binding cleft. The presence of a dominant peak 2 appears limited to those members of the TIGR03618 family containing sequences similar to NLRRDPR, which are distinctly different from the corresponding PdxH subsequence, QIENNPR. Despite the weakness of peak 2 for many of the tested sequences that had a strong peak 1, there was a clear trend toward an increase in positively charged residues (Table 3), again consistent with the binding of polyglutamate. One of the M. tuberculosis genes (Rv1875) yielded equivocal results (despite being a member of the F420 producer-limited TIGR03618 family), but six proteins could be judged very likely F420 binders by SIMBAL. For M. smegmatis, 14 genes fall into the likely F420-dependent category, while there appear to be at least five other FMN-dependent enzymes in addition to PdxH.

Amount of F420-dependent metabolism in normal human gut flora.

Actinobacteria include many species that produce F420, many of which, such as Mycobacteria and Frankia species (see Table S2 in the supplemental material), have large numbers of apparent F420-dependent enzymes. According to recent metagenomic studies (32, 42), the human gut harbors a large number of actinobacterial lineages. In the profiling studies presented here, we utilized the CofE gene as a marker of F420 biosynthesis, even when it was found in the absence of other known F420 genes, a decision based on evidence from the genes surrounding the orphan CofE gene in those few genomes (see Materials and Methods). Even if incorrect, the inclusion of those genomes (see Table S2 in the supplemental material), which were numerically insignificant, would have had little effect on the profiling results. Postanalysis, we can now look back at those genomes and confirm that all also carry putative F420-dependent genes encoding members of the LLM, PPOX, and/or DDN families, as detected by the HMMs we built (Tables 2 and 3). For instance, Jonesia denitrificans DSM 20603 carries 8 LLM family genes, 4 of which were identified by HMMs for putative F420-dependent gene families, and SIMBAL indicated that an additional 2 genes are likely F420 dependent. This is in contrast to Arthrobacter chlorophenolicus A6, which lacks the CofE gene and any other F420 biosynthesis genes and carries 15 LLM family genes, none of which were hits with the F420-dependent HMMs. Thus, we feel justified in regarding the CofE gene as a perfect marker, with a 1:1 correspondence to the F420 biosynthesis trait.

Searches of the CofE HMM (TIGR01916) versus shotgun sequencing reads from human gut microbiome samples totaling 1,149 Mb of sequence (42) identified only one read, from Methanobrevibacter smithii, an archaeal organism. In contrast, searches using an HMM (TIGR00468) for the PheS gene, which is present in single copy in all bacterial and archaeal genomes, yielded hits at an 860-fold higher rate. A similar calculation for a metagenomic sample from the Global Ocean Survey (33) yielded a 1:50 ratio of F420 producers to nonproducers, suggesting that in the human gut flora, F420 producers are rare relative to those in the environment. Similarly, searches of draft genomes for 30 of the most abundant organisms in the human gut microbiome (32) yielded no hits to the CofE model (see Table S3 in the supplemental material).

DISCUSSION

We have demonstrated, through the application of a series of comparative genomic methods, the presence of significant numbers of putative coenzyme F420-dependent enzymes in Mycobacteria tuberculosis and other related mycobacteria. In M. tuberculosis, this likely amounts to 28 different enzymes, including 14 from the LLM family, 7 from the PPOX family, and 7 from the DDN family. Prior to this work, it was not appreciated that the PPOX family includes F420-dependent members, and the extent to which the LLM family, especially in the Actinobacteria, is dominated by them was not known. As its name suggests, the DDN family was known to include the deazaflavin-dependent nitroreductase Rv3547 (active on an antimycobacterial prodrug [40]); we have shown that this family is observed solely in F420-producing organisms.

The prominent expansion of these families in many actinobacterial lineages suggests a degree of importance to their lifestyle. We suggest that there may be an advantage in targeting the distinctive biosynthesis of the F420 cofactor as a means of developing new antimycobacterial drugs for combination therapy. Coenzyme F420 is not utilized by human cells, and a preliminary search of available human gut metagenomic data suggested that F420-producing members of the “normal” flora are rare, implying that such an approach would specifically target mycobacterial cells.

It was recently shown that mutations in an F420 biosynthesis gene result in hypersensitivity of M. tuberculosis to acidified nitrite, a model of macrophage-induced reactive nitrogen intermediates (10), and it is speculated that the purpose of FGD may be to provide reduced F420 to react with these oxidants, protecting M. tuberculosis from macrophages (31). Little else is currently known about the biological role of these enzymes in the Actinobacteria. One exception is the LLM family protein Rv2951c, which has been shown to be a key ketoreductase in the biosynthesis of the mycobacterial diacyl phthiocerol virulence factors (27, 38). We speculate that some prior attempts to clone and express genes from these families may have failed to produce active products due to the lack of F420 production in E. coli. There have been reports, for instance, of “colorless” proteins isolated when colored FMN binding proteins were expected (7). Expression of these proteins in F420-producing backgrounds such as M. smegmatis or provision with F420 during purification may yield improved results.

The methods applied here began with the generation of a phylogenetic profile for the biosynthesis of cofactor F420 over a large set of prokaryotic genomes. PPP led in short order to the identification of the three dominant families of F420-dependent enzymes in actinobacteria. Since two of these families were distributed in both F420 producers and nonproducers and also included known FMN-dependent members, further dissection of the families into F420 producer clades was carried out using tree-building methods. Finally, structural insight into F420 binding and detailed classification were achieved by the application of SIMBAL (aided by available crystal structures).

The SIMBAL method exploits an algorithm related to phylogenetic profiling in order to mine statistical signals from large collections of genome sequence data. By collecting members of a protein family for which the bound cofactor is variable and partitioning the family according to the cofactor biosynthesis properties of each species of origin, SIMBAL is able to bypass the requirement for a training set based on actual experimental data and to substitute a much larger and more informative data set based on computed metabolic backgrounds.

The proteins studied here are enzymes that have, presumably, at least two different types of specificity: one for the substrate and one for the cofactor. Because the training sets were constructed based on cofactor biosynthesis rather than substrate availability, it became possible to discover sequence regions that tend to “predict” binding to one particular cofactor. We used SIMBAL previously to probe transporter substrate specificity, and also substrate specificity, for a family of protein modification methylases (36), but this is our first use of it to probe enzymatic cofactor binding specificity and to develop evidence to make high-confidence functional assignments.

Mapping the short sequences that earned SIMBAL's best scores to solved protein crystal structures demonstrated that the method found both known F420 binding sites in the LLM family and flavonoid binding (presumably F420 binding) sites in the PPOX family. Close examination of SIMBAL hot spots for F420 binding proteins with solved structures showed that these hot spots represent not only sequences that bind the cofactor but also sites representing key differences that distinguish F420 from FMN.

PPP generated the strong hypothesis that high-scoring proteins bind F420, but any number of alternative explanations are possible, e.g., a single F420-dependent pathway makes a novel substrate available, and PPP is identifying collections of enzymes that use that substrate. The strong confirmation by SIMBAL that PPP identified true F420 binding proteins led to the novel, and experimentally testable, finding that the actual numbers of F420-dependent enzymes in the major human pathogen Mycobacterium tuberculosis and numerous other actinobacteria are quite large. Interestingly, the closely related pathogen M. smegmatis not only displays a greater number of genes for each of the F420-utilizing protein families, but a much larger proportion of these genes appear to be FMN dependent. One possible interpretation is that M. tuberculosis has committed, to a much higher degree, to an F420-dominated lifestyle for its oxidation needs.

Supplementary Material

[Supplemental material]

Acknowledgments

We thank Laura Sheahan and Chuck Merryman for their expert editorial advice in the preparation of the manuscript.

This work was supported by NIH/NHGRI grant R01-HG004881 and NSF grant DBI-0445826.

Footnotes

Published ahead of print on 30 July 2010.

Supplemental material for this article may be found at http://jb.asm.org/.

REFERENCES

  • 1.Aufhammer, S. W., E. Warkentin, H. Berk, S. Shima, R. K. Thauer, and U. Ermler. 2004. Coenzyme binding in F420-dependent secondary alcohol dehydrogenase, a member of the bacterial luciferase family. Structure 12:361-370. [DOI] [PubMed] [Google Scholar]
  • 2.Aufhammer, S. W., E. Warkentin, U. Ermler, C. H. Hagemeier, R. K. Thauer, and S. Shima. 2005. Crystal structure of methylenetetrahydromethanopterin reductase (Mer) in complex with coenzyme F420: architecture of the F420/FMN binding site of enzymes within the nonprolyl cis-peptide containing bacterial luciferase family. Protein Sci. 14:1840-1849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bair, T. B., D. W. Isabelle, and L. Daniels. 2001. Structures of coenzyme F(420) in Mycobacterium species. Arch. Microbiol. 176:37-43. [DOI] [PubMed] [Google Scholar]
  • 4.Bashiri, G., C. J. Squire, N. J. Moreland, and E. N. Baker. 2008. Crystal structures of F420-dependent glucose-6-phosphate dehydrogenase FGD1 involved in the activation of the anti-tuberculosis drug candidate PA-824 reveal the basis of coenzyme and substrate binding. J. Biol. Chem. 283:17531-17541. [DOI] [PubMed] [Google Scholar]
  • 5.Boshoff, H. I., and C. E. Barry III. 2005. Tuberculosis—metabolism and respiration in the absence of growth. Nat. Rev. Microbiol. 3:70-80. [DOI] [PubMed] [Google Scholar]
  • 6.Campbell, Z. T., A. Weichsel, W. R. Montfort, and T. O. Baldwin. 2009. Crystal structure of the bacterial luciferase/flavin complex provides insight into the function of the beta subunit. Biochemistry 48:6085-6094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Canaan, S., G. Sulzenbacher, V. Roig-Zamboni, L. Scappuccini-Calvo, F. Frassinetti, D. Maurin, C. Cambillau, and Y. Bourne. 2005. Crystal structure of the conserved hypothetical protein Rv1155 from Mycobacterium tuberculosis. FEBS Lett. 579:215-221. [DOI] [PubMed] [Google Scholar]
  • 8.Chaiyen, P., C. Suadee, and P. Wilairat. 2001. A novel two-protein component flavoprotein hydroxylase. Eur. J. Biochem. 268:5550-5561. [DOI] [PubMed] [Google Scholar]
  • 9.Churchich, J. E. 1984. Brain pyridoxine-5-phosphate oxidase. A dimeric enzyme containing one FMN site. Eur. J. Biochem. 138:327-332. [DOI] [PubMed] [Google Scholar]
  • 10.Darwin, K. H., S. Ehrt, J. C. Gutierrez-Ramos, N. Weich, and C. F. Nathan. 2003. The proteasome of Mycobacterium tuberculosis is required for resistance to nitric oxide. Science 302:1963-1966. [DOI] [PubMed] [Google Scholar]
  • 11.Di Salvo, M., E. Yang, G. Zhao, M. E. Winkler, and V. Schirch. 1998. Expression, purification, and characterization of recombinant Escherichia coli pyridoxine 5′-phosphate oxidase. Protein Expr. Purif. 13:349-356. [DOI] [PubMed] [Google Scholar]
  • 12.di Salvo, M. L., M. K. Safo, F. N. Musayev, F. Bossa, and V. Schirch. 2003. Structure and mechanism of Escherichia coli pyridoxine 5′-phosphate oxidase. Biochim. Biophys. Acta 1647:76-82. [DOI] [PubMed] [Google Scholar]
  • 13.Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Eker, A. P., P. Kooiman, J. K. Hessels, and A. Yasui. 1990. DNA photoreactivating enzyme from the cyanobacterium Anacystis nidulans. J. Biol. Chem. 265:8009-8015. [PubMed] [Google Scholar]
  • 15.Forouhar, F., M. Abashidze, H. Xu, L. L. Grochowski, J. Seetharaman, M. Hussain, A. Kuzin, Y. Chen, W. Zhou, R. Xiao, T. B. Acton, G. T. Montelione, A. Galinier, R. H. White, and L. Tong. 2008. Molecular insights into the biosynthesis of the F420 coenzyme. J. Biol. Chem. 283:11832-11840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gerlo, E., and E. Schram. 1971. Bioluminescence assay of reduced pyridine and flavine nucleotides with bacterial luciferase. Arch. Int. Physiol. Biochim. 79:200-201. [PubMed] [Google Scholar]
  • 17.Graham, D. E., H. Xu, and R. H. White. 2003. Identification of the 7,8-didemethyl-8-hydroxy-5-deazariboflavin synthase required for coenzyme F(420) biosynthesis. Arch. Microbiol. 180:455-464. [DOI] [PubMed] [Google Scholar]
  • 18.Grochowski, L. L., H. Xu, and R. H. White. 2008. Identification and characterization of the 2-phospho-l-lactate guanylyltransferase involved in coenzyme F420 biosynthesis. Biochemistry 47:3033-3037. [DOI] [PubMed] [Google Scholar]
  • 19.Guerra-Lopez, D., L. Daniels, and M. Rawat. 2007. Mycobacterium smegmatis mc2 155 fbiC and MSMEG_2392 are involved in triphenylmethane dye decolorization and coenzyme F420 biosynthesis. Microbiology 153:2724-2732. [DOI] [PubMed] [Google Scholar]
  • 20.Haft, D. H., B. J. Loftus, D. L. Richardson, F. Yang, J. A. Eisen, I. T. Paulsen, and O. White. 2001. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 29:41-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Haft, D. H., I. T. Paulsen, N. Ward, and J. D. Selengut. 2006. Exopolysaccharide-associated protein sorting in environmental organisms: the PEP-CTERM/EpsH system. Application of a novel phylogenetic profiling heuristic. BMC Biol. 4:29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kertesz, M. A., K. Schmidt-Larbig, and T. Wuest. 1999. A novel reduced flavin mononucleotide-dependent methanesulfonate sulfonatase encoded by the sulfur-regulated msu operon of Pseudomonas aeruginosa. J. Bacteriol. 181:1464-1473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kitamura, M., S. Kojima, K. Ogasawara, T. Nakaya, T. Sagara, K. Niki, K. Miura, H. Akutsu, and I. Kumagai. 1994. Novel FMN-binding protein from Desulfovibrio vulgaris (Miyazaki F). Cloning and expression of its gene in Escherichia coli. J. Biol. Chem. 269:5566-5573. [PubMed] [Google Scholar]
  • 24.Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson, and D. G. Higgins. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947-2948. [DOI] [PubMed] [Google Scholar]
  • 25.Li, H., M. Graupner, H. Xu, and R. H. White. 2003. CofE catalyzes the addition of two glutamates to F420-0 in F420 coenzyme biosynthesis in Methanococcus jannaschii. Biochemistry 42:9771-9778. [DOI] [PubMed] [Google Scholar]
  • 26.Moore, S. A., and M. N. James. 1995. Structural refinement of the nonfluorescent flavoprotein from Photobacterium leiognathi at 1.60 A resolution. J. Mol. Biol. 249:195-214. [DOI] [PubMed] [Google Scholar]
  • 27.Onwueme, K. C., C. J. Vos, J. Zurita, C. E. Soll, and L. E. Quadri. 2005. Identification of phthiodiolone ketoreductase, an enzyme required for production of mycobacterial diacyl phthiocerol virulence factors. J. Bacteriol. 187:4760-4766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Parsons, J. F., K. Calabrese, E. Eisenstein, and J. E. Ladner. 2004. Structure of the phenazine biosynthesis enzyme PhzG. Acta Crystallogr. D Biol. Crystallogr. 60:2110-2113. [DOI] [PubMed] [Google Scholar]
  • 29.Purwantini, E., and L. Daniels. 1998. Molecular analysis of the gene encoding F420-dependent glucose-6-phosphate dehydrogenase from Mycobacterium smegmatis. J. Bacteriol. 180:2212-2219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Purwantini, E., T. P. Gillis, and L. Daniels. 1997. Presence of F420-dependent glucose-6-phosphate dehydrogenase in Mycobacterium and Nocardia species, but absence from Streptomyces and Corynebacterium species and methanogenic Archaea. FEMS Microbiol. Lett. 146:129-134. [DOI] [PubMed] [Google Scholar]
  • 31.Purwantini, E., and B. Mukhopadhyay. 2009. Conversion of NO2 to NO by reduced coenzyme F420 protects mycobacteria from nitrosative damage. Proc. Natl. Acad. Sci. U. S. A. 106:6333-6338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J. M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, M. Jian, Y. Zhou, Y. Li, X. Zhang, N. Qin, H. Yang, J. Wang, S. Brunak, J. Dore, F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, P. Bork, and S. D. Ehrlich. 2010. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59-65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Rusch, D. B., A. L. Halpern, G. Sutton, K. B. Heidelberg, S. Williamson, S. Yooseph, D. Wu, J. A. Eisen, J. M. Hoffman, K. Remington, K. Beeson, B. Tran, H. Smith, H. Baden-Tillson, C. Stewart, J. Thorpe, J. Freeman, C. Andrews-Pfannkoch, J. E. Venter, K. Li, S. Kravitz, J. F. Heidelberg, T. Utterback, Y. H. Rogers, L. I. Falcon, V. Souza, G. Bonilla-Rosso, L. E. Eguiarte, D. M. Karl, S. Sathyendranath, T. Platt, E. Bermingham, V. Gallardo, G. Tamayo-Castillo, M. R. Ferrari, R. L. Strausberg, K. Nealson, R. Friedman, M. Frazier, and J. C. Venter. 2007. The Sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 5:e77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Schroeder, E. K., N. de Souza, D. S. Santos, J. S. Blanchard, and L. A. Basso. 2002. Drugs that inhibit mycolic acid biosynthesis in Mycobacterium tuberculosis. Curr. Pharm. Biotechnol. 3:197-225. [DOI] [PubMed] [Google Scholar]
  • 35.Selengut, J. D., D. H. Haft, T. Davidsen, A. Ganapathy, M. Gwinn-Giglio, W. C. Nelson, A. R. Richter, and O. White. 2007. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35:D260-D264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Selengut, J. D., D. B. Rusch, and D. H. Haft. 2010. Sites inferred by metabolic background assertion labeling (SIMBAL): adapting the partial phylogenetic profiling algorithm to scan sequences for signatures that predict protein function. BMC Bioinformatics 11:52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Shi, R., N. Itagaki, and I. Sugawara. 2007. Overview of anti-tuberculosis (TB) drugs and their resistance mechanisms. Mini Rev. Med. Chem. 7:1177-1185. [DOI] [PubMed] [Google Scholar]
  • 38.Simeone, R., P. Constant, W. Malaga, C. Guilhot, M. Daffe, and C. Chalut. 2007. Molecular dissection of the biosynthetic relationship between phthiocerol and phthiodiolone dimycocerosates and their critical role in the virulence and permeability of Mycobacterium tuberculosis. FEBS J. 274:1957-1969. [DOI] [PubMed] [Google Scholar]
  • 39.Singh, J. A., R. Upshur, and N. Padayatchi. 2007. XDR-TB in South Africa: no time for denial or complacency. PLoS Med. 4:e50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Singh, R., U. Manjunatha, H. I. Boshoff, Y. H. Ha, P. Niyomrattanakit, R. Ledwidge, C. S. Dowd, I. Y. Lee, P. Kim, L. Zhang, S. Kang, T. H. Keller, J. Jiricek, and C. E. Barry III. 2008. PA-824 kills nonreplicating Mycobacterium tuberculosis by intracellular NO release. Science 322:1392-1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Thibaut, D., N. Ratet, D. Bisch, D. Faucher, L. Debussche, and F. Blanche. 1995. Purification of the two-enzyme system catalyzing the oxidation of the d-proline residue of pristinamycin IIB during the last step of pristinamycin IIA biosynthesis. J. Bacteriol. 177:5199-5205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Turnbaugh, P. J., M. Hamady, T. Yatsunenko, B. L. Cantarel, A. Duncan, R. E. Ley, M. L. Sogin, W. J. Jones, B. A. Roe, J. P. Affourtit, M. Egholm, B. Henrissat, A. C. Heath, R. Knight, and J. I. Gordon. 2009. A core gut microbiome in obese and lean twins. Nature 457:480-484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Winn, M., R. J. Goss, K. Kimura, and T. D. Bugg. 2010. Antimicrobial nucleoside antibiotics targeting cell wall assembly: recent advances in structure-function studies and nucleoside biosynthesis. Nat. Prod. Rep. 27:279-304. [DOI] [PubMed] [Google Scholar]
  • 44.Xu, Y., M. W. Mortimer, T. S. Fisher, M. L. Kahn, F. J. Brockman, and L. Xun. 1997. Cloning, sequencing, and analysis of a gene cluster from Chelatobacter heintzii ATCC 29600 encoding nitrilotriacetate monooxygenase and NADH:flavin mononucleotide oxidoreductase. J. Bacteriol. 179:1112-1116. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental material]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES