Skip to main content
. 2001 Sep 19;2(10):research0044.1–research0044.18. doi: 10.1186/gb-2001-2-10-research0044

Table 2.

Presence of genes in gene clusters of all available finished and unfinished genome sequences

Presence and names of genes in each species

Gene family Description Protein size (in M. tb) ESAT-6 cluster region M. tuberculosis H37Rv M. tuberculosis CDC1551 (CSU#93) M. tuberculosis* 210 M. bovis* AF2122/97 (spoligotype 9) M. bovis* BCG Pasteur 1173P2
A ABC transporter family signature, 19-27% homology 283 1 Rv3866 MT3980 ND MB851A No sequence data
276 2 Rv3889c MT4004 MTB12A MB727.3A (partly deleted #) No sequence data
295 3 Rv0289 MT0302 MTB203A MB548A No sequence data
- 4 No duplication No duplication No duplication No duplication No duplication
300 5 Rv1794 MT1843 MTB196A MB557A No sequence data
B AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1× ATP/GTP-binding site, 29-39% homology 573 1 Rv3868 MT3981 MTB44B MB851B No sequence data
619 2 Rv3884c MT3999 MTB12B MB727.1B No sequence data
631 3 Rv0282 MT0295 MTB23B MB672B No sequence data
- 4 No duplication No duplication No duplication No duplication No duplication
610 5 Rv1798 MT1847 MTB196B MB542B No sequence data
C Amino-terminal transmembrane protein, possible ATP/GTP-binding motif, 31-41% homology 480 1 Rv3869 MT3982 MTB44C MB851C No sequence data
495 2 Rv3895c MT4011 MTB136C MB780.1C No sequence data
538 3 Rv0283 MT0296 MTB23C MB672C No sequence data
470 4 Rv3450c MT3556 MTB45C MB493.1C No sequence data
506 5 Rv1782 MT1832 MTB46C MB771.1C No sequence data
D DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites, 2× amino-terminal transmembrane protein, 28-39% homology 747 + 591 1 Rv3870+71 MT3983+85 MTB44Da+Db MB851D MB851D (partly deleted)
1396 2 Rv3894c MT4010 MTB3D MB780.1D No sequence data
1330 3 Rv0284 MT0297 MTB23D MB672D No sequence data
1236 4 Rv3447c MT3553 MTB45D MB585.1D No sequence data
435 + 932 5 Rv1783+84 MT1833 MTB46Da+Db MB771.1D No sequence data
E PE, 18-90% homology 99 1 Rv3872 MT3986 MTB44E MB851E Deleted
77 2 Rv3893c MT4008 MTB3E MB780.1E No sequence data
102 3 Rv0285 MT0298 MTB23E MB389E No sequence data
- 4 No duplication No duplication No duplication No duplication No duplication
99 & 99 5 Rv1788 & 91 MT1837 & 40 MTB196Ea & Eb MB771.0E & MB557E No sequence data
F PPE, 19-88% homology 368 1 Rv3873 MT3987 MTB44F MB851F Deleted
399 2 Rv3892c MT4007 MTB3F MB780.1F No sequence data
513 3 Rv0286 MT0299 MTB472F MB528F No sequence data
- 4 No duplication No duplication No duplication No duplication No duplication
365, 393 & 350 5 Rv1787 & 89 & 90 MT1836 & 38 & 39 MTB196Fa & Fb & Fc MB771.0Fa & Fb & MB557F No sequence data
G lhp or CFP-10, also MTSA-10, grouped into ESAT-6 family, potent secreted T-cell antigens, 9-32% homology 100 1 Rv3874 MT3988 MTB44G MB851G Deleted
107 2 Rv3891c MT4006 MTB12G MB727.3G No sequence data
97 3 Rv0287 MT0300 MTB472G MB548G No sequence data
125 4 Rv3445c MT3550 MTB45G MB585.0G No sequence data
98 5 Rv1792 (Stop) MT1841 (Stop) MTB196G (Stop) MB557G No sequence data
H ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology 95 1 Rv3875 MT3989 MTB44H MB851H Deleted
95 2 Rv3890c MT4005 MTB12H MB727.3H No sequence data
96 3 Rv0288 MT0301 MTB203H MB548H No sequence data
100 4 Rv3444c MT3549 MTB45H MB585.0H No sequence data
94 5 Rv1793 MT1842 MTB196H MB557H No sequence data
I ATPases involved in chromosome partitioning, 1× ATP/GTP-binding motif, -33% homology- 666 1 Rv3876 MT3990 MTB60I MB477I Deleted
341 2 Rv3888c MT4003 MTB12I Deleted # No sequence data
- 3 No duplication No duplication No duplication No duplication No duplication
- 4 No duplication No duplication No duplication No duplication No duplication
- 5 No duplication No duplication No duplication No duplication No duplication
J Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology 511 1 Rv3877 MT3991 MTB369J MB477J Deleted
509 2 Rv3887c MT4002 MTB12J MB727.3J (partly deleted #) No sequence data
472 3 Rv0290 MT0303 MTB203J MB548J No sequence data
467 4 Rv3448 MT3554 MTB45J MB585.1J No sequence data
503 5 Rv1795 MT1844 MTB196J MB506J No sequence data
K Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology 446 1 Rv3883c MT3998 MTB12Ka MB727.0K No sequence data
550 2 Rv3886c MT4001(Frame) MTB12Kb MB727.2K No sequence data
461 3 Rv0291 MT0304 MTB203K MB548K No sequence data
455 4 Rv3449 MT3555 MTB45K MB585.1K No sequence data
585 5 Rv1796 MT1845 MTB196K MB506K No sequence data
L 2× amino-terminal transmembrane protein, 16-27% homology 462 1 Rv3882c MT3997 MTB12La MB727.0L No sequence data
537 2 Rv3885c MT4000 (Frame) MTB12Lb MB727.2L No sequence data
331 3 Rv0292 MT0305 MTB203L MB694.0L No sequence data
- 4 No duplication No duplication No duplication No duplication No duplication
406 5 Rv1797 MT1846 MTB196L MB542L No sequence data
Presence and names of genes in each species

Gene family Description Protein size (in M. tb) ESAT-6 cluster region M. leprae TN M. avium* 104 M. paratuberculosis K 10 M. smegmatis* MC2 155 C. diphtheriae* NCTC13129 S. coelicolor A3 (2)

A ABC transporter family signature, 19-27% homology 283 1 ML0057(pseudo) ND ND MS29A ND ND
276 2 MLabc (pseudo) MA138A MP3889c ND ND ND
295 3 ML2530 MA141A MP0289 MS32A ND ND
- 4 No duplication No duplication No duplication No duplication No duplication No duplication
300 5 ML1540 MA310A MP1794 ND ND ND
B AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1x ATP/GTP binding site, 29-39% homology 573 1 ML0055 ND ND MS29B ND ND
619 2 ML0039(pseudo) MA177B MP3884c ND ND ND
631 3 ML2537 MA78B MP0282 MS32B ND ND
- 4 No duplication No duplication No duplication No duplication No duplication No duplication
610 5 ML1536 MA310B MP1798 ND ND ND
C Amino-terminal transmembrane protein, possible ATP/GTP- binding motif, 31-41% homology 480 1 ML0054 ND ND MS29C ND ND
495 2 Deleted MA144C MP3895c ND ND ND
538 3 ML2536 MA78C MP0283 MS32C ND ND
470 4 Deleted MA94C MP3450c MS8C CORDmem SC3C3.07
506 5 ML1544 MA221C MP1782 ND ND ND
D DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites 2 × amino-terminal transmembrane protein, 28-39% homology 747+591 1 ML0053+52 ND ND MS29D (Stop$) ND ND
1396 2 Deleted MA144D MP3894c ND ND ND
1330 3 ML2535 MA78D MP0284 MS32D ND ND
1236 4 Deleted MA504D MP3447c MS8D CORDyuk SC3C3.20c
435+932 5 ML1543 MA221D MP1783 ND ND ND
E PE, 18-90% homology 99 1 Deleted ND ND MS29E ND ND
77 2 Deleted MA138E MP3893c ND ND ND
102 3 ML2534 MA78E MP0285 MS32E ND ND
- 4 No duplication No duplication No duplication No duplication No duplication No
99 & 99 5 Deleted MA310Ea & Eb MP1788 & 91 ND ND ND
F PPE, 19-88% homology 368 1 ML0051 ND ND MS29F ND ND
399 2 Deleted MA138F MP3892c ND ND ND
513 3 ML2533 (pseudo) MA78F MP0286 MS32F ND ND
- 4 No duplication No duplication No duplication No duplication No duplication No duplication
365, 393 & 350 5 Deleted MA310Fa & Fb & Fc MP1787 & 89 & 90 ND ND ND
G lhp or CFP-10, also MTSA-10, grouped ESAT-6 family, potent secreted T-cell antigens, 9-32% homology 100 1 ML0050 ND ND MS29G ND SC3C3.10 and SC3C3.11(c)
107 2 Deleted MA138G MP3891c § ND ND ND
97 3 ML2532 MA141G MP0287 MS32G ND ND
125 4 Deleted MA319G MP3445c MS8G CORDcfp10 ND
98 5 MLcfp (pseudo) MA310G MP1792 ND ND ND
H ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology 95 1 ML0049 ND ND MS29H ND SC3C3.10 and SC3C3.11
95 2 ML0034 (pseudo) MA138H MP3890c § ND ND ND
96 3 ML2531 MA141H MP0288 MS32H ND ND
100 4 ML0363 MA319H MP3444c MS8H CORDesat6 ND
94 5 MLesat (pseudo) MA310H MP1793 ND ND ND
I ATPases involved in chromosome partitioning, 1x ATP/GTP-binding motif, 33% homology 666 1 ML0048 ND ND MS29I ND SC3C3.03c
341 2 ML0035 (pseudo) MA138I MP3888c ND ND ND
- 3 No duplication No duplication No duplication No duplication No duplication No duplication
- 4 No duplication No duplication No duplication No duplication No duplication No duplication
- 5 No duplication No duplication No duplication No duplication No duplication No duplication
J Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology 511 1 ML0047 ND ND MS29J ND ND
509 2 ML0036 (pseudo) MA138J MP3887c ND ND ND
472 3 ML2529 MA141J MP0290 MS32J ND ND
467 4 Deleted MA504J MP3448 MS8J CORDtransporter SC3C3.21
503 5 ML1539 MA310J MP1795 ND ND ND
K Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology 446 1 ML0041 ND ND MS65K ND ND
550 2 ML0037 (pseudo) MA177K MP3886c ND ND ND
461 3 ML2528 MA141K MP0291 MS32K ND ND
455 4 Deleted MA439K MP3449 MS8K CORDsub SC3C3.17c and SC3C3.08
585 5 ML1538 MA310K MP1796 ND ND ND
L 2× amino-terminal transmembrane protein, 16-27% homology 462 1 ML0042 ND ND MS65L ND ND
537 2 ML0038 (pseudo) MA177L MP3885c ND ND ND
331 3 ML2527 MA81L MP0292 MS32L ND ND
- 4 No duplication No duplication No duplication No duplication No duplication No duplication
406 5 ML1537 MA310L MP1797 ND ND ND
Other region-specific genes of known functions (not assigned to a family)
Region 5 (not present in M. smegmatis, C. diphtheriae and S. coelicolor) Rv1785c Probable member of the cytochrome P450 family (pseudogene in M. leprae)
Rv1786 Probable ferredoxin (pseudogene in M. leprae)
Other region-specific genes of unknown functions (not assigned to a family)
Region 1(deleted in M. avium and M. paratuberculosis, not present in C. diphtheriae and S. coelicolor) Rv3867 Unknown, annotated as part of MT3980 (Rv3866) in M. tuberculosis CDC1551 sequence with a frameshift (functional in M. leprae)
Rv3878 Unknown, some similarity to PPE family, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae)
Rv3879c Unknown, repetitive, highly proline-rich N-terminus, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae)
Rv3880c Unknown (functional in M. leprae)
Rv3881c Unknown (pseudogene in M. leprae)
Region 4 (not present in S. coelicolor) Rv3446c Unknown, may contain a possible ABC transporter signature (deleted in M. leprae)

*Names of genes of these organisms were given arbitrarily by the authors of this paper. Gene not identified by BLAST, data obtained from [1], GenBank accession no. U34848 and AAC44033. The gene is present in the sequence, but not annotated (name given arbitrarily by authors of this paper). §Genes identified by BLAST as well as data obtained from GenBank, accession no. AJ250015. Orthologs in S. coelicolor are equally similar to family G and H. ND, Not detected - not necessarily absent from genome but possibly not detected because of unfinished sequencing process. No duplication, no duplication of this gene is present in this region. No sequence data, no sequence data is available for this organism, published deletion information is included ([1] and others). Deleted, deleted from the genome of this particular species or strain (# = deleted in only some strains of this species). Frame, frameshift. Stop, in-frame stop codon. Stop$, stop codon corresponds to stop codon in M. tuberculosis H37Rv, which splits gene into Rv3870 and Rv3871. Pseudo, confirmed pseudogene due to multiple frameshifts and stop codons.