Table 2.
Presence of genes in gene clusters of all available finished and unfinished genome sequences
| Presence and names of genes in each species | ||||||||||
| Gene family | Description | Protein size (in M. tb) | ESAT-6 cluster region | M. tuberculosis H37Rv | M. tuberculosis CDC1551 (CSU#93) | M. tuberculosis* 210 | M. bovis* AF2122/97 (spoligotype 9) | M. bovis* BCG Pasteur 1173P2 | ||
| A | ABC transporter family signature, 19-27% homology | 283 | 1 | Rv3866 | MT3980 | ND | MB851A | No sequence data | ||
| 276 | 2 | Rv3889c | MT4004 | MTB12A | MB727.3A (partly deleted #) | No sequence data | ||||
| 295 | 3 | Rv0289 | MT0302 | MTB203A | MB548A | No sequence data | ||||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| 300 | 5 | Rv1794 | MT1843 | MTB196A | MB557A | No sequence data | ||||
| B | AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1× ATP/GTP-binding site, 29-39% homology | 573 | 1 | Rv3868 | MT3981 | MTB44B | MB851B | No sequence data | ||
| 619 | 2 | Rv3884c | MT3999 | MTB12B | MB727.1B | No sequence data | ||||
| 631 | 3 | Rv0282 | MT0295 | MTB23B | MB672B | No sequence data | ||||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| 610 | 5 | Rv1798 | MT1847 | MTB196B | MB542B | No sequence data | ||||
| C | Amino-terminal transmembrane protein, possible ATP/GTP-binding motif, 31-41% homology | 480 | 1 | Rv3869 | MT3982 | MTB44C | MB851C | No sequence data | ||
| 495 | 2 | Rv3895c | MT4011 | MTB136C | MB780.1C | No sequence data | ||||
| 538 | 3 | Rv0283 | MT0296 | MTB23C | MB672C | No sequence data | ||||
| 470 | 4 | Rv3450c | MT3556 | MTB45C | MB493.1C | No sequence data | ||||
| 506 | 5 | Rv1782 | MT1832 | MTB46C | MB771.1C | No sequence data | ||||
| D | DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites, 2× amino-terminal transmembrane protein, 28-39% homology | 747 + 591 | 1 | Rv3870+71 | MT3983+85 | MTB44Da+Db | MB851D | MB851D (partly deleted) | ||
| 1396 | 2 | Rv3894c | MT4010 | MTB3D | MB780.1D | No sequence data | ||||
| 1330 | 3 | Rv0284 | MT0297 | MTB23D | MB672D | No sequence data | ||||
| 1236 | 4 | Rv3447c | MT3553 | MTB45D | MB585.1D | No sequence data | ||||
| 435 + 932 | 5 | Rv1783+84 | MT1833 | MTB46Da+Db | MB771.1D | No sequence data | ||||
| E | PE, 18-90% homology | 99 | 1 | Rv3872 | MT3986 | MTB44E | MB851E | Deleted | ||
| 77 | 2 | Rv3893c | MT4008 | MTB3E | MB780.1E | No sequence data | ||||
| 102 | 3 | Rv0285 | MT0298 | MTB23E | MB389E | No sequence data | ||||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| 99 & 99 | 5 | Rv1788 & 91 | MT1837 & 40 | MTB196Ea & Eb | MB771.0E & MB557E | No sequence data | ||||
| F | PPE, 19-88% homology | 368 | 1 | Rv3873 | MT3987 | MTB44F | MB851F | Deleted | ||
| 399 | 2 | Rv3892c | MT4007 | MTB3F | MB780.1F | No sequence data | ||||
| 513 | 3 | Rv0286 | MT0299 | MTB472F | MB528F | No sequence data | ||||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| 365, 393 & 350 | 5 | Rv1787 & 89 & 90 | MT1836 & 38 & 39 | MTB196Fa & Fb & Fc | MB771.0Fa & Fb & MB557F | No sequence data | ||||
| G | lhp or CFP-10, also MTSA-10, grouped into ESAT-6 family, potent secreted T-cell antigens, 9-32% homology | 100 | 1 | Rv3874 | MT3988 | MTB44G | MB851G | Deleted | ||
| 107 | 2 | Rv3891c | MT4006 | MTB12G | MB727.3G | No sequence data | ||||
| 97 | 3 | Rv0287 | MT0300 | MTB472G | MB548G | No sequence data | ||||
| 125 | 4 | Rv3445c | MT3550 | MTB45G | MB585.0G | No sequence data | ||||
| 98 | 5 | Rv1792 (Stop) | MT1841 (Stop) | MTB196G (Stop) | MB557G | No sequence data | ||||
| H | ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology | 95 | 1 | Rv3875 | MT3989 | MTB44H | MB851H † | Deleted | ||
| 95 | 2 | Rv3890c | MT4005 | MTB12H | MB727.3H | No sequence data | ||||
| 96 | 3 | Rv0288 | MT0301 | MTB203H | MB548H | No sequence data | ||||
| 100 | 4 | Rv3444c | MT3549 | MTB45H | MB585.0H | No sequence data | ||||
| 94 | 5 | Rv1793 | MT1842 | MTB196H | MB557H | No sequence data | ||||
| I | ATPases involved in chromosome partitioning, 1× ATP/GTP-binding motif, -33% homology- | 666 | 1 | Rv3876 | MT3990 | MTB60I | MB477I | Deleted | ||
| 341 | 2 | Rv3888c | MT4003 | MTB12I | Deleted # | No sequence data | ||||
| - | 3 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| - | 5 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| J | Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology | 511 | 1 | Rv3877 | MT3991 | MTB369J | MB477J | Deleted | ||
| 509 | 2 | Rv3887c | MT4002 | MTB12J | MB727.3J (partly deleted #) | No sequence data | ||||
| 472 | 3 | Rv0290 | MT0303 | MTB203J | MB548J | No sequence data | ||||
| 467 | 4 | Rv3448 | MT3554 | MTB45J | MB585.1J | No sequence data | ||||
| 503 | 5 | Rv1795 | MT1844 | MTB196J | MB506J | No sequence data | ||||
| K | Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology | 446 | 1 | Rv3883c | MT3998 | MTB12Ka | MB727.0K | No sequence data | ||
| 550 | 2 | Rv3886c | MT4001(Frame) | MTB12Kb | MB727.2K | No sequence data | ||||
| 461 | 3 | Rv0291 | MT0304 | MTB203K | MB548K | No sequence data | ||||
| 455 | 4 | Rv3449 | MT3555 | MTB45K | MB585.1K | No sequence data | ||||
| 585 | 5 | Rv1796 | MT1845 | MTB196K | MB506K | No sequence data | ||||
| L | 2× amino-terminal transmembrane protein, 16-27% homology | 462 | 1 | Rv3882c | MT3997 | MTB12La | MB727.0L | No sequence data | ||
| 537 | 2 | Rv3885c | MT4000 (Frame) | MTB12Lb | MB727.2L | No sequence data | ||||
| 331 | 3 | Rv0292 | MT0305 | MTB203L | MB694.0L | No sequence data | ||||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | ||||
| 406 | 5 | Rv1797 | MT1846 | MTB196L | MB542L | No sequence data | ||||
| Presence and names of genes in each species | ||||||||||
| Gene family | Description | Protein size (in M. tb) | ESAT-6 cluster region | M. leprae TN | M. avium* 104 | M. paratuberculosis K 10 | M. smegmatis* MC2 155 | C. diphtheriae* NCTC13129 | S. coelicolor A3 (2) | |
| A | ABC transporter family signature, 19-27% homology | 283 | 1 | ML0057(pseudo) | ND | ND | MS29A | ND | ND | |
| 276 | 2 | MLabc (pseudo)‡ | MA138A | MP3889c | ND | ND | ND | |||
| 295 | 3 | ML2530 | MA141A | MP0289 | MS32A | ND | ND | |||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| 300 | 5 | ML1540 | MA310A | MP1794 | ND | ND | ND | |||
| B | AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1x ATP/GTP binding site, 29-39% homology | 573 | 1 | ML0055 | ND | ND | MS29B | ND | ND | |
| 619 | 2 | ML0039(pseudo) | MA177B | MP3884c | ND | ND | ND | |||
| 631 | 3 | ML2537 | MA78B | MP0282 | MS32B | ND | ND | |||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| 610 | 5 | ML1536 | MA310B | MP1798 | ND | ND | ND | |||
| C | Amino-terminal transmembrane protein, possible ATP/GTP- binding motif, 31-41% homology | 480 | 1 | ML0054 | ND | ND | MS29C | ND | ND | |
| 495 | 2 | Deleted | MA144C | MP3895c | ND | ND | ND | |||
| 538 | 3 | ML2536 | MA78C | MP0283 | MS32C | ND | ND | |||
| 470 | 4 | Deleted | MA94C | MP3450c | MS8C | CORDmem | SC3C3.07 | |||
| 506 | 5 | ML1544 | MA221C | MP1782 | ND | ND | ND | |||
| D | DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites 2 × amino-terminal transmembrane protein, 28-39% homology | 747+591 | 1 | ML0053+52 | ND | ND | MS29D (Stop$) | ND | ND | |
| 1396 | 2 | Deleted | MA144D | MP3894c | ND | ND | ND | |||
| 1330 | 3 | ML2535 | MA78D | MP0284 | MS32D | ND | ND | |||
| 1236 | 4 | Deleted | MA504D | MP3447c | MS8D | CORDyuk | SC3C3.20c | |||
| 435+932 | 5 | ML1543 | MA221D | MP1783 | ND | ND | ND | |||
| E | PE, 18-90% homology | 99 | 1 | Deleted | ND | ND | MS29E | ND | ND | |
| 77 | 2 | Deleted | MA138E | MP3893c | ND | ND | ND | |||
| 102 | 3 | ML2534 | MA78E | MP0285 | MS32E | ND | ND | |||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | No | |||
| 99 & 99 | 5 | Deleted | MA310Ea & Eb | MP1788 & 91 | ND | ND | ND | |||
| F | PPE, 19-88% homology | 368 | 1 | ML0051 | ND | ND | MS29F | ND | ND | |
| 399 | 2 | Deleted | MA138F | MP3892c | ND | ND | ND | |||
| 513 | 3 | ML2533 (pseudo) | MA78F | MP0286 | MS32F | ND | ND | |||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| 365, 393 & 350 | 5 | Deleted | MA310Fa & Fb & Fc | MP1787 & 89 & 90 | ND | ND | ND | |||
| G | lhp or CFP-10, also MTSA-10, grouped ESAT-6 family, potent secreted T-cell antigens, 9-32% homology | 100 | 1 | ML0050 | ND | ND | MS29G | ND | SC3C3.10 and SC3C3.11(c) | |
| 107 | 2 | Deleted | MA138G | MP3891c § | ND | ND | ND | |||
| 97 | 3 | ML2532 | MA141G | MP0287 | MS32G | ND | ND | |||
| 125 | 4 | Deleted | MA319G | MP3445c | MS8G | CORDcfp10 | ND | |||
| 98 | 5 | MLcfp (pseudo)‡ | MA310G | MP1792 | ND | ND | ND | |||
| H | ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology | 95 | 1 | ML0049 | ND | ND | MS29H | ND | SC3C3.10 and SC3C3.11¶ | |
| 95 | 2 | ML0034 (pseudo) | MA138H | MP3890c § | ND | ND | ND | |||
| 96 | 3 | ML2531 | MA141H | MP0288 | MS32H | ND | ND | |||
| 100 | 4 | ML0363 | MA319H | MP3444c | MS8H | CORDesat6 | ND | |||
| 94 | 5 | MLesat (pseudo)‡ | MA310H | MP1793 | ND | ND | ND | |||
| I | ATPases involved in chromosome partitioning, 1x ATP/GTP-binding motif, 33% homology | 666 | 1 | ML0048 | ND | ND | MS29I | ND | SC3C3.03c | |
| 341 | 2 | ML0035 (pseudo) | MA138I | MP3888c | ND | ND | ND | |||
| - | 3 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| - | 5 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| J | Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology | 511 | 1 | ML0047 | ND | ND | MS29J | ND | ND | |
| 509 | 2 | ML0036 (pseudo) | MA138J | MP3887c | ND | ND | ND | |||
| 472 | 3 | ML2529 | MA141J | MP0290 | MS32J | ND | ND | |||
| 467 | 4 | Deleted | MA504J | MP3448 | MS8J | CORDtransporter | SC3C3.21 | |||
| 503 | 5 | ML1539 | MA310J | MP1795 | ND | ND | ND | |||
| K | Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology | 446 | 1 | ML0041 | ND | ND | MS65K | ND | ND | |
| 550 | 2 | ML0037 (pseudo) | MA177K | MP3886c | ND | ND | ND | |||
| 461 | 3 | ML2528 | MA141K | MP0291 | MS32K | ND | ND | |||
| 455 | 4 | Deleted | MA439K | MP3449 | MS8K | CORDsub | SC3C3.17c and SC3C3.08 | |||
| 585 | 5 | ML1538 | MA310K | MP1796 | ND | ND | ND | |||
| L | 2× amino-terminal transmembrane protein, 16-27% homology | 462 | 1 | ML0042 | ND | ND | MS65L | ND | ND | |
| 537 | 2 | ML0038 (pseudo) | MA177L | MP3885c | ND | ND | ND | |||
| 331 | 3 | ML2527 | MA81L | MP0292 | MS32L | ND | ND | |||
| - | 4 | No duplication | No duplication | No duplication | No duplication | No duplication | No duplication | |||
| 406 | 5 | ML1537 | MA310L | MP1797 | ND | ND | ND | |||
| Other region-specific genes of known functions (not assigned to a family) | ||||||||||
| Region 5 (not present in M. smegmatis, C. diphtheriae and S. coelicolor) | Rv1785c | Probable member of the cytochrome P450 family (pseudogene in M. leprae) | ||||||||
| Rv1786 | Probable ferredoxin (pseudogene in M. leprae) | |||||||||
| Other region-specific genes of unknown functions (not assigned to a family) | ||||||||||
| Region 1(deleted in M. avium and M. paratuberculosis, not present in C. diphtheriae and S. coelicolor) | Rv3867 | Unknown, annotated as part of MT3980 (Rv3866) in M. tuberculosis CDC1551 sequence with a frameshift (functional in M. leprae) | ||||||||
| Rv3878 | Unknown, some similarity to PPE family, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae) | |||||||||
| Rv3879c | Unknown, repetitive, highly proline-rich N-terminus, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae) | |||||||||
| Rv3880c | Unknown (functional in M. leprae) | |||||||||
| Rv3881c | Unknown (pseudogene in M. leprae) | |||||||||
| Region 4 (not present in S. coelicolor) | Rv3446c | Unknown, may contain a possible ABC transporter signature (deleted in M. leprae) | ||||||||
*Names of genes of these organisms were given arbitrarily by the authors of this paper. †Gene not identified by BLAST, data obtained from [1], GenBank accession no. U34848 and AAC44033. ‡The gene is present in the sequence, but not annotated (name given arbitrarily by authors of this paper). §Genes identified by BLAST as well as data obtained from GenBank, accession no. AJ250015. ¶Orthologs in S. coelicolor are equally similar to family G and H. ND, Not detected - not necessarily absent from genome but possibly not detected because of unfinished sequencing process. No duplication, no duplication of this gene is present in this region. No sequence data, no sequence data is available for this organism, published deletion information is included ([1] and others). Deleted, deleted from the genome of this particular species or strain (# = deleted in only some strains of this species). Frame, frameshift. Stop, in-frame stop codon. Stop$, stop codon corresponds to stop codon in M. tuberculosis H37Rv, which splits gene into Rv3870 and Rv3871. Pseudo, confirmed pseudogene due to multiple frameshifts and stop codons.