Table 2.
Classification and nomenclature of CRISPR-associated genes*
Proposed gene name‡ | System type or subtype | Name from Haft et al.§ | Name from Brouns et al.∥ | Structure of encoded protein (PDB accessions)¶ | Families (and superfamily) of encoded protein#** | Representatives |
---|---|---|---|---|---|---|
cas1 | • Type I | cas1 | cas1 | 3GOD,3LFX and 2YZS | COG1518 | SERP2463,SPy1047 and ygbT |
• Type II | ||||||
• Type III | ||||||
| ||||||
cas2 | • Type I | cas2 | cas2 | 2IVY,2I8E and 3EXC | COG1343 and COG3512 | SERP2462, SPy1048, SPy1723 (N-terminal domain) and ygbF |
• Type II | ||||||
• Type III | ||||||
| ||||||
cas3 ′ | • Type I‡‡ | cas3 | cas3 | NA | COG1203 | APE1232 and ygcB |
| ||||||
cas3 ″ | • Subtype I-A | NA | NA | NA | COG2254 | APE1231 and BH0336 |
• Subtype I-B | ||||||
| ||||||
cas4 | • Subtype I-A | cas4 and csa1 | NA | NA | COG1468 | APE1239 and BH0340 |
• Subtype I-B | ||||||
• Subtype I-C | ||||||
• Subtype I-D | ||||||
• Subtype II-B | ||||||
| ||||||
cas5 | • Subtype I-A | cas5a, cas5d, cas5e, cas5h, cas5p, cas5t and cmx5 | casD | 3KG4 | COG1688 (RAMP) | APE1234, BH0337, devS and ygcI |
• Subtype I-B | ||||||
• Subtype I-C | ||||||
• Subtype I-E | ||||||
| ||||||
cas6 | • Subtype I-A | cas6 and cmx6 | NA | 3I4H | COG1583 and COG5551 (RAMP) | PF1131 and slr7014 |
• Subtype I-B | ||||||
• Subtype I-D | ||||||
• Subtype III-A | ||||||
•Subtype III-B | ||||||
| ||||||
cas6e | • Subtype I-E | cse3 | casE | 1WJ9 | (RAMP) | ygcH |
| ||||||
cas6f | • Subtype I-F | csy4 | NA | 2XLJ | (RAMP) | y1727 |
| ||||||
cas7 | • Subtype I-A | csa2,csd2,cse4, csh2, csp1 and cst2 | casC | NA | COG1857 and COG3649 (RAMP) | devR and ygcJ |
• Subtype I-B | ||||||
• Subtype I-C | ||||||
• Subtype I-E | ||||||
| ||||||
cas8a1 | • Subtype I-A‡‡ | cmx1, cst1, csx8, csx13 and CXXC-CXXC | NA | NA | BH0338-like | LA3191§§and PG2018§§ |
| ||||||
cas8a2 | • Subtype I-A‡‡ | csa4 and csx9 | NA | NA | PH0918 | AF0070,AF1873, MJ0385,PF0637,PH0918 and SSO1401 |
| ||||||
cas8b | • Subtype I-B‡‡ | csh1 and TM1802 | NA | NA | BH0338-like | MTH1090 and TM1802 |
| ||||||
cas8c | • Subtype I-C‡‡ | csd1 and csp2 | NA | NA | BH0338-like | BH0338 |
| ||||||
cas9 | • Type II‡‡ | csn1 and csx12 | NA | NA | COG3513 | FTN_0757 and SPy1046 |
| ||||||
cas10 | • Type III‡‡ | cmr2, csm1 and csx1 1 | NA | NA | COG1353 | MTH326,Rv2823c§§ and TM 1794§§ |
| ||||||
cas10d | • Subtype I-D‡‡ | csc3 | NA | NA | COG1353 | slr7011 |
| ||||||
csy1 | • Subtype I-F‡‡ | csy1 | NA | NA | y1724-like | y1724 |
| ||||||
csy2 | • Subtype I-F | csy2 | NA | NA | (RAMP) | y1725 |
| ||||||
csy3 | • Subtype I-F | csy3 | NA | NA | (RAMP) | y1726 |
| ||||||
cse1 | • Subtype I-E‡‡ | cse1 | casA | NA | YgcL-like | ygcL |
| ||||||
cse2 | • Subtype I-E | cse2 | casB | 2ZCA | YgcK-like | ygcK |
| ||||||
csc1 | • Subtype I-D | csc1 | NA | NA | alr1563-like(RAMP) | alr1563 |
| ||||||
csc2 | • Subtype I-D | csc1 and csc2 | NA | NA | COG1337(RAMP) | slr7012 |
| ||||||
csa5 | • Subtype I-A | csa5 | NA | NA | AF1870 | AF1870,MJ0380,PF0643 and SSO1398 |
| ||||||
csn2 | • Subtype II-A | csn2 | NA | NA | SPy1049-like | SPy1049 |
| ||||||
csm2 | • Subtype III-A‡‡ | csm2 | NA | NA | COG1421 | MTH1081 and SERP2460 |
| ||||||
csm3 | • Subtype III-A | csc2 and csm3 | NA | NA | COG1337(RAMP) | MTH1080 and SERP2459 |
| ||||||
csm4 | • Subtype III-A | csm4 | NA | NA | COG1567(RAMP) | MTH1079 and SERP2458 |
csm5 | • Subtype III-A | csm5 | NA | NA | COG1332(RAMP) | MTH1078 and SERP2457 |
| ||||||
csm6 | • Subtype III-A | APE2256 and csm6 | NA | 2WTE | COG1517 | APE2256 and SSO1445 |
| ||||||
cmr1 | • Subtype III-B | cmr1 | NA | NA | COG1367(RAMP) | PF1130 |
| ||||||
cmr3 | • Subtype III-B | cmr3 | NA | NA | COG1769(RAMP) | PF1128 |
| ||||||
cmr4 | • Subtype III-B | cmr4 | NA | NA | COG1336(RAMP) | PF1126 |
| ||||||
cmr5 | • Subtype III-B‡‡ | cmr5 | NA | 2ZOP and 2OEB | COG3337 | MTH324 and PF1125 |
| ||||||
cmr6 | • Subtype III-B | cmr6 | NA | NA | COG1604(RAMP) | PF1124 |
| ||||||
csb1 | • Subtype I-U | GSU0053 | NA | NA | (RAMP) | Balac_1306 and GSU0053 |
| ||||||
csb2 | • Subtype I-U§§ | NA | NA | NA | (RAMP) | Balac 1305 and GSU0054 |
| ||||||
csb3 | • Subtype I-U | NA | NA | NA | (RAMP) | Balac_1303# |
| ||||||
csx17 | • Subtype I-U | NA | NA | NA | NA | Btus_2683 |
| ||||||
csx14 | • Subtype I-U | NA | NA | NA | NA | GSU0052 |
| ||||||
csx10 | • Subtype I-U | csx10 | NA | NA | (RAMP) | Caur_2274 |
| ||||||
csx16 | • Subtype III-U | VVA1 548 | NA | NA | NA | VVA1 548 |
| ||||||
csaX | • Subtype III-U | csaX | NA | NA | NA | SSO1438 |
| ||||||
csx3 | • Subtype III-U | csx3 | NA | NA | NA | AF1864 |
| ||||||
csx1 | • Subtype III-U | csa3,csx1,csx2,DXTHG, NE0113 and TIGR02710 | NA | 1XMX and 2I71 | COG1517 and COG4006 | MJ1666, NE0113, PF1127 and TM1812 |
| ||||||
csx15 | • Unknown | NA | NA | NA | TTE2665 | TTE2665 |
| ||||||
csf1 | • Type U | csf1 | NA | NA | NA | AFE_1038 |
| ||||||
csf2 | • Type U | csf2 | NA | NA | (RAMP) | AFE_1039 |
| ||||||
csf3 | • Type U | csf3 | NA | NA | (RAMP) | AFE_1040 |
| ||||||
csf4 | • Type U | csf4 | NA | NA | NA | AFE_1037 |
N, amino; NA, not applicable; RAMP, repeat-associated mysterious protein.
Includes the names of all genes that have been shown to function within the CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems and/or are associated with CRISPR–cas loci in diverse genomes. Genes that are associated with CRISPR–cas loci in only one or a few closely related genomes are not included. Subsequent to their original publication13, Haft et al. introduced a number of new types of CRISPR-Cas systems as well as gene names that are included in the TIGRFAMs database50 but mostly fit into previously described gene and protein families.
The updated TIGRFAMs identifiers are given in Supplementary information S4 (table). The csx names are temporarily given to cas genes that cannot be confidently included in any of the large cas families but are currently not characterized in sufficient detail to rule out the possibility of such assignments in the future. Beginning with release 10.1 (ftp://ftp.jcvi.org/pub/data/TIGRFAMs/), the hidden Markov model (HMM)-based classifiers in TIGRFAMs assign polythetic names reflecting the nomenclature changes described here while retaining the narrower protein family granularities of the original nomenclature13.
See REF. 13. Most of the families correspond to those proposed by Makarova et al.14, with a few changes and additions.
See REF. 24.
All available structures are listed; see the Protein Data Bank (PDB).
Tentative predictions based on weak sequence similarity, sequence length and gene order in an operon.
See the clusters of orthologous groups of proteins (COGs) database.
These are signature genes for these CRISPR-Cas system types and subtypes.
Unclassified.