Table 3.
Functional and sequence characteristics of subfamilies
| Grouping | Subfamily | Functional characteristics |
|---|---|---|
| Snf2-like | Snf2 | The archetype of the Snf2 subfamily, and the entire Snf2 family, is the S.cerevisiae Snf2 protein, originally identified genetically (mutations that were Sucrose Non Fermenting or defective in mating type SWItching). However, these genes were later found to play roles in regulating transcription of a broader spectrum of genes and to catalyse alterations to chromatin structure. Subsequently, the proteins were purified as a non-essential 11 subunit multi-protein complex capable of ATP-dependent chromatin disruption termed the SWI/SNF complex (8) |
| Close sequence homologues have also been identified in many model organisms, including the paralogue RSC (60,61) and the orthologues D.melanogaster Brahma (62), and human hBRM (63) and BRG1 (64). Many of these have been shown to alter the structure of chromatin at the nucleosomal level and to be involved in transcription regulation, although other nucleosome-related roles have also been identified (8). Recent hypotheses have centred on Snf2 subfamily members performing a generally disruptive function on nucleosomes leading either to sliding of the nucleosome (65,66) alterations of histone DNA contacts (67) or to partial or complete removal of the histone octamer components (68,69) | ||
| Homologues such as BRG1 and hBRM are components of megadalton-sized complexes containing other proteins that are also related to components of the yeast SWI/SNF complex (70). However, Snf2 subfamily members have also been reported to interact with additional proteins including histone deacetylases (71), methyl DNA-binding proteins (72), histone methyl transferases (73), the retinoblastoma tumor suppressor protein (74,75), histone chaperones (76), Pol II (77,78) and cohesin (79). These complexes may be recruited to specific regions of the genome through interactions with sequence-specific DNA-binding proteins [reviewed by (80)] or specific patterns of histone modifications (81,82) | ||
| ISWI | Iswi (Imitation SWI2) protein was identified in D.melanogaster by similarity to Snf2p (83) and is at the catalytic core of both the NURF and the ACF/CHRAC chromatin remodelling complexes (84–86). Biochemical studies favour the ability of Iswi proteins to reposition rather than disrupt nucleosomes. Significantly, all Iswi subfamily proteins require a particular region of the histone H4 tail near the DNA surface as an allosteric effector (87–89) | |
| Iswi subfamily members participate in a variety of complexes and functional interactions. For example, human SNF2H has been found as part of RSF, hACF/WCRF, hWICH, hCHRAC, NoRC and also associated with cohesin, while SNF2L is the catalytic subunit of human NURF [summarized in (90)]. Such complexes are involved in a variety of functions including activation/repression of the initiation and elongation of transcription, replication and chromatin assembly [reviewed in (90–93)]. Similar to the Snf2 subfamily, Iswi subfamily members appear to be adaptable subunits for complexes related to the alteration of nucleosome positioning (90) | ||
| Lsh | Despite its name, the archetypal mouse Lsh (lymphoid-specific helicase) protein (94) is widely expressed and without detectable helicase activity. Lsh and its human homologue are alternatively known as PASG, SMARCA6 or by the official gene name HELLS (Helicase Lymphoid Specific). Mutants lead to premature aging with cells exhibiting replicative senescence (95). Importantly, global loss of CpG methylation is observed in both mammalian mutant cell lines and the Arabidopsis thaliana homologue, DDM1 (96,97). Consistent with a direct role in DNA methylation, Lsh is localized to heterochromatic regions (98). Evidence has been presented that A.thaliana DDM1 can slide nucleosomes in vitro (99). The S.cerevisiae subfamily member at locus YFR038W has no assigned name and deletion strains are viable | |
| Lsh subfamily members are detected over a very broad range of eukaryotes including not only fungi, plants and animals, but also protists where their function is likely to be independent of DNA methylation. Furthermore, our genome scans also did not identify Lsh subfamily members in a number of lower animals, or in S.pombe. This may represent functional redundancy or difficulties in assigning distant homologues relative to other subfamilies in the grouping | ||
| ALC1 | The ALC1 subfamily derives its name from the observation that the human gene is ‘Amplified in Liver Cancer’ (100). Two alternative but potentially confusing names, CHD1L [CHD1-like (101)] and SNF2P [SNF2-like in plants (102)], have also been used to refer to subfamily members. ALC1 subfamily members contain a helicase-like region which is relatively similar to the nucleosome-active Snf2, Iswi and Chd1 subfamilies, but which is coupled ahead to a macro domain implicated in ADP–ribose interactions (103). ALC1 subfamily members can be identified in both higher animals and plants, but not in lower animals | |
| Chd1 | The archetypal ‘Chd’ protein is mouse Chd1, named after the presence of ‘Chromodomain, Helicase and DNA binding’ motifs (104). The characteristic chromodomain motifs can in principle bind diverse targets including proteins, DNA and RNA (105). Although Snf2 family proteins containing chromodomains are often referred to as a single ‘Chd’ subfamily, it has previously been recognized they fall into the same three distinct subfamilies (106,107) which we have distinguished in this analysis | |
| Mouse Chd1 protein is the archetype of the first chromodomain-containing subfamily. Chd1 proteins have been purified as single subunits (108,109) although associations between Chd1 and other proteins have been identified subsequently (110,111). Yeast Chd1 has been implicated in transcription elongation and termination (58,112), and the human CHD1 and CHD2 proteins and D.melanogaster dChd1 have been linked to transcriptional events (113). S.pombe Chd1 subfamily member hrp1 (helicase related protein 1) has been linked with both transcription termination (58) and chromosome condensation (114), whereas the paralogous hrp3 has been linked with locus-specific silencing (115) | ||
| Mi-2 | The second of the chromodomain-containing subfamilies is Mi-2, whose name derives from the Mi-2α and Mi-2β proteins which are the commonly used names for the human CHD3 and CHD4 gene products, respectively. Mi-2 was isolated as an autoantigen in the human disease dermatomyositis (116). Subsequently, the two proteins and their homologues in D.melanogaster and Xenopus have been recognized as core subunits of NuRD complexes which link DNA methylation to chromatin remodelling and deacetylation (117). The chromodomains in D.melanogaster Mi-2 are required for activity on nucleosome substrates (118). Human Mi-2α differs from Mi-2β principally by its additional C-terminal domain which directs complexes containing it for a specific transcriptional repression role (119). Since Mi-2 proteins are widely expressed but have specific roles, it has been suggested they may be directed by incorporation of different targeting subunits (117,120) | |
| An additional human member of the Mi-2 subfamily, CHD5, may have a role in neural development and neuroblastomas (121,122) although its biochemical associations are unknown. The A.thaliana subfamily member, PKL (swollen roots of mutants resemble a pickle), has been shown to play a role in repressing embryonic genes during plant development (123) | ||
| CHD7 | The third chromodomain-containing CHD7 subfamily includes four human genes, CHD6–CHD9. CHD7 has recently been linked to CHARGE syndrome which is a common cause of congenital abnormalities (124) with most linked mutations resulting in major nonsense, frameshift or splicing changes (125). There is little functional information available about CHD6 [originally known as CHD5 (126)], CHD8 or CHD9 [also known as CReMM (127)]. The most studied member of the CHD7 subfamily is the product of the D.melanogaster gene kismet. The enormous 574 kDa KIS-L (but not the ‘smaller’ 225 kDa KIS-S form) contains a Snf2 family helicase-like region (128). Although identified as a trithorax family gene acting during development, a recent report suggests that KIS-L may play a global role at an early stage in RNA pol II elongation (129) | |
| Swr1-like | Swr1 | The archetype of the Swr1 subfamily is Swr1p (SWI/SNF-related protein) from S.cerevisiae which is part of the large SWR1 complex that exchanges histone H2A.Z-containing for wild-type H2A-containing dimers (41–43,130). Three other characterized proteins belong to the subfamily: PIE1 is involved in the A.thaliana vernalization regulation [Photoperiod-Independent Early flowering (131)] through a pathway intimately linked with histone lysine methylation events (132). D.melanogaster Domino is an essential, development-linked protein (133) and alleles can suppress position effect variegation, implying they may be linked to heterochromatin functions. Domino participates in a complex that combines components of the homologous yeast SWR1 and NuA4 complexes (134), and has been shown to function in acetylation dependent histone variant exchange within the TRRAP/TIP60 complex (135). The human member SRCAP [Snf2p-related CBP activating protein (136)] acts as a transcriptional co-activator of steroid hormone dependent genes and has recently been shown to be a component of a human TRRAP/TIP60 complex (137) and other complexes (see EP400 below). It also interacts with several coactivators including CBP (136,138). SRCAP can rescue D.melanogaster domino mutants (139), implying functional homology |
| EP400 | The EP400 subfamily archetype, E1A-binding protein p400, appears to have a role in regulation of E1A-activated genes (140–142). EP400 has been shown to interact strongly with ruvB-like helicases TAP54α/β in the TRRAP/TIP60 histone acetyl transferase complex (142) | |
| The complex patterns of similarities and distinctions between EP400 and Swr1 subfamilies suggest a close functional relationship (Supplementary Figure S3A). Our HMM profiles clearly distinguish EP400 from Swr1 members, and show that EP400 members are restricted to vertebrates whereas Swr1 subfamily members are found in almost all eukaryotes (Table 2 and Supplementary Table S1A). Although most vertebrate genomes contain a gene each for an Swr1 and EP400 member, some have only one of the pair. The consensus for the helicase-like regions of the subfamilies shows 50% identity and animal members of both contain large proline and serine/threonine rich insertions at the major insertion site (Supplementary Figure S3A; see below). Members of the two subfamilies also contain overlapping combinations of accessory domains outside the helicase-like region: D.melanogaster DominoA (Swr1 subfamily) and human EP400 (EP400 subfamily) both contain SANT domains, whereas human SRCAP (Swr1 subfamily) instead contains an AT hook (139). This has led to some confusion, with human EP400 being referred to as hDomino although D.melanogaster Domino has higher similarity to human SRCAP (142) and SRCAP can complement Domino mutants (139). In addition to the complexity in primary sequence relationships, complexes of potentially overlapping composition exist involving human EP400 and SRCAP including the NCoR-1 histone deacetylase (143), TRRAP/TIP60 histone acetylase (137,142,144) and DMAP1 complex (145). The confusing overlap of the mammalian Swr1 and EP400 subfamily members may stem from multiple roles for Swr1 subfamily members in lower animals and fungi. For example, it has been suggested that D.melanogaster alternative splice isoforms DominoA and DominoB are functional homologues of EP400 and SRCAP, respectively (139), and that S.cerevisiae Eaf1p is a functional homologue of human EP400 (134) although it lacks both Snf2-related helicase-like and extended proline-rich regions | ||
| Ino80 | The archetype of the Ino80 subfamily is the Ino80 protein from S.cerevisiae. Further members have been identified by sequence similarity in fungi, plants and animals (24). Ino80p was first isolated through its role in transcriptional regulation of inositol biosynthesis (146,147) and forms part of the large Ino80.com complex (148). This complex is notable not only because it can reposition nucleosomes, but also because it is the only known Snf2 family-related complex able to separate DNA strands in a traditional helicase assay (148). However, the Ino80 complex contains two RuvB-like helicase subunits which may assist in this. The human INO80 complex has recently been shown to contain many proteins homologous to Ino80.com subunits, including the RuvB-like helicases, and to be capable of mobilizing mononucleosomes (149). S.cerevisiae INO80-deleted strains are sensitive to DNA damaging agents, and recent studies have implicated Ino80p directly in the events of double-stranded break repair (150,151), perhaps for the eviction of nucleosomes in the vicinity of the break (152) although other remodelling complexes such as Swr1, RSC and SWI/SNF may also participate in this repair pathway [reviewed in (153)] | |
| Etl1 | Mouse Etl1 (Enhancer Trap Locus 1) derives its name from identification in an expression screen for loci having interesting properties in early development (154). Although members are present in all but the lowest eukaryotes, including the human homologue SMARCAD1 (SMARCA containing DEAD box 1) (155) and S.cerevisiae FUN30 (Function Unknown 30) (156), very little attention has been focussed on these proteins. Etl1 is very widely expressed but non-essential, although deletion causes a variety of significant developmental phenotypes (157). FUN30 deletions are viable although temperature sensitive (158), and mutants show decreased sensitivity to UV radiation (159) | |
| Rad54-like | Rad54 | The archetype of the Rad54 subfamily is the Rad54 protein from S.cerevisiae which was isolated because its inactivation leads to increased sensitivity to ionizing radiation. Rad54p and its homologues in other organisms play an important but as yet incompletely understood role in homologous recombination by stimulating Rad51-mediated single strand invasion into the target duplex, and subsequent steps in the process (160,161). Many organisms also contain a second subfamily member, such as S.cerevisiae Rdh54p or S.pombe tid1p. These are frequently implicated in mitotic repair and meiotic crossover (162), although the role of the human homologue RAD54B is unclear (163) |
| Rad54 proteins have been extensively studied in vitro. They have been shown to be able to generate local changes in DNA topology in supercoiled plasmids (13,164,165), to translocate along DNA by biochemical (11) and other methods, and to alter the accessibility of nucleosomal DNA (11,166,167). However, this latter activity appears inefficient compared to purified complexes from the Snf2 and Iswi subfamilies. The crystal structure of the zebrafish Rad54 helicase-like region has been determined recently (47) and is discussed in more detail in the text | ||
| ATRX | The ATRX subfamily derives its name from the Alpha Thalassemia/Mental Retardation syndrome, X-linked genetic disorder caused by defects in the activity of the human member, ATRX (168). This protein is localized to centromeric heterochromatin (169), and purified complexes have been shown to increase the accessibility of nucleosomal DNA although with only moderate efficiency (12). ATRX has been implicated both in the regulation of transcription and heterochromatin structure (170), although the mechanism by which it acts is unclear | |
| Arip4 | Mouse androgen receptor interacting protein 4 (171) can bind to DNA and generate ATP-dependent local torsion (16). Although it can also bind nucleosomes, Arip4 does not appear to be able to alter their nuclease sensitivity, leading to the conclusion that nucleosome mobilization may not be its primary role (172). Interestingly, mutation of the six lysine sumoylation sites in the protein destroyed DNA binding and ATPase activity (172) | |
| DRD1 | A.thaliana DRD1 is named from its phenotype ‘Defective in RNA-directed DNA methylation’ (173). DRD1 functions together with an atypical RNA polymerase IV to establish and also remove non-CpG DNA methylation as part of an RNA interference mediated pathway (174,175) | |
| JBP2 | The JBP2 subfamily takes its name from the T.brucei J Binding Protein 2 which regulates insertion of an unusual glycosylated thymine-derived base, J, which marks silenced telomeric DNA (176) | |
| Rad54-like (cont.) | Both DRD1 and JBP2 are involved in processes which target modifications at the C5 position of the pyrimidine ring which will be exposed in the major groove. JBP2 and DRD1 members show sequence similarity, but have been conservatively assigned to separate subfamilies due to their distinct evolutionary ranges and the limited numbers of members available for building HMM profiles. The identification of a DRD1 subfamily member in Dictyostelium suggests that the subfamilies may be more widespread than indicated by the current small sample set. The L.major, T.brucei and T.cruzi genomes with JBP2 subfamily members also contain proteins assigned to the Arip4 subfamily whose members are otherwise found in higher fungi and animals, but not in DRD1-containing plants. It is possible that a relationship encompasses not only DRD1 and JBP2, but also Arip4 | |
| Rad5/16-like | Rad5/16 | S.cerevisiae Rad5 and Rad16 proteins are distinct but dual archetypes for this subfamily and both are intimately involved in DNA repair pathways |
| Rad5p acts with the Ubc13p–Mms2p E2 ligase complex via its RING finger in one fate of the Rad6 pathway of replication linked DNA damage bypass to poly-ubiquitylate PCNA in (177). It has also been suggested that Rad5p participates in double-stranded break repair in a role dependent on its helicase-like region but not its RING finger (178). A clear function for the helicase-like motor in either role has not been suggested | ||
| Rad16p acts in complex with Rad7p and Elc1p as the NEF4 nucleotide excision repair factor (179,180), possibly scanning along chromatin for lesions as part of non-transcribed strand repair (179) or by distorting DNA to expose the lesion for processing (17). Although the basis is not known, the RING finger of Rad16p influences the stability of the Rad4 protein responsible for recognizing the lesion (180) | ||
| Paradoxically, no DNA repair link has been reported for the single member of the Rad5/16 subfamily present in each mammalian genome such as human SMARCA3 (see also Lodestar and ERCC6 sections of this table). Instead, under the name RUSH1alpha, some have been reported as steroid regulated transcriptional regulators (181) and, under the name HLTF, to be silenced in cancers (182) | ||
| Ris1 | The Ris1 protein from S.cerevisiae interacts with Sir4p and has a role in mating type silencing (183) Members are found in all fungi and plant genomes, but not in animals or lower eukaryotes | |
| Lodestar | This subfamily is the only one within the Rad5/16 grouping which does not contain RING fingers in the major insertion site (Figure S3B). D.melanogaster Lodestar protein was first identified as an essential cell-cycle regulated protein localizing to chromosomes during mitosis (184). Subsequently, the human homologue TTF2 was shown to terminate elongating RNA pol I and pol II complexes independently of transcript length, possibly by directly clearing Pol II from the template at the entry to mitosis (185). TTF2 may also have a role in interphase termination, and in repair (185). This suggestion is interesting because no clear functional homologue of S.cerevisiae Rad5p has been identified in the higher eukaryote genomes which Lodestar is restricted to (see also ERCC6 section of this table). TTF2 has been observed to rescue RNA polymerases stalled at lesions (186) | |
| SHPRH | SHPRH proteins derive their name from the ordered sequence of domains Snf2_N, Linker_Histone (i.e. H1), PHD finger, Zf_C3HC4 (i.e. e RING finger), Helicase_C in the human member (187) (Supplementary Figure S3B). The Linker_Histone and PHD finger motifs are located adjacent to each other at the minor insertion site between motifs I and Ia, whereas the RING finger domain is located at the major insertion site. The linker histone-related domain in human SHPRH corresponds to the globular winged helix structure of histone H1 (188) and transcription factor HNF3 (189,190). The PHD finger motifs are specialized zinc finger structures which occur in a range of proteins involved in chromatin-mediated transcriptional regulation but whose exact function is unclear (191,192) | |
| Fungal SHPRH subfamily members typically do not contain the linker histone-related motif, although they do contain the PHD and RING finger domains. Animal SHPRH members contain an additional ∼50 kDa polypeptide sequence immediately upstream of the RING finger domain within the major insertion region (Supplementary Figure S3B). Although lacking an identifiable motif, this region has a number of cysteines suggestive of a zinc finger type coordination and has some 30% charged residues. Fungal members also contain a significant region of charged residues ahead of the RING finger | ||
| SSO1653-like | Mot1 | S.cerevisiae Mot1 protein (Modifier of Transcription) (193) and homologues with highly conserved helicase-like regions are present across fungi and all higher eukaryotes, where they are known as BTAF1 or TAF172 (194). In vitro and in vivo studies suggest that Mot1p interacts intimately with TBP (195), probably acting to recycle it from DNA-bound states (196). Mot1p is therefore thought to be a Snf2 family enzyme whose role is not to manipulate nucleosome structure, although a possible direct involvement with chromatin has also been proposed (197) |
| ERCC6 | Human ERCC6 protein (198), also known as Cockayne Syndrome B (CSB), and S.cerevisiae homologue Rad26p (199) were initially regarded as repair proteins due to effects on transcription coupled nucleotide excision repair. However, it has more recently been suggested that the function of these proteins may be to assist transcribing RNA polymerases to either pass or dissociate from blocking DNA lesions (200). Such a role would not directly involve chromatin. The consequent barriers to transcription elongation and sensitivity to DNA damage for non-functional mutants would explain features of Cockayne syndrome and is analogous to the role of the non-Snf2 family Mfd DNA translocase from Escherichia coli (201) | |
| Most higher animal genomes contain three separate genes assigned to the ERCC6 subfamily, along with single Lodestar and Rad5/16 subfamily members. Conversely, fungal genomes typically encode a single ERCC6 member, no Lodestar subfamily member, but at least two Rad5/16 members. This may reflect divergent strategies for accomplishing transcription-coupled repair | ||
| A number of mutations in the helicase region which result in Cockayne syndrome have been identified and these map to interesting locations in the Snf2 family crystal structures (46,198). In vitro, purified ERCC6 protein can alter nuclease sensitivity and spacing of nucleosomes in an ATP-dependent manner (202). ERCC6 can also bind and negatively supercoil DNA in the presence of non-hydrolysable ATP analogues (203) | ||
| SSO1653 | SSO1653, the sole Snf2 family member in archaeal S.solfataricus, is the archetype for the uniquely archaeal and eubacterial subfamily most similar to the eukaryotic Snf2 proteins (see text). It is encoded in the P2 strain genome by juxtaposing SSO1653 and SSO1655 genes, which are punctuated by transposase SSO1654 inserted into the second recA domain immediately upstream of motif V. Although it is highly unlikely that a Snf2 family enzyme would be functional with a 40 kDa transposase insertion in this conserved part of the protein, the enzyme with transposase removed can generate DNA torsion in an ATP-dependent analogous to eukaryotic Snf2 family proteins and was used successfully for structure determination (46). Since a full-length gene can be cloned with appropriate screening (M. F. White, personal communication), the SSO1654 transposase must be active and we refer to the re-fusion of SSO1653 and SSO1655 for simplicity as SSO1653. No information is available for the biological role of any member of the subfamily, although a role in an archaeal chromatin remodelling can be excluded because S.solfataricus lacks archaeal histone-like proteins | |
| SSO1653-like (cont.) | The SSO1653 subfamily helicase-like region also shows close linkage with a zinc finger SWIM motif that may bind to nucleic acids (204,205). For example, coordinately regulated SSO1656 immediately downstream of SSO1653–1655 encodes a 26 kDa basic protein containing the SWIM motif. An SSO1653 subfamily member is present in all Bacillus and Streptococcus genomes (21), and many of these polypeptides also carry a SWIM motif in the same polypeptide. Polypeptides with a Snf2 family helicase-like region but lacking a SWIM motif appear to be in the same operon as a second smaller protein which carries the SWIM motif instead (204). Although the SWIM motif also occurs in eukaryotes, it has not been linked to any of the eukaryotic Snf2 family proteins (204) | |
| Distant | SMARCAL1 | The human SMARCAL1 (SMARCA-Like 1) protein and homologues, also known as HARP (22), are unusual within the analysis because they include two subtypes with highly similar helicase-like regions that are flanked by completely different auxiliary domains. The first consists of proteins in higher eukaryotes related to human SMARCAL1 itself with centrally located helicase-like regions and one or more Harp motifs immediately N-terminal to this. Mutants in the helicase-like region of human SMARCAL1 have been linked to a genetic disorder Schimke immuno-osseous dysplasia (206) although the molecular function of the SMARCAL1 protein is unknown. The bovine homologue ADAAD is stimulated by DNA single-double strand boundaries (207) |
| The second subtype is found in animal, plant and some protist genomes and contains SMARCAL1 subfamily members related in overall domain organization to the human ZRANB3 protein (Zinc finger, RAN-Binding domain containing 3). The helicase region is located at the N-terminus of the polypeptide, followed by an unusual zinc finger structure related to those found in Ran protein binding proteins (208), and a putative HNH type endonuclease domain at the C-terminus (209). No functional information about any proteins in this subtype is available | ||
| rapA group | The rapA group includes some 220 eubacterial and archaeal members with significantly more sequence variation than other subfamilies. Subsets of sequences are qualitatively visible within multiple alignments of the rapA group, but initial attempts to distinguish them have been unreliable due to the variability of microbial sequences and non-homogeneous sampling in sequenced organisms (e.g. half of all complete bacterial genomes are for a limited range of firmicute and gamma proteobacterial genera) | |
| Although the rapA group contains the conserved sequence patterns of the Snf2 family for the classical helicase-like motifs, other conserved blocks cannot be easily identified (Supplementary Figure S4). The characteristic extended span of at least 160 residues between helicase motifs III and IV (44) is maintained in the rapA group, but the central part of this region diverges markedly from the other subfamilies and lacks the highly conserved features characteristic of the Snf2 family. The specific difficulty of aligning this region has been remarked previously (20) | ||
| The rapA group also includes a number of polypeptides for which the helicase-like region comprises effectively the entire polypeptide, in contrast with other Snf2 family members which almost universally contain sequences outside the helicase-like region that are likely to form accessory domains or interaction surfaces | ||
| The only member of this group for which biological function has been investigated is E.coli rapA, also known as HepA (210), which influences polymerase recycling under high salt conditions, possibly by aiding the release of stalled polymerases (211) |
Summaries of known biochemical, biological and distinctive sequence of each subfamily. Background colouring of subfamilies for groupings as shown in Figure 1.