Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2015 Apr 15.
Published in final edited form as: Cell. 2009 Jun 26;137(7):1173–1174. doi: 10.1016/j.cell.2009.06.010

Common Ancestry of the CENP-A Chaperones Scm3 and HJURP

Luis Sanchez-Pulido 1, Alison L Pidoux 2, Chris P Ponting 1,*, Robin C Allshire 2,*
PMCID: PMC4397584  EMSID: EMS32198  PMID: 19563746

The centromere is a unique chromosomal locus that ensures accurate segregation of chromosomes during cell division. The centromere supports assembly of a multiprotein complex called the kinetochore, which attaches to spindle microtubules. The kinetochore has specialized nucleosomes in which histone H3 is replaced by the centromere-specific H3 variant CENP-A/cenH3 (reviewed in Allshire and Karpen, 2008). Two recent papers in Cell (Dunleavy et al., 2009; Foltz et al., 2009) have identified a new protein partner for soluble human CENP-A called HJURP/hFLEG/FAKTS that promotes the incorporation of CENP-A at centromeres.

Comparing the mechanism of deposition of CENP-A at centromeres across different organisms is difficult due to species-specific differences in the cell-cycle timing of CENP-A incorporation. Moreover, across eukaryotes, apparently diverse proteins associate with CENP-A and mediate its assembly into nucleosomes. There is, however, a lack of evidence for common ancestry (i.e., homology) among these proteins. In the budding yeast Saccharomyces cerevisiae and fission yeast Schizosaccharomyces pombe, the centromere-associated Scm3 protein binds to CENP-A and is required for incorporation of CENP-A into centromeric chromatin (Camahort et al., 2007; Mizuguchi et al., 2007; Stoler et al., 2007 Pidoux et al., 2009; Williams et al., 2009).

Similarities in the behavior and roles of Scm3 and HJURP suggest that they occupy the same functional niche (Dunleavy et al., 2009). Might these two functionally analogous proteins also be homologs whose shared common ancestry has been difficult to discern owing to substantial sequence divergence? Their non-overlapping phylogenetic ranges suggest that this is a distinct possibility.

To address this question we sought to determine the evolutionary provenances of fungal Scm3 and of mammalian HJURP. In an initial database search (Supplemental Experimental Procedures available online), the first Scm3 homolog to be detected in organisms other than fungi was found in a marine choanoflagellate Monosiga brevicollis. Tantalizingly, this search also showed marginal (but nonsignificant) sequence similarities between the Scm3 family and bovine HJURP. Conversely, BLASTp and tBLASTn searches identified homologs of human HJURP in frogs and birds (E < 2 × 10−3), the first homologs reported in organisms other than mammals.

A sensitive profile sequence search identified additional, more divergent Scm3-like sequences beyond fungi and Monosiga. This established marginal similarity between fungal Scm3 and bovine HJURP with an E value of 0.7 (Figure S1) and implies that less than one sequence is expected to be found in this database search with an equivalent or better alignment score simply by chance. This finding, together with the known functional similarities between Scm3 and HJURP proteins, would be consistent with their common ancestry.

Finally, we provide confirmation of this prediction by comparing the profile of the fungal Scm3 protein alignment with that for the metazoan HJURP alignment, and vice versa, using HHpred (Soding et al., 2005). In each comparison, sequence similarity between these two families was statistically highly significant (E < 10−5) (Figure S1). This level of significance implies that these proteins are homologous members of a wider Scm3/HJURP protein family.

The inclusion within sequence alignments of both families of more divergent, yet homologous, sequences from, for example, the frog Xenopus tropicalis and the choanoflagellate Monosiga brevicollis, appears to have allowed more sensitive sequence searches. This explains why this remote homologous relationship had hitherto escaped detection (Aravind et al., 2007). Our searches failed to identify members of this family in the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster. Although more rapid evolution occurs in these lineages, it cannot be discounted that these organisms possess alternative, functionally analogous molecules for loading CENP-A at centromeres. Besides Scm3 or HJURP orthologs, no additional homologs were discernible for any species.

Our analyses were unable to confirm the prediction by others of tryptophan-aspartic acid (WD40) repeats in HJURP, similar to those found in chromatin assembly factors (Foltz et al., 2009). Indeed, the first such proposed repeat encompasses the Scm3-HJURP homologous domain, which instead of containing β sheets as in WD40 repeats is predicted to contain an α helix. Moreover, the proposed conserved tryptophan residue is substituted in frog HJURP (Figure S1). We provide statistical support for a mixed α + β domain, instead of a WD40 β sheet repeat, in the N-terminal region of Scm3 and HJURP proteins. The only domain that all Scm3 and HJURP proteins have in common is the “Scm3 domain” (Figure S1), which may harbor a CENP-A-binding site. Indeed, the region in Scm3Sc that binds to CENP-ACse4 (residues 90–193) encompasses the Scm3 domain (residues 90–142) (Mizuguchi et al., 2007; Aravind et al., 2007). Moreover, two Scm3Sp amino acid substitutions within the Scm3 domain that disrupt CENP- ACnp1 localization in S. pombe are of leucine residues (Pidoux et al., 2009), which are conserved in other species as small hydrophobic side chains (leucine, isoleucine, or methionine; Figure S1). Other mutations within the Scm3 domain of Scm3Sp (N50S) or just C-terminal to it (N100S) also disrupt the localization of CENP-ACnp1 (Williams et al., 2009).

Our finding that the Scm3 domain is shared between, and is unique to, human HJURP and yeast Scm3 unites previously disparate lines of research. Descriptions of functionally similar, yet seemingly distinct, proteins from S. cerevisiae, S. pombe, and human cells might imply that centromere-specific histone incorporation differs greatly between fungi and animals. Instead, control of CENP-A incorporation at centromeres via Scm3/HJURP appears to be common to these eukaryotes. There are, however, likely to be some derived, lineage-specific features of CENP-A incorporation because of the various domains added to and deleted from Scm3 in different fungi (Aravind et al., 2007) and in vertebrates (Figure S1).

Our analyses reconcile previous observations by demonstrating that fungal Scm3 proteins are indeed distant counterparts of human HJURP. Thus, investigation of Scm3 and associated proteins is likely to be directly relevant to understanding the mechanism of HJURP-mediated CENP-A chromatin assembly at human centromeres.

Supplementary Material

1

Sequence analysis of the Scm3/HJURP protein family.

Top Panel: Schematic representation of evolutionary conserved regions among Scm3 domain-containing proteins.

Bottom Left Panel: Representative multiple sequence alignments of conserved regions (blue, green and red) in Scm3 proteins from tetrapods.

Bottom Right Panel: Numbers correspond to global profile-to-sequence (HMMer) and profile-to-profile (HHpred) comparison E-values between the animal HJURP and fungal Scm3 domain alignments (Eddy, 1996; Soding et al., 2005). Arrows indicate the profile search direction. These significant E-values and the consistency of secondary structure predictions, provide confidence that the Scm3 domain is present in tetrapod HJURP proteins. Alignments were produced with T-Coffee and HMMer (Eddy, 1996; Notredame et al., 2000) using default parameters, slightly refined manually and viewed with the Belvu program (Sonnhammer and Hollich, 2005). The main groups of Scm3 domain-containing proteins are indicated by coloured bars to the left of the Scm3 domain alignment (blue box): red (tetrapods), yellow (choanoflagellate) and violet (fungi). The colouring scheme indicates average BLOSUM62 scores (correlated with amino acid conservation) for each alignment column: red (greater than 2.5), violet (between 2.5 and 1) and light yellow (between 1 and 0.2). Tetrapod sequences were obtained from UniProt, ENSEMBL, GenBank and GSC-WUSTL databases (Wu et al., 2006, Hubbard et al., 2009), but were supplemented by manually assembled ESTs and FGENESH+-predicted gene models (Solovyev et al., 2002). Tetrapod sequences are named according to their genus or common name. Accession numbers, database of origin and species names are: Human, Q8NCD3, Homo sapiens; Tarsier, ENSTSYP00000007653, Tarsius syrichta; Mouse, ENSMUSP00000054263; Mus musculus; Bovine, UPI0000F33924, Bos taurus; Dolphin, ENSTTRP00000004146, Tursiops truncatus; Platypus, ENSEMBL, Ornithorhynchus anatinus; Anolis, ENSEMBL, Anolis carolinensis; Chicken, Q5ZLF3; Gallus gallus; ZebraFinch, GSC-WUSTL, Taeniopygia guttata; Frog, ENSEMBL, Xenopus tropicalis. Monosiga brevicollis and fungal sequences, obtained from UniProt database [Wu et al., 2006], are named with their species name abbreviations. Their corresponding accession numbers are: A9V3K2, Monosiga brevicollis (choanoflagellate); Q12334, Saccharomyces cerevisiae; Q55S59, Cryptococcus neoformans; B6K7K7, Schizosaccharomyces japonicus; Q9HDY7, Schizosaccharomyces pombe; Q1DZJ0, Coccidioides immitis; A1C460, Aspergillus clavatus; Q5BD66, Emericella nidulans; B6QLW7, Penicillium marneffei; A7F2V5, Sclerotinia sclerotiorum; B2AYH7, Podospora anserina; A3LNZ2, Pichia stipitis; B2WCW5, Pyrenophora tritici-repentis; Q5AJC3, Candida albicans; and, A5DY01, Lodderomyces elongisporus. Secondary structure predictions were performed independently for the animal and fungal Scm3 domains, using PsiPred (Jones, 1999). Both results predicted the presence of a long alpha-helix located at the N-terminus of the domain (indicated by grey cylinders).

Acknowledgements

L.S.-P. is supported by an EMBO Long Term Fellowship, and the C.P.P. and R.C.A. laboratories by the Medical Research Council and the Wellcome Trust, respectively. R.C.A. is a Wellcome Trust Principal Research Fellow.

Footnotes

Supplemental Data include Supplemental Experimental Procedures, one figure, and Supplemental References and can also be found at http://www.cell.com/supplemental/S0092-8674(09)00708-9.

References

  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O’Donovan C, Redaschi N, Suzek B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Research. 2006;34:D187–191. doi: 10.1093/nar/gkj161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Eddy SR. Hidden Markov models. Curr. Opin. Struc. Biol. 1996;6:361–365. doi: 10.1016/s0959-440x(96)80056-x. [DOI] [PubMed] [Google Scholar]
  4. Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research. 2005;33:W244–248. doi: 10.1093/nar/gki408. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Sequence analysis of the Scm3/HJURP protein family.

Top Panel: Schematic representation of evolutionary conserved regions among Scm3 domain-containing proteins.

Bottom Left Panel: Representative multiple sequence alignments of conserved regions (blue, green and red) in Scm3 proteins from tetrapods.

Bottom Right Panel: Numbers correspond to global profile-to-sequence (HMMer) and profile-to-profile (HHpred) comparison E-values between the animal HJURP and fungal Scm3 domain alignments (Eddy, 1996; Soding et al., 2005). Arrows indicate the profile search direction. These significant E-values and the consistency of secondary structure predictions, provide confidence that the Scm3 domain is present in tetrapod HJURP proteins. Alignments were produced with T-Coffee and HMMer (Eddy, 1996; Notredame et al., 2000) using default parameters, slightly refined manually and viewed with the Belvu program (Sonnhammer and Hollich, 2005). The main groups of Scm3 domain-containing proteins are indicated by coloured bars to the left of the Scm3 domain alignment (blue box): red (tetrapods), yellow (choanoflagellate) and violet (fungi). The colouring scheme indicates average BLOSUM62 scores (correlated with amino acid conservation) for each alignment column: red (greater than 2.5), violet (between 2.5 and 1) and light yellow (between 1 and 0.2). Tetrapod sequences were obtained from UniProt, ENSEMBL, GenBank and GSC-WUSTL databases (Wu et al., 2006, Hubbard et al., 2009), but were supplemented by manually assembled ESTs and FGENESH+-predicted gene models (Solovyev et al., 2002). Tetrapod sequences are named according to their genus or common name. Accession numbers, database of origin and species names are: Human, Q8NCD3, Homo sapiens; Tarsier, ENSTSYP00000007653, Tarsius syrichta; Mouse, ENSMUSP00000054263; Mus musculus; Bovine, UPI0000F33924, Bos taurus; Dolphin, ENSTTRP00000004146, Tursiops truncatus; Platypus, ENSEMBL, Ornithorhynchus anatinus; Anolis, ENSEMBL, Anolis carolinensis; Chicken, Q5ZLF3; Gallus gallus; ZebraFinch, GSC-WUSTL, Taeniopygia guttata; Frog, ENSEMBL, Xenopus tropicalis. Monosiga brevicollis and fungal sequences, obtained from UniProt database [Wu et al., 2006], are named with their species name abbreviations. Their corresponding accession numbers are: A9V3K2, Monosiga brevicollis (choanoflagellate); Q12334, Saccharomyces cerevisiae; Q55S59, Cryptococcus neoformans; B6K7K7, Schizosaccharomyces japonicus; Q9HDY7, Schizosaccharomyces pombe; Q1DZJ0, Coccidioides immitis; A1C460, Aspergillus clavatus; Q5BD66, Emericella nidulans; B6QLW7, Penicillium marneffei; A7F2V5, Sclerotinia sclerotiorum; B2AYH7, Podospora anserina; A3LNZ2, Pichia stipitis; B2WCW5, Pyrenophora tritici-repentis; Q5AJC3, Candida albicans; and, A5DY01, Lodderomyces elongisporus. Secondary structure predictions were performed independently for the animal and fungal Scm3 domains, using PsiPred (Jones, 1999). Both results predicted the presence of a long alpha-helix located at the N-terminus of the domain (indicated by grey cylinders).

RESOURCES