Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1997 Apr 15;94(8):3842–3847. doi: 10.1073/pnas.94.8.3842

The nude gene encodes a sequence-specific DNA binding protein with homologs in organisms that lack an anticipatory immune system

Thomas Schlake 1,*, Michael Schorpp 1,*, Michael Nehls 1,, Thomas Boehm 1,
PMCID: PMC20529  PMID: 9108066

Abstract

In the mouse, the product of the nude locus, Whn, is required for the keratinization of the hair shaft and the differentiation of epithelial progenitor cells in the thymus. A bacterially expressed peptide representing the presumptive DNA binding domain of the mouse whn gene in vitro specifically binds to a 11-bp consensus sequence containing the invariant tetranucleotide 5′-ACGC. In transient transfection assays, such binding sites stimulated reporter gene expression about 30- to 40-fold, when positioned upstream of a minimal promotor. Whn homologs from humans, bony fish (Danio rerio), cartilaginous fish (Scyliorhinus caniculus), agnathans (Lampetra planeri), and cephalochordates (Branchiostoma lanceolatum) share at least 80% of amino acids in the DNA binding domain. In agreement with this remarkable structural conservation, the DNA binding domains from zebrafish, which possesses a thymus but no hair, and amphioxus, which possesses neither thymus nor hair, recognize the same target sequence as the mouse DNA binding domain in vitro and in vivo. The genomes of vertebrates and cephalochordates contain only a single whn-like gene, suggesting that the primordial whn gene was not subject to gene-duplication events. Although the role of whn in cephalochordates and agnathans is unknown, its requirement in the development of the thymus gland and the differentiation of skin appendages in the mouse suggests that changes in the transcriptional control regions of whn genes accompanied their functional reassignments during evolution.

Keywords: forkhead, winged helix, whn gene, thymus, evolution


The primary function of the thymus is to generate and select a highly diverse repertoire of T cells that exhibit self-tolerance and restriction to self major histocompatibility complex. The importance of the thymic microenvironment in shaping the T cell repertoire has long been recognized, and recent work has demonstrated that positive and negative selection of developing T cells depends on cell–cell interactions with thymic epithelium (1, 2). In rodents with mutations at the nude locus, the thymus fails to form, causing severe immunodeficiency; the defect has been localized to the thymic microenvironment rather than to the developing T cells (3). The cloning of the gene, whn (winged helix nude) affected by the nude mutation (4) has provided the first glimpse of the genetic control of thymic epithelial differentiation. Although the initial formation of the thymic epithelial primordium before the entry of lymphocyte progenitors does not require the activity of whn, the subsequent differentiation of primitive precursor cells into subcapsular, cortical, and medullary epithelial cells of the mature thymus depends on the activity of the whn gene (5). Because expression of whn continues in thymic epithelial cells throughout life (5), whn may be required not only for the initiation of differentiation of epithelial progenitor cells but also for the maintenance of their differentiated phenotype. This suggests that the whn gene occupies a unique position in the genetic hierarchy controlling the differentiation of thymic epithelium. Furthermore, because whn may be regarded as a genetic marker of this cell lineage, it could be used to assess the phylogenetic origin of thymus formation. Previous research has suggested that an adaptive immune system may have first occurred in jawed vertebrates, with cartilaginous fish (sharks, etc.) as their most primitive representatives (6, 7). Indeed, no morphological evidence of a thymus gland has been reported for agnathans, such as lamprey (7).

The nude locus encodes a nuclear protein, Whn, which by sequence homology has been suggested to be a member of the forkhead/winged-helix transcription factor family (4). Although Whn contains a strong acidic transcriptional activation domain (8), its ability to bind to DNA in a sequence-specific manner has not yet been investigated. As with other transcription factors, the DNA binding domain can be expected to be strongly conserved among orthologs of whn in other species. In this paper, we therefore have investigated the DNA binding properties of the mouse whn gene product and have isolated whn genes from a cartilaginous fish, a jawless vertebrate species, and the cephalochordate amphioxus. Our results define consensus DNA binding sites for the mouse Whn protein and demonstrate a remarkable conservation of structural and functional properties of the DNA binding domains of whn genes from two subphyla of chordata.

MATERIALS AND METHODS

Genomic Libraries.

The human genomic P1 library (Genome Systems, St. Louis) was screened using exon-specific primers derived from the human whn cDNA, which itself was isolated from a human thymus λgt10 cDNA library (CLONTECH) by low-stringency hybridization. The zebrafish genomic λPS library was prepared from size-selected partial Sau3AI digests of DNA isolated from pooled brains of fish obtained from a local vendor. Eggs from Scyliorhinus caniculus were obtained from a local zoological garden, specimens of Lampetra planeri were obtained from A. Schreiber (Zoologisches Institut, Universität Heidelberg, Germany), and adult Branchiostoma lanceolatum specimens were purchased from the Biologische Anstalt Helgoland (Meeresstation Helgoland, Germany). Screening of the zebrafish library with a mouse cDNA probe was done essentially as described (8).

PCR Using Degenerated whn-Specific Primers.

From a comparison of nucleotide sequences encoding the whn DNA binding domains of human, mouse, rat, fugu, and zebrafish, redundant primers were derived from three highly conserved regions. Primer whn11, 5′-CC(C/T)GA(C/T)GGNTGGAAGAA(C/T)TC, spans the PDGWKNS amino acid motif; primer whn12, 5′-NGGC(A/T)G(C/T)CTNCCNGT(C/G)AG(C/T)GAG, spans the GSLPVSE amino acid motif; and primer whn13, 5′-(T/G)GC(A/C)GG(A/G)TT(C/G)AG(A/C)G(C/T)CCA(A/C)AG, spans the LWALNPS amino acid motif. PCR was performed under the following conditions: 35 cycles each of 45 sec at 95°C, 3 min at 58°C, and 2 min at 72°C using 50–100 ng of genomic DNA per 20 μl of genomic amplification reaction. Primers were used at a final concentration of 5 ng/μl. The PCR products obtained with genomic DNA from Scyliorhinus caniculus, Lampetra planeri, and Branchiostoma lanceolatum were directly sequenced. From the initial sequence information, specific primers were derived for each species, and the sequence was extended by a primer-walking strategy using genomic DNA digested with various restriction enzymes and a partially noncomplementary adapter (9).

Low-Stringency Hybridization Analysis.

To explore the possible presence of whn-like genes in various species, DNAs isolated from a Balb/c mouse, and single individuals of zebrafish and amphioxus were digested with restriction enzymes known to release a single fragment containing the exon(s) encoding the DNA binding domain. Southern filter hybridization and washes were performed at 65°C in 6 × SSC/0.1% SDS. cDNA probes that encompassed only the DNA binding domain were labeled by random priming and purified as described (9).

Cloning and Expression of DNA Binding Domains.

DNA binding domains were expressed as fusion proteins using the pQE32 vector (Qiagen, Chatsworth, CA), which appends a six-residue histidine tag at the N terminus. The subcloned fragments were generated by PCR and verified by sequencing. The three constructs encode the following N-terminal amino acid residues, MRGSHHHHHHGIRMARYPGSR, followed by a species-specific peptide and a final F residue. The mouse peptide consists of the following amino acids: YPYQRIAPQANAEGHQPLFPKPIYSYSILIFMALKNSKTGSLPVSEIYNFMTEHFPYFKTAPDGWKNSVRHNLSLNKCFEKVENKSGSSSRKGCLWALNPSKIDKMQEELQKWKRKDPIAVRKSMAKPEELDSLIG; the zebrafish peptide corresponds to LFPQPRITAHSQDLQPKCFPKPIYSYSCLIAMALKNSKTGSLPVSEIYSFMKEHFPYFKTAPDGWKNSVRHNLSLNKCFEKVENKMSGSSRKGCLWALNPAKIDKMEEEMQKWKRKDLPAIRRSMANP; and the amphioxus peptide corresponds to APKQPKIAQKDKTETEKVYPKPAYSYSCLIAMALKNSKTGCLPVSEIYNFMCDNFPYFKTAPDGWKNSVRHNLSLNKCFEKVEKSTGGTSKKGCLWTLNPAKVAKMEEEVQKITRKDPQAIRRCMANP (DNA binding domains underlined). Constructs were expressed in M15 bacteria (Qiagen) and the fusion peptides purified on Ni-chelate resin (Qiagen) according to the manufacturer’s instructions. The eluted proteins were >95% pure as determined by denaturing protein gel electrophoresis.

To determine the ability of whn to function as a transactivator, expression constructs with N- or C-terminal epitope tags were prepared. The basic construct consisted of the mouse Whn coding region extended either at the N terminus (meqkliseedln piryrrggrcqdwvMVS … ALA, or the C terminus (MVS … ALAeqkliseedln) [9E10MYC epitope (10) underlined; amino acid residues encoded by the mouse whn cDNA in uppercase letters, unrelated amino acids in lowercase]. Immunohistochemical analysis using an anti-MYC epitope antibody (9E10, PharMingen) showed exclusive localization of Whn proteins in the nucleus of various cell lines tested (data not shown). The tagged mouse whn cDNAs were modified by replacing the DNA binding domain for that of zebrafish and amphioxus (see Fig. 2) using PCR fragments encompassing suitable restriction fragments. The amino acid sequence across the fusion junctions of the zebrafish version reads … ghqplfpKPIYSYS … RSMAkpee … , and that of the amphioxus version reads … ghqplfpKPAYSYS … RCMAkpee … (mouse residues in lowercase letters). Details for all constructions are available on request.

Figure 2.

Figure 2

Protein alignment of Whn DNA binding domains. (A) Genomic structure of whn genes. Introns are indicated by thin lines, exons by rectangles. Shading indicates regions encoding the DNA binding domain. Note the highly variable sizes of introns. (B) Alignment of all known Whn DNA binding domains (uppercase letters) and flanking sequences; amino acids are abbreviated in single-letter code. The phase of introns is indicated in brackets. Asterisks indicate that splice junctions were determined by comparison with other whn genes, rather than by comparison of genomic and cDNA sequences. The sequences for mouse (4), rat (4, 8, 15) and fugu (8) have been described earlier. Residues identical in all eight genes are given in the consensus line; some conservative changes are denoted by number: 1, negatively charged amino acid (E, D); 2, positively charged amino acid (K, R). The bottom line indicates the presumed secondary structure characteristics of the Whn winged-helix domain based on the structure of the DNA binding domain of HNF3γ (16). h, helix; s, β-sheet; w, loop (wing). Pairwise comparisons indicate that Whn DNA binding domains from human and amphioxus are 80% identical. Characteristic amino acid signatures are highlighted in different colors. Note that the exon containing the Whn DNA binding domain in amphioxus extends further into the 5′ direction.

In Vitro Selection of Binding Sites.

One μg of a mixture of randomized oligonucleotides (GCGAAGTGGAGGAGCCACAAGN20TGGCACAACTGGAGCTGGGTG) were incubated with 2 μg of primer 5′-CACCCAGCTCCAGTTGTGCCA in 100 μl of a buffer containing 20 mM Tris·Cl (pH 8.55), 16 mM (NH4)2SO4, 1.5 mM MgCl2. Ten units of Taq polymerase (AGS, Heidelberg) were added and the mixture incubated at 60°C for 1.5 min. The double-stranded material was used for in vitro binding selection experiments as follows. Purified mouse DNA binding domain peptide (0.5 μg) was bound to Ni-chelate resin and then preincubated for 30 min at room temperature in a buffer containing 20 mM Hepes (pH 7.6), 50 mM KCl, 1 mM DTT, 1 mM EDTA, 5% glycerol, and 150 μg/ml poly(dA-dT) in a final volume of 100 μl. In some experiments, MgCl2 to a final concentration of 2 mM was added to the incubation mixture. Furthermore, the in vitro binding site selection was also performed with poly(dI-dC) as nonspecific competitor. Purified double-stranded randomized oligonucleotides (0.4 μg) were added and the incubation extended for another 15 min. The resin was briefly centrifuged and washed six times with 30 bed volumes each of incubation buffer without competitor DNA. The bound protein/DNA complexes were then relased by incubation with 0.5M imidazole in PBS and the DNA fragments reamplified under conditions avoiding the accumulation of single-stranded material. After six rounds of selection and amplification, the resulting oligonucleotides were cloned into a plasmid vector and sequenced. A further variation in the in vitro selection scheme was introduced by reducing the concentration of peptide 5- and 10-fold, respectively.

Analysis of obtained sequences indicated no significant variations in the residues flanking the 5′-ACGC invariant core among the various experimental conditions; therefore, the results from 253 individual binding sites selected under various conditions are combined in Fig. 1.

Figure 1.

Figure 1

Binding site selection for mouse Whn DNA binding domain. Double-stranded oligonucleotides randomized at 20 positions were incubated with the bacterially expressed mouse Whn DNA binding domain. After six rounds of affinity purification, bound oligonucleotides were cloned and sequenced. All sequences contained an identical tetranucleotide, 5′-ACGC. The frequency of nucleotides occurring in the flanking regions is indicated by a percentage. No sequence specificity was detected upstream or downstream of the shown 11-bp region.

Electrophoretic Mobility Shift Assays.

The following double-stranded oligonucleotides were used: wild-type, 5′-atagggcgaattgggtaccAAAGGGACGCTATCgagctccagcttt (core sequence underlined); mutant G → A, core sequence 5′-ACAC; and mutant C → T, core sequence 5′-ATGC. In vitro methylation of double-stranded oligonucleotides was performed with SssI methylase under the conditions recommended by the supplier (New England Biolabs). The completeness of methylation was assessed in parallel reactions with added plasmid DNA. Oligonucleotides were radioactively labeled by phosphorylation using [γ-32P]dATP and T4 polynucleotide kinase and purified by phenol extraction and gel filtration with Sephadex G-50. Electrophoretic mobility shift assays were performed as follows. DNA (3,000 dpm; approximately 1 ng) was preincubated in a buffer containing 20 mM Hepes (pH 7.6), 50 mM KCl, 1 mM DTT, 1 mM EDTA, 5% glycerol, 2 mg/ml acetylated BSA, 1× API, 0.2 mM phenylmethylsulfonyl fluoride, and 100 μg/ml poly(dA-dT) or poly(dI-dC), respectively, for 3 min at room temperature. About 0.1 μg of appropriate peptides were added and the incubation extended for a further 30 min. The presence of MgCl2 to a final concentration of 2 mM had no influence on the results. The mixture was then loaded onto a 4% polyacrylamide gel (37.5:1, acrylamide:bisacrylamide) and run in 0.25× TBE buffer (90 mM Tris/90 mM boric acid/2.5 mM EDTA, pH 8.3) with 10 mA at 4°C.

Transactivation Experiments.

The basic reporter construct (pPRluc; ref. 11) used here consists of a luciferase gene driven from a rat prolactin minimal promotor. Three copies of an in vitro-selected oligonucleotide were cloned upstream of the transcriptional initiation site to give plasmid p9.1luc; the relevant sequence reads 5′- CTCGAGAACAAAGGGACGCTATCCGGTTGGATCCAACCGGATAGCGTCCCTTTGTTCTCGAGAACAAAGGGCACGCTATCCGGTTGGATCTTCGAGGCGAAGGTTTATAAAGCTCAATGTCTGCAGATGAGAAAG (the TATA box and the cap site are underlined; the ACGC core sequences or its complement are in bold). pPRluc or p9.1luc were cotransfected with a β-galactosidase expression construct, pBOS-βgal (12), together with one of the whn expression constructs into subconfluent BHK cells by calcium phosphate coprecipitation. Twenty-four hours after transfection, the CaP precipitate was washed away from the cells, and fresh medium was added for another 24 hr. The cells were then harvested into 250 mM Tris·Cl (pH 7.5) buffer and disrupted by three freeze/thaw cycles. The resulting extracts were cleared by centrifugation and assayed for luciferase and β-galactosidase activities (8). The results were corrected for transfection efficiency using β-galactosidase activity and expressed relative to luciferase activity obtained with the basic construct, pPRluc.

RESULTS

The Whn Protein Contains a Sequence-Specific DNA Binding Domain.

Sequence comparisons suggest that the DNA binding domain of the forkhead/winged-helix class of proteins comprises about 100 amino acids (13, 14). Accordingly, a 158-aa peptide (amino acids 251–386 of the mouse Whn protein) was expressed in bacteria as a His-tagged fusion protein and could be purified to homogeneity by Ni-chelate affinity chromatography under nondenaturing conditions. The peptide was then incubated with double-stranded oligonucleotides in which the central 20 nucleotides were randomized. Following six cycles of affinity selection, the resulting oligonucleotides were cloned and sequenced. Affinity selection was performed in the presence or absence of divalent cations (Mg2+), using poly(dA-dT) or poly(dI-dC), respectively, as competitors, and different peptide to DNA ratios. In each instance, sequence analyses of bound oligonucleotides revealed similar repertoires of sequences, all of which contained an identical tetranucleotide core sequence, 5′-ACGC; small regions flanking the core sequence on either side were considered to be composed of nonrandom sequences (Fig. 1). This provides strong evidence that Whn is a sequence-specific binding protein interacting with an 11-bp recognition site. Certain combinations of nucleotides tend to occur on either side of the invariant core sequence; accordingly, the sequence 5′-AAAGGGACGCTATC was used as a representative binding site sequence to confirm specific DNA binding to Whn in electrophoretic mobility shift assays and in in vivo transactivation experiments (see below).

Isolation of Homologs of the whn Gene.

To assess the presence of the whn gene in other species, we have isolated, either by cross-hybridization or by PCR with degenerated primers, seven homologs of the mouse whn gene. In the present study, we have restricted our analysis to the DNA binding domains of whn homologs. The exon/intron structure of the human whn homolog was deduced by sequence comparison to the mouse whn gene and by comparison to a human whn cDNA isolated from a thymus cDNA library. The isolation of the rat whn gene has been described previously (4, 8, 15), as has the whn homolog from the puffer fish Fugu rubripes (8). The genomic structure of the zebrafish whn gene was established by comparison with the corresponding cDNA. To isolate the whn homolog from cartilaginous fish, genomic DNA from the shark Scyliorhinus caniculus was subjected to PCR using degenerate oligonucleotide primers whn12 and whn13 corresponding to the GSLPVSE and LWALNPS amino acid motifs in the Whn DNA binding domain. The initial sequence was then extended by a PCR-assisted genomic walking strategy (9). A similar strategy failed with DNA from the agnathan fish Lampetra planeri. However, initial sequence information for the whn homolog from this species was readily obtained with a second set of degenerated primers, whn11 and whn13. The likely exon/intron structure of whn genes from shark and lamprey was derived by sequence comparisons to other vertebrate whn genes. The whn homolog from amphioxus was initially detected with primers whn12 and whn13; its genomic structure was experimentally verified by comparison to cDNAs prepared from total Branchiostoma lanceolatum mRNA.

The results indicate that in the seven vertebrate species analyzed, the whn DNA binding domain is encoded in three exons (Fig. 2A). The N-terminal six amino acids are encoded in the 5′ exon; an intron of variable length occurs after the second nucleotide (phase 2) in the seventh codon; the next exon contains the last nucleotide of the seventh codon and triplets for a further 32 amino acids; and the second intron, again of very variable length, occurs after codon 39 (phase 0). In contrast, the DNA binding domain of the whn gene from the cephalochordate Branchiostoma lanceolatum, is contained within one exon (Fig. 2). The intron separating this exon from the 3′ part of the gene occurs 16 amino acids after the last residue of the DNA binding domain (lysine) and in the same phase (phase 1) as introns demarcating the corresponding vertebrate exons (Fig. 2B). Thus, the presence of at least one intron in the ancestral whn gene antedated the separation of vertebrates and cephalochordates.

The cloning experiments reported above failed to reveal evidence of more than one whn-like gene per species, suggesting that no subfamily of whn-like forkhead/winged-helix genes exists. This conclusion is supported by an experiment in which cDNA probes spanning the DNA binding domains of mouse, zebrafish, and amphioxus were used for Southern filter hybridizations under conditions of low stringency. The results shown in Fig. 3 suggest the absence of a close relative(s) of the whn gene, at least in the three species tested, as there are no hybridizing fragments apart from those emanating from the known whn genes.

Figure 3.

Figure 3

Absence of whn-related genes in mouse, zebrafish, and amphioxus. Hybridization analysis was performed under low stringency conditions (6× SSC at 65°C) with cDNA probes spanning the DNA binding domain from the indicated species. DNAs were digested with HindIII (mouse), SpeI (zebrafish), and DraI (amphioxus) to reveal single fragments for the known whn genes. Note the absence of additional hybridizing bands.

The comparison of derived amino acid sequences of DNA-binding domains of the eight known whn homologs shown in Fig. 2B indicated a high level of overall similarity. More than 70% of the 92 residues comprising the DNA binding domain are identical in all eight species. At several sites, conservative amino acid substitutions occur, raising the conservation index across all eight species, as defined by Zvelebil et al. (17) to about 0.85. It is of note that the lamprey whn gene shows a 14-aa residue insertion in the C-terminal half of the DNA binding domain. Assuming that the three-dimensional structure of the forkhead/winged-helix protein HNF-3 (16) also applies to whn, this alteration is located in the first wing of the DNA binding domain, a region that appears tolerant to size differences in other proteins of this class (see alignments in ref. 18). Furthermore, most other species-specific sequence signatures occur in the two putative wings at the C terminus of the DNA binding domain, whereas the amino acid residues in the N-terminally located presumptive helical and sheet structures are more strongly conserved. We also note that the amino acids TAPDG (just preceding the third helix and including its N terminus), thought to be responsible for sequence-specific DNA binding by forkhead/winged-helix domains (18, 19), are identical in all species (see below; Fig. 2). The extent of amino acid changes generally correlates with the evolutionary distance between species. Pairwise comparisons show that even human and amphioxus whn genes are nearly 80% identical at the amino acid level. Overall, our results suggest a remarkable level of conservation among whn genes over more than 500 million years of evolution.

Functional Characteristics of whn Homologs.

To determine whether the amino acid homology of whn homologs translates into similarities with respect to specific DNA binding, the DNA binding domains of zebrafish and amphioxus whn genes were expressed in bacteria as His-tagged peptides. Together with the mouse Whn peptide described above, they were used in electrophoretic mobility shift assays using wild-type and mutated Whn binding sites (Fig. 4A). The results shown in Fig. 4A indicate that DNA–protein complexes are readily formed between zebrafish and amphioxus DNA binding domains and a mouse Whn binding site. All three peptides fail to bind to DNA fragments carrying a G > A mutation in the ACGC core binding sequence, whereas minimal binding is retained with an ATGC mutant. After enzymatic modification of a cytosine residue in the wild-type binding sequence with SssI methylase (AmCGC), a significant reduction of binding in electrophoretic shift assays was observed.

Figure 4.

Figure 4

Recognition of Whn binding sites in vitro and in vivo. (A) Electrophoretic mobility shift assays using mouse, zebrafish, and amphioxus DNA binding domain peptides. A double-stranded oligonucleotide obtained via in vitro selection was used in wild-type configuration (core sequence ACGC), in modified forms (changed nucleotides in core sequence are indicated by lowercase letters), or in in vitro methylated form (m denotes 5-methylcytosine). (B) Transactivation of a luciferase reporter gene after transient transfection into BHK cells. A luciferase gene with a minimal promotor (11) was cotransfected with a mouse whn (DBDMm) expression plasmid or with constructs in which the mouse DNA binding domain was changed to that of zebrafish (DBDDr) or amphioxus (DBDBl) to establish a luciferase baseline activity. These values were compared with reporter constructs in which a whn response element was positioned upstream of the minimal promotor and expressed as fold transactivation. Values shown represent the average of two experiments with a variation of less than 20%. I refers to expression constructs with an N-terminal MYC tag; II refers to constructs with a C-terminal MYC tag. DBD, DNA binding domain; AD, transcriptional activation domain (8). In control experiments, the transfection of whn expression constructs in anti-sense orientation had no effect on luciferase activity.

It was important, however, to assess the ability of the Whn binding site to function in vivo. To this end, a luciferase gene driven from a minimal promotor was modified by the insertion of Whn binding sites upstream of the transcriptional initiation site. This construct was cotransfected into BHK cells with a mouse whn expression plasmid or with one of two chimaeric whn genes in which the mouse DNA binding domain had been replaced by those from zebrafish and amphioxus, respectively. All three constructs activated transcription to a similar extent (Fig. 4B). This indicates that the in vitro-selected Whn binding sites can be recognized by full-length Whn protein in vivo and confirms the functional similarities of Whn DNA binding domains.

DISCUSSION

Whn Is a Sequence-Specific DNA Binding Protein.

The experiments reported here clearly indicate that Whn is a sequence-specific DNA binding protein. The invariant core sequence in the 11-bp Whn consensus binding sites is rather short and nonpalindromic. The binding sites recognized by different members of the forkhead/winged-helix family of proteins are quite distinct (18, 19), unlike the situation with homeodomains or bHLH domains that all appear to bind to an invariant core sequence (5′-TAAT or 5′-CANNTG, respectively) despite their very variable amino acid compositions (20, 21). The identification of a consensus Whn binding site sequence will not immediately be helpful to pinpoint likely target genes, because the frequency of the invariant tetranucleotide in genomic DNA is rather high. (Statistically, about 7 × 106 sites can be expected.) However, because of the apparent nonrandomness of sequences flanking the core tetranucleotide, it may be worthwhile to repeat the in vitro DNA selection with either total genomic DNA or promotor (“CpG island”) fragments. Interestingly, HTLF (22), the closest known relative of Whn in the forkhead/winged-helix family of proteins, binds to purine-rich DNA sequences, unlike the Whn binding site with its characteristic PuPyPuPy core sequence. Although a more systematic study is clearly required, our preliminary results suggest a critical role for Whn binding of the guanine residue in the ACGC core binding site, as binding is completely abolished in the ACAC mutant. For the second position of the ACGC core binding site, our results indicate a crucial role of the hydrogen atom in the C-5 position of the pyrimidine ring for efficient Whn binding. Enzymatic conversion of cytosine to 5-methylcytosine or cytosine to thymine replacement drastically affects DNA–protein complex formation. Our results therefore hint at a possible role of 5-methylcytosine in the regulation of Whn binding to DNA. Unmethylated CpG dinucleotides occur in promotor regions of most genes, whereas most cytosine residues in CpG dinucleotides located elsewhere in the genome are enzymatically converted to 5-methylcytosine after DNA replication. As this modification significantly reduces the affinity of Whn to DNA, a large number of potential binding sites for Whn may in fact not be accessible in the cell.

Our results indicate that the binding sites selected in vitro with a peptide encompassing only the DNA binding domain are functional in vivo and mediate transactivation of reporter genes by full-length Whn proteins. Furthermore, our experiments suggest that the DNA binding domains from mouse, zebrafish, and amphioxus recognize similar DNA sequences and are functionally interchangeable when embedded within a full-length mouse Whn protein.

whn Genes Are Highly Conserved in Chordates.

The present data indicate that whn genes are highly conserved in species from at least two subphyla (vertebrata and acrania) of chordata, suggesting that whn genes may also be found in tunicata, the third subphylum of chordata. Are the whn genes isolated from vertebrates homologs or true orthologs of the mouse nude gene? Several arguments can be made in favor of orthology. First, there is a significant degree of amino acid sequence conservation, which (excluding the extension of wing 1 in lamprey) amounts to well over 80%. Second, the exon/intron structure of vertebrate whn genes is conserved: the DNA binding domain is encoded by three exons and two introns are inserted at exactly identical positions in all vertebrate whn genes. The location of the intron separating the DNA binding domain from the C-terminal end of Whn proteins is identical in all known whn genes. Third, although the full-length sequence of whn genes is known only for human, rat, and mouse, at least one other functionally important feature of the Whn protein was identified in the fugu whn gene. The Whn protein contains an ≈50-aa transactivation domain in its C terminus. Despite considerably lower overall sequence identity at the protein level in this region, the functionally important cluster of acidic residues is conserved between fish and mammalian species. Therefore, there is strong evidence to suggest that the whn genes isolated from all four classes of vertebrates are orthologous. The question of whether the amphioxus whn gene is orthologous to vertebrate whn genes is more difficult to answer. Although a high level of sequence identity and considerable codon bias can also be found for amphioxus, the DNA binding domain is encoded in a single exon. It is of note, however, that the 3′ end of this exon coincides with the end of the last exon of vertebrates encoding the DNA binding domain. This suggests that the ancestral whn gene had only a single exon for the DNA binding domain and that two introns were inserted after the establishment of the vertebrate lineage. Given the fact that only one whn-like gene could be detected in vertebrates and amphioxus, we favor the possibility that the whn gene from Branchiostoma lanceolatum is an ortholog of the mouse nude gene. Clearly, it will be important to establish the complete structure of chordate whn genes and to assess their functional interchangeability (e.g., by attempting to rescue the mutant phenotype in nude mice).

Comparison of whn gene sequences with entries in public databases reveals that the sequence most closely related to whn is that of the human HTLF gene (22), with about 54% identity, between human or amphioxus Whn and human HTLF, and 65% identical residues when Whn consensus residues are compared with HTLF. However, the observed amino acid homology does not extend to the exon/intron structure. The DNA binding domain, including the 21 N-terminal amino acids of the HTLF gene, occurs in one large exon; the first intron interrupts the HTLF coding sequence in phase 2 of codon 123, 10 amino acids after the end of the DNA binding domain (data not shown). This genomic organization is clearly distinct from that of whn genes; therefore, the existence of a common ancestor of whn and HTLF genes remains uncertain.

Functional Role of whn Genes During Evolution.

The present results raise intriguing questions as to the role of whn genes in cephalochordates and agnathans. In the mouse, whn is essential for the proper keratinization of hair shafts and the differentiation of the epithelial thymic primordium. Table 1 correlates the presence of these tissues to the molecular phylogeny of whn genes. No morphological evidence has yet been forthcoming to suggest the presence of thymus glands in agnathans; in addition, an adaptive immune system has not yet been detected in this vertebrate class (6, 7). Even more conspicuous is the absence of an adaptive immune system and its associated organs in the cephalochordate Branchiostoma lanceolatum (23). Thus, whn genes appear to fulfill quite different functions in chordates. Although the site of expression of whn in cephalochordates and jawless vertebrates has not yet been determined, no thymus-like structure or hair-like appendage exists in these animals. Jawed vertebrates, however, all possess a thymus, where whn is likely to play a role in the differentiation of the thymic epithelium, as has been previously shown for the mouse (5). Finally, in mammals, whn also orchestrates the keratinization of the hair-shaft (5, 24). Given that only one whn-like gene exists in the genomes of vertebrates and cephalochordates (Fig. 3), changes in transcriptional control regions rather than gene duplication events must have accompanied such functional reassignments. Indeed, the additional function of whn in the hair follicle correlates with the presence of two promotors directing the expression of whn to distinct cellular compartments of thymus and skin (M.S., M. Hofmann, and T.B., unpublished data); this result is supported by recent experiments reporting an incomplete rescue of the mutant phenotype in nude mice (25).

Table 1.

Correlation of the presence of thymus and hair with the presence of whn orthologs in various species of chordata

Subphylum Class Species Presence of
whn ortholog Thymus Hair
Vertebrata Mammalia Mus musculus + + +
Osteichthyes Danio rerio + +
Chondroichthyes Scyliorhinus caniculus + +
Agnatha Lampetra planeri +
Acrania Cephalochordata Branchiostoma lanceolatum +

Our results also provide a molecular framework to revisit the possible presence of a proto-thymus in agnathans, which is undetectable by morphological methods.

Acknowledgments

We thank Marion Huth and Melanie Sator-Schmitt for expert technical assistance and Dr. A. Schreiber for Lampetra specimens. We also thank Dr. Susumo Ohno for helpful discussions on the composition of the Cambrian pananimalia genome. Financial support for these studies was provided by the Deutsche Forschungsgemeinschaft.

Footnotes

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. Y11537–Y11544Y11537Y11538Y11539Y11540Y11541Y11542Y11543Y11544).

References

  • 1.Kiesielow P, von Boehmer H. Adv Immunol. 1995;58:87–209. doi: 10.1016/s0065-2776(08)60620-3. [DOI] [PubMed] [Google Scholar]
  • 2.Levelt C N, Eichmann K. Immunity. 1995;3:667–672. doi: 10.1016/1074-7613(95)90056-x. [DOI] [PubMed] [Google Scholar]
  • 3.Boehm T, Nehls M, Kyewski B. Immunol Today. 1995;16:555–556. doi: 10.1016/0167-5699(95)80074-3. [DOI] [PubMed] [Google Scholar]
  • 4.Nehls M, Pfeifer D, Schorpp M, Hedrich H, Boehm T. Nature (London) 1994;372:103–107. doi: 10.1038/372103a0. [DOI] [PubMed] [Google Scholar]
  • 5.Nehls M, Kyewski B, Messerle M, Waldschütz R, Schüddekopf K, Smith A J H, Boehm T. Science. 1996;272:886–889. doi: 10.1126/science.272.5263.886. [DOI] [PubMed] [Google Scholar]
  • 6.Marchalonis J J, Schluter S F. Scand J Immunol. 1990;32:13–20. doi: 10.1111/j.1365-3083.1990.tb02886.x. [DOI] [PubMed] [Google Scholar]
  • 7.Sima P, Vetvicka V. Crit Rev Immunol. 1993;13:83–114. [PubMed] [Google Scholar]
  • 8.Schüddekopf K, Schorpp M, Boehm T. Proc Natl Acad Sci USA. 1996;93:9661–9664. doi: 10.1073/pnas.93.18.9661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Nehls M, Lüno K, Schorpp M, Pfeifer D, Krause S, Matysiak-Scholze U, Dierbach H, Boehm T. Mamm Genome. 1995;6:321–331. doi: 10.1007/BF00364794. [DOI] [PubMed] [Google Scholar]
  • 10.Evan G I, Lewis G K, Ramsay G, Bishop J M. Mol Cell Biol. 1985;5:3610–3616. doi: 10.1128/mcb.5.12.3610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nelson C, Albert V R, Elsholtz H P, Lu L I-W, Rosenfeld M G. Science. 1988;239:1400–1405. doi: 10.1126/science.2831625. [DOI] [PubMed] [Google Scholar]
  • 12.Mizushima S, Nagata S. Nucleic Acids Res. 1990;18:5322. doi: 10.1093/nar/18.17.5322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lai E, Clark K L, Burley S K, Darnell J E., Jr Proc Natl Acad Sci USA. 1993;90:10421–10423. doi: 10.1073/pnas.90.22.10421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hromas R, Costa R. Crit Rev Oncol Hematol. 1995;20:129–140. doi: 10.1016/1040-8428(94)00151-i. [DOI] [PubMed] [Google Scholar]
  • 15.Segré J A, Nemhauser J L, Taylor B A, Nadeau J H, Lander E S. Genomics. 1995;28:549–559. doi: 10.1006/geno.1995.1187. [DOI] [PubMed] [Google Scholar]
  • 16.Clark K L, Halay E D, Lai E, Burley S K. Nature (London) 1993;364:412–420. doi: 10.1038/364412a0. [DOI] [PubMed] [Google Scholar]
  • 17.Zvelebil M J, Barton G J, Taylor W R, Sternberg M J. J Mol Biol. 1987;195:957–961. doi: 10.1016/0022-2836(87)90501-8. [DOI] [PubMed] [Google Scholar]
  • 18.Pierrou S, Hellqvist M, Samuelsson L, Enerbäck S, Carlsson P. EMBO J. 1994;13:5002–5012. doi: 10.1002/j.1460-2075.1994.tb06827.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Overdier D G, Porcelly A, Costa R H. Mol Cell Biol. 1994;14:2755–2766. doi: 10.1128/mcb.14.4.2755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wilson D S, Sheng G, Jun S, Desplan C. Proc Natl Acad Sci USA. 1996;93:6886–6891. doi: 10.1073/pnas.93.14.6886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kadesch T. Cell Growth Diff. 1993;4:49–55. [PubMed] [Google Scholar]
  • 22.Li C, Lusis A J, Sparkes R, Tran S-M, Gaynor R. Genomics. 1992;13:658–664. doi: 10.1016/0888-7543(92)90138-i. [DOI] [PubMed] [Google Scholar]
  • 23.Rhodes C P, Ratcliffe N A. Develop Comp Immunol. 1983;7:695–698. [Google Scholar]
  • 24.Hardy M H. Trends Genet. 1993;8:55–61. doi: 10.1016/0168-9525(92)90350-d. [DOI] [PubMed] [Google Scholar]
  • 25.Kurooka H, Segré J A, Hirano Y, Nemhauser J L, Nishimura H, Yoneda K, Lander E S, Honjo T. Int Immunol. 1996;8:961–966. doi: 10.1093/intimm/8.6.961. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES