Skip to main content
. 2001 Mar 15;29(6):1352–1365. doi: 10.1093/nar/29.6.1352

Table 1. Mouse and human genes in the ACHE/TFR2 region.

ACHE Ache AChE (acetylcholinesterase) has multiple molecular forms (28) (Fig. 2B displays the dominant E4-E6 hydrophilic variant). A highly conserved region just 5′ of the transcription start site, highlighted by a red stripe in Figure 2B (positions 7445–7516), contains transcription factor binding sites previously described (55). A second region containing an alternative promoter that acts synergistically with the proximal promoter in enhancing Ache expression in neuroblastoma cells (29) is also highly conserved, and highlighted by a red stripe in Figure 2B (positions 6374 –7139). The conserved regions found upstream of the alternative promoter have yet to be explored for regulatory elements.
 
Hs.157779 Mm.21836 Using our comparative genomic data we predict a 217 amino acid product in mouse just downstream of Ache. In human we can predict a 227 amino acid product from AC002993 that is 79.2% identical to a putative 217 amino acid product in mouse (Table 2). These products show similarity to the C-terminal end of the putative proteins CG16979 (Drosophila), CAB41160 (Arabidopsis), F38A5.1 (C.elegans) and BAA92064 (human). However, on our human cosmid 159d9 and clone NH0126L15 sequenced by Washington University in St Louis (AC011895), the start codon has a C instead of an A, making the largest ORF 146 amino acids.
   
ASR2 Asr2 (Also known as: Ars2) Asr2 (arsenite resistance protein 2) was named for its ability to confer arsenite resistance to an arsenite-sensitive Chinese hamster ovary cell line (56). We characterized the complete genomic structure of ASR2 (Table 2 and Fig. 2B) and found it transcribed in all tissues tested in both human and mouse (data not shown). The 876 amino acid human product differs in both size and amino acid content from that predicted by a full-length cDNA clone AL096723. This can be explained by two separate, single nucleotide differences in AL096723 that cause a frameshift error. These differences are resolved by our human genomic sequence and were probably due to sequence compressions in AL096723. Alternative splicing of the Asr2 mRNA was revealed by sequencing two different PCR-derived Asr2 cDNA clones that contain the full coding region of the mouse Asr2 cDNA (less the first methionine and the last eight codons). By comparing these cDNA clones with genomic and EST sequences we found three alternative splice events that decreased the coding size by 12, 21 or 33 (12 plus 21) bp (Table 2). An examination of human ESTs revealed only the 12 bp alternative splice event which, as in mouse, uses a non-canonical splice site (TG instead of AG in the AG–GT splice rule).
   
CIP1 Cip1 We identified and characterized a novel cation-chloride cotransporter-like gene in both human and mouse. Recently, Caron et al. (57) independently characterized a 3.2 kb transcript that they named CIP1 (cation-chloride cotransporter-interacting protein) as it coimmunoprecipitates with endogenous NKCC (Na-K-Cl cotransporter) and in Xenopus laevis oocytes, it inhibits NKCC1-mediated cation transport. CIP1 is 100% identical to our putative cation-chloride cotransporter-like protein and is predicted by PSORT (58) to have 12 transmembrane helices. To confirm our gene predictions we cloned and sequenced a partial cDNA, which spans all 13 exon/intron boundaries, from a human brain cDNA pool. This cDNA, combined with human and mouse genomic sequence comparisons, allowed us to predict the mouse exon structure, of which exons 7–13 have subsequently been confirmed by sequencing cloned mouse brain cDNA. We screened Clontech mouse and human multiple tissue cDNA panels using PCR, and found that this gene is ubiquitously transcribed in all tissues tested (data not shown). Alternative splice variants were apparent in all mouse tissues tested, while human expression studies using primers designed for the corresponding positions on human cDNA showed no evidence of alternative splice products. However, human EST AW675796 reveals an alternative splicing event that results in a 627 amino acid isoform of CIP1 that is predicted to have a truncated C-terminus, yet is still predicted to contain 12 transmembrane spanning regions. CIP1 shows higher similarity to Drosophila (60%; CG10413), C.elegans (54%; T04B8.5) and putative yeast (49%; NP009794.1) proteins, than it does to other members of the two established branches (Na+-K+-Cl and K+-Cl) of the cation-chloride cotransporter family. This evolutionary conservation combined with preliminary functional studies place CIP1 in a third branch of the cation-chloride cotransporter family and suggest that it may be part of a new family of proteins that modify the activity or kinetics of other cation-chloride transporters (57).
   
TRIP6 Trip6 TRIP6 (thyroid receptor interacting protein 6) is a human LIM domain-containing protein previously mapped to 7q22 (59) whose proposed function is to transmit signals from the cell surface to the nucleus (60). We have characterized the genomic structure of the mouse and human Trip6 genes (Table 2).
   
EPHB4 Ephb4 EPHB4 (ephrin type-B receptor 4 precursor), originally named HTK (hepatoma transmembrane kinase), is a receptor tyrosine kinase expressed in a variety of tissues, with its hematopoietic expression localized to the monocytic lineage (61,62). EPHB4 is preferentially expressed in veins (63) and recent studies have clearly linked Ephb4 to angiogenesis in X.laevis (64). The mouse ortholog, Ephb4, was originally designated Myk1 (mouse tyrosine kinase; 65) and later called Mdk2 (mouse developmental kinase 2; 66). The EST sequence AI049017 extends the 5′ UTR of the Ephb4 cDNA and suggests that an alternative splicing event occurs in this region of the gene. While our genomic data confirm both the coding region and the overall transcript size previously described (66), we do not find a match for the 3′ end (positions 3756–4473) of that sequence (Z49085); however, this sequence matches perfectly to positions 781–1496 of the Cctg gene (Z31556), which has been mapped to mouse chromosome 3 (67). We thus conclude that Z49085 contains sequence from two separate genes.
   
ZAN Zan
ZAN (Zonadhesin) codes for a sperm membrane protein that binds to the zona pellucida of the egg in a species-specific manner (68). In mice, the Zan product is a large (5374 amino acid) testes-specific protein with a complex mosaic structure consisting of MAM (meprin, A5 receptor, protein tyrosine phosphatase mu), mucin, VWD (von Willebrand factor type D), EGF (epidermal growth factor), transmembrane and intracellular domains (24,69). By comparing our mouse genomic sequence with the mouse Zan mRNA MMU97068 (16.1 kb) we find that the mouse Zan gene contains 88 exons and spans 98.2 kb of genomic DNA. In contrast, the human ZAN gene contains 48 exons, spans 61 kb and codes for a protein of 2724 amino acids (T.Cheung, M.Wassler, G.Cornwall and D.Hardy, manscript in preparation). The expansion of the mouse Zan gene stands out in the dot-plot (Fig. 2A) as a 39.5 kb elongation along the x-axis (mouse; positions 108788–147614 on Fig. 2B), and is primarily due to multiple duplication of the partial D3 domain sequence. Each of the 20 partial D3 domains (exons 38–77) consists of a segment containing two exons of ∼218 and 142 bp in length.
ACHE Ache AChE (acetylcholinesterase) has multiple molecular forms (28) (Fig. 2B displays the dominant E4-E6 hydrophilic variant). A highly conserved region just 5′ of the transcription start site, highlighted by a red stripe in Figure 2B (positions 7445–7516), contains transcription factor binding sites previously described (55). A second region containing an alternative promoter that acts synergistically with the proximal promoter in enhancing Ache expression in neuroblastoma cells (29) is also highly conserved, and highlighted by a red stripe in Figure 2B (positions 6374 –7139). The conserved regions found upstream of the alternative promoter have yet to be explored for regulatory elements.
EPO Erythropoietin (EPO) regulates the production of red blood cells by promoting the proliferation and differentiation of erythroid precursor cells (reviewed in 70). The mouse and human genomic structures for EPO have previously been described in detail, with considerable progress being made in characterizing the regulatory elements (reviewed in 49). The major proximal regulatory elements are a promoter at PIP positions 205140–205195 in the mouse sequence and a 3′ enhancer at 201210–201252 (71). These elements, marked by red stripes in Figure 2B, are substantially more conserved than the surrounding sequence. While initial clues for their location were provided by careful inspection of human–mouse alignments between much shorter sequences, Figure 2B verifies that they also stand out clearly in a completely automated comparison of long genomic regions. In particular, the promoter corresponds very closely to a 48 bp gap-free segment with 94% nucleotide identity (PIP positions 205144–205191), and the enhancer is contained within a 149 bp gap-free segment with 79% identity (PIP positions 201141–201289). Two additional elements regulating EPO transcription have been mapped to the region 14 kb upstream of the transcription start site in humans by inserting large DNA fragments into transgenic mice (summarized in 49). These results suggest that the conserved regions around positions 209–214 kb may be worthy of investigation.
Epo  
RPP20 RPP20 codes for a 20 kDa protein that co-purifies with human ribonuclease P (Rnase P), a tRNA processing ribonucleoprotein (72). The Rpp20 gene consists of at least two exons, one of which accounts for the entire 140 amino acid product, which is 94.3% identical between human and mouse (Table 2).
Rpp20  
PERQ1 PERQ1 codes for a novel protein rich in the amino acids proline, glutamate, arginine and glutamine (P, E, R and Q), with a glycine-tyrosine-phenylalanine (GYF) domain. The GYF domain is part of a unique helix–bulge–helix domain that binds to proline-rich motifs (73). Our mouse–human comparison, EST data and partial cDNA sequencing predict a putative Perq1 mouse protein and an orthologous PERQ1 human protein that differ somewhat in structure from the putative ORF2 (11). In order to predict the 1004 amino acid product in human, we had to assume that a run of nine cytosine bases (position 88127 of AF053356) actually contained eight or ten, as the reading frame shift produced did not correspond to our mouse cDNA sequence or genomic data. While the cDNAs are still not fully characterized, we predict a minimum 3 kb transcript with 24 exons in both species (Table 2). Considering the mouse–human sequence similarity and the presence of a CpG island found within the region 8.8 kb 5′ of the annotated Perq1 gene (Fig. 2B; PIP positions 229–231 kb), it is possible that part of the upstream 5′ UTR and promoter region has not been characterized. At the 3′ end of the gene, several EST sequences from a variety of species match significantly with the region between Perq1 and the 3′ end of Gnb2. PERQ1 shows 58% identity to two putative human proteins (AK001739 and AB014542) and 55% identity to a putative chicken protein (U90567). These three proteins are also rich in P, E, R and Q, and contain conserved GYF domains.
Perq1  
GNB2 GNB2 (guanine nucleotide binding protein, β polypeptide 2) encodes a β subunit of guanine nucleotide-binding proteins (G-proteins). G-proteins are heterotrimeric, containing α, β and γ subunits, and are involved in a multitude of cellular signaling processes (for a recent review see 74). We characterized the mouse genomic structure for Gnb2. Part of the highly conserved region found in the 5′ UTR (PIP positions 250459–252757) appears to be transcribed in various mouse and human tissues (see http://web.uvic.ca/~bioweb/laj.html).
Gnb2  
BAF53B The human ACTL6-like gene is more closely related to actin genes of lower organisms than to those of vertebrates (11). Since its discovery, ACTL6-like has confusingly been called ACTL6 (BAF53A). However, it is clear from comparing our mouse genomic sequence to AF053356 that the ACTL6-like gene is actually BAF53B. The mouse ortholog Baf53b encodes a 426 amino acid putative product that is 84.1% identical to the putative 475 amino acid human BAF53B (Table 2). Baf53b is 84% identical to mouse Baf53a, whose role in chromosome remodeling has previously been described (75).
Baf53b  
TFR2 Transferrin receptors mediate iron uptake by binding and internalizing the carrier protein transferrin. The transferrin receptor 2 gene (TFR2) contains 18 exons, is 2.9 kb in length and is primarily expressed in the liver (11,76). We have determined the genomic structure of the first six exons of mouse Tfr2 and identified an alternative splice variant occurring in the 5′ UTR region of the gene (EST AA537969).
Tfr2  

An expanded version of this table and links to specific sequences can be found at http://web.uvic.ca/~bioweb/laj.html.