The coding and partial noncoding nucleotide
sequence of the msCRABP gene and deduced amino acid sequence. Exon and
protein sequences are upper case. Intron and 5′ and 3′ regulatory
sequences are lower case. Protein sequence is listed in one letter code
below the second nucleotide of each codon. Nucleotide and amino acid
(aa) numbers are shown to the right of the sequence. The letter in
parentheses indicates the amino acid (I23) encoded by a split codon.
Nucleotide one is the first nucleotide of exon 1. Negative numbers
indicate nucleotide sequence upstream of the transcription initiation
site. Positive numbers are for the cDNA sequence only. All identified
motifs in the regulatory regions and transcription unit are in
boldface. Regulatory regions: Shown with shaded ovals is a RARE-like
motif (see text). Underlined are several putative transcription factor
binding motifs (GC boxes), four repetitive sequences (labeled a-d), and
transcription termination processing signals (GT cluster; 3′ regulatory
region). Shown boxed is a CAAT and GAGA/purine-like box, a
Drosophila zeste site, and a GATA-1 site. msCRABP
transcription unit: the transcription initiation site is boxed and
labeled mRNA START. Box with a bent arrow is the initiator codon for
translation start and box with a STOP is the termination codon. Shown
with an open oval close to transcription initiation is a downstream
element. Underlined are putative branchpoint sequences for intron
splicing (boldface nucleotides represent consensus), a consensus site
for the transcription factor AP2, and two poly(A) addition signals
(ATTAAA). Strong polypyrimidine tracts preceding 3′ splice sites
(intron 1 and 3) are underscored with dots. The first and last two
nucleotides of each intron are in boldface. Circled and labeled poly(A)
is the poly(A) addition site. Shown in boldface with bidirectional
arrows are two large palindromic sequences in intron 1 and 3. The
deduced msCRABP amino acid sequence differs by two amino acids (shown
in square boxes) compared with the prothoracicotropic hormone peptide
fragment sequence (20), although the two residues in question were
determined with low confidence during prothoracicotropic hormone
sequencing. The successful degenerate primers are noted by brackets and
the cDNA sequence used for Southern hybridization is delimited by large
parentheses.