Supporting information for Correia et al. (2003) Proc. Natl. Acad. Sci. USA, 10.1073/pnas.1131007100

Supporting Figure 4A

Supporting Figure 4B

Supporting Figure 4C

Supporting Figure 4D

Fig. 4.

PSI-BLAST, BLASTP, and multiple sequence alignment (PILE UP) analyses. (A) The E values and a multiple sequence alignment generated by PSI-BLAST with FNG are shown. As described at the National Center for Biotechnology Information
(www.nbi.nlm.nih.gov/blast), PSI-BLAST generates as position-specific scoring matrix (PSSM) from an alignment of sequences generated by BLAST. The PSSM then becomes the query for further iterations, and the E values reflect similarity to the PSSM rather than to the original query. PSI BLAST was carried through five iterations with default settings (until no additional sequences passed the threshold). For reference, we also include the E values of the next five best matches; none of these correspond to known glycosyltransferases. Because this PSI-BLAST was done on release 1 of the Drosophila genome, there are some differences in sequence and gene name with other figures. For example, CG8976 is shown here, but in release 3.1, this was split into CG33145, CG30037, and CG30036. Additionally, CG18558 and CG8673 appear only in later releases, but not in release 1. Sequences that correspond to the conserved motifs identified in Fig. 2 in the main text are red. In some cases, there are gaps in these alignments in this figure, but gaps disappear when multiple sequence alignments are performed among members of related groups (Fig. 4 C and D). Because alignment of more closely related proteins is more reliable, we gave more weight to the alignments in Fig. 4 C and D (i.e., keeping conserved motifs together within a family) than those in Fig. 4A in generating the alignments presented in Fig. 2 in the main text. (B) The results of BLASTP searches against release 3.1 of the predicted Drosophila proteome, with each of the putative β3GTs listed in Fig. 2 in the main text used as a query. For simplicity, the raw BLASTP output has been edited to delete BLAST hits with E values >0.5 and to delete multiple listings of protein products from the same gene. The BLAST listings are color coded as follows: black equals a match identical to the query, blue denotes a match within the same β3GT family, green denotes a match to a member of a different β3GT family, and red denotes a match to a gene that does not encode a known β3GT. β3GTs were subdivided into families applying, as a first criteria, that all members of the family should be more closely related to each other than they are to genes outside the family, as a second criteria, that the genes should be “closely” related. Considering the range of similarity among all β3GTs, E values <1e-15 generally led to genes being considered part of the same family, while E values >1e-10 generally led to genes being considered part of a different family. As a third criteria we considered the similarity within the four conserved motifs identified in Fig. 2 in the main text. These criteria were applied to the families as follows: (Ba) FNG. The best match, to CG9109, is relatively low. Additionally, while CG9109 is the closest match to FNG, FNG is not the closest match to CG9109. (Bb) Core 1 β3GalT group. Ju et al. (1) identified CG9520 and CG8708 as likely Drosophila homologues of mammalian core 1 β3GalT, but did not include the other genes that we include in this group as potential homologues. In our analysis, we note that while CG9520 and CG8708 are closest to each other, they are also the closest homologues of almost all the other members of the group. The sequence identity is quite high (Fig. 5C), and excluding CG2983, all members of this group are much more closely related to each other than they are to other β3GTs. Although consideration of overall sequence similarity would place CG2983 in this group as well, we exclude it because it matches poorly in three of the four the key sequence motifs. (Bc) Chondroitin synthase. Kitagawa et al. (2) identified CG9220 as related to mammalian chondroitin synthase, and like other GAG polymerases, in release 2 of the genome, CG9220 included two distinct GT domains. In release 3.1, the β3GT domain was deleted from CG9220, but based on the similarity to chondroitin synthase, we think this newer version is incorrect, and therefore continued to use version 2 for this BLAST search. The similarity to CG12913 is in the β4GT domain. Although the similarity to CG4351 is reasonable high, CG4351 also differs substantially from CG9220 in the conserved motifs. (Bd) Galactosyltransferase II. Bai et al. (3) identified CG8734 as related to mammalian Galactosyltransferase II. The closest match, CG33145, is moderately related, but CG33145 is much more closely related to other genes than it is to CG8734. (Be) CG9109. The closest match, CG2975, is moderately related, but CG2975 is much more closely related to other genes than it is to CG8734. (Bf) β3Gal/GnT family. Prior studies (4) have recognized that mammalian genes that transfer Gal onto GlcNAc, and GlcNAc onto Gal or GalNAc, in a β1,3 linkage, are closely related. This family comprises the closest Drosophila homologues of these mammalian enzymes. Although more diverse than the core 1 β3GalT family, they are grouped together because they are all more closely related to each other than they are to members of other groups. The similarity of CG8734 to CG33145 provides a single exception to this outcome, but CG8734 lacks significant similarity to most of the other members of this group. (Bg) CG2983. Although it is similar by BLAST to the core 1 β3GalT group, we exclude it because of the low similarity in key motifs.(Bh) CG4351. The best match, to CG9220, does not match well in the conserved motifs. (C) Multiple sequence alignment for members of β3GT family 2B (Core 1 β3GalTs), generated by PILE UP. Predicted amino acids that are identical in five or more genes are shaded black. Bars above the sequence numbers mark the location of the conserved motifs identified in Fig. 2 in the main text. (D) Multiple sequence alignment for members of β3GT family 2F (β3Gal/GnTs), generated by PILE UP. Predicted amino acids that are identical in five or more genes are shaded black. Bars above the sequence numbers mark the location of the conserved motifs identified in Fig. 2 in the main text. All the conserved motifs align without gaps.

1. Ju, T., Brewer, K., D’Souza, A., Cummings, R. D. & Canfield, W. M. (2002) J. Biol. Chem. 277, 178–186.

2. Kitagawa, H., Uyama, T. & Sugahara, K. (2001) J. Biol. Chem. 276, 38721–38726.

3. Bai, X., Zhou, D., Brown, J. R., Crawford, B. E., Hennet, T. & Esko, J. D. (2001) J. Biol. Chem. 276, 48189–48195.

4. Zhou, D., Dinter, A., Gutierrez Gallego, R., Kamerling, J. P., Vliegenthart, J. F., Berger, E. G. & Hennet, T. (1999) Proc. Natl. Acad. Sci. USA 96, 406–411.