Dottorini et al. 10.1073/pnas.0703904104.

Supporting Information

Files in this Data Supplement:

SI Table 2
SI Appendix 1
SI Appendix 2
SI Table 3
SI Figure 3




Fig. 3. Three-dimensional models of five An. gambiae MAG proteins. (A) The male-specific An. gambiae CAP (CRISP/Antigen5/PR-1) protein 06418. The An. gambiae protein (blue) is superimposed onto 1qnx (cyan) to illustrate the Ca2+-binding site and the conserved disulfide bridges. Disulfide bridges (red, shown on model for 1qnx) form between C27-C41, C32-C122, C52-C115, C198-C215, and C237-C249 (not shown) of the An. gambiae protein. The spatial orientation of the histidine residues that form the Ca2+-binding site is conserved between 06418 (H119 and H184, dark green) and 1qnx (light green). (B) The male-specific An. gambiae CRAL-TRIO domain containing protein 09365. Three parallel b-sheets form the floor of the hydrophobic pocket, with hydrophobic residues projecting into the center of the pocket (yellow). The C-terminal string (gray) extends around the rear of the protein, providing reinforcement for the pocket. (C) The An. gambiae cyclophilin-like isomerase 07088 (cyan) superimposed on 1cyn (blue) showing the conserved binding site of cyclosporine (yellow). (D) The An. gambiae carboxylesterase COEBE1D. The active site of the D. melanogaster orthologue EST-6 is partially conserved in An. gambiae; the catalytic triad of both COEBE1D (yellow, superimposed on active site of 2bce in blue) and COEBE4D consists of a serine (S210/S207) and a histidine (H466/H463), but in contrast to EST-6 (and 2bce) the third member of the triad is a glutamic acid (E342/E342) instead of an aspartic acid. The oxyanion hole (shown in gray, superimposed on 2bce in cyan) is likely formed by two alanine residues (A133/A130 and A211/A208) and one glycine residue (G132/G129). (E) The An. gambiae acid lipase 03083. Three principal domains are shown: the a/b hydrolase domain (red), the "cap" domain (aa. 219-340, green) covering the active site, and the putative "lid" structure (amino acids 253-282, yellow), which contains a disulfide bridge between C265-C274 (pink). The active site comprises the highly conserved catalytic triad S188 (which matches the consensus sequence GX1SX2G), D361, and H392 (cyan) and an oxyanion hole formed between the NH2 groups of L102 and Q189 (gray). (F) A close-up of the active site of the acid lipase 03083 superimposed upon the active site of 1hlg revealing the conserved spatial orientation of the residues of the catalytic triad and oxyanion hole.





Table 2. D. melanogaster Acp genes whose putative An. gambiae orthologues could not be detected in the male accessory glands

Drosophila

Acp

Anopheles

homologues

F

T

RB

Method

Acp26Aa

11117

+

 

+

Matlab

Acp29Ab

20910*

+

+

+

Matlab

Mst57Dc

03130

 

 

+

Matlab

CG1462

10596

+

 

 

BLAST

CG2918

01827

+

 

 

BLAST

Acp76A

07691

+

+

 

Matlab

CG4147

04192

+

+

 

BLAST

Acp32CD

06410

+

 

 

Matlab

CG6168

06610

 

+

+

Matlab

CG6461

08915

 

 

 

BLAST

CG8093

03500, 03501

 

 

 

Matlab

CG11864

10764

+

 

 

BLAST

The expression profile of the Anopheles genes in different tissues is indicated. The + symbol indicates expression, and an empty space indicates that no expression could be detected. In three cases, no amplification could be recovered from any tissue. F, whole females; T, testes; RB, male carcasses depleted of reproductive organs. The last column indicates the bioinformatics method used to identify the putative orthologues. For an additional 22 Drosophila Acps putative Anopheles orthologues were not identified in the bioinformatics analyses.

*For 20910 the old Ensembl identifier is used (omitting the initial ENSANGG000000 digit) because the gene is no longer in the database and a new identifier has not been assigned to it.





Table 3. Comparative structural modelling of 22 An. gambiae MAG proteins

Gene

Classification (superfamily)

PFAM class

PDB ID code homology model

Homology model E-score

Identity, %

123D+ Z-score

AGAPE remark-score

3D-PSSM E-value

05246

Serine protease inhibitor

PF00079

1qlp

7.0e-88

26

20.44 (1a7c)

0.000e+00

0.0027

Coebe4D

Alpha/beta-Hydrolase

PF00135

2bce

1.0e-106

25

25.42

1.47e-11

0.0027

Coebe1D

Alpha/beta-Hydrolase

PF00135

2bce

1.0e-107

24

26.29

9.35e-12

0.0027

06418

PR-1-like

PF00188

1qnx

7.0e-41

26

33.38

0.000e+00

8.16e-07

06581

Serine protease inhibitor

PF01826

1ccv

5.0e-04

47

 

2.170e-25

0.0508

PF01826

1hx2

2.6-01

40

5.75

06583

Serine protease inhibitor

PF01826

1ccv

1.1e-02 -

30

 

6.490e-22

 

PF01826

1hx2

3.72

0.0585

06585

Serine protease inhibitor

PF01826

1ccv

2.0e-03 -

37

 

1.521e-25

0.0343

PF01826

1atb

4.04

06586

Serine protease inhibitor

PF01826

1ccv

2.0e-04

44

32

 

1.78 e-24

0.0041

PF01826

1hx2

1.2e-01

5.57

9.32e-23

1.44e-07

06587

Serine protease inhibitor

PF01826

1ccv

8.0e-08

33

 

5.211e-25

0.0162

PF01826

1hx2

1.0e-04

5.42

7.72e-08

07088

Cyclophilin-like isomerase

PF00160

2cpl

5.0e-59

57

31.94

 

6.89e-5

PF00160

1cyn

1.0e-65

73

0.000e+00

07491

FAD-dependent thiol oxidase/Thioredoxin-like

PF00085

1mek

4.0e-28

22

16.66

 

1.76e-05

PF00085

2trx

6.0e-31

20

1.960e-15

01424

Ribosomial protein S5/ ATPase domain of HSP90 chaperone

PF00183

1usv

3.0e-94

50

26.69

 

7.87e-05

PF00183

1usu

2.0e-89

0.000e+00

03083

Alpha/beta-Hydrolase

PF04083

1hlg

1.0e-103

33

44.21

-

4.91e-06

SRPN9

Serine protease inhibitor

PF00079

1qlp

4.0e-76

22

20.46

0.000e+00

1.62e-51

PF00079

1ova

4.0e-72

24

crc

ConcanavalinA-like lectin/glucanase/ P-domain calnexin/calreticulin

PF00262

1jhn

1.0e-118

34

28.66

-

1.66e-08

04428

RNA-binding domain/FAS1

PF02469

1w7e

2.0e-39

23

-

 

 

PF02469

1070

-

0.000e+00

3.78e-09

TEP15

Torpenoid cyclase/ Alpha-macroglobulin receptor

PF07678

1c3d

5.0e-72

31

12.62

-

-

08822

FKBP-like/EF-hand

PF00254

1bkf

2.0e-26

35

17.11

 

0.049

PF00254

1q1c

4.0e-26

27

9.139e-23

08968

Kazal type serine protease inhibitor

PF00050

1sgp

2.0e-04

35

5.60

-

0.000277

09364

CRAL_TRIO domain/ CRAL_TRIO N terminal domain

PF00650

1oiz

2.0e-55

27

 

0.0

 

PF00650

1aua

3.0e-51

22

22.51

0.00319

09365

CRAL_TRIO domain/ CRAL_TRIO N terminal domain

PF00650

1oiz

1.0e-54

25

 

0.0

 

PF00650

1aua

2.0e-48

20

23.34

0.00286

09842

Ribonuclease Rh-like

PF00445

1dix

3.0e-55

26

27.74

 

3.0e-5

PF00445

1iyb

3.0e-55

26

0.000e+00

For each An. gambiae candidate, we used a combination of three different protein-threading programs (123D+, AGAPE, and 3D-PSSM, last three columns) to identify 3D templates and to choose the most congruent alignment. Significance values, to estimate the accuracy of the prediction, are reported for each program as follows: Z scores (123D+), remark score (AGAPE), and E-values (3D-PSSM). 3D models were generated by comparative homology modeling using Geno3D (www.geno3d-pbil.ibcp.fr). The structure-prediction protocols used to identify the most accurate models included the following: (i) accurate significance values (90% confidence of E-value), % certainty for the 3D-PSSM, Z-score for 123D+, remark score for AGAPE; (ii) consistency of the Protein Data Bank (PDB) templates among all prediction programs; (iii) consistency in the PFAM (1) functional annotation for each determined PDB template (www.pdb.org); (iv) congruency among superfamily prediction analysis (Superfamily 1.69) (2) and 3D models functional annotation. Stereochemical quality of models was checked with PROCHECK (3) within Geno3D. Graphical representation of 3D models was performed by using Pymol DeLano Scientific (DeLano Scientific). The PDB entry code (PDB ID code) of the structure template used in the homology modeling analysis, the accuracy of the model expressed as an E-value, and the % identity between an Acp sequence and its template structure are indicated. For each candidate the classification obtained with Superfamily 1.69 is provided, and multiple domains present within a sequence are indicated. The PFAM protein domain annotation codes refer to PDB templates identified with the threading and used for the homology. PFAM codes are indicated to demonstrate the consistency of PDB templates identified with the threading methods. The remaining 24 Acps identified could not be comparatively modeled.

1. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al. (2006) Nucleic Acids Res 34:D247-D251.

2. Gough J, Karplus K, Hughey R, Chothia C (2001) J Mol Biol 313:903-919.

3. Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992) Proteins 12:345-364.