Dottorini et al. 10.1073/pnas.0703904104. |
Fig. 3. Three-dimensional models of five An. gambiae MAG proteins. (A) The male-specific An. gambiae CAP (CRISP/Antigen5/PR-1) protein 06418. The An. gambiae protein (blue) is superimposed onto 1qnx (cyan) to illustrate the Ca2+-binding site and the conserved disulfide bridges. Disulfide bridges (red, shown on model for 1qnx) form between C27-C41, C32-C122, C52-C115, C198-C215, and C237-C249 (not shown) of the An. gambiae protein. The spatial orientation of the histidine residues that form the Ca2+-binding site is conserved between 06418 (H119 and H184, dark green) and 1qnx (light green). (B) The male-specific An. gambiae CRAL-TRIO domain containing protein 09365. Three parallel b-sheets form the floor of the hydrophobic pocket, with hydrophobic residues projecting into the center of the pocket (yellow). The C-terminal string (gray) extends around the rear of the protein, providing reinforcement for the pocket. (C) The An. gambiae cyclophilin-like isomerase 07088 (cyan) superimposed on 1cyn (blue) showing the conserved binding site of cyclosporine (yellow). (D) The An. gambiae carboxylesterase COEBE1D. The active site of the D. melanogaster orthologue EST-6 is partially conserved in An. gambiae; the catalytic triad of both COEBE1D (yellow, superimposed on active site of 2bce in blue) and COEBE4D consists of a serine (S210/S207) and a histidine (H466/H463), but in contrast to EST-6 (and 2bce) the third member of the triad is a glutamic acid (E342/E342) instead of an aspartic acid. The oxyanion hole (shown in gray, superimposed on 2bce in cyan) is likely formed by two alanine residues (A133/A130 and A211/A208) and one glycine residue (G132/G129). (E) The An. gambiae acid lipase 03083. Three principal domains are shown: the a/b hydrolase domain (red), the "cap" domain (aa. 219-340, green) covering the active site, and the putative "lid" structure (amino acids 253-282, yellow), which contains a disulfide bridge between C265-C274 (pink). The active site comprises the highly conserved catalytic triad S188 (which matches the consensus sequence GX1SX2G), D361, and H392 (cyan) and an oxyanion hole formed between the NH2 groups of L102 and Q189 (gray). (F) A close-up of the active site of the acid lipase 03083 superimposed upon the active site of 1hlg revealing the conserved spatial orientation of the residues of the catalytic triad and oxyanion hole.
Table 2. D. melanogaster Acp genes whose putative An. gambiae orthologues could not be detected in the male accessory glands
Drosophila Acp | Anopheles homologues | F | T | RB | Method |
Acp26Aa | 11117 | + |
| + | Matlab |
Acp29Ab | 20910* | + | + | + | Matlab |
Mst57Dc | 03130 |
|
| + | Matlab |
CG1462 | 10596 | + |
|
| BLAST |
CG2918 | 01827 | + |
|
| BLAST |
Acp76A | 07691 | + | + |
| Matlab |
CG4147 | 04192 | + | + |
| BLAST |
Acp32CD | 06410 | + |
|
| Matlab |
CG6168 | 06610 |
| + | + | Matlab |
CG6461 | 08915 |
|
|
| BLAST |
CG8093 | 03500, 03501 |
|
|
| Matlab |
CG11864 | 10764 | + |
|
| BLAST |
The expression profile of the Anopheles genes in different tissues is indicated. The + symbol indicates expression, and an empty space indicates that no expression could be detected. In three cases, no amplification could be recovered from any tissue. F, whole females; T, testes; RB, male carcasses depleted of reproductive organs. The last column indicates the bioinformatics method used to identify the putative orthologues. For an additional 22 Drosophila Acps putative Anopheles orthologues were not identified in the bioinformatics analyses.
*For 20910 the old Ensembl identifier is used (omitting the initial ENSANGG000000 digit) because the gene is no longer in the database and a new identifier has not been assigned to it.
Table 3. Comparative structural modelling of 22 An. gambiae MAG proteins
Gene | Classification (superfamily) | PFAM class | PDB ID code homology model | Homology model E-score | Identity, % | 123D+ Z-score | AGAPE remark-score | 3D-PSSM E-value |
05246 | Serine protease inhibitor | PF00079 | 1qlp | 7.0e-88 | 26 | 20.44 (1a7c) | 0.000e+00 | 0.0027 |
Coebe4D | Alpha/beta-Hydrolase | PF00135 | 2bce | 1.0e-106 | 25 | 25.42 | 1.47e-11 | 0.0027 |
Coebe1D | Alpha/beta-Hydrolase | PF00135 | 2bce | 1.0e-107 | 24 | 26.29 | 9.35e-12 | 0.0027 |
06418 | PR-1-like | PF00188 | 1qnx | 7.0e-41 | 26 | 33.38 | 0.000e+00 | 8.16e-07 |
06581 | Serine protease inhibitor | PF01826 | 1ccv | 5.0e-04 | 47 |
| 2.170e-25 | 0.0508 |
PF01826 | 1hx2 | 2.6-01 | 40 | 5.75 | ||||
06583 | Serine protease inhibitor | PF01826 | 1ccv | 1.1e-02 - | 30 |
| 6.490e-22 |
|
PF01826 | 1hx2 | 3.72 | 0.0585 | |||||
06585 | Serine protease inhibitor | PF01826 | 1ccv | 2.0e-03 - | 37 |
| 1.521e-25 | 0.0343 |
PF01826 | 1atb | 4.04 | ||||||
06586 | Serine protease inhibitor | PF01826 | 1ccv | 2.0e-04 | 44 32 |
| 1.78 e-24 | 0.0041 |
PF01826 | 1hx2 | 1.2e-01 | 5.57 | 9.32e-23 | 1.44e-07 | |||
06587 | Serine protease inhibitor | PF01826 | 1ccv | 8.0e-08 | 33 |
| 5.211e-25 | 0.0162 |
PF01826 | 1hx2 | 1.0e-04 | 5.42 | 7.72e-08 | ||||
07088 | Cyclophilin-like isomerase | PF00160 | 2cpl | 5.0e-59 | 57 | 31.94 |
| 6.89e-5 |
PF00160 | 1cyn | 1.0e-65 | 73 | 0.000e+00 | ||||
07491 | FAD-dependent thiol oxidase/Thioredoxin-like | PF00085 | 1mek | 4.0e-28 | 22 | 16.66 |
| 1.76e-05 |
PF00085 | 2trx | 6.0e-31 | 20 | 1.960e-15 | ||||
01424 | Ribosomial protein S5/ ATPase domain of HSP90 chaperone | PF00183 | 1usv | 3.0e-94 | 50 | 26.69 |
| 7.87e-05 |
PF00183 | 1usu | 2.0e-89 | 0.000e+00 | |||||
03083 | Alpha/beta-Hydrolase | PF04083 | 1hlg | 1.0e-103 | 33 | 44.21 | - | 4.91e-06 |
SRPN9 | Serine protease inhibitor | PF00079 | 1qlp | 4.0e-76 | 22 | 20.46 | 0.000e+00 | 1.62e-51 |
PF00079 | 1ova | 4.0e-72 | 24 | |||||
crc | ConcanavalinA-like lectin/glucanase/ P-domain calnexin/calreticulin | PF00262 | 1jhn | 1.0e-118 | 34 | 28.66 | - | 1.66e-08 |
04428 | RNA-binding domain/FAS1 | PF02469 | 1w7e | 2.0e-39 | 23 | - |
|
|
PF02469 | 1070 | - | 0.000e+00 | 3.78e-09 | ||||
TEP15 | Torpenoid cyclase/ Alpha-macroglobulin receptor | PF07678 | 1c3d | 5.0e-72 | 31 | 12.62 | - | - |
08822 | FKBP-like/EF-hand | PF00254 | 1bkf | 2.0e-26 | 35 | 17.11 |
| 0.049 |
PF00254 | 1q1c | 4.0e-26 | 27 | 9.139e-23 | ||||
08968 | Kazal type serine protease inhibitor | PF00050 | 1sgp | 2.0e-04 | 35 | 5.60 | - | 0.000277 |
09364 | CRAL_TRIO domain/ CRAL_TRIO N terminal domain | PF00650 | 1oiz | 2.0e-55 | 27 |
| 0.0 |
|
PF00650 | 1aua | 3.0e-51 | 22 | 22.51 | 0.00319 | |||
09365 | CRAL_TRIO domain/ CRAL_TRIO N terminal domain | PF00650 | 1oiz | 1.0e-54 | 25 |
| 0.0 |
|
PF00650 | 1aua | 2.0e-48 | 20 | 23.34 | 0.00286 | |||
09842 | Ribonuclease Rh-like | PF00445 | 1dix | 3.0e-55 | 26 | 27.74 |
| 3.0e-5 |
PF00445 | 1iyb | 3.0e-55 | 26 | 0.000e+00 |
For each An. gambiae candidate, we used a combination of three different protein-threading programs (123D+, AGAPE, and 3D-PSSM, last three columns) to identify 3D templates and to choose the most congruent alignment. Significance values, to estimate the accuracy of the prediction, are reported for each program as follows: Z scores (123D+), remark score (AGAPE), and E-values (3D-PSSM). 3D models were generated by comparative homology modeling using Geno3D (www.geno3d-pbil.ibcp.fr). The structure-prediction protocols used to identify the most accurate models included the following: (i) accurate significance values (90% confidence of E-value), % certainty for the 3D-PSSM, Z-score for 123D+, remark score for AGAPE; (ii) consistency of the Protein Data Bank (PDB) templates among all prediction programs; (iii) consistency in the PFAM (1) functional annotation for each determined PDB template (www.pdb.org); (iv) congruency among superfamily prediction analysis (Superfamily 1.69) (2) and 3D models functional annotation. Stereochemical quality of models was checked with PROCHECK (3) within Geno3D. Graphical representation of 3D models was performed by using Pymol DeLano Scientific (DeLano Scientific). The PDB entry code (PDB ID code) of the structure template used in the homology modeling analysis, the accuracy of the model expressed as an E-value, and the % identity between an Acp sequence and its template structure are indicated. For each candidate the classification obtained with Superfamily 1.69 is provided, and multiple domains present within a sequence are indicated. The PFAM protein domain annotation codes refer to PDB templates identified with the threading and used for the homology. PFAM codes are indicated to demonstrate the consistency of PDB templates identified with the threading methods. The remaining 24 Acps identified could not be comparatively modeled.
1. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al. (2006) Nucleic Acids Res 34:D247-D251.
2. Gough J, Karplus K, Hughey R, Chothia C (2001) J Mol Biol 313:903-919.
3. Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992) Proteins 12:345-364.