Vinogradov et al. 10.1073/pnas.0502103102. |
Fig. 3. Alignment of globins based on the myoglobin (Mb)-fold: 44 bacterial flavohemoglobins (FHbs), 21 bacterial 3/3 single-domain globins (SDgbs) (marked by an asterisk), 31 eukaryote FHbs, including those from Giardia lamblia and Dictyostelium discoideum, the Cyanidioschyzon merolae and Thalassiosira pseudonana SDgbs, 106 bacterial 2/2 Hbs (22 2/2 Hb1s, 59 2/2 Hb2s, and 25 2/2 Hb3s), 3 Chlamydomonas reinhardtii, 6 ciliate 2/2 Hbs, 7 plant 2/2 Hbs, and one Thalassiosira pseudonana 2/2 Hb, 42 bacterial globin-coupled sensors (GCSs), and 6 protoglobins (Pgbs).
Fig. 4. Bayesian phylogenetic tree of all 3/3 single-domain globins (SDgbs). Each sequence is identified by the first three letters of the binomial name, provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. The Cyanidioschyzon merolae and Thalassiosira pseudonana SDgbs are shaded in blue.
Fig. 5. Bayesian phylogenetic tree of all flavohemoglobins, including 3/3 single-domain globins. Each sequence is identified by the first three letters of the binomial name provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. Eukaryotes are shaded in blue, Bacteria are shaded in green, 3/3 single-domain globins are labeled with asterisks. The red circles labeled 1-3 indicate putative horizontal gene transfer events.
Fig. 6. Bayesian phylogenetic tree of all flavohemoglobins based on their complete sequences. Each sequence is identified by the first three letters of the binomial name provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. Eukaryotes are shaded in blue, Bacteria are shaded in green. The red circles indicate putative horizontal gene transfer events.
Fig. 7. Bayesian phylogenetic tree based on the small subunit rRNA sequences of all of the flavohemoglobin-containing organisms. Each sequence is identified by the first three letters of the binomial name. The eukaryotic sequences are shaded.
Fig. 8. Bayesian phylogenetic tree of all 2/2 Hbs. Each sequence is identified by the first three letters of the binomial name, provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. Bacterial 2/2 Hb1s, 2/2 Hb2s, and 2/2 Hb3s are colored red, green, and blue, respectively. Eukaryotic 2/2 Hbs are colored black.
Fig. 9. The Bayesian phylogenetic tree based on the alignment of 150 sequences (shown in Fig. 10) representing the following groups of globins: 10 plant 3/3 nonsymbiotic plant Hb (light green), 5 plant 3/3 symbiotic Hbs (dark green), 15 bacterial globin-coupled sensors (red), 4 protoglobins (pink), 39 bacterial 2/2 Hbs (purple), 2 Chlamydomonas reinhardtii (orange), 2 ciliate 2/2 Hbs (yellow), 3 plant 2/2 Hbs (brown), Thalassiosira pseudonana 2/2 Hb (black), 20 bacterial flavohemoglobins (blue), 19 bacterial 3/3 single-domain Hbs (blue), 9 eukaryote flavohemoglobins (blue), 1 diplomonad Giardia lamblia, and 1 mycetozoan Dictyostelium discoideum (both blue and boxed), the Cyanidioschyzon merolae and Thalassiosira pseudonana 3/3 single-domain globins (black), and 3 vertebrate (human, bird, and fish) 3/3 neuroglobins, cytoglobins, a- and b-globins and myoglobins (all in gray), and 2 urochordate Hbs.
Fig. 10. Alignment of 150 globin sequences representing the various globin groups, used in the construction of Bayesian phylogenetic trees: The full version shown is in Fig. 9 and the condensed version is shown in Fig. 1. In order of appearance: 10 plant 3/3 nonsymbiotic plant Hbs, 5 plant 3/3 symbiotic Hbs, 15 bacterial globin-coupled sensors, 4 protoglobins, 9 2/2 Hb1s, 19 2/2 Hb2s, 102/2 Hb3s, 2 Chlamydomonas reinhardtii, 2 ciliate, 3 plant and Thalassiosira pseudonana 2/2 Hbs, 20 bacterial flavohemoglobins, 19 bacterial 3/3 single-domain globins (marked by asterisks), 11 eukaryote flavohemoglobins, including Giardia lamblia and Dictyostelium discoideum, the Cyanidioschyzon merolae and Thalassiosira pseudonana 3/3 single-domain globin, 3 each of vertebrate (human, bird, and fish) 3/3 neuroglobins, cytoglobins, a- and b-globins and myoglobins, and 2 urochordate Hbs.
Fig. 11. A comparison of 2/2 and 3/3 globin structures. (A) Shows the stereo view of the tertiary structure of Paramecium caudatum (Protein Data Bank ID code 1DLW) 2/2 Hb, the heme (red) and the proximal histidine residue. Helices are labeled according to the conventional globin fold nomenclature (AH). To highlight the 2/2 structural motif, orange and cyan helix colors were used. (B) Shows the stereo view of the tertary structure of sperm whale myoglobin (Protein Data Bank ID code 1A6M) in an orientation comparable to that of A and with the same color conventions. All structural elements (helices and loops) additional to the 2/2 or 3/3 a-helical sandwich motifs are highlighted in gray. The structures were drawn with MOLSCRIPT (1).
1. Kraulis, P. J. (1991) J. Appl. Crystallogr. 24, 946950.
Table 1. Eukaryote genomes
Eukaryote name | State of genome work |
Anopheles gambiae | Unfinished |
Arabidopsis thaliana | Completed |
Aspergillus nidulans | Unfinished |
Brachydanio (Danio) rerio | Unfinished |
Candida albicans | Unfinished |
Caenorhabditis briggsae | Unfinished |
Caenorhabditis elegans | Completed |
Chlamydomonas reinhardtii | Unfinished |
Ciona intestinalis | Completed |
Cryptococcus neoformans | Unfinished |
Cryptosporidium parvum | Unfinished |
Cyanidioschyzon merolae | Completed |
Drosophila melanogaster | Unfinished |
Encephalitozoon cuniculi | Completed |
Entamoeba histolytica | Unfinished |
Eremothecium gossypii | Completed |
Giardia lamblia | Unfinished |
Gibberella zeae | Unfinished |
Homo sapiens | Completed |
Kluyveromyces lactis | Completed |
Leishmania major | Completed |
Magnaporthe grisea | Unfinished |
Mus musculus | Completed |
Neurospora crassa | Unfinished |
Oryza sativa | Completed |
Plasmodium falciparum | Completed |
Plasmodium yoelii yoelii | Unfinished |
Rattus norvegicus | Completed |
Saccharomyces cerevisiae | Completed |
Schizosaccharomyces pombe | Completed |
Takifugu (Fugu) rubripes | Completed |
Tetraodon nigroviridis | Unfinished |
Thalassiosira pseudonana | Completed |
Trypanosoma brucei | Completed |
Trypanosoma cruzi | Unfinished |
Ustilago maydis | Unfinished |
Table 2. Microbial database
Organism name | State of genome work |
Archea | |
Aeropyrum pernix | Completed |
Haloarcula marismortui | Completed |
Halobacterium sp. NRC-1 | Completed |
Halobacterium salinarum | Completed |
Methanocaldococcus jannaschii | Completed |
Methanococcus maripaludis | Completed |
Methanopyrus kandleri | Completed |
Methanococcoides burtonii | Unfinished |
Methanosarcina acetivorans | Completed |
Methanosarcina barkeri | Completed |
Methanosarcina mazei | Completed |
Methanothermobacter thermoautotrophicum | Completed |
Nanoarchaeum equitans | Completed |
Picrophilus torridus | Completed |
Pyrobaculum aerophilum | Completed |
Pyrococcus abyssi | Completed |
Pyrococcus furiosus | Completed |
Pyrococcus horikoshii | Completed |
Sulfolobus solfataricus | Completed |
Sulfolobus tokodaii | Completed |
Thermoplasma acidophilum | Completed |
Thermoplasma volcanium | Completed |
Eubacteria | |
Actinobacillus pleuropneumoniae | Completed |
Agrobacterium tumefaciens | Completed |
Anabaena variabilis | Unfinished |
Archaeoglobus fulgidus | Completed |
Aquifex aeolicus | Completed |
Azotobacter vinelandii | Completed |
Bacillus anthracis | Completed |
Bacillus cereus | Completed |
Bacillus halodurans | Completed |
Bacillus subtilis | Completed |
Bacillus thuringiensis | Completed |
Bacteroides thetaiotaomicron | Completed |
Bartonella henselae | Completed |
Bartonella quintana | Completed |
Bdellovibrio bacteriovorus | Completed |
Bifidobacterium longum | Completed |
Bordetella bronchiseptica | Completed |
Bordetella parapertussis | Completed |
Bordetella pertussis | Completed |
Borrelia burgdorferi | Completed |
Bradyrhizobium japonicum | Completed |
Brucella melitensis | Completed |
Brucella suis | Completed |
Buchnera aphidicola | Completed |
Burkholderia cepacia | Unfinished |
Burkholderia fungorum | Completed |
Campylobacter jejuni | Completed |
Candidatus Blochmannia floridanus | Completed |
Caulobacter crescentus | Completed |
Chlamydia muridarum | Completed |
Chlamydia trachomatis | Completed |
Chlamydophila caviae | Completed |
Chlamydophila pneumoniae | Completed |
Chlorobium tepidum | Completed |
Chloroflexus aurantiacus | Completed |
Chromobacterium violaceum | Completed |
Clostridium acetobutylicum | Completed |
Clostridium perfringens | Completed |
Clostridium tetani | Completed |
Clostridium thermocellum | Completed |
Corynebacterium diphtheriae | Completed |
Corynebacterium efficiens | Completed |
Corynebacterium glutamicum | Completed |
Coxiella burnetii | Completed |
Crocosphaera watsonii | Unfinished |
Cytophaga hutchinsonii | Completed |
Dechloromonas aromatica | Unfinished |
Deinococcus radiodurans | Completed |
Desulfitobacterium hafniense | Completed |
Desulfovibrio desulfuricans | Unfinished |
Desulfovibrio vulgaris | Completed |
Ehrlichia canis | Unfinished |
Ehrlichia ruminantium | Unfinished |
Enterococcus faecalis | Completed |
Enterococcus faecium | Completed |
Escherichia coli | Completed |
Exiguobacterium sp. 255-15 | Unfinished |
Ferroplasma acidarmanus | Completed |
Fusobacterium nucleatum | Completed |
Fusobacterium nucleatum | Unfinished |
Gloeobacter violaceus | Completed |
Geobacter metallireducens | Completed |
Geobacter sulfurreducens | Completed |
Haemophilus ducreyi | Completed |
Haemophilus influenzae | Unfinished |
Haemophilus somnus | Completed |
Helicobacter hepaticus | Completed |
Helicobacter pylori | Completed |
Kineococcus radiotolerans | Unfinished |
Lactobacillus gasseri | Completed |
Lactobacillus johnsonii | Completed |
Lactobacillus plantarum | Completed |
Lactococcus lactis | Completed |
Leuconostoc mesenteroides | Completed |
Listeria innocua | Completed |
Listeria monocytogenes | Completed |
Magnetococcus sp. MC-1 | Completed |
Magnetospirillum magnetotacticum | Completed |
Mesorhizobium loti | Completed |
Mesorhizobium sp. BNC1 | Unfinished |
Methylobacillus flagellatus | Unfinished |
Microbulbifer degradans | Completed |
Moorella thermoacetica | Unfinished |
Mycobacterium avium | Completed |
Mycobacterium bovis | Completed |
Mycobacterium leprae | Completed |
Mycobacterium tuberculosis | Completed |
Mycoplasma gallisepticum | Completed |
Mycoplasma genitalium | Completed |
Mycoplasma mobile | Completed |
Mycoplasma mycoides | Completed |
Mycoplasma penetrans | Completed |
Mycoplasma pneumoniae | Completed |
Mycoplasma pulmonis | Completed |
Neisseria meningitidis | Completed |
Nitrosomonas europaea | Completed |
Nostoc sp. PCC 7120 | Completed |
Nostoc punctiforme | Completed |
Novosphingobium aromaticivorans | Completed |
Oceanobacillus iheyensis | Completed |
Oenococcus oeni | Completed |
Onion yellows phytoplasma | Completed |
Parachlamydia sp. UWE25 | Completed |
Pasteurella multocida | Completed |
Pasteuria nishizawae | Unfinished |
Pediococcus pentosaceus | Unfinished |
Photorhabdus luminescens | Completed |
Pirellula sp. 1 | Completed |
Porphyromonas gingivalis | Completed |
Prochlorococcus marinus | Completed |
Pseudomonas aeruginosa | Completed |
Pseudomonas fluorescens | Completed |
Pseudomonas putida | Completed |
Pseudomonas syringae | Completed |
Psychrobacter sp. 273-4 | Unfinished |
Rhodobacter sphaeroides | Completed |
Rhodopseudomonas palustris | Completed |
Rhodospirillum rubrum | Completed |
Sinorhizobium meliloti | Completed |
Rickettsia conorii | Completed |
Rickettsia prowazekii | Completed |
Rickettsia rickettsii | Unfinished |
Rickettsia sibirica | Unfinished |
Ralstonia eutropha; | Unfinished |
Ralstonia metallidurans | Completed |
Ralstonia solanacearum | Completed |
Rubrivivax gelatinosus | Unfinished |
Rubrobacter xylanophilus | Unfinished |
Salmonella enterica s | Completed |
Salmonella typhimurium | Completed |
Shewanella oneidensis | Completed |
Shigella flexneri | Completed |
Staphylococcus aureus | Completed |
Staphylococcus epidermidis | Completed |
Streptomyces avermitilis | Completed |
Streptomyces coelicolor | Completed |
Streptococcus agalactiae | Completed |
Streptococcus mutans | Completed |
Streptococcus pneumoniae | Completed |
Streptococcus pyogenes | Completed |
Synechococcus elongatus | Unfinished |
Synechococcus sp. WH 8102 | Completed |
Synechocystis sp. PCC 6803 | Completed |
Thermoanaerobacter tengcongensis | Completed |
Thermobifida fusca | Completed |
Thermosynechococcus elongatus | Completed |
Thermotoga maritima | Completed |
Thermus thermophilus | Completed |
Treponema denticola | Completed |
Treponema pallidum | Completed |
Trichodesmium erythraeum | Completed |
Tropheryma whipplei | Completed |
Ureaplasma parvum | Completed |
Vibrio cholerae | Completed |
Vibrio parahaemolyticus | Completed |
Vibrio vulnificus | Completed |
Wigglesworthia glossinidia | Completed |
Wolbachia endosymbiont of | |
Drosophila melanogaster | Completed |
Wolinella succinogenes | Completed |
Xanthomonas axonopodis | Completed |
Xanthomonas campestris | Completed |
Xylella fastidiosa | Completed |
Yersinia pestis | Completed |
Table 3. Expect (E) values for the first hits obtained from BLASTP searches using Thalassiosira pseudonana Scaff_137 and Cyanidioschyzon merolae CMR319C as queries, and from PSI-BLAST searches based on all the members of bacterial 3/3 single-domain globins (SDgbs) and flavohemoglobins (FHbs) and of vertebrate 3/3 neuroglobin (Ngbs)
| C. merolae CMR319C | T. pseudonana Scaffold_137 | Bacterial SDgbs | Bacterial FHbs | Vertebrate Ngbs |
No. of hits, above threshold/total | 146/274 (1st iteration) | 131/430 (1st iteration) | 864/1,040 (4th iteration) | 455/1,022 (3rd iteration) | 690/1,040 (3rd iteration) |
Bacterial SDgbsa | |||||
Bacterial FHbs | e 10 | e 15 | e 42 | e 64 | e 8 |
Eukaryote FHbsb | e 7 | e 10 | e 37 | e 53 | e 9 |
Vertebrate Ngbsc | e 11 | e 11 | e 13 | e 12 | e 78 |
Vertebrate Cygbsd | | 0.04 | e 5 | 0.02 | e 4 |
Vertebrate a | | 0.06 | e 8 | e 7 | e 10 |
Vertebrate b | | | 0.003 | e 3 | e 8 |
Vertebrate Mbs | | | | | e 8 |
Urochordate Hbs e | 0.6 | 0.002 | e 13 | e 9 | e 11 |
Echinoderm Hbs f | 0.06 | 0.5 | e 6 | e 6 | e 8 |
Plant SHbs g | 0.006 | e 4 | e 9 | e 11 | e 8 |
Plant NsHbsh | e 4 | e 6 | e 12 | e 8 | e 6 |
Annelid Hbs i | 0.001 | e 4 | e 5 | e 5 | e 5 |
Crustacean Hbs j | 0.001 | e 5 | e 7 | e 5 | e 10 |
Insect Hbs k | | | e 5 | | e 4 |
Mollusc Hbsl | | 0.05 | e 7 | 0.04 | e 5 |
Nematode Hbsm | | e 4 | e 4 | e 4 | e 6 |
Searchs were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix, and threshold E = 0.005. The E values rounded off to the nearest integer are provided for the first hit, i.e. the first member of a globin group recognized by the query sequence(s). E > 0.005 are given only for cases where the alignment is acceptable (see Discussion). The eukaryote FHbs were not included because the results were similar to those obtained with the bacterial FHbs. Cygbs, cytoglobins; SHbs, 3/3 symbiotic plant Hb; NsHbs, 3/3 nonsymbiotic plant Hb; Mbs, myoglobins.
a
Bacterial SDgbs, with accession nos. in parentheses: Acinetobacter sp. ADP (YP_047731), Aquifex aeolicus (NP_21346), Bradyrhizobium japonicum (NP_769447 and NP_771523), Campylobacter coli (AAP86201), Campylobacter jejuni (NP_282714), Clostridium perfringens (NP_562869), Chromobacterium violaceum (NP_562869), Desulfitobacterium hafniense (ZP_00102525), Gloebacter violaceus (NP_924518), Nostoc punctiforme (ZP_00111020), Novosphingobium aromaticivorans (ZP_00304525), Photobacterium profundun (CAG22319), Pseudomonas aeruginosa (NP_252656), Rhodopirellula baltica (NP_864649), Rhodopseudomonas palustris (NP_949046 and NP_949047), Silicibacter sp. (ZP_00336820), Thermobifida fusca (ZP_00292898), Vibrio parahaemolyticus (NP_800617), and Vitreoscilla stercoraria (AAT01097).b
Eukaryote FHbs: fungal, diplomonad (Giardia lamblia), and mycetozoan (Dictyostelium discoideum) FHbs.c
Vertebrate Ngbs, with accession nos. in parentheses: Bos taurus (NP_001001853), Canis familiaris (NP_001003356), Cavia porcellus (CAH03122), Danio rerio (NP_571928), Gallus gallus (CAG25721), Homo sapiens (NP_067080), Macaca mulatta (AAP92374), Mus musculus (NP_071859), Ochotona curzoniae (AAT28324), Oncorhynchus mykiss (P59742 and P59743), Oryctolagus cuniculus (CAH-3123), Rattus norvegicus (AAK1576), Sus scrofa (CAG25617), and Tetraodon nigroviridis (CAC59975).d
Vertebrate Cygbs, with accession nos. in parentheses: Danio rerio (NP_694484), Gallus gallus (XP_425373), Homo sapiens (NP_599030), Mus musculus (AAH55040), Onncorhynchus mykiss (CAD68070), Rattus norvegicus (NP_570100), Tetraodon nigroviridis (CAG02681), and Xenopus tropicalis (AAH76983).e
Urochordate Hbs, with accession nos. in parentheses: Ciona intestinalis (CAD89600, CAD68145, CAD68146, CAD68147).f
Echinoderm Hbs, with accession nos. in parentheses: Caudina arenicola (P80017, P80018, AAB19247, and AAB23119) and Paracaudina chilensis (S06134).g
Plant SHbs, with accession nos. in parentheses: Canavalia lineata (P42511), Glycine max (P02235, P02236, CAA23729, CAA23731, and CAA23732); Lupinus luteus (O607193A and CAA54331), Medicago sativa (AAA32659, CAA32491, CAA32491, CAA31750, CAA38408, and CAA38409), Medicago truncatula (CAA40899 and CAA40900), Phaseolus vulgaris (P02234, AAA33767, and O04939), Pisum sativum (O80405, O48665, P02233, O9SAZ0, and O9SAZ1), Psophocarpus tetragonolobus (CAA46704), Sesbania rostrata (AAA03002, AAA03005, and CAA32044), and Vigna unguiculata (T11573).h
Plant NsHbs, with accession nos. in parentheses: Arabidopsis thaliana (AAP21213 and AAF76353), Brassica napus (AAK07741), Ceratodon purpureus (AAG22831), Glycine max (AAA97887), Gossypium hirsutum (AAL09463 and AAK21604), Hordeum vulgare (AAB70097), Lycopersicon esculentum (AAK07676 and AAK07677), Marchantia polymorpha (AAK07743), Medicago sativa (AAG29748), Oryza sativa (AAK72228 and AAK72229), (AAK72230 and AAK72231), Physcomitrella patens (AAF66104), Raphanus sativus (AAP37043), Solanum tuberosum (AAN85431), Triticum aestivum (AAN85432), and Zea mays (AAG01375).i
Annelid extracellular Hbs, with accession nos. in parentheses: Lamellibrachia (P15469), Lumbricus terrestris (CAA09958, C28151, AAC14535, and P08924), Macrobdella decora (BAC82445), Oligobrachia mashikoi (BAD86542 and O7M413), Pheretima hilgendorfi (P83122), Pheretima sieboldi (P11740), Riftia pachyptila (P80592 and CAD29155), Sabella spallanzanii (CAC37410 and CAC37412); Tubifex tubifex (P18202), Tylorrhynchus heterochaetus (P13578). Annelid intracellular Hbs: Glycera dibranchiata monomer (A33420 and S13157), polymer P1 (P23216), P2 (A36529), and P3 (B36529).j
Crustacean extracellular Hbs, with accession nos. in parentheses: Artemia salina (AAC96001 and AAC96002), Daphnia magna (AAC47544, AAU44972, BAA76874, and BAA76875), Daphnia spinulata (AAM22870, AAM22873, AAM22874, AAM22875, AAM22876, AAM22878, AAM22880, AAM22882, AAM22883, AAM22884, AAM22886, and AAM22887), Moina macrocopa (BAB62537, BAB62536, and BAB62535), and Parartemia zietsiana (AAF72633).k
Insect Hbs, with accession nos. in parentheses: Chironomus tentans (CAA39721, CAA39724, and CAA39725), Chironomus thummi piger (S04499, S04500, S21641, and S21633), Chironomus thummi thummi (AAB58931, AAF87695, A30477, GGICE9, JT0349, JT0423, P02224, P12548, P12550, P29245, O23761, O23762, and O23763), Drosophila melanogaster (NP_732083), and Kiefferulus cornishi (AAC47976, AAC46978, AAC48979, AAC46980, AAC46981, and AAC46982).l
Mollusc Hbs, with accession nos. in parentheses: Anadara trapezia (S06503), Aplysia juliana (BAA19794), Aplysia kurodai (P02211), Barbatia lima (BAA09588, S61518, S61519, and S61520), Barbatia virescens (BAA09587), Biomphalaria glabrata (AAC24318 and CAH23232) Buccinum undatum (Q7M424), Calyptogena nautilei (BAD30645), Lucina pectinata (AAO89499, P41261, and P41262) Nassa mutabilis (P31331), and Scapharca inaequivalvis (AAL96376, AAL96375, P14822, and S09068).m
Nematode Hbs, with accession nos. in parentheses: Caenorhabditis briggsae, CAE63297 (CBG07681), CAE60743 (CBG04428), CAE64614 (CBG09371), CAE65561 (CBG10551), CAE62918 (CBG07112), CAE66591 (CBG11915), CAE71628, and CAE75175; Caenorhabditis elegans CAB60330 (Y15E3A.2), CAE17840 (F56C4.3), NP_501142 (4H982), NP_492107 (C26C6.7), NP_504466, CAB04152.2 (F21A3.6); AAK52176 (C18C4), and AAP82661 (C23H5.2); Mermis nigrescens (AAF35435); and Syngamus trachea (AAL56426 and AAL56427).Table 4. Expect (E) values for the first hit obtained from
BLASTP searches using Thalassiosira pseudonana Scaff_18 and Cyanidioschyzon reinhardtii C160981 as queries, and from PSI-BLAST searches based on all the members of bacterial, plant and ciliate 2/2Hbs
| T.pseudonana Scaff_18 | C. reinhardtii C160981 | Bacterial 2/2 Hbs | Plant 2/2 Hbs | Ciliate 2/2 Hbs | All 2/2 Hbs |
No. of hits, above threshold/total | 28/61 | 35/72 | 138/211 (4th iteration) | 68/87 (3rd iteration) | 52/111 (2nd iteration) | 144/201 (6th iteration) |
Bacterial 2/2 Hbs | e 9 | e 23 | e 37 | e 14 | e 30 | e 33 |
Plant 2/2 Hbs* | e 25 to e20 | | e 15 | e 87 to e89 | | e 23 |
Ciliate 2/2 Hbs | | e 15 | e 26 | | e 51 | e 23 |
Bacterial and eukaryotic FHbs | | | e 4 | | | e 4 |
Searches were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix and threshold E = 0.005. The E values, rounded off to the nearest integer, are provided for the first hit, i.e. the first member of a globin group recognized by the query sequence(s). E > 0.005 are given only for cases where the alignment is acceptable (see Discussion). The results obtained with the other C. reinhardtii globins were similar. FHbs, flavohemoglobins.
*Plant 2/2 Hbs, with accession nos. in parentheses: Arabidopsis thaliana (AAK55409), Datisca glomerata (CAD33536), Glycine max (AAS48191), Hordeum vulgare (AAK55410), Oryza sativa (BAD32857), and Triticum aestivum (AAN85433).
Ciliate 2/2 Hbs, with accession nos. in parentheses: Paramecium caudatum (S27185), P. multimicronucleatum (S60031), P. triaurelia (S60030), Tetrahymena pyriformis (A36270), and Tetrahymena thermophila (pir|S32556).Table 5. Expect (E) values for the first hit obtained from
PSI-BLAST searches using as queries Mycobacterium tuberculosis 2/2Hb1 (HbN) and 2/2Hb2 (HbO) and Desulfitobacterium hafniense 2/2Hb3
| M. tuberculosis HbNNP_216058 2/2Hb1 | M. tuberculosi s HbONP_216986 2/2Hb2 | D. hafniense ZP_00102524 2/2Hb3 |
No. of hits, above threshold/total | 36/70 (1st iteration) | 72/91 (1st iteration) | 25/36 (1st iteration) |
Bacterial 2/2Hbs 1s/2s/3s | e 54 | e 56 | e 21 |
Plant 2/2Hbs* | | e 9 | |
Ciliate 2/2Hbs | e 16 | | |
No. of hits, above threshold/total | 62/113 (3rd iteration) | 109/138 (6th iteration) | 44/86 (3rd iteration) |
Bacterial 2/2Hb1s/2s/3s | 20 e 51 to e34 | 64 e 43 to e19 | 23 e 43 to e29 |
Plant 2/2Hbs* | | e 21 to e18 | |
Ciliate 2/2Hbs | e 29 | e 11 | 0.01 |
*Plant 2/2Hbs, with accession nos. in parentheses: Arabidopsis thaliana (AAK55409), Datisca glomerata (CAD33536), Glycine max (AAS48191), Hordeum vulgare (AAK55410), Oryza sativa (BAD32857), and Triticum aestivum (AAN85433).
Ciliate 2/2Hbs, with accession nos. in parentheses: Paramecium caudatum (S27185), P. multimicronucleatum (S60031), P. triaurelia (S60030), Tetrahymena pyriformis (A36270), and T. thermophila (S32556).Table 6. Expect (E) values for the first hit obtained from psiblast searches using all the members of bacterial globin-coupled sensors (GCSs) and protoglobins (Pgbs) as queries
| Bacterial GCSs | Bacterial Pgbs |
No. of hits (above threshold/total) | 49/206 (4th iteration) | 23/54 (4th iteration) |
Bacterial GCSs * | e 45 to e11 | e 9 |
Bacterial Pgbs | e 18 | e 91 to e42 |
Bacterial FHbs | e 4 | |
Eukaryotic FHbs | e 4 | |
Searches were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix, and threshold E = 0.005. The E values, rounded off to the nearest integer, are provided for the first hit, i.e., the first member of a globin group recognized by the query sequence(s).
*Bacterial GCSs, with accession nos. and lengths in parentheses: Agrobacterium tumefaciens [NP_354049 (499 aa) and NP_531216 (568 aa)], Azoarcus sp. (EbN1 YP_158656, 678 aa), Azotobacter vinelandii (ZP_00090857, 434 aa), Bacillus anthracis (YP_031514, 433 aa), Bacillus cereus (ZP_00239241, 433 aa), Bacillus clausii (YP_176973, 449 aa), Bacillus halodurans (NP_241371, 439 aa), Bacillus licheniformis (YP_90716, 430 aa), Bacillus subtilis (NP_388919, 432aa), Bacillus thuringiensis (YP_039413, 433 aa), Bordetella bronchiseptica (NP_888505, 475 aa), Bordetella parapertussis (NP_884745, 475 aa), Bordetella pertussis (NP_882025, 475 aa), Burkholderia fungorum (ZP_00277651, 723 aa), Caulobacter crescentus [NP_419247 (537 aa) and NP_421120 (555 aa)], Chromobacterium violaceum [NP_900548 (295 aa) and NP_899909 (375 aa)], Desulfotalea psychrophila (YP_063937, 363 aa), Desulfitobacterium hafniense (ZP_00102172, 157 aa), Erwinia carotovara (YP_049782, 442 aa), Escherischia colli (NP_416007, 460 aa), Exiguobacterium [ZP_00183969 (430aa) and ZP00182337 (425 aa)], Geobacter metallireducens (ZP_00298606, 300 aa), Geobacter sulfurreducens (NP_954351, 300 aa), Gluconobacter oxydans (YP_191196, 458 aa), Haloarcula marismortui (AAV45247, 497 aa), Halobacterium salinarum (T44978, 489 aa), Magnetococcus sp. [ZP_00289561 (453 aa) and ZP_00289126 (507aa)], Magnetospirillum magnetotacticum [ZP_00054075 (721 aa) and ZP_00054774 (443 aa)], Moorella thermoacetica (ZP_00330036, 245 aa), Novosphingobium aromaticivorans (ZP_00304036, 481 aa), Rhizobium meliloti (S61831, 533 aa), Rhodobacter sphaeroides (ZP_00207372, 341 aa), Rhodospirillum rubrum [ZP_00268251 (442 aa) and ZP_00270832 (445 aa)], Shigella flexneri (NP_707605, 381 aa), Silicibacter sp. [ZP_0038974 (308 aa), ZP_00336325 (485 aa), and ZP00336080 (487 aa)], Sinorhizobium meliloti (NP_384742, 533 aa), Thermosynechococcus elongatus (NP_682779, 194 aa), Thermus thermophilus (YP_005074, 203 aa), Vibrio vulnificus (NP_936636, 306 aa), and Zymomonas mobilis (AAV_89506, 467 aa).
Bacterial Pgbs, with accession nos. and lengths in parentheses: Aeropyrum pernix (NP_147118, 195 aa), Chloroflexus aurantiacus (ZP_00359040, 227 aa), Methanosarcina acetivorans (NP_617780, 195 aa), Methanosarcina barkeri (ZP_000294773, 195 aa), Rubrobacter xylanophilus (ZP_00200180, 196 aa), and Thermobifida fusca (ZP_00293478, 197aa).Supporting Materials and Methods
All of the putative globin sequences were aligned manually by following the procedure used earlier in the alignment of >700 globins (1) designed to fit the sequences to the myoglobin fold (2, 3), the pattern of predominantly hydrophobic residues at 37 conserved, mostly solvent-inaccessible positions with mean solvent-accessible areas of <15 Å2 (4), including 33 intrahelical residues at positions A8, A11, A12, A15, B6, B9, B10, B13, B14, C4, E4, E7, E8, E11, E12, E15, E18, E19, F1, F4, G5, G8, G11, G12, G13, G15, G16, H7, H8, H11, H12, H15, and H19; the three interhelical residues at CD1, CD4, and FG4; and the invariant His at F8. The alignment was based on the crystal-structure-based alignment of the 58 known globin crystal structures available from the Protein Data Bank: 1A6M, 1IRDA, 1IRDB, 1I3DA, 1A9W, 1OJ6, 1UMO, 1HDSA, 1HDSB, 1A4FA, 1A4FB, 1HBRA, 1HBRB, 1CG5A, 1CG5B, 1SPGA, 1GCVA, 1GCVB, 1OUTA, 1OUTB, 1LA6A, 1LA6B, 1V75A, 1V75B, 2MM1, 1LHT, 1EMY, 1IT2, 2LHB, 1EW6, 1MBA, 1ASH, 1HLB, 1HLM, 1ECO, 2HBG, 1BOB, 1H97, 3SDH, 1SCTA, 1SCTB, 1ITH, 1FSL, 2GDM, 1D8U, 1KR7, 1DLY, 1IDR, 1NGK, 1UX8, 1DLW, 1MWB, 1VHB, 1GVH, 1CQX, 1OR4, and 1TU9. Although an earlier alignment of 700 globins had suggested that globins were characterized by two invariant residues, F8His and CD1Phe (1), members of the recently discovered 2/2 Hb family (5-8), can accomodate other hydrophobic residues, such as Tyr/Met/Leu/Ile/Val at the CD1 position and Ala/Ser/Thr/Leu at the distal E7 position, in addition to His and Gln. Moss 3/3 nonsymbiotic plant Hb also has a Tyr at CD1 (9). Thus, we required only that the putative globin sequence alignment be such as to provide a hydrophobic residue at CD1 in the order of preference Phe>Tyr>Leu>Met>Ile>Val, the residues at the distal E7 in the order of preference His>Gln>Leu≈Thr>Ala≈Val≈Ser≈Tyr, a candidate His at the proximal F8 position, and at the remaining 32 positions an amino acid already encountered in the crystal structure alignment. Deletions in the helical regions were avoided, and interhelical regions were unrestricted to obtain the optimum alignment in the flanking helical regions. The C-terminal FAD reductase sequences and small subunit rRNA sequences were aligned with CLUSTALW and manually improved using GENEDOC.
1. Kapp, O. H., Moens, L., Vanfleteren, J., Trotman, C. N. A., Suzuki, T. & Vinogradov, S. N. (1995) Protein Sci. 4, 21792190.
2. Lesk, A. M. & Chothia, C. (1980) J. Mol. Biol. 136, 225270.
3. Bashford, D., Chothia, C. & Lesk, A. M. (1987) J. Mol. Biol. 196, 199216.
4. Gerstein, M., Sonnhammer, E. L. & Chothia, C. (1994) J. Mol. Biol. 236, 10671078.
5. Pesce, A., Couture, M., Dewilde, S., Guertin, M., Yamauchi, K., Ascenzi, P., Moens, L. & Bolognesi, M. (2000) EMBO J. 19, 24242434.
6. Milani, M., Pesce, A., Ouellet, Y., Ascenzi, P., Guertin, M. &. Bolognesi, M. (2001) EMBO J. 20, 39023909.
7. Milani, M., Savard, P.-Y., Ouellet, H., Ascenzi, P., Guertin, M. & Bolognesi, M. (2003) Proc. Natl. Acad. Sci. USA 100, 57665771.
8. Wittenberg, J. B., Bolognesi, M., Wittenberg, B. A. & Guertin, M. (2002) J. Biol. Chem. 277, 871874.
9. Arredondo-Peter, R., Hargrove, M. S.,. Moran, J. F., Sarath, G. & Klucas, R. V. (1998) Plant Physiol. 118, 11211125.