Vinogradov et al. 10.1073/pnas.0502103102.

Supporting Information

Files in this Data Supplement:

Supporting Table 1
Supporting Table 2
Supporting Materials and Methods
Supporting Table 3
Supporting Table 4
Supporting Figure 3
Supporting Table 5
Supporting Table 6
Supporting Figure 4
Supporting Figure 5
Supporting Figure 6
Supporting Figure 7
Supporting Figure 8
Supporting Figure 9
Supporting Figure 10
Supporting Figure 11




Supporting Figure 3

Fig. 3. Alignment of globins based on the myoglobin (Mb)-fold: 44 bacterial flavohemoglobins (FHbs), 21 bacterial 3/3 single-domain globins (SDgbs) (marked by an asterisk), 31 eukaryote FHbs, including those from Giardia lamblia and Dictyostelium discoideum, the Cyanidioschyzon merolae and Thalassiosira pseudonana SDgbs, 106 bacterial 2/2 Hbs (22 2/2 Hb1s, 59 2/2 Hb2s, and 25 2/2 Hb3s), 3 Chlamydomonas reinhardtii, 6 ciliate 2/2 Hbs, 7 plant 2/2 Hbs, and one Thalassiosira pseudonana 2/2 Hb, 42 bacterial globin-coupled sensors (GCSs), and 6 protoglobins (Pgbs).





Supporting Figure 4

Fig. 4. Bayesian phylogenetic tree of all 3/3 single-domain globins (SDgbs). Each sequence is identified by the first three letters of the binomial name, provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. The Cyanidioschyzon merolae and Thalassiosira pseudonana SDgbs are shaded in blue.





Supporting Figure 5

Fig. 5. Bayesian phylogenetic tree of all flavohemoglobins, including 3/3 single-domain globins. Each sequence is identified by the first three letters of the binomial name provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. Eukaryotes are shaded in blue, Bacteria are shaded in green, 3/3 single-domain globins are labeled with asterisks. The red circles labeled 1-3 indicate putative horizontal gene transfer events.





Supporting Figure 6

Fig. 6. Bayesian phylogenetic tree of all flavohemoglobins based on their complete sequences. Each sequence is identified by the first three letters of the binomial name provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. Eukaryotes are shaded in blue, Bacteria are shaded in green. The red circles indicate putative horizontal gene transfer events.





Supporting Figure 7

Fig. 7. Bayesian phylogenetic tree based on the small subunit rRNA sequences of all of the flavohemoglobin-containing organisms. Each sequence is identified by the first three letters of the binomial name. The eukaryotic sequences are shaded.





Supporting Figure 8

Fig. 8. Bayesian phylogenetic tree of all 2/2 Hbs. Each sequence is identified by the first three letters of the binomial name, provided in the alignment of sequences shown in Fig. 3. The numbers at the various nodes indicate Bayesian probabilities ´100. Bacterial 2/2 Hb1s, 2/2 Hb2s, and 2/2 Hb3s are colored red, green, and blue, respectively. Eukaryotic 2/2 Hbs are colored black.





Supporting Figure 9

Fig. 9. The Bayesian phylogenetic tree based on the alignment of 150 sequences (shown in Fig. 10) representing the following groups of globins: 10 plant 3/3 nonsymbiotic plant Hb (light green), 5 plant 3/3 symbiotic Hbs (dark green), 15 bacterial globin-coupled sensors (red), 4 protoglobins (pink), 39 bacterial 2/2 Hbs (purple), 2 Chlamydomonas reinhardtii (orange), 2 ciliate 2/2 Hbs (yellow), 3 plant 2/2 Hbs (brown), Thalassiosira pseudonana 2/2 Hb (black), 20 bacterial flavohemoglobins (blue), 19 bacterial 3/3 single-domain Hbs (blue), 9 eukaryote flavohemoglobins (blue), 1 diplomonad Giardia lamblia, and 1 mycetozoan Dictyostelium discoideum (both blue and boxed), the Cyanidioschyzon merolae and Thalassiosira pseudonana 3/3 single-domain globins (black), and 3 vertebrate (human, bird, and fish) 3/3 neuroglobins, cytoglobins, a- and b-globins and myoglobins (all in gray), and 2 urochordate Hbs.





Supporting Figure 10

Fig. 10. Alignment of 150 globin sequences representing the various globin groups, used in the construction of Bayesian phylogenetic trees: The full version shown is in Fig. 9 and the condensed version is shown in Fig. 1. In order of appearance: 10 plant 3/3 nonsymbiotic plant Hbs, 5 plant 3/3 symbiotic Hbs, 15 bacterial globin-coupled sensors, 4 protoglobins, 9 2/2 Hb1s, 19 2/2 Hb2s, 102/2 Hb3s, 2 Chlamydomonas reinhardtii, 2 ciliate, 3 plant and Thalassiosira pseudonana 2/2 Hbs, 20 bacterial flavohemoglobins, 19 bacterial 3/3 single-domain globins (marked by asterisks), 11 eukaryote flavohemoglobins, including Giardia lamblia and Dictyostelium discoideum, the Cyanidioschyzon merolae and Thalassiosira pseudonana 3/3 single-domain globin, 3 each of vertebrate (human, bird, and fish) 3/3 neuroglobins, cytoglobins, a- and b-globins and myoglobins, and 2 urochordate Hbs.





Supporting Figure 11

Fig. 11. A comparison of 2/2 and 3/3 globin structures. (A) Shows the stereo view of the tertiary structure of Paramecium caudatum (Protein Data Bank ID code 1DLW) 2/2 Hb, the heme (red) and the proximal histidine residue. Helices are labeled according to the conventional globin fold nomenclature (A–H). To highlight the 2/2 structural motif, orange and cyan helix colors were used. (B) Shows the stereo view of the tertary structure of sperm whale myoglobin (Protein Data Bank ID code 1A6M) in an orientation comparable to that of A and with the same color conventions. All structural elements (helices and loops) additional to the 2/2 or 3/3 a-helical sandwich motifs are highlighted in gray. The structures were drawn with MOLSCRIPT (1).

1. Kraulis, P. J. (1991) J. Appl. Crystallogr. 24, 946–950.





Table 1. Eukaryote genomes

Eukaryote name

State of genome work

Anopheles gambiae

Unfinished

Arabidopsis thaliana

Completed

Aspergillus nidulans

Unfinished

Brachydanio

(Danio) rerio

Unfinished

Candida albicans

Unfinished

Caenorhabditis briggsae

Unfinished

Caenorhabditis elegans

Completed

Chlamydomonas reinhardtii

Unfinished

Ciona intestinalis

Completed

Cryptococcus neoformans

Unfinished

Cryptosporidium parvum

Unfinished

Cyanidioschyzon merolae

Completed

Drosophila melanogaster

Unfinished

Encephalitozoon cuniculi

Completed

Entamoeba histolytica

Unfinished

Eremothecium gossypii

Completed

Giardia lamblia

Unfinished

Gibberella zeae

Unfinished

Homo sapiens

Completed

Kluyveromyces lactis

Completed

Leishmania major

Completed

Magnaporthe grisea

Unfinished

Mus musculus

Completed

Neurospora crassa

Unfinished

Oryza sativa

Completed

Plasmodium falciparum

Completed

Plasmodium yoelii

yoelii

Unfinished

Rattus norvegicus

Completed

Saccharomyces cerevisiae

Completed

Schizosaccharomyces pombe

Completed

Takifugu

(Fugu) rubripes

Completed

Tetraodon nigroviridis

Unfinished

Thalassiosira pseudonana

Completed

Trypanosoma brucei

Completed

Trypanosoma cruzi

Unfinished

Ustilago maydis

Unfinished





Table 2. Microbial database

Organism name

State of genome work

Archea

Aeropyrum pernix

Completed

Haloarcula marismortui

Completed

Halobacterium

sp. NRC-1

Completed

Halobacterium salinarum

Completed

Methanocaldococcus jannaschii

Completed

Methanococcus maripaludis

Completed

Methanopyrus kandleri

Completed

Methanococcoides burtonii

Unfinished

Methanosarcina acetivorans

Completed

Methanosarcina barkeri

Completed

Methanosarcina mazei

Completed

Methanothermobacter thermoautotrophicum

Completed

Nanoarchaeum equitans

Completed

Picrophilus torridus

Completed

Pyrobaculum aerophilum

Completed

Pyrococcus abyssi

Completed

Pyrococcus furiosus

Completed

Pyrococcus horikoshii

Completed

Sulfolobus solfataricus

Completed

Sulfolobus tokodaii

Completed

Thermoplasma acidophilum

Completed

Thermoplasma volcanium

Completed

Eubacteria

Actinobacillus pleuropneumoniae

Completed

Agrobacterium tumefaciens

Completed

Anabaena variabilis

Unfinished

Archaeoglobus fulgidus

Completed

Aquifex aeolicus

Completed

Azotobacter vinelandii

Completed

Bacillus anthracis

Completed

Bacillus cereus

Completed

Bacillus halodurans

Completed

Bacillus subtilis

Completed

Bacillus thuringiensis

Completed

Bacteroides thetaiotaomicron

Completed

Bartonella henselae

Completed

Bartonella quintana

Completed

Bdellovibrio bacteriovorus

Completed

Bifidobacterium longum

Completed

Bordetella bronchiseptica

Completed

Bordetella parapertussis

Completed

Bordetella pertussis

Completed

Borrelia burgdorferi

Completed

Bradyrhizobium japonicum

Completed

Brucella melitensis

Completed

Brucella suis

Completed

Buchnera aphidicola

Completed

Burkholderia cepacia

Unfinished

Burkholderia fungorum

Completed

Campylobacter jejuni

Completed

Candidatus Blochmannia floridanus

Completed

Caulobacter crescentus

Completed

Chlamydia muridarum

Completed

Chlamydia trachomatis

Completed

Chlamydophila caviae

Completed

Chlamydophila pneumoniae

Completed

Chlorobium tepidum

Completed

Chloroflexus aurantiacus

Completed

Chromobacterium violaceum

Completed

Clostridium acetobutylicum

Completed

Clostridium perfringens

Completed

Clostridium tetani

Completed

Clostridium thermocellum

Completed

Corynebacterium diphtheriae

Completed

Corynebacterium efficiens

Completed

Corynebacterium glutamicum

Completed

Coxiella burnetii

Completed

Crocosphaera watsonii

Unfinished

Cytophaga hutchinsonii

Completed

Dechloromonas aromatica

Unfinished

Deinococcus radiodurans

Completed

Desulfitobacterium hafniense

Completed

Desulfovibrio desulfuricans

Unfinished

Desulfovibrio vulgaris

Completed

Ehrlichia canis

Unfinished

Ehrlichia ruminantium

Unfinished

Enterococcus faecalis

Completed

Enterococcus faecium

Completed

Escherichia coli

Completed

Exiguobacterium

sp. 255-15

Unfinished

Ferroplasma acidarmanus

Completed

Fusobacterium nucleatum

Completed

Fusobacterium nucleatum

Unfinished

Gloeobacter violaceus

Completed

Geobacter metallireducens

Completed

Geobacter sulfurreducens

Completed

Haemophilus ducreyi

Completed

Haemophilus influenzae

Unfinished

Haemophilus somnus

Completed

Helicobacter hepaticus

Completed

Helicobacter pylori

Completed

Kineococcus radiotolerans

Unfinished

Lactobacillus gasseri

Completed

Lactobacillus johnsonii

Completed

Lactobacillus plantarum

Completed

Lactococcus lactis

Completed

Leuconostoc mesenteroides

Completed

Listeria innocua

Completed

Listeria monocytogenes

Completed

Magnetococcus

sp. MC-1

Completed

Magnetospirillum magnetotacticum

Completed

Mesorhizobium loti

Completed

Mesorhizobium sp. BNC1

Unfinished

Methylobacillus flagellatus

Unfinished

Microbulbifer degradans

Completed

Moorella thermoacetica

Unfinished

Mycobacterium avium

Completed

Mycobacterium bovis

Completed

Mycobacterium leprae

Completed

Mycobacterium tuberculosis

Completed

Mycoplasma gallisepticum

Completed

Mycoplasma genitalium

Completed

Mycoplasma mobile

Completed

Mycoplasma mycoides

Completed

Mycoplasma penetrans

Completed

Mycoplasma pneumoniae

Completed

Mycoplasma pulmonis

Completed

Neisseria meningitidis

Completed

Nitrosomonas europaea

Completed

Nostoc

sp. PCC 7120

Completed

Nostoc punctiforme

Completed

Novosphingobium aromaticivorans

Completed

Oceanobacillus iheyensis

Completed

Oenococcus oeni

Completed

Onion yellows phytoplasma

Completed

Parachlamydia

sp. UWE25

Completed

Pasteurella multocida

Completed

Pasteuria nishizawae

Unfinished

Pediococcus pentosaceus

Unfinished

Photorhabdus luminescens

Completed

Pirellula sp. 1

Completed

Porphyromonas gingivalis

Completed

Prochlorococcus marinus

Completed

Pseudomonas aeruginosa

Completed

Pseudomonas fluorescens

Completed

Pseudomonas putida

Completed

Pseudomonas syringae

Completed

Psychrobacter

sp. 273-4

Unfinished

Rhodobacter sphaeroides

Completed

Rhodopseudomonas palustris

Completed

Rhodospirillum rubrum

Completed

Sinorhizobium meliloti

Completed

Rickettsia conorii

Completed

Rickettsia prowazekii

Completed

Rickettsia rickettsii

Unfinished

Rickettsia sibirica

Unfinished

Ralstonia eutropha;

Unfinished

Ralstonia metallidurans

Completed

Ralstonia solanacearum

Completed

Rubrivivax gelatinosus

Unfinished

Rubrobacter xylanophilus

Unfinished

Salmonella enterica s

Completed

Salmonella typhimurium

Completed

Shewanella oneidensis

Completed

Shigella flexneri

Completed

Staphylococcus aureus

Completed

Staphylococcus epidermidis

Completed

Streptomyces avermitilis

Completed

Streptomyces coelicolor

Completed

Streptococcus agalactiae

Completed

Streptococcus mutans

Completed

Streptococcus pneumoniae

Completed

Streptococcus pyogenes

Completed

Synechococcus elongatus

Unfinished

Synechococcus sp. WH 8102

Completed

Synechocystis

sp. PCC 6803

Completed

Thermoanaerobacter tengcongensis

Completed

Thermobifida fusca

Completed

Thermosynechococcus elongatus

Completed

Thermotoga maritima

Completed

Thermus thermophilus

Completed

Treponema denticola

Completed

Treponema pallidum

Completed

Trichodesmium erythraeum

Completed

Tropheryma whipplei

Completed

Ureaplasma parvum

Completed

Vibrio cholerae

Completed

Vibrio parahaemolyticus

Completed

Vibrio vulnificus

Completed

Wigglesworthia glossinidia

Completed

Wolbachia endosymbiont of

Drosophila melanogaster

Completed

Wolinella succinogenes

Completed

Xanthomonas axonopodis

Completed

Xanthomonas campestris

Completed

Xylella fastidiosa

Completed

Yersinia pestis

Completed





Table 3. Expect (E) values for the first hits obtained from BLASTP searches using Thalassiosira pseudonana Scaff_137 and Cyanidioschyzon merolae CMR319C as queries, and from PSI-BLAST searches based on all the members of bacterial 3/3 single-domain globins (SDgbs) and flavohemoglobins (FHbs) and of vertebrate 3/3 neuroglobin (Ngbs)

 

C. merolae

CMR319C

T. pseudonana

Scaffold_137

Bacterial SDgbs

Bacterial FHbs

Vertebrate Ngbs

No. of hits, above threshold/total

146/274

(1st iteration)

131/430

(1st iteration)

864/1,040

(4th iteration)

455/1,022

(3rd iteration)

690/1,040

(3rd iteration)

Bacterial SDgbsa

     

Bacterial FHbs

e

–10

e

–15

e

–42

e

–64

e

–8

Eukaryote FHbsb

e

–7

e

–10

e

–37

e

–53

e

–9

Vertebrate Ngbsc

e

–11

e

–11

e

–13

e

–12

e

–78

Vertebrate Cygbsd

0.04

e

–5

0.02

e

–4

Vertebrate

a

0.06

e

–8

e

–7

e

–10

Vertebrate

b

0.003

e

–3

e

–8

Vertebrate Mbs

e

–8

Urochordate Hbs e

0.6

0.002

e

–13

e

–9

e

–11

Echinoderm Hbs f

0.06

0.5

e

–6

e

–6

e

–8

Plant SHbs g

0.006

e

–4

e

–9

e

–11

e

–8

Plant NsHbsh

e

–4

e

–6

e

–12

e

–8

e

–6

Annelid Hbs i

0.001

e

–4

e

–5

e

–5

e

–5

Crustacean Hbs j

0.001

e

–5

e

–7

e

–5

e

–10

Insect Hbs k

e

–5

e

–4

Mollusc Hbsl

0.05

e

–7

0.04

e

–5

Nematode Hbsm

e

–4

e

–4

e

–4

e

–6

Searchs were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix, and threshold E = 0.005. The E values rounded off to the nearest integer are provided for the first hit, i.e. the first member of a globin group recognized by the query sequence(s). E > 0.005 are given only for cases where the alignment is acceptable (see Discussion). The eukaryote FHbs were not included because the results were similar to those obtained with the bacterial FHbs. Cygbs, cytoglobins; SHbs, 3/3 symbiotic plant Hb; NsHbs, 3/3 nonsymbiotic plant Hb; Mbs, myoglobins.

a

Bacterial SDgbs, with accession nos. in parentheses: Acinetobacter sp. ADP (YP_047731), Aquifex aeolicus (NP_21346), Bradyrhizobium japonicum (NP_769447 and NP_771523), Campylobacter coli (AAP86201), Campylobacter jejuni (NP_282714), Clostridium perfringens (NP_562869), Chromobacterium violaceum (NP_562869), Desulfitobacterium hafniense (ZP_00102525), Gloebacter violaceus (NP_924518), Nostoc punctiforme (ZP_00111020), Novosphingobium aromaticivorans (ZP_00304525), Photobacterium profundun (CAG22319), Pseudomonas aeruginosa (NP_252656), Rhodopirellula baltica (NP_864649), Rhodopseudomonas palustris (NP_949046 and NP_949047), Silicibacter sp. (ZP_00336820), Thermobifida fusca (ZP_00292898), Vibrio parahaemolyticus (NP_800617), and Vitreoscilla stercoraria (AAT01097).

b

Eukaryote FHbs: fungal, diplomonad (
Giardia lamblia), and mycetozoan (Dictyostelium discoideum) FHbs.

c

Vertebrate Ngbs, with accession nos. in parentheses:
Bos taurus (NP_001001853), Canis familiaris (NP_001003356), Cavia porcellus (CAH03122), Danio rerio (NP_571928), Gallus gallus (CAG25721), Homo sapiens (NP_067080), Macaca mulatta (AAP92374), Mus musculus (NP_071859), Ochotona curzoniae (AAT28324), Oncorhynchus mykiss (P59742 and P59743), Oryctolagus cuniculus (CAH-3123), Rattus norvegicus (AAK1576), Sus scrofa (CAG25617), and Tetraodon nigroviridis (CAC59975).

d

Vertebrate Cygbs, with accession nos. in parentheses:
Danio rerio (NP_694484), Gallus gallus (XP_425373), Homo sapiens (NP_599030), Mus musculus (AAH55040), Onncorhynchus mykiss (CAD68070), Rattus norvegicus (NP_570100), Tetraodon nigroviridis (CAG02681), and Xenopus tropicalis (AAH76983).

e

Urochordate Hbs, with accession nos. in parentheses:
Ciona intestinalis (CAD89600, CAD68145, CAD68146, CAD68147).

f

Echinoderm Hbs, with accession nos. in parentheses:
Caudina arenicola (P80017, P80018, AAB19247, and AAB23119) and Paracaudina chilensis (S06134).

g

Plant SHbs, with accession nos. in parentheses:
Canavalia lineata (P42511), Glycine max (P02235, P02236, CAA23729, CAA23731, and CAA23732); Lupinus luteus (O607193A and CAA54331), Medicago sativa (AAA32659, CAA32491, CAA32491, CAA31750, CAA38408, and CAA38409), Medicago truncatula (CAA40899 and CAA40900), Phaseolus vulgaris (P02234, AAA33767, and O04939), Pisum sativum (O80405, O48665, P02233, O9SAZ0, and O9SAZ1), Psophocarpus tetragonolobus (CAA46704), Sesbania rostrata (AAA03002, AAA03005, and CAA32044), and Vigna unguiculata (T11573).

h

Plant NsHbs, with accession nos. in parentheses:
Arabidopsis thaliana (AAP21213 and AAF76353), Brassica napus (AAK07741), Ceratodon purpureus (AAG22831), Glycine max (AAA97887), Gossypium hirsutum (AAL09463 and AAK21604), Hordeum vulgare (AAB70097), Lycopersicon esculentum (AAK07676 and AAK07677), Marchantia polymorpha (AAK07743), Medicago sativa (AAG29748), Oryza sativa (AAK72228 and AAK72229), (AAK72230 and AAK72231), Physcomitrella patens (AAF66104), Raphanus sativus (AAP37043), Solanum tuberosum (AAN85431), Triticum aestivum (AAN85432), and Zea mays (AAG01375).

i

Annelid extracellular Hbs, with accession nos. in parentheses:
Lamellibrachia (P15469), Lumbricus terrestris (CAA09958, C28151, AAC14535, and P08924), Macrobdella decora (BAC82445), Oligobrachia mashikoi (BAD86542 and O7M413), Pheretima hilgendorfi (P83122), Pheretima sieboldi (P11740), Riftia pachyptila (P80592 and CAD29155), Sabella spallanzanii (CAC37410 and CAC37412); Tubifex tubifex (P18202), Tylorrhynchus heterochaetus (P13578). Annelid intracellular Hbs: Glycera dibranchiata monomer (A33420 and S13157), polymer P1 (P23216), P2 (A36529), and P3 (B36529).

j

Crustacean extracellular Hbs, with accession nos. in parentheses:
Artemia salina (AAC96001 and AAC96002), Daphnia magna (AAC47544, AAU44972, BAA76874, and BAA76875), Daphnia spinulata (AAM22870, AAM22873, AAM22874, AAM22875, AAM22876, AAM22878, AAM22880, AAM22882, AAM22883, AAM22884, AAM22886, and AAM22887), Moina macrocopa (BAB62537, BAB62536, and BAB62535), and Parartemia zietsiana (AAF72633).

k

Insect Hbs, with accession nos. in parentheses:
Chironomus tentans (CAA39721, CAA39724, and CAA39725), Chironomus thummi piger (S04499, S04500, S21641, and S21633), Chironomus thummi thummi (AAB58931, AAF87695, A30477, GGICE9, JT0349, JT0423, P02224, P12548, P12550, P29245, O23761, O23762, and O23763), Drosophila melanogaster (NP_732083), and Kiefferulus cornishi (AAC47976, AAC46978, AAC48979, AAC46980, AAC46981, and AAC46982).

l

Mollusc Hbs, with accession nos. in parentheses:
Anadara trapezia (S06503), Aplysia juliana (BAA19794), Aplysia kurodai (P02211), Barbatia lima (BAA09588, S61518, S61519, and S61520), Barbatia virescens (BAA09587), Biomphalaria glabrata (AAC24318 and CAH23232) Buccinum undatum (Q7M424), Calyptogena nautilei (BAD30645), Lucina pectinata (AAO89499, P41261, and P41262) Nassa mutabilis (P31331), and Scapharca inaequivalvis (AAL96376, AAL96375, P14822, and S09068).

m

Nematode Hbs, with accession nos. in parentheses:
Caenorhabditis briggsae, CAE63297 (CBG07681), CAE60743 (CBG04428), CAE64614 (CBG09371), CAE65561 (CBG10551), CAE62918 (CBG07112), CAE66591 (CBG11915), CAE71628, and CAE75175; Caenorhabditis elegans CAB60330 (Y15E3A.2), CAE17840 (F56C4.3), NP_501142 (4H982), NP_492107 (C26C6.7), NP_504466, CAB04152.2 (F21A3.6); AAK52176 (C18C4), and AAP82661 (C23H5.2); Mermis nigrescens (AAF35435); and Syngamus trachea (AAL56426 and AAL56427).



Table 4. Expect (E) values for the first hit obtained from

BLASTP searches using Thalassiosira pseudonana Scaff_18 and Cyanidioschyzon reinhardtii C160981 as queries, and from PSI-BLAST searches based on all the members of bacterial, plant and ciliate 2/2Hbs

 

T.pseudonana

Scaff_18

C. reinhardtii

C160981

Bacterial 2/2 Hbs

Plant 2/2 Hbs

Ciliate 2/2 Hbs

All 2/2 Hbs

No. of hits, above threshold/total

28/61

35/72

138/211

(4th iteration)

68/87

(3rd iteration)

52/111

(2nd iteration)

144/201

(6th iteration)

Bacterial 2/2 Hbs

e

–9

e

–23

e

–37

e

–14

e

–30

e

–33

Plant 2/2 Hbs*

e

–25 to e–20

e

–15

e

–87 to e–89

e

–23

Ciliate 2/2 Hbs

e

–15

e

–26

e

–51

e

–23

Bacterial and eukaryotic FHbs

e

–4

e

–4

Searches were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix and threshold E = 0.005. The E values, rounded off to the nearest integer, are provided for the first hit, i.e. the first member of a globin group recognized by the query sequence(s). E > 0.005 are given only for cases where the alignment is acceptable (see Discussion). The results obtained with the other C. reinhardtii globins were similar. FHbs, flavohemoglobins.

*Plant 2/2 Hbs, with accession nos. in parentheses: Arabidopsis thaliana (AAK55409), Datisca glomerata (CAD33536), Glycine max (AAS48191), Hordeum vulgare (AAK55410), Oryza sativa (BAD32857), and Triticum aestivum (AAN85433).

Ciliate 2/2 Hbs, with accession nos. in parentheses: Paramecium caudatum (S27185), P. multimicronucleatum (S60031), P. triaurelia (S60030), Tetrahymena pyriformis (A36270), and Tetrahymena thermophila (pir|S32556).



Table 5. Expect (E) values for the first hit obtained from

PSI-BLAST searches using as queries Mycobacterium tuberculosis 2/2Hb1 (HbN) and 2/2Hb2 (HbO) and Desulfitobacterium hafniense 2/2Hb3

 

M. tuberculosis

HbN

NP_216058

2/2Hb1

M. tuberculosi

s HbO

NP_216986 2/2Hb2

D. hafniense

ZP_00102524

2/2Hb3

No. of hits, above threshold/total

36/70

(1st iteration)

72/91

(1st iteration)

25/36

(1st iteration)

Bacterial 2/2Hbs 1s/2s/3s

e

–54

e

–56

e

–21

Plant 2/2Hbs*

e

–9

Ciliate 2/2Hbs

e

–16

No. of hits, above threshold/total

62/113

(3rd iteration)

109/138

(6th iteration)

44/86

(3rd iteration)

Bacterial 2/2Hb1s/2s/3s

20

e

–51 to e–34

64

e

–43 to e–19

23

e

–43 to e–29

Plant 2/2Hbs*

e

–21 to e–18

Ciliate 2/2Hbs

e

–29

e

–11

0.01

Searches were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix, and threshold E = 0.005. The E values, rounded off to the nearest integer, are provided for the first hit, i.e., the first member of a globin group recognized by the query sequence(s). E > 0.005 are given only for cases where the alignment is acceptable (see Discussion).

*Plant 2/2Hbs, with accession nos. in parentheses: Arabidopsis thaliana (AAK55409), Datisca glomerata (CAD33536), Glycine max (AAS48191), Hordeum vulgare (AAK55410), Oryza sativa (BAD32857), and Triticum aestivum (AAN85433).

Ciliate 2/2Hbs, with accession nos. in parentheses: Paramecium caudatum (S27185), P. multimicronucleatum (S60031), P. triaurelia (S60030), Tetrahymena pyriformis (A36270), and T. thermophila (S32556).



Table 6. Expect (E) values for the first hit obtained from psiblast searches using all the members of bacterial globin-coupled sensors (GCSs) and protoglobins (Pgbs) as queries

 

Bacterial GCSs

Bacterial Pgbs

No. of hits (above threshold/total)

49/206

(4th iteration)

23/54

(4th iteration)

Bacterial GCSs

*

e

–45 to e–11

e

–9

Bacterial Pgbs

e

–18

e

–91 to e–42

Bacterial FHbs

e

–4

Eukaryotic FHbs

e

–4

Searches were conducted against all organisms, using default settings, the BLOSUM62 scoring matrix, and threshold E = 0.005. The E values, rounded off to the nearest integer, are provided for the first hit, i.e., the first member of a globin group recognized by the query sequence(s).

*Bacterial GCSs, with accession nos. and lengths in parentheses: Agrobacterium tumefaciens [NP_354049 (499 aa) and NP_531216 (568 aa)], Azoarcus sp. (EbN1 YP_158656, 678 aa), Azotobacter vinelandii (ZP_00090857, 434 aa), Bacillus anthracis (YP_031514, 433 aa), Bacillus cereus (ZP_00239241, 433 aa), Bacillus clausii (YP_176973, 449 aa), Bacillus halodurans (NP_241371, 439 aa), Bacillus licheniformis (YP_90716, 430 aa), Bacillus subtilis (NP_388919, 432aa), Bacillus thuringiensis (YP_039413, 433 aa), Bordetella bronchiseptica (NP_888505, 475 aa), Bordetella parapertussis (NP_884745, 475 aa), Bordetella pertussis (NP_882025, 475 aa), Burkholderia fungorum (ZP_00277651, 723 aa), Caulobacter crescentus [NP_419247 (537 aa) and NP_421120 (555 aa)], Chromobacterium violaceum [NP_900548 (295 aa) and NP_899909 (375 aa)], Desulfotalea psychrophila (YP_063937, 363 aa), Desulfitobacterium hafniense (ZP_00102172, 157 aa), Erwinia carotovara (YP_049782, 442 aa), Escherischia colli (NP_416007, 460 aa), Exiguobacterium [ZP_00183969 (430aa) and ZP00182337 (425 aa)], Geobacter metallireducens (ZP_00298606, 300 aa), Geobacter sulfurreducens (NP_954351, 300 aa), Gluconobacter oxydans (YP_191196, 458 aa), Haloarcula marismortui (AAV45247, 497 aa), Halobacterium salinarum (T44978, 489 aa), Magnetococcus sp. [ZP_00289561 (453 aa) and ZP_00289126 (507aa)], Magnetospirillum magnetotacticum [ZP_00054075 (721 aa) and ZP_00054774 (443 aa)], Moorella thermoacetica (ZP_00330036, 245 aa), Novosphingobium aromaticivorans (ZP_00304036, 481 aa), Rhizobium meliloti (S61831, 533 aa), Rhodobacter sphaeroides (ZP_00207372, 341 aa), Rhodospirillum rubrum [ZP_00268251 (442 aa) and ZP_00270832 (445 aa)], Shigella flexneri (NP_707605, 381 aa), Silicibacter sp. [ZP_0038974 (308 aa), ZP_00336325 (485 aa), and ZP00336080 (487 aa)], Sinorhizobium meliloti (NP_384742, 533 aa), Thermosynechococcus elongatus (NP_682779, 194 aa), Thermus thermophilus (YP_005074, 203 aa), Vibrio vulnificus (NP_936636, 306 aa), and Zymomonas mobilis (AAV_89506, 467 aa).

Bacterial Pgbs, with accession nos. and lengths in parentheses: Aeropyrum pernix (NP_147118, 195 aa), Chloroflexus aurantiacus (ZP_00359040, 227 aa), Methanosarcina acetivorans (NP_617780, 195 aa), Methanosarcina barkeri (ZP_000294773, 195 aa), Rubrobacter xylanophilus (ZP_00200180, 196 aa), and Thermobifida fusca (ZP_00293478, 197aa).



Supporting Materials and Methods

All of the putative globin sequences were aligned manually by following the procedure used earlier in the alignment of >700 globins (1) designed to fit the sequences to the myoglobin fold (2, 3), the pattern of predominantly hydrophobic residues at 37 conserved, mostly solvent-inaccessible positions with mean solvent-accessible areas of <15 Å2 (4), including 33 intrahelical residues at positions A8, A11, A12, A15, B6, B9, B10, B13, B14, C4, E4, E7, E8, E11, E12, E15, E18, E19, F1, F4, G5, G8, G11, G12, G13, G15, G16, H7, H8, H11, H12, H15, and H19; the three interhelical residues at CD1, CD4, and FG4; and the invariant His at F8. The alignment was based on the crystal-structure-based alignment of the 58 known globin crystal structures available from the Protein Data Bank: 1A6M, 1IRDA, 1IRDB, 1I3DA, 1A9W, 1OJ6, 1UMO, 1HDSA, 1HDSB, 1A4FA, 1A4FB, 1HBRA, 1HBRB, 1CG5A, 1CG5B, 1SPGA, 1GCVA, 1GCVB, 1OUTA, 1OUTB, 1LA6A, 1LA6B, 1V75A, 1V75B, 2MM1, 1LHT, 1EMY, 1IT2, 2LHB, 1EW6, 1MBA, 1ASH, 1HLB, 1HLM, 1ECO, 2HBG, 1BOB, 1H97, 3SDH, 1SCTA, 1SCTB, 1ITH, 1FSL, 2GDM, 1D8U, 1KR7, 1DLY, 1IDR, 1NGK, 1UX8, 1DLW, 1MWB, 1VHB, 1GVH, 1CQX, 1OR4, and 1TU9. Although an earlier alignment of 700 globins had suggested that globins were characterized by two invariant residues, F8His and CD1Phe (1), members of the recently discovered 2/2 Hb family (5-8), can accomodate other hydrophobic residues, such as Tyr/Met/Leu/Ile/Val at the CD1 position and Ala/Ser/Thr/Leu at the distal E7 position, in addition to His and Gln. Moss 3/3 nonsymbiotic plant Hb also has a Tyr at CD1 (9). Thus, we required only that the putative globin sequence alignment be such as to provide a hydrophobic residue at CD1 in the order of preference Phe>Tyr>Leu>Met>Ile>Val, the residues at the distal E7 in the order of preference His>Gln>Leu&#8776;Thr>Ala&#8776;Val&#8776;Ser&#8776;Tyr, a candidate His at the proximal F8 position, and at the remaining 32 positions an amino acid already encountered in the crystal structure alignment. Deletions in the helical regions were avoided, and interhelical regions were unrestricted to obtain the optimum alignment in the flanking helical regions. The C-terminal FAD reductase sequences and small subunit rRNA sequences were aligned with CLUSTALW and manually improved using GENEDOC.

1. Kapp, O. H., Moens, L., Vanfleteren, J., Trotman, C. N. A., Suzuki, T. & Vinogradov, S. N. (1995) Protein Sci. 4, 2179–2190.

2. Lesk, A. M. & Chothia, C. (1980) J. Mol. Biol. 136, 225–270.

3. Bashford, D., Chothia, C. & Lesk, A. M. (1987) J. Mol. Biol. 196, 199–216.

4. Gerstein, M., Sonnhammer, E. L. & Chothia, C. (1994) J. Mol. Biol. 236, 1067–1078.

5. Pesce, A., Couture, M., Dewilde, S., Guertin, M., Yamauchi, K., Ascenzi, P., Moens, L. & Bolognesi, M. (2000) EMBO J. 19, 2424–2434.

6. Milani, M., Pesce, A., Ouellet, Y., Ascenzi, P., Guertin, M. &. Bolognesi, M. (2001) EMBO J. 20, 3902–3909.

7. Milani, M., Savard, P.-Y., Ouellet, H., Ascenzi, P., Guertin, M. & Bolognesi, M. (2003) Proc. Natl. Acad. Sci. USA 100, 5766–5771.

8. Wittenberg, J. B., Bolognesi, M., Wittenberg, B. A. & Guertin, M. (2002) J. Biol. Chem. 277, 871–874.

9. Arredondo-Peter, R., Hargrove, M. S.,. Moran, J. F., Sarath, G. & Klucas, R. V. (1998) Plant Physiol. 118, 1121–1125.