Sequence-structure analysis of FAD-containing proteins

Orly Dym; David Eisenberg

doi:10.1110/ps.12801

. 2001 Sep;10(9):1712–1728. doi: 10.1110/ps.12801

Sequence-structure analysis of FAD-containing proteins

Orly Dym ¹, David Eisenberg ¹

PMCID: PMC2253189 PMID: 11514662

Abstract

We have analyzed structure-sequence relationships in 32 families of flavin adenine dinucleotide (FAD)-binding proteins, to prepare for genomic-scale analyses of this family. Four different FAD-family folds were identified, each containing at least two or more protein families. Three of these families, exemplified by glutathione reductase (GR), ferredoxin reductase (FR), and p-cresol methylhydroxylase (PCMH) were previously defined, and a family represented by pyruvate oxidase (PO) is newly defined. For each of the families, several conserved sequence motifs have been characterized. Several newly recognized sequence motifs are reported here for the PO, GR, and PCMH families. Each FAD fold can be uniquely identified by the presence of distinctive conserved sequence motifs. We also analyzed cofactor properties, some of which are conserved within a family fold while others display variability. Among the conserved properties is cofactor directionality: in some FAD-structural families, the adenine ring of the FAD points toward the FAD-binding domain, whereas in others the isoalloxazine ring points toward this domain. In contrast, the FAD conformation and orientation are conserved in some families while in others it displays some variability. Nevertheless, there are clear correlations among the FAD-family fold, the shape of the pocket, and the FAD conformation. Our general findings are as follows: (a) no single protein `pharmacophore' exists for binding FAD; (b) in every FAD-binding family, the pyrophosphate moiety binds to the most strongly conserved sequence motif, suggesting that pyrophosphate binding is a significant component of molecular recognition; and (c) sequence motifs can identify proteins that bind phosphate-containing ligands.

It is generally accepted that the three-dimensional structure of a polypeptide chain is determined by its amino acid sequence. Nevertheless, similar folds can have very different sequences. One of the ultimate goals in sequence analysis is to predict the structure and function of a protein based solely on its sequence. In cases where the protein of interest shares at least 30% amino acid identity with another protein, the two proteins generally exhibit similar three-dimensional structure (Doolittle 1986). Alternatively, when proteins are known to have similar structure but divergent sequences, consensus sequence motifs can be used to assess the function of unassigned sequences. These consensus motifs usually correspond to residues interacting with cofactors, substrate, or other proteins.

The increasing number of three-dimensional structures of proteins in the Protein Data Bank, complexed with appropriate ligands, provides an important tool for understanding the mechanisms of molecular recognition. In this study, we focussed on flavin adenine dinucleotide (FAD) because it and its related cofactors, nicotinamide adenine dinucleotide (NADH) and adenosine triphosphate (ATP), appear in many biological processes.

Previous comparative structural studies of mononucleotide- and dinucleotide-binding proteins reveal that some exhibit a similar three-dimensional structure with conserved sequence motifs at positions crucial for binding, although the remaining sequence can vary greatly. One of the first folds was identified by Rossmann (Rossmann et al. 1974) who discovered the correlation between the fold of dehydrogenases that bind the cofactor NADH and conserved sequence motifs. The basic structure consists of six parallel β-strands interspersed by α-helices that appear on both sides of the six-stranded β sheet. This symmetrical α/β structure is built from two halves, β₁α₁β₂α₂β₃ and β₄α₄β₅α₅β₆, with a crossover α-helix (α₃) connecting β₃ and β₄ (Fig. 1 ▶). Each of these folds is known as the classical mononucleotide-binding fold or the Rossmann fold.

Fig. 1. — Topology diagram of dehydrogenases that bind NADH. This symmetrical α/β structure is composed of two halves, β₁α₁β₂α₂β₃ and β₄α₄β₅α₅β₆, with a crossover α-helix (α₃) connecting strands β₃ and β₄. Each of these motifs is known as the classical mononucleotide-binding motif or the Rossmann fold. Cylinders represent α-helices and arrows denote β-strands. Dashed lines indicate elements of secondary structure below the plane of the fold. The overall pseudo two-fold rotation symmetry between the two βαβαβ motifs is shown by an arrow .

A variation of the fold observed for dehydrogenases is found in FAD-containing proteins. This fold consists of one β₁α₁β₂α₂β₃ Rossmann fold, and a variation of both the second Rossmann fold and the crossover α-helix. This variation includes a three-stranded antiparallel β-sheet connecting β₃ and β₄, instead of the crossover α-helix observed in dehydrogenases. Moreover the sixth strand, from the second Rossmann fold, is missing whereas the fifth strand is retained but is close to the end of the sequence. Different variations in which structural elements are added or deleted were found in proteins containing other mono- and/or dinucleotides such as flavin mononucleotide (FMN) (Rao and Rossmann 1973), and nicotinamide adenine dinucleotide phosphate (NADPH) (Schulz and Schirmer 1974). Most of these proteins show a series of conserved amino acid residues at positions interacting with the mono/dinucleotide molecule.

We selected a nonredundant set of 32 protein-FAD complex structures from the Protein Data Base (PDB) for structure-sequence analysis with the goal of deepened understanding of principles of molecular recognition. On the basis of sequence-structure comparison and the interaction of cofactor atoms with the different protein residues, we identified several conserved motifs for each structural family. Some of these were previously characterized by others (Schulz and Schirmer 1974; Schulz et al. 1982; Wierenga et al. 1983; Schulz 1992; Correll et al. 1993; Lu et al. 1994; Ingelman et al. 1997; Fraaije et al. 1998) and some are newly derived. These conserved sequence motifs are called "most conserved" when they are present in all family members, and "partially conserved" when present in some but not all family members. In addition to the sequence-structure analysis, we investigated a more complete set of variables, including cofactor conformation, characteristics of the protein pocket wherein the cofactor is bound, cofactor directionality, and correlation of cofactor moieties (adenine, pyrophosphate isoalloxazine, etc.) interacting with conserved sequence motifs in the different family folds. Such fundamental discriminators may improve our understanding of protein evolution, in particular for FAD-binding proteins where many tertiary structures, often distantly related, are known. Furthermore, the presence of a variety of conserved sequence motifs in FAD families, specifically those that are pyrophosphate-binding, can be used as a tool for molecular recognition of phosphate analog-binding proteins.

Results

FAD cofactor

The dinucleotide FAD consists of adenosine monophosphate (AMP) linked to flavin mononucleotide (FMN) by a pyrophosphate bond (Fig. 2A ▶). The AMP moiety is composed of the adenine ring bonded to a ribose that is linked to a phosphate group. The FMN moiety, also known as the riboflavin moiety, is composed of the isoalloxazine-flavin, linked to a sugar, ribitol, which is coupled to a phosphate group. The catalytic function of the FAD is concentrated in the isoalloxazine ring, whereas the ribityl phosphate and the AMP moiety mainly stabilize cofactor binding to protein residues. The flavin functions mainly in a redox capacity, being able to take up two electrons from one substrate and release them two at a time to a substrate or coenzyme, or one at a time to an electron acceptor. The surrounding protein controls many of the catalytic properties of the flavin ring, such as the rate of accepting electrons, the pathway of electron flow within the flavin ring, and the flavin's oxidation-reduction potential (Mathews 1991). This control is asserted in part by the cofactor conformation, which adapts to the protein as either an elongated or bent butterfly conformation (Fig. 2A and B ▶, respectively). In the bent conformation, the AMP portion is folded back, placing the adenine and isoalloxazine rings in close proximity, whereas in the elongated conformation the adenine ring is distal from the isoalloxazine ring.

The glutathione reductase (GR) structural family

Structural conservation

One of the most thoroughly studied FAD-containing families is represented by the enzyme glutathione reductase (GR). The protein members of this populous family catalyze diverse reactions (Karplus and Schulz 1987; Chen et al. 1994; Mande et al. 1996; Mizutani et al. 1996; Yeh et al. 1996; Enroth et al. 1998; Eppink et al. 1998; Binda et al. 1999; Bond et al. 1999; Lennon et al. 1999; Leys et al. 1999; Trickey et al. 1999; Wohlfahrt et al. 1999; Yue et al. 1999; Ziegler et al. 1999) (Table 1). All family members adopt the Rossmann fold (β₁α₁β₂α₂β₃ in Fig. 3A ▶). The topology of the GR family consists of a central five-stranded parallel β-sheet (β₁, β₂, β₃, β₇, and β₈) surrounded by α-helices (α₁ and α₂) and an additional crossover connection composed of a three-stranded antiparallel β-sheet (β_4–6) (Fig. 3A ▶).

Table 1.

Cofactors and conserved sequence motifs in the nonredundant FAD-containing protein structures retrieved from the Protein Data Bank and used in our sequence-structure analysis

PDB Code	Family	Protein (Reference)	Cofactors	C, NC^a	B, E^b	Sequence motifs
1joa	GR₁	Glutathione reductase (Karplus and Schulz 1987)	FAD, NADH	NC	E	GxGxxG(x)₁₇E, hhhxGxGxxAxE, T(x)₅TxxGD, D(x)₆GxxP
1fcd		Sulfide dehydrogenase (Chen et al. 1994)	FAD	C	E	GxGxxG(x)₁₉E, hhhxPxPxxPxE, S(x)₆HxxGD, D(x)₆PxxA
1ebd		Dihydrolipoamide dehydrogenase (Mande et al. 1996)	FAD, NADH	NC	E	GxGxxG(x)₁₇E, hhhxGxGxxGxE, T(x)₅FxxGD,D(x)₆GxxP
1bzl		Trypanothione reductase (Bond et al. 1999)	FAD, NADPH	NC	E	GxGxxG(x)₁₈D, hhhxGvxxxSxE, T(x)₅YxxGD,D(x)₆GxxP
3grs		NADH peroxidase (Yue et al. 1999)	FAD, NADPH	NC	E	GxGxxG(x)₁₉E, hhhxGxGxxGxE, T(x)₅FxxGD, D(x)₆GxxP
1cjc		Adrenodoxin reductase (Ziegler et al. 1999)	FAD, NADPH	NC	E	GxGxxG(x)₁₉E, hhhxGxGxxAxD, V(x)₆YxxGW, GxGxxG(x)₂₁, FxxGD,
1c10		Thioredoxin reductase (Lennon et al. 1999)	FAD, NADPH	NC	E	D(x)₈GxxP
1foh	GR₂	Phenol hydroxylase (Enroth et al. 1998)	FAD, NADPH	NC	E	GxGxxG(x)₁₆D, S(x)₅FxxGD
1bf3		p-Hydroxybenzoate hydroxylase (Eppink et al. 1998)	FAD, NADPH	NC	E	GxGxxG(x)₁₇E, FxxGD
1cf3		Glucose oxidase (Wohlfahrt et al. 1999)	FAD	NC	E	GxGxxG(x)₁₈E
1b37		Polyamine oxidase (Binda et al. 1999)	FAD,	NC	E	GxGxxG(x)₁₈E, T(x)₅FxxGD
1d4c		Fumarate reductase (Leys et al. 1999)	FAD	C	E	GxGxxG(x)₁₇E
1b4v		Cholesterol oxidase (Yue et al. 1999)	FAD	NC	E	GxAxxG
1aa8		D-Amino acid oxidase (Mizutani et al. 1996)	FAD	NC	E	GxGxxG(x)₁₉D
1b3m		Sarcosine oxidase (Trickey et al. 1999)	FAD	C	E	GxGxxG(x)₁₇D
1fdr	FR	Flavodoxin reductase (Ingelman et al. 1997)	FAD, NADPH	NC	B	RxYS, GxxSxxL(x)₅G(x)₈AxG, MxxxGTAIxP
1a8p		Ferredoxin reductase (Sridhar Prasad et al. 1998)	FAD, NADPH	NC	B	RxYS, GxxTxxL(x)₅G(x)₈PxG, MxxxGTGIxP
1ndh		NADH-Cytochrome b₅ reductase (Nishida et al. 1995)	FAD, NADH	NC	E	RxYT, GxxSxxL(x)₅G(x)₇PxG, MxxxGTGIxP
2cnd		Nitrate reductase (Lu et al. 1994)	FAD, NADH	NC	E	RxYT, GxxTxxL(x)₅G(x)₇PxG, MxxxGSGIxP
1amo		NADPH-P450 reductase (Wang et al. 1997)	FAD, NADPH	NC	E	RxYS, GxxTxxL(x)₆G, MxxxGTGIxP
1cqx		Flavohemoglobin (Ermler et al. 1995)	FAD, NADH	NC	E	RxYS, GxxSxxL(x)₆G(x)₇PxG
1qlt	PCMH	p-Cresol methylhydroxylase (Cunane et al. 2000)	FAD	C	E	P(x)₆GxN, G(x)₇GY, K(x)₆E(x)₂YxxVxxG(x)₈Y, R, GxxL
1dii		Vanillyl-alcohol oxidase (Fraaije et al. 2000)	FAD	C	E	P(x)₆GxN, G(x)₇GY, R(x)₆E(x)₂YxxVxxG(x)₈Y, R, GxxL
2mbr		Uridine diphospho-N-acetylenol- pyruvylglucosamine reductase (Benson et al. 1997)	FAD, NADPH	NC	E	P(x)₆GxN, G(x)₈AY, K(x)₆E(x)₄YxxVxxG(x)₈Y, R, GxxL
1qj2		Carbon monoxide dehydrogenase (Dobbek et al. 1999)	FAD	NC	E	P(x)₈GxN, G(x)₃AY, R(x)₅D(x)₅YxxAxxxG(x), R
1f0x		D-Lactate dehydrogenase (Dym et al. 2000)	FAD	NC	E	A(x)₇AxN, G(x)₇GS, R
1pow	PO	Pyruvate Oxidase (Muller et al. 1994)	FAD	NC	E	R(x)₅GxG, VGxN, K(x)₇IxxDP(x)₉D(x)₄ADxxK, KxLxxLxxxL(x)₆T(x)₄GxV
1efv		Electron transfer (Roberts et al. 1996)	FAD	NC	E	K(x)₅GxG, K(x)₇VGxS, IxxDP(x)₈D(x)₄ADxxK, KxLxxLxxxL(x)₆S(x)₆GxV
1ivh	SM^c	Acyl-CoA dehydrogenase (Tiffany et al. 1997)	FAD	NC	E
1dnp		DNA Photolyase (Park et al. 1995)	FAD	NC	B
1b5t		Methylenetetrahydrofolate reductase (Guenther et al. 1999)	FAD, NADPH	NC	E
1qr2		Quinone oxidoreductase (Foster et al. 1999)	FAD, NADPH	NC	E

Open in a new tab

^a FAD bond type: C, covalent; NC, noncovalent.

^b Conformation: B, bent; E, elongated.

^c SM signifies the four single-membered families.

Fig. 3. — Topological diagram of FAD-binding domain of the four FAD-family folds. (A) Rossmann fold (β₁α₁β₂α₂β₃) adopted by the glutathione reductase (GR) family members. For a full description of this fold, it must be noted that there are two subfamilies, GR₁ and GR₂ (see text), and that there are exceptions to the generalizations described here. For example, D-amino acid oxidase of the GR₂ subfamily is an exception to the rule that the FAD-binding fold in the GR family contains a 3-strand β-meander connecting β3 and β4; instead, it has a crossover α-helix. (B) Ferredoxin reductase (FR) family fold adopting a cylindrical β-domain organized into two orthogonal sheets, β₁β₂β₅ and β₃β₄β₆. (C) The p-cresol methylhydroxylase (PCMH) family fold consists of two α + β subdomains; one is composed of three parallel β-strands (β_1–3) and the second contains five antiparallel β-strands (β_4–8) surrounded by α-helices. (D) The pyruvate oxidase (PO) family fold consists of five parallel β-strands (β_1–5) interspersed by α-helices similar to the double Rossmann fold found in dehydrogenases. Cylinders represent α-helices and arrows denote β-strands. The location, indicated in dashed lines , of the conserved sequence motifs in each of the FAD-family folds is listed in Table 1.

Of the 15 GR family members listed in Table 1, two separate subfamilies were identified using the CE program for comparing a polypeptide chain to each chain in the PDB (Shindyalov and Bourne 1998). One subfamily, GR₁, contains the first seven proteins listed in Table 1; the last eight proteins belong to the second subfamily, GR₂. Each of the two subfamilies is obtained when any of the proteins belonging to it is used as a query in the sequence-structure search. Whereas proteins from the GR₁ subfamily align well through the entire FAD-binding domain, those belonging to the GR₂ subfamily align well only in their N-terminal (∼30 residues). Nevertheless, all GR family members share a similar overall three-dimensional structure in their FAD-binding domain as well as at least one conserved sequence motif (Table 1). Terminal additions as well as various insertions within the fold are present in the members of GR₂, however. Specifically, insertions of several secondary structures are observed in the connections between β₂ to α₂ and α₂ to β₃ of the Rossmann fold (Fig. 3A ▶). This is in agreement with the observation that these proteins align well only in their N-terminal residues comprising β₁, α₁, and β₂. Notably, proteins from GR₁, with the exception of sulfide dehydrogenase (Van Driessche et al. 1996), comprise NAD(P)H-binding domains (between β₇ and β₈ in Fig. 3A ▶), adopting the Rossmann fold (Rossmann et al. 1974), in addition to the FAD-binding domain. Phenol hydroxylase (Enroth et al. 1998) and p-hydroxybenzoate hydroxylase (Eppink et al. 1998) are the only proteins from the second subfamily known to bind NADPH. However, both lack the Rossmann-type NADPH-binding fold.

FAD binding and conformation

The FAD cofactor in the GR family members adopt elongated conformations with the adenine and isoalloxazine moieties distal from each other (Table 1 and Fig. 4A ▶). The degree of extension of the cofactor can vary considerably. In addition, the cofactor may be either covalently or noncovalently bound to the protein, and there is no apparent correlation of cofactor conformation with any of the conserved sequence motifs (Table 1). In contrast, the direction of the FAD cofactor is conserved among all GR family members: the adenine ring of the cofactor points toward the FAD-binding domain, while the isoalloxazine ring points away from it (Fig. 4A ▶).

Fig. 4. — Ribbon representation of the FAD-binding domain of the four FAD-family folds complexed with FAD. (A) Rossmann fold adopted by the glutathione reductase (GR) family members. The blue shading indicates the crossover connection composed of a three-stranded antiparallel β-sheet; the red shading indicates the central five-stranded parallel β-sheet surrounded by two α-helices. The FAD cofactor adopts an elongated conformation with the adenosine moiety (gray circles) pointing toward the FAD-binding domain. (B) Ferredoxin reductase (FR) family. The two antiparallel three-stranded β-sheets are shown in red and blue. The FAD in its bent conformation is shown with the isoalloxazine ring (black circles) pointing toward the FAD-binding domain. (C) The α + β fold adopted by the p-cresol methylhydroxylase (PCMH) family members. In red are three parallel β-strands surrounded by α-helices, and in blue are five antiparallel β-strands surrounded by α-helices. The FAD molecule adopts an elongated conformation and is located in between the two subdomains with the adenine ring pointing toward them. (D) Rossmann fold adopted by pyruvate oxidase (PO) family members, shown in red. The cofactor adopts an elongated conformation and lies perpendicular to the β-strands. The stick drawing of the cofactor is depicted with gray circles for atoms in the adenine and sugar rings, with red circles for phosphate and oxygen atoms, and black circles for atoms for the isoalloxazine ring. Figure created by MOLSCRIPT (Kraulis 1991) and RASTER3D (Bacon and Anderson 1988; Merritt and Murphy 1994).

Conserved sequence motifs

The GR family members show several conserved amino acid residues at a few crucial positions, and the rest of the sequence varies. We conducted a sequence-structure alignment of the two GR subfamily members using the program CE. Conserved sequence motifs found in each of the GR family members are listed in Table 1. Specifically, four conserved motifs for FAD binding are found in GR₁ members (depicted in colored boxes in Fig. 5 ▶), two of which are present in the GR₂ subfamily as well (Table 1).

The most conserved and well-studied sequence motif is part of the Rossmann fold, xhxhGxGxxGxxxhxxh(x)₈hxhE(D) (Fig. 5 ▶, blue box), where x is any residue and h is a hydrophobic residue (Schulz and Schirmer 1974; Schulz et al. 1982; Wierenga et al. 1983; Schulz 1992). This motif is found at the N-terminal part of the sequence, and in fact it is the only conserved sequence motif present in all GR family members (Table 1). This is in agreement with the observation that while members of GR₁ align well throughout the entire FAD-binding domain, GR₂ members align well mainly at the N-terminus. This consensus is known as the dinucleotide-binding motif (DBM) (Wierenga et al. 1983) or the phosphate-binding sequence signature (Moller and Amons 1985) and is a common motif among FAD- and NAD(P)H-dependent oxidoreductases. The central part of this consensus motif, GxGxxG, is part of the loop connecting the first β-strand and α-helix in the βαβαβ Rossmann fold with the N-terminal end of helix α₁ pointing toward the pyrophosphate moiety for charge compensation (Fig. 3A ▶). The importance of the Gly residues in the conserved central GxGxxG is well understood (Wierenga et al. 1986): the first strictly conserved Gly allows for a tight turn of the main chain, which is important for positioning the second Gly. The second Gly, because of its missing side chain, permits close contact of the main chain to the pyrophosphate of FAD, specifically oxygen atoms O_P1 or O_P2. The third Gly allows close packing of the helix with the β-sheet. The hydrophobic residues provide hydrophobic interactions of the α-helix with the β-sheet. The conserved negatively charged terminal residue, Glu or Asp, hydrogen bonds to the ribose 2′ hydroxyl of the adenosine moiety. A variation of this glycine-rich sequence motif is hhhxGxGxxGxE (Fig. 5 ▶, red box), in which the central GxGxxG is common (Wierenga et al. 1986). It is part of the βαβ Rossmann NAD(P)H-binding motif and is present only in GR₁ subfamily members containing the NAD(P)H-binding domain in addition to the FAD-binding domain (first seven proteins in Table 1). Sulfide dehydrogenase belongs to the GR₁ subfamily; however, it lacks the NAD(P)H-binding domain, consistent with the observation that glycines are replaced by prolines in this conserved motif (Fig. 5 ▶, red box). Two enzymes of GR₂, p-hydroxybenzoate (Eppink et al. 1998) and phenol hydroxylase (Enroth et al. 1998), are exceptional: they bind NADPH (Table 1) but lack a Rossmann-type NADPH-binding domain and therefore lack the conserved hhhxGxGxxGxE sequence motif typical of the NAD(P)H-binding domain (Wierenga et al. 1983).

Another highly conserved FAD-binding sequence motif was first identified from studies of rubredoxin reductase (Eggink et al. 1990), and is an 11 amino acid segment, T(S)xxxxxF(Y)hhGD(E) (Fig. 5 ▶, pink box). This motif is present in all members of GR₁ and in some of GR₂ (Table 1). The hydrophobic residues belong to the seventh β-stand of the FAD-binding domain, found near the C-terminus of the protein (Fig. 3A ▶). The initial Thr residue participates in the formation of the `greek key' motif found just before the seventh strand, and the invariant Gly and the terminal Asp residues are part of the loop connecting the β-strand to the α-helix, and hydrogen bonded to the O_P1 or O_P2 atoms of the pyrophosphate group.

There are some partially conserved sequence motifs in the GR family. Vallon (2000) has reported short GG and ATG motifs, present in only a few of the protein members. We found another partially conserved sequence motif, D(x)₆GxxP (Fig. 5 ▶, green box). This motif is situated at the interface between the NAD(P)H- and FAD-binding domains and one of the x residues, located between the Gly and Pro, often an Arg residue, making a polar contact with the isoalloxazine ring. This newly derived sequence motif is absent in GR family members lacking NAD(P)H-binding domain (GR₂ proteins in Table 1).