Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2023 Feb 27;158(8):085104. doi: 10.1063/5.0136010

Emergence of allostery through reorganization of protein residue network architecture

Riya Samanta 1, Neel Sanghvi 1, Dorothy Beckett 2,a),b), Silvina Matysiak 1,a)
PMCID: PMC9974213  PMID: 36859102

Abstract

Despite more than a century of study, consensus on the molecular basis of allostery remains elusive. A comparison of allosteric and non-allosteric members of a protein family can shed light on this important regulatory mechanism, and the bacterial biotin protein ligases, which catalyze post-translational biotin addition, provide an ideal system for such comparison. While the Class I bacterial ligases only function as enzymes, the bifunctional Class II ligases use the same structural architecture for an additional transcription repression function. This additional function depends on allosterically activated homodimerization followed by DNA binding. In this work, we used experimental, computational network, and bioinformatics analyses to uncover distinguishing features that enable allostery in the Class II biotin protein ligases. Experimental studies of the Class II Escherichia coli protein indicate that catalytic site residues are critical for both catalysis and allostery. However, allostery also depends on amino acids that are more broadly distributed throughout the protein structure. Energy-based community network analysis of representative Class I and Class II proteins reveals distinct residue community architectures, interactions among the communities, and responses of the network to allosteric effector binding. Bioinformatics mutual information analyses of multiple sequence alignments indicate distinct networks of coevolving residues in the two protein families. The results support the role of divergent local residue community network structures both inside and outside of the conserved enzyme active site combined with distinct inter-community interactions as keys to the emergence of allostery in the Class II biotin protein ligases.

INTRODUCTION

The discovery of the Bohr effect more than 100 years ago marked the first description of functional allostery, and the term itself was coined ∼60 years ago.1,2 Nonetheless, the molecular basis of allostery is still under investigation. Analysis of protein families from an evolutionary perspective provides a fruitful approach to the molecular-level understanding of this ubiquitous regulatory mechanism.3–5 Bacterial biotin protein ligases, comprised of an allosteric and non-allosteric class, are ideal for such studies.

Overview of biotin protein ligases

Biotin protein ligases (BPLs) catalyze post-translational biotin in addition to biotin-dependent carboxylases. Since the carboxylases themselves are involved in critical metabolic processes, including fatty acid biosynthesis, amino acid catabolism, and glycolysis, BPLs are widespread and essential for viability.7 In both eubacteria and archaebacteria, these enzymes fall into two classes, with Class I enzymes only catalyzing the biotin transfer in which the intermediate, bio-5′-AMP, is first synthesized from biotin and ATP [Refs. 8 and 9, Fig. 1(a)]. In the second step, the intermediate serves as the donor in biotin transfer to the carboxylase biotin carboxyl carrier protein (BCCP) subunits.9 In addition to their enzymatic function, the Class II BPLs repress transcription initiation at biotin related operons, including those that code for biotin synthesis, transport, and/or utilization. They do so by binding to the biotin operator sequence that overlaps the operon promoter to block transcription initiation [Refs. 8, 10, and 11, Fig. 1(a)]. The Class II BPL DNA binding domain that directly interacts with the operator DNA is indispensable for this second function. Nonetheless, allosteric activation of Class II BPL dimerization by the enzyme intermediate, bio-5′-AMP, is an obligatory step in transcription repression complex assembly.12,13 Additionally, a single surface on the Class II ligases, the protein–protein interaction (PPI) region, is used for both heterodimerizations with BCCP and the allosterically activated homodimerization [Refs. 14–17, Fig. 1(b)].

FIG. 1.

FIG. 1.

Functional and structural features of Class I and Class II bacterial biotin ligases. (a) Class I ligases catalyze the two-step biotin transfer to the Biotin Carboxyl Carrier Protein (BCCP) subunit of biotin-dependent carboxylases. Class II ligases have an additional function in transcription-repression, in which the enzyme-intermediate complex, holoBirA, first dimerizes and then binds sequence-specifically to DNA. (b) The 3D structural alignment of the Class I Mtb (4op0) and Class II Ec (2ewn) holoBirA. For clarity, the entire Class I protein structure is shown in gray, and selected Class II protein features are colored. The alignment and models were created in PyMol.6

Sequence and structure of biotin protein ligases

Despite low sequence conservation, even within a single class, the biotin protein ligases (BPL) show high structural similarity. Alignment of the Class I (Mtb) and Class II (Ec) ligase sequences shows 28% identity and 40% similarity. A comparison of the Class I Mtb ligase sequence with that of the Class II Staphylococcus aureus indicates a 25% identity. Even with this modest sequence conservation, the Class I and Class II BPLs share similar structures. Excluding the DNA binding domain, pairwise 3D alignment of the Mtb18 and EcBPL19 structures indicates a backbone RMSD value of 1.8 Å, with the largest deviations localized to the surface loop regions, including Loop 5 (L5), which functions in bio-5′-AMP binding and allostery in the Ec protein [Refs. 15, 20, and 21, Fig. 1(b)].

Thermodynamic, structural, and dynamic features of allostery in the Class II EcBPL are well characterized. First, loops on the effector binding (L1 and L5) and dimerization surfaces (L2 and L4) undergo disorder-to-order transitions concomitant with bio-5′-AMP binding [Refs. 14, 19, and 22, Fig. 1(b)]. Second, perturbation of these disorder-to-order transitions through amino acid substitution compromises the coupling between bio-5′-AMP binding and dimerization.20,21 Third, the disorder-to-order transitions facilitate the formation of an interaction network in holo-EcBPL that incorporates the ligand as well as amino acid side chains of residues in the surface loops, the central domain core, and the C-terminal domain.23 Moreover, experimental results indicate that residues in this network all contribute to allostery. Finally, computational force distribution analysis predicts that the combined surface loop folding transitions and accompanying network formation enable enhanced protein core packing. This packing may be critical for communication between the dimerization and ligand binding surfaces in allostery.23 The emerging picture in the Class II EcBPL is of a distributed allosteric mechanism in which local effector-linked folding processes on the ligand binding and dimerization surfaces communicate through a network of residues in the protein’s central domain core.

Structural and sequence analysis of proteins from the two BPL classes provide little insight into the distinguishing features that enable allostery in the Class II enzymes but not in the Class I enzymes. For example, residues in the ligand binding loop, L1, of all enzymes from the two classes thus far studied undergo disorder-to-order transitions upon ligand binding.14,18,22,24,25 By contrast, variations in folding that do not correlate with allostery are observed in dimerization surface loops as well as the responses of the ligand binding loop L5 to bio-5′-AMP binding. In the Class I non-allosteric Mtb enzyme, L5 folding requires bio-5′-AMP binding, while it does not in the Class I Pyrococcus horikoshii enzyme.18,24 Moreover, the same loop in the allosteric Class II S. aureus protein is ordered in both the apo and holo species.25 Additionally, sequences of the segments that comprise L5 on the ligand binding surface and L2 and L4 on the dimerization surface are not conserved within each BPL class or between them. Therefore, the features that distinguish allosteric from non-allosteric biotin protein ligases have yet to be determined.

This work reports on combined experimental and computational analysis to identify features that enable allostery in Class II BPLs. Although experimental studies of Class I BPLs are limited, comprehensive analysis of the effect of amino acid substitution on the Class II Escherichia coli BPL binding, catalytic, and allosteric functions reveals a range of sensitivities of each function to amino acid substitution. Energy-based network analysis performed on representative members of the Class I and Class II BPL proteins reveals distinct residue networks, connectivity among the networks, and responses to effector binding. Mutual information analysis performed on multiple sequence alignments of the two BPL protein classes indicates distinct patterns of pairwise residue co-evolution. Overall, the results indicate that allosteric function can emerge in a protein family via sequence changes that alter the distribution of residues in local energy-based communities, the responses of the community-network structure to effector binding, and the interaction of these communities with one another.

RESULTS

Experimental analysis of class II EcBPL function

The goal of this work is to determine features that enable the additional allosteric function in Class II BPLs. This requires knowledge of the sequence requirements for catalysis in both protein classes and allostery and its relationship to enzymatic function in the Class II proteins. Although functional measurements of Class I BPLs are limited, extensive experimental studies of the Class II EcBPL have been performed. However, the measurements have focused largely on the effects of single amino acid substitution on bio-5′-AMP binding and allosterically activated dimerization, and measurements of catalytic function have been reported for a limited number of EcBPL variants. In this section, we summarize the previously published impacts of single amino acid substitution on allosteric effector binding and allosterically activated dimerization. New experimental results on effects on catalytic function are also reported. The combined experimental results provide a benchmark for evaluating the computational studies reported in this work.

Amino acid substitutions have distinct effects on bio-5′-AMP binding (allosteric input) and homodimerization (allosteric output) in Class II EcBPL. Residues distributed throughout the central domain, including those in the catalytic/allosteric site (CS/AS) function in allostery [Fig. 2(a), Table S1]. Moreover, depending on the position and type, amino acid substitution can tune allosteric coupling up or down. By contrast, amino acid substitutions that alter effector bio-5′-AMP binding to EcBPL are largely localized to the CS/AS, and only decreases in binding affinity were detected [Fig. 2(b), Table S1].

FIG. 2.

FIG. 2.

(a)–(d) Maps of residue positions at which amino acid substitution significantly affects EcBirA function in (a) homodimerization, (b) bio-5′-AMP binding, (c) catalysis of bio-5′-AMP synthesis, and (d) catalysis of biotin transfer from bio-5′-AMP to BCCP. Each sphere indicates the alpha carbon of an amino acid at which substitution results in at least a five-fold change in the magnitude (red: decrease, blue: increase in activity) of the parameter relevant to the process, KDIM, KD, kapp, and kcat/Km (see text for details). (e) No correlation is observed between the effects of amino acid substitution on dimerization energetics (allosteric output) and on each of the other three measured functions.

Large perturbations to catalytic function are dominated by residue substitutions in the EcBPL catalytic/allosteric site

Sequence determinants of EcBPL catalytic activity were evaluated by measuring the effects of substitutions on rates of enzyme-catalyzed bio-5′-AMP synthesis and biotin transfer to BCCP [Fig. 1(a)]. The bio-5′-AMP synthesis rates were measured by monitoring the time-dependent decrease in the protein intrinsic fluorescence signal upon rapid mixing of the enzyme–biotin complex with ATP,26 and apparent rates were obtained by analyzing the fluorescence vs time data using a single exponential model (Table S1). The magnitudes of the effects range from an absence of measurable activity to an approximately five-fold decrease in the apparent rate (Table S1). The results indicate >5-fold decreases in the bio-5′-AMP synthesis rate at 11 out of the 31 residue positions tested. Of these 11 residues, 9 are in the segments that form loops L1, L3, and L5 in the CS/AS [Fig. 2(c)]. Alanine substitution of only one dimerization surface residue, G142, has a significant effect on bio-5′-AMP synthesis. However, previous studies indicate that this substitution disrupts bio-5′-AMP-induced folding of loops on both the dimerization (Loop 4) and ligand binding (Loop 5) surfaces.27

EcBPL-catalyzed biotin transfer from bio-5′-AMP to the EcAcetyl CoA Carboxylase biotin acceptor domain fragment (BCCP) is sensitive to the substitution of only a limited number of residues, the majority located in the catalytic site. Biotin transfer was measured by monitoring the time-dependent increase in the intrinsic EcBPL fluorescence upon mixing the enzyme intermediate complex with excess BCCP, and kCAT/Km values for transfer were obtained from linear regression of the measured kapp vs [BCCP] data.28 Significant effects (>5 fold) on the kCAT/Km value were limited to substitutions of 11 residues located in the CS/AS and on the dimerization surface [Fig. 2(d), Table S1]. In the CS/AS substitution at one position in L1, four positions in a highly conserved KWPND residue cluster in L3 [Fig. 3(b)] and one at the conserved active site residue, K183, yielded significant decreases in the kCAT/Km value for biotin transfer. On the dimerization surface, in addition to G142, alanine substitution at positions G196 and K194 results in >5-fold decreases in kCAT/Km values.29 However, the substitutions at these two latter positions also significantly enhance homodimerization energetics. Since the same surface is used for both interactions, the apparent impact on biotin transfer may reflect the decreased ability of the heterodimerization with apoBCCP to compete with the tighter homodimerization interaction.

FIG. 3.

FIG. 3.

Energy-based network analysis indicates critical roles for catalytic residues of the Class I and Class II BPLs. (a) Model of the Class I MtbBPL with the following segments colored: Loop 1 (L1) is brown, Loop 2 (L2) is green, Loop 3 (L3) is purple, Loop 4 (L4) is blue, Loop 5 (L5) is red, Helix 1 (H1) is teal, Helix 2 (H2) is pink, and Helix 3 (H3) is light blue. (b) Betweenness centrality values are normalized to the highest value for all residues in the apo and holo-species of Class I Mtb and Class II EcBPL proteins. The colored regions in the plots correspond to the segments highlighted in the structural model. In each plot, the asterisk marks residue D261 in MtbBPL and E313 in EcBPL. (c) Alignment of the Class I Mtb and Class II EcBPL sequences with regions corresponding to those mapped on the structural model in (a) highlighted in color.

The relationship between the utilization of Class II EcBPL sequence information for allosterically activated dimerization and each of the remaining three functions was evaluated. Correlation plots were constructed for allosterically activated dimerization and each of the other three functions, with values placed on an energy scale by taking the natural logarithm of the measured rate or equilibrium constant. No correlation was observed between the magnitudes of the functional effects of amino acid substitution for any of the three pairs of functions [Fig. 2(e)]. Therefore, although overlap exists for a subset of residues between those important for allosteric output and each of the other three functions, the magnitudes of effects for any pair of functions are unrelated. This result is consistent with the lack of correlation between effects on dimerization and bio-5′-AMP binding previously reported for a subset of the variants.23

Overall, the experimental results obtained for the Class II EcBPL indicate that amino acid residues that function in allosterically activated dimerization span the protein central domain from the CS/AS to the PPI and include the C-terminal domain residues. By contrast, a majority of the residues critical for effector/catalytic intermediate binding and the two catalytic steps in biotin transfer are primarily localized to the CS/AS. These results provide a reference for the computational analyses described below.

Energy-based computational network analysis reveals critical roles for the conserved active site residues and distinct responses to ligand binding in non-conserved regions

This work aims to identify distinguishing features that enable bio-5′-AMP activated dimerization in Class II BPLs. The experimental results indicate that catalytic site residues form a subset of residues that function in allosterically activated EcBPL dimerization. Moreover, previous computational analysis of the Ec protein holo-species revealed an energy-based residue network in the CS/AS that couples the dimerization surface and effector binding sites. The results of experimental measurements confirmed the importance of the network residues for allostery.23,30 In this work, the network analysis was extended to both the apo and holo species of representative Class I Mtb and Class II EcBPLs to define the network structures in the two proteins and their responses to the bio-5′-AMP binding. The relative significance of each residue for the energy-based network in each of the four protein species was first determined using computational network analysis of all-atom MD simulation trajectories (see the section titled Materials and Methods for details on the simulations). In the analysis, each residue is a node, and the edges connecting them are scaled by a function of energy, with the shortest paths corresponding to those with the lowest (and most favorable) energies. All residues in each species, including the ligand in the holo species, were ranked according to their normalized Betweenness Centrality (CB) scores, and residues with values in the top 5% were designated as critical network nodes (Fig. 3). For simplicity, the protein regions of interest are highlighted in color on only the Class I MtbBPL structural model.

Critical nodes in the catalytic site/allosteric site (CS/AS) of Mtb and EcBPLs show distinct responses (Fig. 3) to bio-5′-AMP binding. The CS/AS in both Class I and Class II ligases contains several critical residues in the conserved loop L1 that folds over the biotin moiety of the effector, in L3, which contains the KWPND sequence that is conserved in all BPL sequences, and in the non-conserved adenylate binding loop, L5. Equivalent L3 residues form critical nodes in the apo and holo species of both proteins. However, residues in L1 and L5 show distinct responses in the two proteins. For example, in L1, residue R118 with a high CB value in Class II apoEcBPL is exchanged for R121 in the holo-species, while the residues R72 and W74 (equivalent to R121 and W123 in EcBPL), with high CB values in Class I apoMtbBPL, have very low values in the holo-species. In L5, the CB values for two residues in EcBPL are significantly larger than those in the Mtb protein. In the C-terminal domain, equivalent residues D261 and E313 in Mtb and EcBPL, respectively, which form electrostatic interactions with L1 residues in the holo species, are critical network residues in both the apo and holo species of each protein.

Residues in the central domain three-helix cluster, H1, H2, and H3, show distinct responses to effector binding in the Class I and II BPLs. Previously, studies of holoEcBPL indicated that the packing of these three helices is altered in variants with defects in allostery.23 In H1, which spans the CS/AS to the PPI region, the Class I Mtb protein contains critical nodes in both apo and holo species. By contrast, the Ec protein residues in this helix are characterized by modest CB values that undergo small decreases upon effector binding. Although effector binding has modest effects on the CB distributions of H2 and H3 in both BPLs, in the Mtb protein, these helices contain more critical residues than they do in EcBPL [Fig. 3(c)]. Therefore, while conserved catalytic site residues form critical nodes in the network structures of both Class I and II BPLs, the distribution of critical nodes outside of the CS/AS and their responses to effector binding are distinct for the two proteins.

Network community structures and linkages between the communities are distinct in Class I and II BPLs

The energy-based network analysis indicates that the CS/AS region contributes critical nodes to the Mtb and EcBPL residue networks. In Class II BPL allostery, the signal for bio-5′-AMP binding to the CS/AS is transmitted to the dimerization surface, or PPI, on the opposite face of the protein structure. Consequently, the determination of the molecular origins of distinct functional responses to effector binding in the two proteins requires analysis of how the CS/AS of each interacts with the remainder of the protein. Energy-based Community Network Analysis (CNA) provides a tool to identify residue clusters or communities that arise from variations in local interactions in a protein residue network. Additionally, the analysis can reveal the locations and strengths of linkages between the communities. In residue communities identified using CNA, the intra-community connections among residues are, by definition, more numerous and stronger than the inter-community links. However, since the inter-community links act as bottlenecks for energy transmission between protein regions, identification of their strengths and locations provides information about the communication between protein regions.

The Class I and Class II BPLs have distinct community networks that undergo consolidation in response to bio-5′-AMP binding (Fig. 4). Due to the absence of a DNA binding domain, the Class I polypeptide chain has 56 fewer residues than the Class II. Nevertheless, the Mtb and Ec BPL apo species are organized into nine and eight communities, respectively, that decrease to seven and six in the holo-species. Consistent with their larger number, the community sizes are smaller for the Class I BPL. The average numbers of residues in the apo and holo Class I BPL communities are 29 and 37, respectively, while in the Class II protein, they are 39 and 52 (Table S2). These smaller average numbers are biased by the presence of communities comprised of fewer than ten residues in the Class I protein. The residue community shift associated with bio-5′-AMP binding was quantified by calculating the Normalized Hamming distance (NHD), which provides a measure of the redistribution of community partitions. Values of 0.43 and 0.41 for the Class I Mtb and Class II Ec BPL apo to holo transitions, respectively, are consistent with ∼40% of the residues undergoing community redistribution upon bio-5′-AMP binding.

FIG. 4.

FIG. 4.

Distinct residue communities and connections in the Class I and Class II BPLs. (a) and (b) Residue communities for the Class I: Mtb and Class II: EcBPL apo and holo species are shown on the protein structure 3-dimensional models prepared in PyMol with input files 4opO (Class I) and 2ewn (Class II). (c) Coarse grained representations of the community structures in which circles represent residue communities with sizes proportional to the number of residues in each community and the connecting line thicknesses scale with the strengths of the inter-community links. (d) Distribution of the connectivity degree, or number of connections, for the Class I and Class II BPLs, apo-(pattern) and holo-(clear) species. Definitions of protein regions: DBD-DNA binding domain, CS/AS-Catalytic/Allosteric site, PPI-Protein–protein interaction surface, CTD-C-terminal domain.

To facilitate community structure description and responses to effector binding, the two proteins were divided into four “sectors” or regions. These include the N-terminal (DBD in Class II EcBPL, Class I: red, Class II: red/gray in apo, gray in holo), C-terminal domain (CTD, teal in all species) regions, the catalytic site/allosteric site (CS/AS, Class I: red and brown in both species, Class II: magenta and brown in both species) and the protein–protein interaction (PPI) region (Class I and Class II apo: blue, green, and orange; Class I and Class II holo: blue and green). Although the DNA binding domain distinguishes the Class II ligases from the Class I ligases, in the apo-species, the red community is made up of similar N-terminal segments in both proteins. However, while the red community is nearly preserved in the Class I holo species, it is absorbed into the DNA binding domain (gray) and the CS/AS (brown) in Class II holoEcBPL. In both proteins, the teal community is primarily made up of C-terminal domain residues. Yet, in the Class I, MtbBPL it decreases in size with effector binding, while in the Class II Ec protein, it expands to infiltrate the central domain (Fig. 4, Table S2). The network structures that define the CS/AS differ markedly in both species of the two proteins. First, in the Class I protein, the region is divided into two major communities (brown and red) in both apo and holo species, with most of the Loop 5, L5, residues located in the red community. By contrast, in EcBPL, most CS/AS residues belong to a single brown community that, upon effector binding, expands into the central domain core beta-sheet region. In apo-EcBPL, this sheet and one face of helix 1 (H1) form part of a large community (purple) that, in the holo-species, forms the much smaller DBD/central domain interface. Finally, residues of helices H1, H2, and H3 (Fig. S3) that converge on the PPI region in Class I MtbBPL are distributed in multiple communities in the apo species that coalesce into a single community (blue) in the effector-bound species. Likewise, in Class II EcBPL, residues in these same helices are distributed in multiple distinct communities in the apo species (pink, blue, and orange). However, although some of these residues (orange and pink) also redistribute to the blue community in the holo species, others shift to the green and teal communities (Fig. S2). Thus, despite their similar three-dimensional structures, the energy-based residue network structures are distinct for both the apo and holo species of the Class I and Class II BPLs.

Inter-community links are distinct in the apo and holo species of the class I and class II BPLs

The inter-community links and the significance of each community for the entire network were determined from coarse-grained (CG) representations of the residue networks in each protein. In these CG networks, each node represents a residue community, and each edge between any two communities is weighted by the linkage strength, provided by the Cumulative Edge Betweenness Centrality (CEBC) score [Fig. 4(c)]. The Class I apo-MtbBPL species is characterized by dispersed connectivity with many weak linkages between communities [Fig. 4(c)]. Although some of these weak links are preserved in the holo species, overall, effector binding shifts the protein toward residue communities with fewer (lower degree) but stronger linkages [Figs. 4(c) and 4(d)]. In Class II EcBPL, communities with fewer linkages are more probable in the apo relative to the holo species, consistent with increased connectivity upon ligand-binding (Fig. 4). Overall, the inter-community linkages for the Ec apo and holo proteins trend toward larger weights than those in the Mtb protein [Fig. 4(c)].

The contribution of each community to the overall network structure was calculated using Current Flow-based Betweenness Centrality (Cflow) analysis, in which the network is treated as a resistor and “nodes” are ranked according to their “current-flow centrality” or Cflow score. This score provides a measure of the amount of information that passes through each node via inter-community links (Table S2). In Class I apoMtb, the blue community, comprised of helices near the PPI region, with a Cflow score of 0.49, is the most critical. In the Class II apoEc protein, it is the purple community, which incorporates most of the central domain beta sheet, that has the highest Cflow score of 0.61 (Table S2). In the holo-species of both proteins, the brown community, defining the CS/AS, has the highest Cflow score, with values of 0.69 and 0.57 for Class I Mtb and Class II EcBPL, respectively. To visualize the information flow in Class I and Class II BPL apo and holo species CG networks, the top 20 inter-community links were mapped onto the protein 3D models (Fig. 5, Table S4). Overall, a comparison of the apo and holo-species of the two proteins reveals that inter-community links in the Class I protein undergo less reorganization than they do in the Class II protein. While four inter-residue links are preserved in the former, only two are in the latter (Table S4). Moreover, the number of top intercommunity links to the C-terminal domain are six and three in the apo- and holo MtbBPL, respectively, and only two in both EcBPL species. In the Class I apo species, the top links are distributed in the CS/AS by the three helices that converge on the PPI and between the catalytic and C-terminal domains, while in the holo species, 8 of the top 20 links are formed between the effector molecule and residues in the green (6) and red (2) adjacent communities (Fig. 6, Table S3). In the holo species of both proteins, the CS/AS region, defined by the brown community, participates in the largest number of the top 20 inter-community links. However, in the monofunctional protein, more than half of these involve interactions between protein residues and bio-5′-AMP, while in the bifunctional protein, only four are effector mediated (Table S4). In the Class II holo-EcBPL, the remaining top intercommunity links are formed between residues distributed across the central domain from Loop five in the CS/AS to the blue community that defines the PPI surface. Notably, in the Ec protein holo species, no DNA binding domain residues contribute to the top 20 links.

FIG. 5.

FIG. 5.

Intercommunity links mapped in the Class I and Class II BPL structures: (a) Class I MtbBPL and (b) Class II EcBPL apo and holo species with the top 20 intercommunity links shown as yellow bars. Models were created in PyMol (6). (c) Number of top intercommunity links formed in apo and holo Class I Mtb and Class II EcBPL by residue communities, with 40 residues participating in the 20 links. In each model, the Catalytic site/Allosteric site (CS/AS) is highlighted with an oval.

FIG. 6.

FIG. 6.

Cumulative mutual information for Class I and Class II BPLs. (a) Residue positions with the top 10% of cumulative mutual information values for Class I and Class II BPLs. Black circles: Cumulative mutual information values normalized to the highest value obtained. The bars indicate conservation, calculated using the Kullback–Leibler method (see the section titled Materials and Methods), with values above >2.5 in red (conserved) and those <2.5 in blue (non-conserved). The colored regions on each plot represent the same regions that are highlighted in Fig. 3. (b) and (c) Positions with the highest CMI values mapped on the Class I Mtb and Class II EcBPL three dimensional structures. Red: conserved; blue: non-conserved. The models were constructed using PyMol6 with the bio-5′-AMP structure shown with black sticks.

Residue pair co-evolution in class I and class II BPL sequences

The energy-based CNA reveals distinct residue communities and interactions among these communities in apo and holo species of single Class I and Class II BPL proteins. However, these single proteins may not be representative of the entire protein classes. A mutual information analysis was performed to obtain protein family level information about inter-residue interactions that may be responsible for functional differences in the BPLs. First, multiple sequence alignments, using manually curated input files, were generated for the Class I and Class II proteins. For each BPL class, sequences from all taxonomic groups of both Eubacteria and Archaea were included in the input data. All candidate sequences were inspected for the presence of highly conserved signature ligase sequences, including the glycine and arginine-rich biotin binding loop (L1) and the KWPND sequence (L3) [Fig. 3(c)]. The absence of the N-terminal DNA binding domain distinguished Class I BPL sequences from Class II. Sequences containing long C-terminal domain extensions, such as those for the Neisseria genus proteins, in which this extension codes for a putative pantothenate kinase, were excluded from the analysis. The resulting sequence files, two independent collections for each ligase class, each contained 1000–1200 sequences. Inspection of alignments obtained using Clustal Omega revealed the anticipated ligase signature sequences for both BPL classes as well as the winged helix-turn-helix of the DNA binding domain in the Class II BPLs (data not shown).

A mutual information analysis using the MISTIC2 server was performed to determine the pairwise correlation in amino acid residue identity in the biotin protein ligases. When applied to protein MSAs, the MI score provides a measure of mutual information shared between two positions. The score indicates the extent of correlation between their identities, with larger MI values indicating greater correlation. In principle, this correlation may reflect constraints due to structure, stability, and/or function.31 Nevertheless, the comparison of results obtained for the Class I and Class II BPLs may provide clues about features responsible for their distinct functional responses to the bio-5′-AMP binding.

The MI analysis of Class I and Class II BPLs was first used to identify residues with high degrees of pairwise evolutionary correlation with other residues in each protein. In its implementation of MI analysis, the MISTIC2 server enables the extraction of Cumulative Mutual Information (CMI) scores, which provide measures of the number and/or strength of the pairwise correlations for each amino acid position in an alignment. Since the Class I proteins contain no DNA binding domain, only the MI results obtained for regions corresponding to the central and C-terminal domains of the Class II proteins were included in the comparison. The exclusion of the DNA binding domain residues is justified by the observation that independent MI analysis of the two Class II MSAs yielded only 2 and 3 N-terminal domain residues, respectively, with relatively high CMI scores. Moreover, in the two analyses, the positions of these residues in the sequence were not preserved.

Residues with the top 10% of CMI scores or 26 residues (the average length of a Class I BPL of ∼260 residues) were identified for the two protein classes (Fig. 6). Since the magnitudes and ranges of the raw CMI scores varied for each MI analysis, the scores are normalized to the top CMI value obtained in each analysis. While residues in the conserved sequences in Loops 1 and 3 of both Class I and Class II BPLs are, in general, characterized by high CMI values, the magnitudes of the relative values are higher in the Class II proteins [Fig. 6(a)]. Moreover, similar results were obtained for residues in non-conserved helix 1 (H1) in the two protein classes. The most significant differences are observed for the C-terminal domain, where Class I BPL residues with the highest relative CMI values are found [Figs. 6(a) and 6(b)]. By contrast, in the Class II BPLs, no residues with the top 10% of CMI values are located in this domain. The two BPL classes also differ in the distribution of high CMI values in conserved vs non-conserved positions. While in the bifunctional proteins, most residues with the highest values are at conserved positions, in the Class I proteins, ∼50% of the residues with CMI values among the top 10% are at non-conserved positions.

Locations of residue links with high pairwise correlation are distinct in the class I and class II BPL families

Maps of pairwise residue correlation networks were obtained for the 26 residues in each protein family with the highest 10% of CMI values. As with the CMI analysis, the DNA binding domain of the Class II proteins was excluded from this analysis. For each of the 26 residues, the five residue partners that share the highest MI values were first identified. The resulting 130 total “connections” with MI values ranging from 7 to 33 were mapped onto 3-dimensional models of Class I (MtbBPL) and Class II (EcBPL) proteins (Fig. 7). While some of the top linkages are formed between residues that belong to the group with the top 10% of CMI values, other linkages are to residues characterized by relatively low CMI values (shown in gray in Fig. 7).

FIG. 7.

FIG. 7.

Pairwise mutual information mapped onto the Mtb and EcBPL structures: (a) Class I and (b) Class II BPL The red and blue spheres are the alpha carbons for residue positions with the top 10% of CMI scores. The yellow lines connect each of the top CMI residues with the five partner residues with which it shares the highest MI, with gray spheres representing partners that are not among those with the top 10% of CMI values. The models were created in PyMol with the Catalytic Site/Allosteric Site (CS/AS) highlighted with an oval.6

Mutual information reveals distinct networks of residue correlations in the two BPL classes. Based on the number of residue positions that belong in the top 10% and the connections between them, the CS/AS plays a central role in the network of coevolving residues in both protein classes [Figs. 7(a) and 7(b)]. However, the two networks differ in the locations of the high MI connections formed with the remainder of the protein. In the Class I proteins, many of the top MI linkages connect residues from the primarily CS/AS region to those in the C-terminal domain. By contrast, all high MI linkages in the Class II proteins are located in the central domain. Furthermore, in the Class II proteins, more high MI connections are formed in the three-helix region, as discussed previously in the context of the energy-based network analysis. Therefore, the MI analysis indicates distinct interactions between the CS/AS and the remainder of the proteins in the Class I and Class II BPL protein families. Analysis performed on the two independently constructed MSAs of each BPL class yielded similar results.

DISCUSSION

Investigation of the evolution of allostery in a protein system provides one approach to achieving a molecular level understanding of this important regulatory mechanism. In this work, experimental measurements and computational tools from network theory (CNA) and bioinformatics (MI) were used to identify features that distinguish the non-allosteric Class I from the allosteric Class II BPLs. Experimental measurements indicate that catalytic site residues are critical for both the enzymatic activity and allosteric effector binding. They also form a subset of broadly distributed residues that function as Class II EcBPL allostery. The results of CNA and MI analyses indicate a central role for catalytic site residues in both protein classes. However, the analyses also indicate distinct overall residue network structures and interactions of the catalytic site with the remainder of the protein in the Class I and Class II BPLs. Therefore, despite their similar three-dimensional structures and catalytic functions, the emergence of allostery in Class II BPLs is correlated with the evolution of a divergent underlying residue network structure throughout the protein.

The results of the energy-based network and mutual information analysis indicate that the catalytic site/allosteric site is critical to the residue networks of Class I and II BPL proteins. The two analyses, which yield Betweenness Centrality, CB, and cumulative mutual information, CMI, scores, provide measures of the relative significance of each amino acid residue for the protein network. The results reveal overlapping regions in the two proteins that are rich in residues with large CB and CMI values (Fig. 8). Calculation of the overlap of the results yielded by the two methods for each protein class using the Sokal–Michener distance metric yields values of 0.23 and 0.28, respectively, for Class I and II proteins, consistent with high similarity.32 Mapping of the positions of these residues on the aligned Class I and II sequences indicates that they are concentrated in the conserved sequences of Loops 1 and 3 as well as in the stretch of sequence connecting Loops 3 and 5 in the CS/AS. Notably, the substitution of L1 and L3 residues has large effects on both catalytic steps in biotin attachment to BCCP by the Class II EcBPL (Table S1). Although no measurements of the impact of analogous substitutions on Class I MtbBPL enzymatic activity have been performed, the high sequence conservation in these two regions suggests that these residues are also important for catalysis and likely all other members of both BPL classes. The combined results indicate that two distinct network analysis methods, one physics-based and the other evolutionary-based, identify functionally significant protein regions in the Class I and II BPLs.

FIG. 8.

FIG. 8.

Results of computational and experimental measurements are illustrated on an alignment of representative Class I (Mtb) and Class II (Ec) BPL sequences. The colored boxed regions represent Loop 1 (L1), Helix 1 (H1), Loop 3 (L3), Loop 5 (L5), Helix 2 (H2), and Helix 3 (H3). The residues with the top 10% of CB and MI values are indicated by ^ and # for CB apo and holo, and * for MI. The single letters for amino acid shown in color represent residues at which substitution results in a >five-fold decrease (red) or increase (blue) in allosterically activated dimerization. The dashed lines connecting W74 (black) in the Class I sequence and W123 (gold) in the Class II sequence with five other residues represent top mutual information connections.

The two computational methods reveal residue networks that provide insight into the molecular origins of the distinct responses of the two BPL protein classes to allosteric effector binding. The CS/AS regions of both protein classes share high sequence similarity and, as indicated above, contain a large number of highly ranked network nodes. Yet, allosteric signal transmission from the CS/AS to the PPI occurs only in the Class II proteins. Consistent with the spatial juxtaposition of the CS/AS and PPI and their functional roles, experimental measurements of the Class II EcBPL variants indicate that residues are distributed in the central domain, including the CS/AS function in allostery [Figs. 2(a) and 8]. A comparison of the top five MI network connections formed by analogous Class I and Class II residues W74 and W123, respectively, provides a simple example that illustrates the distinct network responses to effector binding (Fig. 8). Each of these tryptophan residues is located in L3 of the CS/AS. While three of the five top network connections formed by the residue are conserved in the two proteins and located in the active site, the remaining two connections are distinct. Moreover, in the Class I protein, these two additional connections, like the previous three, are formed with residues within the CS/AS region. In the Class II protein, the two partner residues are both distant in sequence and in space. Although for these single tryptophan residues, these distinct networks seem minor, extension of this simple example to the significantly larger number of network connections formed in the entire protein leads to globally distinct patterns of connectivity in the two protein families.

In Class I and Class II BPLs, distinct network connections are formed between the CS/AS and the remainder of each protein. First, in the holo species of both proteins, the CS/AS region dominates the community structure. However, in the Mtb protein, more than half of the 15 links formed by this community are between the bio-5′-AMP ligand and residues in the nearby red and green communities. By contrast, in the Class II EcBPL, only four of the top 14 intercommunity links formed by the CS/AS region engage the effector ligand. Consistent with experimental results, the remaining 10 links connect this region to communities that span the central domain, including Loop 5 (the adenylate binding loop), the C-terminal domain, and the three-helix region. The MI analysis reinforces the dominance of Class I BPLs through interactions within the active site and between this site and the C-terminal domain. By contrast, in the Class II proteins, and consistent with experimental results, MI linkages span the central domain from the CS/AS to the PPI surface. Notably, the energy-based network analysis of the EcBirA holo-species revealed the absence of any DNA binding residues among the top 20 intercommunity links, and in the MI analysis, no DNA binding domain residues were among those with the highest cumulative mutual information values. These computational results are consistent with the experimentally observed absence of any measurable effect of amino acid substitutions that alter allosterically activated dimerization on the biotin operator binding affinities of variant holoBirA dimers.33

While energy-based inter-community network linkages obtained from CNA occur over short distances, residue pairs characterized by high MI values are separated by a combination of short and long-distances. The shorter distance interactions identified in the energy-based network analysis reflect the constraints used in defining edges, which are restricted to nodes (residues) that lie within 1.7 times the sum of the Van der Waals radii of the relevant atoms for 20% of the trajectory segment analyzed. Maps of high MI connections on the protein structures reveal that in both protein families, many are formed between residue pairs separated by short distances. For example, in the Class I proteins, pairs of residues in close proximity to one another within the active site and the C-terminal domain have high MI values. However, distances are large for some high MI residue pairs. For example, residues G99 in MtbBPL and G148 in EcBPL, located at the N-terminus of helix 1 (H1) in both proteins, form MI connections over a range of distances. In the Class I protein, the distances between G99 and its five highest MI partners range from 9.5 to 17 Å, while in the Class II protein, they range from 5.8 to 31 Å. Previous studies revealed that although MI appears to perform well in identifying residues close in space in the folded structure, its reliability in predicting long distance interactions has been questioned.34–36 Nevertheless, functional interactions between distant residue pairs in EcBPL have been experimentally demonstrated.27,37 Moreover, these long-range interactions appear to be mediated by networks of residue interactions, including those short range interactions in the active site.23

The results of the computational network analysis performed on the EcBPL both confirm experimental results and suggest additional regions of the protein for future investigations of allostery. First, several of the residues with high betweenness centrality (CB) values (Fig. 3) or those that form the top 20 intercommunity linkages are functionally important. For example, residues R121, F124, K172, Y178, K183, W223, and E313 are not only characterized by high CB values and participate in the top intercommunity links in the apo and/or holo-species (Fig. 3, Table S4) but also contribute significantly to at least one EcBirA function (Table S1). Moreover, the substitution of five of these eight residue positions impacts allosterically activated dimerization (Fig. 8, Table S1). The results also identify residues, all of which are involved in the top 20 intercommunity links (Table S4), for future study. These include W265, N270, and R274, at the interface between the central and C-terminal domains; E110, Y132, and R235 that join Loop 1 to Helix 2; and L95 and T192, which link the helix comprised of residues S89-D96 to the central domain beta sheet adjacent to the dimerization surface.

The results described in this work suggest the origins of functional divergence in the Class I and Class II BPLs. First, for both protein families, the catalytic site, which also serves as the allosteric site in the Class II proteins, is the energetic and evolutionary protein center, thereby ensuring the retention of the biologically essential function of biotin transfer. Both the CNA and MI analyses indicate distinct interactions between the catalytic site and the remainder of the proteins in the two protein classes. Most notably, in the Class I proteins, a network of interactions between the two halves of CS/AS and the C-terminal domain dominates, while in the Class II proteins, the CNA and MI reveal a network of interactions across the central domain. Experimental studies of the impact of amino acid residue substitution on bio-5′-AMP promoted dimerization are consistent with these computational results in indicating contributions of residues in the CS/AS, central domain core, PPI surface, and C-terminal domain.20,21,23,29,30 The dependence of BirA allostery on this extensive residue network also suggests an intimate coupling between folding stability and allostery in the protein, a prediction that has been confirmed in the results of preliminary equilibrium measurements (Zhuang and Beckett, unpublished data). Furthermore, the computational results predict that, provided that the residues required for BPL catalytic function are retained, a variety of sequences in the remainder of the protein can enable allostery. The similar thermodynamics of allosteric function measured for the E. coli, Bacillus subtilis, and S. aureus Class II proteins, despite limited sequence conservation beyond the active site,38 support this prediction, as does the ease with which BirA allosteric response can be tuned by amino acid substitution.23,30

CONCLUSION

The importance of residue networks for allostery39 and specifically for the evolution of new allosteric function4 has previously been demonstrated, and the results presented in this work serve to further underscore their significance. The network paradigm also emphasizes the distributed nature of allostery, in which local residue clusters interact to transmit an input signal at one site to allosteric output at a distant site. Finally, the results imply that allostery can emerge in any protein through accumulated amino acid changes that restructure the underlying residue network architecture to enable a new functional response to a signal.

MATERIALS AND METHODS

Kinetic measurements

All chemicals used were at least reagent grade. Measurements were carried out in standard buffer (10 mM Tris-HCl, pH 7.50 ± 0.02 at 20 °C, 200 mM KCl, 2.5 mM MgCl2). Biotin solutions were prepared in standard buffer at a concentration of ∼800 µM and stored as ∼1 ml aliquots at −80 °C. The bio-5′-AMP, synthesized in-house using a modification of the method described by Lane,9,40 was dissolved in water, and aliquots were stored at −80 °C. All BirA variants and BCCP were prepared and stored as previously described.20,28

The kinetics of EcBPL-catalyzed, wild type and variant, bio-5′-AMP synthesis from biotin and ATP and biotin transfer from bio-5′-AMP to the acceptor protein, BCCP, were monitored by fluorescence,26,28 which takes advantage of the sensitivity of the intrinsic EcBPL fluorescence signal to ligation state.26 In each measurement, the enzyme was mixed with substrate in a one-to-one vol/vol ratio, and the resulting time-dependent change in fluorescence was monitored. While faster processes were measured using a KinTek SF2000 stopped-flow instrument, slower rates were measured after hand-mixing the reaction components using an ISS PC-1 spectrofluorimeter. The resulting kinetic traces were analyzed using the appropriate model, single or double exponential, in Prism 5 (GraphPad). Apparent rates of bio-5′-AMP synthesis were measured at a final enzyme concentration of 1 µM and a single set of final substrate biotin and ATP concentrations of 10 and 500 μM, respectively. For measurements of biotin transfer, the enzyme was first combined with bio-5′-AMP at a molar ratio of enzyme to intermediate of 1:0.8 and then mixed with the biotin acceptor protein, apoBCCP. A series of measurements were performed in which the BirA bio-5′-AMP was maintained at a constant final concentration of 0.4 µM and the acceptor protein concentration varied from 10 to 60 µM. The apparent transfer rate was determined at each apoBCCP concentration, and linear regression of the apparent rate vs acceptor protein concentration data yielded the kCAT/KM for biotin transfer catalyzed by each EcBPL variant.

MD simulations

Chain A of the Class I Mtb (PDB id: 4op0) and Class II Ec (PDB id: 2ewn) proteins served as the starting configurations for standard all-atom MD simulations of the Biotin Protein Ligases (BPLs) in complex with the co-repressor biotinoyl-5′-AMP (bio-5′-AMP) for the Mtb protein and the analog biotinol-5′-AMP (btnOH-5′-AMP) for the Ec protein.18,19 MODELLER 9.2241,42 was used to create residues 118–123, which are absent in the MtbBPL model. For each simulation performed using the OPLSAA force field with the GROMACS 4.6.5 simulator,43,44 the protein was placed in a rhombic dodecahedral box with a 1 nm boundary surrounding it. The box was solvated using SPC/E water molecules, and counterions were added in to render the system neutral. Energy minimization was carried out, followed by a production run of 530 and 1000 ns for Mtb and EcBPL, respectively. RMSD analysis of the resulting MD trajectories indicated that the Mtb and EcBPL simulations reached equilibrium after 100 and 500 ns, respectively. The final 430 ns (Mtb) and 500 ns (Ec) were used for trajectory analyses, with additional details provided in the supplementary material.

HREMD/REST2 simulations of Mtb and Ec apo BPLs

The structures of the apo (unliganded) BPLs contain flexible segments that, due to the absence of electron density, are not resolved in the experimental crystal structure. Consequently, the loop heterogeneity and dynamics were captured using Hamiltonian Replica Exchange molecular dynamics (HREMD) simulations.45,46 The starting structure models for the Mtb and Ec apoBPL loops were obtained by removing the ligand removal from the holoBPL models obtained from the pdb files 4op0 and 2ewn. Next, standard MD simulations were performed using the same protocol as that used for the holo-structures. Finally, the last frame from the equilibrated portion of the simulation was selected as the starting configuration for the replica used in the HREMD simulations.

In HREMD simulations, the charge, Lennard-Jones parameter є, and proper dihedral potentials are altered by a scaling factor λ in “heated” regions of a protein for which dynamics are sampled. This is done so that the effective temperature T/λ is in the “heated” region, while the interactions within the remainder of the protein occur at T. For Ec apoBPL, HREMD simulations were performed using 15 replicas at T = 300 K. The “heated” residues selected included 116–124, 140–146, 193–199, and 211–222, and the λ-values for the 15 replicas of 1, 0.95, 0.90, 0.86, 0.82, 0.78, 0.74, 0.70, 0.67, 0.64, 0.61, 0.58, 0.55, 0.52, and 0.5 were chosen so that the exchange ratio between neighboring replicas was between ∼0.2 and 0.4. No Hamiltonian is altered in the λ = 1 replica, while the Hamiltonian in λ = 0.5 replica is maximally scaled to allow breakage of potential energy barriers. The starting conformations for the replicas were taken from the λ = 0.5 replica trajectory from a short 5 ns 18-replica HREMD simulation. The “heated” regions selected included residues 64–80 and 159–181. For both protein systems, the simulations were carried out for 100 ns in the OPLSAA force field using GROMACS 2018.3 simulator patched with PLUMED47 in an NPT ensemble. The exchange between neighboring replicas according to the Metropolis criterion was set after every 500 steps (Ec apoBPL) and every 1000 steps (Mtb apoBPL), with an integration step of 2 fs. The number of replicas and exchange steps between replicas in each protein system were chosen based on the RMSD and replica diffusion maps (see supplementary material). Validation of the apo ensembles is provided in the supplementary material.

Root mean square fluctuation (RMSF) and helicity analysis

The root mean square fluctuation (RMSF) of the flexible regions in the apoBPL species HREMD simulations was calculated after the superimposition of the protein with the starting configuration from the standard MD simulations, the latter being the reference structure, using MDAnalysis in Python 2.7. The deviation of the amino terminus of helix 1 from the ideal alpha helix/3-10 helix was carried out to determine the quality of the apo ensemble for the Ec apoBPL.

Network analysis and betweenness centrality measure

Residue networks were constructed based on a method outlined in Ortiz48 for both apo and holo forms of Class I Mtb and Class II Ec BPLs. Each residue network was generated using the protein residues (and the ligand for the holoBPL species) as nodes. These nodes are connected by edges that are weighted by pairwise residue interaction energies, with inter-residue communication relayed along the minimum energy paths. Network analysis was performed on the final 430 ns (Mtb) and 500 ns (Ec) of the MD trajectory for the holoBPL species and the final 100 ns of the HREMD simulation for each apoBPL species. Two nodes in the residue network were connected by an edge if they were in contact with each other for more than 20% of the analyzed time and were defined to be in contact if the atoms from the respective residues lie within 1.7 times the sum of Van der Waals radii. For each residue with edges to the i, i ± 1, i ± 2, and i ± 3 residues were excluded. The pairwise interaction energy was defined as the sum of all non-bonded interactions between two residues and was averaged throughout the portion of the trajectory that was analyzed. The weight of the link (ωij) connecting residues i and j was a function of the pairwise interaction energies (εɛ ij), given by

ωij=ϵij13, (1)

to ensure the assignment of similar weights to electrostatic and Van der Waals interactions. The minimum energy paths were calculated for all node pairs, and residues most likely to contribute to intra-network communication were evaluated using the normalized node Betweenness centrality (CB) measure, which quantifies the extent to which a residue participates in minimum energy paths between any two nodes in the network.49,50 The CB (v) of a residue v is calculated using Eq. (2)

CBv=svtVσstvmaxsvtVσst(v), (2)

in which σst (v) is the shortest path between residues s and t passing through the residue v, and V is the entire set of residues.

Community network analysis

The modular structure or communities of a residue network were determined using the Girvan–Newman algorithm,51 a hierarchical method in which communities are detected by the iterative removal of the edge with the highest betweenness centrality from the original network. The edge betweenness centrality measure quantifies the extent to which an edge contributes to inter-residue communication. Communities are the connected components in the remaining network obtained after progressively eliminating edges. Optimization of the residue network community structure was accomplished by maximizing the weighted modularity of the network graph. The Weighted Modularity, QW, measures the difference between the fraction of edge weights in the community in a given graph relative to a random graph. Generally, in a typical biological network, QW lies between 0.4 and 0.7, with larger QW values indicating a better community structure. For both apo and holo Mtb and EcBPL, the QW values ranged from 0.5 to 0.6.

The difference between community networks obtained for the apo and holo BPL species was quantified using the Hamming distance (HD). The HD provides a metric to distinguish two binary sequences and is used to quantify the difference between any two community partitions containing the same number of nodes. For a residue r, the Hamming Distance (HD) between two community partitions C1 and C2 is defined as

HDr,C1,C2=1C1rC2r,0C1r=C2r, (3)

where Ci(r) is the community of residue r in the Ci partition, normalized by the number of nodes considered in the network. The range of normalized HD is [0,1], with 0 indicating that C1 and C2 are identical and 1 signifying that the partitions are completely different. The normalized HD is symmetric.

Inter-community links

The inter-community links between the residue modules were obtained by calculating the edge betweenness centrality (CE) for each protein system. The strength of the inter-community links is determined by the Cumulative Edge Betweenness Centrality (CEBC) score,51 which was summed over all the links with end nodes in two different network communities weighted by their edge betweenness centrality score.

Coarse-grained network analysis

A coarse-grained (CG) version of each residue network was constructed in which each community was represented as a node, and the inter-node interactions were weighted by the CEBC score. For a given network, the current flow-based betweenness centrality [Cflow(u)] of a node, u, measures the fraction of current flowing through it for all possible node pairs.52 The CEBC weights act as conductance in this “resistor” network. Cflow was used to identify the crucial communities in the CG network.

Mutual information analysis

A mutual information analysis was used to obtain the evolutionary relationships between residue pairs in the mono and bifunctional Biotin Protein Ligases. First, files containing Class I and Class II eubacterial and archaebacterial sequences were assembled from entries in the NCBI database,53 which required manual curation due to frequent incorrect annotation of entries of, for example, Class I BPLs as members of Class II and vice versa. Representatives from all taxonomic groups were included in each database, and care was taken to avoid including duplicate sequences. Two sequence files for Class I (1189 and 1083 entries) and Class II (1200 and 1034 entries) BPLs were independently assembled. As a control, a file containing Class I and Class II BPL sequences at roughly a 1:1 ratio was also constructed.

Multiple sequence alignments for the two ligase families were obtained with Clustal Omega using default parameters.54 The resulting alignments were visually inspected in Jalview to ensure that signature sequences corresponding to the conserved active sites of all biotin protein ligases thus far studied were represented in the alignment.55 Additionally, the Class II bifunctional sequences were inspected for the alignment of the DNA binding domain region.

Mutual information analysis was carried out using the MISTIC2 server (https://mistic2.leloir.org.ar/56) with the FASTA files obtained from the Clustal Omega alignments used as input. The mutual information analysis method implemented in MISTIC2 is based on the method described in Ref. 31. For the monofunctional ligases, the Mtb sequence (BAW14408.1) was used as the reference, while the Ec sequence (BAE77342.1) was used for the bifunctional reference, and the default parameter values were used in all analyses. Although the MISTIC2 server provides multiple options for visualizing the analysis results, in this work, we compared the results obtained for the Class I and Class II using the numerical cumulative mutual information values for each residue and mutual information scores for pairs of residues (see the section titled Results). The Kullback–Leibler57 method was used to calculate conservation of each residue position as follows:

KLconsi=i=0nlnP(i)Q(i), (4)

where Pi is the frequency of appearance of amino acid i in a given position and Q(i) is the background frequency of the amino acid in nature calculated using the amino acid frequency distribution obtained from the Uniprot database.

The correlation between the results of the computational network and mutual information analyses obtained for Class I and Class II BPLs was assessed by computing the Sokal–Michener distance metric32 using Python 3.7.16. First, Boolean arrays representing the betweenness centrality and cumulative mutual information results for each protein were constructed. In each array, values of 1 and 0 are assigned to each residue with CB and CMI values in the top 10% and bottom 90%, respectively. The distance between the resulting Boolean arrays ranges from 0 to 1, with 0 indicating identical arrays and 1 indicating complementary arrays.

SUPPLEMENTARY MATERIAL

See the supplementary material for additional information about the experimental results, the numerical values associated with the Community Network Analysis, additional figures illustrating the CN results, and detailed information on the HREMD simulations and results.

ACKNOWLEDGMENTS

This work was supported by NIH Grant No. R01-GM129327 to D.B. The authors thank Calvin Muth for performing the experimental measurements of enzyme kinetics.

The research for this article was performed while D.B. was employed at the University of Maryland. The opinions expressed in the article are the author's own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States Government.

Note: This paper is part of the JCP Special Topic on New Views of Allostery.

Contributor Information

Dorothy Beckett, Email: mailto:dorothy.beckett@nih.gov.

Silvina Matysiak, Email: mailto:matysiak@umd.edu.

AUTHOR DECLARATIONS

Conflict of Interest

The authors have no conflicts to disclose.

Author Contributions

R.S. and N.S. performed MD simulations and Network Analysis. D.B. directed experimental studies and performed Mutual Information analysis. S.M. directed computational studies. D.B. and R.S. wrote the manuscript. D.B. and R.S. prepared the figures.

Riya Samanta: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Neel Sanghvi: Investigation (supporting); Writing – review & editing (supporting). Dorothy Beckett: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Silvina Matysiak: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal).

DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

  • 1.Bohr C., Hasselbalch K., and Krogh A., “Concerning a biologically important relationship—The influence of the carbon dioxide content of blood on its oxygen binding,” Skand. Arch. Physiol. 16, 402–412 (1904). 10.1111/j.1748-1716.1904.tb01382.x [DOI] [Google Scholar]
  • 2.Monod J., Changeux J.-P., and Jacob F., “Allosteric proteins and cellular control systems,” J. Mol. Biol. 6, 306–329 (1963). 10.1016/s0022-2836(63)80091-1 [DOI] [PubMed] [Google Scholar]
  • 3.Meinhardt S., Manley M. W., Parente D. J., and Swint-Kruse L., “Rheostats and toggle switches for modulating protein function,” PLoS One 8(12), e83502 (2013). 10.1371/journal.pone.0083502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hadzipasic A., Wilson C., Nguyen V., Kern N., Kim C., Pitsawong W., Villali J., Zheng Y., and Kern D., “Ancient origins of allosteric activation in a Ser-Thr kinase,” Science 367(6480), 912–917 (2020). 10.1126/science.aay9959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Beattie N. R., Keul N. D., Hicks Sirmans T. N., McDonald W. E., Talmadge T. M., Taujale R., Kannan N., and Wood Z. A., “Conservation of atypical allostery in C. elegans UDP-glucose dehydrogenase,” ACS Omega 4(15), 16318–16329 (2019). 10.1021/acsomega.9b01565 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wl D., The PyMOL Molecular Graphics System, 2002.
  • 7.Tong L., “Structure and function of biotin-dependent carboxylases,” Cell. Mol. Life Sci. 70(5), 863–891 (2013). 10.1007/s00018-012-1096-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rodionov D. A., Mironov A. A., and Gelfand M. S., “Conservation of the biotin regulon and the BirA regulatory signal in Eubacteria and Archaea,” Genome Res. 12(10), 1507–1516 (2002). 10.1101/gr.314502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lane M. D., Rominger K. L., Young D. L., and Lynen F., “The enzymatic synthesis of holotranscarboxylase from apotranscarboxylase and (+)-biotin. II. Investigation of the reaction mechanism,” J. Biol. Chem. 239, 2865–2871 (1964). 10.1016/s0021-9258(18)93826-3 [DOI] [PubMed] [Google Scholar]
  • 10.Barker D. F. and Campbell A. M., “Genetic and biochemical characterization of the birA gene and its product: Evidence for a direct role of biotin holoenzyme synthetase in repression of the biotin operon in Escherichia coli,” J. Mol. Biol. 146(4), 469–492 (1981). 10.1016/0022-2836(81)90043-7 [DOI] [PubMed] [Google Scholar]
  • 11.Barker D. F. and Campbell A. M., “The birA gene of Escherichia coli encodes a biotin holoenzyme synthetase,” J. Mol. Biol. 146(4), 451–467 (1981). 10.1016/0022-2836(81)90042-5 [DOI] [PubMed] [Google Scholar]
  • 12.Prakash O. and Eisenberg M. A., “Biotinyl 5′-adenylate: Corepressor role in the regulation of the biotin genes of Escherichia coli k-12,” Proc. Natl. Acad. Sci. U. S. A. 76(11), 5592–5595 (1979). 10.1073/pnas.76.11.5592 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Eisenstein E. and Beckett D., “Dimerization of the Escherichia coli biotin repressor: Corepressor function in protein assembly,” Biochemistry 38(40), 13077–13084 (1999). 10.1021/bi991241q [DOI] [PubMed] [Google Scholar]
  • 14.Weaver L. H., Kwon K., Beckett D., and Matthews B. W., “Corepressor-induced organization and assembly of the biotin repressor: A model for allosteric activation of a transcriptional regulator,” Proc. Natl. Acad. Sci. U. S. A. 98(11), 6045–6050 (2001). 10.1073/pnas.111128198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Weaver L. H., Kwon K., Beckett D., and Matthews B. W., “Competing protein: Protein interactions are proposed to control the biological switch of the E. coli biotin repressor,” Protein Sci. 10(12), 2618–2622 (2001). 10.1110/ps.ps.32701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bagautdinov B., Matsuura Y., Bagautdinova S., and Kunishima N., “Protein biotinylation visualized by a complex structure of biotin protein ligase with a substrate,” J. Biol. Chem. 283(21), 14739–14750 (2008). 10.1074/jbc.m709116200 [DOI] [PubMed] [Google Scholar]
  • 17.Pendini N. R., Yap M. Y., Traore D. A., Polyak S. W., Cowieson N. P., Abell A., Booker G. W., Wallace J. C., Wilce J. A., and Wilce M. C., “Structural characterization of Staphylococcus aureus biotin protein ligase and interaction partners: An antibiotic target,” Protein Sci. 22(6), 762–773 (2013). 10.1002/pro.2262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ma Q., Akhter Y., Wilmanns M., and Ehebauer M. T., “Active site conformational changes upon reaction intermediate biotinyl-5′-AMP binding in biotin protein ligase from Mycobacterium tuberculosis,” Protein Sci. 23(7), 932–939 (2014). 10.1002/pro.2475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wood Z. A., Weaver L. H., Brown P. H., Beckett D., and Matthews B. W., “Co-repressor induced order and biotin repressor dimerization: A case for divergent followed by convergent evolution,” J. Mol. Biol. 357(2), 509–523 (2006). 10.1016/j.jmb.2005.12.066 [DOI] [PubMed] [Google Scholar]
  • 20.Naganathan S. and Beckett D., “Nucleation of an allosteric response via ligand-induced loop folding,” J. Mol. Biol. 373(1), 96–111 (2007). 10.1016/j.jmb.2007.07.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Eginton C., Naganathan S., and Beckett D., “Sequence-function relationships in folding upon binding,” Protein Sci. 24(2), 200–211 (2015). 10.1002/pro.2605 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wilson K. P., Shewchuk L. M., Brennan R. G., Otsuka A. J., and Matthews B. W., “Escherichia coli biotin holoenzyme synthetase/bio repressor crystal structure delineates the biotin- and DNA-binding domains,” Proc. Natl. Acad. Sci. U. S. A. 89(19), 9257–9261 (1992). 10.1073/pnas.89.19.9257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wang J., Samanta R., Custer G., Look C., Matysiak S., and Beckett D., “Tuning allostery through integration of disorder to order with a residue network,” Biochemistry 59(6), 790–801 (2020). 10.1021/acs.biochem.9b01006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bagautdinov B., Kuroishi C., Sugahara M., and Kunishima N., “Crystal structures of biotin protein ligase from Pyrococcus horikoshii OT3 and its complexes: Structural basis of biotin activation,” J. Mol. Biol. 353(2), 322–333 (2005). 10.1016/j.jmb.2005.08.032 [DOI] [PubMed] [Google Scholar]
  • 25.Soares da Costa T. P., Tieu W., Yap M. Y., Pendini N. R., Polyak S. W., Sejer Pedersen D., Morona R., Turnidge J. D., Wallace J. C., Wilce M. C. J. et al. , “Selective inhibition of biotin protein ligase from Staphylococcus aureus,” J. Biol. Chem. 287(21), 17823–17832 (2012). 10.1074/jbc.m112.356576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Xu Y. and Beckett D., “Kinetics of biotinyl-5′-adenylate synthesis catalyzed by the Escherichia coli repressor of biotin biosynthesis and the stability of the enzyme-product complex,” Biochemistry 33(23), 7354–7360 (1994). 10.1021/bi00189a041 [DOI] [PubMed] [Google Scholar]
  • 27.Eginton C., Cressman W. J., Bachas S., Wade H., and Beckett D., “Allosteric coupling via distant disorder-to-order transitions,” J. Mol. Biol. 427(8), 1695–1704 (2015). 10.1016/j.jmb.2015.02.021 [DOI] [PubMed] [Google Scholar]
  • 28.Nenortas E. and Beckett D., “Purification and characterization of intact and truncated forms of the Escherichia coli biotin carboxyl carrier subunit of acetyl-coa carboxylase,” J. Biol. Chem. 271(13), 7559–7567 (1996). 10.1074/jbc.271.13.7559 [DOI] [PubMed] [Google Scholar]
  • 29.Adikaram P. R. and Beckett D., “Functional versatility of a single protein surface in two protein:protein interactions,” J. Mol. Biol. 419(3-4), 223–233 (2012). 10.1016/j.jmb.2012.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.He C., Custer G., Wang J., Matysiak S., and Beckett D., “Superrepression through altered corepressor-activated protein:protein interactions,” Biochemistry 57(7), 1119–1129 (2018). 10.1021/acs.biochem.7b01122 [DOI] [PubMed] [Google Scholar]
  • 31.Buslje C. M., Santos J., Delfino J. M., and Nielsen M., “Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information,” Bioinformatics 25(9), 1125–1131 (2009). 10.1093/bioinformatics/btp135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhang B. and Srihari S., “Properties of binary vector dissimilarity measures,” paper presented at the Proceedings of JCIS International Conference on Computer Vision, Pattern Recognition, and Image Processing, 2003. [Google Scholar]
  • 33.Adikaram P. R. and Beckett D., “Protein:protein interactions in control of a transcriptional switch,” J. Mol. Biol. 425(22), 4584–4594 (2013). 10.1016/j.jmb.2013.07.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dunn S. D., Wahl L. M., and Gloor G. B., “Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction,” Bioinformatics 24(3), 333–340 (2008). 10.1093/bioinformatics/btm604 [DOI] [PubMed] [Google Scholar]
  • 35.Fodor A. A. and Aldrich R. W., “Influence of conservation on calculations of amino acid covariance in multiple sequence alignments,” Proteins 56(2), 211–221 (2004). 10.1002/prot.20098 [DOI] [PubMed] [Google Scholar]
  • 36.Anishchenko I., Ovchinnikov S., Kamisetty H., and Baker D., “Origins of coevolution between residues distant in protein 3D structures,” Proc. Natl. Acad. Sci. U. S. A. 114(34), 9122–9127 (2017). 10.1073/pnas.1702664114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wang J., Custer G., Beckett D., and Matysiak S., “Long distance modulation of disorder-to-order transitions in protein allostery,” Biochemistry 56(34), 4478–4488 (2017). 10.1021/acs.biochem.7b00496 [DOI] [PubMed] [Google Scholar]
  • 38.Wang J. and Beckett D., “A conserved regulatory mechanism in bifunctional biotin protein ligases,” Protein Sci. 26(8), 1564–1573 (2017). 10.1002/pro.3182 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Naganathan A. N., “Modulation of allosteric coupling by mutations: From protein dynamics and packing to altered native ensembles and function,” Curr. Opin. Struct. Biol. 54, 1–9 (2019). 10.1016/j.sbi.2018.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Abbott J. and Beckett D., “Cooperative binding of the Escherichia coli repressor of biotin biosynthesis to the biotin operator sequence,” Biochemistry 32(37), 9649–9656 (1993). 10.1021/bi00088a017 [DOI] [PubMed] [Google Scholar]
  • 41.Fiser A., Do R. K. G., and Šali A., “Modeling of loops in protein structures,” Protein Sci. 9(9), 1753–1773 (2000). 10.1110/ps.9.9.1753 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Webb B. and Sali A., “Comparative protein structure modeling using MODELLER,” Curr. Protoc. Bioinf. 54, 5.6.1–5.6.37 (2016). 10.1002/cpbi.3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Berendsen H. J. C., van der Spoel D., and van Drunen R., “GROMACS: A message-passing parallel molecular dynamics implementation,” Comput. Phys. Commun. 91(1–3), 43–56 (1995). 10.1016/0010-4655(95)00042-e [DOI] [Google Scholar]
  • 44.Van Der Spoel D., Lindahl E., Hess B., Groenhof G., Mark A. E., and Berendsen H. J. C., “GROMACS: Fast, flexible, and free,” J. Comput. Chem. 26(16), 1701–1718 (2005). 10.1002/jcc.20291 [DOI] [PubMed] [Google Scholar]
  • 45.Wang L., Friesner R. A., and Berne B. J., “Replica exchange with solute scaling: A more efficient version of replica exchange with solute tempering (REST2),” J. Phys. Chem. B 115(30), 9431–9438 (2011). 10.1021/jp204407d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Bussi G., “Hamiltonian replica exchange in GROMACS: A flexible implementation,” Mol. Phys. 112(3–4), 379–384 (2014). 10.1080/00268976.2013.824126 [DOI] [Google Scholar]
  • 47.Bonomi M., Bussi G., Camilloni C., Tribello G. A., Banáš P., Barducci A., Bernetti M., Bolhuis P. G., Bottaro S., Branduardi D. et al. , “Promoting transparency and reproducibility in enhanced molecular simulations,” Nat. Methods 16(8), 670–673 (2019). 10.1038/s41592-019-0506-8 [DOI] [PubMed] [Google Scholar]
  • 48.Ribeiro A. A. S. T. and Ortiz V., “Energy propagation and network energetic coupling in proteins,” J. Phys. Chem. B 119(5), 1835–1846 (2015). 10.1021/jp509906m [DOI] [PubMed] [Google Scholar]
  • 49.Newman M. E. J., “The structure and function of complex networks,” SIAM Rev. 45(2), 167–256 (2003). 10.1137/s003614450342480 [DOI] [Google Scholar]
  • 50.Newman M. E. J., Networks: An Introduction (Oxford University Press, Oxford, NY, 2010). [Google Scholar]
  • 51.Girvan M. and Newman M. E. J., “Community structure in social and biological networks,” Proc. Natl. Acad. Sci. 99(12), 7821 (2002). 10.1073/pnas.122653799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Centrality Measures Based on Current Flow. STACS25, edited by Diekert V. and Durand B. (Springer, Berlin, Heidelberg, 2005). [Google Scholar]
  • 53.Sayers E. W., Beck J., Bolton E. E., Bourexis D., Brister J. R., Canese K., Comeau D. C., Funk K., Kim S., Klimke W. et al. , “Database resources of the national center for biotechnology information,” Nucleic Acids Res. 49(D1), D10–D17 (2021). 10.1093/nar/gkaa892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Madeira F., Park Y. m., Lee J., Buso N., Gur T., Madhusoodanan N., Basutkar P., Tivey A. R. N., Potter S. C., Finn R. D. et al. , “The EMBL-EBI search and sequence analysis tools APIs in 2019,” Nucleic Acids Res. 47(W1), W636–W641 (2019). 10.1093/nar/gkz268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Waterhouse A. M., Procter J. B., Martin D. M. A., Clamp M., and Barton G. J., “Jalview version 2—A multiple sequence alignment editor and analysis workbench,” Bioinformatics 25(9), 1189–1191 (2009). 10.1093/bioinformatics/btp033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Colell E. A., Iserte J. A., Simonetti F. L., and Marino-Buslje C., “MISTIC2: Comprehensive server to study coevolution in protein families,” Nucleic Acids Res. 46, W323 (2018). 10.1093/nar/gky419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Cover T. and Thomas A., Elements of Information Theory, 2nd ed. (John Wiley & Sons, Inc., Hoboken, NJ, 2006). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

See the supplementary material for additional information about the experimental results, the numerical values associated with the Community Network Analysis, additional figures illustrating the CN results, and detailed information on the HREMD simulations and results.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES