Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Nov 3;107(47):20352–20357. doi: 10.1073/pnas.1012999107

Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states

Kosuke Hashimoto 1, Anna R Panchenko 1,1
PMCID: PMC2996646  PMID: 21048085

Abstract

The main principles of protein-protein recognition are elucidated by the studies of homooligomers which in turn mediate and regulate gene expression, activity of enzymes, ion channels, receptors, and cell-cell adhesion processes. Here we explore oligomeric states of homologous proteins in various organisms to better understand the functional roles and evolutionary mechanisms of homooligomerization. We observe a great diversity in mechanisms controlling oligomerization and focus in our study on insertions and deletions in homologous proteins and how they enable or disable complex formation. We show that insertions and deletions which differentiate monomers and dimers have a significant tendency to be located on the interaction interfaces and about a quarter of all proteins studied and forty percent of enzymes have regions which mediate or disrupt the formation of oligomers. We suggest that relatively small insertions or deletions may have a profound effect on complex stability and/or specificity. Indeed removal of complex enabling regions from protein structures in many cases resulted in the complete or partial loss of stability. Moreover, we find that insertions and deletions modulating oligomerization have a lower aggregation propensity and contain a larger fraction of polar, charged residues, glycine and proline compared to conventional interfaces and protein surface. Most likely, these regions may mediate specific interactions, prevent nonspecific dysfunctional aggregation and preclude undesired interactions between close paralogs therefore separating their functional pathways. Last, we show how the presence or absence of insertions and deletions on interfaces might be of practical value in annotating protein oligomeric states.

Keywords: homodimer, homooligomer protein, protein structural evolution


How proteins interact is a basic question in cell biology, fundamental studies of physico-chemical and evolutionary mechanisms of protein specific binding involve homooligomers and analysis of their complex formation and functional roles in providing specific cellular function. Recent studies have led to a greater awareness that homooligomers provide diversity and specificity of many pathways and may mediate and regulate gene expression, activity of enzymes, ion channels, receptors, and cell-cell adhesion processes (110). Transitions between different oligomeric states may also be important in regulation of apoptosis and tumor formation, and oligomeric complexes between homologous proteins can integrate different pathways and provide the cross talk between them (5, 11). Moreover, homooligomers may undergo reversible transitions between different discrete conformations which preserve the symmetry of the complex and account for their cooperative binding properties and allosteric mechanisms in signal transduction (12). In addition, oligomerization allows proteins to form large structures without increasing genome size and provides stability, while the reduced surface area of the monomer in a complex can offer protection against denaturation (13, 9, 13).

Although protein oligomeric states are often quite difficult to characterize experimentally, the majority of proteins form homooligomers in the Protein Structure Database (14). Analysis of high-throughput protein-protein interaction networks showed that there are significantly more self-interacting proteins than expected by chance (15), and that the efficiency of coaggregation between different protein domains positively correlates with their similarity (16) due to stability, foldability, or evolutionary constraints (17, 18). Different evolutionary scenarios of protein oligomerization have been discussed in the literature. Some of the scenarios propose evolutionary mechanisms that follow kinetic pathways, domain swapping, or formation of leucine zippers (8, 1922). Gene duplication may lead to oligomeric paralogs and may create in evolution new protein complexes with unique specificities (7, 11, 2325). It has also been shown that binding modes and symmetries of homooligomeric complexes are conserved only between close homologs, and homooligomers with dihedral symmetry evolved through their cyclic intermediates (7, 26).

As homooligomers play important functional roles, the formation of multiple oligomeric interfaces and symmetry requirements put additional constraints on the evolution of constituent monomers. Mechanisms of homooligomerization are not very well understood although recent studies have shed some light on their main principles. For example, it has been shown how low-affinity cadherin dimers formed through β-strand swapping can mediate and control highly specific intercellular adhesion (10, 27). In another study it was proposed that β-barrel membrane proteins can oligomerize through the weakly stable interfacial beta-strands (28), while the C-terminal helix in certain p53 proteins might be essential for stabilizing tetramers (29). Finally, the manual inspection of insertions and deletions of homologous proteins in different oligomeric states revealed that for about a quarter of them certain protein regions are responsible for enabling or disabling the oligomeric interfaces (30).

In this paper we explore different oligomeric states of homologous proteins to better understand the functional and evolutionary mechanisms of homoligomerization. We observe a great diversity in mechanisms of oligomerization and focus our study on how insertions or deletions in proteins may influence complex formation. We show that insertions and deletions which differentiate monomers and dimers have a significant tendency to be located on the interfaces and about a quarter of all studied proteins and 40% of enzymes have regions which may mediate or disrupt the formation of homodimers. Therefore we suggest that relatively small insertions or deletions may represent an important evolutionary mechanism of oligomerization, and may profoundly affect complex stability and the development of new specificities. Indeed our computational experiments have demonstrated that removal of enabling regions from protein structures results in the complete or partial loss of dimer stability. Moreover, enabling and disabling regions may allow proteins to develop new specific interactions and prevent undesired interactions between close paralogs, thereby facilitating the separation of their functional pathways. This assumption is supported by analyses of sequence and structure, amino acid composition, and by calculations of aggregation propensities and free energies of dissociation of complexes. Lastly, we show how the enabling/disabling features might be of practical value in annotating protein oligomeric states.

Results

Identifying Mechanisms of Dimerization.

We obtained 4,419 pairs of a dimer and a closest monomer from the same Conserved Domain Database (CDD) family for the full dataset and 532 pairs for the nonredundant dataset (see Materials and Methods). This list contained 367 different CDD families (348 families in the nonredundant dataset) encompassing a wide range of protein biological functions. About half of these families (168 families) represented enzymes and other major functional categories included regulatory, signaling, and transport proteins. We analyzed pairs of dimers and monomers, their sequence, structural similarity, and energy of dimer dissociation. The sequence similarity between proteins in the dimeric and monomeric states varied considerably with the highest fraction of proteins sharing 90–100% sequence identity between a dimer and a monomer (Fig. S1A), which corresponds to the same proteins in different oligomeric states or cases where point mutation leads to the formation or disruption of a dimer. Upon examination the majority of these dimers were formed by domain swapping or through small scale conformational changes. Point mutation can also have a considerable effect on dimer formation and implicate many diseases (31). For example, we found point mutations of glutamate receptors which apparently stabilize their dimer interface and reduce desensitization (1lb8-1mm6). Point mutation could also shift equilibrium between different oligomeric states in Pyrococcus furiosus mutants (1iz5-1iz4) and disrupt the salt bridge stabilizing the functional dimer in nuclear hormone receptors (2pin-1n46). All these findings were supported by experimental studies (3234). Other mechanisms observed in our set of dimers and monomers included:—presence of common stabilizing ligands on interfaces including ions (“ligand induced dimerization”);—regulation of dimerization through posttranslational modifications including phosphorylation and disulfide bond formation between two subunits; and —presence of insertions/deletions which favored dimeric or monomeric states. We will discuss the latter mechanism in more detail.

Mechanism of Dimerization Through Insertions/Deletions of Protein Regions.

We analyzed differences in structures between monomers and dimers to assess whether or not the gapped or unaligned residues were more frequently located on interface vs. other types of regions. We showed that the unaligned and gapped residues (inserted in the dimer compared to the monomer), occurred more frequently on the interface than on the surface (P-values ≪ 10e-7) (Fig. S1B). This observation implies that they may play an important role in forming the dimer interface, constituting enabling regions. In addition, we noticed that for very remotely related monomers and dimers (of less than 20% identity) enabling features might either be absent or obscured by the overall structural differences as a result of evolutionary divergence.

We found 1,020 enabling regions and 351 disabling regions from the full dataset and 108 enabling and 49 disabling regions from the nonredundant dataset. Fig. 1 shows the length distribution of these regions. Most regions are less than nine residues long whereas some of the disabling regions contain up to twenty five residues. Analysis of functional categories of families with enabling/disabling regions showed that enzyme families more frequently contained such regions (72 out of 168 families, binomial test p-value < 0.003) than other families (46 out of 199 families). Although even one extra residue on an interface might affect oligomer formation we concentrated on regions of four or more residues that clearly mediate or disrupt the homodimer interface. (Table S1 lists manually analyzed representatives of enabling and disabling regions of four or more residues and all regions are listed in Table S2). All monomeric states from this table (except for three entries) were confirmed using the (Inferred Biomolecular Interaction Server) IBIS server (35) and dimer interfaces were additionally verified by the NOXclass method (36). We also estimated the free energy of dissociation of dimers with the PISA algorithm (ΔGdiss) which uses principles of chemical thermodynamics to detect macromolecular assemblies. We showed that many dimers were characterized by high binding affinity, ΔGdiss > 4 kcal/mol [this value corresponds to standard energy of dissociation under the assumption that equilibrium concentrations of dimers and monomers are equal (37)], implying that they were quite stable. Moreover, we performed the excision of enabling regions from the structures of protein dimers from the representative set and recalculated the free energy of dissociation. We showed that removal of enabling regions disrupted the dimer formation in about half of families and destabilized the dimer in other cases (Table S1). The effect of destabilization depended considerably upon the protein family. When we examined structural similarity in the aligned regions between proteins in two oligomeric states, it always turned out to be quite high (drmsd < 3 ) indicating that the dimerization was not caused by intrachain conformational changes or involved domain swapping.

Fig. 1.

Fig. 1.

The length distribution of the enabling and disabling regions in the nonredundant set.

After examining secondary structure propensities of enabling and disabling regions we found that the majority of these regions constituted loops (54% of enabling and 59% of disabling regions respectively), followed by the α-helix (37% and 30%), and β-strand (9% and 11%) (Fig. 2 and Fig. S2). Moreover, loops and strands occurred significantly more often in enabling regions than on the overall protein surface (Fisher exact test p-value ≪ 0.05) (Table S3). In our previous study on glycosyltransferases we found that loops were also significantly more prevalent among enabling regions (24).

Fig. 2.

Fig. 2.

Percentage of secondary structures in the enabling and disabling regions in the nonredundant set. Percentage of secondary structures in the full dataset is shown in Fig. S2. In all cases including the full dataset, more than half of enabling and disabling regions are located in loops, approximately 30–40% in α-helices, and 10% in β-strands.

Amino Acid Composition and Propensity to Aggregate for Enabling and Disabling Regions.

Previous studies showed that amino acid composition on homooligomeric interfaces differed from those on surfaces and buried regions (38). We confirmed that most of the dimer interfaces contained hydrophobic and some aromatic amino acids (Fig. S3A). We also found that enabling and disabling regions contained more polar and charged residues than the aligned interface regions, “conventional interfaces,” (Fig. 3) (although these biases were not pronounced) and contained significantly larger amounts of glycine and proline (Fig. 3 and Fig. S3B). Indeed, the effects of proline and glycine on the inhibition of nonspecific aggregation and amyloid formation have been confirmed previously (39).

Fig. 3.

Fig. 3.

The amino acid propensities in the nonredundant set for (A) enabling regions (B) disabling regions. The propensities of the full dataset are shown in Fig. S3 C and D. The propensities are calculated as the log ratio between frequencies of a particular amino acid in enabling/disabling regions and frequency of the same amino acid in aligned conventional interfaces.

To understand the role of enabling and disabling regions in specific biological interactions and in preventing nonspecific aggregation, we calculated aggregation propensities using potentials developed previously (40). Unlike amino acid composition analysis the aggregation propensity calculations take into account the residue context, the presence of specific patterns of alternating hydrophobic and hydrophilic residues, and residue charges. A recent study by Pechmann et al. (41) reported that interfaces of protein complexes are more prone to aggregate than surface. Consistent with this study we found similar trends for homooligomers (Wilcoxon signed-rank test p-value < 10e-10, Fig. S4A) even though homooligomers might be under selection pressure to have lower aggregation propensity compared to heterooligomers (42). Importantly, we observed that while enabling regions have a somewhat higher propensity to aggregate than the protein surface their aggregation propensity is lower than that for conventional aligned interfaces (p-value < 10e-10) (Fig. 4 and Fig. S4B). Interestingly, disabling regions have quite low aggregation propensity, considerably lower than that for enabling regions or conventional interfaces (p-value < 10e-10).

Fig. 4.

Fig. 4.

Aggregation propensities calculated using the nonredundant set for four different regions: enabling, disabling, aligned interface, and overall molecular surface. The distributions of aggregation propensities are smoothed by the Gaussian kernel density estimation. Conventional interfaces have the highest aggregation propensity, whereas disabling regions have the lowest aggregation propensity. Aggregation propensities for the full dataset are also shown in Fig. S4B.

Evolutionary and Structural Mechanisms to Form Enabling and Disabling Regions.

We analyzed enabling regions and found two typical structural mechanisms to enable extra residues to form an interface. According to the first scenario an insertion in a dimer (or deletion in a monomer, we do not have enough data to infer ancestral states and distinguish these evolutionary events) creates a new secondary structure element or loop that functions as a new interaction surface. For example, dimeric extracellular glutamyl endopeptidase (cd00190, Table S1) has a loop which is elongated through the insertion of an enabling region which forms a new β-sheet. The β-sheet extends in a vertical direction, creating a protruding interaction surface whereas this β-sheet is missing in monomeric human proteins (Fig. 5).

Fig. 5.

Fig. 5.

Illustration of enabling features with insertion of new secondary structure elements, 1P3C—1FQ3 pair from the trypsin-like serine protease family (cd00190). Two subunits of the homodimers are shown in light-blue and light-red. Enabling regions and their surrounding residues are represented by the red tubes, whereas corresponding residues in the monomer are represented by the yellow tubes. A small β-sheet contributing to the enabling interaction (shown in red tube) is absent from the monomer (shown in the yellow tube).

According to the second structural mechanism, which constitutes the majority of the cases, existing secondary structure elements are extended to form the interface. One example includes glycoside hydrolases (pfam10566), BtGH97a and BtGH97b. The difference between these two structures might be the result of insertions of two loops in BtGH97a that strongly contribute to form the homodimer interface (Fig. 6A). In addition, one of these loops reaches close to the catalytic center of the counterpart of the homodimer, implying its possible involvement in the catalytic reaction or substrate binding (Fig. S5A). Moreover, we found that the two enabling loops were present only in one subfamily BtGH97a and absent in the other BtGH97b (Fig. S5B). Based on the facts that the evolutionary conservation of these loops is directly related to the catalytic mechanism and at the same time these loops mediate the formation of oligomeric states, we can conclude that catalytic mechanism and binding specificity can be modulated through different functional oligomeric states. This conclusion has indeed been confirmed by recent studies (43) for glycosyl hydrolases and will be discussed further in the following sections of our paper. Fig. 6 B and C shows examples of the β-strand and α-helix extensions. For example, in the pair of an aminoimidazole riboside kinase and a fructokinase (cd01167), the extension of two strands on the homodimer interface enables the formation of a β-sheet between two subunits.

Fig. 6.

Fig. 6.

Illustration of enabling features with the extension of existing secondary structure elements. (A) 2D73—3A24 from the glycoside hydrolase family (pfam10566). Two longer loops create an interface that is absent from the monomer (B) 1TYY—2QHP from the Fructokinases family (cd01167). The β-sheet is more extensive in the homodimer than in the monomer. (C) 3E3A—3E0X from the Esterases and lipases family (cd00312). Two α-helices are extended in the homodimer compared to the monomer. In all figures, the two subunits of the homodimers are shown in light-blue and light-red. Enabling regions are represented by red tubes, whereas corresponding regions in the monomer are shown by yellow tubes.

We analyzed the structural mechanisms preventing dimer formation as illustrated by the example of phosphonate monoester hydrolase (cd00016). As can be seen in Fig. 7, a disabling region, shown in yellow, fills a part of the interaction surface and disrupts the dimer formation. The C-terminal region in the dimer, shown in blue, forms the interface while in the monomer this interface is covered by its own C-terminal region which increases its stability. Interestingly, it has been shown experimentally that truncation of the C-terminal region that forms the homodimer interface severely affects the enzyme activity (44) and moreover, the effect of the interface truncation is much larger than the effect of the active site mutations.

Fig. 7.

Fig. 7.

Illustration of disabling features of 2VQR—3B5Q from the phosphonate monoester hydrolase family (cd00016). (A) Front view of the interface in 2VQR (homodimer) with several disabling residues shown in yellow for 3B5Q (monomer). (B) Side view of the interface. Interface and surface on one subunit of the homodimer are shown in dark-red and light-red. The binding region (C-terminal region) of the other subunit is represented by the blue tube. A disabling region that fills a part of the interface is shown in yellow.

Annotating Unknown Oligomeric States.

The experimental identification of oligomeric states is a tedious task while the computational annotation is complicated by the limited conservation of oligomeric states in evolution (7, 26). Detecting features which mediate or disrupt the oligomer formation can help in this respect to predict or verify the existing oligomeric states and biological binding modes. Using our representative set of dimers and monomers with enabling and disabling features from Table S1 we annotated the homooligomeric states for all protein entries from our set (excluding proteins from the representative set and protein pairs from the 90–100% identity bin, Fig. S1A). Although, as can be seen in Table S4, the accuracy of oligomeric state prediction depends considerably on the family, the simultaneous presence of enabling and disabling features in different members of the family considerably facilitates the prediction, achieving close to 100% accuracy. Overall, the presence/absence of enabling or disabling features achieves a very good classification accuracy of 0.70 sensitivity, 0.74 specificity, and 0.94 precision (Table 1) even though this approach does not explicitly use the information about the level of similarity between unknown and annotated proteins. High precision of our prediction comes from the low rate of false positives once the enabling/disabling features are detected in the family and does not drop considerably even for remote homology range between unknown and annotated proteins. At the same time, similarity search methods including BLAST can also predict oligomeric states by inference from homologs with the accuracy considerably higher than expected from random assignments (sensitivity 0.74, specificity 0.53, precision 0.89).

Table 1.

Prediction accuracy of oligomeric states

Sensitivity TP/(TP + FN) Specificity TN/(FP + TN) Precision TP/(TP + FP) Error rate 1- TN/(FP + TN)
Presence/absence of enabling and disabling features 0.70 0.74 0.94 0.36
Percent identity 0.71 0.62 0.91 0.38
rmsd 0.72 0.60 0.90 0.40
GSAS 0.81 0.57 0.91 0.43
BLAST 0.74 0.53 0.89 0.47

Assignment of oligomeric states based on presence/absence of enabling/disabling features or based on the closest similarity to proteins with enabling/disabling features using different similarity metrics rmsd, GSAS, sequence identity, and BLAST p-value

Discussion

Mechanisms of Oligomerization Are Very Diverse.

From our analysis of different oligomeric states in proteins from the same family we observed several mechanisms which play key roles in oligomerization: -domain swapping; -“ligand induced dimerization”; -point mutations on dimer interfaces;—posttranslational modifications including phosphorylation on dimer interfaces and disulfide bond formation between two subunits; and —insertions/deletions which favor dimeric or monomeric states. We focused in this paper on the latter mechanism and showed that unaligned and gapped residues occurred more frequently on the interface than on the surface and about a quarter of all studied proteins and about 40% of all enzyme families had regions modulating the formation of dimers. Indeed, it was shown previously that oligomerization is especially important for enzyme activation/deactivation in metabolic pathways, for regulating their cellular location, and interactions with other components (3, 5, 6). The enabling and disabling regions may occur through insertion and deletion events in evolution and although these genetic events are relatively rare compared to point mutations (45), it was shown previously that many insertions and deletions are under strong selective pressure in proteins (46, 47). Our detailed analysis of biological functions, structures, and evolutionary conservation of proteins in different oligomeric states from the studied families revealed several main functional roles of enabling and disabling features.

Enabling Regions Are Important for Complex Stability.

First of all, enabling features can be important for stability of dimers. Indeed, the vast majority of dimers with enabling features from our representative set correspond to obligatory complexes and deletion of enabling features in the majority of cases leads to considerable destabilization of complexes. This scenario is realized for proteins from the dihydropholate reductase family (DHFR, cd00209) which preferably exist as dimers in hyperthermophilic organisms and monomers in mesophiles. Moreover, monomers from this family have disabling regions so that they do not compromise catalytic activity for stability [dimers have reduced catalytic activity (48)]. The inspection of these dimers showed that they are stabilized through the enabling regions which form “clapping hands” conformations with an extensive set of contacts. We should distinguish this mechanism of dimer formation from the previously described “domain swapping” mechanism (19) which includes the opening up of the monomeric conformation and exchanging identical regions between two monomers. While domain swapping involves substantial conformational changes and the stabilities of swapped dimers and monomers might be comparable, dimers formed through enabling features may differ considerably in their energy of stabilization from the monomers and as we showed do not involve large intrasubunit conformational changes.

Enabling and Disabling Regions Are Important for Developing New Specificities and Separation of Functional Pathways.

In another scenario, enabling/disabling regions may correspond to the specificity determining features or loops characteristic only for certain subfamilies of dimers or monomers, which can provide high binding affinity to certain substrates while minimizing interactions with other unwanted partners. Examples of how enabling or disabling regions can regulate the accessibility of substrates to the active site by imposing certain steric constraints include the virulence factor choline-binding protein subfamily from the Metallo-beta-lactamase superfamily (cl00446) (49) or subfamilies from Trypsin-like serine proteases (cd00190) (50). From an evolutionary perspective, introducing enabling or disabling regions in a complex may allow the development of new specificities and prevent undesired interactions between close paralogs, therefore facilitating the separation of their functional pathways. Because oligomeric state often determines the binding affinity and activity of proteins, developing new specificities in different organisms or in paralogs from the same organism by changing the functional oligomeric forms could indeed be an important evolutionary mechanism. At the same time it is known that the activity of many oligomeric enzymes (many of which are presented in Table S1) and nuclear receptors are allosterically regulated, which can reinforce their specific binding (12).

To support the specificity-defining role of enabling and disabling regions, we showed that while enabling regions have a somewhat higher propensity to aggregate than the protein surface, their aggregation propensity and binding affinity is lower than that for the conventional aligned interfaces. Disabling regions have considerably lower aggregation propensity. Based on these results and amino acid composition analysis (showing that enabling/disabling regions contain a large fraction of polar and charged residues as well as Gly and Pro), we can conclude that most likely, enabling regions may play an important role in mediating specific (in many cases of electrostatic nature) interactions including salt bridges, aromatic ring pairing, and cation-π interactions. At the same time enabling regions may prevent nonspecific dysfunctional aggregation, shielding “sticky” hydrophobic patches of the conventional interface from the solvent. The effect of these regions is similar to the role of “gatekeeping” residues preventing nonspecific aggregation (41, 51). Disabling regions prevent any aggregation or association with other partners and might be specific in the sense of preventing the participation in nonspecific functional pathways assisting the functional separation of paralogs. In our previous paper we demonstrated that specific function and binding selectivity in homodimers might be also sustained by disordered loops (52). We should mention, that although here we did not study explicitly the amino acid substitutions which can govern the complex formation, previous studies clearly showed that most of these substitutions involved aromatic and hydrophobic residues which increased binding affinity and stabilized the homooligomer (53, 54), but at the same time compromised the specificity and reversibility of the complex formation. Finally, our analysis concludes that presence of enabling and disabling features can be a strong predictor of the oligomeric state once these features are found in the target protein, resulting in the lower false positive rate and high precision of the prediction.

Materials and Methods

Annotating Oligomeric States and Interaction Interfaces.

First we extracted all available protein chains from the Molecular Modeling Database (MMDB) (55). Then we selected single domain chains, which had only one domain covering at least 80% of the entire chain. The domain definitions and boundaries were obtained from the CDD version 2.17 (56), which is a curated collection of multiple sequence alignments of protein domains defined as recurrent evolutionary units having specific functions and structures. For each chain, oligomeric states and interaction interfaces were annotated and free energy of dissociation was calculated using the PISA program, which achieves 80–90% accuracy for the correct identification of macromolecular assemblies (37). Because the majority of homooligomers in MMDB are dimers and higher order homooligomeric states are much more difficult to decipher using existing methods, in the current study we concentrated on homodimers defined as assemblies consisting of two chains with at least 90% sequence identity.

Comparison of Homologs in Different Oligomeric States.

We searched for the most similar monomer-dimer pair of proteins from the same CDD family. Similarity was assessed with the VAST program (57) using the gapped structural alignment score (GSAS) (58). We compiled a full dataset including all protein pairs and a nonredundant dataset (similar monomer/dimer chains with BLAST p-value < 10e-07 were excluded). We then compared dimer and monomer structures for each pair of proteins to determine those features which might mediate or disrupt the formation of the dimer or monomer. Based on the VAST structure-structure superpositions between dimers and monomers, all protein regions were classified into three categories: aligned, gapped, and unaligned regions. Here, gapped region refers to a set of consecutive residues that are present in a dimer and absent from the monomer (and vice versa), whereas unaligned region includes all the residues except for aligned or gapped residues. We also classified residues into three categories in terms of their location on the structure: interface, surface, and buried residues. Interface, surface, and buried residues were defined by PISA. We searched for the regions that enabled or disabled the formation of interface in homodimers. An enabling region is defined as a gapped region on homodimers where 80% of the gapped residues are also annotated as interface residues. A disabling region is defined as a gapped region on monomers that is surrounded by two aligned residues, which corresponds to the interface residues on the dimer.

Calculating Amino Acid Composition and Aggregation Propensity.

We calculated amino acid propensities to be located preferentially in different regions of proteins as Propensity Inline graphic, where Inline graphic and Inline graphic are the fractions of amino acid i on the interface and surface, respectively. The aggregation propensity of regions was calculated using the Zyggregator method, which considers physico-chemical properties, hydrophobicity, charges, the propensities to α-helix or β-sheet, as well as residue context, and the presence of specific patterns of alternating hydrophobic and hydrophilic residues (40). (See SI Text for details).

Estimating Accuracy for Predicting Oligomeric States.

Each protein was compared to the representative set of proteins with enabling/disabling features from the same family (Tables S1 and S4). Then the oligomeric state of the unknown protein was assigned based on either the presence/absence of enabling/disabling features or based on the closest similarity to annotated proteins with enabling/disabling features using different similarity metrics rmsd, GSAS, sequence identity, and BLAST p-value. CDD families were classified into three types based on the presence of the enabling and disabling features: families with only enabling regions, families with only disabling regions, and families with both enabling and disabling regions. The assignment of oligomeric state was based on the fraction of enabling/disabling regions aligned, if more than 80% of the enabling region was aligned to the test protein, the protein was predicted to be a dimer and if more than 80% of the disabling region in a protein was aligned, it was predicted to be a monomer (see details in Table S4). True positives (TP) were defined as correctly predicted oligomeric states, false positives (FP) as incorrectly predicted oligomers. True negatives (TN) were defined as those which did not pass prediction criteria and were not assigned the wrong oligomeric state and false negatives (FN) as those which did not pass prediction criteria and were not assigned the correct oligomeric state. Then the sensitivity TP/(TP + FN) was calculated as the number of TP divided by the sum of TP and FN. Specificity was estimated as the ratio TN/(FP + TN), error rate as one minus the specificity value and precision was calculated as the ratio TP/(TP + FP).

Supplementary Material

Supporting Information

Acknowledgments.

We thank Steve Bryant for insightful discussions and Tom Madej for careful reading of the manuscript. This work was supported by National Institutes of Health/Department of Health and Human Service (DHHS) (Intramural Research program of the National Library of Medicine). K.H. was supported by a JSPS Research Fellowship from the Japan Society for the Promotion of Science.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1012999107/-/DCSupplemental.

References

  • 1.Cornish-Bowden AJ, Koshland DE., Jr The quaternary structure of proteins composed of identical subunits. J Biol Chem. 1971;246:3092–3102. [PubMed] [Google Scholar]
  • 2.Jones S, Thornton JM. Protein-protein interactions: a review of protein dimer structures. Prog Biophys Mol Biol. 1995;63:31–65. doi: 10.1016/0079-6107(94)00008-w. [DOI] [PubMed] [Google Scholar]
  • 3.Torshin I. Activating oligomerization as intermediate level of signal transduction: analysis of protein-protein contacts and active sites in several glycolytic enzymes. Front Biosci. 1999;4:D557–570. doi: 10.2741/torshin1. [DOI] [PubMed] [Google Scholar]
  • 4.Woodcock JM, Murphy J, Stomski FC, Berndt MC, Lopez AF. The dimeric versus monomeric status of 14-3-3zeta is controlled by phosphorylation of Ser58 at the dimer interface. J Biol Chem. 2003;278:36323–36327. doi: 10.1074/jbc.M304689200. [DOI] [PubMed] [Google Scholar]
  • 5.Mazurek S, Boschek CB, Hugo F, Eigenbrodt E. Pyruvate kinase type M2 and its role in tumor growth and spreading. Semin Cancer Biol. 2005;15:300–308. doi: 10.1016/j.semcancer.2005.04.009. [DOI] [PubMed] [Google Scholar]
  • 6.Koike R, Kidera A, Ota M. Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold. Protein Sci. 2009;18:2060–2066. doi: 10.1002/pro.218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Dayhoff JE, Shoemaker BA, Bryant SH, Panchenko AR. Evolution of protein binding modes in homooligomers. J Mol Biol. 2010;395:860–870. doi: 10.1016/j.jmb.2009.10.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Baisamy L, Jurisch N, Diviani D. Leucine zipper-mediated homo-oligomerization regulates the Rho-GEF activity of AKAP-Lbc. J Biol Chem. 2005;280:15405–15412. doi: 10.1074/jbc.M414440200. [DOI] [PubMed] [Google Scholar]
  • 9.Goodsell DS, Olson AJ. Structural symmetry and protein function. Annu Rev Biophys Biomol Struct. 2000;29:105–153. doi: 10.1146/annurev.biophys.29.1.105. [DOI] [PubMed] [Google Scholar]
  • 10.Chen CP, Posy S, Ben-Shaul A, Shapiro L, Honig BH. Specificity of cell-cell adhesion by classical cadherins: Critical role for low-affinity dimerization through beta-strand swapping. Proc Natl Acad Sci USA. 2005;102:8531–8536. doi: 10.1073/pnas.0503319102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Reid AJ, Ranea JA, Orengo CA. Comparative evolutionary analysis of protein complexes in E.coli and yeast. BMC Genomics. 2010;11:79. doi: 10.1186/1471-2164-11-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Changeux JP, Edelstein SJ. Allosteric mechanisms of signal transduction. Science. 2005;308:1424–1428. doi: 10.1126/science.1108595. [DOI] [PubMed] [Google Scholar]
  • 13.Miller S, Lesk AM, Janin J, Chothia C. The accessible surface area and stability of oligomeric proteins. Nature. 1987;328:834–836. doi: 10.1038/328834a0. [DOI] [PubMed] [Google Scholar]
  • 14.Henrick K, Thornton JM. PQS: a protein quaternary structure file server. Trends Biochem Sci. 1998;23:358–361. doi: 10.1016/s0968-0004(98)01253-5. [DOI] [PubMed] [Google Scholar]
  • 15.Ispolatov I, Yuryev A, Mazo I, Maslov S. Binding properties and evolution of homodimers in protein-protein interaction networks. Nucleic Acids Res. 2005;33:3629–3635. doi: 10.1093/nar/gki678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wright CF, Teichmann SA, Clarke J, Dobson CM. The importance of sequence diversity in the aggregation and evolution of proteins. Nature. 2005;438:878–881. doi: 10.1038/nature04195. [DOI] [PubMed] [Google Scholar]
  • 17.Lukatsky DB, Shakhnovich BE, Mintseris J, Shakhnovich EI. Structural similarity enhances interaction propensity of proteins. J Mol Biol. 2007;365:1596–1606. doi: 10.1016/j.jmb.2006.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Andre I, Strauss CE, Kaplan DB, Bradley P, Baker D. Emergence of symmetry in homooligomeric biological assemblies. Proc Natl Acad Sci USA. 2008;105:16148–16152. doi: 10.1073/pnas.0807576105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bennett MJ, Schlunegger MP, Eisenberg D. 3D domain swapping: a mechanism for oligomer assembly. Protein Sci. 1995;4:2455–2468. doi: 10.1002/pro.5560041202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.D’Alessio G. Oligomer evolution in action? Nat Struct Biol. 1995;2:11–13. doi: 10.1038/nsb0195-11. [DOI] [PubMed] [Google Scholar]
  • 21.Xu D, Tsai CJ, Nussinov R. Mechanism and evolution of protein dimerization. Protein Sci. 1998;7:533–544. doi: 10.1002/pro.5560070301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tiana G, Broglia RA. Design and folding of dimeric proteins. Proteins. 2002;49:82–94. doi: 10.1002/prot.10196. [DOI] [PubMed] [Google Scholar]
  • 23.Pereira-Leal JB, Teichmann SA. Novel specificities emerge by stepwise duplication of functional modules. Genome Res. 2005;15:552–559. doi: 10.1101/gr.3102105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hashimoto K, Madej T, Bryant SH, Panchenko AR. Functional states of homooligomers: insights from the evolution of glycosyltransferases. J Mol Biol. 2010;399:196–206. doi: 10.1016/j.jmb.2010.03.059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 2007;8:R51. doi: 10.1186/gb-2007-8-4-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. Assembly reflects evolution of protein complexes. Nature. 2008;453:1262–1265. doi: 10.1038/nature06942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Katsamba P, et al. Linking molecular affinity and cellular specificity in cadherin-mediated adhesion. Proc Natl Acad Sci USA. 2009;106:11594–11599. doi: 10.1073/pnas.0905349106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Naveed H, Jackups R, Jr, Liang J. Predicting weakly stable regions, oligomerization state, and protein-protein interfaces in transmembrane domains of outer membrane proteins. Proc Natl Acad Sci USA. 2009;106:12735–12740. doi: 10.1073/pnas.0902169106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Joerger AC, et al. Structural evolution of p53, p63, and p73: implication for heterotetramer formation. Proc Natl Acad Sci USA. 2009;106:17705–17710. doi: 10.1073/pnas.0905867106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Akiva E, Itzhaki Z, Margalit H. Built-in loops allow versatility in domain-domain interactions: lessons from self-interacting domains. Proc Natl Acad Sci USA. 2008;105:13292–13297. doi: 10.1073/pnas.0801207105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schuster-Bockler B, Bateman A. Protein interactions in human genetic diseases. Genome Biol. 2008;9:R9. doi: 10.1186/gb-2008-9-1-r9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sun Y, et al. Mechanism of glutamate receptor desensitization. Nature. 2002;417:245–253. doi: 10.1038/417245a. [DOI] [PubMed] [Google Scholar]
  • 33.Matsumiya S, Ishino S, Ishino Y, Morikawa K. Intermolecular ion pairs maintain the toroidal structure of Pyrococcus furiosus PCNA. Protein Sci. 2003;12:823–831. doi: 10.1110/ps.0234503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Estebanez-Perpina E, et al. Structural insight into the mode of action of a direct inhibitor of coregulator binding to the thyroid hormone receptor. Mol Endocrinol. 2007;21:2919–2928. doi: 10.1210/me.2007-0174. [DOI] [PubMed] [Google Scholar]
  • 35.Shoemaker BA, et al. Inferred Biomolecular Interaction Server--a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Res. 2009;38:D518–D524. doi: 10.1093/nar/gkp842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhu H, Domingues FS, Sommer I, Lengauer T. NOXclass: prediction of protein-protein interaction types. BMC Bioinformatics. 2006;7:27. doi: 10.1186/1471-2105-7-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372:774–797. doi: 10.1016/j.jmb.2007.05.022. [DOI] [PubMed] [Google Scholar]
  • 38.Ofran Y, Rost B. Analyzing six types of protein-protein interfaces. J Mol Biol. 2003;325:377–387. doi: 10.1016/s0022-2836(02)01223-8. [DOI] [PubMed] [Google Scholar]
  • 39.Rauscher S, Baud S, Miao M, Keeley FW, Pomes R. Proline and glycine control protein self-organization into elastomeric or amyloid fibrils. Structure. 2006;14:1667–1676. doi: 10.1016/j.str.2006.09.008. [DOI] [PubMed] [Google Scholar]
  • 40.Tartaglia GG, Vendruscolo M. The Zyggregator method for predicting protein aggregation propensities. Chem Soc Rev. 2008;37:1395–1401. doi: 10.1039/b706784b. [DOI] [PubMed] [Google Scholar]
  • 41.Pechmann S, Levy ED, Tartaglia GG, Vendruscolo M. Physicochemical principles that regulate the competition between functional and dysfunctional association of proteins. Proc Natl Acad Sci USA. 2009;106:10159–10164. doi: 10.1073/pnas.0812414106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chen Y, Dokholyan NV. Natural selection against protein aggregation on self-interacting and essential proteins in yeast, fly, and worm. Mol Biol Evol. 2008;25:1530–1533. doi: 10.1093/molbev/msn122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kitamura M, et al. Structural and functional analysis of a glycoside hydrolase family 97 enzyme from Bacteroides thetaiotaomicron. J Biol Chem. 2008;283:36328–36337. doi: 10.1074/jbc.M806115200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Jonas S, van Loo B, Hyvonen M, Hollfelder F. A new member of the alkaline phosphatase superfamily with a formylglycine nucleophile: structural and kinetic characterization of a phosphonate monoester hydrolase/phosphodiesterase from Rhizobium leguminosarum. J Mol Biol. 2008;384:120–136. doi: 10.1016/j.jmb.2008.08.072. [DOI] [PubMed] [Google Scholar]
  • 45.Benner SA, Cohen MA, Gonnet GH. Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol. 1993;229:1065–1082. doi: 10.1006/jmbi.1993.1105. [DOI] [PubMed] [Google Scholar]
  • 46.Jiang H, Blouin C. Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions. BMC Bioinformatics. 2007;8:444. doi: 10.1186/1471-2105-8-444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR. Long-term trends in evolution of indels in protein sequences. BMC Evol Biol. 2007;7:19. doi: 10.1186/1471-2148-7-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Dams T, et al. The crystal structure of dihydrofolate reductase from Thermotoga maritima: molecular features of thermostability. J Mol Biol. 2000;297:659–672. doi: 10.1006/jmbi.2000.3570. [DOI] [PubMed] [Google Scholar]
  • 49.Garau G, Lemaire D, Vernet T, Dideberg O, Di Guilmi AM. Crystal structure of phosphorylcholine esterase domain of the virulence factor choline-binding protein e from streptococcus pneumoniae: new structural features among the metallo-beta-lactamase superfamily. J Biol Chem. 2005;280:28591–28600. doi: 10.1074/jbc.M502744200. [DOI] [PubMed] [Google Scholar]
  • 50.Estebanez-Perpina E, et al. Crystal structure of the caspase activator human granzyme B, a proteinase highly specific for an Asp-P1 residue. Biol Chem. 2000;381:1203–1214. doi: 10.1515/BC.2000.148. [DOI] [PubMed] [Google Scholar]
  • 51.Rousseau F, Serrano L, Schymkowitz JW. How evolutionary pressure against protein aggregation shaped chaperone specificity. J Mol Biol. 2006;355:1037–1047. doi: 10.1016/j.jmb.2005.11.035. [DOI] [PubMed] [Google Scholar]
  • 52.Fong JH, et al. Intrinsic disorder in protein interactions: insights from a comprehensive structural analysis. PLoS Comput Biol. 2009;5:e1000316. doi: 10.1371/journal.pcbi.1000316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Nishi H, Ota M. Amino acid substitutions at protein-protein interfaces that modulate the oligomeric state. Proteins. 2010;78:1563–1574. doi: 10.1002/prot.22673. [DOI] [PubMed] [Google Scholar]
  • 54.Grueninger D, et al. Designed protein-protein association. Science. 2008;319:206–209. doi: 10.1126/science.1150421. [DOI] [PubMed] [Google Scholar]
  • 55.Wang Y, et al. MMDB: annotating protein sequences with Entrez’s 3D-structure database. Nucleic Acids Res. 2007;35:D298–D300. doi: 10.1093/nar/gkl952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Marchler-Bauer A, et al. CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009;37:D205–210. doi: 10.1093/nar/gkn845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996;6:377–385. doi: 10.1016/s0959-440x(96)80058-3. [DOI] [PubMed] [Google Scholar]
  • 58.Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol. 2005;346:1173–1188. doi: 10.1016/j.jmb.2004.12.032. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES