Protein contacts atlas: visualization and analysis of non-covalent contacts in biomolecules

Melis Kayikci; A J Venkatakrishnan; James Scott-Brown; Charles N J Ravarani; Tilman Flock; M Madan Babu

doi:10.1038/s41594-017-0019-z

. Author manuscript; available in PMC: 2018 Jul 15.

Published in final edited form as: Nat Struct Mol Biol. 2018 Jan 15;25(2):185–194. doi: 10.1038/s41594-017-0019-z

Protein contacts atlas: visualization and analysis of non-covalent contacts in biomolecules

Melis Kayikci ^1,^2,^*,^†, A J Venkatakrishnan ^1,^3,^*,^†, James Scott-Brown ^1,⁴, Charles N J Ravarani ¹, Tilman Flock ^1,^5,⁶, M Madan Babu ^1,^*

PMCID: PMC5837000 EMSID: EMS75372 PMID: 29335563

Abstract

Visualizations of biomolecular structures empower us to gain insights into biological functions, generate testable hypotheses, and communicate biological concepts. Typical visualizations (e.g. ball and stick) primarily depict covalent bonds. In contrast, non-covalent contacts between atoms, which govern normal physiology, pathogenesis, and drug action, are seldom visualized. We present Protein Contacts Atlas, an interactive resource of non-covalent contacts from over 100,000 PDB crystal structures. We developed multiple representations for visualization and analysis of non-covalent contacts at different scales of organization: atoms, residues, secondary structure, subunits, and entire complexes. Protein Contacts Atlas enables researchers from different disciplines to investigate diverse questions in the framework of non-covalent contacts, including the interpretation of allostery, disease mutations and polymorphisms, by exploring individual subunits, interfaces and protein-ligand contacts, and by mapping external information. Protein Contacts Atlas is available at http://www.mrc-lmb.cam.ac.uk/pca/ and also through PDBe.

Introduction

Elucidating the structure of biomolecules has provided insights into how they carry out their function¹^–³. These insights have depended on advances in both methods for determining structures (initially X-ray crystallography⁴, NMR spectroscopy⁵ and more recently electron microscopy⁶) and on approaches for visualizing these structures. Historically, graphical representations of biomolecules have focused on covalent bonds that connect individual atoms, such as the ball and stick representation⁷. Such a representation emphasizes the 3D arrangement of atoms and the covalent bonds between them. This has been critical for understanding how function is mediated by the precise spatial localization of atoms in a biomolecule. Computational analyses of covalent bonds have been instrumental in uncovering the principles of protein architecture⁸^–¹¹. Similarly, the calculation of dihedral angles around the covalent bond, and their representation in the Ramachandran plot¹², charted the conformational landscape of polypeptides. The ribbon diagram which was popularized as the Richardson diagram¹³ focused on the covalent backbone architecture and was revolutionary in providing a simplified protein structure representation. This enabled the identification of structural domains, establishment of the structure-function relationship, and classification of protein families¹⁴^–¹⁶. In this manner, each of these representations centered on the covalent bond emphasizes a key aspect of structure and has formed the basis of deriving new insights and discoveries.

In addition to covalent bonds, non-covalent contacts between atoms of residues (residue contacts) are important for co-operative folding of biomolecules, stability and conformational flexibility, and in molecular recognition. Representation of non-covalent contacts dates back to the 1970s in the form of contact matrices¹⁷ and networks of contacts between amino acids in proteins¹⁸. More recently, a number of web tools that compute contact networks have facilitated progress in this area of research¹⁹^–²³. Network representations of non-covalent contacts, and their comparison among related structures, have provided insights into allosteric mechanisms, protein stability, conformational switching, ligand binding and the determinants of protein fold and protein complex assembly¹⁷^,²⁴^–³⁴. Recently, we employed this approach to provide detailed molecular insights into the family of G-protein coupled receptors and G proteins³⁵^–³⁸. In this manner, a residue contact-based representation and analysis of protein structures enable us to identify critical contacts and holds the potential for understanding how biomolecules function in new ways and engineer their activity³⁹^–⁴¹.

Here, we present Protein Contacts Atlas (http://www.mrc-lmb.cam.ac.uk/pca/), a resource of non-covalent contacts from over 110,000 publically available structures in the Protein Data Bank (PDB⁴²; http://www.rcsb.org/). The goal of Protein Contacts Atlas is to go beyond computing contact networks; for the exploration of contacts in structures, we developed interactive representations tailored for different scales of structural organization: atoms, residues, secondary structure, subunits, interfaces and entire biological complexes (Fig. 1). Protein Contacts Atlas also enables investigation of contacts within a single protein or a protein complex, or between a protein and nucleic acids, ligands or other small molecules (Fig. 1). It also permits quantitative analyses of the residue-centric properties derived from the contact network along with externally obtained properties such as evolutionary conservation, thermostability measurements, etc. Outlier residues from the analyses that have the potential to inform follow up experiments are compiled in a downloadable report. Here, we describe the visualization and analysis of non-covalent contacts in Protein Contacts Atlas, which can be readily applied to any system, by focusing on diverse proteins involved in the GPCR signaling pathway.

Visualizations in Protein Contacts Atlas. Summary of the representations for different scales of organization: atoms, residues, secondary structure, subunits, interfaces and entire biological complexes.

Results

Computing non-covalent contacts

We identified non-covalent contacts by calculating the distance between each pair of atoms using their atomic coordinates. We then compared the distances to the Van der Waals radii of the corresponding atoms as determined by Chothia et al.⁴³. The sum of the two atomic radii was subtracted from their distance and a contact was assigned if the resulting difference was less than a threshold. The set of all non-covalent atomic contacts defines a residue contact network, in which each node represents a residue, and a pair of nodes is joined by an edge if there is at least one non-covalent atomic contact between the corresponding residues (the number of such contacts is recorded as the edge weight). For each residue, we computed the local and global network centrality properties such as degree, closeness, and betweenness centrality (see Methods for details). We also quantified the solvent accessible surface area (ASA; Å²) for each residue based on the entire PDB file. Through this approach we identified ~2 billion non-covalent atomic contacts in over 110,000 crystal structures from the PDB (see Supplementary Data for non-covalent contact statistics for each PDB file). As a general trend, we find that the number of non-covalent contacts scale linearly with the size of the molecule, i.e. number of atoms and residues (Supplementary Fig. 1), suggesting that this relationship can be used to infer the tightness of protein packing.

We provide different filtering and contact definition options where users can select a threshold in terms of absolute number of atomic contacts between residues, and/or normalized with respect to the the size of the amino acid to define, view and analyze stronger or weaker interactions. They can also filter contacts based on whether the atoms are from the main-chain/side-chain of residues and identify contacts by their type (i.e. hydrogen bonds, water mediated hydrogen bonds, weak hydrogen bonds, ligand and metal complex interactions, salt bridges, hydrophobic interactions, cation-pi interactions, pi-pi interactions, and other non-canonical contacts using standard geometric considerations by employing the definitions used in Arpeggio²¹. Thus, the user has a number of different options to parameterize, filter and/or choose specific contact types for visualization and analysis in Protein Contacts Atlas.

Visualization of non-covalent contacts

We describe representations that allow intuitive and interactive visualization and analysis of residue contacts (Supplementary Note 1; Fig. 2; Supplementary Fig. 2). We highlight how representations of residue contacts of the same molecule at different scales of organization can provide new insights into structure and function that are not obvious from standard representations.

Protein Contacts Atlas framework. Summary of Protein Contacts Atlas’s framework for visualizing and analyzing contacts.

Biomolecular complex network enables visualization of contacts at the subunit level

How protein (or nucleic acid) subunits interact with each other in a complex is important for understanding the evolution and assembly pathways of the complexes³². The biomolecular complex network representation captures the interactions between subunits of a complex. In this network, the nodes denote individual subunits, which could be proteins or nucleic acids. The links between the nodes denote interaction interfaces between subunits (chains). The size of the node is proportional to the number of residues in the chain and thickness of the link is proportional to the number of residue contacts between the subunits. Such a simplified interactive representation of the entire complex provides an intuitive way to navigate and identify subunits or interfaces of interest for further investigation; particularly when investigating large, multi-subunit complexes (e.g. proteasome). Choosing a subunit, or an interface, takes the user to the “Visualization and Analysis” page (Supplementary Fig. 2).

Chord plot enables visualization of contacts at the secondary structure level

The pattern of contacts between the different secondary structure elements is key to determining the tertiary structure, and hence protein function. ‘Chord plots’ depict all non-covalent contacts at the level of secondary structures. In a chord plot, every secondary structure (including the loops) is represented as an arc (nodes) in a circular layout, and the contacts between the secondary structures are represented as chords (edges). The size of the arc is proportional to the number of residues within the secondary structure and the thickness of the chord is proportional to the number of atomic contacts between them. The chord plot representation provides information about the packing of the different secondary structures and helps identify the secondary structures that are highly connected in the protein structure.

Residue contact matrix enables visualization of contacts at the residue level

Identifying specific contacts between amino acids present on different secondary structure elements helps with inferring the key residues that contribute to protein fold and function. The residue contact matrix presents the non-covalent contacts between residues in the secondary structure elements (selected from Chord plots) and displays the number of atomic contacts between them (Supplementary Fig. 2). Every cell in the matrix has a background color based on the number of atomic contacts. This allows easy identification of residue pairs that make a large number of contacts. The reside contact matrix is particularly useful to investigate the atomic details of interaction interfaces.

The multi-level visualization of non-covalent contacts in the context of a protein-protein interaction (between the Adenosine A2a receptor and an engineered mini Gαs protein⁴⁴) and a protein-nucleic acid interaction (between p53 and DNA⁴⁵) is highlighted in Fig. 3. Such representations can also provide a non-covalent contact based context to generate testable hypotheses for understanding the molecular mechanisms of disease-associated mutations as in pseudo-hyperthyroidism and Albright hereditary osteodystrophy (Supplementary Note 2).

Visualization of protein-protein and protein-DNA interaction interfaces. a, The biomolecular complex network of the Adenosine A2a-mini-Gs structure (5G53) with four chains as nodes and interactions between them as edges is seen with chains A (A2a adenosine receptor; dark blue) and C (mini Gαs protein; light blue) highlighted. The chord plot of contacts between chains A (outer arc in light grey) and C (outer arc in dark grey) of the complex is seen next to it. The inner arcs show the secondary structures in their respective colors with loops as light grey. The selected chord shows the contacts between Helix 39 of the G protein (green) and Helix 11 of the receptor (purple). The residue contact matrix of the interface is also shown along with a network view of the receptor-G protein interaction interface (right). Positions that are mutated in pseudo-hypoparathyroidism (L388 ^G.H5.20; superscript denotes common G protein numbering system³⁶) are shown in the network view and highlighted. Contacts are represented as blue edges and nodes are represented as spheres (using the Cα atom co-ordinates of the residues). b, The biomolecular complex network of p53 in complex with DNA (4MZR) with chain A (p53; circle, dark blue) and chain K (DNA; square, light blue) is highlighted. The selected chord highlights the contacts between Sheet B3 of p53 (chain A; light grey, outer arc) with DNA (Chain K; dark grey, outer arc). The residue contact matrix shows that there are five atomic contacts between R273 of p53 and T20 of the DNA strand. The 3D structure view of the protein-DNA complex with position R273 (red) that forms a part of the interaction interface and whose mutation is implicated in cancer is highlighted on the right.

Visualization and analysis of residue-centric contacts and properties

Asteroid plot enables visualization of local neighbourhood of residues and ligands

Understanding the local neighbourhood of ligands and residues in a structure can aid protein engineering, structure-based drug design and interpretation of the effect of mutations (for example via schematic diagrams of protein-ligand interactions generated by programs such as LigPlot⁴⁶). ‘Asteroid plots’ provide an interactive representation of the atomic neighborhood of a selected ligand or a residue. The ligand or residue of interest is shown in the center as a node. All immediate residues that form a non-covalent contact (first-shell residues) are arranged in a circle around the ligand. The neighbors of each of the contacting residues that do not directly contact the ligand (second-shell residues) are arranged in a larger concentric circle. The size of the nodes in the inner and outer concentric circles denotes the number of atomic contacts. Clicking on any residue makes it the central node, and the asteroid plot for that entity is dynamically generated. This representation allows the identification of key residues that contact a ligand of interest in a structure. While the atomic details or the nature of the contact (e.g. main-chain, side-chain, etc.) are not shown in the asteroid plot, the contacting residues are highlighted in the 3D structure panel (Supplementary Note 1; Supplementary Fig. 2), enabling the analysis of the nature of the contact. Furthermore, detailed information about the individual non-covalent atomic contacts between the individual atoms of the ligand and the contacting residues can be visualized as a ligand-residue interaction matrix from this sub-panel (Fig. 1). Using Asteroid plot, we illustrate how the beta-blocker carazolol acts at its target human β2 adrenergic receptor (Fig. 4a-d; PDB 2RH1)⁴⁷. Please see Supplementary Note 2 for a discussion on how such plots can provide a context for generating hypothesis of the molecular basis of a receptor polymorphism linked to asthma.

Visualization and analysis of protein-ligand contacts. a, Ligand binding pocket of β2 adrenergic receptor bound to the ligand Carazolol (CAU408; blue; 2RH1). All the directly contacting residues are shown as grey sticks. b, Asteroid plot with the ligand highlighted in blue (central node). Directly contacting residues (first-shell residues) are shown in the inner circle and the residues that contact these but not the ligand (second-shell residues) are shown in the outer circle. The residues are colored according to their secondary structures and the size of the circle is scaled to denote the number of atomic contacts. c, The ligand residue matrix shows the atoms (atom numbers are obtained from the PDB file) of the ligand as rows and the residues contacting the ligand as columns. Number of atomic contacts is also shown in the matrix. d, Ligand contacts are shown in the network view. e, All the ligand contacting residues are highlighted in the scatter plot matrix along with f, the statistics table showing solvated area, degree, betweenness and closeness centrality measures (first ten out of seventeen residues are shown).

Scatter plot matrix allows quantitative analysis of per-residue properties

Residues with distinct structural properties are important for function or are attractive sites for engineering. Quantifying different structural (e.g. surface area) and contact properties (e.g. number of contacts) on a per-residue basis, and analyzing their correlations provides a way to identify outlier residues that might be important for structure and/or function (e.g. a buried residue with a large number of contacts can be critical for protein stability)²⁶^,²⁸^,⁴⁸^–⁵⁰. Per-residue external information such as sequence conservation, thermostability, disease mutations, etc. as well as the computed properties can also be mapped onto the contact information and 3D structure for further analysis. Scatter plots display values for two variables for every residue in the chain(s) of interest: each residue is represented by a point, with the values of the variables determining its x- and y-coordinates. A matrix of scatterplots represents more than two variables using multiple scatter plots arranged in a grid, with one row and column per variable. The calculated properties such as the solvent accessible area of the complex, network centrality measures (closeness and betweenness) and the degree for each residue are plotted against each other in the scatter plot matrix (Fig. 4e; see also Supplementary Note 2 and Supplementary Fig. 3 for highlighting multiple positions that are disease mutations in rhodopsin onto the scatter plot).

Analysis of per-residue properties through an interactive statistics table

The interactive statistics table allows sorting the individual residues by any property (e.g. residue name, number, ASA, degree, etc.), enabling the identification of residues with extreme values for a property. The table can be filtered by typing a residue (or multiple independent residues) in the text box, or selecting a bunch of data points directly in the scatter plot, or by clicking on a residue or secondary structure in the sequence panel (Supplementary Note 1; Fig. 4f). Clicking on any row will update the 3D structure panel with the selected residue and can be downloaded in different formats (see sample PDF file in Supplementary Data).

Mapping external information for detailed per-residue analysis of structures

Combining the computed properties of individual residues with external information can help to identify and characterize functionally and structurally important regions/segments in protein structures. Protein Contacts Atlas allows importing and mapping of any external information that is relevant to a particular research question (e.g. evolutionary conservation, disease mutations, thermostability, b-factors, post-translational modification sites, etc.). It automatically generates a template file that a user can download, complete, and upload to the website. The uploaded values are integrated with the statistics table, mapped onto all the relevant panels (including scatter plot matrix, asteroid plot and the 3D structure panel) and colour-coded (cyan low to magenta, high). In this manner, Protein Contacts Atlas allows the user to integrate external and independently derived (i.e. orthogonal) information to make relevant inferences about a biomolecule of interest (see also Supplementary Note 2 and Supplementary Fig. 4 for interpreting stability measurements of point mutations in G protein using this feature).

Structure report and downloadable data

A fully customizable report of the contact-based analysis of the selected chain of a structure can be downloaded (see example in Supplementary Data). It provides a summary of the session for a structure of interest, containing a screenshot of the current views of the structure from the 3D structure panel, the chord plot, an asteroid plot of the selected ligand and the scatter plot matrix from the contacts panel (Supplementary Note 1). Outlier residues are listed in tables, which include those residues with the ten largest, and ten smallest values of (a) ASA, (b) degree, (c) betweenness (d) closeness, and (e) number of atomic contacts (of the ligand if there is one). The primary information about every non-covalent contact between atoms can be downloaded as a text file, via the contacts panel (see example TXT file in Supplementary Data). The web resource can be queried using batch mode by retrieving structures based on their PFAM domain. The information can be downloaded in different formats including for stand-alone visualization with PyMOL (see Methods for details).

Rearrangement of residue contacts in rhodopsin cycle

To highlight how the analysis of residue contacts can be used to derive insights into protein function and mechanism, we present an analysis of the activation mechanism of rhodopsin. Overall, the analysis of the high-resolution structures of rhodopsin reveals a global rearrangement of non-covalent contacts underlying the first molecular events of vision.

Rhodopsin is a light-sensitive protein that is expressed in the eye and enables vision in dim light. In the absence of light, rhodopsin is bound to cis-retinal and is in an inactive state. Incidence of light catalyzes an isomerization reaction of retinal that leads rhodopsin to change shape to an activated form. This trigger intracellular signaling cascades that ultimately culminate as an electrical impulse in the visual cortex of the brain. As non-covalent contacts are important for activation, investigating the organization of non-covalent contacts in rhodopsin is imperative for understanding how rhodopsin functions. During the activation process, rhodopsin forms a series of spectroscopically identifiable intermediate states, which when taken together constitute the rhodopsin cycle (Fig. 5a): dark rhodopsin, batho-rhodopsin, lumi-rhodopsin, meta rhodopsin (MI and MII) and free opsin. Extensive efforts in crystallography over the years have resulted in the determination of high-resolution structures of rhodopsin in these states. The availability of these structures provides an opportunity to systematically investigate the rearrangement of non-covalent contacts during rhodopsin activation.

Comparison of the residue contact networks in the rhodopsin cycle. a, Residue contact networks of the structural intermediates of the rhodopsin cycle. In the residue contact networks, amino acid residues are denoted as nodes and the presence of contacts between pairs of residues are denoted as edges. ‘Contact fingerprinting’ and key residue contact changes in the binding pocket are displayed. For every residue contact within the ensemble, the presence or absence of an equivalent residue contact between equivalent positions across the rest of the states was recorded. This information is represented as an array of filled (contact present) and empty (contact absent) cells, which are referred to here as ‘contact fingerprints’. b, Inactive state only contacts map to the retinal binding pocket and the G protein-coupling region. Active state only contacts map to the region linking the two regions. A key structural change in the binding pocket involving aromatic contacts between Phe293 and Phe294 is shown in stick representation. (**a-b**) inactive states are shown in grey and active states are shown in green.

Non-covalent contacts for the different states of the rhodopsin cycle were computed (PDB IDs: 1U19, 2G87, 2HPY, 3PQR, 3CAP). The presence of contacts was compared across different states using contact fingerprinting (see Methods and Supplementary Data). A core network consisting of 318 residue contacts is present consistently in all the five states. This core network provides a state-independent platform for changes in non-covalent contacts in the rest of the protein. A separate network of 151 contacts connecting 163 residues is maintained exclusively in the inactive (dark, batho and lumi) states. Upon the lumi to metarhodopsin transition, there is a major change in the organization of the contacts. The network of 151 contacts that was previously present in the inactive states is broken and a new network of 90 contacts connecting 126 residues is formed exclusively in metarhodopsin and free opsin and is maintained until the end of the rhodopsin cycle.

In the dark, batho and lumi states, the 151 contacts connecting 163 residues of the inactive states are largely localized near two regions in rhodopsin: (i) the retinal-binding pocket and the transmembrane-extracellular interface region and (ii) region connecting the retinal-binding pocket and G protein-binding site. On the other hand, the 90 contacts of the active states are localized largely in the transmembrane region connecting the retinal binding pocket and G protein-binding pocket. In the inactive states, in the retinal-binding pocket, one of the key contacts observed is between the aromatic ring of Phe293^7.39 (TM7) and Lys296^7.42 (TM7) linked to retinal (superscripts denote GPCRdb numbering⁵¹^,⁵²). Upon activation, this contact is broken and Phe293 engages in a contact with its adjacent amino acid Phe294^7.40 (Fig. 5b). The change in the Phe293 side chain orientation creates an opening between TM1 and TM7 and this local region has been associated with the channel that could potentially be involved in the entry and exit of retinal⁵³. Thus, Phe293 in the inactive states appears to be stabilizing the ligand through a contact whereas the same residue in the active states creates an opening that could enable retinal’s lateral entry and exit. In the G protein-coupling region, some of the important residues in rhodopsin are Arg135^3.50 (TM3), Tyr223^5.58 (TM5), and Tyr306^7.53 (TM7). In the inactive state, they are distal from each other. In the active state, Met257^6.40 (TM6) contacts all these three residues (Fig. 5b). The Met257Tyr mutant form of rhodopsin is constitutively active⁵⁴.

Discussion

Representations of biomolecular structures highlighting specific aspects such as covalent bonds, volume, and surface area have had a profound impact on our understanding of function and on the development of new drugs⁵⁵. For instance, space filling models, Voronoi diagrams and surface representations emphasize volume and surface area, which formed the basis for the identification and investigation of cavities and channels, electrostatic potentials, and interaction interfaces. Such cavities have been exploited for structure-based drug design. Drawing inspiration from how visualization and analysis of structures based on different representations revolutionized structure- and shape- based understanding of biomolecules, we developed representations that enable analysis of non-covalent contacts in biomolecules. Presenting representations of atomic contacts interactively and at different levels of organization (atoms, residues, secondary structures, and chains) alongside the classical 3D structural representation that is more familiar to most biologists provides the opportunity to investigate biomolecules in new ways. Given the likely general interest of this form of representation of protein structures, we have integrated Protein Contacts Atlas via an API through the Protein Data Bank in Europe (PDBe)⁵⁶ website. Protein Contacts Atlas has a modular design, allowing new features to be added easily. Future releases will include the ability to analyze all structures (including NMR and MD models) and directly compare contacts between different structures (e.g. conformational changes upon ligand binding).

Protein Contacts Atlas allows scientists from diverse disciplines including structural biologists, biochemists, molecular biologists, protein engineers, cancer biologists, medicinal and computational chemists, bioinformaticians and geneticists to address diverse questions (Fig. 6). Some typical tasks include mapping mutations from cancer genome sequencing experiments and genome wide association studies, investigating protein structures for rational protein engineering, understanding how individual residues in homologous proteins evolve across homologs, identifying positions for mutational studies aimed at interrogating the function of biomolecules and analyzing structures to derive new biological insights. Finally, Protein Contacts Atlas can also serve as an excellent tool for teachers and students to explore and understand biological molecules at different levels of organization. We anticipate that Protein Contacts Atlas will be a useful scientific resource as well as a learning platform that can fuel future research in biomedical sciences.

How can researchers from different scientific disciplines make use of Protein Contacts Atlas?

Online Methods

Data preprocessing

New structures added to the PDB are preprocessed in a batch process that is run every six months. Structures uploaded by a user undergo the same preprocessing steps, which may take up to 5-7 minutes depending on the size of the file. First, secondary structures are determined using DSSP if this information is not available in the PDB file⁵⁷^,⁵⁸. Non-covalent contacts between atoms are computed using a custom C++ program. Solvent accessible surface area (ASA) is also calculated for each residue using an external program (POPS⁵⁹). For contact identification, we calculated the distance between each pair of atoms based on the co-ordinates provided in the PDB file. The sum of the two atomic radii (defined by Chothia et al.⁴³) was subtracted from this distance and a contact was assigned if the resulting difference was less than a threshold. By default, the threshold used to define if two atoms are contacting is 0.5 Å, but the user can choose any value in the range of 0 to 1 Å. In the case of ligands (including all non-amino acid residues except water (e.g. ions)) or water molecules in the HETATM record in the coordinate file, a contact is assigned if the distance is < 4 Å. The user has the option to provide different distance cutoff values for the ligands.

Filtering and contact identification options

The user has different options to view and analyze contacts:

A filtering step to select a threshold in terms of absolute number of atomic contacts between residues. In this option, the user selects the minimum number of atomic contacts between pairs of residues to view and analyze. After selecting this parameter, the filtering is done within the C++ program. The user has all the options as before for viewing and analyzing the contacts.
A filtering step to view and analyze contacts involving main chain-main chain, side chain-side chain or main chain-side chain atoms.
A filtering step to select contacts that are normalized with respect to the the size of the amino acid. For the normalization, we used a previously published approach¹⁹^,⁶⁰. Briefly, we first identified all non-redundant crystal structures in the PDB (dated 19.06.17) from the NCBI database (ftp://ftp.ncbi.nih.gov/mmdb/nrtable/) using a resolution cut-off of 2 Å. This resulted in 48,856 structures (95,159 chains). We then calculated the average of the maximum number of atomic contacts made by each of the 20 amino acids in these structures (please see Supplementary Fig. 5). This was done using the precomputed results available as JSON files in Protein Contacts Atlas. Using this as our reference table in the C++ program, we computed normalized contacts for any structure provided by the user using the following formula:

Normalized weight = (number of side chain atomic contacts /sqrt(norm_res1 * norm_res2))*100

where number of side chain atomic contacts is the number of atomic contacts between the side chains of two residues where the distance between two atoms is smaller than 4 Å and norm_res1 and norm_res2 are the values taken from the calculated table. After the normalized weight between the residues is calculated, any interaction which has the normalized weight smaller than the threshold chosen by the user is filtered.

A distribution of normalized weights from the 48, 856 non-redundant strucutres are provided as a guide for users to choose the threshold (Supplementary Data). This distribution is also available in the “more info” section of the website while choosing this option for filtering contacts.

If the user changes the default threshold, it only affects filtering options 1, 2, and 3 above.
Protein Contacts Atlas can also calculate hydrogen bonds, water mediated hydrogen bonds, weak hydrogen bonds, ligand and metal complex interactions, salt bridges, hydrophobic interactions, cation-pi interactions, pi-pi interactions, and other non-canonical contacts using standard geometric considerations by employing Arpeggio²¹.

Statistical analysis

For each residue and heteroatoms in the residue contact network, Protein Contacts Atlas calculates a range of network centrality measures⁶¹ (betweenness centrality, closeness centrality and degree), which measures its “importance” in the overall contact network. Centrality measures were computed using the SNAP library written in C++ (see their documentation at https://snap.stanford.edu/snap/doc/snapuser-ref/index.html for detailed definitions). There are two ways of computing the network statistics: with and without water molecules. By default, the network statistics with water molecules are not shown, however the user has the option to view them with water. Betweenness centrality defines how many times the residue of interest falls in the shortest paths connecting other residues. The betweenness centrality values are normalized using the following formula: normalized betweeness for each residue = betweenness for each residue x (2/(([total number of residues]-1)*([total number of residues]-2))). This measure expresses the amount of “control” exerted by that residue over the contacts between other residues in the network⁶²^,⁶³. Closeness centrality is defined as the inverse of the sum of distances of the residue of interest from all other residues⁶⁴^,⁶⁵ and is normalized within SNAP. In this case, the more buried a given residue is, the more contacts it has, and the closer it is to other residues. The closeness centrality is a measure of how quickly “information” spreads from a given residue to other residues in the network. Both measures show how central the residue is with regard to the whole residue contact network. The degree of a residue is the number of other residues it contacts.

Data visualization

Preprocessed results are stored in JSON files (http://json.org/) and are used to produce the interactive visualizations. JSON was chosen because it is a simple format that is easy to generate and parse. A typical JSON file includes the name, numbers and weights (number of atomic contacts) of the residues, secondary structure elements or loops to which the residues belong, whether it is a heteroatom (ligands, water) or not, and which chain they belong to. The file also contains the contacts separately for each residue. This includes the contacting residue pairs, the contacting atoms in each residue, the distances between the contacting atoms, the types of atoms (main or side chains) and the total number of atomic contacts within the residue pair. Finally, the file includes the secondary structure definition (start and end positions and names, e.g. A:HELIX14, A:SHEETB_2, B:LOOP1). The JSON files are used as an input to visualize the contacts in the browser using JavaScript, HTML and CSS. The Bootstrap framework (http://getbootstrap.com/) is used for the overall page layout, and the D3.js library (http://d3js.org/) is used to produce interactive graphs and plots, including the chord plot, asteroid plot and the scatter plot matrix.

Calculations of residue contacts and contact fingerprinting for the Rhodopsin case study

The residue contacts were computed for the structures representing the different intermediates of the rhodopsin cycle. A residue contact between a pair of residues is defined as present when the distance between any two atoms from the residue pair is less than the sum of their van der Waals radii plus a cut-off distance of 0.5 Å³⁵^–³⁷. We analyzed the presence of residue contacts between structurally equivalent residues across the different conformational states of rhodopsin. The functional importance of a given residue contact across conformational states can be estimated based on the extent to which structurally equivalent contacts are maintained consistently. For every residue contact within the ensemble, the presence or absence of an equivalent residue contact between structurally equivalent positions across the rest of the states was recorded. This information is stored as a bit string of ones (present) and zeros (absent), which are referred to as “contact fingerprints”³⁷. Identifying contact fingerprints that represent consistently maintained residue contacts across and between conformational states enabled us to identify the key rearrangements of residue contacts during rhodopsin activation.

Description of the visualization features

The PDB file itself may not reflect the biological unit. Therefore, a PDBe PISA link is also provided for that PDB file (e.g.http://www.ebi.ac.uk/pdbe/pisa/cgi-bin/piserver?qa=3sn6). The link provides access to the PDB co-ordinates to the different plausible biological units of the proteins involved in the complexes. The user can then choose the relevant assembly of interest and upload the file to Protein Contacts Atlas for visualization and analysis. Upon selecting a structure, a page displays the entire 3D structure of the molecule in cartoon representation and a corresponding protein complex network. The main page has three interlinked panels, displaying representations of the sequence, the 3D structure, and the non-covalent contacts (see Supplementary Note 1 for details). The biomolecular complex network is always shown on the top left, providing the opportunity for the user to easily switch between chains or interfaces. For the chord plot, moving the cursor over a chord or arc on the contacts panel increases the transparency of the other secondary structures, making it easier to identify and investigate contacts between secondary structures that are far away in the protein sequence. If the user selects an interface, this view provides information about the secondary structures that interact between chains and the thickness is indicative of the strength of the interface. The colors used for secondary structure representation are consistent across different panels (Supplementary Fig. 2 and Supplementary Note 1). Users have the option to manually define “super secondary structure elements” and/or adjust the exact definition of a secondary structure. For the residue contact matrix, clicking on the individual elements within matrix (which is accessed by clicking on a chord first) highlights the relevant contacting residues in the 3D structure panel, providing the opportunity to investigate the chemical nature of the contact (e.g. side-chain or main-chain contacts). The ligand contacts, which can be seen in the Ligands and Residues sub-panel can be independently visualized and analyzed by downloading a PyMOL script that is provided in the 3D structure panel.

Description of the analysis features

In the scatter plot matrix, the colour spectrum for the different properties can also be set by the user to obtain publication quality images and/or to visualize the 3D structure for detailed analysis (by downloading the updated PyMOL session file). In the Per Residue Statistics sub-panel, clicking and dragging the cursor over a specific region of any scatter plot selects the data points in this region, and simultaneously highlights the same set of residues in the other scatter plots, the 3D structure panel and the sequence panel. Individual residues of interest are either selected by typing in the residue number in a text box or by clicking on a residue in the sequence panel. Multiple independent residues (e.g. several disease mutations) can be selected by typing in the residue numbers separated by a comma in the text box. This highlights the residue(s) in red in all the scatter plots and in the 3D structure panel.

Download file formats and options for accessing contact information for several structures

The contacts file for the individual structures (with all their chains) contains information about the chain, secondary structure, residue name and number, number of atomic contacts for each residue pair, the atom names, types (main chain or side chain atom) and the distance (Å) between the contacting atoms in text format. If a user is interested in a particular protein family but does not have a list of PDB codes of structures that contain the domain, they can use the “PDBs by PFAM” option (within “Advanced Options”) to download contacts of all structures that contain a PFAM domain of interest using the default options for contact definition. In addition, users can also download this information in a simple interaction file (SIF) format, which serves as an input for Cytoscape⁶⁶, a popular open source software for complex network analysis. The user can also download high-resolution screenshots of images from the contacts panels in support vector graphics (SVG) format, screenshots of the 3D structure in portable network graphics (PNG) format, and PyMOL session files for stand-alone visualization in PyMOL.

Webserver specifications

Protein Contacts Atlas has been tested on Chrome, Firefox and Safari (versions 6.0 and higher) and works best in Chrome.

Data availability and webserver access

Protein Contacts Atlas is available online at http://www.mrc-lmb.cam.ac.uk/pca/. Users can access the information for any PDB file directly from a link: www.mrc-lmb.cam.ac.uk/pca/redirect/3sn6/. A detailed tutorial can be accessed via the “Quick Tutorial” button. Data is available with the paper online (Supplementary Data). Source code for the project is available at https://github.com/pandora2017/protein_contacts_atlas. Other data are available upon request. A Life Sciences Reporting Summary for this article is available.

Supplementary Material

Reporting Summary

NIHMS75372-supplement-Reporting_Summary.pdf^{(68.6KB, pdf)}

Supplementary Dataset 1

NIHMS75372-supplement-Supplementary_Dataset_1.pdf^{(911.9KB, pdf)}

Supplementary Dataset 2

NIHMS75372-supplement-Supplementary_Dataset_2.txt^{(1.8MB, txt)}

Supplementary Dataset 3

NIHMS75372-supplement-Supplementary_Dataset_3.pdf^{(695.6KB, pdf)}

Supplementary Dataset 4

NIHMS75372-supplement-Supplementary_Dataset_4.txt^{(16.7KB, txt)}

Supplementary Dataset 5

NIHMS75372-supplement-Supplementary_Dataset_5.txt^{(3.9MB, txt)}

Supplementary Dataset 6

NIHMS75372-supplement-Supplementary_Dataset_6.txt^{(473.1KB, txt)}

Supplementary Figures

NIHMS75372-supplement-Supplementary_Figures.doc^{(1.6MB, doc)}

Supplementary Notes

NIHMS75372-supplement-Supplementary_Notes.pdf^{(211KB, pdf)}

Acknowledgments

We thank A. Lesk, C. Chothia, S. Balaji, H. Harbrecht, I. Huppertz, M. Ouédraogo, G. Chalancon, A. Morgunov, A. Murzin, A. Andreeva, G. de Baets, R. Peer, S. Chavali, A. Sente, N. S. Latysheva, A. Gunnarsson and A. Hauser for their comments on this work. We thank T. Nakane for his inputs on structure visualization using WebGLMol. This work was supported by the Medical Research Council (MC_U105185859; M.M.B., M.K., C.N.J.R., T.F., A.J.V. and J.S.B.), the AFR scholarship from the Luxembourg National Research Fund (C.N.J.R.) and the Boehringer Ingelheim Fond (T.F.). T.F. is a Research Fellow of Fitzwilliam College, University of Cambridge, UK. M.M.B. is a Lister Institute Research Prize Fellow.

Footnotes

Author Contributions

M.K. collected the data, developed the computational pipeline, and built the webserver. A.J.V. designed the prototype of the representations with M.M.B. A.J.V, M.K., J.S.B and M.M.B optimized the representations and M.K. and J.S.B. implemented the representations. M.K. and A.J.V. performed the GPCR analyses. J.S.B. made the prototype of the webserver. C.N.J.R. and T.F. helped with the webserver and analyzing examples. M.K. and C.N.J.R, and A.J.V independently wrote separate drafts of the manuscript. M.K., A.J.V., and M.M.B wrote the final manuscript with critical inputs from C.N.J.R., J.S.B and T.F; M.K., A.J.V and C.N.J.R. prepared the figures. A.J.V. and M.M.B. conceived and planned the project. M.K., A.J.V and M.M.B executed the project. M.M.B supervised the project.

Competing Financial Interest

The authors declare no competing financial interest.

References

1.Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171:737–8. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
2.Kendrew JC, et al. Structure of myoglobin: A three-dimensional Fourier synthesis at 2 A. resolution. Nature. 1960;185:422–7. doi: 10.1038/185422a0. [DOI] [PubMed] [Google Scholar]
3.Perutz MF, et al. Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature. 1960;185:416–22. doi: 10.1038/185416a0. [DOI] [PubMed] [Google Scholar]
4.Shi Y. A glimpse of structural biology through X-ray crystallography. Cell. 2014;159:995–1014. doi: 10.1016/j.cell.2014.10.051. [DOI] [PubMed] [Google Scholar]
5.Wuthrich K. The way to NMR structures of proteins. Nat Struct Biol. 2001;8:923–5. doi: 10.1038/nsb1101-923. [DOI] [PubMed] [Google Scholar]
6.Cheng Y. Single-Particle Cryo-EM at Crystallographic Resolution. Cell. 2015;161:450–7. doi: 10.1016/j.cell.2015.03.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ollis WD. Models and Molecules. Proceedings of the Royal Institution of Great Britain. 1972;45:1–31. [Google Scholar]
8.Perutz MF. The Hemoglobin Molecule. Sci Am. 1964;211:64–76. doi: 10.1038/scientificamerican1164-64. [DOI] [PubMed] [Google Scholar]
9.Baldwin J, Chothia C. Haemoglobin: the structural changes related to ligand binding and its allosteric mechanism. J Mol Biol. 1979;129:175–220. doi: 10.1016/0022-2836(79)90277-8. [DOI] [PubMed] [Google Scholar]
10.Pauling L, Corey RB, Branson HR. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci U S A. 1951;37:205–11. doi: 10.1073/pnas.37.4.205. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Richardson JS. beta-Sheet topology and the relatedness of proteins. Nature. 1977;268:495–500. doi: 10.1038/268495a0. [DOI] [PubMed] [Google Scholar]
12.Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol. 1963;7:95–9. doi: 10.1016/s0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]
13.Richardson JS. Early ribbon drawings of proteins. Nat Struct Biol. 2000;7:624–5. doi: 10.1038/77912. [DOI] [PubMed] [Google Scholar]
14.Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–8. doi: 10.1038/261552a0. [DOI] [PubMed] [Google Scholar]
15.Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–40. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
16.Orengo CA, et al. CATH--a hierarchic classification of protein domain structures. Structure. 1997;5:1093–108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
17.Ken Nishikawa TO, Isogai Yoshinori, Saitô Nobuhiko. Tertiary Structure of Proteins. I. Representation and Computation of the Conformations. J Phys Soc Jpn. 1972;32:1331–1337. [Google Scholar]
18.Lesk AM, Chothia C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol. 1980;136:225–70. doi: 10.1016/0022-2836(80)90373-3. [DOI] [PubMed] [Google Scholar]
19.Chakrabarty B, Parekh N. NAPS: Network Analysis of Protein Structures. Nucleic Acids Res. 2016;44:W375–82. doi: 10.1093/nar/gkw383. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Seeber M, Felline A, Raimondi F, Mariani S, Fanelli F. WebPSN: a web server for high-throughput investigation of structural communication in biomacromolecules. Bioinformatics. 2015;31:779–81. doi: 10.1093/bioinformatics/btu718. [DOI] [PubMed] [Google Scholar]
21.Jubb HC, et al. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J Mol Biol. 2017;429:365–371. doi: 10.1016/j.jmb.2016.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Doncheva NT, Assenov Y, Domingues FS, Albrecht M. Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc. 2012;7:670–85. doi: 10.1038/nprot.2012.004. [DOI] [PubMed] [Google Scholar]
23.Piovesan D, Minervini G, Tosatto SC. The RING 2.0 web server for high quality residue interaction networks. Nucleic Acids Res. 2016;44:W367–74. doi: 10.1093/nar/gkw315. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Vishveshwara S, Brinda KV, Kannan N. Protein Structure: Insights from Graph Theory. Jl Th Comp Chem. 2002;1:187–211. [Google Scholar]
25.Suel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol. 2003;10:59–69. doi: 10.1038/nsb881. [DOI] [PubMed] [Google Scholar]
26.del Sol A, Fujihashi H, Amoros D, Nussinov R. Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol Syst Biol. 2006;2:2006 0019. doi: 10.1038/msb4100063. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Kornev AP, Haste NM, Taylor SS, Eyck LF. Surface comparison of active and inactive protein kinases identifies a conserved activation mechanism. Proc Natl Acad Sci U S A. 2006;103:17783–8. doi: 10.1073/pnas.0607656103. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Vishveshwara S, Ghosh A, Hansia P. Intra and inter-molecular communications through protein structure network. Curr Protein Pept Sci. 2009;10:146–60. doi: 10.2174/138920309787847590. [DOI] [PubMed] [Google Scholar]
29.Fanelli F, Felline A, Raimondi F. Network analysis to uncover the structural communication in GPCRs. Methods Cell Biol. 2013;117:43–61. doi: 10.1016/B978-0-12-408143-7.00003-7. [DOI] [PubMed] [Google Scholar]
30.Bhattacharyya M, Ghosh S, Vishveshwara S. Protein Structure and Function: Looking through the Network of Side-Chain Interactions. Curr Protein Pept Sci. 2016;17:4–25. doi: 10.2174/1389203716666150923105727. [DOI] [PubMed] [Google Scholar]
31.Fanelli F, Felline A, Raimondi F, Seeber M. Structure network analysis to gain insights into GPCR function. Biochem Soc Trans. 2016;44:613–8. doi: 10.1042/BST20150283. [DOI] [PubMed] [Google Scholar]
32.Ahnert SE, M J, Hernández H, Robinson CV, Teichmann SA. Principles of assembly reveal a periodic table of protein complexes. Science. 2015;350 doi: 10.1126/science.aaa2245. [DOI] [PubMed] [Google Scholar]
33.Levy ED, P-L J, Chothia C, Teichmann SA. 3D complex: a structural classification of protein complexes. PLoS Comput Biol. 2006;2 doi: 10.1371/journal.pcbi.0020155. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Greene LH, Higman VA. Uncovering network systems within protein structures. J Mol Biol. 2003;334:781–91. doi: 10.1016/j.jmb.2003.08.061. [DOI] [PubMed] [Google Scholar]
35.Venkatakrishnan AJ, et al. Molecular signatures of G-protein-coupled receptors. Nature. 2013;494:185–94. doi: 10.1038/nature11896. [DOI] [PubMed] [Google Scholar]
36.Flock T, et al. Universal allosteric mechanism for Galpha activation by GPCRs. Nature. 2015;524:173–9. doi: 10.1038/nature14663. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Venkatakrishnan AJ, et al. Diverse activation pathways in class A GPCRs converge near the G-protein-coupling region. Nature. 2016;536:484–7. doi: 10.1038/nature19107. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Flock T, et al. Selectivity determinants of GPCR-G-protein binding. Nature. 2017;545:317–322. doi: 10.1038/nature22070. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Doncheva NT, Klein K, Domingues FS, Albrecht M. Analyzing and visualizing residue networks of protein structures. Trends Biochem Sci. 2011;36:179–82. doi: 10.1016/j.tibs.2011.01.002. [DOI] [PubMed] [Google Scholar]
40.Martin AJ, et al. RING: networking interacting residues, evolutionary information and energetics in protein structures. Bioinformatics. 2011;27:2003–5. doi: 10.1093/bioinformatics/btr191. [DOI] [PubMed] [Google Scholar]
41.Zhang X, P T, Teichmann SA. Evolution of protein structures and interactions from the perspective of residue contact networks. Curr Opin Struct Biol. 2013;23:954–63. doi: 10.1016/j.sbi.2013.07.004. [DOI] [PubMed] [Google Scholar]
42.Rose PW, et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 2016 doi: 10.1093/nar/gkw1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Tsai J, Taylor R, Chothia C, Gerstein M. The packing density in proteins: standard radii and volumes. J Mol Biol. 1999;290:253–66. doi: 10.1006/jmbi.1999.2829. [DOI] [PubMed] [Google Scholar]
44.Carpenter B, Nehme R, Warne T, Leslie AG, Tate CG. Structure of the adenosine A(2A) receptor bound to an engineered G protein. Nature. 2016;536:104–7. doi: 10.1038/nature18966. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Emamzadah S, Tropia L, Vincenti I, Falquet B, Halazonetis TD. Reversal of the DNA-binding-induced loop L1 conformational switch in an engineered human p53 protein. J Mol Biol. 2014;426:936–44. doi: 10.1016/j.jmb.2013.12.020. [DOI] [PubMed] [Google Scholar]
46.Laskowski RA, Swindells MB. LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J Chem Inf Model. 2011;51:2778–86. doi: 10.1021/ci200227u. [DOI] [PubMed] [Google Scholar]
47.Cherezov V, et al. High-resolution crystal structure of an engineered human beta2-adrenergic G protein-coupled receptor. Science. 2007;318:1258–65. doi: 10.1126/science.1150577. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Mendes HF, v d S J, Chapple JP, Cheetham ME. Mechanisms of cell death in rhodopsin retinitis pigmentosa: implications for therapy. Trends Mol Med. 2005;11:177–85. doi: 10.1016/j.molmed.2005.02.007. [DOI] [PubMed] [Google Scholar]
49.del Sol A, F H, Amoros D, Nussinov R. Residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families. Protein Sci. 2006;15:2120–8. doi: 10.1110/ps.062249106. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Soundararajan V, Raman R, Raguram S, Sasisekharan V, Sasisekharan R. Atomic interaction networks in the core of protein domains and their native folds. PLoS One. 2010;5:e9391. doi: 10.1371/journal.pone.0009391. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Isberg V, et al. Generic GPCR residue numbers - aligning topology maps while minding the gaps. Trends Pharmacol Sci. 2015;36:22–31. doi: 10.1016/j.tips.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Isberg V, et al. GPCRdb: an information system for G protein-coupled receptors. Nucleic Acids Res. 2016;44:D356–64. doi: 10.1093/nar/gkv1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Hildebrand PW, et al. A ligand channel through the G protein coupled receptor opsin. PLoS One. 2009;4:e4382. doi: 10.1371/journal.pone.0004382. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Deupi X, E P, Singhal A, Nickle B, Oprian D, Schertler G, Standfuss J. Stabilized G protein binding site in the structure of constitutively active metarhodopsin-II. Proc Natl Acad Sci U S A. 2012;109:119–24. doi: 10.1073/pnas.1114089108. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.O'Donoghue SI, et al. Visualizing biological data-now and in the future. Nat Methods. 2010;7:S2–4. doi: 10.1038/nmeth.f.301. [DOI] [PubMed] [Google Scholar]
56.Velankar S, et al. PDBe: improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res. 2015 doi: 10.1093/nar/gkv1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
58.Touw WG, et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 2015;43:D364–8. doi: 10.1093/nar/gku1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Cavallo L, Kleinjung J, Fraternali F. POPS: A fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Res. 2003;31:3364–6. doi: 10.1093/nar/gkg601. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Kannan N, Vishveshwara S. Identification of side-chain clusters in protein structures by a graph spectral method. J Mol Biol. 1999;292:441–64. doi: 10.1006/jmbi.1999.3058. [DOI] [PubMed] [Google Scholar]
61.Costa LdF, Rodrigues FA, Travieso G, Villas Boas PR. Characterization of complex networks: A survey of measurements. Advances in Physics. 2007;56:167–242. [Google Scholar]
62.Freeman LC. A Set of Measures of Centrality Based on Betweenness. Sociometry. 1977;40:35–41. [Google Scholar]
63.Yoon J, Blumer A, Lee K. An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics. 2006;22:3106–8. doi: 10.1093/bioinformatics/btl533. [DOI] [PubMed] [Google Scholar]
64.Bavelas A. Communication patterns in task-oriented groups. J Acoust Soc Am. 1950;22:725–730. [Google Scholar]
65.Sabidussi G. The centrality index of a graph. Psychometrika. 1966;31:581–603. doi: 10.1007/BF02289527. [DOI] [PubMed] [Google Scholar]
66.Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary

NIHMS75372-supplement-Reporting_Summary.pdf^{(68.6KB, pdf)}

Supplementary Dataset 1

NIHMS75372-supplement-Supplementary_Dataset_1.pdf^{(911.9KB, pdf)}

Supplementary Dataset 2

NIHMS75372-supplement-Supplementary_Dataset_2.txt^{(1.8MB, txt)}

Supplementary Dataset 3

NIHMS75372-supplement-Supplementary_Dataset_3.pdf^{(695.6KB, pdf)}

Supplementary Dataset 4

NIHMS75372-supplement-Supplementary_Dataset_4.txt^{(16.7KB, txt)}

Supplementary Dataset 5

NIHMS75372-supplement-Supplementary_Dataset_5.txt^{(3.9MB, txt)}

Supplementary Dataset 6

NIHMS75372-supplement-Supplementary_Dataset_6.txt^{(473.1KB, txt)}

Supplementary Figures

NIHMS75372-supplement-Supplementary_Figures.doc^{(1.6MB, doc)}

Supplementary Notes

NIHMS75372-supplement-Supplementary_Notes.pdf^{(211KB, pdf)}

Data Availability Statement

[R1] 1.Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171:737–8. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]

[R2] 2.Kendrew JC, et al. Structure of myoglobin: A three-dimensional Fourier synthesis at 2 A. resolution. Nature. 1960;185:422–7. doi: 10.1038/185422a0. [DOI] [PubMed] [Google Scholar]

[R3] 3.Perutz MF, et al. Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature. 1960;185:416–22. doi: 10.1038/185416a0. [DOI] [PubMed] [Google Scholar]

[R4] 4.Shi Y. A glimpse of structural biology through X-ray crystallography. Cell. 2014;159:995–1014. doi: 10.1016/j.cell.2014.10.051. [DOI] [PubMed] [Google Scholar]

[R5] 5.Wuthrich K. The way to NMR structures of proteins. Nat Struct Biol. 2001;8:923–5. doi: 10.1038/nsb1101-923. [DOI] [PubMed] [Google Scholar]

[R6] 6.Cheng Y. Single-Particle Cryo-EM at Crystallographic Resolution. Cell. 2015;161:450–7. doi: 10.1016/j.cell.2015.03.049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Ollis WD. Models and Molecules. Proceedings of the Royal Institution of Great Britain. 1972;45:1–31. [Google Scholar]

[R8] 8.Perutz MF. The Hemoglobin Molecule. Sci Am. 1964;211:64–76. doi: 10.1038/scientificamerican1164-64. [DOI] [PubMed] [Google Scholar]

[R9] 9.Baldwin J, Chothia C. Haemoglobin: the structural changes related to ligand binding and its allosteric mechanism. J Mol Biol. 1979;129:175–220. doi: 10.1016/0022-2836(79)90277-8. [DOI] [PubMed] [Google Scholar]

[R10] 10.Pauling L, Corey RB, Branson HR. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci U S A. 1951;37:205–11. doi: 10.1073/pnas.37.4.205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Richardson JS. beta-Sheet topology and the relatedness of proteins. Nature. 1977;268:495–500. doi: 10.1038/268495a0. [DOI] [PubMed] [Google Scholar]

[R12] 12.Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol. 1963;7:95–9. doi: 10.1016/s0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]

[R13] 13.Richardson JS. Early ribbon drawings of proteins. Nat Struct Biol. 2000;7:624–5. doi: 10.1038/77912. [DOI] [PubMed] [Google Scholar]

[R14] 14.Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–8. doi: 10.1038/261552a0. [DOI] [PubMed] [Google Scholar]

[R15] 15.Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–40. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]

[R16] 16.Orengo CA, et al. CATH--a hierarchic classification of protein domain structures. Structure. 1997;5:1093–108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]

[R17] 17.Ken Nishikawa TO, Isogai Yoshinori, Saitô Nobuhiko. Tertiary Structure of Proteins. I. Representation and Computation of the Conformations. J Phys Soc Jpn. 1972;32:1331–1337. [Google Scholar]

[R18] 18.Lesk AM, Chothia C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol. 1980;136:225–70. doi: 10.1016/0022-2836(80)90373-3. [DOI] [PubMed] [Google Scholar]

[R19] 19.Chakrabarty B, Parekh N. NAPS: Network Analysis of Protein Structures. Nucleic Acids Res. 2016;44:W375–82. doi: 10.1093/nar/gkw383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Seeber M, Felline A, Raimondi F, Mariani S, Fanelli F. WebPSN: a web server for high-throughput investigation of structural communication in biomacromolecules. Bioinformatics. 2015;31:779–81. doi: 10.1093/bioinformatics/btu718. [DOI] [PubMed] [Google Scholar]

[R21] 21.Jubb HC, et al. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J Mol Biol. 2017;429:365–371. doi: 10.1016/j.jmb.2016.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Doncheva NT, Assenov Y, Domingues FS, Albrecht M. Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc. 2012;7:670–85. doi: 10.1038/nprot.2012.004. [DOI] [PubMed] [Google Scholar]

[R23] 23.Piovesan D, Minervini G, Tosatto SC. The RING 2.0 web server for high quality residue interaction networks. Nucleic Acids Res. 2016;44:W367–74. doi: 10.1093/nar/gkw315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Vishveshwara S, Brinda KV, Kannan N. Protein Structure: Insights from Graph Theory. Jl Th Comp Chem. 2002;1:187–211. [Google Scholar]

[R25] 25.Suel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol. 2003;10:59–69. doi: 10.1038/nsb881. [DOI] [PubMed] [Google Scholar]

[R26] 26.del Sol A, Fujihashi H, Amoros D, Nussinov R. Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol Syst Biol. 2006;2:2006 0019. doi: 10.1038/msb4100063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Kornev AP, Haste NM, Taylor SS, Eyck LF. Surface comparison of active and inactive protein kinases identifies a conserved activation mechanism. Proc Natl Acad Sci U S A. 2006;103:17783–8. doi: 10.1073/pnas.0607656103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Vishveshwara S, Ghosh A, Hansia P. Intra and inter-molecular communications through protein structure network. Curr Protein Pept Sci. 2009;10:146–60. doi: 10.2174/138920309787847590. [DOI] [PubMed] [Google Scholar]

[R29] 29.Fanelli F, Felline A, Raimondi F. Network analysis to uncover the structural communication in GPCRs. Methods Cell Biol. 2013;117:43–61. doi: 10.1016/B978-0-12-408143-7.00003-7. [DOI] [PubMed] [Google Scholar]

[R30] 30.Bhattacharyya M, Ghosh S, Vishveshwara S. Protein Structure and Function: Looking through the Network of Side-Chain Interactions. Curr Protein Pept Sci. 2016;17:4–25. doi: 10.2174/1389203716666150923105727. [DOI] [PubMed] [Google Scholar]

[R31] 31.Fanelli F, Felline A, Raimondi F, Seeber M. Structure network analysis to gain insights into GPCR function. Biochem Soc Trans. 2016;44:613–8. doi: 10.1042/BST20150283. [DOI] [PubMed] [Google Scholar]

[R32] 32.Ahnert SE, M J, Hernández H, Robinson CV, Teichmann SA. Principles of assembly reveal a periodic table of protein complexes. Science. 2015;350 doi: 10.1126/science.aaa2245. [DOI] [PubMed] [Google Scholar]

[R33] 33.Levy ED, P-L J, Chothia C, Teichmann SA. 3D complex: a structural classification of protein complexes. PLoS Comput Biol. 2006;2 doi: 10.1371/journal.pcbi.0020155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Greene LH, Higman VA. Uncovering network systems within protein structures. J Mol Biol. 2003;334:781–91. doi: 10.1016/j.jmb.2003.08.061. [DOI] [PubMed] [Google Scholar]

[R35] 35.Venkatakrishnan AJ, et al. Molecular signatures of G-protein-coupled receptors. Nature. 2013;494:185–94. doi: 10.1038/nature11896. [DOI] [PubMed] [Google Scholar]

[R36] 36.Flock T, et al. Universal allosteric mechanism for Galpha activation by GPCRs. Nature. 2015;524:173–9. doi: 10.1038/nature14663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Venkatakrishnan AJ, et al. Diverse activation pathways in class A GPCRs converge near the G-protein-coupling region. Nature. 2016;536:484–7. doi: 10.1038/nature19107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Flock T, et al. Selectivity determinants of GPCR-G-protein binding. Nature. 2017;545:317–322. doi: 10.1038/nature22070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Doncheva NT, Klein K, Domingues FS, Albrecht M. Analyzing and visualizing residue networks of protein structures. Trends Biochem Sci. 2011;36:179–82. doi: 10.1016/j.tibs.2011.01.002. [DOI] [PubMed] [Google Scholar]

[R40] 40.Martin AJ, et al. RING: networking interacting residues, evolutionary information and energetics in protein structures. Bioinformatics. 2011;27:2003–5. doi: 10.1093/bioinformatics/btr191. [DOI] [PubMed] [Google Scholar]

[R41] 41.Zhang X, P T, Teichmann SA. Evolution of protein structures and interactions from the perspective of residue contact networks. Curr Opin Struct Biol. 2013;23:954–63. doi: 10.1016/j.sbi.2013.07.004. [DOI] [PubMed] [Google Scholar]

[R42] 42.Rose PW, et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 2016 doi: 10.1093/nar/gkw1000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Tsai J, Taylor R, Chothia C, Gerstein M. The packing density in proteins: standard radii and volumes. J Mol Biol. 1999;290:253–66. doi: 10.1006/jmbi.1999.2829. [DOI] [PubMed] [Google Scholar]

[R44] 44.Carpenter B, Nehme R, Warne T, Leslie AG, Tate CG. Structure of the adenosine A(2A) receptor bound to an engineered G protein. Nature. 2016;536:104–7. doi: 10.1038/nature18966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Emamzadah S, Tropia L, Vincenti I, Falquet B, Halazonetis TD. Reversal of the DNA-binding-induced loop L1 conformational switch in an engineered human p53 protein. J Mol Biol. 2014;426:936–44. doi: 10.1016/j.jmb.2013.12.020. [DOI] [PubMed] [Google Scholar]

[R46] 46.Laskowski RA, Swindells MB. LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J Chem Inf Model. 2011;51:2778–86. doi: 10.1021/ci200227u. [DOI] [PubMed] [Google Scholar]

[R47] 47.Cherezov V, et al. High-resolution crystal structure of an engineered human beta2-adrenergic G protein-coupled receptor. Science. 2007;318:1258–65. doi: 10.1126/science.1150577. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Mendes HF, v d S J, Chapple JP, Cheetham ME. Mechanisms of cell death in rhodopsin retinitis pigmentosa: implications for therapy. Trends Mol Med. 2005;11:177–85. doi: 10.1016/j.molmed.2005.02.007. [DOI] [PubMed] [Google Scholar]

[R49] 49.del Sol A, F H, Amoros D, Nussinov R. Residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families. Protein Sci. 2006;15:2120–8. doi: 10.1110/ps.062249106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Soundararajan V, Raman R, Raguram S, Sasisekharan V, Sasisekharan R. Atomic interaction networks in the core of protein domains and their native folds. PLoS One. 2010;5:e9391. doi: 10.1371/journal.pone.0009391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Isberg V, et al. Generic GPCR residue numbers - aligning topology maps while minding the gaps. Trends Pharmacol Sci. 2015;36:22–31. doi: 10.1016/j.tips.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Isberg V, et al. GPCRdb: an information system for G protein-coupled receptors. Nucleic Acids Res. 2016;44:D356–64. doi: 10.1093/nar/gkv1178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Hildebrand PW, et al. A ligand channel through the G protein coupled receptor opsin. PLoS One. 2009;4:e4382. doi: 10.1371/journal.pone.0004382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Deupi X, E P, Singhal A, Nickle B, Oprian D, Schertler G, Standfuss J. Stabilized G protein binding site in the structure of constitutively active metarhodopsin-II. Proc Natl Acad Sci U S A. 2012;109:119–24. doi: 10.1073/pnas.1114089108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.O'Donoghue SI, et al. Visualizing biological data-now and in the future. Nat Methods. 2010;7:S2–4. doi: 10.1038/nmeth.f.301. [DOI] [PubMed] [Google Scholar]

[R56] 56.Velankar S, et al. PDBe: improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res. 2015 doi: 10.1093/nar/gkv1047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[R58] 58.Touw WG, et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 2015;43:D364–8. doi: 10.1093/nar/gku1028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Cavallo L, Kleinjung J, Fraternali F. POPS: A fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Res. 2003;31:3364–6. doi: 10.1093/nar/gkg601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] 60.Kannan N, Vishveshwara S. Identification of side-chain clusters in protein structures by a graph spectral method. J Mol Biol. 1999;292:441–64. doi: 10.1006/jmbi.1999.3058. [DOI] [PubMed] [Google Scholar]

[R61] 61.Costa LdF, Rodrigues FA, Travieso G, Villas Boas PR. Characterization of complex networks: A survey of measurements. Advances in Physics. 2007;56:167–242. [Google Scholar]

[R62] 62.Freeman LC. A Set of Measures of Centrality Based on Betweenness. Sociometry. 1977;40:35–41. [Google Scholar]

[R63] 63.Yoon J, Blumer A, Lee K. An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics. 2006;22:3106–8. doi: 10.1093/bioinformatics/btl533. [DOI] [PubMed] [Google Scholar]

[R64] 64.Bavelas A. Communication patterns in task-oriented groups. J Acoust Soc Am. 1950;22:725–730. [Google Scholar]

[R65] 65.Sabidussi G. The centrality index of a graph. Psychometrika. 1966;31:581–603. doi: 10.1007/BF02289527. [DOI] [PubMed] [Google Scholar]

[R66] 66.Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Protein contacts atlas: visualization and analysis of non-covalent contacts in biomolecules

Melis Kayikci

A J Venkatakrishnan

James Scott-Brown

Charles N J Ravarani

Tilman Flock

M Madan Babu

Abstract

Introduction

Figure 1.

Results

Computing non-covalent contacts

Visualization of non-covalent contacts

Figure 2.

Biomolecular complex network enables visualization of contacts at the subunit level

Chord plot enables visualization of contacts at the secondary structure level

Residue contact matrix enables visualization of contacts at the residue level

Figure 3.

Visualization and analysis of residue-centric contacts and properties

Asteroid plot enables visualization of local neighbourhood of residues and ligands

Figure 4.

Scatter plot matrix allows quantitative analysis of per-residue properties

Analysis of per-residue properties through an interactive statistics table

Mapping external information for detailed per-residue analysis of structures

Structure report and downloadable data

Rearrangement of residue contacts in rhodopsin cycle

Figure 5.

Discussion

Figure 6.

Online Methods

Data preprocessing

Filtering and contact identification options

Statistical analysis

Data visualization

Calculations of residue contacts and contact fingerprinting for the Rhodopsin case study

Description of the visualization features

Description of the analysis features

Download file formats and options for accessing contact information for several structures

Webserver specifications

Data availability and webserver access

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases