Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: J Proteome Res. 2020 Dec 2;20(1):1087–1095. doi: 10.1021/acs.jproteome.0c00495

Leveraging the Entirety of Protein Data Bank to Enable Improved Structure Prediction based on Cross-Link Data

Andrew Keller 1, Juan D Chavez 1, Xiaoting Tang 1, James E Bruce 1
PMCID: PMC7980787  NIHMSID: NIHMS1680687  PMID: 33263396

Abstract

XLinkDB is a fast-expanding public database now storing more than 100,000 distinct identified cross-linked protein residue pairs acquired by chemical cross-linking with mass spectrometry from samples of 12 species1. Mapping identified cross-links to protein structures, when available, provides valuable guidance on protein conformations detected in the cross-linked samples. As more and more structures become available in the Protein Data Bank2, we sought to leverage their utility for cross-link studies by automatically mapping identified cross-links to structures based on sequence homology of the cross-linked proteins with those within structures. This enables use of structures derived from organisms different from those of samples, including large multi-protein complexes and complexes in alternative states. We demonstrate utility of mapping to orthologous structures, highlighting a cross-link between two subunits of mouse mitochondrial Complex I that was mapped to 15 structures derived from five mammals, its distances there of 16.2 ± 0.4Å indicating strong conservation of the protein interaction and proximity of the cross-linked sites. We furthermore show how multimeric structures enable reassessment of cross-links presumed to be intra-protein as potentially homodimeric inter-protein in origin, and how multi-protein structures to which inter-protein cross-links are mapped can be used to evaluate cross-link aided protein-protein docking.

Keywords: Chemical cross-linking, protein structures, Protein BLAST, orthologues, interactomics, mass spectrometry, protein complexes, cross-link database

Graphical Abstract

graphic file with name nihms-1680687-f0001.jpg

Introduction

Structures derived by X-ray crystallography, NMR, or CryoEM provide high resolution information of protein conformations, and in some cases, of protein interactions within a multi-protein complex. These structures are often derived from highly purified protein components, and in the case of X-ray, after crystallization in a single static conformation. The number of proteins and protein complexes for which structures are available is growing rapidly but far from complete. This is especially true when one considers the wide variety of organisms in nature for which few structures exist.

In contrast, chemical cross-linking combined with mass spectrometry (XL-MS) can yield low resolution structural information, but when applied in vivo, of proteins and protein interactions in their natural cellular settings35. These measurements therefore can reflect a wide variety of possible protein and protein complex conformations that may occur in different cells and cell organelles in a sample. Furthermore, they can easily be obtained from samples of any organism, and under a variety of environmental or genetic perturbations. Relevant available structures from any species can greatly help interpret acquired cross-link data. Cross-links can be mapped to their residues in proteins within a structure to observe their distances, either Euclidean or solvent accessible surface (SASD)6, in order to determine whether they are within the expected maximum span for the employed cross-linker. In this manner, one can ascertain whether cross-links acquired from a particular in vivo sample are consistent with any of available structures of the cross-linked protein(s) and thus may have arisen from similar sample protein configurations. Furthermore, when quantitative information on cross-link abundances in samples subjected to environmental or genetic perturbations is available7, 8, one can determine whether subsets of co-regulated cross-links are consistent with any particular structures. This can help elucidate the detected changes among the existent ensemble of conformations in protein samples.

Mapping cross-links identified in samples to structures is often challenging and limited by the unavailability of structures for many cross-linked proteins. For inter-protein cross-links spanning two different proteins, one must query Uniprot to search for structures in the Protein Data Bank (PDB) containing both proteins and their cross-linked residues. For homo-oligomeric cross-links, one must find structures with two or more copies of that protein. In all cases, the cross-linked residues must be correctly located in the chains of the structure file corresponding to their respective proteins. In the past, XLinkDB identified structures available only of the sample species for intra-protein and homodimer inter-protein cross-links, comprising only a small fraction of all cross-links in the database. In all other cases, models of structures of cross-linked proteins were made using Modeller9 or Phyre10, and for inter-protein cross-links, models of the protein-protein interaction were made with PatchDock11 and the Integrative Modeling Platform12. These models are useful in cases where structures do not exist but are limited due to the uncertain accuracy of predicted protein structures.

To maximize the use of structural files in the PDB to explore identified cross-links, including inter-protein cross-links, we implemented on XLinkDB an automated process to identify and procure available structures that include cross-linked proteins. Furthermore, because it is often difficult to obtain structures to which identified sample cross-links can be mapped and because many protein structures are conserved across evolution1315, we sought to increase the number of structures utilized by automatically mapping cross-links to structures of all organisms containing components with highly homologous sequences, and thus likely conserved structures, to those of the cross-linked proteins. This enables use of structures derived from organisms different from those of samples. Conservation of orthologous protein structures has long been exploited by template-based protein modeling programs1618. The mapping includes aligning cross-linked residues with their corresponding sequences in the orthologous structures and translating them into the correct chain residue numbering. Doing this manually, especially with structures of large complexes, can be overwhelming. Such was the case with our previous murine mitochondrial interactome studies1921 since the PDB had no mouse electron transport chain (ETC) complex or supercomplex structures. Orthologous mapping on XLinkDB now does this alignment to structures of non-sample organisms automatically, making visualization of cross-linked murine ETC complex and supercomplex assemblies possible.

Structures automatically mapped to cross-link data sets on XLinkDB serve as tools to help researchers infer their sample protein conformations and interactions. Cross-link distances in the context of structures are clearly displayed to indicate their consistency with the cross-linker maximum span, and cross-links can be directly viewed in the structures using NGL22. This can help identify known alternative states and conformations of protein complexes from which sample cross-links may have originated. As new PDB structures become available, we regularly update mappings to the new entries. Here we demonstrate the consistency of sample cross-links with protein structures from organisms different from those of the samples and show how available structures provide utility. XLinkDB, by offering this functionality to researchers, facilitates the use of greater numbers of structures to help interpret identified sample cross-links.

Methods

Structure files in mmCIF format were downloaded from RCSB using rsynchPDB.sh. Mapping of cross-links on XLinkDB to structure files is implemented as a program in PHP. In order to map a cross-linked protein or protein pair to available structures, the sequences of the protein(s) are in turn subjected to Protein BLAST23 comparison against a FASTA file of sequences of all chains of PDB structure files available at the RCSB Protein Data Bank web site (February 5, 2020, ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt). To ensure high homology, we require an expect score no greater than 10−6 and coverage (fraction of protein sequence length aligned to the structure chain) no less than 0.5, and only structures with a resolution 7Å or better are included. In addition, structures with sequences shorter (0.6 or less) than the corresponding protein but exhibiting coverage (fraction of chain sequence length aligned to the protein) 0.8 or more, are included as they potentially result from crystallized partial protein domains. If the cross-link involves two protein molecules (i.e. inter-protein heterodimer or homodimer), then structure files with different chains homologous to each of the proteins are identified. Structures are sorted by increasing maximum expect score and decreasing minimum coverage for the cross-linked proteins, followed by increasing structure resolution.

For each structure, the matching chains to each cross-linked protein are subjected again to Protein BLAST to output the sequence alignment so one can map any cross-linked residue in a protein to its corresponding residue in the chain of the structure file. Note that for this step chain sequences are derived directly from the structure files rather than from the FASTA file in order to remove appended N-terminal residues or add missing ones (as amino acid X). This is necessary to maintain proper mapping between the aligned protein sequence amino acids and their positions in the chain of the structure file. Next, the cross-linked protein residues are mapped onto the structure chains at their corresponding residue positions, which often involves translation of cross-linked residue number to appropriate chain residue number in the structure files. Cross-linked residues must be present in the structure chains and, except when the cross-link is at the protein N-terminus, only cross-linked residues in the structures that are conserved evolutionarily with lysine (lysine, arginine, glutamic acid, and glutamine) are accepted. This requirement is due to the fact that all cross-link data currently on XLinkDB employ cross-linkers that react at lysine residues; however, future cross-linked residues other than lysine sites can be incorporated with analogous requirements. Figure S1 shows three examples of cross-links that could not be mapped to a structure which either lacked the cross-linked residue or had an amino acid at that position not conserved with lysine. All possible chain pairs for the cross-linked residues are recorded, along with their Euclidean distances.

The mapped structure files along with their chains and residue positions corresponding to the cross-linked sites are stored in the XLinkDB database for display and exploration in its data set tables. Only the top 15 scoring structures for each cross-link are currently stored due to memory constraints, and for each, only the 10 distinct chain pairs having the smallest cross-linked Euclidean distances are retained. The numbers of structures mapped to each cross-link on XLinkDB can be viewed in the last column entitled ‘Available Structures’ of a data set table, which when clicked, goes to a page displaying details of each structure with links to view the cross-link in the context of the structures with NGL, and to view the alignment between the cross-linked peptides and the chains within the structures to which they are mapped.

On a regularly scheduled basis every 48 hours, recently uploaded novel, and thus previously unmapped, cross-links are automatically subjected to mapping. Periodically when updates are made to the RCSB Protein Data Bank, we run a PHP program that automatically downloads the new structures, creates a FASTA consisting only of sequences of new structures added since the prior mapping, and then efficiently re-maps all cross-links on XLinkDB to this small FASTA database and stores high scoring mappings (low maximum expect score, high minimum coverage) to new structures in the database. Since we are currently limiting XLinkDB to store only the top 15 structure files for each cross-linked residue pair, we can remove lower scoring entries in order to make room for higher scoring newly found mappings.

Structures mapped to cross-links were compared pairwise by extracting their individual mapped chains and reporting the RMSD after superposing them with TM-align24. For each cross-link, a single sample-species structure was used as a reference and compared with chains of each of the other structures mapped to the cross-link. When more than one comparison was made, the mean RMSD value was reported. Orthologous structure chains of different organisms, and chains of alternative protein structures of the same organism, were identified by having identical descriptions. For each sample protein, only mappings to a single arbitrary intra-protein cross-link were evaluated to avoid redundancy. In total, comparisons were made for orthologous structures of 492 proteins, alternative same-species structures of 4,197 proteins, as well as for 4,810 pairs of random structures of any species.

Docking of two proteins was performed as previously described25 with the Integrated Modeling Platform12 idock version 2.5.0 and PatchDock (2014 version) using the --precision=1 option and outputting the single top scoring model.

Results and Discussion

Mapping cross-links to structures using Protein BLAST

Automated mapping of cross-links to homologous structures is implemented as a new feature on the public XLinkDB database (http://xlinkdb.gs.washington.edu/xlinkdb/). As illustrated in Figure 1, alignment of cross-linked protein sequences to chains in structures is achieved using Protein BLAST with strict criteria for sequence homology (see Methods). Structures with chains homologous to both cross-linked proteins are identified, and within each, the cross-linked protein residue positions are recorded to enable viewing with applications such as NGL, and their Euclidean distances calculated for assessment of consistency with the cross-linker maximum span. In cases in which multiple instances of one or both cross-linked proteins are present in the structure, details of multiple chain pairs are recorded so cross-links can be assessed with respect to all of them. Structure files mapped to cross-links are stored in the XLinkDB database for display and exploration in its data set tables. This enables researchers to easily leverage available structures in order to help interpret their identified sample cross-links. In addition, on the new Structure View page accessible from the XLinkDB website, one can search for all cross-links mapped to one or two specified PDB structures.

Figure 1.

Figure 1.

Automated mapping of sample cross-links to structures based on sequence homology. In the case of intra-protein cross-links, Proteins A and B are the same molecule.

This feature on XLinkDB has greatly expanded the number of structures in which to view identified cross-links in order to help infer sample protein conformations. In the past on XLinkDB, only 19% of intra-protein cross-links had an associated single structure, as did 1% of inter-protein cross-links, consisting solely of homodimers. In contrast, we now identify structures containing both proteins of inter-protein cross-links, and furthermore include not just a single structure, but when possible, up to 15 different ones from multiple species and/or conformational states. This resulted in 47% of all cross-links on XLinkDB now having available structures, including 64% of the 56,340 intra-protein, and 26% of the 44,267 inter-protein, unique cross-linked residue pairs. Mapped cross-links on average have 10 corresponding structure files generated from 3 different species, the majority having at least one structure from the same species as the cross-linked sample.

Alignments of cross-linked proteins to chains in structure files of a different organism have an average overall sequence identity of 60% (ranging from 17–100%), consistent with expected high structural conservation13. Orthologous protein structures mapped to cross-links of a different species were indeed found to be highly conserved, having an average root mean square deviation (RMSD) with respect to a mapped structure of the sample species equal to 1.5Å, just a bit higher than the 1.0Å average observed among alternative protein structures of the sample species (see Methods). In contrast, 93% of comparisons between two random protein structures were either not aligned or had RMSD values greater than 4Å. We thus expect many mapped structures of non-sample organisms to provide good estimates of the sample protein conformations.

Similarity of orthologous protein structures and interaction

Currently 20,271 out of the 47,397 (43%) total unique cross-linked residue pairs on XLinkDB that are mapped to one or more structures are mapped only to structures of non-sample species. Structures of homologous proteins from other organisms can still be helpful since conservation of structure often coincides with sequence conservation. This is demonstrated by the mouse heart sample inter-protein cross-link20 between two subunits of mitochondrial Complex I that was mapped to 15 structures derived from five mammals, its distances there of 16.2 ± 0.4Å indicating strong structural conservation of the protein interaction. Figure 2A shows representative entries from 5 different species on the XLinkDB list of structures, indicating the title and resolution of the structures, the organism from which they are derived, and the chains to which each cross-linked protein sequence is aligned, including their descriptions. This information can be used to assess the relevance of structures. For example, the chain descriptions of the pig (S.scrofa), human, and cow (B.taurus) structures exactly match the descriptions of the cross-linked mouse proteins. In addition, the cross-link distance in the context of each structure is indicated, as well as the chains and positions of the cross-linked residues, which when clicked, display the Protein BLAST alignment.

Figure 2.

Figure 2.

Mapping of mouse inter-protein cross-link of two components of mitochondrial Complex I, NADH dehydrogenase iron-sulfur protein 2 and NADH dehydrogenase 1 alpha subunit 7, to five orthologous structures. The cross-linked peptide pair VSPPKR-LPVGPSHKLSNNYYCTR spans the two proteins at lysine residue positions 367 and 48, respectively. A. Details of structures derived from mouse (6G2J26), pig (5GUP27), human (5XTD28), cow (5O3129), and sheep (6QA930). B. The two components of the structures to which the cross-link was mapped are shown superposed to mouse structure 6G2J, along with the cross-linked lysine residues, colored by structure, the distances of which are shown in the right inset.

Figure 2B shows the superposition of the two protein components of the above five structures to which the inter-protein cross-link was mapped, in shades of red and blue, respectively. Only the mouse structure 6G2J is derived from the same organism as the cross-linked sample. Also shown are the Euclidean distances of the cross-linked residues in the context of the 5 structures, indicating good agreement consistent with the evident high conservation of the protein structures and interaction. As a result, the orthologous protein residues may also be observed cross-linked in similar human, pig, cow, and sheep samples. Surprisingly, an initial analysis of large-scale cross-link data revealed 2,447 cross-linked peptide pairs on XLinkDB that were identified in samples of multiple organisms as a consequence of structural homology. For example, the intra-protein cross-link of Histone H3 between peptides TK5QTAR and K10STGGKAPR at lysine residue positions 5 and 10 is present in human, mouse, and yeast sample data sets. These proteins have similar sequences, and likely similar structures, but nonetheless conservation of cross-link sites in these cross-linked samples indicates even greater similarity of the protein environment at the systems level. This is further demonstrated by the inter-protein cross-link between Histone H3 peptide EIAQDFK80TDLR and H2B peptide LLLPGELAK109HAVSEGTK at lysine residue positions 80 and 109, respectively, that is observed in human, mouse, and cow data sets. XLinkDB can thus serve as a public resource to investigate conservation of protein structures and interactions at the systems level revealed by common cross-links, even in the absence of known structures. It can also be used to help validate novel cross-links that have been identified in samples of other species.

Viewing cross-links in the context of mapped structures

The automated mapping of cross-links to structure files enables one to easily view sample cross-links on XLinkDB in the context of a variety of structures for data exploration. One example includes the cross-links identified in mouse samples within and between subunits of mitochondrial ATP synthase20, which currently has no available mouse structure. There are structures, however, from other species to which these cross-links were mapped, including 5ARA31, a structure of bovine mitochondrial ATP synthase state 1a. The Euclidean distances of 86 mapped cross-links (72 unique cross-linked residue pairs) in the context of the 5ARA structure are shown in Figure 3A, nearly all consistent with the 35Å maximum span of the BDP-NHP cross-linker, defined by 90% of its observed cross-links, in the context of structures, having distances no greater32. Table S1 contains details of the mappings of the cross-linked proteins to specific chains within the structure file, indicating the maximum expect scores and minimum coverage of their Protein BLAST alignments. Also included are the descriptions of the cross-linked proteins along with those of the chains of the structure file to which they are mapped. Even though the mouse cross-linked proteins are mapped to a bovine structure, one can see that the cross-links are indeed mapped to homologous sites in the structure. For example, 3 cross-links between mouse mitochondrial ATP synthase subunits epsilon and alpha are mapped to lysine residues on structure file chains I and A respectively, corresponding to bovine mitochondrial ATP synthase subunits epsilon and alpha.

Figure 3.

Figure 3.

Mouse cross-linked residues in the context of 5ARA bovine structure. A. Distribution of mouse cross-linked residue Euclidean distances in the context of 5ARA. B. Cross-linked residue pairs shown mapped to 5ARA, the cross-link colored pink with inconsistently large distance discussed below. C. Scatter plot of cross-linked residue distances in the context of bovine structures of two rotational states, 5ARA (state 1a) and 5FIL (state 3b). The inter-protein cross-link shown in pink in part B is also shown here in pink, indicating its consistency with the structure of state 3b and not with 1a.

These 72 cross-linked residue pairs mapped to various chains of the structure can be viewed and explored in NGL, as shown in Figure 3B. One cross-link between mitochondrial ATP synthase subunit β and coupling factor 6 with an inconsistently large distance is shown in pink.

One can easily compare cross-links in the context of two different structures to identify those that are consistent with one and not the other. For example, Figure 3C shows the scatter plot generated on XLinkDB comparing sample cross-link distances in the context of structures 5ARA and 5FIL31, bovine ATP synthase rotational states 1a versus 3b, indicating again in pink the aforementioned cross-link that is inconsistent with the former (43Å), yet consistent with latter (35.3Å). If these structures are indeed conserved between mouse and cow, such a cross-link could potentially be used as an indicator of the presence of rotational state 3b.

The mapping process includes alignment of cross-linked protein residues with those of the structure chains to which they are mapped, often at non-identical positions. Doing this manually can be very time consuming and impede the ability to view cross-link data in structures of other species. Figure 4A shows the automated alignment of 16 identified mouse intra-protein cross-links at 12 distinct residues of ADP/ATP translocase 119 to structures of alternative protein conformations, B.taurus 2C3E33 (c-state) and M.thermophila 6GCI34 (m-state). It is evident that the residue positions are different in the chains of both structures relative to those of the cross-linked mouse protein. In addition, because of insertions and deletions in the sequence alignment to 6GCI, there is not a constant position offset between them. The alignment enables viewing the mouse cross-linked residues in the context of both structures using NGL. In Figure 4B, one can see that the cross-linked residues, each distinctly colored, are differentially arranged in the two structures, indicating a large conformational change. Using XLinkDB in this manner, one can easily identify those cross-links formed possibly only within one conformation but not the other. An example cross-link, shown as a yellow line, between protein residues 23 (green) and 272 (yellow) is obstructed in structure 2C3E, passing through residue 33 (aqua), but not in structure 6GCI. Using Jwalk6, this cross-link was found to have SASD values of 63.1Å and 31.1Å in the two structures, respectively. Assuming conservation of the structures across species, this suggests that the cross-link likely originates from the m-state conformation of mouse ADP/ATP translocase 1.

Figure 4.

Figure 4.

Mapping of 16 mouse ADP/ATP translocase 1 intra-protein cross-links to B.taurus 2C3E and M.thermophila 6GCI structures. A. Corresponding residue positions and amino acids of the mouse cross-linked sites A and B in the two structures. Residues 57 and 100 in 6GCI, corresponding to 49 and 92 in the mouse sequence, respectively, are arginine rather than lysine, and residue 267 in 6GCI corresponding to 260 in the mouse sequence is glutamine, as depicted in red font. B. Structures showing the location of the mouse cross-linked residues, each labeled and colored distinctly facilitating comparison between their sites in the two structures. The cross-link between residues 23 and 272 is displayed as a yellow line.

Reassessment of presumed intra-protein cross-links

Multimeric structures to which intra-protein cross-links are mapped provide opportunities to reassess whether the cross-links are possibly homodimeric inter-protein, originating from distinct protein molecules in proximity. By evaluating whether the Euclidean distances of the presumed intra-protein cross-links in intra-chain and/or inter-chain contexts within the multimeric structures are consistent with the maximum span of the cross-linker used (35Å for BDP-NHP, 27Å for DSSO), one can identify cross-links that are possibly or likely inter-protein. In total 29,676 distinct intra-protein cross-links on XLinkDB were mapped to one or more of 16,547 different multimeric structures. 51% of those cross-links had distances within the expected range only as intra-chain, 10% only as inter-chain, and 39% as either. This suggests that a significant fraction of assumed intra-protein cross-links may instead, or in addition, be inter-protein. Using even more conservative criteria for assessing cross-link consistency, 639 presumed intra-protein cross-links had distances within 25Å in the inter-chain context, and distances greater than 60Å in all intra-chain contexts. These cases are strong indications of a homodimer origin. The majority of the 56,188 intra-protein cross-links on XLinkDB are not mapped to multimeric structures and thus cannot be assessed in this manner.

An example presumed intra-protein cross-link of human glucose-6-phosphate isomerase between residues 454 and 524 identified in human tissue culture cells35 has a distance in the context of mapped PDB structures consistent with the cross-linker maximum span (distance ≤ 35Å) only in an inter-chain, and not intra-chain context. Figure 5 shows the cross-linked residues in an inter-chain and intra-chain context in the PDB structure 2CXN36, M.musculus crystal structure of mouse AMF / phosphate dimer complex. One can see that the cross-linked residue pair on the left between two distinct monomers (red and blue) has a distance of 23.8Å (link indicated with black line) whereas the residues on the bottom within a chain (red) are separated by a distance of 65.6Å (link indicated with yellow line), too great for giving rise to the cross-link.

Figure 5.

Figure 5.

Presumed human glucose-6-phosphate isomerase intra-protein cross-link between residues 454 and 524 consistent with homodimer origin (23.8Å) according to mouse glucose-6-phosphate isomerase dimer structure, 2CXN, with monomers depicted in red and blue. Its distance of 65.6Å in the context of an intra-protein cross-link, shown in yellow, is greater than the 35Å maximum span of the BDP-NHP cross-linker.

Use of mapped structures to evaluate protein-protein docking

We previously described how cross-linked peptides can be used with molecular docking to predict protein-protein complex structures25, 37. Having a large number of structures to which inter-protein cross-links are mapped provides an opportunity to assess cross-link aided docking of the interacting proteins, as illustrated in Figure 6. We used this approach to evaluate docking as performed in the past on XLinkDB using the Integrative Modeling Platform12 guided by available inter-protein cross-links (see Methods). In 94 cases in which a cross-linked protein pair was mapped to a structure from the same species in which at least one protein had 20 or more residues within 10Å of a residue of the other, both chains were extracted from the structure and subjected to docking using available BDP-NHP cross-links with distances within 35Å in the structure as guiding constraints. After docking, the orientation of the second “ligand” protein was compared with that of the structure, and the RMSD reported. We find that 49% of the protein docking produced a protein pair model in strong agreement with that of the structure, having an RMSD no greater than 5Å (Table S2). In contrast, 17% and 3% of docked models disagreed strongly with the structure, having an RMSD greater than 60Å and 100Å, respectively. These results generally validate the method performed on XLinkDB, though having structures containing the cross-linked protein pairs makes docking unnecessary, avoiding the ~50% unsuccessful odds. Mapped structures could be used in a similar manner to evaluate and optimize any method of protein-protein docking.

Figure 6.

Figure 6.

Docking of protein pairs extracted from multi-protein structures. Protein pair chains are separated from the structure file and subjected to docking guided by available inter-protein cross-links, indicated as green lysine residues. The resulting two-protein model was then compared with the chains in the original structure by computing RMSD.

Conclusions

Automated mapping of cross-links to structure files on XLinkDB has greatly expanded the number of structures researchers can use to help infer their sample protein conformations, including structures generated from organisms different from those of samples. Cross-links can be viewed and assessed in the context of a variety of conformational states, when available, and in structures generated from homologous proteins of organisms different from the sample. Available structures can also be used to reassess presumed intra-protein cross-links as possibly homodimeric inter-protein, and to evaluate protein-protein docking by any method, as we did for that used in the past on XLinkDB. In all cases, it is important to have high quality cross-link data with a low level of false positives. Data from the Bruce Lab is now routinely analyzed with XLinkProphet38 and filtered for an estimated 1% FDR prior to upload to XLinkDB.

It is interesting to note that mapping of sample inter-protein cross-links to structures can serve as confirmatory evidence of the cross-link identification. It is very unlikely that a false positive inter-protein cross-link between two peptides would by chance be mapped to a structure containing both its corresponding proteins in close proximity. To test this notion, we calculated the fraction of inter-protein cross-links that were mapped to structures in a human nuclear dataset analyzed by a method with a high reported FDR3941. Whereas 33% of all Bruce Lab distinct inter-protein cross-links can be confirmed by mapping to structures, only 6% of those in the nuclear dataset were, one fifth the amount, consistent with a high fraction of false positives. Validation of cross-links in this manner could prove very useful since false positives are generally enriched among inter-protein cross-links38, 42.

Structures of non-sample organisms, particularly those distantly related, may differ from those of the sample so must be used as a best approximation. Furthermore, consistency of cross-link distances in the context of any structure does not inconclusively indicate that an identical sample protein conformation gave rise to the observed cross-links. There may be protein conformations for which no structure currently exists that cannot be evaluated in this manner. It is interesting to note that few PDB structures available include post-translational modifications that may exist in cross-linked samples and affect protein conformations and interactions. Nevertheless, XLinkDB, by mapping to all available structures, enables researchers to explore their data in a large variety of contexts. In the future, we hope to increase the variety by clustering structures mapped to a cross-link and keeping a representative structure from each cluster.

At this time, 53% (53,290) of all cross-links on XLinkDB are still mapped to no available structure. Furthermore, 4% of mapped cross-links have residue distances unexplained by available structures. This may be due to flexible regions of structures whereby some inter-lysine distances can vary in sample protein instances relative to those of the structure. In addition, these cross-links may arise from multimer formations in the sample which are not present in any structure. Even many intra-protein cross-links, as we have shown in this work, are possibly or likely derived from distinct closely interacting protein molecules. This suggests that cross-linking detects many protein conformations for which no structure yet exists and can help guide researchers toward obtaining the missing structures.

Supplementary Material

supplemental table 2

Table S2 - Results of docking protein chain pairs extracted from structures, guided by inter-protein cross-links.

supplemental table 1

Table S1 - Details of the mappings of the 72 cross-linked protein residue pairs identified in mouse heart mitochondria20 to specific chains within the structure 5ARA of bovine mitochondrial ATP synthase state 1a.

supplementary info

Figure S1 - Examples of three cross-links that could not be mapped to a structure with homologous sequences.

References

  • 1.Keller A; Chavez JD; Eng JK; Thornton Z; Bruce JE, Tools for 3D Interactome Visualization. J Proteome Res 2019, 18, (2), 753–758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Berman HM; Westbrook J; Feng Z; Gilliland G; Bhat TN; Weissig H; Shindyalov IN; Bourne PE, The Protein Data Bank. Nucleic Acids Res 2000, 28, (1), 235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Holding AN, XL-MS: Protein cross-linking coupled with mass spectrometry. Methods 2015, 89, 54–63. [DOI] [PubMed] [Google Scholar]
  • 4.Leitner A; Faini M; Stengel F; Aebersold R, Crosslinking and Mass Spectrometry: An Integrated Technology to Understand the Structure and Function of Molecular Machines. Trends Biochem Sci 2016, 41, (1), 20–32. [DOI] [PubMed] [Google Scholar]
  • 5.Zhang H; Tang X; Munske GR; Tolic N; Anderson GA; Bruce JE, Identification of protein-protein interactions and topologies in living cells with chemical cross-linking and mass spectrometry. Mol Cell Proteomics 2009, 8, (3), 409–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Matthew Allen Bullock J; Schwab J; Thalassinos K; Topf M, The Importance of Non-accessible Crosslinks and Solvent Accessible Surface Distance in Modeling Proteins with Restraints From Crosslinking Mass Spectrometry. Mol Cell Proteomics 2016, 15, (7), 2491–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chavez JD; Schweppe DK; Eng JK; Zheng C; Taipale A; Zhang Y; Takara K; Bruce JE, Quantitative interactome analysis reveals a chemoresistant edgotype. Nat Commun 2015, 6, 7928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rampler E; Stranzl T; Orban-Nemeth Z; Hollenstein DM; Hudecz O; Schlogelhofer P; Mechtler K, Comprehensive Cross-Linking Mass Spectrometry Reveals Parallel Orientation and Flexible Conformations of Plant HOP2-MND1. J Proteome Res 2015, 14, (12), 5048–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Webb B; Sali A, Protein Structure Modeling with MODELLER. Methods Mol Biol 2017, 1654, 39–54. [DOI] [PubMed] [Google Scholar]
  • 10.Kelley LA; Mezulis S; Yates CM; Wass MN; Sternberg MJ, The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 2015, 10, (6), 845–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Schneidman-Duhovny D; Inbar Y; Nussinov R; Wolfson HJ, PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 2005, 33, (Web Server issue), W363–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Russel D; Lasker K; Webb B; Velazquez-Muriel J; Tjioe E; Schneidman-Duhovny D; Peterson B; Sali A, Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol 2012, 10, (1), e1001244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chothia C; Lesk AM, The relation between the divergence of sequence and structure in proteins. Embo j 1986, 5, (4), 823–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wood TC; Pearson WR, Evolution of protein sequences and structures. J Mol Biol 1999, 291, (4), 977–95. [DOI] [PubMed] [Google Scholar]
  • 15.Maguid S; Fernandez-Alberti S; Parisi G; Echave J, Evolutionary conservation of protein backbone flexibility. J Mol Evol 2006, 63, (4), 448–57. [DOI] [PubMed] [Google Scholar]
  • 16.Fiser A, Template-based protein structure modeling. Methods Mol Biol 2010, 673, 73–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Szilagyi A; Zhang Y, Template-based structure modeling of protein-protein interactions. Curr Opin Struct Biol 2014, 24, 10–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jang WD; Lee SM; Kim HU; Lee SY, Systematic and Comparative Evaluation of Software Programs for Template-Based Modeling of Protein Structures. Biotechnol J 2020, e1900343. [DOI] [PubMed] [Google Scholar]
  • 19.Schweppe DK; Chavez JD; Lee CF; Caudal A; Kruse SE; Stuppard R; Marcinek DJ; Shadel GS; Tian R; Bruce JE, Mitochondrial protein interactome elucidated by chemical cross-linking mass spectrometry. Proc Natl Acad Sci U S A 2017, 114, (7), 1732–1737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chavez JD; Lee CF; Caudal A; Keller A; Tian R; Bruce JE, Chemical Crosslinking Mass Spectrometry Analysis of Protein Conformations and Supercomplexes in Heart Tissue. Cell Syst 2018, 6, (1), 136–141.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chavez JD; Tang X; Campbell MD; Reyes G; Kramer PA; Stuppard R; Keller A; Zhang H; Rabinovitch PS; Marcinek DJ; Bruce JE, Mitochondrial protein interaction landscape of SS-31. Proc Natl Acad Sci U S A 2020, 117, (26), 15363–15373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rose AS; Hildebrand PW, NGL Viewer: a web application for molecular visualization. Nucleic Acids Res 2015, 43, (W1), W576–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Altschul SF; Gish W; Miller W; Myers EW; Lipman DJ, Basic local alignment search tool. J Mol Biol 1990, 215, (3), 403–10. [DOI] [PubMed] [Google Scholar]
  • 24.Zhang Y; Skolnick J, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005, 33, (7), 2302–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Schweppe DK; Zheng C; Chavez JD; Navare AT; Wu X; Eng JK; Bruce JE, XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data. Bioinformatics 2016, 32, (17), 2716–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Agip AA; Blaza JN; Bridges HR; Viscomi C; Rawson S; Muench SP; Hirst J, Cryo-EM structures of complex I from mouse heart mitochondria in two biochemically defined states. Nat Struct Mol Biol 2018, 25, (7), 548–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wu M; Gu J; Guo R; Huang Y; Yang M, Structure of Mammalian Respiratory Supercomplex I(1)III(2)IV(1). Cell 2016, 167, (6), 1598–1609.e10. [DOI] [PubMed] [Google Scholar]
  • 28.Guo R; Zong S; Wu M; Gu J; Yang M, Architecture of Human Mitochondrial Respiratory Megacomplex I(2)III(2)IV(2). Cell 2017, 170, (6), 1247–1257.e12. [DOI] [PubMed] [Google Scholar]
  • 29.Blaza JN; Vinothkumar KR; Hirst J, Structure of the Deactive State of Mammalian Respiratory Complex I. Structure 2018, 26, (2), 312–319.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Letts JA; Fiedorczuk K; Degliesposti G; Skehel M; Sazanov LA, Structures of Respiratory Supercomplex I+III(2) Reveal Functional and Conformational Crosstalk. Mol Cell 2019, 75, (6), 1131–1146.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhou A; Rohou A; Schep DG; Bason JV; Montgomery MG; Walker JE; Grigorieff N; Rubinstein JL, Structure and conformational states of the bovine mitochondrial ATP synthase by cryo-EM. Elife 2015, 4, e10180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Keller A; Chavez JD; Felt KC; Bruce JE, Prediction of an Upper Limit for the Fraction of Interprotein Cross-Links in Large-Scale In Vivo Cross-Linking Studies. J Proteome Res 2019, 18, (8), 3077–3085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Nury H; Dahout-Gonzalez C; Trézéguet V; Lauquin G; Brandolin G; Pebay-Peyroula E, Structural basis for lipid-mediated interactions between mitochondrial ADP/ATP carrier monomers. FEBS Lett 2005, 579, (27), 6031–6. [DOI] [PubMed] [Google Scholar]
  • 34.Ruprecht JJ; King MS; Zögg T; Aleksandrova AA; Pardon E; Crichton PG; Steyaert J; Kunji ERS, The Molecular Mechanism of Transport by the Mitochondrial ADP/ATP Carrier. Cell 2019, 176, (3), 435–447.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chavez JD; Schweppe DK; Eng JK; Bruce JE, In Vivo Conformational Dynamics of Hsp90 and Its Interactors. Cell Chem Biol 2016, 23, (6), 716–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tanaka N; Haga A; Naba N; Shiraiwa K; Kusakabe Y; Hashimoto K; Funasaka T; Nagase H; Raz A; Nakamura KT, Crystal structures of mouse autocrine motility factor in complex with carbohydrate phosphate inhibitors provide insight into structure-activity relationship of the inhibitors. J Mol Biol 2006, 356, (2), 312–24. [DOI] [PubMed] [Google Scholar]
  • 37.Zheng C; Weisbrod CR; Chavez JD; Eng JK; Sharma V; Wu X; Bruce JE, XLink-DB: database and software tools for storing and visualizing protein interaction topology data. J Proteome Res 2013, 12, (4), 1989–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Keller A; Chavez JD; Bruce JE, Increased sensitivity with automated validation of XL-MS cleavable peptide crosslinks. Bioinformatics 2019, 35, (5), 895–897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Liu F; Lössl P; Scheltema R; Viner R; Heck AJR, Optimized fragmentation schemes and data analysis strategies for proteome-wide cross-link identification. Nat Commun 2017, 8, 15473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ser Z; Cifani P; Kentsis A, Optimized Cross-Linking Mass Spectrometry for in Situ Interaction Proteomics. J Proteome Res 2019, 18, (6), 2545–2558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Beveridge R; Stadlmann J; Penninger JM; Mechtler K, A synthetic peptide library for benchmarking crosslinking-mass spectrometry search engines for proteins and protein complexes. Nat Commun 2020, 11, (1), 742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bartolec TK; Smith DL; Pang CNI; Xu YD; Hamey JJ; Wilkins MR, Cross-linking Mass Spectrometry Analysis of the Yeast Nucleus Reveals Extensive Protein-Protein Interactions Not Detected by Systematic Two-Hybrid or Affinity Purification-Mass Spectrometry. Anal Chem 2020, 92, (2), 1874–1882. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental table 2

Table S2 - Results of docking protein chain pairs extracted from structures, guided by inter-protein cross-links.

supplemental table 1

Table S1 - Details of the mappings of the 72 cross-linked protein residue pairs identified in mouse heart mitochondria20 to specific chains within the structure 5ARA of bovine mitochondrial ATP synthase state 1a.

supplementary info

Figure S1 - Examples of three cross-links that could not be mapped to a structure with homologous sequences.

RESOURCES