Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2015 Mar;189(3):177–183. doi: 10.1016/j.jsb.2015.01.014

Xlink Analyzer: Software for analysis and visualization of cross-linking data in the context of three-dimensional structures

Jan Kosinski 1, Alexander von Appen 1, Alessandro Ori 1, Kai Karius 1, Christoph W Müller 1, Martin Beck 1,
PMCID: PMC4359615  PMID: 25661704

Abstract

Structural characterization of large multi-subunit protein complexes often requires integrating various experimental techniques. Cross-linking mass spectrometry (XL-MS) identifies proximal protein residues and thus is increasingly used to map protein interactions and determine the relative orientation of subunits within the structure of protein complexes. To fully adapt XL-MS as a structure characterization technique, we developed Xlink Analyzer, a software tool for visualization and analysis of XL-MS data in the context of the three-dimensional structures. Xlink Analyzer enables automatic visualization of cross-links, identifies cross-links violating spatial restraints, calculates violation statistics, maps chemically modified surfaces, and allows interactive manipulations that facilitate analysis of XL-MS data and aid designing new experiments. We demonstrate these features by mapping interaction sites within RNA polymerase I and the Rvb1/2 complex. Xlink Analyzer is implemented as a plugin to UCSF Chimera, a standard structural biology software tool, and thus enables seamless integration of XL-MS data with, e.g. fitting of X-ray structures to EM maps. Xlink Analyzer is available for download at http://www.beck.embl.de/XlinkAnalyzer.html.

Abbreviations: XL-MS, cross-linking mass spectrometry; MS, mass spectrometry; 3D, three-dimensional; EM, electron microscopy; Pol I, RNA polymerase I; ld-score, linear discriminant score; tWH, tandem winged helix domain

Keywords: Cross-linking, Mass spectrometry, Visualization, Analysis, XL-MS, Integrative modeling

1. Introduction

Cross-linking mass spectrometry (XL-MS) has recently emerged as one of the major techniques facilitating integrated structure determination approaches, which combine multiple complementary techniques, e.g. X-ray crystallography and electron microscopy (EM). The experimental workflow of XL-MS has been well established and includes biochemical cross-linking of proteins and proteolytic digestion into cross-linked peptides that are subjected to tandem mass spectrometry (Leitner et al., 2010). Several specialized search engines for the identification of cross-linked peptides from MS2 spectra were developed, which often also provide confidence scores assessing reliability of each identified cross-link (see e.g. Gao et al., 2006; Gotze et al., 2012; Holding et al., 2013; Rasmussen et al., 2011; Walzthoeni et al., 2012; Wang et al., 2014; Xu et al., 2008; Yang et al., 2012). For structural analysis, cross-links are translated into maximal distances between the cross-linked amino acid residues that vary depending on the chemical cross-linker. Such restraints facilitate mapping protein–protein interactions (Liu et al., 2014), discriminating alternative oligomeric conformations (Tosi et al., 2013), and modeling structures of proteins (Kahraman et al., 2013) and protein complexes (Bui et al., 2013; Lasker et al., 2012; Politis et al., 2014).

Each reactive amino acid residue can in principle form multiple cross-links of three major types: (i) Inter-links are cross-links between different protein subunits of a given complex. They are useful for determining protein interfaces and defining the relative position of protein subunits within a given complex. (ii) Intra-links connect residues of the same protein subunit. They are used to guide homology modeling (Kahraman et al., 2013) or to position different domains of the same subunit (Bui et al., 2013; Lasker et al., 2012; Politis et al., 2014). In the case of homo-oligomeric complexes, cross-links cannot be unambiguously assigned as intra- or inter-links since they may arise from a residue pair of the same protein subunit or its copies. (iii) Mono-links account for peptides that were modified by the cross-linker but not linked to a second peptide because one of the two functional groups of the cross-linker remained inactive. Mono-links do not provide distance information but can be used for mapping interfaces or accessible surfaces through identification of regions in which mono-links are under- or overrepresented (Tosi et al., 2013).

The usefulness of cross-links for the characterization of structures and structural modeling has led to the development of various software tools dedicated to the visualization of XL-MS data. A first and very informative step is the visualization of cross-links in the context of primary structures, as e.g. implemented in xiNet (http://crosslinkviewer.org/). Such analysis often indicates interacting domains that appear highly interconnected by clusters of cross-links. Some tools for displaying cross-links and the respective distances within three-dimensional (3D) structures are also available (Holding et al., 2013; Kahraman et al., 2011; Zheng et al., 2013), but these tools only allow displaying pre-defined sets of cross-links. More importantly, these tools are not sufficiently integrated with the molecular structure analysis tools that would allow for interactive operations such as filtering cross-links according to various criteria during and simultaneous analysis of complementary information, e.g. EM maps. These limitations represent a severe bottleneck that slows down the routine analysis and interpretation of integrated 3D structures and models.

To enable the interactive and comprehensive analysis of XL-MS data within a sophisticated 3D molecular viewer and structural analysis tool that is widely used in the structural biology community, such as UCSF Chimera (Pettersen et al., 2004), we developed the Xlink Analyzer plugin. Xlink Analyzer is an interactive graphical software. It is seamlessly integrated within UCSF Chimera and allows importing mass spectrometry data, displaying cross-links in the context of structures, automatically detecting violated cross-links, calculating the number of satisfied and violated cross-links, and plotting distributions of distances between cross-linked residues (Fig. 1). It includes tools for filtering the cross-links by confidence score, locating and displaying subsets of cross-links (e.g. inter-links, intra-links or cross-links connecting particular subunits), mapping positions of mono-links, and comparing different cross-linking datasets. Xlink Analyzer also provides an interface for coloring, selecting, hiding or displaying individual subunits or subcomplexes, which, as compared to capabilities of standard Chimera structure operation tools, greatly facilitates analysis of very large complexes (Fig. A.1E). Xlink Analyzer is freely available as open-source software.

Fig.1.

Fig.1

Overview of Xlink Analyzer functionality.

An input to Xlink Analyzer consists of text files with cross-linking data in tabular format and structures or models loaded to UCSF Chimera. The provided tools allow locating cross-links and mono-links, managing color and display of subunits if the structure is a multi-subunit complex, interactively adjusting which cross-links are used for analysis, counting satisfied and violated cross-links, and exporting the data on cross-link distances and violations in tabular format or as a distribution plot.

2. Results

2.1. Import, automated display and analysis of cross-linking data

Xlink Analyzer is implemented as a software extension to the UCSF Chimera molecular graphics program, which is an open source platform widely used by the structural biology community (Pettersen et al., 2004). While structures can be loaded, viewed and manipulated using the well-known UCSF Chimera interface, cross-links can be imported and accessed using a dedicated Xlink Analyzer window (Fig. A.1). Lists of cross-links are imported together with their respective confidence scores from text files in either simple, generic, comma separated (CSV) format listing the cross-linked residue pairs or xQuest/xProphet format (Appendix E). After configuring the protein or the protein complex of interest, Xlink Analyzer automatically displays the cross-links within the 3D structures. The user can interactively hide cross-links of a specific type, for instance intra- or inter-links or display only cross-links between specific subunits. Cross-links are colored according to an adjustable violation distance threshold and cross-links below a specific confidence score can be hidden if desired. Statistics of violations can be interactively recalculated while changing the score threshold. In addition, Xlink Analyzer generates tables of the numbers of violated cross-links and the respective subunit pairs that can be exported as a text file. The distribution of the distances between cross-linked residues can be visualized as a histogram and exported in several graphical formats. Alternative structural models can be analyzed simultaneously and the model satisfying the highest number of cross-links can be identified. Taken together, these functionalities enable the routine implementation of XL-MS data into structural analysis.

2.2. Analysis of XL-MS data in the context of the three-dimensional structure using RNA Polymerase I as an example

To demonstrate the potential of Xlink Analyzer, we performed XL-MS analysis of RNA Polymerase (Pol) I from Saccharomyces cerevisiae. We purified Pol I as previously described (Fernandez-Tornero et al., 2013) and subjected it to varying concentrations of H12/D12 isotope-coded, di-succinimidyl-suberate (DSS) cross-linker, which reacts with amine groups of lysine residues and protein N-termini (Section 5). The cross-linking reaction was quenched with ammonium bicarbonate and proteins were digested with trypsin. The cross-linked peptides were enriched by size exclusion chromatography as previously described (Leitner et al., 2012) and subjected to LC–MS/MS analysis. The cross-links were then identified from the MS spectra using the xQuest/xProphet (Leitner et al., 2014a). After data import into Xlink Analyzer and stringent filtering according to the cross-link confidence score (xQuest ld-score > 30), we mapped the identified cross-links onto the crystal structure of Pol I (Engel et al., 2013; Fernandez-Tornero et al., 2013)(Fig. 2A). Using Xlink Analyzer, we found 106 unique cross-linked residue pairs that could be mapped to the structure, of which five are violated using 30 Å distance threshold (Merkley et al., 2014). Interestingly, four out of the five violated cross-links originate from an extended loop inside the DNA-binding cleft of Pol I (Fig. 2B). This loop has been suggested to be a mobile regulatory element that becomes ordered only under certain conditions (Engel et al., 2013; Fernandez-Tornero et al., 2013). Several cross-links agree with the position of the extended loop in the Pol I cleft as suggested by the crystal structures, whereas three of the violated cross-links consistently suggest an alternative position on the clamp head domain. Thus, the extended loop likely binds to the clamp head in an alternative conformational state of Pol I or transiently interacts with that region when moving to the DNA-binding cleft.

Fig.2.

Fig.2

Analysis of cross-links mapped to Pol I crystal structure. (A) Overall view of Pol I structure (PDB code: 4C3H) (Fernandez-Tornero et al., 2013) showing cross-links with ld-score 30 or higher. Satisfied cross-links (using a distance threshold of 30 Å) are colored blue, violated cross-links (using 30 Å distance threshold) are colored red. (B) Cross-links suggest two alternative interaction sites for the extended loop of the A190 subunit. Dashed lines indicate regions missing in the structure. For clarity, only A190 and A135 subunits and intra-links of A190 subunit and inter-links between A190 and A135 subunits are displayed. Displaying individual subunits and specific cross-link types is facilitated through appropriate panels in Xlink Analyzer. The extended loop is colored cyan using standard coloring tools of UCSF Chimera. (C) Mapping interactions to the tWH domain that forms an extension of subunit A49 and is disordered in the crystal structure. The tWH domain was defined in the project setup as residues 172–403 and the residues cross-linking to this domain were highlighted using Xlink Analyzer (colored dark red as the color of this domain defined in the setup). The approximate position of A49-tWH based on cross-links is indicated.

Xlink Analyzer can highlight residues within structural models cross-linking to any other protein or domain in the XL-MS dataset, regardless of whether or not the structure of that other protein or domain is present. This feature is useful to locate their potential binding surfaces. To demonstrate this, we located the tandem winged helix domain (tWH) of the A49 subunit (Geiger et al., 2010) that was disordered in the crystal structure (Fig. 2C). In the case of Pol I, cross-links were identified that link the tWH to the clamp head and jaw domains of the A190 subunit as well as the protrusion domain of A135. These cross-links would place tWH into the cleft between A190 and A135 but might not necessarily be indicative of only a single conformation. However, they agree with previous cross-linking studies and the topological model of Pol I (Jennebach et al., 2012).

2.3. Mapping protein interfaces based on observed cross-links and mono-links

XL-MS usually targets charged amino acids such as lysine, aspartate or glutamate (Leitner et al., 2014b), which often occur within protein–protein and protein-nucleic acid interfaces (Ahmad et al., 2004; Davis et al., 1998; Zhao et al., 2011). Measurements of the accessibility of such residues by NMR or MS were previously used to map protein–protein interfaces (Scholten et al., 2006) (Hattori et al., 2013). Xlink Analyzer allows identifying surface regions with reduced number of cross-links and mono-links that might correspond to protein interfaces. This is possible because modified residues are more likely to occur on solvent exposed surfaces as compared to buried regions, such as interaction sites (Scholten et al., 2006; Tosi et al., 2013). To account for the fact that some of the modified peptides might not be observable by MS due to their varying susceptibility to digestion, liquid chromatography, ionization, etc. (Sanders et al., 2007), Xlink Analyzer predicts theoretically observable mono-links based on a machine learning approach (see Section 5). The mono-link predictions are based on features derived only from the sequence of the peptides, such as total mass, net charge or hydrophobicity and independent from structural properties. Based on the predictions, Xlink Analyzer visually marks residues expected to be modified but not observable by MS. If residues that are not modified but expected to be observable cluster into a specific region of a structure, they indicate a potentially buried region.

To demonstrate this feature we repeated a prediction of the interaction interfaces of the Rvb1/2 complex as previously described by Tosi et al. (Tosi et al., 2013) (Fig. 3). In this study, the interaction site between two Rvb1/2 hexamers was predicted based on EM and mono-links without taking the observability of the modified peptides into consideration. Xlink Analyzer automatically identifies modified residues of the Rvb1/2 model and predicts the interaction site straightforwardly and in-line with the aforementioned previous work showing that on the previously predicted interface, the expected to be observed but non-modified residues are enriched comparing to the alternative interface (Fig. 3). However, only the central part of the originally predicted interface contains non-modified residues that are indicators of a buried surface (Fig. 3B, red). Although the outer part contains non-modified residues (yellow), those give rise to peptides that are not likely to be observed by XL-MS and should not be used as indicators. Thus, Xlink Analyzer not only enables locating non-modified residues but also helps in discriminating regions devoid of modifications due to experimental limitations of MS from the regions protected from modification due to interaction interfaces.

Fig.3.

Fig.3

Mapping of the interaction interface between Rvb1/2 hexamers based on inaccessible residues.

Modified residues (i.e. residues involved in cross-links or mono-links) with ld-score at least 30 are colored blue. Residues expected to be modified by the cross-linker are colored red. Residues that are not observed as modified by MS and also not expected to be observed because of the physicochemical properties of the respective peptides are colored yellow. The non-interacting, solvent-exposed surface (left) and the interacting, buried surface (right) are depicted. Arrows on the side view of the hexamer indicate possible interaction surfaces for the second copy of the hexamer. Non-modified residues cluster into an area that is likely buried by the interaction of the hexameric rings.

2.4. Analysis of homo-oligomeric complexes

A ‘Homo-oligomeric’ mode is available in Xlink Analyzer that is dedicated to cross-link analysis of homo-oligomeric complexes or complexes containing multiple copies of at least one subunit. In these cases, multiple residue pair combinations can be assigned to cross-links that derive from the subunits that are present in multiple copies (Fig. 4A), making the identification of cross-links in structure and statistical analysis of cross-link violations inherently difficult. In ‘Homo-oligomeric’ mode, given a structure or a theoretical model of the oligomer, Xlink Analyzer automatically identifies the non-violated fraction of every possible set of equivalent residue pairs across the oligomeric interfaces and within the subunits. If no pair within the set satisfies the cross-link, the pair with the shortest distance is selected for display. Violation statistics can be subsequently recalculated. As we demonstrate in Fig. 4, the automatic ‘Homo-oligomeric’ mode significantly increases the visualization and interpretation of the cross-links data on homo-oligomers.

Fig.4.

Fig.4

Assignment of cross-links in homo-oligomeric mode. (A) Schematic illustration of possible residue pair combinations in standard and homo-oligomeric mode. For each cross-link there are four possible residue pair combinations (left). Based on residue distances, Xlink Analyzer automatically determines which pairs more likely correspond to the observed cross-links (right). (B) Cross-linked Rvb1/2 displayed in standard mode. Rvb1/2 hetero-hexamer is composed of three copies each, Rvb1 and Rvb2 subunits that give rise to a large number of residue pair combinations as displayed in standard mode. (C) Same as (B) but displayed in homo-oligomeric mode. The model of Rvb1/2 hexamer was reproduced based on Tosi et al. (Tosi et al., 2013). The remaining cross-links violating the 30 Å distance threshold (Merkley et al., 2014) might indicate an inwards domain movement (see also Fig. 3).

2.5. Comparison of different cross-linking datasets

In order to assess the similarity between alternative experimental conditions or biological states and conformations (e.g. a nucleic acid-binding protein complex in the presence or absence of DNA) it is often necessary to compare different XL-MS data sets. Xlink Analyzer allows importing several cross-linking datasets simultaneously to analyze them in combination or separately. To demonstrate this feature, we compared cross-links of Pol I that were obtained using various different chemical reaction conditions using DSS cross-linker. In particular, the cross-linker concentration was varied from 0.05 to 10 mM and the timing of the reaction was such that either all cross-linker was added at once or, alternatively, it was added in several consecutive intervals (see Section 5) in smaller amounts. The ‘interval setting’ involves adding the cross-linker stepwise up to a given concentration, e.g., in 10 steps, each step increasing the concentration by 0.2 mM up to final concentration of 2 mM. This setting is useful for samples with limited availability of which the optimal cross-linker concentration is not known. Although this data set is not quantitative as compared to standards established for conventional shotgun proteomics, namely such that peptide abundance ratios can be calculated, a trend towards more identified mono-links vs cross-links with increasing cross-linker concentrations was apparent (Fig. 5A). Through analysis with Xlink Analyzer, we found that, in this particular data set, different cross-linker concentrations lead to the identification of specific cross-linked residue pairs that otherwise remained undiscovered (see also Fig. A.2). Generally, we could not find a correlation between cross-link-observability in a given condition with structural properties such as solvent accessibility of cross-linked lysine residues (Fig. A.3). Nevertheless, several regions of Pol I were cross-linked only in some of the conditions. For example, no inter-protein cross-link with ld-score larger than 30 was observed for subunit ABCα when 2 mM of cross-linker was used, but two inter-cross-links from ABCα were obtained at 10 mM cross-linker concentration (Fig. 5B). Similarly, no cross-link was observed for linker region of A12 subunit at 2 mM cross-linker concentration but three cross-links were identified when 2 mM cross-linker (total) was added in the ‘interval setting’ mode (Fig. 5C). We thus conclude that Xlink Analyzer can assist experimentalists with the empirical optimization of cross-linking experiments.

Fig.5.

Fig.5

Comparison of different experimental cross-linking conditions. (A) Number of cross-links and mono-links with ld-score higher than 30 in each experimental condition. (B) Inter-protein cross-links of ABCα subunit of Pol I (magenta) obtained using a 2 mM and 10 mM concentration of the cross-linker. (C) Cross-links involving the A12 subunit (yellow) as obtained in ‘interval’ mode as compared to the 2 mM cross-linker condition.

3. Discussion

Xlink Analyzer provides a comprehensive set of tools for the interactive visualization and automated analysis of cross-links in the context 3D of structural models. Presently available software tools such as Xwalk (Kahraman et al., 2011), Xlink-DB (Zheng et al., 2013), and Hekate (Holding et al., 2013) allow visualizing manually selected lists of cross-links and calculating distances. Xlink Analyzer includes this functionality but in an automated and interactive fashion and in the context of MS data properties. It allows filtering cross-links by a confidence score or by the cross-link type, it seamlessly provides statistical summaries of the compatibility of a given structural model with the set of cross-links in real time. Xlink Analyzer also contains entirely novel features such as mapping cross-links to homo-oligomeric assemblies and potential interaction sites taking mono-link observability into account. Most importantly, the Xlink analyzer is fully integrated into a commonly used structural display and modeling software, namely UCSF Chimera. Therefore, Xlink analyzer functions can be seamlessly and interactively combined with established features, such as e.g. fitting of X-ray structures into EM maps, for integrative structural analysis. Thus, Xlink Analyzer simplifies the routine analysis of XL-MS data and makes it more convenient and more exhaustive.

Xlink Analyzer offers the possibility to map modified residues (both cross-links and mono-links) onto structures, which may help in locating interaction sites. This feature includes optional predictions of residues non-observable as mono-linked peptides in MS. Since peptide observability, and thus the accuracy of the predictions, may vary depending on e.g. the MS setup, we designed our predictor to mark residues as non-observable only when the prediction score is above a stringent threshold adjusted to minimize false predictions (see Section 5). Although it decreases prediction sensitivity (some true non-observable residues may be missed), it avoids predicting as non-observable residues with “intermediate observability”, whose values may change in different MS setups. It must be noted that although the mono-link predictor exhibits good performance in our tests (see Section 5) and performs very well on the Rvb1/2 test case, the predictor accuracy still may vary for other complexes or different MS setups. In order to reliably predict buried or solvent exposed surfaces, it will thus be important to rely on clusters of amino acid residues of which the majority yields a consistent readout instead of single residues.

A limitation of Xlink Analyzer is that the structures or the models of the subunits and the complex need to be available for performing the analysis. Nevertheless, structures of individual subunits can often be built using homology modeling. Otherwise, if a homology models cannot be obtained, their positions can be approximately predicted using the feature of Xlink Analyzer that maps residues cross-linking to other subunits or domains even if they are missing in the model of the complex (Fig. 2C). To better support low resolution modeling, an important future direction will be to support the Rich Molecular Format (http://integrativemodeling.org/rmf/viewing.html), which is recognized by UCSF Chimera and allows displaying subunits for which no structure is available as low-resolution beads.

Thanks to the full integration of Xlink Analyzer into UCSF Chimera molecular viewer, Xlink Analyzer is extremely useful for integrated structural biology projects that combine XL-MS approaches with data from other structural analysis techniques. UCSF Chimera contains an extensive set of tools that is widely used for visualizing molecular structures and EM maps or fitting protein subunits into EM maps (Goddard et al., 2007; Meng et al., 2006). This set of tools is fully interfaced with and compatible with the Xlink Analyzer functionalities. Users can, for example, discriminate alternative structural models using XL-MS data. Xlink Analyzer fills a present gap in the available set of software tools for XL-MS. In addition, it is useful not only for structural modeling projects but also provides feedback to experimentalists.

4. Conclusions

Xlink Analyzer enables integrative analysis of cross-linking and structural data. The software permits seamless use of cross-linking data to map protein interaction sites, to locate subunits in protein complexes, and to identify conformational changes. We thus anticipate that Xlink Analyzer will further drive the integration of XL-MS into the framework of established structural analysis techniques.

5. Material and methods

5.1. Implementation of Xlink Analyzer software

Xlink Analyzer is implemented as Python extension to UCSF Chimera (Pettersen et al., 2004). The cross-links are displayed as pseudo-bonds created using Chimera programming interface. Matplotlib (John, 2007) embedded in Chimera is used to display a cross-link distance histogram. Xlink Analyzer can be downloaded from http://www.beck.embl.de/XlinkAnalyzer.html and the complete source code can be accessed at http://github.com/crosslinks/XlinkAnalyzer and installed using a standard procedure for installing UCSF Chimera extensions (installation instructions are included in Appendix C).

5.2. Cross-linking/mass spectrometry of RNA Pol I

S. cerevisiae Pol I was purified as described previously (Fernandez-Tornero et al., 2013). The sample was dialyzed in XL-buffer (150 mM NaCl, 1 mM DTT, 20 mM Hepes) and diluted to a final concentration of 0.5 μg/μl. 50 μg of protein were used per cross-linking reaction while adding different amounts of an iso-stoichiometric mixture of H12/D12 isotope-coded, di-succinimidyl-suberate (DSS, Creative Molecules). All reactions were carried out at 37 °C with gentle shaking. Six standard conditions of 0.05, 0.2, 0.5, 2, 5 and 10 mM DSS were tested with a reaction time of 30 min. Three additional cross-linking conditions were performed by adding 10 times 0.05 and 0.2 mM pulses of DSS every 4 min and 0.5 mM pulses of DSS 4 times every 10 min. The reactions were quenched by adding ammonium bicarbonate to a final concentration of 50 mM. The digestion of cross-linked proteins was performed as described previously (Bui et al., 2013). Cross-linked peptides were enriched using SEC, as described before (Leitner et al., 2012), using a Superdex Peptide PC 3.2/30 column (GE) on a Ettan LC system (GE) at a flow rate of 50 μl/min. Fractions eluting between 0.9 and 1.4 ml were evaporated to dryness and reconstituted in 20–50 μl 5% (v/v) ACN in 0.1% (v/v) FA according to 215 nm absorbance. LC–MS/MS analysis was carried out as described previously (Bui et al., 2013). Raw files were converted to centroid mzXML using the Mass Matrix file converter tool. The data was searched using xQuest against a database containing the sequences of the cross-linked proteins and Posterior probabilities were calculated using xProphet (Leitner et al., 2014a). The results were filtered using the following parameters: FDR = 0.05, min delta score = 0.95, MS1 tolerance window −4 to 7 ppm.

5.3. Development of monolink predictor

To develop a predictor of observable mono-links using machine learning algorithms, we created an experimental set of observed mono-linked peptides and a set of theoretical peptides for all lysine residues for which mono-links were not observed. The experimental set was derived from the XL-MS of Pol I performed in this work. Since the XL-MS of Pol I was conducted in a variety of reaction conditions, it is likely that the coverage of observable mono-links was maximized, as different conditions led to different sets of mono-links. The theoretical peptides for not observed mono-links were derived as peptides expected from a given protein sequence after digestion with trypsin. In total we obtained, 112 observed mono-links with a ld-score of at least 30 and 105 non-observed mono-links. We then split all observed and non-observed mono-links in the Pol I dataset into training and testing data sets by assigning 70% of the mono-links to the training set and the remaining 30% to the testing set. The split was performed keeping similar percentages of observed and non-observed mono-links in each set. Only mono-links with corresponding peptides composed of five to 50 residues were used (this is the typical range used by xQuest). Then for every peptide (corresponding to every mono-link), we calculated seven features: peptide length, mass, hydrophobicity index based on Kyte–Doolittle indexes (Kyte and Doolittle, 1982), net charge at pH 2 (typical pH of the liquid chromatography buffers), and retention coefficients calculated based on GUOD860101 and MEEJ800102 tables obtained from the AAindex (Kawashima et al., 1999). Then, using these features and the training data set, we applied penalized logistic regression to derive a classifier for predicting observable and non-observable mono-links. We have selected logistic regression over other machine learning methods to allow for the easy implementation in UCSF Chimera without creating dependencies on other libraries. Moreover, other classifiers such as Support Vector Machine with linear and non-linear kernels led to similar classification accuracy as logistic regression. To derive optimal penalized logistic regression classifier we performed a grid search over its parameters (with 5-fold stratified cross-validation) using the training set. Then, the threshold of the classifier that leads to at most 10% of observed mono-links wrongly predicted as non-observable in the training set was automatically adjusted using Receiver Operating Characteristic (ROC) analysis. With this threshold, on the testing data set (data set not used for creating the classifier), the resulting classifier correctly predicted 72% non-observed mono-links as non-observable, wrongly predicting as non-observable only 6% of the mono-links. This corresponded to an accuracy calculated as (TP + TN)/(P + N) equal to 83% and 72% precision [TP/(TP + FP)] (where TP – observed correctly predicted as observable, TN – not observed correctly predicted as non-observable). Machine learning was performed using scikit-learn (Pedregosa et al., 2011) and the data sets prepared using BioPython (Cock et al., 2009).

Author contributions

J.K., M.B., C.W.M. designed research. J.K. and K.K. wrote the software. A. von A. and A.O. performed XL-MS of Pol I. J.K., M.B, A. von A. and A.O., and C.W.M. wrote the paper. M.B. and C.W.M. oversaw the project.

Acknowledgments

The work was supported by the EMBL Interdisciplinary Postdoc Programme under Marie Curie COFUND Actions (J.K.), an Advanced Grant of the European Research Council (ERC-2013-AdG340964-POL1PIC) to C.W.M. We thank María Moreno-Morcillo and Umar J. Rashid for providing purified Pol I sample for XL-MS analysis, Luis Pedro Coelho for advice on machine learning, Thomas Bock, Svetlana Dodonova, Niklas Hoffmann, and Noella Silva Martin from EMBL Structural and Computational Biology Unit for suggestions, feedback and testing, Sebastian Glatt and Carsten Sachse for critical reading of the manuscript. We acknowledge support from the EMBL Proteomic Core Facility.

Appendix A. Supplementary data

Supplementary data 1
mmc1.docx (1.3MB, docx)
Supplementary data 2
mmc2.xlsx (653.8KB, xlsx)
Supplementary data 3
mmc3.pdf (107.4KB, pdf)
Supplementary data 4
mmc4.pdf (13.9MB, pdf)
Supplementary data 5
mmc5.pdf (135KB, pdf)

References

  1. Ahmad S., Gromiha M.M., Sarai A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–486. doi: 10.1093/bioinformatics/btg432. [DOI] [PubMed] [Google Scholar]
  2. Bui K.H., von Appen A., DiGuilio A.L., Ori A., Sparks L., Mackmull M.T., Bock T., Hagen W., Andres-Pons A., Glavy J.S., Beck M. Integrated structural analysis of the human nuclear pore complex scaffold. Cell. 2013;155:1233–1243. doi: 10.1016/j.cell.2013.10.055. [DOI] [PubMed] [Google Scholar]
  3. Cock P.J., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., de Hoon M.J. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Davis S.J., Davies E.A., Tucknott M.G., Jones E.Y., van der Merwe P.A. The role of charged residues mediating low affinity protein-protein recognition at the cell surface by CD2. Proc. Natl. Acad. Sci. U. S. A. 1998;95:5490–5494. doi: 10.1073/pnas.95.10.5490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Engel C., Sainsbury S., Cheung A.C., Kostrewa D., Cramer P. RNA polymerase I structure and transcription regulation. Nature. 2013;502:650–655. doi: 10.1038/nature12712. [DOI] [PubMed] [Google Scholar]
  6. Fernandez-Tornero C., Moreno-Morcillo M., Rashid U.J., Taylor N.M., Ruiz F.M., Gruene T., Legrand P., Steuerwald U., Muller C.W. Crystal structure of the 14-subunit RNA polymerase I. Nature. 2013;502:644–649. doi: 10.1038/nature12636. [DOI] [PubMed] [Google Scholar]
  7. Gao Q., Xue S., Doneanu C.E., Shaffer S.A., Goodlett D.R., Nelson S.D. Pro-CrossLink. Software tool for protein cross-linking and mass spectrometry. Anal. Chem. 2006;78:2145–2149. doi: 10.1021/ac051339c. [DOI] [PubMed] [Google Scholar]
  8. Geiger S.R., Lorenzen K., Schreieck A., Hanecker P., Kostrewa D., Heck A.J., Cramer P. RNA polymerase I contains a TFIIF-related DNA-binding subcomplex. Mol. Cell. 2010;39:583–594. doi: 10.1016/j.molcel.2010.07.028. [DOI] [PubMed] [Google Scholar]
  9. Goddard T.D., Huang C.C., Ferrin T.E. Visualizing density maps with UCSF Chimera. J. Struct. Biol. 2007;157:281–287. doi: 10.1016/j.jsb.2006.06.010. [DOI] [PubMed] [Google Scholar]
  10. Gotze M., Pettelkau J., Schaks S., Bosse K., Ihling C.H., Krauth F., Fritzsche R., Kuhn U., Sinz A. StavroX–a software for analyzing crosslinked products in protein interaction studies. J. Am. Soc. Mass Spectrom. 2012;23:76–87. doi: 10.1007/s13361-011-0261-2. [DOI] [PubMed] [Google Scholar]
  11. Hattori Y., Furuita K., Ohki I., Ikegami T., Fukada H., Shirakawa M., Fujiwara T., Kojima C. Utilization of lysine (1)(3)C-methylation NMR for protein-protein interaction studies. J. Biomol. NMR. 2013;55:19–31. doi: 10.1007/s10858-012-9675-9. [DOI] [PubMed] [Google Scholar]
  12. Holding A.N., Lamers M.H., Stephens E., Skehel J.M. Hekate: software suite for the mass spectrometric analysis and three-dimensional visualization of cross-linked protein samples. J. Proteome Res. 2013;12:5923–5933. doi: 10.1021/pr4003867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jennebach S., Herzog F., Aebersold R., Cramer P. Crosslinking-MS analysis reveals RNA polymerase I domain architecture and basis of rRNA cleavage. Nucleic Acids Res. 2012;40:5591–5601. doi: 10.1093/nar/gks220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. John D.H. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 2007;9:90–95. [Google Scholar]
  15. Kahraman A., Malmstrom L., Aebersold R. Xwalk: computing and visualizing distances in cross-linking experiments. Bioinformatics. 2011;27:2163–2164. doi: 10.1093/bioinformatics/btr348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kahraman A., Herzog F., Leitner A., Rosenberger G., Aebersold R., Malmstrom L. Cross-link guided molecular modeling with ROSETTA. PLoS One. 2013;8:e73411. doi: 10.1371/journal.pone.0073411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kawashima S., Ogata H., Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 1999;27:368–369. doi: 10.1093/nar/27.1.368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kyte J., Doolittle R.F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  19. Lasker K., Forster F., Bohn S., Walzthoeni T., Villa E., Unverdorben P., Beck F., Aebersold R., Sali A., Baumeister W. Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc. Natl. Acad. Sci. U. S. A. 2012;109:1380–1387. doi: 10.1073/pnas.1120559109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Leitner A., Walzthoeni T., Aebersold R. Lysine-specific chemical cross-linking of protein complexes and identification of cross-linking sites using LC–MS/MS and the xQuest/xProphet software pipeline. Nat. Protoc. 2014;9:120–137. doi: 10.1038/nprot.2013.168. [DOI] [PubMed] [Google Scholar]
  21. Leitner A., Walzthoeni T., Kahraman A., Herzog F., Rinner O., Beck M., Aebersold R. Probing native protein structures by chemical cross-linking, mass spectrometry, and bioinformatics. Mol. Cell. Proteomics. 2010;9:1634–1649. doi: 10.1074/mcp.R000001-MCP201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Leitner A., Reischl R., Walzthoeni T., Herzog F., Bohn S., Forster F., Aebersold R. Expanding the chemical cross-linking toolbox by the use of multiple proteases and enrichment by size exclusion chromatography. Mol. Cell. Proteomics. 2012;11(M111):014126. doi: 10.1074/mcp.M111.014126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Leitner A., Joachimiak L.A., Unverdorben P., Walzthoeni T., Frydman J., Forster F., Aebersold R. Chemical cross-linking/mass spectrometry targeting acidic residues in proteins and protein complexes. Proc. Natl. Acad. Sci. U. S. A. 2014;111:9455–9460. doi: 10.1073/pnas.1320298111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Liu H., Zhang H., Weisz D.A., Vidavsky I., Gross M.L., Pakrasi H.B. MS-based cross-linking analysis reveals the location of the PsbQ protein in cyanobacterial photosystem II. Proc. Natl. Acad. Sci. U. S. A. 2014;111:4638–4643. doi: 10.1073/pnas.1323063111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Meng E.C., Pettersen E.F., Couch G.S., Huang C.C., Ferrin T.E. Tools for integrated sequence–structure analysis with UCSF Chimera. BMC Bioinf. 2006;7:339. doi: 10.1186/1471-2105-7-339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Merkley E.D., Rysavy S., Kahraman A., Hafen R.P., Daggett V., Adkins J.N. Distance restraints from crosslinking mass spectrometry: mining a molecular dynamics simulation database to evaluate lysine–lysine distances. Protein Sci. 2014;23:747–759. doi: 10.1002/pro.2458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  28. Pettersen E.F., Goddard T.D., Huang C.C., Couch G.S., Greenblatt D.M., Meng E.C., Ferrin T.E. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
  29. Politis A., Stengel F., Hall Z., Hernandez H., Leitner A., Walzthoeni T., Robinson C.V., Aebersold R. A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat. Methods. 2014;11:403–406. doi: 10.1038/nmeth.2841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Rasmussen M.I., Refsgaard J.C., Peng L., Houen G., Hojrup P. CrossWork: software-assisted identification of cross-linked peptides. J. Proteomics. 2011;74:1871–1883. doi: 10.1016/j.jprot.2011.04.019. [DOI] [PubMed] [Google Scholar]
  31. Sanders W.S., Bridges S.M., McCarthy F.M., Nanduri B., Burgess S.C. Prediction of peptides observable by mass spectrometry applied at the experimental set level. BMC Bioinf. 2007;8(Suppl. 7):S23. doi: 10.1186/1471-2105-8-S7-S23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Scholten A., Visser N.F., van den Heuvel R.H., Heck A.J. Analysis of protein–protein interaction surfaces using a combination of efficient lysine acetylation and nanoLC-MALDI-MS/MS applied to the E9:Im9 bacteriotoxin–immunity protein complex. J. Am. Soc. Mass Spectrom. 2006;17:983–994. doi: 10.1016/j.jasms.2006.03.005. [DOI] [PubMed] [Google Scholar]
  33. Tosi A., Haas C., Herzog F., Gilmozzi A., Berninghausen O., Ungewickell C., Gerhold C.B., Lakomek K., Aebersold R., Beckmann R., Hopfner K.P. Structure and subunit topology of the INO80 chromatin remodeler and its nucleosome complex. Cell. 2013;154:1207–1219. doi: 10.1016/j.cell.2013.08.016. [DOI] [PubMed] [Google Scholar]
  34. Walzthoeni T., Claassen M., Leitner A., Herzog F., Bohn S., Forster F., Beck M., Aebersold R. False discovery rate estimation for cross-linked peptides identified by mass spectrometry. Nat. Methods. 2012;9:901–903. doi: 10.1038/nmeth.2103. [DOI] [PubMed] [Google Scholar]
  35. Wang J., Anania V.G., Knott J., Rush J., Lill J.R., Bourne P.E., Bandeira N. Combinatorial approach for large-scale identification of linked peptides from tandem mass spectrometry spectra. Mol. Cell. Proteomics. 2014;13:1128–1136. doi: 10.1074/mcp.M113.035758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Xu H., Zhang L., Freitas M.A. Identification and characterization of disulfide bonds in proteins and peptides from tandem MS data by use of the MassMatrix MS/MS search engine. J. Proteome Res. 2008;7:138–144. doi: 10.1021/pr070363z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Yang B., Wu Y.J., Zhu M., Fan S.B., Lin J., Zhang K., Li S., Chi H., Li Y.X., Chen H.F., Luo S.K., Ding Y.H., Wang L.H., Hao Z., Xiu L.Y., Chen S., Ye K., He S.M., Dong M.Q. Identification of cross-linked peptides from complex samples. Nat. Methods. 2012;9:904–906. doi: 10.1038/nmeth.2099. [DOI] [PubMed] [Google Scholar]
  38. Zhao N., Pang B., Shyu C.R., Korkin D. Charged residues at protein interaction interfaces: unexpected conservation and orchestrated divergence. Protein Sci. 2011;20:1275–1284. doi: 10.1002/pro.655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zheng C., Weisbrod C.R., Chavez J.D., Eng J.K., Sharma V., Wu X., Bruce J.E. XLink-DB: database and software tools for storing and visualizing protein interaction topology data. J. Proteome Res. 2013;12:1989–1995. doi: 10.1021/pr301162j. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data 1
mmc1.docx (1.3MB, docx)
Supplementary data 2
mmc2.xlsx (653.8KB, xlsx)
Supplementary data 3
mmc3.pdf (107.4KB, pdf)
Supplementary data 4
mmc4.pdf (13.9MB, pdf)
Supplementary data 5
mmc5.pdf (135KB, pdf)

RESOURCES