Abstract
Characterization of protein structural changes in response to protein modifications, ligand or chemical binding, or protein-protein interactions is essential for understanding protein function and its regulation. Amide hydrogen/deuterium exchange (HDX) coupled with mass spectrometry (MS) is one of the most favorable tools for characterizing the protein dynamics and changes of protein conformation. However, currently the analysis of HDX-MS data is not up to its full power as it still requires manual validation by mass spectrometry experts. Especially, with the advent of high throughput technologies, the data size grows everyday and an automated tool is essential for the analysis. Here, we introduce a fully automated software, referred to as ‘deMix’, for the HDX-MS data analysis. deMix deals directly with the deuterated isotopic distributions, but not considering their centroid masses and is designed to be robust over random noises. In addition, unlike the existing approaches that can only determine a single state from an isotopic distribution, deMix can also detect a bimodal deuterated distribution, arising from EX1 behavior or heterogeneous peptides in conformational isomer proteins. Furthermore, deMix comes with visualization software to facilitate validation and representation of the analysis results.
Introduction
Analysis of a protein structure and its molecular composition is essential for understanding the protein’s underlying function. While we believe that a sequence of amino acids encodes a precise three-dimensional shape of a protein, characterizing protein structures from primary sequences is still a challenge. Recently, combining amide hydrogen/deuterium exchange (HDX) with mass spectrometry (MS) is favored as a tool for characterizing dynamic structure of proteins. HDX is a chemical reaction in which a covalently bonded hydrogen atom is replaced by a deuterium atom, or vice versa1. The HDX provides structural information based on solvent accessibility of peptide bonds in tertiary and quaternary structure of each protein, because exchange rates between hydrogen and deuterium in proteins depend on the degree of a proton’s exposure to the protein surface and flexibility of the surrounding tertiary structure in permitting access of protons to solvent.
X-ray crystallography provides detailed structural information of a protein if appropriate crystals can be generated as a solid state. However, often crystal structures of all proteins are unavailable. Moreover, X-ray crystallography provides only the static structure of a crystal but not a dynamic view of a protein in the solution. Alternatively, high field nuclear magnetic resonance (NMR) can be used to obtain the information about the structure and dynamics of a protein. Unfortunately, it requires high concentrations of sample2, which possibly cause the distortion and aggregation of a protein structure. Besides, there are several other analytical tools such as circular dichroism, fluorescence, differential scanning calorimetry, analytical ultracentrifugation, side-chain reactivity, binding assays, and various chromatographic methods available and commonly used for protein biophysical characterization3. However, none of these methods provides an easy or straightforward manner to determine the entire structure of a protein without the help of experts. For the reasons, there have been efforts to combine multiple tools together. HDX-MS has many apparent advantages compared to NMR or X-ray crystallography: it is possible to analyze proteins in native solution condition; it requires much less material for analysis due to growing sensitivity of a mass spectrometer; moreover, protein mixtures are also compatible to analysis. Especially, HDX-MS can provide information for active dynamic structural changes of a protein under various biological conditions.
The main computational problem in HDX-MS analysis is to determine deuterium contribution to isotopic distribution of a deuterated peptide. There are several methods for calculating the number of exchanged deuteriums. One approach to obtaining an average deuteration level is to simply calculate a centroid mass of a deuterated isotopic distribution and subtract the centroid mass of the corresponding natural (non-deuterated) isotopic distribution4–8. A more sophisticated approach is to solve linear equations, where the deuterated isotopic distribution is defined as convolution of its natural isotopic distribution and the distribution for the deuteration level of the peptide9–12. The best fit to all of the linear equations is achieved using least-squares method9 or maximum entropy method (MEM)10. The least-squares method is simple but not applicable when the number of exchangeable hydrogen is big or signal-to-noise ratio is poor. MEM uses entropy to measure the amount of uncertainty in a probability distribution and finds the highest entropy solution subject to the error range. MEM requires substantial computation and can be unreliable when MEM spectral distortion occurs, but it is generally considered to be more robust to the noise than other methods. Fourier deconvolution method11,12 uses Fast Fourier Transform for deconvolution of natural and deuterated isotopic distributions, consequently revealing the deuteration level.
One of the difficulties in the determination of deuterium contribution is that often a deuterated isotopic distribution has a bimodal form, not a single form, which arises from EX1 behavior or heterogeneous conformational populations13. The kinetics for HDX has two limitations, EX2 and EX114,15. In EX2 condition, a progressive mass shift in single deuterated distribution is observed with increasing D2O labeling time, while in EX1 condition, a progressive amount reweighting between two deuterated distributions is observed. More interestingly, the coexistence of two protein conformations may lead to a bimodal deuterated distribution, where simultaneous, progressive mass shifts in two distributions can be observed with increasing D2O labeling time. Nevertheless, most of the previous HDX software assume only the unimodal distribution, resulting in a single deuterium number. For example, centroid mass-based approaches ignore the shape of the observed deuterated distribution and determine the (single) center of masses of all observed peaks. Some software tools such as ExMS16, Hexicon 217 and HX-Express v218 have been proposed to identify a bimodal isotopic distribution for the deuterated distribution.
Here, we introduce a new algorithm, referred to as ‘deMix’ (decode deuterated mixture) that can analyze HDX-MS data in a fully automated fashion, which is essential for avoiding human errors, while allowing for high throughput data analysis. In particular, deMix makes it possible to interpret the bimodal deuterated distributions, reporting deuterium numbers up to two. Basically, deMix assumes a statistical distribution of deuterium contribution to a deuterated peptide as the binomial distribution, and deals directly with the shape of deuterated isotopic distribution, but not considering their centroid masses. Deuterated isotopic distribution naturally spans a wider m/z range than the corresponding natural isotopic distribution, due to partial deuteration. Thus, there can be more noises and frequent overlaps between isotopic distributions in HDX-MS than ordinary MS data, which complicates the recognition of a peptide’s isotope pattern and calculation of the deuteration level. To overcome such difficulties, deMix proposes a measure, referred to as Matched Peak Count, which is designed to be robust over random noises in comparing two distributions. In addition, the proposed measure has strength in analyzing bimodal deuterated distributions. Applied to HDX-MS experiments with native and oxidized Nm23-H1, a tumor metastasis suppressor, deMix could accurately predict the deuteration level not only when the deuterated isotopic distribution has a single deuterated form but also for when it has a bimodal form as two proteins coeluted in a MS run. Finally, deMix comes with a visualization software to facilitate validation and representation of the analysis results, offering greater practical utility.
Results
The deMix algorithm
The deMix workflow is summarized in Fig. 1. deMix determines the H/D exchange profile via binomial fitting and decides whether to perform bimodal analysis without any human intervention. The analysis starts with detecting natural and deuterated isotopic distributions of peptides of interest and the procedure is shown in Fig. 2: (1) given peptides identified from MS2, deMix generates a theoretical isotopic distribution for each peptide and compares observed distributions in MS1 spectra (non-deuterated sample), constructing its extracted ion chromatogram (XIC). More details are provided under Methods; (2) based on the XIC in the non-deuterated sample, XICs of peptides in deuterated samples are constructed, where our assumption is that the related XICs across samples partially overlap or are shifted within a certain range (e.g., ±40 scans) although they may not totally overlap; (3) deMix refines each XIC and selects isotopic distribution peaks corresponding to presumably the same peptide ion within a determined elution time span; (4) deMix aggregates all detected isotopic peaks into a single isotopic distribution. Individual isotopic distributions typically have poor shape when the intensities of peaks are not high enough or a few isotopic distributions overlap and the similarity of the individual distributions became weak. The aggregated isotopic distributions is regarded more robust than individual distributions; and (5) based on the aggregated isotopic distributions, deuterium numbers are determined. The algorithm is described in the next section.
Fundamental algorithm to analyze deuterated distributions
HDX-MS data have to do with three relevant distributions as shown in Fig. 3: two observed isotopic distributions- (1) natural isotopic distribution, ‘Dnat’; (2) deuterated isotopic distribution, ‘Ddeu’, and one statistical distribution of deteurium contribution to Ddeu, ‘Dlev’. Dlev is expected to conform to the binomial distribution B(n, d/n), where n is the number of exchangeable hydrogens in a peptide and d, of interest in HDX-MS analysis, is the average number of exchanged deuteriums9. The convolution of Dnat and Dlev results in Ddeu, which gets to span a wider m/z range than Dnat since Dlev is not a single value but a distribution.
Given Dnat and Ddeu of a peptide, the estimation of d is performed as follows: for every possible d, theoretical Dlev_d is generated and theoretical Ddeu_d can be also generated by convoluting Dlev_d with Dnat (Let Dlev_# and Ddeu_# be the distributions based on deuterium number #). Then d can be determined by finding the best match between any theoretical Ddeu_d and the observed Ddeu.
The matching of two distributions is measured by Matched Peak Count (MPC) (eq. 1)
1 |
where Ok and Tk are intensity of the k-th peak in the observed Ddeu, O, and the theoretical Ddeu, T, respectively. N is a normalization factor to scale the peak intensities in T and is first calculated from the position p of the most abundant peak in T so that NTp is set equal to Op as shown in Fig. 4b (i.e., N = Op/Tp). εk is an intensity tolerance in matching k-th peak and is set to 10% of NTk to allow wider tolerance for higher intensity peak. Then for each d, MPCs are calculated while varying N from 110% to 10% of its initial value in 5% decrement and the maximum MPC is determined as the matching quality for d under consideration. In Fig. 4c, for example, the maximum MPC is found when N = 0.75 in case d = 9. By iteratively performing this process for every d, the optimal d is determined with maximum MPC (when d = 8 in Fig. 4d).
Maximizing MPC is somewhat different from minimizing errors in matching distributions. MPC has strength in cases when an experimental distribution has big noises or is obscured by isotopic distribution of other species as shown in Fig. 5. Centroid mass calculation approaches can easily result in a wrong deuterium number in such cases, because the big noises are included in the calculation (thus the centroid mass is shifted to the right in an example shown in Fig. 5). It is also evident that error measurement approaches such as least-square error easily fail for such cases. In the example of Fig. 5, chi-square error of Ddeu_9 was less than that of Ddeu_8 and resulted in the wrong deuterium number, while MPC clearly distinguished the two and determined the exact deuterium number, 8. In addition to robustness to noises, we argue that MPC has great potential to discover a bimodal distribution.
Toward analyzing bimodal deuterated distributions
Figure 6 shows the bimodal analysis process by deMix. The first analysis is conducted assuming a unimodal distribution and results in a single deuterium value, dA (Fig. 6a). If the ratio of explained area by the single value to unexplained area does not exceed a certain preset threshold (5:1 in this work), deMix proceeds to bimodal distribution analysis. deMix takes the 2nd-ranked deuterium number, dB from the first analysis (Fig. 6b), generates a theoretical bimodal distribution using the two numbers, dA and dB, and compares it with the observed Ddeu. In the bimodal distribution analysis, the problem is to determine how each species is populated in the bimodal distribution (eq. 2).
2 |
deMix finds the optimal w using maximum MPC. Finally, deMix reports two deuterium numbers, dA and dB, only if the error of the bimodal distribution analysis is significantly improved than that of the unimodal analysis (because the error of the bimodal distribution analysis is almost always less than that of the unimodal analysis given a higher degree of freedom, its significance should be assessed). In addition, if the weight factor for more abundant species is over 90%, only the abundant one is reported (not accepted as a bimodal distribution).
Basically, for the analysis of the bimodal distribution, all possible combinations of two deuterium numbers need to be examined, but this would be computationally expensive. MPC can successfully select candidates of the two numbers by using non-overlapped area (e.g., right tail in Fig. 6a) in the bimodal distribution. However, if the two numbers are similar and thus, the length of non-overlapped tail is short, MPC may fail to determine the numbers exactly. We found in case that the distance between the two numbers is small, MPC often reported tdA plus 1 for a small true value tdA, while tdB minus 1 for a large true value tdB. To compensate for such off-by-one misinterpretations, deMix in the bimodal analysis takes into account 4 combinations, either dA−1 or dA and either dB or dB+1 and determines the final two values from the combination with the best MPC. We performed simulation tests to show how well MPC predicts two numbers in a bimodal distribution according to the distance between the two numbers. The simulation test was performed as follows: for each peptide of length n that was randomly selected from Swiss-Prot human database, two deuterium numbers and their respective weights (the sum of weights should be one) were randomly generated, and theoretical, bimodal deuterated distribution was generated using eq. 2. Then, we checked how correctly MPC predicts the two numbers from the bimodal distributions. In Table 1, each column represents the distance (2, 3, 5, and 8) between the two deuterium numbers and each row represents the weight of a more abundant deuterated form. For each cell, 10,000 trials (or different bimodal distributions) were conducted and the figures show the percentage of times when MPC correctly determines both simulated deuterium numbers. In ‘initial’ column, MPC performance was assessed only by initial prediction dA and dB, while ‘expanded’ column shows the performance when considering dA + 1 and dB − 1 as well as dA and dB (dA < dB). In ‘initial’ column, the performance drop was observed when the distance between two component numbers is close or a weight of one component is much higher than the other (e.g., 0.9 vs. 0.1). The performance was worse as the peptide length was longer, where a certain bimodal distribution was very similar to a unimodal distribution and it was difficult to distinguish the two. On the other hand, in ‘expanded’ column considering 4 combinations, MPC showed almost perfect performance in all cases without considering all possible combinations of two deuterium numbers. With high accuracy, the MPC-based calculation was fast and the running time of deMix was a few minutes for a set of HDX experiments (i.e., in processing 7 mzXML files and calculating deuterium numbers from different D2O labeling times in this work) on a desktop computer.
Table 1.
Distance between two deuterated numbers in a bimodal distribution | ||||||||
---|---|---|---|---|---|---|---|---|
Weight | 2 | 3 | 5 | 8 | ||||
Initial | Expanded | Initial | Expanded | Initial | Expanded | Initial | Expanded | |
(a) For peptides of length 10 | ||||||||
0.5 | 49% | 100% | 80% | 100% | 100% | 100% | 100% | 100% |
0.6 | 68% | 100% | 83% | 100% | 98% | 100% | 100% | 100% |
0.7 | 37% | 100% | 86% | 100% | 92% | 100% | 100% | 100% |
0.8 | 24% | 100% | 58% | 100% | 91% | 100% | 100% | 100% |
0.9 | 18% | 100% | 42% | 98% | 92% | 100% | 100% | 100% |
(b) For peptides of length 15 | ||||||||
0.5 | 54% | 100% | 79% | 100% | 100% | 100% | 100% | 100% |
0.6 | 45% | 100% | 85% | 100% | 100% | 100% | 100% | 100% |
0.7 | 23% | 100% | 71% | 100% | 100% | 100% | 100% | 100% |
0.8 | 18% | 100% | 53% | 100% | 100% | 100% | 100% | 100% |
0.9 | 19% | 100% | 43% | 100% | 89% | 94% | 100% | 100% |
(c) For peptides of length 20 | ||||||||
0.5 | 48% | 100% | 75% | 100% | 100% | 100% | 100% | 100% |
0.6 | 29% | 100% | 74% | 100% | 100% | 100% | 100% | 100% |
0.7 | 21% | 100% | 54% | 100% | 99% | 99% | 100% | 100% |
0.8 | 18% | 100% | 45% | 100% | 99% | 100% | 100% | 100% |
0.9 | 18% | 100% | 42% | 100% | 83% | 97% | 100% | 100% |
(d) For peptides of length 30 | ||||||||
0.5 | 14% | 100% | 67% | 100% | 99% | 100% | 100% | 100% |
0.6 | 21% | 100% | 56% | 100% | 99% | 100% | 100% | 100% |
0.7 | 19% | 100% | 47% | 100% | 97% | 98% | 100% | 100% |
0.8 | 17% | 100% | 43% | 100% | 87% | 94% | 100% | 100% |
0.9 | 24% | 100% | 42% | 100% | 75% | 92% | 100% | 100% |
With varying distances and weights between two distributions in bimodal analysis, deMix performance is shown. Each cell shows how deMix correctly determined two deuterated forms for bimodal distributions simulated under a specific condition. For example, in b), peptides of length 15 were randomly selected from Swiss-Prot human database, and then two deuterium numbers and their respective weights were randomly generated. Based on the simulated values, bimodal deuterated distributions were generated (10,000 different distributions for each cell). When the distance between two deuterium numbers was close, deMix often reported tdA plus 1 for a small true value tdA while tdB minus 1 for a large true value tdB and failed to determine the exact numbers (initial column). Based on the observation, deMix in bimodal analysis takes into account 4 combinations using dA − 1 and dB + 1 in addition to dA and dB from initial prediction. The adaptation led to outstanding performance for all cases (expanded column). In ‘weight’ row, the weight (w) of more abundant form is represented (the weight of the other form is 1-w).
Application to Nm23-H1
Nm23-H1, a tumor metastasis suppressor, is a multifunctional housekeeping enzyme. It is known that NDP kinase activities and metastasis suppressor activities of Nm23-H1 are regulated by redox balance at Cys10919,20. Recent study characterized molecular mechanism of this redox regulation as stepwise oxidations. Firstly, under oxidative conditions, Nm23-H1 forms intra-disulfide bond between Cys4 and Cys145, which induces large conformational changes. Secondly, these conformational changes induce oxidations at Cys109 by regulating the quaternary structure21. The crystal structures of native Nm23-H1 are available22 and recent study has also reported the crystal structure of oxidized Nm23-H121. In this study, HDX experiments for native and oxidized (treated with H2O2) Nm23-H1 were used to elucidate the conformational changes, and deMix was applied to analyze the data.
Table 2 summarizes the peptides from Nm23-H1 and their HDX analysis results. Higher deuteration rates were observed in the oxidized form than the native form. Especially, the peptide (residue 109–132) containing Cys109 showed dominant difference in its HDX rates between the two conditions. Under the oxidative condition, the intra-disulfide bond induces a conformational change of the C-terminal domain (residue 133–141), which triggers helix-to-loop transition of α8 in the C-terminal domain and thereby exposes Cys109 to be easily accessible to solvent molecules. deMix results clearly showed that under an oxidative condition the HDX rates were increased at both early and saturated points of D2O labeling time. In recent study, two interface regions of Nm23-H1 subunits were detected to show a large difference in HDX rate depending on H2O2 concentration21. In this study, kinetics of HDX rate changes in control and 1 mM H2O2-treated protein were examined. HDX rates in oxidized Nm23-H1 were increased in dimeric interface (residues 9~35) and most of the K-pn loop region (residues 96~108), HDX rates of dimeric interface (9~35) in response to H2O2 treatment showed gradual increase and then reached saturation depending on D2O labeling time. On the other hand, HDX rates of K-pn loop region (residues 96~108) were immediately saturated at the first time point. These kinetic results provide the detail structural information of oxidized Nm23-H1, where K-pn loop in oxidized Nm23-H1 is exposed to surface and dimeric interface is presumed to be more inside than K-pn loop, thus shows a slower change in HDX rate.
Table 2.
m/z | rt (m) | peptide | sites | D2O labeling times (Native) | D2O labeling times (Oxidative) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 s | 1 m | 3 m | 5 m | 10 m | 30 m | 60 m | 30 s | 1 m | 3 m | 5 m | 10 m | 30 m | 60 m | ||||
420.69 (2+) | 5.96 | ANCERTF | 2~8 | 1 | 1 | 1 | 1 | 1 | 1~2 | 2~1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 |
566.99 (3+) | 8.71 | FIAIKPDGVQRGLVGE | 8~23 | 1 | 1 | 1 | 1 | 1 | 1 | 2~0 | 1~3 | 2~0 | 1~3 | 1~4 | 3~1 | 3 | 4 |
776.45 (2+) | 8.01 | IAIKPDGVQRGLVGE | 9~23 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1~4 | 1~4 | 4~1 | 2~5 | 5~2 |
517.97 (3+) | 8.01 | IAIKPDGVQRGLVGE | 9~23 | 1 | 2~0 | 1 | 1~3 | 1 | 2~0 | 1 | 1~4 | 1~4 | 1~4 | 1~4 | 4~1 | 4 | 4 |
663.64 (4+) | 10.55 | IAIKPDGVQRGLVGEIIKRFEQKG | 9~32 | 0~2 | 0~3 | 2 | 0~3 | 0~3 | 2~0 | 2 | 2 | 3~0 | 1~4 | 1~4 | 1~4 | 2~5 | 4 |
767.70 (4+) | 11.96 | IAIKPDGVQRGLVGEIIKRFEQKGFRL | 9~35 | × | 1 | 1 | 1 | 1~3 | × | 1 | 3~0 | 4~0 | 1~5 | 1~5 | 1~5 | 2~6 | 3~8 |
512.31 (3+) | 8.01 | IIKRFEQKGFRL | 24~35 | 0~2 | 1 | 1 | 1 | 1 | 0~2 | 1 | 1~0 | 1 | 1 | 1 | 1 | 2~0 | 1~3 |
490.31 (2+) | 9.84 | FRLVGLKF | 33~40 | 0~1 | 0~1 | 0~1 | 1~0 | 1~0 | 0~1 | 1~0 | 0~1 | 1~0 | 1~0 | 1~0 | 1 | 1 | 1~3 |
941.48 (3+) | 10.2 | MQASEDLLKEHYVDLKDRPFFAGL | 41~64 | 7 | 8 | 9 | 9 | 10 | 11 | 11 | 10 | 10 | 10 | 11 | 11 | 11 | 11 |
569.81 (4+) | 10.2 | DLLKEHYVDLKDRPFFAGL | 46~64 | 6 | 6 | 7 | 7 | 8 | 8 | 9~8 | 8 | 8 | 8 | 8 | 9 | 9 | 9 |
602.99 (3+) | 9.14 | LKEHYVDLKDRPFF | 48~61 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 | 4 |
452.49 (4+) | 9.14 | LKEHYVDLKDRPFF | 48~61 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
1024.55 (2+) | 9.49 | LKEHYVDLKDRPFFAGL | 48~64 | 5 | 5 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 7 | 6 | 7 | 7 |
559.96 (3+) | 10.13 | HYVDLKDRPFFAGL | 51~64 | 5 | 6 | 6 | 6 | 6 | 7 | 6~7 | 6 | 6 | 6 | 7 | 7 | 7 | 7 |
650.86 (2+) | 7.3 | LVKYMHSGPVVA | 64~75 | 1 | 1 | 1 | 1~2 | 2~1 | 2 | 2 | 2~1 | 2~1 | 2~1 | 2 | 2 | 2 | 2 |
716.38 (2+) | 8.08 | LVKYMHSGPVVAM | 64~76 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2~1 | 2 | 2~1 | 2 | 2 | 2 | 2 |
594.32 (2+) | 6.59 | VKYMHSGPVVA | 65~75 | 2~0 | 2~0 | 2~0 | 2~0 | 2 | 2 | 2 | 2~0 | 2~0 | 2 | 2 | 2 | 2 | 2 |
680.85 (2+) | 8.08 | PADSKPGTIRGDF | 96~108 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
852.10 (3+) | 10.55 | CIQVGRNIIHGSDSVESAEKEIGL | 109~132 | 5 | 5 | 5 | 5 | 6 | 7 | 7 | 7 | 8 | 8 | 9 | 10 | 11 | 12 |
586.28 (2+) | 9.77 | WFHPEELVD | 133~141 | 1~3 | 1~2 | 1~2 | 1~2 | 1~2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 |
Each cell represents the deuteration numbers of input peptides under each condition, where ‘x’ means that deuterated distribution is not detected and #~# means that a peptide is observed as two deuterated forms (more abundant form precedes). “rt (m)” column shows a retention time of a peptide in minutes and “sites” column represents the start and end positions of a peptide in Nm23-H1.
Interestingly, we observed the mixed deuteration behaviors indicating two conformational states for peptides of 9–35 region, a dimeric interface of the protein, presumably resulting from incompletely dissociated dimers in response to H2O2 treatment. Since Nm23-H1 is regulated by stepwise oxidation, it is expected that intermediate conformations of Nm23-H1 oligomeric states may be observed. Figure 7 shows the mass spectral isotopic distributions from the peptide at m/z 767.7027 [M+4 H]4+ in its oxidized form. The deuterated distributions could not be properly fitted when a single distribution is assumed. For example, the distribution of Fig. 7a was first interpreted as 4 exchanged deuteriums with MPC value 6 and only 62% of observed distribution could be explained. However, bimodal distribution analysis led to 97% explanation of observed distribution with MPC value 12. We observed simultaneous and progressive mass shifts in two components of a bimodal distribution with increasing D2O labeling time, and this is different from a phenomenon by EX1 kinetics. It indicates the co-existence of two protein conformations, where one form is specified by the distribution from the exchanged deuterium number 1 to 3 with different D2O labeling time, while the other from 5 to 7.
To demonstrate the performance of deMix, we compared with Hexicon 217, an automated pipeline for HDX analysis that supports bimodal analysis like deMix. The overall analysis results of Nm23-H1 by Hexicon 2 agreed with those by deMix. Between the two algorithms, the difference lies in a peptide feature (or isotopic distribution) detection. deMix dynamically determined the elution time spans for candidate peptide masses and aggregated isotopic distributions of the same peptide feature over elution time as shown in Fig. 2 while Hexicon 2 used a single isotopic distribution. We emphasize here the strength of deMix arising from the difference. Hexicon 2 generated two incompatible H/D exchange profiles at different elution times for a single peptide ‘PADSKPGTIRGDF’ (residues 96~108), where one exchange rate increased (consistent with deMix analysis in Table 2) with increasing D2O labeling time but the other decreased (wrong by manual validation of the scan, see Supplementary Fig. S1). Hexicon 2 also did not generate an H/D exchange profile for a peptide ‘CIQVGRNIIHGSDSVESAEKEIGL’ (residues 109~132) under oxidative condition. We found that in the two cases, their isotopic signals were weak or noisy. Notably, deMix could generate results for those cases since the aggregation of isotopic distributions over elution time increased the signal-to-noise ratio.
Visualization software
deMix provides visualization software to help users validate and represent HDX analysis results. Figure 8 shows a user-friendly graphical interface of the tool. For spectral data view, two windows are shown: one (upper) is for a natural isotope distribution and the other (lower) for the corresponding deuterated isotope distribution in one of HDX experiments across D2O labeling times, which can be selected by a user from a list at the bottom. Peptide list view at the bottom shows input peptides and their HDX analysis results. By selecting a specific cell representing a deuterium number from this list, experimental (red peaks) natural and deuterated isotopic distributions for the peptide can be displayed in the spectral view window together with its theoretical (blue line) distributions, and a deuteration rate plot across D2O labeling times is drawn automatically in the left panel. Users can zoom in and out from isotopic distributions in each individual scan, and examine aggregated distributions by adjusting retention time range.
Discussion
There have been many efforts to characterize protein structures that have the key to elucidating the complex protein function and dynamics. Recently, protein structural changes by oxidation of peroxiredoxin 223, by calcium binding in EF-hands of secretagogin24, and by chemical binding to Nm23-H125 and by drug binding to PPARγ26 have all been well characterized by employing HDX-MS. HDX-MS is a one of the hottest techniques, which costs less effort and money, and at the same time, it has less limitation compared to other methods such as NMR or X-ray crystallography as described previously27. With the growing data size and the technology, a fully automated method is essential for high throughput analysis. deMix takes raw MS data of before and after HDX experiment and their peptide identifications as its input, and analyze the number of exchanged deuteriums. As most of experimental data contain many unexpected noise or co-eluted peptides, deMix effectively addresses lots of nosy peaks during its analysis.
In particular, deMix is very suitable for analyzing bimodal distributions by EX1 kinetics or from two protein conformations, which may suggest an important clue about protein function and dynamics. In the bimodal distribution analysis, the important issue is filtering out false positives. In some cases, a combination of two species may give better fit to the observed distribution than one species, but this may well result from over fitting or from the fact that the original MS data includes noises. For example, the bimodal distribution of Ddeu_6 and Ddeu_8 can be very similar to Ddeu_7, and both unimodal and bimodal analyses generate good results with very small errors. The selection of bimodal distributions over unimodal one must be assessed more conservatively.
Methods
Sample Preparation
Recombinant Nm23-H1 protein was purified from E. coli strain BL21 (DE3) overexpressing plasmids pET3c containing Nm23-H1 as described previously28. Briefly, cytosolic fraction of E. coli strains BL21 (DE3) transformed with pET-3c expression plasmids containing Nm23-H1 coding region were obtained after inducing the expression of protein with 0.2 mM IPTG. Each cytosolic fraction was applied to 2~4 mL of ATP-sepharose column equilibrated with Buffer A (20 mM Tris-acetate, 20 mM NaCl, 0.1 mM EDTA, 3 mM MgCl2, pH 7.4) at a flow rate of 3 mL/min. The column was then washed with buffer A and then with Buffer A containing 0.25 M NaCl to remove nonspecifically binding proteins. Then Nm23-H1 was eluted with Buffer A containing 1 mM ATP.
HDX using nanoUPLC-ESI-q-TOF
Recombinant Nm23-H1 (2 μg/μL) was diluted 10-fold with 99% D2O for 30 sec, 1, 3, 5, 10, 30, and 60 min and maintained at 25 °C with 1 mM H2O2. The labeling reaction was quenched by 5 mM tris(2-carboxyethyl)phosphine hydrochloride, pH 2.3 (This is titrated with formic acids). For peptic digestion, porcine pepsin (1 mg/mL) was added to each quenched protein sample and incubated at 0 °C for 3 min before injection21.
Deuterated peptic peptides were desalted on line prior to separation using trap column (ID 180 μm × 20 mm, Symmetry® C18) cartridge. Peptides were separated using a C18 reversed-phase 100 μm ID × 100 mm analytical column (1.7 μm particle size, BEH130 C18, Waters Co. USA) with integrated electrospray ionization PicoTipTM (±10 μm ID, New Objective, USA). The auto-sampler chamber was set at 5 °C. The trap, analytical column and all tubing were immersed in an ice bath to minimize deuterium back-exchange. Both mobile phase bottles were placed on ice and both mobile phases contained 0.1% FA. Gradient chromatography was performed at 600 nL/min flow rate and was sprayed on line to mass spectrometer (SYNAPTTM HDMSTM, Waters Co. USA). All mass spectral measurements were taken at: capillary voltage 2.5 kV, cone voltage 35 V, extraction cone voltage 4.0 V, source temperature 80 °C. TOF mode scan was performed in range of m/z 300–1500 with scan time of 1 sec.
MS/MS-based peptide identification
The raw files were converted to mzXML format using masswolf (v1.4) and were searched by Mascot (version 2.2.07, Matrix Science) with the following parameters: no enzyme option, ±0.3 Da mass tolerance for peptide and fragment ions, and variable modifications of Gln → pyro-Glu (N-terminal Q) and Oxidation (M). Swiss-Prot Human database was used for search. The peptide identifications were obtained at default significant threshold (p < 0.05), only matched to the target protein (Swiss-Prot Number P15531).
Natural and deuterated peptide profile
deMix’s input consists of a series of mzXML files, one for non-deuterated and the rest from HDX experiments across different D2O labeling times, together with the information (sequence, m/z, charge state, and precursor selection time) of peptides identified from the same sample. For each peptide given, its retention time range is automatically defined first from non-deuterated sample as shown in Fig. 2, and then deuterated isotopic distributions are detected from HDX experiments using the defined retention time range.
To define the retention time range of a peptide, the algorithm examines whether the peptide has been eluted in any of the preceding and/or following MS scans starting from its input precursor selection time, by comparing the peptide’s experimental isotopic distribution with its theoretical isotopic distribution calculated from its sequence information. As a default, the algorithm retrieves around at least ± 40 scans from the precursor selection time, but continues the extension as long as the peptide is detected. The scan extension is terminated if the peptide is not detected in two consecutive MS scans, and then the range along with the center of this elution (CE) is reported (i.e. center of extracted ion chromatogram, XIC), where the center of elution area was calculated as intensity-weighted average time of extended scans. Next, the corresponding deuterated isotopic distribution is determined for HDX experiments, where the shape of its deuterated distribution is not predictable because a peptide’s deuteration number is not known. Thus the algorithm selects the distribution (i.e. a set of peaks) with the maximum consecutive peaks (at least 5) at possible isotopic positions in the HDX-MS scan corresponding to CE, from which scan extension within defined XIC range begins as described above and is finished if in an extended scan, a distribution is not detected or a detected distribution is not overlapped by at least 5 peaks with the distribution at CE. For the deuterated distribution, all distributions detected in individual scans are aggregated together since the aggregated distribution is regarded more robust over errors than individual ones.
In this work, peaks were found within 0.1 Da mass tolerance. A user can set various parameters (default range of scan extension, number of required peaks for distributions, threshold for each peak’s intensity, etc) according to the MS-instruments of choice.
deMix Software
deMix software suite was implemented in Java programming language (v1.7), and can be obtained upon request to the corresponding author.
Supplementary information
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2017R1E1A1A01077412); by the National Research Foundation of Korea (NRF) grants (NRF-2018R1A4A1025985); by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2017R1C1B5017497); by R&D program for Advanced Integrated-intelligence for IDentification (AIID) through the National Research Foundation of Korea (NRF) funded by the Korea government (MSIT) (No. 2018M3E3A1057288); and by the Global Research Lab Program (2012K1A1A2045441). S.N. was supported by BK21 Plus project.
Author Contributions
S.N. designed the algorithms, implemented the software, and wrote the paper. J.J.L. designed and conducted the HDX-MS experiments. J.W.J. implemented the software. K.J.L. designed the HDX-MS experiments. E.P. designed the algorithms and wrote the paper. All authors discussed, commented and contributed to writing the paper.
Data Availability
The authors declare to make the data used in this manuscript available anytime on requirement.
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Supplementary information accompanies this paper at 10.1038/s41598-019-39512-8.
References
- 1.Englander SW, Mayne L, Bai Y, Sosnick TR. Hydrogen exchange: the modern legacy of Linderstrøm-Lang. Protein Sci. 1997;6:1101–1109. doi: 10.1002/pro.5560060517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Houde D, Arndt J, Domeier W, Berkowitz S, Engen JR. Characterization of IgG1 Conformation and Conformational Dynamics by Hydrogen/Deuterium Exchange Mass Spectrometry. Anal. Chem. 2009;81:2644–2651. doi: 10.1021/ac802575y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Engen JR, Smith DL. Investigating protein structure and dynamics by hydrogen exchange MS. Anal. Chem. 2001;73:256A–265A. doi: 10.1021/ac012452f. [DOI] [PubMed] [Google Scholar]
- 4.Slysz GW, Percy AJ, Schriemer DC. Restraining expansion of the peak envelope in H/D exchange-MS and its application in detecting perturbations of protein structure/dynamics. Anal. Chem. 2008;80:7004–7011. doi: 10.1021/ac800897q. [DOI] [PubMed] [Google Scholar]
- 5.Weis DD, Engen JR, Kass IJ. Semi-automated data processing of hydrogen exchange mass spectra using HX-Express. J. Am. Soc. Mass Spectrom. 2006;17:1700–1703. doi: 10.1016/j.jasms.2006.07.025. [DOI] [PubMed] [Google Scholar]
- 6.Pascal BD, et al. The Deuterator: software for the determination of backbone amide deuterium levels from H/D exchange MS data. BMC Bioinformatics. 2007;8:156. doi: 10.1186/1471-2105-8-156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Slysz GW, et al. Hydra: software for tailored processing of H/D exchange data from MS or tandem MS analyses. BMC Bioinformatics. 2009;10:162. doi: 10.1186/1471-2105-10-162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pascal BD, et al. HDX workbench: software for the analysis of H/D exchange MS data. J. Am. Soc. Mass Spectrom. 2012;23:1512–1521. doi: 10.1007/s13361-012-0419-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Chik JK, Vande Graaf JL, Schriemer DC. Quantitating the statistical distribution of deuterium incorporation to extend the utility of H/D exchange MS data. Anal. Chem. 2006;78:207–214. doi: 10.1021/ac050988l. [DOI] [PubMed] [Google Scholar]
- 10.Zhang Z, Guan S, Marshall AG. Enhancement of the effective resolution of mass spectra of high-mass biomolecules by maximum entropy-based deconvolution to eliminate the isotopic natural abundance distribution. J. Am. Soc. Mass Spectrom. 1997;8:659–670. doi: 10.1016/S1044-0305(97)82982-0. [DOI] [Google Scholar]
- 11.Palmblad M, Buijs J, Håkansson P. Automatic analysis of hydrogen/deuterium exchange mass spectra of peptides and proteins using calculations of isotopic distributions. J. Am. Soc. Mass Spectrom. 2001;12:1153–1162. doi: 10.1016/S1044-0305(01)00301-4. [DOI] [PubMed] [Google Scholar]
- 12.Hotchko M, Anand GS, Komives EA, Ten Eyck LF. Automated extraction of backbone deuteration levels from amide H/2H mass spectrometry experiments. Protein Sci. 2006;15:583–601. doi: 10.1110/ps.051774906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang J, Ramachandran P, Kumar R, Gross ML. H/D exchange centroid monitoring is insufficient to show differences in the behavior of protein states. J. Am. Soc. Mass Spectrom. 2013;24:450–453. doi: 10.1007/s13361-012-0555-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Percy AJ, Rey M, Burns KM, Schriemer DC. Probing protein interactions with hydrogen/deuterium exchange and mass spectrometry-a review. Anal. Chim. Acta. 2012;721:7–21. doi: 10.1016/j.aca.2012.01.037. [DOI] [PubMed] [Google Scholar]
- 15.Weis DD, Wales TE, Engen JR, Hotchko M, Ten Eyck LF. Identification and characterization of EX1 kinetics in H/D exchange mass spectrometry by peak width analysis. J. Am. Soc. Mass Spectrom. 2006;17:1498–1509. doi: 10.1016/j.jasms.2006.05.014. [DOI] [PubMed] [Google Scholar]
- 16.Kan ZY, Mayne L, Chetty PS, Englander SW. ExMS: data analysis for HX-MS experiments. J. Am. Soc. Mass Spectrom. 2011;22:1906–1915. doi: 10.1007/s13361-011-0236-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lindner R, et al. Hexicon 2: automated processing of hydrogen-deuterium exchange mass spectrometry data with improved deuteration distribution estimation. J. Am. Soc. Mass Spectrom. 2014;25:1018–1028. doi: 10.1007/s13361-014-0850-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Guttman M, Weis DD, Engen JR, Lee KK. Analysis of overlapped and noisy hydrogen/deuterium exchange mass spectra. J. Am. Soc. Mass Spectrom. 2013;24:1906–1912. doi: 10.1007/s13361-013-0727-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Song EJ, et al. Oxidative modification of nucleoside diphosphate kinase and its identification by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Biochemistry. 2000;39:10090–10097. doi: 10.1021/bi000267a. [DOI] [PubMed] [Google Scholar]
- 20.Lee E, et al. Multiple functions of Nm23-H1 are regulated by oxido-reduction system. PLoS One. 2009;4:e7949. doi: 10.1371/journal.pone.0007949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kim MS, Jeong J, Jeong J, Shin DH, Lee KJ. Structure of Nm23-H1 under oxidative conditions. Acta Crystallogr. D Biol. Crystallogr. 2013;69:669–680. doi: 10.1107/S0907444913001194. [DOI] [PubMed] [Google Scholar]
- 22.Min K, et al. Crystallization and preliminary X-ray crystallographic analysis of human nucleoside diphosphate kinase A. Acta Crystallogr. D Biol. Crystallogr. 2000;56:504–505. doi: 10.1107/S0907444900002626. [DOI] [PubMed] [Google Scholar]
- 23.Song IK, et al. Degradation of redox-sensitive proteins including peroxiredoxins and DJ-1 is promoted by oxidation-induced conformational changes and ubiquitination. Sci. Rep. 2016;6:34432. doi: 10.1038/srep34432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee JJ, et al. Calcium ion induced structural changes promote dimerization of secretagogin, which is required for its insulin secretory function. Sci. Rep. 2017;7:6976. doi: 10.1038/s41598-017-07072-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lee JJ, et al. Small molecule activator of Nm23/NDPK as an inhibitor of metastasis. Sci. Rep. 2018;8:10909. doi: 10.1038/s41598-018-29101-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Choi JH, et al. Antidiabetic actions of a non-agonist PPARγ ligand blocking Cdk5-mediated phosphorylation. Nature. 2010;477:477–481. doi: 10.1038/nature10383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lee JJ, Park YS, Lee KJ. Hydrogen-deuterium exchange mass spectrometry for determining protein structural changes in drug discovery. Arch. Pharm. Res. 2015;38:1737–1745. doi: 10.1007/s12272-015-0584-9. [DOI] [PubMed] [Google Scholar]
- 28.Kim SY, et al. Rapid purification and characterization of nucleoside diphosphate kinase isoforms using ATP-sepharose affinity column chromatography. Mol. Cells. 1997;7:630–634. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors declare to make the data used in this manuscript available anytime on requirement.