Abstract
NMR studies of intrinsically disordered proteins and other complex biomolecular systems require spectra with the highest resolution and dimensionality. An efficient approach, extra‐large NMR spectroscopy, is presented for experimental data collection, reconstruction, and handling of very large NMR spectra by a combination of the radial and non‐uniform sampling, a new processing algorithm, and rigorous statistical validation. We demonstrate the first high‐quality reconstruction of a full seven‐dimensional HNCOCACONH and two five‐dimensional HACACONH and HN(CA)CONH experiments for a representative intrinsically disordered protein α‐synuclein. XLSY will significantly enhance the NMR toolbox in challenging biomolecular studies.
Keywords: intrinsically disordered protein, NMR spectroscopy, non-uniform sampling, XLSY
With the ever‐increasing sizes and complexity of biomolecular systems studied by NMR spectroscopy, the number of peaks and hence signal overlap increases which seriously complicates and compromises the data analysis. The problem is addressed by enhancing spectral resolution and dimensionality with the radial (RS) and non‐uniform sampling (NUS).1 However, the task of reconstructing and handling of very large spectra is still awaiting a good solution.
The RS approach, which is based on direct analysis of planar spectral projections, is an efficient way of signal detection in high‐dimensional experiments.2 However, a set of planar projections is not a fair substitute for a true multidimensional experiment, especially in case of a highly crowded spectrum of a challenging protein system such as an intrinsically disordered protein (IDP).
The best algorithms for reconstructing spectra from NUS3 data are impractical for large data sets owing to unbearable computational and storage requirements. Existing methods for spectra with five dimensions require a three‐dimensional reference spectrum or peak list4 whereas six‐ and seven‐dimensional spectra are produced only as their reduced dimensionality projections.4b A possible solution for the large data sets may be found in the family of parametric algorithms,5 although no examples of spectra reconstructions with more than four dimensions have been presented so far.
Herein we introduce XLSY–NMR spectroscopy for extra‐large datasets. When dealing with a large spectrum, the main problem stems from its size that requires huge amounts of computational power for processing and by far exceeds computer memory. Notably, a multidimensional NMR spectrum is sparse and thus can be presented in a compact form both in time and frequency domains. XLSY is a non‐iterative procedure that converts a small number of RS/NUS measurements into a compact, high‐quality sparse spectrum without ever dealing with the huge full data representation in either the time or frequency domains.
The XLSY algorithm for spectrum reconstruction consists of three steps: 1) frequency identification, 2) intensity evaluation, and 3) validation.
The frequency identification borrows part of the SFFT algorithm6 for finding a short list of frequencies in the spectrum that may have significant (that is, higher than noise) intensities. This part is based on the radial sampling and the Fourier projection theorem.7 For an illustration of the algorithm, let us consider the simplest case of only two spectral dimensions each spanning N points. The two‐dimensional spectrum contains N×N points with frequency coordinates (f 1,f 2). In an experiment, we measure a one‐dimensional projection that contains N frequency points enumerated by index f. To distinguish frequency points in a multidimensional spectrum (f 1,f 2) and in a 1D projection, we call the later buckets. Value of each bucket with position f is given by the sum of N points of the 2D, which positions (f 1,f 2) fulfil the relation [Eq. (1a)a)]:
(1a) |
For a spectrum with many dimensions, the corresponding general relation is [Eq. (1b)b)]:
(1b) |
where mod is the modulo operator, α 1 and α 2 are integers, and α 1/α 2 represents a slope of the projection. Since the spectrum is sparse, most buckets contain only noise and are considered empty. A few buckets, the intensities of which exceed a chosen noise threshold, correspond to one or a few non‐zero frequencies points (f 1,f 2). To find the exact position of these essential frequencies in the 2D plane, we measure and analyze several projections with different tilt angles. Figure 1 illustrates the cumulative analysis of several projections from a 2D spectrum with only 3 non‐zero points. After occupied buckets of the first projection are identified, all N frequencies that contributed to those buckets get a vote. This procedure is repeated for different projections until a consistent short list of essential frequencies is discriminated by maximum number of votes. Points with frequencies that accumulated none or too few votes are considered to have exactly zero intensities in the spectrum and are omitted from further consideration and storage. To ensure picking up of low‐intensity peaks, the threshold in the projections can be as low as 2 σ noise and the voting cut‐off is set on the level of 70–80 % of the maximum defined by the number of used projections. For example, in Figure 1 d, three correct frequencies collect a maximal possible number of four votes each (blue). There are also four points that collect 3 votes (the darkest brown), which are artefacts of the radial sampling. These points may be kept in the short list of frequencies and will be eliminated at the final validation step.
Figure 1.
An illustration of the voting procedure using Eq. (1): a) positions of three signals, b) voting with two orthogonal projections, c) addition of the first diagonal projection, and d) voting with two orthogonal and two diagonal projections. Pixel colour in (b)–(d) from light to dark indicates the number of votes from one to four. See the text for more explanations.
Evaluation of intensities for the frequencies shortlisted at the identification step is performed by solving a system of linear equations [Eq. (2a)a)]:
(2a) |
where vector s consists of N f unknown spectral intensities. Vector t is composed of N t experimental complex time‐domain data points. A is a N t×N f complex matrix obtained from the matrix of discrete inverse d‐dimensional Fourier transform by retaining only columns and rows corresponding to the shortlisted frequencies and available experimental points, respectively. Matrix elements A n,k are calculated as [Eq. (2b)a)]
(2b) |
where d is the number of indirect dimensions spanning N points each (in our case the same for all dimensions); k̂ is a d‐dimensional vector of coordinates of the k‐th point in the frequency domain corresponding to intensity sk; and n̂ is a d‐dimensional vector of coordinates of the n‐th point in the time domain corresponding to the measured value tn.
To obtain a unique and reliable solution, matrix A in the system in Equation (2a) must be skinny, that is, number of unknown spectral intensities N f must be lower than the number of linear equations N t. Besides, to obtain well‐conditioned matrix A, it is essential to augment the RS data by NUS measurements. The possibility to use NUS along with RS data is a key feature of the new method that for the first time allowed ambiguities to be resolved and spurious aliasing peaks that are inherent in RS data to be avoided.1b, 3c To further stabilize the solution of the linear system in Equation (2a) for the most crowded spectral regions, we use a mild Tikhonov regularization.8
At the final validation step, which is also unique for the XLSY algorithm, the bootstrap approach9 is used to estimate individual uncertainties for every calculated point in the spectrum. This defines a local noise level that may vary significantly from one spectral region to another depending on the signal density. The local noise estimate also sets an upper intensity boundary for the weak peaks that might be lost in the spectrum reconstruction. Furthermore, the uncertainties play an important role in the XLSY algorithm for detecting true weak peaks and discarding a sizable fraction (20–80 %) of points in the frequency shortlist that originate from the radial sampling artefacts. Since the NUS data are used for solving Equation (2), these frequencies are easily spotted as having statistically insignificant intensities.
Additional comment should be made about sensitivity of the XLSY method. Although at the evaluation and validations steps of the algorithm, all the RS and NUS experimental data can be used together, the identification step implies signal thresholding in the individual projections, which are recorded in a fraction of total experiment time. Similar to the APSY approach,2d,2e the corresponding loss of sensitivity is largely offset by the combined analysis of the projections in the voting procedure. For our spectra, the signal threshold in the projections was set to 2–3 σ noise level, which is well below the threshold commonly used for peak detection in individual NMR spectra. Note that 5 % (1 %) of noise intensities exceed the 2 (3) σ noise threshold just by chance.
We demonstrate the XLSY by the first high‐quality reconstruction of three very large spectra consisting of 1010–1015 points. Figure 2 illustrates XLSY reconstructions of 7D HNCOCACONH,10 5D HACACONH,11 and 5D HN(CA)CONH12 spectra for a representative 14 kDa IDP α‐synuclein (Table 1). Figure 2 b–d highlights the genuine resolution of the 7D spectrum. Peaks of E105 and E131 from EEG sequence repeats, which are resolved in the 7D HNCOCACONH spectrum, are fully overlapped in 5D spectra (insets of Figure 2 e) and cannot be resolved in any radially sampled planar projections of the spectrum.
Figure 2.
XLSY reconstructions of α‐synuclein spectra. a) CO i−1/Ni+1 projection of 7D HNCOCACONH spectrum. b)–d) Slices of the 7D spectrum through peaks for E105 and E131. e) Ni/Cα i−1 projection of 5D HACACONH spectrum. Insets show 1D cross‐sections taken through the cross peak E131/E130 (overlapped with E105/E104). f) An example of a sequential assignment walk in the Ni+1/Ni projection from 5D HN(CA)CONH spectrum.
Table 1.
Parameters of NMR experiments and XLSY reconstructions.
5D | 7D | |
---|---|---|
Experimental time RS/NUS/total, hours |
23.5/3.5/27 | 49/68/117 |
Time‐domain projections | 19 | 14 |
Number of NUS points | 75 | 310 |
Number of shortlisted frequencies | 8×104/6×104 [a] | 3×106 |
Final size of the reconstruction after validation, pts | 3.9×104/3.5×104 [a] | 4.6×104 |
[a] 5D HACACONH/ 5D HN(CA)CONH spectra, respectively.
The XLSY spectra can be easily handled and analyzed. They have compact sparse‐matrix representation and contain only statistically validated intensities. For visualization and detailed analysis, any spectral slice or projection can be obtained including those that are very difficult or impossible to obtain in experiments with lower dimensions. Figure 2 a shows example of a unique orthogonal projection CO i−1/Ni+1 of the 7D HNCOCACONH spectrum. Figures 2 e,f illustrate quality of the 5D HACACONH and 5D HN(CA)CONH spectra with two planar projections Ni/Cα i−1 and Ni/Ni−1. Figure 2 f shows a partial sequential assignment walk performed in the 5D HN(CA)CONH spectrum for the stretch of amino acids from A69 to T64.
In the multidimensional spectra, peaks are well‐resolved and semi‐automatic signal detection is straightforward. In the 5D HACACONH and 5D HN(CA)CONH spectra, we found all peaks expected for α‐synuclein with exception of five prolines and two residues at the N‐terminus. In the 7D HNCOCACONH we found all peaks that were present in the orthogonal projections of the experiment. The peak lists together with the corresponding backbone assignment of α‐synuclein are deposited in BMRB (27586). The assignment is in line with the published assignment (BMRB No. 6968), which was obtained at different sample temperature.
In conclusion, by demonstrating the first high quality reconstruction of complete 7D and 5D spectra of a representative IDP we introduce the XLSY method that removes the limits on spectrum dimensionality and resolution imposed by the existing signal acquisition and processing approaches. We envisage that the method will be most useful in studies of IDPs and in automatized high‐throughput characterization of small and medium size globular protein systems, where experiments with high dimensionality and resolution are in the highest demand.13
Conflict of interest
The authors declare no conflict of interest.
Supporting information
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Supplementary
Acknowledgements
The work was supported by the Swedish Research Council (Research Grant 2015–04614); the Swedish NMR Centre is acknowledged for spectrometer time.
Y. Pustovalova, M. Mayzel, V. Y. Orekhov, Angew. Chem. Int. Ed. 2018, 57, 14043.
References
- 1.
- 1a. Hoch J. C., J. Magn. Reson. 2017, 283, 117–123; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 1b. Kazimierczuk K., Orekhov V., Magn. Reson. Chem. 2015, 53, 921–926. [DOI] [PubMed] [Google Scholar]
- 2.
- 2a. Coggins B. E., Venters R. A., Zhou P., Prog. Nucl. Magn. Reson. Spectrosc. 2010, 57, 381–419; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2b. Eghbalnia H. R., Bahrami A., Tonelli M., Hallenga K., Markley J. L., J. Am. Chem. Soc. 2005, 127, 12528–12536; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2c. Freeman R., Kupce E., Conc. Magn. Reson. 2004, 23A, 63–75; [Google Scholar]
- 2d. Hiller S., Fiorito F., Wüthrich K., Wider G., Proc. Natl. Acad. Sci. USA 2005, 102, 10876–10881; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2e. Narayanan R. L., Dürr U. H., Bibow S., Biernat J., Mandelkow E., Zweckstetter M., J. Am. Chem. Soc. 2010, 132, 11906–11907. [DOI] [PubMed] [Google Scholar]
- 3.
- 3a. Hyberts S. G., Milbradt A. G., Wagner A. B., Arthanari H., Wagner G., J. Biomol. NMR 2012, 52, 315–327; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3b. Kazimierczuk K., Orekhov V. Y., Angew. Chem. Int. Ed. 2011, 50, 5556–5559; [DOI] [PubMed] [Google Scholar]; Angew. Chem. 2011, 123, 5670–5673; [Google Scholar]
- 3c. Mobli M., Stern A. S., Hoch J. C., J. Magn. Reson. 2006, 182, 96–105; [DOI] [PubMed] [Google Scholar]
- 3d. Qu X. B., Mayzel M., Cai J. F., Chen Z., Orekhov V., Angew. Chem. Int. Ed. 2015, 54, 852–854; [DOI] [PubMed] [Google Scholar]; Angew. Chem. 2015, 127, 866–868; [Google Scholar]
- 3e. Sun S. J., Gill M., Li Y. F., Huang M., Byrd R. A., J. Biomol. NMR 2015, 62, 105–117; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3f. Ying J., Delaglio F., Torchia D. A., Bax A., J. Biomol. NMR 2016, 1–18. [Google Scholar]
- 4.
- 4a. Kosiński K., Stanek J., Górka M. J., Żerko S., Koźmiński W., J. Biomol. NMR 2017, 68, 129–138; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4b. Żerko S., Koźmiński W., J. Biomol. NMR 2015, 63, 283–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.
- 5a. Hansen A. L., Li D., Wang C., Bruschweiler R., Angew. Chem. Int. Ed. 2017, 56, 8149–8152; [DOI] [PMC free article] [PubMed] [Google Scholar]; Angew. Chem. 2017, 129, 8261–8264; [Google Scholar]
- 5b. Jaravine V., Ibraghimov I., Orekhov V. Y., Nat. Methods 2006, 3, 605–607. [DOI] [PubMed] [Google Scholar]
- 6. Hassanieh H., Mayzel M., Shi L., Katabi D., Orekhov V. Y., J. Biomol. NMR 2015, 63, 9–19. [DOI] [PubMed] [Google Scholar]
- 7. Bracewell R. N., Aust. J. Phys. 1956, 9, 198–217. [Google Scholar]
- 8. Tikhonov A. N., Samarskii A. A., Equations of Mathematical Phisics, Dover Publications, New York, 2011. [Google Scholar]
- 9. Efron B. T., An Introduction to the Bootstrap., Chapman & Hall, London, 1993. [Google Scholar]
- 10. Fiorito F., Hiller S., Wider G., Wuthrich K., J. Biomol. NMR 2006, 35, 27–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Hiller S., Wider G., Wüthrich K., J. Biomol. NMR 2008, 42, 179–195. [DOI] [PubMed] [Google Scholar]
- 12. Motáčková V., Nováček J., Zawadzka-Kazimierczuk A., Kazimierczuk K., Žídek L., Šanderová H., Krásný L., Koźmiński W., Sklenář V., J. Biomol. NMR 2010, 48, 169–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.The XLSY matlab scripts for sampling generation and spectra processing as well as guidelines and a test data are available upon request from the authors.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Supplementary