Dear Editor,
The field of complex mixture analysis has advanced significantly in the past two decades, although its history goes much further back. When Dirk Willem van Krevelen developed his now eponymous diagram in 1950 to represent the chemical makeup of coals, he proposed that the chemical nature of samples, including the presence of structural motifs and chemical properties, could be inferred from the elemental ratios of the sample.1 While his work, limited by the technology of the era, looked at whole samples characterised by the ratio of elements present, i.e. number of carbons‐to‐hydrogens within the sample, modern mass spectrometry allows us to examine in a similar manner the individual components of a complex mixture.
Since 2003, when the modern van Krevelen diagram was first used to visualise complex MS datasets,2 every significant high‐resolution mass spectrometric analysis of a complex mixture has included one.3, 4, 5 Today's van Krevelen diagram places every assigned unique chemical formula on a 2D scatter plot of H/C ratio versus O/C ratio, although other elemental ratios can also be used. Although this represents a break from the original intentions of van Krevelen, the modified technique has become a useful tool for the interpretation and visualisation of complex data. For example, regions of the van Krevelen plot can be tentatively associated with certain compound classes,2, 6, 7 such as lipids (O/C < 0.2, H/C 2 – values quoted are approximate), carbohydrates (H/C 2, O/C 1), or condensed hydrocarbons (O/C < 0.2, H/C < 1).
In the field of complex mixture analysis, a number of methods are available to the enterprising chemist; however, Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) reigns supreme as the ‘gold standard’ technique.8, 9 Likewise, there exist a number of well‐studied complex mixtures, including natural organic matter (NOM), i.e. dissolved organic matter,5, 10, 11 soil organic matter,12 and organic aerosols,13, 14, 15 petroleum,16, 17 or beverages such as wine18 or Scotch whisky.19, 20 Amongst the most complex of these, a component of NOM and the closest sample to a universal standard, is Suwannee River Fulvic Acid (SRFA) produced by the International Humic Substance Society.21 A typical electrospray ionisation (ESI)‐FTICR mass spectrum of SRFA will contain thousands of peaks across a range of masses, predominantly between m/z 200 and 700. Due to its ubiquity and complexity, SRFA was chosen to demonstrate the capability of the visualisation tools described herein.
With the mass accuracy of FTICR MS spectra in parts‐per‐billion,22, 23 routine and confident assignment of thousands of unique chemical formulae to individual peaks is now increasingly possible. The generation of this volume of data represents a significant challenge in terms of data visualisation, interrogation, and interpretation that has not been addressed so far. Here, we present a handful of tools aimed at filling this gap.
We have developed a version of the van Krevelen diagram, which introduces interactivity, and allows the analyst, or reviewer, to interrogate the data in an intuitive way. This interactive van Krevelen, or i‐van Krevelen for short, is generated using the Bokeh Python plotting library.24 The developed tools are fully compatible with data assigned using any software package, as the input for the i‐van Krevelen scripts are three text files containing (1) monoisotopic peak assignments, (2) isotopologue peak assignments, (3) remaining unassigned, but detected, peaks. Example input files are included with the suite of presented tools. The Bokeh API allows for the straightforward coding, in Python, of complex JavaScript (JSON) plots as HTML5 Canvas objects. The output from this tool is a standard HTML document compatible with any modern web browser such as Google Chrome, Firefox, or Internet Explorer.
The main feature of the i‐van Krevelen software is the generation of interactive diagrams including a centroid mass spectrum, van Krevelen, DBE vs carbon number plot and the modified Aromaticity Index vs carbon number plot.25 The plots are linked together, such that selecting any data points in one plot highlights those same points – i.e. unique chemical formula – in the other plots. In addition, these plots are explorable, featuring zoom and pan tools, as well as a display of the key information of each point in a hover‐tool. Finally, the data points can be used as hyperlinks – in our implementation, they link to a ChemSpider (The Royal Society of Chemistry, Cambridge, UK) search for their molecular formula.
The benefits of these features will be immediately obvious to any analytical chemist who has tried to make sense of complex static van Krevelen diagrams of complex mixtures.
For example, in a standard van Krevelen plot, numerous points may be superimposed if they share elemental ratios but differ in molecular formulae. As a van Krevelen plot is a specific type of scatter plot, it is susceptible to the same problems as other any other scatter plot, and can be misinterpreted when hundreds or thousands of points are plotted. Whilst the addition of colour and transparency can reduce these problems, they are not eliminated entirely.26, 27 One alternative is to plot data density, not individual data points – i.e. a histogram or kernel density plot in 1D, or a hexagonally binned data plot in 2D.28 This allows easier visualisation of where the most (or largest, or most intense, depending on the density variable) data points are; however, this approach leads to a loss of information about specific components and their molecular formulae. With interactivity, however, a user can zoom to a region of interest in the plot, and use the hover‐tools to identify every component contributing to a particular point, thus removing the ambiguity caused by the overlap. Furthermore, we encode the relative abundance of a species by the size of the glyph on the plot. The colour can then be used to indicate mass, as in our van Krevelen plots, or oxygen number, as in our DBE and AI plots. This approach is illustrated in our recent paper on Scotch whisky.20
Reducing complex data down to a two‐variable van Krevelen plot inevitably represents a loss of information. In our tool, we have therefore created several 2D plots that are linked together. An example of this layout is shown in Fig. 1. This allows for the relation of multiple variables to a single molecular formula in order to better understand the sample. For example, as shown in Fig. 2, we can select only the most intense signals in the spectrum. Here we can see that these species, whilst the dominant compounds in the mass spectrum, represent only a fraction of the diversity present in the sample as revealed by their position on the van Krevelen plot. This means that if we were to consider only the n most abundant ions – an approach utilised in some previous statistical analyses of complex spectra19 – we would be losing the vast majority of the chemical diversity of the sample. On the contrary, by selecting only the low‐abundance peaks, i.e. the “grass”, we can see that these signals do describe the chemical diversity of the sample more fully. Such information, which is lost in static van Krevelen plots, will be important for comparative studies aiming to characterise multiple samples by different ionisation techniques; for example, comparing ESI with MALDI (matrix‐assisted laser desorption/ionization) mass spectra, where the abundance of a species is a function of both concentration and ionisation energy. Likewise, this interactive selection of points can be used to easily link outliers on any plots to their positions on the mass spectrum, or understand where specific regions of these plots originate from in the mass spectra.
On a second tab of the HTML page, the centroid mass spectrum is plotted with the identified isotopomers, as well as the remaining unassigned peaks. An example of this is shown in Fig. 3. This gives the analyst, and more importantly the reader or reviewer, a straightforward means to see how well the spectrum was assigned, thus validating or otherwise the assignment methodologies.
Finally, on a third tab, the data table is presented that is required to generate the plots, and it is also interactively linked to the plots, meaning that selections made on any plot are highlighted in the data table, and vice versa. This data table is downloadable as a text file.
The developed code also includes a number of related Python scripts for: (i) automated batch plotting of publication quality van Krevelen and DBE vs Carbon Number plots; (ii) heteroatomic class distribution calculation and plotting; (iii) an “all‐possible‐formula‐generator”, which calculates a list of possible, logical, chemical formulae as based on work done by Kind et al.;29 (iv) a tool to batch perform automated exact mass‐to‐formula assignment based on Kendrick mass defect analysis and z* by looking for homologous series of compounds;30 and (v) a tool for reformatting of PetroOrg (Florida State University, Tallahassee, FL, USA) output CSV files. Assignment files generated by the latter two tools produce, as outputs, inputs for the i‐van Krevelen software and other included scripts. The included formula generator is especially useful for determining assignment error thresholds, for example by allowing the user to determine the minimum distance between possible compounds at a given m/z, and thus adding confidence to the assignment.
Overall, these interactive plots, and their combination, represent a step forward in the analysis of complex mixtures by high‐resolution mass spectrometry. The tools are open‐source and available freely through GitHub with a GNU General Public License v3.0, encouraging others to experiment with and build upon them. The GitHub repository31 can be found online.32 An online tool allowing the use of some of these tools without the need to install any specialist software has also been developed, and can be found through the GitHub repository. An example of the interactive plots enabled by this initial i‐van Krevelen package based on the SRFA FTICR MS data can also be found online.33
Future work could incorporate the Datashader34 package, which would allow the visualisation of the raw profile spectra in a web browser without the need for the end user to download large data files or install proprietary mass spectrometry software, as well as the Bokeh Server tool, allowing the user to dynamically select which variables to plot on each axis, or to choose a specific colour or size scale. Examples of code for the Datashader functionality are included as a Jupyter Notebook in the GitHub repository.
Acknowledgements
The authors gratefully acknowledge the continued support and advice of Dr Logan Mackay, and his maintenance of the FT‐ICR MS instrument. This project was supported by BBSRC grant BB/L016311/1 and The Scotch Whisky Research Institute.
Kew, W. , Blackburn, J. W. T. , Clarke, D. J. , and Uhrín, D. (2017) Interactive van Krevelen diagrams – Advanced visualisation of mass spectrometry data of complex mixtures. Rapid Commun. Mass Spectrom., 31: 658–662. doi: 10.1002/rcm.7823.
Contributor Information
William Kew, Email: w.kew@sms.ed.ac.uk.
Dušan Uhrín, Email: dusan.uhrin@ed.ac.uk.
References
- 1. Van Krevelen D.. Graphical statistical method for the study of structure and reaction processes of coal. Fuel 1950, 29, 269. [Google Scholar]
- 2. Kim S., Kramer R. W., Hatcher P. G.. Graphical method for analysis of ultrahigh‐resolution broadband mass spectra of natural organic matter, the Van Krevelen diagram. Anal. Chem. 2003, 75, 5336. [DOI] [PubMed] [Google Scholar]
- 3. Wu Z., Rodgers R. P., Marshall A. G.. Two‐ and three‐dimensional van Krevelen diagrams: A graphical analysis complementary to the Kendrick mass plot for sorting elemental compositions of complex organic mixtures based on ultrahigh‐resolution broadband Fourier transform ion cyclotron resonance. Anal. Chem. 2004, 76, 2511. [DOI] [PubMed] [Google Scholar]
- 4. Herzsprung P., von Tümpling W., Hertkorn N., Harir M., Büttner O., Bravidor J., Friese K., Schmitt‐Kopplin P.. Variations of DOM quality in inflows of a drinking water reservoir: Linking of van Krevelen diagrams with EEMF spectra by rank correlation. Environ. Sci. Technol. 2012, 46, 5511. [DOI] [PubMed] [Google Scholar]
- 5. D'Andrilli J., Cooper W. T., Foreman C. M., Marshall A. G.. An ultrahigh‐resolution mass spectrometry index to estimate natural organic matter lability. Rapid Commun. Mass Spectrom. 2015, 29, 2385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Hertkorn N., Benner R., Frommberger M., Schmitt‐Kopplin P., Witt M., Kaiser K., Kettrup A., Hedges J. I.. Characterization of a major refractory component of marine dissolved organic matter. Geochim. Cosmochim. Acta 2006, 70, 2990. [Google Scholar]
- 7. Gougeon R. D., Lucio M., Frommberger M., Peyron D., Chassagne D., Alexandre H., Feuillat F., Voilley A., Cayot P., Gebefugi I., Hertkorn N., Schmitt‐Kopplin P.. The chemodiversity of wines can reveal a metabologeography expression of cooperage oak wood. Proc. Natl. Acad. Sci. 2009, 106, 9174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Lim L., Yan F., Bach S., Pihakari K., Klein D.. Fourier transform mass spectrometry: The transformation of modern environmental analyses. Int. J. Mol. Sci. 2016, 17, 104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Marshall A. G., Hendrickson C. L., Jackson G. S.. Fourier transform ion cyclotron resonance mass spectrometry: A primer. Mass Spectrom. Rev. 1998, 17, 1. [DOI] [PubMed] [Google Scholar]
- 10. Cao D., Huang H., Hu M., Cui L., Geng F., Rao Z., Niu H., Cai Y., Kang Y.. Comprehensive characterization of natural organic matter by MALDI‐ and ESI‐Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chim. Acta 2015, 866, 48. [DOI] [PubMed] [Google Scholar]
- 11. Herzsprung P., Hertkorn N., von Tümpling W., Harir M., Friese K., Schmitt‐Kopplin P.. Molecular formula assignment for dissolved organic matter (DOM) using high‐field FT‐ICR‐MS: chemical perspective and validation of sulphur‐rich organic components (CHOS) in pit lake samples. Anal. Bioanal. Chem. 2016, 408, 2461. [DOI] [PubMed] [Google Scholar]
- 12. Mann B. F., Chen H., Herndon E. M., Chu R. K., Tolic N., Portier E. F., Roy Chowdhury T., Robinson E. W., Callister S. J., Wullschleger S. D., Graham D. E., Liang L., Gu B.. Indexing permafrost soil organic matter degradation using high‐resolution mass spectrometry. PLoS One 2015, 10, e0130557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Nozière B., Kalberer M., Claeys M., Allan J., D'Anna B., Decesari S., Finessi E., Glasius M., Grgić I., Hamilton J. F., Hoffmann T., Iinuma Y., Jaoui M., Kahnt A., Kampf C. J., Kourtchev I., Maenhaut W., Marsden N., Saarikoski S., Schnelle‐Kreis J., Surratt J. D., Szidat S., Szmigielski R., Wisthaler A.. The molecular identification of organic compounds in the atmosphere: State of the art and challenges. Chem. Rev. 2015, 115, 3919. [DOI] [PubMed] [Google Scholar]
- 14. Tao S., Lu X., Levac N., Bateman A. P., Nguyen T. B., Bones D. L., Nizkorodov S. A., Laskin J., Laskin A., Yang X.. Molecular characterization of organosulfates in organic aerosols from Shanghai and Los Angeles urban areas by nanospray‐desorption electrospray ionization high‐resolution mass spectrometry. Environ. Sci. Technol. 2014, 48, 10993. [DOI] [PubMed] [Google Scholar]
- 15. Wozniak A. S., Bauer J. E., Sleighter R. L., Dickhut R. M., Hatcher P. G.. Technical Note: Molecular characterization of aerosol‐derived water soluble organic carbon using ultrahigh resolution electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry. Atmos. Chem. Phys. 2008, 8, 5099. [Google Scholar]
- 16. Hughey C. A., Hendrickson C. L., Rodgers R. P., Marshall A. G., Qian K.. Kendrick mass defect spectrum: A compact visual analysis for ultrahigh‐resolution broadband mass spectra. Anal. Chem. 2001, 73, 4676. [DOI] [PubMed] [Google Scholar]
- 17. McKenna A. M., Williams J. T., Putman J. C., Aeppli C., Reddy C. M., Valentine D. L., Lemkau K. L., Kellermann M. Y., Savory J. J., Kaiser N. K., Marshall A. G., Rodgers R. P.. Unprecedented ultrahigh resolution FT‐ICR mass spectrometry and parts‐per‐billion mass accuracy enable direct characterization of nickel and vanadyl porphyrins in petroleum from natural seeps. Energy Fuels 2014, 28, 2454. [Google Scholar]
- 18. Cooper H. J., Marshall A. G.. Electrospray ionization Fourier transform mass spectrometric analysis of wine. J. Agric. Food Chem. 2001, 49, 5710. [DOI] [PubMed] [Google Scholar]
- 19. Garcia J. S., Vaz B. G., Corilo Y. E., Ramires C. F., Saraiva S. A. S. A., Sanvido G. B., Schmidt E. M., Maia D. R. J. J., Cosso R. G., Zacca J. J., Eberlin M. N.. Whisky analysis by electrospray ionization‐Fourier transform mass spectrometry. Food Res. Int. 2013, 51, 98. [Google Scholar]
- 20. Kew W., Goodall I., Clarke D., Uhrín D.. Chemical diversity and complexity of Scotch whisky as revealed by high‐resolution mass spectrometry. J. Am. Soc. Mass Spectrom. 2017, 28, 200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Malcolm R. L., MacCarth P.. A proposal for implementing a reference collection of humic and fulvic acids, in Trace Organic Analysis: A New Frontier in Analytical Chemistry, (Eds: Chesier S. N., Hertz H. S.). U.S. National Bureau of Standards, Maryland, 1979, pp. 789–792. [Google Scholar]
- 22. Stenson A. C., Marshall A. G., Cooper W. T.. Exact masses and chemical formulas of individual Suwannee River fulvic acids from ultrahigh resolution electrospray ionization Fourier transform ion cyclotron resonance mass spectra. Anal. Chem. 2003, 75, 1275. [DOI] [PubMed] [Google Scholar]
- 23. Shaw J. B., Lin T.‐Y., Leach F. E., Tolmachev A. V., Tolić N., Robinson E. W., Koppenaal D. W., Paša‐Tolić L.. 21 Tesla Fourier transform ion cyclotron resonance mass spectrometer greatly expands mass spectrometry toolbox. J. Am. Soc. Mass Spectrom. 2016, 27, 1929. [DOI] [PubMed] [Google Scholar]
- 24. Bokeh Development Team . Bokeh: Python library for interactive visualization. 2014, URL http://bokeh.pydata.org (accessed 04/01/2017).
- 25. Koch B. P., Dittmar T.. From mass to structure: An aromaticity index for high‐resolution mass data of natural organic matter. Rapid Commun. Mass Spectrom. 2006, 20, 926. [Google Scholar]
- 26. Keim D. A., Hao M. C., Dayal U., Janetzko H., Bak P.. Generalized scatter plots. Inf. Vis. 2010, 9, 301. [Google Scholar]
- 27. Mayorga A., Gleicher M.. Splatterplots: Overcoming overdraw in scatter plots. IEEE Trans. Vis. Comput. Graph. 2013, 19, 1526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Carr D. B., Littlefield R. J., Nicholson W. L., Littlefield J. S.. Scatterplot matrix techniques for large N. J. Am. Stat. Assoc. 1987, 82, 424. [Google Scholar]
- 29. Kind T., Fiehn O.. Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 2007, 8, 105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Hsu C. S., Qian K., Chen Y. C.. An innovative approach to data analysis in hydrocarbon characterization by on‐line liquid chromatography‐mass spectrometry. Anal. Chim. Acta 1992, 264, 79. [Google Scholar]
- 31. Kew W., Blackburn J. W. T., Clarke D., Uhrín D.. FTMSVisualization: verson 1 – Public. 2016, GitHub, DOI: https://doi.org/10.5281/zenodo.165785.
- 32.Available: https://github.com/wkew/FTMSVisualization.
- 33.Available: https://wkew.github.io/FTMSViz/SRFA‐plot.html.
- 34. Cottam J. A., Lumsdaine A., Wang P.. Abstract rendering: out‐of‐core rendering for information visualization, in Proceedings of SPIE – The International Society for Optical Engineering, (Eds: Wong P. C., Kao D. L., Hao M. C., Chen C.). 2013, p. 90170K. [Google Scholar]