To the editor,
Mass spectrometry, specifically liquid chromatography-tandem mass spectrometry (LC-MS/MS), stands at the forefront of metabolomic, pharmaceutical, and clinical analyses, offering high sensitivity and specificity in detecting metabolites and other small molecules within biological samples. Yet, this powerful technique has unveiled a puzzling abundance of unidentified spectral features, coined the “dark metabolome”1,2 which starkly contrasts with the known metabolic diversity. While humans are known to host approximately 20,000 protein encoding genes3,4, with only a subset of these expressing enzymes, the presumed chemical diversity indicated by LC-MS/MS suggests hundreds of thousands to millions of metabolites remain to be characterised2,5. This discrepancy raises a crucial point: might there be a technological explanation inflating the perceived complexity instead of a biological one?
The central dogma of biology describes the relation and information flow between the different omic layers from genome to transcriptome to proteome to metabolome, with the genome being the master code imprinted in any living system at its conception6. The code linking the first three layers is rather straightforward with three nucleotides encoding one amino acid. However, such an encryption with the metabolome is yet to be revealed. This fact, combined with vast arrays of unannotated LC-MS/MS data, are two of the main reasons that the metabolome (being the entirety of small molecules within a biological system) cannot yet be defined. Current estimates suggest that less than 2% of observed LC-MS/MS spectra can be annotated, pointing to a potentially broad spectrum of unknown compounds2.
The vast number of modern metabolomics studies (>90%) are conducted by LC-MS/MS analysis applying electrospray ionisation (ESI) as a so-called soft ionisation process. Following ionisation in the ESI source, charged molecules are sent into a collision cell and fragmented for structural characterisation (Figure 1). This process is ideally repeated for all (charged) analytes of a sample. However, our laboratories (and others7) have observed and applied a phenomenon called in-source fragmentation (ISF)8,9. ISF relates to the fragmentation of analytes during the initial ionisation process within the ESI source and hence before the collision cell. ISF basically generates a forest from a tree, in other words a single analyte can be presented as a molecular ion and one or many fragments. The employed mass analyzer will blindly isolate and fragment (again) whatever is being sent into the collision cell. Given this fact, it appeared plausible to us that ISF might partially be responsible for the dark metabolome.
Figure 1. Electrospray ionisation (ESI) in-source fragmentation (ISF) and its broad impact on mass spectrometry data analysis.

(a) ISF is a common phenomenon that that occurs in the ESI source and has been investigated here across the METLIN MS/MS database of 931,000 molecular standards. (b) The analysis reveals that widespread ISF (exceeding 70%) is responsible for most of the ESI peaks (also known as features), instead of molecular ions. (c) This result indicates that the metabolome is in fact simpler than previously thought and that the “dark metabolome” is largely made up of fragment ions generated during ESI.
To test our hypothesis, we examined the METLIN MS/MS database. METLIN consists of 931,000 molecular standards representing over 350 chemical classes. For the purposes of this study, METLIN data was mined at 0eV, an energy designed to simulate the absence of collision induced dissociation. To retrieve ISF from the METLIN database we queried all MS/MS data giving peak intensities ≥5%, with the charge setting being positive or negative or both, and the collision energy set to 0eV. The obtained hits were collected and summed up. A python script was used to perform this data analysis from all 931,000 molecules in METLIN. The computer code is accessible at URL https://metlin.scripps.edu. The analysis revealed that ISF could account for over 70% of the peaks observed in typical LC-MS/MS metabolomic datasets. Further to this, we reviewed data from fragmentation across different instrument platforms10 and found, generally, ISF accounts for greater than 70% of the peaks observed and in some instances was even greater, depending on the instrument.
This finding disrupts the prevailing assumption that the majority of peaks in mass spectra correspond to unique metabolites. Instead, it suggests that the spectra may be significantly populated by fragment ions generated during the ionisation process, prior to any collision-induced dissociation. Such fragments, if not recognized as such, could be misclassified as distinct molecular entities, thus artificially expanding the metabolome’s perceived complexity.
This insight challenges traditional MS/MS spectral interpretations that define any unidentified MS/MS spectrum as unknown (e.g., the “dark metabolome”). Moreover, the fact that several “unknown” features will become intrinsically correlated potentially impacts statistical analysis. Our finding sheds an entirely different light on the potential size of the dark metabolome and questions if a technological rather than a biochemical phenomenon explains the overabundance of unknown metabolic space. Moreover, as LC-MS/MS data is also instrumental to metabolic networking and related computational approaches, our finding questions the outcomes of such methods on a broad biological scale. And while there remains a significant number of metabolites that are yet undiscovered, it is imperative to re-evaluate the prevalent utilization and interpretation of LC-MS/MS in light of these insights. Nevertheless, our findings suggest a notable positive outcome: the metabolome appears to be distinctly more defined than earlier assumptions indicated. This clarity brings the possibility of fully understanding the metabolome within closer reach than we have previously envisioned.
ACKNOWLEDGMENTS
This research was partially funded by the Novo Nordisk Foundation Center for Stem Cell Medicine (reNEW) supported by Novo Nordisk Foundation grant (NNF21CC0073729) (M.G.) and by the National Institutes of Health R35 GM130385 (G.S.).
Footnotes
COMPETING INTEREST
The authors declare no competing interest.
COMPUTER CODE
The python code to analyze the METLIN library is freely available at https://metlin.scripps.edu/.
REFERENCES
- 1.Peisl BYL, Schymanski EL & Wilmes P Dark matter in host-microbiome metabolomics: Tackling the unknowns–A review. Anal Chim Acta 1037, 13–27, doi: 10.1016/j.aca.2017.12.034 (2018). [DOI] [PubMed] [Google Scholar]
- 2.da Silva RR, Dorrestein PC & Quinn RA Illuminating the dark matter in metabolomics. Proc Natl Acad Sci U S A 112, 12549–12550, doi: 10.1073/pnas.1516878112 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Amaral P et al. The status of the human gene catalogue. Nature 622, 41–47, doi: 10.1038/s41586-023-06490-x (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Varabyou A et al. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol 24, 249, doi: 10.1186/s13059-023-03088-4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Crow JM Canada’s scientists are elucidating the dark metabolome. Nature 599, S14–15, doi: 10.1038/d41586-021-03062-9 (2021). [DOI] [Google Scholar]
- 6.Crick F Central dogma of molecular biology. Nature 227, 561–563, doi: 10.1038/227561a0 (1970). [DOI] [PubMed] [Google Scholar]
- 7.Xu YF, Lu W & Rabinowitz JD Avoiding misannotation of in-source fragmentation products as cellular metabolites in liquid chromatography-mass spectrometry-based metabolomics. Anal Chem 87, 2273–2281, doi: 10.1021/ac504118y (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chen L et al. Widespread occurrence of in-source fragmentation in the analysis of natural compounds by liquid chromatography–electrospray ionization mass spectrometry. Rapid Commun Mass Spectrom 37, e9519, doi: 10.1002/rcm.9519 (2023). [DOI] [PubMed] [Google Scholar]
- 9.Bernardo-Bermejo S et al. Quantitative multiple fragment monitoring with enhanced in-source fragmentation/annotation mass spectrometry. Nat Protoc 18, 1296–1315, doi: 10.1038/s41596-023-00803-0 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hoang C et al. Tandem Mass Spectrometry across Platforms. Anal Chem 96, 5478–5488, doi: 10.1021/acs.analchem.3c05576 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
