MIMI: Molecular Isotope Mass Identifier for stable isotope-labeled Fourier transform ultra-high mass resolution data analysis

Nabil Rahiman; Michael A Ochsenkühn; Shady A Amin; Kristin C Gunsalus

doi:10.1186/s12859-025-06348-1

. 2026 Jan 28;27:41. doi: 10.1186/s12859-025-06348-1

MIMI: Molecular Isotope Mass Identifier for stable isotope-labeled Fourier transform ultra-high mass resolution data analysis

Nabil Rahiman ^1,^#, Michael A Ochsenkühn ^2,^#, Shady A Amin ^1,^2,^✉, Kristin C Gunsalus ^1,^3,^✉

PMCID: PMC12879430 PMID: 41606456

Abstract

Background

Ultra High Resolution (UHR) mass spectrometry systems with Fourier Transform Ion Cyclotron Resonance (FT-ICR) are often used to analyze the composition of complex mixtures of naturally-occurring or isotopically-enriched small molecules, such as metabolite samples for environmental, biological, or paleontological studies. The extremely high resolution of these systems enables simultaneous measurement of the exact masses of tens of thousands of molecular features and accurate determination of chemical formulas based on isotopic fine-structure ratios. To accelerate and streamline analysis of these datasets, automated solutions to rapidly characterize the molecular composition of unknown samples by comparison with known reference databases are needed.

Results

Here we present Molecular Isotope Mass Identifier (MIMI), a commandline tool to identify molecules present in complex samples using data from UHR-FT-ICR mass spectrometry. Given a database of known molecules and expected ratios of atomic isotopes in the sample, MIMI first computes theoretical exact masses and expected abundance for all possible isotopic variants in the database. Chemical formulas from publicly available databases and/or custom lists of molecules of interest can be used as reference data for comparison. By default MIMI is configured to use natural isotopic abundances, but its advantage is it can easily accommodate different user-defined isotopic labeling ratios for any element(s). Candidate molecules are first identified in peak lists from UHR-FT-ICR mass spectrometry runs by comparing masses detected in the experimental data with precomputed theoretical masses for all entries in the reference database. Chemical formulas are then verified by counting isotopic fine-structure matches with the theoretical values for all possible molecular isotopic variants and can be further validated by comparing observed vs. expected relative peak heights. We illustrate MIMI’s utility using metabolite data from a cultured diatom sample isolated from seawater in the Arabian Gulf and spiked with ¹³C-labeled internal standards, as well as a sample containing naturally-abundant metabolite standards.

Conclusions

We introduce a simple commandline tool, MIMI, that rapidly identifies chemicals present in samples of complex composition using UHR-FT-ICR mass spectrometry data. MIMI can compare measured data against both standard and customized molecular databases and can accommodate natural or user-specified isotope ratios. This software provides a convenient tool for simultaneous determination of natural and isotope-labeled compounds within the same sample, particularly for rapid characterization of complex mixtures of metabolites.

Keywords: FT-ICR, Mass spectrometry, Metabolomics, Isotope labeling

Background

Fourier Transform Ion Cyclotron Resonance (FT-ICR) is a high-resolution mass spectrometry (MS) platform that uses ion cyclotron resonance frequencies and Fourier transformations to measure the mass-to-charge ratio (m/z) of ions in a sample. The resolving power of this technique lies in the sub-ppm range, making it possible to distinguish mass differences in the range of the mass of an electron [1, 2]. The platform’s ability to provide accurate mass measurements, high resolving power, and high sensitivity make it particularly useful for analyzing complex mixtures of molecules [3]. It has been successfully applied to environmental, petroleum, and food/beverage samples, in addition to standard metabolomics applications [3–5].

Beyond providing exact mass measurements, the ultra-high resolution of FT-ICR-MS allows the separation of molecular isotopes and identification of their chemical formulas. The expected isotopic fine-structure peaks of a molecule, derived from the natural abundance of its constituent isotopes, can serve as a secondary criterion for accurate chemical formula determination [6, 7]. Moreover, enrichment experiments with stable isotope labeling enables the investigation of elemental (carbon, oxygen, nitrogen, etc.) or molecule-specific changes in chemical and biological processes. This approach has been used in metabolomics, lipidomics, and proteomics to examine alterations in metabolic pathways, protein expression and turnover, flux analysis, and even intercellular interactions [8–12]. Isotope labeling is commonly used to trace contaminants in environmental studies, study carbon and nitrogen metabolism in plants [13], and characterize nitrogen fixation in microbes [14]; it has also been applied in medical and pharmacological studies [15–17].

Despite the broad utility of isotope labeling and isotopic fine-structure mapping for molecular identification, automated tools to extract and analyze isotopic mass features from UHR mass spectrometry data are not widely available. Currently, several freely available software platforms support FT-ICR MS data analysis in metabolomics and lipidomics. Formularity [18], CoreMS [19], and MFAssignR [20] focus on elemental formula assignment, with varying capabilities for recalibration, isotopic filtering, and customizable workflows. MetaboDirect [21], ftmsRanalysis [22], FREDA [23], and PyKrev [24] emphasize downstream analysis, offering statistical comparisons, diversity metrics, and rich visualization options such as Van Krevelen and Kendrick plots. For browser-based exploration, UltraMassExplorer [25] and DropMS [26] provide interactive platforms that combine rapid formula assignment with spectrum processing and visualization. Collectively, these tools cover major steps from formula assignment to visualization and statistics, but none are specifically designed to handle isotopically labeled datasets.

To address the need for a versatile, isotope-aware analysis tool for UHR-FT-ICR mass spectrometry, we created the Molecular Isotope Mass Identifier (MIMI) tool. Rather than attempt the computationally intractable problem of considering the entire universe of chemical space, MIMI compares peak signals from empirical mass spectra with theoretical molecular masses for a given set of reference compounds and isotope composition to identify the correct chemical formulas for molecular species detected in the data. MIMI first matches measured m/z values with expected values based on the reference data. To support the confidence in molecular identifications for exact mass spectra, MIMI then seeks to confirm the initial molecular assignments by comparing observed with expected isotopic fine-structure patterns based on the reference data and given isotope ratios. Reference lists may comprise data from publicly available metabolite databases and/or a set of specific molecules of interest; this flexibility is important for the study of model and non-model systems alike, since standard databases are incomplete and natural products discovery remains an active field. In addition, by allowing the user to specify any type and ratio of stable isotopes (e.g., ¹³C, ¹⁵N, ³⁴S), MIMI supports a wide variety of studies and experimental designs, such as molecular profiling of natural samples with isotope-labeled standards, uptake kinetics and metabolic flux of isotope-labeled compounds.

Implementation

MIMI is an open-source Python package with a command-line interface that is distributed under an NYU Non-Commercial Research License. The code base is available in a GitHub repository, and a companion website provides comprehensive documentation (see "Availability and requirements").

The MIMI framework (Fig. 1) separates the analysis into two stages: Preprocessing and Mass Analysis. This combination of modular design and reference database lookups helps to ensure fast processing speed and repeatability. The MIMI package also includes helper scripts that can generate reference lists from publicly available metabolite databases.

Software architecture

In the first stage (Preprocessing), MIMI builds a catalog of all possible isotopic variants for each chemical formula given in one or more list(s) of reference molecules, along with their masses and expected relative abundance, based on either natural isotope ratios or user-specified isotope composition. In the second stage (Mass Analysis), MIMI compares UHR mass spectra with the reference data to identify matches with the most abundant isotopic variant of each reference compound and then uses the theoretical isotopic fine structures to verify molecular assignments.

The Preprocessing task is computationally intensive, but it needs to be performed only once for each combination of reference list and isotope composition. Separating the two stages significantly reduces runtime when the same set of molecular isotope variants is consulted repeatedly – for example, to run the same analysis for multiple samples or replicates, or to test different error threshold parameters for peak matching.

Preprocessing

The mimi_cache_create command takes as input two types of files (Fig. 2): one or more molecular reference file(s) in tabular (.tsv) format and a file specifying expected atomic isotope ratios in JSON (.json) format. For each chemical formula in the reference list(s), MIMI combinatorially enumerates all possible isotopic variants and computes their molecular masses and expected relative abundances on the basis of the specified isotope composition. These are written to a cache file that can then be used to match any MS data set against the same list of molecular variants. Since each cache corresponds to a unique pairing of chemical formulas and isotope ratios, a separate cache file is needed for each combination of these used for mass analysis. Details on each file type are provided below.

Inline graphic — *MIMI Workflow*: Detailed workflow illustrating input and output file formats for Preprocessing (mimi_cache_create) and Mass Analysis (mimi_mass_analysis) steps. Snippets of input abundance files and cache files for two independent Preprocessing runs are shown for the default natural atomic isotope ratios (blue headers; to identify metabolites in the test sample) and for an override file specifying ¹³C-labeled compounds (green headers; for the IROA-IS spike-in). In the cache files, natural minor isotopes for H and N are indicated in blue font; minor isotopes for C are in red font (¹³C for natural ratios; ¹²C for the spike-in). One or more cache files and MS peak lists (after calibration and peak picking; pink/orange headers) may be used as input for a single Mass Analysis run. A partial results table shows the layout for a run using both cache files and two sample replicates (data for the second replicate would appear as a second set of columns to the right). The same color scheme highlights output columns for corresponding inputs. The top row of the final output file contains the file path and name of the log file, which records metadata on how each workflow was run

Molecular database: The choice of reference molecules to be used for comparison with MS spectra is specified with the --dbfile (-d) parameter followed by one or more filenames. If multiple database files are specified, they will be merged during the Preprocessing stage. This can be useful, for example, when certain compounds of interest are not present in a publicly available database. For convenience, MIMI is packaged with helper scripts to generate preformatted reference lists using publicly available data from the Human Metabolome DataBase (HMDB) [27] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND Database [28, 29] (see Documentation for details).

A properly formatted molecular reference file should include a header line containing at least three columns with the following names (in any order): CF (the chemical formula), ID (a unique identifier, such as from a publicly available database), and Name (a human-readable molecular name). MIMI will ignore any additional columns when building the cache file (e.g. synonyms, description, associated enzymes, references, etc.), since only the chemical formula and ID are needed for the analysis. Also, since the software relies solely on mass comparisons – whereas different structures or names can correspond to the same chemical formula – MIMI does not de-duplicate entries with the same CF but different IDs or Names in the final results table.

Isotope ratios: By default, MIMI uses atomic masses and natural isotope abundances from the National Institute of Standards and Technology (NIST, Gaithersburg, MD; downloaded Feb. 26, 2025) [30, 31], which are distributed with the MIMI package in JSON format. MIMI will always parse this file first as the basis for isotopic analysis. For any samples with stable isotope enrichment, it is necessary to explicitly specify new values for all elements with non-natural ratios. This allows MIMI to flexibly accommodate the presence of labeled organic compounds within metabolite samples. This is important since experimental studies increasingly employ stable isotope labeling with carbon (¹³C), hydrogen (²H), and/or nitrogen (¹⁵N), or occasionally oxygen (¹⁷O, ¹⁸O) or sulfur (³³S, ³⁴S).

Users may override the default values for any element(s) using the --label (-l) option together with a user-defined .json file. An override file needs only the new proportions of all isotopic variants for each labeled element(s), as MIMI will simply update these entries in its preloaded list of natural isotope ratios. It is critical that the updated relative isotope abundances for any individual element sum to 1.0; MIMI will perform a check for this and will throw an error if this check fails. For example, an experiment with 95% ¹³C labeling corresponds to a ratio of 95:5 ¹³C:¹²C, so the override file should express the proportion of ¹³C as 0.95 and that of ¹²C as 0.05 (as illustrated in Fig. 2). An example override file specifying 95% ¹³C labeling is included with the MIMI package as C13_95.json.

When multiply labeled molecules are expected (e.g. for dual labeling with ¹³C and¹⁵N), the new values for all non-natural isotope ratios should be included in a single override file, so that expected mass ratios for all combinations of molecular variants are correctly computed for downsteam Mass Analysis. However, if molecules with different non-natural isotope labels are expected, then a separate cache file should be created for each (e.g. for a ¹⁵N-labeled sample with a ¹³C-labeled standard).

Any non-natural labeling ratios supplied by the user should be either based on estimated ratios for the sample or, ideally, measured by isotope ratio mass spectroscopy (IRMS) to provide more accurate expected peak patterns for isotopic fine-structure comparison. If incorrect isotope ratios are provided, or if an override file is omitted, initial peak matches with chemical formulas may still be detected, but the proportion of candidate assignments verified by fine-structure analysis will decrease.

By default, isotopic variants with a theoretical relative abundance of Inline graphic compared to the monoisotopic peak (i.e. the most abundant molecular variant) are excluded during cache file generation to improve runtime and minimize background artifacts during the subsequent step. The default value can be overriden with the (-n) parameter.

Ionization mode: To compute expected masses for the reference data, the ionization mode of the spectrometer must also be specified during the preprocessing step using the --ion (-i) parameter. Arguments are positive (-i pos) or negative (-i neg), for which MIMI will correspondingly add ( Inline graphic ) or subtract () one proton mass of 1.007276467 from the computed exact mass. This version of MIMI only considers the most common adduct of a single ionization charge, since considering additional potential adducts would cause a combinatorial explosion that would greatly increase computational complexity.

Cache file: The output of the Preprocessing stage is a cache file that contains theoretical masses for all isotopic variants of the molecules in the reference list(s), along with expected relative abundances based on the specified atomic isotope ratios. The cache is stored as a serialized data object in a binary Python pickle file, which enables faster read/write times and uses less disk space than a text file. The name of the cache file is specified by the --cache (-c) parameter and carries the suffix .pkl (e.g. db_C13_95.pkl).

Mass analysis

The mimi_mass_analysis command takes as input one or more cache files from the Preprocessing step, along with one or more peak list(s) (after calibration and peak picking, see "Materials and methods" section). The --cache (-c) option specifies the .pkl cache file(s) to be used for peak matching, while the --sample (-s) option specifies the tab-delimited .acs sample file(s) to be analyzed in a single run.

MIMI was designed to support comparisons of individual peak lists with multiple cache files and vice versa to accommodate scenarios that call for the combined analysis of one or more datasets using different expected isotope ratios. This is useful, for example, to identify molecules in a set of replicates with isotope-labeled spike-in standards, to analyze time-series data, or to compare different treatment groups from the same experiment.

Assignment of chemical formulas to mass spectral data is a two-part process: (1) peak lists are compared with the theoretical masses of reference molecules to identify matches with the monoisotopic mass, i.e. the most prevalent molecular variant expected based on a given isotope composition; and (2) preliminary assignments are verified by comparing observed vs. expected fine-structure patterns for isotopic variants of each candidate molecule.

For the initial peak matching, MIMI compares the sample and cache files using an efficient hash-based search algorithm. This enables the rapid identification of peak signals whose measured mass matches the theoretical monoisotopic mass of each reference molecule. The mass error threshold for the search window must be specified by the user in parts per million (ppm) using the --ppm (-p) option.

Once a match for the monoisotopic mass has been identified, MIMI seeks to verify the chemical formula assignment by comparing the observed fine-structure isotope pattern with that expected for the candidate compound. The error window for matching peaks against the precomputed masses of isotopic variants is specified by the user (in ppm) using the -vp option (where ‘v’ stands for ‘validation’).

Both the initial monoisotopic matching step and the subsequent fine-structure analysis can be tuned by varying the user-defined mass error tolerance thresholds (-p and -vp), which can affect the number of chemical formulas identified in the sample data. The current mass accuracy of UHR-FT-MS instruments is in the sub-ppm range, so an error window of Inline graphic ppm is a reasonable starting value for -p. Exploring different -vp values can be particularly useful, as smaller peaks often exhibit broader peak shapes that may cause shifts during peak picking. Users can test each dataset for the optimal thresholds by re-running the analysis across a range of values, as illustrated in Fig. 3. The online Documentation provides a template for a simple bash script that can be used to automate this process.

Fig. 3 — Effect of varying ppm error thresholds on peak assignments: Number of a monoisotopic and b fine-structure peak matches for peak lists from two technical replicates of a diatom sample (testdata1.asc and testdata1.asc) containing naturally abundant metabolites (blue/light blue bars) with a ¹³C-labeled IROA-IS spike-in (orange/pink bars), when compared with a list of 8,529 unique chemical formulas for 16,089 distinct KEGG compounds ranging between 40–1000 Daltons. Comparisons were performed across a range of error thresholds against the theoretical masses of metabolites with either natural isotopic abundance (nat_nist) or 95¹³C-labeling (C13_95). a) Number of distinct molecular features (monoisotopic masses) identified at varying -p settings of 0.1, 0.5, 1 ppm, with -vp held constant at 0.5 ppm. b) Average number of minor isotopic variants detected per matched chemical formula at a -p setting of 0.5 and varying -vp values of 0.1, 0.5, 1 ppm

MIMI does not automatically apply thresholds for matching theoretical vs. observed isotope fine structure peak heights because these can diverge considerably in complex samples, particularly those with mixed isotope compositions (e.g. natural samples with labeled spike-in standards). However, MIMI can optionally perform further validatation of isotopic fine-structure matches to increase confidence in annotated features. When run with the --iso-validation flag, MIMI will compare measured vs. theoretical heights of isotope fine-structure peaks with respect to the mono-isotopic mass and output the number of fine-structure matches that also fall within a tolerance of Inline graphic deviation from expected values. This “validation" step is implemented as an optional flag because some complex samples may contain overlapping peaks that obscure the signals from minor isotopic variants, rendering this step less useful.

Output

The final product of a MIMI analysis is a tabular .tsv file containing a list of all the matched reference compounds and observed experimental data for the FT-ICR-MS spectra analyzed in the same run, as illustrated in Fig. 2. The output filename is specified together with its relative file path using the --output (-o) parameter. To enable reproducible analysis workflows, MIMI also writes a log file for each run that contains the full commands issued by mimi_cache_create and mimi_mass_analysis, along with timestamps and a summary of the metadata. This provides a clear record of all parameter options, cache files, and data files used in each run. The first line of the mimi_mass_analysis output file contains the name and location of the log file for that run.

The first several columns of the results table contain molecular information from the reference database: chemical formula, database identifier, compound name, theoretical mass, and atom counts for individual elements (e.g. C, H, N, O, P, and S). Inclusion of atom counts facilitates the computation of various chemical properties that depend on atomic ratios, such as double bond equivalency (DBE), creation of van Krevelen diagrams, etc. This information is followed by numerical data for each combination of spectrum and isotope patterns analyzed: measured monoisotopic mass (m/z), detected mass error (in ppm), monoisotopic peak intensity, and count of theoretical isotope features matched. Note that the number of data rows in the results table may exceed the number of unique CFs matched. This is because if the same CF is associated with more than one compound ID or Name in a reference list, MIMI will output these in separate rows to preserve the original identifiers provided.

To facilitate side-by-side comparisons, all results will be concatenated in a single output file when mimi_mass_analysis is run using multiple input files. The data are grouped first by the peak files and second by the cache files used for the analysis. Hence, if more than one set of isotope ratios is used to analyze a single dataset, numerical data based on each cache (.pkl) file will be presented in sequential columns under the same .acs header. Likewise, if multiple spectra are analyzed in the same run, results for each dataset will be output sequentially as a grouped set of columns.

For example, consider a batch of samples or technical replicates to which a 95% ¹³C-labeled spike-in of known compounds has been added as a control, as in Fig. 2. Since both the monoisotopic masses and the fine-structure patterns will differ between 95% ¹³C-labeled molecules and those with natural isotope abundances, it is desirable to compare each .acs file with the .pkl cache files generated using both natural_isotope_abundance_NIST.json and C13_95.json. The output file for a combined run will report matches to entries in each cache file in separate groups for each dataset analyzed, enabling reference compounds in both the original sample and the spike-in to be identified independently for each dataset.

Full documentation for the MIMI package is available on the MIMI website (see Documentation).

Materials and methods

Diatom isolation and growth: Asterionellopsis glacialis strain A3 (CCMP3542) was isolated from the Arabian Gulf and kept axenic as described previously [32]. The cultures were maintained in f/2+Si [33] in semi-continuous batch cultures and incubated in Percival growth chambers (Percival Scientific, Perry, IA, USA) at Inline graphic C, 130 E photosynthetic photon flux density (PPFD) and a 12h:12h light/dark cycle. Light flux was measured using a QSL-2100 PAR Sensor (Biospherical Instruments Inc., San Diego, CA, USA). Growth was monitored by measuring in vivo fluorescence using a 10-AU fluorometer (Turner Designs, San Jose, CA, USA).

Diatom exudate metabolites sample: To extract exudate molecules, the spent medium of a diatom culture grown to a relative fluorescence units (RFU) value of Inline graphic 8 (representing mid-exponential growth) was filtered through Whatman 0.2-m polycarbonate membrane filters (Cytiva Life Sciences, Marlborough, MA, USA). Solid-phase extraction (SPE) was performed on the spent medium using Agilent PPL Bond Elut cartridges (1 g for 300 ml samples) (Agilent Technologies, Santa Clara, CA, USA). The columns were activated according to the manufacturer’s instructions and subsequently used to remove and desalt organic molecules from the filtrates as previously described [34]. Extracts were eluted with 1 ml and 5 ml 0.1% (v/v) formic acid in methanol, dried using a Savant SpeedVac SC210A evaporator (Thermo Fisher Scientific, Waltham, MA, USA), and stored at −20 °C.

For FT-ICR-MS analysis, the sample was redissolved in 400 Inline graphic l of 90% MeOH and spiked with 15l of 95% ¹³C-labeled IROA-IS internal standard (IROA Technologies, Ann Arbor, MI, USA). The standard, a yeast extract prepared using uniformly 95% ¹³C-labeled glucose, is distributed as a powder and was first dissolved according to the manufacturer’s recommendations in 1.2 ml Milli-Q Inline graphic O. The spiked sample was measured twice as technical replicates.

Unlabeled nitrogen-containing metabolite standards sample: 343 metabolite standards from the IROA Library (Sigma-Aldrich, USA) that contain nitrogen were pooled into a single sample at a concentration of 5 Inline graphic g/ml and were analyzed on the FT-ICR-MS as described below. The 343 metabolites represented 274 unique chemical formulas, which were the basis for the isotope fine-structure peak height analysis.

FT-ICR-MS acquisition: High-resolution mass spectra were acquired on a Bruker solariX Fourier Transform Ion Cyclotron Resonance Fourier Transform Mass Spectrometer (FT-ICR-MS) (Bruker , Bellerica, MA, USA) equipped with a 7T superconducting magnet operated in the negative ionization mode. Spectra were acquired with a time domain of 4 mega words over a mass range of 50−1000 m/z, with an optimal mass range from 200–600 m/z. For each sample replicate, 300 hundred scans were collected at a sample injection rate of 1 Inline graphic l/min. The following parameters were set for detection of molecules in the m/z 100 to 1000 range: Source: ESI, negative mode, Capillary: 4500 V, End Plate Offset: −700 V, Nebulizer pressure: 2 bar, Dry gas flow: 10 L/min, Dry temperature: C. Iontrap: Source Optics, Capillary Exit: −200 V, Deflector Plate: −220 V, Funnel1: −150 V, Skimmer: −15 V, Funnel RF App: 150Vpp; Octopole Frequency: 5 MHz, RF Amplitude: 350 Vpp, Collison Cell: Collision Voltage: −10 V, DCextract Bias: −0.6 V, RF Frequency: 2 Mhz, Collision RF Amplitude: 1500 Vpp, Transfer Optics: TOF: 0.5 ms, Frequency: 4 MHz, RF Amplitude 350 Vpp. Analyzer: ParaCell: Transfer Exit Lens: 23.0 V, Analyzer Entrance: 10.0 V, Side kick: 0.0 V, Side kick Offset: 1.5 V, Front trap plate: −2.0 V, Back Trap plate: −2.0 V, Back trap Quench: 30 V, Sweep excitation: 21%; Shimming DC Bias: Inline graphic : 1.468, : 1.532, : 1.420, :1.584, ICR Fills: 1.

DataAnalysis 5.0 calibration and feature picking: The FT-ICR mass spectra were internally calibrated on primary metabolites (amino acids, fatty acids, and organic acids) using DataAnalysis Version 5.0 (Bruker, Bellerica, MA, USA). Peak alignment was performed with a maximum error threshold of 0.01 ppm and with a cut-off signal-to-noise (S/N) ratio of 4. The resulting peak tables were exported to peak lists in tab-delimited .asc format. Data files for both sample replicates (testdata1.asc, testdata2.asc) are distributed with the MIMI package.

Mass and Isotope Analysis: For this study, MIMI v1.0.0 was installed using the Conda package manager (Anaconda Inc., Austin, TX, USA) and run using Python, v3.11.11 (Python Software Foundation, https://www.python.org/). For testing purposes, the analysis was performed on both Mac and PC systems using a reference database generated with the KEGG helper script included in the MIMI package (see Documentation). Cache files were generated using either the default natural atomic isotope ratios from NIST or by specifying a 95% ¹³C override file; .json files for both are distributed with the MIMI package.

Results and discussion

To validate the tool’s performance, we used MIMI to analyze FT-ICR-MS data for two technical replicates of a sample containing diatom dissolved organic matter ("Materials and methods" section). In brief, a diatom strain isolated from the Arabian Gulf was cultured in axenic conditions; the de-salted diatom extract was spiked with a ¹³C-labeled IROA-IS internal molecular standard and FT-ICR-MS data were acquired using a time domain of 4 mega words across a mass range of 50–1000 m/z in negative ionization mode. After calibration and peak picking, the resulting .asc files contained 89,288 and 89,282 molecular features (m/z) per replicate with a resolving power of 750,000 at m/z 110 (corresponding to a mass resolution of approximately 0.00015 m/z units).

Preprocessing was performed using a reference list of 16,089 distinct metabolites with 8,529 unique masses ranging from 40–1000 Daltons, extracted from the online KEGG database [28, 29]. Two cache files were created using this reference list: one with natural (default) isotope abundance (to identify diatom metabolites) and another with a ¹³C:¹²C isotope ratio of 95:5 (for the IROA-IS spike-in).

Mass analysis for both sample replicates and both cache files was performed across a range of error thresholds (Fig. 3). With the -p parameter set to 0.1, 0.5, or 1 ppm and the -vp parameter set at a constant value of 0.5 ppm, the number of unique reference chemical formulas detected in the test data ranged from 686 to 3100 for unlabeled metabolites from the diatom extract, and from 139 to 1140 for the 95% ¹³C-labeled standard (Fig. 3a). For both isotope compositions, we observed a 4- to 5-fold increase in the number of distinct molecular features (monoisotopic matches) detected for an error threshold of -p 0.5 vs. -p 0.1 ppm, in line with the expected accuracy of the instrumentation. Consistent with the expectation that thousands of distinct molecules should be present in the unlabeled diatom sample, we detected a maximum of between 2398 and 3100 distinct molecular species in the test data. The expected ¹³C-labeled IROA-IS spike-in composition of around 500-1000 KEGG compounds (https://www.iroatech.com/wp-content/uploads/2022/02/TruQuant-Yeast-Extract-QC-Workflow-Kit-USER-MANUAL_022022.pdf) [35] also compares well with the 618 and 1140 matched features at 0.5 and 1 ppm, respectively.

In general, for FT-ICR-MS data with very high resolution (>250,000 resolving power), deviations between measured and theoretical masses should be expected to be in the sub-ppm range. Depending on variations in instrumentation and acquisition parameters, standard FT-ICR-MS data should allow compound identification within a mass error of 0.5 ppm (after adequate acquisition and calibration and peak picking). However, this error tolerance is highly dependent on instrument setup, sample types and acquisition parameters. For this study, we chose a threshold of 0.5 ppm for the monoisotopic peak match for further observations.

As peak shape might deteriorate for very small molecular peak features, we also explored the effect of varying the error rate for isotopic fine-structure feature comparisons. To illustrate the performance of the identification and verification of isotopic fine-structure features, we ran MIMI on the test data with the monoisotopic mass error fixed at -p 0.5 and -vp settings of 0.1, 0.5 and 1 ppm (Fig. 3b). Between the two different isotope types, we should expect a greater number of detectable molecular isotopes for 95:5 ¹³C:¹²C-labeled molecules – since the relative abundance of minor isotope peaks containing 5% ¹²C will be greater than that of molecules containing the natural abundance of 1.07% ¹³C. For -vp 0.1 ppm, we observed a very low number of isotopes due to insufficient instrument accuracy. At -vp 0.5 and -vp 1.0 ppm, an average of 1.9 and 3.1 isotopes were matched with the natural isoforms across the two replicates. As expected, a higher number of isotopes (2.8 and 5.2 on average) were matched to ¹³C-labeled features at these settings.

Because isotope fine-structure peak heights exhibit considerable overlap in samples with mixed isotope ratios such as the diatom sample used here, a second sample of 274 known nitrogen-containing IROA metabolites with natural isotope abundance was processed on the FT-ICR-MS and analyzed with MIMI. As before, mass analysis was performed across a range of error thresholds, this time including the --isotope-validation flag. We then evaluated, for both datasets, the number of monoisotopic peaks with matches to chemical formulas in the KEGG database, how many of these CF assignments could be verified with fine-structure isotopic peaks, and how many CFs were subsequently validated by the presence of minor isotopic peaks that fell within 30% of their expected peak height based on theoretical isotope ratios (Fig. 4). To illustrate, we show output data for one molecule from the nitrogen-containing metabolites dataset with its monoisotopic mass, all associated fine-structure peaks, and validated peaks based on m/z values and peak height intensities (Fig. 5 and Table 1). In this example, of five detected fine-structure isotopic peaks associated with N-sulfo-D-glucosamine, three were validated.

Fig. 4 — *Validation rates of chemical formula assignments for monoisotopic masses using relative peak heights of minor isotopic variants*: Number of unique CFs with monoisotopic matches (light bars), minor isotope variant matches (medium bars), and validated formulas (dark bars) for two sample types when compared with KEGG compounds between 40–1000 Daltons. The nitrogen-containing IROA metabolite standards dataset (orange) contains 274 unique chemical formulas. The diatom sample (blue; testdata1) contains a mixture of natural and 95% ¹³C-labeled isotopes. Comparisons were performed across a range of ppm error thresholds using MIMI’s --iso-validation option with a 30% tolerance for isotopic fine-structure peak height matching. a) Number of unique CFs detected at varying -p settings of 0.1, 0.5, 1 ppm, with -vp held constant at 0.5 ppm. b) Number of unique CFs detected at -p setting of 0.5 and varying -vp values of 0.1, 0.5, 1 ppm.

Fig. 5 — *Mass spectrum analysis of N-sulfo-D-glucosamine (C*₆H₁₃NO₈*S) showing isotopic pattern verification using split X-Y axis visualization*: The plot displays the mono-isotopic peak (blue) and five isotopic variants with their corresponding chemical formulas. Green peaks indicate “validated" isotopic variants whose relative abundance ratios match theoretical molecular abundances within tolerance, while red peaks indicate variants that fail validation criteria. The split-axis design allows clear visualization of both the high-intensity mono-isotopic peak and the lower-intensity isotopic variants on separate y-axis scales

Table 1.

Isotopic Mass Verification using Intensity Ratio Matching for N-Sulfo-D-glucosamine (C₆H₁₃NO₈S)

Formula¹	m/z	Intensity²	MS Ratio³	Abundance	Error Rate⁴	Verified⁵
	259.028	6984797	0.0081	0.0079	0.02	Yes
S	259.032	40719460	0.0473	0.0649	0.27	Yes
	260.025	25647532	0.0298	0.0447	0.33	No
S	260.033	12089784	0.0140	0.0164	0.14	Yes
	262.025	33301546	0.0387	0.00001	3643.68	No

Open in a new tab

¹Isotopic formulas show the specific isotope composition for each molecular variant

²Mass analysis was performed with -p and -vp settings of 0.5 ppm

³MS Isotopic Ratio = Isotopic Mass Intensity / Monoisotopic Mass Intensity (861392640)

⁴Error Rate = |Molecular Abundance – MS Isotopic Ratio| / Molecular Abundance

⁵Ratio verification based on error rate threshold of 0.3; entries with error rates Inline graphic 0.3 are marked as No

Monoisotopic mass: 258.029

Analysis of N-containing metabolites standard across a -p parameter setting of 0.1, 0.5 or 1 ppm with -vp set at a constant 0.5 ppm (Fig. 4a) reveals that a -p of 0.5 ppm yields the most reliable result, with virtually 100% of known standards validated. 30 additional chemical formulas were also validated, likely arising from contamination in the standards and/or from sample processing. With a constant -p of 0.5 ppm, varying the -vp setting across 0.1, 0.5 and 1.0 ppm yielded a similar range of validated CFs (Fig. 4a) and revealed that 0.5 ppm is also optimal for the -vp parameter. With either parameter set to the most stringent cutoff of 0.1 ppm, most known compounds failed to validate, whereas relaxing either error threshold to 1.0 ppm increased the number of validations further beyond expectation. Validation rates ranged from 55-63%, with a rate of 60% at the optimal settings. In contrast, validation rates for the diatom sample, containing a mixture of natural and ¹³C-labeled isotopes, ranged from 29-39% across all settings. At the intermediate settings, only 34% of CFs were validated, just slightly over half the rate for the N-containing sample. The lower percentage of validated CFs in the diatom sample can be attributed to the mixed-isotope nature of the sample and the fact that diatoms are not model organisms, for which a significant fraction of metabolites are to date unknown.

We also assessed the computational performance of MIMI by benchmarking the current modular workflow against a hypothetical combined approach, in which a new cache file was created for each analysis (Table 2). For this example, cache creation required 7 s, and each mass analysis completed in an average of 3 s. Gains in total runtime scale with the number of independent runs. Beyond increasing speed, the modular design improves reproducibility and workflow flexibility, as there is no need to reprocess identical reference data across replicates, time series, or parameter sweeps.

Table 2.

Runtime comparisons of modular and combined workflows

	Cache¹	Analysis¹	1 Run	3 Runs	30 Runs
Modular workflow	7	3	10	16	97
Combined workflow	7	3	10	30	300
Improvement	–	–	–	1.9	3.0

Open in a new tab

¹Runtime per dataset

Cache creation is a one-time step in the separated workflow and is repeated for each analysis in the combined workflow

Runtimes are reported in seconds for one or more independent runs, using a reference database of 16,089 compounds and an input sample file containing 89,288 features

Conclusion

The broad utility of isotope labeling and isotopic fine-structure mapping for molecular identification calls for automated tools to extract and analyze isotopic mass features from UHR mass spectrometry data. Here we introduce MIMI (Molecular Isotope Mass Identifier), an open-source tool designed to simplify and accelerate the analysis of FT-ICR mass spectrometry data from both unlabeled samples and enriched isotope labeling experiments. MIMI is well-suited for a wide range of experimental studies and serves as a foundation for advancing the analysis of ultra-high-resolution mass spectrometry data, particularly for biological and environmental research. MIMI enables efficient comparison of unlabeled and isotope-labeled chemical mixtures, offering flexibility through customizable reference databases and support for any combination of stable isotope ratios. Mass analysis is facilitated by a pre-processing step that computes theoretical masses of all possible isotopic variants for any user-specified isotope ratios while filtering out variants expected to be present at extremely low abundance. Separating the pre-processing and analysis steps produces greater savings in computational time as the number of samples increases. MIMI can analyze multiple peak files simultaneously and produces a comprehensive results table that includes atom counts and isotope information for all datasets in a single output file. These features enable easy comparison of results across multiple replicates, treatment conditions, or time series simultaneously. By enabling analysis of one or more expected isotope ratios at once, MIMI also facilitates analysis of experiments employing mixed isotope composition, such as samples with isotope-enriched spike-ins or matched samples with different isotope labels.

The current version of MIMI was designed for simplicity and speed, at the cost of some limitations in functionality that will require further development. First, while MIMI verifies molecular assignments for matched mono-isotopic peaks through isotopic fine-structure matching, we recognize that quantification can be affected by overlapping isotope patterns and chimeric spectra. More sophisticated approaches, such as mixture model–based deconvolution, could in principle provide more accurate recovery of true signal intensities. For now, we caution that quantitative comparisons should be interpreted conservatively, and note that incorporating similarity-based scoring of isotopic intensity patterns may offer a practical way to describe potential interferences. In addition, MIMI currently handles only un-fragmented mass spectrometry (MS1) data, therefore only enabling chemical formula identification and not disambiguation of different chemical structures with the same consituent atoms. Finally, MIMI currently supports only singly charged ions, which are the predominant species in electrospray-based metabolomics. While this greatly simplifies the analysis and captures the majority of observed features, it does not account for alternative adducts or multiple charge states. Extending support to these would require combinatorial enumeration of isotopic fine structures during the pre-processing step, greatly increasing computational cost. Future releases could overcome these limitations by aiming to integrate tandem mass spectrometry (MS/MS) to improve compound identification and structural interpretation, to add the capability to assign adducts and multiple charge states for chemical formulas, and to implement similarity-based scoring of isotopic intensity patterns to resolve potential interferences, all of which could help improve analysis of complex samples, particularly those with mixed isotopes.

Availability and requirements

Project name: MIMI (Molecular Isotope Mass Identifier)
Documentation: https://corebioinf.abudhabi.nyu.edu/MIMI
Code repository: https://github.com/NYUAD-Core-Bioinformatics/MIMI
Operating system(s): Platform independent
Programming language: Python
Other requirements: Python 3.11.11
License: NYU Non-Commercial Research License (free for academic use)
Any restrictions to use by non-academics: License required for commercial use; see GitHub repository for contact details

Acknowledgements

The authors would like to acknowledge the NYU Abu Dhabi Core Technology Platforms (CTP) facilities for access to the FT-ICR-MS instrumentation.

Author contributions

MAO conceived the project; NR developed the algorithms and wrote the code; SAA and KCG provided supervision and guidance; and all authors contributed to writing the manuscript.

Funding

This work was supported by a Gordon and Betty Moore Foundation award to SAA (GBMF9335, https://doi.org/10.37807/GBMF9335), an NYU Abu Dhabi award to SAA (AD179), and by Tamkeen under the NYU Abu Dhabi Research Institute Award for the NYUAD Center for Genomics and Systems Biology to KCG and SAA (ADHPG-CGSB).

Data availability

All test datasets, configuration files, and associated materials discussed in the manuscript are available in the MIMI GitHub repository (https://github.com/NYUAD-Core-Bioinformatics/MIMI). MIMI can be installed via bioconda (https://anaconda.org/bioconda/mimi).

Declarations

Competing interests

The authors declare no Conflict of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Nabil Rahiman and Michael A. Ochsenkühn contributed equally to this work.

Contributor Information

Shady A. Amin, Email: sa132@nyu.edu

Kristin C. Gunsalus, Email: kcg1@nyu.edu

References

1.Nikolaev EN. Some notes about FT ICR mass spectrometry. Int J Mass Spectrom. 2015;377:421–31. 10.1016/j.ijms.2014.07.051.
2.Marshall AG, Hendrickson CL, Jackson GS. Fourier transform ion cyclotron resonance mass spectrometry: a primer. Mass Spectrom Rev. 1998;17(1):1–35. 10.1002/(SICI)1098-2787(1998)17:1%3C1::AID-MAS1%3E3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]
3.Ghaste M, Mistrik R, Shulaev V. Applications of fourier transform ion cyclotron resonance (FT-ICR) and orbitrap based high resolution mass spectrometry in metabolomics and lipidomics. Int J Mol Sci. 2016;17(6):816. 10.3390/ijms17060816. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kanawati B, Wanczek KP, Schmitt-Kopplin P. Data processing and automation in fourier transform mass spectrometry. In: Fundamentals and Applications of Fourier Transform Mass Spectrometry; 2019. pp. 133–185. Elsevier. 10.1016/B978-0-12-814013-0.00006-5.
5.Rodgers RP, Marshall AG. Petroleomics: Advanced characterization of petroleum-derived materials by fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS). In: Mullins, O.C., Sheu, E.Y., Hammami, A., Marshall, A.G. (eds.) Asphaltenes, Heavy Oils, and Petroleomics; 2007. p. 63–93. Springer. 10.1007/0-387-68903-6_3.
6.Miladinović SM, Kozhinov AN, Gorshkov MV, Tsybin YO. On the utility of isotopic fine structure mass spectrometry in protein identification. Anal Chem. 2012;84(9):4042–51. 10.1021/ac2034584. [DOI] [PubMed] [Google Scholar]
7.Nagao T, Yukihira D, Fujimura Y, Saito K, Takahashi K, Miura D, et al. Power of isotopic fine structure for unambiguous determination of metabolite elemental compositions: in silico evaluation and metabolomic application. Anal Chim Acta. 2014;813:70–6. 10.1016/j.aca.2014.01.032. [DOI] [PubMed] [Google Scholar]
8.Zhang R, Chen B, Zhang H, Tu L, Luan T. Stable isotope-based metabolic flux analysis: a robust tool for revealing toxicity pathways of emerging contaminants. TrAC Trends Anal Chem. 2023;159:116909. 10.1016/j.trac.2022.116909. [Google Scholar]
9.Qian Z-Y, Ma J, Sun C-L, Li Z-G, Xian Q-M, Gong T-T, et al. Using stable isotope labeling to study the nitrogen metabolism in anabaena flos-aquae growth and anatoxin biosynthesis. Water Res. 2017;127:223–9. 10.1016/j.watres.2017.09.060. [DOI] [PubMed] [Google Scholar]
10.Chokkathukalam A, Kim D-H, Barrett MP, Breitling R, Creek DJ. Stable isotope-labeling studies in metabolomics: new insights into structure and dynamics of metabolic networks. Bioanalysis. 2014;6(4):511–24. 10.4155/bio.13.348. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Shih L-M, Tang H-Y, Lynn K-S, Huang C-Y, Ho H-Y, Cheng M-L. Stable isotope-labeled lipidomics to unravel the heterogeneous development lipotoxicity. Molecules. 2018;23(11):2862. 10.3390/molecules23112862. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Gevaert K, Impens F, Ghesquière B, Van Damme P, Lambrechts A, Vandekerckhove J. Stable isotopic labeling in proteomics. Proteomics. 2008;8(23):4873–85. 10.1002/pmic.200800421. [DOI] [PubMed] [Google Scholar]
13.Maia M, Figueiredo A, Cordeiro C, Sousa Silva M. FT-ICR-MS-based metabolomics: a deep dive into plant metabolism. Mass Spectrom Rev. 2023;42(5):1535–56. 10.1002/mas.21731. [DOI] [PubMed] [Google Scholar]
14.Angel R, Panhölzl C, Gabriel R, Herbold C, Wanek W, Richter A, et al. Application of stable-isotope labelling techniques for the detection of active diazotrophs. Environ Microbiol. 2018;20(1):44–61. 10.1111/1462-2920.13954. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Mutlib AE. Application of stable isotope-labeled compounds in metabolism and in metabolism-mediated toxicity studies. Chem Res Toxicol. 2008;21(9):1672–89. 10.1021/tx800139z. [DOI] [PubMed] [Google Scholar]
16.Knapp DR, Gaffney TE. Use of stable isotopes in pharmacology-clinical pharmacology. Clin Pharm Ther. 1972;13(3):307–16. 10.1002/cpt1972133307. [DOI] [PubMed] [Google Scholar]
17.Schellekens RCA, Stellaard F, Woerdenbag HJ, Frijlink HW, Kosterink JGW. Applications of stable isotopes in clinical pharmacology. Br J Clin Pharm. 2011;72(6):879–97. 10.1111/j.1365-2125.2011.04071.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Tolić N, Liu Y, Liyu A, Shen Y, Tfaily MM, Kujawinski EB, et al. Formularity: software for automated formula assignment of natural and other organic matter from ultrahigh-resolution mass spectra. Anal Chem. 2017;89(23):12659–65. 10.1021/acs.analchem.7b03318. [DOI] [PubMed] [Google Scholar]
19.Corilo YE, Kew WR, McCue LA, Heal K, Carr JC. EMSL-Computing/CoreMS: CoreMS 3.0.0. 2024. 10.5281/ZENODO.14009575. https://zenodo.org/doi/10.5281/zenodo.14009575.
20.Schum SK, Brown LE, Mazzoleni LR. MFAssignR: Molecular formula assignment software for ultrahigh resolution mass spectrometry analysis of environmental complex mixtures. Environ Res. 2020;191:110114. 10.1016/j.envres.2020.110114. [DOI] [PubMed]
21.Ayala-Ortiz C, Graf-Grachet N, Freire-Zapata V, Fudyma J, Hildebrand G, AminiTabrizi R, Howard-Varona C, Corilo YE, Hess N, Duhaime MB, Sullivan MB, Tfaily MM. MetaboDirect: an analytical pipeline for the processing of FT-ICR MS-based metabolomic data. Microbiome. 2023;11(1):28 . 10.1186/s40168-023-01476-3. [DOI] [PMC free article] [PubMed]
22.Bramer LM, White AM, Stratton KG, Thompson AM, Claborne D, Hofmockel K, McCue LA. ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data. Comp Biol. 2020;16(3):1007654. 10.1371/journal.pcbi.1007654. [DOI] [PMC free article] [PubMed]
23.Degnan D, Claborne D, White A, Akers S, Winans N, Corilo Y, Strauch C, Bailey V, McCue L, Stratton K, Bramer L. FREDA: A web application for the processing, analysis, and visualization of fourier-transform mass spectrometry data. Rapid Commun Mass Spectrom. 2025;39(7):9980. 10.1002/rcm.9980. [DOI] [PMC free article] [PubMed]
24.Kitson E, Kew W, Ding W, Bell NGA. PyKrev: A Python library for the analysis of complex mixture FT-MS data. J Am Soc Mass Spectrom. 2021;32(5):1263–7. 10.1021/jasms.1c00064. [DOI] [PubMed]
25.Leefmann T, Frickenhaus S, Koch BP. UltraMassExplorer: a browser-based application for the evaluation of high-resolution mass spectrometric data. Rapid Commun Mass Spectrom. 2019;33(2):193–202. 10.1002/rcm.8315. [DOI] [PubMed]
26.Rosa TR, Folli GS, Pacheco WLS, Castro MP, Romão W, Filgueiras PR. DropMS: Petroleomics data treatment based in web server for high-resolution mass spectrometry. J Am Soc Mass Spectrom. 2020;31(7):1483–90. 10.1021/jasms.0c00109. [DOI] [PubMed]
27.Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, et al. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 2022;50(D1):622–31. 10.1093/nar/gkab1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27–30. 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Kanehisa M, Furumichi M, Sato Y, Matsuura Y, Ishiguro-Watanabe M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 2025;53(D1):672–7. 10.1093/nar/gkae909. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Coursey JS, Schwab DJ, Tsai JJ, Dragoset RA. Atomic Weights and Isotopic Compositions (version 4.1). National Institute of Standards and Technology, Gaithersburg, MD; 2015. [Online] Available: http://physics.nist.gov/Comp [Downloaded Feb 26, 2025].
31.Meija J, Coplen TB, Berglund M, Brand WA, Bièvre PD, Gröning M, Holden NE, Irrgeher J, Loss RD, Walczyk T, Prohaska T. Atomic weights of the elements 2013 (IUPAC technical report). Pure Appl Chem. 2016;88(3):265–291. 10.1515/pac-2015-0305.
32.Behringer G, Ochsenkühn MA, Fei C, Fanning J, Koester JA, Amin SA. Bacterial communities of diatoms display strong conservation across strains and time. Front Microbiol. 2018;9:659. 10.3389/fmicb.2018.00659. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Guillard RR, Ryther JH. Studies of marine planktonic diatoms. I. Cyclotella nana Hustedt, and Detonula confervacea (cleve) Gran. Can J Microbiol. 1962;8:229–39. 10.1139/m62-029. [DOI] [PubMed] [Google Scholar]
34.Ochsenkühn MA, Schmitt-Kopplin P, Harir M, Amin SA. Coral metabolite gradients affect microbial community structures and act as a disease cue. Commun Biol. 2018;1(1):1–10. 10.1038/s42003-018-0189-1. [DOI] [PMC free article] [PubMed]
35.Mahmud I, Wei B, Veillon L, Tan L, Martinez S, Tran B, et al. Ion suppression correction and normalization for non-targeted metabolomics. Nat Commun. 2025;16(1):1–30. 10.1038/s41467-025-56646-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Nikolaev EN. Some notes about FT ICR mass spectrometry. Int J Mass Spectrom. 2015;377:421–31. 10.1016/j.ijms.2014.07.051.

[CR2] 2.Marshall AG, Hendrickson CL, Jackson GS. Fourier transform ion cyclotron resonance mass spectrometry: a primer. Mass Spectrom Rev. 1998;17(1):1–35. 10.1002/(SICI)1098-2787(1998)17:1%3C1::AID-MAS1%3E3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Ghaste M, Mistrik R, Shulaev V. Applications of fourier transform ion cyclotron resonance (FT-ICR) and orbitrap based high resolution mass spectrometry in metabolomics and lipidomics. Int J Mol Sci. 2016;17(6):816. 10.3390/ijms17060816. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Kanawati B, Wanczek KP, Schmitt-Kopplin P. Data processing and automation in fourier transform mass spectrometry. In: Fundamentals and Applications of Fourier Transform Mass Spectrometry; 2019. pp. 133–185. Elsevier. 10.1016/B978-0-12-814013-0.00006-5.

[CR5] 5.Rodgers RP, Marshall AG. Petroleomics: Advanced characterization of petroleum-derived materials by fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS). In: Mullins, O.C., Sheu, E.Y., Hammami, A., Marshall, A.G. (eds.) Asphaltenes, Heavy Oils, and Petroleomics; 2007. p. 63–93. Springer. 10.1007/0-387-68903-6_3.

[CR6] 6.Miladinović SM, Kozhinov AN, Gorshkov MV, Tsybin YO. On the utility of isotopic fine structure mass spectrometry in protein identification. Anal Chem. 2012;84(9):4042–51. 10.1021/ac2034584. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Nagao T, Yukihira D, Fujimura Y, Saito K, Takahashi K, Miura D, et al. Power of isotopic fine structure for unambiguous determination of metabolite elemental compositions: in silico evaluation and metabolomic application. Anal Chim Acta. 2014;813:70–6. 10.1016/j.aca.2014.01.032. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Zhang R, Chen B, Zhang H, Tu L, Luan T. Stable isotope-based metabolic flux analysis: a robust tool for revealing toxicity pathways of emerging contaminants. TrAC Trends Anal Chem. 2023;159:116909. 10.1016/j.trac.2022.116909. [Google Scholar]

[CR9] 9.Qian Z-Y, Ma J, Sun C-L, Li Z-G, Xian Q-M, Gong T-T, et al. Using stable isotope labeling to study the nitrogen metabolism in anabaena flos-aquae growth and anatoxin biosynthesis. Water Res. 2017;127:223–9. 10.1016/j.watres.2017.09.060. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Chokkathukalam A, Kim D-H, Barrett MP, Breitling R, Creek DJ. Stable isotope-labeling studies in metabolomics: new insights into structure and dynamics of metabolic networks. Bioanalysis. 2014;6(4):511–24. 10.4155/bio.13.348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Shih L-M, Tang H-Y, Lynn K-S, Huang C-Y, Ho H-Y, Cheng M-L. Stable isotope-labeled lipidomics to unravel the heterogeneous development lipotoxicity. Molecules. 2018;23(11):2862. 10.3390/molecules23112862. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Gevaert K, Impens F, Ghesquière B, Van Damme P, Lambrechts A, Vandekerckhove J. Stable isotopic labeling in proteomics. Proteomics. 2008;8(23):4873–85. 10.1002/pmic.200800421. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Maia M, Figueiredo A, Cordeiro C, Sousa Silva M. FT-ICR-MS-based metabolomics: a deep dive into plant metabolism. Mass Spectrom Rev. 2023;42(5):1535–56. 10.1002/mas.21731. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Angel R, Panhölzl C, Gabriel R, Herbold C, Wanek W, Richter A, et al. Application of stable-isotope labelling techniques for the detection of active diazotrophs. Environ Microbiol. 2018;20(1):44–61. 10.1111/1462-2920.13954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Mutlib AE. Application of stable isotope-labeled compounds in metabolism and in metabolism-mediated toxicity studies. Chem Res Toxicol. 2008;21(9):1672–89. 10.1021/tx800139z. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Knapp DR, Gaffney TE. Use of stable isotopes in pharmacology-clinical pharmacology. Clin Pharm Ther. 1972;13(3):307–16. 10.1002/cpt1972133307. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Schellekens RCA, Stellaard F, Woerdenbag HJ, Frijlink HW, Kosterink JGW. Applications of stable isotopes in clinical pharmacology. Br J Clin Pharm. 2011;72(6):879–97. 10.1111/j.1365-2125.2011.04071.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Tolić N, Liu Y, Liyu A, Shen Y, Tfaily MM, Kujawinski EB, et al. Formularity: software for automated formula assignment of natural and other organic matter from ultrahigh-resolution mass spectra. Anal Chem. 2017;89(23):12659–65. 10.1021/acs.analchem.7b03318. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Corilo YE, Kew WR, McCue LA, Heal K, Carr JC. EMSL-Computing/CoreMS: CoreMS 3.0.0. 2024. 10.5281/ZENODO.14009575. https://zenodo.org/doi/10.5281/zenodo.14009575.

[CR20] 20.Schum SK, Brown LE, Mazzoleni LR. MFAssignR: Molecular formula assignment software for ultrahigh resolution mass spectrometry analysis of environmental complex mixtures. Environ Res. 2020;191:110114. 10.1016/j.envres.2020.110114. [DOI] [PubMed]

[CR21] 21.Ayala-Ortiz C, Graf-Grachet N, Freire-Zapata V, Fudyma J, Hildebrand G, AminiTabrizi R, Howard-Varona C, Corilo YE, Hess N, Duhaime MB, Sullivan MB, Tfaily MM. MetaboDirect: an analytical pipeline for the processing of FT-ICR MS-based metabolomic data. Microbiome. 2023;11(1):28 . 10.1186/s40168-023-01476-3. [DOI] [PMC free article] [PubMed]

[CR22] 22.Bramer LM, White AM, Stratton KG, Thompson AM, Claborne D, Hofmockel K, McCue LA. ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data. Comp Biol. 2020;16(3):1007654. 10.1371/journal.pcbi.1007654. [DOI] [PMC free article] [PubMed]

[CR23] 23.Degnan D, Claborne D, White A, Akers S, Winans N, Corilo Y, Strauch C, Bailey V, McCue L, Stratton K, Bramer L. FREDA: A web application for the processing, analysis, and visualization of fourier-transform mass spectrometry data. Rapid Commun Mass Spectrom. 2025;39(7):9980. 10.1002/rcm.9980. [DOI] [PMC free article] [PubMed]

[CR24] 24.Kitson E, Kew W, Ding W, Bell NGA. PyKrev: A Python library for the analysis of complex mixture FT-MS data. J Am Soc Mass Spectrom. 2021;32(5):1263–7. 10.1021/jasms.1c00064. [DOI] [PubMed]

[CR25] 25.Leefmann T, Frickenhaus S, Koch BP. UltraMassExplorer: a browser-based application for the evaluation of high-resolution mass spectrometric data. Rapid Commun Mass Spectrom. 2019;33(2):193–202. 10.1002/rcm.8315. [DOI] [PubMed]

[CR26] 26.Rosa TR, Folli GS, Pacheco WLS, Castro MP, Romão W, Filgueiras PR. DropMS: Petroleomics data treatment based in web server for high-resolution mass spectrometry. J Am Soc Mass Spectrom. 2020;31(7):1483–90. 10.1021/jasms.0c00109. [DOI] [PubMed]

[CR27] 27.Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, et al. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 2022;50(D1):622–31. 10.1093/nar/gkab1062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27–30. 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Kanehisa M, Furumichi M, Sato Y, Matsuura Y, Ishiguro-Watanabe M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 2025;53(D1):672–7. 10.1093/nar/gkae909. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Coursey JS, Schwab DJ, Tsai JJ, Dragoset RA. Atomic Weights and Isotopic Compositions (version 4.1). National Institute of Standards and Technology, Gaithersburg, MD; 2015. [Online] Available: http://physics.nist.gov/Comp [Downloaded Feb 26, 2025].

[CR31] 31.Meija J, Coplen TB, Berglund M, Brand WA, Bièvre PD, Gröning M, Holden NE, Irrgeher J, Loss RD, Walczyk T, Prohaska T. Atomic weights of the elements 2013 (IUPAC technical report). Pure Appl Chem. 2016;88(3):265–291. 10.1515/pac-2015-0305.

[CR32] 32.Behringer G, Ochsenkühn MA, Fei C, Fanning J, Koester JA, Amin SA. Bacterial communities of diatoms display strong conservation across strains and time. Front Microbiol. 2018;9:659. 10.3389/fmicb.2018.00659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Guillard RR, Ryther JH. Studies of marine planktonic diatoms. I. Cyclotella nana Hustedt, and Detonula confervacea (cleve) Gran. Can J Microbiol. 1962;8:229–39. 10.1139/m62-029. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Ochsenkühn MA, Schmitt-Kopplin P, Harir M, Amin SA. Coral metabolite gradients affect microbial community structures and act as a disease cue. Commun Biol. 2018;1(1):1–10. 10.1038/s42003-018-0189-1. [DOI] [PMC free article] [PubMed]

[CR36] 35.Mahmud I, Wei B, Veillon L, Tan L, Martinez S, Tran B, et al. Ion suppression correction and normalization for non-targeted metabolomics. Nat Commun. 2025;16(1):1–30. 10.1038/s41467-025-56646-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MIMI: Molecular Isotope Mass Identifier for stable isotope-labeled Fourier transform ultra-high mass resolution data analysis

Nabil Rahiman

Michael A Ochsenkühn

Shady A Amin

Kristin C Gunsalus

Abstract

Background

Results

Conclusions

Background

Implementation

Fig. 1.

Software architecture

Preprocessing

Fig. 2.

Mass analysis

Fig. 3.

Output

Materials and methods

Results and discussion

Fig. 4.

Fig. 5.

Table 1.

Table 2.

Conclusion

Availability and requirements

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases