Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 May 19.
Published in final edited form as: Metabolomics. 2023 Feb 6;19(2):11. doi: 10.1007/s11306-023-01974-3

Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software

Xinsong Du 1, Farhad Dastmalchi 1, Hao Ye 2, Timothy J Garrett 3, Matthew A Diller 1, Mei Liu 1, William R Hogan 1, Mathias Brochhausen 4, Dominick J Lemas 1,5,6
PMCID: PMC12087353  NIHMSID: NIHMS2069783  PMID: 36745241

Abstract

Background

Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles – Findability, Accessibility, Interoperability, and Reusability – were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS).

Aim of review

This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software.

Key scientific concepts of review

We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.

Keywords: FAIR principles, Metabolomics, Research software, Liquid chromatography-mass spectrometry, Open science, Reproducibility

1. Introduction

Metabolomics is the systematic study of small molecules (metabolites) within cells, biofluids, tissues, or organisms, and it has been widely used in clinical studies (Zhang et al., 2020). Liquid Chromatography – High Resolution Mass Spectrometry (LC-HRMS) is a popular data acquisition technique due to its high sensitivity and specificity (Zhang et al., 2020). Data processing is the first computational step of a metabolomics study and is very important for downstream analysis, such as statistical analysis and data interpretation (Du et al., 2022). Therefore, our study focused on LC-HRMS metabolomics data processing steps. The FAIR Principles – Findability, Accessibility, Interoperability, and Reusability – were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. To date, numerous software has been developed for LC-HRMS metabolomics data processing (Spicer et al., 2017) such as XCMS (Smith et al., 2006; Tautenhahn et al., 2012), MZmine (Pluskal et al., 2010), and MS-DIAL (Tsugawa et al., 2015); however, there is limited information on the FAIRness of LC-HRMS metabolomics data processing software.

To enhance the propensity of data and other digital objects for sharing and reuse by humans and at scale by machines, the FAIR Principles (Findable, Accessible, Interoperable, and Reusable) originated in the Netherlands during the 2014 Lorentz Workshop “Jointly Designing a Data FAIRport”. Following consultation via the Future of Research Communications and e-Scholarship (FORCE11), the FAIR Principles with 15 detailed guiding principles were published by Wilkinson et al., in 2016 (Wilkinson et al., 2016). With such an arresting and rhetorically useful acronym, the FAIR Principles have gained greater uptake than earlier encapsulation of these ideas (Directorate-General for Research and Innovation (European Commission), 2018). To date, FAIR Principles have been implemented in several areas such as precision oncology data sharing and bioinformatics data management (Mayer et al., 2021; Vesteghem et al., 2020). The adoption of FAIR Principles does not only require FAIR data, but also other FAIR digital objects such as software (Chue Hong et al., 2022). FAIR for Research Software (FAIR4RS) Working Group (Katz, Barker, et al., 2021) was established in 2020 to develop community-endorsed FAIR Principles for Research Software (Chue Hong et al., 2022). In the released FAIR4RS Principles, “Findable” (F) means the software and its metadata can be easily found by humans and machines; “Accessible” (A) means the software and its metadata can be retrieved via standardized protocols; “Interoperable” (I) means software can interoperates with other software; and “Reusable” (R) means software can be understood, modified, built upon, or incorporated into other software (Chue Hong et al., 2022). Clinical studies conducted with FAIR research software are associated with better transparency, reproducibility, and reusability (Barker et al., 2022), leading to more useful research and better translational potential (Goodman et al., 2016). However, the FAIR4RS Principles is a general description that only depicts a continuum of features that moves a research software closer to that goal (Wilkinson et al., 2019). Since FAIR speaks to machine-actionable operations, FAIR digital objects should be amenable to unambiguous forms of validation and evaluation, and a practical interpretation of the FAIR4RS Principles and the development of detailed evaluation guidelines are needed for the implementation (Wilkinson et al., 2019). In the field of LC-HRMS metabolomics data processing, a detailed implementation solution for FAIR4RS Principles has not been investigated yet.

The purpose of this study is to implement FAIR4RS Principles in the evaluation of LC-HRMS metabolomics data processing software. Through a systematic review, we identified relevant LC-HRMS metabolomics data processing software. Next, we evaluated them using criteria related to FAIR4RS Principles, which were based on published papers regarding best practices of research software and internal discussions (XD, MAD, HY, DJL). Each of the criteria was also mapped to the corresponding FAIR4RS Principles. To remove potential ambiguity, detailed evaluation guidelines were also developed and refined through discussion. The goal of our analysis is focused on identifying strategies to improve the FAIRness of software for LC-HRMS metabolomics data processing.

2. Methods

2.1. Study selection process

As illustrated by Fig. 1, we followed PRISMA guidelines for literature review (Liberati et al., 2009). Steps include keyword search, duplicate removal, title-abstract scanning, full-text review, and tool extraction. Search databases include Web of Science (WOS), PubMed, and Embase. The literature search strategy was established by consulting a librarian. As shown by Table S1, search terms have words related to LC-HRMS metabolomics, software, reproducibility, and LC-HRMS data processing. During the Boolean search, we restricted search fields to title and abstract, as well as subject headings for some keywords, and only studies published during the past 5 years as of 2021 August were included. During the paper screening, Covidence software was employed to help with the process (Covidence - Better Systematic Review Management, n.d.). We extracted potentially related software from eligible papers based on sentences around their names in the paper or descriptions in the tool’s paper if the tool has a publication. Then, the software was selected and evaluated based on eligibility and evaluation criteria via reading all their available documentation, publications, and code repositories. Notably, each paper was screened by two reviewers and each software was evaluated by two reviewers. All criteria could be considered as a question like “does the paper/software has a specific feature”. When there is a disagreement between the two reviewers, the reviewer said the paper/software had the feature that would need to show evidence and persuade the other reviewer, then the final label of a specific criterion was decided based on the result of the persuasion. Detailed guidelines for resolving discrepancies were illustrated in Table S3.

Fig. 1.

Fig. 1

Consort diagram for the literature review and the computational tool extraction. The screening has two phases: literature screening and tool screening. During literature screening, 1396 papers published in the past 5 years were obtained through keyword search, 70 papers were finally included as relevant ones. In tool screening phase, 122 potentially relevant software were extracted from the 70 papers. We added two software that were recommended by an expert but were not mentioned in the 70 papers. All 124 software were reviewed in more detailed by reading other available resources online such as documentations and code repositories. 61 software were finally considered eligible for final FAIR4RS review.

2.2. Inclusion and exclusion criteria

2.2.1. Related steps

We refer LC-HRMS metabolomics data processing to all steps after data acquisition and before statistical analysis. We documented 13 relevant steps for each software tool. A detailed list and descriptions of the steps are included in the supplementary material Table S2 along with literature references (Clasquin et al., 2012; DeFelice et al., 2017; FillPeaks-Methods, n.d.; Liu et al., 2020; Mayer et al., 2013; Smith et al., 2006; Zhou et al., 2012). We created a controlled vocabulary for these steps based on ontologies and literature. The steps were categorized into four main categories:

  1. Data preparation: Steps included in this category happen immediately after data acquisition. These steps make the data usable with computational software in the lab. File format conversion and parameter optimization are the only two steps involved.

  2. Feature generation: This category represents processes of generating features and their intensity values in the peak table. Nine steps are included in this category: mass detection, chromatogram building, deconvolution, peak grouping, retention time alignment, peak filling, ion annotation, and batch effect correction.

  3. Quality control: Evaluating the analytical variability of the data to make sure the acquired data have good quality for downstream analysis.

  4. Metabolite annotation: The process of determining the identity or chemical structure of a metabolite underlying a group of peaks, methods that are used to enhance the proximation result are also included in this category.

2.2.2. Paper screening

The goal of paper screening was to select papers that mentioned or used software related to LC-HRMS metabolomics data processing. Therefore, during title-abstract scanning, we excluded studies based on the following criteria:

  1. We excluded papers that were not written in English.

  2. We excluded papers whose instrumental analysis did not include LC-HRMS. For example, studies used gas chromatography-mass spectrometry (GC-MS) or nuclear magnetic resonance (NMR).

  3. We excluded papers that were not about metabolomics, such as papers focusing on proteomics.

  4. We excluded papers that did not evaluate computational steps. For instance, some studies only focused on sample processing steps.

  5. We excluded reviews that did not describe specific LC-HRMS metabolomics software.

During the full-text reading stage, we used information provided within the paper and used the following exclusion criteria for further filtering:

  1. The paper did not include a computational tool.

  2. No tool in the paper was about metabolomics.

  3. Some software in the paper was about metabolomics but not related to LC-HRMS data processing steps.

2.2.3. Tool screening

For initial screening, the relevance of a certain tool was judged according to its context in the paper. Software for which at least one reviewer thought it could perform related steps was included. During tool screening, we further investigated whether a tool had a certain function based on its documentation. For example, if a tool did not contain a tutorial regarding how to do quality control, we would think it did not have the function; this applied even if a researcher might figure out a way to do quality control with the tool. Then, the software was further filtered by reading publicly available documentation, publications, and code repositories. The exclusion criteria were:

  1. The tool was no longer available.

  2. The tool was produced by a commercial company.

  3. Functions included in the tool were not related to the steps in Table S2.

  4. The tool was an extension of an existing tool and did not have its webpage or documentation.

2.3. Information gathering process

To extract information manually, the full texts of selected articles were read by two reviewers. Relevant software included in these articles was deduplicated and evaluated by two reviewers for general information - name, first release date, supported operating systems, major programming languages, literature citation, supported mode (i.e., web-based/standalone/plug-in), and open-source or not - about the tool and specific FAIR4RS related criteria. Data processing steps a software tool can perform were extracted by going through the documentation as well as literature containing relevant information (Li et al., 2018; Misra, 2018, 2021; Misra et al., 2017; Misra & Mohapatra, 2019; O’Shea & Misra, 2020; Stanstrup et al., 2019). To ensure the correctness of the annotation of steps that a software tool can perform, we first did the annotation by ourselves, then contacted all senior authors of included software via emails for confirmation, correction, and comments. We also sent a follow-up email to software authors that did not respond within five days. The software authors we contacted were recorded in Table S4. FAIR4RS-related evaluation criteria were extracted by going through publications related to research software best practices, and additional criteria were added based on internal discussions (XD, MAD, HY, DJL). Each selected evaluation criterion was then assigned to one or more FAIR4RS categories through discussions between two bioinformaticians (XD and MAD) and one reproducibility expert (HY). The detailed evaluation guideline for each criterion was created and refined through discussion between the two reviewers (XD and FD) to make sure the description was not ambiguous. We screened the latest version of each tool at the time of evaluation, the time spanned from October 2021 to August 2022. During the evaluation process, each criterion was assigned a value of “TRUE”, “FALSE” or “NA” based on publicly available documentation and code repositories. “TRUE” represents that the tool met the criterion; “FALSE” meant the tool did not meet the criterion; “NA” meant the criterion does not apply to the tool. For example, if the tool did not have the command-line option, then the criterion of “include help command” would be not applicable (NA).

2.4. Synthesis of results

The percentage of criteria fulfillment of each tool was calculated by dividing the number of “TRUE”s by the number of applicable criteria of the tool (i.e. ignoring any NA values). We also calculated the percentage of software that met each criterion, which was calculated by dividing the number of “TRUE”s by the number of software that could apply the criterion. For example, a tool may have 45 applicable criteria out of the total 47 criteria, and 35 criteria were labeled “TRUE”, then the percentage of criteria fulfillment would be 35/45*100%=77.8%. This can identify FAIR4RS-related criteria that merit further attention from the authors of each tool as well as future developers.

3. Results

3.1. Paper screening

As illustrated in Fig. 1, the initial search generated 1396 records and 954 articles after removing duplicates. The title-abstract screening reduced the number to 87 articles. During the full-text screening, 17 articles were excluded: 4 did not include a computational tool; 8 mentioned computational software but they were not about metabolomics; and 5 included metabolomics software but the software was not for LC-HRMS data processing. Paper screening record exported from Covidence is included in the supplementary material Table S4.

3.2. Tool screening

We extracted 122 potentially related software from the 70 eligible papers, then added two relevant software (EI-MAVEN, MS-FLO) we knew but were not mentioned in the 70 papers. That said, the two software were not mentioned in the notes produced by reviewers (Table S5 -> sheet “full_text_included” -> column “Notes”) during the full-text review. The number of software was reduced to 61 after reviewing their documentation in detail. In terms of function annotation, senior authors of 49 out of the 61 included software responded to our inquiry emails. General information and detailed evaluation results of the 61 selected software (Adusumilli & Mallick, 2017; Agrawal et al., 2019; Alonso et al., 2011; Broeckling et al., 2014; Brunius et al., 2016; Bueschl et al., 2017; Cai et al., 2015; Capellades et al., 2016; Chokkathukalam et al., 2013; Chong & Xia, 2018; Clasquin et al., 2012; Creek et al., 2012; Davidson et al., 2016; De Livera et al., 2018; DeFelice et al., 2017; Del Carratore et al., 2019; Dührkop et al., 2019; Fischer et al., 2022; Franceschi et al., 2014; Gatto et al., 2021; Giacomoni et al., 2015; Guo et al., 2021; Helmus et al., 2021; Huan & Li, 2015a, b; Huang et al., 2014; F. Huber, Ridder, et al., 2021; Hughes et al., 2014; Jaitly et al., 2009; Ji et al., 2017, 2019; Kantz et al., 2019; Kuhl et al., 2012; Kutuzova et al., 2020; Li et al., 2017; Libiseller et al., 2015; Liggi et al., 2018; Lommen, 2009; Loos, 2016; Mahieu et al., 2016; Müller et al., 2020; Olivon et al., 2018; Palarea-Albaladejo et al., 2018; Pang et al., 2021; Pluskal et al., 2010; Protsyuk et al., 2018; Ridder et al., 2012; Ross et al., 2020; Röst et al., 2016; Ruttkies et al., 2016; Shen et al., 2019; Smith et al., 2006; Tautenhahn et al., 2012; Teo et al., 2020; Tsugawa et al., 2015, 2016; Uppal et al., 2013, 2017; Weber & Viant, 2010; Yu et al., 2009; Zhou et al., 2014) was included in the supplementary material Table S5. Venn diagrams regarding technical properties were illustrated in the supplementary material Figure S1. In the 63 excluded software, 8 were excluded because they were not available, 24 were excluded due to their association with commercial companies, 28 were excluded because functions were not relevant, and 3 were excluded because they were not independent (i.e., an extension of another selected tool). Here “independent” means the tool has its documentation or webpage. For example, we consider MetaboAnalystR a different tool from MetaboAnalyst since MetaboAnalystR has its own GitHub page and webpage. However, we considered CSI:FinderID part of SIRIUS since we did not find CSI:FinderID to have its documentation at the time we did our evaluation. Similarly, MSnBase and CAMERA are included in the xcms software tool, but they were considered “independent” in our study since they had their webpage and documentation. All excluded software and detailed reasons for exclusion were shown in the supplementary material Table S6.

3.3. FAIR4RS evaluation

FAIR4RS evaluation results were illustrated in Fig. 2 and Fig. 3. As illustrated in Table 1, we summarized 47 FAIR4RS-related criteria in total, among which 41 were from the literature regarding research software best practices, and 6 were based on internal discussions (XD, MAD, HY, DJL). At the time of our evaluation, the minimum, first quartile, median, third quartile, and maximum percentages of the fulfillment of criteria were about 21.6%, 39.5%, 47.7%, 57.9%, and 71.8%. Two software (OpenMS and patRoon) fulfilled no less than 70% applicable criteria. Ten software met 60–70% applicable criteria, they were MSnbase, EI-MAVEN, smartPeak, Spec2Vec, SIRIUS, ProteoWizard-msConvert, W4M, xcms, MetaboAnalystR, and MAGMa. Additionally, we found seven software (OpenMS, patron, EI-MAVEN, W4M, MetaboAnalystR, MS-DIAL, MetaboAnalyst) could perform the greatest number of steps (9 out of 13) involved in LC-HRMS metabolomics data processing. Three software were powered by artificial intelligence (AI) techniques (Spec2Vec, DeepMASS, and No_NAME), and they fulfilled 62.2%, 47.4%, and 26.3% of our evaluation criteria. We also found most of the included software was open source except MetaboAnalyst, MetAlign, and XCMS-Online, and none of the closed-source software was ranked within the first quartile of all included software.

Fig. 2.

Fig. 2

Summary of basic results. Functions associated with each tool were displayed on the left side. Percentages of evaluating criteria that a software tool fulfilled are shown in the middle of the figure. The right panel indicates the percentage of criteria fulfillment of each tool. Additionally, names of software powered by AI techniques are highlighted in red, and software that are closed source are highlighted in black.

Fig. 3.

Fig. 3

Line chart reflecting the criteria fulfillment of each category. X-axis represents tool names, ranked by overall percentage of fulfillment from left to right. Y-axis on the left side stands for percentage of fulfilled criteria. Overall fulfillment is represented by the black solid line. Fulfillments of findability, accessibility, interoperability, and reusability categories are represented by blue, yellow, green, and red dash line separately. Software are ranked by their release time, which is represented by the grey line along with the secondary y-axis.

Table 1.

Information about evaluation criteria

Criteria FAIR4RS category (Chue Hong et al., 2022; Hasselbring et al., 2020; Lamprecht et al., 2020) Evaluation guideline

Register to Zenodo and get DOI (Georgeson et al., 2019) F1.2, A1.1, A2 Whether the software has been deposited to Zenodo (https://zenodo.org/) and got a DOI.For example, some software may have a Zenodo badge on the GitHub repo. Bioconductor DOI does not count since Bioconductor assigns DOI to only the current version. Only using GitHub does not count, since the GitHub URL, which relies on alterable domain name resolution, does not guarantee the persistence of identifiers (Berrios et al., n.d.).
Have a license (Georgeson et al., 2019; Jiménez et al., 2017) R1.1 If the license appears in any documentation of the software, then this would be TRUE.
Provide a command-line option (Georgeson et al., 2019) I1, R3 Does any documentation teach users how to use the tool with a command-line (shell programming language, which can be used in a console)?
Use conventional input and output (Georgeson et al., 2019) I1, I2, R3 The software did not create a data format meanwhile forcing users to use the new format as the input/output, is this correct?
Output log independently (Georgeson et al., 2019) R Does the software output the progress logging or error logging to an independent place (e.g., a file for non-web-based software, or on a website for web-based software)
Document exit status (Georgeson et al., 2019) I Does the documentation describe the meanings of different exit codes produced after running the software? (e.g., the README file on this page https://github.com/bionitio-team/bionitio-python/search?q=exit+status)
Have continuous integration (Georgeson et al., 2019; Hunter-Zinck et al., 2021) I, R Does any documentation or publication of the software mention continuous integration? Has the software applied a tool like GitHub Action, Travis CI, Circle CI, etc.? If any answer to the above two questions is true, this would be TRUE, otherwise FALSE. If the software is not open source, then this would be NA.
Have software testing (Georgeson et al., 2019; Hunter-Zinck et al., 2021) R3 Does any documentation or any part of the code (or code comment) mention software testing (e.g., unit test, integration test, etc.)? If the software is not open source, then this would be NA.
Provide official software containerization or virtual machine (Georgeson et al., 2019) I, A1, R Has the software been packaged with a virtual machine or software container such as Docker or Singularity? Tips: software with containerization might have a file named “Dockerfile” or provide a link to Docker Hub. If the software is not open source, then this would be NA.
Have standard packaging (Georgeson et al., 2019; Heil et al., 2021) F4, A1, I, R If the software is an API, has it been packaged? If it has, users should be able to install it with one command line within the programming language like “pip install” or “install.packages”. If the software is not a plug-in, put NA
Have version control for code (Heil et al., 2021; Jiménez et al., 2017; Ram, 2013) A1 Does the tool’s code use software like GitHub, GitLab, or BitBucket to do version control? The repo should be created by the author, instead of being created by others or a system bot (e.g., CRAN bot). If the software is not open source, then this would be NA.
Have automated code quality checks (Georgeson et al., 2019) R3 Does the tool use a tool like “Codacy” or “quality gate” to report code quality? If the software is not open source, then this would be NA.
Have code coverage assessment (Aghamohammadi et al., 2021; Hunter-Zinck et al., 2021) R3 Does the software use a tool like SonarCloud (https://sonarcloud.io/) or Codecov (https://about.codecov.io/) to report the percentage of code coverage? If the software is not open source, then this would be NA.
Provide example input data (Leprevost et al., 2014) R1, R3 Are any example data provided? The example data may be in the documentation or the code repo.
Provide example results (Zhao et al., 2012) R1, R3 Has any output result (produced by running the software with the provided example data) been displayed in the documentation or the code repo? If the software did not provide any example data, this would be “NA”
Provide issue tracking (Leprevost et al., 2014) R1.2 Is there any issue tracking tool used? Such as GitHub Issue (software that has a GitHub/GitLab/BitBucket should fulfill this criterion as long as the repo was created by the author). There is also some other issue-tracking software (see if the documentation/publication mentions any other issue-tracking software if GitHub/GitLab/Bitbucket was not used).
Have user documentation (Georgeson et al., 2019) R1 Is there documentation for users?
Have developer documentation (Georgeson et al., 2019) R1 Is there documentation for developers, which has information about how to contribute to the tool?
Have an installation guide in documentation (Karimzadeh & Hoffman, 2018) R1 Does the documentation teach you how to install the software?
Include dependencies and version numbers in documentation (Karimzadeh & Hoffman, 2018) I2, R2, F4 Do the documentation mention dependencies (e.g., operating system, programming language, packages, etc.) and some of their versions?
Provide a quick start in documentation (Lee, 2018) R1 Does the documentation teach you how to run the software with provided example data? If example data were not provided, put NA.
Provide a configuration guide in documentation (Karimzadeh & Hoffman, 2018) I2, R Each software tool may have a bunch of parameters that users need to set up. Does the documentation talk about the meaning of the parameters (including the input and output of the software)? Software that only provides a GUI version do not applicable (i.e., put NA)
Have FAQ in documentation (Karimzadeh & Hoffman, 2018) R1 Does the documentation have an FAQ?
Have a searchable forum (Karimzadeh & Hoffman, 2018) R Does the documentation mention any forum for the software?
Have a mailing list in documentation (Karimzadeh & Hoffman, 2018) R Does the documentation mention emails that you can contact?
Have a change log in documentation (Lee, 2018) R1.2 Does the documentation include a changelog (i.e., a document that describes changes and time of changes made by contributors, such as “NEWS” or “releases”, etc? GitHub commit does not count, since it might not be accurate and a person can upload code written by others to GitHub and has one commit record for multiple important changes)?
Have a citation guide in documentation (Lee, 2018) R1, F3 Does the documentation mention how to cite the software or publications associated with the software?
Have historical contribution record in documentation (Lee, 2018) R1.2 Does the documentation have a part that lists all contributors? Contributors shown on the right side of the GitHub page do not count, since this contributor section is created by GitHub automatically, but people may upload code or documents written by others to GitHub.
Have fully documented functions in code (Lee, 2018) I, R Randomly check the code written by the authors and see if the function includes comments that describe the function and its input and return values. If the software is not open source, then this would be NA.
Include help command (Lee, 2018; Seemann, 2013) R1 Does the documentation mention a command line that can provide information about how to use the software? If the tool does not provide a command-line argument, then put NA.
Have a comment including the creator’s name in the code (Karimzadeh & Hoffman, 2018) R1.2 Randomly check the code written by the authors and see if it mentioned the creator’s name. This would be “NA” if the software is not open source.
Have a comment including the creation date in the code (Karimzadeh & Hoffman, 2018) R1.2 Randomly check the code written by the authors and see if it mentioned the date of creation. This would be “NA” if the software is not open source.
Have a comment including the creator’s email in the code (Karimzadeh & Hoffman, 2018) R1.2 Randomly check the code written by the authors and see if it mentioned the creator’s email address. This would be “NA” if the software is not open source.
Have version control for documentation (Lee, 2018) R1 Does any of the documentation associated with the software use a version control system such as GitHub?
Have web-based documentation (Karimzadeh & Hoffman, 2018) F4 Is any of the documentation web-based (i.e., not in a compressed file that you need to download and decompress, or you can only see them after installing the software via an executable file), which makes it easier for being found by a search engine?
Have information on potential errors and warnings as well as ways to resolve them in documentation (Karimzadeh & Hoffman, 2018) R1 Does the documentation mention any commonly appeared errors and ways to resolve them? This is usually in the FAQ section but may also be in other parts of the document.
Explain file format in documentation if a new format is created (Karimzadeh & Hoffman, 2018) I2, R1 If any new file format was created by the authors, does the documentation have explanations of the new format? If no new file was created, put “NA”.
Provide graphical instruction if the software has a GUI (Karimzadeh & Hoffman, 2018) R1 If the software has GUI (web-based software count GUI software), does the documentation include a screenshot (visualization) to explain the software? Put “NA” if the software does not have GUI. Notably, we considered all web-based tools having GUI.
Have information about supported OS in documentation (Karimzadeh & Hoffman, 2018) R2 Can you get the information about which operating system is supported from the documentation? This is only applicable to software with a standalone mode.
Include any semantic annotation with controlled vocabulary underlying ontologies in the documentation (Palmblad et al., 2019) F2, I, R1, R3 Does the documentation mention any part of the software is described with controlled vocabulary associated with an ontology?
Have a name F2 Does the software have a name?
Have a publication (Romano & Moore, 2020) F2, F4, R1 Does the software have a scientific publication?
Have the capability of consuming a config file containing the parameter setting information I2 Can the software consume a config file including parameter setting information? This does not apply to software that only has plug-in mode.
Provide interpretation of the independent log file in documentation R1 Does the documentation tell you what the inside of the output log file looks like?
Have an output log file containing parameter setting information R1 Does the output log file include parameter setting information? If the software does not output an independent log file (or log information that is shown on a webpage from a web-based tool), put “NA”.
Have an output log file containing the version of the software R1 Does the output log file include the version information of the software being used? If the software does not output an independent log file (or log information that can be downloaded from a web-based tool), put “NA”.
Have an output log file containing loaded modules/dependencies during the execution R2 Does the output log file include the loaded modules/dependencies of the execution? If the software does not output an independent log file (or log information that can be downloaded from a web-based tool), put “NA”.

The table includes the name (left), FAIR categorization (middle), and description (right) of each criterion.

Figure 4 represented the percentage of criteria fulfilled in each category. Notably, some criteria were related to multiple FAIR4RS categories. An overview of the evaluation results in terms of each category is illustrated below:

Fig. 4.

Fig. 4

Percentage of software that fulfilled each evaluation criterion. X-axes are values for percentage of fulfillment, and y-axes list all evaluation criteria. There are 4 categories and 47 criteria in total. (A) Blue bars represent findability related criteria; (B) yellow bars represent criteria related to accessibility; (C) green bars represent criteria related to interoperability; and (D) red bars stand for criteria regarding reusability.

  1. Findability: We identified eight criteria that were related to findability. Among the eight criteria, “Have web-based documentation” had the best fulfillment of 100%. “Register to Zenodo and get DOI” was associated with the least fulfillment of ~ 6% and “Include any semantic annotation with controlled vocabulary underlying ontologies in the documentation” had the lowest fulfillment of 0%.

  2. Accessibility: We identified four criteria that were corresponding to accessibility. Among the four criteria, “Code has version control” had the greatest fulfillment of over 80%. However, “Provide official software containerization or virtual machine” had ~ 14% fulfillment, and “Register to Zenodo and get DOI” was associated with the least fulfillment of ~ 6%.

  3. Interoperability: We identified 12 criteria that were about interoperability. Among the 12 criteria, “Use conventional input and output” and “Explain file format in documentation if a new format is created” had the best fulfillment of 100%. Nevertheless, “Have fully documented functions in code” had ~ 17% fulfillment, “Provide official software containerization or virtual machine” had ~ 14% fulfillment, “Document exit status” had ~ 2% fulfillment, and “Include any semantic annotation with controlled vocabulary underlying ontologies in the documentation” had 0% fulfillment.

  4. Reusability: We identified 41 criteria that were related to reusability. Among the 41 criteria, three criteria (“Use conventional input and output”, “Have user documentation”, and “Explain file format in documentation if a new format is created”) had 100% fulfillment. However, we found 13 criteria with a fulfillment of < 20%: “Have FAQ in documentation”, “Have developer documentation”, “Have information on potential errors and warnings as well as ways to resolve them in documentation”, “Have fully documented functions in code”, “Have an output log file containing loaded modules/dependencies during the execution”, “Have code coverage assessment”, “Provide official software containerization or virtual machine”, “Have a searchable forum”, “Have comment of creator’s email in code”, “Have historical contribution record in documentation”, “Have comment of creation date in code”, “Have automated code quality checks”, “Document exit status”, and “Include any semantic annotation with controlled vocabulary underlying ontologies in the documentation”.

3.4. FAIRness over time

Table S5 had information of the release time of every included tool. We observed that there was only one included tool was released in 2006 (xcms), 2008 (MZmine), and 2021 (DaDIA), and no tool was released in 2007. Multiple included software was associated with a release time within each year between 2009 and 2020. Figure 5 is a scatter plot representing the change of FAIRness through time. Pearson’s correlation (Schober et al., 2018) indicated: (1) there was no significant (p < 0.05) FAIRness improvement across time; (2) findability has a trend of decrease; (3) all four categories were positively correlated with a statistical significance, and interoperability and reusability had the highest coefficient. To investigate what factors might contribute to the decrease of findability, we did a multiple linear regression using criteria related to findability and found four factors were significantly and positively correlated to findability: (1) have a name; (2) have a citation guide in documentation; (3) register to Zenodo and get DOI; (4) have dependencies and version numbers in documentation.

Fig. 5.

Fig. 5

FAIRness trend across time. X-axis are years representing selected software? first release times, y-axis stands for the averaged %fulfillment of software released in each year. (A) represents the relationship between findability and tool release times; (B) represents the relationship between accessibility and tool release times; (C) represents the relationship between interoperability and tool release times; (D) represents the relationship between reusability and tool release times; (E) represents the overall FAIR4RS criteria fulfillment and tool release times. Results of Pearson?s correlation are also included in each sub-figure; (F) includes information regarding correlations among categories.

4. Discussion

The application of FAIR Principles in the metabolomics field has largely focused on data sharing and management (Mendez et al., 2019; Rocca-Serra & Sansone, 2019; Savoi et al., 2021). More recently, the FAIR Data Principles have been translated into research software (Hasselbring et al., 2020; Katz, Gruenpeter, et al., 2021; Lamprecht et al., 2020), and the first version of FAIR4RS Principles was released in May 2022 (Chue Hong et al., 2022). In this study, we used FAIR4RS-related criteria to evaluate 61 selected data processing software, including 58 open-source software and 3 closed-source software. Most (41 out of 61) included criteria were related to reusability, which is consistent with previous findings (Wolf et al., 2021), and we also found the four categories were positively related to each other. Notably, open-source code is not required by FAIR Principles (Katz, Gruenpeter, et al., 2021), thus, it was not one of our evaluation criteria. To date, no study has been conducted to investigate how to implement FAIR4RS Principles in the metabolomics field. Therefore, our study extends previous work by implementing FAIR4RS Principles to assess LC-HRMS metabolomics data processing software.

4.1. FAIRness improvement strategies

The primary findings from our evaluation revealed that semantic annotation of key information was associated with 0% fulfillment among all evaluated software, notably, the criterion is related to findability, interoperability, and reusability. Semantic annotation of key information includes functions, input and output data type, and format. Describing key information using controlled vocabulary underlying an ontology makes it readable and discoverable by both humans and machines (Lamprecht et al., 2020). Although we noticed that many software such as XCMS has been registered to bio.tools (Ison et al., 2019) that include semantic annotation of software, none of the official documentation mentioned this feature at the time of our evaluation. Notably, semantic annotation does not only improve the findability of the tool but also enables machines to find other similar software based on the annotation (Lamprecht et al., 2021). Semantic annotation of data processing software can be used to find relevant software to develop workflows automatically (Lamprecht et al., 2021), such systems have been used for analyzing proteomics data (Kasalica et al., 2021; Palmblad et al., 2019) and DNA sequence data (Zheng et al., 2015). Therefore, adding semantic annotation information to the official documentation may improve the FAIRness of LC-HRMS metabolomics data processing software dramatically.

We also found three other criteria that were related to multiple FAIR categories for which only a small percentage of software was compliant. Firstly, only about 6% of software was registered to Zenodo and received a digital object identifier (DOI). As explained in Table 1, unlike other identifiers such as GitHub page URL or the DOI assigned by Bioconductor, Zenodo assigns each version of the software a persistent and distinct DOI (Berrios et al., n.d.; van de Sandt et al., 2019), facilitating findability and accessibility. Notably, we also found the criterion of registering software to Zenodo contributes to the decrease in findability over time. The FAIR4RS Working Group created a community on Zenodo to ensure the FAIRness of research outputs (Chue Hong et al., 2022). Notably, GitHub now has an integration of Zenodo as a third-party tool to help users with referencing and citing content (Referencing and Citing Content, n.d.). The low percentage of fulfillment (6%) is also expected since we observed most of the included software are R packages (39 out of 61 as illustrated in Figure S1). Unlike other popular repositories for R packages such as GitHub, R-Force, CRAN, and Bioconductor (Decan et al., 2015); users cannot install an R package deposited to Zenodo in the console using “install.packages”. Therefore, to improve software FAIRness, we recommend future developers deposit packages to both Zenodo and an R repository. We also recommend Zenodo developers provide a wrapper or guide regarding how to install R packages using a Zenodo URL. Additionally, we found only about 15% of software provided an official containerized version, which is relevant to accessibility, interoperability, and reusability. Providing a containerized version enables users to execute the software on different machines smoothly and without worrying about the installation of dependencies (Georgeson et al., 2019; Senington et al., 2018). In terms of R-based software, containerization including only a single package may not be that useful since users may want to use multiple packages for the analysis. The RofMassSpectrometry Initiative (Rainer et al., 2022; RforMassSpectrometry, n.d.) and metaRbolomics Toolbox in Bioconductor (Stanstrup et al., 2019) are currently two popular ecosystems for R-based metabolomics software. Therefore, a community-wide containerized ecosystem including all commonly-used and mature R-based software for the metabolomics community would be very helpful. Another poorly fulfilled criterion is “fully documented functions in code”, which is related to interoperability and reusability. Documenting input, output, and errors that a function may raise makes it easier for users to inspect the software and learn how the software can interact with other software (Lee, 2018). Therefore, in addition to semantic annotation, our results also demonstrate the FAIRness of LC-HRMS metabolomics data processing software also needs: (1) registering to Zenodo and getting version controlled DOIs, (2) providing official software containerization or virtual machine, and (3) providing fully documented functions in code.

Advances in academic publishing of software have required checklists to ensure the quality of submissions, such as the Journal of Open Source Software (Review Checklist — JOSS Documentation, n.d.). Our results revealed that 59 out of 61 software were associated with a peer-reviewed publication. Checklists have been used to promote transparent reporting of metabolomics studies (Considine & Salek, 2019; Fiehn et al., 2007; Goodacre et al., 2007; Snyder et al., 2014; Sumner et al., 2007). Our recent work proposed a checklist to promote reproducible computational analysis of clinical metabolomics research (Du et al., 2022). The evaluation guideline in this study can also be used as a checklist by journals for promoting FAIR4RS Principles. A recent study shows that a direct request from the editor for research resource identifiers (RRID) is a more effective way for users to provide RRID than merely writing it in the journal’s author instructions document (Menke et al., 2020). Similarly, we expect a direct request from the journal editor regarding some criteria in our evaluation guideline will be a more effective way for FAIRness improvement than just providing a checklist in the author’s guideline. Additionally, some journals such as Analytical Chimica Acta now require an assessment of the tool by an independent advanced user, who is not a software developer or bioinformatician (Analytica Chimica Acta | Journal | ScienceDirect.Com by Elsevier, n.d.). In summary, our evaluation guideline can be used during peer-review as a checklist to improve the FAIRness of software associated with a scientific publication.

4.2. Strengths, limitations, and future works

Our study has several strengths. We followed PRISMA guidelines to extract literature in the past 5 years (from 2016 to 2021) that might include a metabolomics data processing tool. Our evaluation checklist synthesized FAIR4RS-related criteria from existing literature regarding best software practices, including reproducible software development practices (Heil et al., 2021; Jiménez et al., 2017; Ram, 2013), best practices for scientific software (Hunter-Zinck et al., 2021; Leprevost et al., 2014; Zhao et al., 2012), best software documenting practices (Karimzadeh & Hoffman, 2018; Lee, 2018), best practices for command-line software (Georgeson et al., 2019; Seemann, 2013), and best software testing practices (Aghamohammadi et al., 2021). We also filtered out criteria that were not required by FAIR4RS Principles such as open-source (Katz, Gruenpeter, et al., 2021). Therefore, our checklist is more comprehensive than existing checklists, and more focused on FAIR4RS Principles. Additionally, our checklist is more specific than the general FAIR4RS Principles and with a detailed evaluation guideline. We used multiple reviewers to mitigate subjective bias. We extracted related software from selected papers, and read their publicly available documentation, publications, and code repositories for a comprehensive FAIRness evaluation. The FAIR4RS criteria were synthesized from the literature and additional criteria were added via discussions among authors (XD, MAD, HY, DJL), and the criteria can also be used to guide future FAIR LC-HRMS metabolomics software development. Each criterion was assigned to one or more FAIR4RS categories based on a discussion between two bioinformaticians (XD and MAD) and one reproducibility expert (YH). During the evaluation, two reviewers (XD and FD) went through the official documentation, publications, and code repositories of the 61 included data processing software based on an agreed evaluation guideline. More importantly, our evaluation method leads to a meaningful result. The result indicates that findability decreases over time, which is expected since people usually try to publish first and then build up nicer documentation afterward when resources are available, which means it takes time after the release for findability to increase. Although we only evaluated non-commercial LC-HRMS metabolomics data processing software, the criteria may also be suitable for evaluating all types of software. To date, this is the first study to provide an implementation solution for using FAIR4RS Principles in the metabolomics field.

However, the results of this study should be considered along with some limitations of our experimental design. First, our evaluation was based only on publicly available materials including documentation and code. Secondly, we only checked the software’s official documentation but did not check the software’s machine-readable metadata (e.g., XCMS has machine-readable metadata in the software ontology (Malone et al., 2014)), which might be added by experts other than the software’s authors. Additionally, we considered all evaluation criteria equally important without weighting them. Furthermore, we included only software that was mentioned in metabolomics papers or metabolomics software review papers published in 2016–2021, thus, the list of tools may not be completed and some relevant software published very recently might not be included, such as MS2DeepScore (F. Huber, van der Burg, et al., 2021) and MS2Query (Jonge et al., 2022). The controlled vocabulary and categorization of data processing functions shown in Table S2 were produced by looking through literature (Clasquin et al., 2012; De Vos et al., 2007; FillPeaks-Methods, n.d.; Libiseller et al., 2015; Liu et al., 2020; Mayer et al., 2013; Smith et al., 2006; Vitale et al., 2022; Zhou et al., 2012) and discussing internally, and we admit that such a categorization is never easy. So, our function categorization may not be perfect and recognized by the entire metabolomics community, and software authors may have a slightly different understanding of our categorization and descriptions when responding to our inquiry. For instance, one software author emailed us saying functions like mass precision enhancement (Huber et al., 2022) should be part of the data processing steps. Notably, our study only focuses on evaluating the FAIRness of software using relevant qualitative properties, but FAIRness is just one of many considerations when selecting the software for a research workflow. For example, some software authors responded to us by saying the speed of data processing, the capability to process large-scale data, and the efficiency of hardware usage are very important aspects (Gatto et al., 2021). We did not use real-world data to get results using the included software and make a comparison. Our main focus is the software itself, and we did not consider factors (e.g., the type of the mass spectrometer) that may affect the development of the software. Additionally, it is essential to assess the quantitative performance of software using annotated LC-HRMS dataset and library (Hao et al., 2018). The FAIRness of the LC-HRMS dataset, spectral library, and software for metabolite proximation are important for the FAIRness of the entire workflow. Therefore, a future review regarding studies and available datasets for such benchmarking would be a very useful resource for FAIR LC-HRMS metabolomics workflows.

5. Conclusion

We presented a comprehensive FAIRness evaluation for existing non-commercial LC-HRMS metabolomics data processing software. We evaluated 61 qualified software with 47 criteria related to FAIR4RS. Our result indicated that no software had perfect FAIR4RS compliance (i.e., fulfilled 100% applicable criteria). The maximum of criteria fulfillment was 71.7%, meaning all evaluated software had considerable space for improvement. We also identified criteria that were poorly fulfilled for each category along with detailed strategies for improvement. We believe our study can serve as a guideline to create FAIR research software for LC-HRMS metabolomics data processing software.

Supplementary Material

Table S6
Table S5
Main
Table S4

Table 2.

Multiple linear regression for criteria related to findability

Criterion (missing%) Coefficient P-value

Have a name (0%) 23.72 0.00
Have a citation guide in documentation
(0%)
14.87 0.00
Register to Zenodo and get DOI (0%) 11.63 0.00
Have dependencies and version numbers in documentation (8.2%) 7.25 0.00
Have a publication (0%) 5.87 0.17
Have standard packaging (39.34%) 1.08 0.36
Has web-based documentation (0%) 0.00 0.03
Include any semantic annotation with controlled vocabulary underlying ontologies in the documentation (0%) −0.00 0.00

The table highlights multiple linear regression results for criteria related to findability. The first column includes criterion names, and the second and third columns indicate the coefficient and p-value of each criterion. Notably, The R2 of the regression model is 0.75 and the adjusted R2 is 0.72.

Acknowledgements

The authors thank Biswapriya Misra, Ph.D., for his constructive remarks and useful suggestions for the study. The authors also sincerely thank Bailey Ballard and Jianming (Jennifer) Wang for their help in the process of title-abstract screening. We would like to express a special thank to all software authors that took the time out of their busy schedule to respond our emails and provide thoughtful feedback regarding the annotation of software functions.

Funding

Research reported in this publication was supported by the University of Florida Informatics Institute Fellowship Program. Research reported in this publication was also supported by Southeast Center for Integrated Metabolomics at the University of Florida, the National Institute of Diabetes and Digestive and Kidney Diseases (K01DK115632), the University of Florida Clinical and Translational Science Institute (UL1TR001427). The content is solely the responsibility of the authors and does not necessarily represent the official views the University of Florida Informatics Institute, Southeast Center for Integrated Metabolomics at the University of Florida, University of Florida Clinical and Translational Science Institute, or the National Institutes of Health.

Footnotes

Declarations

Conflict of interest The authors declare that they have no competing interests.

Supplementary information The online version contains supplementary material available at https://doi.org/10.1007/s11306-023-01974-3.

References

  1. Adusumilli R, & Mallick P (2017). Data Conversion with ProteoWizard msConvert. Methods in Molecular Biology. (Clifton N J), 1550, 339–368. 10.1007/978-1-4939-6747-6_23. [DOI] [PubMed] [Google Scholar]
  2. Aghamohammadi A, Mirian-Hosseinabadi SH, & Jalali S (2021). Statement frequency coverage: a code coverage criterion for assessing test suite effectiveness. Information and Software Technology, 129, 106426. 10.1016/j.infsof.2020.106426. [DOI] [Google Scholar]
  3. Agrawal S, Kumar S, Sehgal R, George S, Gupta R, Poddar S, Jha A, & Pathak S (2019). El-MAVEN: A Fast, Robust, and User-Friendly Mass Spectrometry Data Processing Engine for Metabolomics. Methods in Molecular Biology (Clifton, N.J.), 1978, 301–321. 10.1007/978-1-4939-9236-2_19 [DOI] [PubMed] [Google Scholar]
  4. Alonso A, Julià A, Beltran A, Vinaixa M, Díaz M, Ibañez L, Correig X, & Marsal S (2011). AStream: an R package for annotating LC/MS metabolomic data. Bioinformatics, 27(9), 1339–1340. 10.1093/bioinformatics/btr138. [DOI] [PubMed] [Google Scholar]
  5. Analytica Chimica Acta | Journal | ScienceDirect.com by Elsevier. (n.d.). Retrieved September 16, from https://www.sciencedirect.com/journal/analytica-chimica-acta [Google Scholar]
  6. Barker M, Chue Hong NP, Katz DS, Lamprecht AL, Martinez-Ortiz C, Psomopoulos F, Harrow J, Castro LJ, Gruenpeter M, Martinez PA, & Honeyman T (2022). Introducing the FAIR principles for research software. Scientific Data, 9(1), 10.1038/s41597-022-01710-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Berrios DC, Beheshti A, & Costes SV (n.d.). FAIRness and Usability for Open-access Omics Data Systems. 10. [PMC free article] [PubMed] [Google Scholar]
  8. Broeckling CD, Afsar FA, Neumann S, Ben-Hur A, & Prenni JE (2014). RAMClust: a Novel feature clustering method enables spectral-matching-based annotation for Metabolomics Data. Analytical Chemistry, 86(14), 6812–6817. 10.1021/ac501530d. [DOI] [PubMed] [Google Scholar]
  9. Brunius C, Shi L, & Landberg R (2016). Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics: Official Journal of the Metabolomic Society, 12(11), 173. 10.1007/s11306-016-1124-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bueschl C, Kluger B, Neumann NKN, Doppler M, Maschietto V, Thallinger GG, Meng-Reiterer J, Krska R, & Schuhmacher R (2017). MetExtract II: a Software suite for stable isotope-assisted untargeted metabolomics. Analytical Chemistry, 89(17), 9518–9526. 10.1021/acs.analchem.7b02518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cai Y, Weng K, Guo Y, Peng J, & Zhu ZJ (2015). An integrated targeted metabolomic platform for high-throughput metabolite profiling and automated data processing. Metabolomics, 11(6), 1575–1586. 10.1007/s11306-015-0809-4. [DOI] [Google Scholar]
  12. Capellades J, Navarro M, Samino S, Garcia-Ramirez M, Hernandez C, Simo R, Vinaixa M, & Yanes O (2016). geoRge: a computational Tool to detect the Presence of stable isotope labeling in LC/MS-Based untargeted metabolomics. Analytical Chemistry, 88(1), 621–628. 10.1021/acs.analchem.5b03628. [DOI] [PubMed] [Google Scholar]
  13. Chokkathukalam A, Jankevics A, Creek DJ, Achcar F, Barrett MP, & Breitling R (2013). mzMatch–ISO: an R tool for the annotation and relative quantification of isotope-labelled mass spectrometry data. Bioinformatics, 29(2), 281–283. 10.1093/bioinformatics/bts674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chong J, & Xia J (2018). MetaboAnalystR: an R package for flexible and reproducible analysis of metabolomics data. Bioinformatics, 34(24), 4313–4314. 10.1093/bioinformatics/bty528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chue Hong NP, Katz DS, Barker M, Lamprecht AL, Martinez C, Psomopoulos FE, Harrow J, Castro LJ, Gruenpeter M, Martinez PA, Honeyman T, Struck A, Lee A, Loewe A, van Werkhoven B, Jones C, Garijo D, Plomp E, & Genova F (2022). … WG, R. F. FAIR Principles for Research Software (FAIR4RS Principles). 10.15497/RDA00068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Clasquin MF, Melamud E, & Rabinowitz JD (2012). LC-MS Data Processing with MAVEN: A Metabolomic Analysis and Visualization Engine. Current Protocols in Bioinformatics / Editoral Board, Andreas D. Baxevanis … et Al.], 0 14, Unit14.11. 10.1002/0471250953.bi1411s37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Considine EC, & Salek RM (2019). A Tool to encourage Minimum Reporting Guideline Uptake for Data Analysis in Metabolomics. Metabolites, 9(3), E43. 10.3390/metabo9030043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Covidence—Better systematic review management. (n.d.). Covidence. Retrieved April 6, from https://www.covidence.org/ [Google Scholar]
  19. Creek DJ, Jankevics A, Burgess KEV, Breitling R, & Barrett MP (2012). IDEOM: an Excel interface for analysis of LC-MS-based metabolomics data. Bioinformatics (Oxford England), 28(7), 1048–1049. 10.1093/bioinformatics/bts069. [DOI] [PubMed] [Google Scholar]
  20. Davidson RL, Weber RJM, Liu H, Sharma-Oates A, & Viant MR (2016). Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data. GigaScience, 5(1), 10. 10.1186/s13742-016-0115-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. De Livera AM, Olshansky G, Simpson JA, & Creek DJ (2018). NormalizeMets: assessing, selecting and implementing statistical methods for normalizing metabolomics data. Metabolomics, 14(5), 54. 10.1007/s11306-018-1347-7. [DOI] [PubMed] [Google Scholar]
  22. De Vos RC, Moco S, Lommen A, Keurentjes JJ, Bino RJ, & Hall RD (2007). Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nature Protocols, 2(4), 10.1038/nprot.2007.95. [DOI] [PubMed] [Google Scholar]
  23. Decan A, Mens T, Claes M, & Grosjean P (2015). On the Development and Distribution of R Packages: An Empirical Analysis of the R Ecosystem. Proceedings of the 2015 European Conference on Software Architecture Workshops, 1–6. 10.1145/2797433.2797476 [DOI] [Google Scholar]
  24. DeFelice BC, Mehta SS, Samra S, Čajka T, Wancewicz B, Fahrmann JF, & Fiehn O (2017). Mass Spectral feature list optimizer (MS-FLO): a Tool to minimize false positive peak reports in untargeted liquid Chromatography–Mass Spectroscopy (LC-MS) data Processing. Analytical Chemistry, 89(6), 3250–3255. 10.1021/acs.analchem.6b04372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Del Carratore F, Schmidt K, Vinaixa M, Hollywood KA, Greenland-Bews C, Takano E, Rogers S, & Breitling R (2019). Integrated Probabilistic Annotation: a bayesian-based annotation method for metabolomic profiles integrating biochemical connections, isotope patterns, and Adduct Relationships. Analytical Chemistry, 91(20), 12799–12807. 10.1021/acs.analchem.9b02354. [DOI] [PubMed] [Google Scholar]
  26. Directorate-General for Research and Innovation (European Commission). (2018). Turning FAIR into reality: final report and action plan from the European Commission expert group on FAIR data. Publications Office of the European Union. 10.2777/1524. [DOI] [Google Scholar]
  27. Du X, Aristizabal-Henao JJ, Garrett TJ, Brochhausen M, Hogan WR, & Lemas DJ (2022). A checklist for reproducible computational analysis in clinical Metabolomics Research. Metabolites, 12(1), 10.3390/metabo12010087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Dührkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, & Böcker S (2019). SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nature Methods, 16(4), 10.1038/s41592-019-0344-8. Article 4. [DOI] [PubMed] [Google Scholar]
  29. Fiehn O, Sumner LW, Rhee SY, Ward J, Dickerson J, Lange BM, Lane G, Roessner U, Last R, & Nikolau B (2007). Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics, 3(3), 195–201. 10.1007/s11306-007-0068-0. [DOI] [Google Scholar]
  30. fillPeaks-methods: Integrate areas of missing peaks in xcms: LC-MS and GC-MS Data Analysis. (n.d.). Retrieved April 6, from https://rdrr.io/bioc/xcms/man/fillPeaks-methods.html [Google Scholar]
  31. Fischer D, Panse C, & Laczko E (2022). cosmiq: Cosmiq - COmbining Single Masses Into Quantities (1.28.0). Bioconductor version: Release (3.14). 10.18129/B9.bioc.cosmiq [DOI] [Google Scholar]
  32. Franceschi P, Mylonas R, Shahaf N, Scholz M, Arapitsas P, Masuero D, Weingart G, Carlin S, Vrhovsek U, Mattivi F, & Wehrens R (2014). MetaDB a Data Processing Workflow in untargeted MS-Based Metabolomics experiments. Frontiers in Bioengineering and Biotechnology, 2, 72. 10.3389/fbioe.2014.00072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gatto L, Gibb S, & Rainer J (2021). MSnbase, efficient and elegant R-Based Processing and visualization of raw Mass Spectrometry Data. Journal of Proteome Research, 20(1), 1063–1069. 10.1021/acs.jproteome.0c00313. [DOI] [PubMed] [Google Scholar]
  34. Georgeson P, Syme A, Sloggett C, Chung J, Dashnow H, Milton M, Lonsdale A, Powell D, Seemann T, & Pope B (2019). Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. GigaScience, 8, giz109. 10.1093/gigascience/giz109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Giacomoni F, Le Corguillé G, Monsoor M, Landi M, Pericard P, Pétéra M, Duperier C, Tremblay-Franco M, Martin JF, Jacob D, Goulitquer S, Thévenot EA, & Caron C (2015). Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics. Bioinformatics, 31(9), 1493–1495. 10.1093/bioinformatics/btu813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Goodacre R, Broadhurst D, Smilde AK, Kristal BS, Baker JD, Beger R, Bessant C, Connor S, Capuani G, Craig A, Ebbels T, Kell DB, Manetti C, Newton J, Paternostro G, Somorjai R, Sjöström M, Trygg J, & Wulfert F (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231–241. 10.1007/s11306-007-0081-3. [DOI] [Google Scholar]
  37. Goodman SN, Fanelli D, & Ioannidis JPA (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 341ps12-341ps12. [DOI] [PubMed] [Google Scholar]
  38. Guo J, Shen S, Xing S, & Huan T (2021). DaDIA: Hybridizing Data-Dependent and Data-Independent Acquisition Modes for Generating High-Quality Metabolomic Data.Analytical Chemistry, 93(4),2669–2677. 10.1021/acs.analchem.0c05022 [DOI] [PubMed] [Google Scholar]
  39. Hao L, Wang J, Page D, Asthana S, Zetterberg H, Carlsson C, Okonkwo OC, & Li L (2018). Comparative evaluation of MS-based Metabolomics Software and its application to preclinical Alzheimer’s Disease. Scientific Reports, 8(1), 10.1038/s41598-018-27031-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hasselbring W, Carr L, Hettrick S, Packer H, & Tiropanis T (2020). From FAIR research data toward FAIR and open research software. It - Information Technology, 62(1), 39–47. 10.1515/itit-2019-0040. [DOI] [Google Scholar]
  41. Heil BJ, Hoffman MM, Markowetz F, Lee SI, Greene CS, & Hicks SC (2021). Reproducibility standards for machine learning in the life sciences. Nature Methods, 18(10), 1132–1135. 10.1038/s41592-021-01256-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Helmus R, ter Laak TL, van Wezel AP, de Voogt P, & Schymanski EL (2021). patRoon: open source software platform for environmental mass spectrometry based non-target screening. Journal of Cheminformatics, 13(1), 1. 10.1186/s13321-020-00477-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Huan T, & Li L (2015a). Counting missing values in a metabolite-intensity data set for measuring the analytical performance of a metabolomics platform. Analytical Chemistry, 87(2), 1306–1313. 10.1021/ac5039994. [DOI] [PubMed] [Google Scholar]
  44. Huan T, & Li L (2015b). Quantitative metabolome analysis based on Chromatographic Peak Reconstruction in Chemical isotope labeling liquid chromatography Mass Spectrometry. Analytical Chemistry, 87(14), 7011–7016. 10.1021/acs.analchem.5b01434. [DOI] [PubMed] [Google Scholar]
  45. Huang X, Chen YJ, Cho K, Nikolskiy I, Crawford PA, & Patti GJ (2014). X13CMS: Global Tracking of Isotopic Labels in untargeted metabolomics. Analytical Chemistry, 86(3), 1632–1639. 10.1021/ac403384n. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Huber C, Nijssen R, Mol H, Philippe Antignac J, Krauss M, Brack W, Wagner K, Debrauwer L, Vitale M, Price CJ, Klanova E, Molina JG, Leon B, Pardo N, Fernández O, Szigeti SF, Középesy T, Šulc S, Čupr L, & Lommen P, A (2022). A large scale multi-laboratory suspect screening of pesticide metabolites in human biomonitoring: from tentative annotations to verified occurrences. Environment International, 168, 107452. 10.1016/j.envint.2022.107452. [DOI] [PubMed] [Google Scholar]
  47. Huber F, Ridder L, Verhoeven S, Spaaks JH, Diblen F, Rogers S, & van der Hooft JJJ (2021). Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships. PLOS Computational Biology, 17(2), e1008724. 10.1371/journal.pcbi.1008724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Huber F, van der Burg S, van der Hooft JJJ, & Ridder L (2021). MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. Journal of Cheminformatics, 13(1), 84. 10.1186/s13321-021-00558-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Hughes G, Cruickshank-Quinn C, Reisdorph R, Lutz S, Petrache I, Reisdorph N, Bowler R, & Kechris K (2014). MSPrep—Summarization, normalization and diagnostics for processing of mass spectrometry–based metabolomic data. Bioinformatics, 30(1), 133–134. 10.1093/bioinformatics/btt589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hunter-Zinck H, de Siqueira AF, Vásquez VN, Barnes R, & Martinez CC (2021). Ten simple rules on writing clean and reliable open-source scientific software. PLOS Computational Biology, 17(11), e1009481. 10.1371/journal.pcbi.1009481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ison J, Ienasescu H, Chmura P, Rydza E, Ménager H, Kalaš M, Schwämmle V, Grüning B, Beard N, Lopez R, Duvaud S, Stockinger H, Persson B, Vařeková RS, Raček T, Vondrášek J, Peterson H, Salumets A, Jonassen I, & Brunak S (2019). The bio.tools registry of software tools and data resources for the life sciences. Genome Biology, 20(1), 164. 10.1186/s13059-019-1772-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Jaitly N, Mayampurath A, Littlefield K, Adkins JN, Anderson GA, & Smith RD (2009). Decon2LS: an open-source software package for automated processing and visualization of high resolution mass spectrometry data. Bmc Bioinformatics, 10(1), 87. 10.1186/1471-2105-10-87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Ji H, Xu Y, Lu H, & Zhang Z (2019). Deep MS/MS-Aided structural-similarity scoring for unknown metabolite identification. Analytical Chemistry, 91(9), 5629–5637. 10.1021/acs.analchem.8b05405. [DOI] [PubMed] [Google Scholar]
  54. Ji H, Zeng F, Xu Y, Lu H, & Zhang Z (2017). KPIC2: an effective Framework for Mass Spectrometry-Based Metabolomics using pure Ion Chromatograms. Analytical Chemistry, 89(14), 7631–7640. 10.1021/acs.analchem.7b01547. [DOI] [PubMed] [Google Scholar]
  55. Jiménez RC, Kuzak M, Alhamdoosh M, Barker M, Batut B, Borg M, Capella-Gutierrez S, Chue Hong N, Cook M, Corpas M, Flannery M, Garcia L, Gelpí JL, Gladman S, Goble C, González Ferreiro M, Gonzalez-Beltran A, Griffin PC, Grüning B, & Crouch S (2017). Four simple recommendations to encourage best practices in research software. F1000Research, 6, ELIXIR-876. 10.12688/f1000research.11407.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. de Jonge NF, Louwen JR, Chekmeneva E, Camuzeaux S, Vermeir FJ, Jansen RS, Huber F, & van der Hooft JJJ (2022). MS2Query: Reliable and Scalable MS2 Mass Spectral-based Analogue Search (p. 2022.07.22.501125). bioRxiv. 10.1101/2022.07.22.501125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kantz ED, Tiwari S, Watrous JD, Cheng S, & Jain M (2019). Deep neural networks for classification of LC-MS spectral peaks. Analytical Chemistry, 91(19), 12407–12413. 10.1021/acs.analchem.9b02983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Karimzadeh M, & Hoffman MM (2018). Top considerations for creating bioinformatics software documentation. Briefings in Bioinformatics, 19(4), 693–699. 10.1093/bib/bbw134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Kasalica V, Schwämmle V, Palmblad M, Ison J, & Lamprecht AL (2021). APE in the Wild: Automated Exploration of Proteomics Workflows in the bio.tools Registry. Journal of Proteome Research, 20(4), 2157–2165. 10.1021/acs.jproteome.0c00983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Katz DS, Barker M, Chue Hong NP, Castro LJ, & Martinez PA (2021, June 28). The FAIR4RS team: Working together to make research software FAIR. 2021 Collegeville Workshop on Scientific Software - Software Teams (Collegeville2021). Zenodo. 10.5281/zenodo.5037157 [DOI] [Google Scholar]
  61. Katz DS, Gruenpeter M, & Honeyman T (2021). Taking a fresh look at FAIR for research software. Patterns, 2(3), 100222. 10.1016/j.patter.2021.100222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Kuhl C, Tautenhahn R, Böttcher C, Larson TR, & Neumann S (2012). CAMERA: an Integrated strategy for compound Spectra extraction and annotation of Liquid Chromatography/Mass Spectrometry Data Sets. Analytical Chemistry, 84(1), 283–289. 10.1021/ac202450g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Kutuzova S, Colaianni P, Röst H, Sachsenberg T, Alka O, Kohlbacher O, Burla B, Torta F, Schrübbers L, Kristensen M, Nielsen L, Herrgård MJ, & McCloskey D (2020). SmartPeak automates targeted and quantitative Metabolomics Data Processing. Analytical Chemistry, 92(24), 15968–15974. 10.1021/acs.analchem.0c03421 [DOI] [PubMed] [Google Scholar]
  64. Lamprecht A-L, Garcia L, Kuzak M, Martinez C, Arcila R, Martin Del Pico E, Dominguez Del Angel V, van de Sandt S, Ison J, Martinez PA, McQuilton P, Valencia A, Harrow J, Psomopoulos F, Gelpi JL, Chue Hong N, Goble C, & Capella-Gutierrez S (2020). Towards FAIR principles for research software.Data Science, 3(1), 37–59. 10.3233/DS-190026 [DOI] [Google Scholar]
  65. Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Manir MSA, Altintas I, Baker CJO, Amor ABH, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, & Wolstencroft K (2021). Perspectives on automated composition of workflows in the life sciences (10:897). F1000Research. 10.12688/f1000research.54159.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Lee BD (2018). Ten simple rules for documenting scientific software. PLOS Computational Biology, 14(12), e1006561. 10.1371/journal.pcbi.1006561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Leprevost FV, Barbosa VC, Francisco EL, Perez-Riverol Y, & Carvalho PC (2014). On best practices in the development of bioinformatics software. Frontiers in Genetics, 5, 199. 10.3389/fgene.2014.00199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Li B, Tang J, Yang Q, Li S, Cui X, Li Y, Chen Y, Xue W, Li X, & Zhu F (2017). NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Research, 45(W1), W162–W170. 10.1093/nar/gkx449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Li Z, Lu Y, Guo Y, Cao H, Wang Q, & Shui W (2018). Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection. Analytica Chimica Acta, 1029, 50–57. 10.1016/j.aca.2018.05.001. [DOI] [PubMed] [Google Scholar]
  70. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, & Moher D (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. Bmj, 339, b2700. 10.1136/bmj.b2700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Libiseller G, Dvorzak M, Kleb U, Gander E, Eisenberg T, Madeo F, Neumann S, Trausinger G, Sinner F, Pieber T, & Magnes C (2015). IPO: a tool for automated optimization of XCMS parameters. Bmc Bioinformatics, 16(1), 118. 10.1186/s12859-015-0562-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Liggi S, Hinz C, Hall Z, Santoru ML, Poddighe S, Fjeldsted J, Atzori L, & Griffin JL (2018). KniMet: a pipeline for the processing of chromatography–mass spectrometry metabolomics data. Metabolomics, 14(4), 52. 10.1007/s11306-018-1349-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Liu Q, Walker D, Uppal K, Liu Z, Ma C, Tran V, Li S, Jones DP, & Yu T (2020). Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Scientific Reports, 10(1), 13856. 10.1038/s41598-020-70850-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Lommen A (2009). MetAlign: Interface-Driven, Versatile Metabolomics Tool for Hyphenated full-scan Mass Spectrometry Data Preprocessing. Analytical Chemistry, 81(8), 3079–3086. 10.1021/ac900036d. [DOI] [PubMed] [Google Scholar]
  75. Loos M (2016). enviPick: Peak Picking for High Resolution Mass Spectrometry Data (1.5). https://CRAN.R-project.org/package=enviPick [Google Scholar]
  76. Mahieu NG, Spalding JL, & Patti GJ (2016). Warpgroup: increased precision of metabolomic data processing by consensus integration bound analysis. Bioinformatics, 32(2), 268–275. 10.1093/bioinformatics/btv564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Malone J, Brown A, Lister AL, Ison J, Hull D, Parkinson H, & Stevens R (2014). The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. Journal of Biomedical Semantics, 5(1), 25. 10.1186/2041-1480-5-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Mayer G, Montecchi-Palazzi L, Ovelleiro D, Jones AR, Binz PA, Deutsch EW, Chambers M, Kallhardt M, Levander F, Shofstahl J, Orchard S, Vizcaíno JA, Hermjakob H, Stephan C, Meyer HE, Eisenacher M, & HUPO-PSI Group. (2013). &. The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary. Database: The Journal of Biological Databases and Curation, 2013, bat009. 10.1093/database/bat009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Mayer G, Müller W, Schork K, Uszkoreit J, Weidemann A, Wittig U, Rey M, Quast C, Felden J, Glöckner FO, Lange M, Arend D, Beier S, Junker A, Scholz U, Schüler D, Kestler HA, Wibberg D, Pühler A, & Turewicz M (2021). Implementing FAIR data management within the German Network for Bioinformatics infrastructure (de.NBI) exemplified by selected use cases. Briefings in Bioinformatics, 22(5), bbab010. 10.1093/bib/bbab010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Mendez KM, Pritchard L, Reinke SN, & Broadhurst DI (2019). Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing. Metabolomics, 15(10), 125. 10.1007/s11306-019-1588-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Menke J, Roelandse M, Ozyurt B, Martone M, & Bandrowski A (2020). The rigor and transparency Index Quality Metric for assessing Biological and Medical Science Methods. IScience, 23(11), 101698. 10.1016/j.isci.2020.101698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Misra BB (2018). New tools and resources in metabolomics: 2016–2017. ELECTROPHORESIS, 39(7), 909–923. 10.1002/elps.201700441. [DOI] [PubMed] [Google Scholar]
  83. Misra BB (2021). New software tools, databases, and resources in metabolomics: updates from 2020. Metabolomics, 17(5), 49. 10.1007/s11306-021-01796-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Misra BB, Fahrmann JF, & Grapov D (2017). Review of emerging metabolomic tools and resources: 2015–2016. ELECTROPHORESIS, 38(18), 2257–2274. 10.1002/elps.201700110. [DOI] [PubMed] [Google Scholar]
  85. Misra BB, & Mohapatra S (2019). Tools and resources for metabolomics research community: a 2017–2018 update. ELECTROPHORESIS, 40(2), 227–246. 10.1002/elps.201800428. [DOI] [PubMed] [Google Scholar]
  86. Müller E, Huber CE, Brack W, Krauss M, & Schulze T (2020). Symbolic aggregate approximation improves gap filling in high-resolution Mass Spectrometry Data Processing. Analytical Chemistry, 92(15), 10425–10432. 10.1021/acs.analchem.0c00899. [DOI] [PubMed] [Google Scholar]
  87. Olivon F, Elie N, Grelier G, Roussi F, Litaudon M, & Touboul D (2018). MetGem Software for the generation of Molecular Networks based on the t-SNE algorithm. Analytical Chemistry, 90(23), 13900–13908. 10.1021/acs.analchem.8b03099. [DOI] [PubMed] [Google Scholar]
  88. O’Shea K, & Misra BB (2020). Software tools, databases and resources in metabolomics: updates from 2018 to 2019. Metabolomics, 16(3), 36. 10.1007/s11306-020-01657-3. [DOI] [PubMed] [Google Scholar]
  89. Palarea-Albaladejo J, Mclean K, Wright F, & Smith DGE (2018). MALDIrppa: quality control and robust analysis for mass spectrometry data. Bioinformatics, 34(3), 522–523. 10.1093/bioinformatics/btx628. [DOI] [PubMed] [Google Scholar]
  90. Palmblad M, Lamprecht AL, Ison J, & Schwämmle V (2019). Automated workflow composition in mass spectrometry-based proteomics. Bioinformatics, 35(4), 656–664. 10.1093/bioinformatics/bty646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Pang Z, Chong J, Zhou G, de Lima Morais DA, Chang L, Barrette M, Gauthier C, Jacques P, Li S, & Xia J (2021). MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Research, 49(W1), W388–W396. 10.1093/nar/gkab382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Pluskal T, Castillo S, Villar-Briones A, & Orešič M (2010). MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. Bmc Bioinformatics, 11(1), 395. 10.1186/1471-2105-11-395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Protsyuk I, Melnik AV, Nothias LF, Rappez L, Phapale P, Aksenov AA, Bouslimani A, Ryazanov S, Dorrestein PC, & Alexandrov T (2018). 3D molecular cartography using LC-MS facilitated by Optimus and ‘ili software. Nature Protocols, 13(1), 134–154. 10.1038/nprot.2017.122. [DOI] [PubMed] [Google Scholar]
  94. Rainer J, Vicini A, Salzer L, Stanstrup J, Badia JM, Neumann S, Stravs MA, Hernandes V, Gatto V, Gibb L,S, , & Witting M (2022). A modular and expandable ecosystem for Metabolomics Data Annotation in R. Metabolites, 12(2), 10.3390/metabo12020173. Article 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Ram K (2013). Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine, 8(1), 7. 10.1186/1751-0473-8-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Referencing and citing content. (n.d.). GitHub Docs. Retrieved December 30, from https://ghdocs-prod.azurewebsites.net/en/repositories/archiving-a-github-repository/referencing-and-citing-content. [Google Scholar]
  97. Review checklist—JOSS documentation. (n.d.). Retrieved April 28, from https://joss.readthedocs.io/en/latest/review_checklist.html [Google Scholar]
  98. RforMassSpectrometry. (n.d.). Retrieved January 14, from https://www.rformassspectrometry.org/ [Google Scholar]
  99. Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, van Schaik R, & Vervoort J (2012). Substructure-based annotation of high-resolution multistage MSn spectral trees. Rapid Communications in Mass Spectrometry, 26(20), 2461–2471. 10.1002/rcm.6364. [DOI] [PubMed] [Google Scholar]
  100. Rocca-Serra P, & Sansone SA (2019). Experiment design driven FAIRification of omics data matrices, an exemplar. Scientific Data, 6(1), 10.1038/s41597-019-0286-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Romano JD, & Moore JH (2020). Ten simple rules for writing a paper about scientific software. PLOS Computational Biology, 16(11), e1008390. 10.1371/journal.pcbi.1008390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Ross DH, Cho JH, Zhang R, Hines KM, & Xu L (2020). LiPydomics: a Python Package for Comprehensive Prediction of lipid Collision Cross sections and Retention Times and Analysis of Ion Mobility-Mass Spectrometry-Based Lipidomics Data. Analytical Chemistry, 92(22), 14967–14975. 10.1021/acs.analchem.0c02560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Röst HL, Sachsenberg T, Aiche S, Bielow C, Weisser H, Aicheler F, Andreotti S, Ehrlich HC, Gutenbrunner P, Kenar E, Liang X, Nahnsen S, Nilse L, Pfeuffer J, Rosenberger G, Rurik M, Schmitt U, Veit J, Walzer M, & Kohlbacher O (2016). OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nature Methods, 13(9), 741–748. 10.1038/nmeth.3959. [DOI] [PubMed] [Google Scholar]
  104. Ruttkies C, Schymanski EL, Wolf S, Hollender J, & Neumann S (2016). MetFrag relaunched: incorporating strategies beyond in silico fragmentation. Journal of Cheminformatics, 8(1), 3. 10.1186/s13321-016-0115-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Savoi S, Arapitsas P, Duchêne É, Nikolantonaki M, Ontañón I, Carlin S, Schwander F, Gougeon RD, Ferreira ACS, Theodoridis G, Töpfer R, Vrhovsek U, Adam-Blondon AF, Pezzotti M, & Mattivi F (2021). Grapevine and wine metabolomics-based guidelines for FAIR data and Metadata Management. Metabolites, 11(11), 757. 10.3390/metabo11110757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Schober P, Boer C, & Schwarte LA (2018). Correlation coefficients: appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763–1768. 10.1213/ANE.0000000000002864. [DOI] [PubMed] [Google Scholar]
  107. Seemann T (2013). Ten recommendations for creating usable bioinformatics command line software. GigaScience, 2(1), 15. 10.1186/2047-217X-2-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Senington R, Pataki B, & Wang XV (2018). Using docker for factory system software management: experience report. Procedia CIRP, 72, 659–664. 10.1016/j.procir.2018.03.173. [DOI] [Google Scholar]
  109. Shen X, Wang R, Xiong X, Yin Y, Cai Y, Ma Z, Liu N, & Zhu ZJ (2019). Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics. Nature Communications, 10(1), 10.1038/s41467-019-09550-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Smith CA, Want EJ, O’Maille G, Abagyan R, & Siuzdak G (2006). Matching, and Identification. Analytical Chemistry, 78(3), 779–787. 10.1021/ac051437y. XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment,. [DOI] [PubMed] [Google Scholar]
  111. Snyder M, Mias G, Stanberry L, & Kolker E (2014). Metadata Checklist for the Integrated Personal OMICS Study: Proteomics and Metabolomics experiments. OMICS: A Journal of Integrative Biology, 18(1), 81–85. 10.1089/omi.2013.0148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Spicer R, Salek RM, Moreno P, Cañueto D, & Steinbeck C (2017). Navigating freely-available software tools for metabolomics analysis. Metabolomics: Official Journal of the Metabolomic Society, 13(9), 106. 10.1007/s11306-017-1242-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Stanstrup J, Broeckling CD, Helmus R, Hoffmann N, Mathé E, Naake T, Nicolotti L, Peters K, Rainer J, Salek RM, Schulze T, Schymanski EL, Stravs MA, Thévenot EA, Treutler H, Weber RJM, Willighagen E, Witting M, & Neumann S (2019). The metaRbolomics Toolbox in Bioconductor and beyond. Metabolites, 9(10), 10.3390/metabo9100200. Article 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Sumner LW, Amberg A, Barrett D, Beale MH, Beger R, Daykin CA, Fan TWM, Fiehn O, Goodacre R, Griffin JL, Hankemeier T, Hardy N, Harnly J, Higashi R, Kopka J, Lane AN, Lindon JC, Marriott P, Nicholls AW, & Viant MR (2007). Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics: Official Journal of the Metabolomic Society, 3(3), 211–221. 10.1007/s11306-007-0082-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Tautenhahn R, Patti GJ, Rinehart D, & Siuzdak G (2012). XCMS Online: a web-based platform to process untargeted metabolomic data. Analytical Chemistry, 84(11), 5035–5039. 10.1021/ac300698c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Teo G, Chew WS, Burla BJ, Herr D, Tai ES, Wenk MR, Torta F, & Choi H (2020). MRMkit: Automated Data Processing for large-scale targeted Metabolomics Analysis. Analytical Chemistry, 92(20), 13677–13682. 10.1021/acs.analchem.0c03060. [DOI] [PubMed] [Google Scholar]
  117. Tsugawa H, Cajka T, Kind T, Ma Y, Higgins B, Ikeda K, Kanazawa M, VanderGheynst J, Fiehn O, & Arita M (2015). MS-DIAL: Data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nature Methods, 12(6), 523–526. 10.1038/nmeth.3393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Tsugawa H, Kind T, Nakabayashi R, Yukihira D, Tanaka W, Cajka T, Saito K, Fiehn O, & Arita M (2016). Hydrogen rearrangement rules: computational MS/MS fragmentation and structure elucidation using MS-FINDER Software. Analytical Chemistry, 88(16), 7946–7958. 10.1021/acs.analchem.6b00770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Uppal K, Soltow QA, Strobel FH, Pittard WS, Gernert KM, Yu T, & Jones DP (2013). xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data. Bmc Bioinformatics, 14(1), 15. 10.1186/1471-2105-14-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Uppal K, Walker DI, & Jones DP (2017). xMSannotator: an R Package for Network-Based annotation of high-resolution Metabolomics Data. Analytical Chemistry, 89(2), 1063–1067. 10.1021/acs.analchem.6b01214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. van de Sandt S, Nielsen LH, Ioannidis A, Muench A, Henneken E, Accomazzi A, Bigarella C, Lopez JBG, & Dallmeier-Tiessen S (2019). Practice meets Principle: Tracking Software and Data Citations to Zenodo DOIs (arXiv:1911.00295). arXiv. 10.48550/arXiv.1911.00295 [DOI] [Google Scholar]
  122. Vesteghem C, Brøndum RF, Sønderkær M, Sommer M, Schmitz A, Bødker JS, Dybkær K, El-Galaly TC, & Bøgsted M (2020). Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives. Briefings in Bioinformatics, 21(3), 936–945. 10.1093/bib/bbz044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Vitale CM, Lommen A, Huber C, Wagner K, Garlito Molina B, Nijssen R, Price EJ, Blokland M, van Tricht F, Mol HGJ, Krauss M, Debrauwer L, Pardo O, Leon N, Klanova J, & Antignac JP (2022). Harmonized Quality Assurance/Quality control provisions for nontargeted measurement of urinary pesticide biomarkers in the HBM4EU Multisite SPECIMEn Study. Analytical Chemistry, 94(22), 7833–7843. 10.1021/acs.analchem.2c00061. [DOI] [PubMed] [Google Scholar]
  124. Weber RJM, & Viant MR (2010). MI-Pack: increased confidence of metabolite identification in mass spectra by integrating accurate masses and metabolic pathways. Chemometrics and Intelligent Laboratory Systems, 104(1), 75–82. 10.1016/j.chemolab.2010.04.010. [DOI] [Google Scholar]
  125. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, & Mons B (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), Article 1. 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Wilkinson MD, Dumontier M, Sansone SA, Bonino da Silva Santos LO., Prieto M, Batista D, McQuilton P, Kuhn T, Rocca-Serra, P, Crosas M, & Schultes E (2019). Evaluating FAIR maturity through a scalable, automated, community-governed framework. Scientific Data, 6(1), 10.1038/s41597-019-0184-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Wolf M, Logan J, Mehta K, Jacobson D, Cashman M, Walker AM, Eisenhauer G, Widener P, & Cliff A (2021). Reusability First: Toward FAIR Workflows. 2021 IEEE International Conference on Cluster Computing (CLUSTER), 444–455. 10.1109/Cluster48925.2021.00053 [DOI] [Google Scholar]
  128. Yu T, Park Y, Johnson JM, & Jones DP (2009). ApLCMS—adaptive processing of high-resolution LC/MS data. Bioinformatics, 25(15), 1930–1936. 10.1093/bioinformatics/btp291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Zhang X, Li Q, Xu Z, & Dou J (2020). Mass spectrometry-based metabolomics in health and medical science: a systematic review. RSC Advances, 10(6), 3092–3104. 10.1039/C9RA08985C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. Zhao J, Gómez-Pérez J, Belhajjame K, Klyne G, García-Cuesta E, Garrido A, Hettne K, Roos M, Roure DD, & Goble C (2012). Why workflows break—Understanding and combating decay in Taverna workflows. 2012 IEEE 8th International Conference on E-Science. 10.1109/eScience.2012.6404482 [DOI] [Google Scholar]
  131. Zheng CL, Ratnakar V, Gil Y, & McWeeney SK (2015). Use of semantic workflows to enhance transparency and reproducibility in clinical omics. Genome Medicine, 7(1), 73. 10.1186/s13073-015-0202-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Zhou B, Xiao JF, Tuli L, & Ressom HW (2012). LC-MS-based metabolomics. Molecular BioSystems, 8(2), 470–481. 10.1039/c1mb05350g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Zhou R, Tseng CL, Huan T, & Li L (2014). IsoMS: automated processing of LC-MS data generated by a chemical isotope labeling metabolomics platform. Analytical Chemistry, 86(10), 4675–4679. 10.1021/ac5009089. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S6
Table S5
Main
Table S4

RESOURCES