Skip to main content
PLOS One logoLink to PLOS One
. 2023 Nov 30;18(11):e0294750. doi: 10.1371/journal.pone.0294750

BioDiscViz: A visualization support and consensus signature selector for BioDiscML results

Sophiane Bouirdene 1, Mickael Leclercq 1, Léopold Quitté 1, Steve Bilodeau 2,3, Arnaud Droit 1,*
Editor: Achraf El Allali4
PMCID: PMC10688618  PMID: 38033002

Abstract

Machine learning (ML) algorithms are powerful tools to find complex patterns and biomarker signatures when conventional statistical methods fail to identify them. While the ML field made significant progress, state of the art methodologies to build efficient and non-overfitting models are not always applied in the literature. To this purpose, automatic programs, such as BioDiscML, were designed to identify biomarker signatures and correlated features while escaping overfitting using multiple evaluation strategies, such as cross validation, bootstrapping and repeated holdout. To further improve BioDiscML and reach a broader audience, better visualization support and flexibility in choosing the best models and signatures are needed. Thus, to provide researchers with an easily accessible and usable tool for in depth investigation of the results from BioDiscML outputs, we developed a visual interaction tool called BioDiscViz. This tool provides summaries, tables and graphics, in the form of Principal Component Analysis (PCA) plots, UMAP, t-SNE, heatmaps and boxplots for the best model and the correlated features. Furthermore, this tool also provides visual support to extract a consensus signature from BioDiscML models using a combination of filters. BioDiscViz will be a great visual support for research using ML, hence new opportunities in this field by opening it to a broader community.

Introduction

In recent years, new methods of Artificial Intelligence (AI) have been deployed in bioinformatics research to provide pattern classification, biomarker identification and forecast modeling using omics data. Studying biomarker signatures is an important part of the research process as they are correlated to biological functions. Machine learning and feature selection will identify multivariate associations of biomarkers (i.e., features) and detect complex hidden patterns in the data. Considering the existence of many algorithms for feature selection and classification, multiple models are often generated with different signatures, but inconsistent overlaps between signatures were observed despite equivalent performances being frequent [1]. Furthermore, correlated features may not be retained by the models during their optimization when avoiding redundancy of information. Indeed, selecting a “best model” and its signatures is an equilibrium between decomplexifying the model and getting all valuable biomarkers. Often, various approaches, like ensemble learning or union of overlapping features, tend to find optimized solutions but at the cost of either side of the balance.

A solution to facilitate the generation of multiple models and signatures has been proposed with an automatic ML tool, BioDiscML [2]. BioDiscML is a new generation ML tool which has been demonstrated to be highly efficient in multiple research topics involving the identification of biomarker signatures from various types of data, such as proteomics [3], transcriptomics [4] and multi-omics (metagenomics/metabolomics, metagenomics/lipidomics) [5, 6]. Furthermore, BioDiscML proposes various conditions for choosing a “best model”, but this is complex to determine as some data are too heterogeneous to propose ideal decision threshold metrics. Unfortunately, this tool does not provide visualization of the signature, hence limiting a rapid view of the results. Thus, to help in these decisions, we propose a visual tool, BioDiscViz, to support the choice of consensus features within a set of trained classifiers with their corresponding signatures.

Design and implementation

BioDiscViz is a visual Shiny application working on Windows and Unix operating systems to support BioDiscML by presenting an interactive interface and graphs to the researchers which will improve their understanding of the results. The application is based on R [7]. It uses the framework Rshiny [8] and its dependency Rshiny Dashboard [9] and requires Rstudio [10], an integrated development environment for R.

Input

BioDiscViz takes as input a directory containing BioDiscML output in csv format and their summary results. The best model and the classification or regression results are independently accessible. Furthermore, the tool supports multiple BioDiscML outputs in the same directory and allows rapid switching between them.

Layout

BioDiscViz’s interface is divided into two parts: a sidebar on the left and the main section on the right.

Starting with the sidebar, the first item is the “input directory” button. Clicking this button opens an interface where it’s possible to choose the directory containing the BioDiscML outputs. Below, there is a submit button, which runs BioDiscViz on the selected BioDiscML results and an “example button” which runs the analysis of the example data provided. The main part of the interface is divided into four sections: Short Signature, Long Signature, Attribute Distribution, and Consensus Signatures.

Once the BioDiscML results are submitted to BioDiscViz, additional options appear in the sidebar. First, a scrollable list allows the selection of a specific BioDiscML output to study if there are multiple outputs in the same directory. Then, two sliders are present to adjust the font and label sizes of the figures, which can be modified at any time by the user and force the update of the plots. Finally, a button to download an HTML report of the results, including all the figures will be somewhere.

On the main part of the application, four sections are accessible through the sidebar. Each section is divided into two to three parts, consisting of a results summary for the models, plots, and a table.

  • Short Signature: This section represents the results obtained by the best model of bioDiscML.

  • Long Signature: This section displays the correlated features.

  • Attribute Distribution: This is an additional feature in the visualizer that allows to interactively visualize the most frequently used features by different classifiers that were tested. Various thresholds for the classifiers using metrics such as the Matthew’s correlation coefficient and standard deviation are available. Moreover, the number of attributes to match the experimental design is determined by the user.

  • Consensus Signatures: This section provides a representation of the different signatures called by the majority of classifiers based on user-defined parameters.

Representations

There is a heatmap, a PCA, t-SNE, UMAP graph and a boxplot to represent the short, long and consensus signatures (Fig 1A). The heatmap was made using ComplexHeatmap [11], an user-friendly package for better representation of heatmaps. The PCA was built using FactoExtra [12] and the UMAP, Rtsne and boxplots with ggplot2 [13].

Fig 1. Representation of the best signature and attribute distribution sections.

Fig 1

A. Heatmap and PCA by BiodiscViz on the best model found by BiodiscML, Kstar model, on the colon cancer dataset. We observe in cyan color the tumor tissues and in red the normal ones. B. Selection of the consensus signatures with BiodiscViz on the colon cancer dataset. Here were selected the 10 attributes most frequently called by the classifiers passing the threshold of a Matthew Correlation Coefficient > = 0.75 and a Standard Deviation < = 0.15.

The attribute distribution is represented under the form of a UpsetR plot [14] (Fig 1B). UpsetR is a R package generating static upset plots to visualize the intersections between the different features in the different classifiers.

BioDiscViz also gives access to the summary details for the short signature and a table of the data used for the short and long signature. The table in the shiny application is an integration of the datasets. It allows users to search for specific information using a search field. If there is a particular instance of interest, it can be easily found and highlighted within the table (S1A–S1C Fig).

Considering that non-numerical features cannot be easily integrated into PCA and heatmap with other numerical values, a particularity of BioDiscViz is the transformation of categorical features into numerical ones. This form allows users to simply annotate them on the side of the heatmaps to integrate the information contained by these features into the clustering of PCA.

Outputs

BioDiscViz also possesses different functions to facilitate use and export of the results for archiving, sharing and publication. The first one is the creation of a report of the different graphs represented in the application which takes into account the modifications carried out by the user. The second functionality is to be able to download a sub dataset containing the information for the selected features in the “attribute distribution”.

The study of consensus signatures is of great interest to allow researchers to identify new molecular targets of interest. If the best model provides a vision of which useful data were selected, the model does not necessarily use the features providing the most information. We consider that the most frequently called signatures by the classifiers contain important information for our problem. Those are the signatures we call consensus signatures. As such, giving BioDiscML’s models these consensus signatures, which were left out by the best model, could potentially improve the initial results obtained by the previous best model.

Results

To demonstrate the functionalities of BioDiscViz, we used a colon cancer dataset [15] which was used for the BioDiscML publication and which is available on BioDiscViz gitlab. This dataset contains gene expression in 40 tumor and 22 normal colon tissue samples.

Visualize the best signature

The identification of the best signatures was studied from two perspectives. First, the signatures from the best model followed by the consensus signatures.

For signatures retrieved from the best model, different plots were generated. In this case, two classes were distinctly separated on the PCA and the heatmap (Fig 1A), showing that they provide enough information to the model to correctly predict tumor tissues and healthy tissues. Then, differential expressions of genes identified in the model were visualized using boxplots (Fig 2). Interestingly, all the signatures showed promising results as there is a clear difference for each gene between the two classes.

Fig 2. Boxplot of the best model obtained by BioDiscML.

Fig 2

Biomarkers used by the best model selected by BioDiscML to classify the healthy and cancerous tissues.

Consensus signature

BioDiscML uses ensemble methods to create an association of signatures, but it does not take advantage of all generated models during its learning stage. Ensemble methods also keep all features of the model’s signatures, without any optimization, thus complexifying the model. We formulated the hypothesis that the features frequently called by the different models contained valuable information relevant to the problem at hand. The consensus signatures, referred to as such, offer a promising avenue for constructing a new signature and enhancing or streamlining the model. To delve deeper into these signatures, it is feasible to generate a dataset of consensus signatures directly using BioDiscViz. Following numerous tests, we selected the top 10 signatures from the classifiers that surpassed the MCC threshold of 0.76 (Matthew’s Correlation Coefficient) and had a standard deviation of MCC (STD MCC) no greater than 0.15. (Fig 1B). The quality of the selection was assessed using the heatmap (Fig 3A) and PCA (Fig 3B), which presented a better separation between the classes than the best model identified by BioDiscML. Compared to the best model signatures, these consensus signatures consist of 3 genes overlapping with the best signature, and 7 newly added genes. To further look into these new signatures, The boxplot was used to select the genes which were differentially expressed between the healthy and cancerous tissues.

Fig 3. Graphical representation of the 10 best consensus signatures.

Fig 3

A. Heatmap. B. PCA.

Following the identification of the consensus signature, we ran BiodiscML a second time to find an optimal machine learning classifier with the full signature, without any feature selection. The best classifier was a Kstar model with 6 attributes signature which had a MCC of 0.776 with a standard deviation across (STD) all evaluation procedures of 0.037, which is a reasonable performance considering past work on MCC evaluations [16]. Furthermore, the model had an accuracy of 0.857 Moreover, the model exhibited an accuracy of 0.857, which is comparable to, and in some cases even superior to, the results reported in the existing literature for this particular type of data [17]. With the consensus signature, we obtained a Fuzzy Lattice Reasoning model with a MCC of 0.791 (STD 0.032) which is slightly better than the previous best model (MCC increased by 1,9% and STD decreased by 15,6%).

In conclusion, our tool is able to provide visual support to BioDiscML and new insights outside of the best model by looking into the consensus signatures. Furthermore, these consensus signatures could be used to rerun BioDiscML and may enhance the quality of the model.

Supporting information

S1 Fig. Illustration of a research in the table.

Search for a specific instance: “12” in the table (A). The first approach involves scanning the entire table for any values that contain “12” (B). Alternatively, the user can focus on a particular column and instance and search for the one with a value of “12” (C).

(EPS)

Data Availability

BioDiscViz is directly implemented in R and is available under the GNU-GPL 3 license on Gitlab (https://gitlab.com/SBouirdene/biodiscviz.git) and online at https://sophiane-bouirdene.shinyapps.io/BiodiscViz_shinyapp/. The version used for this article can be found under the release 1.0.

Funding Statement

Dr Steve Bilodeau received a grant from the Canadian Institutes of Health Research (Grant Number: 387762) for the broader project encompassing BioDiscViz. We assure you that the funders played no role in the study design, data collection, analysis, the decision to publish, or the preparation of the manuscript.

References

  • 1. Li YH, Xu JY, Tao L, Li XF, Li S, Zeng X, et al. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity. PLOS ONE. 2016. p. e0155290. doi: 10.1371/journal.pone.0155290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Leclercq M, Vittrant B, Martin-Magniette ML, Scott Boyer MP, Perin O, Bergeron A, et al. Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data. Front Genet. 2019;10: 452. doi: 10.3389/fgene.2019.00452 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Roux-Dalvai F, Gotti C, Leclercq M, Hélie M-C, Boissinot M, Arrey TN, et al. Fast and Accurate Bacterial Species Identification in Urine Specimens Using LC-MS/MS Mass Spectrometry and Machine Learning. Mol Cell Proteomics. 2019;18: 2492–2505. doi: 10.1074/mcp.TIR119.001559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rabaglino MB, Kadarmideen HN. Machine Learning Approach to Integrated Endometrial Transcriptomic Datasets Reveals Biomarkers Predicting Uterine Receptivity in Cattle at Seven Days after Estrous. Sci Rep. 2020;10: 1–10. doi: 10.1038/s41598-020-72988-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Khorraminezhad L, Leclercq M, O’Connor S, Julien P, Weisnagel SJ, Gagnon C, et al. Dairy Product Intake Modifies Gut Microbiota Composition among Hyperinsulinemic Individuals. Eur J Nutr. 2020;60: 159–167. doi: 10.1007/s00394-020-02226-z [DOI] [PubMed] [Google Scholar]
  • 6. Doré E, Joly-Beauparlant C, Morozumi S, Mathieu A, Lévesque T, Allaeys I, et al. The Interaction of Secreted Phospholipase A2-IIA with the Microbiota Alters Its Lipidome and Promotes Inflammation. JCI Insight. 2022;7. doi: 10.1172/jci.insight.152638 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria. R Foundation for Statistical Computing; 2020. Available from: https://www.R-project.org/.
  • 8.Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y et al. “shiny: Web Application Framework for R” R package version 1.7.1 Available from: https://rstudio.github.io/shiny/index.html.
  • 9.Chang W, Borges Ribeiro B. “shinydashboard: Create Dashboards with’Shiny’” R package version 0.7.2. Available from: https://rstudio.github.io/shinydashboard/.
  • 10.RStudio Team (2020) RStudio: Integrated Development for R. RStudio, Boston, MA, USA. Available from: https://github.com/rstudio/rstudio.
  • 11. Gu Z, Eils R, Schlesner M. Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data. Bioinformatics. 2016;32: 2847–2849. doi: 10.1093/bioinformatics/btw313 [DOI] [PubMed] [Google Scholar]
  • 12.Kassambara A and Mundt F. Factoextra: Extract and Visualize the Results of Multivariate Data Analyses R Package Version 1.0.7. Available from: https://cran.r-project.org/web/packages/factoextra/readme/README.html.
  • 13.Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4. Available from: https://ggplot2.tidyverse.org.
  • 14. Conway JR, Lex A, Gehlenborg N. UpSetR: An R Package for the Visualization of Intersecting Sets and Their Properties. Bioinformatics. 2017;33: 2938–2940 doi: 10.1093/bioinformatics/btx364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc Natl Acad Sci U S A. 1999;96: 6745–6750. doi: 10.1073/pnas.96.12.6745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Schober P, Boer C, Schwarte LA. Correlation Coefficients: Appropriate Use and Interpretation. Anesth Analg. 2018;126: 1763–1768. doi: 10.1213/ANE.0000000000002864 [DOI] [PubMed] [Google Scholar]
  • 17.Fahami M, Roshanzamir M, Izadi N, Keyvani V, Alizadehsani R. Detection of Effective Genes in Colon Cancer: A Machine Learning Approach. 2021. Informatics in Medicine Unlocked 24 (January): 100605.

Decision Letter 0

Eduardo Andrés-León

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

30 May 2023

PONE-D-22-29004BioDiscViz : a visualization support and consensus signature selector for BioDiscML resultsPLOS ONE

Dear Dr. Droit,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Reviewers think that the manuscript is interesting but an in-deep modifications are needed. Please take care of all theirs comments

Please submit your revised manuscript by Jul 14 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Eduardo Andrés-León

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is not an original research paper, but rather a technical note of a visualisation extension of an already exiting software. I do not think it fits the scope of this journal and therefore, rejection is recommended

Reviewer #2: This manuscript presents BioDiscViz, an R-based visualization package for understanding and investigating the results outputted by BioDiscML. I think the tool itself is interesting and could be very useful to the users of the BioDiscML package, the presentation of the tool in this manuscript is also good but could use some improvement. For that reason, I think this manuscript is suited for publication with some editions.

Here are my suggestions for the authors to consider:

1. I was somewhat confused about the layout of BioDiscViz when I read the paper. More specifically, the authors mentioned 4 main parts, I was not sure what it was referring to, and only two of them are presented in figures 1A and 1B, so I was wondering what's on the other two parts. It's probably better to show the whole UI first, introduce each part on the UI, and focus on each individual section in later parts, or at least rearrange the existing text so it's more clear.

I actually find the "usage" part of the documentation included in the gitlab repo to be much more clear and more informative, maybe the authors could try to take some inspiration from there.

2. The authors mentioned the tool could provide an "interactive table" of the data used for the short and long signatures. I'm not quite sure what that means, but it sounds like an interesting feature. It would be great to see more details about it and probably an illustration if the authors find it fit.

3. For a visualization tool, some of the plots are a little bit crude and hard to read. One simplest changes I would suggest is increasing the font sizes of the plots, as well as the size of the scatters on the PCA plots. It would be best to make them customizable by the users.

4. Other dimension reduction plots might also be useful, like Umap, TSNE, in addition to PCA, just something to consider.

Reviewer #3: The paper presents an application, BioDiscViz, implementing different visualization techniques and exploratory tools to investigate BioDiscML outputs, that is outputs from a software implementing different Machine Learning algorithms for the identification of biomarker signatures. While I believe that the developed application has merit and that the paper is clear and well written, I have some comments for the authors.

- Currently, the application is written using the R library Shiny, but it is only hosted locally on one of the authors' gitlab pages (https://gitlab.com/SBouirdene/biodiscviz). The Shiny environment allows for free and open access web publication of Shiny applications, via https://www.shinyapps.io/. I believe the authors should make their application easily accessible on the web. In its current state, it requires for users to manually install it and run it locally in R, after having cloned the gitlab repository containing the application. This is a somehow lengthy process that could inhibit use of the application and I believe should be avoided as primary option to access it, given that a freely available platform for publishing is available. You could still leave the current option (run the app locally) as a secondary one, for more experienced users to choose.

- In relation to my previous comment, multiple packages (ComplexHeatmap - which has to be manually installed from Bioconductor, Rtools - for which the corresponding drives need to be downloaded and installed manually) are required to run the app. However, the requirement of having to install such dependencies is not stated in the instructions available in the gitlab page. This results in quite a long and multiple steps installation and deployment process for a user who does not have these library already pre-installed locally. I suggest the authors improve such aspect.

- While the illustration of usage of the app is clear on the gitlab page and on the paper, no runnable toy example is available on the app. I believe including a toy example, which would not require the upload of a BioDiscML output, would be a very useful feature of the app, that could serve better illustrating its functioning to first time users. You could consider including the same example discussed in this paper and in the gitlab page.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Xiangyun Lei

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Nov 30;18(11):e0294750. doi: 10.1371/journal.pone.0294750.r002

Author response to Decision Letter 0


18 Jul 2023

Reviewer #1: This is not an original research paper, but rather a technical note of a visualisation extension of an already exiting software. I do not think it fits the scope of this journal and therefore, rejection is recommended

While we respect the opinion of the reviewer, our manuscript fits the scope of PLOS ONE. According to the stated criteria, a tool must be useful to the community and demonstrate a clear advantage over existing alternatives, if applicable. For BiodiscViz, there is currently no comparable visualization tool available for machine learning results from a software like BioDiscML where many models are computed at once. Additionally, BiodiscViz represents an original contribution that significantly enhances the interpretation of machine learning results, particularly in the field of biology where researchers often possess limited knowledge in computer science. Therefore, there is a genuine need for a tool like BiodiscViz.

Reviewer #2: This manuscript presents BioDiscViz, an R-based visualization package for understanding and investigating the results outputted by BioDiscML. I think the tool itself is interesting and could be very useful to the users of the BioDiscML package, the presentation of the tool in this manuscript is also good but could use some improvement. For that reason, I think this manuscript is suited for publication with some editions.

We are pleased that the reviewer agrees with our assessment that the manuscript describes a potentially very useful tool. We have carefully considered all suggestions to enhance the research paper and the accompanying tool. We are resubmitting improved versions addressing all the comments.

1. I was somewhat confused about the layout of BioDiscViz when I read the paper. More specifically, the authors mentioned 4 main parts, I was not sure what it was referring to, and only two of them are presented in figures 1A and 1B, so I was wondering what's on the other two parts. It's probably better to show the whole UI first, introduce each part on the UI, and focus on each individual section in later parts, or at least rearrange the existing text so it's more clear.

We thank the reviewer for pointing out this lack of clarity. We revised the Layout section and incorporated the explanation provided on GitLab to improve clarity and understanding of our tool.

2. The authors mentioned the tool could provide an "interactive table" of the data used for the short and long signatures. I'm not quite sure what that means, but it sounds like an interesting feature. It would be great to see more details about it and probably an illustration if the authors find it fit.

The reviewer is raising an interesting question. The interactive table was not a primary feature of our tool, so we did not delve into extensive detail in the paper. The idea was to let the user access the input table without leaving the application while giving them easier means to select and search for instance or values. We are now including a brief explanation of the table and supplemented it with illustrations in the supplementary data to provide a visual demonstration of its functionality. Furthermore, as we realized that the term wasn’t the most appropriate we will just use “table” to describe it.

3. For a visualization tool, some of the plots are a little bit crude and hard to read. One simplest changes I would suggest is increasing the font sizes of the plots, as well as the size of the scatters on the PCA plots. It would be best to make them customizable by the users.

The comment by the reviewer relates to simplicity versus versatility of the tool. Our primary goal was to ensure that the application is highly user-friendly by minimizing the number of interactive parameters that users need to manage. However, we acknowledge that this approach could be problematic if the plots are difficult to read. Therefore, we have implemented a reactive value for the fontsize of the plots. This allows users to adjust the fontsize at any time, addressing any readability issues and providing them with greater control over their viewing experience.

4. Other dimension reduction plots might also be useful, like Umap, TSNE, in addition to PCA, just something to consider.

We agree with the reviewer that additional dimension reduction plots would be useful. As such the Umap, and t-SNE are now included.

Reviewer #3: The paper presents an application, BioDiscViz, implementing different visualization techniques and exploratory tools to investigate BioDiscML outputs, that is outputs from a software implementing different Machine Learning algorithms for the identification of biomarker signatures. While I believe that the developed application has merit and that the paper is clear and well written, I have some comments for the authors.

We are grateful that the reviewer appreciated the merit of our application. This new version of our manuscript addresses all the concerns.

- Currently, the application is written using the R library Shiny, but it is only hosted locally on one of the authors' gitlab pages (https://gitlab.com/SBouirdene/biodiscviz). The Shiny environment allows for free and open access web publication of Shiny applications, via https://www.shinyapps.io/. I believe the authors should make their application easily accessible on the web. In its current state, it requires for users to manually install it and run it locally in R, after having cloned the gitlab repository containing the application. This is a somehow lengthy process that could inhibit use of the application and I believe should be avoided as primary option to access it, given that a freely available platform for publishing is available. You could still leave the current option (run the app locally) as a secondary one, for more experienced users to choose.

We thank the reviewer for the suggestion. To provide a web alternative, we now include a web option for our application to cater to the users' needs. It is now hosted in shinnyapps.io, in the free version.

- In relation to my previous comment, multiple packages (ComplexHeatmap - which has to be manually installed from Bioconductor, Rtools - for which the corresponding drives need to be downloaded and installed manually) are required to run the app. However, the requirement of having to install such dependencies is not stated in the instructions available in the gitlab page. This results in quite a long and multiple steps installation and deployment process for a user who does not have these library already pre-installed locally. I suggest the authors improve such aspect.

Once again, the reviewers raised an important limitation. In order to prioritize user-friendliness, we improved the tool to specifically integrate Bioconductor directly into the app. This will automatically download required packages if not already present on the user's system. Additionally, we included a reference in the readme file to guide users on downloading Rtools. Furthermore, the installation of all the libraries used in the app has been seamlessly integrated and will be automatically implemented when the user runs it. These enhancements will simplify the setup process for users and provide a smoother experience overall.

- While the illustration of usage of the app is clear on the gitlab page and on the paper, no runnable toy example is available on the app. I believe including a toy example, which would not require the upload of a BioDiscML output, would be a very useful feature of the app, that could serve better illustrating its functioning to first time users. You could consider including the same example discussed in this paper and in the gitlab page

Initially, we included the dataset used for the research paper as an example in the "example" directory, aiming to help users become acquainted with the tool. However, it seems like our decision created confusion. We have now implemented an "example" button, which automatically loads the data specifically for this example, ensuring a better user experience. Additionally, to provide further clarity on the functioning of the app, we included a step-by-step video guide, allowing users to follow along and better understand the app's functionality. These additions will enhance user understanding and usability of the tool.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Xiaoyong Sun

20 Sep 2023

PONE-D-22-29004R1

BioDiscViz : a visualization support and consensus signature selector for BioDiscML results

PLOS ONE

Dear Dr. Droit,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we have decided that your manuscript does not meet our criteria for publication and must therefore be rejected.

Specifically:

- Experiments, statistics, and other analyses are NOT performed to a high technical standard and are NOT described in sufficient detail. The visualization function can be easily achieved with R itself, and the software is NOT performed to a high technical standard.

I am sorry that we cannot be more positive on this occasion, but hope that you appreciate the reasons for this decision.

Kind regards,

Xiaoyong Sun

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for addressing the comments. The manuscript has now improved and it is suitable for publication

Reviewer #2: I think the revised version properly addressed my concerns for the initial draft. I think the manuscript is more clear now and is ready to be accepted. There are few minor things for the authors to consider.

- minor typo: line 61 (might be worth to proofread again to catch other potential typos)

- it might be a good idea to include a proper license for the software, even if it's meant to be open-sourced and used by others with no restriction

- It's up to the authors, but for a software package release that might be continually maintained and updated, it might also be a good idea to create an official release (give it a version number), and archive it using Zenodo/figshare, which the authors can refer to in the paper - just so that the developer could continue working on the repo in the future, but readers would have archived version to refer to.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

- - - - -

For journal use only: PONEDEC3

PLoS One. 2023 Nov 30;18(11):e0294750. doi: 10.1371/journal.pone.0294750.r004

Author response to Decision Letter 1


17 Oct 2023

Response to editor

The visualization function included in our software, coded in R, answers the need for fast evaluation of machine learning models. While it is true that the visualization could simply be run in R, our Shinny application was designed to streamline and automate a visualization process making it accessible to researchers who may not possess coding expertise. This feature enhances the usability and accessibility of our software, aligning with the broader goal of facilitating scientific research.

Response to reviewers

We express our gratitude to the reviewers for dedicating their time to assess the second version of our manuscript. We thank them for their positive feedback. In response, particularly addressing a typo in section 6, we have made the necessary revisions. Additionally, in accordance with their recommendations, we have included a GNU GPL license for our software. Furthermore, we have initiated a release with a version number in the BioDiscViz GitLab repository, which can be accessed via the following link: https://gitlab.com/SBouirdene/biodiscviz.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Achraf El Allali

9 Nov 2023

BioDiscViz : a visualization support and consensus signature selector for BioDiscML results

PONE-D-22-29004R2

Dear Dr. Droit,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Achraf El Allali, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

I would have liked to see the BioDiscML fully integrated but I believe that the proposed package will have a good added value for the users who do not have the expertise to extract relevant plots and information from BioDiscML output. The authors are encouraged to maintain the web version and host it other on a paid version of shiny or on their organization's webservers.

Reviewers' comments:

Acceptance letter

Achraf El Allali

20 Nov 2023

PONE-D-22-29004R2

BioDiscViz : a visualization support and consensus signature selector for BioDiscML results

Dear Dr. Droit:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Achraf El Allali

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Illustration of a research in the table.

    Search for a specific instance: “12” in the table (A). The first approach involves scanning the entire table for any values that contain “12” (B). Alternatively, the user can focus on a particular column and instance and search for the one with a value of “12” (C).

    (EPS)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    BioDiscViz is directly implemented in R and is available under the GNU-GPL 3 license on Gitlab (https://gitlab.com/SBouirdene/biodiscviz.git) and online at https://sophiane-bouirdene.shinyapps.io/BiodiscViz_shinyapp/. The version used for this article can be found under the release 1.0.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES