NeuroVar: an open-source tool for the visualization of gene expression and variation data for biomarkers of neurological diseases

Hiba Ben Aribi; Najla Abassi; Olaitan I Awe

doi:10.46471/gigabyte.143

. 2024 Nov 25;2024:gigabyte143. doi: 10.46471/gigabyte.143

NeuroVar: an open-source tool for the visualization of gene expression and variation data for biomarkers of neurological diseases

Hiba Ben Aribi ^1,^*, Najla Abassi ², Olaitan I Awe ^3,⁴

PMCID: PMC11612633 PMID: 39629064

Abstract

The expanding availability of large-scale genomic data and the growing interest in uncovering gene-disease associations call for efficient tools to visualize and evaluate gene expression and genetic variation data. Here, we developed a comprehensive pipeline that was implemented as an interactive Shiny application and a standalone desktop application. NeuroVar is a tool for visualizing genetic variation (single nucleotide polymorphisms and insertions/deletions) and gene expression profiles of biomarkers of neurological diseases. Data collection involved filtering biomarkers related to multiple neurological diseases from the ClinGen database. NeuroVar provides a user-friendly graphical user interface to visualize genomic data and is freely accessible on the project’s GitHub repository (https://github.com/omicscodeathon/neurovar).

Statement of need

Disease biomarkers are genes or molecules that indicate the presence or severity of a disease. Their identification provides important insights into disease etiology and can facilitate the development of new treatments and therapies [1]. Integrating multi-omics data, such as gene expression and genetic variations, has emerged as a powerful approach for biomarker discovery.

Several genomics studies have discovered multiple genetic variations linked to numerous neurological conditions that are complex diseases with a significant level of heterogeneity, such as Alzheimer’s disease [2] and Parkinson’s disease [3]. Some studies have also used genetic variants to detect the presence of human disorders [4].

The discovered biomarkers are extensively documented in various scientific publications and are accessible through databases like the Clinical Genome (ClinGen) database. ClinGen stores a vast amount of genomic data, including a comprehensive dataset of biomarkers associated with multiple diseases, such as various neurological disorders [5].

Multiple computational tools have been developed in recent years to analyze genomic data, including gene expression data analysis [6, 7], identification of potential inhibitors for therapeutic targets [8], and comparative analysis of molecular and genetic evolution [9]. However, there is still a need for a specialized tool that focuses on filtering critical disease biomarkers, as this will help in studies that work on finding genes that are involved in diseases using transcriptomic data generated from sequencing experiments [10–13]. Such a tool would help users identify phenotypic subtypes of diseases in their patients, thereby facilitating more accurate diagnoses and personalized treatment plans.

In this study, we developed a novel tool named “NeuroVar” to analyze biomarker data for neurological diseases specifically, including gene expression profiles and genetic variations such as single nucleotide polymorphisms (SNPs) and nucleotide insertion and/or deletion (Indels).

Implementation

Data collection

The ClinGen database [5] provides a dataset of biomarkers of multiple diseases from which we filtered data of all the available neurological syndromes (eleven) and non-neurological diseases with neurological manifestations (seven).

Software development

Two versions of the tool were developed: an R shiny and a desktop application.

The shiny application was developed using multiple R packages, including Shiny (RRID:SCR_001626) [14] and shinydashboard [15]. Other R packages are used for data manipulation, including dplyr (RRID:SCR_016708) [16], readr [17], tidyverse (RRID:SCR_019186) [18], purrr (RRID:SCR_021267) [19], vcfR (RRID:SCR_023453) [20], bslib [21], stringr (RRID:SCR_022813) [22], data.table [23], fs [24], DT [25], sqldf [26], and ggplot2 (RRID:SCR_014601) [27].

For the stand-alone desktop application, the wxPython framework [28] was used to build a similar GUI. A variety of Python libraries were employed, including Pandas (RRID:SCR_018214) [29], MatPlotLib (RRID:SCR_008624) [30], and NumPy (RRID:SCR_008633) [31]. After testing, the application was packaged as an installer using cx_Freeze [32]. Finally, it was distributed as a zip file to be downloaded.

Pipeline validation and case study

To validate the pipeline, a case study was performed on the public dataset SRP149638 [33] available on the SRA database [34]. The dataset corresponds to RNA sequencing data from the peripheral blood mononuclear cells from healthy donors and Amyotrophic Lateral Sclerosis (ALS) patients. The ALS patients involved in the study have mutations in the FUS, TARDBP, SOD1, and VCP genes.

The file’s preprocessing, genetic expression analysis, and variant calling were performed using the Exvar R package [35]. The Exvar package uses the rfastp package [36] and the gmapR package [37] for preprocessing fastq files, the GenomicAlignments package (RRID:SCR_024236) [38], and the DESeq2 packages (RRID:SCR_015687) [39] for gene expression data analysis, as well as the VariantTools [40] and VariantAnnotation (RRID:SCR_000074) [41] packages for variant calling.

Results

Supported disease

NeuroVar integrates biomarkers of multiple neurological diseases, including epilepsy, ALS, intellectual disability, autism spectrum disorder, brain malformation syndrome, syndromic disorders, cerebral palsy, RASopathy, aminoacidopathy, craniofacial malformations, Parkinson’s disease, and PHARC syndrome. It also integrates seven non-neurological diseases with neurological manifestations: peroxisomal disorders, hereditary cancer, mitochondrial disease, retina-related disorders, general gene curation, hearing loss, and fatty acid oxidation disorders. Each disease syndrome includes multiple disease types; for example, sixteen types of ALS disorder are integrated.

Operation and implementation

The desktop and Shiny applications have the same user interface; however, the implementation is different.

The Shiny application is platform-independent, while the desktop application is optimized for the Windows operating system. The necessary library requirements for the tool are automatically installed in both versions. The amount of RAM used depends on the servers or the machine being used, and the only prerequisites for using the tool are having R installed for the shiny application and having Python installed for the desktop application.

The tool is compatible with RNA sequencing data. The input data files should be in CSV format for gene expression data and VCF (Variant Call Format) format for genetic variants data. Guidance of the files’ organization is available in the tool’s Github repository in detail (path: omicscodeathon/neurovar/demonstration_data).

Detailed guidelines for installing and using both versions of the application are provided in the project’s GitHub repository.

The application’s usage

The application dashboard includes three pages. The first page, named “Biomarker”, provides data on the disease’s biomarkers. Initially, the user should select the target disease syndrome and the specific disease subtypes from the provided list (Figure 1).

Figure 1. — The layout of the “Biomarker” page. The user is requested to define the target disease, disease type, and gene of interest.

Next, a list of biomarkers is provided with additional data, including the gene’s mode of inheritance, description, type, and transcripts. Also, a link for the official online report validating the gene’s association with the disease is provided (Figure 2).

Figure 2. — The output of the “Biomarker” page. The output includes two tables detailing key information about the selected gene.

The second page, named “Expression”, is used to visualize the biomarkers expression profile. After importing a CSV file and identifying the key columns, the log2FC value and adjusted p-value are requested to define the differential expression profile. By default, the adjusted p-value is set to less than 0.01, and the logFC value is set to less or more than 2 (Figure 3).

Figure 3. — The layout of the “Expression” page. The user is requested to upload the data file and select the p-value and the log-FC value required to construct the differential expression profile.

As a result, the expression profiles of the target disease biomarkers (previously selected) are summarized in a table and represented in a volcano plot (Figure 4).

Figure 4. — The output of the “Expression” page. As output, a summary of the genes’ expression profiles is displayed in a table and a volcano plot.

The third page, named “Variants”, allows the visualization of SNPs and Indels data. The user is requested to define the path to the directory containing the VCF files. The files are expected to be divided into two folders, named “controls” and “patients”, containing the VCF files of the controls and patients, respectively. The user needs to define the variant type as SNPs or Indels (Figure 5).

Figure 5. — The layout of the “Variant” page. The user is prompted to specify the path to the data-containing folder and the data type.

The VCF files are processed and annotated, and then the variants in the target disease biomarkers are filtered and resumed in a table comparing the reference genome, the control group, and the patients’ group (Figures 6 and 7).

Figure 6. — The output of the “Variant” page. Table summarizing the SNPs in the target disease’s biomarkers.

Figure 7. — The output of the “Variant” page. Table summarizing the INDELs in the target disease’s biomarkers.

Case study results

To validate the pipeline, we conducted a case study using the public dataset that provides RNA sequencing data of ALS patients who were declared to carry mutations in the FUS, TARDBP, SOD1, and VCP genes [33].

Initially, we used NeuroVar to explore the roles of the genes FUS, TARDBP, SOD1, and VCP in ALS. Our findings confirmed that FUS, TARDBP, and SOD1 are recognized ALS biomarkers, while VCP is not. ALS has 26 subtypes, with FUS being a biomarker for type 6, SOD1 for type 1, and TARDBP for type 10, suggesting that the patients in the study may represent a mixture of these ALS subtypes.

Next, we investigated whether mutations in these genes impacted their expression profiles. Using an adjusted p-value threshold of 0.05 and a log fold change (logFC) cutoff of 2, we found that out of 21 known ALS biomarkers, only one gene—TUBA4A—was differentially expressed. Notably, none of the four genes (FUS, TARDBP, SOD1, and VCP) showed differential expression.

Finally, we examined the types of mutations present in these genes. We detected 23 SNPs across seven biomarkers: DAO (all ALS types), FIG4 (ALS type 11), ERBB4 (ALS type 19), TUBA4A (ALS type 22), KIF5A (ALS type 25), C9orf72 (ALS type 1), and TBK1 (ALS type 4). No indels were detected in any of the biomarkers. Interestingly, the biomarkers FUS, TARDBP, and SOD1 exhibited neither SNPs nor Indels, suggesting that the mutations in these genes may be due to other types of genomic changes.

A demonstration video describing how to visualize the demonstration data using neurovar is available on GitHub (Figure 8).

Discussion and conclusion

NeuroVar is a novel tool for visualizing genetic variation and gene expression data related to neurological diseases. The tool is designed to visualize genetic variation and gene expression data, with a particular emphasis on neurological disorders. This specialization makes it an invaluable resource for researchers and clinicians focused on these conditions. It offers features to filter biomarkers by specific diseases, which aids in confirming gene-disease associations and prioritizing genes for further investigation.

The tool supports eleven neurological syndromes and seven non-neurological diseases with neurological manifestations. While the supported diseases list is currently limited to data from the ClinGen database, it will be frequently updated, and data sources will be expanded to include other databases in the future.

NeuroVar is available as a desktop application and as a Shiny application. Both versions are user-friendly and do not require computational skills to operate them. Additionally, all necessary dependencies are automatically installed with the tools. This dual accessibility of NeuroVar caters to users with varying preferences and technical backgrounds, which makes it more accessible and easier to use than other visualization tools of genetic variant data, such as the command line tool VIVA [42] to analyze VCF files and the “Transcriptomics oSPARC” web tool for gene expression data visualization hosted on the o²S²PARC platform (RRID:SCR_018997) [6].

In addition to its user-friendly design, NeuroVar streamlines the research workflow by eliminating the need for multiple filtering steps across different platforms. By integrating essential functions within a single interface, it allows users to conduct comprehensive analyses without leaving the application, thereby enhancing efficiency and reducing errors. The inclusion of a quick-access library on the first page further aids in referencing important data, making it easier to revisit and validate findings. This centralization of tasks, coupled with a focus on neurological diseases and extensive biomarker information, makes NeuroVar a highly useful tool for advancing research in the field.

Availability of source code and requirements

Project name: NeuroVar
Project home page: https://github.com/omicscodeathon/neurovar
Operating system: Platform independent
Programming language: Python and R
Other requirements: None
License: Artistic license 2.0
RRID: SCR_025640
DOI for the Project’s GitHub Repository: https://doi.org/10.5281/zenodo.13375646
DOI for the Shiny application: https://doi.org/10.5281/zenodo.13375493
DOI for the desktop application: https://doi.org/10.5281/zenodo.13375579
DOI for the data: https://doi.org/10.5281/zenodo.13375591.

Acknowledgements

The authors thank the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS), and the National Center for Biotechnology Information (NCBI) for their immense support before and during the April 2023 Omics codeathon organized by the African Society for Bioinformatics and Computational Biology (ASBCB).

Data availability

The following resources can be accessed in the project’s GitHub repository, https://github.com/omicscodeathon/neurovar:

The open-source code for both the Shiny application and the desktop application.
An installation guide.
A video demonstration.
The processed case study data is available as demonstration data in Zenodo [43].

Data came from ClinVar, and the presented case study was performed on the public dataset SRP149638 from the SRA database.

The open source code of the Shiny application and the desktop application are available in the project’s GitHub Repository: https://github.com/omicscodeathon/neurovar.

Installation Guide, demonstration data, and video demonstration (Figure 8) are also available in the project’s GitHub Repository: https://github.com/omicscodeathon/neurovar.

Snapshots of the project code [45], shiny application code [46], and desktop application code [47] are all in Zenodo.

Abbreviations

ALS: Amyotrophic Lateral Sclerosis; Indel: insertion and/or deletion; logFC: log fold change; SNP: single nucleotide polymorphism; VCF: Variant Call Format.

Declarations

Ethics approval and consent to participate

The authors declare that ethical approval was not required for this type of research.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

HBA: Conceptualization, Methodology, Validation, Writing; NA: Data Analysis, Methodology, Validation, Writing; OIA: Resources, Manuscript review, and Project Supervision.

Funding

The authors declare that no financial support was received for the research, authorship, and/or publication of this article.

References

1.Hirschhorn JN, Lohmueller KE, Byrne E et al. A comprehensive review of genetic association studies. Genet. Med., 2002; 4(2): 45–61. doi: 10.1097/00125817-200203000-00002. [DOI] [PubMed] [Google Scholar]
2.Kunkle BW, Grenier-Boley B, Sims R et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet., 2019; 51: 414–430. doi: 10.1038/s41588-019-0358-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Li W, Fu Y, Halliday GM et al. PARK genes link mitochondrial dysfunction and Alpha-Synuclein pathology in sporadic Parkinson’s disease. Front. Cell Dev. Biol., 2021; 9: 612476. doi: 10.3389/fcell.2021.612476. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Wesonga RM, Awe OI. . An assessment of traditional and genomic screening in newborns and their applicability for Africa. Inform. Med. Unlocked, 2022; 32: 101050. doi: 10.1016/j.imu.2022.101050. [DOI] [Google Scholar]
5.Rehm HL, Berg JS, Brooks LD et al. ClinGen—the clinical genome resource. N. Engl. J. Med., 2015; 372(23): 2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Aribi HB, Ding M, Kiran A. . Gene expression data visualization tool on the o²S²PARC platform. F1000Research, 2022; 11: 1267. doi: 10.12688/f1000research.126840.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Die JV, Elmassry MM, LeBlanc KH et al. geneHummus: an R package to define gene families and their expression in legumes and beyond. BMC Genom., 2019; 20: 591. doi: 10.1186/s12864-019-5952-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ogbodo UC, Enejoh OA, Okonkwo CH et al. Computational identification of potential inhibitors targeting cdk1 in colorectal cancer. Front. Chem., 2023; 11: 1264808. doi: 10.3389/fchem.2023.1264808. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Awe OI, En najih N, Nyamari MN et al. Comparative study between molecular and genetic evolutionary analysis tools using African SARS-CoV2 variants. Inform. Med. Unlocked, 2023; 36: 101143. doi: 10.1016/j.imu.2022.101143. [DOI] [Google Scholar]
10.El Abed F, Baraket G, Nyamari MN et al. Differential expression analysis of miRNAs and mRNAs in epilepsy uncovers potential biomarkers. bioRxiv. 2023; 10.1101/2023.09.11.557132. [DOI]
11.Chikwambi Z, Hidjo M, Chikondowa P et al. Multi-omics data integration approach identifies potential biomarkers for Prostate cancer. bioRxiv. 2023; 10.1101/2023.09.11.557129. [DOI]
12.Nyamari MN, Omar KM, Fayehun AF et al. Expression level analysis of ACE2 receptor gene in African-American and non-African-American COVID-19 patients. bioRxiv. 2023; 10.1101/2023.09.11.557129. [DOI]
13.Nzungize L, Kengne-Ouafo JA, Wesonga MR et al. Transcriptional profiles analysis of COVID-19 and malaria patients reveals potential biomarkers in children. bioRxiv. 2022; 10.1101/2022.06.30.498338. [DOI]
14.Chang W, Cheng J, Allaire J et al. Shiny: web application framework for R. R package version 1.7.4.9002. 2023; https://shiny.rstudio.com/.
15.Chang W, Borges Ribeiro B. . Create dashboards with ‘Shiny’. 2023; http://rstudio.github.io/shinydashboard/.
16.Wickham H, François R, Henry L et al. dplyr: a grammar of data manipulation. 2022; https://github.com/tidyverse/dplyr.
17.Wickham H, Hester J, Bryan J. . readr: read rectangular text data. 2022; https://readr.tidyverse.org, https://github.com/tidyverse/readr.
18.Wickham H, Averick M, Bryan J et al. Welcome to the tidyverse. J. Open Source Softw., 2019; 4(43): 1686. doi: 10.21105/joss.01686. [DOI] [Google Scholar]
19.Wickham H, Henry L. . purrr: functional programming tools. 2023; https://purrr.tidyverse.org.
20.Knaus BJ, Grünwald NJ. . VCFR: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour., 2017; 17(1): 44–53. ISSN 757. doi: 10.1111/1755-0998.12549. [DOI] [PubMed] [Google Scholar]
21.Sievert C, Cheng J. . bslib: custom ‘bootstrap’ ‘sass’ themes for ‘shiny’ and ‘rmarkdown’. 2023; https://github.com/rstudio/bslib.
22.Wickham H. . stringr: simple, consistent wrappers for common string operations. 2022; https://stringr.tidyverse.org, https://github.com/tidyverse/stringr.
23.Dowle M, Srinivasan A, Gorecki J et al. Data.table: extension of ‘data.frame’. 2023; https://github.com/Rdatatable/data.table.
24.Hester J, Wickham H, Csárdi G. . Fs: cross-platform file system operations based on ‘libuv’. 2023; https://fs.r-lib.org.
25.Xie Y, Cheng J, Tan X et al. DT: a wrapper of the JavaScript library ‘DataTables’. 2023; https://github.com/rstudio/DT.
26.Grothendieck G. . sqldf: manipulate R data frames using SQL. 2017; Retrieved from https://cran.r-project.org/web/packages/sqldf/index.html.
27.Wickham H. . ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag, 2016. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org/. [Google Scholar]
28.Talbot H. . WxPython, a GUI toolkit. Linux J., 2000; 2000(74): 5. https://dl.acm.org/citation.cfm?id=349312. [Google Scholar]
29.McKinney W. . Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference. vol. 445, 2010; pp. 51–56, doi: 10.25080/Majora-92bf1922-00a. [DOI] [Google Scholar]
30.Hunter JD. . Matplotlib: a 2D graphics environment. Comput. Sci. Eng., 2007; 9(3): 90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
31.Harris CR, Millman KJ, Van Der Walt SJ et al. Array programming with NumPy. Nature, 2020; 585(7825): 357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Duarte M. . GitHub - marcelotduarte/cx_Freeze: create standalone executables from Python scripts, with the same performance and is cross-platform. 2023; https://github.com/marcelotduarte/cx_Freeze.
33.Zucca S, Gagliardi S, Pandini C et al. RNA-Seq profiling in peripheral blood mononuclear cells of amyotrophic lateral sclerosis patients and controls. Sci. Data, 2019; 6(1): 1–8. doi: 10.1038/sdata.2019.6. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Leinonen R, Sugawara H, Shumway M. . International nucleotide sequence database collaboration. The sequence read archive. Nucleic Acids Res., 2011; 39(Database issue): D19–D21. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Ben Aribi H, Dixon I, Abassi N et al. Exvar: an R package for gene expression and genetic variation data analysis and visualization. 2023; https://github.com/omicscodeathon/Exvar.
36.Wang W, Carroll T. . Rfastp: an ultra-fast and all-in-one fastq preprocessor (quality control, adapter, low quality and polyX trimming) and UMI sequence parsing). R package (Version 1.6.0). Bioconductor. 2022; 10.18129/B9.bioc.Rfastp. [DOI]
37.Barr C, Wu T, Lawrence M. . gmapR: an R interface to the GMAP/GSNAP/GSTRUCT suite (Version 1.40.0). Bioconductor. 2022; 10.18129/B9.bioc.gmapR. [DOI]
38.Lawrence M, Huber W, Pagès H et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol., 2013; 9(8): e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Love MI, Huber W, Anders S. . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 2014; 15: 550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Lawrence M, Degenhardt J, Gentleman R. . VariantTools: tools for exploratory analysis of variant calls (Version 1.40.0). Bioconductor. 2022; 10.18129/B9.bioc.VariantTools. [DOI]
41.Obenchain V, Lawrence M, Carey V et al. VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics, 2014; 30(14): 2076–2078. doi: 10.1093/bioinformatics/btu168. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Tollefson GA, Schuster J, Gelin F et al. VIVA (visualization of variants): a VCF file visualization tool. Sci. Rep., 2019; 9(1): 12648. doi: 10.1038/s41598-019-49114-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Ben Aribi H, Abassi N, Awe OI. . NeuroVar demonstration data [data set]. Zenodo. 2024; 10.5281/zenodo.13375591. [DOI]
44.Ben Aribi H. . Video demonstration: neurovar shiny application. Youtube. 2024; https://www.youtube.com/watch?v=cYZ8WOvabJs.
45.Ben Aribi H, Abassi N, Awe OI. . NeuroVar: a genetic expression and variation data visualization tool for neurological diseases’ biomarkers. Zenodo. 2024; 10.5281/zenodo.13375646. [DOI]
46.Ben Aribi H, Abassi N, Awe OI. . NeuroVar shiny application. Zenodo. 2024; 10.5281/zenodo.13375493. [DOI]
47.Ben Aribi H, Abassi N, Awe OI. . NeuroVar desktop application. Zenodo. 2024; 10.5281/zenodo.13375579. [DOI]

GigaByte. 2024 Nov 25;2024:gigabyte143.

Article Submission

Olaitan Awe

GigaByte.

Assign Handling Editor

Editor: Scott Edmunds

GigaByte.

Editor Assess MS

Editor: Hongfang Zhang

GigaByte.

Curator Assess MS

Editor: Yannan Fan

GigaByte.

Review MS

Editor: Joost Wagenaar

Reviewer name and names of any other individual's who aided in reviewer	Joost Wagenaar
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published manuscript. (If no, please inform the editor that you cannot review this manuscript.)	Yes
Is the language of sufficient quality?	Yes
Please add additional comments on language quality to clarify if needed
Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?	Yes
Additional Comments	There is a clear statement of need, but the audience is not very targeted. The investigators outline the need for tools to help users identify phenotypic subtypes of disease and describe how the tool would help with this. Although the investigators mention that the tool will allow users to analyze biomarker data, the scope of the types of analysis that can be performed is relatively small. I think that it would benefit the tool to better define the targeted users (clinicians, data scientists, enthusiasts?) and develop specifically towards a single audience. The tool leverages several existing R packages to run the analysis over the data and the provided tool can be described as a user-friendly wrapper around these libraries. The interface allows users to submit a file, and plot the results of the analysis within the app.
Is the source code available, and has an appropriate Open Source Initiative license <a href="https://opensource.org/licenses" target="_blank">(https://opensource.org/licenses)</a> been assigned to the code?	Yes
Additional Comments
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?	No
Additional Comments	I did not see any guidelines for contributing to the project in the paper, or in the associated GitHub repository.
Is the code executable?	Yes
Additional Comments
Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?	Yes
Additional Comments
Is the documentation provided clear and user friendly?	Yes
Additional Comments	Yes, the investigators did a great job providing documentation and installation instructions.
Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required?	Yes
Additional Comments
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?	Yes
Additional Comments	Yes, the investigators provide a clearly-stated list of dependencies and instructions on how to install them prior to running the application.
Have any claims of performance been sufficiently tested and compared to other commonly-used packages?	Yes
Additional Comments
Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?	Yes
Additional Comments	The paper, and GitHub repository point to a public dataset that can be used to test the application.
Are there (ideally real world) examples demonstrating use of the software?	Yes
Additional Comments	The investigators provide a video highlighting the use of the application and provide a use-case where they use the app to validate some existing knowledge.
Is automated testing used or are there manual steps described so that the functionality of the software can be verified?	No
Additional Comments	The application is sufficiently small that no automated testing or manual testing would necessary be required beyond validating that the application works.
Any Additional Overall Comments to the Author	The proposed application provides a nice tool that makes visualization of vcf data and analysis easier for users who are not comfortable working within R directly. It provides a nice demonstration how the scientific community can wrap scientific tools into deployable applications and tools that can be easily understood. A question remains on the target audience for an application like this as most people who are interested in these type of analysis and visualizations are, in fact, familiar enough with R, or other programming languages to directly leverage the libraries and plot the results. That said, as data integration and multi-omics visualization becomes more complex and the app provides more ways to visualize the data in meaningful ways, I do strongly believe that applications like this can provide a meaningful addition to the scientific tools that are available.
Recommendation	Accept

Open in a new tab

GigaByte.

Review MS

Editor: Ruslan Rust

Reviewer name and names of any other individual's who aided in reviewer	Ruslan Rust
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published manuscript. (If no, please inform the editor that you cannot review this manuscript.)	Yes
Is the language of sufficient quality?	Yes
Please add additional comments on language quality to clarify if needed	The language quality of the document is of sufficient quality. I did not notice any major issues.
Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?	Yes
Additional Comments	Yes, authors provide a statement of need. Authors mention that there is the need for a specialized software tool to identify genes from transcriptomic data and genetic variations such as SNPs, specifically for neurological diseases. Perhaps authors could expand on how they chose the diseases. E.g. stroke is not listed among the neurological diseases. Perhaps authors could expand a bit on the diseases they chose in the introduction.
Is the source code available, and has an appropriate Open Source Initiative license <a href="https://opensource.org/licenses" target="_blank">(https://opensource.org/licenses)</a> been assigned to the code?	Yes
Additional Comments	Yes the source code is available in github under the following link: https://github.com/omicscodeathon/neurovar. Additionally authors deposited the source code and additional supplementary data in a permanent depository with zenodo under the following DOI: https://zenodo.org/records/13375493. They also provided test data https://zenodo.org/records/13375591. I was able to download and access the complete set of data
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?	No
Additional Comments	I did not find any way to contribute, report issues or seek support. I would recommend that the authors add this information to the Github README file.
Is the code executable?	Yes
Additional Comments	Yes, I could execute the code using Rstudio 4.3.3
Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?	Yes
Additional Comments	I could follow the installation process, but perhaps authors could add few more details how to download from Github in more detail. As some scientist may have trouble with it. Also perhaps an installation video (additionally to the video demonstration of the Neurovar Shiny App might be helpful.·
Is the documentation provided clear and user friendly?	Yes
Additional Comments	The documentation is provided and is user friendly. I was able to install, test and run the tool using RStudio. Authors may consider to offer also a simple website link for the RshinyTools if possible. This may enable the access also for scientists that are not familiar with R.Especially, it is great that authors provided a demonstration video. I was able to reproduce the steps. However, I would recommend to add more information into the Youtube video. E.g. reference to the preprint/ paper and Github link would be helpful to connect the data.Perhaps authors could also expand a bit on the possibilities to export data from their software. And provide different formats e.g., PDF / PNG /JPEG. I think this is important for many researchs to export their outputs e.g., from the heatmaps.
Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required?	Yes
Additional Comments
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?	Yes
Additional Comments	Yes, dependencies are listed and are installed automatically. It worked for me with Rstudio version 4.3.3. In the manuscript and in the repository.
Have any claims of performance been sufficiently tested and compared to other commonly-used packages?	Not applicable
Additional Comments
Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?	Yes
Additional Comments	Yes the authors provide test data with this doi: https://doi.org/10.5281/zenodo.13375590
Are there (ideally real world) examples demonstrating use of the software?	Yes
Additional Comments	Yes, authors use the example of Epilepsy, focal epilepsy and the gene of interest DEPDC5. I replicated their search and got the same results. However, I find that the label in Figure 1 in the gene’s transcript could be a bit more clear. E.g. it is not clear to me what transcript start and end refers to. It might also be more helpful if authors provide an example dataset for the Expression data that is loaded in the software by default.Furthermore authors use a case study results using RNAseq in ALS patients with mutations in FUS, TARDBP, SOD1, VCP genes.
Is automated testing used or are there manual steps described so that the functionality of the software can be verified?	No
Additional Comments	Automated testing is not used as far as I can access it.
Any Additional Overall Comments to the Author	The preprint version of this paper was also reviewed in ResearchHub: https://www.researchhub.com/paper/7381836/neurovar-an-open-source-tool-for-gene-expression-and-variation-data-visualization-for-biomarkers-of-neurological-diseases/reviews My expertise: I am assistant professor in neuroscience and physiology at University of Southern California and work on stem cell therapies on stroke. We are particularly interested in working with genomic data and the development of new biomarkers for stroke, AD and other neurological diseases. Summary: The authors provide a software tool NeuroVar that helps visualizing genetic variations and gene expression profiles of biomarkers in different neurological diseases.
Recommendation	Minor Revisions

Open in a new tab

GigaByte.

Editor Decision

Editor: Hongfang Zhang

GigaByte. 2024 Nov 25;2024:gigabyte143.

Minor Revision

Olaitan Awe

GigaByte.

Assess Revision

Editor: Hongfang Zhang

GigaByte.

Final Data Preparation

Editor: Bastien Molcrette

GigaByte.

Editor Decision

Editor: Hongfang Zhang

GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment

Coded and written up as part of the African Society for Bioinformatics and Computational Biology (ASBCB) Omicscodeathons, NeuroVar is a new tool for visualizing genetic variation (Single nucleotide polymorphisms and insertions/deletions) and gene expression data related to neurological diseases. The open source R-tool is available as an online Shiny Application and a desktop application that does not require any computational skills to use. Initial validation and case studies for the tool present analyses of biomarkers in ALS, exemplifying its relevance in personalized medicine and genomic discovery. Being an Open Source project, after peer review more detail has been added in paper and GitHub repo on how to contribute, report issues or seek support. Alongside some improved installation guidelines. The paper states future developments will expand its dataset beyond the ClinGen database to encompass new databases and broader genetic inquiries.

Editor’s Assessment

Open in a new tab

GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The following resources can be accessed in the project’s GitHub repository, https://github.com/omicscodeathon/neurovar:

The open-source code for both the Shiny application and the desktop application.
An installation guide.
A video demonstration.
The processed case study data is available as demonstration data in Zenodo [43].

Data came from ClinVar, and the presented case study was performed on the public dataset SRP149638 from the SRA database.

The open source code of the Shiny application and the desktop application are available in the project’s GitHub Repository: https://github.com/omicscodeathon/neurovar.

Installation Guide, demonstration data, and video demonstration (Figure 8) are also available in the project’s GitHub Repository: https://github.com/omicscodeathon/neurovar.

Snapshots of the project code [45], shiny application code [46], and desktop application code [47] are all in Zenodo.

[ref1] 1.Hirschhorn JN, Lohmueller KE, Byrne E et al. A comprehensive review of genetic association studies. Genet. Med., 2002; 4(2): 45–61. doi: 10.1097/00125817-200203000-00002. [DOI] [PubMed] [Google Scholar]

[ref2] 2.Kunkle BW, Grenier-Boley B, Sims R et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet., 2019; 51: 414–430. doi: 10.1038/s41588-019-0358-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3.Li W, Fu Y, Halliday GM et al. PARK genes link mitochondrial dysfunction and Alpha-Synuclein pathology in sporadic Parkinson’s disease. Front. Cell Dev. Biol., 2021; 9: 612476. doi: 10.3389/fcell.2021.612476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4.Wesonga RM, Awe OI. . An assessment of traditional and genomic screening in newborns and their applicability for Africa. Inform. Med. Unlocked, 2022; 32: 101050. doi: 10.1016/j.imu.2022.101050. [DOI] [Google Scholar]

[ref5] 5.Rehm HL, Berg JS, Brooks LD et al. ClinGen—the clinical genome resource. N. Engl. J. Med., 2015; 372(23): 2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] 6.Aribi HB, Ding M, Kiran A. . Gene expression data visualization tool on the o²S²PARC platform. F1000Research, 2022; 11: 1267. doi: 10.12688/f1000research.126840.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7.Die JV, Elmassry MM, LeBlanc KH et al. geneHummus: an R package to define gene families and their expression in legumes and beyond. BMC Genom., 2019; 20: 591. doi: 10.1186/s12864-019-5952-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8.Ogbodo UC, Enejoh OA, Okonkwo CH et al. Computational identification of potential inhibitors targeting cdk1 in colorectal cancer. Front. Chem., 2023; 11: 1264808. doi: 10.3389/fchem.2023.1264808. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9.Awe OI, En najih N, Nyamari MN et al. Comparative study between molecular and genetic evolutionary analysis tools using African SARS-CoV2 variants. Inform. Med. Unlocked, 2023; 36: 101143. doi: 10.1016/j.imu.2022.101143. [DOI] [Google Scholar]

[ref10] 10.El Abed F, Baraket G, Nyamari MN et al. Differential expression analysis of miRNAs and mRNAs in epilepsy uncovers potential biomarkers. bioRxiv. 2023; 10.1101/2023.09.11.557132. [DOI]

[ref11] 11.Chikwambi Z, Hidjo M, Chikondowa P et al. Multi-omics data integration approach identifies potential biomarkers for Prostate cancer. bioRxiv. 2023; 10.1101/2023.09.11.557129. [DOI]

[ref12] 12.Nyamari MN, Omar KM, Fayehun AF et al. Expression level analysis of ACE2 receptor gene in African-American and non-African-American COVID-19 patients. bioRxiv. 2023; 10.1101/2023.09.11.557129. [DOI]

[ref13] 13.Nzungize L, Kengne-Ouafo JA, Wesonga MR et al. Transcriptional profiles analysis of COVID-19 and malaria patients reveals potential biomarkers in children. bioRxiv. 2022; 10.1101/2022.06.30.498338. [DOI]

[ref14] 14.Chang W, Cheng J, Allaire J et al. Shiny: web application framework for R. R package version 1.7.4.9002. 2023; https://shiny.rstudio.com/.

[ref15] 15.Chang W, Borges Ribeiro B. . Create dashboards with ‘Shiny’. 2023; http://rstudio.github.io/shinydashboard/.

[ref16] 16.Wickham H, François R, Henry L et al. dplyr: a grammar of data manipulation. 2022; https://github.com/tidyverse/dplyr.

[ref17] 17.Wickham H, Hester J, Bryan J. . readr: read rectangular text data. 2022; https://readr.tidyverse.org, https://github.com/tidyverse/readr.

[ref18] 18.Wickham H, Averick M, Bryan J et al. Welcome to the tidyverse. J. Open Source Softw., 2019; 4(43): 1686. doi: 10.21105/joss.01686. [DOI] [Google Scholar]

[ref19] 19.Wickham H, Henry L. . purrr: functional programming tools. 2023; https://purrr.tidyverse.org.

[ref20] 20.Knaus BJ, Grünwald NJ. . VCFR: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour., 2017; 17(1): 44–53. ISSN 757. doi: 10.1111/1755-0998.12549. [DOI] [PubMed] [Google Scholar]

[ref21] 21.Sievert C, Cheng J. . bslib: custom ‘bootstrap’ ‘sass’ themes for ‘shiny’ and ‘rmarkdown’. 2023; https://github.com/rstudio/bslib.

[ref22] 22.Wickham H. . stringr: simple, consistent wrappers for common string operations. 2022; https://stringr.tidyverse.org, https://github.com/tidyverse/stringr.

[ref23] 23.Dowle M, Srinivasan A, Gorecki J et al. Data.table: extension of ‘data.frame’. 2023; https://github.com/Rdatatable/data.table.

[ref24] 24.Hester J, Wickham H, Csárdi G. . Fs: cross-platform file system operations based on ‘libuv’. 2023; https://fs.r-lib.org.

[ref25] 25.Xie Y, Cheng J, Tan X et al. DT: a wrapper of the JavaScript library ‘DataTables’. 2023; https://github.com/rstudio/DT.

[ref26] 26.Grothendieck G. . sqldf: manipulate R data frames using SQL. 2017; Retrieved from https://cran.r-project.org/web/packages/sqldf/index.html.

[ref27] 27.Wickham H. . ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag, 2016. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org/. [Google Scholar]

[ref28] 28.Talbot H. . WxPython, a GUI toolkit. Linux J., 2000; 2000(74): 5. https://dl.acm.org/citation.cfm?id=349312. [Google Scholar]

[ref29] 29.McKinney W. . Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference. vol. 445, 2010; pp. 51–56, doi: 10.25080/Majora-92bf1922-00a. [DOI] [Google Scholar]

[ref30] 30.Hunter JD. . Matplotlib: a 2D graphics environment. Comput. Sci. Eng., 2007; 9(3): 90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]

[ref31] 31.Harris CR, Millman KJ, Van Der Walt SJ et al. Array programming with NumPy. Nature, 2020; 585(7825): 357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] 32.Duarte M. . GitHub - marcelotduarte/cx_Freeze: create standalone executables from Python scripts, with the same performance and is cross-platform. 2023; https://github.com/marcelotduarte/cx_Freeze.

[ref33] 33.Zucca S, Gagliardi S, Pandini C et al. RNA-Seq profiling in peripheral blood mononuclear cells of amyotrophic lateral sclerosis patients and controls. Sci. Data, 2019; 6(1): 1–8. doi: 10.1038/sdata.2019.6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34.Leinonen R, Sugawara H, Shumway M. . International nucleotide sequence database collaboration. The sequence read archive. Nucleic Acids Res., 2011; 39(Database issue): D19–D21. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] 35.Ben Aribi H, Dixon I, Abassi N et al. Exvar: an R package for gene expression and genetic variation data analysis and visualization. 2023; https://github.com/omicscodeathon/Exvar.

[ref36] 36.Wang W, Carroll T. . Rfastp: an ultra-fast and all-in-one fastq preprocessor (quality control, adapter, low quality and polyX trimming) and UMI sequence parsing). R package (Version 1.6.0). Bioconductor. 2022; 10.18129/B9.bioc.Rfastp. [DOI]

[ref37] 37.Barr C, Wu T, Lawrence M. . gmapR: an R interface to the GMAP/GSNAP/GSTRUCT suite (Version 1.40.0). Bioconductor. 2022; 10.18129/B9.bioc.gmapR. [DOI]

[ref38] 38.Lawrence M, Huber W, Pagès H et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol., 2013; 9(8): e1003118. doi: 10.1371/journal.pcbi.1003118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] 39.Love MI, Huber W, Anders S. . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 2014; 15: 550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 40.Lawrence M, Degenhardt J, Gentleman R. . VariantTools: tools for exploratory analysis of variant calls (Version 1.40.0). Bioconductor. 2022; 10.18129/B9.bioc.VariantTools. [DOI]

[ref41] 41.Obenchain V, Lawrence M, Carey V et al. VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics, 2014; 30(14): 2076–2078. doi: 10.1093/bioinformatics/btu168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref42] 42.Tollefson GA, Schuster J, Gelin F et al. VIVA (visualization of variants): a VCF file visualization tool. Sci. Rep., 2019; 9(1): 12648. doi: 10.1038/s41598-019-49114-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] 43.Ben Aribi H, Abassi N, Awe OI. . NeuroVar demonstration data [data set]. Zenodo. 2024; 10.5281/zenodo.13375591. [DOI]

[ref44] 44.Ben Aribi H. . Video demonstration: neurovar shiny application. Youtube. 2024; https://www.youtube.com/watch?v=cYZ8WOvabJs.

[ref45] 45.Ben Aribi H, Abassi N, Awe OI. . NeuroVar: a genetic expression and variation data visualization tool for neurological diseases’ biomarkers. Zenodo. 2024; 10.5281/zenodo.13375646. [DOI]

[ref46] 46.Ben Aribi H, Abassi N, Awe OI. . NeuroVar shiny application. Zenodo. 2024; 10.5281/zenodo.13375493. [DOI]

[ref47] 47.Ben Aribi H, Abassi N, Awe OI. . NeuroVar desktop application. Zenodo. 2024; 10.5281/zenodo.13375579. [DOI]

PERMALINK

NeuroVar: an open-source tool for the visualization of gene expression and variation data for biomarkers of neurological diseases

Hiba Ben Aribi

Najla Abassi

Olaitan I Awe

Roles

Abstract

Statement of need

Implementation

Data collection

Software development

Pipeline validation and case study

Results

Supported disease

Operation and implementation

The application’s usage

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Case study results

Figure 8.

Discussion and conclusion

Availability of source code and requirements

Acknowledgements

Data availability

Abbreviations

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Authors’ contributions

Funding

References

Article Submission

Dr Olaitan Awe

Roles

Assign Handling Editor

Roles

Editor Assess MS

Roles

Curator Assess MS

Roles

Review MS

Roles

Review MS

Roles

Editor Decision

Roles

Minor Revision

Dr Olaitan Awe

Roles

Assess Revision

Roles

Final Data Preparation

Roles

Editor Decision

Roles

Accept

Roles

Export to Production

Roles

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases