Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2023 Dec 9;39(12):btad746. doi: 10.1093/bioinformatics/btad746

Bibliometric analysis of neuroscience publications quantifies the impact of data sharing

Herve Emissah 1,2, Bengt Ljungquist 3, Giorgio A Ascoli 4,5,
Editor: Hanchuan Peng
PMCID: PMC10733721  PMID: 38070153

Abstract

Summary

Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy. We carried out a comprehensive bibliometric analysis of neural morphology publications to quantify the impact of data sharing in the neuroscience community. Our findings demonstrate that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting authors. The rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. We also released an open-source citation tracking web-service allowing researchers to monitor reusage of their datasets in independent peer-reviewed reports. These results and tools can facilitate the recognition of shared data reuse for merit evaluations and funding decisions.

Availability and implementation

The application is available at: http://cng-nmo-dev3.orc.gmu.edu:8181/. The source code at https://github.com/HerveEmissah/nmo-authors-app and https://github.com/HerveEmissah/nmo-bibliometric-analysis.

1 Introduction

Omics and structural biology have benefited enormously from the consistent practice of data sharing, with thriving research subfields fueled by seminal discoveries entirely based on publicly available datasets, and vibrant ecosystems of related scientific tools (Field et al. 2009, Chervitz et al. 2011, Wilson et al. 2021). Neuroscience has followed suit only more recently and more gradually, in part due to greater data heterogeneity and the lack of a clear functional code akin to that of genomic sequences (Gardner et al. 2003, Gleeson et al. 2017, Poline et al. 2022). One particular domain of neuroscience, digital reconstructions of neural morphology, is especially amenable to data sharing (Ascoli 2006, 2015, Ascoli et al. 2017).

The accelerating development of advanced technologies in microscopic imaging and computational processing has greatly enhanced 3D neural reconstruction methods, enabling the creation of ever larger amounts of digital tracing data (Liu et al. 2022, Manubens-Gil et al. 2023). Capitalizing on this growth requires effective data accessibility to propel scientific discovery in neuroscience. Indeed, this is the goal of NeuroMorpho.Org, an open-access archive of 3D neural reconstructions and associated metadata (Ascoli et al. 2007). Today, this resource comprises hundreds of thousands of downloadable reconstructions, each of them linked to peer-reviewed publications from laboratories worldwide (Akram et al. 2018). Global collaborative efforts and data sharing from multiple sources are extremely valuable to researchers to gain a better understanding of the brain and its cellular constituents given the strong association between neuronal form and function (Parekh and Ascoli 2015). It is essential to determine, however, the effective extent and impact of free data exchange.

Previous research quantified the benefits of data sharing to the original authors who shared data, in addition to the data users and the community at large, in specific disciplines such as cancer microarray clinical trials (Piwowar et al. 2007) and noninvasive human brain imaging (Milham et al. 2018). However, it is not yet known whether these findings generalize to other fields, and in particular if neural morphology data sharing provides a positive return on investment for the original data owners and/or significantly impacts scientific throughput.

Here, we present a comprehensive bibliometric analysis of published literature pertaining to neural morphology to assess the impact of data sharing on the overall field as well as on individual investigators. We further introduce a dynamic web-based research tool to determine the scientific impact of uniquely identified, shared neural morphology datasets. The application serves as a valuable resource for neuroscientists to demonstrate the direct and indirect benefits of sharing their data.

2 Materials and methods

This study relies on datasets retrieved from NeuroMorpho.Org, Semantic Scholar, and Europe PubMed Central (EuropePMC). Semantic Scholar is an Artificial Intelligence-powered engine for research literature including a large neuroscience collection (Jones 2015). EuropePMC is an open-access archive of life science publications (Ferguson et al. 2021). We have selected these databases due to their extensive full-text record coverage and accessibility via Application Program Interface (API). In particular, NeuroMorpho.Org tallies availability and reusage of neural morphology data, while Semantic Scholar and EuropePMC track peer-reviewed citations, broadly considered an expedient proxy for scientific impact.

We refer here to publications that generated new digital reconstructions of neural morphology as Describing. NeuroMorpho.Org divides Describing publications into three categories depending on whether the underlying datasets are publicly available (Sharing), unavailable (Unsharing), or determining availability. The database curators determine this information through direct interaction with data owners (Maraver et al. 2019) and update it publicly every month (neuromorpho.org/LS_queryStatus.jsp? status=Available&page=0).

NeuroMorpho.Org also tracks the publications that cite the Describing articles and/or utilize the corresponding downloaded digital reconstructions, referred to as Citing and Using, respectively (neuromorpho.org/LS_usage.jsp). We fetch the Describing, Citing and Using metadata via the NeuroMorpho.Org API (neuromorpho.org/apiReference.html#literature) using a Python application, implemented with the Flask framework and released open source (https://github.com/HerveEmissah/nmo-bibliometric-analysis), to populate a MongoDB database. We also retrieve from the NeuroMorpho API (neuromorpho.org/api/neuron) and store in the database the upload date for each dataset.

We then fetch citations to and references of Describing and Using/Citing publications programmatically for storage in the MongoDB database. Specifically, the Semantic Scholar API (api.semanticscholar.org/v1/paper/{doi}) returns a JSON formatted response containing both citations and references metadata for a given publication. EuropePMC, in contrast, exposes distinct API endpoints for citations (ebi.ac.uk/europepmc/webservices/rest/MED/{pmid}/citations) and references (ebi.ac.uk/europepmc/webservices/rest/MED/{pmid}/references). We use a union of citations and a union of references from both Semantic Scholar and EuropePMC to provide a more complete record.

3 Results

We first investigated whether openly sharing via NeuroMorpho.Org the digital reconstructions of neural morphology described in an article increases the number of citations to that article (Fig. 1). We started by comparing the number of citations to Sharing (N = 1656) and Unsharing (N = 3089) articles. Specifically, we normalized the yearly number of citations for a given Describing article by dividing its accrued citations by the number of years elapsed since publication. The analysis (Fig. 1A) demonstrates a significant difference in yearly citations between groups (Sharing: 8.91 ± 14, Unsharing: 6.19 ± 12; effect size +43.9%, P = 0.006).

Figure 1.

Figure 1.

Publicly sharing digital reconstructions of neural morphology increases the number of citations to the Describing article. (A) Distributions of citations for Sharing and Unsharing articles bin-grouped using logarithmic scale. (B) Relative increase of citations to Sharing articles specifically due to Using/Citing publications. (C) Yearly citations to Sharing article by Using/Citing publications as a function of the time elapsed since the publication of the Sharing article. (D) Cumulative sum of Describing and Using/Citing article counts by year. (E) Citations accrued by Describing and Using/Citing articles by year. (F) Proportion of Using/Citing publications relying on different numbers of Describing articles. Bottom: ratio between the number of Using/Citing publications relying on ≥4 Describing articles and those relying on ≤3 Describing articles. Right: Mean number of citations accrued by Using/Citing publications as a function of the number of Describing articles cited. The error bars indicate a 95% confidence interval.

We then asked whether this increase was specifically due to the citations by the Citing and Using publications. Thus, we calculated the Citation Increase for each Sharing article based on the following formula: CitationIncrease = NMO_Citations/(Citations_since_upload-NMO_Citations), where NMO_Citations represents the citations to the Sharing article by the Using/Citing publications, and Citations_since_upload represents the overall citations to the Sharing article since the upload date of the corresponding dataset. The resultant histogram distribution (Fig. 1B) reveals that the Citation Increase of Sharing articles due to the secondary publications (13.8%) explains less than a third of the difference in citations between Sharing and Unsharing articles. Taken together, these analyses indicate that sharing neural reconstruction data through NeuroMorpho.Org increases the impact of the original publication.

Next, we explored if the numbers of citations to Sharing articles due to secondary publications decreases over time after publication. The results suggest a broadly uniform citation likelihood without a tendency to decrease over the whole 16 years of the project activity (Fig. 1C).

To help assess the impact of shared data, it is also interesting to compare the number of Describing and Using/Citing publications and their respective citations. Both the cumulative number of Describing articles and of Using/Citing articles increased consistently from the project launch to present (Fig. 1D). Notably, the Using/Citing publications, which rely on shared data, effectively double the Describing data literature. Moreover, comparing the overall citations to Describing and Using/Citing articles (Fig. 1E) again demonstrates that data reusage increases the number of citations in the field by nearly 50%. These results further underscore the added impact of data sharing in neuroscience.

Moreover, we found that fewer than a third of Using/Citing studies only refer to a single Describing article, and approximately the same proportion relies on 2–3 data sources (Fig. 1F). In contrast, more than 40% of Using/Citing articles rely on four or more sources, and nearly 10% require ten or more Describing publications. Interestingly, the ratio between the number of Using/Citing publications relying on four or more Describing articles and those relying on three or fewer Describing articles (Fig. 1F, bottom) is substantially greater after 2016 (ratio: 0.82) than before (ratio: 0.54), reflecting the increasing emphasis on big science, data aggregation, and meta-analyses. We then asked whether the Using/Citing articles relying on a greater number of data sources are cited more (Fig. 1F, right). Indeed, while Using/Citing publications relying on 1–3 data sources only accrue on average 60–65 citations, the mean number of accrued citations reaches ∼80 for articles using 4–9 sources and exceeds 110 for articles using 10 or more sources.

To provide researchers the capability to investigate the impact of Sharing articles on secondary publications, we made the bibliometric functionality utilized in the above analysis available as a public service through a web-based user interface (see Supplementary Materials).

4 Discussion

The analysis of neuroscience data sharing among researchers provides insight into current trends and raises awareness to encourage collaborations and open data release (Poldrack and Gorgolewski 2014). It is intuitively obvious that the public availability of data is beneficial to researchers who can reuse it for follow-up analysis and diverse scientific applications (Halavi et al. 2012). However, whether it in fact provides advantages for the data owners to share data has remained a topic for discussion.

Our findings demonstrate indeed that sharing digital reconstructions of neural morphology via NeuroMorpho.Org leads to a significant increase of citations to the original article, thus directly benefiting the authors. Moreover, the rate of data reusage remains constant for at least 16 years after sharing (the whole period analyzed), altogether nearly doubling the peer-reviewed discoveries in the field. Furthermore, the recent availability of larger and more numerous datasets fostered integrative applications, which accrue on average twice the citations of re-analyses of individual datasets. These results demonstrate the broader impact of open sharing of neural reconstructions on scientific discovery.

We also designed and deployed an open-source bibliometric tracking web-service that allows researchers to monitor reusage of their datasets in independent peer-reviewed reports. This tool can facilitate the recognition of shared data reuse for promotion and tenure considerations, merit evaluations, and funding decisions.

Supplementary Material

btad746_Supplementary_Data

Acknowledgements

The authors thank Dr Patricia Maraver for helping with the NeuroMorpho.Org literature management system.

Contributor Information

Herve Emissah, Bioinformatics Program, College of Science, George Mason University, Fairfax, VA 22030, United States; Center for Neural Informatics, Structures, & Plasticity (CN3) and Bioengineering Department, College of Engineering & Computing, George Mason University, Fairfax, VA 22030, United States.

Bengt Ljungquist, Center for Neural Informatics, Structures, & Plasticity (CN3) and Bioengineering Department, College of Engineering & Computing, George Mason University, Fairfax, VA 22030, United States.

Giorgio A Ascoli, Bioinformatics Program, College of Science, George Mason University, Fairfax, VA 22030, United States; Center for Neural Informatics, Structures, & Plasticity (CN3) and Bioengineering Department, College of Engineering & Computing, George Mason University, Fairfax, VA 22030, United States.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

None declared.

Funding

This work was supported by NIH [R01NS39600 and R01NS86082].

References

  1. Akram MA, Nanda S, Maraver P. et al. An open repository for single-cell reconstructions of the brain Forest. Sci Data 2018;5:180006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ascoli GA. Mobilizing the base of neuroscience data: the case of neuronal morphologies. Nat Rev Neurosci 2006;7:318–24. [DOI] [PubMed] [Google Scholar]
  3. Ascoli GA. Sharing neuron data: carrots, sticks, and digital records. PLoS Biol 2015;13:e1002275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ascoli GA, Donohue DE, Halavi M.. NeuroMorpho.Org: a central resource for neuronal morphologies. J Neurosci 2007;27:9247–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ascoli GA, Maraver P, Nanda S. et al. Win–win data sharing in neuroscience. Nat Methods 2017;14:112–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chervitz SA, Deutsch EW, Field D. et al. Data standards for omics data: the basis of data sharing and reuse. Methods Mol Biol 2011;719:31–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Ferguson C, Araújo D, Faulk L. et al. Europe PMC in 2020. Nucleic Acids Res 2021;49:D1507–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Field D, Sansone S-A, Collis A. et al. Omics data sharing. Science 2009;326:234–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gardner D, Toga AW, Ascoli GA. et al. Towards effective and rewarding data sharing. Neuroinformatics 2003;1:289–95. [DOI] [PubMed] [Google Scholar]
  10. Gleeson P, Davison AP, Silver RA. et al. A commitment to open source in neuroscience. Neuron 2017;96:964–5. [DOI] [PubMed] [Google Scholar]
  11. Halavi M, Hamilton KA, Parekh R. et al. Digital reconstructions of neuronal morphology: three decades of research trends. Front Neurosci 2012;6:49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jones N. Artificial-intelligence institute launches free science search engine. Nature 2015. 10.1038/nature.2015.18703. [DOI] [Google Scholar]
  13. Liu Y, Wang G, Ascoli GA. et al. Neuron tracing from light microscopy images: automation, deep learning and bench testing. Bioinformatics 2022;38:5329–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Manubens-Gil L, Zhou Z, Chen H. et al. BigNeuron: a resource to benchmark and predict performance of algorithms for automated tracing of neurons in light microscopy datasets. Nat Methods 2023;20:824–35. [DOI] [PubMed] [Google Scholar]
  15. Maraver P, Armañanzas R, Gillette TA. et al. PaperBot: open-source web-based search and metadata organization of scientific literature. BMC Bioinformatics 2019;20:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Milham MP, Craddock RC, Son JJ. et al. Assessment of the impact of shared brain imaging data on the scientific literature. Nat Commun 2018;9:2818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Parekh R, Ascoli GA.. Quantitative investigations of axonal and dendritic arbors: development, structure, function, and pathology. Neuroscientist 2015;21:241–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Piwowar HA, Day RS, Fridsma DB.. Sharing detailed research data is associated with increased citation rate. PLoS One 2007;2:e308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Poldrack RA, Gorgolewski KJ.. Making big data open: data sharing in neuroimaging. Nat Neurosci 2014;17:1510–7. [DOI] [PubMed] [Google Scholar]
  20. Poline J-B, Kennedy DN, Sommer FT. et al. Is neuroscience FAIR? A call for collaborative standardisation of neuroscience data. Neuroinformatics 2022;20:507–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Wilson SL, Way GP, Bittremieux W. et al. Sharing biological data: why, when, and how. FEBS Lett 2021;595:847–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btad746_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES