Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Feb 27:2023.02.24.529867. [Version 1] doi: 10.1101/2023.02.24.529867

Auto-STEED: A data mining tool for automated extraction of experimental parameters and risk of bias items from in vivo publications

Wolfgang Emanuel Zurrer, Amelia Elaine Cannon, Ewoud Ewing, Marianna Rosso, Daniel S Reich, Benjamin V Ineichen
PMCID: PMC10187249  PMID: 37205453

Abstract

Background

Systematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain systematic review steps such as data extraction are labour-intensive which hampers their applicability, not least with the rapidly expanding body of biomedical literature.

Objective

To bridge this gap, we aimed at developing a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n=45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n=31 publications; multiple sclerosis, n=244 publications).

Results

Our data mining tool Auto-STEED (Automated and STructured Extraction of Experimental Data) was able to extract key experimental parameters such as animal models and species as well as risk of bias items such as randomization or blinding from in vivo studies. Sensitivity and specificity were over 85 and 80%, respectively, for most items in both validation corpora. Accuracy and F-scores were above 90% and 0.9 for most items in the validation corpora. Time savings were above 99%.

Conclusions

Our developed text mining tool Auto-STEED is able to extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. With this, the tool can be deployed to probe a field in a research improvement context or to replace one human reader during data extraction resulting in substantial time-savings and contribute towards automation of systematic reviews. The function is available on Github.

Full Text

The Full Text of this preprint is available as a PDF (537.9 KB). The Web version will be available soon.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES