Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2026 Mar 4:2024.12.23.629818. Originally published 2024 Dec 23. [Version 4] doi: 10.1101/2024.12.23.629818

The FAIRSCAPE AI-readiness Framework for Biomedical Research

Sadnan Al Manir, Maxwell Adam Levinson, Justin Niestroy, Christopher Churas, Nathan C Sheffield, Brynne Sullivan, Karen Fairchild, Monica Munoz-Torres, Sarah J Ratcliffe, Jillian A Parker, Trey Ideker, Timothy Clark
PMCID: PMC11703166  PMID: 39763764

Abstract

Objective

Biomedical datasets intended for use in AI applications require packaging with rich pre-model metadata to support model development that is explainable, ethical, epistemically grounded and FAIR (Findable, Accessible, Interoperable, Reusable).

Methods

We developed FAIRSCAPE, a digital commons environment, using agile methods, in close alignment with the team developing the AI-readiness criteria and with the Bridge2AI data production teams. Work was initially based on an existing provenance-aware framework for clinical machine learning. We incrementally added RO-Crate data+metadata packaging and exchange methods, client-side packaging support, provenance visualization, and support metadata mapped to the AI-readiness criteria, with automated AI-readiness evaluation. LinkML semantic enrichment and Croissant ML-ecosystem translations were also incorporated.

Results

The FAIRSCAPE framework generates, packages, evaluates, and manages critical pre-model AI-readiness and explainability information with descriptive metadata and deep provenance graphs for biomedical datasets. It provides ethical, schema, statistical, and semantic characterization of dataset releases, licensing and availability information, and an automated AI-readiness evaluation across all 28 AI-readiness criteria. We applied this framework to successive, large-scale releases of multimodal datasets, progressively increasing dataset AI-readiness to full compliance.

Conclusion

FAIRSCAPE enables AI-readiness in biomedical datasets using standard metadata components and has been used to establish this pattern across a major, multimodal NIH data generation program. It eliminates early-stage opacity apparent in many biomedical AI applications and provides a basis for establishing end-to-end AI explainability.

Full Text

The Full Text of this preprint is available as a PDF (747.9 KB). The Web version will be available soon.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES