Harnessing big data and artificial intelligence in transfusion medicine: Opportunities for precision, safety and efficiency

Sheharyar Raza; Ruchika Goel; Christian Erikstrup; Angelo D'Alessandro; Brian Custer; Na Li; the ISBT BIG DATA Working Party

doi:10.1111/vox.70236

. 2026 Mar 22;121(4):441–452. doi: 10.1111/vox.70236

Harnessing big data and artificial intelligence in transfusion medicine: Opportunities for precision, safety and efficiency

Sheharyar Raza ^1,², Ruchika Goel ^3,^4,^✉, Christian Erikstrup ⁵, Angelo D'Alessandro ⁶, Brian Custer ⁷, Na Li ^8,⁹; the ISBT BIG DATA Working Party

PMCID: PMC13066917 PMID: 41866232

Abstract

Transfusion medicine generates enormous volumes of data across the vein‐to‐vein continuum, spanning donor characteristics, laboratory testing, component manufacturing, logistics and recipient outcomes. The emergence of big data infrastructures, coupled with artificial intelligence (AI), offers a transformative opportunity to harness this information for safer, more efficient and better personalized transfusion practices. This narrative review outlines current and potential applications of AI and machine learning (ML) at each phase of the big data pipeline in transfusion medicine, including data collection, wrangling and harmonization, validation, feature engineering, analysis, publication and knowledge mobilization. We discuss how AI‐enabled methods—such as natural language processing to extract variables, anomaly detection for product quality, supervised models to predict risks, federated analysis for collaboration, and forecasting algorithms to optimize inventory and logistics—may address longstanding challenges related to data fragmentation, unstructured documentation and labour‐intensive manual validation. We emphasize critical risks and limitations of applying AI to big data analytics and discuss mitigation through robust governance, performance monitoring, fairness audits, cybersecurity measures and transparent human oversight. We end by offering key recommendations and future directions, highlighting that strategic, equitable and ethically sound implementation will be essential to realizing benefits and ensuring trust in an increasingly data‐driven transfusion ecosystem.

Keywords: artificial intelligence, big data, machine learning, transfusion medicine

Highlights.

Artificial intelligence and machine learning are powerful tools to harness big data across the entire vein‐to‐vein continuum in transfusion medicine, from donor selection to recipient outcomes.
AI‐enabled methods such as natural language processing, risk prediction, anomaly detection and forecasting, can address big data challenges such as fragmentation, unstructured records, and labour‐intensive validation.
Robust governance, transparency, fairness monitoring and human oversight are essential to ensure ethical, equitable and trustworthy implementation of AI in transfusion practice.

INTRODUCTION

Transfusion medicine is among the most data‐intensive disciplines in healthcare. Transfusion services generate vast amounts of data across the ‘vein‐to‐vein’ continuum from donor recruitment and screening through laboratory testing, component manufacturing, distribution, bedside administration and post‐transfusion outcomes. Each aspect produces structured and unstructured information: donor demographics and epidemiology, product testing and modifications, blood distribution and transportation, supply–demand metrics, financial records, and patient‐level clinical outcomes, among many other examples [1]. When systematically linked, these datasets enable large‐scale assessments of safety, efficiency and equity to inform data‐driven decision‐making within and beyond transfusion medicine [2, 3].

The emergence of big data and heterogeneous datasets offers substantial advances in donor recruitment and screening, product–recipient matching with genomic precision, blood inventory forecasting, clinical risk prediction, transfusion‐transmitted infection surveillance and novel epidemiological discovery [4, 5]. As the data applications continue to grow in scope and modality, methods from artificial intelligence (AI) offer novel solutions to improve the feasibility harnessing big data in transfusion medicine, which may in turn enable broader AI/machine learning (ML) application. Although interest in AI applications in transfusion medicine is increasing, their use in transfusion research remains nascent, with limited familiarity posing a key barrier to adoption and implementation [6, 7, 8].

The term big data denotes the large, complex datasets, commonly characterized by the ‘5 Vs’: volume (massive in scale, with millions or even billions of data points), velocity (continuous, often real‐time generation), variety (diverse formats with both structured and unstructured data), veracity (reliable and validated) and valuable (built to provide domain‐specific insights) [9, 10]. In transfusion research, big data can include data from electronic health records (EHRs), omics databases, medical imaging devices (e.g., chest radiographs for confirmation of respiratory transfusion reactions), biological sensors (e.g., oximetry readings), blood bank inventories, logistics software and clinical trials, and others. What all big data pipelines share is a the need for data governance and sequential steps for data collection, preprocessing (wrangling and standardization), validation, feature extraction, storage with privacy safeguards, analysis and knowledge dissemination [11].

Prominent vein‐to‐vein databases include the Scandanavian Donations and Transfusion database, a Danish‐Swedish binational registry linking donors and recipients over multiple decades to study transfusion safety, donor health and long‐term outcomes [3]; and the US‐based Recipient Epidemiology and Donor Evaluation Study (REDS‐IV‐P) [12, 13], which integrates data from hospitals and blood centres to examine donor characteristics, recipient outcomes, and blood utilization across diverse populations. Additional national databases, such as the Danish Blood Donor Study, the Dutch Transfusion Data Warehouse and Australia's National Transfusion Dataset, further expand global capacity for large‐scale transfusion research and collaboration [14]. Complementing these are large regional administrative datasets, for example, National Inpatient Sample (NIS), National Readmission Database (NRD) and National Surgical Quality Improvement Project (NSQIP), which have been leveraged to study transfusion trends and associations despite limited clinical granularity [15, 16, 17]. Collectively, these resources enable large‐scale epidemiological, health services, and quality improvement research. A more comprehensive discussion of features and limitations of existing vein‐to‐vein datasets is available in a recent publication [18].

AI and ML provide powerful computational approaches for extracting insights from big data resources. Broadly, AI encompasses algorithms and systems that perform tasks traditionally requiring human intelligence, such as pattern recognition, reasoning, and prediction [17, 19]. ML, the most common approach to AI, enables computers to learn relationships from data and improve performance on a task without explicit programming. Table 1 provides a glossary of terms related to AI/ML with examples in relation to transfusion medicine [20].

TABLE 1.

Glossary of terms related to artificial intelligence and machine learning.

Term	Definition	Context/example in transfusion medicine
Big data	Very large, complex and organized datasets containing diverse high‐volume data managed within an infrastructure that allows access and usage	Data from EHRs, omics, medical imaging, sensors, blood inventories, logistics systems or clinical trials
Multimodal datasets	Datasets containing information collected from multiple distinct data types and sources, such as numerical measurements, free text, audiovisual content and other structured data objects	Vein‐to‐vein datasets containing numerical lab results, free text (e.g., clinical notes, donor surveys), diagnostic images (e.g., radiographs for respiratory transfusion reaction detection, blood smears), biosensor data (e.g., electrocardiograms for post‐transfusion cardiac events) and molecular data (e.g., genotyping)
Natural language processing (NLP)	A set of methods that allow computers to ingest, clean, structure, analyse and generate human language data so that it can be used for analysis, classification and retrieval	Extracting structured data from clinical notes, diagnostic test reports, donor questionnaires and interviews
Machine learning (ML)	Statistical methods that allow computers to learn relationships using data	Predicting likelihood of transfusion needs based on patient variables
Supervised ML	Models learn from labelled datasets to predict specific outcomes	Predicting donor reaction risk using existing data on donor characteristics and reactions
Unsupervised ML	Models find patterns or groupings within datasets	Clustering donors or patients by risk profiles or utilization patterns
Deep learning	Form of ML using multi‐layered neural networks capable of modelling highly complex relationships in large and diverse data	Predicting transfusion‐associated circulatory overload using medical imaging and clinical data
Artificial intelligence (AI)	Algorithms and systems that mimic human intelligence to perform tasks such as pattern recognition, reasoning and prediction, often using ML	Identifying donors who may be at high risk of deferral, using cellular metabolic biomarkers to predict post‐transfusion laboratory increments
Analytical AI	AI systems designed to analyse existing data to identify patterns, generate insights, recognize risk or make predictions. Often includes ML, statistical modelling, and decision‐support algorithms	Predicting transfusion needs for individual patients based on characteristics, identifying high‐risk patients for reactions, optimizing blood inventory based on historical and real‐time data
Generative AI	AI models capable of synthesizing new content or data, such as text, images or predictions	Chatbots for transfusion workflows, simulation of synthetic datasets, large language models to synthesize transfusion evidence

Open in a new tab

Abbreviation: EHRs, electronic health records.

It should be noted that although AI/ML have considerable potential to advance big data science, these approaches are not synonymous with big data analytics and do not encompass all analytical tools used in transfusion medicine. Large‐scale vein‐to‐vein research and operational analytics also rely on complementary approaches such as traditional biostatistical modelling, causal inference frameworks, simulation for optimization, and non‐ML forecasting, implementation science, and qualitative methods, among others. These methods remain foundational for many problems, and AI/ML should be viewed as augmenting rather than replacing established approaches in big data and the broader landscape of transfusion research.

This review examines how AI/ML can be incorporated into each stage of the big data pipeline to support transfusion research and practice (Figure 1). We begin by outlining key concepts in big data as they relate to AI/ML, then examine major applications and limitations of integrating AI across different phases of the big data research pipeline, and conclude with recommendations for AI/ML adoption in transfusion science.

Schematic of a big data pipeline in transfusion medicine.

INCORPORATING AI INTO BIG DATA WORKFLOWS

AI/ML and big data relate synergistically

AI/ML can enhance the breadth, quality and availability of data, while big data can in turn enable more capable and accurate models and algorithms. The following subsections discuss how AI can be embedded at each phase of the big data process. AI/ML applications at various phases are summarized in Table 2. Examples of methods from AI/ML, typical uses, advantages, limitations and examples from transfusion medicine are summarized in Table 3.

TABLE 2.

Stepwise integration of big data and artificial intelligence in transfusion medicine.

Step	Objective	Example in transfusion medicine
1. Strategic assessment and use‐case prioritization	Identify high‐impact, feasible AI opportunities	National blood service selects: (a) donor iron‐deficiency prediction and (b) inventory forecasting for platelets as first AI projects.
2. Data infrastructure and governance setup	Build secure, scalable big data foundation	Link donor registry, testing database and hospital EHR transfusion records under a governed ‘vein‐to‐vein’ research environment.
3. Data mapping, wrangling and standardization	Convert heterogeneous data into analysable formats	Standardize indications for transfusion and reactions across multiple hospitals and blood establishments using OMOP and NLP extraction.
4. Validation and quality assurance	Ensure data and labels are accurate and reliable	Validate algorithmic identification of TACO/TRALI against manual haemovigilance review in a sample of cases.
5. Feature engineering and model development	Build AI models tailored to clinical/operational questions	Develop a supervised model predicting donor vasovagal reaction risk using demographics, prior donation history and biometrics.
6. Pilot implementation in ‘Sandbox’	Test AI tools in a controlled, low‐risk environment	Silent run of inventory forecasting model for 6 months to compare predicted vs. actual platelet shortages in one region.
7. Clinical/operational integration	Embed AI into real‐world workflows with oversight	Display a ‘transfusion appropriateness’ suggestion in CPOE, with an explanation panel and easy override by clinicians.
8. Monitoring, audit and governance review	Continuously monitor safety, performance and equity	Quarterly review of model performance by age, sex, ethnicity and hospital, with corrective actions for any disparities.
9. Scale‐up and generalization	Extend successful models across sites and use cases	Roll out validated inventory model to all blood centres nationally, with local recalibration and support tools.
10. Knowledge mobilization and learning health system	Close the loop and update practice based on data	National ‘vein‐to‐vein’ dashboard for policymakers and clinicians that updates monthly and informs guideline revisions.

Open in a new tab

Abbreviations: AI, artificial intelligence; CPOE, computerized provider order entry; EHR, electronic health record; NLP, natural language processing; OMOP, Observational Medical Outcomes Partnership (common data model); TACO, Transfusion‐Associated Circulatory Overload; TRALI, transfusion‐related acute lung injury.

TABLE 3.

Applications of artificial intelligence for common challenges in big data pipelines.

AI/ML methods	Typical inputs	Typical tasks	Advantages	Limitations	Application in transfusion medicine
Health records digitization	Scanned forms PDFs Images of paper records	Data collection Wrangling Validation	Scales conversion of paper to analysable data Reduces manual transcription burden	Sensitive to scan quality/handwriting Error propagation if not validated Requires human‐in‐the‐loop QC	Multimodal extraction from scanned transfusion reaction reports Donor screening forms
NLP	Free‐text donor questionnaires Clinical notes Transfusion reaction narratives Reports Literature	Wrangling/standardization Validation (named entity recognition from text data) Feature engineering for ML models Knowledge mobilization	Converts narrative text to structured information Enables scalable abstraction and coding Supports health ontology mapping	Domain knowledge and language drift Privacy risk LLM hallucination	Named entity recognition for clinical reports LLM‐based OMOP mapping tools LLMs for evidence synthesis and summarization
Supervised ML	Structured tabular data with labels	Risk prediction Classification	Optimizes prediction for defined outcomes Achieves higher accuracy than traditional prediction models	Requires high‐quality labels and representative training data Risk of bias if data are unbalanced Performance can degrade with drift	Predicting post‐donation iron recovery using ML with external validation Predicting risk of recipient transfusion reaction
Unsupervised ML (clustering; dimensionality reduction)	Unlabelled structured data High‐dimensional data	Phenotype discovery Data reduction and compression	Supports hypothesis generation and subgroup discovery Reduces complexity and noise Reduces computational costs	Clusters can be unstable and sensitive to pre‐processing Interpretability may be limited Not inherently causal	Classifying blood donors and recipients into risk subgroups for adverse events
Deep learning and multimodal learning	Multimedia Omics data Natural language	High‐dimensional pattern recognition Omics prediction Multimodal risk modelling	Learns features automatically Strong performance in complex data Supports multimodal integration	Data‐ and compute‐intensive Challenging to interpret	Deep learning prediction of blood group antigens Foundation models for multi‐omics representations Multimedia data coding as tabular features for analysis
Generative AI (LLMs; generative tabular/omics models)	Multimodal data (natural language, multimedia, omics)	Knowledge mobilization/publication Data augmentation (rare events) Synthetic data generation	Accelerates knowledge translation Allows synthetic data generation Reduces barriers for under‐resourced settings	Hallucination and citation errors Privacy risks due to data leakage Requires strict governance and human oversight	LLMs for evidence synthesis and writing support Coding complex clinical data requiring interpretation
Federated and privacy‐preserving learning	Distributed, siloed datasets across institutions	High‐performance multi‐institution model development without pooling raw donor/recipient data	Enables collaboration while minimizing risks of data breaches May improve generalizability across sites	Relies on assumptions that may be unrealistic	Collaborative big data projects across different blood establishments, hospitals and vein‐to‐vein datasets

Open in a new tab

Abbreviations: AI, artificial intelligence; LLM, large language model; ML, machine learning; NLP, natural language processing; OMOP, Observational Medical Outcomes Partnership (common data model); QC, quality control.

Data collection

Data collection in transfusion medicine spans the entire vein‐to‐vein pathway, and each phase yields distinct types of data: donor demographics and health history; epidemiological screenings and infectious disease markers; product manufacturing details (component type, labelling, storage conditions and biochemical or genomic profiles); supply chain and inventory metrics; transfusion event documentation and dosing; and patient data such as diagnostic testing, transfusion and outcome variables. Together, these sources produce large volumes of continuous and/or periodic structured, semi‐structured and unstructured data.

Major challenges persist in assembling and harmonizing transfusion data. Many hospitals and blood centres still rely on paper‐based records or legacy software, which complicates data harmonization. Databases are often siloed across institutions, for example, donor data reside with blood collection agencies, whereas recipient transfusion and outcome data are typically stored in hospital EHR systems, with separate non‐interfacing infrastructures. Furthermore much of the existing data originate from operational or clinical workflows (e.g., blood bank information systems) and are not primarily curated for research. Thus, they often lack standardized definitions, consistent formats or sufficient metadata for analyses beyond their original intended use. Variability in data recording practices, non‐interoperable standards and context‐specific coding (which may change over time or between institutions) are widespread within transfusion services, which mirrors broader challenges seen in the biomedical big data landscape. The result is a fragmented data collection system that lacks interoperability limits the conversion of routinely collected data into high‐quality evidence for policy and clinical decision‐making [21].

There are several potential applications of AI/ML to hasten and harmonize data collection in transfusion medicine. Optical character recognition can be used for ‘machine reading’ forms and digitize paper records, reducing human effort and transcription errors [19]. Biometric analysis, such as automated estimation of vital signs or body mass index from photographs, can supplement clinical data capture where traditional measurements might be inaccurate or cumbersome [20]. Named entity recognition, a form of natural language processing (NLP), can convert free text data, such as from donor questionnaires, diagnostic test reports and clinical notes, into tabular data for analysis [22]. AI can also enable structured data extraction from scientific literature sources, for example, results of clinical trials of transfusion medicine, allowing for automated systematic reviews, and meta‐analyses under expert supervision [23]. The scope of data collection itself can be guided by predictive modelling to identify high‐yield targets for data collection, for instance, by flagging donors who have a high probability of carrying a rare blood type, iron deficiency, in or transmissible infection for more efficient sampling and deeper testing [24, 25]. Together, AI/ML capabilities have the potential to greatly improve the methods and efficiency of data collection in transfusion.

Data wrangling, cleaning and standardization

Data wrangling refers to transforming raw, heterogeneous inputs into structured, analysable formats. Data cleaning involves identifying and correcting errors or inconsistencies and respecifying missing values; harmonization aligns similar variables across different datasets to allow meaningful comparison and merging; rescaling numerical variables for specific analytical methods; and standardizing data through controlled vocabularies and ontologies to ensure consistent semantic meaning across systems. Examples include mapping locally adapted/truncated International Classification of Diseases (ICD) codes to standardized ICD standards and harmonizing variable names through common data models such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for clinical concepts, Logical Observation Identifiers Names and Codes (LOINC) for laboratory observations, RxNorm for drug and medication data and the Observational Medical Outcomes Partnership (OMOP) for harmonized data representation [26, 27, 28].

A significant challenge in data preparation stems from the multimodal and non‐standard nature of many transfusion variables. Data definitions and formats can be diverse, spanning text (clinical notes, reports), images (blood bags for visual assessment, blood smears), molecular data (genomic or proteomic profiles of donors) and device outputs (vital signs, apheresis machine data), all of which must be reconciled into a unified analytical structure. Data inputs come from various instruments and systems (laboratory analysers, bedside transfusion documentation tools, hospital EHRs, blood establishment software), each with unique formats and conventions. Key information is sometimes recorded hidden in narrative text (e.g., the indication for transfusion or description of a reaction) and may require arduous manual abstraction.

AI/ML have the potential to streamline and strengthen data preparation in transfusion. Automated mapping tools can assist with harmonizing data elements (e.g., OMOP) across differing jurisdictions and datasets and, in some cases, entirely replace the need for prespecified harmonization [29]. NLP might also be applied to more structured data extraction such as tabulation of blood component characteristics captured in International Society of Blood Transfusion (ISBT) 128 product codes. Medical Information Mart for Intensive Care (MIMIC)‐Extract, a pipeline for extracting and preprocessing healthcare datasets, standardizes electronic medical records (including transfusions) into tabular, analysis‐ready datasets in an automated fashion [30, 31]. ML‐based imputation approaches can infer missing events (e.g., incomplete iron markers, interim transfusion laboratory results, unreported transfusion reactions) with greater accuracy than traditional statistical methods [32]. String‐similarity algorithms can reconcile inconsistent naming conventions errors in data entry across blood products, drug names for supportive therapies, laboratory test labels for completeness, and other clinical variables to standardize variables across different databases [33]. Generative AI can also be used to create realistic synthetic transfusion datasets that mimic rare‐event distributions (e.g., transfusion‐related acute lung injury [TRALI] reactions, rare RH genotypes). BioReason, an open‐source DNA foundational model, can code raw genomic sequences into machine‐interpretable features for data analysis [34]. Such methods are becoming increasingly widespread and feasible for preparating big data for transfusion research.

Data validation

In data science, validation refers to ensuring that information captured in a database faithfully represents data captured at source and represents the phenomenon they are presumed to measure. Much of the information generated in transfusion medicine is collected for operational, clinical or administrative purposes, and may not be designed to meet the standards required in research. This means validation is an essential step for regulatory compliance, reproducible research, and accurate findings.

In practice, full validation of an entire dataset is rarely feasible due to the volume and complexity of modern transfusion data. Hence, most validation efforts follow guideline‐recommended approaches that emphasize risk‐based or selected sampling strategies. For example, a researcher may validate a representative subset of records, prioritizing variables that are clinically or analytically critical, or use predefined data quality rules to identify fields requiring manual review. Comprehensive validation protocols often require large teams to review large amounts of data which can significantly increase project costs and extend timelines.

AI/ML methods can greatly optimize data validation by automating or semi‐automating labour‐intensive validation process in transfusion datasets [35]. AI‐based human‐in‐the‐loop review systems can assist experts by pre‐screening donor or patient records, flagging inconsistencies, and prioritizing entries most likely to require verification. Cross‐institutional consistency checks can be done through automated comparison of coding systems, such as identifying mismatched or invalid ISBT 128 product codes across blood centres or hospitals [36]. ML–based anomaly detection methods can identify implausible or internally inconsistent values such as out‐of‐range haematological indices, implausible transfusion volumes or donor characteristics incompatible with recorded eligibility criteria [37]. NLP can also be leveraged to validate and supplement diagnostic or procedural coding by comparing structured elements (e.g., transfusion reaction codes) against evidence in clinical notes, nursing documentation or transfusion reaction narratives, thereby confirming the accuracy and completeness of recorded events and uncovering discordant data [38]. Unlike humans, AI/ML approaches are not affected by motivation, fatigue, or skill variability of human coders, thus providing more consistent and continuous monitoring for data quality and data drift (changes in the underlying distribution of data over time), improving the reliability of datasets for operational and research applications.

Coding, extraction and feature engineering

After data have been validated, variables are often transformed to suit specific analytical goals for any given project. For instance, data on age may be transformed into buckets of age quintiles. Feature engineering involves creating measurable variables from raw data that are relevant to the research question or other modelling task. In a transfusion context, this might include developing electronic definitions or algorithms to identify transfusion events and reactions (e.g., an algorithm that flags possible transfusion‐related acute lung injury based on oxygenation and timing), converting narrative text into tabular data for visualization or analysis, or aggregating data over time (such as converting granular blood utilization into blood use per capita). Genomic and molecular data also require preprocessing steps to extract biologically meaningful features, for example, summarizing gene variant data or proteomic markers of blood units.

Manual annotation of data for training or validation, such as labelling a set of clinical notes for whether they describe a transfusion reaction, can be remarkably time‐intensive. Rule‐based extraction pipelines can be brittle and hard to maintain across data updates. Unstructured text poses perennial challenges due to drift in language, spelling and formatting between institutions or over time. Crafting reliable, reusable variable definitions entails collaboration among transfusion medicine experts, clinical informaticians and data engineers, requiring specialized skills that may be in short supply.

AI/ML can improve feature engineering by streamlining steps in variable transformation and interpretation that traditionally rely on extensive expert input. Unsupervised clustering methods can identify latent patterns in donor profiles, transfusion episodes, and reaction characteristics, revealing naturally occurring groupings that may inform new phenotypes or risk categories [39]. Supervised learning models can help detect underreported or misclassified events (e.g., donor iron deficiency, occult transfusion reactions), using learned patterns from known cases to search for unidentified cases [40]. ML‐driven summarization tools can also allow recoding of complex variables into analysable formats, such as converting detailed genomic variant data into biologically meaningful variant allele frequencies or transforming narrative descriptions of transfusion events into structured tabular fields suitable for modelling [41]. Dimensionality reduction techniques, such as principle component analysis and auto‐encoders, can further distil large amounts of information, synthesizing 100–1000 s of features from multimodal data (e.g., clinical, laboratory and product metrics) into compact variables that retain core information while reducing noise [42]. Collectively, these AI/ML‐based approaches can greatly improve the usability and of complex transfusion datasets, such as those commonly used in vein‐to‐vein data research project.

Data storage, sharing, security and privacy

A key consideration for safely and effectively using big data in transfusion medicine is related to computing infrastructure and strict governance. Blood products and components touch nearly every part of the healthcare system. This means that the breadth and scale of modern vein‐to‐vein transfusion databases, often encompassing millions of donor, product and patient records, rivals or subsumes the complexity of other large healthcare databases. Handling massive amounts of data can require distributed storage and computational resources. However, data sharing across systems can be constrained by differing policies across institutions and jurisdictions, non‐interoperability between data systems and disagreements over data ownership, which can hinder data exchanges necessary to tap into big data insights. Privacy risks also increase as data linkages become more comprehensive. Recently, blood establishments and healthcare institutions have become frequent targets for cyberattacks, making salient the importance of rigorous cybersecurity measures as more data are become digital [43].

Building on these considerations, AI integration within big data infrastructure enables a range of capabilities to strengthen both usability and security. Natural language‐based interfaces, such as copilot integration with spreadsheet and databasing software, can allow clinicians, researchers and operational transfusion staff without programming expertise to retrieve and explore complex datasets at source using intuitive, conversational inputs, without requiring local storage [44]. ML‐based systems can be deployed to support cybersecurity by identifying malicious instructions or phishing attempts, analysing user behaviour, and communication patterns in real time, to anticipate and prevent cyberattacks [44, 45]. For data sharing, federated learning offers a particularly promising approach to address both data sharing and privacy challenges. In this paradigm, institutions train a shared ML model without transferring raw data (Figure 2). This allows data to remain within each organization's secure environment while still contributing to collaborative analyses, thereby enhancing privacy protections and supporting multi‐institutional model development when combined with appropriate security measures.

Federated multimodal learning for vein‐to‐vein clinical decision support.

Study design, literature review and analysis

High‐fidelity translation of analytic findings from big data requires a high‐quality background evidence, conceptual frameworks, analytic plans, and grant funding. An early step is a thorough review of the literature: scoping or systematic reviews require comprehensive reviews of evidence to identify knowledge gaps and refine hypotheses. Investigators may conduct exploratory data analyses on big datasets to uncover general trends or associations. Epidemiologists and statisticians may need to perform simulation studies with synthetic data to inform study design, power calculations, and counterfactual reasoning for causal inference. Drafting grant proposals and protocols requires advanced scientific writing and familiarity with evolving research reporting standards (e.g., Consolidated Standards of Reporting Trails), which can pose challenges for early‐career, low‐middle‐income (LMIC) and non‐English native investigators.

Building on these needs, AI/ML‐based tools can facilitate nearly all aspects of research design. Several AI tools, both paid and open source, now support efficient literature review and summarization, critical appraisal of study quality and assist with grant writing support, protocols and manuscripts aligned with reporting standards [46]. AI tools also have the potential to democratize coding and scientific writing skills, making research more feasible for researchers from LMICs who might normally lack copyediting support, administrative resources, and English language proficiency available to their counterparts in high‐resource settings [47, 48]. Although not the subject of this article, the suite of AI and ML techniques is itself an incredibly powerful set of primary research methods for advanced synthesis, inference and prediction using large transfusion datasets [49, 50]. When thoughtfully integrated, these applications may accelerate discovery, improve methodological rigor and lower technical barriers for investigators.

Publication

The publication phase translates analysis into peer‐reviewed manuscripts, reports, and other formats of disseminated evidence. Despite its primacy to big data research and science at large, the medical publication process remains discouragingly laborious. Converting raw references into a correctly formatted bibliography can be tedious, especially when managing dozens of citations. Medical journals have inconsistent rules for structure, figure placement, word count and supplementary materials, forcing researchers to reformat manuscripts repeatedly for resubmission to new journals. Consequently, the path from analysis to published evidence can be slow, delaying or altogether prevent the sharing of scientific insights to the broader community.

AI/ML can be harnessed to greatly reduce frustrations related to the publication process. Applications include automated formatting to meet journal‐specific requirements, extraction of key data from manuscripts and supplementary documents to reduce manual entry during submission and automated screening of abstracts to help expedite editorial and peer review processes [51]. AI‐assisted reviewers, although should not replace human experts, can serve as an additional layer of quality control to assist in expert peer review [52]. AI can also be used for automated assessments by applying well‐established research checklists (e.g., Strengthening the Reporting of Observational Studies in Epidemiology) to support adherence to reporting and integrity standards [53]. ML models can also increase accountability by detecting plagiarism and fabricated articles [54]. Collectively, these applications may reduce administrative burden and reduce the time from analysis to publication, particularly for early‐career and under‐resourced scholars who may lack administrative resources.

Knowledge mobilization

Knowledge mobilization (KM) allows the conversion of insights from research into accessible, actionable outputs for diverse stakeholders. In the age of big data, KM increasingly leverages interactive tools such as information dashboards that allow blood supply managers to visualize trends in near real time and personalized summaries for clinical decision‐making. Educational materials and practice guidelines for clinicians translate evidence into bedside decisions (e.g., an infographic with updated indications for transfusion or patient blood management strategies). Without knowledge mobilization, the impact of research findings can be delayed until long after publication, or altogether abandoned, and practice‐changing insights languish out‐of‐sight of key knowledge users.

In this context, digital and AI‐enabled tools can strengthen KM by streamlining the creation, adaptation and dissemination of operational and scholarly works in the field of transfusion research outputs [55]. Automated language translation, AI‐driven optimization and targeted marketing campaigns can expand the reach of practice‐changing evidence to global and non‐specialist audiences. Tools that support drafting lay summaries, press releases, and educational resources, as well as generating digital infographics, websites, apps and other interactive platforms, can complement KM activities and resource‐creation [56]. Administrative activities such as project planning and meeting summaries can be similarly automated. By lowering barriers to timely and multilingual dissemination of information, AI can catalyse the translation of insights into real‐world impact.

RISKS AND CAUTIONS OF USING AI FOR BIG DATA

Integrating AI into transfusion big data research offers tremendous promise yet also introduces significant risks that require careful oversight (Table 4) [57]. A foremost concern is privacy. Specifically, how sensitive health information is stored, accessed and processed by AI tools can introduce substantial vulnerabilities. Thus, safeguards are paramount: strong access controls to limit who can query the data, robust encryption of data in storage, and transit and detailed audit logs of any AI system's interaction with patient data. As noted in section ‘Data storage, sharing, security and privacy’, privacy‐preserving analytical methods such as secure multiparty computation, differential privacy or federated learning can reduce certain risks in well‐defined contexts.

TABLE 4.

Risks of artificial intelligence adoption in transfusion medicine and corresponding safeguards.

Risk	Mechanism	Clinical/operational consequence	Safeguard
Algorithmic bias	Non‐representative training data	Inaccurate risk scores for specific donor or ethnic groups	Diverse datasets, fairness audits, bias‐aware modelling
Data leakage and cybersecurity threats	PHI aggregation, cross‐system linkages	Regulatory breaches, reputational harm	Encryption, access logging, intrusion detection AI
Automation bias	Over‐reliance on AI recommendations	Deskilling, unsafe transfusion decisions	Mandatory human oversight, audit sampling
Model drift	Changes in clinical practice/data patterns	Declining accuracy over time	Scheduled recalibration, post‐deployment monitoring

Open in a new tab

Abbreviations: AI, artificial intelligence; PHI, protected health information.

Another concern is algorithmic bias. If an AI model is trained on data that are unbalanced or unrepresentative of the broader population, its predictions may be systematically unfair or less accurate for certain groups. Such biases have been documented; for example, a widely used health algorithm was found to underestimate healthcare needs for Black patients due to bias in the training data [58].

No matter how sophisticated, medical AI recommendations must be reviewed alongside clinical judgement to ensure additional safety and accountability for decisions. ‘Human‐in‐the‐loop’ monitoring, where a subset of AI outputs are manually checked, can catch AI errors or outliers that automated systems might miss, ensuring that incorrect interpretations do not compromise patient care.

A different concern relates to automation bias and deskilling when users of AI come to uncritically rely on AI recommendations. This may lead to complacency, loss of vigilance and attrition of expertise in performing tasks manually, particularly when AI systems may experience downtime or disruption.

Finally, implementing AI comes with an added layer of financial and logistical burdens, although this may be justified by the efficiency gains from AI use. Legal and ethical questions around consent for data use, algorithm transparency and data ownership require deliberate data governance frameworks [59, 60].

Several principles are particularly relevant for transfusion medicine. First, the intended use case of an AI system must be explicitly defined, including its clinical role, decision boundaries and required human oversight. Second, datasets should be assessed for completeness, representativeness and potential sources of bias, with model validation performed across diverse institutions and patient populations to ensure generalizability. Third, implementation should involve clear governance frameworks, including processes for post‐deployment monitoring, safe model updating, change management and mechanisms for clinician feedback. Institutions must address questions of accountability and liability, clarify expectations for data ownership and secondary data use and ensure transparency about how AI systems support clinical or operational decision‐making. Training end users and communicating model limitations are essential to maintain trust and avoid automation bias.

KEY RECOMMENDATIONS AND FUTURE DIRECTIONS

Based on the workflow outlined in this review, we propose the following key recommendations and future directions.

Establish dedicated big data and bioinformatics infrastructure

The implementation of dedicated big data and bioinformatics infrastructure within blood centres and transfusion services should be prioritized. This infrastructure should include standardized data architectures, interoperable informatics systems, and secure high‐performance computing environments, supported by personnel trained in bioinformatics and data science. Establishing this foundation is critical to harnessing the potential of large‐scale datasets for AI application, precision transfusion medicine, predictive analytics and translational research.

Prioritize high‐impact, clinically relevant use cases

Early AI efforts should focus on well‐defined, high‐yield applications with strong data availability and clear clinical or operational value, such as donor risk prediction, inventory forecasting and automated detection of transfusion reactions. Successful pilots can later expand to other applications.

Embed validation, monitoring and human oversight throughout the AI lifecycle

AI systems must be treated as imperfect tools requiring rigorous pre‐deployment validation and continuous post‐implementation monitoring to detect inaccuracy, drift and bias. Human annotation and expert oversight should initially remain mandatory for all automated steps, with regular audits and recalibration embedded into routine operations to ensure results remain reliable and accurate [61].

Invest in interoperable data infrastructure and governance

Effective AI adoption requires interoperable data architectures enabling secure donor–product–recipient linkage using standardized models (e.g., OMOP, LOINC). Robust governance structures should define data access, consent, ownership and accountability, overseen by multidisciplinary committees to ensure transparency, trust and regulatory compliance.

Strengthen cybersecurity and privacy‐preserving analytics

As data ecosystems expand, institutions must adopt strong cybersecurity measures, including encryption, access logging and regular audits. Privacy‐preserving approaches such as federated learning and secure multiparty computation should be operationalized, particularly for cross‐institutional collaboration.

Address equity, bias and inclusion as core design principles

AI systems must be developed using diverse and representative datasets to avoid perpetuating inequities [62]. Multi‐institutional collaboration and federated learning can help broaden participation while ensuring equitable distribution of AI‐enabled benefits [63].

Advance education and multidisciplinary collaboration

Training programmes should build AI literacy among transfusion professionals, while data scientists must remain closely embedded with clinical teams. Sustained collaboration across clinical, technical, ethical and policy domains is essential.

CONCLUSION

We believe that big data science in transfusion medicine will increasingly incorporate AI/ML technology with the potential to create safer, more capable and resilient data systems for operations and research in blood systems. Systematic integration of AI/ML throughout big data pipelines can more efficiently transform heterogeneous, high‐volume data streams into coherent, high‐quality evidence and insights to support advancements in transfusion practice. However, the expanding scale and complexity of big data systems must be matched by rigorous governance frameworks that address bias, privacy, cybersecurity and automation reliance, ensuring that innovation proceeds responsibly and remains aligned with safety. Ultimately, sustained investment in interoperable big data infrastructure, privacy‐preserving collaboration and multidisciplinary scientific partnerships will be essential to realizing the full potential of a continuously learning, impactful and data‐driven transfusion ecosystem.

CONFLICT OF INTEREST STATEMENT

The views expressed are those of the authors and do not necessarily reflect the views of our institutions. S.R. served as an Independent Contractor for Micro1, contributing to OpenAI initiatives focused on the safety and alignment of AI responses to health‐related questions.

ACKNOWLEDGEMENTS

The authors have nothing to report.

All authors contributed to manuscript preparation, critical revisions and final decision to submit.

Raza S, Goel R, Erikstrup C, D'Alessandro A, Custer B, Li N, et al. Harnessing big data and artificial intelligence in transfusion medicine: Opportunities for precision, safety and efficiency. Vox Sang. 2026;121:441–452.

Sheharyar Raza and Ruchika Goel are joint first authors.

DATA AVAILABILITY STATEMENT

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

REFERENCES

1. Raza S, Callum J, Modi D, Sztainert T, Shih AW, Schull MJ, et al. Canadian donations and transfusion database (candat): from blood donors to transfusion recipients. Transfusion. 2025;65:1187–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Kleinman S, Busch MP, Murphy EL, Shan H, Ness P, Glynn SA, et al. The National Heart, Lung, and Blood Institute Recipient Epidemiology and Donor Evaluation Study (REDS‐III): a research program striving to improve blood donor and transfusion recipient outcomes. Transfusion. 2014;54:942–955. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Edgren G, Rostgaard K, Vasan SK, Wikman A, Norda R, Pedersen OB, et al. The new Scandinavian Donations and Transfusions database (SCANDAT2): a blood safety resource with added versatility. Transfusion. 2015;55:1600–1606. [DOI] [PubMed] [Google Scholar]
4. Williamson LM, Devine DV. Challenges in the management of the blood supply. Lancet. 2013;381:1866–1875. [DOI] [PubMed] [Google Scholar]
5. Pendry K. The use of big data in transfusion medicine. Transfus Med. 2015;25:129–137. [DOI] [PubMed] [Google Scholar]
6. Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data. 2019;6:54. [Google Scholar]
7. Al‐Riyami AZ, Herjes S. Use of artificial intelligence and big data in transfusion medicine: an exploratory assessment of status in the Eastern Mediterranean and North Africa region. Vox Sang. 2026;121:511–519. 10.1111/vox.70145 [DOI] [PubMed] [Google Scholar]
8. Al‐Riyami AZ, Rexer K, Masters K, Gammon R. Use of artificial intelligence in transfusion medicine practice, education and research: a mixed methodology study. Vox Sang. 2026;121:520–529. 10.1111/vox.70182 [DOI] [PubMed] [Google Scholar]
9. Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018;15:20170030. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. 2020;26:29–38. [DOI] [PubMed] [Google Scholar]
11. Birch RJ, Umbel K, Karafin MS, Goel R, Mathew S, Pace W, et al. How do we build a comprehensive Vein‐to‐Vein (v2v) database for conduct of observational studies in transfusion medicine? Demonstrated with the Recipient Epidemiology and Donor Evaluation Study‐IV‐Pediatric V2V database protocol. Transfusion. 2023;63:1623–1632. [DOI] [PubMed] [Google Scholar]
12. Goel R, Ness PM, Takemoto CM, Krishnamurti L, King KE, Tobian AAR. Platelet transfusions in platelet consumptive disorders are associated with arterial thrombosis and in‐hospital mortality. Blood. 2015;125:1470–1476. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Ning S, Li N, Barty R, Arnold D, Heddle NM. Database‐driven research and big data analytic approaches in transfusion medicine. Transfusion. 2022;62:1427–1434. [DOI] [PubMed] [Google Scholar]
14. Goel R, Yang P, Zhu X, Patel EU, Crowe EP, Rai H, et al. Hospital readmissions among people with sickle cell disease. JAMA Netw Open. 2025;8:e2517974. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Goel R, Chappidi MR, Patel EU, Ness PM, Cushing MM, Frank SM, et al. Trends in red blood cell, plasma, and platelet transfusions in the United States, 1993‐2014. JAMA. 2018;319:825–827. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–1358. [DOI] [PubMed] [Google Scholar]
17. Topol EJ. High‐performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. [DOI] [PubMed] [Google Scholar]
18. Lewin A, Rochette S, Shih AW, Tinmouth A, Chassé M, O'Brien SF, et al. Vein‐to‐vein databases: uses and considerations in transfusion research. Transfus Med Rev. 2025;40:150936. [DOI] [PubMed] [Google Scholar]
19. Haahti E, Leikola J, Ruikka S. Application of the OCR‐B symbol in the data‐processing system of the Finnish Red Cross Blood Transfusion Service. Vox Sang. 1981;40:181–186. [PubMed] [Google Scholar]
20. Pantanowitz A, Cohen E, Gradidge P, Crowther NJ, Aharonson V, Rosman B, et al. Estimation of body mass index from photographs using deep convolutional neural networks. Inform Med Unlocked. 2021;26:100727. [Google Scholar]
21. Jacobs JW, Raza S, Maynard S, Shaz BH, Tobian AAR, Bloch EM. Improving transfusion access through improved policy: a call for a less fragmented blood supply. Expert Rev Hematol. 2026;19:225–235. [DOI] [PubMed] [Google Scholar]
22. Xu Y, Tsujii J, Chang EI‐C. Named entity recognition of follow‐up and time information in 20,000 radiology reports. J Am Med Inform Assoc. 2012;19:792–799. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Lieberum J‐L, Toews M, Metzendorf M‐I, Heilmeyer F, Siemens W, Haverkamp C, et al. Large language models for conducting systematic reviews: on the rise, but not yet ready for use‐a scoping review. J Clin Epidemiol. 2025;181:111746. [DOI] [PubMed] [Google Scholar]
24. Moslemi C, Saekmose S, Larsen R, Brodersen T, Bay JT, Didriksen M, et al. A deep learning approach to prediction of blood group antigens from genomic data. Transfusion. 2024;64:2179–2195. [DOI] [PubMed] [Google Scholar]
25. Li W, Su C‐Y, Meulenbeld A, Jagirdar H, Janssen MP, Swanevelder R, et al. Machine‐learning models to predict iron recovery after blood donation: a model development and external validation study. Lancet Haematol. 2025;12:e431–e441. [DOI] [PubMed] [Google Scholar]
26. Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011;18:441–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Reich C, Ostropolets A, Ryan P, Rijnbeek P, Schuemie M, Davydov A, et al. OHDSI Standardized Vocabularies‐a large‐scale centralized reference ontology for international data harmonization. J Am Med Inform Assoc. 2024;31:583–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Kirchler M, Ferro M, Lorenzini V, Van De Water RP; FinnGen ; Lippert C, Ganna A. Large language models improve transferability of electronic health record‐based predictions across countries and coding systems. NPJ Digit Med. 2026;9:177. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Adams MCB, Perkins ML, Hudson C, Madhira V, Akbilgic O, Ma D, et al. Breaking digital health barriers through a large language model‐based tool for automated observational medical outcomes partnership mapping: development and validation study. J Med Internet Res. 2025;27:e69004. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Wang S, McDermott MBA, Chauhan G, Hughes MC, Naumann T, Ghassemi M. MIMIC‐extract: a data extraction, preprocessing, and representation pipeline for MIMIC‐III. 2019. 10.48550/ARXIV.1907.08322 [DOI]
31. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC‐IV, a freely accessible electronic health record dataset. Sci Data. 2023;10:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Jerez JM, Molina I, García‐Laencina PJ, Alba E, Ribelles N, Martín M, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010;50:105–115. [DOI] [PubMed] [Google Scholar]
33. Li X, Döhmen T. Towards efficient data wrangling with LLMs using code generation. Proceedings of the eighth workshop on data Management for end‐to‐end Machine Learning. Santiago AA Chile: ACM; 2024. p. 62–66. [Google Scholar]
34. Fallahpour A, Magnuson A, Gupta P, Ma S, Naimer J, Shah A, et al. BioReason: incentivizing multimodal biological reasoning within a DNA‐LLM model [Internet]. arXiv. 2025 May 29. Available from: https://arxiv.org/abs/2505.23579. Last accessed 6 Mar 2026.
35. Verma AA, Pasricha SV, Jung HY, Kushnir V, Mak DYF, Koppula R, et al. Assessing the quality of clinical and administrative data extracted from hospitals: the General Medicine Inpatient Initiative (GEMINI) experience. J Am Med Inform Assoc. 2021;28:578–587. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Adamson B, Waskom M, Blarre A, Kelly J, Krismer K, Nemeth S, et al. Approach to machine learning for extraction of real‐world data variables from electronic health records. Front Pharmacol. 2023;14:1180962. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Niu H, Omitaomu OA, Langston MA, Grady SK, Olama M, Ozmen O, et al. Anomaly detection in electronic health records across hospital networks: integrating machine learning with graph algorithms. IEEE J Biomed Health Inform. 2025;29:3723–3735. [DOI] [PubMed] [Google Scholar]
38. Schafer H, Schmidt CS, Wutzkowsky J, Lorek K, Reinartz L, Ruckert J, et al. A multimodal pipeline for clinical data extraction: applying vision‐language models to scans of transfusion reaction reports. Annu Int Conf IEEE Eng Med Biol Soc. 2025;2025:1–7. [DOI] [PubMed] [Google Scholar]
39. Li N, Riazi K, Pan J, Thavorn K, Ziegler J, Rochwerg B, et al. Unsupervised clustering for sepsis identification in large‐scale patient data: a model development and validation study. Intensive Care Med Exp. 2025;13:37. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Wittlinger S, Wiest IC, Ladani MJ, Kather JN, Ebert MP, Siegel F, et al. How machine learning on real world clinical data improves adverse event recording for endoscopy. NPJ Digit Med. 2025;8:424. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, et al. scGPT: toward building a foundation model for single‐cell multi‐omics using generative AI. Nat Methods. 2024;21:1470–1480. [DOI] [PubMed] [Google Scholar]
42. Lever J, Krzywinski M, Altman N. Principal component analysis. Nat Methods. 2017;14:641–642. [Google Scholar]
43. Jacobs JW, De Simone N, Duque MA, Wu Y, Ward DC, Woo JS, et al. Cybersecurity and the blood supply: the vulnerabilities of the technological revolution. Am J Hematol. 2024;99:2258–2260. [DOI] [PubMed] [Google Scholar]
44. Lund‐Tonnesen M, Vahr Lauridsen S, Rosenberg J. Evaluating microsoft copilot in qualitative health research: accurate for manifest content coding but limited in latent interpretation. Cureus. 2025;17:e95719. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Thapa J, Chahal G, Gabreanu SV, Otoum Y. Phishing detection in the gen‐AI era: quantized LLMs vs classical models [Internet]. arXiv. 2025 Jul 10. Available from: https://arxiv.org/abs/2507.07406. Last accessed 24 Dec 2025.
46. Wang Z, Cao L, Danek B, Jin Q, Lu Z, Sun J. Accelerating clinical evidence synthesis with large language models. NPJ Digit Med. 2025;8:509. [DOI] [PMC free article] [PubMed] [Google Scholar]
47. Sun M, Han R, Jiang B, Qi H, Sun D, Yuan Y, et al. A survey on large language model‐based agents for statistics and data science. Am Statistic. 2025;1–14:1–14. [Google Scholar]
48. Reis F, Lenz C, Gossen M, Volk H‐D, Drzeniek NM. Practical applications of large language models for health care professionals and scientists. JMIR Med Inform. 2024;12:e58478. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Maynard S, Farrington J, Alimam S, Evans H, Li K, Wong WK, et al. Machine learning in transfusion medicine: a scoping review. Transfusion. 2024;64:162–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Li N, Goel R, Raza S, Riazi K, Pan J, Nguyen HQ, et al. Artificial intelligence and machine learning in transfusion practice: an analytical assessment. Transfus Med Rev. 2025;39:150926. [DOI] [PubMed] [Google Scholar]
51. Ahn S. The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions. Korean J Physiol Pharmacol. 2024;28:393–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Perlis RH, Christakis DA, Bressler NM, Öngür D, Kendall‐Taylor J, Flanagin A, et al. Artificial intelligence in peer review. JAMA. 2025;334. Erratum in: JAMA. 2025; 334: 1563. [DOI] [PubMed] [Google Scholar]
53. Golinelli D, Sanmarchi F, Nuzzolese A, Toscano F, Bucci A, Nante N. A guide to AI in epidemiology: ChatGPT and the STROBE checklist for observational studies. Eur J Public Health. 2023;33:ckad160.1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Matsui K. Delving into PubMed records: how AI‐influenced vocabulary has transformed medical writing since ChatGPT. Perspect Med Educ. 2025;14:882–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Trinkley KE, An R, Maw AM, Glasgow RE, Brownson RC. Leveraging artificial intelligence to advance implementation science: potential opportunities and cautions. Implement Sci. 2024;19:17. [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Uddin J, Feng C, Xu J. Health communication on the internet: promoting public health and exploring disparities in the generative AI era. J Med Internet Res. 2025;27:e66032. [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Seheult JN, Malone JR, Jackson BR, Malik MM, Yazer M. Ethical dilemmas in the use of artificial intelligence in transfusion medicine. Vox Sang. 2026;121:417–429. 10.1111/vox.70183 [DOI] [PubMed] [Google Scholar]
58. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453. [DOI] [PubMed] [Google Scholar]
59. International Medical Device Regulators Forum (IMDRF) . Good machine learning practice for medical device development: guiding principles [Internet]. IMDRF; 2025 Jan 29. Available from: https://www.imdrf.org/documents/good‐machine‐learning‐practice‐medical‐device‐development‐guiding‐principles. Last accessed 19 Feb 2026. [Google Scholar]
60. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, Ashrafian H, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT‐AI extension. Lancet Digit Health. 2020;2:e537–e548. [DOI] [PMC free article] [PubMed] [Google Scholar]
61. Maynard S, Farrington J, Raza S, Stanworth SJ. Artificial intelligence implementation in transfusion medicine: addressing the challenges of clinical adoption. Transfus Med Rev. 2026;40:150961. [DOI] [PubMed] [Google Scholar]
62. Li MM, Reis BY, Rodman A, Cai T, Dagan N, Balicer RD, et al. Scaling medical AI across clinical contexts. Nat Med. 2026;32:439–448. [DOI] [PubMed] [Google Scholar]
63. Theodorou B, Danek B, Tummala V, Kumar SP, Malin B, Sun J. Improving medical machine learning models with generative balancing for equity and excellence. NPJ Digit Med. 2025;8:100. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

[vox70236-bib-0001] 1. Raza S, Callum J, Modi D, Sztainert T, Shih AW, Schull MJ, et al. Canadian donations and transfusion database (candat): from blood donors to transfusion recipients. Transfusion. 2025;65:1187–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0002] 2. Kleinman S, Busch MP, Murphy EL, Shan H, Ness P, Glynn SA, et al. The National Heart, Lung, and Blood Institute Recipient Epidemiology and Donor Evaluation Study (REDS‐III): a research program striving to improve blood donor and transfusion recipient outcomes. Transfusion. 2014;54:942–955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0003] 3. Edgren G, Rostgaard K, Vasan SK, Wikman A, Norda R, Pedersen OB, et al. The new Scandinavian Donations and Transfusions database (SCANDAT2): a blood safety resource with added versatility. Transfusion. 2015;55:1600–1606. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0004] 4. Williamson LM, Devine DV. Challenges in the management of the blood supply. Lancet. 2013;381:1866–1875. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0005] 5. Pendry K. The use of big data in transfusion medicine. Transfus Med. 2015;25:129–137. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0006] 6. Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data. 2019;6:54. [Google Scholar]

[vox70236-bib-0007] 7. Al‐Riyami AZ, Herjes S. Use of artificial intelligence and big data in transfusion medicine: an exploratory assessment of status in the Eastern Mediterranean and North Africa region. Vox Sang. 2026;121:511–519. 10.1111/vox.70145 [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0008] 8. Al‐Riyami AZ, Rexer K, Masters K, Gammon R. Use of artificial intelligence in transfusion medicine practice, education and research: a mixed methodology study. Vox Sang. 2026;121:520–529. 10.1111/vox.70182 [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0009] 9. Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018;15:20170030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0010] 10. Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. 2020;26:29–38. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0011] 11. Birch RJ, Umbel K, Karafin MS, Goel R, Mathew S, Pace W, et al. How do we build a comprehensive Vein‐to‐Vein (v2v) database for conduct of observational studies in transfusion medicine? Demonstrated with the Recipient Epidemiology and Donor Evaluation Study‐IV‐Pediatric V2V database protocol. Transfusion. 2023;63:1623–1632. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0012] 12. Goel R, Ness PM, Takemoto CM, Krishnamurti L, King KE, Tobian AAR. Platelet transfusions in platelet consumptive disorders are associated with arterial thrombosis and in‐hospital mortality. Blood. 2015;125:1470–1476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0013] 13. Ning S, Li N, Barty R, Arnold D, Heddle NM. Database‐driven research and big data analytic approaches in transfusion medicine. Transfusion. 2022;62:1427–1434. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0014] 14. Goel R, Yang P, Zhu X, Patel EU, Crowe EP, Rai H, et al. Hospital readmissions among people with sickle cell disease. JAMA Netw Open. 2025;8:e2517974. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0015] 15. Goel R, Chappidi MR, Patel EU, Ness PM, Cushing MM, Frank SM, et al. Trends in red blood cell, plasma, and platelet transfusions in the United States, 1993‐2014. JAMA. 2018;319:825–827. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0016] 16. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–1358. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0017] 17. Topol EJ. High‐performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0018] 18. Lewin A, Rochette S, Shih AW, Tinmouth A, Chassé M, O'Brien SF, et al. Vein‐to‐vein databases: uses and considerations in transfusion research. Transfus Med Rev. 2025;40:150936. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0019] 19. Haahti E, Leikola J, Ruikka S. Application of the OCR‐B symbol in the data‐processing system of the Finnish Red Cross Blood Transfusion Service. Vox Sang. 1981;40:181–186. [PubMed] [Google Scholar]

[vox70236-bib-0020] 20. Pantanowitz A, Cohen E, Gradidge P, Crowther NJ, Aharonson V, Rosman B, et al. Estimation of body mass index from photographs using deep convolutional neural networks. Inform Med Unlocked. 2021;26:100727. [Google Scholar]

[vox70236-bib-0021] 21. Jacobs JW, Raza S, Maynard S, Shaz BH, Tobian AAR, Bloch EM. Improving transfusion access through improved policy: a call for a less fragmented blood supply. Expert Rev Hematol. 2026;19:225–235. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0022] 22. Xu Y, Tsujii J, Chang EI‐C. Named entity recognition of follow‐up and time information in 20,000 radiology reports. J Am Med Inform Assoc. 2012;19:792–799. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0023] 23. Lieberum J‐L, Toews M, Metzendorf M‐I, Heilmeyer F, Siemens W, Haverkamp C, et al. Large language models for conducting systematic reviews: on the rise, but not yet ready for use‐a scoping review. J Clin Epidemiol. 2025;181:111746. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0024] 24. Moslemi C, Saekmose S, Larsen R, Brodersen T, Bay JT, Didriksen M, et al. A deep learning approach to prediction of blood group antigens from genomic data. Transfusion. 2024;64:2179–2195. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0025] 25. Li W, Su C‐Y, Meulenbeld A, Jagirdar H, Janssen MP, Swanevelder R, et al. Machine‐learning models to predict iron recovery after blood donation: a model development and external validation study. Lancet Haematol. 2025;12:e431–e441. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0026] 26. Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011;18:441–448. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0027] 27. Reich C, Ostropolets A, Ryan P, Rijnbeek P, Schuemie M, Davydov A, et al. OHDSI Standardized Vocabularies‐a large‐scale centralized reference ontology for international data harmonization. J Am Med Inform Assoc. 2024;31:583–590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0028] 28. Kirchler M, Ferro M, Lorenzini V, Van De Water RP; FinnGen ; Lippert C, Ganna A. Large language models improve transferability of electronic health record‐based predictions across countries and coding systems. NPJ Digit Med. 2026;9:177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0029] 29. Adams MCB, Perkins ML, Hudson C, Madhira V, Akbilgic O, Ma D, et al. Breaking digital health barriers through a large language model‐based tool for automated observational medical outcomes partnership mapping: development and validation study. J Med Internet Res. 2025;27:e69004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0030] 30. Wang S, McDermott MBA, Chauhan G, Hughes MC, Naumann T, Ghassemi M. MIMIC‐extract: a data extraction, preprocessing, and representation pipeline for MIMIC‐III. 2019. 10.48550/ARXIV.1907.08322 [DOI]

[vox70236-bib-0031] 31. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC‐IV, a freely accessible electronic health record dataset. Sci Data. 2023;10:1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0032] 32. Jerez JM, Molina I, García‐Laencina PJ, Alba E, Ribelles N, Martín M, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010;50:105–115. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0033] 33. Li X, Döhmen T. Towards efficient data wrangling with LLMs using code generation. Proceedings of the eighth workshop on data Management for end‐to‐end Machine Learning. Santiago AA Chile: ACM; 2024. p. 62–66. [Google Scholar]

[vox70236-bib-0034] 34. Fallahpour A, Magnuson A, Gupta P, Ma S, Naimer J, Shah A, et al. BioReason: incentivizing multimodal biological reasoning within a DNA‐LLM model [Internet]. arXiv. 2025 May 29. Available from: https://arxiv.org/abs/2505.23579. Last accessed 6 Mar 2026.

[vox70236-bib-0035] 35. Verma AA, Pasricha SV, Jung HY, Kushnir V, Mak DYF, Koppula R, et al. Assessing the quality of clinical and administrative data extracted from hospitals: the General Medicine Inpatient Initiative (GEMINI) experience. J Am Med Inform Assoc. 2021;28:578–587. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0036] 36. Adamson B, Waskom M, Blarre A, Kelly J, Krismer K, Nemeth S, et al. Approach to machine learning for extraction of real‐world data variables from electronic health records. Front Pharmacol. 2023;14:1180962. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0037] 37. Niu H, Omitaomu OA, Langston MA, Grady SK, Olama M, Ozmen O, et al. Anomaly detection in electronic health records across hospital networks: integrating machine learning with graph algorithms. IEEE J Biomed Health Inform. 2025;29:3723–3735. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0038] 38. Schafer H, Schmidt CS, Wutzkowsky J, Lorek K, Reinartz L, Ruckert J, et al. A multimodal pipeline for clinical data extraction: applying vision‐language models to scans of transfusion reaction reports. Annu Int Conf IEEE Eng Med Biol Soc. 2025;2025:1–7. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0039] 39. Li N, Riazi K, Pan J, Thavorn K, Ziegler J, Rochwerg B, et al. Unsupervised clustering for sepsis identification in large‐scale patient data: a model development and validation study. Intensive Care Med Exp. 2025;13:37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0040] 40. Wittlinger S, Wiest IC, Ladani MJ, Kather JN, Ebert MP, Siegel F, et al. How machine learning on real world clinical data improves adverse event recording for endoscopy. NPJ Digit Med. 2025;8:424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0041] 41. Cui H, Wang C, Maan H, Pang K, Luo F, Duan N, et al. scGPT: toward building a foundation model for single‐cell multi‐omics using generative AI. Nat Methods. 2024;21:1470–1480. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0042] 42. Lever J, Krzywinski M, Altman N. Principal component analysis. Nat Methods. 2017;14:641–642. [Google Scholar]

[vox70236-bib-0043] 43. Jacobs JW, De Simone N, Duque MA, Wu Y, Ward DC, Woo JS, et al. Cybersecurity and the blood supply: the vulnerabilities of the technological revolution. Am J Hematol. 2024;99:2258–2260. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0044] 44. Lund‐Tonnesen M, Vahr Lauridsen S, Rosenberg J. Evaluating microsoft copilot in qualitative health research: accurate for manifest content coding but limited in latent interpretation. Cureus. 2025;17:e95719. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0045] 45. Thapa J, Chahal G, Gabreanu SV, Otoum Y. Phishing detection in the gen‐AI era: quantized LLMs vs classical models [Internet]. arXiv. 2025 Jul 10. Available from: https://arxiv.org/abs/2507.07406. Last accessed 24 Dec 2025.

[vox70236-bib-0046] 46. Wang Z, Cao L, Danek B, Jin Q, Lu Z, Sun J. Accelerating clinical evidence synthesis with large language models. NPJ Digit Med. 2025;8:509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0047] 47. Sun M, Han R, Jiang B, Qi H, Sun D, Yuan Y, et al. A survey on large language model‐based agents for statistics and data science. Am Statistic. 2025;1–14:1–14. [Google Scholar]

[vox70236-bib-0048] 48. Reis F, Lenz C, Gossen M, Volk H‐D, Drzeniek NM. Practical applications of large language models for health care professionals and scientists. JMIR Med Inform. 2024;12:e58478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0049] 49. Maynard S, Farrington J, Alimam S, Evans H, Li K, Wong WK, et al. Machine learning in transfusion medicine: a scoping review. Transfusion. 2024;64:162–184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0050] 50. Li N, Goel R, Raza S, Riazi K, Pan J, Nguyen HQ, et al. Artificial intelligence and machine learning in transfusion practice: an analytical assessment. Transfus Med Rev. 2025;39:150926. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0051] 51. Ahn S. The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions. Korean J Physiol Pharmacol. 2024;28:393–401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0052] 52. Perlis RH, Christakis DA, Bressler NM, Öngür D, Kendall‐Taylor J, Flanagin A, et al. Artificial intelligence in peer review. JAMA. 2025;334. Erratum in: JAMA. 2025; 334: 1563. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0053] 53. Golinelli D, Sanmarchi F, Nuzzolese A, Toscano F, Bucci A, Nante N. A guide to AI in epidemiology: ChatGPT and the STROBE checklist for observational studies. Eur J Public Health. 2023;33:ckad160.1213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0054] 54. Matsui K. Delving into PubMed records: how AI‐influenced vocabulary has transformed medical writing since ChatGPT. Perspect Med Educ. 2025;14:882–890. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0055] 55. Trinkley KE, An R, Maw AM, Glasgow RE, Brownson RC. Leveraging artificial intelligence to advance implementation science: potential opportunities and cautions. Implement Sci. 2024;19:17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0056] 56. Uddin J, Feng C, Xu J. Health communication on the internet: promoting public health and exploring disparities in the generative AI era. J Med Internet Res. 2025;27:e66032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0057] 57. Seheult JN, Malone JR, Jackson BR, Malik MM, Yazer M. Ethical dilemmas in the use of artificial intelligence in transfusion medicine. Vox Sang. 2026;121:417–429. 10.1111/vox.70183 [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0058] 58. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0059] 59. International Medical Device Regulators Forum (IMDRF) . Good machine learning practice for medical device development: guiding principles [Internet]. IMDRF; 2025 Jan 29. Available from: https://www.imdrf.org/documents/good‐machine‐learning‐practice‐medical‐device‐development‐guiding‐principles. Last accessed 19 Feb 2026. [Google Scholar]

[vox70236-bib-0060] 60. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK, Ashrafian H, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT‐AI extension. Lancet Digit Health. 2020;2:e537–e548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vox70236-bib-0061] 61. Maynard S, Farrington J, Raza S, Stanworth SJ. Artificial intelligence implementation in transfusion medicine: addressing the challenges of clinical adoption. Transfus Med Rev. 2026;40:150961. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0062] 62. Li MM, Reis BY, Rodman A, Cai T, Dagan N, Balicer RD, et al. Scaling medical AI across clinical contexts. Nat Med. 2026;32:439–448. [DOI] [PubMed] [Google Scholar]

[vox70236-bib-0063] 63. Theodorou B, Danek B, Tummala V, Kumar SP, Malin B, Sun J. Improving medical machine learning models with generative balancing for equity and excellence. NPJ Digit Med. 2025;8:100. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Harnessing big data and artificial intelligence in transfusion medicine: Opportunities for precision, safety and efficiency

Sheharyar Raza

Ruchika Goel

Christian Erikstrup

Angelo D'Alessandro

Brian Custer

Na Li

Abstract

Highlights.

INTRODUCTION

TABLE 1.

FIGURE 1.

INCORPORATING AI INTO BIG DATA WORKFLOWS

AI/ML and big data relate synergistically

TABLE 2.

TABLE 3.

Data collection

Data wrangling, cleaning and standardization

Data validation

Coding, extraction and feature engineering

Data storage, sharing, security and privacy

FIGURE 2.

Study design, literature review and analysis

Publication

Knowledge mobilization

RISKS AND CAUTIONS OF USING AI FOR BIG DATA

TABLE 4.

KEY RECOMMENDATIONS AND FUTURE DIRECTIONS

Establish dedicated big data and bioinformatics infrastructure

Prioritize high‐impact, clinically relevant use cases

Embed validation, monitoring and human oversight throughout the AI lifecycle

Invest in interoperable data infrastructure and governance

Strengthen cybersecurity and privacy‐preserving analytics

Address equity, bias and inclusion as core design principles

Advance education and multidisciplinary collaboration

CONCLUSION

CONFLICT OF INTEREST STATEMENT

ACKNOWLEDGEMENTS

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases