Skip to main content
JAMIA Open logoLink to JAMIA Open
. 2026 Jan 24;9(1):ooag009. doi: 10.1093/jamiaopen/ooag009

Cognitive readiness of nurses regarding artificial intelligence predictions: understanding through the dual lens of verbatim and gist knowledge

Insook Cho 1,2,, Soyun Shim 3, Hyunchul Park 4
PMCID: PMC12918763  PMID: 41727411

Abstract

Objectives

The expansion of artificial intelligence (AI)-enabled clinical decision support (CDS) requires nurses to interpret complex model outputs. However, their cognitive readiness remains underexplored, particularly in terms of their understanding of statistics. To assess nurses’ understanding of key statistical concepts underlying AI predictions and their relationship to health numeracy.

Materials and Methods

An organizational approach study involving 180 nurses from 6 medical–surgical units at a tertiary hospital, preparing to implement an AI fall-prediction model. Statistical knowledge was evaluated using a heuristic vignette based on fuzzy-trace theory, assessing both verbatim (literal) and gist (meaning-based) understanding of sensitivity, specificity, and CIs. Health numeracy was measured using the Lipkus Objective Numeracy Scale, Numeracy Understanding in Medicine Instrument: short form, and Subjective Numeracy Scale. Analyses included ANOVA and Kruskal–Wallis and Wilcoxon rank-sum tests, with thematic analyses applied to the qualitative concerns of nurses.

Results

Overall statistical knowledge was moderate (mean = 85.56, 95% CI, 82.64-88.46). Gist knowledge lagged verbatim knowledge, especially about CIs. Nurses with advanced degrees had higher verbatim scores (P = .0108), while bachelor-level nurses performed better on discrete-choice tasks related to gist (P = .0124). Numeracy was not significantly associated with the understanding of statistics. Nurses overrode predictions due to cognitive mismatch, requesting greater model transparency, input rationale, and risk-threshold explanations.

Conclusion

Despite displaying adequate numeracy, nurses’ conceptual grasp of statistical concepts may hinder the safe application of AI CDS system outputs. These findings underscore the need for targeted education and a cognitive-fit-driven interface design to support the trustworthy use of AI in nursing practice.

Keywords: clinical decision support, artificial intelligence, statistical knowledge, cognitive fit, health numeracy, clinical nurses

Lay Summary

Hospitals are increasingly using artificial intelligence (AI) tools to support nurses’ decision-making. For these tools to be safe and helpful, nurses need to understand what the AI results mean. In this study, we surveyed nurses at a large hospital to see how well they understood key ideas that explain AI “risk scores,” such as sensitivity, specificity, and CIs. We tested 2 kinds of understanding. Verbatim understanding means remembering the exact numbers. Gist understanding means knowing the main message the numbers convey. Nurses scored about 86 out of 100 overall. They performed better at recalling exact numbers than at understanding their meaning, particularly for CIs. We also measured numeracy (comfort with numbers). Strong numeracy did not always mean nurses understood these statistics well. Nurses with advanced degrees were better at recalling exact facts, while nurses with bachelor’s degrees did better on tasks that relied on gist understanding. After the AI tool was used in practice, some nurses asked how the model arrived at its predictions and what the “high risk” and “low risk” cutoffs meant. These findings suggest that nurses may require clearer training and simpler AI displays to use AI predictions with confidence in everyday patient care.

Introduction

The integration of artificial intelligence (AI) into clinical decision support (CDS) has accelerated across health-care settings, including nursing.1–6 While AI tools can enhance decision-making, their successful adoption relies on nurses appropriately interpreting probabilistic outputs, such as their statistical knowledge.7–10 Frontline nurses often need to assess and respond to uncertain AI predictions.11,12

It is unclear whether nurses have an adequate understanding of the development, validation, and deployment of AI models. Artificial intelligence algorithms based on machine learning or artificial neural networks often act as “black boxes,” which makes it to select, interpret, and utilize their outputs.10,13 Consider a hypothetical event-prediction model with 95% sensitivity and 90% specificity in detecting a rare condition with a prevalence of <5%. If the model predicts positivity for a given patient, the actual likelihood that the patient will experience the event—the positive predictive value (PPV)—may still be low, and so a nurse without sufficient statistical knowledge may erroneously assume that “95% sensitivity” makes the prediction trustworthy. In contrast, a statistically knowledgeable nurse would recognize the impact of a low base rate on PPV and proceed cautiously (see the Box of Supplementary Material).

Understanding the tradeoffs between false-positive and false-negative rates also requires familiarity with the clinical context and an understanding of model performance metrics such as sensitivity, specificity, PPV, negative predictive value, and CIs. These statistical constructs are fundamental to assessing discrimination and calibration, which in turn support informed judgments about the appropriateness and clinical utility of a model when uncertainty is present.7–10 Without such understanding, nurses may either over- or underestimate the reliability of AI predictions, threatening the safe and effective implementation of AI.11,12 Nurses’ familiarity with AI and statistical concepts significantly influences their trust, expectations, and willingness to adopt AI systems.14,15 Misconceptions may lead to unrealistic expectations or unwarranted skepticism, both of which hinder the integration of AI into clinical practice.

This study applied the complementary fuzzy-trace theory (FTT)16,17 and cognitive-fit theory (CFT)18 to examine nurses’ health numeracy and their understanding of statistical knowledge in a mixed-form knowledge perspective, and to consider practical approaches for improving nurses’ understanding in real practice. We also explored how nurses encode and interpret AI outputs qualitatively using AI fall-risk prediction as a clinical exemplar. This was performed as an organizational case study at a single institution to minimize institutional bias and control for system-level confounding by factors such as prior exposure, education, informatics maturity, and patient safety culture.

Background

This study formed part of the INSIGHT Project (Intelligent Nursing for Safety Improvement with Guided Healthcare Technology), a multi-institutional initiative focused on developing and implementing an AI CDS system to prevent inpatient falls integrated with electronic health records (EHRs) data.19,20 The study aimed to reveal nurses’ understanding during their first use of a risk-prediction tool and to determine strategies to enhance that understanding. Fuzzy-trace theory was used to investigate their understanding of statistical concepts commonly produced by risk-prediction models. Additionally, CFT was used to distinguish between aspects of user needs that can and cannot be systematically addressed through system design, thereby informing both the study’s execution and interpretation. Fuzzy-trace theory involves a dual-process framework of memory, reasoning, judgment, and decision-making that has been widely applied to understanding risk-related decisions across contexts and life stages. It provides a cognitive explanation for how individuals encode, retrieve, and translate information into action, clarifying why their recognition, recall, and risk judgments can vary across situations.21,22 In contrast, CFT posits that task performance improves when information is presented in a format consistent with users’ cognitive structures and decision-making strategies,18 which is especially relevant when clinicians need to interpret AI predictions within EHR-based systems.23

FTT: verbatim and gist knowledge

Fuzzy-trace theory posits that individuals encode information in 2 parallel forms: verbatim knowledge (precise, literal details such as probabilities and numerical values) and gist knowledge (the essential, intuitive meaning). In the presence of uncertainty, real-world decisions are predominantly guided by gist knowledge.4,16 For example, a nurse may interpret a 22.2% lifetime breast cancer risk as “low” based on its numerical value, but correctly recognize it as “elevated” when compared with the population average of 11.3%. Gist knowledge is influenced by statistical input, numeracy, experience, affect, and cultural framing.22,24 In the context of AI CDS systems, clinicians are more likely to rely on perceived implications than raw metrics. Therefore, aligning statistical outputs with gist knowledge can reduce interpretive errors such as base-rate neglect and promote informed clinical judgments.

CFT: aligning information and cognition

Cognitive-fit theory originated in decision science and health informatics, and posits that cognitive ability improves when information is presented in alignment with users’ mental models and problem-solving strategies.18 Cognitive-fit theory indicates that AI interfaces should reduce cognitive load by presenting outputs in a structure congruent with nurses’ internal representations.23 We adopted an extended cognitive-fit model integrated with distributed cognition theory25 to distinguish between internal representations (knowledge, experience, and training) and external representations (the visual format and structure of AI outputs) (Figure 1). This allowed us to examine how nurses’ statistical knowledge interacts with model output design to shape their understanding and decision-making ability.

Figure 1.

Research framework showing that internal representation of the problem domain and external problem representation, together with an AI-based risk prediction model for preventing falls integrated in an EHR system, shape the mental representation for task solution, which in turn influences users’ response and ultimately problem-solving performance.

Research framework of the study. Abbreviations: AI, artificial intelligence; EHR, electronic health record.

Fuzzy-trace theory was operationalized in this study by assessing nurses’ verbatim and gist knowledge of key AI statistical concepts. Explorations of the relationship between these knowledge forms and health numeracy were aimed at informing interface design strategies that enhance interpretability and the reliable use of predictive analytics based on CFT.

Methods

Study design and setting

This cross-sectional study employed a structured, self-administered survey supplemented by qualitative data collected through participants’ verbal and electronic reports. The study was conducted at a tertiary academic hospital in Seoul that for >20 years had utilized a self-developed EHR system featuring a rule-based tool for recommending nursing diagnoses. By 2020, the hospital had not introduced any machine-learning or deep-learning tools for nurses. The AI CDS system for fall prevention was the first prediction tool integrated into nurses’ routine practice as part of the INSIGHT Project.

Participants

The study population comprised 180 nurses working in 6 medical–surgical units. Given the large variation in clinical experience across units and the small proportion of nurses with <1 year of experience, stratified sampling based on unit, role, and years of experience was used to recruit approximately 30% of eligible nurses. Participants were categorized into 3 experience groups: novice (<1 year), junior (1-5 years), and senior (>5 years). At the study institution, a charge nurse is defined as a senior nurse who performs both direct patient care and unit-level administration. Sixty nurses who had worked in the selected unit for ≥6 months during the study period were ultimately enrolled. Nurses were excluded if they were employed on a temporary, per diem, or probationary basis, or were planning to resign or transfer during the study period. Based on job function, participants were further classified into direct care providers (staff nurses) and managers.

To complement the quantitative findings, all 180 nurses were invited to submit anonymous feedback regarding the AI fall predictions. Nurses were asked to submit their concerns, questions, and comments about the model’s risk classification and output interpretation via the hospital’s internal communication system or verbally to the research team.

Ethical considerations

The study was approved by the Institutional Review Board of Samsung Medical Center (IRB No. SMC 2019-11-105-001). Written informed consent was obtained from all participants after they were briefed on the study’s purpose, potential benefits, and risks. Data collection and coding were conducted by trained research staff members while maintaining participant anonymity and confidentiality.

Study tools and measurements

Statistical understanding was measured using a vignette tool designed to evaluate understanding of key concepts—specifically sensitivity, specificity, and CIs—through the dual lens of verbatim and gist knowledge.11 The vignette was adapted from the original tool developed by Weissman et al. for weather prediction with reduced decision bias and personal context effects. It includes a didactic module that presents statistical concepts through textual summaries, icon arrays, and numerical examples. Each concept was assessed via 2 questions (on verbatim and gist knowledge) and 1 discrete-choice experiment (DCE) evaluating tradeoffs between predictive models. Scores were calculated as mean percentages of correct responses. The Korean translation of the vignette was validated by 2 bilingual experts: a PhD-level nursing informatics professor at the University of Massachusetts College of Nursing (Dr Joohyun Chung), and a diagnostic imaging specialist at Boston Children’s Hospital (Dr Donsoo Kim), both of whom rated the translation’s accuracy at ≥4 out of 5.

Numeracy was defined as the ability to interpret and apply probabilistic and proportional information relevant to clinical decision-making. Subjective numeracy was assessed using the 8-item Subjective Numeracy Scale (SNS; Cronbach’s α = 0.82),26 while objective numeracy was evaluated using the Numeracy Understanding in Medicine Instrument: short form (S-NUMi) and the Lipkus Objective Numeracy Scale (ONS).27,28 Numeracy Understanding in Medicine Instrument: short form focuses on applied health numeracy (eg, interpreting graphs and tables), with scores categorized as low (<4), moderate (4-6), or high (>6). Lipkus Objective Numeracy Scale comprises 7 items measuring basic probability and disease-risk understanding. Permissions were obtained for using and translating the tools, and expert validation procedures were conducted in parallel with those used for the statistical knowledge instrument.

Figure 2 shows an EHR output screenshot from the AI fall-risk prediction model, which indicates a patient’s risk group, probability, and risk factors.

Figure 2.

This image shows an example screen of the fall-prediction model implemented in the hospital’s EHR system, displaying the estimated fall risk and contributing risk factors.

Example screen of the fall-prediction model implemented in the hospital’s EHR system, displaying the estimated fall risk and contributing risk factors (from ref.29).

Data collection and analysis

Data were collected over a 3-month period starting in August 2020. Although stratified sampling was initially planned to ensure a balanced distribution of nursing experience, recruitment challenges (particularly for novice nurses) resulted in only 8 novice nurses and 6 charge nurses being included. All participants completed a structured questionnaire that included items assessing statistical knowledge, validated scales for subjective and objective health numeracy, and demographic variables such as clinical unit, experience, and role. Personally identifiable information beyond stratification variables was not included to ensure respondent anonymity.

Quantitative analyses included descriptive statistics for the mean correct responses and mean scores with 95% CIs. Differences according to clinical experience and education were assessed using Kruskal–Wallis and Wilcoxon rank-sum tests, respectively. Pearson coefficients were calculated to assess correlations among statistical knowledge, subjective numeracy, and objective numeracy. The internal consistency of SNS was evaluated using Cronbach’s α. Data were analyzed using SAS statistical software (version 9.4, SAS Institute).

Qualitative data were collected using both an intranet communication platform and a researcher’s on-site visits following the system’s deployment. Two researchers (S.S. and Mira Song), who are nurse managers in the hospital, gathered the raw text data and transcribed nurses’ words from regular in-person visits to the units on a weekly basis. Two independent analysts (I.C. and S.S.) conducted rapid thematic analyses to identify recurring interpretive patterns. Responses were clustered into dominant thematic categories, with an interrater agreement of 96.7%.

Results

Participant characteristics

The survey participants had a mean age of approximately 31 years, with females predominating (Table 1). Most (n = 52, 87.7%) participants had a Bachelor of Science in Nursing (BSN), while 8 (13.3%) had a Master of Science in Nursing (MSN) or higher, and 24 (40%) had managerial roles. The mean knowledge score for statistical concepts was 85.56 out of 100 (95% CI, 82.64-88.46). The mean SNS score was 37.22, with only 23 participants (38.3%) achieving high subjective numeracy. The mean score for objective numeracy was 7.0 out of 8 on S-NUMi and 6.17 out of 7 on ONS, with high numeracy demonstrated by 43 (71.7%) and 26 (43.3%) participants, respectively.

Table 1.

Characteristics of the study participants.

Variable Number (%) Mean (95% CI)
Clinical experience 80.88
 <1 year 8 (13.3) (59.42-102.35)
 1-5 years 28 (46.7)
 >5 years 24 (40.0)
Age 31.28
 <26 years 11 (18.3) (29.34-33.23)
 26-28 years 15 (25.0)
 >28 years 34 (56.7)
Gender NA
 Female 56 (93.3)
 Male 4 (6.7)
Education NA
 BSN 52 (87.7)
 MSN or higher 8 (13.3)
Current role NA
 Manager 24 (40.0)
 Staff nurse 36 (60.0)
Statistical knowledge score NA 85.56 (82.64-88.46)
 Verbatim 81.11 (76.53-85.70)
 Gist 76.11 (69.37-82.85)
 DCE 99.44 (98.33-100.0)
SNS score (range 8-48) 37.22 (35.50-38.93)
 High (≥40) 23 (38.3)
 Low (≤39) 37 (61.7)
S-NUMi score (range 0-8) 7.00 (6.77-7.23)
 High (>6) 43 (71.7)
 Moderate (4-6) 17 (28.3)
 Low (<4) 0
ONS score (range 0-7) 6.17 (5.93-6.41)
 High (≥7) 26 (43.3)
 Low (≤6) 34 (56.7)

Abbreviations: BSN, bachelor of science in nursing; DCE, discrete-choice experiment; MSN, master of science in nursing; ONS, Lipkus Objective Numeracy Scale; SNS, Subjective Numeracy Scale; S-NUMi, Numeracy Understanding in Medicine Instrument: short form.

Statistical knowledge and numeracy scores

The overall knowledge score was significantly lower for CIs (mean = 68.78, 95% CI, 62.07-73.49) than for sensitivity and specificity (Table 2). The mean scores for both verbatim and gist knowledge of CIs were notably low (46.67 and 56.67, respectively), with considerable variability. In contrast, the DCE, which assessed the ability to quantify tradeoffs between opposing models, yielded consistently high scores ranging from 98.33 to 100.0, with no significant difference across statistical concepts.

Table 2.

Nurses’ knowledge scores for interpreting statistical concepts.

Concept Mean (95% CI)
F (P)
Sensitivity Specificity CIs
Verbatim knowledge 100.0a
  • 96.67a

  • (92.10-100.0)

  • 46.67b

  • (33.67-59.66)

  • 56.20

  • (<.0001)

Gist knowledge
  • 86.67a

  • (77.81-95.52)

  • 85.0a

  • (75.90-94.30)

  • 56.67b

  • (43.76-69.58)

  • 10.30

  • (<.0001)

DCE
  • 98.33a

  • (95.10-100.0)a

100.0a 100.0a
  • 1.00

  • (.3699)

Overall knowledge
  • 95.00a

  • (91.52-98.48)

  • 93.89a

  • (90.53-97.25)

  • 67.78b

  • (62.07-73.49)

  • 50.90

  • (<.0001)

Values with different letters (a and b) within each row differ significantly (P < .05) and are the results of post hoc comparisons.

Abbreviation: DCE, discrete-choice experiment.

The scores for statistical knowledge and both objective and subjective numeracy did not differ significantly with clinical experience (Figure 3), and no significant differences were observed between verbatim and gist knowledge.

Figure 3.

Bar chart comparing standardized scores (0–100) for overall statistical knowledge (overall, verbatim, gist, DCE) and numeracy (SNS, S-NUMi, ONS) across three clinical experience levels (novice, junior, senior). Seniors generally score highest, novices lowest, with most group means clustered around or above 80, and a red dashed line indicating the 80-point reference level.

Comparison of statistical knowledge and numeracy according to clinical experience, standardized to a 100-point scale for the Subjective Numeracy Scale (SNS), Numeracy Understanding in Medicine Instrument: short form (S-NUMi), and Lipkus Objective Numeracy Scale (ONS). Abbreviation: DCE, discrete-choice experiment.

The scores for overall statistical knowledge did not differ significantly between nurses with a BSN and those with an MSN or higher (Figure 4). However, the verbatim knowledge score was significantly higher in the latter group (95.83 vs 78.85, z = 2.55, P = .0108), whereas the DCE score was significantly higher in the former group (100.0 vs 95.83, z=–2.50, P = .0124). The scores for gist knowledge, objective numeracy, and subjective numeracy did not differ significantly with education.

Figure 4.

Bar chart comparing standardized scores (0–100) for overall statistical knowledge (overall, verbatim, gist, DCE) and numeracy (SNS, S-NUMi, ONS) between nurses with a BSN and those with an MSN or higher. Scores are generally similar, but the MSN-or-higher group shows significantly higher scores for verbatim knowledge and DCE, marked with asterisks (P < .05).

Comparison of statistical knowledge and numeracy according to education, standardized to a 100-point scale for SNS, S-NUMi, and ONS. * = significant difference, P < .05.

For the key study variables, there was a moderate positive correlation between gist knowledge and DCE score (γ = 0.38, P = .0025) and between SNS and S-NUMi scores (γ = 0.34, P = .0077; Figure 5). There was a strong correlation between age and clinical experience (γ = 0.89, P < .0001), as expected, but neither variable was significantly associated with statistical knowledge or numeracy scores.

Figure 5.

Heatmap showing Pearson correlation coefficients among statistical knowledge domains (verbatim, gist, DCE), numeracy measures (SNS, S-NUMi, ONS), age, and clinical experience in months. Most correlations are weak, with moderate positive correlations between gist and DCE, between SNS and S-NUMi, and a strong positive correlation between age and clinical experience. Warmer colors represent positive and cooler colors negative correlations.

Heatmap of Pearson coefficients for correlations between statistical knowledge domains, numeracy measures, and demographic variables. Warmer and cooler colors indicate stronger positive and negative correlations, respectively. Abbreviation: Clinical Exp, clinical experience, months.

Qualitative findings

Thematic analyses of qualitative responses revealed 4 dominant patterns of interpretive concerns regarding AI fall predictions, as described below.

Overrides due to cognitive mismatch

The most common concern involved nurses overriding the model’s risk classification when it conflicted with their clinical judgment or previous criteria such as the Morse Fall Scale (MFS). This reflects a conservative response pattern grounded in perceived cognitive dissonance between the system output and personal experience. For example, several nurses noted, “A patient who would have been classified as high risk under the previous criteria (MFS) was categorized as not high risk by the prediction model, so I overrode it to high risk based on my judgment.” Similar responses indicated a reliance on heuristics such as advanced age, sedative use, and known risk factors (eg, bleeding tendency, fracture risk) as grounds for overriding low-risk predictions. Conversely, nurses sought clarification when patients were perceived as low risk but flagged as high risk: “The patient is on postoperative day 2 but has been classified as high risk of falling–requesting clarification.” Another nurse expressed confusion about fluctuating predictions: “The fall probability keeps fluctuating between high and not high.” These examples suggest inconsistencies between the internal cognitive models and the system’s risk logic.

Concerns about variable explainability

The second concern reflected demands for model transparency and clinical consistency. Nurses questioned whether specific clinical indicators were included in the model, such as: “Is the Mini Mental State Examination result included in the fall-risk data?”, “Is the structured input for limb motor function included in the model?”, and “Does the model consider not only structured documentation, such as situational records, but also narrative nursing interventions written in the nursing notes?” These queries suggest a desire for alignment between perceived clinically meaningful data and the model’s internal logic, indicating that explainability of AI would facilitate its frontline adoption.

Risk-threshold clarity and risk calibration

The third concern centered on the interpretability of risk thresholds. Some nurses sought clarification about how the model distinguishes between high- and low-risk classifications and what numerical thresholds represent. As one nurse asked, “How is the cutoff between high and low risk determined? And why was it set that way?”, while another added, “What is the actual threshold value? Patients are classified as high risk, but the probabilities seem to differ—what does that mean?” These statements reflect uncertainty in translating probabilistic outputs into actionable clinical decisions.

Reduce burden and reliance

The fourth concern included mixed comments such as, “The burden of documenting the MFS has decreased,” and “It’s convenient to have the system automatically indicate the risk level, but I’m concerned that my own judgment may become unnecessary over time.” One nurse manager also noted, “It seems the inconsistency among nurses that was problematic with the MFS might be resolved.”

Discussion

The effectiveness of AI CDS in nursing workflows hinges not only on algorithmic performance but also on nurses’ ability to interpret AI outputs. This study revealed that while nurses generally demonstrate high numeracy, some struggle with core statistical concepts. For these individuals, understanding situations involving low-frequency outcomes is likely to cause even more conceptual interference. For example, it is not easy for them to grasp why the PPV of a model with 95% sensitivity and 90% specificity might still be low for a rare event such as inpatient falls. Assuming a fall incidence of around 1%, Bayes’ theorem indicates that the probability of a positive test prediction among 1000 patient-days can be expressed as P(+)=P(+|Fall)·P(Fall)+P(+|No fall)·P(No fall)=(0.95)(0.01)+(0.10)(0.99)=0.1085. However, the value of clinical interest, P(Fall|+), is given by {P(+|Fall)·P(Fall)}/P(+)=(0.95 × 0.01)/0.1085 ≈ 8.8%. Therefore, interpreting prediction model outputs for low-frequency outcomes like inpatient falls requires a clear understanding of statistical concepts. This cognitive bias (base-rate neglect) refers to the tendency to overlook the underlying prevalence of an event when interpreting conditional probabilities,4,30,31 highlighting a potential mismatch between perceived and actual risks. These gaps in interpretive readiness may lead to overconfidence, misinterpretation of AI guidance, and missed opportunities for appropriate care.30

The participants in this study were registered nurses with baccalaureate or master’s degree, and constituted a relatively homogeneous group. However, despite being educated in a standardized nursing education system and exhibiting high numeracy, they demonstrated only moderate overall statistical knowledge, with a particularly poor understanding of CIs. Their statistical knowledge score (85.56) was lower than that of laypersons in a previous study using the same vignette,11 with especially the gap for CI-related items exceeding 18 points. According to FTT, individuals often struggle more with probabilistic reasoning than with fact recall, particularly in clinical scenarios involving low-prevalence events. The nurses in this study similarly appeared to conflate statistical terms, hindering their ability to interpret CIs.

The nurses scored significantly higher on the DCE, indicating a greater ability to understand tradeoffs between model outputs. This divergence suggests that despite statistical concepts not being well internalized, nurses may still engage in intuitive reasoning when faced with applied decision-making tasks, highlighting a gap between formal statistical knowledge and functional interpretive behavior.

While contrary to expectations from FTT,16,17,22 the nurses exhibited better verbatim than gist knowledge, regardless of their experience. While gist knowledge is typically considered more cognitively efficient and expert-driven,4,21 the observed pattern suggests that clinical exposure alone is insufficient to intuitively interpret statistical outputs in a context-sensitive manner. Moreover, while holders of advanced degrees scored higher on verbatim knowledge, they underperformed in the DCE, reinforcing the need to bridge theoretical understanding and clinical decision-making. Correlational analyses further supported this dissociation, with gist knowledge being positively associated with DCE performance, and verbatim scores showing weak associations with age, experience, and numeracy. These findings align with other research indicating inadequate statistical knowledge among nurses, including a poor understanding of core concepts such as probability, sensitivity, specificity, and model accuracy metrics (eg, c-statistics, AUROC).7–9,32,33 Without conceptual clarity, nurses may misinterpret AI outputs, either overrelying on predictions or rejecting them based on intuition or skepticism.14,15 In our study, this was evident in some nurses’ tendency to compare and override the AI predictions when these conflicted with their familiar heuristics (ie, confirmation bias), selectively attending to discrepancies that aligned with their prior expectations.34 Some nurses exhibited signs of automation bias, characterized by overreliance on AI outputs. These users tended to defer their clinical judgment, accepting AI predictions without critical evaluation.35

Nonetheless, most nurses did not raise questions or concerns about the prediction model. Some recognized the relevance of the input data and actively suggested improvements to data entry and variable selection. Others appreciated the standardization it brought across shifts and considered that the model helped to reduce the documentation burden associated with repeated applications of the MFS.

The nurses in our sample scored above average on validated health numeracy instruments (SNS, S-NUMi, and ONS),26–28 which was not correlated with an improved understanding of statistical concepts. This diverges from previous suggestions that statistical knowledge—the ability to apply probabilistic reasoning in practice—is a unique predictor of decision-making competence and that health numeracy mediates the effect of statistical knowledge on decision-making performance.11,12,36 One plausible explanation is that the nurses had insufficient exposure to decision-making situations that required the direct application of statistical knowledge in clinical contexts. Consequently, opportunities to develop gist knowledge from verbatim knowledge may have not facilitated the cognitive ability necessary to interpret AI outputs effectively or to make informed judgments about their appropriate use in clinical decision-making.

These results reveal a critical informatics challenge: bridging the gap between nurses’ internal cognitive models and AI systems. The extended CFT23 emphasizes the importance of presenting predictive information in formats consistent with user expectations and interpretive heuristics. Accordingly, displaying the visual SHAP (Shapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) in the user interface may improve the understandability of the prediction model. SHAP provide consistent, mathematically grounded explanations by quantifying each feature’s contribution (positive or negative) to a specific prediction, while LIME can use visually intuitive explanations such as bar plots to clearly show what drove a decision.

Embedding user-centered design principles in AI interfaces alongside providing structured training on model logic and output interpretation will be essential for the safe and meaningful integration of AI into nursing workflows. The findings from this study have been applied within the INSIGHT Project, with the AI prediction model implemented as a CDS system across 3 additional health-care institutions (1 tertiary university hospital and 2 public secondary hospitals), and it is currently undergoing multisite clinical evaluations. Phenomena similar to those observed in this study—including challenges in interpretive ability and reliance on heuristics—were consistently identified during these implementations, further emphasizing the need for appropriate user interfaces and greater efforts to help nurses understand AI outputs. Moreover, the hospital participating in this study has expanded its AI applications beyond fall prevention to include pressure injuries and phlebitis.

Both academic institutions and health-care organizations are actively implementing education programs related to AI literacy for students and health-care professionals.37–39 However, many nurses still lack opportunities for practical engagement with AI in real-world settings. Bridging this critical educational gap will require targeted training and user-friendly instructional design to support learning processes in practical applications.

Limitations of this study include the underrepresentation of novice nurses, which restricts its generalizability across experience levels. All participants were baccalaureate-holding nurses from a single tertiary hospital, restricting educational diversity. Additionally, the survey sample comprised a stratified sample of 180 nurses who participated voluntarily. While their characteristics such as age, experience, gender, education, and role might not have been representative. It is possible that the vignette used in this study did not fully capture nurses’ statistical knowledge, since the tool was designed for laypersons with a focus on accessibility and risk communication. Its content validity was supported through expert review, and its construct validity was evidenced by significant correlations with objective and subjective numeracy measures (S-NUMi and SNS). However, the lack of test–retest reliability data make its stability unclear. Lastly, this study focused on individual cognitive ability, and so broader dimensions of AI implementation such as ethical considerations, organizational policies, and team-level abilities were not considered.

Conclusion

This study identified that the gap between model outputs and users’ ability to interpret them represents a critical barrier to the safe and effective use of AI predictions. Nurses’ inadequate understanding of core statistical concepts, particularly in the context of low-prevalence, high-uncertainty predictions, can lead to confusion, mistrust, or inappropriate overrides of AI recommendations. These interpretive gaps are not merely educational deficits; they represent a structural misalignment between system design and user cognition. Realizing the full potential of AI in nursing care requires system developers, educators, and health-care organizations to collaboratively design solutions that provide cognitive alignment in both user interfaces and training curricula, ensuring that predictive analytics are not only accurate but also understandable and actionable at the point of care.

Supplementary Material

ooag009_Supplementary_Data

Acknowledgments

We thank 2 researchers, Joohyun Chung, PhD, and Donsoo Kim, PhD, for their assistance in translating and validating the research tools. We also appreciate the nurses who participated in this study, as well as Mira Song, PhD, who assisted with the administrative and logistical work, including preparation at the hospital.

Contributor Information

Insook Cho, School of Nursing, Inha University, Incheon, 22212, Republic of Korea; Division of General Internal Medicine, The Center for Patient Safety Research and Practice, Brigham and Women’s Hospital, Boston, MA 02120, United States.

Soyun Shim, Department of Nursing, Samsung Medical Center, Seoul, 06351, Republic of Korea.

Hyunchul Park, The Graduate School of Business IT, Kookmin University, Seoul, 02707, Republic of Korea.

Author contributions

Insook Cho (Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Supervision, Validation, Writing—original draft, Writing—review & editing), Soyun Shim (Data curation, Formal analysis, Project administration, Resources, Writing—review & editing), and Hyunchul Park (Conceptualization, Formal analysis, Software, Validation, Visualization, Writing—review & editing)

Supplementary material

Supplementary material is available at JAMIA Open online.

Funding

This study was supported by grants from the National Research Foundation of Korea grant funded by the Korea government (MSIT) (RS 2024-00341841 and RS 2024-00466631).

Conflicts of interest

No conflict of interest has been declared by the authors.

Data availability

The data supporting this study’s findings are available from the authors upon reasonable request.

Ethical approval statement

The Institutional Review Boards approved this study (Samsung Medical Center IRB, No. SMC 2019-11-105-001).

References

  • 1.Durlach P, Fournier R, Gottlich J, et al. The AI maturity roadmap: a framework for effective and sustainable ai in health care. NEJM AI Sponsored. 2024. 10.1056/AI-S2400177 [DOI] [Google Scholar]
  • 2.Sahni NR, Carrus B.. Artificial intelligence in US health care delivery. N Engl J Med. 2023;389:348-358. [DOI] [PubMed] [Google Scholar]
  • 3.Bates DW, Levine D, Syrowatka A, et al. The potential of artificial intelligence to improve patient safety: a scoping review. NPJ Digit Med. 2021;4:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Reyna VF. A theory of medical decision making and health: fuzzy trace theory. Med Decis Making. 2008;28:850-865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Webster P. Six ways large language models are changing healthcare. Nat Med. 2023;29:2969-2971. [DOI] [PubMed] [Google Scholar]
  • 6.Hartman V, Zhang X, Poddar R, et al. Developing and evaluating large language model-generated emergency medicine handoff notes. JAMA Netw Open. 2024;7:e2448723-e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hayat MJ, Kim M, Schwartz TA, Jiroutek MR.. A study of statistics knowledge among nurse faculty in schools with research doctorate programs. Nurs Outlook. 2021;69:228-233. [DOI] [PubMed] [Google Scholar]
  • 8.Redman TC, Hoerl RW.. AI and Statistics: Perfect Together. Massachusetts Institute of Technology; 2024. https://sloanreview.mit.edu/article/ai-and-statistics-perfect-together/. [Google Scholar]
  • 9.Friedrich S, Antes G, Behr S, et al. Is there a role for statistics in artificial intelligence? Adv Data Anal Classif. 2022;16:823-846. [Google Scholar]
  • 10.Collins GS, Dhiman P, Ma J, et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ. 2024;384:e074819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Weissman GE, Yadav KN, Madden V, et al. Numeracy and understanding of quantitative aspects of predictive models: a pilot study. Appl Clin Inform. 2018;9:683-692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schwartz LM, Woloshin S, Black WC, Welch HG.. The role of numeracy in understanding the benefit of screening mammography. Ann Intern Med. 1997;127:966-972. [DOI] [PubMed] [Google Scholar]
  • 13.Liu Y, Chen P-HC, Krause J, Peng L.. How to read articles that use machine learning: users’ guides to the medical literature. JAMA. 2019;322:1806-1816. [DOI] [PubMed] [Google Scholar]
  • 14.Choudhury A. Toward an ecologically valid conceptual framework for the use of artificial intelligence in clinical settings: need for systems thinking, accountability, decision-making, trust, and patient safety considerations in safeguarding the technology and clinicians. JMIR Hum Factors. 2022;9:e35421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Walther CC. Hybrid intelligence: the future of human-AI collaboration: a path to harnessing the range of our assets. Psychol Today. March 12, 2025. Accessed June 10, 2025. https://www.psychologytoday.com/us. [Google Scholar]
  • 16.Reyna V, Brainerd C.. Fuzzy-trace theory: some foundational issues. Learn Individ Differ. 1995;7:145-162. [Google Scholar]
  • 17.Reyna VF, Brainerd CJ.. Dual processes in decision making and developmental neuroscience: a fuzzy-trace model. Dev Rev. 2011;31:180-206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Vessey I. Cognitive fit: a theory‐based analysis of the graphs versus tables literature. Decis Sci. 1991;22:219-240. [Google Scholar]
  • 19.Cho I, Cho J, Hong JH, Choe WS, Shin H.. Utilizing standardized nursing terminologies in implementing an AI-powered fall-prevention tool to improve patient outcomes: a multihospital study. J Am Med Inform Assoc. 2023;30:1826-1836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cho I, Sun Jin I, Park H, Dykes PC.. Clinical impact of an analytic tool for predicting the fall risk in inpatients: controlled interrupted time series. JMIR Med Inform. 2021;9:e26456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Blalock SJ, Reyna VF.. Using fuzzy-trace theory to understand and improve health judgments, decisions, and behaviors: a literature review. Health Psychol. 2016;35:781-792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Reyna VF, Brainerd CJ.. Numeracy, gist, literal thinking and the value of nothing in decision making. Nat Rev Psychol. 2023;2:1-39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shaft TM, Vessey I.. The role of cognitive fit in the relationship between software comprehension and modification. MIS Q. 2006;30:29-55. [Google Scholar]
  • 24.Gleaves LP, Broniatowski DA.. Impact of gist intervention on automated system interpretability and user decision making. Cogn Res Princ Implic. 2024;9:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang J. The nature of external representations in problem solving. Cogn Sci. 1997;21:179-217. [Google Scholar]
  • 26.Fagerlin A, Zikmund-Fisher BJ, Ubel PA, Jankovic A, Derry HA, Smith DM.. Measuring numeracy without a math test: development of the subjective numeracy scale. Med Decis Making. 2007;27:672-680. [DOI] [PubMed] [Google Scholar]
  • 27.Schapira MM, Walker CM, Miller T, et al. Development and validation of the numeracy understanding in medicine instrument short form. J Health Commun. 2014;19:240-253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lipkus IM, Samsa G, Rimer BK.. General performance on a numeracy scale among highly educated samples. Med Decis Making. 2001;21:37-44. [DOI] [PubMed] [Google Scholar]
  • 29.Shim S, Yu JY, Jekal S, et al. Development and validation of interpretable machine learning models for inpatient fall events and electronic medical record integration. Clin Exp Emerg Med. 2022;9:345-353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gigerenzer G, Gaissmaier W, Kurz-Milcke E, Schwartz LM, Woloshin S.. Helping doctors and patients make sense of health statistics. Psychol Sci Public Interest. 2007;8:53-96. [DOI] [PubMed] [Google Scholar]
  • 31.Markovits H, Béghin G.. The paradoxical effects of time pressure on base rate neglect. Cognition. 2023;237:105451. [DOI] [PubMed] [Google Scholar]
  • 32.Jeffery AD, Novak LL, Kennedy B, Dietrich MS, Mion LC.. Participatory design of probability-based decision support tools for in-hospital nurses. J Am Med Inform Assoc. 2017;24:1102-1110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Whiting PF, Davenport C, Jameson C, et al. How well do health professionals interpret diagnostic information? A systematic review. BMJ Open. 2015;5:e008155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bashkirova A, Krpan D.. Confirmation bias in AI-assisted decision-making: AI triage recommendations congruent with expert judgments increase psychologist trust and recommendation acceptance. Comput Human Behav. 2024;2:100066. [Google Scholar]
  • 35.Khera R, Simon MA, Ross JS.. Automation bias and assistive AI: risk of harm from AI-driven clinical decision support. JAMA. 2023;330:2255-2257. [DOI] [PubMed] [Google Scholar]
  • 36.Cokely ET, Feltz A, Ghazal S, Allan JN, Petrova D, Garcia-Retamero R.. Decision making skill: from intelligence to numeracy and expertise. In: Ericsson KA, Hoffman RR, Kozbelt A, Williams AM, eds. Cambridge Handbook of Expertise and Expert Performance. Vol. 2018. Cambridge University Press; 2018:476-505. [Google Scholar]
  • 37.Buchanan C, Howitt ML, Wilson R, Booth RG, Risling T, Bamford M.. Predicted influences of artificial intelligence on nursing education: Scoping review. JMIR Nurs. 2021;4:e23933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Perchik JD, Smith AD, Elkassem AA, et al. Artificial intelligence literacy: developing a multi-institutional infrastructure for AI education. Acad Radiol. 2023;30:1472-1480. [DOI] [PubMed] [Google Scholar]
  • 39.Simpson RL. Integrating big data into nursing education: a call to action for faculty. Nurs Educ Perspect. 2023;44:333-334. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ooag009_Supplementary_Data

Data Availability Statement

The data supporting this study’s findings are available from the authors upon reasonable request.


Articles from JAMIA Open are provided here courtesy of Oxford University Press

RESOURCES