Skip to main content
JAMIA Open logoLink to JAMIA Open
. 2026 Jan 11;9(1):ooag001. doi: 10.1093/jamiaopen/ooag001

A preliminary evaluation of AFibrisk: digital decision-support platform for atrial fibrillation risk assessment after cryptogenic stroke—a cross-sectional concordance study

João Brainer Clares de Andrade 1,2,3,4, Rafael P Gomes 5, Alexandre Cristiuma Robles 6, Thales Pardini Fagundes 7, George N Nunes Mendes 8,
PMCID: PMC12924629  PMID: 41727413

Abstract

Objectives

Identifying patients at high risk for atrial fibrillation (AF) after cryptogenic stroke remains a challenge, particularly in settings with limited access to long-term cardiac monitoring. The AFibrisk platform, a free digital decision-support tool, integrates 19 validated AF prediction scores to support post-stroke triage. We aimed to assess the concordance of AFibrisk-supported classification decisions with expert electrophysiologist consensus and compare performance across evaluator groups with different levels of clinical experience.

Materials and Methods

A prospective, cross-sectional concordance study was conducted using 29 standardized clinical vignettes. Evaluators—3 vascular neurologists, 4 cardiology residents, and 11 neurology residents—classified each case as high or low AF risk using AFibrisk outputs. Expert consensus served as the reference standard. Statistical analyses included inter-group comparisons, inter-rater reliability, and regression models adjusting for group size and response clustering.

Results

Vascular neurologists demonstrated the highest agreement with the reference standard (mean 90.3%), followed by cardiology residents (85.2%) and neurology residents (77.5%). Differences were statistically significant (ANOVA p = .0199; Kruskal–Wallis p = .0259). Neurology residents showed the greatest intra-group consistency (Light’s κ = 0.607), despite lower accuracy. Classification errors differed by experience: residents tended to overestimate risk, while experts showed occasional underestimation. Overall, 30.1% of responses were “not classified,” with the highest uncertainty among vascular neurologists (43.8%).

Discussion and Conclusion

AFibrisk improved alignment with expert judgment across evaluator groups and helped standardize decision-making. Our free platform may support AF risk stratification in low-resource environments and reinforce evidence-based heuristics among early-career clinicians, and it is available at www.afibrisk.net

Keywords: atrial fibrillation, stroke, decision-support system, clinical prediction, digital health, electrophysiology, neurology

1. Introduction

Atrial fibrillation (AF) is one of the most common causes of embolic stroke, yet its diagnosis is frequently delayed or missed in patients with cryptogenic stroke.1 Identifying those at highest risk of paroxysmal AF is essential for initiating timely secondary prevention strategies.1–3 However, in many clinical settings—especially where long-term cardiac monitoring is not readily available. Despite advances in continuous rhythm monitoring and wearable technology, widespread implementation remains limited by cost, access, and infrastructure constraints.4–6

To bridge this gap, numerous clinical prediction models have been developed to stratify the risk of incident AF (new-onset atrial fibrillation detected after the index stroke event) using readily available clinical and imaging variables. While these tools offer a low-cost strategy for identifying high-risk patients, their clinical adoption has been limited.6–10

Digital decision-support platforms offer a potential solution by aggregating existing scores into a unified interface, reducing complexity and supporting more consistent clinical reasoning.11–16 Such platforms have shown promise in other areas of stroke care, particularly when integrated into electronic health records or mobile applications. However, little is known about how such tools perform across users with varying clinical experience.

AF detection after cryptogenic stroke remains a priority in stroke systems of care, particularly in low- and middle-income countries.17,18 A pragmatic, evidence-based, and easily deployable strategy to support post-stroke triage may reduce the risk of recurrent ischemic events and optimize resource allocation. Structured platforms that consolidate validated tools can also play a role in medical education, reinforcing evidence-based heuristics for early-career clinicians while mitigating risk stratification errors.19–21

We aimed to evaluate the performance of AFibrisk, a free digital decision-support platform that compiles 19 published and validated AF prediction scores, in a simulated case-based environment. Integrating the multiple existing AF prediction models—such as CHARGE-AF, ARIC, Framingham, C2HEST, HATCH, and the KP-AF model—poses an additional challenge, as these scores rely on overlapping yet sometimes discordant predictors, definitions, and weighting schemes. Harmonizing them within a unified platform like AFibrisk requires careful methodological alignment to ensure consistent and clinically meaningful risk estimation.

We aimed to (1) measure the concordance of user responses with the consensus of electrophysiology experts; (2) compare the performance across vascular neurologists, cardiology residents, and neurology residents; and (3) analyze the types of discrepancies encountered. Our hypothesis was that the tool would improve consistency and accuracy across evaluator groups, particularly among less experienced clinicians.

2. Methods

Study design

We conducted a prospective, cross-sectional concordance study to assess the clinical utility of AFibrisk, a digital decision-support platform that consolidates 19 validated clinical prediction scores for estimating the risk of incident atrial fibrillation (AF) in patients with cryptogenic stroke (ischemic stroke without identifiable cause after standard evaluation). AFibrisk is not a new prediction model but an implementation platform that aggregates 19 existing validated scores. It collects demographic, neurological, clinical, laboratory, electrocardiographic, echocardiographic, and vascular variables. These include age, sex, race, stroke severity (NIHSS), cardiovascular risk factors (hypertension, diabetes, coronary disease, heart failure, COPD), lifestyle factors (smoking, alcohol consumption), anthropometric measurements, laboratory biomarkers (LDL cholesterol, BNP), ECG parameters, echocardiographic findings, and markers of systemic and cerebrovascular disease.

The study was performed using structured clinical vignettes and simulated assessments to standardize evaluator exposure and ensure uniform case presentation across participants. The protocol adhered to STARD and MINIMAR reporting standards for diagnostic and AI-assisted tools, respectively.

Case vignette development and reference standard

An initial pool of 40 anonymized clinical vignettes was developed based on the literature and curated by a panel of vascular neurologists and electrophysiologists. Each vignette contained standardized clinical variables—such as demographics, comorbidities, neuroimaging findings, and baseline ECG data—sufficient to calculate up to 19 AF prediction scores integrated into the AFibrisk platform. Each case was independently reviewed and classified by three board-certified cardiac electrophysiologists. Cases lacking a clear majority agreement (ie, those with full disagreement or persistent divergence after blinded re-review) were excluded. As a result, 11 cases (27.5%) were removed due to lack of consensus among the electrophysiologists. The final dataset included 29 clinical cases with an established majority reference classification.

The reference classification for each included case was defined by the majority judgment among the three electrophysiologists—expert electrophysiologists. In cases where two or more experts agreed on the risk level (high vs. low), this consensus was used as the reference standard. Inter-rater reliability was assessed using pairwise concordance and Cohen’s kappa, demonstrating high consistency among the experts.

The vignettes were constructed with systematic inclusion of variables required for the 19 AF prediction scores. Universal variables documented in all 40 cases (100%) included age, sex, heart failure, and diabetes mellitus. Nearly universal variables comprised DPOC (84.6%), valvular disease (82.1%), and hypertension (74.4%). Prior stroke history appeared in 46.2% of vignettes, reflecting a mixed population of first-stroke and recurrent-stroke patients. Clinical severity assessments included NIHSS scores in 33.3% of cases. Cardiac evaluation variables encompassed ECG findings (33.3%), biomarkers such as BNP (30.8%), and neuroimaging data (30.8%). Race specification occurred in approximately half of cases (51.3%). Less frequently documented but clinically relevant variables included prior myocardial infarction (17.9%), vital signs such as blood pressure (15.4%) and heart rate (20.5%), BMI (20.5%), smoking status (15.4%), and detailed echocardiographic parameters including left atrial dimensions (5.1%) and atrial septal abnormalities (15.4%).

Validation

We invited a total of 27 neurology residents (postgraduate years 2-4) and 27 cardiology residents (postgraduate years 3-4) from academic institutions with dedicated stroke units and arrhythmia clinics. Invitations were distributed via institutional emails and professional networks. Participation was voluntary and required informed digital consent.

From the invited pool, 11 neurology residents, 4 cardiology residents, and 3 board-certified vascular neurologists ultimately completed the full evaluation protocol and were included in the final analysis. All participants had prior clinical exposure to stroke care but differed in experience with AF risk stratification tools. Evaluators were blinded to the reference classifications and to each other’s responses.

Participants accessed the AFibrisk platform through a secure, web-based interface. For each vignette, they were instructed to classify the patient as either high risk or low risk for incident AF, based on the clinical data and the consolidated scoring output generated by the platform. AFibrisk displayed all 19 scores in both numerical and graphical formats, automatically calculated from case-level variables. Importantly, evaluators were instructed to interpret and synthesize the information from all 19 scores collectively rather than selecting or relying on any single score. The platform presented the scores simultaneously without providing an algorithmic integration or weighted average, thereby requiring evaluators to apply their clinical judgment to reconcile potentially discordant risk estimates across multiple scores. This approach was designed to assess how clinicians integrate multiple prediction tools in practice, reflecting real-world decision-making where multiple risk assessment tools may yield conflicting recommendations. Evaluators were explicitly told that their final risk classification should consider the overall pattern of results across all scores, using their clinical expertise to weigh the relative importance of different predictive models based on the specific clinical context of each vignette. Evaluators also had the option to select “unable to classify” when they deemed the case indeterminate. These responses were recorded but excluded from all concordance and accuracy calculations.

Outcomes and statistical analysis

The primary outcome was the agreement between evaluator risk classifications and the expert-defined reference standard. Secondary outcomes included intra-group agreement, between-group differences, and analysis of error types—classified as underestimations (false negatives) or overestimations (false positives).

Statistical methods

All statistical analyses were conducted using R (version 4.3.1). Descriptive statistics were used to summarize the accuracy of classifications and the completion rates across evaluator groups. Inter-rater agreement was assessed using Cohen’s kappa for pairwise comparisons and Light’s kappa for multi-rater reliability. Group differences in concordance with the reference standard were evaluated using one-way analysis of variance (ANOVA) for normally distributed data and the Kruskal–Wallis test as a non-parametric alternative. When statistically significant differences were observed, post-hoc pairwise comparisons were performed using Dunn’s test with Bonferroni correction to adjust for multiple testing. In addition, linear regression models were constructed to assess the effect of evaluator group on classification accuracy, adjusting for unequal group sizes and clustering of responses. All statistical tests were two-sided, and a P-value < .05 was considered indicative of statistical significance.

To account for the nested structure of the data (multiple cases per evaluator) while adhering to our prespecified approach, we fitted linear regression models with Huber–White cluster-robust standard errors at the evaluator level and adjusted for unequal group sizes. Between-group differences were significant based on a cluster-robust Wald test. Effect sizes were Cohen’s d = 0.58 (vascular neurologists vs cardiology residents) and d = 1.18 (vascular neurologists vs. neurology residents); the partial η2 for the group term in the linear model was 0.431 (ω2 = 0.320). Excluding “unable to classify” responses is a limitation. In a sensitivity analysis treating these as incorrect classifications, overall accuracy was 54.6%, with differential impact across groups (vascular neurologists: 51.7%; cardiology residents: 60.3%; neurology residents: 53.3%), suggesting our primary analysis may overestimate real-world performance.

Data availability

All de-identified clinical vignettes, evaluator-level responses, reference classifications, and analysis scripts are available on Mendeley Data [Andrade, Joao (2025), “Digital Platform for AF Risk after Stroke”, Mendeley Data, V1, doi: 10.17632/wnvrctzryj.1]. This open dataset enables full reproducibility of the results and encourages secondary research on decision-making in post-stroke AF detection. The platform code is available at: https://github.com/joaobrainer/afibrisk

Ethics

This study was conducted in accordance with the principles of the Declaration of Helsinki and followed all applicable guidelines for research involving human participants. The protocol was reviewed and approved by our local IRB.

All participants provided informed digital consent prior to participation. Since the study involved simulated clinical vignettes and did not include any identifiable patient data or clinical interventions, it was classified as minimal risk. All data collected from evaluators were de-identified prior to analysis. Participants were informed that their individual responses would remain confidential and would not be linked to their professional identity in any publication or presentation. The expert electrophysiologists who defined the reference standard were also blinded to the identities of the evaluators and to each other’s classifications during initial case review. The platform AFibrisk is completely free and does not have ads or merchandising.

3. Results

A total of 29 clinical cases were included in the final analysis. Each case had a consensus risk classification provided by three independent cardiac electrophysiologists, whose majority response constituted the reference standard.

The cohort of evaluators consisted of three vascular neurologists, four cardiology residents, and eleven neurology residents. The individual judgments of the three electrophysiologists—used to define the reference standard—were analyzed only to assess internal agreement and not compared to the standard itself.

Agreement among the three electrophysiologists was remarkably high, with pairwise concordance ranging from 94.4% to 100%, and corresponding Cohen’s kappa values between 0.87 and 1.00. This high inter-rater reliability supports the robustness and internal consistency of the reference classification adopted in this study.

When comparing individual judgments to the reference standard, vascular neurologists exhibited a mean concordance of 90.3% (standard deviation [SD] 10.0%; 95% CI 80.8-96.8), while cardiology residents showed a mean of 85.2% (SD 7.9%; 95% CI 76.1-91.4) and neurology residents 77.5% (SD 11.0%; 95% CI 71.3-82.3) (Figures 1 and 2). These differences were statistically significant in both parametric and non-parametric frameworks. One-way ANOVA yielded an F statistic of 4.29 with a corresponding p-value of 0.0199, and the Kruskal–Wallis test confirmed significance with a p-value of 0.0259 (Table 1). A post-hoc Dunn test with Bonferroni correction revealed statistically significant differences between the neurology residents and the reference group (electrophysiologists) (adjusted P < .0002), and between the cardiology residents and the reference (adjusted P < .0001). No statistically significant difference was observed between vascular neurologists and either resident group (adjusted P > .05).

Figure 1.

Figure 1.

Distribution of concordance with the reference standard.

Boxplot of individual-level concordance rates (%) with the reference standard across evaluator groups. Each dot represents a single evaluator.

Figure 2.

Figure 2.

Heatmap of evaluator responses by group and reference classification.

Heatmap showing the distribution of evaluator responses stratified by both evaluator group and the reference classification (high vs. low risk for incidental atrial fibrillation). Cell values represent the proportion of responses within each subgroup. High = High risk, Low = Low risk for incidental atrial fibrillation.

Table 1.

Accuracy and inter-rater reliability by evaluator group.

Evaluator group Mean concordance (%) Standard deviation Light’s Kappa
Vascular neurologists 90.3 10.0 0.417
Cardiology residents 85.2 7.9 0.432
Neurology residents 77.5 11.0 0.607

Mean concordance with the reference standard, standard deviation, and Light’s Kappa statistic by evaluator group.

A linear regression model using evaluator group as a categorical predictor of concordance confirmed the presence of statistically significant between-group variation (F = 4.29, P = .0199), reinforcing the robustness of the group effect regardless of unequal sample sizes.

Intra-group agreement was evaluated using pairwise concordance and Cohen’s Kappa. Among vascular neurologists, Kappa values ranged from 0.59 to 0.85, indicating moderate to substantial agreement. Among residents, intra-group consistency was more variable. When Light’s Kappa was computed by incorporating the reference classification as an additional rater, neurology residents exhibited the highest multi-rater reliability (κ  =  0.607), followed by cardiology residents (κ  =  0.432) and vascular neurologists (κ  =  0.417). These findings suggest that, despite lower individual-level accuracy, the group of neurology residents showed notable internal consistency in their assessments (Table 2). Inter-rater reliability assessed using Light’s kappa, incorporating the reference classification as an additional rater, showed moderate agreement among vascular neurologists (κ  =  0.42; 95% CI 0.31-0.52) and cardiology residents (κ  =  0.43; 95% CI 0.34-0.53), and substantial agreement among neurology residents (κ  =  0.61; 95% CI 0.54-0.67). Confidence intervals were obtained using non-parametric bootstrap resampling.

Table 2.

Group comparisons: ANOVA and non-parametric tests.

Test Statistic P-value
One-way ANOVA F = 4.29 .0199
Kruskal–Wallis .0259

Statistical significance testing using both parametric (ANOVA) and non-parametric (Kruskal–Wallis) frameworks.

To explore the nature of discrepancies, each incorrect response was classified as either a subestimation (cases in which the evaluator labeled the patient as low risk while the reference indicated high risk) or a superestimation (cases where the evaluator labeled the patient as high risk but the reference classified the case as low risk). Across all groups, 38 out of 426 total judgments (8.9%) were classified as superestimations, whereas only 10 judgments (2.3%) represented subestimations (Figure 3).

Figure 3.

Figure 3.

Classification outcomes by error type and evaluator group.

Stacked bar plot showing the distribution of response types (correct classification, underestimation, and overestimation) as a proportion of all judgments per group.

Neurology residents made 220 total assessments. Among these, 170 (77.3%) matched the reference, 44 (20.0%) were superestimations, and only 6 (2.7%) were subestimations. Cardiology residents completed 82 judgments, of which 70 (85.4%) were concordant with the standard, and 12 (14.6%) were superestimations; no subestimations were recorded in this group. Vascular neurologists provided 49 evaluations, correctly classifying 45 (91.8%) cases; three judgments (6.1%) were subestimations and one (2.0%) was a superestimation (Table 3, Figure 4). When stratified by case difficulty, classification accuracy decreased progressively across all groups as complexity increased. For cases categorized as easy (n = 7), accuracy rates were 95.2% for vascular neurologists, 92.9% for cardiology residents, and 87.0% for neurology residents. Intermediate cases (n = 10) showed moderate accuracy: 90.0%, 85.0%, and 78.2% respectively. For difficult cases (n = 12), performance dropped substantially to 75.0%, 71.4%, and 58.3% respectively, demonstrating that even with AFibrisk support, complex cases remain challenging, particularly for less experienced evaluators.

Table 3.

Classification errors by type and evaluator group.

Evaluator group Total judgments Superestimation (%) Subestimation (%)
Vascular neurologists 49 2.0 6.1
Cardiology residents 82 14.6 0.0
Neurology residents 220 20.0 2.7

Number and percentage of correct classifications, overestimations, and underestimations made by each evaluator group.

Figure 4.

Figure 4.

Heatmap of adjusted P-values from Post-hoc Dunn–Bonferroni comparisons.

This figure shows pairwise comparisons between evaluator groups after Dunn’s post-hoc test with Bonferroni correction. Darker shades indicate lower adjusted P-values, and labels report the exact values used to assess statistical significance. In this context, P-values less than .05 indicate that differences in concordance rates between groups are statistically significant and unlikely to have occurred by chance. For example, both the cardiology and neurology residents differed significantly from the electrophysiologists, suggesting group-level variation in classification accuracy. No significant difference was observed between vascular neurologists and the other groups. Dunn–Bonferroni post-hoc tests demonstrating the degree of concordance between each pair of tested groups.

Finally, completeness of evaluation was assessed for each participant. Each evaluator was presented with 29 clinical cases. However, in 183 out of 609 total rating opportunities (30.1%), participants explicitly indicated that they were unable to classify a given case as either high or low risk for incidental atrial fibrillation. These responses were recorded as “unable to classify” and were excluded from all accuracy and concordance analyses.

Among the 4 cardiology residents, the number of cases classified per evaluator ranged from 19 to 22 (mean: 20.5), corresponding to an average of 8.5 unclassified cases per participant, or approximately 29.3% of their rating opportunities. The 11 neurology residents completed between 12 and 28 classifications each (mean: 20.0), yielding a mean of 9.0 unclassified cases per participant (31.0%). The 3 vascular neurologists submitted between 10 and 22 ratings (mean: 16.3), leaving an average of 12.7 unclassified cases, or 43.8% of expected judgments. Electrophysiologists, whose ratings were used only to construct the reference standard, submitted between 22 and 28 responses each (mean: 25.0), resulting in an average of 4.0 unclassified cases (or 13.8%).

Each of the 18 raters independently evaluated 29 clinical vignettes, yielding a total of 522 potential rating opportunities. Among these, 183 ratings (35.1%) were recorded as ‘not classified’. The remaining ratings were assigned to one of the predefined categories and were included in subsequent agreement and comparative analyses. X

The overall rate of “not classified” responses differed significantly across professional categories (Kruskal-Wallis H = 18.77, P = .0003). Electrophysiologists demonstrated the lowest rate of uncertainty (13.8%), whereas stroke neurologists exhibited the highest (43.7%), followed by neurology (31.0%) and cardiology residents (29.3%) (Supplementary Material, Figure S1).

A multinomial ordinal model showed that, compared to electrophysiologists, all other groups had higher odds of uncertainty. Stroke neurologists (OR = 3.97, 95% CI = 2.21-7.13), neurology residents (OR = 3.02, 95% CI = 1.90-4.81), and cardiology residents (OR = 2.78, 95% CI = 1.63-4.75) were all significantly more likely to choose “not classified” rather than a risk category.

Agreement with the gold standard was highest among electrophysiologists (85.1%), with lower agreement in cardiology (60.3%) and neurology residents (53.3%), and stroke neurologists (51.7%). A regression model confirmed that cases labeled as “low risk” by the gold standard were significantly more likely to be left unclassified by participants (OR = 2.98; 95% CI = 1.92-4.63), indicating greater uncertainty when no overt risk factors were present. Also, based on these findings, each case was categorized as easy, intermediate, or difficult based on predefined thresholds of consensus and classification completeness. Of the 40 cases, 41.4% were classified as difficult (low agreement or high uncertainty), 34.5% as intermediate, and only 24.1% as easy (high agreement and low uncertainty).

4. Discussion

This study assessed the clinical utility of AFibrisk, a digital decision-support platform that consolidates 19 validated clinical scores to estimate the risk of incident atrial fibrillation (AF) in cryptogenic stroke patients. AFibrisk is not a new prediction model but a platform integrating existing validated scores. The tool was evaluated across a range of clinical backgrounds—vascular neurologists, cardiology residents, and neurology residents—using expert electrophysiologist consensus as the reference standard. Our results suggest that such a structured platform improves alignment with expert judgment, particularly among less experienced clinicians.

The AFibRisk questionnaire collects demographic, neurological, clinical, laboratory, electrocardiographic, echocardiographic, and vascular variables. These include age, sex, race, stroke severity, cardiovascular risk factors, lifestyle factors, anthropometric measurements, laboratory biomarkers, ECG parameters, echocardiographic findings, and markers of systemic and cerebrovascular disease. Also, The AFibrisk interface displays the outputs of all 19 validated atrial fibrillation prediction scores simultaneously, each in its original format (numerical estimate and/or categorical risk level), organized in a standardized visual layout. The platform does not apply weighting, averaging, or algorithmic integration across scores. Users are instructed to interpret the overall pattern of concordant or discordant risk signals and to render a final binary classification (high vs. low AF risk).

The consistent internal agreement observed within all evaluator groups reinforces the notion that digital tools can harmonize decision-making even across varying levels of training. This is consistent with prior concerns raised by Himmelreich et al22 and Poorthuis et al,23 who highlighted the usability challenges and limited implementation of traditional AF prediction models such as CHARGE-AF24 in real-world clinical settings.25,26

Error pattern analysis revealed that more experienced clinicians were prone to underestimation, while residents showed a tendency to overestimate AF risk. These opposing tendencies suggest different thresholds for risk attribution, which can have clinical consequences ranging from missed opportunities for timely detection to unnecessary resource utilization. Piot and Guidoux12 have similarly emphasized the need for tools that optimize this balance, particularly in post-stroke AF screening contexts. These divergent error patterns observed—with residents overestimating risk in 20% of cases versus experts underestimating in 6.1%—have important implications for clinical implementation and training. These patterns likely reflect different decision-making heuristics: residents may apply more conservative thresholds given their limited experience with post-stroke outcomes, while experienced clinicians may rely more heavily on clinical gestalt, potentially underweighting systematic risk scores. For educational purposes, AFibrisk could serve as a calibration tool, allowing trainees to compare their risk assessments against both the aggregated scores and expert consensus. The platform’s visual display of all 19 scores simultaneously may help users recognize when their clinical judgment diverges significantly from the evidence base. Importantly, these error patterns were consistent within groups (Light’s κ = 0.607 for neurology residents), suggesting that standardized tools like AFibrisk could reduce inter-observer variability while maintaining appropriate clinical judgment.

The integration of AFibrisk into clinical workflows aligns with emerging digital health trends. In a recent implementation study, Grout et al27 demonstrated that embedding a risk prediction tool for undiagnosed AF directly into electronic health record systems resulted in effective risk stratification and actionable alerts. AFibrisk operates on a similar premise—albeit externally from the EHR—providing a clinically relevant summary of AF risk through the aggregation of multiple validated scores, and enabling real-time bedside use in both acute and ambulatory settings.28–31

The scalability and accessibility of AFibrisk are further underscored when contrasted with high-cost strategies such as prolonged ECG patch monitoring. In the mSToPS trial,32 the authors showed that extended monitoring improves AF detection and is cost-effective in selected populations. However, such resources may not be readily available in low- and middle-income settings. A platform like AFibrisk, which relies solely on standard clinical variables, offers a pragmatic and scalable alternative that democratizes risk stratification.

Advances in artificial intelligence have also driven AF prediction through ECG analysis.28,30 A notable example is the work by Attia et al,30 who used deep learning applied to resting ECGs to predict paroxysmal AF in large cohorts. While this approach represents a major innovation, it also requires digitized ECG repositories and specialized processing capacity. By contrast, AFibrisk capitalizes on existing clinical data, providing a complementary route that is transparent, interpretable, and implementable without sophisticated infrastructure.

Importantly, AF detection remains suboptimal in acute care settings. Chyou et al,33 in an AHA Scientific Statement, underscored the absence of structured, bedside strategies to guide early AF workup during hospitalization. AFibrisk may help address some of these gaps, standardizing triage decisions and improving identification rates, particularly in stroke units with variable access to long-term cardiac monitoring, though its real-world impact requires prospective evaluation in clinical settings.

The observed inter-user variability in risk classification decisions warrants careful interpretation. The differences in final risk classifications reflect variations in clinical interpretation and decision-making processes rather than platform inconsistencies. When presented with 19 different prediction scores—some of which may yield conflicting risk estimates—users must synthesize this information using their clinical judgment, experience, and individual risk thresholds.

Excluding high-uncertainty cases from accuracy analyses may overestimate real-world performance. This uncertainty is less a limitation of the platform design than a reflection of the inherent complexity of AF risk stratification, particularly when multiple validated scores yield discordant estimates.

The systematic differences we observed between user groups (with residents tending toward overestimation and experts showing occasional underestimation) reveal distinct clinical reasoning patterns that have important educational implications. Rather than representing a weakness, this variability highlights opportunities for targeted medical education and standardization of decision-making frameworks.

In prospective clinical use, the performance of AFibrisk will depend as much on practical implementation as on predictive accuracy. Efficient integration into existing workflows and minimal impact on decision time will be essential, particularly in high-volume settings. As with other decision-support tools, repeated use may lead to user fatigue if outputs are perceived as redundant or poorly timed. Future prospective studies should therefore focus on usability, time to decision, and clinician engagement in routine practice.

Future work could test implementation by embedding AFibrisk into routine clinical workflows or training programs and measuring practical outcomes such as time to decision, frequency of use, and agreement with standard care over repeated cases. Similar approaches in educational settings could assess whether the tool supports learning and consistency among trainees without increasing cognitive burden.

This study has some limitations that must be acknowledged. First, the use of clinical vignettes, while ensuring standardized evaluation across all participants, cannot fully replicate the complexity of real-world clinical encounters where additional contextual factors, patient preferences, and dynamic information gathering influence decision-making. Some theoretically possible combinations of clinical findings in our vignettes may not reflect actual clinical practice patterns, potentially limiting the generalizability of our findings to real-world decision-making. Second, our sample consisted of 29 vignettes selected from an initial pool of 40, representing a relatively small dataset. The exclusion of 11 vignettes (27.5%) due to lack of consensus among expert electrophysiologists is particularly noteworthy, as it suggests our analysis was restricted to cases with clearer risk profiles, potentially overestimating the platform’s performance in ambiguous scenarios where decision support would be most valuable. Third, the high rate of “unable to classify” responses—30.1% overall and reaching 43.8% among vascular neurologists—was excluded from accuracy calculations. This exclusion may overstate the tool’s capacity to provide definitive classifications in practice, where clinical uncertainty is common and represents a significant challenge in post-stroke AF risk assessment. Fourth, inter-rater variability even among expert electrophysiologists reflects the inherent challenge of establishing reference standards for probabilistic outcomes like AF risk, where no true gold standard exists outside of long-term monitoring. Fifth, our simulated environment could not assess practical implementation factors such as time-to-decision, integration with electronic health records, or disruption to clinical workflows. Sixth, the observed error patterns—with residents tending toward overestimation and experts toward underestimation—suggest that user experience significantly influences platform interpretation, though we could not explore how training or feedback mechanisms might mitigate these biases. Seventh, it must be emphasized that AFibrisk is not intended to replace cardiac monitoring technologies such as implantable loop recorders or extended Holter monitoring, which remain the gold standard for AF detection when available. The platform addresses a specific niche: resource-limited settings where clinical prediction scores using readily available variables represent the only feasible option for initial risk stratification. Finally, case selection bias represents an important limitation. By excluding non-consensus or high-uncertainty cases, the study may have preferentially selected clearer clinical scenarios, potentially inflating performance metrics and limiting generalizability to routine clinical practice. Despite these constraints, our findings provide robust preliminary evidence supporting AFibrisk’s potential as a standardized decision-support tool, particularly given the 90.3% concordance achieved by vascular neurologists and the consistent performance patterns observed across different user groups.

5. Conclusion

The AFibrisk platform adoption demonstrated a high concordance with expert electrophysiologist judgment in stratifying the risk of incident atrial fibrillation across simulated clinical scenarios, particularly among vascular neurologists and cardiology residents. The tool demonstrated improved consistency across users and revealed statistically significant differences in error patterns between user groups. As a free, English-language, user-friendly, and fully referenced digital resource, AFibrisk may be a practical decision-support solution for AF triage after cryptogenic stroke—especially in environments with limited access to long-term monitoring resources.

Author contributions

João Brainer Clares de Andrade (Conceptualization, Formal analysis, Methodology, Software, Writing—original draft, Writing—review & editing), Rafael P. Gomes (Methodology, Validation, Writing—original draft), Alexandre Cristiuma Robles (Data curation, Writing—original draft), Thales Pardini Fagundes (Conceptualization, Formal analysis, Supervision), and George N. Nunes Mendes (Conceptualization, Data curation, Validation, Writing—original draft)

Supplementary material

Supplementary material is available at JAMIA Open online.

Funding

This study was not supported by any sponsor or funder.

Conflicts of interest

The authors declare that there is no conflict of interest related to the study design, execution and analysis, and manuscript conception, planning, writing and decision to publish.

Ethical approval

This project has an IRB approval (93608725.1.0000.5505)

Informed consent

Participants have signed a digital informed consent. No personal data was adopted in any steps of creating process, except occupation.

General statement

All the authors have read and approved the submitted manuscript. The manuscript has not been submitted elsewhere nor published elsewhere in whole or in part.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the authors used chat GPT 4.o in order to review grammar and language. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication. ChatGPT® 4.o was also employed to translate Portuguese into English.

Guarantor

Joao Brainer Clares de Andrade.

Code availability

The software application and custom code used in this study are publicly available at GitHub: https://github.com/joaobrainer/afibrisk.

Data availability

No real patient-level data were used in this study. The study was based exclusively on fictional clinical vignettes developed for research and educational purposes. As such, there are no datasets generated or analyzed that require deposition in a public repository. All materials necessary to reproduce the study methods, including the underlying algorithms and implementation logic, are widely available. Dataset with vignettes is fully available in Mendley Data repository [Andrade, Joao (2025), “Digital Platform for AF Risk after Stroke”, Mendeley Data, V1, doi: 10.17632/wnvrctzryj.1].

Supplementary Material

ooag001_Supplementary_Data

Contributor Information

João Brainer Clares de Andrade, Department of Neurology, Department of Health Informatics, Universidade Federal de São Paulo, São Paulo, SP 04025012, Brazil; Academic Research Organization, Hospital Israelita Albert Einstein, São Paulo, SP 05653-000, Brazil; Bioengineering Laboratory, Institute of Aeronautics Technology, São José dos Campos, SP 12228-900, Brazil; School of Medicine, Centro Universitario São Camilo, São Paulo, SP 05025-010, Brazil.

Rafael P Gomes, School of Medicine, Centro Universitario São Camilo, São Paulo, SP 05025-010, Brazil.

Alexandre Cristiuma Robles, School of Medicine, Centro Universitario São Camilo, São Paulo, SP 05025-010, Brazil.

Thales Pardini Fagundes, Barretos Cancer Hospital, Barretos, SP 14784-400, Brazil.

George N Nunes Mendes, Centre Hospitalier de I’Universite de Montreal, QC H2W 1T8, Canada.

References

  • 1. Sanna T, Diener HC, Passman RS, et al. ; CRYSTAL AF Investigators  Cryptogenic stroke and underlying atrial fibrillation. N Engl J Med. 2014;370:2478-2486. 10.1056/NEJMoa1313600 [DOI] [PubMed] [Google Scholar]
  • 2. Sposato LA, Chaturvedi S, Hsieh CY, Morillo CA, Kamel H.  Atrial fibrillation detected after stroke and transient ischemic attack: a novel clinical concept challenging current views. Stroke. 2022;53:e94-e103. 10.1161/STROKEAHA.121.034777 [DOI] [PubMed] [Google Scholar]
  • 3. Kishore A, Vail A, Majid A, et al.  Detection of atrial fibrillation after ischemic stroke or transient ischemic attack. Stroke. 2014;45:520-526. 10.1161/STROKEAHA.113.003433 [DOI] [PubMed] [Google Scholar]
  • 4. Vinciguerra M, Dobrev D, Nattel S.  Atrial fibrillation: pathophysiology, genetic and epigenetic mechanisms. Lancet Reg Health Eur. 2024;37:100785. 10.1016/j.lanepe.2023.100785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Buja A, Rebba V, Montecchio L, et al.  The cost of atrial fibrillation: a systematic review. Value Health. 2024;27:527-541. 10.1016/j.jval.2023.12.015 [DOI] [PubMed] [Google Scholar]
  • 6. Linker DT, Murphy TB, Mokdad AH.  Selective screening for atrial fibrillation using multivariable risk models. Heart. 2018;104:1492-1499. 10.1136/heartjnl-2017-312686 [DOI] [PubMed] [Google Scholar]
  • 7. Biancari F, Teppo K, Jaakkola J, et al.  Income and outcomes of patients with incident atrial fibrillation. J Epidemiol Community Health (1978)  2022;76:736-742. 10.1136/jech-2022-219190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Trevisan Teixeira C, Rizelio V, Robles A, Coelho Maia Barros L, Sampaio Silva G, Clares De Andrade BJ.  A predictive score for atrial fibrillation in poststroke patients. Arq Neuropsiquiatr. 2024;82:1-9. 10.1055/s-0044-1788271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Suissa L, Bertora D, Lachaud S, Mahagne MH.  Score for the targeting of atrial fibrillation (STAF): a new approach to the detection of atrial fibrillation in the secondary prevention of ischemic stroke. Stroke. 2009;40:2866-2868. 10.1161/STROKEAHA.109.552679 [DOI] [PubMed] [Google Scholar]
  • 10. Dörr M, Nohturfft V, Brasier N, et al.  The WATCH AF trial: SmartWATCHes for detection of atrial fibrillation. JACC Clin Electrophysiol. 2019;5:199-208. 10.1016/j.jacep.2018.10.006 [DOI] [PubMed] [Google Scholar]
  • 11. Karakasis P, Theofilis P, Sagris M,  et al.  Artificial intelligence in atrial fibrillation: from early detection to precision therapy. JCM. 2025;14:2627:1-16. 10.3390/jcm14082627 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Piot O, Guidoux C.  Searching for atrial fibrillation post stroke: is it time for digital devices?  Front Cardiovasc Med. 2023;10:1212128. 10.3389/fcvm.2023.1212128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Tiwari P, Colborn KL, Smith DE, Xing F, Ghosh D, Rosenberg MA.  Assessment of a machine learning model applied to harmonized electronic health record data for the prediction of incident atrial fibrillation. JAMA Netw Open. 2020;3:e1919396. 10.1001/jamanetworkopen.2019.19396 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Papadopoulou A, Harding D, Slabaugh G, Marouli E, Deloukas P.  Prediction of atrial fibrillation and stroke using machine learning models in UK biobank. Heliyon. 2024;10:e28034. 10.1016/j.heliyon.2024.e28034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Jabbour G, Nolin-Lapalme A, Tastet O, et al.  Prediction of incident atrial fibrillation using deep learning, clinical models and polygenic scores. Eur Heart J. 2024;45:4920-4934. 10.1093/eurheartj/ehae595 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Khurshid S, Friedman S, Reeder C, et al.  ECG-Based deep learning and clinical risk factors to predict atrial fibrillation. Circulation. 2022;145:122-133. 10.1161/CIRCULATIONAHA.121.057480 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Chao TF, Chiang CE, Chen TJ, Liao JN, Tuan TC, Chen SA.  Clinical risk score for the prediction of incident atrial fibrillation: derivation in 7 220 654 Taiwan patients with 438 930 incident atrial fibrillations during a 16-year follow-up. J Am Heart Assoc. 2021;10:e020194. 10.1161/JAHA.120.020194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Dretzke J, Chuchu N, Chua W, et al.  Prognostic models for predicting incident or recurrent atrial fibrillation: protocol for a systematic review. Syst Rev. 2019;8:221. 10.1186/s13643-019-1128-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Aronson D, Shalev V, Katz R, Chodick G, Mutlak D.  Risk score for prediction of 10-year atrial fibrillation: a community-based study. Thromb Haemost. 2018;118:1556-1563. 10.1055/s-0038-1668522 [DOI] [PubMed] [Google Scholar]
  • 20. Gadaleta M, Harrington P, Barnhill E, et al.  Prediction of atrial fibrillation from at-home single-lead ECG signals without arrhythmias. NPJ Digit Med. 2023;6:229. 10.1038/s41746-023-00966-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Kwong C, Ling AY, Crawford MH, Zhao SX, Shah NH.  A clinical score for predicting atrial fibrillation in patients with cryptogenic stroke or transient ischemic attack. Cardiology (Switzerland). 2017;138:133-140. 10.1159/000476030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Himmelreich JCL, Veelers L, Lucassen WAM, et al.  Prediction models for atrial fibrillation applicable in the community: a systematic review and meta-analysis. Europace. 2020;22:684-694. 10.1093/europace/euaa005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Poorthuis MHF, Jones NR, Sherliker P, et al.  Utility of risk prediction models to detect atrial fibrillation in screened participants. Eur J Prev Cardiol. 2021;28:586-595. 10.1093/eurjpc/zwaa082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Goudis C, Daios S, Dimitriadis F, Liu T.  CHARGE-AF: a useful score for atrial fibrillation prediction?  Curr Cardiol Rev. 2023;19:e010922208402. 10.2174/1573403X18666220901102557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Lip GYH, Nieuwlaat R, Pisters R, et al.  Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest. 2010;137:263-272. 10.1378/chest.09-1584 [DOI] [PubMed] [Google Scholar]
  • 26. Khurshid S, Kartoun U, Ashburner JM, et al.  Performance of atrial fibrillation risk prediction models in over 4 million individuals. Circ Arrhythm Electrophysiol. 2021;14:E008997. 10.1161/CIRCEP.120.008997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Grout RW, Ateya M, DiRenzo B, et al.  Screening for undiagnosed atrial fibrillation using an electronic health record–based clinical prediction model: clinical pilot implementation initiative. BMC Med Inform Decis Mak. 2024;24:388. 10.1186/s12911-024-02773-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Zeng A, Tang Q, O’Hagan E, et al.  Use of digital patient decision-support tools for atrial fibrillation treatments: a systematic review and meta-analysis. BMJ Evid Based Med. 2025;30:10-21. 10.1136/bmjebm-2023-112820 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Gladstone DJ, Spring M, Dorian P, et al. ; EMBRACE Investigators and Coordinators  Atrial fibrillation in patients with cryptogenic stroke. N Engl J Med. 2014;370:2467-2477. 10.1056/NEJMoa1311376 [DOI] [PubMed] [Google Scholar]
  • 30. Attia ZI, Noseworthy PA, Lopez-Jimenez F, et al.  An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet. 2019;394:861-867. 10.1016/S0140-6736(19)31721-0 [DOI] [PubMed] [Google Scholar]
  • 31. Linz D, Gawalko M, Betz K, et al.  Atrial fibrillation: epidemiology, screening and digital health. Lancet Reg Health Eur. 2024;37:100786. 10.1016/j.lanepe.2023.100786 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Steinhubl SR, Waalen J, Edwards AM, et al.  Effect of a home-based wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation. JAMA. 2018;320:146-155. 10.1001/jama.2018.8102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Chyou JY, Barkoudah E, Dukes JW, et al. ; American Heart Association Acute Cardiac Care and General Cardiology Committee, Electrocardiography and Arrhythmias Committee, and Clinical Pharmacology Committee of the Council on Clinical Cardiology; Council on Cardiovascular Surgery and Anesthesia; Council on Cardiopulmonary, Critical Care, Perioperative and Resuscitation; Council on Cardiovascular and Stroke Nursing; and Stroke Council  Atrial fibrillation occurring during acute hospitalization: a scientific statement from the American heart association. Circulation. 2023;147:E676-E698. 10.1161/CIR.0000000000001133 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ooag001_Supplementary_Data

Data Availability Statement

All de-identified clinical vignettes, evaluator-level responses, reference classifications, and analysis scripts are available on Mendeley Data [Andrade, Joao (2025), “Digital Platform for AF Risk after Stroke”, Mendeley Data, V1, doi: 10.17632/wnvrctzryj.1]. This open dataset enables full reproducibility of the results and encourages secondary research on decision-making in post-stroke AF detection. The platform code is available at: https://github.com/joaobrainer/afibrisk

No real patient-level data were used in this study. The study was based exclusively on fictional clinical vignettes developed for research and educational purposes. As such, there are no datasets generated or analyzed that require deposition in a public repository. All materials necessary to reproduce the study methods, including the underlying algorithms and implementation logic, are widely available. Dataset with vignettes is fully available in Mendley Data repository [Andrade, Joao (2025), “Digital Platform for AF Risk after Stroke”, Mendeley Data, V1, doi: 10.17632/wnvrctzryj.1].


Articles from JAMIA Open are provided here courtesy of Oxford University Press

RESOURCES