. 2025 Sep 23;11:e71102. doi: 10.2196/71102

Table 4. Performance of baseline models in named entity recognition.

Model, learning strategy, and entity	Precision	Recall	F₁-score	Standard error (F₁)	Lower CI (F₁)	Upper CI (F₁)
Bert-base-cased
SFT^a
Cancer_type	0.5366	0.6286	0.5789	0.0493	0.4821	0.6756
Indicated_symptom	0.1667	0.1429	0.1538	0.0360	0.08	0.2245
Product	0.6773	0.7161	0.6962	0.0459	0.6060	0.7863
Micro average	0.6514	0.6905	0.6704	0.047	0.5782	0.7625
Macro average	0.4602	0.4959	0.4763	0.0499	0.3784	0.5741
Weighted average	0.6495	0.6905	0.6692	0.0470	0.5769	0.7614
Bio_ClinicalBERT
SFT
Cancer_type	0.5349	0.697	0.6053	0.0489	0.5094	0.7011
Indicated_symptom	0.3000	0.2143	0.2500	0.0433	0.1651	0.3348
Product	0.695	0.6583	0.6762	0.0468	0.5844	0.7679
Micro average	0.6675	0.6462	0.6567	0.0474	0.5636	0.75
Macro average	0.5100	0.5232	0.5105	0.0499	0.4125	0.6084
Weighted average	0.6684	0.6462	0.6558	0.0475	0.5626	0.7489
Zero-shot
Cancer_type	0.2885	0.6818	0.4054	0.0490	0.3091	0.5016
Indicated_symptom	0.0759	0.4615	0.1304	0.0336	0.06	0.1964
Product	0.3529	0.3243	0.338	0.0473	0.2452	0.4307
Micro average	0.2776	0.3619	0.3142	0.0464	0.2232	0.4051
Macro average	0.2391	0.4892	0.2913	0.0454	0.2022	0.3803
Weighted average	0.3334	0.3619	0.3333	0.0471	0.2409	0.4256
gpt4-1106-preview-chat
Few-shot
Cancer_type	0.3148	0.7727	0.4474	0.0497	0.3499	0.5448
Indicated_symptom	0.0536	0.2308	0.087	0.0281	0.03	0.1422
Product	0.4743	0.5405	0.5053	0.0499	0.4073	0.6032
Micro average	0.3857	0.5447	0.4516	0.0498	0.3540	0.5491
Macro average	0.2809	0.5147	0.3465	0.0476	0.2532	0.4397
Weighted average	0.4394	0.5447	0.4791	0.0499	0.3811	0.5770
Many-shot
Cancer_type	0.4000	0.6364	0.4912	0.05	0.3932	0.589
Indicated_symptom	0	0	0	0	0	0
Product	0.5672	0.5135	0.539	0.0498	0.44	0.6367
Micro average	0.5079	0.4981	0.5029	0.05	0.4049	0.601
Macro average	0.3224	0.3833	0.3434	0.0474	0.2503	0.4364
Weighted average	0.5242	0.4981	0.5077	0.0499	0.4097	0.6056

SFT: supervised fine-tuning.