. 2023 Apr 22;30(6):1091–1102. doi: 10.1093/jamia/ocad050

Table 2.

Coverage and prediction results of different components of the QA pipeline on FHIR_DATA

Component	Using gold concept	Top MetaMap score	Longest concept	Filter using EHR concepts	Longest after filter using EHR concepts
Component	Result: % (#) [total = 966]
MetaMap generated concepts include gold Boundary (recall)	–	83.85% (810)
MetaMap generated concepts include gold CUI (recall)	–	41.51% (401)
Predicted concepts match gold Boundary (accuracy)	–	54.04% (522)	49.59% (479)	49.38% (477)	63.66% (615)
Predicted concepts match gold CUI (accuracy)	–	23.50% (227)	20.50% (198)	39.13% (378)	40.06% (387)
Generated logical trees include gold (recall)	100.00% (966)	83.64% (808)	78.88% (762)	99.17% (958)	87.06% (841)
Predicted logical tree matches gold (accuracy)	97.41% (941)	81.57% (788)	39.03% (377)	96.27% (930)	84.89% (820)
Predicted time frame matches gold (accuracy)			88.30% (853)
Generated logical forms include gold (recall)	100.00% (966)	45.45% (439)	47.10% (455)	42.34% (409)	54.45% (526)
Predicted logical form matches gold (accuracy)	86.23% (833)	19.25% (186)	9.21% (89)	33.54% (324)	34.16% (330)
Predicted FHIR response matches gold (accuracy)	97.41% (941)	22.77% (220)	10.14% (98)	38.51% (372)	39.13% (378)
Predicted answer matches gold (accuracy)	97.41% (941)	22.98% (222)	10.14% (98)	38.61% (373)	39.34% (380)
Predicted answer matches gold (precision) [# correct responses/all responses]	98.33% [941/957]	94.42% [220/233]	50.52% [98/194]	94.67% [373/394]	94.03% [378/402]