. 2025 Aug 24;8:543. doi: 10.1038/s41746-025-01955-x

Table 2.

Quantitative evaluation results of RAG LLM and Vanilla LLM in Domain Knowledge set

Questions	Methods	BertScore	Coverage
Facial phenotype-disease associations	Vanilla LLM	0.8123	11.52%
	Cypher RAG LLM	0.8621	34.76%
	Vector RAG LLM	0.8705	38.92%
Facial phenotype-gene associations	Vanilla LLM	0.8127	10.66%
	Cypher RAG LLM	0.8498	30.15%
	Vector RAG LLM	0.8532	40.01%
Disease-gene associations	Vanilla LLM	0.8377	40.01%
	Cypher RAG LLM	0.8914	75.33%
	Vector RAG LLM	0.9028	80.47%
Facial phenotype synonyms	Vanilla LLM	0.8743	52.64%
	Cypher RAG LLM	0.9481	88.20%
	Vector RAG LLM	0.9335	86.50%

For all types of questions, the RAG LLM has a higher BertScore and coverage than the Vanilla LLM. The coverage measures the proportion of key information in reference answers correctly answered by the LLMs. The bold values represent the best performance in this type of question.