Skip to main content
. 2025 Aug 24;8:543. doi: 10.1038/s41746-025-01955-x

Table 2.

Quantitative evaluation results of RAG LLM and Vanilla LLM in Domain Knowledge set

Questions Methods BertScore Coverage
Facial phenotype-disease associations Vanilla LLM 0.8123 11.52%
Cypher RAG LLM 0.8621 34.76%
Vector RAG LLM 0.8705 38.92%
Facial phenotype-gene associations Vanilla LLM 0.8127 10.66%
Cypher RAG LLM 0.8498 30.15%
Vector RAG LLM 0.8532 40.01%
Disease-gene associations Vanilla LLM 0.8377 40.01%
Cypher RAG LLM 0.8914 75.33%
Vector RAG LLM 0.9028 80.47%
Facial phenotype synonyms Vanilla LLM 0.8743 52.64%
Cypher RAG LLM 0.9481 88.20%
Vector RAG LLM 0.9335 86.50%

For all types of questions, the RAG LLM has a higher BertScore and coverage than the Vanilla LLM. The coverage measures the proportion of key information in reference answers correctly answered by the LLMs. The bold values represent the best performance in this type of question.