Table 2.
Quantitative evaluation results of RAG LLM and Vanilla LLM in Domain Knowledge set
| Questions | Methods | BertScore | Coverage |
|---|---|---|---|
| Facial phenotype-disease associations | Vanilla LLM | 0.8123 | 11.52% |
| Cypher RAG LLM | 0.8621 | 34.76% | |
| Vector RAG LLM | 0.8705 | 38.92% | |
| Facial phenotype-gene associations | Vanilla LLM | 0.8127 | 10.66% |
| Cypher RAG LLM | 0.8498 | 30.15% | |
| Vector RAG LLM | 0.8532 | 40.01% | |
| Disease-gene associations | Vanilla LLM | 0.8377 | 40.01% |
| Cypher RAG LLM | 0.8914 | 75.33% | |
| Vector RAG LLM | 0.9028 | 80.47% | |
| Facial phenotype synonyms | Vanilla LLM | 0.8743 | 52.64% |
| Cypher RAG LLM | 0.9481 | 88.20% | |
| Vector RAG LLM | 0.9335 | 86.50% |
For all types of questions, the RAG LLM has a higher BertScore and coverage than the Vanilla LLM. The coverage measures the proportion of key information in reference answers correctly answered by the LLMs. The bold values represent the best performance in this type of question.