Abstract
Causal discovery algorithms are often leveraged for inferring causal relationships and recovering a causal model from data. However, causal discovery from data alone is limited by the structural constraints of the used dataset, the lack of causal logic, and the lack of external knowledge. Thus, data-driven causal discovery can only suggest possible causal relationships at best. To overcome these limitations, Large Language Models (LLMs) and knowledge systems, such as Retrieval-Augmented Generation (RAG), have been proposed as alternatives to data-driven causal discovery and as a method to augment causal discovery algorithms. Using an expert-defined causal graph of chronic lower back pain, we further propose knowledge graph based RAG systems, such as GraphRAG, as an improvement over RAG systems for augmenting causal discovery (F1 0.745), benchmarking its performance against augmenting causal discovery with an LLM (F1 0.636), augmenting causal discovery with RAG (F1 0.714), and causal discovery alone (F1 0.396). We also explore the impact of different prompting methods for causality, such as querying for the plausibility of causal relationships, the presence of statistical associations, and the existence of temporal causal relationships, as inspired by the methodology of the domain experts constructing our ground truth. Lastly, we discuss how applications of LLMs, RAG, and graph-based RAG systems can impact and accelerate the causal modeling of chronic lower back pain by bridging the gap between domain knowledge and data driven approaches to causal modeling.
Full Text Availability
The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.
