Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Aalborg University, Southeast University, Bowling Green State University, Tongji University
EMNLP 2025

LLM-KG4QA


This survey aims to outline the recent progress in integrating LLMs with KGs for complex QA, summarizing technical advances, and identifying open challenges and future research opportunities. The methodology of synthesizing LLMs and KGs and complex QA has been explored from different perspectives. A structured taxonomy from different perspectives aims to highlight the alignments between various LLM+KG approaches and different complex QA by discussing how the LLM+KG approaches with different roles of KG can address the challenges of complex QA.

Abstract

Large language models (LLMs) have demonstrated remarkable performance on question-answering (QA) tasks because of their superior capabilities in natural language understanding and generation. However, LLM-based QA struggles with complex QA tasks due to poor reasoning capacity, outdated knowledge, and hallucinations. Several recent works synthesize LLMs and knowledge graphs (KGs) for QA to address the above challenges. In this survey, we propose a new structured taxonomy that categorizes the methodology of synthesizing LLMs and KGs for QA according to the categories of QA and the KG's role when integrating with LLMs. We systematically survey state-of-the-art methods in synthesizing LLMs and KGs for QA and compare and analyze these approaches in terms of strength, limitations, and KG requirements. We then align the approaches with QA and discuss how these approaches address the main challenges of different complex QA. Finally, we summarize the advancements, evaluation metrics, and benchmark datasets and highlight open challenges and opportunities.

Knowledge Integration and Fusion

KGs usually play the role of background knowledge when synthesizing LLMs for complex QA, where knowledge fusion and RAG are the main technical paradigms. Knowledge integration and fusion aim to enhance language models (LMs) by integrating unknown knowledge into LMs for QA, in which the KGs and text are aligned via local subgraph extraction and entity linking, and then fed into the cross-model encoder to bidirectionally fuse text and KG to jointly train the language models for complex QA tasks. The main challenge of this approach lies in how to effectively integrate up-to-date the knowledge from KGs and text to avoid knowledge conflicts and update knowledge without retraining or re-finetuning?

Knowledge Integration and Fusion

Knowledge Integration and Fusion

Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) serves as a retrieval and augmentation mechanism to first retrieves relevant knowledge from the text chunks based on vector-similarity retrieval, and then augments the LLMs by integrating the retrieved context with LLMs. The RAG and KG-RAG can also improve the capabilities of LLMs in understanding the user’s interactions for generating accurate answers for conversational QA. However, the key technical challenge behind this methodology is how to retrieve the relevant knowledge from large-scale KGs and then effectively fuse with LLMs without inducing knowledge conflicts?

Retrieval Augmented Generation

Retrieval Augmented Generation

KGs as Reasoning Guidelines

KGs can provide reasoning guidelines for LLMs to access precise knowledge from factual evidence based on reasoning. Recent methods have shown that KGs can also be integrated into the reasoning process of LLMs as a component within an Agent system, this integration allows the Agent to leverage structured knowledge for augmenting the decision-making and problem-solving capabilities of LLMs. However, the reasoning capabilities of KGs mainly depend on the completeness and knowledge coverage of KGs, where the incomplete, inconsistent, and outdated knowledge from KGs might induce noise or conflicts. The main challenge lies in how to improve the reasoning efficiency over the large-scale graph and reasoning capabilities under incomplete KGs?

KGs as Reasoning Guidelines

KGs as Reasoning Guidelines

KGs as Refiners and Validators

KGs act as refiner and validator for LLMs, where the factual evidence from KGs enables LLMs to refine and verify the intermediate answers generated by LLMs, thereby enhancing the accuracy and reliability of the final answers. However, the knowledge conflicts between the intermediate answer and KG facts might induce irrelevant results due to the poorly verified intermediate results. Meanwhile, the refinement and validation of results largely depend on the correctness, timeliness, and completeness of factual knowledge in KGs. The main challenges of this approach lie in how to handle the knowledge conflict between intermediate answers and KG facts and incrementally update KGs to ensure factual knowledge in KGs is up-to-date and correct?

KGs as Refiners and Validators

KGs as Refiners and Validators

BibTeX

@InProceedings{ma2025llmkg4qa,
  title={Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities},
  author={Ma, Chuangtao and Chen, Yongrui and Wu, Tianxing and Khan, Arijit and Wang, Haofen},
  booktitle={Proceedings of the 30th Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2025},
  pages={1--20},
}