Abstract
Large language models (LLMs) have demonstrated remarkable performance on question-answering (QA) tasks because of their superior capabilities in natural language understanding and generation. However, LLM-based QA struggles with complex QA tasks due to poor reasoning capacity, outdated knowledge, and hallucinations. Several recent works synthesize LLMs and knowledge graphs (KGs) for QA to address the above challenges. In this survey, we propose a new structured taxonomy that categorizes the methodology of synthesizing LLMs and KGs for QA according to the categories of QA and the KG's role when integrating with LLMs. We systematically survey state-of-the-art methods in synthesizing LLMs and KGs for QA and compare and analyze these approaches in terms of strength, limitations, and KG requirements. We then align the approaches with QA and discuss how these approaches address the main challenges of different complex QA. Finally, we summarize the advancements, evaluation metrics, and benchmark datasets and highlight open challenges and opportunities.
Knowledge Integration and Fusion
KGs usually play the role of background knowledge when synthesizing LLMs for complex QA, where knowledge fusion and RAG are the main technical paradigms. Knowledge integration and fusion aim to enhance language models (LMs) by integrating unknown knowledge into LMs for QA, in which the KGs and text are aligned via local subgraph extraction and entity linking, and then fed into the cross-model encoder to bidirectionally fuse text and KG to jointly train the language models for complex QA tasks. The main challenge of this approach lies in how to effectively integrate up-to-date the knowledge from KGs and text to avoid knowledge conflicts and update knowledge without retraining or re-finetuning?

Knowledge Integration and Fusion
Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) serves as a retrieval and augmentation mechanism to first retrieves relevant knowledge from the text chunks based on vector-similarity retrieval, and then augments the LLMs by integrating the retrieved context with LLMs. The RAG and KG-RAG can also improve the capabilities of LLMs in understanding the user’s interactions for generating accurate answers for conversational QA. However, the key technical challenge behind this methodology is how to retrieve the relevant knowledge from large-scale KGs and then effectively fuse with LLMs without inducing knowledge conflicts?

Retrieval Augmented Generation
KGs as Reasoning Guidelines
KGs can provide reasoning guidelines for LLMs to access precise knowledge from factual evidence based on reasoning. Recent methods have shown that KGs can also be integrated into the reasoning process of LLMs as a component within an Agent system, this integration allows the Agent to leverage structured knowledge for augmenting the decision-making and problem-solving capabilities of LLMs. However, the reasoning capabilities of KGs mainly depend on the completeness and knowledge coverage of KGs, where the incomplete, inconsistent, and outdated knowledge from KGs might induce noise or conflicts. The main challenge lies in how to improve the reasoning efficiency over the large-scale graph and reasoning capabilities under incomplete KGs?

KGs as Reasoning Guidelines
KGs as Refiners and Validators
KGs act as refiner and validator for LLMs, where the factual evidence from KGs enables LLMs to refine and verify the intermediate answers generated by LLMs, thereby enhancing the accuracy and reliability of the final answers. However, the knowledge conflicts between the intermediate answer and KG facts might induce irrelevant results due to the poorly verified intermediate results. Meanwhile, the refinement and validation of results largely depend on the correctness, timeliness, and completeness of factual knowledge in KGs. The main challenges of this approach lie in how to handle the knowledge conflict between intermediate answers and KG facts and incrementally update KGs to ensure factual knowledge in KGs is up-to-date and correct?

KGs as Refiners and Validators
BibTeX
@InProceedings{ma2025llmkg4qa,
title={Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities},
author={Ma, Chuangtao and Chen, Yongrui and Wu, Tianxing and Khan, Arijit and Wang, Haofen},
booktitle={Proceedings of the 30th Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2025},
pages={1--20},
}