Benchmarking Large Language Models for Knowledge Graph Validation
Aaltodoc (Aalto University)
Problems Identified (4)
KG fact validation at scale: Verifying factual accuracy in knowledge graphs is essential but challenging because expert manual verification is impractical at large scale and automated methods are not ready for real-world KGs.
Unexplored LLM suitability for KG validation: The suitability and effectiveness of LLMs for knowledge graph fact validation remain largely unexplored.
KG fact validation at scale: Verifying factual accuracy in knowledge graphs is essential but challenging because expert manual verification is impractical at large scale and automated methods are not ready for real-world KGs.
Unexplored LLM suitability for KG validation: The suitability and effectiveness of LLMs for knowledge graph fact validation remain largely unexplored.
Proposed Solutions (5)
FactCheck benchmark: The paper introduces FactCheck, a benchmark for evaluating LLMs on KG fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.
RAG dataset for KG validation: FactCheck includes a retrieval-augmented generation dataset with more than two million documents tailored to KG fact validation.
Interactive verification analysis platform: The paper provides an interactive platform for analyzing KG fact verification decisions.
FactCheck benchmark: The paper introduces FactCheck, a benchmark for evaluating LLMs on KG fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.
RAG dataset for KG validation: FactCheck includes a retrieval-augmented generation dataset with more than two million documents tailored to KG fact validation.
Results (3)
LLMs not reliable for real-world KG validation:
RAG gives inconsistent KG validation gains:
Consensus does not consistently outperform single models:
Research Domain
Knowledge graph validation and large language model evaluation