Benchmarking Large Language Models for Knowledge Graph Validation
Farzad Shami, Gianmaria Silvello, Stefano Marchesin
Padua Research Archive (University of Padova)
Problems Identified (5)
KG Fact Validation Challenge: Verifying the factual accuracy of knowledge graph facts is essential but challenging for applications that depend on knowledge graphs.
Manual Verification Scalability: Expert manual verification of knowledge graph facts is impractical at large scale.
Automated KG Validation Readiness: Existing automated methods for knowledge graph validation are not ready for real-world knowledge graphs.
LLM Suitability Gap: The suitability and effectiveness of large language models for knowledge graph fact validation remain underexplored.
KG Fact Validation Challenge: Verifying the factual accuracy of knowledge graph facts is essential but challenging for applications that depend on knowledge graphs.
Proposed Solutions (5)
FactCheck Benchmark: FactCheck is a benchmark for evaluating LLMs on knowledge graph fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.
Real-World KG LLM Evaluation: The study evaluates open-source and commercial LLMs on three diverse real-world knowledge graphs.
KG Validation RAG Dataset: FactCheck provides a retrieval-augmented generation dataset with over two million documents for knowledge graph fact validation.
Verification Exploration Platform: FactCheck includes an interactive platform for analyzing knowledge graph fact verification decisions.
FactCheck Benchmark: FactCheck is a benchmark for evaluating LLMs on knowledge graph fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.
Results (3)
LLM Validation Reliability Limitation:
RAG Inconsistent Improvement:
Consensus Inconsistent Outperformance:
Research Domain
Knowledge graph fact validation and LLM benchmarking