Benchmarking Large Language Models for Knowledge Graph Validation

2026benchmark creationevaluativedataset

Aaltodoc (Aalto University)

https://doi.org/10.48786/edbt.2026.45 OpenAlex: W7138873612

URLs Found

Internal Citations

Authors

usable

Abstract Quality

GPT-5.5 Abstract Analysis

Problems Identified (4)

KG fact validation at scale: Verifying factual accuracy in knowledge graphs is essential but challenging because expert manual verification is impractical at large scale and automated methods are not ready for real-world KGs.

Unexplored LLM suitability for KG validation: The suitability and effectiveness of LLMs for knowledge graph fact validation remain largely unexplored.

Proposed Solutions (5)

FactCheck benchmark: The paper introduces FactCheck, a benchmark for evaluating LLMs on KG fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.

RAG dataset for KG validation: FactCheck includes a retrieval-augmented generation dataset with more than two million documents tailored to KG fact validation.

Interactive verification analysis platform: The paper provides an interactive platform for analyzing KG fact verification decisions.

FactCheck benchmark: The paper introduces FactCheck, a benchmark for evaluating LLMs on KG fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.

RAG dataset for KG validation: FactCheck includes a retrieval-augmented generation dataset with more than two million documents tailored to KG fact validation.

Results (3)

LLMs not reliable for real-world KG validation:

RAG gives inconsistent KG validation gains:

Consensus does not consistently outperform single models:

Research Domain

Knowledge graph validation and large language model evaluation

← Back to all papers