Confidential — Stefan Michaelcheck Only

Benchmarking Large Language Models for Knowledge Graph Validation

2026benchmark creationnovelevaluation

Farzad Shami, Gianmaria Silvello, Stefano Marchesin

Padua Research Archive (University of Padova)

https://doi.org/10.48550/arxiv.2602.10748OpenAlex: W7128688500arXiv: 2602.10748
3
URLs Found
0
Internal Citations
3
Authors
usable
Abstract Quality
GPT-5.5 Abstract Analysis

Problems Identified (5)

KG Fact Validation Challenge: Verifying the factual accuracy of knowledge graph facts is essential but challenging for applications that depend on knowledge graphs.

Manual Verification Scalability: Expert manual verification of knowledge graph facts is impractical at large scale.

Automated KG Validation Readiness: Existing automated methods for knowledge graph validation are not ready for real-world knowledge graphs.

LLM Suitability Gap: The suitability and effectiveness of large language models for knowledge graph fact validation remain underexplored.

KG Fact Validation Challenge: Verifying the factual accuracy of knowledge graph facts is essential but challenging for applications that depend on knowledge graphs.

Proposed Solutions (5)

FactCheck Benchmark: FactCheck is a benchmark for evaluating LLMs on knowledge graph fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.

Real-World KG LLM Evaluation: The study evaluates open-source and commercial LLMs on three diverse real-world knowledge graphs.

KG Validation RAG Dataset: FactCheck provides a retrieval-augmented generation dataset with over two million documents for knowledge graph fact validation.

Verification Exploration Platform: FactCheck includes an interactive platform for analyzing knowledge graph fact verification decisions.

FactCheck Benchmark: FactCheck is a benchmark for evaluating LLMs on knowledge graph fact validation across internal knowledge, RAG-based external evidence, and multi-model consensus.

Results (3)

LLM Validation Reliability Limitation:

RAG Inconsistent Improvement:

Consensus Inconsistent Outperformance:

Research Domain

Knowledge graph fact validation and LLM benchmarking

← Back to all papers