Assessing Large Language Models Suitability for Knowledge Graph Construction

2026benchmark creationevaluativeevaluation

Vasile Ionut Remus Iga, Gheorghe Cosmin Silaghi

Neurosymbolic Artificial Intelligence

https://doi.org/10.1177/29498732261419317 OpenAlex: W7128598170

URLs Found

Internal Citations

Authors

usable

Abstract Quality

GPT-5.5 Abstract Analysis

Problems Identified (5)

LLM hallucination and nondeterminism: Large language models are prone to hallucinated information and non-deterministic outputs that can cause flawed reasoning.

Pipeline integration limits: The unpredictability of LLM outputs limits their integration into automated NLP pipelines such as chatbots and task-oriented dialogue systems.

LLM suitability for KG tasks: The paper investigates the potential and limitations of LLMs for knowledge graph tasks, especially static knowledge graph construction.

LLM hallucination and nondeterminism: Large language models are prone to hallucinated information and non-deterministic outputs that can cause flawed reasoning.

Pipeline integration limits: The unpredictability of LLM outputs limits their integration into automated NLP pipelines such as chatbots and task-oriented dialogue systems.

Proposed Solutions (5)

LLM KG construction evaluation: The study evaluates Mixtral-8x7b-Instruct-v0.1, GPT-3.5-Turbo-0125, and GPT-4o on static knowledge graph construction.

TELeR prompt scenarios: The approach uses TELeR-taxonomy-based prompts in zero-shot and one-shot scenarios for task-oriented dialogue contexts.

Flexible KG evaluation framework: The paper proposes a flexible evaluation framework that captures usable model-generated information in addition to strict metrics.

TODSet benchmark dataset: The paper introduces TODSet, a dataset for measuring LLM performance on knowledge graph-related tasks.

LLM KG construction evaluation: The study evaluates Mixtral-8x7b-Instruct-v0.1, GPT-3.5-Turbo-0125, and GPT-4o on static knowledge graph construction.

Results (3)

Prompt detail improves LLM KG construction:

TODSet introduced:

Flexible evaluation framework introduced:

Research Domain

Large language models for knowledge graph construction

← Back to all papers