Automated Knowledge Extraction from Large Language Model Research Papers for the ORKG Model Landscape
Alaa Kefi
Leibniz Universität Hannover
Problems Identified (5)
Document-centric scholarly knowledge: Scientific knowledge embedded in natural-language documents limits machine-assisted discovery and reuse.
Scattered LLM metadata: Core facts about LLMs are scattered across heterogeneous sources, making a stable, queryable model catalog difficult to maintain.
Need for machine-actionable descriptions: The work addresses the need for structured, machine-actionable descriptions of LLM research knowledge aligned with FAIR principles.
Document-centric scholarly knowledge: Scientific knowledge embedded in natural-language documents limits machine-assisted discovery and reuse.
Scattered LLM metadata: Core facts about LLMs are scattered across heterogeneous sources, making a stable, queryable model catalog difficult to maintain.
Proposed Solutions (5)
LLM-based ORKG extraction workflow: The thesis proposes an NLP workflow that parses research papers, applies LLM-based extraction under the ORKG LLM template, and maps outputs into the Generative AI Model Landscape comparison.
Multivariant paper support: The workflow includes support for extracting and mapping information from papers that describe multiple model variants.
Model extraction evaluation: The work evaluates 18 LLMs for extraction quality using property-level precision, recall, F1, strict and fuzzy matching, and BERTScore for longer fields.
LLM-based ORKG extraction workflow: The thesis proposes an NLP workflow that parses research papers, applies LLM-based extraction under the ORKG LLM template, and maps outputs into the Generative AI Model Landscape comparison.
Multivariant paper support: The workflow includes support for extracting and mapping information from papers that describe multiple model variants.
Results (3)
Extraction quality characterization:
Recurring failure modes identified:
Reproducible end-to-end pipeline:
Research Domain
NLP for scholarly knowledge extraction and research knowledge graphs