A multimodal alignment and attention fusion framework for knowledge graph construction: joint extraction and evaluation in the context of ICH (a case study of Anhui)

2026graph constructionnovelcombination

Proceedings of the Indian National Science Academy

https://doi.org/10.1007/s43538-026-00733-x OpenAlex: W7140179380

URLs Found

Internal Citations

Authors

usable

Abstract Quality

GPT-5.5 Abstract Analysis

Problems Identified (4)

multimodal KG limitations: Digital preservation of Intangible Cultural Heritage is hindered by traditional knowledge graphs' limited ability to represent and integrate multimodal data.

cross-modal semantic differences: Different data modalities introduce semantic differences that must be addressed for multimodal ICH knowledge graph construction.

multimodal KG limitations: Digital preservation of Intangible Cultural Heritage is hindered by traditional knowledge graphs' limited ability to represent and integrate multimodal data.

cross-modal semantic differences: Different data modalities introduce semantic differences that must be addressed for multimodal ICH knowledge graph construction.

Proposed Solutions (5)

multimodal alignment attention fusion framework: The paper proposes a Multimodal Alignment and attention fusion framework for constructing a comprehensive ICH knowledge graph.

multimodal joint extraction pipeline: The approach processes textual, image, audio, and video data through multimodal fusion, alignment, and joint entity-relation extraction.

visual-guided Transformer attention: The method injects visual features from ICH imagery and keyframes into a Transformer-based text encoder using visual-guided attention.

JSD alignment loss: The framework introduces an alignment loss based on Jensen-Shannon Divergence to handle semantic differences between modalities.

ICH relational-ontology-temporal strategy: The framework includes a relational pattern, ontology mapping, and temporal alignment strategy tailored to ICH knowledge.

Results (3)

state-of-the-art triplet extraction:

outperforms multimodal baselines:

large ICH knowledge graph:

Research Domain

Intangible Cultural Heritage knowledge graph construction / multimodal information extraction

← Back to all papers