A multimodal alignment and attention fusion framework for knowledge graph construction: joint extraction and evaluation in the context of ICH (a case study of Anhui)
Proceedings of the Indian National Science Academy
Problems Identified (4)
multimodal KG limitations: Digital preservation of Intangible Cultural Heritage is hindered by traditional knowledge graphs' limited ability to represent and integrate multimodal data.
cross-modal semantic differences: Different data modalities introduce semantic differences that must be addressed for multimodal ICH knowledge graph construction.
multimodal KG limitations: Digital preservation of Intangible Cultural Heritage is hindered by traditional knowledge graphs' limited ability to represent and integrate multimodal data.
cross-modal semantic differences: Different data modalities introduce semantic differences that must be addressed for multimodal ICH knowledge graph construction.
Proposed Solutions (5)
multimodal alignment attention fusion framework: The paper proposes a Multimodal Alignment and attention fusion framework for constructing a comprehensive ICH knowledge graph.
multimodal joint extraction pipeline: The approach processes textual, image, audio, and video data through multimodal fusion, alignment, and joint entity-relation extraction.
visual-guided Transformer attention: The method injects visual features from ICH imagery and keyframes into a Transformer-based text encoder using visual-guided attention.
JSD alignment loss: The framework introduces an alignment loss based on Jensen-Shannon Divergence to handle semantic differences between modalities.
ICH relational-ontology-temporal strategy: The framework includes a relational pattern, ontology mapping, and temporal alignment strategy tailored to ICH knowledge.
Results (3)
state-of-the-art triplet extraction:
outperforms multimodal baselines:
large ICH knowledge graph:
Research Domain
Intangible Cultural Heritage knowledge graph construction / multimodal information extraction