1233 - Balancing Heterogeneous Knowledge Sources: Pareto-Based Multi-Teacher Distillation for MLPs on Graphs
Association for Artificial Intelligence 2026, Wenrui Zhao, Yijun Tian, Zhichao Xu, Chuxu Zhang, Yawei Wang
Open MIND
Problems Identified (4)
HGNN inference latency: Heterogeneous Graph Neural Networks depend heavily on neighbor information, causing high latency that limits real-world practicality.
Structure-agnostic distillation limitation: Existing GNN distillation approaches focus on reproducing teacher predictions while neglecting structural knowledge, making them less effective on complex heterogeneous graphs.
HGNN inference latency: Heterogeneous Graph Neural Networks depend heavily on neighbor information, causing high latency that limits real-world practicality.
Structure-agnostic distillation limitation: Existing GNN distillation approaches focus on reproducing teacher predictions while neglecting structural knowledge, making them less effective on complex heterogeneous graphs.
Proposed Solutions (4)
HGKD hierarchical distillation: HGKD is a hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to an MLP student.
Pareto multi-teacher HGKD variants: Two HGKD variants help the MLP student learn from multiple teacher models using Pareto learning and low-cost neighbor information.
HGKD hierarchical distillation: HGKD is a hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to an MLP student.
Pareto multi-teacher HGKD variants: Two HGKD variants help the MLP student learn from multiple teacher models using Pareto learning and low-cost neighbor information.
Results (2)
Competitive teacher-level performance:
Competitive teacher-level performance:
Research Domain
Heterogeneous graph learning and knowledge distillation