1233 - Balancing Heterogeneous Knowledge Sources: Pareto-Based Multi-Teacher Distillation for MLPs on Graphs

2026model innovationincrementalmethod

Association for Artificial Intelligence 2026, Wenrui Zhao, Yijun Tian, Zhichao Xu, Chuxu Zhang, Yawei Wang

Open MIND

https://doi.org/10.48448/1c4b-fk26 OpenAlex: W7128717995

URLs Found

Internal Citations

Authors

usable

Abstract Quality

GPT-5.5 Abstract Analysis

Problems Identified (4)

HGNN inference latency: Heterogeneous Graph Neural Networks depend heavily on neighbor information, causing high latency that limits real-world practicality.

Structure-agnostic distillation limitation: Existing GNN distillation approaches focus on reproducing teacher predictions while neglecting structural knowledge, making them less effective on complex heterogeneous graphs.

HGNN inference latency: Heterogeneous Graph Neural Networks depend heavily on neighbor information, causing high latency that limits real-world practicality.

Proposed Solutions (4)

HGKD hierarchical distillation: HGKD is a hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to an MLP student.

Pareto multi-teacher HGKD variants: Two HGKD variants help the MLP student learn from multiple teacher models using Pareto learning and low-cost neighbor information.

HGKD hierarchical distillation: HGKD is a hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to an MLP student.

Pareto multi-teacher HGKD variants: Two HGKD variants help the MLP student learn from multiple teacher models using Pareto learning and low-cost neighbor information.

Results (2)

Competitive teacher-level performance:

Research Domain

Heterogeneous graph learning and knowledge distillation

← Back to all papers