
学位论文简介
针对当前多模态情感计算在处理非平稳、多源异构交互时面临的拓扑时序固化、频域特征擦除、深层语义漂移以及底层梯度失衡等核心挑战,本文以图神经网络为基础,确立了“拓扑重构 → 频域解耦 → 时空演化 → 梯度寻优”的研究主线。主要研究内容归纳如下:
(1) 动态演化拓扑建模与自适应聚合。 为突破静态构图的拓扑固化局限,引入重叠滑动窗口机制构建局部演化子图,规避长程序列的过度平滑问题。同时,校准拉普拉斯算子的度数惩罚偏置,推导度感知相似度矩阵,提升关键情感枢纽节点的聚合权重,纠正特征同质化稀释现象。
(2) 双维上下文滤波与多频传播解耦。为过滤冗余交互节点以构建高信噪比表征,设计基于语义关联与信息熵的动态滤波机制。此外,提出高低频并行的图传播架构,同步保留跨模态情感共性,并精准刻画模态间的异质性情绪冲突,实现频域层面的深层解耦。
(3) 权重时空演化与多尺度拓扑协同。 为抑制瞬态噪声引发的语义漂移现象,构建时序图演化门控循环机制,通过增量式更新图聚合规则,平衡长程依赖捕获与显存开销之间的矛盾。结合度归一化算子与并行多尺度图卷积架构,在规避结构性拓扑盲区的同时,实现异构信号多尺度时空特征的高效聚合。
(4) 博弈幅度校准与多目标凸包投影。 为解决跨模态联合训练的底层优化困境,引入基于合作博弈论的夏普利值的估值机制,自适应校准梯度范数以缓解幅度失衡。同时,设计改进的多目标凸包投影算法,将参数更新严格约束于最优凸包内,确立了异构信号均衡演化的理论收敛下界。
主要学术成果
[1] Yingxue Gao, Huan Zhao*, Zixing Zhang*. Adaptive Speech Emotion Representation Learning Based On Dynamic Graph [C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2024: 1116-11120. (CCF B, 学生第一作者)
[2] Huan Zhao, Yingxue Gao*, Haijiao Chen, Bo Li, Guanghui Ye, Zixing Zhang*. Enhanced Multimodal Emotion Recognition in Conversations via Contextual Filtering and Multi-Frequency Graph Propagation [C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2025: 1-5. (CCF B, 导师第一作者,学生第二作者)
[3] Huan Zhao, Yi Ju, Yingxue Gao*. Bilevel Relational Graph Representation Learning-based Multimodal Emotion Recognition in Conversation [C]. IEEE International Conference on Multimedia and Expo. 2024: 1-6. (CCF B, 导师第一作者,学生唯一通讯作者)
[4] Huan Zhao, Gong Chen, Zhijie Yu, Yingxue Gao*. Graph-Based Emotion Consensus Perception Learning for Multimodal Emotion Recognition in Conversation [C]. IEEE International Conference on Acoustics, Speech and Signal Processing. 2026: 11642-11646. (CCF B, 导师第一作者,学生唯一通讯作者)
[5] Huan Zhao, Zhijie Yu, Yong Wei, Bo Li, Yingxue Gao*. DSSR: Decoupling Salient and Subtle Representations Under Missing Modalities for Multimodal Emotion Recognition [C]. IEEE International Conference on Acoustics, Speech and Signal Processing. 2026: 12077-12081. (CCF B, 导师第一作者,学生唯一通讯作者)
[6] Yingxue Gao, Huan Zhao, Yufeng Xiao, Zixing Zhang. GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition [C]. IEEE International Conference on Multimodal Interaction. 2023: 307-313. (CCF C, 学生第一作者)
[7] Yingxue Gao, Huan Zhao*, Zixing Zhang*, Keqin Li. Mitigating Modality Imbalance in Multimodal Emotion Recognition via Dynamic Gradient Shaping [J]. IEEE Transactions on Affective Computing. (SCI 一区,学生第一作者,评审中)
[8] Yingxue Gao, Huan Zhao*, Zixing Zhang*. DyGIN: Modality-Unified Dynamic Graph Inception Network for Context-Aware Emotion Recognition [J]. IEEE Internet of Things Journal. (SCI 二区,学生第一作者,大修修回)