
学位论文简介
语音情感识别作为智能人机交互的核心技术,在推动情感计算、心理健康监测等应用落地的同时面临跨模态隐私泄露的独特性挑战。语音副语言特性与说话人生物属性强耦合,导致情感识别过程中易发生特征级联泄露。传统集中式训练需上传原始音频,攻击者可利用声纹克隆、对抗样本重构等技术窃取敏感信息。现有语音情感联邦学习框架虽通过数据本地化降低中心化泄露风险,并利用语音模型的时序梯度抽象性抑制声纹反演,但其参数共享机制仍存在以下问题:
(1)针对共享模型更新暴露敏感属性信息问题,提出一种特征级联邦隐私增强方法以对抗属性推理攻击。首先,双向递归神经网络负责捕获序列中的潜在表征以去除部分冗余特征。然后,应用特征注意力机制将注意力集中在潜在表征的显著区域,进一步隐藏与情绪无关的属性信息。
(2)针对频繁模型更新导致差分隐私策略失效问题,提出一种梯度级分层联邦差分隐私策略以对抗属性推理攻击。一方面,通过雇佣归一化函数,更细粒度地区分梯度重要性,近而裁剪重要梯度,过滤掉可能带来隐私泄露的敏感信息。另一方面,基于理论分析,提出分层梯度扰动机制,通过在反向传播过程中对早期网络层施加差异化随机噪声,实现了精准的隐私保护定位。
(3)针对预训练语音模型参数冗余难以适配联邦微调问题,提出一种云—边—端协同的架构级联邦参数高效微调可行性范式。参数高效微调支持在预训练语音模型的反向更新层嵌入可训练层,通过冻结骨干模型参数,只允许少量可训练参数参与共享,实现轻量化交互,降低通信开销。同时,基于预训练语音模型的通用知识微调能够有效提升语音情感识别性能。
(4)针对多重攻击与灾难性遗忘的耦合威胁,提出一种系统级联邦蒸馏多重防御机制。首先,使用服务器端轻量化生成器学习全局视图知识,并通过蒸馏指导客户端更新,进一步减轻灾难性遗忘并提高系统性能。其次,设计一种多路径集成的防御范式对抗潜在的系统攻击,包括基于梯度修改的数据扰动、动态加权选择方法以及捕获判别特征的隐私增强策略。此外,为最大限度减少参数泄露,采用参数解耦的分层共享机制,大大降低了通信开销。
主要学术成果
[1] Haijiao Chen, Huan Zhao*, Zixing Zhang*, and Keqin Li, “Discriminative feature learning-based federated lightweight distillation against multiple attacks,” IEEE Internet of Things Journal, pp. 17663-17677, 2024. (中科院SCI一区,第一作者)
[2] Haijiao Chen, Huan Zhao*, Zixing Zhang*, and Keqin Li, “Federal Parameter-Efficient Fine-Tuning for Speech Emotion Recognition,” Expert Systems With Applications, 2025. (中科院SCI一区,第一作者,大修修回)
[3] Haijiao Chen, Huan Zhao*, Zixing Zhang*, “Gradient-Level Differential Privacy Against Attribute Inference Attack for Speech Emotion Recognition,” IEEE Signal Processing Letters, pp. 3124-3128, 2024. (中科院SCI二区,第一作者)
[4] Huan Zhao, Haijiao Chen*, Yufeng Xiao, and Zixing Zhang, “Privacy-enhanced federated learning against attribute inference attack for speech emotion recognition,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1-5. (EI,CCF-B类,导师一作)
[5] Haijiao Chen, Huan Zhao*, Yingxue Gao, Yiming Liu, Zixing Zhang*, “Parameter-efficient federal-tuning enhances privacy preserving for speech emotion recognition,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2025: 1-5. (EI,CCF-B类,第一作者)
[6] Huan Zhao, Nianxin Huang, Haijiao Chen*, “Knowledge enhancement for speech emotion recognition via multi-level acoustic feature,” Connection Science, 2024, 36(1): 2312103. (中科院SCI四区,唯一通讯作者)
[7] Huan Zhao, Yingxue Gao*, Haijiao Chen, Bo Li, Guanghui Ye, and Zixing Zhang, “Enhanced Multimodal Emotion Recognition in Conversations via Contextual Filtering and Multi-Frequency Graph Propagation,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2025: 1-5. (EI,CCF-B类,第三作者)
[8] Yiming Liu, Huan Zhao*, Yaqian Liu, Haijiao Chen, Bo Li, Guanghui Ye, and Zixing Zhang, “DSSM: Dual State Space Model For Human Motions Generation,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2025: 1-5. (EI,CCF-B类,第四作者)
[9] 赵欢, 向小君, 陈海蛟. 基于动态窗口紧凑卷积transformer的跨语料库语音情感识别方法[P]. 湖南省: CN118522311A, 2024-08-20, 发明专利, 已实审.
[10] 赵欢, 黄念鑫, 陈海蛟. 融合多层次声学信息的语音情感识别方法、装置及存储介质[P]. 湖南省: CN116504275A, 2023-07-28, 发明专利, 已实审.