大型音频-语言模型及其应用-湖大计算机学院

我的位置在：首页 > 学术报告 > 正文

大型音频-语言模型及其应用

浏览次数:日期：2025-07-09编辑：科研办

报告人：Wenwu Wang教授，英国萨里大学计算机科学与电子工程学院

报告时间：2025年7月11日（星期五）下午3：30

报告地点：信息科学与工程学院 223会议室

报告摘要: 大型语言模型（LLMs）正被应用于音频处理领域，用于从复杂声音数据（如语音、音乐、环境噪声、音效及其他非言语音频）中解读并生成有意义的模式。结合声学模型，大型语言模型在解决各类音频处理问题上展现出巨大潜力，例如音频描述生成、音频生成、声源分离及音频编码等。

本次讲座将介绍利用大型语言模型应对音频相关挑战的最新进展，内容包括：用于音频与文本数据映射和对齐的语言-音频模型、它们在各类音频任务中的应用、语言-音频数据集的构建，以及语言-音频学习未来的潜在方向。

报告将展示该领域的近期成果，例如：用于音频生成与故事讲述的AudioLDM、AudioLDM2和WavJourney；用于声源分离的AudioSep；用于音频描述生成的ACTUAL；用于音频编码的SemantiCodec；用于内容创作与编辑的WavCraft；用于音频推理的APT-LLMs；以及用于训练和评估大型语言-音频模型的数据集WavCaps、Sound-VECaps和AudioSetCaps。

Large Audio-Language Models and Applications

Abstract: Large Language Models (LLMs) are being explored in audio processing to interpret and generate meaningful patterns from complex sound data, such as speech, music, environmental noise, sound effects, and other non-verbal audio. Combined with acoustic models, LLMs offer great potential for addressing a variety of problems in audio processing, such as audio captioning, audio generation, source separation, and audio coding.

This talk will cover recent advancements in using LLMs to address audio-related challenges. Topics will include the language-audio models for mapping and aligning audio with textual data, their applications across various audio tasks, the creation of language-audio datasets, and potential future directions in language-audio learning.

We will demonstrate our recent works in this area, for example, AudioLDM, AudioLDM2 and WavJourney for audio generation and storytelling, AudioSep for audio source separation, ACTUAL for audio captioning, SemantiCodec for audio coding, WavCraft for content creation and editing, and APT-LLMs for audio reasoning, and the datasets WavCaps, Sound-VECaps, and AudioSetCaps for training and evaluating large language-audio models.

报告人简介: 王文武（Wenwu Wang）博士现任英国萨里大学计算机科学与电子工程学院信号处理与机器学习教授、对外合作副院长，同时担任萨里大学人工智能研究所特聘研究员，主要研究方向包括信号处理、机器学习与感知、人工智能、机器听觉及统计异常检测，已在这些领域发表论文400余篇。其研究成果在国际上获得广泛认可，包括2022年IEEE信号处理学会青年作者最佳论文奖、2021年国际音频信号处理与理解会议（ICAUS）最佳论文奖、2020年及2023年音频场景与事件检测挑战赛（DCASE）评委奖、2019年及2020年DCASE可复现系统奖，以及2018年语音与音频处理及独立分量分析国际会议（LVA/ICA）最佳学生论文奖。

学术任职方面，他现任IEEE开放信号处理期刊资深领域编辑（2025-2027年）、IEEE多媒体汇刊编委（2024-2026年）；曾担任IEEE信号处理汇刊资深领域编委（2019-2023年）及编委（2014-2018年）、IEEE/ACM音频、语音与语言处理汇刊编委（2020-2025年）。

他还曾任IEEE信号处理学会机器学习信号处理技术委员会主席（2023-2024年）及IEEE信号处理学会技术方向委员会委员（2023-2024年），现任欧洲信号处理协会（EURASIP）声学、语音与音乐信号处理技术领域委员会主席（2025-2027年）及IEEE信号处理学会信号处理理论与方法技术委员会委员（2021-2026年）。

王文武教授是多个重要国际会议的组织委员会成员，包括2022年国际语音通信协会年会（INTERSPEECH2022）、2019年及2024年IEEE国际声学、语音与信号处理会议（ICASSP）、2013年及2024年IEEE机器学习信号处理国际研讨会（MLSP）以及2009年IEEE统计信号处理研讨会（SSP），并担任2025IEEE机器学习信号处理国际研讨会（MLSP）技术程序联合主席。此外，他还在20多个国际会议和研讨会上担任特邀主旨演讲嘉宾和大会报告嘉宾。

Prof. Wenwu Wang , University of Surrey, UK

Dr. Wenwu Wang is a Professor in Signal Processing and Machine Learning and an Associate Head in External Engagement, School of Computer Science and Electronic Engineering, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 400 papers in these areas. His work has received numerous recognitions, including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2020 and 2023 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, and LVA/ICA 2018 Best Student Paper Award.

He is a Senior Area Editor (2025-2027) for IEEE Open Journal of Signal Processing, and an Associate Editor (2024-2026) for IEEE Transactions on Multimedia. He was previously a Senior Area Editor (2019-2023) and an Associate Editor (2014-2018) for IEEE Transactions on Signal Processing, and an Associate Editor (2020-2025) for IEEE/ACM Transactions on Audio Speech and Language Processing.

He was the elected Chair (2023-2024) of IEEE Signal Processing Society (SPS) Machine Learning for Signal Processing Technical Committee and a Board Member (2023-2024) of IEEE SPS Technical Directions Board. He is currently the elected Chair (2025-2027) of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, and an elected Member (2021-2026) of the IEEE SPS Signal Processing Theory and Methods Technical Committee.

He has been on the organising committee of major conferences including INTERSPEECH 2022, IEEE ICASSP 2019 & 2024, IEEE MLSP 2013 & 2024, and IEEE SSP 2009. He is Technical Program Co-Chair of IEEE MLSP 2025. He has been an invited Keynote or Plenary Speaker on more than 20 international conferences and workshops.

邀请人：钟雄虎

联系人：罗娟（学）

上一篇：: 浅谈如何开展高质量的研究工作：兴趣、坚持和写作

下一篇：: Key Technologies of AI-based Communications towards 6G