
学位论文简介
服务机器人凭借自主感知、决策与执行能力,在医疗、家居等领域得到广泛应用。近年来,基于提示的场景理解方法受到关注,该方法融合多模态传感器数据与用户交互提示(如点击、文本、语音),提升了意图解析与目标定位的精度与鲁棒性。然而,现有研究多聚焦于单模态提示与单一任务,缺乏对“触控”与“语音”提示的协同感知与多任务优化支持,限制了服务机器人在复杂家庭场景中的实际应用。为解决提示语义模糊、小目标感知不均、跨模态信息冲突与多目标动态建模不足等问题,本文围绕多模态提示感知提出四项关键方法:
(1) 设计基于点击-像素聚合与梯度自适应的图像分割框架,提出掩码自适应Transformer与焦点损失,有效加速模型收敛并缓解类内点击模糊,提升分割精度与交互鲁棒性。实验结果表明,所提出的方法在解决交互模糊问题方面具有竞争力的效果和优势。
(2) 构建基于文本-像素选择与样本不平衡优化的参考图像分割模型,引入多尺度特征融合模块与平衡损失函数,提高网络在像素不平衡情况下定位参考目标的能力。在多个公共数据集上的实验结果验证了所提出方法的有效性与效率。
(3) 提出音频-文本协同Transformer用于视频目标分割,通过表达对齐机制与视觉注意模块提升语音与文本提示的协同理解能力。实验结果表明,所提出的方法在多个数据集上展现了较强的适应能力,尤其在联合提示中具有优异表现。
(4) 引入频域交叉注意力机制,建立音频参考多目标跟踪框架,并构建首个相关数据集,实现音频驱动的多目标动态感知,提升了音频与文本提示的参考多目标跟踪精度。最后,在参考多目标数据集上进行了广泛实验,验证了所提出的融合模块和损失函数的有效性。
最后,本文集成上述方法,构建了服务机器人提示场景理解原型系统,验证了“触控”与“语音”提示下的图像分割、视频分割与多目标跟踪能力,为服务机器人在真实家庭环境中的智能感知奠定了基础。
主要学术成果
[1] Lin Jiacheng, Xiao Zhiqiang, Wei Xiaohui, Duan Puhong, He Xuan, Dian Renwei, Li Zhiyong, and Li Shutao. Click-pixel Cognition Fusion Network with Balanced Cut for Interactive Image Segmentation. IEEE Transactions on Image Processing, 2024, 33: 177-190. (SCI, 第一作者)
[2] Lin Jiacheng, Chen Jiajun, Yang Kailun, A Roitberg, Li Siyu, Li Zhiyong, and Li Shutao. AdaptiveClick: Click-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation. IEEE Transactions on Neural Networks and Learning Systems, 2025. (SCI, 第一作者)
[3] Lin Jiacheng, Chen Jiajun, Peng Kunyu, He Xuan, R Stiefelhagen, Li Zhiyong, and Yang Kailun. EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving. IEEE Transactions on Intelligent Transportation Systems, 2024. (SCI,第一作者)
[4] Lin Jiacheng, Dai Xianwen, Nai Ke, Yuan Jin, Li Zhiyong, Zhang Xu, and Li Shutao. BRPPNet: Balanced Privacy Protection Network for Referring Personal Image Privacy Protection. Expert System With Application, 2023. (SCI,第一作者)
[5] Lin Jiacheng, Li Yang, Yang Guanci. FPGAN: Face De-identification Method with Generative Adversarial Networks for Social Robots. Neural Networks, 2021. (SCI,第一作者)
[6] Chen Jiajun, Lin Jiacheng, Zhong Guojin, Yao You, and Li Zhiyong. Multi-granularity Localization Transformer with Collaborative Understanding for Referring Multi-Object Tracking. IEEE Transactions on Instrumentation and Measurement, 2025. (SCI,共同通讯作者)
[7] Zeng Kang, Lin Jiacheng, and Li Zhiyong. Commonality-aware State Space Model with Adaptive Cut for Compositional Zero-Shot Learning. (ICCV 2025审稿中,共同第一作者,共同通讯作者)
[8] 林家丞,陈嘉俊,李智勇,王耀南. 基于语义概念关联的参考多目标跟踪方法. (自动化学报审稿中,第一作者)
[9] Li Siyu, Lin Jiacheng, Shi Hao, Zhang Jiaming, Wang Song, Yao You, Li Zhiyong and Yang Kailun. DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction. IEEE Transactions on Intelligent Transportation Systems, 2024. (SCI,第二作者)
[10] Chen Jiajun, Lin Jiacheng, Xiao Zhiqiang, Fu Haolong, Nai Ke, Yang Kailun, and Li Zhiyong. Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation. Knowledge-Based Systems, 2025. (SCI,第二作者)
[11] Xiao Zhiqiang, Lin Jiacheng, Chen Jiajun, Fu Haolong, Li Yifan, Yuan Jin, and Li Zhiyong. Privacy Preservation Network with Global-aware Focal Loss for Interactive Personal Visual Privacy Preservation. Neurocomputing, 2024. (SCI,第二作者)
[12] Dai Xianwen, Lin Jiacheng, Nai Ke, Li Qingpeng, and Li Zhiyong. Multiscale Deep Feature Selection Fusion Network for Referring Image Segmentation. Multimedia Tools and Applications, 2024. (SCI,第二作者)