多媒体技术研究:2014——深度学习与媒体计算

吴飞; 朱文武; 于俊清

发布时间： 2015-11-02
摘要点击次数： 5581
全文下载次数： 549
DOI: 10.11834/jig.20151101
2015 | Volume 20 | Number 11

多媒体技术研究:2014——深度学习与媒体计算

吴飞¹, 朱文武², 于俊清³(1.浙江大学计算机学院, 杭州 310058;2.清华大学计算机学院, 北京 100084;3.华中科技大学计算机学院, 武汉 430074)

摘要

目的海量数据的快速增长给多媒体计算带来了深刻挑战。与传统以手工构造为核心的媒体计算模式不同,数据驱动下的深度学习(特征学习)方法成为当前媒体计算主流。方法重点分析了深度学习在检索排序与标注、多模态检索与语义理解、视频分析与理解等媒体计算方面的最新进展和所面临的挑战,并对未来的发展趋势进行展望。结果在检索排序与标注方面, 基于深度学习的神经编码等方法取得了很好的效果;在多模态检索与语义理解方面,深度学习被用于弥补不同模态间的“异构鸿沟“以及底层特征与高层语义间的”语义鸿沟“,基于深度学习的组合语义学习成为研究热点;在视频分析与理解方面, 深度神经网络被用于学习视频的有效表示方式及动作识别,并取得了很好的效果。然而,深度学习是一种数据驱动的方法,易受数据噪声影响, 对于在线增量学习方面还不成熟,如何将深度学习与众包计算相结合是一个值得期待的问题。结论该综述在深入分析现有方法的基础上,对深度学习框架下为解决异构鸿沟和语义鸿沟给出新的思路。

关键词

多媒体海量数据检索与标注语义理解深度学习

Researches on multimedia technology 2014—deep learning and multimedia computing

Wu Fei¹, Zhu Wenwu², Yu Junqing³(1.School of Computer Science & Technology, Zhejiang University, Hangzhou 310058, China;2.School of Computer Science & Technology, Tsinghua University, Beijing 100084, China;3.School of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China)

Abstract

Objective The increasing large scale data puts forth a great challenge to multimedia computing. Different from traditional multimedia computing which is heavily based on hand-crafted features, deep learning (feature learning) recently achieves noticeable advance in multimedia computing. Method This paper presents the details of deep learning on multimedia retrieval and annotation, multi-modal semantic understanding as well as the video analysis and understanding, which tend to overcome the heterogeneity gap and semantic gap of multimedia computing in the setting of deep learning framework. Result On multimedia retrieval and annotation, deep learning-based "neural-codes" has been proposed and proves effective. Besides, deep learning is used for multi-modal semantic understanding to bridge the heterogeneity gap between different modals and the semantic gap between the bottom features and top semantic and deep learning-based compositional semantic learning is attracting increasing focus. Moreover, deep learning proves effective for video action recognition and for achieving a good representation of videos. However, the data-driven deep learning is easily affected by the noise in the data and is not ripe for online incremental learning. How to combine deep learning with crowdsourcing computing is a challenge and may be a future research direction. Conclusion We analyze the existing methods of deep learning, and provide a new way to overcome the heterogeneity gap and semantic gap in deep learning framework.

Keywords

multimedia large scale data retrieval and annotaytion semantic understanding deep learning

在线采编平台

论文出版

年度会议

下载中心

年度信息