Current Issue Cover
多媒体技术研究:2014——深度学习与媒体计算

吴飞1, 朱文武2, 于俊清3(1.浙江大学计算机学院, 杭州 310058;2.清华大学计算机学院, 北京 100084;3.华中科技大学计算机学院, 武汉 430074)

摘 要
目的 海量数据的快速增长给多媒体计算带来了深刻挑战。与传统以手工构造为核心的媒体计算模式不同,数据驱动下的深度学习(特征学习)方法成为当前媒体计算主流。方法 重点分析了深度学习在检索排序与标注、多模态检索与语义理解、视频分析与理解等媒体计算方面的最新进展和所面临的挑战,并对未来的发展趋势进行展望。结果 在检索排序与标注方面, 基于深度学习的神经编码等方法取得了很好的效果;在多模态检索与语义理解方面,深度学习被用于弥补不同模态间的“异构鸿沟“以及底层特征与高层语义间的”语义鸿沟“,基于深度学习的组合语义学习成为研究热点;在视频分析与理解方面, 深度神经网络被用于学习视频的有效表示方式及动作识别,并取得了很好的效果。然而,深度学习是一种数据驱动的方法,易受数据噪声影响, 对于在线增量学习方面还不成熟,如何将深度学习与众包计算相结合是一个值得期待的问题。结论 该综述在深入分析现有方法的基础上,对深度学习框架下为解决异构鸿沟和语义鸿沟给出新的思路。
关键词
Researches on multimedia technology 2014—deep learning and multimedia computing

Wu Fei1, Zhu Wenwu2, Yu Junqing3(1.School of Computer Science & Technology, Zhejiang University, Hangzhou 310058, China;2.School of Computer Science & Technology, Tsinghua University, Beijing 100084, China;3.School of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China)

Abstract
Objective The increasing large scale data puts forth a great challenge to multimedia computing. Different from traditional multimedia computing which is heavily based on hand-crafted features, deep learning (feature learning) recently achieves noticeable advance in multimedia computing. Method This paper presents the details of deep learning on multimedia retrieval and annotation, multi-modal semantic understanding as well as the video analysis and understanding, which tend to overcome the heterogeneity gap and semantic gap of multimedia computing in the setting of deep learning framework. Result On multimedia retrieval and annotation, deep learning-based "neural-codes" has been proposed and proves effective. Besides, deep learning is used for multi-modal semantic understanding to bridge the heterogeneity gap between different modals and the semantic gap between the bottom features and top semantic and deep learning-based compositional semantic learning is attracting increasing focus. Moreover, deep learning proves effective for video action recognition and for achieving a good representation of videos. However, the data-driven deep learning is easily affected by the noise in the data and is not ripe for online incremental learning. How to combine deep learning with crowdsourcing computing is a challenge and may be a future research direction. Conclusion We analyze the existing methods of deep learning, and provide a new way to overcome the heterogeneity gap and semantic gap in deep learning framework.
Keywords

订阅号|日报