Current Issue Cover
多媒体技术研究:2017——记忆驱动的媒体学习与创意

吴飞1, 韩亚洪2, 廖彬兵1, 于俊清3(1.浙江大学计算机科学与技术学院, 杭州 310027;2.天津大学计算机科学与技术学院, 天津 300072;3.华中科技大学计算机科学与技术学院, 武汉 430074)

摘 要
目的 借鉴大脑的工作机理来发展人工智能是当前人工智能发展的重要方向之一。注意力与记忆在人的认知理解过程中扮演了重要的角色。由于"端到端"深度学习在识别分类等任务中表现了优异性能,因此如何在深度学习模型中引入注意力机制和外在记忆结构,以挖掘数据中感兴趣的信息和有效利用外来信息,是当前人工智能研究的热点。方法 本文以记忆和注意力等机制为中心,介绍了这些方面的3个代表性工作,包括神经图灵机、记忆网络和可微分神经计算机。在这个基础上,进一步介绍了利用记忆网络的研究工作,其分别是记忆驱动的自动问答、记忆驱动的电影视频问答和记忆驱动的创意(文本生成图像),并对国内外关于记忆网络的研究进展进行了比较。结果 调研结果表明:1)在深度学习模型中引入注意力机制和外在记忆结构,是当前人工智能研究的热点;2)关于记忆网络的研究越来越多。国内外关于记忆网络的研究正在蓬勃发展,每年发表在机器学习与人工智能相关的各大顶级会议上的论文数量正在逐年攀升;3)关于记忆网络的研究越来越热。不仅每年发表的论文数量越来越多,且每年的增长趋势并没有放缓,2015年增长了9篇,2016年增长了4篇,2017年增长了9篇,2018年增长了14篇;4)基于记忆驱动的手段和方法十分通用。记忆网络已成功地运用于自动问答、视觉问答、物体检测、强化学习、文本生成图像等领域。结论 数据驱动的机器学习方法已成功运用于自然语言、多媒体、计算机视觉、语音等领域,数据驱动和知识引导将是人工智能未来发展的趋势之一。
关键词
Researches on multimedia technology: 2017——memory-augmented media learning and creativity

Wu Fei1, Han Yahong2, Liao Binbing1, Yu Junqing3(1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;2.School of Computer Science and Technology, Tianjin University, Tianjin 300072, China;3.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China)

Abstract
Objective The human brain that has evolved over a million years is perhaps the most complex and sophisticated machine in the world, carrying all the intelligent activities of human beings, such as attention, learning, memory, intuition, insight and decision making. The core of the human brain consists of billions of neurons and synapses. Each neuron "receives" information from some neurons through synapses, and then passes the processed information to other neurons through the synapse. In this way, external sensory information (i.e., visual, auditory, olfactory, taste, touch) is analyzed and processed in the brain in a complex way to form perception and cognition. Attention and memory play an important role in the cognitive process of human understanding. The development of artificial intelligence based on the memory mechanism of the brain is an advanced aspect of research. Given that "end-to-end" deep learning enables an excellent performance in tasks, such as recognition and classification, introducing attention mechanism and external memory in the deep learning model to mine information of interest in data and effectively use auxiliary information is a popular research area in artificial intelligence. Method This report focuses on the external memory and attention mechanism of the brain. Firstly, three representative works, namely, neural turing machine, memory networks, and differentiable neural computer, are introduced. Neural turing machine is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Memory networks can reason with inference components combined with a long-term memory component and they learn how to use these components jointly. Differentiable neural computer, which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Secondly, several specific applications, such as knowledge memory network for question answering, memory-driven movie question answering, and memory-driven creativity (text-to-image), are presented. For answering the factoid questions, this report present the temporality-enhanced knowledge memory network (TE-KMN), which encodes not only the content of questions and answers, but also the temporal cues in a sequence of ordered sentences that gradually remark the answer. Moreover, TE-KMN collaboratively uses external knowledge for a better understanding of a given question. For answering questions about movies, the layered memory network (LMN) that represents frame-level and clip-level movie content by the static word memory module and the dynamic subtitle memory module respectively, is introduced. To generate images depending on their corresponding narrative sentences, this report presents the visual-memory Creative Adversarial Network (vmCAN), which appropriately leverages an external visual knowledge memory in both multi-modal fusion and image synthesis. Finally, research progress of memory networks at home and abroad is compared. Result Research results show that 1) introducing attention mechanism and external memory structure in the deep learning model is a current hotspot in artificial intelligence research. 2) Research that focuses on memory networks at home and abroad has been intensified, and literature related to machine learning and artificial intelligence has been published at top conferences and has been increasing annually. 3) Research on memory networks is gaining popularity. An increasing number of papers have been published yearly, and this trend has been constantly growing. Thus far, 9, 4, 9, and 14 articles have been published from 2015 to 2018, respectively. 4) Memory-driven methods and approaches are general, and memory networks have been successfully used in areas, such as question answering, visual question answering, object detection, reinforcement learning, and text-to-images. Conclusion This report shows a future work on media learning and creativity. The next generation of artificial intelligence should be never-ending learning from data, experience, and automatic reasoning. In the future, artificial intelligence should be integrated organically with human knowledge through methods such as attention mechanism, memory network, transfer learning, and reinforcement learning, so as to achieve from shallow computing to deep reasoning, from simple data-driven to data-driven combined with logic rules, from vertical domain intelligence to more general artificial intelligence.
Keywords

订阅号|日报