多媒体信号处理的数学理论前沿进展
熊红凯1, 戴文睿1, 林宙辰2, 吴飞3, 于俊清4, 申扬眉1, 徐明星1(1.上海交通大学电子信息与电气工程学院, 上海 200240;2.北京大学信息科学技术学院, 北京 100080;3.浙江大学计算机科学与技术学院, 杭州 310027;4.华中科技大学计算机科学与技术学院, 武汉 430074) 摘 要
深度学习模型广泛应用于多媒体信号处理领域,通过引入非线性能够极大地提升性能,但是其黑箱结构无法解析地给出最优点和优化条件。因此如何利用传统信号处理理论,基于变换/基映射模型逼近深度学习模型,解析优化问题,成为当前研究的前沿问题。本文从信号处理的基础理论出发,分析了当前针对高维非线性非规则结构方法的数学模型和理论边界,主要包括:结构化稀疏表示模型、基于框架理论的深度网络模型、多层卷积稀疏编码模型以及图信号处理理论。详细描述了基于组稀疏性和层次化稀疏性的表示模型和优化方法,分析基于半离散框架和卷积稀疏编码构建深度/多层网络模型,进一步在非欧氏空间上扩展形成图信号处理模型,并对国内外关于记忆网络的研究进展进行了比较。最后,展望了多媒体信号处理的理论模型发展,认为图信号处理通过解析谱图模型的数学性质,解释其中的关联性,为建立广义的大规模非规则多媒体信号处理模型提供理论基础,是未来研究的重要领域之一。
关键词
Advances in mathematical theory for multimedia signal processing
Xiong Hongkai1, Dai Wenrui1, Lin Zhouchen2, Wu Fei3, Yu Junqing4, Shen Yangmei1, Xu Mingxing1(1.School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;2.School of Electronic Engineering and Computer Science, Peking University, Beijing 100080, China;3.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;4.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China) Abstract
Deep learning models have been widely used in multimedia signal processing. They considerably improve the performance of signal processing tasks by introducing nonlinearities but lack analytical formulation of optimum and optimality conditions due to their black-box architectures. In recent years, analyzing the optimal formulation and approximating the deep learning models based on classical signal processing theory have been popular for multimedia, that is, transform/basis projection-based models. This paper presents and analyzes the mathematical models and their theoretical bounds for high-dimensional nonlinear and irregular structured methods based on the fundamental theories of signal processing. The main content includes structured sparse representation, frame-based deep networks, multilayer convolutional sparse coding, and graph signal processing. We begin with sparse representation models based on group and hierarchical sparsities with their optimization methods and subsequently analyze the deep/multilayer networks developed using semi-discrete frames and convolutional sparse coding. We also present graph signal processing models by extending classical signal processing to the non-Euclidean geometry. Recent advances in these topics achieved by domestic and foreign researchers are compared and discussed. Structured sparse representation introduces the mixed norms to formulate a group Lasso problem for structural information, which can be solved using proximal method or network flow optimization. Considering that structured sparse representation is still based on the linear projection onto dictionary atoms, frame-based deep networks are developed to extend the semi-discrete frames in multiscale geometric analysis. They inherit the scale and directional decomposition led by frame theory and introduce nonlinearities to guarantee deformation stability. Inspired by scattering networks, multilayer convolutional sparse coding introduces combined regularization into sparse representation to fit max pooling operation. Sparse representation of irregular multiscale structures can be achieved with the trained overcomplete dictionary in a recursive manner. Graph signal processing extends conventional signal processing into non-Euclidean spaces. When integrated with convolutional neural networks, graph neural networks learn complex relational networks and are desirable for data-driven large-scale high-dimensional irregular signal processing. This paper forecasts the future work of mathematical theories and models for multimedia signal processing. This research is useful for developing a generalized graph signal processing model for large-scale irregular multimedia signals by analyzing the mathematical properties and linkages of conventional signal processing and graph spectral model.
Keywords
structured sparse representation frame-based deep convolutional network multi-layer convolutional sparse coding graph signal processing multimedia signal processing
|