多路径卷积神经网络的轮廓感知
摘 要
目的 引入视觉信息流的整体和局部处理机制,提出了一种多路径卷积神经网络的轮廓感知新方法。方法 利用高斯金字塔尺度分解获得低分辨率子图,用来表征视觉信息中的整体轮廓;通过2维高斯导函数模拟经典感受野的方向选择性,获得描述细节特征的边界响应子图;构建多路径卷积神经网络,利用具有稀疏编码特性的子网络(Sparse-Net)实现对整体轮廓的快速检测;利用具有冗余度增强编码特性的子网络(Redundancy-Net)实现对局部细节特征提取;对上述多路径卷积神经网络响应进行融合编码,以实现轮廓响应的整体感知和局部检测融合,获取轮廓的精细化感知结果。结果 以美国伯克利大学计算机视觉组提供的数据集BSDS500图库为实验对象,在GTX1080Ti环境下本文Sparse-Net对整体轮廓的检测速度达到42幅/s,为HFL方法1.2幅/s的35倍;而Sparse-Net和Redundancy-Net融合后的检测指标数据集尺度上最优(ODS)、图片尺度上最优(OIS)、平均精度(AP)分别为0.806、0.824、0.846,优于HED (holistically-nested edge detection)方法和RCF (richer convolution features for edge detection)方法,结果表明本文方法能有效突出主体轮廓并抑制纹理背景。结论 多路径卷积神经网络的轮廓感知应用,将有助于进一步理解视觉感知机制,并对减弱卷积神经网络的黑盒特性有着重要的意义。
关键词
Contour perception based on multi-path convolution neural network
Tan Mingming, Fan Yingle, Wu Wei, She Qingshan, Gan Haitao(Laboratory of Pattern Recognition and Image Processing, Hangzhou Dianzi University, Hangzhou 310018, China) Abstract
Objective This study aims to introduce the global and local processing mechanism of visual information flow by constructing a visual information encoding and decoding model based on the correlation between visual nerve coding and contour perception and propose a contour perception method based on multi-path convolution neural network. Method The Gauss pyramid scale decomposition was used to obtain low-resolution molecular images to characterize the whole contour of visual information. Two-dimensional Gauss derivative was used to simulate the directional selectivity of classical receptive fields to obtain boundary response sub-graphs describing details. A multi-path convolution neural network was constructed, and a sparse encoding sub-network (Sparse-Net) was used to realize the fast processing of the whole contour detection. Redundancy enhanced coding (Redundancy-Net) was used to extract local details. The response of the multi-path convolution neural network was fused and coded to integrate global perception and local detection of contour responses and obtain the fine perception results of the contour. Result With the BSDS500 image database provided by Berkely Computer Vision Group as the experimental object, the detection speed of Sparse-Net in GTX1080Ti environment reached 42 frame/s, which was 35 times higher than that of HFL method (1.2 frame/s). The detection index data set of Sparse-Net and Redundancy-Net after fusion was the best in scale (ODS) and picture scale (OIS) and AP are 0.806, 0.824, and 0.846 respectively, which are better than the holistically-nested edge detection (HED) and richer convolution features for edge detection (RCF) methods, which are based on the analysis of the lateral output feature map, progressive encoding and decoding, and feature fusion from the shallow to the deep layer of the network, learning fine contour features and achieving end-to-end contour detection. The proposed method cannot only effectively highlight the main contour and suppress the texture background but also improve the detection efficiency of contour. Conclusion Convolution neural network can be explained by visual mechanism in some dimensions, such as convolution operation corresponding to the topological mapping of retinal visual information. Pooling operation is related to complex cells and simple cells in visual pathway. As such, convolution neural network is still a black box model which depends heavily on massive samples on the whole. Considering that the actual visual pathway is not simply a serial transmission of information but a fusion of the local and global characteristics of multi-channel visual information flow in the visual cortex, a Gauss pyramid decomposition model was constructed for sparse encoding of the spatial scale of visual information and obtaining low-resolution molecular maps representing the overall characteristics. Lateral suppression of non-classical receptive fields was used in the lateral geniculate region. A classical receptive field with directional selection characteristics was set up for isotropic suppression of background information and considering the ability of primary visual cortex for information processing in the visual radiation region. A two-dimensional Gauss derivative model was constructed to process the visual information by directional selection. The boundary response sub-graph representing local features was obtained. A multi-path convolution neural network was constructed considering the local details of external excitation and the layer-by-layer perception of overall information in the primary and advanced visual cortex. In the network, the fast detection path was composed of a sub-network Sparse-Net containing a pooling unit for sparse coding of the overall image contour. The detail detection path was composed of a sub-network Redundancy-Net containing a void convolution unit to realize image bureau. Redundancy enhanced the coding of part details. Finally, the feedback and fusion process of high-level visual cortex to visual information flow was simulated, and the above-mentioned multi-path convolution neural network response was fused and coded for overall perception and local detection fusion of the contour response. Finally, the fine perception results of the contour were obtained. Contour perception based on multi-path convolution neural network is helpful to further understand the mechanism of visual perception and is of great significance to weaken the black-box characteristics of the convolution neural network. Taking the natural scene image subject contour perception under complex texture background as an example, simulating the neural coding mechanism of multi-path cooperative work in primary visual pathway will help understand the intrinsic mechanism of visual system and its specific application in visual perception. This works provides a new idea for subsequent image understanding and analysis based on visual mechanism.
Keywords
contour detection cavity convolution convolution neural network (CNN) visual perception feature fusion
|