多尺度显著区域检测图像压缩
摘 要
目的 为了解决利用显著区域进行图像压缩已有方法中存在的对多目标的图像内容不能有效感知,从而影响重建图像的质量问题,提出一种基于多尺度深度特征显著区域检测图像压缩方法。方法 利用改进的卷积神经网络(CNNs),进行多尺度图像深度特征检测,得到不同尺度显著区域;然后根据输入图像尺寸自适应调整显著区域图的尺寸,同时引入高斯函数,对显著区域进行滤波,得到多尺度融合显著区域;最后结合编码压缩技术,对显著区域实行近无损压缩,非显著区域利用有损编码技术进行有损压缩,完成图像的压缩和重建工作。结果 提出的图像压缩方法较JPEG压缩方法,编码码率为0.39 bit/像素左右时,在数据集Kodak PhotoCD上,峰值信噪比(PSNR)提高了2.23 dB,结构相似性(SSIM)提高了0.024;在数据集Pascal Voc上,PSNR和SSIM两个指标分别提高了1.63 dB和0.039。同时,将提出的多尺度特征显著区域方法结合多级树集合分裂(SPIHT)和游程编码(RLE)压缩技术,在Kodak数据集上,PSNR分别提高了1.85 dB、1.98 dB,SSIM分别提高了0.006、0.023。结论 提出的利用多尺度深度特征进行图像压缩方法得到了较传统编码技术更好的结果,该方法通过有效地进行图像内容的感知,使得在图像压缩过程中,减少了图像内容损失,从而提高了压缩后重建图像的质量。
关键词
Image compression method based on multi-scale saliency region detection
Qu Haicheng1, Tian Xiaorong1, Liu Lamei1, Shi Cuiping2(1.School of Software, Liaoning Technical University, Huludao 125105, China;2.College of Communication and Electronic Engineering, Qiqihar University, Qiqihar 161000, China) Abstract
Objective Image compression, which aims to remove redundant information in an image, is a popular issue in image processing and computer vision. In recent years, image compression based on deep learning has attracted much attention of scholars in the field of image processing. Image compression using convolutional neural networks (CNNs) can be roughly divided into two categories. One is the image compression method based on the end-to-end convolutional network. The other category is CNNs combined with the traditional image compression method, which uses CNNs to deeply perceive the image content and obtains salient regions. High-quality coding is then applied to the salient regions, and lower-quality coding is used for non-significant regions to improve the visual quality of the compressed reconstructed images. However, in the latter method, the quality of the reconstructed image is often considerably affected because there is no effective perception of the image content information. In view of the effectiveness of image content perception, the influence of scale on image content detection is disregarded in several conventionally proposed salient region detection methods. Furthermore, the difference in size between the input image and the output saliency map is not considered, which limits the model's perception domain to the image. Consequently, several salient objects in the original image cannot be effectively perceived, which affects the reconstructed image's quality in the subsequent compression. A novel image compression method based on multi-scale depth feature salient region (MS-DFSR) detection is proposed in the current study to deal with this problem. Method Improved CNNs are used to detect the depth features of multi-scale images. For multi-scale images, with the help of the scale space concept, a plurality of saliency maps is generated by inputting an image into the MS-DFSR model using a pyramid structure to complete the detection of multi-scale saliency regions. Scale selection, in the presence of an extremely large scale, causes the resulting salient area to become too divergent and loses salient meaning. Therefore, two scales are used in this work. The first one is the standard output scale of the network, and the second scale is the larger scale adopted in this work. The latter scale is used to effectively detect multiple salient objects in an image and perceive the image content effectively. For depth features' salient region detection, we replace the fully connected layer and the fourth max pooling layer with a global average pooling layer and an avg pooling layer in order to retain spatial location information on multiple salient objects in an image as much as possible. Then, the salient areas of different scales that are detected by MS-DFSR are obtained. To increase the perceived domain of an image and the perceived image content effectively, the size of the salient region map is adaptively adjusted according to the size of the input image by considering the difference between the input and output salient image sizes. Meanwhile, a Gaussian function is introduced to filter the salient region, retain the original image content information, and obtain a multi-scale fusion saliency region map. Finally, we complete image compression and reconstruction by combining the obtained multi-scale saliency region map with image coding methods. To protect the image's salient content and improve the reconstructed image's quality, the salient regions of an image are compressed using near-lossless and lossy compression methods, such as joint photographic experts' group (JPEG) and set partitioning in hierarchical trees (SPIHT), on the non-salient regions. Result We compare our model with three traditional compression methods, namely, JPEG, SPIHT, and run-length encoding (RLE) compression techniques. The experimental datasets include two public datasets, namely, Kodak PhotoCD and Pascal Voc. The quantitative evaluation metrics (higher is better) include the peak signal-to-noise ratio (PSNR), the structural similarity index measure (SSIM), and a modified PSNR metric based on HVS (PSNR-HVS). Experiment results show that our model outperforms all the other traditional methods on the Kodak PhotoCD and Pascal Voc datasets. The saliency map shows that our model can produce results that cover multiple salient objects and improve the effective perception of image content. We compare the image compression method based on MS-DFSR detection with the image compression method based on single-scale depth feature salient region (SS-DFSR) detection, and the validity of the MS-DFSR detection model is verified. Comparative experiments demonstrate that the proposed compression method improves image compression quality. The quality of the image reconstructed using the proposed compression method is higher than that using the JPEG image compression method. When the code rate is approximately 0.39 bpp on the Kodak PhotoCD dataset, PSNR is improved by 2.23 dB, SSIM by 0.024, and PSNR-HVS by 2.07. On the Pascal Voc dataset, PSNR, SSIM, and PSNR-HVS increase by 1.63 dB, 0.039, and 1.57, respectively. At the same time, when MS-DFSR is combined with SPIHT and RLE compression technology on the Kodak PhotoCD dataset, PSNR is increased by 1.85 dB and 1.98 dB, respectively. SSIM is improved by 0.006 and 0.023, respectively, and PSNR-HVS is increaseal by 1.90 and 1.88, respectively. Conclusion The proposed image compression method using multi-scale depth features exhibits better performance than traditional image compression methods because the proposed method effectively reduces image content loss by improving the effectiveness of image content perception during the image compression process. Consequently, the quality of the reconstructed image can be improved significantly.
Keywords
image compression multi-scale depth features saliency region detection convolutional neural networks (CNNs) peak signal to noise ratio (PSNR) structural similarity (SSIM)
|