面向尺度自适应纹理滤波的动态侧窗口滤波核预测网络

刘春晓; 高铭志; 楼菊青; 王勋

发布时间： 2025-03-14
摘要点击次数： 124
全文下载次数： 49
DOI:
| Volume | Number

面向尺度自适应纹理滤波的动态侧窗口滤波核预测网络

刘春晓¹, 高铭志², 楼菊青³, 王勋²(1.浙江工商大学;2.浙江工商大学计算机科学与技术学院;3.浙江工商大学环境科学与工程学院)

摘要

目的现有纹理滤波算法难以在保持小尺度或弱梯度结构的同时平滑大尺度或强梯度纹理，而且目前用于算法训练的合成图像和真实场景图像存在领域偏移。为实现尺度自适应的纹理滤波，提出了一种动态侧窗口滤波核预测网络。为提高算法在真实场景图像上的泛化性能，制作了一个混合合成纹理滤波数据集。方法本文算法分为两个阶段，首先基于Transformer和卷积设计编码-解码模块，生成结构区域分割图和过平滑图，然后基于引导滤波和侧窗口滤波设计了滤波核预测模块，在结构和纹理信息的引导下，预测8组动态侧窗口滤波核的采样点和权重值，分别对过平滑图采样和滤波，最后线性融合，获得了最终滤波结果。本文数据集混合了往分割区域中填充多种纹理的填充子数据集和将结构背景融合单种纹理的融合子数据集两部分。结果实验在6个数据集上与18种算法进行比较。相比于性能第2的算法，本文算法在峰值信噪比和结构相似性上均有提升，在视觉效果中增强了细窄结构并减少了纹理残留和颜色不均现象。本文数据集降低了算法混淆真实结构和纹理的概率。结论本文算法综合Transformer的全局依赖捕捉和卷积的低级特征提取优势，借助侧窗口滤波核强大的结构保持能力和动态的采样点和权重值的设计，利用引导信息平衡了对大尺度纹理的平滑和对小尺度结构的重建。本文数据集模拟真实场景图像中的结构和纹理模式，提升了算法对真实结构和纹理的识别和滤波效果。

关键词

纹理滤波尺度自适应数据集生成动态侧窗口引导滤波 Transformer

Dynamic side window filtering kernel prediction network for scale adaptive texture filtering

()

Abstract

Objective Texture filtering is a fundamental task in computer vision that attempts to preserve significant structures and smooth out irrelevant textures in an image through filtering. Texture filtering is useful for many image analysis and processing tasks, such as contour detection, image segmentation, image abstraction, detail enhancement, tone mapping and image denoising. The existing texture filtering algorithms can be mainly divided into traditional filtering algorithms such as algorithms based on local optimization and algorithms based on global optimization, and neural network filtering algorithms. Traditional filtering algorithms are difficult to use high-level semantics in the image and rely heavily on manual parameter adjustment, neural network filtering algorithms achieve better results than traditional filtering algorithms with the help of the powerful feature representation ability of convolutional neural networks. However neural network filtering algorithms have fixed inference parameters after the training, and there are textures and structures of different scales in the image, which require different filtering strengths, and it is difficult for a neural network with fixed parameters to adaptively adjust and smooth large-scale and strong-gradient textures while maintaining small-scale and weak-gradient structures. In addition, there is a lack of high-quality texture filtering datasets for training neural networks, and the common practice is to select the filtering results of existing algorithms as imperfect labels or to make synthetic structure-texture images with domain shift from real scene images, but all of them have flaws, resulting in poor generalization performance of the algorithm on real scene images. In order to address the issue of the filtering algorithm's lack of adaptive filtering capability, a two-stage dynamic side window filtering kernel prediction network (SWNet) is proposed. It employs reverse thinking and a divide-and-conquer strategy, which involves smoothing first and then reconstructing, suppressing large-scale textures, distinguishing between structural pixels and texture pixels during reconstruction, and restoring clear structures. In order to address the issue of domain shift in the dataset, an improved structure-texture image synthesis method is proposed and a hybrid synthetic texture filtering dataset (HSTF) is made. Method We firstly design a hierarchical U-Net encoding-decoding module based on Transformer and convolutional neural networks (CNN) to generate the structure area segmentation map and the over-smooth image. Then, based on the guided filtering, the side window filtering and the dynamic convolution, a filter kernel prediction module is designed to predict the sampling points and weight values of eight groups of side window filter kernels under the guidance of the structure area segmentation map, the over-smooth image and the original input image, then the over-smooth image is sampled and filtered respectively. Finally, a linear fusion is used to obtain the final filtering result. The HSTF dataset consists of two parts: a filling sub-dataset that fills different texture maps in the segmented area, and a fusion sub-dataset that fuses a structure background map with a single texture map, which are mixed together for network training. Result In the experiment, the SWNet algorithm is compared with 18 algorithms on 6 datasets, including classical algorithms and latest algorithms. In the test set of HSTF dataset, compared with the second algorithm in performance, the SWNet algorithm improves the PSNR value by 2.581dB and the SSIM value by 0.033. In the BEPS dataset, compared with the second performance algorithm, the SWNet algorithm improves the PSNR value by 4.616dB and the SSIM value by 0.090. In the NKS dataset, compared with the second algorithm in performance, the SWNet algorithm improves the PSNR value by 1.942dB and the SSIM value by 0.012. In the SPS dataset, compared with the second performance algorithm, the SWNet algorithm improves the PSNR value by 2.701dB and the SSIM value by 0.034. In the RTV dataset, the BEPS dataset and the ImageNet2012 dataset, a visual effect comparison experiment is carried out to intuitively verify the effectiveness of the SWNet algorithm from the perspective of human eye perception, compared with other algorithms, the filtering results of the SWNet algorithm have a finer structure preservation and smoother texture area (no remnant texture and close to a flat pure color). We also retrain some classical algorithms on the original dataset, the filling sub-dataset, the fusion sub-dataset and the HSTF dataset, a comparative experiment shows that the HSTF dataset can effectively improve the structure preservation and texture smoothing effect of the algorithm on real scene images. Conclusion The SWNet algorithm proposed in this paper combines the advantage of global dependence capture of Transformer and the advantage of low-level feature extraction of CNN, and uses the over-smooth image and the structure area segmentation map to better guide the smoothing process and the preservation process, and with the help of the strong structure-preserving ability of the side window filter kernel and the design of dynamic sampling points and weight values, it can effectively smooth out a variety of different scale textures while maintaining the significant and narrow structures, therefore realize scale-adaptive filtering. In addition, the HSTF dataset takes into account the structure and texture pattern of real scene images, which can effectively improve the generalization performance of the algorithm on real scene images.

Keywords

texture filtering, scale adaptive, dataset generation, dynamic side window, guided filtering, transformer

在线采编平台

论文出版

年度会议

下载中心

年度信息