轻量化航拍图像电力线语义分割
摘 要
目的 电力线在航拍图像中的提取是智能巡检的重要研究内容,基于深度学习的图像语义分割模型在此领域的应用已有较好的效果。然而,图像训练集容量较小和预训练模型计算量过大是两个待解决的问题。方法 首先使用生成对抗网络模型结合圆锥曲线和色相扰动进行数据集增强,对3种不同的损失函数以及两个色彩空间所训练的U-Net网络模型进行对比,给出最优组合。然后提出了一种联合一阶泰勒展开和输出通道2范数的显著度指标,对上述完整模型使用改进的通道级参数正则化方法来稀疏化模型权重,并对稀疏模型进行网络剪枝和重训练以降低模型的计算量。最后,在判决阈值的选择上,使用自适应阈值替代固定值法以增强对亮度变化的鲁棒性。结果 实验结果表明,提出的灰度输入轻量化模型IoU(intersection-over-union)指标为0.459,但其参数量和计算量相当于IoU指标为0.573的可见光完整模型的0.03%和3.05%,且自适应阈值法在合适的光照变化范围内能达到该条件下最优阈值的相似结果。结论 验证了不同数据集增强方法、损失函数、输入色彩空间组合对模型收敛性能、训练速度和过拟合程度的影响,给出了各色彩空间内的最佳组合。同时,采用网络剪枝的方式极大降低了电力线语义分割网络的参数量和运算量,对网络模型的落地部署有积极的作用。
关键词
Research on lightweight neural network of aerial powerline image segmentation
Xu Gang, Li Guo(College of Electrical and Electronic Engineering, North China Electric Power University, Beijing 102206, China) Abstract
Objective Powerline semantic segmentation of aerial images, as an important content of powerline intelligent inspection research, has received widespread attention. Recently, several deep learning-based methods have been proposed in this field and achieved high accuracy. However, two major problems still need to be solved before deep learning models can be applied in practice. First, the sample size of publicly available datasets is small. Unlike target objects in other semantic segmentation tasks (e.g., cars and buildings), powerlines have few textures and structural features, which make powerlines easy to be misidentified, especially in scenes that are not covered by the training set. Therefore, constructing a training set that contains many different background samples is crucial to improve the generalization capability of the model. The second problem is the conflict between the amount of model computation and the limited terminal computing resources. Previous work has demonstrated that an improved U-Net model can segment powerlines from aerial images with satisfactory accuracy. However, the model is computationally expensive for many resource-constrained inference terminals (e.g., unmanned aerial vehicles(UAVs)). Method In this study, the background images in the training set were learned using a generative adversarial network (GAN) to generate a series of pseudo-backgrounds, and curved powerlines were drawn on the generated images by utilizing conic curves. In detail, a multi-scale-based automatic growth model called progressive growing of GANs (PGGAN) was adopted to learn the mapping of a random noise vector to the background images in the training set. Then, its generator was used to generate serials of the background images. These background images and the curved powerlines generated by the conic curves were fused in the alpha channel. We created three training sets. The first one consisted of only 2 000 real background pictures, and the second was a mixture of 10 000 real and generated background images. The third training dataset was composed of 200 generated backgrounds and used to evaluate the similarity between the generated and original images. At the input of the segmentation network, random hue perturbation was applied to the images to enhance the generalization of the model across seasons. Then, the convergence accuracy of U-Net networks with three different loss functions was compared in RGB and grayscale color spaces to determine the best combination. Specifically, we trained U-Net with focal, soft-IoU, and Dice loss functions in RGB and gray spaces and compared the convergence accuracy, convergence speed, and overfitting of the six obtained models. Afterward, sparse regularization was applied to the pre-trained full model, and structured network pruning was performed to reduce the computation load in network inference. A saliency metric that combines first-order Taylor expansion and 2-norm metric was proposed to guide the regularization and pruning process. It provided a higher compression rate compared with the 2-norm that was used in the previous pruning algorithm. Conventional saliency metrics based on first-order expansion can change by orders of magnitude during the regularization process, thus making threshold selection during the iterative process difficult. Compared with these conventional metrics, the proposed metric has a more stable range of values, which enables the use of iteration-based regularization methods. We adopted a 0-norm-based regularization method to widen the saliency gap between important and unimportant neurons. To select the decision threshold, we used an adaptive approach, which was more robust to changes in luminance compared with the fixed-threshold method used in previous work. Result Experimental results showed that the convergence accuracy of the curved powerline dataset was higher than that of the straight powerline dataset. In RGB space, the hybrid dataset using GAN had higher convergence accuracy than the dataset using only real images, but no significant improvement in gray space was observed due to the possibility of model collapse. We confirmed that hue disturbance can effectively improve the performance of the model across seasons. The experimental results of the different loss functions revealed that the convergence intersection-over-union(IoU) of RGB and gray spaces under their respective optimal loss functions was 0.578 and 0.586, respectively. Dice and soft-IoU had a negligible difference in convergence speed and achieved the best accuracy in gray and RGB spaces, respectively. The convergence of focal loss was the slowest in both spaces, and neither achieved the optimal accuracy. At the pruning stage, by using the conventional 2-norm saliency metric, the proposed gray space lightweight model (IoU of 0.459) reduced the number of floating-point operations per second (FLOPs) and parameters to 3.05% and 0.03% of the full model in RGB space, respectively (IoU of 0.573). When the proposed joint saliency metric was used, the numbers of FLOPs and parameters further decreased to 0.947% and 0.015% of the complete model, respectively, while maintaining an IoU of 0.42. The experiment also showed that the Otsu threshold method worked stably within the appropriate range of illumination changes, and a negligible difference from the optimal threshold was observed. Conclusion Improvements in the dataset and loss function independently enhanced the performance of the baseline model. Sparse regularization and network pruning reduced the network parameters and calculation load, which facilitated the deployment of the model on resource-constrained inferring terminals, such as UAVs. The proposed saliency measure exhibited better compression capabilities than the conventional 2-norm metric, and the adaptive threshold method helped improve the robustness of the model when the luminance changed.
Keywords
smart inspection image semantic segmentation sparse regularization network pruning generated adversarial network(GAN)
|