引入概率分布的深度神经网络贪婪剪枝
胡骏1,2, 黄启鹏1, 刘嘉昕1,2, 刘威1,3, 袁淮1, 赵宏1(1.东北大学计算机科学与工程学院, 沈阳 110169;2.东软睿驰汽车技术(沈阳)有限公司, 沈阳 110179;3.东软睿驰汽车技术(上海)有限公司, 上海 201804) 摘 要
目的 深度学习在自动驾驶环境感知中的应用,将极大提升感知系统的精度和可靠性,但是现有的深度学习神经网络模型因其计算量和存储资源的需求难以部署在计算资源有限的自动驾驶嵌入式平台上。因此为解决应用深度神经网络所需的庞大计算量与嵌入式平台有限的计算能力之间的矛盾,提出了一种基于权重的概率分布的贪婪网络剪枝方法,旨在减少网络模型中的冗余连接,提高模型的计算效率。方法 引入权重的概率分布,在训练过程中记录权重参数中较小值出现的概率。在剪枝阶段,依据训练过程中统计的权重概率分布进行增量剪枝和网络修复,改善了目前仅以权重大小为依据的剪枝策略。结果 经实验验证,在Cifar10数据集上,在各个剪枝率下本文方法相比动态网络剪枝策略的准确率更高。在ImageNet数据集上,此方法在较小精度损失的情况下,有效地将AlexNet、VGG(visual geometry group)16的参数数量分别压缩了5.9倍和11.4倍,且所需的训练迭代次数相对于动态网络剪枝策略更少。另外对于残差类型网络ResNet34和ResNet50也可以进行有效的压缩,其中对于ResNet50网络,在精度损失增加较小的情况下,相比目前最优的方法HRank实现了更大的压缩率(2.1倍)。结论 基于概率分布的贪婪剪枝策略解决了深度神经网络剪枝的不确定性问题,进一步提高了模型压缩后网络的稳定性,在实现压缩网络模型参数数量的同时保证了模型的准确率。
关键词
Greedy pruning of deep neural networks fused with probability distribution
Hu Jun1,2, Huang Qipeng1, Liu Jiaxin1,2, Liu Wei1,3, Yuan Huai1, Zhao Hong1(1.School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China;2.Neusoft Reachauto Corporation(Shenyang), Shenyang 110179, China;3.Neusoft Reachauto Corporation(Shanghai), Shanghai 201804, China) Abstract
Objective In recent years, deep learning neural network has continued to develop, and excellent results have been achieved in the fields of computer vision, natural language processing, and speech recognition. In autonomous driving technology, the environment perception is an important application. The environment perception mainly processes the collected image information about the surrounding environment. Thus, deep learning is an important section in this link. However, the number of layers of existing neural network models continues to increase with the continuous increase in the complexity of processing problems. Thus, the number of overall parameters of the network and the required computing power are increasing. These models run well on platforms with sufficient computing power, such as server platforms with sufficient computing power. However, many deep neural network models are difficult to be deployed on embedded platforms with limited computing and storage resources, such as autonomous driving platforms. Compressing the existing deep neural network models is necessary to solve the contradiction between the huge amount of calculation required for the application of deep neural networks and the limited computing power of embedded platforms. This process can reduce the number of model parameters and computing power. This paper proposes a greedy network pruning method based on the existing model compression method. The propose method incorporates the probability distribution of weights to reduce redundant connections in the network model and improve the computational efficiency and parameters of the model. Method The current pruning method mainly uses the property of weight parameter as a criterion for parameter importance evaluation. The 1 norm of the convolution kernel weight parameter is used as the basis for determining the importance. However, this method ignores the variation of weight during training. In the pruning process, many methods use the trained model to perform one-time pruning. Thus, the accuracy of the model after pruning is difficult to maintain. the proposed is inspired by the study of uncertain graphs to solve the above problem. The probability distribution of weights is introduced, and the importance of the connection is jointly judged in accordance with the probability distribution of the weight parameter value and the size of the current weight in the training. The importance of the network connection and the effect of cutting the connection on the loss function are jointly used. The degree of influence collectively represents the contribution rate of this network connection to the result, thereby serving as the basis for pruning the network connection. In the stage of greedy pruning of the model, the proposed method uses incremental pruning to control the scale and speed of pruning. Iterative pruning and restoration are performed for a small proportion of connections until the state of the current sparse connections no longer changes. The pruning scale is gradually expanded until the expected model compression effect is achieved. Therefore, the incremental pruning and recovery strategy can avoid the weight gradient explosion problem caused by excessive pruning, improve the pruning efficiency and model stability, and realize dynamic pruning compared with the one-time pruning process based on the weight parameters. The proposed pruning method guarantees the maximum compression of the model's volume while maintaining its accuracy. Result The experiment uses networks of different depths for experiments, including CifarSmall, AlexNet, and visual geometry group(VGG)16, and nets with residual connections, including ResNet34 and ResNet50 networks, to verify the applicability of the proposed method to different depth networks. The experimental dataset uses the commonly used classification datasets, including CIFAR-10 and ImageNet ILSVRC(ImageNet Large Scale Visual Recognition Challenge)-2012, making it convenient for comparison with other methods. The main comparison content of the experiment includes the proposed method and the dynamic network pruning strategy on CIFAR-10. The pruning effect of the proposed method and the current state-of-the-art (SOTA) pruning algorithm HRank is compared on the Imagenet dataset in ResNet50. Experimental results prove that the accuracy of the proposed method is higher than that of the dynamic network pruning strategy at various pruning rates on the Cifar10 dataset. On the ImageNet data set, the proposed method effectively compresses the number of parameters of AlexNet and VGG16 by 5.9 and 11.4 times, respectively, with a small loss of accuracy. The number of training iterations required is more than that of the dynamic network pruning strategy. Effective compression can be performed for the residual type networks ResNet34 and ResNet50. For the ResNet50 network, a larger compression rate is achieved with a small increase in accuracy loss compared with the current SOTA method HRank. Conclusion The greedy pruning strategy fused with probability distribution solves the uncertainty problem of deep neural network pruning, improves the stability of the network after model compression, and realizes the compression of the number of network model parameters while ensuring the accuracy of the model. Experimental results prove that the proposed method has a good compression effect for many types. The probability distribution of the weight parameters introduced in this research can be used as an important basis for the subsequent parameter importance criterion in the pruning research. The incremental pruning and the connection recovery in the pruning process used in this article are important for accuracy maintenance, However, optimizing and accelerating the reasoning of the sparse model obtained after pruning needs further research.
Keywords
|