图像分类的深度卷积神经网络模型综述
张珂1,2, 冯晓晗1, 郭玉荣2, 苏昱坤1, 赵凯1, 赵振兵1, 马占宇2, 丁巧林1(1.华北电力大学电子与通信工程系, 保定 071000;2.北京邮电大学人工智能学院, 北京 100086) 摘 要
图像分类是计算机视觉中的一项重要任务,传统的图像分类方法具有一定的局限性。随着人工智能技术的发展,深度学习技术越来越成熟,利用深度卷积神经网络对图像进行分类成为研究热点,图像分类的深度卷积神经网络结构越来越多样,其性能远远好于传统的图像分类方法。本文立足于图像分类的深度卷积神经网络模型结构,根据模型发展和模型优化的历程,将深度卷积神经网络分为经典深度卷积神经网络模型、注意力机制深度卷积神经网络模型、轻量级深度卷积神经网络模型和神经网络架构搜索模型等4类,并对各类深度卷积神经网络模型结构的构造方法和特点进行了全面综述,对各类分类模型的性能进行了对比与分析。虽然深度卷积神经网络模型的结构设计越来越精妙,模型优化的方法越来越强大,图像分类准确率在不断刷新的同时,模型的参数量也在逐渐降低,训练和推理速度不断加快。然而深度卷积神经网络模型仍有一定的局限性,本文给出了存在的问题和未来可能的研究方向,即深度卷积神经网络模型主要以有监督学习方式进行图像分类,受到数据集质量和规模的限制,无监督式学习和半监督学习方式的深度卷积神经网络模型将是未来的重点研究方向之一;深度卷积神经网络模型的速度和资源消耗仍不尽人意,应用于移动式设备具有一定的挑战性;模型的优化方法以及衡量模型优劣的度量方法有待深入研究;人工设计深度卷积神经网络结构耗时耗力,神经架构搜索方法将是未来深度卷积神经网络模型设计的发展方向。
关键词
Overview of deep convolutional neural networks for image classification
Zhang Ke1,2, Feng Xiaohan1, Guo Yurong2, Su Yukun1, Zhao Kai1, Zhao Zhenbing1, Ma Zhanyu2, Ding Qiaolin1(1.Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071000, China;2.Institute of Artificial Intelligence, Beijing University of Posts and Telecommunication, Beijing 100086, China) Abstract
Image classification(IC) is one of important tasks in support of computer vision. Traditional image classification methods have limitations on the aspect of computer vision. Deep learning technology has become more mature than before based on deep convolutional neural network(DCNN) with the development of artificial intelligence(AI) recently. The performance of image classification has been upgraded based on the maturation of the deep convolutional neural network model.This research has mainly focused on a comprehensive overview of image classification in DCNN via the deep convolutional neural network model structure of image classification. Firstly, the modeling methodology has been analyzed and summarized. The DCNN analysis has been formulated into four categories listed below:1)classic deep convolutional neural networks; 2)deep convolutional neural networks based on the attention mechanism;3) lightweight networks; 4) the neural architecture search method. DCNN has high optimization capability using convolution to extract effective features of the images and learn feature expression from a large number of samples automatically. DCNN achieves better performance on image classification due to the effective features based on the deeper DCNN research and development. DCNN has been encounting lots of difficulities such as overfitting, vanishing gradient and huge model parameters.Hence, DCNN has become more and more difficult to optimize. The researchers in the context of IC have illustrated different DCNN models for different problems. Researchers have been making the network deeper that before via AlexNet. Subsequently, the classified analyses such as network in network(NIN), Overfeat, ZFNet, Visual Geometry Group(VGGNet), GoogLeNet have been persisted on.The problem of vanishing gradient has been more intensified via the deepening of the network.The optimization of the network becomes more complicated. Researchers have proposed residual network(ResNet) to ease gradient vanishing to improve the performance of image classification greatly. To further improve the performance of ResNet, researchers have issued a series of ResNet variants which can be divided into three categories in terms of different solutions via ResNet variants based on very deep ResNet optimization, ResNet variants based on increasing width and the new dimensions in ResNet variants. The ResNet has been attributed to the use of shortcut connections maximization. Densely connected convolutional network (DenseNet) have been demonstrated and the information flow in DenseNet between each layer has been maximized. To further promote the information flow between layers, the DenseNet variants have been illustrated via DPN(dual path network) and CliqueNet. DCNN based on the attention mechanism has focused on the regions of interest based on the classic DCNN models and channel attention mechanism, spatial attention mechanism and layer attention mechanism can be categorized. DCNN need higher accuracy and a small amount of parameters and fast model calculation speed. The researchers have proposed the lightweight networks such as the ShuffleNet series and MobileNet series. The NAS(neural architecture search) methods using neural networks to automatically design neural networks have been conerned. The NAS methods can be divided into three categories:design search space, model optimization and others. Secondly, The image classification datasets have been commonly presented in common including MNIST(modified NIST(MNIST)) dataset, ImageNet dataset, CIFAR dataset and SVHN(street view house number(SVHN)) dataset. The comparative performance and analysis of experimental results of various models were conducted as well.The accuracy, parameter and FLOPs(floating point operations) analyses to measure the results of classification have been mentioned. The capability of model optimization has been upgraded gradually via the accuracy improvement of image classification, the decreasing amount of parameters of the model and increasing speed of training and inference. Finally, the DCNN model has been constrained some factors. The DCNN model has been mainly used to supervise deep learning for image classification in constraint of the quality and scale of the datasets.The speed and resource consuming of the DCNN model have been upgraded in mobile devices.The measurment and optimization in analyzing the advantages and disadvantages of the DCNN model need to be studied further.The neural architecture search method will be the development direction of future deep convolutional neural network model designs. The DCNN models of image classification have been reviewed and the experimental results of the DCNNs have been demonstrated.
Keywords
deep learning image classification(IC) deep convolutional neural networks(DCNN) model structure model optimization
|