卷积神经网络的多字体汉字识别
摘 要
目的 多字体的汉字识别在中文自动处理及智能输入等方面具有广阔的应用前景,是模式识别领域的一个重要课题。近年来,随着深度学习新技术的出现,基于深度卷积神经网络的汉字识别在方法和性能上得到了突破性的进展。然而现有方法存在样本需求量大、训练时间长、调参难度大等问题,针对大类别的汉字识别很难达到最佳效果。方法 针对无遮挡的印刷及手写体汉字图像,提出了一种端对端的深度卷积神经网络模型。不考虑附加层,该网络主要由3个卷积层、2个池化层、1个全连接层和一个Softmax回归层组成。为解决样本量不足的问题,提出了综合运用波纹扭曲、平移、旋转、缩放的数据扩增方法。为了解决深度神经网络参数调整难度大、训练时间长的问题,提出了对样本进行批标准化以及采用多种优化方法相结合精调网络等策略。结果 实验采用该深度模型对国标一级3 755类汉字进行识别,最终识别准确率达到98.336%。同时通过多组对比实验,验证了所提出的各种方法对改善模型最终效果的贡献。其中使用数据扩增、使用混合优化方法和使用批标准化后模型对测试样本的识别率分别提高了8.0%、0.3%和1.4%。结论 与其他文献中利用手工提取特征结合卷积神经网络的方法相比,减少了人工提取特征的工作量;与经典卷积神经网络相比,该网络特征提取能力更强,识别率更高,训练时间更短。
关键词
Recognition of Chinese characters using deep convolutional neural network
Chai Weijia, Wang Lianming(School of Physics, Northeast Normal University, Changchun 130024, China) Abstract
Objective The recognition of Chinese characters has a broad application prospect in Chinese automatic processing and intelligent input. It is an important subject in the field of pattern recognition. With the emergence of the new technology of deep learning in recent years, the recognition of Chinese characters based on a deep convolutional neural network has made a breakthrough in theoretical method and actual performance. However, many problems still exist, such as the need for a large sample size, long training time, and great difficulty in parameter tuning. Thus, achieving the best identification result for Chinese characters, which belong to numerous categories, is difficult. Method An end-to-end deep convolutional neural network model was proposed for processing unscreened images with printed and handwritten Chinese characters. Regardless of the additional layers, such as batch normalization and dropout layers, the network mainly consisted of three convolutional layers, two pooling layers, one fully connected layer, and a softmax regression layer. This paper proposed the data augmentation method, which comprehensively adopted a wave distortion, translation, rotation, and zooming, to solve the problem of a small sample size. The translation and zooming scale, the rotation angles, and a large number of pseudo-samples were randomly generated by controlling the amplitude and period of the sine function that caused the wave distortion. The overall structure of the characters could not be changed, and the number of the trainset samples could be increased to infinity. Advanced strategies, such as batch normalization and fine-tuning the model by combining two optimizers, namely, stochastic gradient descent (SGD) and adaptive moment estimation (Adam), were used to reduce the difficulty of parameter adjustment and the long model training duration. Batch normalization refers to normalizing the input data for each training mini-batch in the process of stochastic gradient descent. Thus, the probability distribution in each dimension becomes a stable probability distribution with mean 0 and standard deviation 1. We define internal covariate shift as the change in the distribution of network activations due to the change in network parameters during training. The network should learn to adapt to different distributions at each iteration, which will greatly reduce the training speed of the network. Batch normalization is an effective way to solve this problem. In the proposed network, the batch normalization layer was placed in front of the activation function layer. In the classic convolutional neural network, the mini-batch stochastic gradient descent method is usually adopted during the training process. However, selecting suitable hyper-parameters is difficult. Parameter selection, such as learning rate and initial weight, greatly affects training speed and classification results. Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions based on adaptive estimates of lower-order moments. The method computes individual adaptive learning rates for different parameters from estimates of the first and second moments of the gradients. The greatest advantage of the method is that the magnitudes of parameter updates are invariant to the rescaling of the gradient and that the training speed can be accelerated tremendously. However, the single use of this method cannot ensure state-of-the-art results. Therefore, this paper presents a new training method that combines the novel optimization method, Adam, and the traditional method, SGD. We divided the training process into two steps. First, we adopted Adam to adjust the parameter, such as learning rate, to avoid manual adjustment and make the network coverage immediately. This process lasted for 200 iterations, and the best model was saved after the first training step. Second, SGD was used to further fine-tune the trained model with a minimal learning rate to achieve the best classification result. The initial learning rate was set to 0.0001 in this step and exponentially decayed. Through these methods,the network performed well in terms of training speed and generalization ability. Result A seven-layer deep model was trained to categorize 3,755 Chinese characters, and the recognition accuracy rate reached 98.336%. The contribution of each proposed method to improve the final effect of the model was verified by several sets of comparative experiments.The recognition rate of the model increased by 8.0%, 0.3%, and 1.4% by using data augmentation, combining the two kinds of optimizers, and using batch normalization, respectively.The training time of the model was 483 and 43 minutes less than when SGD was used and batch normalization was not used, respectively.Conclusion The workload of extracting features is manually reduced compared with traditional recognition methods that use handcrafted features in combination with convolutional neural networks in the reference paper. Our proposed method achieves superior performance because it has a higher recognition rate, stronger extraction ability, and shorter training time compared with the classic convolutional neural network.
Keywords
recognition of Chinese characters convolutional neural network deep learning data augmentation batch normalization
|