伪3D卷积神经网络与注意力机制结合的疲劳驾驶检测
摘 要
目的 复杂环境下的疲劳驾驶检测是一个具有挑战性的技术问题。为了充分利用驾驶员面部特征信息与时间特征,提出一种基于伪3D(Pseudo-3D,P3D)卷积神经网络(convolutional neural network,CNN)与注意力机制的驾驶疲劳检测方法。方法 采用伪3D卷积模块进行时空特征学习;提出P3D-Attention模块,利用P3D的结构融合双通道注意力模块和适应的空间注意力模块,提高对重要通道特征的相关度,增加特征图的全局相关性,将多层深度卷积特征进行融合。利用双通道注意力模块分别在视频帧之间和每一帧的通道上施加关注,去除背景和噪声对识别的干扰,使用自适应空间注意模块使模型训练更快、收敛更好;使用2D全局平均池化层替代3D全局平均池化层获得更具表达能力的特征,进而提高网络收敛速度;运用softmax分类层进行分类。结果 在公共数据集YawDD(a yawning detection dataset)上开展对比实验,本文方法在测试集上的F1-score检测准确率达到99.89%,在打哈欠类别上召回率达到100%;在数据集UTA-RLDD(University of Texas at Arlington real-life drowsiness dataset)上,本文方法在测试集上的F1-score检测准确率达到99.64%,在困倦类别上召回率达到100%;与Inception-V3融合LSTM(long short-term memory)的方法相比,本文方法模型大小为42.5 MB,是其模型大小的1/9,本文方法预测时间约660 ms,是其11%左右。结论 提出一种基于伪3D卷积神经网络与注意力机制的驾驶疲劳检测方法,利用注意力机制进一步分析哈欠、眨眼和头部特征运动,将哈欠行为与说话行为动作很好地区分开来。
关键词
Driving fatigue detection based on pseudo 3D convolutional neural network and attention mechanisms
Zhuang Yuan, Qi Yong(School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China) Abstract
Objective Fatigue driving is one of the main causes of traffic accidents. Drivers in fatigue state will have reduced alertness, weakened ability to deal with abnormal events, and inability to react traffic control and dangerous events, which will lead to accidents. The current technology for detecting fatigue driving behavior can be divided into three methods based on physiological parameters, vehicle behavior, and facial feature analysis. Detection methods based on physiological parameters require various sensors. These sensors use physiological signals to detect the driver's drowsiness, but they need contact with the driver's body, rely on expensive equipment, and are invasive. Detection methods based on vehicle behavior use vehicle behavior parameters, such as lane departure detection, steering wheel angle, and yaw angle information, to detect driving fatigue behavior, but they also depend on external factors such as road conditions. Detection methods based on facial feature analysis need to extract feature points from the driver's facial features and compare the driver's performance in fatigue or normal conditions by detecting fatigue behavior characteristics such as eye state, blinking, and yawning. Compared with the two earlier methods, this method has the advantages of noninvasiveness and easy implementation. In several current methods, spatiotemporal features cannot be well integrated, and interference of background and noise on recognition is not removed. This paper proposes a driving fatigue detection method based on pseudo 3D (P3D) convolutional neural network(CNN) and attention mechanism to solve these problems. Method The dataset is cropped into small videos of around 5 s each. The training video data interval is 90 video frames, and the picture resolution is set to 80×80×3. First, the feature map of each frame is fully extracted through the P3D module to generate a fixed-size feature set. Second, the P3D structure uses a 1×3×3 convolution kernel and a 3×1×1 convolution kernel to simulate 3×3×3 convolution in the spatial and time domains, decoupling 3×3×3 convolutions in time and space. Based on the feature that P3D decouples 3×3×3 convolutions in time and space, a module named P3D-Attention is proposed. The 3D convolutional neural network and attention mechanism are integrated to improve the correlation of important channel features, increase the global correlation of feature graphs, and remove the interference of background and noise on recognition by translating 3D temporal and spatial features into 2D features and embedding them in the dual-channel and spatial attention modules. The dual-channel attention module is used to apply attention on video frames and channels of each frame, which removes the interference of background and noise on recognition. For driving scenarios, this paper selects convolution kernels of different sizes to adapt to convolution features of different depths and uses the adaptive spatial attention module to make the model training converge faster and better. Afterward, 2D global average pooling layer is used instead of 3D global average pooling layer to obtain more expressive features, improving network convergence speed. Finally, the softmax classification layer is used for classification. Result A comparative test is performed on the public dataset——a yawning detection dataset(YawDD). The detection accuracy of the method in this paper reaches 98.75%, and the recall rate of the yawning category reaches 100%.On the University of Texas at Arlington real-life drowsiness dataset(UTA-RLDD), the F1-score detection accuracy of the method in this paper reaches 99.64% on the test set, and the recall rate reaches 100% in the drowsy category. In terms of running time and model size, experimental results show that compared with the long short-term memory(LSTM) fusion method using ImageNet-trained Inception_v3 model, the algorithm in this paper has evident advantages in terms of running time and predicts that a 5 s video will take 660 ms on average, which is 11% of it. In terms of the size of the unpruned model, the method of Inception-v3 plus LSTM has 396.15 MB, and the model size in this paper is 42.5 MB, which is 1/9 of it. Conclusion A driving fatigue detection method based on P3D convolutional neural network and attention mechanism is proposed. Attention mechanisms are used to remove background and noise from recognition interference, improve the accuracy of driving fatigue detection, distinguish yawning behavior from mouth opening and mouth closing behaviors such as talking, and analyze yawning behavior, blinking, and head characteristic movements. The further work of this paper will 1)verify whether features can be extracted through a smaller network structure, design a more efficient network structure, and further reduce the size of the model. 2)Future work will also focus on using 3D convolution to distinguish more complicated driving behaviors because distracted driving behavior not only needs to focus on predicting the driver's fatigue status.
Keywords
3D convolutional neural network(CNN) pseudo-3D(P3D) convolutional global average pooling attention mechanisms fatigue driving
|