Current Issue Cover
  • 发布时间: 2024-09-11
  • 摘要点击次数:  20
  • 全文下载次数: 10
  • DOI:
  •  | Volume  | Number
HK-DETR:改进RT-DETR的持刀危险行为检测算法

金涛, 胡配雨(甘肃政法大学)

摘 要
目的 在对公安系统网络摄像头获取的视频数据进行分析时,行人危险持刀行为的自动检测面临刀具形状、大小的多样性,以及遮挡和多目标重叠等因素导致的检测精度低、误检率高的问题。针对上述问题,本文提出了一种改进实时物体检测Transformer(real-time detection Transformer,RT-DETR)的持刀危险行为检测算法(human-knife detection Transformer,HK-DETR)。方法 首先,设计了倒置残差级联模块(inverted residual cascade block,IRCB)作为主干网络中的基本块(BasicBlock),这使得网络更加轻量化,减少了计算冗余,并提高了对全局特征和长距离依赖关系的理解能力;其次,提出了跨阶并行空洞融合网络结构(cross stage partial-parallel multi-atrous convolution,CSP-PMAC),专注于多尺度特征的提取,使模型能有效识别不同大小和角度的刀具;最后,引入了Haar小波下采样(Haar wavelet-based downsampling,HWD)模块来替换原模型中的下采样操作,为多尺度特征融合提供了更丰富的信息。同时,采用了最小点距离交并比(minimum point distance based intersection over union,MPDIoU)损失函数来进一步提升检测性能。结果 对比实验结果表明,与原RT-DETR算法相比,改进后的模型网络参数量下降了25%,精度、召回率、平均精度(mean average precision,mAP)分别提高了2.3%、5.5%、5.2%;与YOLOv5m、YOLOv8m和Gold-YOLO-s相比,在模型网络参数量较低的情况下mAP又分别提高了6.3%、5.2%、1.8%。结论 本文提出的HK-DETR算法能够有效适应网络摄像头下多种复杂环境的持刀危险行为检测场景,相较于其他参与对比的模型,其性能优势得到了充分的展现。
关键词
HK-DETR: An Improved Knife-Holding Dangerous Behavior Detection Algorithm Based on RT-DETR

Jin Tao, Hu Peiyu(College of Artificial Intelligence,Gansu University of Political Science and Law)

Abstract
Objective In contemporary society, public safety concerns have garnered increasing attention, particularly in crowded venues such as subway stations, railway terminals, and commercial centers, where timely and accurate detection and response to potential threatening behaviors are paramount for maintaining societal stability. The extensive deployment of network cameras by public security systems serves as a vital surveillance tool, capable of capturing and recording vast amounts of video data in real-time, providing a rich source of information for security analysis. Nevertheless, a pivotal challenge arises when delving into the depths of these video data: the automated detection of pedestrians engaging in dangerous knife-wielding behaviors. The complexity of this task stems primarily from the diversity in knife shapes and sizes, ranging from conventional elongated knives to folding knives and daggers, each exhibiting distinct visual representations in images, posing significant challenges for detection algorithms. Furthermore, occlusions, a common occurrence in real-world surveillance scenarios, including body occlusions between pedestrians, obstructions by trees or buildings, can lead to incomplete target feature information, thereby compromising detection performance. Additionally, multi-object occlusion, prevalent in densely populated areas, where multiple pedestrians or objects overlap in images, exacerbates the difficulty in accurately distinguishing and localizing knife-wielding individuals. To address these issues and enhance the precision and efficiency of detecting dangerous knife-wielding behaviors, this paper proposes an algorithm named human-knife detection Transformer(HK-DETR), which is an improvement upon the real-time detection Transformer(RT-DETR). Building upon the inherent strengths of RT-DETR, HK-DETR incorporates numerous optimizations and innovations tailored specifically to the characteristics of knife-detection tasks. Method First, we have meticulously designed the inverted residual cascade block(IRCB) as a fundamental building block(BasicBlock) within the backbone network. This innovative design not only achieves a lightweight network architecture, effectively alleviating computational resource scarcity, but also significantly reduces redundant computations. By optimizing the processing flow of feature maps, the IRCB module substantially enhances the backbone network's ability to capture and distinguish diverse features, thereby laying a solid foundation for subsequent complex knife detection tasks. Subsequently, we propose the cross stage partial-parallel multi-atrous convolution(CSP-PMAC) module, a revolutionary feature fusion strategy. This module directs the network to focus more intently on capturing and integrating multi-scale feature information during the fusion stage, which is pivotal for identifying knives of varying shapes and angles. This design equips the model with exceptional adaptability, enabling it to accurately identify both small knives and large knives, thus significantly improving the model’s performance in complex scenarios. In further optimizing the model, we have selected the novel Haar wavelet-based downsampling(HWD) module as a downsampling method to replace the traditional downsampling mechanism within the network. By leveraging its unique hierarchical wavelet decomposition technique, the HWD module effectively diminishes data dimensionality while retaining richer details of object scale variations. This enriches and refines feature representations in subsequent multi-scale feature fusion, enhancing the model’s robustness in handling scale variations. Finally, to comprehensively enhance detection accuracy, we have adopted the minimum point distance based intersection over union (MPDIoU) loss function. This improved loss function optimizes object localization accuracy by more precisely measuring the overlap between predicted bounding boxes and actual target boxes. It not only considers classification accuracy but also intensifies the pursuit of localization precision, enabling the model to maintain superior detection performance even in the presence of dense or overlapping targets. Result Ablation experiments were conducted on the pedestrian knife-carrying dataset, which revealed that each improvement strategy, when applied individually, contributed to a certain degree of performance enhancement for the original RT-DETR model, despite the persistence of challenges such as missed detections and confidence issues in some cases. However, when these improvement strategies were combined, a significant boost in detection performance was achieved. To validate the effectiveness of the proposed model, comparative experiments were performed on the pedestrian knife-carrying dataset. The results demonstrated that compared to the original RT-DETR algorithm, the refined model exhibited a 25% reduction in network parameters while achieving improvements of 2.3%, 5.5%, and 5.2% in accuracy, recall, and mean average precision(mAP), respectively. When benchmarked against YOLOv5m, YOLOv8m, and Gold-YOLO-s, the refined model, with a lower number of network parameters, demonstrated notable mAP enhancements of 6.3%, 5.2%, and 1.8%, respectively. Conclusion The proposed HK-DETR algorithm in this paper exhibits remarkable performance advantages in the task of automatically detecting dangerous knife-carrying behaviors of pedestrians in video data captured by public security system network cameras. This algorithm effectively addresses the challenges posed by the diversity of knife shapes and sizes, occlusion, and multi-target overlapping in complex scenarios, while significantly enhancing detection accuracy, recall rate, and mAP through a series of innovative designs. Compared to the original RT-DETR algorithm and other mainstream detection models such as YOLOv5m, YOLOv8m, and Gold-YOLO-s, HK-DETR achieves notable performance improvements. This result underscores the algorithm's ability to maintain high efficiency and accuracy in diverse and complex environments, offering robust technical support for the field of public security surveillance. Within the realm of public safety, HK-DETR holds immense potential for widespread adoption in surveillance systems of public places like railway stations, airports, subway stations, and shopping malls, enabling real-time detection and early warning of potential knife-related dangers, thereby providing timely and effective information support for law enforcement agencies. Moreover, as technology continues to evolve and mature, the HK-DETR algorithm is poised to expand its reach into other domains, such as intelligent transportation and industrial automation, offering potent solutions to an array of practical problems.
Keywords

订阅号|日报