Current Issue Cover
  • 发布时间: 2024-09-12
  • 摘要点击次数:  21
  • 全文下载次数: 11
  • DOI:
  •  | Volume  | Number
基于深度学习的监控视频异常检测方法综述

汪洋1, 周脚根2, 严俊1, 关佶红1(1.同济大学;2.淮阴师范学院)

摘 要
利用监控视频监测异常在社会治理中具有至关重要的地位,因此视频异常检测一直是计算机视觉领域备受关注且具有挑战性的议题。本文以深度学习的视角,对当前关键的视频异常检测方法进行了分类和综述。首先,本文全面介绍了视频异常的定义,包括异常的划定和类型分类。随后,分析了目前全监督、弱监督、无监督等方面的深度学习方法在视频异常检测领域的进展,探讨了各自的优缺点,特别针对结合大模型的最新研究进展进行了探讨。接着,本文详细介绍了常见和最新的数据集,并对它们的特点进行了比较分析和截图展示。最后,本文介绍了多种异常判定和性能评估标准,对各算法的性能表现进行了对比分析。根据这些信息,本文展望了未来数据集、评估标准以及方法研究的可能发展方向,其中特别强调了大模型在视频异常检测中的新机遇。综上,本文对于深化读者对视频异常检测领域的理解,以及指导未来的研究方向具有积极意义。
关键词
Survey of anomaly detection methods in surveillance videos based on deep learning

Wang Yang, Zhou Jiaogen1, Yan Jun2, Guan Jihong2(1.Huaiyin Normal University;2.Tongji University)

Abstract
Abstract: Video anomaly detection plays a crucial role in social governance by utilizing surveillance footage, making it a highly significant and challenging topic within the field of computer vision. This paper presents a detailed classification and review of current key video anomaly detection methods from a deep learning perspective, analyzing existing technical challenges and future development trends. Firstly, the paper provides a comprehensive introduction to the definition of video anomalies, including the delineation of anomalies and video anomalies, the five types of video anomalies (intuitive anomalies, action change anomalies, trajectory change anomalies, group change anomalies, and spatiotemporal anomalies), and the three characteristics of anomaly detection (abstraction, uncertainty, and sparsity).The paper then reviews the development trends in video anomaly detection research from 2008 to the present based on the DataBase systems and Logic Programming (DBLP) database and provides a detailed analysis of the progress of fully supervised, weakly supervised, and unsupervised deep learning methods in the field of video anomaly detection. The core innovations, structures, and advantages and disadvantages of each method are discussed, with a particular focus on the latest research advancements involving large models. For instance, some studies address the challenge of applying virtual anomaly video datasets to real-world scenarios by designing anomaly prompts that guide mapping networks to generate unseen anomalies in real-world settings. Additionally, some works have designed dual-branch model structures based on multimodal large model frameworks. One branch uses the Contrastive Language–Image Pre-training (CLIP) visual encoding module for coarse-grained binary classification, while the other branch aligns textual features of anomaly category labels with visual encoding features for fine-grained anomaly classification, surpassing the current state-of-the-art performance in video anomaly detection. Furthermore, research has explored the potential of using GPT-4V, a powerful large visual language model, to tackle general anomaly detection tasks, examining its applications in multimodal and multi-domain anomaly detection tasks, including image, video, point cloud, and time-series data across various fields such as industry, healthcare, logic, video, 3D anomaly detection, and localization. The introduction of large models presents new opportunities and challenges for video anomaly detection. Moreover, the paper introduces 10 commonly used and latest datasets, providing a comparative analysis of their characteristics and presenting detailed content through figures, along with corresponding download links. These datasets play a crucial role in video anomaly detection research, and this paper offers a comprehensive evaluation of them.The paper also introduces four anomaly determination standards (frame-based, pixel-based, trajectory-based) and three performance evaluation standards (Area Under the Receiver Operating Characteristic Curve (AUC), Equal Error Rate (EER), Average Precision (AP)), and conducts a comparative analysis of the performance of various algorithms. We summarize the strengths and weaknesses of current video anomaly detection algorithms and propose suggestions for improvement. Based on this information, we predict that datasets may have become a bottleneck in the development of current methods. In complex real-world scenarios, research methods based solely on simple scenes may not effectively address anomaly issues in the real world. To better promote research development, future datasets will aim to better reflect real-world anomalies, such as collecting data from the remote sensing field, improving the quality of existing image and video data through models, and collecting multi-camera, multi-dimensional annotated data, to detect more diverse and challenging anomaly events. Additionally, in terms of evaluation standards, common evaluation methods primarily rely on calculating the true positive rate and false positive rate and computing the area under the Receiver Operating Characteristic curve. However, in practical applications, some methods may achieve high AUC but exhibit a high false alarm rate, as the true positive rate and false positive rate are directly influenced by different anomaly determination methods. Adopting different anomaly determination methods may result in models achieving high AUC performance while generating high false alarm rates. Therefore, this paper proposes the need to design an evaluation system that simultaneously considers AUC performance and false alarm rates to comprehensively evaluate methods. Finally, the paper's outlook emphasizes the new opportunities presented by large models in video anomaly detection. The emergence of large models in recent years has significantly improved the performance of deep learning-based methods on commonly used video anomaly detection datasets. This field has accumulated a solid academic research foundation. Therefore, future research should not only focus on improving anomaly detection performance but also consider the application of this field to practical problems to address existing challenges. Future research should aim to design more fine-grained and general models, leveraging the rich prior knowledge of large models to gradually develop video anomaly detection models capable of distinguishing specific types of anomalies. With the powerful multimodal information understanding capabilities of large models, video anomaly detection models will evolve towards a more general direction, ultimately blurring the boundaries between supervised, weakly supervised, and unsupervised learning methods. In summary, this paper significantly enhances readers' understanding of the field of video anomaly detection and provides valuable references and guidance for future research directions. Through a systematic review and analysis of existing research, this paper offers crucial insights for the further development of the video anomaly detection field.
Keywords

订阅号|日报