Current Issue Cover
  • 发布时间: 2024-08-28
  • 摘要点击次数:  24
  • 全文下载次数: 7
  • DOI:
  •  | Volume  | Number
面向目标检测的视点规划方法综述

王健宇1, 朱枫2, 郝颖明2, 王群2, 赵鹏飞2, 孙海波3(1.东北大学;2.中国科学院沈阳自动化研究所;3.中国科学院上海微系统与信息技术研究所)

摘 要
目标检测是计算机视觉领域的基础研究方向之一。由于图像采集时物体摆放密集,光照条件差等因素会导致图像失去细节,当使用此类图像作为输入时,常规的目标检测算法对目标物的检测结果无法满足任务需求。为了解决这类问题,面向目标检测的视点规划这一智能感知方法应运而生,其可自主分析当前条件下影响检测任务的因素,调整相机的位姿参数规避影响,实现目标物准确检测。 面向目标检测的视点规划方法不仅可以辅助计算机视觉的其他领域,也会为未来的智能化生活提供便利。为了反映其研究现状和最新进展,本文梳理了2007年以来的文献,对国内外的研究方法做出概括性总结。首先,以算法应用的场景维度和调整参数作为分类依据,我们将面向目标检测的视点规划方法分为二维像素调整的规划方法,三维空间移动的规划方法以及两者结合的规划方法三类,本文重点对前两类方法进行分析与总结。其次,解析每类方法的基本思想,并指出各类方法需解决的关键问题,然后对解决问题的主要研究方法进行归纳和分析,并总结各自的优点和局限性。除此之外,本文也对各类场景下可使用的数据集和评价指标进行简要介绍。最后,在目前方法的分析基础上,探讨面向目标检测的视点规划领域所面临的挑战,并对未来研究方法进行展望。
关键词
Methods of view planning of object detection: a survey

(1.Shenyang Institute of Automation Chinese Academy of Sciences;2.Shanghai Institute of Microsystem and Information Technology)

Abstract
Object detection is one of the fundamental research directions in the field of computer vision. It is also the cornerstone of advanced vision research. When objects are placed densely or under poor lighting conditions, much detail is lost in image acquisition. Using the image with missing details as input, the detection results of the conventional target detection algorithm can not meet the task requirements. To solve such problems, intelligent perceptual methods for point-of-view planning for target detection have emerged, which can autonomously analyze the factors affecting the detection task under the current conditions, adjust the camera's pose parameters to avoid the effects and achieve accurate detection of targets. In order to reflect the research status and the latest development of viewpoint planning methods for object detection, relevant studies since 2007 are combed and analyzed, and a summary of domestic and foreign research methods is made. In order to simplify, this method is called active object detection (AOD) in this article. According to the different use scenarios, this paper divides the active object detection methods into two categories: AOD in two-dimensional scenes, AOD in three-dimensional scenes, and AOD combining the two. Since the third method is not common, this paper mainly introduces the first two methods. To be more specific, in two-dimensional scenes, AOD methods are divided into pixel-based, and that simulates camera parameters, depending on whether a single-pixel or an overall image is planned. The most important part of the pixel-based approach is selecting the target pixel point and how the next pixel is planned. Typically, integral features, scale features, or key points, which are the parts of the target that have the largest gap between the target and the background, are used by researchers to locate where the target pixels are likely to be. After positioning the target pixel, to ensure the continuity of the front and back frames and avoid the task failure caused by planning errors, the moving position of the next pixel will be set according to the category of the region. For AOD methods that simulate camera parameters, different influencing factors cause various difficulties in target detection. As a result, researchers have designed different planning scenarios by analyzing the types of influencing factors, and some excellent results have emerged in recent years. As time goes by, the popularity of moveable robots has brought AOD into a new development environment-3D scenes. In a three-dimensional scene, the AOD method controls the intelligent agent to actively select the next viewpoint pose in space to remove the influence of interference factors on the target detection process. According to the degree of known spatial location information within 3D scenes, we classify them into 3D scenes with known spatial relationships and 3D scenes with unknown spatial relationships. In the first type of scenario, the placement of the target object and surrounding objects, the display of spatial category markers, and the range of viewpoint planning are all known, and the AOD method can perform viewpoint planning based on the known information. In this type of approach, researchers focus more on the representation of relationships and the selection of the next viewpoint in a fixed search space. The second type of space has no information to assist, and the agent can only rely on the observation results to select the next viewpoint. As is well known, in real life, situations where relationships are unknown are more common, so the design of AOD methods in this situation is currently a hot direction. Due to the close relationship between the planning strategy of such scenarios and the observed results, researchers have made a lot of effort to provide detailed descriptions. In AOD, we usually refer to observation information as state expression, and the more detailed the expression, the better the strategy generation. In addition, to evaluate the next viewpoint and modify the planning strategy, researchers have also made many efforts in the evaluation function of the next view. AOD has two main objectives in unknown environments: path optimization and detection effect optimization. The evaluation function is generally divided into single-element evaluation and multi-element evaluation based on the types of evaluation factors. Although multi-element evaluation is more accurate, the selection of elements in different problems must be more consistent. Finding the same component in various scenarios to design a universal evaluation function is still a direction researchers can break through in the future. In addition to the analysis of the methods mentioned above, this article also provides a brief introduction to the datasets that AOD methods can use in different types of scenarios. The viewpoint planning in two-dimensional scenes is consistent with the scenes used by conventional object detection methods, so there are also many overlaps on the dataset, such as large-scale public datasets CoCo, Pascal VOC, etc. Meanwhile, the evaluation indicators of the two methods are also basically the same, so performance comparison can be directly conducted. Due to the consideration of motion factors, it is not possible to directly compare detection results on 3D datasets such as AVD and T-LESS to determine the correctness of the movement path. Therefore, researchers have designed task success rate(SR) and average travel distance as the leading indicators to measure the effectiveness of the AOD algorithm. It should be emphasized that although many excellent results have been achieved in viewpoint planning methods oriented toward target detection, there are still parts that can be improved regarding scene design and research methodology. First, some real physical elements can be added to the scene design to transform the planning problem into an optimization problem under certain constraints. Secondly, the methods suitable for two-dimensional and three-dimensional scenes are closely combined, and further accurate detection can be achieved by changing the sensor parameters in inaccessible areas. Finally, detection-oriented viewpoint planning methods usually output discrete actions and are also tightly bound to the task, so viewpoint planning in continuous environments or establishing a generic framework for task-independent viewpoint planning can also be considered future directions.
Keywords

订阅号|日报