Current Issue Cover
大规模室外图像3维重建技术研究进展

颜深1, 张茂军1, 樊亚春2, 谭小慧3, 刘煜1, 彭杨1, 刘宇翔1(1.国防科技大学系统工程学院, 长沙 410073;2.北京师范大学人工智能学院, 北京 100875;3.首都师范大学信息工程学院, 北京 100048)

摘 要
基于图像的3维重建旨在从一组2维多视角图像中精确地恢复真实场景的几何形状,是计算机视觉和摄影测量中基础且活跃的研究课题,具有重要的理论研究意义和应用价值,在智慧城市、虚拟旅游、数字遗产保护、数字地图和导航等领域有着广泛应用。随着图像采集系统(智能手机、消费级数码相机和民用无人机等)的普及和互联网的高速发展,通过搜索引擎可以获取大量关于某个室外场景的互联网图像。利用这些图像进行高效鲁棒准确的3维重建,为用户提供真实感知和沉浸式体验已经成为研究热点,引发了学术界和产业界的广泛关注,涌现了多种方法。深度学习的出现为大规模室外图像的3维重建提供了新的契机。首先阐述大规模室外图像3维重建的基本串行过程,包括图像检索、图像特征点匹配、运动恢复结构和多视图立体。然后从传统方法和基于深度学习的方法两个角度,分别系统全面地回顾大规模室外图像3维重建技术在各重建子过程中的发展和应用,总结各子过程中适用于大规模室外场景的数据集和评价指标。最后介绍现有主流的开源和商业3维重建系统以及国内相关产业的发展现状。
关键词
Progress in the large-scale outdoor image 3D reconstruction

Yan Shen1, Zhang Maojun1, Fan Yachun2, Tan Xiaohui3, Liu Yu1, Peng Yang1, Liu Yuxiang1(1.School of System Engineering, National University of Defense Technology, Changsha 410073, China;2.School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China;3.Information Engineering College, Capital Normal University, Beijing 100048, China)

Abstract
3D reconstruction aims to accurately restore the geometry of an actual scene. It is a fundamental and active research field in computer vision and photogrammetry with important theoretical significance and application value. Acquisition of 3D models is highly relevant for various applications, including smart city, virtual tourism, digital heritage preservation, mapping, and navigation. Various technologies that enable 3D modeling have been developed, and each of them has its own benefits and drawbacks for certain applications. The methods can be classified into two categories, namely, active acquisition methods (e.g., LiDAR and radar) and passive ones (i.e., cameras). As a passive acquisition method, cameras are especially power efficient and do not need direct physical contact with the actual world, and 3D model can be effectively rebuilt from a set of 2D multiview images. In addition, with the increasing availability of cameras as commodity sensors in consumer devices, the cost of camera hardware has decreased significantly. Over the last decades, with the popularization of image acquisition systems (including smart phones, consumer-grade digital cameras, and civil drones) and the rapid development of the Internet, normal people can easily obtain a large number of Internet images about an outdoor scene through search engines (such as Google, Bing, or Baidu). Organizing and utilizing these extremely rich and diverse data source to perform efficient, robust, and accurate 3D reconstruction to provide users with actual perception and immersive experience have become a research hotspot and have attracted widespread attention from the academic and industrial circles. For a human, building an accurate and complete 3D representation of the actual world on the fly is natural, but abstracting the underlying problem in a computer program is extremely hard. Nowadays, many of the underlying problems in large-scale outdoor 3D reconstruction are gradually understood, but many problems, which the research community has not deeply understood, still exist. 3D modeling becomes feasible in computer programming by decomposing the entire reconstruction into several simpler subproblems. Thus far, a growing amount and diversity of methods have been proposed to solve the challenging problem. Some researchers focus on solving the overall modeling problem, and more approaches focus on dealing with subreconstruction tasks. In particular, in recent years, modern convolutional neural network (CNN) models have achieved the best quality for object recognition, image segmentation, image translation, and some other challenging computer vision problems. The emergence of deep learning provides new opportunities and increasing interests for the research on large-scale outdoor image 3D reconstruction. However, 3D reconstruction experiences rapid development from traditional period to deep learning era. Interestingly, to the best of our knowledge, no previous work has presented an overview of recent progress in the large-scale outdoor image 3D reconstruction in detail. To conclude the rapid evolution of this field, traditional image-based 3D reconstruction approaches are presented, a comprehensive survey of the recent learning-based developments is provided. Specifically, the basic serial pipeline of large-scale outdoor image 3D reconstruction, including image retrieval, image feature matching, structure from motion, and multiview stereo is described. Then, traditional methods and deep learning-based methods are distinguished, and the development and application of large-scale outdoor image 3D reconstruction technology in each reconstruction subprocess are systematically and comprehensively reviewed. We show that, although deep learning-based methods have achieved overwhelming advantages in other computer vision and natural language processing tasks, geometric-based methods, which are adopted by some common 3D reconstruction systems, still illuminate higher robust and accurate performance in 3D reconstruction. This finding indicates that deep learning methods can be remarkably improved. Subsequently, the datasets and evaluation indicators applicable to large-scale outdoor scenes in each subprocess are summarized in detail. Furthermore, we introduce the datasets used in each subtask and present a comprehensive dataset specifically for 3D reconstruction. Finally, the current mainstream open source and commercial 3D reconstruction systems and the development status of domestic related industries are introduced. Although the image-based 3D reconstruction technology has made great progress in the past 10 years, the current method still has some problems, as follows: 1) For scenes with repeated textures (such as the Temple of Heaven), the structure from motion process fails, resulting in inaccurate registered camera posed and incomplete reconstruction models; for scenes with weak textures (such as lake surface, glass curtain wall), multiview stereo process fails, thereby resulting in holes in the reconstructed model. 2) The current 3D reconstruction system consumes considerable time to reconstruct scenes (especially large-scale scenes); this approach is different from real-time reconstruction. 3) The price of 3D sensors (such as LiDAR and ToF) has dropped significantly; thus, they become closer to consumer applications. Using these sensors to effectively compensate for the lack of image-based 3D reconstruction is still an unsolved problem.
Keywords

订阅号|日报