3D遮挡模型引导的光场图像深度获取
摘 要
目的 光场相机可以通过单次曝光同时从多个视角采样单个场景,在深度估计领域具有独特优势。消除遮挡的影响是光场深度估计的难点之一。现有方法基于2D场景模型检测各视角遮挡状态,但是遮挡取决于所采样场景的3D立体模型,仅利用2D模型无法精确检测,不精确的遮挡检测结果将降低后续深度估计精度。针对这一问题,提出了3D遮挡模型引导的光场图像深度获取方法。方法 向2D模型中的不同物体之间添加前后景关系和深度差信息,得到场景的立体模型,之后在立体模型中根据光线的传输路径推断所有视角的遮挡情况并记录在遮挡图(occlusion map)中。在遮挡图引导下,在遮挡和非遮挡区域分别使用不同成本量进行深度估计。在遮挡区域,通过遮挡图屏蔽被遮挡视角,基于剩余视角的成像一致性计算深度;在非遮挡区域,根据该区域深度连续特性设计了新型离焦网格匹配成本量,相比传统成本量,该成本量能够感知更广范围的色彩纹理,以此估计更平滑的深度图。为了进一步提升深度估计的精度,根据遮挡检测和深度估计的依赖关系设计了基于最大期望(exception maximization,EM)算法的联合优化框架,在该框架下,遮挡图和深度图通过互相引导的方式相继提升彼此精度。结果 实验结果表明,本文方法在大部分实验场景中,对于单遮挡、多遮挡和低对比度遮挡在遮挡检测和深度估计方面均能达到最优结果。均方误差(mean square error,MSE)对比次优结果平均降低约19.75%。结论 针对遮挡场景的深度估计,通过理论分析和实验验证,表明3D遮挡模型相比传统2D遮挡模型在遮挡检测方面具有一定优越性,本文方法更适用于复杂遮挡场景的深度估计。
关键词
Light field depth estimation guided by 3D occlusion model
Wu Di, Zhang Xudong, Zhang Jun, Fan Zhiguo, Sun Rui(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China) Abstract
Objective Depth estimation from multiple images is a central task in computer vision. Reliable depth information provides an effective source for visual tasks, such as target detection, image segmentation, and special effects for movies. As one of the new multi-view image acquisition devices, the light field camera makes it more convenient to acquire multiple image data. A light field camera can simultaneously sample a scene from multiple viewpoints with a single exposure, which has unique advantages in portability and depth accuracy over other depth sensors. Occlusion is a challenging issue for light field depth estimation. For a non-occluded pixel on Lambertian surfaces, the angular patch corresponding to this pixel exhibits photo-consistency when refocused to its correct depth. However, the occluder will prevent viewpoints from sampling the same point. Thus, the photo-consistency fails to hold at occluded pixels. If the occluded viewpoints are accurately excluded, the photo-consistency of the remaining viewpoints can still be guaranteed. Therefore, how to identify the occluded viewpoints in the angular patch is crucial for accurate depth estimation. Previous works detected occlusion on the basis of the 2D model (RGB image) of the scene. However, occlusion is determined by the scene's 3D model, and it cannot be accurately detected using only the 2D model. Inaccurate occlusion detection will lead to low quality of depth estimation. In this study, we present a light field depth estimation algorithm that is robust to occlusion. Method First, we reconstruct the 3D scene model by adding the foreground-background relation and depth difference between different objects in the 2D model. On the basis of the 3D model, we directly calculate the occlusion state of each view and record it in the occlusion map. Further analysis demonstrates that the generated occlusion map can exclude all occluded viewpoints. Thanks to the occlusion map, the scene is able to be divided into occluded and non-occluded regions, so that more appropriate cost function can be adopted in different regions. In this study, if a spatial point is visible in a subset of viewpoints, this spatial point will be included in the occluded region. The remaining spatial points will be included in the non-occluded regions. In the occluded regions, we exclude the occluded viewpoints by the occlusion map and build the cost volume on the basis of the photo-consistency of the remaining viewpoints. In the non-occluded regions, on the basis of the depth continuity of these regions, we design a defocus grid matching cost function that captures textures over a wider area than traditional methods. A wider capture range means that our cost function is capable of collecting more information to increase its robustness. To propagate the effective information of higher confidence points to low confidence points, every slice in the final data cost volume is filtered using the edge-preserving filter. Compared with graph-based optimization, the filter-based method is more efficient and easy to parallelize. Moreover, because our occlusion map has excluded the possible occlusions, the filter-based method is enough for most examples. The initial disparity label is generated from the filtered cost volume using the winner-takes-all method. Finally, we exploit the dependence between the occlusion map and the depth map to further improve the accuracy of depth estimation. That is, the depth map can help the reconstruction of the 3D model required for occlusion detection, and the occlusion map can help the cost function exclude the occluded viewpoints. On the basis of this dependence, we integrate occlusion detection and depth estimation into an expectation-maximization-based optimization framework to alternatively improve the accuracy of the occlusion map and the depth map. Result Experiments are conducted on the HCI (Heidelberg Collaboratory for Image Processing) synthetic dataset and Stanford Lytro Illum dataset for real scenes. To ensure fairness, the number of depth labels of all cost-volume-based algorithms is uniformly set to 75. For quantitative evaluation, we use the percentage of bad pixels and the mean square error to measure the pros and cons of every algorithm. We also compare our occlusion detection method with state-of-the-art methods. Instead of evaluating the occlusion map of a single angular patch, we evaluate the occlusion map of all angular patches around the occlusion boundary. This evaluation method requires the algorithm to respond correctly to all degrees of occlusion. The experimental results show that the proposed method achieves better performance than other state-of-the-art methods in terms of both occlusion detection and depth estimation for single occlusion, multi-occlusion, and low-contrast occlusion. Compared with the suboptimal method, our mean square error is reduced by about 19.75% on average. Conclusion For the depth estimation of scenes with occlusion, the superiority of the proposed 3D occlusion model is demonstrated through theoretical analysis and experimental verification. The proposed depth estimation algorithm is more suitable for scenes with complex occlusion.
Keywords
|