Current Issue Cover
抗高光的光场深度估计方法

王程, 张骏, 高隽(合肥工业大学计算机与信息学院, 合肥 230601)

摘 要
目的 光场相机一次成像可以同时获取场景中光线的空间和角度信息,为深度估计提供了条件。然而,光场图像场景中出现高光现象使得深度估计变得困难。为了提高算法处理高光问题的可靠性,本文提出了一种基于光场图像多视角上下文信息的抗高光深度估计方法。方法 本文利用光场子孔径图像的多视角特性,创建多视角输入支路,获取不同视角下图像的特征信息;利用空洞卷积增大网络感受野,获取更大范围的图像上下文信息,通过同一深度平面未发生高光的区域的深度信息,进而恢复高光区域深度信息。同时,本文设计了一种新型的多尺度特征融合方法,串联多膨胀率空洞卷积特征与多卷积核普通卷积特征,进一步提高了估计结果的精度和平滑度。结果 实验在3个数据集上与最新的4种方法进行了比较。实验结果表明,本文方法整体深度估计性能较好,在4D light field benchmark合成数据集上,相比于性能第2的模型,均方误差(mean square error,MSE)降低了20.24%,坏像素率(bad pixel,BP)降低了2.62%,峰值信噪比(peak signal-to-noise ratio,PSNR)提高了4.96%。同时,通过对CVIA (computer vision and image analysis) Konstanz specular dataset合成数据集和Lytro Illum拍摄的真实场景数据集的定性分析,验证了本文算法的有效性和可靠性。消融实验结果表明多尺度特征融合方法改善了深度估计在高光区域的效果。结论 本文提出的深度估计模型能够有效估计图像深度信息。特别地,高光区域深度信息恢复精度高、物体边缘区域平滑,能够较好地保存图像细节信息。
关键词
Anti-specular light-field depth estimation algorithm

Wang Cheng, Zhang Jun, Gao Jun(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China)

Abstract
Objective Image depth, which refers to the distance from a point in a scene to the center plane of a camera, reflects the 3D geometric information of a scene. Reliable depth information is important in many visual tasks, including image segmentation, target detection, and 3D surface reconstruction. Depth estimation has become one of the most important research topics in the field of computer vision. With the development of sensor technology, light field cameras, as new multi-angle image acquisition devices, have increased the convenience of acquiring optical field data. These cameras can simultaneously acquire the spatial and angular information of a scene and show unique advantages in depth estimation. At present, most of the available methods for light field depth estimation can obtain highly accurate depth information in many scenes. However, these methods implicitly assume that objects are on a Lambertian surface or a uniform reflection coefficient surface. When specular reflection or non-Lambertian surfaces appear in a scene, depth information cannot be accurately obtained. Specular reflection is commonly observed in real-world scenes when light strikes the surface of an object, such as metals, plastics, ceramics, and glass. Specular reflection tends to change the color of an object and obscure its texture, thereby leading to local area information loss. Previous studies have shown that the specular region changes along with angle of view. Furthermore, we can speculate on the location of the specular area based on the context information of its surroundings. Inspired by these principles, we propose an anti-specular depth estimation method based on the context information of the light field image. In this way, this method can improve the reliability of the algorithm in handling problems associated with specular reflection. Method Based on the changes in the change of an image with the angle of view, we design our network by considering the light field geometry, select the horizontal, vertical, left diagonal, and right diagonal dimensions, and create four independent yet identical sub-aperture image processing branches. In this configuration, the network generates four directional independent depth feature representations that are combined at a later stage. We also use a fixed light direction, due to the obstruction of the front object or the incident angle of the light, smooth surface at the same depth level, not all areas will appear as highlights. In addition, the degree of reflection of specular on the smooth surface is different, indirectly showing the geometric characteristics. Therefore, we process each sub-aperture image branch via dilated convolution, which expands the network receptive field. Our constructed network obtains a wide range of image context information and then restores the specular region depth information. To improve the depth estimation accuracy in the specular area, we apply a novel multi-scale feature fusion method where the multi-rate dilated convolution feature is connected to a multi-kernel common convolution feature to obtain the fusion features. To enhance the robustness of our depth estimation, we use a series of residual modules to reintroduce part of the feature information that is lost by the previous layer convolution in the network, learn the relationship among the fusion features, and encode such relationship into higher-dimension features. We use Tensorflow as our training backend, the Ker as programming language to build our network, Rmsprop as our optimizer, and set the batch size to 16. We initialize our model parameters by using the Glorot uniform distribution initialization and set our initial learning rate to 1E-4, which decreases to 1E-6 along with the number of iterations. We use the mean absolute error (MAE) as our loss function given its robustness to outliers. We use an Intel i7-5820K@3.30 GHz processor with GeForce GTX 1080Ti as our experimental machine. Our network trains 200 epochs for approximately 2 to 3 days. Result 4D light field benchmark synthetic scene dataset was used for quantitative experiments, and the computer vision and image analysis (CVIA) Konstanz specular synthetic scene dataset and real scene dataset captured by Lytro Illum were used for the qualitative experiments. We used three evaluation criteria in our quantitative experiment, namely, mean square error (MSE), bad pixel (BP), and peak signal-to-noise ratio (PSNR). Experiment results show that our proposed method has an improved depth estimation. Our quantitative analysis on 4D light field benchmark synthetic dataset shows that our proposed method reduces the MSE value by 20.24%, has a BP value (0.07) that is 2.62% lower than that of the second-best model, and a 4.96% PSNR value. Meanwhile, in our qualitative analysis of the CVIA Konstanz specular synthetic dataset and the real scene dataset captured by Lytro Illum, our proposed algorithm achieves ideal depth estimation results, thereby verifying its effectiveness in recovering depth information in the specular highlight region. We also perform an ablation experiment of the network receptive field expansion and residual feature coding modules, and we find that the multi-scale feature fusion method improves the effect of depth estimation in the highlight areas and greatly improves the residual structure. Conclusion Our model can effectively estimate image depth information. This model achieves a high recovery accuracy in recovering highlight region depth information, has a smooth object edge region, and can efficiently preserve image detail information.
Keywords

订阅号|日报