Current Issue Cover
6DoF视频技术研究进展

王旭1, 刘琼2, 彭宗举3, 侯军辉4, 元辉5, 赵铁松6, 秦熠7, 吴科君8, 刘文予2, 杨铀2(1.深圳大学计算机与软件学院, 深圳 518060;2.华中科技大学电子信息与通信学院, 武汉 430074;3.重庆理工大学电气与电子工程学院, 重庆 400054;4.香港城市大学计算机科学系, 香港;5.山东大学控制科学与工程学院, 济南 250061;6.福州大学物理与信息工程学院, 福州 350300;7.华为技术有限公司, 上海 201206;8.南洋理工大学电气与电子工程学院信息科学与系统研究中心, 新加坡 639798, 新加坡)

摘 要
随着元宇宙概念的兴起,以6自由度(six degree of freedom,6DoF)视频为代表的新一代交互式媒体技术得到产业界和学术界的广泛关注。6DoF视频隶属于多媒体通信领域,通过计算重构的方式向用户提供包括视角、光照、焦距和视场范围等多个维度的媒体交互与内容变化,能使千里之外的用户有身临其境、千人千面之感,与元宇宙具有的感知、计算、重构、协同和交互等技术特征具有高度重合性。因此,6DoF视频涵盖的技术体系可作为实现元宇宙的替代技术框架。本文提出了6DoF视频10个方面的40个问题,并将6DoF视频端到端技术链条归纳为生成、分发和呈现3个宏观阶段,随后围绕这3个技术阶段分别从内容采集与预处理、编码压缩与传输优化以及交互与呈现等方面阐述国内外研究进展。其中,在内容采集与预处理阶段,阐述了多视点联合采集、多视点与深度联合采集、深度图与点云预处理;在视频压缩与传输阶段,阐述了多视点视频编码、多视点+深度视频编码、光场图像压缩、焦栈图像压缩、点云编码压缩、6DoF视频传输优化;在交互与显示阶段,阐述了解码后滤波增强和虚拟视点合成。最后,本文围绕该领域当下的挑战,对未来趋势进行了讨论。
关键词
Research progress of six degree of freedom(6DoF) video technology

Wang Xu1, Liu Qiong2, Peng Zongju3, Hou Junhui4, Yuan Hui5, Zhao Tiesong6, Qin Yi7, Wu Kejun8, Liu Wenyu2, Yang You2(1.College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China;2.School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China;3.School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China;4.Department of Computer Science, City University of Hong Kong, Hong Kong, China;5.School of Control Science and Engineering, Shandong University, Jinan 250061, China;6.College of Physics and Information Engineering, Fuzhou University, Fuzhou 350300, China;7.Huawei Technologies Co., Ltd., Shanghai 201206, China;8.School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore)

Abstract
The six degree of freedom based(6DoF-based) video technique is featured by interaction between video content and users, and it is focused on its 1) linear-derived multiple capacities, 2) horizontal straightness, 3) vertical straightness, 4) pitch, 5) yaw, and 6) roll motions of users. In this manner, users can change multiple audio-visual dimensions, including:viewing perspective, lighting condition or directions, focal length or spot, field of view through ground truthcompared computational or synthesized content reconstruction. The 6DoF video can be used to change conventional behavior of video watching, in which the user-video interaction is limited to different span of channels and the relations between video contents is restricted as well. The 6DoF-based technique can offer immersive experience for users because the homogeneity of video-watching receptive content can be in consistency per their motion. In this way, the 6DoF video can be recognized as an epoch-making type of video for academia and industries. At the same time, metaverse-driven 6DoF video has also been recognized as a new generation of interactive media technology, which is recognized as one of the key technologies for Metaversein related domains. All these features make users experience feel depth-immersive and diversified. This mutual-benefited status is in relevance to the metaverse-based perception, computing, reconstruction, collaboration, interaction, and other related technical features. Basically, 6DoF video is originated from the framework of typical multimedia communication system, where it can be suitable to meet the basic procedure requirement of video-contextual multimedia communication like its capturing, content process, video compression, transmission, decode and display. To realize intelligent human-terminal interaction, it brings a new look beyond traditional 3D video communication system, and the requirements for interaction range and intelligence are still acomplicated. Therefore, such newly techniques are in support of new type of video to a certain extent. Our proposed technical framework of 6DoF-relevant multimedia communication system is demonstrated on the three aspects of generation, distribution, and visualization. Forty scientific and technical challenges of this domain are illustrated and it can be categorized them into 10 different directions. We carry out literature review of its growth of per one of these 10 directions on the aspects of content acquisition and pre-processing, coding compression and transmission optimization, interaction, and presentation. For techniques analysis, it is focused on such aspects of 1) content generation-derived multiview video-captured content, 2) multiview video plus depth, and 3) point cloud. The dataacquired systems can be categorized by 2 types of multiview and multiview plus depth system, and different types of contents can be thus obtained via these systems. To describe the 3D structure of the spot scene initially, multiview color videos can be captured without any affiliated information, but it is a challenging issue for subsequent data processing techniques. After that, multiview plus depth system is proposed to handle this problem, while data can be classified into two types of i) color plus depth and ii) point cloud. Data-heteogenous volume is a big challenge for these kinds of data representation to some extent. The video compression techniques-after can be focused on in terms of the video contents. Popular compression techniques for multiview video, multiview video plus depth, light fields, and point clouds are discussed further, including their origination, mechanism, performance, and credible application standards. Subsequently, transmission techniques for 6DoF video are illustrated as well after the video bitstream is obtained. Such techniques like bit allocation, interaction oriented transmission, standards and protocols are all mentioned and discussed. Its quality evaluation and synthesized-view for user-terminal interaction are analyzed as well. It can be reached to user-friendly in terms of a "capture to display" based 6DoF video system. Pixel-based methods are still discussed and optimized but computational cost is challenged there. Recent learning based methods are more concerned about terminal-oriented applications, especially for its synthesized view. To meet the requirements from practical applications, 40 scientific and technical challenges mentioned above are still to be resolved further.
Keywords

订阅号|日报