面向上行流媒体的压缩感知视频流技术前沿
摘 要
上行流媒体在军民融合领域展现出日益重要的新兴战略价值,压缩感知视频流技术体系在上行流媒体应用中具有前端功耗低、容错性好、适用信号广等独特优势,已成为当前可视通信研究的前沿与热点之一。本文从阐述上行流媒体的应用特征出发,从性能指标、并行分块计算成像、低复杂度视频编码、视频重构和语义质量评价等方面,分析了当前针对压缩感知视频流的基础理论与关键技术,对国内外相关的研究进展进行了探究与比较。面向上行流媒体的压缩感知视频流面临着观测效率难控、码流适配困难和重建质量较低等技术挑战。对压缩感知视频流的技术发展趋势进行展望,未来将通过前端与智能云端的分工协作,突破高效率的视频观测与语义质量导引视频重构等关键技术,进一步开拓压缩感知视频流在上行流媒体应用中的定量优势与演进途径。
关键词
Survey on compressive sensing video stream for uplink streaming media
Liu Hao, Huang Rong, Yuan Haodong(College of Information Science and Technology, Donghua University, Shanghai 201620, China) Abstract
Uplink streaming media has an emerging strategic value in the civil-military integration field. For uplink streaming media applications, any compressive sensing video stream has technological advantages in terms of low-complexity terminal, good error resilience, and widely available signals. This technology is becoming one of the main issues in visual communication research. The compressive sensing video stream is a new type of visual communication whose functional modules mainly consist of front-end video observation and cloud-end video reconstruction. The core technology of compressive sensing video stream has not developed to a degree that can be standardized. When the uplink streaming media provides a large number of video sensing signals not for human viewing but for universal machine vision, any compressive-sensing video stream utilizes a new signal-processing mechanism that can avoid the shortage of existing uplink streaming technologies to first obtain additional information and then discard it. Based on the application characteristics of uplink streaming media, this study analyzes the basic theories and key technologies of compressive-sensing video stream, i.e., performance metrics, parallel block computational imaging, low-complexity video encoding, video reconstruction, and semantic quality evaluation. The latest research progress is also investigated and compared in this survey. The video sensing signal is usually divided into group-of-frames (GOF), and each GOF is further divided into a key frame and several non-key frames. As block compressive sensing (BCS) requires less sensing or storage resources at the front end, it not only realizes the lightweight observation matrix but also transmits block-by-block or in parallel. In a compressive sensing video stream, the GOF-BCS block array denotes the set of all BCS blocks in a GOF. The existing compressive sensing video stream adopts such a technical framework as single-frame observation, open-loop encoding, and fidelity-guided reconstruction. The study results show that for uplink streaming media, the existing compressive-sensing video stream faces bottleneck problems such as uncontrollable observation efficiency, lack of bitstream adaptation, and low reconstruction quality. Therefore, the technology development trend of compressive-sensing video streams have to be examined. The research directions of future compressive sensing video streams aim to focus on the following aspects. 1) Efficiency-optimized GOF-BCS block-array layout. The existing compressive-sensing video stream only uses a simple combination of GOF frame number, BCS block size, and sampling rate, which is a special layout of the GOF-BCS block-array. This special layout lacks a rationality proof. Therefore, we need to compare and analyze various block-array layouts and spatial-temporal partitions, and then design a universally optimized GOF-BCS block-array to quickly generate the observation vectors with more spatiotemporal semantics. At the same time, this approach is conducive to the hierarchical sparse modeling of video reconstruction. 2) Observation control and bitstream adaptation of video sensing signal. During video encoding, a trade-off occurs between the sampling rate and quantization depth. In subsequent study, an important task is to know how to construct the distribution model of observation vectors and adaptively adjust the sampling rate and quantization depth. Based on an efficiency-optimized GOF-BCS block-array, the novel compressive sensing video stream may improve the observation efficiency at the front end, and adapt both low-complexity encoding and wireless transmission. Through the dynamic interaction between source and channel at the front end, the feedback coordination is formed between video observation and wireless transmission, and the front-end complexity may be quantitatively controlled. 3) During video reconstruction, an important methodology is to obtain the sparse solution of the underdetermined system by prior modeling. When the hierarchical sparse model cannot stably represent the observation vectors, the data-driven reconstruction mechanism can make up for the deficiency of prior modeling. Future research will construct the generation and recovery mechanism of partial reversible signals, and explore the hybrid reconstruction mechanism of hierarchical sparse model and deep neural network (DNN). 4) Semantic quality assessment model for any reconstructed block-array. At present, the quality evaluation of reconstructed videos is limited to pixel-level fidelity. For universal machine vision, the video reconstruction relies more on semantic quality evaluation. On the basis of sparse residual prediction reconstruction, the cloud end gradually adds the data-driven reconstruction by DNN. By integrating the semantic quality assessment model, the video reconstruction mechanism with memory learning may be provided at cloud end. 5) A new technical framework will combine the high-efficiency observation and semantic-guided hybrid reconstruction. One of the important research directions is to construct the effective division and cooperation between the front and cloud ends. Besides the complexity-controllable front end, the new technical framework should demonstrate the higher semantic quality in video reconstruction and enhance the interpretability of compressive-sensing deep learning. For the video-sensing signal with dynamic scene changes, the new technical framework can balance the observation distortion, bitrate, and power consumption at any resource-constrained front end. The research directions are expected to break through the limitations of the existing compressive-sensing video stream. Such key technologies have to be developed as high-efficiency observation and semantic-guided hybrid reconstruction, which can further highlight the unique advantage and quantitative evolution of compressive-sensing video stream technology for uplink streaming media applications.
Keywords
video stream observation efficiency bitstream adaptation semantic quality video reconstruction review
|