面向远程光体积描记的人脸检测与跟踪
摘 要
目的 远程光体积描记(remote photoplethysmograph,rPPG)是一种基于视频的非接触心率测量方法,通过跟踪人脸皮肤区域并从中提取周期性微弱变化的颜色信号估计出心率。目前基于级联回归树的人脸地标方法训练的Dlib库,由于能快速准确定位人脸轮廓,正逐渐被研究者用于跟踪皮肤感兴趣区域(region of interest,ROI)。由于实际应用中存在地标无规则抖动,且现有研究没有考虑目标晃动的影响,因此颜色信号提取不准确,心率估计精度不佳。为了克服以上缺陷,提出一种基于Dlib的抗地标抖动和运动晃动的跟踪方法。方法 本文方法主要包含两个步骤:首先,通过阈值判断两帧间地标的区别,若近似则沿用地标,反之使用当前帧地标以解决抖动问题。其次,针对运动晃动,通过左右眼地标中点计算旋转角度,矫正晃动的人脸,保证ROI在运动中也能保持一致。结果 通过信噪比(signal-to-noise,SNR)、平均绝对误差(mean absolute error,MAE)和均方根误差(root mean squared error,RMSE)来评价跟踪方法在rPPG中的测量表现。经在UBFC-RPPG(stands for Univ.Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy)和PURE(Pulse Rate Detection Dataset)数据集测试,与Dlib相比,本文方法rPPG测量结果在UBFC-RPPG中SNR提高了约0.425 dB,MAE提高0.291 5 bpm,RMSE降低0.645 3 bpm;在PURE中SNR降低了0.041 1 dB,MAE降低0.065 2 bpm,RMSE降低0.271 8 bpm。结论 本文方法相比于Dlib有效提高跟踪框稳定性,在静止和运动中都能跟踪相同ROI,适合rPPG应用。
关键词
Face detection and tracking algorithm for remote photoplethysmography
Zhao Changchen, Mei Peiyi, Feng Yuanjing(College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China) Abstract
Objective Remote photoplethysmography (rPPG) is a video-based noncontact heart rate measurement method. It tracks the skin area of the face, extracts periodic subtle color variations within video data, and estimates heart rate from color signals. It has a broad application in the field of medical healthcare and daily living. Currently, facial landmark-based tracking methods are widely used by researchers to track regions of interest (ROIs) because it can quickly and accurately locate face contours. The Dlib library trained based on the cascade regression tree method is widely used. However, in practice, it has problems, such as the irregular jitter of landmarks during tracking, and present research does not consider the effect of target shaking. Thus, color signal extraction is inaccurate, and the accuracy of heart rate estimation is poor. To overcome the above problems, we first use the threshold method to stabilize landmarks, then rotate the image to correct the shaking face, and finally extract the region of interest and extract the color signal to estimate the heart rate. Method When Dlib is applied to a frame, it detects the face bounding box, fits a set of average landmark points in the model to the face frame as the first predicted landmark, and updates the landmark through a cascade of regression trees. In each regression tree, a tree node decides the direction of splitting on the basis of the difference in pixel intensity between two pixels in a graph and threshold, and the offset is obtained until the last layer. When the detected face position is different or the offset obtained by a certain tree is different, deviation appears between the landmarks of two frames, that is, the landmarks jitter irregularly. Dlib suffers from the problem of landmark jitter. In some low-head scenes, the degree of jitter is particularly large, but the contours of landmarks detected using two-frame images are approximate. Nevertheless, the facial landmark detection accuracy of Dlib is more accurate than most object detection and tracking algorithms. Accordingly, the proposed method for stabilizing landmarks is based on the threshold method. First, we employ Euclidean distance and the standard deviation of a landmark as the threshold, which can be used in determining the current movement state of a subject. A large standard deviation indicates that the difference between the two frames of the landmark is large. Conversely, a small standard deviation indicates that the difference between the two frames of landmarks is small, that is, the target may be stationary. In this paper, the landmark of a previous frame is used when a target is stationary or landmark jitter is strong. Otherwise, the landmarks are updated regularly. Second, for motion shake, for maintaining the straight position of faces in images, a rotation correction mechanism is proposed. It calculates the rotation angle through the midpoint of the left and right eye landmarks for image rotation, then maps landmarks to the rotated image, and finally extracts an ROI to ensure that the ROI is consistent. Result This paper evaluates the performance of a tracking method for rPPG pulse extraction by using signal-to-noise ratio (SNR). SNR represents the quality of the pulse signal estimated from the color signals extracted from the tracking area. This paper selects UBFC-RPPG(stands for Univ. Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy) and PURE(Pulse Rate Detection Dataset) datasets to test method performance. Compared with Dlib, the proposed method improves SNR by 0.425 dB and root mean squared error(RMSE) decreases by 0.645 3 bpm but mean absolute eror(MAE) increases by 0.291 5 bpm on UBFC-RPPG dataset. MAE decreases by 0.065 2 bpm and RMSE decreases by 0.271 8 bpm but SNR decreases by 0.0411 dB on PURE dataset. Conclusion Compared with Dlib, the proposed method effectively improves the stability of the tracking frame and can track the same ROI still and moving images of the subject in still or moving situation. It is a tracking method suitable for rPPG applications.
Keywords
remote photoplethysmography(rPPG) heart rate measurement object tracking facial landmark rotation correction
|