面向人脸图像发布的差分隐私保护
摘 要
目的 由于人脸图像蕴含着丰富的个人敏感信息,直接发布出来可能会造成个人的隐私泄露。为了保护人脸图像中的隐私信息,本文提出了一种基于傅里叶变换与差分隐私技术相结合的人脸图像发布方法FIP(facial image publication)。方法 将人脸图像作为实数域2维矩阵,充分利用离散傅里叶变换技术压缩图像。为了有效均衡由拉普拉斯机制引起的噪音误差以及由傅里叶变换导致的重构误差,引入一种基于指数机制的傅里叶系数选择方法EMK(exponential mechanism-based k coefficients sampling),它能够在不同的系数空间中挑选出合理的傅里叶系数来压缩人脸图像,然后利用拉普拉斯机制对所挑选出的系数添加噪音,进而使整个处理过程满足ε-差分隐私。此外,为了避免较大的傅里叶系数空间导致指数机制挑选系数不准确问题,基于离散实数傅里叶变换的共轭对称特性,提出了一种增强的指数机制挑选傅里叶系数方法BEMK(boosted exponential mechanism-based k coefficients sampling),该方法不仅进一步压缩离散傅里叶系数空间,而且还能够提高人脸图像发布的精度。结果 基于4种真实人脸图像数据集采用支持向量机分类与采用主成分分析技术验证方法的正确性。从算法的准确率、召回率,以及F1-Score度量结果显示,提出的基于离散傅里叶变换技术的人脸图像发布方法均优于直接采用拉普拉斯机制的发布方法LAP(Laplace mechanism-based publication)。结论 实验结果表明,本文方法能够实现满足ε-差分隐私的敏感人脸图像发布,图像分类验证其具有较高的可用性。特别是BEMK方法具有较好的鲁棒性,是一种有效的隐私人脸图像发布方法。
关键词
Facial image publication with differential privacy
Zhang Xiaojian1, Fu Congcong1, Meng Xiaofeng2(1.School of Computer & Information Engineering, Henan University of Economics and Law, Zhengzhou 450046, China;2.School of Informatica, Renmin University of China, Beijing 100872, China) Abstract
Objective Facial image publication (FIP) in a direct way may lead to privacy leakage of individuals because facial images are inherently sensitive. To protect the private information in facial images, this paper proposes an efficient publishing algorithm called FIP that is based on Fourier transform combined with differential privacy, which is the state-of-the-art model to address privacy concerns. Method First, this algorithm uses the real-valued matrix to model the facial image, in which each cell corresponds to each pixel point of the image. Then, on the basis of the matrix, this algorithm relies on the Fourier transform technique to extract the Fourier coefficients (e.g., a pre-defined limit k on the coefficients sampled) and then uses the Laplace mechanism to inject noise into each coefficient to ensure differential privacy. Finally, this algorithm uses Fourier inverse transform to reconstruct the noisy facial image. However, in the FIP algorithm, we encounter two sources of errors:1) the Laplace error (LE) due to the Laplace noise injected and 2) the reconstruction error (RE) caused by the lossy compression of Fourier transform. The selection of k is a serious dilemma:for the FIP algorithm to produce the low LE, k cannot be large, whereas a small k causes the RE to be extremely large. However, increasing k would cause RE to be small but LE to be extremely large. Furthermore, k cannot be directly tuned on the basis of facial images; otherwise, the selection of k itself reveals private information in facial images and violates differential privacy. Therefore, a differentially private k value is vital in balancing the LE and RE in sanitized facial images. To remedy the deficiency of FIP, we present exponential mechanism-based k coefficient sampling (EMK), a k coefficient sampling algorithm that adopts exponential mechanism to select the suitable coefficients but eliminates the dependency on a pre-defined k. The core of EMK is to sample k coefficients first by using a portion of the privacy budget in different candidate coefficients set via exponential mechanism and uses the Laplace noise to perturb the k samples by utilizing the remaining budget. On the basis of the sequential composition of differential privacy, the two phases meet ε-differential privacy. The EMK algorithm, however, does not exploit the correlation over all coefficients, which may generate a large candidate sampling set and an inaccurate k selection. We notice that the discrete real-valued Fourier coefficients are correlated as they are half-redundant. On the basis of this observation, a boosted EMK (BEMK) is proposed to address the problem in EMK. The main idea of BEMK is to use the conjugate symmetry of discrete real Fourier transform to compress the candidate set and adopt the exponential mechanism to select the k coefficients in the compressed candidate set. Result On the basis of the SVM classification and principal component analysis technique, comprehensive experiments were conducted over four real facial image datasets ((CMU), (ORL), Yale, and YaleB) to evaluate the quality of the facial images generated from the BEMK, EMK, FIP, and LAP algorithms using a variety of metrics, including precision, recall, and F1 score. Our experiments show that the proposed BEMK, EMK, and FIP algorithms outperform LAP in terms of the abovementioned four metrics. BEMK applies to the four datasets and achieves better accuracy than EMK and FIP. For example, on the CMU dataset, we fix the matrix=128×128 and vary the privacy budget ε (i.e., 0.1, 0.5, 0.9, and 1.4) to study the accuracy of each algorithm. Tables 1-12 show the results. As expected, the accuracy measures of all algorithms increase when ε increases. When ε varies from 0.1 to 1.4, BEMK still achieves a better precision, recall, and F1 score than the other algorithms. Their values are 88%, 90%, and 89%, respectively. Conclusion We provide both in-depth theoretical analysis and extensive experiments to compare BEMK with EMK, FIP, and LAP. Results show that the proposed algorithms can overcome the privacy leakage of FIP. BEMK significantly improves compared with the other three algorithms. Moreover, BEMK also maintains good robustness and generates high-quality synthetic facial images while still satisfying differential privacy.
Keywords
|