Current Issue Cover
针对未知攻击的泛化性对抗防御技术综述

周大为1, 徐一搏1, 王楠楠1, 刘德成1, 彭春蕾1, 高新波2(1.西安电子科技大学空天地一体化综合业务网全国重点实验室, 西安 710071;2.重庆邮电大学重庆市图像认知重点实验室, 重庆 400065)

摘 要
在计算机视觉领域,对抗样本是一种包含攻击者所精心设计的扰动的样本,该样本与其对应的自然样本的差异通常难以被人眼察觉,却极易导致深度学习模型输出错误结果。深度学习模型的这种脆弱性引起了社会各界的广泛关注,与之相对应的对抗防御技术得到了极大发展。然而,随着攻击技术和应用环境的不断发展变化,仅实现针对特定类型的对抗扰动的鲁棒性显然无法进一步满足深度学习模型的性能要求。由此,在尽可能不依赖对抗样本的情况下,通过更高效的训练方式和更少的训练次数,达到一次性防御任意种类的未知攻击的目标,是当下亟待解决的问题。期望所防御的未知攻击要有尽可能强的未知性,要在原理、性能上尽可能彻底地不同于训练阶段引入的攻击。为进一步了解未知攻击的对抗防御技术的发展现状,本文以上述防御目标为核心,对本领域的研究工作进行全面、系统的总结归纳。首先简要介绍了研究背景,对防御研究所面临的困难与挑战进行了简要说明。将未知对抗攻击的防御工作分为面向训练机制的方法和面向模型架构的方法。对于面向训练机制的方法,根据防御模型所涉及的最基本的训练框架,从对抗训练、自然训练以及对比学习3个角度阐述相关工作。对于面向模型架构的方法,根据模型结构的修改方式从目标模型结构优化、输入数据预处理两个角度分析相关研究。最后,分析了现有未知攻击防御机制的研究规律,同时介绍了其他相关的防御研究方向,揭示了未知攻击防御研究的整体发展趋势。不同于一般对抗防御综述,本文注重在未知性极强的攻击上的防御的调研与分析,对防御机制的泛化性、通用性提出了更高的要求,希望能为未来防御机制的研究提供更多有益的思考。
关键词
Generalized adversarial defense against unseen attacks: a survey

Zhou Dawei1, Xu Yibo1, Wang Nannan1, Liu Decheng1, Peng Chunlei1, Gao Xinbo2(1.State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an 710071, China;2.Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China)

Abstract
Deep learning-based models have achieved impressive breakthroughs in various areas in recent years. However, they are vulnerable when their inputs are affected by imperceptible but adversarial noises, which can easily lead to wrong outputs. To tackle this problem, many defense methods have been proposed to mitigate the effect from these threat models for deep neural networks. As adversaries seek to improve the technologies of disrupting the models’ performances, an increasing number of attacks that are unseen to the model during the training process are emerging. Thus, the defense mechanism, which defends against only some specific types of adversarial perturbations, is becoming less robust. The ability of a model to generally defend against various unseen attacks becomes pivotal. Unseen attacks should be as different as possible from the attacks used in the training process in terms of theory and attack performance rather than adjustment of parameters from the same attack method. The core is to defend against any attacks via efficient training procedures, while the defense is expected to be as independent as possible from adversarial attacks during training. Our survey aims to summarize and analyze the existing adversarial defense methods against unseen adversarial attacks. We first briefly review the background of defending against unseen attacks. One of the main reasons that the model is robust against unseen attacks is that it can extract robust features through a specially designed training mechanism without explicitly designing a defense mechanism that has special internal structures. A robust model can be achieved by modifying its structure or designing additional modules. Therefore, we divide these methods into two categories: training mechanism-based defense and model structure-based defense. The former mainly seeks to improve the quality of the robust feature extracted by the model via its training process. 1) Adversarial training is one of the most effective adversarial defense strategies, but it can easily overfit to some specific types of adversarial noises. Well-designed attacks for training can explicitly improve the model’s ability to explore the perturbation space during training, which directly helps the model learn more representative features compared with traditional adversarial attacks in the perturbation space. Adding regularization terms is another way to obtain robust models by improving the robust features from the basic training process. Furthermore, we introduce some adversarial training-based methods combined with knowledge from other domains, such as domain adaptation, pre-training, and fine tuning. Different examples make different contributions to the model’s robustness. Thus, example reweighting is also a way to achieve robustness against attacks. 2) Standard training is the most basic training method in deep learning. Data augmentation methods focus on example diversity of standard training, while adding regularization terms into standard training aims to enhance the model outputs’ stabilization. Pre-training strategy aims to achieve a robust model within a predefined perturbation bound. 3) We also found that contrastive learning is a useful strategy as its core ideas about feature similarity match well with the goal of acquiring representative robust features. Model structure-based defense, meanwhile, mainly focuses on intrinsic drawbacks from the model’s structure. It is divided into structure optimization for target network methods and input data pre-processing methods according to how the structures are modified. 1) Structure optimization for target network aims to enhance the model’s ability to obtain useful information from inputs and features because the network itself is susceptible to variations from them. 2) Input data pre-processing focuses on eliminating the threats from examples before feeding them into the target network. Removing adversarial noise from inputs or detecting adversarial examples to reject them are two popular strategies because they are easily modeled and rely less on adversarial training examples compared with other methods such as adversarial training. Finally, we analyze the trends of research in this area and summarize some research on other related domains. 1) Defending against multiple adversarial perturbation well cannot make sure that the model is robust against various unseen attacks but contributes to the improvement of robustness against one specific type of perturbation. 2) With the development of defense against unseen adversarial attacks, some auxiliary tools such as the accelerating module have been proposed. 3) Defense against unseen common corruptions is beneficial for applications of defense methods because adversarial perturbations cannot represent the whole perturbation space in the real world. To summarize, defending against attacks that are totally different from the attacks during training has stronger generalizability. The analysis based on this goal shows differences from traditional surveys about adversarial defense. We hope that this survey can further motivate research on defending against unseen adversarial attacks.
Keywords

订阅号|日报