基于图像变换的无监督对抗样本检测方法研究
作者:
作者单位:

1.武汉大学 空天信息安全与可信计算教育部重点实验室,国家网络安全学院;2.武汉软件工程职业学院(武汉开放大学)

中图分类号:

T391.4??????

基金项目:


Unsupervised adversarial example detection methods based on image transformation
Author:
Affiliation:

1.Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education,School of Cyber Science and Engineering, Wuhan University,;2.Wuhan Vocational College of Software and Engineering(Wuhan Open University)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    深度神经网络(DNNs)对经过特殊设计的对抗样本存在脆弱性,容易受到欺骗。目前的检测技术虽能识别一些恶意输入,但在对抗复杂攻击手段时,其保护能力仍显不足。本文基于无标记数据提出一种新型无监督对抗样本检测方法,其核心思想是通过特征的构建与融合,将对抗样本检测问题转化为异常检测问题,为此设计了图像变换、神经网络分类器、热力图绘制、距离计算以及异常检测器5个核心部分。先对原始图像进行变换处理,将变换前后的图像分别输入神经网络分类器,提取预测概率数组与卷积层特征绘制热力图,并将检测器从单纯关注模型输出层拓展到输入层特征,增强检测器对对抗样本和正常样本差异的建模和度量能力,进而计算变换前后图像的概率数组KL距离与热力图关注点变化距离,将距离特征输入异常检测器判断是否为对抗样本。在大尺寸高质量图像数据集 ImageNet上进行实验,本检测器面向5种不同类型攻击取得的平均AUC值为0.77,展现出良好的检测性能。与其他前沿的无监督对抗样本检测器相比,本检测器在保持相近的误报率的情况下TPR至少高出15.88%,检测能力具有明显优势。

    Abstract:

    Deep Neural Networks (DNNs) are vulnerable to specially designed adversarial examples and can be easily deceived. Although current detection techniques can identify some malicious inputs, their protective capabilities are still insufficient when facing complex attacks. This paper proposes a novel unsupervised adversarial example detection method based on unlabeled data. The core idea is to transform the adversarial example detection problem into an anomaly detection problem through feature construction and fusion. To this end, five core components are designed, including image transformation, neural network classifier, heatmap generation, distance calculation, and anomaly detector. Firstly, the original image is transformed, and the images before and after the transformation are input into the neural network classifier. The prediction probability array and convolutional layer features are extracted to generate a heatmap. The detector is extended from focusing solely on the model's output layer to the input layer features, enhancing the detector's ability to model and measure the differences between adversarial and normal samples. Then, the KL divergence of the probability arrays and the change distance of the heatmap focus points of the images before and after the transformation are calculated, and the distance features are input into the anomaly detector to determine whether it is an adversarial example. Experiments on the large-scale, high-quality image dataset ImageNet showed that our detector achieved an average AUC value of 0.77 against five different types of attacks, demonstrating good detection performance. Compared with other cutting-edge unsupervised adversarial example detectors, our detector has a TPR at least 15.88% higher while maintaining a similar false alarm rate, indicating a significant advantage in detection capability.

    参考文献
    [1] Goodfellow I J, Shlens J, Szegedy C. Explaining and Harnessing Adversarial Examples[J]. Computer Science, 2014, 1-11
    [2] Tian S X, Yang G L, Cai Y. Detecting adversarial examples through image transformation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. February 2-7, 2018, New Orleans, Lousiana, USA. 2018, 32(1): 4139-4146
    [3] Agarwal A, Singh R, Vatsa M, et al. Image Transformation-Based Defense Against Adversarial Perturbation on Deep Learning Models[J]. IEEE transactions on dependable and secure computing, 2021(18-5): 2106-2121
    [4] Xu W L, Evans D, Qi Y J. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks[C]//Network and Distributed Systems Security Symposium (NDSS). February, 2018, San Diego. 2018: 1-15
    [5] 刘会, 赵波, 郭嘉宝, 等. 针对深度学习的对抗攻击综述[J]. 密码学报, 2021, 8(2): 202-214
    Liu Hui, Zhao Bo, Guo Jiabao, et al. A survey of adversarial attacks on deep learning[J]. Journal of Cryptologic Research, 2021, 8(2): 202-214
    [7] [6] Kurakin A, Goodfellow I, Bengio S. Adversarial examples in the physical world[J/OL]. arXiv preprint arXiv:1607.02533, 2016[2023-03-20]. https://arxiv.org/abs/1607.02533.
    [8] [7] Carlini N, Wagner D. Towards Evaluating the Robustness of Neural Networks[C]//2017 IEEE Symposium on Security and Privacy (SP). May 22-26, 2017, San Jose, CA, USA. IEEE, 2017: 39-57
    [9] [8] Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks[C]//2016 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 2574-2582
    [10] [9] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J/OL]. arXiv preprint arXiv:1409.1556, 2014[2023-03-20]. https://arxiv.org/abs/1409.1556.
    [11] [10] Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization[C]//2017 Proceedings of the IEEE International Conference on Computer Vision (ICCV). October 22-29, 2017, Venice, Italy. IEEE, 2017: 618-626
    [12] [11] Liu F T, Ting K M, Zhou Z H. Isolation Forest[C]//2008 IEEE International Conference on Data Mining. December 15-19, 2008, Pisa, Italy. IEEE, 2008: 413-422
    [13] [12] Schlkopf B, Platt J C, Shawe-Taylor J C, et al. Estimating the Support of a High-Dimensional Distribution[J]. Neural Computation, 2001, 13(7): 1443-1471
    [14] [13] Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying Density-Based Local Outliers[C]//2000 Acm Sigmod International Conference on Management of Data. May 15-18, 2000, Dallas, Texas, USA. ACM, 2000: 93-104
    [15] [14] Papernot N, Faghri F, Carlini N, et al. Technical report on the cleverhans v2.1.0 adversarial examples library[J/OL]. arXiv preprint arXiv:1610.00768, 2018[2023-03-20].
    [16] [15] Xu W, Evans D, Qi Y. Feature squeezing: Detecting adversarial examples in deep neural networks[C]// 2018 Proceedings 2018 Network and Distributed System Security Symposium (NDSS), February 18–21, 2018, San Diego, CA, USA. 2018: 1-15
    [17] [16] Aldahdooh A, Hamidouche W, Olivier D. Revisiting Model''s Uncertainty and Confidences for Adversarial Example Detection[J]. Applied Intelligence, 2023, 53: 509-531
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

章凌,赵波,黄林荃.基于图像变换的无监督对抗样本检测方法研究[J].南京信息工程大学学报,,():

复制
分享
文章指标
  • 点击次数:67
  • 下载次数: 0
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2024-03-21
  • 最后修改日期:2024-05-15
  • 录用日期:2024-05-16

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2025 版权所有  技术支持:北京勤云科技发展有限公司