基于图像变换的无监督对抗样本检测方法研究
DOI:
作者:
作者单位:

1.武汉大学 空天信息安全与可信计算教育部重点实验室,国家网络安全学院;2.武汉软件工程职业学院(武汉开放大学)

作者简介:

通讯作者:

中图分类号:

T391.4??????

基金项目:


Unsupervised adversarial example detection methods based on image transformation
Author:
Affiliation:

1.Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education,School of Cyber Science and Engineering, Wuhan University,;2.Wuhan Vocational College of Software and Engineering(Wuhan Open University)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    深度神经网络(DNNs)对经过特殊设计的对抗样本存在脆弱性,容易受到欺骗。目前的检测技术虽能识别一些恶意输入,但在对抗复杂攻击手段时,其保护能力仍显不足。本文基于无标记数据提出一种新型无监督对抗样本检测方法,其核心思想是通过特征的构建与融合,将对抗样本检测问题转化为异常检测问题,为此设计了图像变换、神经网络分类器、热力图绘制、距离计算以及异常检测器5个核心部分。先对原始图像进行变换处理,将变换前后的图像分别输入神经网络分类器,提取预测概率数组与卷积层特征绘制热力图,并将检测器从单纯关注模型输出层拓展到输入层特征,增强检测器对对抗样本和正常样本差异的建模和度量能力,进而计算变换前后图像的概率数组KL距离与热力图关注点变化距离,将距离特征输入异常检测器判断是否为对抗样本。在大尺寸高质量图像数据集 ImageNet上进行实验,本检测器面向5种不同类型攻击取得的平均AUC值为0.77,展现出良好的检测性能。与其他前沿的无监督对抗样本检测器相比,本检测器在保持相近的误报率的情况下TPR至少高出15.88%,检测能力具有明显优势。

    Abstract:

    Deep Neural Networks (DNNs) are vulnerable to specially designed adversarial examples and can be easily deceived. Although current detection techniques can identify some malicious inputs, their protective capabilities are still insufficient when facing complex attacks. This paper proposes a novel unsupervised adversarial example detection method based on unlabeled data. The core idea is to transform the adversarial example detection problem into an anomaly detection problem through feature construction and fusion. To this end, five core components are designed, including image transformation, neural network classifier, heatmap generation, distance calculation, and anomaly detector. Firstly, the original image is transformed, and the images before and after the transformation are input into the neural network classifier. The prediction probability array and convolutional layer features are extracted to generate a heatmap. The detector is extended from focusing solely on the model's output layer to the input layer features, enhancing the detector's ability to model and measure the differences between adversarial and normal samples. Then, the KL divergence of the probability arrays and the change distance of the heatmap focus points of the images before and after the transformation are calculated, and the distance features are input into the anomaly detector to determine whether it is an adversarial example. Experiments on the large-scale, high-quality image dataset ImageNet showed that our detector achieved an average AUC value of 0.77 against five different types of attacks, demonstrating good detection performance. Compared with other cutting-edge unsupervised adversarial example detectors, our detector has a TPR at least 15.88% higher while maintaining a similar false alarm rate, indicating a significant advantage in detection capability.

    参考文献
    相似文献
    引证文献
引用本文

章凌,赵波,黄林荃.基于图像变换的无监督对抗样本检测方法研究[J].南京信息工程大学学报,,():

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-21
  • 最后修改日期:2024-05-15
  • 录用日期:2024-05-16
  • 在线发布日期:
  • 出版日期:

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2024 版权所有  技术支持:北京勤云科技发展有限公司