阳帆,魏宪,郭杰龙,等.基于生成式自监督学习的对抗样本分类算法[J]. 微电子学与计算机,2024,41(2):11-18. doi: 10.19304/J.ISSN1000-7180.2023.0114
引用本文: 阳帆,魏宪,郭杰龙,等.基于生成式自监督学习的对抗样本分类算法[J]. 微电子学与计算机,2024,41(2):11-18. doi: 10.19304/J.ISSN1000-7180.2023.0114
YANG F,WEI X,GUO J L,et al. Adversarial example classification algorithm based on generative self-supervised learning[J]. Microelectronics & Computer,2024,41(2):11-18. doi: 10.19304/J.ISSN1000-7180.2023.0114
Citation: YANG F,WEI X,GUO J L,et al. Adversarial example classification algorithm based on generative self-supervised learning[J]. Microelectronics & Computer,2024,41(2):11-18. doi: 10.19304/J.ISSN1000-7180.2023.0114

基于生成式自监督学习的对抗样本分类算法

Adversarial example classification algorithm based on generative self-supervised learning

  • 摘要: 对抗样本常常被视为对深度学习模型鲁棒性的威胁,而现有对抗训练往往会降低分类网络的泛化精度,导致其对原始样本的分类效果降低。因此,提出了一个基于生成式自监督学习的对抗样本分类算法,通过自监督学习训练生成式模型获取图像数据潜在特征的能力,并基于该模型实现对抗样本的特征筛选,而后将其中有益分类的信息反馈给分类模型。最后进行联合学习,完成端到端的全局训练,进一步实现分类模型泛化精度的提升。在MNIST、CIFAR10和CIFAR100数据集上的实验结果显示,与标准训练相比,该算法将分类精度分别提高了0.06%、1.34%、0.89%,达到99.70%、84.34%、63.65%。结果证明,该算法克服了传统对抗训练降低模型泛化性能的固有缺点,并进一步提高了分类网络的精度。

     

    Abstract: Adversarial examples are often regarded as a threat to the robustness of deep learning models, and various defense techniques such as adversarial training have been developed to mitigate the impact of adversarial examples on label prediction. However, the various existing adversarial training reduces the generalization accuracy of the classification network, resulting in a reduction in its classification effect on the original examples. Therefore, an adversarial example classification algorithm based on generative self-supervised learning is proposed. Through self-supervised learning, the generative model can be trained to obtain the potential features of image data, and this model performs feature screening on adversarial examples. After that, the information useful for classification is fed back to train the classification model. Finally, joint learning is carried out to complete the end-to-end global training, and further improves the generalization accuracy of the classification model. Experimental results on MNIST, CIFAR10, and CIFAR100 datasets show that compared with standard training, the proposed algorithm increases the classification accuracy by 0.06%, 1.34%, and 0.89%, respectively, reaching 99.70%, 84.34%, and 63.65%. The result shows that it overcomes the inherent shortcomings of traditional adversarial training reducing the generalization performance of the model, and further improves the accuracy of the classification network.

     

/

返回文章
返回