Publications-Theses
Article View/Open
Publication Export
-
題名 基於半色調轉換的影像對抗例防禦機制
Defense mechanism against adversarial attacks using density-based representation of images作者 黃辰瑋
Huang, Chen-Wei貢獻者 廖文宏
Liao, Wen-Hung
黃辰瑋
Huang, Chen-Wei關鍵詞 深度學習
對抗例防禦
半色調
輸入轉化
deep learning
adversarial defense
halftoning
input recharacterization日期 2019 上傳時間 5-Sep-2019 16:15:04 (UTC+8) 摘要 對抗例是一種刻意使深度學習模型分類錯誤的輸入資料,只需要在輸入中加入微小的干擾便可以使輸出結果大幅改變。目前的研究中已提出了許多方法來保護神經網路避免受到對抗例攻擊的影響,而其中多數防禦方法已被證實無法有效抵抗對抗例的攻擊。為了解決這個問題,我們提出了輸入轉化,一種有效消除對抗例干擾以維持模型準確率的防禦方法。輸入轉化分成兩個階段: 正向轉換及反向重構。我們希望透過具有破壞性的雙向轉換方式,使被刻意加入的干擾失效。在這項研究中,我們使用半色調轉換及半色調還原作為轉化方法進行實驗,並透過卷積層視覺化等方式進行結果分析。我們使用Tiny-ImageNet中的200個類別,共約26萬張128x128的灰階及半色調圖片作為訓練資料。現有對抗例防禦研究中,大多採用梯度模糊、輸入轉換及對抗例訓練等機制作為防禦策略,其中又以對抗例訓練最具防禦效果。然而,對抗例訓練需生成對抗例並加入訓練樣本中,這在大部分的應用是不實際的。我們所提出的方法較類似於輸入轉換,使用半色調轉換方式將圖片由連續色調轉換為二進制,希望藉圖片不同的表現形態使對抗例攻擊失效。同時也使用半色調還原對其進行還原,嘗試藉由還原過程消除對抗例干擾。此方法不須對抗例訓練龐大的訓練成本,僅需對輸入資料進行前處理。本論文提出的方法在VGG-16架構上對於灰階模型top5可達76.5%、半色調模型top5可達80.4%、混合模型更是達到85.14%。面對FGSM、I-FGSM、PGD對抗例攻擊混合模型top5仍可維持80.97%、78.77%、81.56%的準確率。雖然準確率仍受到影響,但對抗例效果卻大幅下降,與現有輸入轉換防禦機制相比,我們的方法平均可提升準確率約10%。
Adversarial examples are slightly modified inputs that are devised to cause erroneous inference of deep learning models. Recently, many methods have been proposed to counter the attack of adversarial examples. However, new ways of generating attacks have also surfaced accordingly. Protection against the intervention of adversarial examples is a fundamental issue that needs to be addressed before wide adoption of deep learning based intelligent systems. In this research, we utilize the method known as input recharacterization to effectively remove the perturbations found in the adversarial examples in order to maintain the performance of the original model.Input recharacterization typically consists of two stages: a forward transform and a backward reconstruction. Our hope is that by going through the lossy two-way transformation, the purposely added `noise` or `perturbation` will become ineffective. In this work, we employ digital halftoning and inverse halftoning for input recharacterization, although there exist many possible choices. We apply convolution layer visualization to better understand the network architecture and characteristics. The data set used in this study is Tiny ImageNet, consisting of 260 thousand 128x128 grayscale images belonging to 200 classes.Most of defense mechanisms rely on gradient masking, input transform and adversarial training. Among these strategies, adversarial training is widely regarded as the most effective. However, it requires adversarial examples to be generated and included in the training set, which is impractical in most applications. The proposed approach is more similar to input transform. We convert the image from intensity-based representation to density-based representation using halftone operation, which hopefully invalidates the attack by changing the image representation. We also investigate whether inverse halftoning can eliminate the adversarial perturbation. The proposed method does not require extra training of adversarial samples. Only low-cost input pre-processing is needed.On the VGG-16 architecture, the top-5 accuracy for the grayscale model is 76.5%, the top-5 accuracy for halftone model is 80.4%, and the top-5 accuracy for the hybrid model (trained with both grayscale and halftone images) is 85.14%. With adversarial attacks generated using FGSM, I-FGSM, and PGD, the top-5 accuracy of the hybrid model can still maintain 80.97%, 78.77%, 81.56%, respectively. Although the accuracy has been affected, the influence of adversarial examples is significantly discounted. The average improvement over existing input transform defense mechanisms is approximately 10%.參考文獻 [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199, 2014.[2] IBM Research AI tutorial:http://research.ibm.com/labs/ireland/nemesis2018/pdf/tutorial.pdf[3] Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 641–647. ACM, 2005.[4] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.[5] Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.[6] Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv: Computer Vision and Pattern Recognition, 2016.[7] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble Adversarial Training: " Attacks and Defenses. ArXiv e-prints, May 2017.[8] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.[9] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.[10] Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), number EPFL-CONF-218057, 2016.[11] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.[12] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.[13] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16–25. ACM, 2006.[14] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.[15] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. international conference on machine learning, pages 448–456, 2015.[16] Cristian Bucilua, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 535–541, New York, NY, USA, 2006. ACM.[17] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.[18] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582–597. IEEE, 2016.[19] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068, 2014.[20] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. computer vision and pattern recognition, pages 427–436, 2015.[21] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick D Mcdaniel. On the (statistical) detection of adversarial examples. arXiv: Cryptography and Security, 2017.[22] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. international conference on learning representations, 2017.[23] Chuan Guo, Mayank Rana, Moustapha Cisse and Laurens van der Maaten. Countering Adversarial Images Using Input Transformation. arXiv:1711.00117v3 [cs.CV] 25 Jan 2018[24] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017.[25] Leonid Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992.[26] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239, 2001.[27] 陳乃瑋,基於卷積核冗餘的神經網路壓縮機制,政治大學資訊科學系碩士論文,2018。[28] Repository for Scale-recurrent Network for Deep Image Deblurringhttps://github.com/jiangsutx/SRN-Deblur#scale-recurrent-network-for-deep-image-deblurring 描述 碩士
國立政治大學
資訊科學系
106753021資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106753021 資料類型 thesis dc.contributor.advisor 廖文宏 zh_TW dc.contributor.advisor Liao, Wen-Hung en_US dc.contributor.author (Authors) 黃辰瑋 zh_TW dc.contributor.author (Authors) Huang, Chen-Wei en_US dc.creator (作者) 黃辰瑋 zh_TW dc.creator (作者) Huang, Chen-Wei en_US dc.date (日期) 2019 en_US dc.date.accessioned 5-Sep-2019 16:15:04 (UTC+8) - dc.date.available 5-Sep-2019 16:15:04 (UTC+8) - dc.date.issued (上傳時間) 5-Sep-2019 16:15:04 (UTC+8) - dc.identifier (Other Identifiers) G0106753021 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/125643 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 106753021 zh_TW dc.description.abstract (摘要) 對抗例是一種刻意使深度學習模型分類錯誤的輸入資料,只需要在輸入中加入微小的干擾便可以使輸出結果大幅改變。目前的研究中已提出了許多方法來保護神經網路避免受到對抗例攻擊的影響,而其中多數防禦方法已被證實無法有效抵抗對抗例的攻擊。為了解決這個問題,我們提出了輸入轉化,一種有效消除對抗例干擾以維持模型準確率的防禦方法。輸入轉化分成兩個階段: 正向轉換及反向重構。我們希望透過具有破壞性的雙向轉換方式,使被刻意加入的干擾失效。在這項研究中,我們使用半色調轉換及半色調還原作為轉化方法進行實驗,並透過卷積層視覺化等方式進行結果分析。我們使用Tiny-ImageNet中的200個類別,共約26萬張128x128的灰階及半色調圖片作為訓練資料。現有對抗例防禦研究中,大多採用梯度模糊、輸入轉換及對抗例訓練等機制作為防禦策略,其中又以對抗例訓練最具防禦效果。然而,對抗例訓練需生成對抗例並加入訓練樣本中,這在大部分的應用是不實際的。我們所提出的方法較類似於輸入轉換,使用半色調轉換方式將圖片由連續色調轉換為二進制,希望藉圖片不同的表現形態使對抗例攻擊失效。同時也使用半色調還原對其進行還原,嘗試藉由還原過程消除對抗例干擾。此方法不須對抗例訓練龐大的訓練成本,僅需對輸入資料進行前處理。本論文提出的方法在VGG-16架構上對於灰階模型top5可達76.5%、半色調模型top5可達80.4%、混合模型更是達到85.14%。面對FGSM、I-FGSM、PGD對抗例攻擊混合模型top5仍可維持80.97%、78.77%、81.56%的準確率。雖然準確率仍受到影響,但對抗例效果卻大幅下降,與現有輸入轉換防禦機制相比,我們的方法平均可提升準確率約10%。 zh_TW dc.description.abstract (摘要) Adversarial examples are slightly modified inputs that are devised to cause erroneous inference of deep learning models. Recently, many methods have been proposed to counter the attack of adversarial examples. However, new ways of generating attacks have also surfaced accordingly. Protection against the intervention of adversarial examples is a fundamental issue that needs to be addressed before wide adoption of deep learning based intelligent systems. In this research, we utilize the method known as input recharacterization to effectively remove the perturbations found in the adversarial examples in order to maintain the performance of the original model.Input recharacterization typically consists of two stages: a forward transform and a backward reconstruction. Our hope is that by going through the lossy two-way transformation, the purposely added `noise` or `perturbation` will become ineffective. In this work, we employ digital halftoning and inverse halftoning for input recharacterization, although there exist many possible choices. We apply convolution layer visualization to better understand the network architecture and characteristics. The data set used in this study is Tiny ImageNet, consisting of 260 thousand 128x128 grayscale images belonging to 200 classes.Most of defense mechanisms rely on gradient masking, input transform and adversarial training. Among these strategies, adversarial training is widely regarded as the most effective. However, it requires adversarial examples to be generated and included in the training set, which is impractical in most applications. The proposed approach is more similar to input transform. We convert the image from intensity-based representation to density-based representation using halftone operation, which hopefully invalidates the attack by changing the image representation. We also investigate whether inverse halftoning can eliminate the adversarial perturbation. The proposed method does not require extra training of adversarial samples. Only low-cost input pre-processing is needed.On the VGG-16 architecture, the top-5 accuracy for the grayscale model is 76.5%, the top-5 accuracy for halftone model is 80.4%, and the top-5 accuracy for the hybrid model (trained with both grayscale and halftone images) is 85.14%. With adversarial attacks generated using FGSM, I-FGSM, and PGD, the top-5 accuracy of the hybrid model can still maintain 80.97%, 78.77%, 81.56%, respectively. Although the accuracy has been affected, the influence of adversarial examples is significantly discounted. The average improvement over existing input transform defense mechanisms is approximately 10%. en_US dc.description.tableofcontents 摘要 II致謝 VI目錄 VII表目錄 X圖目錄 XII第 1 章 緒論 11.1 對抗例 11.2 輸入轉化 31.3 研究目的 41.4論文架構 5第 2 章 背景與相關研究 62.1深度學習架構 62.2 對抗例攻擊 72.2.1 對抗例框架 82.2.2 微擾評估 102.2.3 攻擊策略 102.2.4 防禦策略 152.3 半色調 222.3.1 半色調方法 222.3.2 誤差擴散法 232.3.3 半色調還原 242.4 小結 25第 3 章 資料集介紹與基礎概念驗證 263.1 環境配置及套件 263.2 資料集介紹 263.2.1 CIFAR-10 263.2.2 ImageNet 273.2.3 Tiny-ImageNet 273.3 對抗例生成實驗 273.4 隨機權重評估實驗 293.5 半色調轉換與還原實驗 - CIFAR 10 313.6 半色調轉換與還原實驗 - ImageNet 333.7 小結 35第 4 章 研究方法 364.1 資料集選擇及處理 364.2 模型訓練 374.3 半色調對抗例生成可行性 38第 5 章 實驗與結果分析 405.1 模型訓練 405.1.1 灰階模型 405.1.2半色調模型 415.1.3 混合訓練模型 425.1.4 特徵融合模型 435.1.5 小結 445.2 卷積層視覺化分析 455.3 模型防禦能力 475.3.1 灰階模型 475.3.2 半色調模型 485.3.3 混合訓練模型 485.3.4 實驗結果比較 495.4 半色調對抗例生成的可行性 515.4.1 全圖修改生成對抗例 515.4.2 單點修改生成對抗例 52第 6 章 結論與未來研究 54參考文獻 56 zh_TW dc.format.extent 3639428 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106753021 en_US dc.subject (關鍵詞) 深度學習 zh_TW dc.subject (關鍵詞) 對抗例防禦 zh_TW dc.subject (關鍵詞) 半色調 zh_TW dc.subject (關鍵詞) 輸入轉化 zh_TW dc.subject (關鍵詞) deep learning en_US dc.subject (關鍵詞) adversarial defense en_US dc.subject (關鍵詞) halftoning en_US dc.subject (關鍵詞) input recharacterization en_US dc.title (題名) 基於半色調轉換的影像對抗例防禦機制 zh_TW dc.title (題名) Defense mechanism against adversarial attacks using density-based representation of images en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199, 2014.[2] IBM Research AI tutorial:http://research.ibm.com/labs/ireland/nemesis2018/pdf/tutorial.pdf[3] Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 641–647. ACM, 2005.[4] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.[5] Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.[6] Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv: Computer Vision and Pattern Recognition, 2016.[7] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel. Ensemble Adversarial Training: " Attacks and Defenses. ArXiv e-prints, May 2017.[8] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.[9] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.[10] Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), number EPFL-CONF-218057, 2016.[11] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.[12] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.[13] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16–25. ACM, 2006.[14] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.[15] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. international conference on machine learning, pages 448–456, 2015.[16] Cristian Bucilua, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 535–541, New York, NY, USA, 2006. ACM.[17] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.[18] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582–597. IEEE, 2016.[19] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068, 2014.[20] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. computer vision and pattern recognition, pages 427–436, 2015.[21] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick D Mcdaniel. On the (statistical) detection of adversarial examples. arXiv: Cryptography and Security, 2017.[22] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. international conference on learning representations, 2017.[23] Chuan Guo, Mayank Rana, Moustapha Cisse and Laurens van der Maaten. Countering Adversarial Images Using Input Transformation. arXiv:1711.00117v3 [cs.CV] 25 Jan 2018[24] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017.[25] Leonid Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992.[26] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239, 2001.[27] 陳乃瑋,基於卷積核冗餘的神經網路壓縮機制,政治大學資訊科學系碩士論文,2018。[28] Repository for Scale-recurrent Network for Deep Image Deblurringhttps://github.com/jiangsutx/SRN-Deblur#scale-recurrent-network-for-deep-image-deblurring zh_TW dc.identifier.doi (DOI) 10.6814/NCCU201900958 en_US