Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 以虛擬化混淆轉換來落實 Python 程式的安全式機器學習
Secure machine learning through virtualization obfuscation of Python code
作者 邱怡翔
CHIU, YI-HSIANG
貢獻者 胡毓忠
Hu, Yuh-Jong
邱怡翔
CHIU, YI-HSIANG
關鍵詞 程式碼混淆
虛擬化混淆轉換
安全式機器學習
Code obfuscation
Virtualization obfuscation
Secure machine learning
日期 2019
上傳時間 1-Jul-2019 10:59:22 (UTC+8)
摘要 借助機器學習的能力人們可以從資料裡得到許多有用的資訊。當有巨量分析需求的資料時經常以向公有雲平台提供者租用運算資源來進行叢集運算作為處理方式。然而在公有雲進行運算意味著不可信任性,程式資訊有洩漏的可能性。本研究以保護 Python 程式語言撰寫的程式為目的設計程式碼混淆轉換工具,其利用虛擬化混淆演算法作為主要轉換方式來修改程式,轉換後的程式達成程序抽象化,確保模型在訓練及預測階段的運算方式無法被輕易得知。此外,本研究應用簡單化混淆來改寫虛擬化混淆轉換中,直譯器的運作方式來阻饒攻擊者進行靜態及動態的程式分析。在轉換效果評估上,本研究以 Kaggle 預測鐵達尼號事件存亡的競賽資料集準備機器學習程式。機器學習程式在虛擬化轉換後,控制流程被全面地改寫並且使軟體複雜度大幅提高,而這也將使程式執行時間增加 43 到 70 倍。
With the power of machine learning, people can get a lot of useful information from the data. When there is a huge amount of data for analyzing, the cluster computing operation is often carried out by renting computing resources, which is offered by the public cloud platform provider. However, computing in the public cloud means untrustworthiness, and program information has the possibility of leakage. This paper designs a code obfuscation conversion tool for the purpose of protecting programs written in the Python programming language. It uses the Virtualization Obfuscation algorithm as the main conversion method to modify the program, and the converted program achieves program abstraction to ensure that the model is secure in the training and prediction stage. In addition, this study also applies simplicity obfuscation to rewrite the interpreter in the Virtualization Obfuscation transformation, so that the attacker is harder to perform static and dynamic program analysis. In the evaluation of the conversion effect, this study prepares a machine learning program based on the Kaggle competition data set in which predicts the survival of the Titanic event. After the Virtualization Obfuscation transform is performed on the machine learning program, the control flow is completely rewritten and the complexity of the software is greatly improved, but this will also increase the program execution time by 43 to 70 times.
參考文獻 [1] B. Anckaert, M. H. Jakubowski, R. Venkatesan. "Virtualization for diversified
tamper resistance." U.S. Patent No. 8,584,109. 12 Nov. 2013.
[2] D. Apon, et al. "Implementing Cryptographic Program Obfuscation." IACR
Cryptology ePrint Archive 2014 (2014): 779.
[3] M. R.Asghar, S.D. Galbraith, G. Russello. "Obfuscation through simplicity."
(2016).
[4] S. Banescu, et al. "Code obfuscation against symbolic execution attacks."
Proceedings of the 32nd Annual Conference on Computer Security Applications.
ACM, 2016.
[5] S. Banescu, et al. "Vot4cs: A virtualization obfuscation tool for C#" Proceedings
of the 2016 ACM Workshop on Software PROtection.ACM, 2016.
[6] C. Cadar, D. Dunbar, D. R. Engler. "KLEE: Unassisted andAutomatic
Generation of High-Coverage Tests for Complex Systems Programs." OSDI. Vol.
8. 2008.
[7] J. Cazalas, et al. "Probing the limits of virtualized software protection."
Proceedings of the 4th Program Protection and Reverse Engineering Workshop.
ACM, 2014.
[8] C. Collberg, C. Thomborson, D. Low.A taxonomy of obfuscating
transformations. Department of Computer Science, The University ofAuckland,
New Zealand, 1997.
[9] C. Collberg, et al. "Distributed application tamper detection via continuous
software updates." Proceedings of the 28th Annual Computer Security
Applications Conference.ACM, 2012.
[10] K. Coogan, G. Lu, S. Debray. "Deobfuscation of virtualization-obfuscated
software: a semantics-based approach." Proceedings of the 18th ACM conference
on Computer and communications security.ACM, 2011.
[11] S. Garg, et al. "Candidate indistinguishability obfuscation and functional
encryption for all circuits." SIAM Journal on Computing 45.3 (2016): 882-929.
25
[12] M. H. Halstead. Elements of software science. Vol. 7. New York: Elsevier, 1977.
[13] J. Kinder. "Towards static analysis of virtualization-obfuscated binaries."
Reverse Engineering (WCRE), 2012 19th Working Conference on. IEEE, 2012.
[14] J. C. King. "Symbolic execution and program testing." Communications of the
ACM 19.7 (1976): 385-394.
[15] T. J. McCabe. "A complexity measure." IEEE Transactions on software
Engineering 4 (1976): 308-320.
[16] J. Nagra, C. Collberg. Surreptitious Software: Obfuscation, Watermarking, and
Tamperproofing for Software Protection. Pearson Education, 2009.
[17] T.A. Proebsting. "Optimizing an ANSI C interpreter with superoperators."
Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of
programming languages. ACM, 1995.
[18] R. Rolles. "Unpacking virtualization obfuscators." 3rd USENIX Workshop on
Offensive Technologies.(WOOT). 2009.
[19] S.A. Sebastian, S. Malgaonkar, P. Shah, M. Kapoor and T. Parekhji, "A study &
review on code obfuscation," 2016 World Conference on Futuristic Trends in
Research and Innovation for Social Welfare (Startup Conclave), Coimbatore,
2016, pp. 1-6.
[20] M. Sharif, et al. "Automatic reverse engineering of malware emulators." 2009
30th IEEE Symposium on Security and Privacy. IEEE, 2009.
[21] B. Yadegari, et al. "A generic approach to automatic deobfuscation of executable
code." 2015 IEEE Symposium on Security and Privacy. IEEE, 2015.
描述 碩士
國立政治大學
資訊科學系
105753027
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0105753027
資料類型 thesis
dc.contributor.advisor 胡毓忠zh_TW
dc.contributor.advisor Hu, Yuh-Jongen_US
dc.contributor.author (Authors) 邱怡翔zh_TW
dc.contributor.author (Authors) CHIU, YI-HSIANGen_US
dc.creator (作者) 邱怡翔zh_TW
dc.creator (作者) CHIU, YI-HSIANGen_US
dc.date (日期) 2019en_US
dc.date.accessioned 1-Jul-2019 10:59:22 (UTC+8)-
dc.date.available 1-Jul-2019 10:59:22 (UTC+8)-
dc.date.issued (上傳時間) 1-Jul-2019 10:59:22 (UTC+8)-
dc.identifier (Other Identifiers) G0105753027en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/124196-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 105753027zh_TW
dc.description.abstract (摘要) 借助機器學習的能力人們可以從資料裡得到許多有用的資訊。當有巨量分析需求的資料時經常以向公有雲平台提供者租用運算資源來進行叢集運算作為處理方式。然而在公有雲進行運算意味著不可信任性,程式資訊有洩漏的可能性。本研究以保護 Python 程式語言撰寫的程式為目的設計程式碼混淆轉換工具,其利用虛擬化混淆演算法作為主要轉換方式來修改程式,轉換後的程式達成程序抽象化,確保模型在訓練及預測階段的運算方式無法被輕易得知。此外,本研究應用簡單化混淆來改寫虛擬化混淆轉換中,直譯器的運作方式來阻饒攻擊者進行靜態及動態的程式分析。在轉換效果評估上,本研究以 Kaggle 預測鐵達尼號事件存亡的競賽資料集準備機器學習程式。機器學習程式在虛擬化轉換後,控制流程被全面地改寫並且使軟體複雜度大幅提高,而這也將使程式執行時間增加 43 到 70 倍。zh_TW
dc.description.abstract (摘要) With the power of machine learning, people can get a lot of useful information from the data. When there is a huge amount of data for analyzing, the cluster computing operation is often carried out by renting computing resources, which is offered by the public cloud platform provider. However, computing in the public cloud means untrustworthiness, and program information has the possibility of leakage. This paper designs a code obfuscation conversion tool for the purpose of protecting programs written in the Python programming language. It uses the Virtualization Obfuscation algorithm as the main conversion method to modify the program, and the converted program achieves program abstraction to ensure that the model is secure in the training and prediction stage. In addition, this study also applies simplicity obfuscation to rewrite the interpreter in the Virtualization Obfuscation transformation, so that the attacker is harder to perform static and dynamic program analysis. In the evaluation of the conversion effect, this study prepares a machine learning program based on the Kaggle competition data set in which predicts the survival of the Titanic event. After the Virtualization Obfuscation transform is performed on the machine learning program, the control flow is completely rewritten and the complexity of the software is greatly improved, but this will also increase the program execution time by 43 to 70 times.en_US
dc.description.tableofcontents 摘要 i
ABSTRACT ii
表目錄 v
圖目錄 vi
第一章 導論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 研究成果 2
第二章 研究背景 3
2.1 程式碼混淆 3
2.1.1 虛擬化混淆演算法 3
2.1.2 對虛擬化混淆的攻擊 6
2.2 Python 程式 8
第三章 相關研究 11
第四章 混淆方法與流程 12
4.1 原始程式分析 12
4.2 虛擬化混淆轉換 13
4.2.1 建立混淆版位元組碼 15
4.2.2 建立自訂直譯器 16
4.3 簡單化混淆轉換 17
第五章 研究實作 19
5.1 混淆前程式準備 19
5.2 虛擬化混淆後程式實測 20
5.3 虛擬化混淆轉換效力 21
第六章 結論與未來研究 23
6.1 研究結論與貢獻 23
6.2 研究限制 23
參考文獻 25
zh_TW
dc.format.extent 2742003 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0105753027en_US
dc.subject (關鍵詞) 程式碼混淆zh_TW
dc.subject (關鍵詞) 虛擬化混淆轉換zh_TW
dc.subject (關鍵詞) 安全式機器學習zh_TW
dc.subject (關鍵詞) Code obfuscationen_US
dc.subject (關鍵詞) Virtualization obfuscationen_US
dc.subject (關鍵詞) Secure machine learningen_US
dc.title (題名) 以虛擬化混淆轉換來落實 Python 程式的安全式機器學習zh_TW
dc.title (題名) Secure machine learning through virtualization obfuscation of Python codeen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] B. Anckaert, M. H. Jakubowski, R. Venkatesan. "Virtualization for diversified
tamper resistance." U.S. Patent No. 8,584,109. 12 Nov. 2013.
[2] D. Apon, et al. "Implementing Cryptographic Program Obfuscation." IACR
Cryptology ePrint Archive 2014 (2014): 779.
[3] M. R.Asghar, S.D. Galbraith, G. Russello. "Obfuscation through simplicity."
(2016).
[4] S. Banescu, et al. "Code obfuscation against symbolic execution attacks."
Proceedings of the 32nd Annual Conference on Computer Security Applications.
ACM, 2016.
[5] S. Banescu, et al. "Vot4cs: A virtualization obfuscation tool for C#" Proceedings
of the 2016 ACM Workshop on Software PROtection.ACM, 2016.
[6] C. Cadar, D. Dunbar, D. R. Engler. "KLEE: Unassisted andAutomatic
Generation of High-Coverage Tests for Complex Systems Programs." OSDI. Vol.
8. 2008.
[7] J. Cazalas, et al. "Probing the limits of virtualized software protection."
Proceedings of the 4th Program Protection and Reverse Engineering Workshop.
ACM, 2014.
[8] C. Collberg, C. Thomborson, D. Low.A taxonomy of obfuscating
transformations. Department of Computer Science, The University ofAuckland,
New Zealand, 1997.
[9] C. Collberg, et al. "Distributed application tamper detection via continuous
software updates." Proceedings of the 28th Annual Computer Security
Applications Conference.ACM, 2012.
[10] K. Coogan, G. Lu, S. Debray. "Deobfuscation of virtualization-obfuscated
software: a semantics-based approach." Proceedings of the 18th ACM conference
on Computer and communications security.ACM, 2011.
[11] S. Garg, et al. "Candidate indistinguishability obfuscation and functional
encryption for all circuits." SIAM Journal on Computing 45.3 (2016): 882-929.
25
[12] M. H. Halstead. Elements of software science. Vol. 7. New York: Elsevier, 1977.
[13] J. Kinder. "Towards static analysis of virtualization-obfuscated binaries."
Reverse Engineering (WCRE), 2012 19th Working Conference on. IEEE, 2012.
[14] J. C. King. "Symbolic execution and program testing." Communications of the
ACM 19.7 (1976): 385-394.
[15] T. J. McCabe. "A complexity measure." IEEE Transactions on software
Engineering 4 (1976): 308-320.
[16] J. Nagra, C. Collberg. Surreptitious Software: Obfuscation, Watermarking, and
Tamperproofing for Software Protection. Pearson Education, 2009.
[17] T.A. Proebsting. "Optimizing an ANSI C interpreter with superoperators."
Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of
programming languages. ACM, 1995.
[18] R. Rolles. "Unpacking virtualization obfuscators." 3rd USENIX Workshop on
Offensive Technologies.(WOOT). 2009.
[19] S.A. Sebastian, S. Malgaonkar, P. Shah, M. Kapoor and T. Parekhji, "A study &
review on code obfuscation," 2016 World Conference on Futuristic Trends in
Research and Innovation for Social Welfare (Startup Conclave), Coimbatore,
2016, pp. 1-6.
[20] M. Sharif, et al. "Automatic reverse engineering of malware emulators." 2009
30th IEEE Symposium on Security and Privacy. IEEE, 2009.
[21] B. Yadegari, et al. "A generic approach to automatic deobfuscation of executable
code." 2015 IEEE Symposium on Security and Privacy. IEEE, 2015.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201900153en_US