Publications-Theses
Article View/Open
Publication Export
-
題名 Python指令碼的符號化分析與條件生成
Symbolic Constraint Generation of Python Opcode作者 詹語昕
Jan, Yu-Shin貢獻者 郁方
Yu, Fang
詹語昕
Jan, Yu-Shin關鍵詞 靜態分析
符號分析
Symbolic Execution
Static analysis日期 2021 上傳時間 1-Jun-2021 14:54:40 (UTC+8) 摘要 由於Python語言已被廣泛用於開發現代應用程序,例如Web應用程序,數據分析工具,機器學習包和自主機器人模組。作為一種通用語言,它不僅在學術環境中而且在業界都被廣泛地使用。 Python易於學習,並具有自然語言架構,以及非常豐富的函式庫。雖然它是算法開發和探索性數據的誘人選擇所以對於Python程式的安全性分析就顯得格外的重要。在符號分析中,程式可以在特定輸入的執行下觀察到對python應用程序的運作和對於輸入值的解析。為了可以準確計算出輸入值的產生和提高程式覆蓋率,我們提出了一個有效的分析框架,該框架可用於在其字節碼級別生成python程序的路徑約束。除此之外,我們在Python字節碼指令上基於符號堆棧執行,並且我們可以依照每個指令去記錄和推斷輸入值的資料型態。SMT-Solver方面,我們使用了SMT求解器-CVC4,我們會依照測試集來產生出每一條路徑的SMT檔案,並利用用CVC4解實現了較高的代碼覆蓋率,性能優於現有工具PyExZ3。同時,我們的工具同時也能分析Python的Client-Server架構,並進行案例研究以發現Python Web服務和應用程序的漏洞,像是XMLRPC軟和Django框架。
Python language has been widely adopted to develop modern applications suchas web applications, data analytic tools, machine learning packages, and autonomousrobotics modules due to its high-level interactive nature and its maturing ecosystemof scientific libraries. As a general-purpose language, it is increasingly used not onlyin academic settings but also in industry. Python is easy to learn and has a spokenlanguage architecture, coupled with a very rich library of functions. While it is anappealing choice for algorithmic development and exploratory data analytic, a sys-tematic approach to analyze code soundness and correctness of Python programsis of the essence for software security. An exploit of a python application can beobserved under executions on specific inputs. To discover inputs that cover mostprogram executions, we propose an effective analysis framework that can be usedto generate the path constraints of python programs at its bytecode level.Particularly, we propose symbolic stack-based execution on Python bytecodeinstructions equipped with effective type inference on integer and string variables.We synthesize string and integer constraints for bounded paths in terms of loopdepth and string variable length. Integrating with the modern SMT solver, CVC4,we are able to derive test sets that achieve high code coverage against Python utilitybenchmarks, outperforming the existing tool PyExZ3. We also support client-serverarchitecture for Python remote procedure calls and conduct a case study on dis-covering CVE vulnerabilities of XMLRPC packages and Django frameworks. Bothhave been widely used to develop Python web services and applications.參考文獻 [1]J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): Avision, architectural elements, and future directions,”Future generation computersystems, vol. 29, no. 7, pp. 1645–1660, 2013.[2]I. Lee and K. Lee, “The internet of things (iot): Applications, investments, andchallenges for enterprises,”Business Horizons, vol. 58, no. 4, pp. 431–440, 2015.[3]A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internetof things: A survey on enabling technologies, protocols, and applications,”IEEECommunications Surveys & Tutorials, vol. 17, no. 4, pp. 2347–2376, 2015.[4]A. Kamilaris, F. Gao, F. X. Prenafeta-Boldú, and M. I. Ali, “Agri-iot: A semanticframework for internet of things-enabled smart farming applications,” inInternet ofThings (WF-IoT), 2016 IEEE 3rd World Forum on, pp. 442–447, IEEE, 2016.[5]P. A. Laplante and N. Laplante, “The internet of things in healthcare: Potentialapplications and challenges,”IT Professional, vol. 18, no. 3, pp. 2–4, 2016.[6]Y. Jie, J. Y. Pei, L. Jun, G. Yun, and X. Wei, “Smart home system based oniot technologies,” inComputational and Information Sciences (ICCIS), 2013 FifthInternational Conference on, pp. 1789–1791, IEEE, 2013.[7]S. Kalra and S. K. Sood, “Secure authentication scheme for iot and cloud servers,”Pervasive and Mobile Computing, vol. 24, pp. 210–223, 2015.[8]G. A. Koushik Sen, Darko Marinov, “Cute: A concolic unit testing engine for c,”2005.[9]R.-G. Xu, “Symbolic execution algorithms for test generation,” 2009.[10]G. L. G. P. Rajan, “Klover: A symbolic execution and automatic test generation toolfor c++ programs,” 2011.39[11]A. R. D. B. Sang Kil Cha, Thanassis Avgerinos, “Unleashing mayhem on binarycode,” 2012.[12]D. M. Patrice Godefroid, Michael Y. Levin, “Sage: Whitebox fuzzing for securitytesting,” 2012.[13]E. P. Thesis, “S2e: A platform for in-vivo multi-path analysis of software systems,”2014.[14]D. G. F. H. M. I. T. K. Kasper Luckow, Marko Dimjasevic, “Jdart: A dynamicsymbolic analysis framework,” 2016.[15]N. R. Corina S. Pas ̆ areanu, “Symbolic pathfinder:symbolic execution of java byte-cod,” 2010.[16]D. E. Cristian Cadar, Daniel Dunbar, “Klee: Unassisted and automatic generationof high-coverage tests for complex systems programs,” 2012.[17]C. Z. G. C. Stefan Bucur, Vlad Ureche, “Parallel symbolic execution for automatedreal-world software testing,” 2011.[18]S. H. F. M. S. M. D. S. C. S. D. Prateek Saxena, Devdatta Akhawe, “A symbolicexecution framework for javascript,”[19]T. B. J. D. T. Ball, “Deconstructing dynamic symbolic execution,” 2015.[20]J. C. King, “Symbolic execution and program testing,”Communications of the ACM,vol. 19, no. 7, pp. 385–394, 1976.[21]W. Visser, C. S. Pǎsǎreanu, and S. Khurshid, “Test input generation with javapathfinder,”ACM SIGSOFT Software Engineering Notes, vol. 29, no. 4, pp. 97–107, 2004.[22]T. Xie, D. Marinov, W. Schulte, and D. Notkin, “Symstra: A framework for generat-ing object-oriented unit tests using symbolic execution,” inInternational Conference40 on Tools and Algorithms for the Construction and Analysis of Systems, pp. 365–381,Springer, 2005.[23]C. S. Păsăreanu, M. B. Dwyer, and W. Visser, “Finding feasible counter-exampleswhen model checking abstracted java programs,” inInternational Conference onTools and Algorithms for the Construction and Analysis of Systems, pp. 284–298,Springer, 2001.[24]C. Csallner and Y. Smaragdakis, “Check’n’crash: combining static checking andtesting,” inProceedings of the 27th international conference on Software engineering,pp. 422–431, ACM, 2005.[25]C. S. Păsăreanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, and N. Rungta,“Symbolic pathfinder: integrating symbolic execution with model checking for javabytecode analysis,”Automated Software Engineering, vol. 20, no. 3, pp. 391–425,2013.[26]K. Sen and G. Agha, “A race-detection and flipping algorithm for automated testingof multi-threaded programs,” 2006.[27]V. G. A. K. Karthick Jayaraman, David Harvison, “jfuzz: A concolic tester for nasajava,” 2009.[28]D. S. A. . P. M. J. H. . P. M. N. . P. H. Yang, “nstruments android apps using sootfor dynamic symbolic execution,” 2013.[29]a S. Pasareanu Neha Rungta, “Symbolic pathfinder: symbolic execution of java byte-code,” 2010.[30]C. Cadar, D. Dunbar, D. R. Engler,et al., “Klee: Unassisted and automatic genera-tion of high-coverage tests for complex systems programs.,” inOSDI, vol. 8, pp. 209–224, 2008.41[31]K. Sen, D. Marinov, and G. Agha, “Cute: a concolic unit testing engine for c,” inACM SIGSOFT Software Engineering Notes, vol. 30, pp. 263–272, ACM, 2005.[32]S. Mechtaev, J. Yi, and A. Roychoudhury, “Angelix: Scalable multiline pro-gram patch synthesis via symbolic analysis,” inSoftware Engineering (ICSE), 2016IEEE/ACM 38th International Conference on, pp. 691–701, IEEE, 2016.[33]L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contractssmarter,” inProceedings of the 2016 ACM SIGSAC Conference on Computer andCommunications Security, CCS ’16, (New York, NY, USA), pp. 254–269, ACM, 2016.[34]E. Brier and M. Joye, “Weierstraß elliptic curves and side-channel attacks,” inInter-national Workshop on Public Key Cryptography, pp. 335–345, Springer, 2002.[35]W. Schindler, K. Lemke, and C. Paar, “A stochastic model for differential side channelcryptanalysis,” inInternational Workshop on Cryptographic Hardware and EmbeddedSystems, pp. 30–46, Springer, 2005.[36]Y. Zhang, “Cache side channels: State of the art and research opportunities,” inProceedings of the 2017 ACM SIGSAC Conference on Computer and CommunicationsSecurity - CCS 17, 2017.[37]J. Chen, Y. Feng, and I. Dillig, “Precise detection of side-channel vulnerabilitiesusing quantitative cartesian hoare logic,” inProceedings of the 2017 ACM SIGSACConference on Computer and Communications Security - CCS 17, 2017.[38]A. Aggarwal and P. Jalote, “Integrating static and dynamic analysis for detectingvulnerabilities,” inComputer Software and Applications Conference, 2006. COMP-SAC’06. 30th Annual International, vol. 1, pp. 343–350, IEEE, 2006.[39]P. Godefroid, N. Klarlund, and K. Sen, “Dart: directed automated random testing,”inACM Sigplan Notices, vol. 40, pp. 213–223, ACM, 2005.42[40]N. Jovanovic, C. Kruegel, and E. Kirda, “Pixy: A static analysis tool for detectingweb application vulnerabilities,” inSecurity and Privacy, 2006 IEEE Symposium on,pp. 6–pp, IEEE, 2006.[41]T. B. J. D. T. Ball, “Deconstructing dynamic symbolic execution,” 2015.[42]M. E. Mostafa Hassan, Caterina Urban and P. Müller, “Maxsmt-based type inferencefor python,” 2018.[43]A. L. F. S. Casper Boone, Niels de Bruin, “Deep learning type inference of pythonfunction signatures using natural language context,” 2019.[44]C. Barrett, A. Stump, C. Tinelli,et al., “The smt-lib standard: Version 2.0,” inPro-ceedings of the 8th International Workshop on Satisfiability Modulo Theories (Edin-burgh, England), vol. 13, p. 14, 2010. 描述 碩士
國立政治大學
資訊管理學系
107356036資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107356036 資料類型 thesis dc.contributor.advisor 郁方 zh_TW dc.contributor.advisor Yu, Fang en_US dc.contributor.author (Authors) 詹語昕 zh_TW dc.contributor.author (Authors) Jan, Yu-Shin en_US dc.creator (作者) 詹語昕 zh_TW dc.creator (作者) Jan, Yu-Shin en_US dc.date (日期) 2021 en_US dc.date.accessioned 1-Jun-2021 14:54:40 (UTC+8) - dc.date.available 1-Jun-2021 14:54:40 (UTC+8) - dc.date.issued (上傳時間) 1-Jun-2021 14:54:40 (UTC+8) - dc.identifier (Other Identifiers) G0107356036 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/135328 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 107356036 zh_TW dc.description.abstract (摘要) 由於Python語言已被廣泛用於開發現代應用程序,例如Web應用程序,數據分析工具,機器學習包和自主機器人模組。作為一種通用語言,它不僅在學術環境中而且在業界都被廣泛地使用。 Python易於學習,並具有自然語言架構,以及非常豐富的函式庫。雖然它是算法開發和探索性數據的誘人選擇所以對於Python程式的安全性分析就顯得格外的重要。在符號分析中,程式可以在特定輸入的執行下觀察到對python應用程序的運作和對於輸入值的解析。為了可以準確計算出輸入值的產生和提高程式覆蓋率,我們提出了一個有效的分析框架,該框架可用於在其字節碼級別生成python程序的路徑約束。除此之外,我們在Python字節碼指令上基於符號堆棧執行,並且我們可以依照每個指令去記錄和推斷輸入值的資料型態。SMT-Solver方面,我們使用了SMT求解器-CVC4,我們會依照測試集來產生出每一條路徑的SMT檔案,並利用用CVC4解實現了較高的代碼覆蓋率,性能優於現有工具PyExZ3。同時,我們的工具同時也能分析Python的Client-Server架構,並進行案例研究以發現Python Web服務和應用程序的漏洞,像是XMLRPC軟和Django框架。 zh_TW dc.description.abstract (摘要) Python language has been widely adopted to develop modern applications suchas web applications, data analytic tools, machine learning packages, and autonomousrobotics modules due to its high-level interactive nature and its maturing ecosystemof scientific libraries. As a general-purpose language, it is increasingly used not onlyin academic settings but also in industry. Python is easy to learn and has a spokenlanguage architecture, coupled with a very rich library of functions. While it is anappealing choice for algorithmic development and exploratory data analytic, a sys-tematic approach to analyze code soundness and correctness of Python programsis of the essence for software security. An exploit of a python application can beobserved under executions on specific inputs. To discover inputs that cover mostprogram executions, we propose an effective analysis framework that can be usedto generate the path constraints of python programs at its bytecode level.Particularly, we propose symbolic stack-based execution on Python bytecodeinstructions equipped with effective type inference on integer and string variables.We synthesize string and integer constraints for bounded paths in terms of loopdepth and string variable length. Integrating with the modern SMT solver, CVC4,we are able to derive test sets that achieve high code coverage against Python utilitybenchmarks, outperforming the existing tool PyExZ3. We also support client-serverarchitecture for Python remote procedure calls and conduct a case study on dis-covering CVE vulnerabilities of XMLRPC packages and Django frameworks. Bothhave been widely used to develop Python web services and applications. en_US dc.description.tableofcontents 1 Introduction - 12 Related Work - 32.1 Symbolic Execution - 32.2 Static analysis on python program - 42.3 Type Inference in Python - 53 A Motivating Example - 63.1 Client-Server Architecture - 74 Methodology - 124.1 Stack-based Symbolic Execution - 124.1.1 Instruction Implementations - 14 -4.1.2 Type Inference - 194.1.3 String operation -224.1.4 Function Call of Integer operation - 224.1.5 Output - 234.2 Multiple Symbolic Execution Engine - 234.3 SMT Constraint Generation - 255 Experiments - 275.1 Experimental Setup - 275.1.1 Theorem prover CVC4 - 275.1.2 Coverage - 285.1.3 Dead path - 285.2 Function of python programs - 305.3 Client-Server Architecture - 345.3.1 Django Framework - 345.3.2 XML-RPC Server - 366 Conclusion - 38Reference - 39 zh_TW dc.format.extent 626033 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107356036 en_US dc.subject (關鍵詞) 靜態分析 zh_TW dc.subject (關鍵詞) 符號分析 zh_TW dc.subject (關鍵詞) Symbolic Execution en_US dc.subject (關鍵詞) Static analysis en_US dc.title (題名) Python指令碼的符號化分析與條件生成 zh_TW dc.title (題名) Symbolic Constraint Generation of Python Opcode en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1]J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): Avision, architectural elements, and future directions,”Future generation computersystems, vol. 29, no. 7, pp. 1645–1660, 2013.[2]I. Lee and K. Lee, “The internet of things (iot): Applications, investments, andchallenges for enterprises,”Business Horizons, vol. 58, no. 4, pp. 431–440, 2015.[3]A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internetof things: A survey on enabling technologies, protocols, and applications,”IEEECommunications Surveys & Tutorials, vol. 17, no. 4, pp. 2347–2376, 2015.[4]A. Kamilaris, F. Gao, F. X. Prenafeta-Boldú, and M. I. Ali, “Agri-iot: A semanticframework for internet of things-enabled smart farming applications,” inInternet ofThings (WF-IoT), 2016 IEEE 3rd World Forum on, pp. 442–447, IEEE, 2016.[5]P. A. Laplante and N. Laplante, “The internet of things in healthcare: Potentialapplications and challenges,”IT Professional, vol. 18, no. 3, pp. 2–4, 2016.[6]Y. Jie, J. Y. Pei, L. Jun, G. Yun, and X. Wei, “Smart home system based oniot technologies,” inComputational and Information Sciences (ICCIS), 2013 FifthInternational Conference on, pp. 1789–1791, IEEE, 2013.[7]S. Kalra and S. K. Sood, “Secure authentication scheme for iot and cloud servers,”Pervasive and Mobile Computing, vol. 24, pp. 210–223, 2015.[8]G. A. Koushik Sen, Darko Marinov, “Cute: A concolic unit testing engine for c,”2005.[9]R.-G. Xu, “Symbolic execution algorithms for test generation,” 2009.[10]G. L. G. P. Rajan, “Klover: A symbolic execution and automatic test generation toolfor c++ programs,” 2011.39[11]A. R. D. B. Sang Kil Cha, Thanassis Avgerinos, “Unleashing mayhem on binarycode,” 2012.[12]D. M. Patrice Godefroid, Michael Y. Levin, “Sage: Whitebox fuzzing for securitytesting,” 2012.[13]E. P. Thesis, “S2e: A platform for in-vivo multi-path analysis of software systems,”2014.[14]D. G. F. H. M. I. T. K. Kasper Luckow, Marko Dimjasevic, “Jdart: A dynamicsymbolic analysis framework,” 2016.[15]N. R. Corina S. Pas ̆ areanu, “Symbolic pathfinder:symbolic execution of java byte-cod,” 2010.[16]D. E. Cristian Cadar, Daniel Dunbar, “Klee: Unassisted and automatic generationof high-coverage tests for complex systems programs,” 2012.[17]C. Z. G. C. Stefan Bucur, Vlad Ureche, “Parallel symbolic execution for automatedreal-world software testing,” 2011.[18]S. H. F. M. S. M. D. S. C. S. D. Prateek Saxena, Devdatta Akhawe, “A symbolicexecution framework for javascript,”[19]T. B. J. D. T. Ball, “Deconstructing dynamic symbolic execution,” 2015.[20]J. C. King, “Symbolic execution and program testing,”Communications of the ACM,vol. 19, no. 7, pp. 385–394, 1976.[21]W. Visser, C. S. Pǎsǎreanu, and S. Khurshid, “Test input generation with javapathfinder,”ACM SIGSOFT Software Engineering Notes, vol. 29, no. 4, pp. 97–107, 2004.[22]T. Xie, D. Marinov, W. Schulte, and D. Notkin, “Symstra: A framework for generat-ing object-oriented unit tests using symbolic execution,” inInternational Conference40 on Tools and Algorithms for the Construction and Analysis of Systems, pp. 365–381,Springer, 2005.[23]C. S. Păsăreanu, M. B. Dwyer, and W. Visser, “Finding feasible counter-exampleswhen model checking abstracted java programs,” inInternational Conference onTools and Algorithms for the Construction and Analysis of Systems, pp. 284–298,Springer, 2001.[24]C. Csallner and Y. Smaragdakis, “Check’n’crash: combining static checking andtesting,” inProceedings of the 27th international conference on Software engineering,pp. 422–431, ACM, 2005.[25]C. S. Păsăreanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, and N. Rungta,“Symbolic pathfinder: integrating symbolic execution with model checking for javabytecode analysis,”Automated Software Engineering, vol. 20, no. 3, pp. 391–425,2013.[26]K. Sen and G. Agha, “A race-detection and flipping algorithm for automated testingof multi-threaded programs,” 2006.[27]V. G. A. K. Karthick Jayaraman, David Harvison, “jfuzz: A concolic tester for nasajava,” 2009.[28]D. S. A. . P. M. J. H. . P. M. N. . P. H. Yang, “nstruments android apps using sootfor dynamic symbolic execution,” 2013.[29]a S. Pasareanu Neha Rungta, “Symbolic pathfinder: symbolic execution of java byte-code,” 2010.[30]C. Cadar, D. Dunbar, D. R. Engler,et al., “Klee: Unassisted and automatic genera-tion of high-coverage tests for complex systems programs.,” inOSDI, vol. 8, pp. 209–224, 2008.41[31]K. Sen, D. Marinov, and G. Agha, “Cute: a concolic unit testing engine for c,” inACM SIGSOFT Software Engineering Notes, vol. 30, pp. 263–272, ACM, 2005.[32]S. Mechtaev, J. Yi, and A. Roychoudhury, “Angelix: Scalable multiline pro-gram patch synthesis via symbolic analysis,” inSoftware Engineering (ICSE), 2016IEEE/ACM 38th International Conference on, pp. 691–701, IEEE, 2016.[33]L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contractssmarter,” inProceedings of the 2016 ACM SIGSAC Conference on Computer andCommunications Security, CCS ’16, (New York, NY, USA), pp. 254–269, ACM, 2016.[34]E. Brier and M. Joye, “Weierstraß elliptic curves and side-channel attacks,” inInter-national Workshop on Public Key Cryptography, pp. 335–345, Springer, 2002.[35]W. Schindler, K. Lemke, and C. Paar, “A stochastic model for differential side channelcryptanalysis,” inInternational Workshop on Cryptographic Hardware and EmbeddedSystems, pp. 30–46, Springer, 2005.[36]Y. Zhang, “Cache side channels: State of the art and research opportunities,” inProceedings of the 2017 ACM SIGSAC Conference on Computer and CommunicationsSecurity - CCS 17, 2017.[37]J. Chen, Y. Feng, and I. Dillig, “Precise detection of side-channel vulnerabilitiesusing quantitative cartesian hoare logic,” inProceedings of the 2017 ACM SIGSACConference on Computer and Communications Security - CCS 17, 2017.[38]A. Aggarwal and P. Jalote, “Integrating static and dynamic analysis for detectingvulnerabilities,” inComputer Software and Applications Conference, 2006. COMP-SAC’06. 30th Annual International, vol. 1, pp. 343–350, IEEE, 2006.[39]P. Godefroid, N. Klarlund, and K. Sen, “Dart: directed automated random testing,”inACM Sigplan Notices, vol. 40, pp. 213–223, ACM, 2005.42[40]N. Jovanovic, C. Kruegel, and E. Kirda, “Pixy: A static analysis tool for detectingweb application vulnerabilities,” inSecurity and Privacy, 2006 IEEE Symposium on,pp. 6–pp, IEEE, 2006.[41]T. B. J. D. T. Ball, “Deconstructing dynamic symbolic execution,” 2015.[42]M. E. Mostafa Hassan, Caterina Urban and P. Müller, “Maxsmt-based type inferencefor python,” 2018.[43]A. L. F. S. Casper Boone, Niels de Bruin, “Deep learning type inference of pythonfunction signatures using natural language context,” 2019.[44]C. Barrett, A. Stump, C. Tinelli,et al., “The smt-lib standard: Version 2.0,” inPro-ceedings of the 8th International Workshop on Satisfiability Modulo Theories (Edin-burgh, England), vol. 13, p. 14, 2010. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202100461 en_US