Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 動態符號執行測試用於自動深度網絡測試
Dynamic Concolication for Automatic Deep Network Testing
作者 蔣其叡
Chiang, Chi-Rui
貢獻者 郁方
Yu, Fang
蔣其叡
Chiang, Chi-Rui
關鍵詞 動態符號執行測試
自動單元測試
Automatic Unit Testing
Concolic Testing
Python
Dynamic Concolication
NCCU
日期 2024
上傳時間 4-Sep-2024 14:04:08 (UTC+8)
摘要 結合具體測試和符號執行的動態符號執行測試(Concolic testing)已被證明在識別軟體漏洞方面非常有效。本文重點介紹如何應用 PyCT,一種動態符號執行測試工具,用於自動生成單元測試及其所需的輸入。我們的目標不僅是對目標程式進行動態符號執行測試,還通過使用動態子程序追蹤(DST)對目標程式呼叫的子程序和外部庫進行封裝,以實現符號執行,從而檢查其互動中的潛在漏洞。採用該方法的動機在於解決測試過程中動態符號執行變量過早降級的問題,例如,由於不支持的操作,這可能會妨礙後續測試中符號表達式的使用。通過將當前執行及其子程序的輸入升級為動態符號執行變量,我們可以減輕過早降級的影響,從而確保更全面的動態符號執行測試覆蓋率。在遇到無法升級為動態符號執行變量的輸入類型時,我們還在 DST 中引入了模糊測試技術。實驗結果證明了我們的方法在增強對各種 Python 庫的動態符號執行測試方面的有效性,展示了測試覆蓋率的提高和潛在漏洞的檢測能力。我們的方法能夠從最小的初始努力生成大量針對目標庫的測試用例。
Concolic testing, which combines concrete testing and symbolic execution, has proven highly effective in identifying software vulnerabilities. This paper focuses on applying PyCT, a concolic testing tool, for the automated generation of unittests and their required inputs. Our objective is not only to perform concolic testing on the target program but also to employ Dynamic Subroutine Tracking (DST) to wrap the subroutines and external libraries called by the target program for symbolic execution, thereby checking for potential vulnerabilities in their interactions. The motivation behind this approach is to address the issue of premature downgrading of concolic variables during testing, e.g., due to unsupported operations, which can hinder subsequent testing from using symbolic expressions. By upgrading the inputs of current execution and its subroutines to concolic variables, we mitigate the impact of premature downgrading, thus ensuring a more comprehensive concolic testing coverage. We also incorporate fuzzing techniques in DST when encountering input types that cannot be upgraded to concolic variables. Experimental results demonstrate the effectiveness of our approach in enhancing concolic testing for various Python libraries, showcasing improved testing coverage and the detection of potential vulnerabilities. Our method can generate extensive test cases for target libraries from minimal initial efforts.
參考文獻 Ahmadilivani, M. H., Taheri, M., Raik, J., Daneshtalab, M., and Jenihhin, M. (2023). A systematic literature review on hardware reliability assessment methods for deep neural networks. Araki., L. Y. and Peres., L. M. (2018). A systematic review of concolic testing with aplication of test criteria. In Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 2: ICEIS, pages 121–132. INSTICC, SciTePress. Bai, T., Huang, S., Huang, Y., Wang, X., Xia, C., Qu, Y., and Yang, Z. (2024). Criticalfuzz: A critical neuron coverage-guided fuzz testing framework for deep neural networks. Information and Software Technology, 172:107476. Ball, T. and Daniel, J. (2015). Deconstructing dynamic symbolic execution. In Irlbeck, M., Peled, D. A., and Pretschner, A., editors, Dependable Software Systems Engineering, volume 40 of NATO Science for Peace and Security Series, D: Information and Communication Security, pages 26–41. IOS Press. Cadar, C. and Sen, K. (2013). Symbolic execution for software testing: three decades later. Commun. ACM, 56(2):82–90. Caniço, A. B. and Santos, A. L. (2023). Witter: A library for white-box testing of introductory programming algorithms. In Proceedings of the 2023 ACM SIGPLAN Interna-tional Symposium on SPLASH-E, SPLASH-E 2023, page 69–74, New York, NY, USA. Association for Computing Machinery. Chen, Y.-F., Tsai, W.-L., Wu, W.-C., Yen, D.-D., and Yu, F. (2021). Pyct: A python concolic tester. In Oh, H., editor, Programming Languages and Systems, pages 38–46, Cham. Springer International Publishing. Gopinath, D., Wang, K., Zhang, M., Pasareanu, C. S., and Khurshid, S. (2018). Symbolic execution for deep neural networks. Gu, J., Luo, X., Zhou, Y., and Wang, X. (2022). Muffin: Testing deep learning libraries via neural architecture fuzzing. Huang, J.-t., Zhang, J., Wang, W., He, P., Su, Y., and Lyu, M. R. (2022). Aeon: A method for automatic evaluation of nlp test cases. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2022, page 202– 214, New York, NY, USA. Association for Computing Machinery. Ji, P., Feng, Y., Liu, J., Zhao, Z., and Chen, Z. (2022). Asrtest: Automated testing for deepneural- network-driven speech recognition systems. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2022, page 189–201, New York, NY, USA. Association for Computing Machinery. Khan, M. (2011). Different approaches to black box testing technique for finding errors. International Journal of Software Engineering Applications, 2. Klees, G., Ruef, A., Cooper, B., Wei, S., and Hicks, M. (2018). Evaluating fuzz testing. Li, R., Yang, P., Huang, C.-C., Sun, Y., Xue, B., and Zhang, L. (2022). Towards practical robustness analysis for dnns based on pac-model learning. In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 2189–2201, New York, NY, USA. Association for Computing Machinery. Liu, Z., Feng, Y., and Chen, Z. (2021). Dialtest: Automated testing for recurrent-neuralnetwork- driven dialogue systems. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, page 115–126, New York, NY, USA. Association for Computing Machinery. Manès, V. J., Han, H., Han, C., Cha, S. K., Egele, M., Schwartz, E. J., and Woo, M. (2021). The art, science, and engineering of fuzzing: A survey. IEEE Transactions on Software Engineering, 47(11):2312–2331. Sen, K., Marinov, D., and Agha, G. (2005). Cute: a concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, page 263–272, New York, NY, USA. Association for Computing Machinery. Wang, S., Shrestha, N., Subburaman, A. K., Wang, J., Wei, M., and Nagappan, N. (2021a). Automatic unit test generation for machine learning libraries: How far are we? In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1548–1560. Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., and Zhang, W. (2021b). Prioritizing test inputs for deep neural networks via mutation analysis. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 397–409. Xia, C. S., Dutta, S., Misailovic, S., Marinov, D., and Zhang, L. (2023). Balancing effectiveness and flakiness of non-deterministic machine learning tests. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1801–1813. Xie, D., Li, Y., Kim, M., Pham, H. V., Tan, L., Zhang, X., and Godfrey, M. W. (2022). Docter: Documentation-guided fuzzing for testing deep learning api functions. ISSTA 2022, page 176–188, New York, NY, USA. Association for Computing Machinery. Yang, C., Deng, Y., Yao, J., Tu, Y., Li, H., and Zhang, L. (2023). Fuzzing automatic differentiation in deep-learning libraries. Yu, F., Chi, Y.-Y., and Chen, Y.-F. (2024a). Constraint-based adversarial example synthesis. Yu, F., Chi, Y.-Y., and Chen, Y.-F. (2024b). Constraint-based adversarial example synthesis. Zhang, J. and Li, J. (2020). Testing and verification of neural-network-based safety-critical control software: A systematic literature review. Information and Software Technology, 123:106296. Zhang, X., Sun, N., Fang, C., Liu, J., Liu, J., Chai, D., Wang, J., and Chen, Z. (2021). Predoo: Precision testing of deep learning operators. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, page 400–412, New York, NY, USA. Association for Computing Machinery. Zhao, X., Qu, H., Xu, J., Li, X., Lv, W., and Wang, G.-G. (2023). A systematic review of fuzzing. Soft Comput., 28(6):5493–5522.
描述 碩士
國立政治大學
資訊管理學系
111356024
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111356024
資料類型 thesis
dc.contributor.advisor 郁方zh_TW
dc.contributor.advisor Yu, Fangen_US
dc.contributor.author (Authors) 蔣其叡zh_TW
dc.contributor.author (Authors) Chiang, Chi-Ruien_US
dc.creator (作者) 蔣其叡zh_TW
dc.creator (作者) Chiang, Chi-Ruien_US
dc.date (日期) 2024en_US
dc.date.accessioned 4-Sep-2024 14:04:08 (UTC+8)-
dc.date.available 4-Sep-2024 14:04:08 (UTC+8)-
dc.date.issued (上傳時間) 4-Sep-2024 14:04:08 (UTC+8)-
dc.identifier (Other Identifiers) G0111356024en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153152-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 111356024zh_TW
dc.description.abstract (摘要) 結合具體測試和符號執行的動態符號執行測試(Concolic testing)已被證明在識別軟體漏洞方面非常有效。本文重點介紹如何應用 PyCT,一種動態符號執行測試工具,用於自動生成單元測試及其所需的輸入。我們的目標不僅是對目標程式進行動態符號執行測試,還通過使用動態子程序追蹤(DST)對目標程式呼叫的子程序和外部庫進行封裝,以實現符號執行,從而檢查其互動中的潛在漏洞。採用該方法的動機在於解決測試過程中動態符號執行變量過早降級的問題,例如,由於不支持的操作,這可能會妨礙後續測試中符號表達式的使用。通過將當前執行及其子程序的輸入升級為動態符號執行變量,我們可以減輕過早降級的影響,從而確保更全面的動態符號執行測試覆蓋率。在遇到無法升級為動態符號執行變量的輸入類型時,我們還在 DST 中引入了模糊測試技術。實驗結果證明了我們的方法在增強對各種 Python 庫的動態符號執行測試方面的有效性,展示了測試覆蓋率的提高和潛在漏洞的檢測能力。我們的方法能夠從最小的初始努力生成大量針對目標庫的測試用例。zh_TW
dc.description.abstract (摘要) Concolic testing, which combines concrete testing and symbolic execution, has proven highly effective in identifying software vulnerabilities. This paper focuses on applying PyCT, a concolic testing tool, for the automated generation of unittests and their required inputs. Our objective is not only to perform concolic testing on the target program but also to employ Dynamic Subroutine Tracking (DST) to wrap the subroutines and external libraries called by the target program for symbolic execution, thereby checking for potential vulnerabilities in their interactions. The motivation behind this approach is to address the issue of premature downgrading of concolic variables during testing, e.g., due to unsupported operations, which can hinder subsequent testing from using symbolic expressions. By upgrading the inputs of current execution and its subroutines to concolic variables, we mitigate the impact of premature downgrading, thus ensuring a more comprehensive concolic testing coverage. We also incorporate fuzzing techniques in DST when encountering input types that cannot be upgraded to concolic variables. Experimental results demonstrate the effectiveness of our approach in enhancing concolic testing for various Python libraries, showcasing improved testing coverage and the detection of potential vulnerabilities. Our method can generate extensive test cases for target libraries from minimal initial efforts.en_US
dc.description.tableofcontents 1 Introduction 1 2 Related Work 5 2.1 Concolic Testing 5 2.2 Testing Methodologies for Deep Learning Models 7 3 methodology 10 3.1 PyCT Testing of Invoked Subroutine 10 3.2 PyCT Testing of Dynamic Subroutine Tracking 11 3.2.1 Profiler for Function Call Tracing 12 3.2.2 Checking suitable argument type for PyCT 14 3.2.3 Example Application: Analyzing the updatecache Function in linecache.py 15 3.2.4 Advantages of Dynamic Subroutine Tracking 15 3.3 Fuzzing Testing of Dynamic Subroutine Tracking 17 3.3.1 Detailed Steps of the Fuzzing-Integrated DST 18 3.3.2 Advantages and Challenges 19 4 Experiments 21 4.1 PyCT Testing of Dynamic Subroutine Tracking 21 5 Experiment Result 25 5.1 Python libraries 25 5.2 ML library 27 5.3 Inference of NN models 29 6 Discussion 31 7 Conclusions 33 Reference 34zh_TW
dc.format.extent 752779 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111356024en_US
dc.subject (關鍵詞) 動態符號執行測試zh_TW
dc.subject (關鍵詞) 自動單元測試zh_TW
dc.subject (關鍵詞) Automatic Unit Testingen_US
dc.subject (關鍵詞) Concolic Testingen_US
dc.subject (關鍵詞) Pythonen_US
dc.subject (關鍵詞) Dynamic Concolicationen_US
dc.subject (關鍵詞) NCCUen_US
dc.title (題名) 動態符號執行測試用於自動深度網絡測試zh_TW
dc.title (題名) Dynamic Concolication for Automatic Deep Network Testingen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Ahmadilivani, M. H., Taheri, M., Raik, J., Daneshtalab, M., and Jenihhin, M. (2023). A systematic literature review on hardware reliability assessment methods for deep neural networks. Araki., L. Y. and Peres., L. M. (2018). A systematic review of concolic testing with aplication of test criteria. In Proceedings of the 20th International Conference on Enterprise Information Systems - Volume 2: ICEIS, pages 121–132. INSTICC, SciTePress. Bai, T., Huang, S., Huang, Y., Wang, X., Xia, C., Qu, Y., and Yang, Z. (2024). Criticalfuzz: A critical neuron coverage-guided fuzz testing framework for deep neural networks. Information and Software Technology, 172:107476. Ball, T. and Daniel, J. (2015). Deconstructing dynamic symbolic execution. In Irlbeck, M., Peled, D. A., and Pretschner, A., editors, Dependable Software Systems Engineering, volume 40 of NATO Science for Peace and Security Series, D: Information and Communication Security, pages 26–41. IOS Press. Cadar, C. and Sen, K. (2013). Symbolic execution for software testing: three decades later. Commun. ACM, 56(2):82–90. Caniço, A. B. and Santos, A. L. (2023). Witter: A library for white-box testing of introductory programming algorithms. In Proceedings of the 2023 ACM SIGPLAN Interna-tional Symposium on SPLASH-E, SPLASH-E 2023, page 69–74, New York, NY, USA. Association for Computing Machinery. Chen, Y.-F., Tsai, W.-L., Wu, W.-C., Yen, D.-D., and Yu, F. (2021). Pyct: A python concolic tester. In Oh, H., editor, Programming Languages and Systems, pages 38–46, Cham. Springer International Publishing. Gopinath, D., Wang, K., Zhang, M., Pasareanu, C. S., and Khurshid, S. (2018). Symbolic execution for deep neural networks. Gu, J., Luo, X., Zhou, Y., and Wang, X. (2022). Muffin: Testing deep learning libraries via neural architecture fuzzing. Huang, J.-t., Zhang, J., Wang, W., He, P., Su, Y., and Lyu, M. R. (2022). Aeon: A method for automatic evaluation of nlp test cases. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2022, page 202– 214, New York, NY, USA. Association for Computing Machinery. Ji, P., Feng, Y., Liu, J., Zhao, Z., and Chen, Z. (2022). Asrtest: Automated testing for deepneural- network-driven speech recognition systems. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2022, page 189–201, New York, NY, USA. Association for Computing Machinery. Khan, M. (2011). Different approaches to black box testing technique for finding errors. International Journal of Software Engineering Applications, 2. Klees, G., Ruef, A., Cooper, B., Wei, S., and Hicks, M. (2018). Evaluating fuzz testing. Li, R., Yang, P., Huang, C.-C., Sun, Y., Xue, B., and Zhang, L. (2022). Towards practical robustness analysis for dnns based on pac-model learning. In Proceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 2189–2201, New York, NY, USA. Association for Computing Machinery. Liu, Z., Feng, Y., and Chen, Z. (2021). Dialtest: Automated testing for recurrent-neuralnetwork- driven dialogue systems. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, page 115–126, New York, NY, USA. Association for Computing Machinery. Manès, V. J., Han, H., Han, C., Cha, S. K., Egele, M., Schwartz, E. J., and Woo, M. (2021). The art, science, and engineering of fuzzing: A survey. IEEE Transactions on Software Engineering, 47(11):2312–2331. Sen, K., Marinov, D., and Agha, G. (2005). Cute: a concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, page 263–272, New York, NY, USA. Association for Computing Machinery. Wang, S., Shrestha, N., Subburaman, A. K., Wang, J., Wei, M., and Nagappan, N. (2021a). Automatic unit test generation for machine learning libraries: How far are we? In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1548–1560. Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., and Zhang, W. (2021b). Prioritizing test inputs for deep neural networks via mutation analysis. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 397–409. Xia, C. S., Dutta, S., Misailovic, S., Marinov, D., and Zhang, L. (2023). Balancing effectiveness and flakiness of non-deterministic machine learning tests. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1801–1813. Xie, D., Li, Y., Kim, M., Pham, H. V., Tan, L., Zhang, X., and Godfrey, M. W. (2022). Docter: Documentation-guided fuzzing for testing deep learning api functions. ISSTA 2022, page 176–188, New York, NY, USA. Association for Computing Machinery. Yang, C., Deng, Y., Yao, J., Tu, Y., Li, H., and Zhang, L. (2023). Fuzzing automatic differentiation in deep-learning libraries. Yu, F., Chi, Y.-Y., and Chen, Y.-F. (2024a). Constraint-based adversarial example synthesis. Yu, F., Chi, Y.-Y., and Chen, Y.-F. (2024b). Constraint-based adversarial example synthesis. Zhang, J. and Li, J. (2020). Testing and verification of neural-network-based safety-critical control software: A systematic literature review. Information and Software Technology, 123:106296. Zhang, X., Sun, N., Fang, C., Liu, J., Liu, J., Chai, D., Wang, J., and Chen, Z. (2021). Predoo: Precision testing of deep learning operators. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2021, page 400–412, New York, NY, USA. Association for Computing Machinery. Zhao, X., Qu, H., Xu, J., Li, X., Lv, W., and Wang, G.-G. (2023). A systematic review of fuzzing. Soft Comput., 28(6):5493–5522.zh_TW