結合大型語言模型之代理用於 Android App 錯誤重現任務 | Publication

Publications-Theses

Article View/Open

pdf(114)

Publication Export

Google Scholar^TM

題名	結合大型語言模型之代理用於 Android App 錯誤重現任務 Combining Large Language Models for Agent Tasks in Android App Bug Reproduction
作者	黃毓學
貢獻者	蔡子傑黃毓學
關鍵詞	自動化錯誤重現軟體測試除錯大型語言模型提示工程 Android App Automated Bug Reproduction Software testing and debugging Large Language Models Prompt Engineering
日期	2025
上傳時間	3-Mar-2025 14:28:52 (UTC+8)
摘要	代理（Agent）任務與大型語言模型（Large Language Models, LLM）兩者研究領域持續互相影響者，代理任務為 LLM 模型擴展了更多數據的類別，而 LLM 為代理研究解決了以往透過強化學習、監督學習做不到的問題，兩者結合謂為趨勢。本文即是探討使用 LLM 作為代理，試圖解決運行在 Android App中的錯誤描述的重現任務中，當遺失錯誤步驟過多時，因而無法用強化學習方式順利重現錯誤的問題。透過任務的轉換將強化學習獎勵設計的困難，轉為如何輸入適當的提示詞給 LLM，包括使用日誌解析工具來降低長上下文對 LLM 的生成文字準確性的影響。借鏡強化學習訓練的思維，高度結合 LLM，為降低龐大狀態空間搜索，代理可能低效率探索，本文使用子目標區域（subgoal regions）的概念，透過 LLM 找出只與目標句有高度關聯的區域去搜索，進而降低要搜尋比對的數量。也將問題拆解成可以用 LLM 作為代理去運行的子任務，規劃流程為子目標區、制定靜態計畫、動態調整、動態探索的流程、應用 LLM 的規劃（planning）、推理（reasoning）、提取代換文字的能力。本文貢獻為在大量遺漏描述任務如何結合 LLM 在錯誤重現任務的提示工程。從流程各項子任務評估驗證 LLM 的規劃及推理能力，評估結果：在子目標區域（subgoal regions）子任務，本文使用 GPT-4 在 Top-1 Accuracy：57%， Top-2 Accuracy：100% 可映射到正確目標區域。在靜態計畫子任務中 LLM 的表現，有 Top-1 Accuracy - 42%、 Top-2 Accuracy - 71%、Top-3 Accuracy - 100%。同時為了減少長上下文的影響，對 LLM 可能會有不正確的生成，因此使用事件日誌提取參數的工具 Spell 演算法，使得在提取特定字串的子任務中，LLM 有 90%的準確率。但在兩項子任務中，將提取後相關文字代換的子任務，以及在動態生成建議行動的子任務中，LLM 都呈現偽陽性（false positive）高的狀況，這在錯誤重現任務中，並不能允許這樣情況發生，因為可能導致後續重現錯誤的基礎與使用者描述不一致，這個結果顯示 LLM 代理用於錯誤重現任務在自動化仍有提升的空間。未來研究方向為用思考方式的語言模型或是使用 Open AI 近期提出強化微調（Reinforcement Learning Fine-Tuning）方式，透過訓練調整 LLM 輸出的順序，使 LLM 代理能在特定任務中發揮更準確的表現，使錯誤重現任務達到自動化的目標。 The research fields of agent tasks and large language models (LLMs) continue to influence each other. Agent tasks expand the types of data available to LLMs, while LLMs solve problems that reinforcement learning and supervised learning could not address in the past. The combination of these two approaches is becoming a trend. This paper explores using LLMs as agents to solve the problem of reproducing error descriptions in Android apps when too many error steps are missing, making it difficult to reproduce the error using reinforcement learning. By transforming the task, the difficulty of reward design in reinforcement learning is shifted to how to input appropriate prompts to the LLM. This includes using log parsing tools to reduce the impact of long contexts on the accuracy of the generated text by the LLM. Drawing on reinforcement learning training concepts and closely integrating LLMs, the paper aims to reduce the inefficiency of agent exploration in a large state space. The concept of subgoal regions is employed to use LLMs to identify areas highly related to the target sentence, thereby reducing the number of regions to search and compare. The task is broken down into subtasks that can be executed by LLMs as agents, with a process that includes subgoal regions, static planning, dynamic adjustments, dynamic exploration, and the application of LLM capabilities such as planning, reasoning, and extracting substitute text. The contribution of this paper is in how LLMs are integrated into error reproduction tasks with prompt engineering, particularly when a significant portion of the description is missing. The paper evaluates LLM planning and reasoning capabilities across various subtasks in the workflow. The evaluation results show that for the subgoal regions subtask, using GPT-4 achieved Top-1 Accuracy: 57% and Top-2 Accuracy: 100%, mapping to the correct target regions. In the static planning subtask, LLM performance achieved Top-1 Accuracy: 42%, Top-2 Accuracy: 71%, and Top-3 Accuracy: 100%. To reduce the impact of long contexts, which can lead to inaccurate generation, the Spell algorithm, an event log parameter extraction tool, was used in specific string extraction subtasks, leading to 90% accuracy for the LLM. However, in two subtasks—substitute text extraction and dynamically generating suggested actions—the LLM showed high false positive rates, which is unacceptable in the error reproduction task, as it may result in a mismatch between the foundation of error reproduction and the user’s description. This outcome indicates that there is still room for improvement in using LLMs as agents for automating error reproduction tasks. Future research directions include using models with a reasoning approach or OpenAI’s recently proposed reinforcement learning fine-tuning method to adjust the order of LLM outputs through training. This will enable LLM agents to perform more accurately in specific tasks, ultimately achieving the goal of automating error reproduction tasks.
參考文獻	[1] Zhang, Z., Winn, R., Zhao, Y., Yu, T., & Halfond, W. G. J. (2023). Automatically Reproducing Android Bug Reports using Natural Language Processing and Reinforcement Learning Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA. https://doi.org/10.1145/3597926.3598066 [2] Zhang, Z., Tawsif, F. M., Ryu, K., Yu, T., & Halfond, W. G. J. (2024). Mobile Bug Report Reproduction via Global Search on the App UI Model. Proc. ACM Softw. Eng., 1(FSE), Article 117. https://doi.org/10.1145/3660824 [3] Ran, D., Wang, H., Song, Z., Wu, M., Cao, Y., Zhang, Y., Yang, W., & Xie, T. (2024). Guardian: A Runtime Framework for LLM-Based UI Exploration Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria. https://doi.org/10.1145/3650212.3680334 [4] Dziri, N., Lu, X., Sclar, M., Li, X. L., Jiang, L., Yuchen Lin, B., West, P., Bhagavatula, C., Le Bras, R., Hwang, J. D., Sanyal, S., Welleck, S., Ren, X., Ettinger, A., Harchaoui, Z., & Choi, Y. (2023). Faith and Fate: Limits of Transformers on Compositionality. arXiv:2305.18654. Retrieved May 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230518654D [5] Zheran Liu, E., et al. (2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration. arXiv:1802.08802 DOI: 10.48550/arXiv.1802.08802 [6] Kim, G., et al. (2023) Language Models can Solve Computer Tasks. arXiv:2303.17491 DOI: 10.48550/arXiv.2303.17491 [7] Xi, Z., et al. (2023) The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864 DOI: 10.48550/arXiv.2309.07864 [8] Jothimurugan, K., Bastani, O., & Alur, R. (2020). Abstract Value Iteration for Hierarchical Reinforcement Learning. arXiv:2010.15638. Retrieved October 01, 2020, from https://ui.adsabs.harvard.edu/abs/2020arXiv201015638J [9] Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T., Abbeel, P., Gupta, A., & Andreas, J. (2023). Guiding Pretraining in Reinforcement Learning with Large Language Models. arXiv:2302.06692. Retrieved February 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230206692D [10] Song, C. H., Wu, J., Washington, C., Sadler, B. M., Chao, W.-L., & Su, Y. (2022). LLM- Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. arXiv:2212.04088. Retrieved December 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv221204088S [11] Lan, Y., Lu, Y., Li, Z., Pan, M., Yang, W., Zhang, T., & Li, X. (2024). Deeply Reinforcing Android GUI Testing with Deep Reinforcement Learning Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal. https://doi.org/10.1145/3597503.362334460 [12] Du, M., & Li, F. (2016). Spell: Streaming parsing of system event logs. 2016 IEEE 16th International Conference on Data Mining (ICDM), [13] Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res., 22(1), Article 268. [14] Huang, S., & Ontañón, S. (2020). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. arXiv:2006.14171. Retrieved June 01, 2020, from https://ui.adsabs.harvard.edu/abs/2020arXiv200614171H [15] Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. [16] Gaon, M., & Brafman, R. (2020). Reinforcement learning with non-markovian rewards. Proceedings of the AAAI conference on artificial intelligence, [17] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C.,…Amodei, D. (2020). Language models are few-shot learners Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada. [18] Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., Shen, Y., Zhou, J., Chen, S., Gui, T., Zhang, Q., & Huang, X. (2023). A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. ArXiv, abs/2303.10420. [19] Achiam, O. J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H.-i., Bavarian, M., Belgum, J., Bello, I.,…Zoph, B. (2023). GPT-4 Technical Report. [20] Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, Texas, USA. https://doi.org/10.1145/3133956.3134015 [21] Wang, W., Bao, H., Huang, S., Dong, L., & Wei, F. (2021, August). MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. In C. Zong, F. Xia, W. Li, & R. Navigli, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 Online. [22] UI Automator. https://developer.android.com/training/testing/ui-automator. [23] 2020. Bug Report – AnkiDroid 6432. https://github.com/ankidroid/Anki-Android/issues/6432 [24] 2023. ReproBot Website. https://sites.google.com/usc.edu/reprobot/home.61 [25] Wang, D., Zhao, Y., Feng, S., Zhang, Z., Halfond, W. G. J., Chen, C., Sun, X., Shi, J., & Yu, T. (2024). Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps. arXiv:2407.05165. Retrieved July 01, 2024, from https://ui.adsabs.harvard.edu/abs/2024arXiv240705165W [26] Peng, A., Sucholutsky, I., Li, B. Z., Sumers, T. R., Griffiths, T. L., Andreas, J., & Shah, J.A. (2024). Learning with language-guided state abstractions. arXiv preprint arXiv:2402.18759.
描述	碩士國立政治大學資訊科學系碩士在職專班 110971022
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110971022
資料類型	thesis

dc.contributor.advisor	蔡子傑	zh_TW
dc.contributor.author (Authors)	黃毓學	zh_TW
dc.creator (作者)	黃毓學	zh_TW
dc.date (日期)	2025	en_US
dc.date.accessioned	3-Mar-2025 14:28:52 (UTC+8)	-
dc.date.available	3-Mar-2025 14:28:52 (UTC+8)	-
dc.date.issued (上傳時間)	3-Mar-2025 14:28:52 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110971022	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/155990	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	110971022	zh_TW
dc.description.abstract (摘要)	代理（Agent）任務與大型語言模型（Large Language Models, LLM）兩者研究領域持續互相影響者，代理任務為 LLM 模型擴展了更多數據的類別，而 LLM 為代理研究解決了以往透過強化學習、監督學習做不到的問題，兩者結合謂為趨勢。本文即是探討使用 LLM 作為代理，試圖解決運行在 Android App中的錯誤描述的重現任務中，當遺失錯誤步驟過多時，因而無法用強化學習方式順利重現錯誤的問題。透過任務的轉換將強化學習獎勵設計的困難，轉為如何輸入適當的提示詞給 LLM，包括使用日誌解析工具來降低長上下文對 LLM 的生成文字準確性的影響。借鏡強化學習訓練的思維，高度結合 LLM，為降低龐大狀態空間搜索，代理可能低效率探索，本文使用子目標區域（subgoal regions）的概念，透過 LLM 找出只與目標句有高度關聯的區域去搜索，進而降低要搜尋比對的數量。也將問題拆解成可以用 LLM 作為代理去運行的子任務，規劃流程為子目標區、制定靜態計畫、動態調整、動態探索的流程、應用 LLM 的規劃（planning）、推理（reasoning）、提取代換文字的能力。本文貢獻為在大量遺漏描述任務如何結合 LLM 在錯誤重現任務的提示工程。從流程各項子任務評估驗證 LLM 的規劃及推理能力，評估結果：在子目標區域（subgoal regions）子任務，本文使用 GPT-4 在 Top-1 Accuracy：57%， Top-2 Accuracy：100% 可映射到正確目標區域。在靜態計畫子任務中 LLM 的表現，有 Top-1 Accuracy - 42%、 Top-2 Accuracy - 71%、Top-3 Accuracy - 100%。同時為了減少長上下文的影響，對 LLM 可能會有不正確的生成，因此使用事件日誌提取參數的工具 Spell 演算法，使得在提取特定字串的子任務中，LLM 有 90%的準確率。但在兩項子任務中，將提取後相關文字代換的子任務，以及在動態生成建議行動的子任務中，LLM 都呈現偽陽性（false positive）高的狀況，這在錯誤重現任務中，並不能允許這樣情況發生，因為可能導致後續重現錯誤的基礎與使用者描述不一致，這個結果顯示 LLM 代理用於錯誤重現任務在自動化仍有提升的空間。未來研究方向為用思考方式的語言模型或是使用 Open AI 近期提出強化微調（Reinforcement Learning Fine-Tuning）方式，透過訓練調整 LLM 輸出的順序，使 LLM 代理能在特定任務中發揮更準確的表現，使錯誤重現任務達到自動化的目標。	zh_TW
dc.description.abstract (摘要)	The research fields of agent tasks and large language models (LLMs) continue to influence each other. Agent tasks expand the types of data available to LLMs, while LLMs solve problems that reinforcement learning and supervised learning could not address in the past. The combination of these two approaches is becoming a trend. This paper explores using LLMs as agents to solve the problem of reproducing error descriptions in Android apps when too many error steps are missing, making it difficult to reproduce the error using reinforcement learning. By transforming the task, the difficulty of reward design in reinforcement learning is shifted to how to input appropriate prompts to the LLM. This includes using log parsing tools to reduce the impact of long contexts on the accuracy of the generated text by the LLM. Drawing on reinforcement learning training concepts and closely integrating LLMs, the paper aims to reduce the inefficiency of agent exploration in a large state space. The concept of subgoal regions is employed to use LLMs to identify areas highly related to the target sentence, thereby reducing the number of regions to search and compare. The task is broken down into subtasks that can be executed by LLMs as agents, with a process that includes subgoal regions, static planning, dynamic adjustments, dynamic exploration, and the application of LLM capabilities such as planning, reasoning, and extracting substitute text. The contribution of this paper is in how LLMs are integrated into error reproduction tasks with prompt engineering, particularly when a significant portion of the description is missing. The paper evaluates LLM planning and reasoning capabilities across various subtasks in the workflow. The evaluation results show that for the subgoal regions subtask, using GPT-4 achieved Top-1 Accuracy: 57% and Top-2 Accuracy: 100%, mapping to the correct target regions. In the static planning subtask, LLM performance achieved Top-1 Accuracy: 42%, Top-2 Accuracy: 71%, and Top-3 Accuracy: 100%. To reduce the impact of long contexts, which can lead to inaccurate generation, the Spell algorithm, an event log parameter extraction tool, was used in specific string extraction subtasks, leading to 90% accuracy for the LLM. However, in two subtasks—substitute text extraction and dynamically generating suggested actions—the LLM showed high false positive rates, which is unacceptable in the error reproduction task, as it may result in a mismatch between the foundation of error reproduction and the user’s description. This outcome indicates that there is still room for improvement in using LLMs as agents for automating error reproduction tasks. Future research directions include using models with a reasoning approach or OpenAI’s recently proposed reinforcement learning fine-tuning method to adjust the order of LLM outputs through training. This will enable LLM agents to perform more accurately in specific tasks, ultimately achieving the goal of automating error reproduction tasks.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節背景 1 第二節錯誤重現任務介紹 2 第三節研究動機與目標 5 第四節相關文獻研究 8 第五節章節介紹 13 第二章使用方法理論基礎 15 第一節方法建構時參考文獻 15 第二節錯誤重現代理任務的框架-文字正規化 18 第三節降低搜尋空間的複雜度 - 劃分子目標區域空間 20 第四節使用預訓練語言模組做子目標區域匹配 21 第五節輸出低層次的動作規劃 23 第六節探索獎勵函數，改爲語言目標方式 24 第七節大型語言模型應用在人類語言的電腦任務 26 第八節減少上下文的資訊萃取方式 27 第三章實驗方法 29 第一節各節介紹 29 第二節空間劃分及子目標區域路徑 30 第三節系統架構圖 32 第四節句子轉換為錯誤重現步驟及比對 37 第五節子目標區域的劃分 40 第六節動態探索與靜態計畫修正 42 第四章實驗結果與分析 48 第一節實驗說明 48 第二節實驗結果 53 第三節結果分析 58 第五章結論與未來展望 59 參考文獻 60	zh_TW
dc.format.extent	3293134 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110971022	en_US
dc.subject (關鍵詞)	自動化錯誤重現	zh_TW
dc.subject (關鍵詞)	軟體測試除錯	zh_TW
dc.subject (關鍵詞)	大型語言模型	zh_TW
dc.subject (關鍵詞)	提示工程	zh_TW
dc.subject (關鍵詞)	Android App	en_US
dc.subject (關鍵詞)	Automated Bug Reproduction	en_US
dc.subject (關鍵詞)	Software testing and debugging	en_US
dc.subject (關鍵詞)	Large Language Models	en_US
dc.subject (關鍵詞)	Prompt Engineering	en_US
dc.title (題名)	結合大型語言模型之代理用於 Android App 錯誤重現任務	zh_TW
dc.title (題名)	Combining Large Language Models for Agent Tasks in Android App Bug Reproduction	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Zhang, Z., Winn, R., Zhao, Y., Yu, T., & Halfond, W. G. J. (2023). Automatically Reproducing Android Bug Reports using Natural Language Processing and Reinforcement Learning Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA. https://doi.org/10.1145/3597926.3598066 [2] Zhang, Z., Tawsif, F. M., Ryu, K., Yu, T., & Halfond, W. G. J. (2024). Mobile Bug Report Reproduction via Global Search on the App UI Model. Proc. ACM Softw. Eng., 1(FSE), Article 117. https://doi.org/10.1145/3660824 [3] Ran, D., Wang, H., Song, Z., Wu, M., Cao, Y., Zhang, Y., Yang, W., & Xie, T. (2024). Guardian: A Runtime Framework for LLM-Based UI Exploration Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria. https://doi.org/10.1145/3650212.3680334 [4] Dziri, N., Lu, X., Sclar, M., Li, X. L., Jiang, L., Yuchen Lin, B., West, P., Bhagavatula, C., Le Bras, R., Hwang, J. D., Sanyal, S., Welleck, S., Ren, X., Ettinger, A., Harchaoui, Z., & Choi, Y. (2023). Faith and Fate: Limits of Transformers on Compositionality. arXiv:2305.18654. Retrieved May 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230518654D [5] Zheran Liu, E., et al. (2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration. arXiv:1802.08802 DOI: 10.48550/arXiv.1802.08802 [6] Kim, G., et al. (2023) Language Models can Solve Computer Tasks. arXiv:2303.17491 DOI: 10.48550/arXiv.2303.17491 [7] Xi, Z., et al. (2023) The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864 DOI: 10.48550/arXiv.2309.07864 [8] Jothimurugan, K., Bastani, O., & Alur, R. (2020). Abstract Value Iteration for Hierarchical Reinforcement Learning. arXiv:2010.15638. Retrieved October 01, 2020, from https://ui.adsabs.harvard.edu/abs/2020arXiv201015638J [9] Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T., Abbeel, P., Gupta, A., & Andreas, J. (2023). Guiding Pretraining in Reinforcement Learning with Large Language Models. arXiv:2302.06692. Retrieved February 01, 2023, from https://ui.adsabs.harvard.edu/abs/2023arXiv230206692D [10] Song, C. H., Wu, J., Washington, C., Sadler, B. M., Chao, W.-L., & Su, Y. (2022). LLM- Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. arXiv:2212.04088. Retrieved December 01, 2022, from https://ui.adsabs.harvard.edu/abs/2022arXiv221204088S [11] Lan, Y., Lu, Y., Li, Z., Pan, M., Yang, W., Zhang, T., & Li, X. (2024). Deeply Reinforcing Android GUI Testing with Deep Reinforcement Learning Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal. https://doi.org/10.1145/3597503.362334460 [12] Du, M., & Li, F. (2016). Spell: Streaming parsing of system event logs. 2016 IEEE 16th International Conference on Data Mining (ICDM), [13] Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res., 22(1), Article 268. [14] Huang, S., & Ontañón, S. (2020). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. arXiv:2006.14171. Retrieved June 01, 2020, from https://ui.adsabs.harvard.edu/abs/2020arXiv200614171H [15] Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems. [16] Gaon, M., & Brafman, R. (2020). Reinforcement learning with non-markovian rewards. Proceedings of the AAAI conference on artificial intelligence, [17] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C.,…Amodei, D. (2020). Language models are few-shot learners Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada. [18] Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y., Zhou, Z., Gong, C., Shen, Y., Zhou, J., Chen, S., Gui, T., Zhang, Q., & Huang, X. (2023). A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. ArXiv, abs/2303.10420. [19] Achiam, O. J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H.-i., Bavarian, M., Belgum, J., Bello, I.,…Zoph, B. (2023). GPT-4 Technical Report. [20] Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, Texas, USA. https://doi.org/10.1145/3133956.3134015 [21] Wang, W., Bao, H., Huang, S., Dong, L., & Wei, F. (2021, August). MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. In C. Zong, F. Xia, W. Li, & R. Navigli, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 Online. [22] UI Automator. https://developer.android.com/training/testing/ui-automator. [23] 2020. Bug Report – AnkiDroid 6432. https://github.com/ankidroid/Anki-Android/issues/6432 [24] 2023. ReproBot Website. https://sites.google.com/usc.edu/reprobot/home.61 [25] Wang, D., Zhao, Y., Feng, S., Zhang, Z., Halfond, W. G. J., Chen, C., Sun, X., Shi, J., & Yu, T. (2024). Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps. arXiv:2407.05165. Retrieved July 01, 2024, from https://ui.adsabs.harvard.edu/abs/2024arXiv240705165W [26] Peng, A., Sucholutsky, I., Li, B. Z., Sumers, T. R., Griffiths, T. L., Andreas, J., & Shah, J.A. (2024). Learning with language-guided state abstractions. arXiv preprint arXiv:2402.18759.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM