Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 環境政策與永續轉型的動態均衡:異質參與者之強化學習分析
The Dynamics of Environmental Policy and Sustainability Transition: A Heterogeneous Multi-Agent Reinforcement Learning Approach
作者 曾婷婉
Tseng, Ting-Wan
貢獻者 何靜嫺
曾婷婉
Tseng, Ting-Wan
關鍵詞 多參與者之強化學習
環境政策
綠能轉型
碳稅
永續性
不完全資訊
不完全競爭
個體基礎模型
目標性補貼
政策模擬
Multi-agent reinforcement learning
Environmental policy
Green transition
Carbon tax
Sustainability
Incomplete information
Imperfect competition
Agent-based modeling
Targeted subsidy
Policy simulation
日期 2025
上傳時間 4-Aug-2025 12:49:24 (UTC+8)
摘要 本研究使用多參與者強化學習(MARL)模型,以探討在具有不完全競爭與資訊不完全的市場中,環境政策所帶來的動態影響。模型中的參與者包括消費者、企業與政府,透過反覆互動學習行為策略,進而做出內生性的決策,如綠能投資、定價與勞動供給。本模型納入了偏好雜訊、碳稅制度與目標性補貼等設計。模擬結果顯示,資訊不對稱促進參與者的實驗性行為,並加速綠能技術的採用;相對地,共通性衝擊雖有助於企業勾結,但會抑制綠能革新。此外,針對綠能領導企業與低薪勞工所設計的浮動排放稅率與目標性補貼,相較於固定稅率或比例型課稅機制,更能有效達成政策目標。 綜合而言,本研究突顯 MARL 模型於複雜經濟政策環境建模上的應用優勢,並對設計具高度適應性與包容性的永續轉型策略提供實務參考。
This paper develops a multi-agent reinforcement learning (MARL) model to examine the dynamic effects of environmental policies in a market with imperfect competition and incomplete information. Agents—including consumers, firms, and the government—learn behavioral strategies through repeated interactions, allowing for endogenous decisions such as green investment, pricing, and labor supply. The model incorporates noisy preferences, carbon taxation, and targeted subsidies. Simulation results show that informational frictions foster experimentation and accelerate green adoption, while common shocks improve tacit coordination but reduce green innovation. Flexible emission taxes and targeted subsidies for green leaders and low-wage workers are more effective than fixed-rate taxes or proportional schemes. These findings highlight the advantages of MARL in modeling complex policy environments and provide practical insights for designing adaptive and inclusive sustainability transitions.
參考文獻 1. Acemoglu, D., Aghion, P., Bursztyn, L., & Hemous, D. (2012). The environment and directed technical change. American economic review, 102(1), 131-166. 2. Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32. 3. Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Machine learning methods for demand estimation. American Economic Review, 105(5), 481-485. 4. Batten, S., Sowerbutts, R., & Tanaka, M. (2020). Climate change: Macroeconomic impact and implications for monetary policy. Ecological, societal, and technological risks and the financial sector, 13-38. 5. Chen, M., Joseph, A., Kumhof, M., Pan, X., & Zhou, X. (2021). Deep reinforcement learning in a monetary model. arXiv preprint arXiv:2104.09368. 6. Combet, E., Ghersi, F., Hourcade, J. C., & Théry, D. (2010). Carbon tax and equity: The importance of policy design. Critical issues in environmental taxation, pp-277. 7. Council, A. (2013). Summary for policy-makers. Arctic Resilience Interim Report 2013. 8. Curry, M., Trott, A., Phade, S., Bai, Y., & Zheng, S. (2022). Analyzing Micro-Founded General Equilibrium Models with Many Agents using Deep Reinforcement Learning. arXiv preprint arXiv:2201.01163. 9. Danthine, J. P., & Donaldson, J. B. (1993). Methodological and empirical issues in real business cycle theory. European economic review, 37(1), 1-35. 10. Feng, Z. H., Wei, Y. M., & Wang, K. (2012). Estimating risk for the carbon market via extreme value theory: An empirical analysis of the EU ETS. Applied Energy, 99, 97-108. 11. Feng, Y., Xu, D., Failler, P., & Li, T. (2020). Research on the time-varying impact of economic policy uncertainty on crude oil price fluctuation. Sustainability, 12(16), 6523. 12. Gazzotti, P. (2022). RICE50+: DICE model at country and regional level. Socio-Environmental Systems Modelling, 4, 18038-18038. 13. Goulder, L. H., & Schneider, S. H. (1999). Induced technological change and the attractiveness of CO2 abatement policies. Resource and energy economics, 21(3-4), 211-253. 14. Grubb, M., Wieners, C., & Yang, P. (2021). Modeling myths: On DICE and dynamic realism in integrated assessment models of climate change mitigation. Wiley Interdisciplinary Reviews: Climate Change, 12(3), e698. 15. Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121. 16. Fischer, C., & Heutel, G. (2013). Environmental macroeconomics: Environmental policy, business cycles, and directed technical change. Annu. Rev. Resour. Econ., 5(1), 197-210. 17. Hill, E., Bardoscia, M., & Turrell, A. (2021). Solving heterogeneous general equilibrium economic models with deep reinforcement learning. arXiv preprint arXiv:2103.16977. 18. Kim, I. M., & Loungani, P. (1992). The role of energy in real business cycle models. journal of Monetary Economics, 29(2), 173-189. 19. Kreif, N., & DiazOrdaz, K. (2019). Machine learning in policy evaluation: new tools for causal inference. arXiv preprint arXiv:1903.00402. 20. Lanctot, M., Lockhart, E., Lespiau, J. B., Zambaldi, V., Upadhyay, S., Pérolat, J., ... & Ryan-Davis, J. (2019). OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453. 21. Mas-Colell, A., Whinston, M. D., & Green, J. R. (1995). Microeconomic theory (Vol. 1). New York: Oxford university press. 22. Moran, K. (2001). Dynamic general-equilibrium models and why the bank of canada is interested in them. Bank of Canada Review, 2000(Winter), 3-12. 23. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. 24. Nordhaus, W. D. (2017). Revisiting the social cost of carbon. Proceedings of the National Academy of Sciences, 114(7), 1518-1523. 25. Nordhaus, W. (2018). Projections and uncertainties about climate change in an era of minimal climate policies. American economic journal: economic policy, 10(3), 333-360. 26. OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/. 27. Popp, D. (2004). ENTICE: endogenous technological change in the DICE model of global warming. Journal of Environmental Economics and management, 48(1), 742-768. 28. Popp, D. (2010). Innovation and climate policy. Annu. Rev. Resour. Econ., 2(1), 275-298. 29. Radovic, D., Kruitwagen, L., de Witt, C. S., Caldecott, B., Tomlinson, S., & Workman, M. (2022). Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning. arXiv preprint arXiv:2211.11043. 30. Richardson, A., van Florenstein Mulder, T., & Vehbi, T. (2021). Nowcasting GDP using machine-learning algorithms: A real-time assessment. International journal of forecasting, 37(2), 941-948. 31. Shayegh, S., Reissl, S., Roshan, E., & Calcaterra, M. (2023). An assessment of different transition pathways to a green global economy. Communications Earth & Environment, 4(1), 448. 32. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. nature, 550(7676), 354-359. 33. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press. 34. Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. D. L., ... & Riedmiller, M. (2018). Deepmind control suite. arXiv preprint arXiv:1801.00690. 35. Trott, A., Srinivasa, S., van der Wal, D., Haneuse, S., & Zheng, S. (2021). Building a foundation for data-driven, interpretable, and robust policy design using the ai economist. arXiv preprint arXiv:2108.02904.Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of economic perspectives, 28(2), 3-28. 36. AlphaStar, D. (2019). Mastering the real-time strategy game starcraft ii. URL: https://deepmind. com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii. 37. Woloszko, N. (2020). Tracking activity in real time with Google Trends. 38. Yang, Y., Niu, L., Amin, S., & Yasin, I. (2024). Unemployment and mental health: a global study of unemployment’s influence on diverse mental disorders. Frontiers in Public Health, 12, 1440403. 39. Zhan, E., Zheng, S., Yue, Y., & Lucey, P. (2018). Generative multi-agent behavioral cloning. arXiv preprint arXiv:1803.07612, 2. 40. Zhang, K., Yang, Z., & Başar, T. (2021). Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, 321-384. 41. Zheng, S., Trott, A., Srinivasa, S., Naik, N., Gruesbeck, M., Parkes, D. C., & Socher, R. (2020). The ai economist: Improving equality and productivity with ai-driven tax policies. arXiv preprint arXiv:2004.13332. 42. Zheng, S., Yue, Y., & Hobbs, J. (2016). Generating long-term trajectories using deep hierarchical networks. Advances in Neural Information Processing Systems, 29.
描述 碩士
國立政治大學
經濟學系
112258004
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112258004
資料類型 thesis
dc.contributor.advisor 何靜嫺zh_TW
dc.contributor.author (Authors) 曾婷婉zh_TW
dc.contributor.author (Authors) Tseng, Ting-Wanen_US
dc.creator (作者) 曾婷婉zh_TW
dc.creator (作者) Tseng, Ting-Wanen_US
dc.date (日期) 2025en_US
dc.date.accessioned 4-Aug-2025 12:49:24 (UTC+8)-
dc.date.available 4-Aug-2025 12:49:24 (UTC+8)-
dc.date.issued (上傳時間) 4-Aug-2025 12:49:24 (UTC+8)-
dc.identifier (Other Identifiers) G0112258004en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/158268-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 經濟學系zh_TW
dc.description (描述) 112258004zh_TW
dc.description.abstract (摘要) 本研究使用多參與者強化學習(MARL)模型,以探討在具有不完全競爭與資訊不完全的市場中,環境政策所帶來的動態影響。模型中的參與者包括消費者、企業與政府,透過反覆互動學習行為策略,進而做出內生性的決策,如綠能投資、定價與勞動供給。本模型納入了偏好雜訊、碳稅制度與目標性補貼等設計。模擬結果顯示,資訊不對稱促進參與者的實驗性行為,並加速綠能技術的採用;相對地,共通性衝擊雖有助於企業勾結,但會抑制綠能革新。此外,針對綠能領導企業與低薪勞工所設計的浮動排放稅率與目標性補貼,相較於固定稅率或比例型課稅機制,更能有效達成政策目標。 綜合而言,本研究突顯 MARL 模型於複雜經濟政策環境建模上的應用優勢,並對設計具高度適應性與包容性的永續轉型策略提供實務參考。zh_TW
dc.description.abstract (摘要) This paper develops a multi-agent reinforcement learning (MARL) model to examine the dynamic effects of environmental policies in a market with imperfect competition and incomplete information. Agents—including consumers, firms, and the government—learn behavioral strategies through repeated interactions, allowing for endogenous decisions such as green investment, pricing, and labor supply. The model incorporates noisy preferences, carbon taxation, and targeted subsidies. Simulation results show that informational frictions foster experimentation and accelerate green adoption, while common shocks improve tacit coordination but reduce green innovation. Flexible emission taxes and targeted subsidies for green leaders and low-wage workers are more effective than fixed-rate taxes or proportional schemes. These findings highlight the advantages of MARL in modeling complex policy environments and provide practical insights for designing adaptive and inclusive sustainability transitions.en_US
dc.description.tableofcontents List of Tables v List of Figures vi 1. Introduction 1 2. Literature Review 4 2.1 Machine Learning in Economics Analyses 4 2.2 Economic Analyses on Environmental Policies 10 3. Imperfect Competition Market with Strategic Agents and Limited Information 14 3.1 Worker-Consumers 15 3.2 Price-Setting Firms and Green Decisions 18 3.3 Government 22 4. Reinforcement Learning and Dynamic Market Frictions 25 4.1 Introduction to Policy Optimization in RL 25 4.2 Environment Structure and Timing of Decisions 25 4.3 Policy Learning and Agent Adaption 29 4.4 RL Training Procedure 32 4.5 Key Implementation Details 36 5. Scenarios Evaluations and Experiment Results 38 5.1 Scenarios to Evaluate 38 5.2 Experiment Results and Discussions 44 6. Conclusions 67 Appendix A. Supplementary Figures for Experimental Scenarios 69 Appendix B. Notation and Simulation Parameters 78 References 80zh_TW
dc.format.extent 5411914 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112258004en_US
dc.subject (關鍵詞) 多參與者之強化學習zh_TW
dc.subject (關鍵詞) 環境政策zh_TW
dc.subject (關鍵詞) 綠能轉型zh_TW
dc.subject (關鍵詞) 碳稅zh_TW
dc.subject (關鍵詞) 永續性zh_TW
dc.subject (關鍵詞) 不完全資訊zh_TW
dc.subject (關鍵詞) 不完全競爭zh_TW
dc.subject (關鍵詞) 個體基礎模型zh_TW
dc.subject (關鍵詞) 目標性補貼zh_TW
dc.subject (關鍵詞) 政策模擬zh_TW
dc.subject (關鍵詞) Multi-agent reinforcement learningen_US
dc.subject (關鍵詞) Environmental policyen_US
dc.subject (關鍵詞) Green transitionen_US
dc.subject (關鍵詞) Carbon taxen_US
dc.subject (關鍵詞) Sustainabilityen_US
dc.subject (關鍵詞) Incomplete informationen_US
dc.subject (關鍵詞) Imperfect competitionen_US
dc.subject (關鍵詞) Agent-based modelingen_US
dc.subject (關鍵詞) Targeted subsidyen_US
dc.subject (關鍵詞) Policy simulationen_US
dc.title (題名) 環境政策與永續轉型的動態均衡:異質參與者之強化學習分析zh_TW
dc.title (題名) The Dynamics of Environmental Policy and Sustainability Transition: A Heterogeneous Multi-Agent Reinforcement Learning Approachen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 1. Acemoglu, D., Aghion, P., Bursztyn, L., & Hemous, D. (2012). The environment and directed technical change. American economic review, 102(1), 131-166. 2. Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32. 3. Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Machine learning methods for demand estimation. American Economic Review, 105(5), 481-485. 4. Batten, S., Sowerbutts, R., & Tanaka, M. (2020). Climate change: Macroeconomic impact and implications for monetary policy. Ecological, societal, and technological risks and the financial sector, 13-38. 5. Chen, M., Joseph, A., Kumhof, M., Pan, X., & Zhou, X. (2021). Deep reinforcement learning in a monetary model. arXiv preprint arXiv:2104.09368. 6. Combet, E., Ghersi, F., Hourcade, J. C., & Théry, D. (2010). Carbon tax and equity: The importance of policy design. Critical issues in environmental taxation, pp-277. 7. Council, A. (2013). Summary for policy-makers. Arctic Resilience Interim Report 2013. 8. Curry, M., Trott, A., Phade, S., Bai, Y., & Zheng, S. (2022). Analyzing Micro-Founded General Equilibrium Models with Many Agents using Deep Reinforcement Learning. arXiv preprint arXiv:2201.01163. 9. Danthine, J. P., & Donaldson, J. B. (1993). Methodological and empirical issues in real business cycle theory. European economic review, 37(1), 1-35. 10. Feng, Z. H., Wei, Y. M., & Wang, K. (2012). Estimating risk for the carbon market via extreme value theory: An empirical analysis of the EU ETS. Applied Energy, 99, 97-108. 11. Feng, Y., Xu, D., Failler, P., & Li, T. (2020). Research on the time-varying impact of economic policy uncertainty on crude oil price fluctuation. Sustainability, 12(16), 6523. 12. Gazzotti, P. (2022). RICE50+: DICE model at country and regional level. Socio-Environmental Systems Modelling, 4, 18038-18038. 13. Goulder, L. H., & Schneider, S. H. (1999). Induced technological change and the attractiveness of CO2 abatement policies. Resource and energy economics, 21(3-4), 211-253. 14. Grubb, M., Wieners, C., & Yang, P. (2021). Modeling myths: On DICE and dynamic realism in integrated assessment models of climate change mitigation. Wiley Interdisciplinary Reviews: Climate Change, 12(3), e698. 15. Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121. 16. Fischer, C., & Heutel, G. (2013). Environmental macroeconomics: Environmental policy, business cycles, and directed technical change. Annu. Rev. Resour. Econ., 5(1), 197-210. 17. Hill, E., Bardoscia, M., & Turrell, A. (2021). Solving heterogeneous general equilibrium economic models with deep reinforcement learning. arXiv preprint arXiv:2103.16977. 18. Kim, I. M., & Loungani, P. (1992). The role of energy in real business cycle models. journal of Monetary Economics, 29(2), 173-189. 19. Kreif, N., & DiazOrdaz, K. (2019). Machine learning in policy evaluation: new tools for causal inference. arXiv preprint arXiv:1903.00402. 20. Lanctot, M., Lockhart, E., Lespiau, J. B., Zambaldi, V., Upadhyay, S., Pérolat, J., ... & Ryan-Davis, J. (2019). OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453. 21. Mas-Colell, A., Whinston, M. D., & Green, J. R. (1995). Microeconomic theory (Vol. 1). New York: Oxford university press. 22. Moran, K. (2001). Dynamic general-equilibrium models and why the bank of canada is interested in them. Bank of Canada Review, 2000(Winter), 3-12. 23. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. 24. Nordhaus, W. D. (2017). Revisiting the social cost of carbon. Proceedings of the National Academy of Sciences, 114(7), 1518-1523. 25. Nordhaus, W. (2018). Projections and uncertainties about climate change in an era of minimal climate policies. American economic journal: economic policy, 10(3), 333-360. 26. OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/. 27. Popp, D. (2004). ENTICE: endogenous technological change in the DICE model of global warming. Journal of Environmental Economics and management, 48(1), 742-768. 28. Popp, D. (2010). Innovation and climate policy. Annu. Rev. Resour. Econ., 2(1), 275-298. 29. Radovic, D., Kruitwagen, L., de Witt, C. S., Caldecott, B., Tomlinson, S., & Workman, M. (2022). Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning. arXiv preprint arXiv:2211.11043. 30. Richardson, A., van Florenstein Mulder, T., & Vehbi, T. (2021). Nowcasting GDP using machine-learning algorithms: A real-time assessment. International journal of forecasting, 37(2), 941-948. 31. Shayegh, S., Reissl, S., Roshan, E., & Calcaterra, M. (2023). An assessment of different transition pathways to a green global economy. Communications Earth & Environment, 4(1), 448. 32. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. nature, 550(7676), 354-359. 33. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press. 34. Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. D. L., ... & Riedmiller, M. (2018). Deepmind control suite. arXiv preprint arXiv:1801.00690. 35. Trott, A., Srinivasa, S., van der Wal, D., Haneuse, S., & Zheng, S. (2021). Building a foundation for data-driven, interpretable, and robust policy design using the ai economist. arXiv preprint arXiv:2108.02904.Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of economic perspectives, 28(2), 3-28. 36. AlphaStar, D. (2019). Mastering the real-time strategy game starcraft ii. URL: https://deepmind. com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii. 37. Woloszko, N. (2020). Tracking activity in real time with Google Trends. 38. Yang, Y., Niu, L., Amin, S., & Yasin, I. (2024). Unemployment and mental health: a global study of unemployment’s influence on diverse mental disorders. Frontiers in Public Health, 12, 1440403. 39. Zhan, E., Zheng, S., Yue, Y., & Lucey, P. (2018). Generative multi-agent behavioral cloning. arXiv preprint arXiv:1803.07612, 2. 40. Zhang, K., Yang, Z., & Başar, T. (2021). Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, 321-384. 41. Zheng, S., Trott, A., Srinivasa, S., Naik, N., Gruesbeck, M., Parkes, D. C., & Socher, R. (2020). The ai economist: Improving equality and productivity with ai-driven tax policies. arXiv preprint arXiv:2004.13332. 42. Zheng, S., Yue, Y., & Hobbs, J. (2016). Generating long-term trajectories using deep hierarchical networks. Advances in Neural Information Processing Systems, 29.zh_TW