環境政策與永續轉型的動態均衡：異質參與者之強化學習分析 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	環境政策與永續轉型的動態均衡：異質參與者之強化學習分析 The Dynamics of Environmental Policy and Sustainability Transition: A Heterogeneous Multi-Agent Reinforcement Learning Approach
作者	曾婷婉 Tseng, Ting-Wan
貢獻者	何靜嫺曾婷婉 Tseng, Ting-Wan
關鍵詞	多參與者之強化學習環境政策綠能轉型碳稅永續性不完全資訊不完全競爭個體基礎模型目標性補貼政策模擬 Multi-agent reinforcement learning Environmental policy Green transition Carbon tax Sustainability Incomplete information Imperfect competition Agent-based modeling Targeted subsidy Policy simulation
日期	2025
上傳時間	4-Aug-2025 12:49:24 (UTC+8)
摘要	本研究使用多參與者強化學習（MARL）模型，以探討在具有不完全競爭與資訊不完全的市場中，環境政策所帶來的動態影響。模型中的參與者包括消費者、企業與政府，透過反覆互動學習行為策略，進而做出內生性的決策，如綠能投資、定價與勞動供給。本模型納入了偏好雜訊、碳稅制度與目標性補貼等設計。模擬結果顯示，資訊不對稱促進參與者的實驗性行為，並加速綠能技術的採用；相對地，共通性衝擊雖有助於企業勾結，但會抑制綠能革新。此外，針對綠能領導企業與低薪勞工所設計的浮動排放稅率與目標性補貼，相較於固定稅率或比例型課稅機制，更能有效達成政策目標。綜合而言，本研究突顯 MARL 模型於複雜經濟政策環境建模上的應用優勢，並對設計具高度適應性與包容性的永續轉型策略提供實務參考。 This paper develops a multi-agent reinforcement learning (MARL) model to examine the dynamic effects of environmental policies in a market with imperfect competition and incomplete information. Agents—including consumers, firms, and the government—learn behavioral strategies through repeated interactions, allowing for endogenous decisions such as green investment, pricing, and labor supply. The model incorporates noisy preferences, carbon taxation, and targeted subsidies. Simulation results show that informational frictions foster experimentation and accelerate green adoption, while common shocks improve tacit coordination but reduce green innovation. Flexible emission taxes and targeted subsidies for green leaders and low-wage workers are more effective than fixed-rate taxes or proportional schemes. These findings highlight the advantages of MARL in modeling complex policy environments and provide practical insights for designing adaptive and inclusive sustainability transitions.
參考文獻	1. Acemoglu, D., Aghion, P., Bursztyn, L., & Hemous, D. (2012). The environment and directed technical change. American economic review, 102(1), 131-166. 2. Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32. 3. Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Machine learning methods for demand estimation. American Economic Review, 105(5), 481-485. 4. Batten, S., Sowerbutts, R., & Tanaka, M. (2020). Climate change: Macroeconomic impact and implications for monetary policy. Ecological, societal, and technological risks and the financial sector, 13-38. 5. Chen, M., Joseph, A., Kumhof, M., Pan, X., & Zhou, X. (2021). Deep reinforcement learning in a monetary model. arXiv preprint arXiv:2104.09368. 6. Combet, E., Ghersi, F., Hourcade, J. C., & Théry, D. (2010). Carbon tax and equity: The importance of policy design. Critical issues in environmental taxation, pp-277. 7. Council, A. (2013). Summary for policy-makers. Arctic Resilience Interim Report 2013. 8. Curry, M., Trott, A., Phade, S., Bai, Y., & Zheng, S. (2022). Analyzing Micro-Founded General Equilibrium Models with Many Agents using Deep Reinforcement Learning. arXiv preprint arXiv:2201.01163. 9. Danthine, J. P., & Donaldson, J. B. (1993). Methodological and empirical issues in real business cycle theory. European economic review, 37(1), 1-35. 10. Feng, Z. H., Wei, Y. M., & Wang, K. (2012). Estimating risk for the carbon market via extreme value theory: An empirical analysis of the EU ETS. Applied Energy, 99, 97-108. 11. Feng, Y., Xu, D., Failler, P., & Li, T. (2020). Research on the time-varying impact of economic policy uncertainty on crude oil price fluctuation. Sustainability, 12(16), 6523. 12. Gazzotti, P. (2022). RICE50+: DICE model at country and regional level. Socio-Environmental Systems Modelling, 4, 18038-18038. 13. Goulder, L. H., & Schneider, S. H. (1999). Induced technological change and the attractiveness of CO2 abatement policies. Resource and energy economics, 21(3-4), 211-253. 14. Grubb, M., Wieners, C., & Yang, P. (2021). Modeling myths: On DICE and dynamic realism in integrated assessment models of climate change mitigation. Wiley Interdisciplinary Reviews: Climate Change, 12(3), e698. 15. Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121. 16. Fischer, C., & Heutel, G. (2013). Environmental macroeconomics: Environmental policy, business cycles, and directed technical change. Annu. Rev. Resour. Econ., 5(1), 197-210. 17. Hill, E., Bardoscia, M., & Turrell, A. (2021). Solving heterogeneous general equilibrium economic models with deep reinforcement learning. arXiv preprint arXiv:2103.16977. 18. Kim, I. M., & Loungani, P. (1992). The role of energy in real business cycle models. journal of Monetary Economics, 29(2), 173-189. 19. Kreif, N., & DiazOrdaz, K. (2019). Machine learning in policy evaluation: new tools for causal inference. arXiv preprint arXiv:1903.00402. 20. Lanctot, M., Lockhart, E., Lespiau, J. B., Zambaldi, V., Upadhyay, S., Pérolat, J., ... & Ryan-Davis, J. (2019). OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453. 21. Mas-Colell, A., Whinston, M. D., & Green, J. R. (1995). Microeconomic theory (Vol. 1). New York: Oxford university press. 22. Moran, K. (2001). Dynamic general-equilibrium models and why the bank of canada is interested in them. Bank of Canada Review, 2000(Winter), 3-12. 23. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. 24. Nordhaus, W. D. (2017). Revisiting the social cost of carbon. Proceedings of the National Academy of Sciences, 114(7), 1518-1523. 25. Nordhaus, W. (2018). Projections and uncertainties about climate change in an era of minimal climate policies. American economic journal: economic policy, 10(3), 333-360. 26. OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/. 27. Popp, D. (2004). ENTICE: endogenous technological change in the DICE model of global warming. Journal of Environmental Economics and management, 48(1), 742-768. 28. Popp, D. (2010). Innovation and climate policy. Annu. Rev. Resour. Econ., 2(1), 275-298. 29. Radovic, D., Kruitwagen, L., de Witt, C. S., Caldecott, B., Tomlinson, S., & Workman, M. (2022). Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning. arXiv preprint arXiv:2211.11043. 30. Richardson, A., van Florenstein Mulder, T., & Vehbi, T. (2021). Nowcasting GDP using machine-learning algorithms: A real-time assessment. International journal of forecasting, 37(2), 941-948. 31. Shayegh, S., Reissl, S., Roshan, E., & Calcaterra, M. (2023). An assessment of different transition pathways to a green global economy. Communications Earth & Environment, 4(1), 448. 32. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. nature, 550(7676), 354-359. 33. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press. 34. Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. D. L., ... & Riedmiller, M. (2018). Deepmind control suite. arXiv preprint arXiv:1801.00690. 35. Trott, A., Srinivasa, S., van der Wal, D., Haneuse, S., & Zheng, S. (2021). Building a foundation for data-driven, interpretable, and robust policy design using the ai economist. arXiv preprint arXiv:2108.02904.Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of economic perspectives, 28(2), 3-28. 36. AlphaStar, D. (2019). Mastering the real-time strategy game starcraft ii. URL: https://deepmind. com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii. 37. Woloszko, N. (2020). Tracking activity in real time with Google Trends. 38. Yang, Y., Niu, L., Amin, S., & Yasin, I. (2024). Unemployment and mental health: a global study of unemployment’s influence on diverse mental disorders. Frontiers in Public Health, 12, 1440403. 39. Zhan, E., Zheng, S., Yue, Y., & Lucey, P. (2018). Generative multi-agent behavioral cloning. arXiv preprint arXiv:1803.07612, 2. 40. Zhang, K., Yang, Z., & Başar, T. (2021). Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, 321-384. 41. Zheng, S., Trott, A., Srinivasa, S., Naik, N., Gruesbeck, M., Parkes, D. C., & Socher, R. (2020). The ai economist: Improving equality and productivity with ai-driven tax policies. arXiv preprint arXiv:2004.13332. 42. Zheng, S., Yue, Y., & Hobbs, J. (2016). Generating long-term trajectories using deep hierarchical networks. Advances in Neural Information Processing Systems, 29.
描述	碩士國立政治大學經濟學系 112258004
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112258004
資料類型	thesis

dc.contributor.advisor	何靜嫺	zh_TW
dc.contributor.author (Authors)	曾婷婉	zh_TW
dc.contributor.author (Authors)	Tseng, Ting-Wan	en_US
dc.creator (作者)	曾婷婉	zh_TW
dc.creator (作者)	Tseng, Ting-Wan	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	4-Aug-2025 12:49:24 (UTC+8)	-
dc.date.available	4-Aug-2025 12:49:24 (UTC+8)	-
dc.date.issued (上傳時間)	4-Aug-2025 12:49:24 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112258004	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/158268	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	經濟學系	zh_TW
dc.description (描述)	112258004	zh_TW
dc.description.abstract (摘要)	本研究使用多參與者強化學習（MARL）模型，以探討在具有不完全競爭與資訊不完全的市場中，環境政策所帶來的動態影響。模型中的參與者包括消費者、企業與政府，透過反覆互動學習行為策略，進而做出內生性的決策，如綠能投資、定價與勞動供給。本模型納入了偏好雜訊、碳稅制度與目標性補貼等設計。模擬結果顯示，資訊不對稱促進參與者的實驗性行為，並加速綠能技術的採用；相對地，共通性衝擊雖有助於企業勾結，但會抑制綠能革新。此外，針對綠能領導企業與低薪勞工所設計的浮動排放稅率與目標性補貼，相較於固定稅率或比例型課稅機制，更能有效達成政策目標。綜合而言，本研究突顯 MARL 模型於複雜經濟政策環境建模上的應用優勢，並對設計具高度適應性與包容性的永續轉型策略提供實務參考。	zh_TW
dc.description.abstract (摘要)	This paper develops a multi-agent reinforcement learning (MARL) model to examine the dynamic effects of environmental policies in a market with imperfect competition and incomplete information. Agents—including consumers, firms, and the government—learn behavioral strategies through repeated interactions, allowing for endogenous decisions such as green investment, pricing, and labor supply. The model incorporates noisy preferences, carbon taxation, and targeted subsidies. Simulation results show that informational frictions foster experimentation and accelerate green adoption, while common shocks improve tacit coordination but reduce green innovation. Flexible emission taxes and targeted subsidies for green leaders and low-wage workers are more effective than fixed-rate taxes or proportional schemes. These findings highlight the advantages of MARL in modeling complex policy environments and provide practical insights for designing adaptive and inclusive sustainability transitions.	en_US
dc.description.tableofcontents	List of Tables v List of Figures vi 1. Introduction 1 2. Literature Review 4 2.1 Machine Learning in Economics Analyses 4 2.2 Economic Analyses on Environmental Policies 10 3. Imperfect Competition Market with Strategic Agents and Limited Information 14 3.1 Worker-Consumers 15 3.2 Price-Setting Firms and Green Decisions 18 3.3 Government 22 4. Reinforcement Learning and Dynamic Market Frictions 25 4.1 Introduction to Policy Optimization in RL 25 4.2 Environment Structure and Timing of Decisions 25 4.3 Policy Learning and Agent Adaption 29 4.4 RL Training Procedure 32 4.5 Key Implementation Details 36 5. Scenarios Evaluations and Experiment Results 38 5.1 Scenarios to Evaluate 38 5.2 Experiment Results and Discussions 44 6. Conclusions 67 Appendix A. Supplementary Figures for Experimental Scenarios 69 Appendix B. Notation and Simulation Parameters 78 References 80	zh_TW
dc.format.extent	5411914 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112258004	en_US
dc.subject (關鍵詞)	多參與者之強化學習	zh_TW
dc.subject (關鍵詞)	環境政策	zh_TW
dc.subject (關鍵詞)	綠能轉型	zh_TW
dc.subject (關鍵詞)	碳稅	zh_TW
dc.subject (關鍵詞)	永續性	zh_TW
dc.subject (關鍵詞)	不完全資訊	zh_TW
dc.subject (關鍵詞)	不完全競爭	zh_TW
dc.subject (關鍵詞)	個體基礎模型	zh_TW
dc.subject (關鍵詞)	目標性補貼	zh_TW
dc.subject (關鍵詞)	政策模擬	zh_TW
dc.subject (關鍵詞)	Multi-agent reinforcement learning	en_US
dc.subject (關鍵詞)	Environmental policy	en_US
dc.subject (關鍵詞)	Green transition	en_US
dc.subject (關鍵詞)	Carbon tax	en_US
dc.subject (關鍵詞)	Sustainability	en_US
dc.subject (關鍵詞)	Incomplete information	en_US
dc.subject (關鍵詞)	Imperfect competition	en_US
dc.subject (關鍵詞)	Agent-based modeling	en_US
dc.subject (關鍵詞)	Targeted subsidy	en_US
dc.subject (關鍵詞)	Policy simulation	en_US
dc.title (題名)	環境政策與永續轉型的動態均衡：異質參與者之強化學習分析	zh_TW
dc.title (題名)	The Dynamics of Environmental Policy and Sustainability Transition: A Heterogeneous Multi-Agent Reinforcement Learning Approach	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	1. Acemoglu, D., Aghion, P., Bursztyn, L., & Hemous, D. (2012). The environment and directed technical change. American economic review, 102(1), 131-166. 2. Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32. 3. Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Machine learning methods for demand estimation. American Economic Review, 105(5), 481-485. 4. Batten, S., Sowerbutts, R., & Tanaka, M. (2020). Climate change: Macroeconomic impact and implications for monetary policy. Ecological, societal, and technological risks and the financial sector, 13-38. 5. Chen, M., Joseph, A., Kumhof, M., Pan, X., & Zhou, X. (2021). Deep reinforcement learning in a monetary model. arXiv preprint arXiv:2104.09368. 6. Combet, E., Ghersi, F., Hourcade, J. C., & Théry, D. (2010). Carbon tax and equity: The importance of policy design. Critical issues in environmental taxation, pp-277. 7. Council, A. (2013). Summary for policy-makers. Arctic Resilience Interim Report 2013. 8. Curry, M., Trott, A., Phade, S., Bai, Y., & Zheng, S. (2022). Analyzing Micro-Founded General Equilibrium Models with Many Agents using Deep Reinforcement Learning. arXiv preprint arXiv:2201.01163. 9. Danthine, J. P., & Donaldson, J. B. (1993). Methodological and empirical issues in real business cycle theory. European economic review, 37(1), 1-35. 10. Feng, Z. H., Wei, Y. M., & Wang, K. (2012). Estimating risk for the carbon market via extreme value theory: An empirical analysis of the EU ETS. Applied Energy, 99, 97-108. 11. Feng, Y., Xu, D., Failler, P., & Li, T. (2020). Research on the time-varying impact of economic policy uncertainty on crude oil price fluctuation. Sustainability, 12(16), 6523. 12. Gazzotti, P. (2022). RICE50+: DICE model at country and regional level. Socio-Environmental Systems Modelling, 4, 18038-18038. 13. Goulder, L. H., & Schneider, S. H. (1999). Induced technological change and the attractiveness of CO2 abatement policies. Resource and energy economics, 21(3-4), 211-253. 14. Grubb, M., Wieners, C., & Yang, P. (2021). Modeling myths: On DICE and dynamic realism in integrated assessment models of climate change mitigation. Wiley Interdisciplinary Reviews: Climate Change, 12(3), e698. 15. Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121. 16. Fischer, C., & Heutel, G. (2013). Environmental macroeconomics: Environmental policy, business cycles, and directed technical change. Annu. Rev. Resour. Econ., 5(1), 197-210. 17. Hill, E., Bardoscia, M., & Turrell, A. (2021). Solving heterogeneous general equilibrium economic models with deep reinforcement learning. arXiv preprint arXiv:2103.16977. 18. Kim, I. M., & Loungani, P. (1992). The role of energy in real business cycle models. journal of Monetary Economics, 29(2), 173-189. 19. Kreif, N., & DiazOrdaz, K. (2019). Machine learning in policy evaluation: new tools for causal inference. arXiv preprint arXiv:1903.00402. 20. Lanctot, M., Lockhart, E., Lespiau, J. B., Zambaldi, V., Upadhyay, S., Pérolat, J., ... & Ryan-Davis, J. (2019). OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453. 21. Mas-Colell, A., Whinston, M. D., & Green, J. R. (1995). Microeconomic theory (Vol. 1). New York: Oxford university press. 22. Moran, K. (2001). Dynamic general-equilibrium models and why the bank of canada is interested in them. Bank of Canada Review, 2000(Winter), 3-12. 23. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. 24. Nordhaus, W. D. (2017). Revisiting the social cost of carbon. Proceedings of the National Academy of Sciences, 114(7), 1518-1523. 25. Nordhaus, W. (2018). Projections and uncertainties about climate change in an era of minimal climate policies. American economic journal: economic policy, 10(3), 333-360. 26. OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/. 27. Popp, D. (2004). ENTICE: endogenous technological change in the DICE model of global warming. Journal of Environmental Economics and management, 48(1), 742-768. 28. Popp, D. (2010). Innovation and climate policy. Annu. Rev. Resour. Econ., 2(1), 275-298. 29. Radovic, D., Kruitwagen, L., de Witt, C. S., Caldecott, B., Tomlinson, S., & Workman, M. (2022). Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning. arXiv preprint arXiv:2211.11043. 30. Richardson, A., van Florenstein Mulder, T., & Vehbi, T. (2021). Nowcasting GDP using machine-learning algorithms: A real-time assessment. International journal of forecasting, 37(2), 941-948. 31. Shayegh, S., Reissl, S., Roshan, E., & Calcaterra, M. (2023). An assessment of different transition pathways to a green global economy. Communications Earth & Environment, 4(1), 448. 32. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. nature, 550(7676), 354-359. 33. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press. 34. Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D. D. L., ... & Riedmiller, M. (2018). Deepmind control suite. arXiv preprint arXiv:1801.00690. 35. Trott, A., Srinivasa, S., van der Wal, D., Haneuse, S., & Zheng, S. (2021). Building a foundation for data-driven, interpretable, and robust policy design using the ai economist. arXiv preprint arXiv:2108.02904.Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of economic perspectives, 28(2), 3-28. 36. AlphaStar, D. (2019). Mastering the real-time strategy game starcraft ii. URL: https://deepmind. com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii. 37. Woloszko, N. (2020). Tracking activity in real time with Google Trends. 38. Yang, Y., Niu, L., Amin, S., & Yasin, I. (2024). Unemployment and mental health: a global study of unemployment’s influence on diverse mental disorders. Frontiers in Public Health, 12, 1440403. 39. Zhan, E., Zheng, S., Yue, Y., & Lucey, P. (2018). Generative multi-agent behavioral cloning. arXiv preprint arXiv:1803.07612, 2. 40. Zhang, K., Yang, Z., & Başar, T. (2021). Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, 321-384. 41. Zheng, S., Trott, A., Srinivasa, S., Naik, N., Gruesbeck, M., Parkes, D. C., & Socher, R. (2020). The ai economist: Improving equality and productivity with ai-driven tax policies. arXiv preprint arXiv:2004.13332. 42. Zheng, S., Yue, Y., & Hobbs, J. (2016). Generating long-term trajectories using deep hierarchical networks. Advances in Neural Information Processing Systems, 29.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM