Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 運用 LLM、RAG 與提示工程於永續報告書中的風險識別
Leveraging LLM, RAG, and Prompt Engineering for Risk Identification in Sustainability Reports
作者 王子云
Wang, Zih-Yun
貢獻者 林怡伶
Lin, Yi-Ling
王子云
Wang, Zih-Yun
關鍵詞 企業社會責任 (CSR)
環境社會與公司治理 (ESG)
企業永續報告書
大型語言模型 (LLMs)
檢索增強生成 (RAG)
提示工程
思維鏈(chain-of-thought, CoT)
語境風險偵測
Corporate Social Responsibility (CSR)
Environmental Social and Corporate Governance (ESG)
Sustainability report
Large Language Models (LLMs)
Retrieval-Augmented Generation (RAG)
Prompt engineering
Chain-of-Thought (CoT)
Contextual risk detection
日期 2025
上傳時間 4-Aug-2025 14:26:12 (UTC+8)
摘要 隨著企業社會責任(CSR)與環境、社會及公司治理(ESG)概念日益受到重視,利害關係人越來越依賴從企業永續報告書中,一窺透明化的企業永續作為以及風險鑑別與管理方式。然而,企業面臨的風險複雜且多樣,使得利害關係人難以全面分析。若要自動化從冗長且無標準化格式的文本中提取顯性與隱性風險極具挑戰性,因為傳統的關鍵字方法難以應對多樣化用詞及細微語境差異。本研究與國立政治大學商學院的信義學院合作,提出一個端到端的檢索增強生成流程來自動化偵測中文永續報告中的風險,並以橫跨五個產業共 30 份 2024年在台灣發布的永續報告書上評估。我們比較了四種提示策略,包含零樣本、零樣本思維鏈、少樣本與少樣本思維鏈,並採用集成方法達成每項風險之中位數 F1值 0.90 的成果,同時兼顧時間與成本效益。對思維鏈輸出進行錯誤分析後,統整出四種常見錯誤類型。此外,我們釋出領域適應的提示模板,以助未來中文永續報告書中的風險偵測相關研究。研究結果顯示,結合大型語言模型、檢索增強生成與提示工程能可靠地自動化風險揭露分析,提升透明度並增強利害關係人的信任。
As the concepts of CSR and ESG receive growing attention, stakeholders increasingly rely on corporate sustainability reports to gain transparent insights into a company’s sustainability practices and its risk identification and management approaches. However, the complexity and diversity of these risks make it difficult to analyze comprehensively. Automatically extracting both explicit and implicit risks from lengthy, unstandardized texts is particularly challenging, as traditional keyword-based methods struggle to handle diverse wording and nuanced contexts. In collaboration with the Sinyi School at National Chengchi University’s College of Commerce, we propose an end-to-end Retrieval-Augmented Generation (RAG) pipeline for automated risk detection in Chinese sustainability reports and evaluate it on 30 Taiwanese 2024 reports spanning five industries. We compare four prompting strategies, including zero-shot, zero-shot chain-of-thought (CoT), few-shot, and few-shot CoT, and employ an ensemble approach that achieves a median per-risk F1 score of 0.90, while maintaining time- and cost-efficiency. Error analysis of CoT outputs uncovers four common failure types. Additionally, we develop domain-adapted prompt templates to support future risk detection research in Chinese sustainability reports. Our results demonstrate that combining Large Language Models (LLMs) with RAG and prompt engineering reliably automates risk-disclosure analysis, enhancing transparency and stakeholder trust.
參考文獻 Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. In arXiv preprint arXiv:1908.10063 (ArXiv Preprint ArXiv:1908.10063). http://arxiv.org/abs/1908.10063 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877–1901. Bsharat, S. M., Myrzakhan, A., & Shen, Z. (2023). Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 (ArXiv Preprint ArXiv:2312.16171). http://arxiv.org/abs/2312.16171 Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020). LEGAL-BERT: The Muppets straight out of Law School: Vol. arXiv preprint (ArXiv Preprint ArXiv:2010.02559). http://arxiv.org/abs/2010.02559 Chen, Q., Hu, Y., Peng, X., Xie, Q., Jin, Q., Gilson, A., Singer, M. B., Ai, X., Lai, P. T., Wang, Z., Keloth, V. K., Raja, K., Huang, J., He, H., Lin, F., Du, J., Zhang, R., Zheng, W. J., Adelman, R. A., … Xu, H. (2025). Benchmarking large language models for biomedical natural language processing applications and recommendations. Nature Communications, 16(1), 3280. https://doi.org/10.1038/s41467-025-56989-2 Church, K. W., Chen, Z., & Ma, Y. (2021). Emerging trends: A gentle introduction to fine-tuning. Natural Language Engineering, 27(6), 763–778. https://doi.org/10.1017/S1351324921000322 Devlin, J., Chang, M.-W., Lee, K., Google, K. T., & Language, A. I. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. Duan, Z., & Wang, J. (2025). Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models. 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), 2222–2227. Ekin, S. (2023). Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices (Authorea Preprints). https://doi.org/10.36227/techrxiv.22683919.v2 Hadi, M. U., tashi, qasem al, Qureshi, R., Shah, A., muneer, amgad, Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, J., & Mirjalili, S. (2023). A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage (Authorea Preprints). https://doi.org/10.36227/techrxiv.23589741.v1 Huang, J., Wang, D. D., & Wang, Y. (2024). Textual Attributes of Corporate Sustainability Reports and ESG Ratings. Sustainability, 16(21), 9270. https://doi.org/10.3390/su16219270 Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2025). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions on Information Systems, 43(2), 1–55. https://doi.org/10.1145/3703155 Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Chen, D., Dai, W., Chan, H. S., Madotto, A., & Fung, P. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730 Kaźmierczak, M. (2022). A literature review on the difference between CSR and ESG. Zeszyty Naukowe. Organizacja i Zarządzanie/Politechnika Śląska, 2022(162), 275–289. https://doi.org/10.29119/1641-3466.2022.162.16 Leng, Q., Portes, J., Havens, S., Zaharia, M., & Carbin, M. (2024). Long Context RAG Performance of Large Language Models (ArXiv Preprint ArXiv:2411.03538). http://arxiv.org/abs/2411.03538 Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 9459–9474. Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases (ArXiv Preprint ArXiv:2403.10446). http://arxiv.org/abs/2403.10446 Lin, K. H., Kao, T. H., Wang, L. C., Kuo, C. T., Chen, P. C. H., Chu, Y. C., & Yeh, Y. C. (2025). Benchmarking large language models GPT-4o, llama 3.1, and qwen 2.5 for cancer genetic variant classification. NPJ Precision Oncology, 9(1), 141. https://doi.org/10.1038/s41698-025-00935-4 Lin, Z. (2024). How to write effective prompts for large language models. Nature Human Behaviour, 8(4), 611–615. https://doi.org/10.31234/osf.io/r78fc Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts (ArXiv Preprint ArXiv:2307.03172). http://arxiv.org/abs/2307.03172 Liu, Y., & Han, J. (2025). Climate Risk Disclosure and Financial Analysts’ Forecasts: Evidence from China. Sustainability, 17(7), 3178. https://doi.org/10.3390/su17073178 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach (ArXiv Preprint ArXiv:1907.11692). http://arxiv.org/abs/1907.11692 Luccioni, A., Baylor, E., & Duchene, N. (2020). Analyzing Sustainability Reports Using Natural Language Processing (ArXiv Preprint ArXiv:2011.08073). http://arxiv.org/abs/2011.08073 Mao, Y., He, J., & Chen, C. (2025). From Prompts to Templates: A Systematic Prompt Template Analysis for Real-world LLMapps (ArXiv Preprint ArXiv:2504.02052). http://arxiv.org/abs/2504.02052 Phan, H., Acharya, A., Chaturvedi, S., Sharma, S., Parker, M., Nally, D., Jannesari, A., Pazdernik, K., Halappanavar, M., Munikoti, S., & Horawalavithana, S. (2024). RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension (ArXiv Preprint ArXiv:2407.07321). http://arxiv.org/abs/2407.07321 Shorten, C., Pierse, C., Smith, T. B., Cardenas, E., Sharma, A., Trengrove, J., & van Luijt, B. (2024). StructuredRAG: JSON Response Formatting with Large Language Models (ArXiv Preprint ArXiv:2408.11061). http://arxiv.org/abs/2408.11061 Tian, K., & Chen, H. (2024). ESG-GPT:GPT4-based Few-Shot Prompt Learning for Multi-lingual ESG News Text Classification. Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing@LREC-COLING 2024, 279–282. Vamvourellis, D., & Mehta, D. (2025). Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis (ArXiv Preprint ArXiv:2506.04574). http://arxiv.org/abs/2506.04574 Wang, D. Y.-B., Shen, Z., Mishra, S. S., Xu, Z., Teng, Y., & Ding, H. (2025). SLOT: Structuring the Output of Large Language Models (ArXiv Preprint ArXiv:2505.04016). http://arxiv.org/abs/2505.04016 Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). ClimateBert: A Pretrained Language Model for Climate-Related Text (ArXiv Preprint ArXiv:2110.12010). http://arxiv.org/abs/2110.12010 Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi Quoc, E. H., Le, V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Chain-of-Thought Prompting. Advances in Neural Information Processing Systems, 35, 24824–24837. Wu, K., Wu, E., & Zou, J. (2024). ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence. Advances in Neural Information Processing Systems, 37. http://arxiv.org/abs/2404.10198 Xu, D., Huang, J., Ren, X., & Ye, M. (2024). ESG report textual similarity and stock price synchronicity: Evidence from China. Pacific-Basin Finance Journal, 85, 102343. https://doi.org/10.1016/j.pacfin.2024.102343 Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2023). A Survey of Large Language Models (2; ArXiv Preprint ArXiv:2303.18223, Vol. 1). http://arxiv.org/abs/2303.18223 Zhong, Q., Ding, L., Liu, J., Du, B., & Tao, D. (2023). Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT (ArXiv Preprint ArXiv:2302.10198). http://arxiv.org/abs/2302.10198 Zou, Y., Shi, M., Chen, Z., Deng, Z., Lei, Z., Zeng, Z., Yang, S., Tong, H., Xiao, L., & Zhou, W. (2025). ESGReveal: An LLM-based approach for extracting structured data from ESG reports. Journal of Cleaner Production, 489, 144572. https://doi.org/10.1016/j.jclepro.2024.144572
描述 碩士
國立政治大學
資訊管理學系
112356015
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112356015
資料類型 thesis
dc.contributor.advisor 林怡伶zh_TW
dc.contributor.advisor Lin, Yi-Lingen_US
dc.contributor.author (Authors) 王子云zh_TW
dc.contributor.author (Authors) Wang, Zih-Yunen_US
dc.creator (作者) 王子云zh_TW
dc.creator (作者) Wang, Zih-Yunen_US
dc.date (日期) 2025en_US
dc.date.accessioned 4-Aug-2025 14:26:12 (UTC+8)-
dc.date.available 4-Aug-2025 14:26:12 (UTC+8)-
dc.date.issued (上傳時間) 4-Aug-2025 14:26:12 (UTC+8)-
dc.identifier (Other Identifiers) G0112356015en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/158571-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 112356015zh_TW
dc.description.abstract (摘要) 隨著企業社會責任(CSR)與環境、社會及公司治理(ESG)概念日益受到重視,利害關係人越來越依賴從企業永續報告書中,一窺透明化的企業永續作為以及風險鑑別與管理方式。然而,企業面臨的風險複雜且多樣,使得利害關係人難以全面分析。若要自動化從冗長且無標準化格式的文本中提取顯性與隱性風險極具挑戰性,因為傳統的關鍵字方法難以應對多樣化用詞及細微語境差異。本研究與國立政治大學商學院的信義學院合作,提出一個端到端的檢索增強生成流程來自動化偵測中文永續報告中的風險,並以橫跨五個產業共 30 份 2024年在台灣發布的永續報告書上評估。我們比較了四種提示策略,包含零樣本、零樣本思維鏈、少樣本與少樣本思維鏈,並採用集成方法達成每項風險之中位數 F1值 0.90 的成果,同時兼顧時間與成本效益。對思維鏈輸出進行錯誤分析後,統整出四種常見錯誤類型。此外,我們釋出領域適應的提示模板,以助未來中文永續報告書中的風險偵測相關研究。研究結果顯示,結合大型語言模型、檢索增強生成與提示工程能可靠地自動化風險揭露分析,提升透明度並增強利害關係人的信任。zh_TW
dc.description.abstract (摘要) As the concepts of CSR and ESG receive growing attention, stakeholders increasingly rely on corporate sustainability reports to gain transparent insights into a company’s sustainability practices and its risk identification and management approaches. However, the complexity and diversity of these risks make it difficult to analyze comprehensively. Automatically extracting both explicit and implicit risks from lengthy, unstandardized texts is particularly challenging, as traditional keyword-based methods struggle to handle diverse wording and nuanced contexts. In collaboration with the Sinyi School at National Chengchi University’s College of Commerce, we propose an end-to-end Retrieval-Augmented Generation (RAG) pipeline for automated risk detection in Chinese sustainability reports and evaluate it on 30 Taiwanese 2024 reports spanning five industries. We compare four prompting strategies, including zero-shot, zero-shot chain-of-thought (CoT), few-shot, and few-shot CoT, and employ an ensemble approach that achieves a median per-risk F1 score of 0.90, while maintaining time- and cost-efficiency. Error analysis of CoT outputs uncovers four common failure types. Additionally, we develop domain-adapted prompt templates to support future risk detection research in Chinese sustainability reports. Our results demonstrate that combining Large Language Models (LLMs) with RAG and prompt engineering reliably automates risk-disclosure analysis, enhancing transparency and stakeholder trust.en_US
dc.description.tableofcontents 致謝 i 摘要 ii Abstract iii Table of Contents iv List of Figures vii List of Tables viii 1 Introduction 1 1.1 Research Background 1 1.2 Research Objective 2 2 Related Work 5 2.1 Text Classification with Generative LLMs in Specific Domains 5 2.2 Approaches to Sustainability Report Analysis 5 2.3 Language Models Overview 6 2.4 Large Language Models (LLMs) 7 2.4.1 Retrieval-Augmented Generation (RAG) 8 2.4.2 Prompt Engineering 8 3 Methodology 10 3.1 Risk Taxonomy and Disclosure Types 10 3.1.1 Categories and Development of Risk Definitions 10 3.1.2 Disclosure Types 11 3.2 Data collection 12 3.2.1 Sample Selection 12 3.2.2 Manual Annotation Process 15 3.3 Research Framework 16 3.4 Data preprocessing 17 3.5 RAG 18 3.5.1 Framework and Model Selection 18 3.5.2 Parameter Settings 19 3.5.3 Prompt Engineering Techniques 21 3.5.4 Prompt Design and Output Schema 21 3.6 Evaluation Metrics 25 4 Experiments 27 4.1 Pilot Study 27 4.1.1 Retrieval Threshold Sensitivity Analysis 27 4.1.2 Experiment Prompt Selection 28 4.2 Results 28 4.2.1 Performance by Overall Prompt Strategy 28 4.2.2 Performance by Industry and Prompt Strategy 29 4.2.3 Performance by Risk Category and Prompt Strategy 31 4.2.4 Ensemble Performance by Risk Category 32 4.3 Analysis of Reasoning and Disclosure Decisions 36 4.3.1 Evaluation of FP’s Chain-of-Thought Reasoning 37 4.3.2 Validation of Disclosure Type Decisions 41 5 Discussion and Conclusion 42 5.1 Discussion 42 5.1.1 RQ1: Comparative Performance of Prompting Strategies 42 5.1.2 RQ2: Benefits of Few-Shot Exemplars and CoT 43 5.1.3 RQ3: RAG Pipeline Design Considerations 43 5.1.4 Ensemble Performance Analysis 44 5.1.5 Insights from Supplementary Analysis 45 5.2 Conclusion 46 5.3 Limitation and Future Work 46 References 48 Appendix A 52 Appendix B 61zh_TW
dc.format.extent 9891106 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112356015en_US
dc.subject (關鍵詞) 企業社會責任 (CSR)zh_TW
dc.subject (關鍵詞) 環境社會與公司治理 (ESG)zh_TW
dc.subject (關鍵詞) 企業永續報告書zh_TW
dc.subject (關鍵詞) 大型語言模型 (LLMs)zh_TW
dc.subject (關鍵詞) 檢索增強生成 (RAG)zh_TW
dc.subject (關鍵詞) 提示工程zh_TW
dc.subject (關鍵詞) 思維鏈(chain-of-thought, CoT)zh_TW
dc.subject (關鍵詞) 語境風險偵測zh_TW
dc.subject (關鍵詞) Corporate Social Responsibility (CSR)en_US
dc.subject (關鍵詞) Environmental Social and Corporate Governance (ESG)en_US
dc.subject (關鍵詞) Sustainability reporten_US
dc.subject (關鍵詞) Large Language Models (LLMs)en_US
dc.subject (關鍵詞) Retrieval-Augmented Generation (RAG)en_US
dc.subject (關鍵詞) Prompt engineeringen_US
dc.subject (關鍵詞) Chain-of-Thought (CoT)en_US
dc.subject (關鍵詞) Contextual risk detectionen_US
dc.title (題名) 運用 LLM、RAG 與提示工程於永續報告書中的風險識別zh_TW
dc.title (題名) Leveraging LLM, RAG, and Prompt Engineering for Risk Identification in Sustainability Reportsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. In arXiv preprint arXiv:1908.10063 (ArXiv Preprint ArXiv:1908.10063). http://arxiv.org/abs/1908.10063 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33, 1877–1901. Bsharat, S. M., Myrzakhan, A., & Shen, Z. (2023). Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 (ArXiv Preprint ArXiv:2312.16171). http://arxiv.org/abs/2312.16171 Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020). LEGAL-BERT: The Muppets straight out of Law School: Vol. arXiv preprint (ArXiv Preprint ArXiv:2010.02559). http://arxiv.org/abs/2010.02559 Chen, Q., Hu, Y., Peng, X., Xie, Q., Jin, Q., Gilson, A., Singer, M. B., Ai, X., Lai, P. T., Wang, Z., Keloth, V. K., Raja, K., Huang, J., He, H., Lin, F., Du, J., Zhang, R., Zheng, W. J., Adelman, R. A., … Xu, H. (2025). Benchmarking large language models for biomedical natural language processing applications and recommendations. Nature Communications, 16(1), 3280. https://doi.org/10.1038/s41467-025-56989-2 Church, K. W., Chen, Z., & Ma, Y. (2021). Emerging trends: A gentle introduction to fine-tuning. Natural Language Engineering, 27(6), 763–778. https://doi.org/10.1017/S1351324921000322 Devlin, J., Chang, M.-W., Lee, K., Google, K. T., & Language, A. I. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. Duan, Z., & Wang, J. (2025). Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models. 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), 2222–2227. Ekin, S. (2023). Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices (Authorea Preprints). https://doi.org/10.36227/techrxiv.22683919.v2 Hadi, M. U., tashi, qasem al, Qureshi, R., Shah, A., muneer, amgad, Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, J., & Mirjalili, S. (2023). A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage (Authorea Preprints). https://doi.org/10.36227/techrxiv.23589741.v1 Huang, J., Wang, D. D., & Wang, Y. (2024). Textual Attributes of Corporate Sustainability Reports and ESG Ratings. Sustainability, 16(21), 9270. https://doi.org/10.3390/su16219270 Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2025). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions on Information Systems, 43(2), 1–55. https://doi.org/10.1145/3703155 Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Chen, D., Dai, W., Chan, H. S., Madotto, A., & Fung, P. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), 1–38. https://doi.org/10.1145/3571730 Kaźmierczak, M. (2022). A literature review on the difference between CSR and ESG. Zeszyty Naukowe. Organizacja i Zarządzanie/Politechnika Śląska, 2022(162), 275–289. https://doi.org/10.29119/1641-3466.2022.162.16 Leng, Q., Portes, J., Havens, S., Zaharia, M., & Carbin, M. (2024). Long Context RAG Performance of Large Language Models (ArXiv Preprint ArXiv:2411.03538). http://arxiv.org/abs/2411.03538 Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 9459–9474. Li, J., Yuan, Y., & Zhang, Z. (2024). Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases (ArXiv Preprint ArXiv:2403.10446). http://arxiv.org/abs/2403.10446 Lin, K. H., Kao, T. H., Wang, L. C., Kuo, C. T., Chen, P. C. H., Chu, Y. C., & Yeh, Y. C. (2025). Benchmarking large language models GPT-4o, llama 3.1, and qwen 2.5 for cancer genetic variant classification. NPJ Precision Oncology, 9(1), 141. https://doi.org/10.1038/s41698-025-00935-4 Lin, Z. (2024). How to write effective prompts for large language models. Nature Human Behaviour, 8(4), 611–615. https://doi.org/10.31234/osf.io/r78fc Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts (ArXiv Preprint ArXiv:2307.03172). http://arxiv.org/abs/2307.03172 Liu, Y., & Han, J. (2025). Climate Risk Disclosure and Financial Analysts’ Forecasts: Evidence from China. Sustainability, 17(7), 3178. https://doi.org/10.3390/su17073178 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach (ArXiv Preprint ArXiv:1907.11692). http://arxiv.org/abs/1907.11692 Luccioni, A., Baylor, E., & Duchene, N. (2020). Analyzing Sustainability Reports Using Natural Language Processing (ArXiv Preprint ArXiv:2011.08073). http://arxiv.org/abs/2011.08073 Mao, Y., He, J., & Chen, C. (2025). From Prompts to Templates: A Systematic Prompt Template Analysis for Real-world LLMapps (ArXiv Preprint ArXiv:2504.02052). http://arxiv.org/abs/2504.02052 Phan, H., Acharya, A., Chaturvedi, S., Sharma, S., Parker, M., Nally, D., Jannesari, A., Pazdernik, K., Halappanavar, M., Munikoti, S., & Horawalavithana, S. (2024). RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension (ArXiv Preprint ArXiv:2407.07321). http://arxiv.org/abs/2407.07321 Shorten, C., Pierse, C., Smith, T. B., Cardenas, E., Sharma, A., Trengrove, J., & van Luijt, B. (2024). StructuredRAG: JSON Response Formatting with Large Language Models (ArXiv Preprint ArXiv:2408.11061). http://arxiv.org/abs/2408.11061 Tian, K., & Chen, H. (2024). ESG-GPT:GPT4-based Few-Shot Prompt Learning for Multi-lingual ESG News Text Classification. Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing@LREC-COLING 2024, 279–282. Vamvourellis, D., & Mehta, D. (2025). Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis (ArXiv Preprint ArXiv:2506.04574). http://arxiv.org/abs/2506.04574 Wang, D. Y.-B., Shen, Z., Mishra, S. S., Xu, Z., Teng, Y., & Ding, H. (2025). SLOT: Structuring the Output of Large Language Models (ArXiv Preprint ArXiv:2505.04016). http://arxiv.org/abs/2505.04016 Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). ClimateBert: A Pretrained Language Model for Climate-Related Text (ArXiv Preprint ArXiv:2110.12010). http://arxiv.org/abs/2110.12010 Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi Quoc, E. H., Le, V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Chain-of-Thought Prompting. Advances in Neural Information Processing Systems, 35, 24824–24837. Wu, K., Wu, E., & Zou, J. (2024). ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence. Advances in Neural Information Processing Systems, 37. http://arxiv.org/abs/2404.10198 Xu, D., Huang, J., Ren, X., & Ye, M. (2024). ESG report textual similarity and stock price synchronicity: Evidence from China. Pacific-Basin Finance Journal, 85, 102343. https://doi.org/10.1016/j.pacfin.2024.102343 Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2023). A Survey of Large Language Models (2; ArXiv Preprint ArXiv:2303.18223, Vol. 1). http://arxiv.org/abs/2303.18223 Zhong, Q., Ding, L., Liu, J., Du, B., & Tao, D. (2023). Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT (ArXiv Preprint ArXiv:2302.10198). http://arxiv.org/abs/2302.10198 Zou, Y., Shi, M., Chen, Z., Deng, Z., Lei, Z., Zeng, Z., Yang, S., Tong, H., Xiao, L., & Zhou, W. (2025). ESGReveal: An LLM-based approach for extracting structured data from ESG reports. Journal of Cleaner Production, 489, 144572. https://doi.org/10.1016/j.jclepro.2024.144572zh_TW