Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 氣候相關財務揭露報告書的文字分析
A Study of Text Analysis on the Reports of Climate-related Financial Disclosures
作者 徐韶汶
Hsu, Shao-Wen
貢獻者 余清祥
Yue, Ching-Syang
徐韶汶
Hsu, Shao-Wen
關鍵詞 氣候相關財務揭露
文字探勘
寫作風格
探索性資料分析
機器學習模型
日期 2024
上傳時間 2-Jun-2025 14:29:22 (UTC+8)
摘要 我國行政院金融監督管理委員會宣布2023年起,銀行業及保險業應依規模及業務性質建立適切之氣候相關風險與機會之評估及揭露機制,促使相關業者在TCFD或稱永續報告書中提供氣候財務資訊,以因應極端氣候帶來的災害。本文研究各公司永續報告書的寫作風格,以38家銀行、21家壽險及 23家產險業者的報告書為研究對象,分析四大要素的寫作風格,檢視報告書是否依照框架(治理、策略、風險管理、指標與目標)揭露氣候財務資訊,同時也設計迴歸評分模型輔助專家進行評鑑。 我們先藉由探索性資料分析挑選與專家評分有關的變數,以提高迴歸模型的估計準確性,可能變數則包括公司、報告書的相關資訊。分析結果顯示三種業者的迴歸模型R2解釋力至少達到0.73,其中文字變數佔比超過一半,顯示文字資訊在報告評估的重要性。另外,本文也以500次交叉驗證評估迴歸模型的效果,分析發現顯著的解釋變數更為關鍵,使用較少變數的迴歸模型之估計結果優於隨機森林、XGBoost、支持向量機等機器學習模型。寫作風格分析則發現永續報告書的寫作風格較為單調,或許受限於金管會的規格及篇幅的要求,使得詞彙缺乏多樣性;即便如此,四要素的用字遣詞仍有相當大的差異,以常見詞彙就能清楚區隔報告書中的四個要素。
參考文獻 一、中文文獻 [1] 小金斧(2022)。「八大公股?民營?外商?一次帶你了解台灣的銀行競爭地圖」。小金斧。https://goldenaxes.net/bank_cat_asset/。 [2] 何立行、余清祥、鄭文惠(2014)。「從文言到白話:《新青年》雜誌語言變化統計研究」。《東亞觀念史集刊》,第 7 期,頁427–454。 [3] 余清祥、葉昱廷(2020)。「以文字探勘技術分析臺灣四大報文字風格」。數位典藏與數位人文,6,頁67–94。 [4] 吳蒨芸(2022)。「從文字探勘比較臺灣與中國之寫作風格—以《聯合報》、《人民日報》為例」。國立政治大學統計學系學位論文。 [5] 周桂田、郭雅婷、趙怡萌(2023)。「TCFD 調查報告—高碳排產業面對淨零轉型的挑戰」。2023 富邦永續大未來論壇。 [6] 劉貞莉(2024)。「臺灣碩博士論文之文字分析—以商業及管理學門摘要為例」。國立政治大學統計學系學位論文。 二、英文文獻 [1] Task Force on Climate-related Financial Disclosures. (2023). 2023 Status Report. https://www.fsb-tcfd.org/publications/. [2] Amar, J., Demaria, S., & Rigot, S. (2020). Enhancing financial transparency to mitigate climate change: Towards a climate risks and opportunities reporting index. GREDEG Working Paper, No. 2020–52. [3] Ding, D., Liu, B., & Chang, M. (2022). Carbon emissions and TCFD aligned climate-related information disclosures. Journal of Business Ethics, 182(4), 967–1001. [4] Auzepy, A., Lenz, D., Tonjes, E., & Funk, C. (2023). Evaluating TCFD reporting: A new application of zero-shot analysis to climate-related financial disclosures. PLoS ONE, 18(11): e0288052. https://doi.org/10.1371/journal.pone.0288052. [5] Krueger, P., Sautner, Z., & Starks, L. T. (2020). The importance of climate risks for institutional investors. The Review of Financial Studies, 33(3), 1067–1111. [6] Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). ClimateBert: A pretrained language model for climate-related text. CoRR, abs/2110.12010. [7] Cody, E. M., Reagan, A. J., Mitchell, L., Dodds, P. S., & Danforth, C. M. (2015). Climate change sentiment on Twitter: An unsolicited public opinion poll. PLOS ONE, 10(8), e0136092. [8] Sautner, Z., van Lent, L., Vilkov, G., & Zhang, R. (2022). Firm-level climate change exposure. Journal of Finance, 78(3), 1449–1498. [9] Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020). ClimaText: A dataset for climate change topic detection. In Tackling Climate Change with Machine Learning (Climate Change AI) Workshop at NeurIPS. [10] Luccioni, A., Baylor, E., & Duchene, N. (2020). Analyzing sustainability reports using natural language processing. arXiv preprint, arXiv:2011.08073. [11] Diggelmann, T., Boyd-Graber, J., Bulian, J., Ciaramita, M., & Leippold, M. (2020). Climate-fever: A dataset for verification of real-world climate claims. arXiv preprint, arXiv:2012.00614v2. https://doi.org/10.48550/arXiv.2012.00614. [12] Morio, G., & Manning, C. D. (2023). An NLP benchmark dataset for assessing corporate climate policy engagement. NeurIPS 2023 Datasets and Benchmarks Track. https://proceedings.neurips.cc/paper_files/paper/2023/file/7ccaa4f9a89cce6619093226f26b84e6-Paper-Datasets_and_Benchmarks.pdf [13] Rowlands, H., Morio, G., Tanner, D., & Manning, C. D. (2024). Predicting narratives of climate obstruction in social media advertising. Findings of ACL 2024, 5547–5558. [14] Morio, G., In, S. Y., Yoon, J., Rowlands, H., & Manning, C. D. (2024). ReportParse: A unified NLP tool for extracting document structure and semantics of corporate sustainability reporting. IJCAI-24 Demonstrations Track. [15] Coen, D., Herman, K., & Pegram, T. (2022). Are corporate climate efforts genuine? An empirical analysis of the climate ‘talk–walk’ hypothesis. Business Strategy and the Environment. DOI: 10.1002/bse.3063. [16] Ding, D., Liu, B., & Chang, M. (2023). Carbon emissions and TCFD aligned climate-related information disclosures. Journal of Business Ethics, 182, 967–1001. https://doi.org/10.1007/s10551-022-05292-x. [17] Di Marco, R., Dong, T., Malatincová, R., Reuter, M., & Strömsten, T. (2022). Symbol or substance? Scrutinizing the ‘risk transparency premise’ in marketized sustainable finance: The case of TCFD reporting. Business Strategy and the Environment. DOI: 10.1002/bse.3285. [18] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. [19] Simpson, E. H. (1949). Measurement of diversity. Nature, 163(4148), 688. [20] Templin, M. C. (1957). Certain language skills in children; their development and interrelationships. University of Minnesota Press. https://www.jstor.org/stable/10.5749/j.ctttv2st. [21] Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard's index of similarity. Systematic Biology, 45(3), 380–385. [22] Yue, J. C., & Clayton, M. K. (2005). A similarity measure based on species proportions. Communications in Statistics - Theory and Methods, 34(11), 2123–2131. [23] Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4), 35–43. [24] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781. [25] Hocking, R. R. (1976). The analysis and selection of variables in linear regression. Journal of the Royal Statistical Society: Series B (Methodological), 38(2), 139–147. [26] Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15(4), 661–675. [27] Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari, 4, 83–91. [28] Smirnov, N. V. (1948). Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics, 19(2), 279–281. [29] Goldfeld, S. M., & Quandt, R. E. (1965). Some tests for homoscedasticity. Journal of the American Statistical Association, 60(310), 539–547. [30] Utts, J. M., & Heckard, R. F. (2010). Mind on statistics (4th ed.). Cengage Learning. [31] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. [32] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. [33] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In KDD '16, 785–794. [34] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
描述 碩士
國立政治大學
風險管理與保險學系
111358012
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111358012
資料類型 thesis
dc.contributor.advisor 余清祥zh_TW
dc.contributor.advisor Yue, Ching-Syangen_US
dc.contributor.author (Authors) 徐韶汶zh_TW
dc.contributor.author (Authors) Hsu, Shao-Wenen_US
dc.creator (作者) 徐韶汶zh_TW
dc.creator (作者) Hsu, Shao-Wenen_US
dc.date (日期) 2024en_US
dc.date.accessioned 2-Jun-2025 14:29:22 (UTC+8)-
dc.date.available 2-Jun-2025 14:29:22 (UTC+8)-
dc.date.issued (上傳時間) 2-Jun-2025 14:29:22 (UTC+8)-
dc.identifier (Other Identifiers) G0111358012en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/157198-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 風險管理與保險學系zh_TW
dc.description (描述) 111358012zh_TW
dc.description.abstract (摘要) 我國行政院金融監督管理委員會宣布2023年起,銀行業及保險業應依規模及業務性質建立適切之氣候相關風險與機會之評估及揭露機制,促使相關業者在TCFD或稱永續報告書中提供氣候財務資訊,以因應極端氣候帶來的災害。本文研究各公司永續報告書的寫作風格,以38家銀行、21家壽險及 23家產險業者的報告書為研究對象,分析四大要素的寫作風格,檢視報告書是否依照框架(治理、策略、風險管理、指標與目標)揭露氣候財務資訊,同時也設計迴歸評分模型輔助專家進行評鑑。 我們先藉由探索性資料分析挑選與專家評分有關的變數,以提高迴歸模型的估計準確性,可能變數則包括公司、報告書的相關資訊。分析結果顯示三種業者的迴歸模型R2解釋力至少達到0.73,其中文字變數佔比超過一半,顯示文字資訊在報告評估的重要性。另外,本文也以500次交叉驗證評估迴歸模型的效果,分析發現顯著的解釋變數更為關鍵,使用較少變數的迴歸模型之估計結果優於隨機森林、XGBoost、支持向量機等機器學習模型。寫作風格分析則發現永續報告書的寫作風格較為單調,或許受限於金管會的規格及篇幅的要求,使得詞彙缺乏多樣性;即便如此,四要素的用字遣詞仍有相當大的差異,以常見詞彙就能清楚區隔報告書中的四個要素。zh_TW
dc.description.tableofcontents 第壹章 緒論 3 第一節 研究動機 3 第二節 研究流程 4 第貳章 文獻探討及研究方法 5 第一節 文獻回顧 5 第二節 研究素材 7 第三節 文字分析方法 9 第四節 多變量迴歸模型 14 第五節 分類模型 16 第參章 評分模型 22 第一節 變數介紹 23 第二節 探索資料特色 28 第三節 多變量迴歸模型建置 36 第四節 機器學習模型驗證 41 第肆章 寫作風格分析 45 第一節 四大要素風格分析 46 第二節 常見詞彙探索 49 第三節 關鍵詞叢 58 第四節 模型分類結果 65 第伍章 結論與限制 70 第一節 結論與建議 70 第二節 限制與展望 72 參考文獻 74 附錄一、變數與分數的相關性 78 附錄二、分數高低之報告特性 82 附錄三、各產業公司文字模型殘差分析圖 86 附錄四、機器學習模型之全部特徵 90 附錄五、各產業交叉驗證結果 92 附錄六、各產業之模型比較與特徵比較 93zh_TW
dc.format.extent 4986675 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111358012en_US
dc.subject (關鍵詞) 氣候相關財務揭露zh_TW
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) 寫作風格zh_TW
dc.subject (關鍵詞) 探索性資料分析zh_TW
dc.subject (關鍵詞) 機器學習模型zh_TW
dc.title (題名) 氣候相關財務揭露報告書的文字分析zh_TW
dc.title (題名) A Study of Text Analysis on the Reports of Climate-related Financial Disclosuresen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 一、中文文獻 [1] 小金斧(2022)。「八大公股?民營?外商?一次帶你了解台灣的銀行競爭地圖」。小金斧。https://goldenaxes.net/bank_cat_asset/。 [2] 何立行、余清祥、鄭文惠(2014)。「從文言到白話:《新青年》雜誌語言變化統計研究」。《東亞觀念史集刊》,第 7 期,頁427–454。 [3] 余清祥、葉昱廷(2020)。「以文字探勘技術分析臺灣四大報文字風格」。數位典藏與數位人文,6,頁67–94。 [4] 吳蒨芸(2022)。「從文字探勘比較臺灣與中國之寫作風格—以《聯合報》、《人民日報》為例」。國立政治大學統計學系學位論文。 [5] 周桂田、郭雅婷、趙怡萌(2023)。「TCFD 調查報告—高碳排產業面對淨零轉型的挑戰」。2023 富邦永續大未來論壇。 [6] 劉貞莉(2024)。「臺灣碩博士論文之文字分析—以商業及管理學門摘要為例」。國立政治大學統計學系學位論文。 二、英文文獻 [1] Task Force on Climate-related Financial Disclosures. (2023). 2023 Status Report. https://www.fsb-tcfd.org/publications/. [2] Amar, J., Demaria, S., & Rigot, S. (2020). Enhancing financial transparency to mitigate climate change: Towards a climate risks and opportunities reporting index. GREDEG Working Paper, No. 2020–52. [3] Ding, D., Liu, B., & Chang, M. (2022). Carbon emissions and TCFD aligned climate-related information disclosures. Journal of Business Ethics, 182(4), 967–1001. [4] Auzepy, A., Lenz, D., Tonjes, E., & Funk, C. (2023). Evaluating TCFD reporting: A new application of zero-shot analysis to climate-related financial disclosures. PLoS ONE, 18(11): e0288052. https://doi.org/10.1371/journal.pone.0288052. [5] Krueger, P., Sautner, Z., & Starks, L. T. (2020). The importance of climate risks for institutional investors. The Review of Financial Studies, 33(3), 1067–1111. [6] Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). ClimateBert: A pretrained language model for climate-related text. CoRR, abs/2110.12010. [7] Cody, E. M., Reagan, A. J., Mitchell, L., Dodds, P. S., & Danforth, C. M. (2015). Climate change sentiment on Twitter: An unsolicited public opinion poll. PLOS ONE, 10(8), e0136092. [8] Sautner, Z., van Lent, L., Vilkov, G., & Zhang, R. (2022). Firm-level climate change exposure. Journal of Finance, 78(3), 1449–1498. [9] Varini, F. S., Boyd-Graber, J., Ciaramita, M., & Leippold, M. (2020). ClimaText: A dataset for climate change topic detection. In Tackling Climate Change with Machine Learning (Climate Change AI) Workshop at NeurIPS. [10] Luccioni, A., Baylor, E., & Duchene, N. (2020). Analyzing sustainability reports using natural language processing. arXiv preprint, arXiv:2011.08073. [11] Diggelmann, T., Boyd-Graber, J., Bulian, J., Ciaramita, M., & Leippold, M. (2020). Climate-fever: A dataset for verification of real-world climate claims. arXiv preprint, arXiv:2012.00614v2. https://doi.org/10.48550/arXiv.2012.00614. [12] Morio, G., & Manning, C. D. (2023). An NLP benchmark dataset for assessing corporate climate policy engagement. NeurIPS 2023 Datasets and Benchmarks Track. https://proceedings.neurips.cc/paper_files/paper/2023/file/7ccaa4f9a89cce6619093226f26b84e6-Paper-Datasets_and_Benchmarks.pdf [13] Rowlands, H., Morio, G., Tanner, D., & Manning, C. D. (2024). Predicting narratives of climate obstruction in social media advertising. Findings of ACL 2024, 5547–5558. [14] Morio, G., In, S. Y., Yoon, J., Rowlands, H., & Manning, C. D. (2024). ReportParse: A unified NLP tool for extracting document structure and semantics of corporate sustainability reporting. IJCAI-24 Demonstrations Track. [15] Coen, D., Herman, K., & Pegram, T. (2022). Are corporate climate efforts genuine? An empirical analysis of the climate ‘talk–walk’ hypothesis. Business Strategy and the Environment. DOI: 10.1002/bse.3063. [16] Ding, D., Liu, B., & Chang, M. (2023). Carbon emissions and TCFD aligned climate-related information disclosures. Journal of Business Ethics, 182, 967–1001. https://doi.org/10.1007/s10551-022-05292-x. [17] Di Marco, R., Dong, T., Malatincová, R., Reuter, M., & Strömsten, T. (2022). Symbol or substance? Scrutinizing the ‘risk transparency premise’ in marketized sustainable finance: The case of TCFD reporting. Business Strategy and the Environment. DOI: 10.1002/bse.3285. [18] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. [19] Simpson, E. H. (1949). Measurement of diversity. Nature, 163(4148), 688. [20] Templin, M. C. (1957). Certain language skills in children; their development and interrelationships. University of Minnesota Press. https://www.jstor.org/stable/10.5749/j.ctttv2st. [21] Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard's index of similarity. Systematic Biology, 45(3), 380–385. [22] Yue, J. C., & Clayton, M. K. (2005). A similarity measure based on species proportions. Communications in Statistics - Theory and Methods, 34(11), 2123–2131. [23] Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4), 35–43. [24] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781. [25] Hocking, R. R. (1976). The analysis and selection of variables in linear regression. Journal of the Royal Statistical Society: Series B (Methodological), 38(2), 139–147. [26] Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15(4), 661–675. [27] Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari, 4, 83–91. [28] Smirnov, N. V. (1948). Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics, 19(2), 279–281. [29] Goldfeld, S. M., & Quandt, R. E. (1965). Some tests for homoscedasticity. Journal of the American Statistical Association, 60(310), 539–547. [30] Utts, J. M., & Heckard, R. F. (2010). Mind on statistics (4th ed.). Cengage Learning. [31] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. [32] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. [33] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In KDD '16, 785–794. [34] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.zh_TW