LLM提示工程與查核報告能否提升財報舞弊偵測？ | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	LLM提示工程與查核報告能否提升財報舞弊偵測？ Does LLM Prompt Engineering and Audit Report Embedding Improve Financial Fraud Detection?
作者	張永愛 Chang, Yung-Ai
貢獻者	莊皓鈞<br>周彥君 Chuang, Hao-Chun<br>Chou, Yen-Chun 張永愛 Chang, Yung-Ai
關鍵詞	財報舞弊偵測查核報告嵌入提示工程 BERT SBERT 孤立森林 SHAP values Financial Statement Fraud Detection Auditor Report Embedding Prompt Engineering BERT SBERT Isolation Forest SHAP Values
日期	2025
上傳時間	1-Sep-2025 15:05:34 (UTC+8)
摘要	本研究探討結合大型語言模型（Large Language Models, LLM）提示工程與會計師查核報告嵌入（embedding）是否能提升財報舞弊偵測的效果。相較於過往僅使用數值型財務與非財務指標進行分析，本研究納入文字型內容，透過 ChatGPT-4o 提取與舞弊風險高度相關的五大語意構面與關鍵字，並結合 BERT 與 Sentence-BERT 等語言模型進行語意向量化，建立具語意辨識能力的文字型指標。實證資料涵蓋台灣上市、上櫃、興櫃與創新版等公司，舞弊樣本由投保中心公布之「財報不實」訴訟案件中選取，正常樣本則依相同產業與時間配對。分析方法採用無監督學習之孤立森林（Isolation Forest，IF）進行異常偵測，並結合 SHAP values 提升模型可解釋性。研究結果顯示，納入文字型指標能有效提升舞弊偵測之敏感度與精確性，特別是在採樣平衡情境下，「關鍵查核事項＋年分」模型之真陽性數為全指標模型的兩倍，偽陽性亦較少。此外，SBERT 雖能提升召回率，但相對於 BERT 模型，其誤判數亦較多，顯示需視應用情境權衡選擇。本研究證實查核報告中語意訊號對舞弊風險具有高度辨識力，並提供監理機構與企業一套具備實務可行性的早期預警方法。 This study explores whether integrating prompt engineering with large language models (LLMs) and auditor report embeddings can enhance the detection of financial statement fraud. Unlike previous approaches that relied solely on numerical financial and non-financial indicators, this research incorporates textual data by extracting five key semantic dimensions and associated keywords related to fraud risk using ChatGPT-4o. These textual features are then vectorized using language models such as BERT and Sentence-BERT to create semantically meaningful indicators. The empirical data covers companies listed on the Taiwan Stock Exchange, OTC (Over-the-Counter), Emerging Stock Board, and the Innovation Board. Fraudulent samples are selected from financial misstatement litigation cases disclosed by the Securities and Futures Investors Protection Center. Normal samples are matched based on industry and reporting period. The analysis employs an unsupervised anomaly detection method, Isolation Forest (IF), and incorporates SHAP values to enhance model interpretability. The results show that incorporating textual indicators significantly improves the sensitivity and precision of fraud detection. In particular, under balanced sampling conditions, the "Key Audit Matters + Year" model identified twice as many true positives and fewer false positives compared to the full-feature model. While SBERT improved recall rates, it also resulted in more false positives than the BERT-based model, suggesting a trade-off depending on application context. This study confirms that semantic signals within auditor reports are highly indicative of fraud risk and offers a practical early warning framework for regulators and companies.
參考文獻	Achakzai, M. A. K., & Peng, J. (2023). Detecting financial statement fraud using dynamic ensemble machine learning. International Review of Financial Analysis, 89. Beneish, D. M. (1999). The Detection of Earnings Manipulation. Financial Analysts Journal, 55(5), 24–36. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fairchild, R., & Marnet, O. (2022). Cycles of Corporate Fraud: a Behavioural Economics Approach. In Research Handbook on Corporate Board Decision-Making: Research Handbooks in Business and Management series, 367-401. Hariri, S., Carrasco Kind, M., & Brunner, R. J. (2021). Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1479-1489. Hwang, TK., Chen, WC., Chiang, WC., Li, YM. (2022). Machine Learning Detection for Financial Statement Fraud. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data Mining techniques for the detection of fraudulent financial statements. Expert Systems With Applications, 32(4), 995-1003. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. Leevy, J. L., Salekshahrezaee, Z., & Khoshgoftaar, T. M. (2024). A Review of Unsupervised Anomaly Detection Techniques for Health Insurance Fraud. 141-149. Li, W., Liu, X., & Zhou, S. (2024). Deep Learning Model Based Research on Anomaly Detection and Financial Fraud Identification in Corporate Financial Reporting Statements. The Journal of Combinatorial Mathematics and Combinatorial Computing, 123(1), 343-355. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery From Data, 6(1), 3. Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) Isolation Forest. 2008 8th IEEE International Conference on Data Mining, Pisa, 15-19 December 2008, 413-422. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. Lundberg, S.M. and Lee, S.-I. (2017) A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 4766-4777. Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. 50(3), 559-569. Perols, J., & Lougee, B. A. (2011). The relation between earnings management and financial statement fraud. Advances in Accounting, 27(1), 39-53. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140), 1-67. Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. 50(2), 491-500. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084 Schilit, M. (2010). Financial Shenanigans: Detecting Accounting Gimmicks That Destroy Investments (corrected November 2010). 27(4), 67-74. Sharma, V. D. (2004). Board of Director Characteristics, Institutional Ownership and Fraud: Evidence from Australia. Ear and Hearing, 23(2), 105-117. Shahana, T., Lavanya, V., & Bhat, A. R. (2023). State of the art in financial statement fraud detection: A systematic review. Technological Forecasting and Social Change, 192, 122527. Summers, S. L., & Sweeney, J. T. (1998). Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis. The Accounting Review, 73(1), 131-146. van Vugt, M., Hogan, R., & Kaiser, R. B. (2008). Leadership, followership, and evolution: Some lessons from the past. American Psychologist, 63(3), 182-196. Vasarhelyi, M. A., Kogan, A., & Tuttle, B. (2015). Big Data in Accounting: An Overview. Accounting Horizons, 29(2), 381-396. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. Yao, J., Pan, Y., Yang, S., Chen, Y., & Li, Y. (2019). Detecting Fraudulent Financial Statements for the Sustainable Development of the Socio-Economy in China: A Multi-Analytic Approach. Sustainability, 11(6), 1579. Zainudin, E. F., & Hashim, H. A. (2016). Detecting fraudulent financial reporting using financial ratio. Journal of Financial Reporting and Accounting, 14(2), 266-278. 林均祐，2023，〈管理階層討論與分析語調對股票報酬中反映的預期未來盈餘之影響〉，國立臺灣大學會計學研究所碩士論文。張莉，2019，〈雲時代的舞弊審計――基於國家治理的新策略〉，《Business & Economics》，崧燁文化出版。許伯彥，2003，〈財務報表舞弊風險評量模式硏究〉，國立臺灣大學會計學研究所碩士論文。陳雪如、林琦珍、柯佳玲，2009，〈自願性資訊揭露對財務報導舞弊偵測之研究〉，《會計與公司治理》，6(2)。陳雪如、黃劭彥、史雅男、蕭鎮臺，〈再探財務報表舞弊-風險因子新鑑識〉。劉若蘭、李旻育，2017，〈董事會政治關聯, 客戶重要性對財務報導舞弊之影響〉。劉桂良、葉寶松、周蘭，2009，〈舞弊治理:基於上市公司財務舞弊特徵的分析〉，《財經理論與實踐》，頁52-56。
描述	碩士國立政治大學資訊管理學系 112356037
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112356037
資料類型	thesis

dc.contributor.advisor	莊皓鈞<br>周彥君	zh_TW
dc.contributor.advisor	Chuang, Hao-Chun<br>Chou, Yen-Chun	en_US
dc.contributor.author (Authors)	張永愛	zh_TW
dc.contributor.author (Authors)	Chang, Yung-Ai	en_US
dc.creator (作者)	張永愛	zh_TW
dc.creator (作者)	Chang, Yung-Ai	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	1-Sep-2025 15:05:34 (UTC+8)	-
dc.date.available	1-Sep-2025 15:05:34 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2025 15:05:34 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112356037	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/159097	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	112356037	zh_TW
dc.description.abstract (摘要)	本研究探討結合大型語言模型（Large Language Models, LLM）提示工程與會計師查核報告嵌入（embedding）是否能提升財報舞弊偵測的效果。相較於過往僅使用數值型財務與非財務指標進行分析，本研究納入文字型內容，透過 ChatGPT-4o 提取與舞弊風險高度相關的五大語意構面與關鍵字，並結合 BERT 與 Sentence-BERT 等語言模型進行語意向量化，建立具語意辨識能力的文字型指標。實證資料涵蓋台灣上市、上櫃、興櫃與創新版等公司，舞弊樣本由投保中心公布之「財報不實」訴訟案件中選取，正常樣本則依相同產業與時間配對。分析方法採用無監督學習之孤立森林（Isolation Forest，IF）進行異常偵測，並結合 SHAP values 提升模型可解釋性。研究結果顯示，納入文字型指標能有效提升舞弊偵測之敏感度與精確性，特別是在採樣平衡情境下，「關鍵查核事項＋年分」模型之真陽性數為全指標模型的兩倍，偽陽性亦較少。此外，SBERT 雖能提升召回率，但相對於 BERT 模型，其誤判數亦較多，顯示需視應用情境權衡選擇。本研究證實查核報告中語意訊號對舞弊風險具有高度辨識力，並提供監理機構與企業一套具備實務可行性的早期預警方法。	zh_TW
dc.description.abstract (摘要)	This study explores whether integrating prompt engineering with large language models (LLMs) and auditor report embeddings can enhance the detection of financial statement fraud. Unlike previous approaches that relied solely on numerical financial and non-financial indicators, this research incorporates textual data by extracting five key semantic dimensions and associated keywords related to fraud risk using ChatGPT-4o. These textual features are then vectorized using language models such as BERT and Sentence-BERT to create semantically meaningful indicators. The empirical data covers companies listed on the Taiwan Stock Exchange, OTC (Over-the-Counter), Emerging Stock Board, and the Innovation Board. Fraudulent samples are selected from financial misstatement litigation cases disclosed by the Securities and Futures Investors Protection Center. Normal samples are matched based on industry and reporting period. The analysis employs an unsupervised anomaly detection method, Isolation Forest (IF), and incorporates SHAP values to enhance model interpretability. The results show that incorporating textual indicators significantly improves the sensitivity and precision of fraud detection. In particular, under balanced sampling conditions, the "Key Audit Matters + Year" model identified twice as many true positives and fewer false positives compared to the full-feature model. While SBERT improved recall rates, it also resulted in more false positives than the BERT-based model, suggesting a trade-off depending on application context. This study confirms that semantic signals within auditor reports are highly indicative of fraud risk and offers a practical early warning framework for regulators and companies.	en_US
dc.description.tableofcontents	摘要 i Abstract ii 目次 iii 表次 v 圖次 vi 第一章緒論 1 第二章文獻回顧 4 第一節偵測財務報表舞弊所使用的數值特徵 4 第二節 LLM, BERT, Sentence BERT 10 第三節孤立森林與 SHAP values 12 第三章資料流程與變數設計 18 第一節資料來源 18 第二節文字變數的建構設計流程 22 一、利用 LLM Prompting 進行概念生成 22 二、會計師查核報告嵌入提取 26 第四章實證分析 29 第一節實驗流程設計 29 一、全指標 + 全產業 + 全體樣本 30 二、全指標 + 全產業 + 採樣平衡 31 第二節僅使用文字指標進行異常偵測之分析 36 一、文字指標 + 年分 37 二、文字指標 + 年分 + 產業碼 41 第三節特定產業別之異常偵測表現分析 45 第五章結論 51 第一節研究結論 51 第二節研究建議 56 參考文獻 58	zh_TW
dc.format.extent	4120633 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112356037	en_US
dc.subject (關鍵詞)	財報舞弊偵測	zh_TW
dc.subject (關鍵詞)	查核報告嵌入	zh_TW
dc.subject (關鍵詞)	提示工程	zh_TW
dc.subject (關鍵詞)	BERT	zh_TW
dc.subject (關鍵詞)	SBERT	zh_TW
dc.subject (關鍵詞)	孤立森林	zh_TW
dc.subject (關鍵詞)	SHAP values	zh_TW
dc.subject (關鍵詞)	Financial Statement Fraud Detection	en_US
dc.subject (關鍵詞)	Auditor Report Embedding	en_US
dc.subject (關鍵詞)	Prompt Engineering	en_US
dc.subject (關鍵詞)	BERT	en_US
dc.subject (關鍵詞)	SBERT	en_US
dc.subject (關鍵詞)	Isolation Forest	en_US
dc.subject (關鍵詞)	SHAP Values	en_US
dc.title (題名)	LLM提示工程與查核報告能否提升財報舞弊偵測？	zh_TW
dc.title (題名)	Does LLM Prompt Engineering and Audit Report Embedding Improve Financial Fraud Detection?	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Achakzai, M. A. K., & Peng, J. (2023). Detecting financial statement fraud using dynamic ensemble machine learning. International Review of Financial Analysis, 89. Beneish, D. M. (1999). The Detection of Earnings Manipulation. Financial Analysts Journal, 55(5), 24–36. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fairchild, R., & Marnet, O. (2022). Cycles of Corporate Fraud: a Behavioural Economics Approach. In Research Handbook on Corporate Board Decision-Making: Research Handbooks in Business and Management series, 367-401. Hariri, S., Carrasco Kind, M., & Brunner, R. J. (2021). Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1479-1489. Hwang, TK., Chen, WC., Chiang, WC., Li, YM. (2022). Machine Learning Detection for Financial Statement Fraud. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data Mining techniques for the detection of fraudulent financial statements. Expert Systems With Applications, 32(4), 995-1003. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. Leevy, J. L., Salekshahrezaee, Z., & Khoshgoftaar, T. M. (2024). A Review of Unsupervised Anomaly Detection Techniques for Health Insurance Fraud. 141-149. Li, W., Liu, X., & Zhou, S. (2024). Deep Learning Model Based Research on Anomaly Detection and Financial Fraud Identification in Corporate Financial Reporting Statements. The Journal of Combinatorial Mathematics and Combinatorial Computing, 123(1), 343-355. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-Based Anomaly Detection. ACM Transactions on Knowledge Discovery From Data, 6(1), 3. Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) Isolation Forest. 2008 8th IEEE International Conference on Data Mining, Pisa, 15-19 December 2008, 413-422. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. Lundberg, S.M. and Lee, S.-I. (2017) A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 4766-4777. Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. 50(3), 559-569. Perols, J., & Lougee, B. A. (2011). The relation between earnings management and financial statement fraud. Advances in Accounting, 27(1), 39-53. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140), 1-67. Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. 50(2), 491-500. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084 Schilit, M. (2010). Financial Shenanigans: Detecting Accounting Gimmicks That Destroy Investments (corrected November 2010). 27(4), 67-74. Sharma, V. D. (2004). Board of Director Characteristics, Institutional Ownership and Fraud: Evidence from Australia. Ear and Hearing, 23(2), 105-117. Shahana, T., Lavanya, V., & Bhat, A. R. (2023). State of the art in financial statement fraud detection: A systematic review. Technological Forecasting and Social Change, 192, 122527. Summers, S. L., & Sweeney, J. T. (1998). Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis. The Accounting Review, 73(1), 131-146. van Vugt, M., Hogan, R., & Kaiser, R. B. (2008). Leadership, followership, and evolution: Some lessons from the past. American Psychologist, 63(3), 182-196. Vasarhelyi, M. A., Kogan, A., & Tuttle, B. (2015). Big Data in Accounting: An Overview. Accounting Horizons, 29(2), 381-396. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. Yao, J., Pan, Y., Yang, S., Chen, Y., & Li, Y. (2019). Detecting Fraudulent Financial Statements for the Sustainable Development of the Socio-Economy in China: A Multi-Analytic Approach. Sustainability, 11(6), 1579. Zainudin, E. F., & Hashim, H. A. (2016). Detecting fraudulent financial reporting using financial ratio. Journal of Financial Reporting and Accounting, 14(2), 266-278. 林均祐，2023，〈管理階層討論與分析語調對股票報酬中反映的預期未來盈餘之影響〉，國立臺灣大學會計學研究所碩士論文。張莉，2019，〈雲時代的舞弊審計――基於國家治理的新策略〉，《Business & Economics》，崧燁文化出版。許伯彥，2003，〈財務報表舞弊風險評量模式硏究〉，國立臺灣大學會計學研究所碩士論文。陳雪如、林琦珍、柯佳玲，2009，〈自願性資訊揭露對財務報導舞弊偵測之研究〉，《會計與公司治理》，6(2)。陳雪如、黃劭彥、史雅男、蕭鎮臺，〈再探財務報表舞弊-風險因子新鑑識〉。劉若蘭、李旻育，2017，〈董事會政治關聯, 客戶重要性對財務報導舞弊之影響〉。劉桂良、葉寶松、周蘭，2009，〈舞弊治理:基於上市公司財務舞弊特徵的分析〉，《財經理論與實踐》，頁52-56。	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM