學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 基於注意力機制語言模型之財務風險文章偵測與實體辨識
Financial Risk-related News Detection and Named Entity Recognition via Transformer-based Language Models
作者 盧佳妤
Lu, Jia-Yu
貢獻者 蔡銘峰
Tsai, Ming-Feng
盧佳妤
Lu, Jia-Yu
關鍵詞 注意力機制模型
聯合訓練
實體辨識
自然語言處理
Transformer
Attention mechanism
Joint training
Named-entity recognization
Natual language processing
日期 2021
上傳時間 1-Oct-2021 10:06:33 (UTC+8)
摘要 本研究利用注意力機制模型偵測財務文章之風險事件及抽取潛在金融犯罪名單,建構自動化模型以降低人力標記成本及提升預測速度。我們分析不同模型架構及訓練方法之優缺點,並比較傳統神經網路方法與 Transformer Based 模型的差異。模型架構分為兩階段,第一階段判斷目標文章是否包含金融風險事件,而第二階段則在這些文章中抽取高危險的名單。我們提出聯合訓練方法同時訓練兩階段的模型,透過實驗證明可在不損失正確性的情況提升訓練及預測速度,並得以提升模型穩定性。我們亦針對注意力機制模型內部的 Attention Weight 做視覺化分析,顯示模型能在不提供標注的情況自動關注金融風險詞彙。另外我們針對缺乏風險人名標記的訓練資料之情況,利用以上 Attention Weight 分析設計特殊的規則,達到一定程度的效果提升。最後我們額外在一個 Wikipedia 上的英文資料集做測試,說明此研究結果亦可應用於不同領域及不同語言的任務。
This thesis uses transformer-based models to detect risk events from financial articles and extract potential financial criminals. With such automated models, we can reduce human costs on labeling and increase prediction performance. In this thesis, we analyze the advantages and disadvantages of different approaches and compare the differences between traditional neural networks and
Transformer-based models. The proposed method contains two stages: the first stage determines whether the target news contains financial risk events, and the second stage extracts high-risk entities from the news. We propose a
joint-training method to train these two stages at the same time. Experimental results show that the proposed joint-training method improves prediction accuracy and enhances the stability of the training process. We also visualize
the attention weights of the attention mechanism model, showing that the model automatically pays attention to financial risk vocabularies without providing annotations. In addition, we use the above attention weight scheme to design special rules, achieving a certain degree of effect improvement for the case that lacks risk-name-annotation. Finally, further experiments conducted on a dataset from English Wikipedia confirm that the proposed method can also apply to different domains and languages.
參考文獻 [1] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learn- ing for natural language processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604–624, 2020.
[2] R.Jozefowicz,W.Zaremba,andI.Sutskever,“Anempiricalexplorationofrecurrent network architectures,” in International conference on machine learning. PMLR, 2015, pp. 2342–2350.
[3] S.HochreiterandJ.Schmidhuber,“Longshort-termmemory,”Neuralcomputation, vol. 9, no. 8, pp. 1735–1780, 1997.
[4] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[6] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[7] K. Potdar, T. S. Pardawala, and C. D. Pai, “A comparative study of categorical vari- able encoding techniques for neural network classifiers,” International journal of computer applications, vol. 175, no. 4, pp. 7–9, 2017.
[8] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word repre- sentations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[9] Q. Liu, M. J. Kusner, and P. Blunsom, “A survey on contextual embeddings,” arXiv preprint arXiv:2003.07278, 2020.
[10] J. Li, A. Sun, J. Han, and C. Li, “A survey on deep learning for named entity recog- nition,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2020.
[11] V.KrishnanandV.Ganapathy,“Namedentityrecognition,”StanfordLectureCS229, 2005.
[12] S. R. Eddy, “Hidden markov models,” Current opinion in structural biology, vol. 6, no. 3, pp. 361–365, 1996.
[13] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilis- tic models for segmenting and labeling sequence data,” 2001.
[14] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d`Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an- imperative-style-high-performance-deep-learning-library.pdf
[15] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. [Online]. Available: https://www.aclweb.org/anthology/2020.emnlp-demos.6
[16] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[17] I.LoshchilovandF.Hutter,“Decoupledweightdecayregularization,”arXivpreprint arXiv:1711.05101, 2017.
描述 碩士
國立政治大學
資訊科學系
108753120
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108753120
資料類型 thesis
dc.contributor.advisor 蔡銘峰zh_TW
dc.contributor.advisor Tsai, Ming-Fengen_US
dc.contributor.author (Authors) 盧佳妤zh_TW
dc.contributor.author (Authors) Lu, Jia-Yuen_US
dc.creator (作者) 盧佳妤zh_TW
dc.creator (作者) Lu, Jia-Yuen_US
dc.date (日期) 2021en_US
dc.date.accessioned 1-Oct-2021 10:06:33 (UTC+8)-
dc.date.available 1-Oct-2021 10:06:33 (UTC+8)-
dc.date.issued (上傳時間) 1-Oct-2021 10:06:33 (UTC+8)-
dc.identifier (Other Identifiers) G0108753120en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/137297-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 108753120zh_TW
dc.description.abstract (摘要) 本研究利用注意力機制模型偵測財務文章之風險事件及抽取潛在金融犯罪名單,建構自動化模型以降低人力標記成本及提升預測速度。我們分析不同模型架構及訓練方法之優缺點,並比較傳統神經網路方法與 Transformer Based 模型的差異。模型架構分為兩階段,第一階段判斷目標文章是否包含金融風險事件,而第二階段則在這些文章中抽取高危險的名單。我們提出聯合訓練方法同時訓練兩階段的模型,透過實驗證明可在不損失正確性的情況提升訓練及預測速度,並得以提升模型穩定性。我們亦針對注意力機制模型內部的 Attention Weight 做視覺化分析,顯示模型能在不提供標注的情況自動關注金融風險詞彙。另外我們針對缺乏風險人名標記的訓練資料之情況,利用以上 Attention Weight 分析設計特殊的規則,達到一定程度的效果提升。最後我們額外在一個 Wikipedia 上的英文資料集做測試,說明此研究結果亦可應用於不同領域及不同語言的任務。zh_TW
dc.description.abstract (摘要) This thesis uses transformer-based models to detect risk events from financial articles and extract potential financial criminals. With such automated models, we can reduce human costs on labeling and increase prediction performance. In this thesis, we analyze the advantages and disadvantages of different approaches and compare the differences between traditional neural networks and
Transformer-based models. The proposed method contains two stages: the first stage determines whether the target news contains financial risk events, and the second stage extracts high-risk entities from the news. We propose a
joint-training method to train these two stages at the same time. Experimental results show that the proposed joint-training method improves prediction accuracy and enhances the stability of the training process. We also visualize
the attention weights of the attention mechanism model, showing that the model automatically pays attention to financial risk vocabularies without providing annotations. In addition, we use the above attention weight scheme to design special rules, achieving a certain degree of effect improvement for the case that lacks risk-name-annotation. Finally, further experiments conducted on a dataset from English Wikipedia confirm that the proposed method can also apply to different domains and languages.
en_US
dc.description.tableofcontents 致謝 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章 緒論 1
1.1 前言 1
1.2 研究目的與貢獻 3
第二章 相關文獻探討 4
2.1 自然語言中的文字表示法 4
2.2 中文斷詞 5
2.3 循環神經網路 5
2.4 Transformer-Based模型 5
2.4.1 注意力機制模型 5
2.4.2 Transformer 6
2.4.3 BERT 6
2.5 命名實體識別 7
第三章 研究方法 9
3.1 問題定義 9
3.2 模型簡介 10
3.2.1 新聞文本分類任務 — CLASS任務 12
3.2.2 實體辨識任務 — NER任務 13
3.2.3 聯合訓練 — Joint Training 13
3.3 各種架構之模型實作方式 14
3.3.1 LSTM架構實作 14
3.3.2 Attention架構實作 15
3.3.3 混合使用LSTM與Attention架構 16
3.3.4 BERT架構實作 17
3.4 利用Attention Weight配合通用NER工具之實作 17
第四章 實驗結果 19
4.1 資料說明 19
4.1.1 Wikipedia資料(Wiki資料集) 19
4.1.2 新聞資料(News資料集) 20
4.2 實驗設定 20
4.3 實驗結果分析 21
4.3.1 文章分類結果 21
4.3.2 實體辨識結果 23
4.3.3 目標實體抽取結果 25
4.4 模型訓練參數分析 28
4.4.1 依Positive及Negative資料比例調整Loss權重之分析 28
4.4.2 Joint模型兩任務Loss之權重比較 29
4.5 Attention NER模型學習到之資訊分析 30
4.6 Attention Weight分析 30
第五章 結論 32
參考文獻 34
zh_TW
dc.format.extent 1849640 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108753120en_US
dc.subject (關鍵詞) 注意力機制模型zh_TW
dc.subject (關鍵詞) 聯合訓練zh_TW
dc.subject (關鍵詞) 實體辨識zh_TW
dc.subject (關鍵詞) 自然語言處理zh_TW
dc.subject (關鍵詞) Transformeren_US
dc.subject (關鍵詞) Attention mechanismen_US
dc.subject (關鍵詞) Joint trainingen_US
dc.subject (關鍵詞) Named-entity recognizationen_US
dc.subject (關鍵詞) Natual language processingen_US
dc.title (題名) 基於注意力機制語言模型之財務風險文章偵測與實體辨識zh_TW
dc.title (題名) Financial Risk-related News Detection and Named Entity Recognition via Transformer-based Language Modelsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learn- ing for natural language processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604–624, 2020.
[2] R.Jozefowicz,W.Zaremba,andI.Sutskever,“Anempiricalexplorationofrecurrent network architectures,” in International conference on machine learning. PMLR, 2015, pp. 2342–2350.
[3] S.HochreiterandJ.Schmidhuber,“Longshort-termmemory,”Neuralcomputation, vol. 9, no. 8, pp. 1735–1780, 1997.
[4] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[6] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[7] K. Potdar, T. S. Pardawala, and C. D. Pai, “A comparative study of categorical vari- able encoding techniques for neural network classifiers,” International journal of computer applications, vol. 175, no. 4, pp. 7–9, 2017.
[8] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word repre- sentations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[9] Q. Liu, M. J. Kusner, and P. Blunsom, “A survey on contextual embeddings,” arXiv preprint arXiv:2003.07278, 2020.
[10] J. Li, A. Sun, J. Han, and C. Li, “A survey on deep learning for named entity recog- nition,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2020.
[11] V.KrishnanandV.Ganapathy,“Namedentityrecognition,”StanfordLectureCS229, 2005.
[12] S. R. Eddy, “Hidden markov models,” Current opinion in structural biology, vol. 6, no. 3, pp. 361–365, 1996.
[13] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilis- tic models for segmenting and labeling sequence data,” 2001.
[14] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d`Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an- imperative-style-high-performance-deep-learning-library.pdf
[15] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. [Online]. Available: https://www.aclweb.org/anthology/2020.emnlp-demos.6
[16] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[17] I.LoshchilovandF.Hutter,“Decoupledweightdecayregularization,”arXivpreprint arXiv:1711.05101, 2017.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202101564en_US