學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 由史料中探勘職官年表:以康熙時期為例
Discovering Official Chronology from Historical Documents Using Kangxi`s Reign as an Example
作者 闕伯丞
Que, Bo Cheng
貢獻者 沈錳坤
Shan, Man Kwan
闕伯丞
Que, Bo Cheng
關鍵詞 資料探勘
職官年表
清聖祖實錄
Data Mining
Official Chronology
Veritable Records of Qing
日期 2009
上傳時間 1-Apr-2014 11:16:53 (UTC+8)
摘要 在現今文獻典藏數位化技術發展蓬勃之下,已經有許多古籍資料庫提供豐富的歷史典籍給史學家查詢搜索,結合資訊技術可以協助史學研究進行分析比較,減輕過程中處理大量資料的困擾,成為歷史學者檢校、查考、補註或是訂正的輔助工具。
本論文之研究目的是從歷史文獻中進行職官資訊的探勘與擷取,運用資料探勘技術,根據職官名稱,由史料中識別職官的人名與任期,以自動產生職官年表。我們提出基於歷史文本的寫作特性,藉由資料探勘與資訊擷取的相關技術,利用區間頻繁項目集探勘的方式,在建置的職官資料庫當中識別擔任職官的人名,同時擷取出擔任職官的任期,最後利用這些職官資訊來產生康熙時期的職官年表。
As the advance of the technique of digital archives, there exist many historical databases that provide abundant historical documents for historian searching. The integration with information technology can help historical researchers and reduce the struggle of handling a huge number of data.
The research goal of this thesis is mining and extracting official information from historical documents. We propose the algorithm to extract the named-entity of official based on the frequent itemset mining with period on the official database. We use these official information to generate the official chronology of Kangxi’s Reign.
參考文獻 [1] 二月河,康熙大帝,台經院文化,臺北,2001年。
[2] 毛婷婷、李麗雙與黃德根,基於混合模型的中國人名識別,《中文信息學報》,第二十一卷,第二期,2007年。
[3] 朱政吉,由史料中探勘社會網絡:以乾隆時期為例,國立政治大學資訊科學學系碩士論文,2008年。
[4] 向曉雯、史曉東與曾準琳,一個統計與規則相結合的中文命名實體識別系統,計算機應用,第二十五卷,第十期,2005年。
[5] 李中國與劉穎,邊界模板和局部統計相結合的中國人名識別,《中文信息學報》第二十卷,第五期,2006年。
[6] 李振昌、李御璽與陳信希,中文文本專有名詞辨識問題之研究,國立臺灣大學資訊工程學系碩士論文,1994年。
[7] 李澍田編,清實錄東北史料全輯 (共三冊),吉林文史出版社,長春,1988年。
[8] 張尚斌,詞夾子演算法在專有名詞辨識上的應用─以歷史文件為例,國立臺灣大學資訊工程學系碩士論文,2005年。
[9] 張敏與毛少平,用於信息檢索的古文統計分析,《中文信息學報》第十五卷第六期,2001年。
[10] 陳捷先,滿文清實錄研究,大化書局出版社,臺北,1978年。
[11] 趙爾巽等纂修,清史稿 (共五冊),博愛出版社,臺北,1983年。
[12] 錢實甫編,清代職官年表 (共四冊),中華書局出版社,北京,1980年。
[13] R. Agrawal and R. Srikant,“Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
[14] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, The ACM Press, 1999.
[15] O. Bender, F. J. Och, and H. Ney, “Maximum Entropy Models for Named Entity Recognition,” Proceedings of the seventh conference on Natural language learning, 2003.
[16] C. K. Fan and W. H. Tsai, “Automatic Word Identification in Chinese Sentences by the Relaxation Technique,” Proceeding of National Computer Symposium, 1987.
[17] D. Farmakiotou, V. Karkaletsis, J. Koutsias, G. Sigletos, C. D. Spyropoulos, and P. Stamatopoulos, “Rule-Based Named Entity Recognition For Greek Financial Texts,” In Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries, 2000.
[18] M. Fresko, B. Rozenfeld, and Ronen Feldman, “A Hybrid Approach to NER by Integrating Manual Rules into MEMM,” AI and Math, 2006.
[19] J. Han and M. Kamber, Data mining: Concepts and Techniques Second Edition, Elsevier, San Francisco, 2006.
[20] J. W. Huang, B. R. Dai, and M. S. Chen, “Twain: Two-End Association Miner with Precise Frequent Exhibition Periods,” ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 2, 2007.
[21] H. Isozaki and H. KazawaEfficient, “Support Vector Classifiers for Named Entity Recognition,” In Proceedings of the 19th International Conference on Computational Linguistics, 2002.
[22] K.T. Lua and K.W. Gan, “An Application of Information Theory in Chinese Word Segmentation,” Journal of Computer Processing of Chinese and Oriental Languge, Vol. 8, No. 1, 1994.
[23] B. I. Li, S. Lien, C. F. Sun and M. S. Sun, “A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution,” Proceedings of the Conference on Research on Computational Linguistics, 1991.
[24] J. Y. Nie, M. L. Hannan, and W. Jin, “Unknow Word Detection and Segmentation of Chinese Using Statistical and Heuristic Knowledge,” Journal of Communications of the Chinese and Oriental Languages Information Processing Society, Vol. 5, 1995.
[25] B. Settles, “Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets,” Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 2004.
[26] R. Song, “Person Name Recognition Method Based on Corpus and Rule,” In Computational Language Research and Development, L. W. Chen & Q. Yuan, ed., Beijing Institute of Linguistic Press, 1993.
[27] Y. Wu, J. Zhao and B. Xu, “Chinese Named Entity Recognition Combining a Statistical Model with Human Knowledge,” In Proceedings of the Workshop on Multilingual and Mixed-language Named Entity Recognition, 2003.
[28] G. D. Zhou and J. Su, “Named Entity Recognition using an HMM-based Chunk Tagger,” Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002.
[29] X. Zhu, M. Li, J. Gao and C. N. Huang, “Single Character Chinese Named Entity Recognition,” Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, 2003.
[30] H. P. Zhang, Q. Liu, H. Yu, X. Cheng and S. Bai, “Chinese Named Entity Recognition Using Role Model,” Computational Linguistics and Chinese Language Processing, Vol. 8, No. 2, 2003.
[31] G. D. Zhou and J. Su, “Named Entity Recognition using and HMM-based Chunk Tagger,” Preceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002.
描述 碩士
國立政治大學
資訊科學學系
96753016
98
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0096753016
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.advisor Shan, Man Kwanen_US
dc.contributor.author (Authors) 闕伯丞zh_TW
dc.contributor.author (Authors) Que, Bo Chengen_US
dc.creator (作者) 闕伯丞zh_TW
dc.creator (作者) Que, Bo Chengen_US
dc.date (日期) 2009en_US
dc.date.accessioned 1-Apr-2014 11:16:53 (UTC+8)-
dc.date.available 1-Apr-2014 11:16:53 (UTC+8)-
dc.date.issued (上傳時間) 1-Apr-2014 11:16:53 (UTC+8)-
dc.identifier (Other Identifiers) G0096753016en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/65080-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 96753016zh_TW
dc.description (描述) 98zh_TW
dc.description.abstract (摘要) 在現今文獻典藏數位化技術發展蓬勃之下,已經有許多古籍資料庫提供豐富的歷史典籍給史學家查詢搜索,結合資訊技術可以協助史學研究進行分析比較,減輕過程中處理大量資料的困擾,成為歷史學者檢校、查考、補註或是訂正的輔助工具。
本論文之研究目的是從歷史文獻中進行職官資訊的探勘與擷取,運用資料探勘技術,根據職官名稱,由史料中識別職官的人名與任期,以自動產生職官年表。我們提出基於歷史文本的寫作特性,藉由資料探勘與資訊擷取的相關技術,利用區間頻繁項目集探勘的方式,在建置的職官資料庫當中識別擔任職官的人名,同時擷取出擔任職官的任期,最後利用這些職官資訊來產生康熙時期的職官年表。
zh_TW
dc.description.abstract (摘要) As the advance of the technique of digital archives, there exist many historical databases that provide abundant historical documents for historian searching. The integration with information technology can help historical researchers and reduce the struggle of handling a huge number of data.
The research goal of this thesis is mining and extracting official information from historical documents. We propose the algorithm to extract the named-entity of official based on the frequent itemset mining with period on the official database. We use these official information to generate the official chronology of Kangxi’s Reign.
en_US
dc.description.tableofcontents 中文摘要 ii
英文摘要 iii
致謝 iv
目錄 vi
圖目錄 ix
表目錄 xi
第一章 概論 1
1.1 前言 1
1.2 歷史與資訊科學 3
1.3 研究之動機與目的 4
1.4 採用的史料文本 7
1.5 論文架構 8
第二章 相關研究 9
2.1 命名實體識別 (Named Entity Recognition, NER) 9
2.1.1 背景領域和語言差異 10
2.1.2 中文命名實體識別 (Chinese Named Entity Recognition) 12
2.2 詞夾子演算法 (Word-Clip Algorithm) 13
第三章 從史料中產生職官任職資訊 15
3.1 史料寫作特性 17
3.1.1 職官名稱與人名位置關係 20
3.1.2 位置關係之特例 21
3.2 以資料探勘識別擔任職官之候選人名 24
3.2.1 職官bi-gram資料庫建置 25
3.2.2 以Twain探勘候選人名及頻繁區間 28
3.2.3 組合候選人名 35
3.2.4 候選人名過濾 36
3.3 排名候選人名 37
3.3.1 平均任期 37
3.3.2 職官品位變化幅度 39
3.3.3 職官變動率 41
3.3.4 與各職官名稱平均距離 41
3.4 人名識別與任期判斷 43
3.4.1 時間擷取 44
3.4.2 任期時間判斷與職官年表產生 45
第四章 實驗評估與結果 49
4.1 實驗資料來源 49
4.2 實驗評估方法 51
4.3 實驗結果 54
4.3.1 實驗參數 54
4.3.2 造成誤差之因素 60
第五章 結論與未來研究 63
5.1 結論 63
5.2 未來研究 64
參考文獻 66
附錄 70
附錄一:研究採用二品的部份職官清單 70
zh_TW
dc.format.extent 1983148 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0096753016en_US
dc.subject (關鍵詞) 資料探勘zh_TW
dc.subject (關鍵詞) 職官年表zh_TW
dc.subject (關鍵詞) 清聖祖實錄zh_TW
dc.subject (關鍵詞) Data Miningen_US
dc.subject (關鍵詞) Official Chronologyen_US
dc.subject (關鍵詞) Veritable Records of Qingen_US
dc.title (題名) 由史料中探勘職官年表:以康熙時期為例zh_TW
dc.title (題名) Discovering Official Chronology from Historical Documents Using Kangxi`s Reign as an Exampleen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] 二月河,康熙大帝,台經院文化,臺北,2001年。
[2] 毛婷婷、李麗雙與黃德根,基於混合模型的中國人名識別,《中文信息學報》,第二十一卷,第二期,2007年。
[3] 朱政吉,由史料中探勘社會網絡:以乾隆時期為例,國立政治大學資訊科學學系碩士論文,2008年。
[4] 向曉雯、史曉東與曾準琳,一個統計與規則相結合的中文命名實體識別系統,計算機應用,第二十五卷,第十期,2005年。
[5] 李中國與劉穎,邊界模板和局部統計相結合的中國人名識別,《中文信息學報》第二十卷,第五期,2006年。
[6] 李振昌、李御璽與陳信希,中文文本專有名詞辨識問題之研究,國立臺灣大學資訊工程學系碩士論文,1994年。
[7] 李澍田編,清實錄東北史料全輯 (共三冊),吉林文史出版社,長春,1988年。
[8] 張尚斌,詞夾子演算法在專有名詞辨識上的應用─以歷史文件為例,國立臺灣大學資訊工程學系碩士論文,2005年。
[9] 張敏與毛少平,用於信息檢索的古文統計分析,《中文信息學報》第十五卷第六期,2001年。
[10] 陳捷先,滿文清實錄研究,大化書局出版社,臺北,1978年。
[11] 趙爾巽等纂修,清史稿 (共五冊),博愛出版社,臺北,1983年。
[12] 錢實甫編,清代職官年表 (共四冊),中華書局出版社,北京,1980年。
[13] R. Agrawal and R. Srikant,“Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
[14] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, The ACM Press, 1999.
[15] O. Bender, F. J. Och, and H. Ney, “Maximum Entropy Models for Named Entity Recognition,” Proceedings of the seventh conference on Natural language learning, 2003.
[16] C. K. Fan and W. H. Tsai, “Automatic Word Identification in Chinese Sentences by the Relaxation Technique,” Proceeding of National Computer Symposium, 1987.
[17] D. Farmakiotou, V. Karkaletsis, J. Koutsias, G. Sigletos, C. D. Spyropoulos, and P. Stamatopoulos, “Rule-Based Named Entity Recognition For Greek Financial Texts,” In Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries, 2000.
[18] M. Fresko, B. Rozenfeld, and Ronen Feldman, “A Hybrid Approach to NER by Integrating Manual Rules into MEMM,” AI and Math, 2006.
[19] J. Han and M. Kamber, Data mining: Concepts and Techniques Second Edition, Elsevier, San Francisco, 2006.
[20] J. W. Huang, B. R. Dai, and M. S. Chen, “Twain: Two-End Association Miner with Precise Frequent Exhibition Periods,” ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 2, 2007.
[21] H. Isozaki and H. KazawaEfficient, “Support Vector Classifiers for Named Entity Recognition,” In Proceedings of the 19th International Conference on Computational Linguistics, 2002.
[22] K.T. Lua and K.W. Gan, “An Application of Information Theory in Chinese Word Segmentation,” Journal of Computer Processing of Chinese and Oriental Languge, Vol. 8, No. 1, 1994.
[23] B. I. Li, S. Lien, C. F. Sun and M. S. Sun, “A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution,” Proceedings of the Conference on Research on Computational Linguistics, 1991.
[24] J. Y. Nie, M. L. Hannan, and W. Jin, “Unknow Word Detection and Segmentation of Chinese Using Statistical and Heuristic Knowledge,” Journal of Communications of the Chinese and Oriental Languages Information Processing Society, Vol. 5, 1995.
[25] B. Settles, “Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets,” Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 2004.
[26] R. Song, “Person Name Recognition Method Based on Corpus and Rule,” In Computational Language Research and Development, L. W. Chen & Q. Yuan, ed., Beijing Institute of Linguistic Press, 1993.
[27] Y. Wu, J. Zhao and B. Xu, “Chinese Named Entity Recognition Combining a Statistical Model with Human Knowledge,” In Proceedings of the Workshop on Multilingual and Mixed-language Named Entity Recognition, 2003.
[28] G. D. Zhou and J. Su, “Named Entity Recognition using an HMM-based Chunk Tagger,” Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002.
[29] X. Zhu, M. Li, J. Gao and C. N. Huang, “Single Character Chinese Named Entity Recognition,” Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, 2003.
[30] H. P. Zhang, Q. Liu, H. Yu, X. Cheng and S. Bai, “Chinese Named Entity Recognition Using Role Model,” Computational Linguistics and Chinese Language Processing, Vol. 8, No. 2, 2003.
[31] G. D. Zhou and J. Su, “Named Entity Recognition using and HMM-based Chunk Tagger,” Preceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002.
zh_TW