Publications-Theses

題名 由史料中探勘社會網絡:以乾隆時期為例
Social Network Mining from Historical Documents- by Example during Qianlong`s Reign
作者 朱政吉
Chu, Cheng Ji
貢獻者 沈錳坤
Shan, Man Kwan
朱政吉
Chu, Cheng Ji
關鍵詞 社會網絡探勘
乾隆
Social Network Mining
Qianlong
日期 2007
上傳時間 8-Dec-2010 12:11:08 (UTC+8)
摘要 古今中外歷史中,在政治權力的結構裡,除了在最上位的領袖外,其下的文武百官,往往根據其職份或私交等情況,自成人際關係網絡。然而,依照每個人在網絡中位置的不同,重要程度也有所不同。在網絡中扮演重要角色者,除了代表其人際關係愈複雜外,同時也暗示其政治影響力愈大。這些人物往往也就是足以影響當時政治的「權臣」。然而在歷史上,有些皇帝的在位時間較長、統治時間較久。在其統治期間,可能因為皇帝本身,或政治環境遞嬗等因素,使得不同的時期有不同「權臣」,或是其晚年才出現明顯的「權臣」。本論文便是基於這樣的歷史現象,研究由史料中探勘當時的人脈網絡。我們先從文本中,自動擷取出人名。然後,藉由人物在文本中與其他人物的共現場合,建立歷史人物的人脈網絡,接著利用社會網絡分析的理論基礎,分析這些網絡,進而在網絡中找出權臣,以及偵測政治權力結構的變化,為時代作出分期。本研究選用的文本為《清實錄》中的《高宗純皇帝實錄》,意欲以清高宗 (乾隆)時期為例,探勘該朝的人脈網絡,完成上述之工作。希望這樣的研究,可以在中國政治制度史等研究上,協助史學研究者。
In power structure from ancient times to the present, officials who under the leader usually take part in the social network according to their positions or friendship. However, the importance of each person is different by their locations in network. The people who play important roles in network have complex interpersonal relationship as well as high influence in political situation. We call them "chief counselors." But in the history, some emperors reign for extremely many years. Due to some causes, such like emperor himself or changing of political circumstances, there could be several different chief counselors during their reign. This thesis focuses on social network mining from historical documents in view of above-mentioned historical phenomenon. After extracting person names from the corpus, we can construct social network by co-occurrence of people, then to find chief counselors and detect transition of power structure by Social Network Analysis. The "Veritable Records of Gaozong" is taken as the example for experiments and the result of effectiveness analysis demonstrates that the proposed methods are helpful to assist historian for historical research.
參考文獻 [1] 毛婷婷、李麗雙與黃德根,"基於混合模型的中國人名識別",《中文信息學報》,第二十一卷,第二期,2007年。
[2] 古鴻廷,清代官制研究,五南圖書出版社,臺北,2005年。
[3] 江建忠,蝕日者,年輪文化出版社,臺北,1999年。
[4] 李中國與劉穎,"邊界模板和局部統計相結合的中國人名識別",《中文信息學報》第二十卷,第五期,2006年。
[5] 杜維運,史學方法論,三民書局出版社,臺北,2001年10月。
[6] 李澍田編,清實錄東北史料全輯 (共三冊),吉林文史出版社,長春,1988年。
[7] 張尚斌,"詞夾子演算法在專有名詞辨識上的應用─以歷史文件為例",臺灣大學資訊工程學系碩士論文,2005年。
[8] 張敏與毛少平,"用於信息檢索的古文統計分析",《中文信息學報》第十五卷第六期,2001年。
[9] 陳捷先,滿文清實錄研究,大化書局出版社,臺北,1978年。
[10] 趙爾巽等纂修,清史稿 (共五冊),博愛出版社,臺北,1983年。
[11] 蔡秉叡,"和珅與乾隆朝晚期(1775-1795)政局之研究",成功大學歷史學系碩士論文,2007年。
[12] 蕭一山,清代通史(共三冊),商務出版社,臺北,1962年。
[13] 蕭一山,清史,中華文化出版社,臺北,1952年。
[14] 錢實甫編,清代職官年表 (共四冊),中華書局出版社,北京,1980年。
[15] 羅鳳珠,"臺灣地區中國古籍數位化的現況與展望",第三次兩岸古籍整理研究學術討論會,2001年4月。
[16] E. Agichtein and L. Gravano, "Snowball: Extracting Relations from Large Plain-text Collections," Proceedings of the 5th ACM Conference on Digital Libraries, pages 85-94, 2000.
[17] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, The ACM Press, 1999.
[18] M. A. Beauchamp, "An Improved Index of Centrality," Behavioral Science, Vol. 10, pages 161-163, 1965.
[19] R. L. Breiger, The Analysis of Social Networks: In Handbook of Data Analysis, London: Sage Publication, 2004.
[20] D. Davidov, A. Rappoport and M. Koppel, "Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining," Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 232-239, 2007.
[21] C. K. Fan and W. H. Tsai, "Automatic Word Identification in Chinese Sentences by the Relaxation Technique," Proceedings of National Computer Symposium, pages 423-431, 1987.
[22] L. Freeman, "Centrality in Social Networks: Conceptual Clarification," Social Networks, Vol. 1, No. 3, pages 215-239, 1979.
[23] S. L. Hakimi, "Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph," Operations Research, Vol. 12, No. 3, pages 450-459.
[24] M. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proceedings of the 14th Annual Meeting of the Association of Computational Linguistics, pages 535-545, 1992.
[25] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Englewood Cliffs: Prentice Hall Publication, 1988.
[26] B.Y. Li, S. Lin, C.F. Sun and M.S. Sun, "A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution," Proceedings of R.O.C. Computational Linguistics Conference, pages 135-146, 1991.
[27] K.T Lua and K.W Gan, "An Application of Information Theory in Chinese Word Segmentation," Journal of Computer Processing of Chinese and Oriental Language, Vol. 8, No. 1, pages115-124, 1994.
[28] U. Manber, Introduction to Algorithms: A Creative Approach, Addison Wesley, 1989.
[29] S. Miller, H. Fox, L. Ramshaw, and R. Weischedel, "A Novel Use of Statistical Parsing to Extract Information from Text," Proceedings of the 6th Applied Natural Language Processing Conference, pages 226-233, 2000.
[30] M. Newman, "The structure and function of complex networks," SIAM Review, Vol. 45, No. 2, pages 167- 256, 2003.
[31] J. Y. Nie, M. L. Hannan, and W. Jin, "Unknown Word Detection and Segmentation of Chinese Using Statistical and Heuristic Knowledge," Journal of Communications of the Chinese and Oriental Languages Information Processing Society, Vol. 5, pages 47-57, 1995.
[32] J. Nieminen, "On Centrality in a Graph," Scandinavian Journal of Psychology, Vol. 15, pages 332-226, 1974.
[33] W. D. Nooy, Exploratory network analysis with Pajek, New York: Cambridge University Press, 2005.
[34] E. Riloff and R. Jones, "Learning Dictionaries for Information Extraction by Multi-level Bootstrapping," Proceedings of the 16th National Conference on Artificial Intelligence, pages 474-479, 1999.
[35] P. Turney, "Expressing Implicit Semantic Relations without Supervision," Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association of Computational Linguistics, pages 313-320, 2006.
[36] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications, New York: Cambridge University Press, 1994.
[37] D. Zelenko, C. Aone, and A. Richardella, "Kernel Methods for Relation Extraction," Journal of Machine Learning Research, Vol. 3, pages 1083-1106, 2003.
[38] X. Zhu, M. Li, J. Gao and C.N. Huang, "Single Character Chinese Named Entity Recognition," Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, pages 125-132, 2003.
描述 碩士
國立政治大學
資訊科學學系
95753014
96
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0957530141
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.advisor Shan, Man Kwanen_US
dc.contributor.author (Authors) 朱政吉zh_TW
dc.contributor.author (Authors) Chu, Cheng Jien_US
dc.creator (作者) 朱政吉zh_TW
dc.creator (作者) Chu, Cheng Jien_US
dc.date (日期) 2007en_US
dc.date.accessioned 8-Dec-2010 12:11:08 (UTC+8)-
dc.date.available 8-Dec-2010 12:11:08 (UTC+8)-
dc.date.issued (上傳時間) 8-Dec-2010 12:11:08 (UTC+8)-
dc.identifier (Other Identifiers) G0957530141en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/49477-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 95753014zh_TW
dc.description (描述) 96zh_TW
dc.description.abstract (摘要) 古今中外歷史中,在政治權力的結構裡,除了在最上位的領袖外,其下的文武百官,往往根據其職份或私交等情況,自成人際關係網絡。然而,依照每個人在網絡中位置的不同,重要程度也有所不同。在網絡中扮演重要角色者,除了代表其人際關係愈複雜外,同時也暗示其政治影響力愈大。這些人物往往也就是足以影響當時政治的「權臣」。然而在歷史上,有些皇帝的在位時間較長、統治時間較久。在其統治期間,可能因為皇帝本身,或政治環境遞嬗等因素,使得不同的時期有不同「權臣」,或是其晚年才出現明顯的「權臣」。本論文便是基於這樣的歷史現象,研究由史料中探勘當時的人脈網絡。我們先從文本中,自動擷取出人名。然後,藉由人物在文本中與其他人物的共現場合,建立歷史人物的人脈網絡,接著利用社會網絡分析的理論基礎,分析這些網絡,進而在網絡中找出權臣,以及偵測政治權力結構的變化,為時代作出分期。本研究選用的文本為《清實錄》中的《高宗純皇帝實錄》,意欲以清高宗 (乾隆)時期為例,探勘該朝的人脈網絡,完成上述之工作。希望這樣的研究,可以在中國政治制度史等研究上,協助史學研究者。zh_TW
dc.description.abstract (摘要) In power structure from ancient times to the present, officials who under the leader usually take part in the social network according to their positions or friendship. However, the importance of each person is different by their locations in network. The people who play important roles in network have complex interpersonal relationship as well as high influence in political situation. We call them "chief counselors." But in the history, some emperors reign for extremely many years. Due to some causes, such like emperor himself or changing of political circumstances, there could be several different chief counselors during their reign. This thesis focuses on social network mining from historical documents in view of above-mentioned historical phenomenon. After extracting person names from the corpus, we can construct social network by co-occurrence of people, then to find chief counselors and detect transition of power structure by Social Network Analysis. The "Veritable Records of Gaozong" is taken as the example for experiments and the result of effectiveness analysis demonstrates that the proposed methods are helpful to assist historian for historical research.en_US
dc.description.tableofcontents 中文摘要 i
英文摘要 ii
誌謝 iii
目錄 vi
圖目錄 x
表目錄 xii
第一章 前言 1
1.1 中文古籍文獻數位化的時代背景 1
1.2 傳統的史學方法與資訊科學的關係 2
1.3 本研究的目的、動機與方法 3
1.4 採用的實驗文本 4
1.5 本論文架構 5
第二章 相關研究 6
2.1 中文命名實體識別 6
2.2 建立字詞關係 9
2.3 社會網絡分析 12
第三章 由史料中探勘人脈網絡 16
3.1 歷史人名識別 17
3.1.1 使用詞夾子演算法 18
3.1.1.1 詞夾子的評分規則 18
3.1.1.2 提升人名識別的效果 21
3.1.2 根據文本特性修改詞夾子演算法 21
3.1.2.1 左詞夾子必須出現在官名列表中 22
3.1.2.2 利用左右詞夾子為頓號進行補召 23
3.1.3 協助人名識別的有用資源 24
3.2 探勘權臣 25
3.2.1 為歷史人物間建立連結 26
3.2.2 以網絡中心性分析人脈網絡 27
3.2.2.1 程度中心性 27
3.2.2.2 緊密中心性 28
3.2.2.3 中介中心性 29
3.3 偵測權力結構的變化 32
3.3.1 基於權臣個人的重要性變化 32
3.3.2 基於權力團體的消長 33
3.1.2.1 找出權力團體 33
3.1.2.2 權力團體間差異度的判斷 36
第四章 實驗評估與結果 40
4.1 歷史人名識別 40
4.1.1 限定左詞夾子為官名 42
4.1.2 利用詞庫過濾候選詞 43
4.1.3 利用頓號進行補召 43
4.1.4 剔除長期出現在文本的詞彙 44
4.2 探勘權臣 45
4.2.1 網絡中心性間的效果比較 46
4.2.2 有無權重間的效果比較 48
4.2.3 探勘權臣的整體實驗探討 48
4.3 偵測權力結構變化 50
4.3.1 前後兩年間權力結構差異度的結果 51
4.3.2 差異門檻值的設定 52
4.3.3 偵測權力結構變化的整體實驗探討 53
第五章 結論與未來研究 55
參考文獻 56
附錄一:人名識別使用的百家姓列表 60
附錄二:人名識別使用的地名列表 61
附錄三:人名識別使用的官名列表 62
附錄四:人名識別給予的樣本人名 66
附錄五:權臣探勘的標準答案集 67
附錄六:軍機領班大臣表 70
附錄七:重要的內閣大學士之解職時間 71
zh_TW
dc.format.extent 55431 bytes-
dc.format.extent 287691 bytes-
dc.format.extent 153663 bytes-
dc.format.extent 230448 bytes-
dc.format.extent 393618 bytes-
dc.format.extent 496921 bytes-
dc.format.extent 713698 bytes-
dc.format.extent 590671 bytes-
dc.format.extent 166322 bytes-
dc.format.extent 165371 bytes-
dc.format.extent 609942 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0957530141en_US
dc.subject (關鍵詞) 社會網絡探勘zh_TW
dc.subject (關鍵詞) 乾隆zh_TW
dc.subject (關鍵詞) Social Network Miningen_US
dc.subject (關鍵詞) Qianlongen_US
dc.title (題名) 由史料中探勘社會網絡:以乾隆時期為例zh_TW
dc.title (題名) Social Network Mining from Historical Documents- by Example during Qianlong`s Reignen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] 毛婷婷、李麗雙與黃德根,"基於混合模型的中國人名識別",《中文信息學報》,第二十一卷,第二期,2007年。zh_TW
dc.relation.reference (參考文獻) [2] 古鴻廷,清代官制研究,五南圖書出版社,臺北,2005年。zh_TW
dc.relation.reference (參考文獻) [3] 江建忠,蝕日者,年輪文化出版社,臺北,1999年。zh_TW
dc.relation.reference (參考文獻) [4] 李中國與劉穎,"邊界模板和局部統計相結合的中國人名識別",《中文信息學報》第二十卷,第五期,2006年。zh_TW
dc.relation.reference (參考文獻) [5] 杜維運,史學方法論,三民書局出版社,臺北,2001年10月。zh_TW
dc.relation.reference (參考文獻) [6] 李澍田編,清實錄東北史料全輯 (共三冊),吉林文史出版社,長春,1988年。zh_TW
dc.relation.reference (參考文獻) [7] 張尚斌,"詞夾子演算法在專有名詞辨識上的應用─以歷史文件為例",臺灣大學資訊工程學系碩士論文,2005年。zh_TW
dc.relation.reference (參考文獻) [8] 張敏與毛少平,"用於信息檢索的古文統計分析",《中文信息學報》第十五卷第六期,2001年。zh_TW
dc.relation.reference (參考文獻) [9] 陳捷先,滿文清實錄研究,大化書局出版社,臺北,1978年。zh_TW
dc.relation.reference (參考文獻) [10] 趙爾巽等纂修,清史稿 (共五冊),博愛出版社,臺北,1983年。zh_TW
dc.relation.reference (參考文獻) [11] 蔡秉叡,"和珅與乾隆朝晚期(1775-1795)政局之研究",成功大學歷史學系碩士論文,2007年。zh_TW
dc.relation.reference (參考文獻) [12] 蕭一山,清代通史(共三冊),商務出版社,臺北,1962年。zh_TW
dc.relation.reference (參考文獻) [13] 蕭一山,清史,中華文化出版社,臺北,1952年。zh_TW
dc.relation.reference (參考文獻) [14] 錢實甫編,清代職官年表 (共四冊),中華書局出版社,北京,1980年。zh_TW
dc.relation.reference (參考文獻) [15] 羅鳳珠,"臺灣地區中國古籍數位化的現況與展望",第三次兩岸古籍整理研究學術討論會,2001年4月。zh_TW
dc.relation.reference (參考文獻) [16] E. Agichtein and L. Gravano, "Snowball: Extracting Relations from Large Plain-text Collections," Proceedings of the 5th ACM Conference on Digital Libraries, pages 85-94, 2000.zh_TW
dc.relation.reference (參考文獻) [17] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, The ACM Press, 1999.zh_TW
dc.relation.reference (參考文獻) [18] M. A. Beauchamp, "An Improved Index of Centrality," Behavioral Science, Vol. 10, pages 161-163, 1965.zh_TW
dc.relation.reference (參考文獻) [19] R. L. Breiger, The Analysis of Social Networks: In Handbook of Data Analysis, London: Sage Publication, 2004.zh_TW
dc.relation.reference (參考文獻) [20] D. Davidov, A. Rappoport and M. Koppel, "Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining," Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 232-239, 2007.zh_TW
dc.relation.reference (參考文獻) [21] C. K. Fan and W. H. Tsai, "Automatic Word Identification in Chinese Sentences by the Relaxation Technique," Proceedings of National Computer Symposium, pages 423-431, 1987.zh_TW
dc.relation.reference (參考文獻) [22] L. Freeman, "Centrality in Social Networks: Conceptual Clarification," Social Networks, Vol. 1, No. 3, pages 215-239, 1979.zh_TW
dc.relation.reference (參考文獻) [23] S. L. Hakimi, "Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph," Operations Research, Vol. 12, No. 3, pages 450-459.zh_TW
dc.relation.reference (參考文獻) [24] M. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proceedings of the 14th Annual Meeting of the Association of Computational Linguistics, pages 535-545, 1992.zh_TW
dc.relation.reference (參考文獻) [25] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Englewood Cliffs: Prentice Hall Publication, 1988.zh_TW
dc.relation.reference (參考文獻) [26] B.Y. Li, S. Lin, C.F. Sun and M.S. Sun, "A Maximal Matching Automatic Chinese Word Segmentation Algorithm Using Corpus Tagging for Ambiguity Resolution," Proceedings of R.O.C. Computational Linguistics Conference, pages 135-146, 1991.zh_TW
dc.relation.reference (參考文獻) [27] K.T Lua and K.W Gan, "An Application of Information Theory in Chinese Word Segmentation," Journal of Computer Processing of Chinese and Oriental Language, Vol. 8, No. 1, pages115-124, 1994.zh_TW
dc.relation.reference (參考文獻) [28] U. Manber, Introduction to Algorithms: A Creative Approach, Addison Wesley, 1989.zh_TW
dc.relation.reference (參考文獻) [29] S. Miller, H. Fox, L. Ramshaw, and R. Weischedel, "A Novel Use of Statistical Parsing to Extract Information from Text," Proceedings of the 6th Applied Natural Language Processing Conference, pages 226-233, 2000.zh_TW
dc.relation.reference (參考文獻) [30] M. Newman, "The structure and function of complex networks," SIAM Review, Vol. 45, No. 2, pages 167- 256, 2003.zh_TW
dc.relation.reference (參考文獻) [31] J. Y. Nie, M. L. Hannan, and W. Jin, "Unknown Word Detection and Segmentation of Chinese Using Statistical and Heuristic Knowledge," Journal of Communications of the Chinese and Oriental Languages Information Processing Society, Vol. 5, pages 47-57, 1995.zh_TW
dc.relation.reference (參考文獻) [32] J. Nieminen, "On Centrality in a Graph," Scandinavian Journal of Psychology, Vol. 15, pages 332-226, 1974.zh_TW
dc.relation.reference (參考文獻) [33] W. D. Nooy, Exploratory network analysis with Pajek, New York: Cambridge University Press, 2005.zh_TW
dc.relation.reference (參考文獻) [34] E. Riloff and R. Jones, "Learning Dictionaries for Information Extraction by Multi-level Bootstrapping," Proceedings of the 16th National Conference on Artificial Intelligence, pages 474-479, 1999.zh_TW
dc.relation.reference (參考文獻) [35] P. Turney, "Expressing Implicit Semantic Relations without Supervision," Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association of Computational Linguistics, pages 313-320, 2006.zh_TW
dc.relation.reference (參考文獻) [36] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications, New York: Cambridge University Press, 1994.zh_TW
dc.relation.reference (參考文獻) [37] D. Zelenko, C. Aone, and A. Richardella, "Kernel Methods for Relation Extraction," Journal of Machine Learning Research, Vol. 3, pages 1083-1106, 2003.zh_TW
dc.relation.reference (參考文獻) [38] X. Zhu, M. Li, J. Gao and C.N. Huang, "Single Character Chinese Named Entity Recognition," Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, pages 125-132, 2003.zh_TW