dc.contributor.advisor | 劉昭麟 | zh_TW |
dc.contributor.advisor | Liu, Chao-Lin | en_US |
dc.contributor.author (Authors) | 鄭人豪 | zh_TW |
dc.contributor.author (Authors) | Cheng, Jen-Hao | en_US |
dc.creator (作者) | 鄭人豪 | zh_TW |
dc.creator (作者) | Cheng, Jen-Hao | en_US |
dc.date (日期) | 2006 | en_US |
dc.date.accessioned | 17-Sep-2009 13:59:31 (UTC+8) | - |
dc.date.available | 17-Sep-2009 13:59:31 (UTC+8) | - |
dc.date.issued (上傳時間) | 17-Sep-2009 13:59:31 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0093753030 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/32662 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊科學學系 | zh_TW |
dc.description (描述) | 93753030 | zh_TW |
dc.description (描述) | 95 | zh_TW |
dc.description.abstract (摘要) | 國外法學資訊系統已研究多年,嘗試利用科技幫助提昇司法審判的效率。重要的議題包括輔助判決,法律文件分類,或是相似案件搜尋等。本研究將針對中文裁判書的分類做進一步談討。在文件特徵表示方面,我們以有序詞組來表達中文裁判書,我們嘗試比較採用不同的詞彙來源對於分類效果的影響。實驗中我們分別採用一般通用的電子詞典建立一般詞組;以及以演算法取出法學專業詞彙集建立專業詞組。並依tf-idf(term frequency – inverse document frequency)的概念,設計兩種詞組權重tpf-idf(term pair frequency – inverse document frequency)以及tpf-icf(term pair frequency – inverse category frequency),來計算特徵詞組權重。在文件分類演算法方面,我們實作以相似度為基礎的k最近鄰居法作為系統分類機制,藉由裁判書的案由欄位,將案例分為七種類別,分別為竊盜、搶奪、強盜、贓物、傷害、恐嚇以及賭博。並藉由觀察案例資料庫的相似度分佈,以找出恰當的參數,進一步得到較佳的分類正確率與較低的拒絕率。我們並依照自省式學習法的精神,建立權重調整的機制。企圖藉由自省式學習法提昇分類效果,以及找出對分類有影響的詞組。而我們以案例資料庫的相似度差異值以及距離差異值,分析調整前後案例資料庫的變化,藉以觀察自省式學習法的效果。 | zh_TW |
dc.description.abstract (摘要) | Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents.I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf.In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases.To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles. | en_US |
dc.description.tableofcontents | 第一章 序論 1 第二章 文獻回顧 5 第三章 背景知識與資料來源 15 第四章 基本定義與前處理 21 第五章 k最近鄰居法分類技術 30 第六章 自省式學習法 58 第七章 結論與未來展望 92 參考文獻 98 | zh_TW |
dc.format.extent | 50367 bytes | - |
dc.format.extent | 76663 bytes | - |
dc.format.extent | 69544 bytes | - |
dc.format.extent | 106681 bytes | - |
dc.format.extent | 119521 bytes | - |
dc.format.extent | 142933 bytes | - |
dc.format.extent | 125524 bytes | - |
dc.format.extent | 133899 bytes | - |
dc.format.extent | 282200 bytes | - |
dc.format.extent | 293071 bytes | - |
dc.format.extent | 112509 bytes | - |
dc.format.extent | 75115 bytes | - |
dc.format.extent | 5609969 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0093753030 | en_US |
dc.subject (關鍵詞) | 法學資訊系統 | zh_TW |
dc.subject (關鍵詞) | 自然語言處理 | zh_TW |
dc.subject (關鍵詞) | k最近鄰居法 | zh_TW |
dc.subject (關鍵詞) | 自省式學習法 | zh_TW |
dc.subject (關鍵詞) | Legal information system | en_US |
dc.subject (關鍵詞) | Natural language processing | en_US |
dc.subject (關鍵詞) | k nearest neighbor | en_US |
dc.subject (關鍵詞) | introspective learning | en_US |
dc.title (題名) | 中文詞彙集的來源與權重對中文裁判書分類成效的影響 | zh_TW |
dc.title (題名) | Exploring the Influences of Lexical Sources and Term Weights on the Classification of Chinese Judgment Documents | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | [1] HowNet電子詞典1999年版本 153H153Hhttp://www.keenage.com/ | zh_TW |
dc.relation.reference (參考文獻) | [2] WestLaw Thesaurus 154H154Hhttp://lawschool.westlaw.com/ | zh_TW |
dc.relation.reference (參考文獻) | [3] 中央研究院 155H155Hhttp://www.sinica.edu.tw/中央研究院平衡語料庫156H156Hhttp://www.sinica.edu.tw/~tibe/2-words/modern-words/index.html | zh_TW |
dc.relation.reference (參考文獻) | [4] 中華民國計算語言學http://www.aclclp.org.tw/ | zh_TW |
dc.relation.reference (參考文獻) | 中文詞知識庫及中文語法http://www.aclclp.org.tw/use_ckip_c.php | zh_TW |
dc.relation.reference (參考文獻) | [5] 司法院法學檢索系統157H157Hhttp://jirs.judicial.gov.tw/ | zh_TW |
dc.relation.reference (參考文獻) | [6] 司法院司法統計http://www.judicial.gov.tw/juds/index1.htm | zh_TW |
dc.relation.reference (參考文獻) | [7] 法務部全國法規資料庫 158H158Hhttp://law.moj.gov.tw/ | zh_TW |
dc.relation.reference (參考文獻) | [8] 林吉鶴,專家系統應用於命案犯罪現場之研究,行政院國科會科資中心 NSC84-2414-H015-001,1996。 | zh_TW |
dc.relation.reference (參考文獻) | [9] 張正宗,電腦輔助簡易刑事判決技術之探討,碩士論文,國立政治大學,台北,台灣,2003。 | zh_TW |
dc.relation.reference (參考文獻) | [10] 陳永德,中文斷詞中長詞優先、詞頻比對與前詞優先規則之使用,博士論文,國立台灣大學,台北,台灣,1997。 | zh_TW |
dc.relation.reference (參考文獻) | [11] 楊才蔚及呂士奇,女法官積勞成疾臨終遺言勸大家莫熬夜,東森新聞報159H159Hhttp://www.ettoday.com/2002/08/24/322-1343820.htm,2002。 | zh_TW |
dc.relation.reference (參考文獻) | [12] 與板橋地方法院何君豪法官私人通信。 | zh_TW |
dc.relation.reference (參考文獻) | [13] 廖鼎銘,觸犯多款法條之賭博與竊盜案件的法院文書的分類與分析,碩士論文,國立政治大學,台北,台灣,2004。 | zh_TW |
dc.relation.reference (參考文獻) | [14] 謝淳達,利用詞組檢索中文訴訟文書之研究,碩士論文,國立政治大學,台北,台灣,2005。 | zh_TW |
dc.relation.reference (參考文獻) | [15] ACM International Conference on Artificial Intelligence and Law (ICAIL): http://portal.acm.org/browse_dl.cfm?coll=portal&dl=ACM&idx=SERIES732&linked=1&part=series | zh_TW |
dc.relation.reference (參考文獻) | [16] K. Al-Kofahi, A. Tyrrell, A. Vachher and P. Jackson, A machine learning approach to prior case retrieval, Proceedings of the Eighth International Conference on Artificial Intelligence and Law, pp. 88-93, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [17] K. D. Ashley and E. L. Rissland, But, see, accord: Generating Blue Book citation in HYPO, Proceedings of the First International Conference on Artificial Intelligence and Law, pp. 67-74, 1987. | zh_TW |
dc.relation.reference (參考文獻) | [18] S. Bruninghaus, K. D. Ashley, Toward adding knowledge to learning algorithms for indexing legal cases, Proceedings of the Seventh International Conference on Artificial Intelligence and Law, pp. 9-17, 1999. | zh_TW |
dc.relation.reference (參考文獻) | [19] L. F. Chien, Fast and quasi-natural language search for gigabytes of Chinese texts, Proceedings of the Eighteenth ACM Special Interest Group of Information Retrieval conference on Research and development in information retrieval, pp.112–120, 1995. | zh_TW |
dc.relation.reference (參考文獻) | [20] L. F. Chien, PAT-tree-based keyword extraction for Chinese information retrieval, Proceedings of the Twentieth Annual International ACM Special Interest Group of Information Retrieval Conference on Research and Development in Information Retrieval, pp. 50-58, 1997. | zh_TW |
dc.relation.reference (參考文獻) | [21] R. Feldman, I. Dagan, Mining text using keyword distributions, Journal of Intelligent Information Systems, Volume 10, pp. 281-300, 1998. | zh_TW |
dc.relation.reference (參考文獻) | [22] K. M. Hammouda and M. S. Kamel, Phrase-based document similarity based on an index graph model, Proceedings of the Second IEEE International Conference on Data Mining, pp. 203-210, 2002. | zh_TW |
dc.relation.reference (參考文獻) | [23] E. H. Han, G. Karypis and V. Kumar, Text categorization using weight adjusted k-Nearest Neighbor classification, Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53-65, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [24] Y. J. Ko and Y. J. Seo, Automatic text categorization by unsupervised learning, Proceedings of the Eighteenth conference on Computational linguistics, pp. 453-459, 2000. | zh_TW |
dc.relation.reference (參考文獻) | [25] Y. J. Ko and Y. J. Seo, Text categorization using feature projections, Proceedings of the Nineteenth international conference on Computational linguistics, Volume 1, pp.1-7, 2002. | zh_TW |
dc.relation.reference (參考文獻) | [26] G. Lame, A categorization method for French legal documents on the Web, Proceedings of the Eighth International Conference on Artificial Intelligence and Law, pp. 219-220, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [27] B. L. Li, Q. Lu and S. W. Yu, An adaptive k-nearest neighbor text categorization strategy, ACM Transactions on Asian Language Information Processing,Volume 3 , Issue 4, pp. 215 -226, 2004 | zh_TW |
dc.relation.reference (參考文獻) | [28] C. D. Manning, and H. Schutze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999. | zh_TW |
dc.relation.reference (參考文獻) | [29] T. Mitchell, Machine Learning, McGraw Hill, 1997. | zh_TW |
dc.relation.reference (參考文獻) | [30] D. D. Palmer and J. D. Burger, Chinese Word Segmentation and Information Retrieval, AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Electronic Working Notes, 1997 | zh_TW |
dc.relation.reference (參考文獻) | [31] E. L. Rissland and K. D. Ashley, A case-based system for Trade Secrets Law, Proceedings of the First International Conference on Artificial Intelligence and Law, pp. 60-66, 1987. | zh_TW |
dc.relation.reference (參考文獻) | [32] U. J. Schild, Intelligent computer systems for criminal sentencing, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Law, pp. 229-238, 1995. | zh_TW |
dc.relation.reference (參考文獻) | [33] P. Thompson, Automatic categorization of case law, Proceedings of the Eighth International Conference on Artificial Intelligence and Law, pp.70-77, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [34] J. J. Tsay and J. D. Wang, Design and evaluation of approaches to automatic Chinese text categorization, Computational Linguistics and Chinese Language Processing, Volume 5, No.2, pp. 43-58, 2000. | zh_TW |
dc.relation.reference (參考文獻) | [35] B. Verheij, Automated argument assistance for lawyers, Proceedings of the Seventh international conference on Artificial intelligence and law, pp. 43-52, 1999. | zh_TW |
dc.relation.reference (參考文獻) | [36] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, 2005. | zh_TW |
dc.relation.reference (參考文獻) | [37] Y .Yang, A study of thresholding strategies for text categorization, Proceedings of the Twenty-fourth annual international ACM Special Interest Group of Information Retrieval Conference on Research and Development in Information Retrieval, pp. 137-145, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [38] Z. Zhang, Q. Yang , Feature weight maintenance in case bases using introspective learning, Journal of Intelligent Information Systems, Volume 16, pp. 95-116, 2001. | zh_TW |