中文司法裁判文書標記輔助環境初探

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	中文司法裁判文書標記輔助環境初探 A Prototype for Assisting the Labeling of Judicial Documents in Chinese
作者	黃翊唐 Huang, Yi-Tang
貢獻者	劉昭麟黃翊唐 Huang, Yi-Tang
關鍵詞	法學資訊系統開發自然語言處理判決書 Legal informatics System development Natural language processing Judicial ruling
日期	2023
上傳時間	6-Apr-2023 18:00:07 (UTC+8)
摘要	隨著科技日新月異，許多產業導入各種硬軟體進行自動化與數位化，但是在法學資訊領域上這件事情較難以發展，原因可能是案發原因複雜，難以使用文字完整記載、法官判決之理由也不全然會寫在判決書內，所以法學資訊相較於其他領域較難以發展。我們希望透過將判決書的各種標記，把判決書中的某些類別標記出來，像是：爭點、法官見解等等，將判決書中的線索解構出來，方便進行後續的檢索甚至是機器學習等等應用，本篇論文在研究標記系統的開發方法與相關技術。　　由於訓練品質精良的機器學習與深度學習模型，仰賴極大量的資料對模型進行訓練及測試，這些資料都必需由人工標記，極大量的資料透過人工瀏覽與標記是很容易出錯的，為了降低標記的錯誤率，以及提升資料集的品質，我們希望透過開發一套判決書的標記輔助系統，使用者能夠在本系統上進行判決書的檢索、瀏覽、上傳、標記與下載標記成果，藉由降低標記的難度，以及提升工作流程的順暢度，達到降低錯誤率的結果。　　As technology advances, many industries are adopting various software and hardware for automation and data management, but it is difficult to develop in the field of legal informatics. This may be because the reasons for the case are complicated and difficult to fully record in writing, and the reasons for the judge`s ruling are not entirely written in the ruling. Therefore, legal informatics is difficult to develop compared to other fields. We hope to mark the various marks in the ruling, mark out some categories in the ruling, such as: points of contention, judge`s opinion, etc., and deconstruct the clues in the ruling to facilitate subsequent retrieval or even machine learning applications. This paper studies the development methods and related technologies of the labeling system. 　　Due to the training and deep learning models with excellent training quality, it depends on a large amount of data to train and test the model. These materials must be manually marked by manual. The error rate and the quality of the data set, we hope to develop a set of judgment auxiliary systems, users can retrieve, browsing, uploading, marking, and download markers on this system to reduce the marking marks. Difficulty, and improve the smoothness of the workflow to reduce the result of reducing error rates.
參考文獻	司法院法學資料檢索系統 (2023)。檢自https://lawsearch.judicial.gov.tw/(January 01, 2023) 劉一凡、劉昭麟及楊婕。以民事訴訟之爭點分群為基礎的類似案件搜尋系統(Clustering Issues in Civil Judgments for Recommending Similar Cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),184-192。2022。劉威志、林泓任、吳柏憲及劉昭麟。老年扶養費請求案件之准駁及扶養金額預測 (Predicting judgments and grants for civil cases of alimony for the elderly), 第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),121-128。2022。林泓任、劉威志、劉昭麟及楊婕。以機器學習與規則方法辨識中文民事裁判書結構 (Using machine learning and pattern-based methods for identifying elements in Chinese judgment documents of civil cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),107-115。2022。 Elasticsearch. Retrieved from https://www.elastic.co/ (January 01, 2023) 黃詩淳、邵軒磊,人工智慧與法律資料分析之方法與應用:以單獨親權酌定裁判的預測模型為例。臺大法學論叢,第 48 卷第 4 期,2023-2073。2019。司法院 : 國民法官制度 (2023)。檢自https://social.judicial.gov.tw/CJlandingpage/ (February 03, 2023) Label Studio. Retrieved from https://labelstud.io/ (January 01, 2023) Pip. Retrieved from https://pypi.org/project/pip/ (January 01, 2023) Amazon Web Services. Retrieved from https://aws.amazon.com/ (February 03, 2023) Google Cloud Platform. Retrieved from https://console.cloud.google.com/ (February 03, 2023) Microsoft Azure. Retrieved from https://azure.microsoft.com/ (February 03, 2023) MARKUS. Retrieved from https://dh.chinese-empires.eu/markus/beta/index.html (January 01, 2023) MedKnowts. Retrieved from http://clinicalml.org/projects/medknowts/ (February 03, 2023) Murray, Luke, et al. Medknowts: unified documentation and information retrieval for electronic health records. The 34th Annual ACM Symposium on User Interface Software and Technology. 2021. Chen, Irene Y., Rahul G. Krishnan, and David Sontag. Clustering Interval-Censored Time-Series for Disease Phenotyping. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, No. 6, 6211-6221. 2022. Karlsson, Rickard KA, et al. Using time-series privileged information for provably efficient learning of prediction models. arXiv preprint arXiv:2110.14993. 2021. Huang, Yi-Tang, Hong-Ren Lin, and Chao-Lin Liu. Toward an Integrated Annotation and Inference Platform for Enhancing Justifications for Algorithmically Generated Legal Recommendations and Decisions. Legal Knowledge and Information Systems. IOS Press, 281-285. 2022.
描述	碩士國立政治大學資訊科學系 108753132
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0108753132
資料類型	thesis

dc.contributor.advisor	劉昭麟	zh_TW
dc.contributor.author (Authors)	黃翊唐	zh_TW
dc.contributor.author (Authors)	Huang, Yi-Tang	en_US
dc.creator (作者)	黃翊唐	zh_TW
dc.creator (作者)	Huang, Yi-Tang	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	6-Apr-2023 18:00:07 (UTC+8)	-
dc.date.available	6-Apr-2023 18:00:07 (UTC+8)	-
dc.date.issued (上傳時間)	6-Apr-2023 18:00:07 (UTC+8)	-
dc.identifier (Other Identifiers)	G0108753132	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/144042	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	108753132	zh_TW
dc.description.abstract (摘要)	隨著科技日新月異，許多產業導入各種硬軟體進行自動化與數位化，但是在法學資訊領域上這件事情較難以發展，原因可能是案發原因複雜，難以使用文字完整記載、法官判決之理由也不全然會寫在判決書內，所以法學資訊相較於其他領域較難以發展。我們希望透過將判決書的各種標記，把判決書中的某些類別標記出來，像是：爭點、法官見解等等，將判決書中的線索解構出來，方便進行後續的檢索甚至是機器學習等等應用，本篇論文在研究標記系統的開發方法與相關技術。　　由於訓練品質精良的機器學習與深度學習模型，仰賴極大量的資料對模型進行訓練及測試，這些資料都必需由人工標記，極大量的資料透過人工瀏覽與標記是很容易出錯的，為了降低標記的錯誤率，以及提升資料集的品質，我們希望透過開發一套判決書的標記輔助系統，使用者能夠在本系統上進行判決書的檢索、瀏覽、上傳、標記與下載標記成果，藉由降低標記的難度，以及提升工作流程的順暢度，達到降低錯誤率的結果。	zh_TW
dc.description.abstract (摘要)	As technology advances, many industries are adopting various software and hardware for automation and data management, but it is difficult to develop in the field of legal informatics. This may be because the reasons for the case are complicated and difficult to fully record in writing, and the reasons for the judge`s ruling are not entirely written in the ruling. Therefore, legal informatics is difficult to develop compared to other fields. We hope to mark the various marks in the ruling, mark out some categories in the ruling, such as: points of contention, judge`s opinion, etc., and deconstruct the clues in the ruling to facilitate subsequent retrieval or even machine learning applications. This paper studies the development methods and related technologies of the labeling system. 　　Due to the training and deep learning models with excellent training quality, it depends on a large amount of data to train and test the model. These materials must be manually marked by manual. The error rate and the quality of the data set, we hope to develop a set of judgment auxiliary systems, users can retrieve, browsing, uploading, marking, and download markers on this system to reduce the marking marks. Difficulty, and improve the smoothness of the workflow to reduce the result of reducing error rates.	en_US
dc.description.tableofcontents	摘要 i 目錄 iii 圖目錄 v 第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的與方法 3 1.3 研究期待與貢獻 4 1.4 論文章節與架構 4 第二章文獻探討 5 2.1 現有標記工具介紹 5 2.1.1 Label Studio 5 2.1.2 MARKUS 11 2.1.3 MedKnowts 15 2.2 標記系統開發方法彙整 17 2.2.1 輸入 17 2.2.2 標記 18 2.2.3 顯示控制 18 2.2.4 進階搜尋 18 2.2.5 輸出 18 2.2.6 雲端儲存 19 2.3 標記系統應有的功能 19 2.3.1 輸入 19 2.3.2 標記 20 2.3.3 顯示控制 20 2.3.4 搜尋 20 2.3.5 輸出 21 2.3.6 雲端儲存 21 第三章系統架構 22 3.1 軟體工程技術背景 22 3.1.1 軟體需求 22 3.1.2 軟體設計 23 3.1.3 軟體建構 25 3.1.4 軟體測試 30 3.2 前後端設計內容 31 3.2.1 前端 31 3.3 伺服器與資料庫設計 32 3.4 判決書取得與檢索方法 33 3.4.1 透過檢索頁面 33 3.4.2 輸入判決書編號 34 3.5 系統基本操作 35 3.5.1 判決書檢索 35 3.5.2 取得判決書 36 3.5.3 對判決書新增標記 36 3.5.4 對判決書移除標記 37 3.5.5 新增自定義類別 37 3.5.6 判決書內容搜尋 38 3.5.7 顯示及隱藏標籤 40 3.5.8 儲存標記成果 40 3.6 系統操作實例 41 第四章研究結論與討論 45 4.1 研究結論 45 4.2 研究討論與未來方向 47 參考文獻 48 附錄 A 論文口試記錄 50	zh_TW
dc.format.extent	3670535 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0108753132	en_US
dc.subject (關鍵詞)	法學資訊	zh_TW
dc.subject (關鍵詞)	系統開發	zh_TW
dc.subject (關鍵詞)	自然語言處理	zh_TW
dc.subject (關鍵詞)	判決書	zh_TW
dc.subject (關鍵詞)	Legal informatics	en_US
dc.subject (關鍵詞)	System development	en_US
dc.subject (關鍵詞)	Natural language processing	en_US
dc.subject (關鍵詞)	Judicial ruling	en_US
dc.title (題名)	中文司法裁判文書標記輔助環境初探	zh_TW
dc.title (題名)	A Prototype for Assisting the Labeling of Judicial Documents in Chinese	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	司法院法學資料檢索系統 (2023)。檢自https://lawsearch.judicial.gov.tw/(January 01, 2023) 劉一凡、劉昭麟及楊婕。以民事訴訟之爭點分群為基礎的類似案件搜尋系統(Clustering Issues in Civil Judgments for Recommending Similar Cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),184-192。2022。劉威志、林泓任、吳柏憲及劉昭麟。老年扶養費請求案件之准駁及扶養金額預測 (Predicting judgments and grants for civil cases of alimony for the elderly), 第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),121-128。2022。林泓任、劉威志、劉昭麟及楊婕。以機器學習與規則方法辨識中文民事裁判書結構 (Using machine learning and pattern-based methods for identifying elements in Chinese judgment documents of civil cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),107-115。2022。 Elasticsearch. Retrieved from https://www.elastic.co/ (January 01, 2023) 黃詩淳、邵軒磊,人工智慧與法律資料分析之方法與應用:以單獨親權酌定裁判的預測模型為例。臺大法學論叢,第 48 卷第 4 期,2023-2073。2019。司法院 : 國民法官制度 (2023)。檢自https://social.judicial.gov.tw/CJlandingpage/ (February 03, 2023) Label Studio. Retrieved from https://labelstud.io/ (January 01, 2023) Pip. Retrieved from https://pypi.org/project/pip/ (January 01, 2023) Amazon Web Services. Retrieved from https://aws.amazon.com/ (February 03, 2023) Google Cloud Platform. Retrieved from https://console.cloud.google.com/ (February 03, 2023) Microsoft Azure. Retrieved from https://azure.microsoft.com/ (February 03, 2023) MARKUS. Retrieved from https://dh.chinese-empires.eu/markus/beta/index.html (January 01, 2023) MedKnowts. Retrieved from http://clinicalml.org/projects/medknowts/ (February 03, 2023) Murray, Luke, et al. Medknowts: unified documentation and information retrieval for electronic health records. The 34th Annual ACM Symposium on User Interface Software and Technology. 2021. Chen, Irene Y., Rahul G. Krishnan, and David Sontag. Clustering Interval-Censored Time-Series for Disease Phenotyping. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, No. 6, 6211-6221. 2022. Karlsson, Rickard KA, et al. Using time-series privileged information for provably efficient learning of prediction models. arXiv preprint arXiv:2110.14993. 2021. Huang, Yi-Tang, Hong-Ren Lin, and Chao-Lin Liu. Toward an Integrated Annotation and Inference Platform for Enhancing Justifications for Algorithmically Generated Legal Recommendations and Decisions. Legal Knowledge and Information Systems. IOS Press, 281-285. 2022.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM