學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 中文司法裁判文書標記輔助環境初探
A Prototype for Assisting the Labeling of Judicial Documents in Chinese
作者 黃翊唐
Huang, Yi-Tang
貢獻者 劉昭麟
黃翊唐
Huang, Yi-Tang
關鍵詞 法學資訊
系統開發
自然語言處理
判決書
Legal informatics
System development
Natural language processing
Judicial ruling
日期 2023
上傳時間 6-Apr-2023 18:00:07 (UTC+8)
摘要   隨著科技日新月異,許多產業導入各種硬軟體進行自動化與數位化,但是在法學資訊領域上這件事情較難以發展,原因可能是案發原因複雜,難以使用文字完整記載、法官判決之理由也不全然會寫在判決書內,所以法學資訊相較於其他領域較難以發展。我們希望透過將判決書的各種標記,把判決書中的某些類別標記出來,像是:爭點、法官見解等等,將判決書中的線索解構出來,方便進行後續的檢索甚至是機器學習等等應用,本篇論文在研究標記系統的開發方法與相關技術。
  由於訓練品質精良的機器學習與深度學習模型,仰賴極大量的資料對模型進行訓練及測試,這些資料都必需由人工標記,極大量的資料透過人工瀏覽與標記是很容易出錯的,為了降低標記的錯誤率,以及提升資料集的品質,我們希望透過開發一套判決書的標記輔助系統,使用者能夠在本系統上進行判決書的檢索、瀏覽、上傳、標記與下載標記成果,藉由降低標記的難度,以及提升工作流程的順暢度,達到降低錯誤率的結果。
  As technology advances, many industries are adopting various software and hardware for automation and data management, but it is difficult to develop in the field of legal informatics. This may be because the reasons for the case are complicated and difficult to fully record in writing, and the reasons for the judge`s ruling are not entirely written in the ruling. Therefore, legal informatics is difficult to develop compared to other fields. We hope to mark the various marks in the ruling, mark out some categories in the ruling, such as: points of contention, judge`s opinion, etc., and deconstruct the clues in the ruling to facilitate subsequent retrieval or even machine learning applications. This paper studies the development methods and related technologies of the labeling system.
  Due to the training and deep learning models with excellent training quality, it depends on a large amount of data to train and test the model. These materials must be manually marked by manual. The error rate and the quality of the data set, we hope to develop a set of judgment auxiliary systems, users can retrieve, browsing, uploading, marking, and download markers on this system to reduce the marking marks. Difficulty, and improve the smoothness of the workflow to reduce the result of reducing error rates.
參考文獻 司法院法學資料檢索系統 (2023)。檢自https://lawsearch.judicial.gov.tw/(January 01, 2023)
劉一凡、劉昭麟及楊婕。以民事訴訟之爭點分群為基礎的類似案件搜尋系統(Clustering Issues in Civil Judgments for Recommending Similar Cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),184-192。2022。
劉威志、林泓任、吳柏憲及劉昭麟。老年扶養費請求案件之准駁及扶養金額
預測 (Predicting judgments and grants for civil cases of alimony for the elderly), 第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),121-128。2022。
林泓任、劉威志、劉昭麟及楊婕。以機器學習與規則方法辨識中文民事裁判
書結構 (Using machine learning and pattern-based methods for identifying elements in Chinese judgment documents of civil cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),107-115。2022。
Elasticsearch. Retrieved from https://www.elastic.co/ (January 01, 2023)
黃詩淳、邵軒磊,人工智慧與法律資料分析之方法與應用:以單獨親權酌定裁判的預測模型為例。臺大法學論叢,第 48 卷第 4 期,2023-2073。2019。
司 法 院 : 國 民 法 官 制 度 (2023)。檢自https://social.judicial.gov.tw/CJlandingpage/ (February 03, 2023)
Label Studio. Retrieved from https://labelstud.io/ (January 01, 2023)
Pip. Retrieved from https://pypi.org/project/pip/ (January 01, 2023)
Amazon Web Services. Retrieved from https://aws.amazon.com/ (February 03, 2023)
Google Cloud Platform. Retrieved from https://console.cloud.google.com/ (February 03, 2023)
Microsoft Azure. Retrieved from https://azure.microsoft.com/ (February 03, 2023)
MARKUS. Retrieved from https://dh.chinese-empires.eu/markus/beta/index.html (January 01, 2023)
MedKnowts. Retrieved from http://clinicalml.org/projects/medknowts/ (February 03, 2023)
Murray, Luke, et al. Medknowts: unified documentation and information retrieval for electronic health records. The 34th Annual ACM Symposium on User Interface Software and Technology. 2021.
Chen, Irene Y., Rahul G. Krishnan, and David Sontag. Clustering Interval-Censored Time-Series for Disease Phenotyping. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, No. 6, 6211-6221. 2022.
Karlsson, Rickard KA, et al. Using time-series privileged information for provably efficient learning of prediction models. arXiv preprint arXiv:2110.14993. 2021.
Huang, Yi-Tang, Hong-Ren Lin, and Chao-Lin Liu. Toward an Integrated Annotation and Inference Platform for Enhancing Justifications for Algorithmically Generated Legal Recommendations and Decisions. Legal Knowledge and Information Systems. IOS Press, 281-285. 2022.
描述 碩士
國立政治大學
資訊科學系
108753132
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0108753132
資料類型 thesis
dc.contributor.advisor 劉昭麟zh_TW
dc.contributor.author (Authors) 黃翊唐zh_TW
dc.contributor.author (Authors) Huang, Yi-Tangen_US
dc.creator (作者) 黃翊唐zh_TW
dc.creator (作者) Huang, Yi-Tangen_US
dc.date (日期) 2023en_US
dc.date.accessioned 6-Apr-2023 18:00:07 (UTC+8)-
dc.date.available 6-Apr-2023 18:00:07 (UTC+8)-
dc.date.issued (上傳時間) 6-Apr-2023 18:00:07 (UTC+8)-
dc.identifier (Other Identifiers) G0108753132en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/144042-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 108753132zh_TW
dc.description.abstract (摘要)   隨著科技日新月異,許多產業導入各種硬軟體進行自動化與數位化,但是在法學資訊領域上這件事情較難以發展,原因可能是案發原因複雜,難以使用文字完整記載、法官判決之理由也不全然會寫在判決書內,所以法學資訊相較於其他領域較難以發展。我們希望透過將判決書的各種標記,把判決書中的某些類別標記出來,像是:爭點、法官見解等等,將判決書中的線索解構出來,方便進行後續的檢索甚至是機器學習等等應用,本篇論文在研究標記系統的開發方法與相關技術。
  由於訓練品質精良的機器學習與深度學習模型,仰賴極大量的資料對模型進行訓練及測試,這些資料都必需由人工標記,極大量的資料透過人工瀏覽與標記是很容易出錯的,為了降低標記的錯誤率,以及提升資料集的品質,我們希望透過開發一套判決書的標記輔助系統,使用者能夠在本系統上進行判決書的檢索、瀏覽、上傳、標記與下載標記成果,藉由降低標記的難度,以及提升工作流程的順暢度,達到降低錯誤率的結果。
zh_TW
dc.description.abstract (摘要)   As technology advances, many industries are adopting various software and hardware for automation and data management, but it is difficult to develop in the field of legal informatics. This may be because the reasons for the case are complicated and difficult to fully record in writing, and the reasons for the judge`s ruling are not entirely written in the ruling. Therefore, legal informatics is difficult to develop compared to other fields. We hope to mark the various marks in the ruling, mark out some categories in the ruling, such as: points of contention, judge`s opinion, etc., and deconstruct the clues in the ruling to facilitate subsequent retrieval or even machine learning applications. This paper studies the development methods and related technologies of the labeling system.
  Due to the training and deep learning models with excellent training quality, it depends on a large amount of data to train and test the model. These materials must be manually marked by manual. The error rate and the quality of the data set, we hope to develop a set of judgment auxiliary systems, users can retrieve, browsing, uploading, marking, and download markers on this system to reduce the marking marks. Difficulty, and improve the smoothness of the workflow to reduce the result of reducing error rates.
en_US
dc.description.tableofcontents 摘要 i
目錄 iii
圖目錄 v
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的與方法 3
1.3 研究期待與貢獻 4
1.4 論文章節與架構 4
第二章 文獻探討 5
2.1 現有標記工具介紹 5
2.1.1 Label Studio 5
2.1.2 MARKUS 11
2.1.3 MedKnowts 15
2.2 標記系統開發方法彙整 17
2.2.1 輸入 17
2.2.2 標記 18
2.2.3 顯示控制 18
2.2.4 進階搜尋 18
2.2.5 輸出 18
2.2.6 雲端儲存 19
2.3 標記系統應有的功能 19
2.3.1 輸入 19
2.3.2 標記 20
2.3.3 顯示控制 20
2.3.4 搜尋 20
2.3.5 輸出 21
2.3.6 雲端儲存 21
第三章 系統架構 22
3.1 軟體工程技術背景 22
3.1.1 軟體需求 22
3.1.2 軟體設計 23
3.1.3 軟體建構 25
3.1.4 軟體測試 30
3.2 前後端設計內容 31
3.2.1 前端 31
3.3 伺服器與資料庫設計 32
3.4 判決書取得與檢索方法 33
3.4.1 透過檢索頁面 33
3.4.2 輸入判決書編號 34
3.5 系統基本操作 35
3.5.1 判決書檢索 35
3.5.2 取得判決書 36
3.5.3 對判決書新增標記 36
3.5.4 對判決書移除標記 37
3.5.5 新增自定義類別 37
3.5.6 判決書內容搜尋 38
3.5.7 顯示及隱藏標籤 40
3.5.8 儲存標記成果 40
3.6 系統操作實例 41
第四章 研究結論與討論 45
4.1 研究結論 45
4.2 研究討論與未來方向 47
參考文獻 48
附錄 A 論文口試記錄 50
zh_TW
dc.format.extent 3670535 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0108753132en_US
dc.subject (關鍵詞) 法學資訊zh_TW
dc.subject (關鍵詞) 系統開發zh_TW
dc.subject (關鍵詞) 自然語言處理zh_TW
dc.subject (關鍵詞) 判決書zh_TW
dc.subject (關鍵詞) Legal informaticsen_US
dc.subject (關鍵詞) System developmenten_US
dc.subject (關鍵詞) Natural language processingen_US
dc.subject (關鍵詞) Judicial rulingen_US
dc.title (題名) 中文司法裁判文書標記輔助環境初探zh_TW
dc.title (題名) A Prototype for Assisting the Labeling of Judicial Documents in Chineseen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 司法院法學資料檢索系統 (2023)。檢自https://lawsearch.judicial.gov.tw/(January 01, 2023)
劉一凡、劉昭麟及楊婕。以民事訴訟之爭點分群為基礎的類似案件搜尋系統(Clustering Issues in Civil Judgments for Recommending Similar Cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),184-192。2022。
劉威志、林泓任、吳柏憲及劉昭麟。老年扶養費請求案件之准駁及扶養金額
預測 (Predicting judgments and grants for civil cases of alimony for the elderly), 第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),121-128。2022。
林泓任、劉威志、劉昭麟及楊婕。以機器學習與規則方法辨識中文民事裁判
書結構 (Using machine learning and pattern-based methods for identifying elements in Chinese judgment documents of civil cases),第卅四屆自然語言與語音處理研討會論文集 (ROCLING XXXIV),107-115。2022。
Elasticsearch. Retrieved from https://www.elastic.co/ (January 01, 2023)
黃詩淳、邵軒磊,人工智慧與法律資料分析之方法與應用:以單獨親權酌定裁判的預測模型為例。臺大法學論叢,第 48 卷第 4 期,2023-2073。2019。
司 法 院 : 國 民 法 官 制 度 (2023)。檢自https://social.judicial.gov.tw/CJlandingpage/ (February 03, 2023)
Label Studio. Retrieved from https://labelstud.io/ (January 01, 2023)
Pip. Retrieved from https://pypi.org/project/pip/ (January 01, 2023)
Amazon Web Services. Retrieved from https://aws.amazon.com/ (February 03, 2023)
Google Cloud Platform. Retrieved from https://console.cloud.google.com/ (February 03, 2023)
Microsoft Azure. Retrieved from https://azure.microsoft.com/ (February 03, 2023)
MARKUS. Retrieved from https://dh.chinese-empires.eu/markus/beta/index.html (January 01, 2023)
MedKnowts. Retrieved from http://clinicalml.org/projects/medknowts/ (February 03, 2023)
Murray, Luke, et al. Medknowts: unified documentation and information retrieval for electronic health records. The 34th Annual ACM Symposium on User Interface Software and Technology. 2021.
Chen, Irene Y., Rahul G. Krishnan, and David Sontag. Clustering Interval-Censored Time-Series for Disease Phenotyping. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, No. 6, 6211-6221. 2022.
Karlsson, Rickard KA, et al. Using time-series privileged information for provably efficient learning of prediction models. arXiv preprint arXiv:2110.14993. 2021.
Huang, Yi-Tang, Hong-Ren Lin, and Chao-Lin Liu. Toward an Integrated Annotation and Inference Platform for Enhancing Justifications for Algorithmically Generated Legal Recommendations and Decisions. Legal Knowledge and Information Systems. IOS Press, 281-285. 2022.
zh_TW