學術產出-期刊論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 光學字元辨識古籍之全文轉置經驗:以明人文集為例
Full Text Conversion Experience in Optical Character Recognition of Ancient Books: An Example of Ming Dynasty Literati Collections
作者 林巧敏
Lin, Chiao-Min
蔡瀚緯
Lin, Chiao-Min
貢獻者 圖檔所
關鍵詞 光學字元辨識  ;  全文資料庫  ;  特藏古籍  ;  古籍數位化  ;  數位典藏 
Optical character recognition  ;  Full-Text database  ;  Old rare books  ;  Ancient book digitization  ;  Digital archive
日期 2020-12
上傳時間 10-六月-2021 14:36:05 (UTC+8)
摘要 因應資訊技術的發展,加上數位人文研究對於全文內容分析的使用需求,運用光學字元辨識技術(OCR)將文本內容轉置為全文,可促進全文檢索與內容探勘使用。為瞭解利用 OCR 辨識軟體轉換古籍全文的可行性,本研究運用古籍文本進行實測分析,探討古籍運用 OCR 辨識的成效以及影響辨識率的原因。研究選取 40 種明代文集進行分析,研究結果顯示古籍版式與影像品質皆會影響 OCR 辨識率,尤其版式文字過於擁擠和影像品質不佳,較不利於OCR 處理,進而歸納出六種常見的辨識錯誤字形樣態,可提供典藏機構進行類似古籍版本全文轉置作業規劃之參考。
Due to the development of information technology and the need for content analysis of digital humanities research, the use of optical character recognition technology (OCR) to convert contents into verbatim texts can facilitate full-text search and content exploration. In order to understand the feasibility of using the OCR software to convert the full text of the ancient books, this study used the ancient texts to conduct a measured analysis to explore the effectiveness of OCR identification and the reasons for the impact of text recognition. The study selected 40 different layouts and glyphs of Ming Dynasty ancient books for analysis. The results show that the ancient book layout and image quality would affect the OCR recognition rate. When the layout is too crowded and the image quality is blurred, it is not conducive to OCR recognition. This study summarized six common types of identification error glyphs, which can provide the collection agencies to carry out the plan of the full text conversion of similar ancient books.
關聯 圖資與檔案學刊, Vol.12, No.2, pp.76-117
資料類型 article
DOI https://doi.org/10.6575/JILA.202012_(97).0003
dc.contributor 圖檔所
dc.creator (作者) 林巧敏
dc.creator (作者) Lin, Chiao-Min
dc.creator (作者) 蔡瀚緯
dc.creator (作者) Lin, Chiao-Min
dc.date (日期) 2020-12
dc.date.accessioned 10-六月-2021 14:36:05 (UTC+8)-
dc.date.available 10-六月-2021 14:36:05 (UTC+8)-
dc.date.issued (上傳時間) 10-六月-2021 14:36:05 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/135733-
dc.description.abstract (摘要) 因應資訊技術的發展,加上數位人文研究對於全文內容分析的使用需求,運用光學字元辨識技術(OCR)將文本內容轉置為全文,可促進全文檢索與內容探勘使用。為瞭解利用 OCR 辨識軟體轉換古籍全文的可行性,本研究運用古籍文本進行實測分析,探討古籍運用 OCR 辨識的成效以及影響辨識率的原因。研究選取 40 種明代文集進行分析,研究結果顯示古籍版式與影像品質皆會影響 OCR 辨識率,尤其版式文字過於擁擠和影像品質不佳,較不利於OCR 處理,進而歸納出六種常見的辨識錯誤字形樣態,可提供典藏機構進行類似古籍版本全文轉置作業規劃之參考。
dc.description.abstract (摘要) Due to the development of information technology and the need for content analysis of digital humanities research, the use of optical character recognition technology (OCR) to convert contents into verbatim texts can facilitate full-text search and content exploration. In order to understand the feasibility of using the OCR software to convert the full text of the ancient books, this study used the ancient texts to conduct a measured analysis to explore the effectiveness of OCR identification and the reasons for the impact of text recognition. The study selected 40 different layouts and glyphs of Ming Dynasty ancient books for analysis. The results show that the ancient book layout and image quality would affect the OCR recognition rate. When the layout is too crowded and the image quality is blurred, it is not conducive to OCR recognition. This study summarized six common types of identification error glyphs, which can provide the collection agencies to carry out the plan of the full text conversion of similar ancient books.
dc.format.extent 1287835 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) 圖資與檔案學刊, Vol.12, No.2, pp.76-117
dc.subject (關鍵詞) 光學字元辨識  ;  全文資料庫  ;  特藏古籍  ;  古籍數位化  ;  數位典藏 
dc.subject (關鍵詞) Optical character recognition  ;  Full-Text database  ;  Old rare books  ;  Ancient book digitization  ;  Digital archive
dc.title (題名) 光學字元辨識古籍之全文轉置經驗:以明人文集為例
dc.title (題名) Full Text Conversion Experience in Optical Character Recognition of Ancient Books: An Example of Ming Dynasty Literati Collections
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.6575/JILA.202012_(97).0003
dc.doi.uri (DOI) https://doi.org/10.6575/JILA.202012_(97).0003