Publications-Theses
Article View/Open
Publication Export
-
題名 應用 Auto-encoder 技術於無監督漢字圖像轉譯
Unsupervised Chinese character image translation based on Auto-encoder作者 邱柏森
Chiu, Po-Sen貢獻者 劉昭麟
Liu, Chao-Lin
邱柏森
Chiu, Po-Sen關鍵詞 影像處理
圖像轉譯日期 2021 上傳時間 2-Mar-2021 14:32:19 (UTC+8) 摘要 光學字元辨識(Optical Character Recognition)為對漢字圖像檔案進行分析辨識處理,目前已成為一項重要且廣泛使用的技術。然而待辨識的原始資料裡的漢字不一定能被其光學字元辨識模型所辨識,主要原因有以下幾種,一為原始資料裡所使用的漢字字型是未知的,導致每個漢字筆畫上的粗細、長短、形狀特徵等等皆不同,假如剛好此字型不在光學字元辨識模型的辨識範圍內,極有可能會出現辨識困難,二為可能因為種種原因使得原始資料上會有污損或模糊等等,導致得到的掃描的圖像的品質不好,因而無法辨識。綜合以上問題,除非能找到另外能辨識此特徵的辨識模型以外,就只能花費大量時間另外標記類別進行訓練,難以快速解決光學字元辨識問題。因此本研究實驗應用Auto-encoder技術於建構漢字圖像轉譯模型,能以無監督方式進行訓練來對資料集的掃描圖像做預處理來獲得預處理後的漢字圖像結果,並會使用未經過預處理的漢字圖像在固定的光學字元模型中來做比較,藉以評估預處理後光學字元辨識的辨識結果。
Optical character recognition is an important and used technology for analyzing and identifying Chinese character image files. However, the original data may not be recognized by its optical character recognition model. The main reasons are as follows. One is that the Chinese characters used in the original data are unknown, which leads to the strokes of each Chinese character thickness, length, shape feature is different. If the font is not within the recognition range of the optical character recognition model, it is likely to be difficult to recognize. On the other hand, the original data may be defaced due to various reasons or blurring resulting in poor quality of the scanned image, which cannot be recognized. Based on the above problems, unless another recognition model can be found that can recognize this feature, lots of time will be spent on training for additional marking categories, it is difficult to quickly solve the problem of optical character recognition.Therefore, this research experiment uses Auto-encoder technology to construct a Chinese character image translation model, which can be trained in an unsupervised manner to preprocess the scanned images of the data set to obtain the preprocessed Chinese character image results, and will use unsupervised preprocessed Chinese character images are compared in a fixed optical character model to evaluate the recognition results of the optical character recognition after preprocessing.參考文獻 [1] 政府資料開放平台CNS11643中文標準交換碼全字庫字型下載https://data.gov.tw/dataset/5961.[2] 中日韓統一表意文字http://jicheng.tw/hanzi/unicode?s=4E00&e=9FFF.[3] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 1558-1566, 2016.[4] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B Frey. Adversarial Autoencoders. In NIPS, 2016.[5] D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. In ICLR, 2014.[6] D. Pathak, P. Krähenbühl, J. Donahue, T.Darrell, A. A. Efros. Context Encoders: Feature Learning by Inpainting. In CVPR, 2536-2544, 2016.[7] H. Cho, J. Wang, S. Lee. Text Image Deblurring Using Text-Specific Properties. In ECCV, 524-537, 2012.[8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville. Improved Training of Wasserstein GANs. In NIPS, 5769-5779, 2017.[9] J. Pan, Z. Hu, Z. Su, M.-H. Yang. Deblurring Text Images via L0-Regularized Intensity and Gradient Prior. In CVPR, 2901-2908, 2014.[10] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV, 2242-2251, 2017.[11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, M. Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. In arXiv:1901.00212, 2019.[12] M. Arjovsky, S. Chintala, L. Bottou. Wasserstein Generative Adversarial Networks. In ICML, 214-223, 2017.[13] M. Tan, Q. V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML, 6105-6114, 2019.[14] M.-Y. Liu, T. Breuel, J. Kautz. Unsupervised Image-to-Image Translation Networks. In NIPS, 700-708, 2017.[15] O. Elharrouss, N. Almaadeed, S. Al-Maadeed, Y. Akbari. Image inpainting: A review. In Neural Process Letters, 2019.[16] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 234-241, 2015.[17] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR, 5967-5976, 2017.[18] R. Smith. An overview of the Tesseract OCR engine. In ICDAR, 629-633, 2007.[19] S. Ren, K. He, R. B. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 91-99, 2015.[20] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, S. P. Smolley. Least Squares Generative Adversarial Networks. In ICCV, 2813-2821, 2017. 描述 碩士
國立政治大學
資訊科學系
107753029資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107753029 資料類型 thesis dc.contributor.advisor 劉昭麟 zh_TW dc.contributor.advisor Liu, Chao-Lin en_US dc.contributor.author (Authors) 邱柏森 zh_TW dc.contributor.author (Authors) Chiu, Po-Sen en_US dc.creator (作者) 邱柏森 zh_TW dc.creator (作者) Chiu, Po-Sen en_US dc.date (日期) 2021 en_US dc.date.accessioned 2-Mar-2021 14:32:19 (UTC+8) - dc.date.available 2-Mar-2021 14:32:19 (UTC+8) - dc.date.issued (上傳時間) 2-Mar-2021 14:32:19 (UTC+8) - dc.identifier (Other Identifiers) G0107753029 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/134085 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系 zh_TW dc.description (描述) 107753029 zh_TW dc.description.abstract (摘要) 光學字元辨識(Optical Character Recognition)為對漢字圖像檔案進行分析辨識處理,目前已成為一項重要且廣泛使用的技術。然而待辨識的原始資料裡的漢字不一定能被其光學字元辨識模型所辨識,主要原因有以下幾種,一為原始資料裡所使用的漢字字型是未知的,導致每個漢字筆畫上的粗細、長短、形狀特徵等等皆不同,假如剛好此字型不在光學字元辨識模型的辨識範圍內,極有可能會出現辨識困難,二為可能因為種種原因使得原始資料上會有污損或模糊等等,導致得到的掃描的圖像的品質不好,因而無法辨識。綜合以上問題,除非能找到另外能辨識此特徵的辨識模型以外,就只能花費大量時間另外標記類別進行訓練,難以快速解決光學字元辨識問題。因此本研究實驗應用Auto-encoder技術於建構漢字圖像轉譯模型,能以無監督方式進行訓練來對資料集的掃描圖像做預處理來獲得預處理後的漢字圖像結果,並會使用未經過預處理的漢字圖像在固定的光學字元模型中來做比較,藉以評估預處理後光學字元辨識的辨識結果。 zh_TW dc.description.abstract (摘要) Optical character recognition is an important and used technology for analyzing and identifying Chinese character image files. However, the original data may not be recognized by its optical character recognition model. The main reasons are as follows. One is that the Chinese characters used in the original data are unknown, which leads to the strokes of each Chinese character thickness, length, shape feature is different. If the font is not within the recognition range of the optical character recognition model, it is likely to be difficult to recognize. On the other hand, the original data may be defaced due to various reasons or blurring resulting in poor quality of the scanned image, which cannot be recognized. Based on the above problems, unless another recognition model can be found that can recognize this feature, lots of time will be spent on training for additional marking categories, it is difficult to quickly solve the problem of optical character recognition.Therefore, this research experiment uses Auto-encoder technology to construct a Chinese character image translation model, which can be trained in an unsupervised manner to preprocess the scanned images of the data set to obtain the preprocessed Chinese character image results, and will use unsupervised preprocessed Chinese character images are compared in a fixed optical character model to evaluate the recognition results of the optical character recognition after preprocessing. en_US dc.description.tableofcontents 1緒論 11.1研究動機 11.2研究目的 22相關研究 32.1圖像轉譯 32.2 Generative adversarial network 42.3 Image-to-Image Translation 42.3.1 Supervised Learning 52.3.2 Unsupervised Learning 53研究方法 63.1實驗架構 63.2實驗資料 83.2.1標準漢字圖像 83.2.2地方志漢字圖像 103.3圖像資料預處理 133.4漢字偵測模型 143.4.1模型介紹 143.4.2傳統方法 153.5圖像轉譯模型 183.5.1 Variational Auto-encoder 193.5.2 CycleGAN 193.5.3 Unsupervised Image-to-Image Translation Networks 193.6實驗衡量指標 203.6.1光學字元辨識 203.6.2圖像分類模型 203.7實驗結果評估 214實驗設計與結果分析 224.1訓練資料與衡量指標 224.1.1實驗資料 224.1.2衡量模型 234.1.3原始資料辨識率 244.2實驗VAE模型 264.2.1實驗設計 264.2.2模型設計 264.2.3標準正宋體轉譯標準正楷體結果分析 294.2.4地方志漢字轉譯標準正楷體結果分析 314.3實驗CycleGAN模型 334.3.1實驗設計 334.3.2模型設計 344.3.3演算法 364.3.4標準正宋體轉譯標準正楷體結果分析 374.3.5地方志漢字轉譯標準正楷體結果分析 404.4實驗UNIT模型 444.4.1實驗設計 444.4.2模型設計 444.4.3演算法 474.4.4標準正宋體轉譯標準正楷體結果分析 484.4.5地方志漢字轉譯標準正楷體結果分析 534.5實驗Adversarial UNIT 模型 574.5.1實驗設計 574.5.2模型設計 584.5.3演算法 604.5.4標準正宋體轉譯標準正楷體結果分析 624.5.5地方志漢字轉譯標準正楷體取消循環一致性結果分析 654.6圖像轉譯的有監督多分類模型訓練實驗 684.6.1實驗設計 684.6.2地方志漢字結果分析 685結論與未來展望 705.1結論 705.2未來展望 73參考文獻 74 zh_TW dc.format.extent 10514249 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107753029 en_US dc.subject (關鍵詞) 影像處理 zh_TW dc.subject (關鍵詞) 圖像轉譯 zh_TW dc.title (題名) 應用 Auto-encoder 技術於無監督漢字圖像轉譯 zh_TW dc.title (題名) Unsupervised Chinese character image translation based on Auto-encoder en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] 政府資料開放平台CNS11643中文標準交換碼全字庫字型下載https://data.gov.tw/dataset/5961.[2] 中日韓統一表意文字http://jicheng.tw/hanzi/unicode?s=4E00&e=9FFF.[3] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 1558-1566, 2016.[4] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B Frey. Adversarial Autoencoders. In NIPS, 2016.[5] D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. In ICLR, 2014.[6] D. Pathak, P. Krähenbühl, J. Donahue, T.Darrell, A. A. Efros. Context Encoders: Feature Learning by Inpainting. In CVPR, 2536-2544, 2016.[7] H. Cho, J. Wang, S. Lee. Text Image Deblurring Using Text-Specific Properties. In ECCV, 524-537, 2012.[8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville. Improved Training of Wasserstein GANs. In NIPS, 5769-5779, 2017.[9] J. Pan, Z. Hu, Z. Su, M.-H. Yang. Deblurring Text Images via L0-Regularized Intensity and Gradient Prior. In CVPR, 2901-2908, 2014.[10] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV, 2242-2251, 2017.[11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, M. Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. In arXiv:1901.00212, 2019.[12] M. Arjovsky, S. Chintala, L. Bottou. Wasserstein Generative Adversarial Networks. In ICML, 214-223, 2017.[13] M. Tan, Q. V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML, 6105-6114, 2019.[14] M.-Y. Liu, T. Breuel, J. Kautz. Unsupervised Image-to-Image Translation Networks. In NIPS, 700-708, 2017.[15] O. Elharrouss, N. Almaadeed, S. Al-Maadeed, Y. Akbari. Image inpainting: A review. In Neural Process Letters, 2019.[16] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 234-241, 2015.[17] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR, 5967-5976, 2017.[18] R. Smith. An overview of the Tesseract OCR engine. In ICDAR, 629-633, 2007.[19] S. Ren, K. He, R. B. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 91-99, 2015.[20] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, S. P. Smolley. Least Squares Generative Adversarial Networks. In ICCV, 2813-2821, 2017. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202100219 en_US