Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/134085
題名: 應用 Auto-encoder 技術於無監督漢字圖像轉譯
Unsupervised Chinese character image translation based on Auto-encoder
作者: 邱柏森
Chiu, Po-Sen
貢獻者: 劉昭麟
Liu, Chao-Lin
邱柏森
Chiu, Po-Sen
關鍵詞: 影像處理
圖像轉譯
日期: 2021
上傳時間: 2-Mar-2021
摘要: 光學字元辨識(Optical Character Recognition)為對漢字圖像檔案進行分析辨識處理,目前已成為一項重要且廣泛使用的技術。然而待辨識的原始資料裡的漢字不一定能被其光學字元辨識模型所辨識,主要原因有以下幾種,一為原始資料裡所使用的漢字字型是未知的,導致每個漢字筆畫上的粗細、長短、形狀特徵等等皆不同,假如剛好此字型不在光學字元辨識模型的辨識範圍內,極有可能會出現辨識困難,二為可能因為種種原因使得原始資料上會有污損或模糊等等,導致得到的掃描的圖像的品質不好,因而無法辨識。綜合以上問題,除非能找到另外能辨識此特徵的辨識模型以外,就只能花費大量時間另外標記類別進行訓練,難以快速解決光學字元辨識問題。\n因此本研究實驗應用Auto-encoder技術於建構漢字圖像轉譯模型,能以無監督方式進行訓練來對資料集的掃描圖像做預處理來獲得預處理後的漢字圖像結果,並會使用未經過預處理的漢字圖像在固定的光學字元模型中來做比較,藉以評估預處理後光學字元辨識的辨識結果。
Optical character recognition is an important and used technology for analyzing and identifying Chinese character image files. However, the original data may not be recognized by its optical character recognition model. The main reasons are as follows. One is that the Chinese characters used in the original data are unknown, which leads to the strokes of each Chinese character thickness, length, shape feature is different. If the font is not within the recognition range of the optical character recognition model, it is likely to be difficult to recognize. On the other hand, the original data may be defaced due to various reasons or blurring resulting in poor quality of the scanned image, which cannot be recognized. Based on the above problems, unless another recognition model can be found that can recognize this feature, lots of time will be spent on training for additional marking categories, it is difficult to quickly solve the problem of optical character recognition.\nTherefore, this research experiment uses Auto-encoder technology to construct a Chinese character image translation model, which can be trained in an unsupervised manner to preprocess the scanned images of the data set to obtain the preprocessed Chinese character image results, and will use unsupervised preprocessed Chinese character images are compared in a fixed optical character model to evaluate the recognition results of the optical character recognition after preprocessing.
參考文獻: [1] 政府資料開放平台CNS11643中文標準交換碼全字庫字型下載https://data.gov.tw/dataset/5961.\n[2] 中日韓統一表意文字http://jicheng.tw/hanzi/unicode?s=4E00&e=9FFF.\n[3] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 1558-1566, 2016.\n[4] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B Frey. Adversarial Autoencoders. In NIPS, 2016.\n[5] D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. In ICLR, 2014.\n[6] D. Pathak, P. Krähenbühl, J. Donahue, T.Darrell, A. A. Efros. Context Encoders: Feature Learning by Inpainting. In CVPR, 2536-2544, 2016.\n[7] H. Cho, J. Wang, S. Lee. Text Image Deblurring Using Text-Specific Properties. In ECCV, 524-537, 2012.\n[8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville. Improved Training of Wasserstein GANs. In NIPS, 5769-5779, 2017.\n[9] J. Pan, Z. Hu, Z. Su, M.-H. Yang. Deblurring Text Images via L0-Regularized Intensity and Gradient Prior. In CVPR, 2901-2908, 2014.\n[10] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV, 2242-2251, 2017.\n[11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, M. Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. In arXiv:1901.00212, 2019.\n[12] M. Arjovsky, S. Chintala, L. Bottou. Wasserstein Generative Adversarial Networks. In ICML, 214-223, 2017.\n[13] M. Tan, Q. V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML, 6105-6114, 2019.\n[14] M.-Y. Liu, T. Breuel, J. Kautz. Unsupervised Image-to-Image Translation Networks. In NIPS, 700-708, 2017.\n[15] O. Elharrouss, N. Almaadeed, S. Al-Maadeed, Y. Akbari. Image inpainting: A review. In Neural Process Letters, 2019.\n[16] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 234-241, 2015.\n[17] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR, 5967-5976, 2017.\n[18] R. Smith. An overview of the Tesseract OCR engine. In ICDAR, 629-633, 2007.\n[19] S. Ren, K. He, R. B. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 91-99, 2015.\n[20] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, S. P. Smolley. Least Squares Generative Adversarial Networks. In ICCV, 2813-2821, 2017.
描述: 碩士
國立政治大學
資訊科學系
107753029
資料來源: http://thesis.lib.nccu.edu.tw/record/#G0107753029
資料類型: thesis
Appears in Collections:學位論文

Files in This Item:
File Description SizeFormat
302901.pdf10.27 MBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.