Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 應用 Auto-encoder 技術於無監督漢字圖像轉譯
Unsupervised Chinese character image translation based on Auto-encoder
作者 邱柏森
Chiu, Po-Sen
貢獻者 劉昭麟
Liu, Chao-Lin
邱柏森
Chiu, Po-Sen
關鍵詞 影像處理
圖像轉譯
日期 2021
上傳時間 2-Mar-2021 14:32:19 (UTC+8)
摘要 光學字元辨識(Optical Character Recognition)為對漢字圖像檔案進行分析辨識處理,目前已成為一項重要且廣泛使用的技術。然而待辨識的原始資料裡的漢字不一定能被其光學字元辨識模型所辨識,主要原因有以下幾種,一為原始資料裡所使用的漢字字型是未知的,導致每個漢字筆畫上的粗細、長短、形狀特徵等等皆不同,假如剛好此字型不在光學字元辨識模型的辨識範圍內,極有可能會出現辨識困難,二為可能因為種種原因使得原始資料上會有污損或模糊等等,導致得到的掃描的圖像的品質不好,因而無法辨識。綜合以上問題,除非能找到另外能辨識此特徵的辨識模型以外,就只能花費大量時間另外標記類別進行訓練,難以快速解決光學字元辨識問題。
因此本研究實驗應用Auto-encoder技術於建構漢字圖像轉譯模型,能以無監督方式進行訓練來對資料集的掃描圖像做預處理來獲得預處理後的漢字圖像結果,並會使用未經過預處理的漢字圖像在固定的光學字元模型中來做比較,藉以評估預處理後光學字元辨識的辨識結果。
Optical character recognition is an important and used technology for analyzing and identifying Chinese character image files. However, the original data may not be recognized by its optical character recognition model. The main reasons are as follows. One is that the Chinese characters used in the original data are unknown, which leads to the strokes of each Chinese character thickness, length, shape feature is different. If the font is not within the recognition range of the optical character recognition model, it is likely to be difficult to recognize. On the other hand, the original data may be defaced due to various reasons or blurring resulting in poor quality of the scanned image, which cannot be recognized. Based on the above problems, unless another recognition model can be found that can recognize this feature, lots of time will be spent on training for additional marking categories, it is difficult to quickly solve the problem of optical character recognition.
Therefore, this research experiment uses Auto-encoder technology to construct a Chinese character image translation model, which can be trained in an unsupervised manner to preprocess the scanned images of the data set to obtain the preprocessed Chinese character image results, and will use unsupervised preprocessed Chinese character images are compared in a fixed optical character model to evaluate the recognition results of the optical character recognition after preprocessing.
參考文獻 [1] 政府資料開放平台CNS11643中文標準交換碼全字庫字型下載https://data.gov.tw/dataset/5961.
[2] 中日韓統一表意文字http://jicheng.tw/hanzi/unicode?s=4E00&e=9FFF.
[3] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 1558-1566, 2016.
[4] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B Frey. Adversarial Autoencoders. In NIPS, 2016.
[5] D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. In ICLR, 2014.
[6] D. Pathak, P. Krähenbühl, J. Donahue, T.Darrell, A. A. Efros. Context Encoders: Feature Learning by Inpainting. In CVPR, 2536-2544, 2016.
[7] H. Cho, J. Wang, S. Lee. Text Image Deblurring Using Text-Specific Properties. In ECCV, 524-537, 2012.
[8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville. Improved Training of Wasserstein GANs. In NIPS, 5769-5779, 2017.
[9] J. Pan, Z. Hu, Z. Su, M.-H. Yang. Deblurring Text Images via L0-Regularized Intensity and Gradient Prior. In CVPR, 2901-2908, 2014.
[10] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV, 2242-2251, 2017.
[11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, M. Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. In arXiv:1901.00212, 2019.
[12] M. Arjovsky, S. Chintala, L. Bottou. Wasserstein Generative Adversarial Networks. In ICML, 214-223, 2017.
[13] M. Tan, Q. V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML, 6105-6114, 2019.
[14] M.-Y. Liu, T. Breuel, J. Kautz. Unsupervised Image-to-Image Translation Networks. In NIPS, 700-708, 2017.
[15] O. Elharrouss, N. Almaadeed, S. Al-Maadeed, Y. Akbari. Image inpainting: A review. In Neural Process Letters, 2019.
[16] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 234-241, 2015.
[17] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR, 5967-5976, 2017.
[18] R. Smith. An overview of the Tesseract OCR engine. In ICDAR, 629-633, 2007.
[19] S. Ren, K. He, R. B. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 91-99, 2015.
[20] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, S. P. Smolley. Least Squares Generative Adversarial Networks. In ICCV, 2813-2821, 2017.
描述 碩士
國立政治大學
資訊科學系
107753029
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107753029
資料類型 thesis
dc.contributor.advisor 劉昭麟zh_TW
dc.contributor.advisor Liu, Chao-Linen_US
dc.contributor.author (Authors) 邱柏森zh_TW
dc.contributor.author (Authors) Chiu, Po-Senen_US
dc.creator (作者) 邱柏森zh_TW
dc.creator (作者) Chiu, Po-Senen_US
dc.date (日期) 2021en_US
dc.date.accessioned 2-Mar-2021 14:32:19 (UTC+8)-
dc.date.available 2-Mar-2021 14:32:19 (UTC+8)-
dc.date.issued (上傳時間) 2-Mar-2021 14:32:19 (UTC+8)-
dc.identifier (Other Identifiers) G0107753029en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/134085-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 107753029zh_TW
dc.description.abstract (摘要) 光學字元辨識(Optical Character Recognition)為對漢字圖像檔案進行分析辨識處理,目前已成為一項重要且廣泛使用的技術。然而待辨識的原始資料裡的漢字不一定能被其光學字元辨識模型所辨識,主要原因有以下幾種,一為原始資料裡所使用的漢字字型是未知的,導致每個漢字筆畫上的粗細、長短、形狀特徵等等皆不同,假如剛好此字型不在光學字元辨識模型的辨識範圍內,極有可能會出現辨識困難,二為可能因為種種原因使得原始資料上會有污損或模糊等等,導致得到的掃描的圖像的品質不好,因而無法辨識。綜合以上問題,除非能找到另外能辨識此特徵的辨識模型以外,就只能花費大量時間另外標記類別進行訓練,難以快速解決光學字元辨識問題。
因此本研究實驗應用Auto-encoder技術於建構漢字圖像轉譯模型,能以無監督方式進行訓練來對資料集的掃描圖像做預處理來獲得預處理後的漢字圖像結果,並會使用未經過預處理的漢字圖像在固定的光學字元模型中來做比較,藉以評估預處理後光學字元辨識的辨識結果。
zh_TW
dc.description.abstract (摘要) Optical character recognition is an important and used technology for analyzing and identifying Chinese character image files. However, the original data may not be recognized by its optical character recognition model. The main reasons are as follows. One is that the Chinese characters used in the original data are unknown, which leads to the strokes of each Chinese character thickness, length, shape feature is different. If the font is not within the recognition range of the optical character recognition model, it is likely to be difficult to recognize. On the other hand, the original data may be defaced due to various reasons or blurring resulting in poor quality of the scanned image, which cannot be recognized. Based on the above problems, unless another recognition model can be found that can recognize this feature, lots of time will be spent on training for additional marking categories, it is difficult to quickly solve the problem of optical character recognition.
Therefore, this research experiment uses Auto-encoder technology to construct a Chinese character image translation model, which can be trained in an unsupervised manner to preprocess the scanned images of the data set to obtain the preprocessed Chinese character image results, and will use unsupervised preprocessed Chinese character images are compared in a fixed optical character model to evaluate the recognition results of the optical character recognition after preprocessing.
en_US
dc.description.tableofcontents 1緒論 1
1.1研究動機 1
1.2研究目的 2
2相關研究 3
2.1圖像轉譯 3
2.2 Generative adversarial network 4
2.3 Image-to-Image Translation 4
2.3.1 Supervised Learning 5
2.3.2 Unsupervised Learning 5
3研究方法 6
3.1實驗架構 6
3.2實驗資料 8
3.2.1標準漢字圖像 8
3.2.2地方志漢字圖像 10
3.3圖像資料預處理 13
3.4漢字偵測模型 14
3.4.1模型介紹 14
3.4.2傳統方法 15
3.5圖像轉譯模型 18
3.5.1 Variational Auto-encoder 19
3.5.2 CycleGAN 19
3.5.3 Unsupervised Image-to-Image Translation Networks 19
3.6實驗衡量指標 20
3.6.1光學字元辨識 20
3.6.2圖像分類模型 20
3.7實驗結果評估 21
4實驗設計與結果分析 22
4.1訓練資料與衡量指標 22
4.1.1實驗資料 22
4.1.2衡量模型 23
4.1.3原始資料辨識率 24
4.2實驗VAE模型 26
4.2.1實驗設計 26
4.2.2模型設計 26
4.2.3標準正宋體轉譯標準正楷體結果分析 29
4.2.4地方志漢字轉譯標準正楷體結果分析 31
4.3實驗CycleGAN模型 33
4.3.1實驗設計 33
4.3.2模型設計 34
4.3.3演算法 36
4.3.4標準正宋體轉譯標準正楷體結果分析 37
4.3.5地方志漢字轉譯標準正楷體結果分析 40
4.4實驗UNIT模型 44
4.4.1實驗設計 44
4.4.2模型設計 44
4.4.3演算法 47
4.4.4標準正宋體轉譯標準正楷體結果分析 48
4.4.5地方志漢字轉譯標準正楷體結果分析 53
4.5實驗Adversarial UNIT 模型 57
4.5.1實驗設計 57
4.5.2模型設計 58
4.5.3演算法 60
4.5.4標準正宋體轉譯標準正楷體結果分析 62
4.5.5地方志漢字轉譯標準正楷體取消循環一致性結果分析 65
4.6圖像轉譯的有監督多分類模型訓練實驗 68
4.6.1實驗設計 68
4.6.2地方志漢字結果分析 68
5結論與未來展望 70
5.1結論 70
5.2未來展望 73
參考文獻 74
zh_TW
dc.format.extent 10514249 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107753029en_US
dc.subject (關鍵詞) 影像處理zh_TW
dc.subject (關鍵詞) 圖像轉譯zh_TW
dc.title (題名) 應用 Auto-encoder 技術於無監督漢字圖像轉譯zh_TW
dc.title (題名) Unsupervised Chinese character image translation based on Auto-encoderen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] 政府資料開放平台CNS11643中文標準交換碼全字庫字型下載https://data.gov.tw/dataset/5961.
[2] 中日韓統一表意文字http://jicheng.tw/hanzi/unicode?s=4E00&e=9FFF.
[3] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 1558-1566, 2016.
[4] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B Frey. Adversarial Autoencoders. In NIPS, 2016.
[5] D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. In ICLR, 2014.
[6] D. Pathak, P. Krähenbühl, J. Donahue, T.Darrell, A. A. Efros. Context Encoders: Feature Learning by Inpainting. In CVPR, 2536-2544, 2016.
[7] H. Cho, J. Wang, S. Lee. Text Image Deblurring Using Text-Specific Properties. In ECCV, 524-537, 2012.
[8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville. Improved Training of Wasserstein GANs. In NIPS, 5769-5779, 2017.
[9] J. Pan, Z. Hu, Z. Su, M.-H. Yang. Deblurring Text Images via L0-Regularized Intensity and Gradient Prior. In CVPR, 2901-2908, 2014.
[10] J.-Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV, 2242-2251, 2017.
[11] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, M. Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. In arXiv:1901.00212, 2019.
[12] M. Arjovsky, S. Chintala, L. Bottou. Wasserstein Generative Adversarial Networks. In ICML, 214-223, 2017.
[13] M. Tan, Q. V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In ICML, 6105-6114, 2019.
[14] M.-Y. Liu, T. Breuel, J. Kautz. Unsupervised Image-to-Image Translation Networks. In NIPS, 700-708, 2017.
[15] O. Elharrouss, N. Almaadeed, S. Al-Maadeed, Y. Akbari. Image inpainting: A review. In Neural Process Letters, 2019.
[16] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 234-241, 2015.
[17] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR, 5967-5976, 2017.
[18] R. Smith. An overview of the Tesseract OCR engine. In ICDAR, 629-633, 2007.
[19] S. Ren, K. He, R. B. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 91-99, 2015.
[20] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, S. P. Smolley. Least Squares Generative Adversarial Networks. In ICCV, 2813-2821, 2017.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202100219en_US