學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 基於深度學習之行草中文古文辨識
Cursive Chinese Calligraphy Recognition For Historical Documents—A Deep Learning Approach
作者 戎諒
Jung, Liang
貢獻者 廖文宏
Liao, Wen-Hung
戎諒
Jung, Liang
關鍵詞 草書中文字
文字辨識
深度學習
Cursive Chinese calligraphy
Text recognition
Deep learning
日期 2019
上傳時間 5-九月-2019 16:15:17 (UTC+8)
摘要 書法是中國古代重要的書寫工具,亦是一種藝術形式。其中,草書書法在規範與結構上相較其他書體更為自由且能顯露出書法家個性。然而,此一藝術體現使得草書書法的文字更難以被辨識,即便是於人文專家,進行歷史文本數位化作業的仍是一項曠日廢時的工作。然而,光學文字辨識 (OCR)在結構簡化、風格迥異的草書中文字上的效果無法滿足實務需求。因此,協助草書書法辨識的輔助工具需求被出。

在這項研究裡,我們使用基於深度學習的方法進行草書書法辨識的研究。目前並沒有一套公開可被檢視的草書中文字資料集,我們經過網路蒐集並以人力進行資料整理後,彙整了一套包含 5301字、42862張圖片的草書中文字資料集。

由於針對草書書法的相關研究相當稀少,我們將草書辨識延伸思考為手寫中文字辨識的子問題並 探討相關研究 。我們以過去在手寫中文字辨識上表現優異的M6網路架構為基礎,提出加入 Batch Normalization與額外的全連接層的EM6、由DenseNet-121簡化而來的 DenseNet-18,以及考慮中文手寫字特性的三叉網路框架。雖然這幾種架構在訓練階段的準確度相近,但 EM6網路有最高的測試準確度。我們最後選擇使用 EM6模型,以二南堂法帖作為測試資料,在18668張測試圖片的辨識任務上達到64.3%的Top-1準確度及80.5% Top-5準確度。
Calligraphy is one of the most important writing tool as well as cultural art in ancient China. Compared with other calligraphy styles, the cursive script is least restricted and oftentimes exhibits the personality of calligraphers. However, this style-oriented expression makes the cursive script hard to recognize even for trained experts. Furthermore, optical character recognition (OCR) systems are designed for printed texts and perform poorly on cursive scripts. The call for auxiliary tools for cursive Chinese calligraphy text recognition has thus arisen. In this study, we employ the deep learning-based approach to the recognition of cursive Chinese calligraphy. As there are currently no open datasets for cursive Chinese calligraphy, we collected 42862 images of 5301 different Chinese characters written in cursive format to train our neural network.
Since there exists little previous research on this topic, we consider the cursive Chinese calligraphy recognition task as a variant of offline handwriting recognition. We proposed and investigated three different neural network architectures, namely, Enhanced M6 (EM6), DenseNet-18, and 3-way neural network. EM6 is constructed by adding batch normalization and an additional fully connected layer to decrease the impact of overfitting; The DenseNet-18 is simplified from DensetNet-121 with shallower network depth. The 3-way neural network is devised based on our observation of Chinese writing. These networks achieved similar performance during the training phase. However, the EM6 outperforms the others in terms of test accuracy and hence becomes our model of choice. We evaluate the proposed EM6 model on 18668 cursive Chinese calligraphy images extracted from BiSouth model calligraphy and achieve 64.3% Top-1 accuracy and 80.5% Top-5 accuracy, respectively.
參考文獻 [1] Ivakhnenko, Alekseĭ Grigorʹevich, and Valentin Grigorévich Lapa. Cybernetic predicting devices. No. TR-EE66-5. PURDUE UNIV LAFAYETTE IND SCHOOL OF ELECTRICAL ENGINEERING, 1966.
[2] ImageNet. http://www.image-net.org/
[3] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/
[4] Liu, Cheng-Lin, et al. "CASIA online and offline Chinese handwriting databases." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011
[5] Huang, Yi-Fan. “Recognition of low resolution text using deep learning approach”. MS Thesis. National Chengchi University, ,
[6] De Mulder, Wim, Steven Bethard, and Marie-Francine Moens. "A survey on the application of recurrent neural networks to statistical language modeling." Computer Speech & Language 30.1 (2015): 61-98.
[7] Deng, Jia, et al. "Imagenet large scale visual recognition competition 2012 (ILSVRC2012)." See net. org/challenges/LSVRC (2012).
[8] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[9] Park, E., et al. "ILSVRC-2017." URL http://www. image-net. org/challenges/LSVRC/2017 (2017).
[10] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." arXiv preprint arXiv:1709.01507 7 (2017).
[11] Waibel, Alexander, et al. "Phoneme recognition using time-delay neural networks." Readings in speech recognition. 1990. 393-404.
[12] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[13] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
[14] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[15] Huang, Gao, et al. "Densely connected convolutional networks." CVPR. Vol. 1. No. 2. 2017.
[16] Karpathy, Andrej, F. F. Li, and J. Johnson. "CS231n: Convolutional neural networks for visual recognition, 2016." URL http://cs231n. github. io (2017).
[17] Cireşan, Dan, Ueli Meier, and Jürgen Schmidhuber. "Multi-column deep neural networks for image classification." arXiv preprint arXiv:1202.2745 (2012).
[18] Cireşan, Dan, and Ueli Meier. "Multi-column deep neural networks for offline handwritten Chinese character classification." Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015.
[19] C.-L. Liu, F. Yin, Q.-F. Wang, D.-H. Wang, ICDAR 2011 Chinese handwriting recognition competition, in: Proceedings of the 11th ICDAR, Beijing, China, 2011, pp. 1464–1469.
[20] Yin, Fei, et al. "ICDAR 2013 Chinese handwriting recognition competition." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013.
[21] Zhang, Yuhao. "Deep convolutional network for handwritten chinese character recognition." Computer Science Department, Stanford University (2015).
[22] Chen, Ying-Zhoug. “Segmentation and Recognition of Chinese Characters in Cursive Script in Calligraphy Documents”. MS Thesis. National Chao Tung University, 2001,
[23] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML Deep Learning Workshop. Vol. 2. 2015.
[24] Shufa,https://shufa.supfree.net/dity.asp
[25] Abadi, Martín, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016.
[26] Paszke, Adam, et al. "Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration." (2017).
[27] Chollet, François. "Keras." (2015).
[28] Seide, Frank, and Amit Agarwal. "CNTK: Microsoft`s open-source deep-learning toolkit." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
[29] 終身教育司-教育部4808個常用字, https://ws.moe.edu.tw/001/Upload/6/relfile/6490/38921/d190213c-7af8-45bf-b70e-48b4469aad72.pdf
[30] Jones, Eric, Travis Oliphant, and Pearu Peterson. "{SciPy}: Open source scientific tools for {Python}." (2014).
[31] 新北市教育局-自編國小一至六年級生字簿, https://eword.ntpc.edu.tw/
[32] Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[33] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
[34] Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): e0130140.
[35] Alber, Maximilian, et al. "iNNvestigate neural networks!." Journal of Machine Learning Research 20.93 (2019): 1-8.
[36] Zhang, T. Y., and Ching Y. Suen. "A fast parallel algorithm for thinning digital patterns." Communications of the ACM 27.3 (1984): 236-239.
描述 碩士
國立政治大學
資訊科學系
106753027
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106753027
資料類型 thesis
dc.contributor.advisor 廖文宏zh_TW
dc.contributor.advisor Liao, Wen-Hungen_US
dc.contributor.author (作者) 戎諒zh_TW
dc.contributor.author (作者) Jung, Liangen_US
dc.creator (作者) 戎諒zh_TW
dc.creator (作者) Jung, Liangen_US
dc.date (日期) 2019en_US
dc.date.accessioned 5-九月-2019 16:15:17 (UTC+8)-
dc.date.available 5-九月-2019 16:15:17 (UTC+8)-
dc.date.issued (上傳時間) 5-九月-2019 16:15:17 (UTC+8)-
dc.identifier (其他 識別碼) G0106753027en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/125644-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 106753027zh_TW
dc.description.abstract (摘要) 書法是中國古代重要的書寫工具,亦是一種藝術形式。其中,草書書法在規範與結構上相較其他書體更為自由且能顯露出書法家個性。然而,此一藝術體現使得草書書法的文字更難以被辨識,即便是於人文專家,進行歷史文本數位化作業的仍是一項曠日廢時的工作。然而,光學文字辨識 (OCR)在結構簡化、風格迥異的草書中文字上的效果無法滿足實務需求。因此,協助草書書法辨識的輔助工具需求被出。

在這項研究裡,我們使用基於深度學習的方法進行草書書法辨識的研究。目前並沒有一套公開可被檢視的草書中文字資料集,我們經過網路蒐集並以人力進行資料整理後,彙整了一套包含 5301字、42862張圖片的草書中文字資料集。

由於針對草書書法的相關研究相當稀少,我們將草書辨識延伸思考為手寫中文字辨識的子問題並 探討相關研究 。我們以過去在手寫中文字辨識上表現優異的M6網路架構為基礎,提出加入 Batch Normalization與額外的全連接層的EM6、由DenseNet-121簡化而來的 DenseNet-18,以及考慮中文手寫字特性的三叉網路框架。雖然這幾種架構在訓練階段的準確度相近,但 EM6網路有最高的測試準確度。我們最後選擇使用 EM6模型,以二南堂法帖作為測試資料,在18668張測試圖片的辨識任務上達到64.3%的Top-1準確度及80.5% Top-5準確度。
zh_TW
dc.description.abstract (摘要) Calligraphy is one of the most important writing tool as well as cultural art in ancient China. Compared with other calligraphy styles, the cursive script is least restricted and oftentimes exhibits the personality of calligraphers. However, this style-oriented expression makes the cursive script hard to recognize even for trained experts. Furthermore, optical character recognition (OCR) systems are designed for printed texts and perform poorly on cursive scripts. The call for auxiliary tools for cursive Chinese calligraphy text recognition has thus arisen. In this study, we employ the deep learning-based approach to the recognition of cursive Chinese calligraphy. As there are currently no open datasets for cursive Chinese calligraphy, we collected 42862 images of 5301 different Chinese characters written in cursive format to train our neural network.
Since there exists little previous research on this topic, we consider the cursive Chinese calligraphy recognition task as a variant of offline handwriting recognition. We proposed and investigated three different neural network architectures, namely, Enhanced M6 (EM6), DenseNet-18, and 3-way neural network. EM6 is constructed by adding batch normalization and an additional fully connected layer to decrease the impact of overfitting; The DenseNet-18 is simplified from DensetNet-121 with shallower network depth. The 3-way neural network is devised based on our observation of Chinese writing. These networks achieved similar performance during the training phase. However, the EM6 outperforms the others in terms of test accuracy and hence becomes our model of choice. We evaluate the proposed EM6 model on 18668 cursive Chinese calligraphy images extracted from BiSouth model calligraphy and achieve 64.3% Top-1 accuracy and 80.5% Top-5 accuracy, respectively.
en_US
dc.description.tableofcontents 摘要......I
ABSTRACT......II
致謝......III
目錄......IV
表目錄......VII
圖目錄......VIII
第一章 緒論......1
1.1 研究背景與動機......1
1.2 研究目的......4
1.3 研究架構......5
第二章 技術背景與相關研究......6
2.1 關於深度學習......6
2.2 卷積神經網路......8
2.3基於深度學習之手寫中文字辨識......10
2.3.1 CASIA-HWDB資料集......10
2.3.2 Multi-Column Deep Neural Networks......11
2.3.3 Deep Convolutional Network for Handwritten Chinese Character Recognition......12
2.3.4 小結......13
第三章 資料集介紹與概念驗證......14
3.1 資料集介紹......14
3.2 Siamese Convolutional Neural Network......15
3.3 類M11 CNN網路架構......16
3.4 分類錯誤案例分析......17
3.5 深度學習套件及開發環境......18
第四章 研究方法......19
4.1 資料處理......19
4.1.1 類別細分化......19
4.1.2 類別聚合後處理......22
4.1.3 小結......23
4.2 神經網路模型選擇......23
4.2.1 EM6—Enhanced M6 Network......23
4.2.2 混合CNN及LSTM的三叉網路......24
4.2.2.1 三叉網路架構概觀......25
4.2.2.2 卷積網路模組選擇......27
4.3 不同網路的效果比較......28
第五章 實驗與結果分析......30
5.1 不同圖片前處理的影響......30
5.1.1 Bounding Box處理......30
5.1.2 輸入圖片大小......32
5.1.3 應用Erosion與Dilation於Image Augmentation......32
5.2 外部測試資料集--二南堂法帖......33
5.2.1 二南堂法帖簡介......33
5.2.2 二南堂法帖的資料前處理......34
5.2.3 實驗結果與分類錯誤分析......35
5.3 卷積神經網路視覺化分析......36
第六章 結論與未來研究......38
參考文獻......40
附錄一 類別一覽......43
附錄二 用於測試資料集之書法作品一覽......58
附錄三 二南堂測試資料與訓練資料之誤差(最嚴重之5個字)......61
zh_TW
dc.format.extent 7534077 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106753027en_US
dc.subject (關鍵詞) 草書中文字zh_TW
dc.subject (關鍵詞) 文字辨識zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) Cursive Chinese calligraphyen_US
dc.subject (關鍵詞) Text recognitionen_US
dc.subject (關鍵詞) Deep learningen_US
dc.title (題名) 基於深度學習之行草中文古文辨識zh_TW
dc.title (題名) Cursive Chinese Calligraphy Recognition For Historical Documents—A Deep Learning Approachen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Ivakhnenko, Alekseĭ Grigorʹevich, and Valentin Grigorévich Lapa. Cybernetic predicting devices. No. TR-EE66-5. PURDUE UNIV LAFAYETTE IND SCHOOL OF ELECTRICAL ENGINEERING, 1966.
[2] ImageNet. http://www.image-net.org/
[3] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/
[4] Liu, Cheng-Lin, et al. "CASIA online and offline Chinese handwriting databases." Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011
[5] Huang, Yi-Fan. “Recognition of low resolution text using deep learning approach”. MS Thesis. National Chengchi University, ,
[6] De Mulder, Wim, Steven Bethard, and Marie-Francine Moens. "A survey on the application of recurrent neural networks to statistical language modeling." Computer Speech & Language 30.1 (2015): 61-98.
[7] Deng, Jia, et al. "Imagenet large scale visual recognition competition 2012 (ILSVRC2012)." See net. org/challenges/LSVRC (2012).
[8] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[9] Park, E., et al. "ILSVRC-2017." URL http://www. image-net. org/challenges/LSVRC/2017 (2017).
[10] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." arXiv preprint arXiv:1709.01507 7 (2017).
[11] Waibel, Alexander, et al. "Phoneme recognition using time-delay neural networks." Readings in speech recognition. 1990. 393-404.
[12] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[13] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
[14] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[15] Huang, Gao, et al. "Densely connected convolutional networks." CVPR. Vol. 1. No. 2. 2017.
[16] Karpathy, Andrej, F. F. Li, and J. Johnson. "CS231n: Convolutional neural networks for visual recognition, 2016." URL http://cs231n. github. io (2017).
[17] Cireşan, Dan, Ueli Meier, and Jürgen Schmidhuber. "Multi-column deep neural networks for image classification." arXiv preprint arXiv:1202.2745 (2012).
[18] Cireşan, Dan, and Ueli Meier. "Multi-column deep neural networks for offline handwritten Chinese character classification." Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015.
[19] C.-L. Liu, F. Yin, Q.-F. Wang, D.-H. Wang, ICDAR 2011 Chinese handwriting recognition competition, in: Proceedings of the 11th ICDAR, Beijing, China, 2011, pp. 1464–1469.
[20] Yin, Fei, et al. "ICDAR 2013 Chinese handwriting recognition competition." Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 2013.
[21] Zhang, Yuhao. "Deep convolutional network for handwritten chinese character recognition." Computer Science Department, Stanford University (2015).
[22] Chen, Ying-Zhoug. “Segmentation and Recognition of Chinese Characters in Cursive Script in Calligraphy Documents”. MS Thesis. National Chao Tung University, 2001,
[23] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML Deep Learning Workshop. Vol. 2. 2015.
[24] Shufa,https://shufa.supfree.net/dity.asp
[25] Abadi, Martín, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016.
[26] Paszke, Adam, et al. "Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration." (2017).
[27] Chollet, François. "Keras." (2015).
[28] Seide, Frank, and Amit Agarwal. "CNTK: Microsoft`s open-source deep-learning toolkit." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
[29] 終身教育司-教育部4808個常用字, https://ws.moe.edu.tw/001/Upload/6/relfile/6490/38921/d190213c-7af8-45bf-b70e-48b4469aad72.pdf
[30] Jones, Eric, Travis Oliphant, and Pearu Peterson. "{SciPy}: Open source scientific tools for {Python}." (2014).
[31] 新北市教育局-自編國小一至六年級生字簿, https://eword.ntpc.edu.tw/
[32] Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[33] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
[34] Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): e0130140.
[35] Alber, Maximilian, et al. "iNNvestigate neural networks!." Journal of Machine Learning Research 20.93 (2019): 1-8.
[36] Zhang, T. Y., and Ching Y. Suen. "A fast parallel algorithm for thinning digital patterns." Communications of the ACM 27.3 (1984): 236-239.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201900798en_US