基於深度學習之低解析度文字辨識

Publications-Theses

Article View/Open

pdf(119)

Publication Export

Google Scholar^TM

題名	基於深度學習之低解析度文字辨識 Recognition of low resolution text using deep learning approach
作者	黃依凡
貢獻者	廖文宏 Liao, Wen-Hung 黃依凡
關鍵詞	文字辨識低解析度卷積神經網路 Text recognition Convolution neural networks Low resolution
日期	2017
上傳時間	10-Aug-2017 09:59:08 (UTC+8)
摘要	本論文關注的是電腦視覺中一個已充分研究過的議題，即光學文字識別。然而，我們主要著重在一種非常特別的圖片類型:解析度非常低並且有大量失真與干擾的印刷中文字。雖然使用卷積神經網路已能成功穩定識別高解析度印刷文字或手寫文字，然而，對於品質非常低的印刷中文字仍有幾個挑戰，需要進一步分析研究。具體來說，我們的資料集是點陣印刷機產生的 31,570 張文字圖片，包含模糊文字、缺少筆劃的文字以及文字與其他文字或圖形重疊的文字圖片。為了有效地解決這些困難，我們實驗不同的深層神經網路架構以及超參數，最後獲得辨識成果最佳的設置。在 1,530 類，平均解析度為 16x18 像素的圖片中，top-1 和 top-5 的準確率分別為 71% 和 87%。 Recent advances in deep neural networks have changed the landscape of computer vision and pattern recognition research significantly. Convolutional neural networks (CNN), for example, have demonstrated outstanding capabilities in image classification, in many cases exceeding human performance. Many tasks that did not get satisfactory results using conventional machine learning approaches are now being actively re-examined using deep learning techniques. This thesis is concerned with a well-investigated topic in computer vision, namely, optical character recognition (OCR). Our main focus, however, is a very specific class of input: printed Chinese texts with very low resolution and a significant amount of distortion/interference. Whereas the recognition of high-resolution texts, either printed or handwritten, has been successfully tackled using convolutional neural networks, the analysis of very low-quality printed Chinese texts poses several challenges that require further study. Specifically, our dataset consists of~31570~text images generated with dot-matrix printers, blurred texts, texts with missing strokes, and texts overlapping with other texts or graphs.To effectively address these difficulties, we have experimented with different deep neural networks with various combinations of network architectures and hyperparameters. The results are reported and discussed in order to obtain an optimal setting for the recognition task. The top-1 and top-5 accuracies are 71% and 87%, respectively, for input images with an average resolution of 16x18 pixels belonging to 1530 classes.
參考文獻	[1] Yuhao Zhang. Deep convolutional network for handwritten chinese character recognition. CS231N course project. [2] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using con- volutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2414–2423, 2016. [3] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156–3164, 2015. [4] Xu Chen. Convolution neural networks for chinese handwriting recognition. [5] Charles Jacobs, Patrice Y Simard, Paul Viola, and James Rinker. Text recognition of low- resolution document images. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pages 695–699. IEEE, 2005. [6] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006. [7] YuanqingLin, FengjunLv, ShenghuoZhu, MingYang, TimotheeCour, KaiYu, Liangliang Cao, and Thomas Huang. Large-scale image classification: fast feature extraction and svm training. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1689–1696. IEEE, 2011. [8] AlexKrizhevsky, IlyaSutskever, and Geoffrey Hinton.Imagenetclassificationwithdeep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 1026–1034, 2015. [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. IEEE, pages 770 – 778, 2016. [11] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016. [12] Yann LeCun, LD Jackel, Léon Bottou, Corinna Cortes, John S Denker, Harris Drucker, Isabelle Guyon, UA Muller, E Sackinger, Patrice Simard, et al. Learning algorithms for classification: A comparison on handwritten digit recognition. Neural networks: the sta- tistical mechanics perspective, 261:276, 1995. [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision, pages 346–361. Springer, 2014. [14] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 1440–1448, 2015. [15] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real- time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015. [16] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640, 2015. [17] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. Ssd: Single shot multibox detector. arXiv preprint arXiv:1512.02325, 2015. [18]Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman.Deepinsideconvolutionalnet- works: Visualising image classification models and saliency maps. arXiv preprint arXiv: 1312.6034, 2013. [19] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 1520–1528, 2015. [20] Zhuoyao Zhong, Lianwen Jin, and Zecheng Xie. High performance offline handwritten chinese character recognition using googlenet and directional feature maps. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pages 846– 850. IEEE, 2015. [21] KarenSimonyanandAndrewZisserman.Verydeepconvolutionalnetworksforlarge-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [22] MatthewDZeilerandRobFergus.Visualizingandunderstandingconvolutionalnetworks. In European conference on computer vision, pages 818–833. Springer, 2014. [23] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Jour- nal of Machine Learning Research, 15(1):1929–1958, 2014. [24] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedfor- ward neural networks. In Aistats, volume 9, pages 249–256, 2010.
描述	碩士國立政治大學資訊科學學系 104753010
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0104753010
資料類型	thesis

dc.contributor.advisor	廖文宏	zh_TW
dc.contributor.advisor	Liao, Wen-Hung	en_US
dc.contributor.author (Authors)	黃依凡	zh_TW
dc.creator (作者)	黃依凡	zh_TW
dc.date (日期)	2017	en_US
dc.date.accessioned	10-Aug-2017 09:59:08 (UTC+8)	-
dc.date.available	10-Aug-2017 09:59:08 (UTC+8)	-
dc.date.issued (上傳時間)	10-Aug-2017 09:59:08 (UTC+8)	-
dc.identifier (Other Identifiers)	G0104753010	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/111787	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學學系	zh_TW
dc.description (描述)	104753010	zh_TW
dc.description.abstract (摘要)	本論文關注的是電腦視覺中一個已充分研究過的議題，即光學文字識別。然而，我們主要著重在一種非常特別的圖片類型:解析度非常低並且有大量失真與干擾的印刷中文字。雖然使用卷積神經網路已能成功穩定識別高解析度印刷文字或手寫文字，然而，對於品質非常低的印刷中文字仍有幾個挑戰，需要進一步分析研究。具體來說，我們的資料集是點陣印刷機產生的 31,570 張文字圖片，包含模糊文字、缺少筆劃的文字以及文字與其他文字或圖形重疊的文字圖片。為了有效地解決這些困難，我們實驗不同的深層神經網路架構以及超參數，最後獲得辨識成果最佳的設置。在 1,530 類，平均解析度為 16x18 像素的圖片中，top-1 和 top-5 的準確率分別為 71% 和 87%。	zh_TW
dc.description.abstract (摘要)	Recent advances in deep neural networks have changed the landscape of computer vision and pattern recognition research significantly. Convolutional neural networks (CNN), for example, have demonstrated outstanding capabilities in image classification, in many cases exceeding human performance. Many tasks that did not get satisfactory results using conventional machine learning approaches are now being actively re-examined using deep learning techniques. This thesis is concerned with a well-investigated topic in computer vision, namely, optical character recognition (OCR). Our main focus, however, is a very specific class of input: printed Chinese texts with very low resolution and a significant amount of distortion/interference. Whereas the recognition of high-resolution texts, either printed or handwritten, has been successfully tackled using convolutional neural networks, the analysis of very low-quality printed Chinese texts poses several challenges that require further study. Specifically, our dataset consists of~31570~text images generated with dot-matrix printers, blurred texts, texts with missing strokes, and texts overlapping with other texts or graphs.To effectively address these difficulties, we have experimented with different deep neural networks with various combinations of network architectures and hyperparameters. The results are reported and discussed in order to obtain an optimal setting for the recognition task. The top-1 and top-5 accuracies are 71% and 87%, respectively, for input images with an average resolution of 16x18 pixels belonging to 1530 classes.	en_US
dc.description.tableofcontents	第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 3 第二章技術背景與相關研究 4 2.1 深度學習的背景與突破 4 2.2 CNN 概述 6 2.3 相關研究 12 第三章資料集 17 3.1 發票測試集 17 3.2 CASIA-HWDB1.1 18 3.3 Tesseract 資料集 18 第四章研究方法及架構 20 4.1 深度學習工具及環境 20 4.2 CNN 架構 20 4.3 Caffe solver 22 4.4 實驗流程 23 第五章實驗及析 24 5.1 實驗一: CASIA 資料集 24 5.2 實驗二: 加入變化 25 5.3 實驗三: padding 4 個像素 27 5.4 實驗四: 隨機 padding 29 5.5 實驗五: 修改文字亮度 33 5.6 實驗結果 37 第六章結論與未來研究方向 45 參考文獻 46 附錄 49	zh_TW
dc.format.extent	10630802 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0104753010	en_US
dc.subject (關鍵詞)	文字辨識	zh_TW
dc.subject (關鍵詞)	低解析度	zh_TW
dc.subject (關鍵詞)	卷積神經網路	zh_TW
dc.subject (關鍵詞)	Text recognition	en_US
dc.subject (關鍵詞)	Convolution neural networks	en_US
dc.subject (關鍵詞)	Low resolution	en_US
dc.title (題名)	基於深度學習之低解析度文字辨識	zh_TW
dc.title (題名)	Recognition of low resolution text using deep learning approach	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Yuhao Zhang. Deep convolutional network for handwritten chinese character recognition. CS231N course project. [2] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image style transfer using con- volutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2414–2423, 2016. [3] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156–3164, 2015. [4] Xu Chen. Convolution neural networks for chinese handwriting recognition. [5] Charles Jacobs, Patrice Y Simard, Paul Viola, and James Rinker. Text recognition of low- resolution document images. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05), pages 695–699. IEEE, 2005. [6] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006. [7] YuanqingLin, FengjunLv, ShenghuoZhu, MingYang, TimotheeCour, KaiYu, Liangliang Cao, and Thomas Huang. Large-scale image classification: fast feature extraction and svm training. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1689–1696. IEEE, 2011. [8] AlexKrizhevsky, IlyaSutskever, and Geoffrey Hinton.Imagenetclassificationwithdeep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 1026–1034, 2015. [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. IEEE, pages 770 – 778, 2016. [11] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016. [12] Yann LeCun, LD Jackel, Léon Bottou, Corinna Cortes, John S Denker, Harris Drucker, Isabelle Guyon, UA Muller, E Sackinger, Patrice Simard, et al. Learning algorithms for classification: A comparison on handwritten digit recognition. Neural networks: the sta- tistical mechanics perspective, 261:276, 1995. [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision, pages 346–361. Springer, 2014. [14] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 1440–1448, 2015. [15] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real- time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015. [16] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640, 2015. [17] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott Reed. Ssd: Single shot multibox detector. arXiv preprint arXiv:1512.02325, 2015. [18]Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman.Deepinsideconvolutionalnet- works: Visualising image classification models and saliency maps. arXiv preprint arXiv: 1312.6034, 2013. [19] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Com- puter Vision, pages 1520–1528, 2015. [20] Zhuoyao Zhong, Lianwen Jin, and Zecheng Xie. High performance offline handwritten chinese character recognition using googlenet and directional feature maps. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pages 846– 850. IEEE, 2015. [21] KarenSimonyanandAndrewZisserman.Verydeepconvolutionalnetworksforlarge-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [22] MatthewDZeilerandRobFergus.Visualizingandunderstandingconvolutionalnetworks. In European conference on computer vision, pages 818–833. Springer, 2014. [23] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Jour- nal of Machine Learning Research, 15(1):1929–1958, 2014. [24] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedfor- ward neural networks. In Aistats, volume 9, pages 249–256, 2010.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM