學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 多語言的場景文字偵測
Multilingual Scene Text Detection作者 梁苡萱
Liang, Yi Hsuan貢獻者 廖文宏
Liao, Wen Hung
梁苡萱
Liang, Yi Hsuan關鍵詞 場景文字偵測
雙邊濾波器
最大穩定極值區域
Scene text detection
Bilateral filter
Maximally Stable Extremal Region(MSER)日期 2014 上傳時間 1-十二月-2014 14:19:48 (UTC+8) 摘要 影像中的文字訊息,通常包含著與場景內容相關的重要資訊,如地點、名稱、指示、警告等,因此如何有效地在影像中擷取文字區塊,進而解讀其意義,成為近來電腦視覺領域中相當受矚目的議題。然而在眾多的場景文字偵測方法裡,絕大多數是以英文為偵測目標語言,中文方面的研究相當稀少,而且辨識率遠不及英文。因此,本論文提出以中文和英文為偵測目標語言的方法,分成以下四個主要程序:一、前處理,利用雙邊濾波器(Bilateral filter)使文字區域更加穩定;二、候選文字資訊擷取,考慮文字特徵,選用Canny 邊緣偵測和最大穩定極值區域(Maximally Stable Extremal Region),分別提取文字邊緣和區域特徵,並結合兩者來優化擷取的資訊;三、文字連結,依中文字結構和直式、橫式兩種書寫方向,設置幾何條件連結候選文字字串;四、候選字串分類,以SVM加入影像中文字的特徵,分類文字字串和非文字字串。使得此方法可以偵測中文和英文兩種語言,並且達到不錯的辨識效果。
Text messages in an image usually contain useful information related to the scene, such as location, name, direction and warning. As such, robust and efficient scene text detection has gained increasing attention in the area of computer vision recently. However, most existing scene text detection methods are devised to process Latin-based languages. For the few researches that reported the investigation of Chinese text, the detection rate was inferior to the result for English. In this thesis, we propose a multilingual scene text detection algorithm for both Chinese and English. The method comprises of four stages: 1. Preprocessing by bilateral filter to make the text region more stable. 2. Extracting candidate text edge and region using Canny edge detector and Maximally Stable Extremal Region (MSER) respectively. Then combine these two features to achieve more robust results. 3. Linking candidate characters: considering both horizontal and vertical direction, character candidates are clustered into text candidates by using geometrical constraints. 4. Classifying candidate texts using support vector machine (SVM), the text and non-text areas are separated. Experimental results show that the proposed method detects both Chinese and English texts, and achieve satisfactory performance compared to those approaches designed only for English detection.參考文獻 [1] 王冠智. 雲端筆記之混合式文字切割與辨識.國立政治大學資訊科學研究所碩士論文,2012.[2] Trung Quy Phan, Palaiahnakote Shivakumara, Chew Lim Tan.“Text detection in natural scenes using Gradient Vector Flow-Guided symmetry. ”ICPR 2012.[3] Lukáš Neumann.“ Scene text recognition in images and video.”PhD Proposal, 2012.[4] Teofilo E. de Campos, Bodla Rakesh Babu, and Manik Varma.“ Character recognition in natural images.” In Proceedings of the International Conference on Computer Vision Theory and Applications, 2009.[5] J.J. Lee, P.H. Lee, S.W. Lee, A. Yuille, and C. Koch. “Adaboost for text detection in natural scene. ”In Document Analysis and Recognition (ICDAR), 2011.[6] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao.“Robust Text Detection in Natural Scene Images.” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2013.[7] 陳學志,張瓅勻,邱郁秀,宋曜廷,張國恩. 中文部件組字與形構資料庫之建立及其在識字教學的應用. 教育心理學報2011 43卷[8] Gang Zhou, Yuehu Liu, Quan Meng, and Yuanlin Zhang.Detection Multilingual text in Natural Scene.IEEE-ISAS 2011.[9] X. Chen and A. L. Yuille.″Detecting and reading text in natural scenes.″CVPR, 2004.[10] Boris Epshtein, Eyal Ofek, and Yonatan Wexler.“Detecting text in natural scenes with stroke width transform.” CVPR, page 2963-2970. IEEE, 2010.[11] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Yu. Detecting texts of arbitrary orientations in natural images. CVPR, IEEE, 2012.[12] C. Yi and Y. Tian.“Text string detection from natural scenes by structure-based partition and grouping.”IEEE Trans. on Image Processing, 2011.[13] Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, and Bernd Girod. ”Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions.” IEEE Trans. on Image Processing, 2011.[14] J. Matas, O. Chum, M. Urban, T. Pajdla. “Robust Wide Baseline Stereo From Maximally Stable Etremal Region.” Proc. Of British Machine Version Conference, 2002.[15] Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu. ”A hybrid approach to detect and localize texts in natural secne images.” IEEE Trans. Image Processing, 2011.[16] Wayne Niblack. ”An Introduction to Digital Image Processing.” Prentice-Hall, 1986.[17] John Canny. “A computational approach to edge detection.” Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-8(6):679–698, 1986.[18] Carlo Tomasi and Roberto Manduchi, “Bilateral filtering for gray and color images,” in Computer Vision, 1998. Sixth International Conference on . IEEE, 1998, pp. 839– 846.[19] Boser, B. E,. Guyon, I. M, Vapnik, V. N, "A training algorithm for optimal margin classifiers", "Proceedings of the fifth annual workshop on Computational learning theory - COLT `92". p. 144.[20] L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” in Proc. Int. Conf. on Document Analysis and Recognition, 2013.[21] C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao, “Scene text detection using graph model built upon maximally stable extremal regions,” Pattern Recognition Letters, vol. 34, no. 2, pp. 107–116, 2013.[22] A. Shahab, F. Shafait, and A. Dengel, “ICDAR 2011 robust reading competition challenge 2: Reading text in scene images,” in ICDAR 2011, 2011, pp. 1491–1496.[23] A. Shahab, F. Shafait, and A. Dengel, “ICDAR 2011 robust reading competition challenge 2: Reading text in scene images,” in ICDAR 2011, 2011, pp. 1491–1496. 描述 碩士
國立政治大學
資訊科學學系
101753021
103資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101753021 資料類型 thesis dc.contributor.advisor 廖文宏 zh_TW dc.contributor.advisor Liao, Wen Hung en_US dc.contributor.author (作者) 梁苡萱 zh_TW dc.contributor.author (作者) Liang, Yi Hsuan en_US dc.creator (作者) 梁苡萱 zh_TW dc.creator (作者) Liang, Yi Hsuan en_US dc.date (日期) 2014 en_US dc.date.accessioned 1-十二月-2014 14:19:48 (UTC+8) - dc.date.available 1-十二月-2014 14:19:48 (UTC+8) - dc.date.issued (上傳時間) 1-十二月-2014 14:19:48 (UTC+8) - dc.identifier (其他 識別碼) G0101753021 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/71721 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 101753021 zh_TW dc.description (描述) 103 zh_TW dc.description.abstract (摘要) 影像中的文字訊息,通常包含著與場景內容相關的重要資訊,如地點、名稱、指示、警告等,因此如何有效地在影像中擷取文字區塊,進而解讀其意義,成為近來電腦視覺領域中相當受矚目的議題。然而在眾多的場景文字偵測方法裡,絕大多數是以英文為偵測目標語言,中文方面的研究相當稀少,而且辨識率遠不及英文。因此,本論文提出以中文和英文為偵測目標語言的方法,分成以下四個主要程序:一、前處理,利用雙邊濾波器(Bilateral filter)使文字區域更加穩定;二、候選文字資訊擷取,考慮文字特徵,選用Canny 邊緣偵測和最大穩定極值區域(Maximally Stable Extremal Region),分別提取文字邊緣和區域特徵,並結合兩者來優化擷取的資訊;三、文字連結,依中文字結構和直式、橫式兩種書寫方向,設置幾何條件連結候選文字字串;四、候選字串分類,以SVM加入影像中文字的特徵,分類文字字串和非文字字串。使得此方法可以偵測中文和英文兩種語言,並且達到不錯的辨識效果。 zh_TW dc.description.abstract (摘要) Text messages in an image usually contain useful information related to the scene, such as location, name, direction and warning. As such, robust and efficient scene text detection has gained increasing attention in the area of computer vision recently. However, most existing scene text detection methods are devised to process Latin-based languages. For the few researches that reported the investigation of Chinese text, the detection rate was inferior to the result for English. In this thesis, we propose a multilingual scene text detection algorithm for both Chinese and English. The method comprises of four stages: 1. Preprocessing by bilateral filter to make the text region more stable. 2. Extracting candidate text edge and region using Canny edge detector and Maximally Stable Extremal Region (MSER) respectively. Then combine these two features to achieve more robust results. 3. Linking candidate characters: considering both horizontal and vertical direction, character candidates are clustered into text candidates by using geometrical constraints. 4. Classifying candidate texts using support vector machine (SVM), the text and non-text areas are separated. Experimental results show that the proposed method detects both Chinese and English texts, and achieve satisfactory performance compared to those approaches designed only for English detection. en_US dc.description.tableofcontents 第一章 緒論………………………………………………………………………...11.1研究背景與目的…………………………………………………………..11.2流程架構與方法…………………………………………………………..3 1.3論文架構…………………………………………………………………..5第二章 相關研究…………………………………………………………………....62.1 基於滑動視窗方法……………………………………………………...62.2 基於連接元件方法……………………………………………………...82.3 混合式方法…………………………………………………….............102.4 小結…………………………………………………….........................11第三章 研究方法…………………………………………………………………..123.1 影像前處理…………………………………………………….............123.1.1 雙邊濾波器……………………………………………………......133.1.2 高通濾波器……………………………………………………......143.1.3 中值濾波器……………………………………………………......143.2 文字特徵的影像資訊擷取和組合…………………………………….153.2.1 不同亮度影像Canny邊緣擷取…………………………………..173.2.1.1 亮度影像…………………………………………………...173.2.1.2 Canny邊緣偵測……………………………………………183.2.2 最大穩定極值區域………………………………………………..213.2.3 整合Canny和MSER……………………………………………..233.3 過濾候選文字與字串連結…………………………………………….263.3.1 過濾候選文字……………………………………………………..263.3.1.1 筆畫寬度…………………………………………………...263.3.1.2 長寬比……………………………………………………...293.3.2 文字字串連結……………………………………………………..313.3.2.1 相鄰文字連結……………………………………………...323.3.2.2 重疊文字連結……………………………………………...333.4 文字字串與非文字字串分類………………………………………….353.4.1 支持向量機………………………………………………………..363.4.2 SVM的特徵擷取…………………………………………………40第四章 實驗結果與討論………………………………………………………….444.1 ICDAR 2011 Database…………………………………………………454.2 多語言場景影像……………………………………………………….50第五章 結論與未來計劃………………………………………………………...54參考文獻……………………………………………………………………………55 zh_TW dc.format.extent 4101245 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101753021 en_US dc.subject (關鍵詞) 場景文字偵測 zh_TW dc.subject (關鍵詞) 雙邊濾波器 zh_TW dc.subject (關鍵詞) 最大穩定極值區域 zh_TW dc.subject (關鍵詞) Scene text detection en_US dc.subject (關鍵詞) Bilateral filter en_US dc.subject (關鍵詞) Maximally Stable Extremal Region(MSER) en_US dc.title (題名) 多語言的場景文字偵測 zh_TW dc.title (題名) Multilingual Scene Text Detection en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] 王冠智. 雲端筆記之混合式文字切割與辨識.國立政治大學資訊科學研究所碩士論文,2012.[2] Trung Quy Phan, Palaiahnakote Shivakumara, Chew Lim Tan.“Text detection in natural scenes using Gradient Vector Flow-Guided symmetry. ”ICPR 2012.[3] Lukáš Neumann.“ Scene text recognition in images and video.”PhD Proposal, 2012.[4] Teofilo E. de Campos, Bodla Rakesh Babu, and Manik Varma.“ Character recognition in natural images.” In Proceedings of the International Conference on Computer Vision Theory and Applications, 2009.[5] J.J. Lee, P.H. Lee, S.W. Lee, A. Yuille, and C. Koch. “Adaboost for text detection in natural scene. ”In Document Analysis and Recognition (ICDAR), 2011.[6] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao.“Robust Text Detection in Natural Scene Images.” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2013.[7] 陳學志,張瓅勻,邱郁秀,宋曜廷,張國恩. 中文部件組字與形構資料庫之建立及其在識字教學的應用. 教育心理學報2011 43卷[8] Gang Zhou, Yuehu Liu, Quan Meng, and Yuanlin Zhang.Detection Multilingual text in Natural Scene.IEEE-ISAS 2011.[9] X. Chen and A. L. Yuille.″Detecting and reading text in natural scenes.″CVPR, 2004.[10] Boris Epshtein, Eyal Ofek, and Yonatan Wexler.“Detecting text in natural scenes with stroke width transform.” CVPR, page 2963-2970. IEEE, 2010.[11] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Yu. Detecting texts of arbitrary orientations in natural images. CVPR, IEEE, 2012.[12] C. Yi and Y. Tian.“Text string detection from natural scenes by structure-based partition and grouping.”IEEE Trans. on Image Processing, 2011.[13] Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, and Bernd Girod. ”Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions.” IEEE Trans. on Image Processing, 2011.[14] J. Matas, O. Chum, M. Urban, T. Pajdla. “Robust Wide Baseline Stereo From Maximally Stable Etremal Region.” Proc. Of British Machine Version Conference, 2002.[15] Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu. ”A hybrid approach to detect and localize texts in natural secne images.” IEEE Trans. Image Processing, 2011.[16] Wayne Niblack. ”An Introduction to Digital Image Processing.” Prentice-Hall, 1986.[17] John Canny. “A computational approach to edge detection.” Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-8(6):679–698, 1986.[18] Carlo Tomasi and Roberto Manduchi, “Bilateral filtering for gray and color images,” in Computer Vision, 1998. Sixth International Conference on . IEEE, 1998, pp. 839– 846.[19] Boser, B. E,. Guyon, I. M, Vapnik, V. N, "A training algorithm for optimal margin classifiers", "Proceedings of the fifth annual workshop on Computational learning theory - COLT `92". p. 144.[20] L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” in Proc. Int. Conf. on Document Analysis and Recognition, 2013.[21] C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao, “Scene text detection using graph model built upon maximally stable extremal regions,” Pattern Recognition Letters, vol. 34, no. 2, pp. 107–116, 2013.[22] A. Shahab, F. Shafait, and A. Dengel, “ICDAR 2011 robust reading competition challenge 2: Reading text in scene images,” in ICDAR 2011, 2011, pp. 1491–1496.[23] A. Shahab, F. Shafait, and A. Dengel, “ICDAR 2011 robust reading competition challenge 2: Reading text in scene images,” in ICDAR 2011, 2011, pp. 1491–1496. zh_TW