學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 漢字古文書光學字元辨識之文本閱讀順序偵測研究
Reading Order Detection in Optical Character Recognition for Historical Chinese Documents
作者 馬行遠
Ma, Hsing-Yuan
貢獻者 劉昭麟<br>黃瀚萱
Liu, Chao-Lin<br>Huang, Hen-Hsen
馬行遠
Ma, Hsing-Yuan
關鍵詞 閱讀順序
排序學習
多模態模型
古籍文本處理
Reading Order Detection
Pairwise Learning-to-Rank
Multimodal Representation
Archival Document ProcessingMultimodal Representation
日期 2023
上傳時間 1-Sep-2023 15:24:26 (UTC+8)
摘要 在光學字元識別(OCR)和文檔版面分析(DLA)的研究和發展已累積了多年的豐富經驗,然而閱讀順序偵測的問題卻仍然是一個待解的難題。閱讀順序偵測在維護文檔原始結構以及對文字偵測後的校正過程中,扮演著至關重要的角色。目前,大部分閱讀順序偵測工具主要依賴於基於規則的算法來處理。對於結構簡單、排列規整且間距均勻的現代文檔,這些方法的確能夠取得不錯的成果。然而,當面對手寫或古代文本中複雜的版面以及不平整的邊緣,現有的方法便明顯力不從心。因此,我們迫切需要一種能對複雜版面的中文古籍進行精準閱讀順序偵測的策略。
本研究以當前主流的OCR框架為基礎,提出一個專注於閱讀順序偵測的模型。此模型著重考量人類閱讀歷程的模擬,將圖像線索視為確定閱讀順序的關鍵線索,並且獨創性地提出一種多模態閱讀順序偵測方法,成功地簡化了閱讀順序任務的處理流程,並在中文古籍MTHv2資料集上進行驗證。實驗結果指出,與先前的研究方法相比,我們的模型成功地降低了25%的頁面錯誤率。此外,它在有限的訓練資料和文字偵測資訊不足的情境下也展現出良好的效能,證明了本研究的韌性和實際應用價值。
Optical character recognition (OCR) and document layout analysis (DLA) have been developed for years.
Still, reading order detection (ROD) is a problem that needs to be solved.
ROD plays an important role in preserving the original structure of the document as well as in post-OCR correction.
Most modern ROD tools rely on rule-based algorithms to place detected text coordinates in order.
These approaches may work well for simple, modern documents because they are well-aligned and spaced.
However, due to the complex layouts and curved layout edges in handwritten or historical documents, current methods are inadequate.
In this paper, we proposed a multimodal approach to ROD by formulating the task as pairwise learning-to-rank.
We evaluate our approach on the MTHv2 dataset.
Experimental results indicate that, compared to previous research methods, our model successfully reduced the page error rate by 25%. Furthermore, it demonstrated good performance even in scenarios with limited training data and insufficient text detection information, proving the robustness and practical value of this research.
參考文獻 [1] Abid, A., Abdalla, A., Abid, A., Khan, D., Alfozan, A., Zou, J.: Gradio: Hasslefree sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569(2019)
[2] Aiello, M., Pegoretti, A.: Textual article clustering in newspaper pages. Applied Artificial Intelligence 20(9), 767–796 (2006).
https://doi.org/10.1080/08839510600903858
[3] Clausner, C., Pletschacher, S., Antonacopoulos, A.: The significance of reading order in document recognition and its evaluation. 2013 12th International Conference
on Document Analysis and Recognition 688–692 (2013)
[4] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner,
T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.:
An image is worth 16x16 words: Transformers for image recognition at scale (2021)
[5] Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.G.: Svtr: Scene
text recognition with a single visual model (2022)
[6] Egly, R., Driver, J., Rafal, R.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. Journal of Experimental
Psychology: General 123(2), 161–177 (jun 1994). https://doi.org/10.1037//0096-
3445.123.2.161
[7] Ferilli, S., Grieco, D., Redavid, D., Esposito, F.: Abstract argumentation for reading
order detection. In: ACM Symposium on Document Engineering (2014)
[8] Gu, Z., Meng, C., Wang, K., Lan, J., Wang, W., Gu, M., Zhang, L.: Xylayoutlm:
Towards layout-aware multimodal networks for visually-rich document understanding (2022). https://doi.org/10.48550/ARXIV.2203.06947
[9] Ha, J., Haralick, R., Phillips, I.: Recursive x-y cut using bounding boxes
of connected components. In: Proceedings of 3rd International Conference
on Document Analysis and Recognition. vol. 2, 952–955 vol.2 (1995).
https://doi.org/10.1109/ICDAR.1995.602059
[10] Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L., Tan, M., Chu, G., Va-
sudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for mobilenetv3.
In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
1314–1324. IEEE Computer Society, Los Alamitos, CA, USA (nov 2019).
https://doi.org/10.1109/ICCV.2019.00140
[11] Iani, C., Nicoletti, R., Rubichi, S., Umiltà, C.: Shifting attention between objects. Cognitive Brain Research 11(1), 157–164 (2001).
https://doi.org/10.1016/S0926-6410(00)00076-8
[12] KENDALL, M.G.: A NEW MEASURE OF RANK CORRELATION. Biometrika
30(1-2), 81–93 (06 1938). https://doi.org/10.1093/biomet/30.1-2.81
[13] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014).
https://doi.org/10.48550/ARXIV.1412.6980
[14] Kosinski, M.: Theory of mind may have spontaneously emerged in large language
models (2023)
[15] Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web. 571 –40
580. WWW ’10, Association for Computing Machinery, New York, NY,
USA (2010). https://doi.org/10.1145/1772690.1772749
[16] Lamy, D., Egeth, H.: Object-based selection: The role of attentional shifts. Perception & Psychophysics 64(1), 52–66 (2002). https://doi.org/10.3758/BF03194557
[17] Li, L., Gao, F., Bu, J., Wang, Y., Yu, Z., Zheng, Q.: An end-to-end ocr text
re-organization sequence learning for rich-text detail image comprehension. In:
Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision – ECCV 2020. 85–100. Springer International Publishing, Cham (2020)
[18] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with
differentiable binarization and adaptive scale fusion (2022)
[19] Liu, Z.Y.: Understanding of Printed Ancient Book and Book Collectors. studentbooktw (2007)
[20] Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization (2020).
https://doi.org/10.48550/ARXIV.2007.06890, https://arxiv.org/abs/2007.06890
[21] Mai, J., Chen, J., Li, B., Qian, G., Elhoseiny, M., Ghanem, B.: Llm as a robotic
brain: Unifying egocentric memory and control (2023)
[22] Malerba, D., Ceci, M., Berardi, M.: Machine Learning for Reading Order Detection in Document Image Understanding, vol. 90, 45–69 (12 2007).
https://doi.org/10.1007/978-3-540-76280-5_3
[23] Mukherjee, K., Khare, A., Verma, A.: A simple dynamic learning rate tuning algorithm for automated training of dnns (2019).
https://doi.org/10.48550/ARXIV.1910.11605
[24] Naoum, A., Nothman, J., Curran, J.: Article segmentation in digitised
newspapers with a 2d markov model. In: 2019 International Conference
on Document Analysis and Recognition (ICDAR). 1007–1014 (2019).
https://doi.org/10.1109/ICDAR.2019.00165
[25] Neisser, U.: Cognitive Psychology. Appleton-Century-Crofts, New York (1967)
[26] Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Interactive simulacra of human behavior (2023)
[27] Posner, M.: Orienting of attention. The Quarterly journal of experimental psychology 32, 3–25 (03 1980). https://doi.org/10.1080/00335558008248231
[28] Quiros, L., Vidal, E.: Learning to sort handwritten text lines in reading order through estimated binary order relations. In: 2020 25th Inter-
national Conference on Pattern Recognition (ICPR). 7661–7668 (2021).
https://doi.org/10.1109/ICPR48806.2021.9413256
[29] Quirós, L., Vidal, E.: Reading order detection on handwritten documents. Neural Computation and Applications 34, 9593–9611 (2022).
https://doi.org/10.1007/s00521-022-06948-5
[30] Villalobos, P., Sevilla, J., Heim, L., Besiroglu, T., Hobbhahn, M., Ho, A.: Will we
run out of data? an analysis of the limits of scaling datasets in machine learning(2022)
[31] Walczyk, J.J.: The interplay between automatic and control processes in reading.
Reading Research Quarterly 35(4), 554–566 (2000), http://www.jstor.org/stable/748099
[32] Wang, Z., Xu, Y., Cui, L., Shang, J., Wei, F.: LayoutReader: Pre-training of
text and layout for reading order detection. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4735–4744.
Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.389, https://
aclanthology.org/2021.emnlp-main.389
[33] Wei, L.: Simple Organization and Version Study of Ancient Books. Macao Library
& Information Management Association, Macao (2004)
[34] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-training
of text and layout for document image understanding. In: Proceedings of the 26th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (aug 2020). https://doi.org/10.1145/3394486.3403172, https://doi.org/
10.1145%2F3394486.3403172
[35] Yang, H., Jin, L., Huang, W., Yang, Z., Lai, S., Sun, J.: Dense and tight detection of chinese characters in historical documents: Datasets and a recognition guided detector. IEEE Access 6, 30174–30183 (2018).
https://doi.org/10.1109/ACCESS.2018.2840218
[36] Yu, H., Chen, J., Li, B., Xue, X.: Chinese character recognition with radicalstructured stroke trees (2022)
描述 碩士
國立政治大學
資訊科學系
110753132
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110753132
資料類型 thesis
dc.contributor.advisor 劉昭麟<br>黃瀚萱zh_TW
dc.contributor.advisor Liu, Chao-Lin<br>Huang, Hen-Hsenen_US
dc.contributor.author (Authors) 馬行遠zh_TW
dc.contributor.author (Authors) Ma, Hsing-Yuanen_US
dc.creator (作者) 馬行遠zh_TW
dc.creator (作者) Ma, Hsing-Yuanen_US
dc.date (日期) 2023en_US
dc.date.accessioned 1-Sep-2023 15:24:26 (UTC+8)-
dc.date.available 1-Sep-2023 15:24:26 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2023 15:24:26 (UTC+8)-
dc.identifier (Other Identifiers) G0110753132en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/147032-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 110753132zh_TW
dc.description.abstract (摘要) 在光學字元識別(OCR)和文檔版面分析(DLA)的研究和發展已累積了多年的豐富經驗,然而閱讀順序偵測的問題卻仍然是一個待解的難題。閱讀順序偵測在維護文檔原始結構以及對文字偵測後的校正過程中,扮演著至關重要的角色。目前,大部分閱讀順序偵測工具主要依賴於基於規則的算法來處理。對於結構簡單、排列規整且間距均勻的現代文檔,這些方法的確能夠取得不錯的成果。然而,當面對手寫或古代文本中複雜的版面以及不平整的邊緣,現有的方法便明顯力不從心。因此,我們迫切需要一種能對複雜版面的中文古籍進行精準閱讀順序偵測的策略。
本研究以當前主流的OCR框架為基礎,提出一個專注於閱讀順序偵測的模型。此模型著重考量人類閱讀歷程的模擬,將圖像線索視為確定閱讀順序的關鍵線索,並且獨創性地提出一種多模態閱讀順序偵測方法,成功地簡化了閱讀順序任務的處理流程,並在中文古籍MTHv2資料集上進行驗證。實驗結果指出,與先前的研究方法相比,我們的模型成功地降低了25%的頁面錯誤率。此外,它在有限的訓練資料和文字偵測資訊不足的情境下也展現出良好的效能,證明了本研究的韌性和實際應用價值。
zh_TW
dc.description.abstract (摘要) Optical character recognition (OCR) and document layout analysis (DLA) have been developed for years.
Still, reading order detection (ROD) is a problem that needs to be solved.
ROD plays an important role in preserving the original structure of the document as well as in post-OCR correction.
Most modern ROD tools rely on rule-based algorithms to place detected text coordinates in order.
These approaches may work well for simple, modern documents because they are well-aligned and spaced.
However, due to the complex layouts and curved layout edges in handwritten or historical documents, current methods are inadequate.
In this paper, we proposed a multimodal approach to ROD by formulating the task as pairwise learning-to-rank.
We evaluate our approach on the MTHv2 dataset.
Experimental results indicate that, compared to previous research methods, our model successfully reduced the page error rate by 25%. Furthermore, it demonstrated good performance even in scenarios with limited training data and insufficient text detection information, proving the robustness and practical value of this research.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究動機 1
第二節 研究背景 2
第三節 研究架構 3
第二章 文獻回顧 4
第一節 主流光學字元辨識框架以及其發展 4
第二節 人類閱讀歷程 6
第三節 中文古籍版面及閱讀順序 7
第四節 閱讀順序研究 8
第五節 閱讀順序偵測在中文古籍上的挑戰 9
第六節 小結 12
第三章 研究方法 13
第一節 問題定義 13
第二節 多模態讀序偵測模型 14
第三節 配對關係矩陣解碼模型 15
第四章 實驗程序 17
第一節 評估方法 17
第二節 實驗資料集 19
第三節 實驗模型 21
第四節 實驗參數設計 23
第五章 實驗結果 24
第一節 多版面資料實驗 24
第二節 簡單與複雜版面實驗 26
第三節 特定版面實驗 26
第四節 小樣本訓練實驗 27
第五節 不同圖像特徵實驗 28
第六節 結論 30
第七節 研究侷限性 30
第八節 未來研究 30
第六章 中文古籍 OCR 系統實作 32
第一節 系統框架 32
第二節 文字偵測模型 33
第三節 文字辨識模型 33
第四節 操作介面與方法 35
第五節 成果範例 36
參考文獻 39
zh_TW
dc.format.extent 18343293 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110753132en_US
dc.subject (關鍵詞) 閱讀順序zh_TW
dc.subject (關鍵詞) 排序學習zh_TW
dc.subject (關鍵詞) 多模態模型zh_TW
dc.subject (關鍵詞) 古籍文本處理zh_TW
dc.subject (關鍵詞) Reading Order Detectionen_US
dc.subject (關鍵詞) Pairwise Learning-to-Ranken_US
dc.subject (關鍵詞) Multimodal Representationen_US
dc.subject (關鍵詞) Archival Document ProcessingMultimodal Representationen_US
dc.title (題名) 漢字古文書光學字元辨識之文本閱讀順序偵測研究zh_TW
dc.title (題名) Reading Order Detection in Optical Character Recognition for Historical Chinese Documentsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Abid, A., Abdalla, A., Abid, A., Khan, D., Alfozan, A., Zou, J.: Gradio: Hasslefree sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569(2019)
[2] Aiello, M., Pegoretti, A.: Textual article clustering in newspaper pages. Applied Artificial Intelligence 20(9), 767–796 (2006).
https://doi.org/10.1080/08839510600903858
[3] Clausner, C., Pletschacher, S., Antonacopoulos, A.: The significance of reading order in document recognition and its evaluation. 2013 12th International Conference
on Document Analysis and Recognition 688–692 (2013)
[4] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner,
T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.:
An image is worth 16x16 words: Transformers for image recognition at scale (2021)
[5] Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.G.: Svtr: Scene
text recognition with a single visual model (2022)
[6] Egly, R., Driver, J., Rafal, R.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. Journal of Experimental
Psychology: General 123(2), 161–177 (jun 1994). https://doi.org/10.1037//0096-
3445.123.2.161
[7] Ferilli, S., Grieco, D., Redavid, D., Esposito, F.: Abstract argumentation for reading
order detection. In: ACM Symposium on Document Engineering (2014)
[8] Gu, Z., Meng, C., Wang, K., Lan, J., Wang, W., Gu, M., Zhang, L.: Xylayoutlm:
Towards layout-aware multimodal networks for visually-rich document understanding (2022). https://doi.org/10.48550/ARXIV.2203.06947
[9] Ha, J., Haralick, R., Phillips, I.: Recursive x-y cut using bounding boxes
of connected components. In: Proceedings of 3rd International Conference
on Document Analysis and Recognition. vol. 2, 952–955 vol.2 (1995).
https://doi.org/10.1109/ICDAR.1995.602059
[10] Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L., Tan, M., Chu, G., Va-
sudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for mobilenetv3.
In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
1314–1324. IEEE Computer Society, Los Alamitos, CA, USA (nov 2019).
https://doi.org/10.1109/ICCV.2019.00140
[11] Iani, C., Nicoletti, R., Rubichi, S., Umiltà, C.: Shifting attention between objects. Cognitive Brain Research 11(1), 157–164 (2001).
https://doi.org/10.1016/S0926-6410(00)00076-8
[12] KENDALL, M.G.: A NEW MEASURE OF RANK CORRELATION. Biometrika
30(1-2), 81–93 (06 1938). https://doi.org/10.1093/biomet/30.1-2.81
[13] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014).
https://doi.org/10.48550/ARXIV.1412.6980
[14] Kosinski, M.: Theory of mind may have spontaneously emerged in large language
models (2023)
[15] Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web. 571 –40
580. WWW ’10, Association for Computing Machinery, New York, NY,
USA (2010). https://doi.org/10.1145/1772690.1772749
[16] Lamy, D., Egeth, H.: Object-based selection: The role of attentional shifts. Perception & Psychophysics 64(1), 52–66 (2002). https://doi.org/10.3758/BF03194557
[17] Li, L., Gao, F., Bu, J., Wang, Y., Yu, Z., Zheng, Q.: An end-to-end ocr text
re-organization sequence learning for rich-text detail image comprehension. In:
Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision – ECCV 2020. 85–100. Springer International Publishing, Cham (2020)
[18] Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with
differentiable binarization and adaptive scale fusion (2022)
[19] Liu, Z.Y.: Understanding of Printed Ancient Book and Book Collectors. studentbooktw (2007)
[20] Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization (2020).
https://doi.org/10.48550/ARXIV.2007.06890, https://arxiv.org/abs/2007.06890
[21] Mai, J., Chen, J., Li, B., Qian, G., Elhoseiny, M., Ghanem, B.: Llm as a robotic
brain: Unifying egocentric memory and control (2023)
[22] Malerba, D., Ceci, M., Berardi, M.: Machine Learning for Reading Order Detection in Document Image Understanding, vol. 90, 45–69 (12 2007).
https://doi.org/10.1007/978-3-540-76280-5_3
[23] Mukherjee, K., Khare, A., Verma, A.: A simple dynamic learning rate tuning algorithm for automated training of dnns (2019).
https://doi.org/10.48550/ARXIV.1910.11605
[24] Naoum, A., Nothman, J., Curran, J.: Article segmentation in digitised
newspapers with a 2d markov model. In: 2019 International Conference
on Document Analysis and Recognition (ICDAR). 1007–1014 (2019).
https://doi.org/10.1109/ICDAR.2019.00165
[25] Neisser, U.: Cognitive Psychology. Appleton-Century-Crofts, New York (1967)
[26] Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Interactive simulacra of human behavior (2023)
[27] Posner, M.: Orienting of attention. The Quarterly journal of experimental psychology 32, 3–25 (03 1980). https://doi.org/10.1080/00335558008248231
[28] Quiros, L., Vidal, E.: Learning to sort handwritten text lines in reading order through estimated binary order relations. In: 2020 25th Inter-
national Conference on Pattern Recognition (ICPR). 7661–7668 (2021).
https://doi.org/10.1109/ICPR48806.2021.9413256
[29] Quirós, L., Vidal, E.: Reading order detection on handwritten documents. Neural Computation and Applications 34, 9593–9611 (2022).
https://doi.org/10.1007/s00521-022-06948-5
[30] Villalobos, P., Sevilla, J., Heim, L., Besiroglu, T., Hobbhahn, M., Ho, A.: Will we
run out of data? an analysis of the limits of scaling datasets in machine learning(2022)
[31] Walczyk, J.J.: The interplay between automatic and control processes in reading.
Reading Research Quarterly 35(4), 554–566 (2000), http://www.jstor.org/stable/748099
[32] Wang, Z., Xu, Y., Cui, L., Shang, J., Wei, F.: LayoutReader: Pre-training of
text and layout for reading order detection. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4735–4744.
Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.389, https://
aclanthology.org/2021.emnlp-main.389
[33] Wei, L.: Simple Organization and Version Study of Ancient Books. Macao Library
& Information Management Association, Macao (2004)
[34] Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-training
of text and layout for document image understanding. In: Proceedings of the 26th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (aug 2020). https://doi.org/10.1145/3394486.3403172, https://doi.org/
10.1145%2F3394486.3403172
[35] Yang, H., Jin, L., Huang, W., Yang, Z., Lai, S., Sun, J.: Dense and tight detection of chinese characters in historical documents: Datasets and a recognition guided detector. IEEE Access 6, 30174–30183 (2018).
https://doi.org/10.1109/ACCESS.2018.2840218
[36] Yu, H., Chen, J., Li, B., Xue, X.: Chinese character recognition with radicalstructured stroke trees (2022)
zh_TW