基於自動情境標註之圖像檢索工具發展與數位人文應用研究

Publications-Theses

Article View/Open

pdf(2)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	基於自動情境標註之圖像檢索工具發展與數位人文應用研究 Developing an Image Retrieval Tool based on Automatic Context Annotation for Digital Humanities Research
作者	趙映翔 Zhao, Ying-Xiang
貢獻者	陳志銘 Chen, Chih-Ming 趙映翔 Zhao, Ying-Xiang
關鍵詞	數位人文圖像檢索語意鴻溝物件標註情境標註深度學習自動圖像標註 Mask R-CNN TF-IDF SVM 行為分析 Digital humanities Image retrieval Semantic gap Object annotation Contextual annotation Deep learning Automatic image annotation Mask R-CNN TF-IDF SVM Behavioral analysis
日期	2021
上傳時間	2-Sep-2021 16:36:05 (UTC+8)
摘要	「圖像檢索」在資訊蓬勃發展的現代，已經成為數位人文研究的重要方式之一。而影響傳統「基於文本的圖像檢索工具(Text-Based Image Retrieval, TBIR)」之圖像檢索效能的主要問題，為人工所注入代表圖像後設資料(metadata)與使用者所下檢索詞之間的語意鴻溝(semantic gap)。隨著電腦視覺技術快速發展而衍伸出的自動圖像標註(automatic image annotation)，由機器為其自動添加後設資料以降低的語意鴻溝。然而自動圖像標註的物件標註僅能找到具有該物件特徵，對於使用者的圖像檢索及圖像理解的幫助有限，進而促成本研究探索「自動情境標註」為減少圖像情境與人之間的語意鴻溝，並發展出得以有效輔助人文學者進行圖像檢索及圖像解讀之數位人文工具。因此，本研究發展出「基於自動情境標註之圖像檢索工具(Image Retrieval Tool Based on Automatic Context Annotation, IRT-ACA)」。該系統的核心技術採用Mask R-CNN、TF-IDF及SVM，主要目的為圖像中的實體物件識別，以及抽象的情境識別，並將所得之數據以標籤化形式提供使用者用於圖像檢索與瀏覽，讓使用者得以可以快速萃取數位圖像中的實體物件以及抽象情境之訊息。進而促進人文學者更有效率地解讀圖像情境。為驗證本研究發展之IRT-ACA是否有助於人文學者進行圖像解讀，本研究採用實驗研究法之對抗平衡設計，將實驗對象分為兩組，根據不同的系統使用順序來依次操作「IRT-ACA」與「基於文本的圖像檢索工具(Text-Based Image Retrieval, TBIR)」來完成檢索任務學習單。並透過行為歷程記錄來完整記錄實驗對象的系統操作行為、科技接受度問卷來反映實驗對象對於系統的實際感受，以及半結構式訪談來瞭解實驗對象的想法與建議，透過多種方法進行交互驗證，以瞭解本研究發展之IRT-ACA與TBIR在自動情境標註之準確度、解讀圖像情境之成效以及科技接受度上的差異。研究結果發現：第一，IRT-ACA的自動情境標註準確度已足以有效輔助使用者解讀圖像情境；第二，使用TBIR與IRT-ACA在解讀圖像情境之成效上達顯著差異，並且IRT-ACA顯著優於TBIR；第三，使用TBIR與IRT-ACA在整體科技接受度上達顯著差異，並且IRT-ACA顯著優於TBIR，但其中的系統易用性未達顯著差異。從訪談分析中顯示，實驗對象對於兩個系統的操作難意度及使用流暢性上均感到滿意，因此給予系統易用性分數差異不大；第四，IRT-ACA的標籤型檢索比起自由下達檢索詞的檢索更能促進實驗對象的檢索意願；第五，使用IRT-ACA高分組使用者之檢索行為更充分使用到所有檢索功能；第六，IRT-ACA使用者之查看圖像至筆記紀錄之轉移率高於TBIR。 Image retrieval has become one of the significant approaches in digital humanities research in the digital age. The main problem affecting the performance of Text-Based Image Retrieval (TBIR) is the semantic gap between the manually determined metadata for images and the users’ search terms or keywords. With the rapid development of computer vision technology in recent years, automatic image annotation developed by machine learning schemes can reduce the semantic gap between humans through automatically adding metadata based on the identified image objects’ tags. However, the object tags determined by automatic image annotation can only find the characteristics of image objects, which is of little help to users’ image retrieval and image comprehension because they are still too low level from human’s perspectives. It prompted this research to develop automatic context annotation as a digital humanities tool that can effectively assist humanities scholars in image context interpretation by reducing the semantic gap between the subject of the image context and humans. Therefore, this research developed an Image Retrieval Tool Based on Automatic Context Annotation (IRT-ACA). The core technology of the tool is Mask R-CNN, TF-IDF, and SVM, which aims to identify physical objects and abstract contexts hidden in images and provide users with more rich metadata in the form of object and contextual tags for image retrieval and browsing, so that users can quickly extract needed information from images, thus facilitating the more efficient interpretation of image contexts by humanists. To verify whether IRT-ACA developed in this research is beneficial to humanities scholars in image interpretation, this research utilized a counterbalanced design of the experimental research method to examine the research questions. Users were divided into two groups and operated the IRT-ACA and the TBIR tool alternately to complete the two designed image retrieval tasks. Besides, the behavioral history recorder was used to record the system operation behavior of the users using the IRT-ACA and the TBIR completely. The technology acceptance questionnaire was used to reflect the actual feelings and perceptron of the experimental subjects towards the two systems. The semi-structured interview was used to understand the thoughts, ideas, and suggestions of the users who alternately used the two image retrieval systems. The research results are summarized as follows. First, the accuracy of the automatic context annotation of IRT-ACA was sufficient to interpret the image context effectively. Second, there was a significant difference in the effectiveness of interpreting the image context between TBIR and IRT-ACA as well as IRT-ACA is significantly superior to TBIR. Third, there was a significant difference in the overall technology acceptance and perceived usefulness between TBIR and IRT-ACA as well as IRT-ACA is significantly superior to TBIR, but there was no significant difference in the perceived ease of use between TBIR and IRT-ACA. The analysis results show that users were satisfied with the ease of operation and the smoothness of using the two systems, so the difference in the scores of perceived ease of use was not significant. Fourth, the labeled retrieval provided by the IRT-ACA promoted the research subjects’ willingness to retrieve more than the free retrieval of using keywords. Fifth, the retrieval behaviors of the IRT-ACA users with high image interpretation performance made full use of all retrieval functions, including contextual tags’ search, object tags’ search, full-text search, and title research. Sixth, the transfer rate from viewing images to take notes of the users who used IRT-ACA is higher than that of users who used TBIR.
參考文獻	陳勇汀（2017）。行為順序檢定：滯後序列分析/ Behavior Analysis: Lag Sequential Analysis。檢自https://pulipulichen.github.io/HTML-Lag-Sequential-Analysis/ 項潔、陳麗華（2014）。數位人文－學科對話與融合的新領域。數位人文研究與技藝（頁9-23） Agosti, M., Ferro, N., Orio, N., & Ponchia, C. (2014). CULTURA outcomes for improving the user’s engagement with cultural heritage collections. Procedia Computer Science, 38, 34-39. doi:10.1016/j.procs.2014.10.007 Beaudoin, J. E. (2014). A framework of image use among archaeologists, architects, art historians and artists. Journal of Documentation, 70(1), 119-147. doi:10.1108/JD-12-2012-0157 Beaudoin, J. E., & Brady, J. E. (2011). Finding visual information: a study of image resources used by archaeologists, architects, art historians, and artists. Art Documentation: Journal of the Art Libraries Society of North America, 30(2), 24-36. doi:10.1086/adx.30.2.41244062 Bradshaw, B. (2000). Semantic based image retrieval: a probabilistic approach. In Proceedings of the eighth ACM international conference on Multimedia, 167-176. doi:10.1145/354384.354456 Brooks, J. (2019). COCO Annotator. Jsbroks/coco-annotator. https://github.com/jsbroks/coco-annotator/ Burdescu, D. D., Mihai, C. G., Stanescu, L., & Brezovan, M. (2013). Automatic image annotation and semantic based image retrieval for medical domain. Neurocomputing, 109, 33-48. doi:10.1016/j.neucom.2012.07.030 Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394-410. doi:10.1109/TPAMI.2007.61 Chen, C.-M., & Tsay, M.-Y. (2017). Applications of collaborative annotation system in digital curation, crowdsourcing, and digital humanities. The Electronic Library, 35(6), 1122-1140. doi:10.1108/EL-08-2016-0172 Chen, S. H., & Chen, Y. H. (2017). A content-based image retrieval method based on the google cloud vision api and wordnet. Intelligent Information and Database Systems, 651-662. doi:10.1007/978-3-319-54472-4_61 Chen, Y., Zhou, X. S., & Huang, T. S. (2001). One-class SVM for learning in image retrieval. In Proceedings 2001 International Conference on Image Processing, IEEE 2001, 34-37. doi:10.1109/ICIP.2001.958946 Cheng, Q., Zhang, Q., Fu, P., Tu, C., & Li, S. (2018). A survey and analysis on automatic image annotation. Pattern Recognition, 79, 242-259. doi:10.1016/j.patcog.2018.02.017 Eakins, J., & Graham, M. (1999). Content-based image retrieval. Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: learning global representations for image search. Computer Vision – ECCV 2016, 9910, 241-257. doi:10.1007/978-3-319-46466-4_15 Hare, J. S., Lewis, P. H., Enser, P. G., & Sandom, C. J. (2006). Mind the gap: another look at the problem of the semantic gap in image retrieval. In Multimedia Content Analysis, Management, and Retrieval 2006, 6073, 607309.1-607309.12. doi:10.1117/12.647755 He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, CVPR 2017, 2961-2969. doi:10.1109/ICCV.2017.322 Henningsmeier, J. (1998). The foreign sources of Dianshizhai huabao 點石齋畫報, A nineteenth century Shanghai illustrated magazine. Ming qing yanjiu, 7(1), 59-91. doi:10.1163/24684791-90000374 Huang, C., Xu, H., Xie, L., Zhu, J., Xu, C., & Tang, Y. (2018). Large-scale semantic web image retrieval using bimodal deep learning techniques. Information Sciences, 430-431, 331-348. doi:10.1016/j.ins.2017.11.043 Huang, Z., Zhong, Z., Sun, L., & Huo, Q. (2019). Mask R-CNN with pyramid attention network for scene text detection. In 2019 IEEE Winter Conference on Applications of Computer Vision, IEEE 2019, 764-772. doi:10.1109/WACV.2019.00086 Hwang, G. J., Yang, L. H., & Wang, S. Y. (2013). A concept map-embedded educational computer game for improving students’ learning performance in natural science courses. Computers & Education, 69, 121-130. doi:10.1016/j.compedu.2013.07.008 Hyvönen, E., Saarela, S., Styrman, A., & Viljanen, K. (2003). Ontology-based image retrieval. In Proceedings of XML Finland Conference, 27-51. Retrieved from https://seco.cs.aalto.fi/publications/2002/hyvonen-styrman-saarela-ontology-based-image-retrieval-2002.pdf Ivasic-Kos, M., Ipsic, I., & Ribaric, S. (2015). A knowledge-based multi-layered image annotation system. Expert Systems with Applications, 42(24), 9539-9553. doi:10.1016/j.eswa.2015.07.068 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105. https://doi.org/10.1145/3065386 Li, Z., & Tang, J. (2015). Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia, 17(11), 1989-1999. doi:10.1109/TMM.2015.2477035 Liu, Y., Zhang, D., & Lu, G. (2008). Region-based image retrieval with high-level semantics using decision tree learning. Pattern Recognition, 41(8), 2554-2570. doi:10.1016/j.patcog.2007.12.003 Liu, Y., Zhang, D., Lu, G., & Ma, W. Y. (2007). A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1), 262-282. doi:10.1016/j.patcog.2006.04.045 Llamas, J., Lerones, P. M., Zalama, E., & Gómez-García-Bermejo, J. (2016). Applying deep learning techniques to cultural heritage images within the inception project. Progress in Cultural Heritage: Documentation, Preservation, and Protection, 25-32. doi:10.1007/978-3-319-48974-2_4 Lorang, E., Soh, L.-K., Datla, M. V., & Kulwicki, S. (2015). Developing an image-nased classifier for detecting poetic content in historic newspaper collections. D-Lib Magazine, 21(7/8). doi:10.1045/july2015-lorang Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, IEEE 1999, 1150-1157. doi:10.1109/ICCV.1999.790410 Murthy, V. N., Maji, S., & Manmatha, R. (2015). Automatic image annotation using deep learning representations. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 603-606. doi:10.1145/2671188.2749391 Murthy, V. S. V. S., Vamsidhar, E., Kumar, J. S., & Rao, P. S. (2010). Content based image retrieval using Hierarchical and K-means clustering techniques. International Journal of Engineering Science and Technology, 2(3), 209-212. Nguyen, H. V., & Bai, L. (2011). Cosine similarity metric learning for face verification. In Computer Vision – ACCV 2010, 709-720. doi:10.1007/978-3-642-19309-5_55 Özyurt, F., Tuncer, T., Avci, E., Koç, M., & Serhatlioğlu, İ. (2019). A novel liver image classification method using perceptual hash-based convolutional neural network. Arabian Journal for Science and Engineering, 44(4), 3173-3182. doi:10.1007/s13369-018-3454-1 Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2016, 779-788. doi: 10.1109/CVPR.2016.91 Rui, Y., & Huang, T. (2000). Optimizing learning in image retrieval. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2000, 236-243. doi: 10.1109/CVPR.2000.855825 Schreibman, Susan. (2012). Digital humanities: centres and peripheries. Historical Social Research-Historische Sozialforschung, 37(3), 46-58. Shyu, C. R. (2000). Relevance feedback decision trees in content-based image retrieval. In 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries, 68-72. Su, J. H., Wang, B. W., Yeh, H. H., & Tseng, V. S. (2009). Ontology-based semantic web image retrieval by utilizing textual and visual annotations. In 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, 425-428. Milan, Italy: IEEE. doi:10.1109/WI-IAT.2009.317 Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349-1380. doi:10.1109/34.895972 Terras, M. (2012). Image processing and digital humanities. Digital Humanities in Practice, 71-90. Facet. Retrieved from http://discovery.ucl.ac.uk/1327983/ Vijayarajan, V., Dinakaran, M., Tejaswin, P., & Lohani, M. (2016). A generic framework for ontology-based information retrieval and image retrieval in web data. Human-Centric Computing and Information Sciences, 6(1), 18. doi:10.1186/s13673-016-0074-1 Wan, H. L., & Chowdhury, M. (2003). Image semantic classification by using SVM. Journal of software, 14(11), 1891-1899. Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., ... & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2018, 5265-5274. doi:10.1109/CVPR.2018.00552 Wang, L. Z., & Gu, X. E. (2007). Dian-Shi-Zhai pictorial suiting both refined and popular tastes. Journal of Shanxi Normal University (Social Science Edition), 4. Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of machine learning research, 10(2). Xiaoqing, Y. (2003). The Dianshizhai pictorial: Shanghai urban life, 1884-1898 (No. 98). Ann Arbor, MI: University of Michigan Press. Yin, S., Chen, W., & Qin, X. (2009). Research on semantic network image retrieval method. In 2009 International Conference on Future BioMedical Information Engineering, 449-452. doi: 10.1109/FBIE.2009.5405823 Zhang, D., Islam, Md. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346-362. doi:10.1016/j.patcog.2011.05.013 Zhang, Y. J. (2006). Semantic-based visual information retrieval. Pennsylvania, PA: IGI Global.
描述	碩士國立政治大學圖書資訊與檔案學研究所 108155017
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0108155017
資料類型	thesis

dc.contributor.advisor	陳志銘	zh_TW
dc.contributor.advisor	Chen, Chih-Ming	en_US
dc.contributor.author (Authors)	趙映翔	zh_TW
dc.contributor.author (Authors)	Zhao, Ying-Xiang	en_US
dc.creator (作者)	趙映翔	zh_TW
dc.creator (作者)	Zhao, Ying-Xiang	en_US
dc.date (日期)	2021	en_US
dc.date.accessioned	2-Sep-2021 16:36:05 (UTC+8)	-
dc.date.available	2-Sep-2021 16:36:05 (UTC+8)	-
dc.date.issued (上傳時間)	2-Sep-2021 16:36:05 (UTC+8)	-
dc.identifier (Other Identifiers)	G0108155017	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/136927	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	圖書資訊與檔案學研究所	zh_TW
dc.description (描述)	108155017	zh_TW
dc.description.abstract (摘要)	「圖像檢索」在資訊蓬勃發展的現代，已經成為數位人文研究的重要方式之一。而影響傳統「基於文本的圖像檢索工具(Text-Based Image Retrieval, TBIR)」之圖像檢索效能的主要問題，為人工所注入代表圖像後設資料(metadata)與使用者所下檢索詞之間的語意鴻溝(semantic gap)。隨著電腦視覺技術快速發展而衍伸出的自動圖像標註(automatic image annotation)，由機器為其自動添加後設資料以降低的語意鴻溝。然而自動圖像標註的物件標註僅能找到具有該物件特徵，對於使用者的圖像檢索及圖像理解的幫助有限，進而促成本研究探索「自動情境標註」為減少圖像情境與人之間的語意鴻溝，並發展出得以有效輔助人文學者進行圖像檢索及圖像解讀之數位人文工具。因此，本研究發展出「基於自動情境標註之圖像檢索工具(Image Retrieval Tool Based on Automatic Context Annotation, IRT-ACA)」。該系統的核心技術採用Mask R-CNN、TF-IDF及SVM，主要目的為圖像中的實體物件識別，以及抽象的情境識別，並將所得之數據以標籤化形式提供使用者用於圖像檢索與瀏覽，讓使用者得以可以快速萃取數位圖像中的實體物件以及抽象情境之訊息。進而促進人文學者更有效率地解讀圖像情境。為驗證本研究發展之IRT-ACA是否有助於人文學者進行圖像解讀，本研究採用實驗研究法之對抗平衡設計，將實驗對象分為兩組，根據不同的系統使用順序來依次操作「IRT-ACA」與「基於文本的圖像檢索工具(Text-Based Image Retrieval, TBIR)」來完成檢索任務學習單。並透過行為歷程記錄來完整記錄實驗對象的系統操作行為、科技接受度問卷來反映實驗對象對於系統的實際感受，以及半結構式訪談來瞭解實驗對象的想法與建議，透過多種方法進行交互驗證，以瞭解本研究發展之IRT-ACA與TBIR在自動情境標註之準確度、解讀圖像情境之成效以及科技接受度上的差異。研究結果發現：第一，IRT-ACA的自動情境標註準確度已足以有效輔助使用者解讀圖像情境；第二，使用TBIR與IRT-ACA在解讀圖像情境之成效上達顯著差異，並且IRT-ACA顯著優於TBIR；第三，使用TBIR與IRT-ACA在整體科技接受度上達顯著差異，並且IRT-ACA顯著優於TBIR，但其中的系統易用性未達顯著差異。從訪談分析中顯示，實驗對象對於兩個系統的操作難意度及使用流暢性上均感到滿意，因此給予系統易用性分數差異不大；第四，IRT-ACA的標籤型檢索比起自由下達檢索詞的檢索更能促進實驗對象的檢索意願；第五，使用IRT-ACA高分組使用者之檢索行為更充分使用到所有檢索功能；第六，IRT-ACA使用者之查看圖像至筆記紀錄之轉移率高於TBIR。	zh_TW
dc.description.abstract (摘要)	Image retrieval has become one of the significant approaches in digital humanities research in the digital age. The main problem affecting the performance of Text-Based Image Retrieval (TBIR) is the semantic gap between the manually determined metadata for images and the users’ search terms or keywords. With the rapid development of computer vision technology in recent years, automatic image annotation developed by machine learning schemes can reduce the semantic gap between humans through automatically adding metadata based on the identified image objects’ tags. However, the object tags determined by automatic image annotation can only find the characteristics of image objects, which is of little help to users’ image retrieval and image comprehension because they are still too low level from human’s perspectives. It prompted this research to develop automatic context annotation as a digital humanities tool that can effectively assist humanities scholars in image context interpretation by reducing the semantic gap between the subject of the image context and humans. Therefore, this research developed an Image Retrieval Tool Based on Automatic Context Annotation (IRT-ACA). The core technology of the tool is Mask R-CNN, TF-IDF, and SVM, which aims to identify physical objects and abstract contexts hidden in images and provide users with more rich metadata in the form of object and contextual tags for image retrieval and browsing, so that users can quickly extract needed information from images, thus facilitating the more efficient interpretation of image contexts by humanists. To verify whether IRT-ACA developed in this research is beneficial to humanities scholars in image interpretation, this research utilized a counterbalanced design of the experimental research method to examine the research questions. Users were divided into two groups and operated the IRT-ACA and the TBIR tool alternately to complete the two designed image retrieval tasks. Besides, the behavioral history recorder was used to record the system operation behavior of the users using the IRT-ACA and the TBIR completely. The technology acceptance questionnaire was used to reflect the actual feelings and perceptron of the experimental subjects towards the two systems. The semi-structured interview was used to understand the thoughts, ideas, and suggestions of the users who alternately used the two image retrieval systems. The research results are summarized as follows. First, the accuracy of the automatic context annotation of IRT-ACA was sufficient to interpret the image context effectively. Second, there was a significant difference in the effectiveness of interpreting the image context between TBIR and IRT-ACA as well as IRT-ACA is significantly superior to TBIR. Third, there was a significant difference in the overall technology acceptance and perceived usefulness between TBIR and IRT-ACA as well as IRT-ACA is significantly superior to TBIR, but there was no significant difference in the perceived ease of use between TBIR and IRT-ACA. The analysis results show that users were satisfied with the ease of operation and the smoothness of using the two systems, so the difference in the scores of perceived ease of use was not significant. Fourth, the labeled retrieval provided by the IRT-ACA promoted the research subjects’ willingness to retrieve more than the free retrieval of using keywords. Fifth, the retrieval behaviors of the IRT-ACA users with high image interpretation performance made full use of all retrieval functions, including contextual tags’ search, object tags’ search, full-text search, and title research. Sixth, the transfer rate from viewing images to take notes of the users who used IRT-ACA is higher than that of users who used TBIR.	en_US
dc.description.tableofcontents	目次 i 圖目次 iii 表目次 iv 第一章緒論 1 第一節研究背景與動機 1 第二節研究目的 3 第三節研究問題 4 第四節研究範圍與限制 4 第五節名詞解釋 5 第二章文獻探討 7 第一節圖像檢索系統發展 7 第二節自動圖像標註 8 第三章系統設計 11 第一節系統設計理念 11 第二節系統架構 12 第三節系統使用者介面 14 第四節系統開發環境與工具 17 第五節系統操作說明 19 第四章研究方法與實驗設計 21 第一節研究架構 21 第二節研究方法 22 第三節研究對象 24 第四節研究工具 24 第五節實驗流程 26 第六節資料處理與分析 28 第七節研究步驟 32 第五章實驗結果分析 34 第一節實驗對象基本資料 34 第二節發展之IRT-ACA自動情境標註之準確率分析 36 第三節使用者使用TBIR與IRT-ACA解讀圖像情境成效之差異分析 41 第四節使用者使用IRT-ACA與TBIR系統之科技接受度差異分析 43 第五節使用者使用IRT-ACA與TBIR之系統操作行為歷程紀錄分析 44 第六節半結構式訪談質性資料分析 52 第七節綜合討論 55 第六章結論與建議 61 第一節結論 61 第二節 IRT-ACA之系統改進建議 64 第三節未來研究方向 65 參考文獻 67 附錄 71 附錄一訪談大綱 71 附錄二 IRT-ACA科技接受度問卷 72 附錄三 TBIR科技接受度問卷 74	zh_TW
dc.format.extent	3600487 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0108155017	en_US
dc.subject (關鍵詞)	數位人文	zh_TW
dc.subject (關鍵詞)	圖像檢索	zh_TW
dc.subject (關鍵詞)	語意鴻溝	zh_TW
dc.subject (關鍵詞)	物件標註	zh_TW
dc.subject (關鍵詞)	情境標註	zh_TW
dc.subject (關鍵詞)	深度學習	zh_TW
dc.subject (關鍵詞)	自動圖像標註	zh_TW
dc.subject (關鍵詞)	Mask R-CNN	zh_TW
dc.subject (關鍵詞)	TF-IDF	zh_TW
dc.subject (關鍵詞)	SVM	zh_TW
dc.subject (關鍵詞)	行為分析	zh_TW
dc.subject (關鍵詞)	Digital humanities	en_US
dc.subject (關鍵詞)	Image retrieval	en_US
dc.subject (關鍵詞)	Semantic gap	en_US
dc.subject (關鍵詞)	Object annotation	en_US
dc.subject (關鍵詞)	Contextual annotation	en_US
dc.subject (關鍵詞)	Deep learning	en_US
dc.subject (關鍵詞)	Automatic image annotation	en_US
dc.subject (關鍵詞)	Mask R-CNN	en_US
dc.subject (關鍵詞)	TF-IDF	en_US
dc.subject (關鍵詞)	SVM	en_US
dc.subject (關鍵詞)	Behavioral analysis	en_US
dc.title (題名)	基於自動情境標註之圖像檢索工具發展與數位人文應用研究	zh_TW
dc.title (題名)	Developing an Image Retrieval Tool based on Automatic Context Annotation for Digital Humanities Research	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	陳勇汀（2017）。行為順序檢定：滯後序列分析/ Behavior Analysis: Lag Sequential Analysis。檢自https://pulipulichen.github.io/HTML-Lag-Sequential-Analysis/ 項潔、陳麗華（2014）。數位人文－學科對話與融合的新領域。數位人文研究與技藝（頁9-23） Agosti, M., Ferro, N., Orio, N., & Ponchia, C. (2014). CULTURA outcomes for improving the user’s engagement with cultural heritage collections. Procedia Computer Science, 38, 34-39. doi:10.1016/j.procs.2014.10.007 Beaudoin, J. E. (2014). A framework of image use among archaeologists, architects, art historians and artists. Journal of Documentation, 70(1), 119-147. doi:10.1108/JD-12-2012-0157 Beaudoin, J. E., & Brady, J. E. (2011). Finding visual information: a study of image resources used by archaeologists, architects, art historians, and artists. Art Documentation: Journal of the Art Libraries Society of North America, 30(2), 24-36. doi:10.1086/adx.30.2.41244062 Bradshaw, B. (2000). Semantic based image retrieval: a probabilistic approach. In Proceedings of the eighth ACM international conference on Multimedia, 167-176. doi:10.1145/354384.354456 Brooks, J. (2019). COCO Annotator. Jsbroks/coco-annotator. https://github.com/jsbroks/coco-annotator/ Burdescu, D. D., Mihai, C. G., Stanescu, L., & Brezovan, M. (2013). Automatic image annotation and semantic based image retrieval for medical domain. Neurocomputing, 109, 33-48. doi:10.1016/j.neucom.2012.07.030 Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394-410. doi:10.1109/TPAMI.2007.61 Chen, C.-M., & Tsay, M.-Y. (2017). Applications of collaborative annotation system in digital curation, crowdsourcing, and digital humanities. The Electronic Library, 35(6), 1122-1140. doi:10.1108/EL-08-2016-0172 Chen, S. H., & Chen, Y. H. (2017). A content-based image retrieval method based on the google cloud vision api and wordnet. Intelligent Information and Database Systems, 651-662. doi:10.1007/978-3-319-54472-4_61 Chen, Y., Zhou, X. S., & Huang, T. S. (2001). One-class SVM for learning in image retrieval. In Proceedings 2001 International Conference on Image Processing, IEEE 2001, 34-37. doi:10.1109/ICIP.2001.958946 Cheng, Q., Zhang, Q., Fu, P., Tu, C., & Li, S. (2018). A survey and analysis on automatic image annotation. Pattern Recognition, 79, 242-259. doi:10.1016/j.patcog.2018.02.017 Eakins, J., & Graham, M. (1999). Content-based image retrieval. Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2016). Deep image retrieval: learning global representations for image search. Computer Vision – ECCV 2016, 9910, 241-257. doi:10.1007/978-3-319-46466-4_15 Hare, J. S., Lewis, P. H., Enser, P. G., & Sandom, C. J. (2006). Mind the gap: another look at the problem of the semantic gap in image retrieval. In Multimedia Content Analysis, Management, and Retrieval 2006, 6073, 607309.1-607309.12. doi:10.1117/12.647755 He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, CVPR 2017, 2961-2969. doi:10.1109/ICCV.2017.322 Henningsmeier, J. (1998). The foreign sources of Dianshizhai huabao 點石齋畫報, A nineteenth century Shanghai illustrated magazine. Ming qing yanjiu, 7(1), 59-91. doi:10.1163/24684791-90000374 Huang, C., Xu, H., Xie, L., Zhu, J., Xu, C., & Tang, Y. (2018). Large-scale semantic web image retrieval using bimodal deep learning techniques. Information Sciences, 430-431, 331-348. doi:10.1016/j.ins.2017.11.043 Huang, Z., Zhong, Z., Sun, L., & Huo, Q. (2019). Mask R-CNN with pyramid attention network for scene text detection. In 2019 IEEE Winter Conference on Applications of Computer Vision, IEEE 2019, 764-772. doi:10.1109/WACV.2019.00086 Hwang, G. J., Yang, L. H., & Wang, S. Y. (2013). A concept map-embedded educational computer game for improving students’ learning performance in natural science courses. Computers & Education, 69, 121-130. doi:10.1016/j.compedu.2013.07.008 Hyvönen, E., Saarela, S., Styrman, A., & Viljanen, K. (2003). Ontology-based image retrieval. In Proceedings of XML Finland Conference, 27-51. Retrieved from https://seco.cs.aalto.fi/publications/2002/hyvonen-styrman-saarela-ontology-based-image-retrieval-2002.pdf Ivasic-Kos, M., Ipsic, I., & Ribaric, S. (2015). A knowledge-based multi-layered image annotation system. Expert Systems with Applications, 42(24), 9539-9553. doi:10.1016/j.eswa.2015.07.068 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105. https://doi.org/10.1145/3065386 Li, Z., & Tang, J. (2015). Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia, 17(11), 1989-1999. doi:10.1109/TMM.2015.2477035 Liu, Y., Zhang, D., & Lu, G. (2008). Region-based image retrieval with high-level semantics using decision tree learning. Pattern Recognition, 41(8), 2554-2570. doi:10.1016/j.patcog.2007.12.003 Liu, Y., Zhang, D., Lu, G., & Ma, W. Y. (2007). A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1), 262-282. doi:10.1016/j.patcog.2006.04.045 Llamas, J., Lerones, P. M., Zalama, E., & Gómez-García-Bermejo, J. (2016). Applying deep learning techniques to cultural heritage images within the inception project. Progress in Cultural Heritage: Documentation, Preservation, and Protection, 25-32. doi:10.1007/978-3-319-48974-2_4 Lorang, E., Soh, L.-K., Datla, M. V., & Kulwicki, S. (2015). Developing an image-nased classifier for detecting poetic content in historic newspaper collections. D-Lib Magazine, 21(7/8). doi:10.1045/july2015-lorang Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, IEEE 1999, 1150-1157. doi:10.1109/ICCV.1999.790410 Murthy, V. N., Maji, S., & Manmatha, R. (2015). Automatic image annotation using deep learning representations. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 603-606. doi:10.1145/2671188.2749391 Murthy, V. S. V. S., Vamsidhar, E., Kumar, J. S., & Rao, P. S. (2010). Content based image retrieval using Hierarchical and K-means clustering techniques. International Journal of Engineering Science and Technology, 2(3), 209-212. Nguyen, H. V., & Bai, L. (2011). Cosine similarity metric learning for face verification. In Computer Vision – ACCV 2010, 709-720. doi:10.1007/978-3-642-19309-5_55 Özyurt, F., Tuncer, T., Avci, E., Koç, M., & Serhatlioğlu, İ. (2019). A novel liver image classification method using perceptual hash-based convolutional neural network. Arabian Journal for Science and Engineering, 44(4), 3173-3182. doi:10.1007/s13369-018-3454-1 Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2016, 779-788. doi: 10.1109/CVPR.2016.91 Rui, Y., & Huang, T. (2000). Optimizing learning in image retrieval. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2000, 236-243. doi: 10.1109/CVPR.2000.855825 Schreibman, Susan. (2012). Digital humanities: centres and peripheries. Historical Social Research-Historische Sozialforschung, 37(3), 46-58. Shyu, C. R. (2000). Relevance feedback decision trees in content-based image retrieval. In 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries, 68-72. Su, J. H., Wang, B. W., Yeh, H. H., & Tseng, V. S. (2009). Ontology-based semantic web image retrieval by utilizing textual and visual annotations. In 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, 425-428. Milan, Italy: IEEE. doi:10.1109/WI-IAT.2009.317 Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349-1380. doi:10.1109/34.895972 Terras, M. (2012). Image processing and digital humanities. Digital Humanities in Practice, 71-90. Facet. Retrieved from http://discovery.ucl.ac.uk/1327983/ Vijayarajan, V., Dinakaran, M., Tejaswin, P., & Lohani, M. (2016). A generic framework for ontology-based information retrieval and image retrieval in web data. Human-Centric Computing and Information Sciences, 6(1), 18. doi:10.1186/s13673-016-0074-1 Wan, H. L., & Chowdhury, M. (2003). Image semantic classification by using SVM. Journal of software, 14(11), 1891-1899. Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., ... & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2018, 5265-5274. doi:10.1109/CVPR.2018.00552 Wang, L. Z., & Gu, X. E. (2007). Dian-Shi-Zhai pictorial suiting both refined and popular tastes. Journal of Shanxi Normal University (Social Science Edition), 4. Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. Journal of machine learning research, 10(2). Xiaoqing, Y. (2003). The Dianshizhai pictorial: Shanghai urban life, 1884-1898 (No. 98). Ann Arbor, MI: University of Michigan Press. Yin, S., Chen, W., & Qin, X. (2009). Research on semantic network image retrieval method. In 2009 International Conference on Future BioMedical Information Engineering, 449-452. doi: 10.1109/FBIE.2009.5405823 Zhang, D., Islam, Md. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346-362. doi:10.1016/j.patcog.2011.05.013 Zhang, Y. J. (2006). Semantic-based visual information retrieval. Pennsylvania, PA: IGI Global.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202101403	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM