Please use this identifier to cite or link to this item: https://ah.nccu.edu.tw/handle/140.119/100572


Title: 跨平台社群媒體圖文檢索系統之設計與實作
Design and Implementation of a Text and Image Retrieval Tool for Cross-Platform Social Media Content
Authors: 許展嘉
Hsu, Chan Chia
Contributors: 陳恭
Chen, Kung
許展嘉
Hsu, Chan Chia
Keywords: 社群媒體
搜尋引擎
資訊檢索
social media
Elasticsearch
information retrieval
Date: 2016
Issue Date: 2016-08-22 13:40:52 (UTC+8)
Abstract: 本校數位人文研究學者歷年研究中收集了選舉、災難、社運等重大事件的社群媒體文本資料,包含Twitter、Facebook、批踢踢BBS(PTT),以及網路即時新聞等來源。這些大量的話語資料反應了網路社群大眾及新聞媒體在重大事件發生時的意見、情緒與互動狀態,非常具有研究價值。但對於這些大量社群媒體文本內容一直未能做充分地分析,其主要原因在於缺乏有效的資料檢索系統來幫助他們探索與研究來自不同媒體來源的文本內容。
因此本論文設計並建置一個跨媒體來源的資料檢索系統,依據所收集到的Twitter、Facebook、批踢踢、即時新聞之文本的data與metadata(後設資料)的特性,經由資料欄位重新定義、關聯式資料轉換、中文斷詞等機制,將data轉換成適合中文檢索的資料集,再透過Elasticsearch這個開放源碼的搜尋引擎進行鉅量資料的搜尋,建立一個具有彈性資料查詢界面與使用者的管理機制。方便數位人文研究學者可以針對資料集、關鍵字詞、圖片、時間區間等等,快速的搜尋各社群媒體文本內容,並藉由視覺化檢索成果展示不同社群媒體上對特定事件關注程度及反應狀況,為跨平台社群媒體圖文檢索做一整合資料來源管道奠定基石。
In the past few years, digital humanities researchers in our school have collected a huge amount of social media text data about major public events such as elections, disasters, social movements from various sources, namely Twitter, Facebook, PTT, real-time news. These text data can reflect largely the opinions, emotion, and interaction state, of the network community at the time of major events, thus being considered as valuable research assets. However, due to the lack of a proper information retrieval tool, these researchers have not been able to launch any in-depth studies on these social media text data.
Therefore, this thesis presents the design and implement an information retrieval system for these cross-media data sets based on the popular search engine, Elasticsearch. Our system first preprocesses the data and meta-data of these social media texts into a unified yet flexible data schema before building their indices in a way that users can search the full text from both the data proper and various meta-data attributes such as date of publication and authors. We also provide some visualization aid to display the search results in a user-friendly manner. Overall, our system serves as a good tool for researchers to explore the social media text data from various sources in an easy yet effectively way.
Reference: 1. NoSQL, from:http://zh.wikipedia.org/wiki/NoSQL
2. Elasticsearch-Definitive-Guide, from:https://github.com/elastic/elasticsearch-definitive-guide
3. Elasticsearh Reference, from:https://www.elastic.co/guide/en/elasticsearch
4. jQuery, from:http://jquery.com
5. Spring Framework, from: https://projects.spring.io/spring-framework
6. Hibernate, from: http://hibernate.org/
7. Echarts, from:http://echarts.baidu.com/
8. Jest, from: https://github.com/searchbox-io/Jest/tree/master/jest
9. PostgreSQL, from: https://www.postgresql.org/
Description: 碩士
國立政治大學
資訊科學系碩士在職專班
103971006
Source URI: http://thesis.lib.nccu.edu.tw/record/#G0103971006
Data Type: thesis
Appears in Collections:[資訊科學系碩士在職專班] 學位論文

Files in This Item:

File SizeFormat
100601.pdf3763KbAdobe PDF312View/Open


All items in 學術集成 are protected by copyright, with all rights reserved.


社群 sharing