學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 跨平台社群媒體圖文檢索系統之設計與實作
Design and Implementation of a Text and Image Retrieval Tool for Cross-Platform Social Media Content
作者 許展嘉
Hsu, Chan Chia
貢獻者 陳恭
Chen, Kung
許展嘉
Hsu, Chan Chia
關鍵詞 社群媒體
搜尋引擎
資訊檢索
social media
Elasticsearch
information retrieval
日期 2016
上傳時間 22-Aug-2016 13:40:52 (UTC+8)
摘要 本校數位人文研究學者歷年研究中收集了選舉、災難、社運等重大事件的社群媒體文本資料,包含Twitter、Facebook、批踢踢BBS(PTT),以及網路即時新聞等來源。這些大量的話語資料反應了網路社群大眾及新聞媒體在重大事件發生時的意見、情緒與互動狀態,非常具有研究價值。但對於這些大量社群媒體文本內容一直未能做充分地分析,其主要原因在於缺乏有效的資料檢索系統來幫助他們探索與研究來自不同媒體來源的文本內容。
因此本論文設計並建置一個跨媒體來源的資料檢索系統,依據所收集到的Twitter、Facebook、批踢踢、即時新聞之文本的data與metadata(後設資料)的特性,經由資料欄位重新定義、關聯式資料轉換、中文斷詞等機制,將data轉換成適合中文檢索的資料集,再透過Elasticsearch這個開放源碼的搜尋引擎進行鉅量資料的搜尋,建立一個具有彈性資料查詢界面與使用者的管理機制。方便數位人文研究學者可以針對資料集、關鍵字詞、圖片、時間區間等等,快速的搜尋各社群媒體文本內容,並藉由視覺化檢索成果展示不同社群媒體上對特定事件關注程度及反應狀況,為跨平台社群媒體圖文檢索做一整合資料來源管道奠定基石。
In the past few years, digital humanities researchers in our school have collected a huge amount of social media text data about major public events such as elections, disasters, social movements from various sources, namely Twitter, Facebook, PTT, real-time news. These text data can reflect largely the opinions, emotion, and interaction state, of the network community at the time of major events, thus being considered as valuable research assets. However, due to the lack of a proper information retrieval tool, these researchers have not been able to launch any in-depth studies on these social media text data.
Therefore, this thesis presents the design and implement an information retrieval system for these cross-media data sets based on the popular search engine, Elasticsearch. Our system first preprocesses the data and meta-data of these social media texts into a unified yet flexible data schema before building their indices in a way that users can search the full text from both the data proper and various meta-data attributes such as date of publication and authors. We also provide some visualization aid to display the search results in a user-friendly manner. Overall, our system serves as a good tool for researchers to explore the social media text data from various sources in an easy yet effectively way.
參考文獻 1. NoSQL, from:http://zh.wikipedia.org/wiki/NoSQL
2. Elasticsearch-Definitive-Guide, from:https://github.com/elastic/elasticsearch-definitive-guide
3. Elasticsearh Reference, from:https://www.elastic.co/guide/en/elasticsearch
4. jQuery, from:http://jquery.com
5. Spring Framework, from: https://projects.spring.io/spring-framework
6. Hibernate, from: http://hibernate.org/
7. Echarts, from:http://echarts.baidu.com/
8. Jest, from: https://github.com/searchbox-io/Jest/tree/master/jest
9. PostgreSQL, from: https://www.postgresql.org/
描述 碩士
國立政治大學
資訊科學系碩士在職專班
103971006
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103971006
資料類型 thesis
dc.contributor.advisor 陳恭zh_TW
dc.contributor.advisor Chen, Kungen_US
dc.contributor.author (Authors) 許展嘉zh_TW
dc.contributor.author (Authors) Hsu, Chan Chiaen_US
dc.creator (作者) 許展嘉zh_TW
dc.creator (作者) Hsu, Chan Chiaen_US
dc.date (日期) 2016en_US
dc.date.accessioned 22-Aug-2016 13:40:52 (UTC+8)-
dc.date.available 22-Aug-2016 13:40:52 (UTC+8)-
dc.date.issued (上傳時間) 22-Aug-2016 13:40:52 (UTC+8)-
dc.identifier (Other Identifiers) G0103971006en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/100572-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 103971006zh_TW
dc.description.abstract (摘要) 本校數位人文研究學者歷年研究中收集了選舉、災難、社運等重大事件的社群媒體文本資料,包含Twitter、Facebook、批踢踢BBS(PTT),以及網路即時新聞等來源。這些大量的話語資料反應了網路社群大眾及新聞媒體在重大事件發生時的意見、情緒與互動狀態,非常具有研究價值。但對於這些大量社群媒體文本內容一直未能做充分地分析,其主要原因在於缺乏有效的資料檢索系統來幫助他們探索與研究來自不同媒體來源的文本內容。
因此本論文設計並建置一個跨媒體來源的資料檢索系統,依據所收集到的Twitter、Facebook、批踢踢、即時新聞之文本的data與metadata(後設資料)的特性,經由資料欄位重新定義、關聯式資料轉換、中文斷詞等機制,將data轉換成適合中文檢索的資料集,再透過Elasticsearch這個開放源碼的搜尋引擎進行鉅量資料的搜尋,建立一個具有彈性資料查詢界面與使用者的管理機制。方便數位人文研究學者可以針對資料集、關鍵字詞、圖片、時間區間等等,快速的搜尋各社群媒體文本內容,並藉由視覺化檢索成果展示不同社群媒體上對特定事件關注程度及反應狀況,為跨平台社群媒體圖文檢索做一整合資料來源管道奠定基石。
zh_TW
dc.description.abstract (摘要) In the past few years, digital humanities researchers in our school have collected a huge amount of social media text data about major public events such as elections, disasters, social movements from various sources, namely Twitter, Facebook, PTT, real-time news. These text data can reflect largely the opinions, emotion, and interaction state, of the network community at the time of major events, thus being considered as valuable research assets. However, due to the lack of a proper information retrieval tool, these researchers have not been able to launch any in-depth studies on these social media text data.
Therefore, this thesis presents the design and implement an information retrieval system for these cross-media data sets based on the popular search engine, Elasticsearch. Our system first preprocesses the data and meta-data of these social media texts into a unified yet flexible data schema before building their indices in a way that users can search the full text from both the data proper and various meta-data attributes such as date of publication and authors. We also provide some visualization aid to display the search results in a user-friendly manner. Overall, our system serves as a good tool for researchers to explore the social media text data from various sources in an easy yet effectively way.
en_US
dc.description.tableofcontents 第一章 緒論 1
1.1研究背景 1
1.2研究動機 2
1.3研究目的 2
1.4研究成果 4
1.5論文大綱 5
第二章 相關研究與技術背景 6
2.1社群媒體資料特徵 6
2.1.1 Twitter資料結構特徵 6
2.1.2 Facebook資料結構特徵 8
2.1.3 批踢踢BBS 9
2.1.4 即時新聞 11
2.2跨平台資料檢索之機制 11
2.3資料庫及存取技術 13
2.3.1 Elasticsearch 14
2.3.2 Hibernate 17
2.3.3 PostgreSQL 19
2.4 中文斷詞與索引 20
2.4.1 N-gram 基本原理 21
2.4.2 反向索引基本原理 22
2.5前端頁面技術 23
2.5.1 Spring 24
2.5.2 jQuery 25
2.5.3 ECharts 26
2.5.4 Bootstrap 27
第三章 系統架構與設計 29
3.1 系統設計理念 29
3.2資料儲存格式分析 30
3.3 Elasticsearch搜尋引擎溝通之QUERY DSL 33
3.4 Elasticsearch搜尋引擎處理關聯式data機制 33
3.5 系統設計與實作 34
3.5.1 使用者操作模組 35
3.5.2 資料轉換模組 37
3.5.3 資料搜尋模組 38
3.5.4 資料展示模組 40
3.5.5 資料匯出模組 41
第四章 系統實作與驗證 42
4.1 Elasticsearch建置 42
4.1.1 Elasticsearch Mapping 43
4.2 Java 與Elasticsearch互動查詢方法 45
4.3 視覺化呈現-Echarts 46
4.4 系統實作結果 47
4.4.1 資料搜尋介面功能 48
4.4.2 資料展示介面功能 51
4.5 系統驗證 59
4.5.1 系統驗證方法 59
4.5.2 系統驗證結果 60
第五章 結論與建議 61
5.1 結論 61
5.2研究限制 62
5.3未來發展與建議 62
參考文獻 64
zh_TW
dc.format.extent 3854219 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103971006en_US
dc.subject (關鍵詞) 社群媒體zh_TW
dc.subject (關鍵詞) 搜尋引擎zh_TW
dc.subject (關鍵詞) 資訊檢索zh_TW
dc.subject (關鍵詞) social mediaen_US
dc.subject (關鍵詞) Elasticsearchen_US
dc.subject (關鍵詞) information retrievalen_US
dc.title (題名) 跨平台社群媒體圖文檢索系統之設計與實作zh_TW
dc.title (題名) Design and Implementation of a Text and Image Retrieval Tool for Cross-Platform Social Media Contenten_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 1. NoSQL, from:http://zh.wikipedia.org/wiki/NoSQL
2. Elasticsearch-Definitive-Guide, from:https://github.com/elastic/elasticsearch-definitive-guide
3. Elasticsearh Reference, from:https://www.elastic.co/guide/en/elasticsearch
4. jQuery, from:http://jquery.com
5. Spring Framework, from: https://projects.spring.io/spring-framework
6. Hibernate, from: http://hibernate.org/
7. Echarts, from:http://echarts.baidu.com/
8. Jest, from: https://github.com/searchbox-io/Jest/tree/master/jest
9. PostgreSQL, from: https://www.postgresql.org/
zh_TW