dc.contributor.advisor | 陳恭 | zh_TW |
dc.contributor.advisor | Chen, Kung | en_US |
dc.contributor.author (Authors) | 許矢勇 | zh_TW |
dc.contributor.author (Authors) | Shiu, Shih Yung | en_US |
dc.creator (作者) | 許矢勇 | zh_TW |
dc.creator (作者) | Shiu, Shih Yung | en_US |
dc.date (日期) | 2013 | en_US |
dc.date.accessioned | 25-Aug-2014 15:21:49 (UTC+8) | - |
dc.date.available | 25-Aug-2014 15:21:49 (UTC+8) | - |
dc.date.issued (上傳時間) | 25-Aug-2014 15:21:49 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0100971001 | en_US |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/69229 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊科學學系 | zh_TW |
dc.description (描述) | 100971001 | zh_TW |
dc.description (描述) | 102 | zh_TW |
dc.description.abstract (摘要) | 近年來社群媒體如推特、臉書、新浪微博等蓬勃地發展,不僅用戶數持續成長,也已成為人們日常生活中與朋友交流以及獲取資訊的一個重要管道。對於傳播與社會學者而言,社群媒體巨擘們掌握的巨量資料,是進行相關主題研究的一個重要資源。各大社群媒體雖然都有適度提供資料擷取的程式介面(API),但也或多或少地對資料搜集者加諸某些限制,導致資料的搜集發生困難。簡言之,研究人員必須在這些社群媒體提供的有限資源的限制下,設法優化所能取的資料集的質與量。有鑑於此,本研究以推特(twitter)為標的,實作一具資源感知之社群媒體資料搜集平台來協助學者蒐集推文(tweet)。首先,本平台採用事件-工作的概念,讓使者用針對所關注的事件,選定不同的關鍵字進行蒐集的資料,這些不同的關鍵字即對應到系統的工作。其次,每個工作必須擁有存取代幣(access tokens)才能以蒐集推文,而每個代幣在一定時間內只能取得一定數量的推文,所以代幣是本平台的主要資源。為因應特殊事件發生時,推文暴增的常見情況,本平台提供了一個代幣池(token pool)的機制,讓眾多工作得以分享代幣資源,並善用推特API的存取選項,提供使用者可依蒐集資料時間點的差異,進行可取得推文數量的優化。在系統核心設計上,本研究提出「豪宅家務服務群(Mansion Household Service)」的概念,透過服務群內隨從(minion)們的分工合作,系統能夠在資源有限的情況下,仍然能夠同步執行多個不同的工作,有效降低推特所加諸的限制,對於推文搜集所造成的衝擊。我們並以實證方式,驗證我們平台的推文蒐集能力。 | zh_TW |
dc.description.abstract (摘要) | Recently, with the rapid development of social media such as Twitter, Facebook and Weibo, people have employed social media as a major channel for inter-personal communication and a daily source of various kinds of information. From the viewpoints of social science and humanity scholars, the digital footprints that people left on these social media are a rich resource for the study of human behaviors. However, these social media usually impose certain resource restrictions such as rate limiting on how scholars may use their API to retrieve their data. Therefore, we design and implement a resource-aware data collection platform for Twitter to help scholars retrieve historical tweets in an effective and efficient manner.Our platform employs the event-job approach to help users organize the tasks and the tweets to be collected. As each job requires an access token to fetch tweets, our platform provides a pool of tokens for system jobs to share so that access tokens will be maximally utilized. Besides, we leverage the tweet-id options in Twitter API and enable users to optimize the number of tweets to be collected depending on the timing of tweet collection. In the organization of the system core of tweet collection, we propose a so-called “Mansion Household System,” in which four-minions will corporate with each other to launch different jobs simultaneously and thus alleviate the impact from the restrictions which Twitter imposes via access tokens. To validate our design, we have conducted a series of experiments and the results are quite satisfying. | en_US |
dc.description.tableofcontents | 第一章 緒論 11.1前言 11.2研究動機 21.3研究目的 21.4研究成果 41.5論文大綱 5第二章 相關觀念與技術背景 62.1 Model-View-Controller(MVC) 62.2 Spring與MVC 102.3 Object-relational mapping(ORM) 132.3.1 OpenJPA 142.3.2 c3p0 DataSources Pools 152.4 推特資料搜集 152.4.1 OAuth 152.4.2 推特API 172.4.3 Twitter4j 212.4排程與Quartz 212.6佇列與RabbitMQ 232.7前端技術 242.7.1 jQuery 242.7.2 jQWidgets 262.7.3 jVectorMap 272.7.4 Java Server Pages Standard Tag Library (JSTL) 272.8相關工具:YourTwapperKeeper 28第三章 系統設計與架構 293.1系統設計理念 303.1.1 MVC架構 313.1.2系統核心邏輯 333.2資料庫存取 343.2.1 Service Layer與Data Access Object(DAO)設計模式 343.2.2資料表設計 353.3系統功能探索 373.3.1資料搜集 383.3.2搜集資料分析與統計 413.3.3系統管理 483.4深入推特資料搜集 493.4.1推特所加諸之限制 493.4.2使用具資源感知性之Access Token Pool進行效率化推文搜集 513.4.3搜集工作排程 543.5實作推特資料搜集之服務群 553.5.1門房(Doorman) 563.5.2管家(Butler) 573.5.3房務人員(HouseKeeper) 603.5.4守衛(Guardian) 62第四章 系統功能驗證 644.1個案設計 644.2個案分析與討論 664.3比較本平台與YourTwapperKeeper之推文搜集 714.3.1個案設計 714.3.2比較與分析 72第五章 結論與建議 775.1結論 775.2未來發展與建議 78參考文獻 79 | zh_TW |
dc.format.extent | 2734160 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0100971001 | en_US |
dc.subject (關鍵詞) | 推特 | zh_TW |
dc.subject (關鍵詞) | 資源感知 | zh_TW |
dc.subject (關鍵詞) | 社群媒體 | zh_TW |
dc.subject (關鍵詞) | Twitter | en_US |
dc.subject (關鍵詞) | Resource-aware | en_US |
dc.subject (關鍵詞) | Social media | en_US |
dc.title (題名) | 資源感知之社群媒體資料搜集平台:以推特為例 | zh_TW |
dc.title (題名) | A resource-aware data collection platform for Twitter | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | 【1】 Shamanth Kumar ,Fred Morstatter, Huan Liu. August 19,2013. Twitter Data Analytics.【2】 周玉駿. 2013. 實作推特社群媒體的資料蒐集與管理服務.【3】 Adam Marcus, Michael S.Bernstein, Osama Badar, David R.Karger, Samuel Madden, Robert C.Miller. 2012. Processing and Visualizing the Data in Tweets.【4】 Lance Reagan Vick, Titus Soporan, Daniel Robert Lewis, Jane Brooks Zurn. 2012. Hybrid Browser/Server Collection of Streaming Social Media Data for Scalable Real-Time Analysis.【5】 Matko Bosnjak, Eduardo Oliveira, Jose Martins, Eduarda Mendes Rodrigues, Luis Sarmento. 2012. TwitterEcho-A Distributed Focused Crawler to Support Open Research with Twitter Data.【6】 Axel Bruns ,Yuxian Eugene Liang. Apr, 2012. Tools and methods for capturing Twitter data during natural disasters.【7】 Twitter Application-only authentication: https://dev.twitter.com/docs/auth/application-only-auth【8】 Twitter Search API: https://dev.twitter.com/docs/using-search【9】 Aditi Das. Jan 17,2008. Understanding JPA,Part1: The object-oriented paradigm of data persistence. http://www.javaworld.com/article/2077817/java-se/understanding-jpa-part-1-the-object-oriented-paradigm-of-data-persistence.html【10】 Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. August 1994. Design Patterns Elements of Reusable Object-Oriented Software.【11】 Adam Green, February 15,2013. Twitter API Engagement Programming with PHP and MySQL. | zh_TW |