學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 整合R與Hadoop/MapReduce來分析FOAF社群網路
Using R and Hadoop/MapReduce for FOAF-based Social Network Analytics
作者 孫肇祥
Sun, Jhao Siang
貢獻者 胡毓忠
Hu, Yuh Jong
孫肇祥
Sun, Jhao Siang
關鍵詞 RDF(S)
R and Hadoop/MapReduce
FOAF
Hadoop
MapReduce
社群網路分析
FOAF
Hadoop
MapReduce
Social network analytics
日期 2013
上傳時間 6-八月-2014 11:47:06 (UTC+8)
摘要 分散式線上社群網路採用RDF(S)為基礎的FOAF格式於信任的第三方Hadoop cluster來儲存個人資料與其社群網絡。面臨大量的社群網路資料,傳統的分析方式將會遇到許多處理與儲存的問題。本研究透過結合R與Hadoop/MapReduce技術,提出三種分析方式:R + Hadoop Streaming (RHS), R + MySQL (RMS), R + Hive (RH)來解決分析大量FOAF資料運算與儲存的瓶頸。我們首先將FOAF資料集注入Hadoop cluster平台並利用MapReduce的分散式運算,預先消化大部分的資料以解決R統計軟體單機記憶體不足以應付大型檔案的問題,透過後續R的分析我們也同時解決MapReduce運算無法進行深層社群網路分析的問題。透過預先拆解的方式以可以處理更大的FOAF資料使其更有延展性。這個方法可以適用於非結構化或結構化資料。面對每日激增的社群網路資料,如何更進一步的結合R與Hadoop/MapReduce,並 使用HBase或是與既有R的平行化軟體作結合,也是日後可以努力研究的方向。
The decentralized online social networks are encoded as RDF(S)-based FOAF data format. These FOAF datasets, stored on the trusted Hadoop cluster, are used to represent Web users’ personal data and their social relationships. When using traditional data analysis techniques, we face numerous data processing and storing challenges. In this study, we apply three R and Hadoop/MapReduce integration techniques for high volume FOAF data analysis, including R + Hadoop Streaming (RHS), R + MySQL (RMS), and R + Hive (RH). We first ingest the FOAF datasets and pre-process these datasets through the MapReduce distributed programming paradigm. Then, apply R for FOAF data analysis. This resolves the major problems of impossibly reading high volume of big FOAF data into memory for R analysis and the limitation of social network analysis by using MapReduce computation. High volume of FOAF datasets can be distributed and stored effectively in the Hadoop platform for scalable data processing. The R + Hadoop/MapReduce techniques can be used for analysis on the structured and unstructured data. In the future study, the research issues will be on how to effectively integrate R and Hadoop/MapReduce and leverage the HBase or parallel R programming for high volume big data analytics.
參考文獻 [1].Apache Hadoop Project, http://hadoop.apache.org
[2].Billion Triples Challenge 2012 Dataset, http://km.aifb.kit.edu/projects/btc-2012/
[3].Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far.International journal on semantic web and information systems, 5(3), 1-22.
[4].Bonacich, P. (1987). Power and centrality: A family of measures. American journal of sociology, 1170-1182.
[5].Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 4.
[6].Daniel J. Weitzner . http://www.w3.org/People/Weitzner.html
[7].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
[8].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
[9].Department Of Statistics, Purdue University (2012). Divide and Recombine (D&R) with RHIPE. Retrieved from http://www.datadr.org/.
[10].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE.
[11].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE.
[12].Dirk Eddelbuettel(2014, July 7) . CRAN Task View: High-Performance and Parallel Computing with R , Retrieved July 7, 2014, from http://cran.r-project.org/web/views/HighPerformanceComputing.html
[13].Erétéo, G., Gandon, F., Corby, O., & Buffa, M. (2009). Semantic social network analysis. arXiv preprint arXiv:0904.3701.
[14].FOAF Vocabulary Specification 0.99/Namespace Document 14 January 2014 - Paddington Edition. http://xmlns.com/foaf/spec/
[15].Freeman, L. C. (1979). Centrality in social networks conceptual clarification. Social networks, 1(3), 215-239.
[16].G. K. Zipf, Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, 1932
[17].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM.
[18].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM.
[19].Golbeck, J., & Rothstein, M. (2008, July). Linking Social Networks on the Web with FOAF: A Semantic Web Case Study. In AAAI (Vol. 8, pp. 1138-1143).
[20].http://en.wikipedia.org/wiki/Information_Sciences_Institute
[21].http://www.ldodds.com/foaf/foaf-a-matic.html
[22].Jonathan Seidman .,& Ramesh Venkataramaiah (2011). Distributed Data Analysis with Hadoop and R.
[23].Mori, J., Matsuo, Y., Ishizuka, M., & Faltings, B. (2004, September). Keyword extraction from the web for foaf metadata. In Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web.
[24].MySQL database, http://www.mysql.com/
[25].MySQL Limits on Table Size, http://dev.mysql.com/doc/refman/5.1/en/table-size-limit.html
[26].Paolillo, J. C., & Wright, E. (2004). The challenges of FOAF characterization. InProceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web.
[27].Paolillo, J. C., & Wright, E. (2006). Social network analysis on the semantic web: Techniques and challenges for visualizing FOAF. In Visualizing the semantic web(pp. 229-241). Springer London.
[28].Piccolboni, A. (2014,May 25) RevolutionAnalytics/RHadoop. Retrieved from https://github.com/RevolutionAnalytics/RHadoop/wiki.
[29].Resource Description Framework (RDF), http://www.w3.org/RDF/
[30].Rickert, J. B. (2010). Big Data Analysis with Revolution R Enterprise.
[31].Ryan R. Rosario(2010). Taking R to the Limit. Los Angeles R Users` Group
[32].The Apache HBase, http://hbase.apache.org/
[33].The Apache Hive, https://hive.apache.org/
[34].The Apache ZooKeeper, http://zookeeper.apache.org/
[35].The Friend of a Friend (FOAF) project, http://www.foaf-project.org/
[36].The R Project for Statistical Computing, http://www.r-project.org/
[37].Yeung, C. M. A., Liccardi, I., Lu, K., Seneviratne, O., & Berners-Lee, T. (2009, January). Decentralization: The future of online social networking. In W3C Workshop on the Future of Social Networking Position Papers (Vol. 2, pp. 2-7).
描述 碩士
國立政治大學
資訊科學學系
95971012
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0095971012
資料類型 thesis
dc.contributor.advisor 胡毓忠zh_TW
dc.contributor.advisor Hu, Yuh Jongen_US
dc.contributor.author (作者) 孫肇祥zh_TW
dc.contributor.author (作者) Sun, Jhao Siangen_US
dc.creator (作者) 孫肇祥zh_TW
dc.creator (作者) Sun, Jhao Siangen_US
dc.date (日期) 2013en_US
dc.date.accessioned 6-八月-2014 11:47:06 (UTC+8)-
dc.date.available 6-八月-2014 11:47:06 (UTC+8)-
dc.date.issued (上傳時間) 6-八月-2014 11:47:06 (UTC+8)-
dc.identifier (其他 識別碼) G0095971012en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/68266-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 95971012zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 分散式線上社群網路採用RDF(S)為基礎的FOAF格式於信任的第三方Hadoop cluster來儲存個人資料與其社群網絡。面臨大量的社群網路資料,傳統的分析方式將會遇到許多處理與儲存的問題。本研究透過結合R與Hadoop/MapReduce技術,提出三種分析方式:R + Hadoop Streaming (RHS), R + MySQL (RMS), R + Hive (RH)來解決分析大量FOAF資料運算與儲存的瓶頸。我們首先將FOAF資料集注入Hadoop cluster平台並利用MapReduce的分散式運算,預先消化大部分的資料以解決R統計軟體單機記憶體不足以應付大型檔案的問題,透過後續R的分析我們也同時解決MapReduce運算無法進行深層社群網路分析的問題。透過預先拆解的方式以可以處理更大的FOAF資料使其更有延展性。這個方法可以適用於非結構化或結構化資料。面對每日激增的社群網路資料,如何更進一步的結合R與Hadoop/MapReduce,並 使用HBase或是與既有R的平行化軟體作結合,也是日後可以努力研究的方向。zh_TW
dc.description.abstract (摘要) The decentralized online social networks are encoded as RDF(S)-based FOAF data format. These FOAF datasets, stored on the trusted Hadoop cluster, are used to represent Web users’ personal data and their social relationships. When using traditional data analysis techniques, we face numerous data processing and storing challenges. In this study, we apply three R and Hadoop/MapReduce integration techniques for high volume FOAF data analysis, including R + Hadoop Streaming (RHS), R + MySQL (RMS), and R + Hive (RH). We first ingest the FOAF datasets and pre-process these datasets through the MapReduce distributed programming paradigm. Then, apply R for FOAF data analysis. This resolves the major problems of impossibly reading high volume of big FOAF data into memory for R analysis and the limitation of social network analysis by using MapReduce computation. High volume of FOAF datasets can be distributed and stored effectively in the Hadoop platform for scalable data processing. The R + Hadoop/MapReduce techniques can be used for analysis on the structured and unstructured data. In the future study, the research issues will be on how to effectively integrate R and Hadoop/MapReduce and leverage the HBase or parallel R programming for high volume big data analytics.en_US
dc.description.tableofcontents 摘要 i
Abstract ii
致謝 iii
第一章 導論 . 1
1.1 研究動機 1
1.2 研究目的 1
1.3 各章節敘述 2
第二章 研究背景 3
2.1 Hadoop 3
2.2 Hive 4
2.3 R 6
第三章 相關研究 8
3.1 FOAF(Friend of A Friend) 8
3.2 社會網路分析(Social Network Analysis,SNA) 10
3.3 R與Hadoop的整合 13
3.3.1 RHadoop 13
3.3.2 RHIPE 15
3.3.3 Hadoop Streaming 17
第四章 方法架構設計 20
4.1 研究架構 20
4.2 FOAF分析 21
4.2.1 R+Hadoop Streaming分析(RHS Analytics) 21
4.2.2 R+MySQL分析(RMS Analytics) 23
4.2.3 R+Hive分析(RH Analytics) 26
第五章 系統實作 30
5.1 系統架構 30
5.2 資料來源 32
5.3 FOAF資料分析 33
5.3.1 R+Hadoop Streaming分析(RHS Analytics) 33
5.3.2 R+MySQL分析(RMS Analytics) 39
5.3.3 R+Hive分析(RH Analytics) 40
5.3.4 效能分析比較 44
第六章 結論與未來展望 46
參考文獻 48
zh_TW
dc.format.extent 9299565 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0095971012en_US
dc.subject (關鍵詞) RDF(S)zh_TW
dc.subject (關鍵詞) R and Hadoop/MapReducezh_TW
dc.subject (關鍵詞) FOAFzh_TW
dc.subject (關鍵詞) Hadoopzh_TW
dc.subject (關鍵詞) MapReducezh_TW
dc.subject (關鍵詞) 社群網路分析zh_TW
dc.subject (關鍵詞) FOAFen_US
dc.subject (關鍵詞) Hadoopen_US
dc.subject (關鍵詞) MapReduceen_US
dc.subject (關鍵詞) Social network analyticsen_US
dc.title (題名) 整合R與Hadoop/MapReduce來分析FOAF社群網路zh_TW
dc.title (題名) Using R and Hadoop/MapReduce for FOAF-based Social Network Analyticsen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1].Apache Hadoop Project, http://hadoop.apache.org
[2].Billion Triples Challenge 2012 Dataset, http://km.aifb.kit.edu/projects/btc-2012/
[3].Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far.International journal on semantic web and information systems, 5(3), 1-22.
[4].Bonacich, P. (1987). Power and centrality: A family of measures. American journal of sociology, 1170-1182.
[5].Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 4.
[6].Daniel J. Weitzner . http://www.w3.org/People/Weitzner.html
[7].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
[8].Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
[9].Department Of Statistics, Purdue University (2012). Divide and Recombine (D&R) with RHIPE. Retrieved from http://www.datadr.org/.
[10].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE.
[11].Ding, L., Zhou, L., Finin, T., & Joshi, A. (2005, January). How the semantic web is being used: An analysis of foaf documents. In System Sciences, 2005. HICSS`05. Proceedings of the 38th Annual Hawaii International Conference on(pp. 113c-113c). IEEE.
[12].Dirk Eddelbuettel(2014, July 7) . CRAN Task View: High-Performance and Parallel Computing with R , Retrieved July 7, 2014, from http://cran.r-project.org/web/views/HighPerformanceComputing.html
[13].Erétéo, G., Gandon, F., Corby, O., & Buffa, M. (2009). Semantic social network analysis. arXiv preprint arXiv:0904.3701.
[14].FOAF Vocabulary Specification 0.99/Namespace Document 14 January 2014 - Paddington Edition. http://xmlns.com/foaf/spec/
[15].Freeman, L. C. (1979). Centrality in social networks conceptual clarification. Social networks, 1(3), 215-239.
[16].G. K. Zipf, Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, 1932
[17].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM.
[18].Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS Operating Systems Review (Vol. 37, No. 5, pp. 29-43). ACM.
[19].Golbeck, J., & Rothstein, M. (2008, July). Linking Social Networks on the Web with FOAF: A Semantic Web Case Study. In AAAI (Vol. 8, pp. 1138-1143).
[20].http://en.wikipedia.org/wiki/Information_Sciences_Institute
[21].http://www.ldodds.com/foaf/foaf-a-matic.html
[22].Jonathan Seidman .,& Ramesh Venkataramaiah (2011). Distributed Data Analysis with Hadoop and R.
[23].Mori, J., Matsuo, Y., Ishizuka, M., & Faltings, B. (2004, September). Keyword extraction from the web for foaf metadata. In Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web.
[24].MySQL database, http://www.mysql.com/
[25].MySQL Limits on Table Size, http://dev.mysql.com/doc/refman/5.1/en/table-size-limit.html
[26].Paolillo, J. C., & Wright, E. (2004). The challenges of FOAF characterization. InProceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web.
[27].Paolillo, J. C., & Wright, E. (2006). Social network analysis on the semantic web: Techniques and challenges for visualizing FOAF. In Visualizing the semantic web(pp. 229-241). Springer London.
[28].Piccolboni, A. (2014,May 25) RevolutionAnalytics/RHadoop. Retrieved from https://github.com/RevolutionAnalytics/RHadoop/wiki.
[29].Resource Description Framework (RDF), http://www.w3.org/RDF/
[30].Rickert, J. B. (2010). Big Data Analysis with Revolution R Enterprise.
[31].Ryan R. Rosario(2010). Taking R to the Limit. Los Angeles R Users` Group
[32].The Apache HBase, http://hbase.apache.org/
[33].The Apache Hive, https://hive.apache.org/
[34].The Apache ZooKeeper, http://zookeeper.apache.org/
[35].The Friend of a Friend (FOAF) project, http://www.foaf-project.org/
[36].The R Project for Statistical Computing, http://www.r-project.org/
[37].Yeung, C. M. A., Liccardi, I., Lu, K., Seneviratne, O., & Berners-Lee, T. (2009, January). Decentralization: The future of online social networking. In W3C Workshop on the Future of Social Networking Position Papers (Vol. 2, pp. 2-7).
zh_TW