巨量資料分析之虛擬矩陣設計

Publications-Theses

Article View/Open

html(303)

Publication Export

Google Scholar^TM

題名	巨量資料分析之虛擬矩陣設計 Designing of Virtual Matrix of Big Data Analysis
作者	黃日佳
貢獻者	劉文卿<br>張景堯黃日佳
關鍵詞	巨量資料記憶體不足虛擬矩陣矩陣運算 R語言
日期	2016
上傳時間	1-Sep-2016 23:45:58 (UTC+8)
摘要	本研究為解決在巨量資料分析下所產生之主記憶體不足之問題,設計虛擬矩陣架構,透過虛擬矩陣架構提供快速、高效能的矩陣操作及運算,並降低巨量資料在運算時所佔據之主記憶體容量。並結合 R 語言,提供 R 語言巨量資料分析、高速矩陣運算之能力。
參考文獻	1. 林思吟. (2006). 「中國上市公司財務危機預警模型研究」政治大學金融研究所學位論文, 1-51. 2. 西內啟著, 陳亦苓譯（2016）「統計學，最強的商業武器從買樂透到大數據，全都離不開統計學；不懂統計學，你就等著被騙吧！」悅知文化出版社. 3. Adler, D., Nenadic, O., Zucchini, W.& Glaser, C. (2008). The ff Package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files. 4. Anton, H. (2010). Elementary linear algebra. John Wiley & Sons 5. Asymptotix (2011). Integrating RevoDeployR from Revolution through RESTful API or XML-RPC with .NET or Drupal. Retrieved from http://www.asymptotix.eu/news/integrating-revodeployr-revolution-through-restful-api-or-xml-rpc-net-or-drupal 6. Beyer, M. A.& Laney, D. (2012). The Importance of ‘Big Data’: A Definition. Stamford, CT: Gartner, 2014-2018. 7. Bivand, R. S. (2000). Using the R statistical data analysis language on GRASS 5.0 GIS database files. Computers & Geosciences, 26(9), 1043-1052. 8. Boyland, J. T. (2005, July). Handling Out of Memory Errors. In ECOOP 2005 Workshop on Exception Handling in Object-Oriented Systems. 9. Cai, X., Nie, F., & Huang, H. (2013, August). Multi-View K-Means Clustering on Big Data. In IJCAI. 10. Constantine A. C., Tim P. (2013). High-Volume Data Collection and Real Time Analytics Using Redis. Retrieved from http://conferences.oreilly.com/strata/strata2013/public/schedule/detail/27350 11. Cribari-Neto, F., & Zarkos, S. G. (1999). R: Yet another econometric programming environment. Journal of Applied Econometrics, 14(3), 319-329. 12. Debasis, S. (2009). Classic Data Structures 2Nd Ed. PHI Learning Pvt. Ltd.. 13. Derksen, S. & Keselman, H. J. (1992). Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables. British Journal of Mathematical and Statistical Psychology, 45(2), 265-282. 14. Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). JHU Press. 15. Han, J., Haihong, E., Le, G. & Du, J. (2011, October). Survey on NoSQL Database. In Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on IEEE，363-366. 16. IBM (2015). The Four V`s of Big Data. Retrieved from http://www.ibmbigdatahub.com/infographic/four-vs-big-data 17. Ihaka, R.& Gentleman, R. (1996). R: a Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314. 18. Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, 6, 70. 19. Matloff, N. (2008). R for Programmers. University of California. 20. Ordonez, C., Zhang, Y., & Cabrera, W. (2016). The Gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1905-1918. 21. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H. A.& Mankovskii, S. (2012). Solving Big Data Challenges for Enterprise Application Performance Management. Proceedings of the VLDB Endowment, 5(12), 1724-1735. 22. Scott, J. A. (2015). Getting Started with Apache Spark. MapR Technologies. 23. Team, R. C. (2000). R Language Definition. Vienna, Austria: R Foundation for Statistical Computing. 24. Venables, W. N.& Smith, D. M. (2009). An Introduction to R. Network Theory Limited. 25. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., Mccauley, M.& Stoica, I. (2012). Fast and Interactive Analytics Over Hadoop Data with Spark. USENIX Login,37(4),45-51. 26. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M.& Stoica, I. (2012, April). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of The 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association,2-2.
描述	碩士國立政治大學資訊管理學系 102356042
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0102356042
資料類型	thesis

dc.contributor.advisor	劉文卿<br>張景堯	zh_TW
dc.contributor.author (Authors)	黃日佳	zh_TW
dc.creator (作者)	黃日佳	zh_TW
dc.date (日期)	2016	en_US
dc.date.accessioned	1-Sep-2016 23:45:58 (UTC+8)	-
dc.date.available	1-Sep-2016 23:45:58 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2016 23:45:58 (UTC+8)	-
dc.identifier (Other Identifiers)	G0102356042	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/101077	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	102356042	zh_TW
dc.description.abstract (摘要)	本研究為解決在巨量資料分析下所產生之主記憶體不足之問題,設計虛擬矩陣架構,透過虛擬矩陣架構提供快速、高效能的矩陣操作及運算,並降低巨量資料在運算時所佔據之主記憶體容量。並結合 R 語言,提供 R 語言巨量資料分析、高速矩陣運算之能力。	zh_TW
dc.description.tableofcontents	第一章緒論1 第一節研究背景與動機1 第二節研究目的4 第二章文獻探討5 第一節巨量資料5 第二節 R語言8 第三節 Apache Spark和 Resilient Distributed Datasets (RDDs)11 第四節 Redis13 第三章系統設計15 第一節矩陣的初始化及其資料結構15 第二節虛擬矩陣的初始化及其資料結構17 第三節虛擬矩陣之設計23 第四節 vmdf設計31 第四章實驗數據分析34 第一節記憶體可配置資料數量測試與分析34 第二節虛擬矩陣速度測試與分析36 第三節 R語言與虛擬矩陣整合測試與分析38 第五章結論40 第一節研究結論40 第二節未來展望42 參考文獻43	zh_TW
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0102356042	en_US
dc.subject (關鍵詞)	巨量資料	zh_TW
dc.subject (關鍵詞)	記憶體不足	zh_TW
dc.subject (關鍵詞)	虛擬矩陣	zh_TW
dc.subject (關鍵詞)	矩陣運算	zh_TW
dc.subject (關鍵詞)	R語言	zh_TW
dc.title (題名)	巨量資料分析之虛擬矩陣設計	zh_TW
dc.title (題名)	Designing of Virtual Matrix of Big Data Analysis	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	1. 林思吟. (2006). 「中國上市公司財務危機預警模型研究」政治大學金融研究所學位論文, 1-51. 2. 西內啟著, 陳亦苓譯（2016）「統計學，最強的商業武器從買樂透到大數據，全都離不開統計學；不懂統計學，你就等著被騙吧！」悅知文化出版社. 3. Adler, D., Nenadic, O., Zucchini, W.& Glaser, C. (2008). The ff Package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files. 4. Anton, H. (2010). Elementary linear algebra. John Wiley & Sons 5. Asymptotix (2011). Integrating RevoDeployR from Revolution through RESTful API or XML-RPC with .NET or Drupal. Retrieved from http://www.asymptotix.eu/news/integrating-revodeployr-revolution-through-restful-api-or-xml-rpc-net-or-drupal 6. Beyer, M. A.& Laney, D. (2012). The Importance of ‘Big Data’: A Definition. Stamford, CT: Gartner, 2014-2018. 7. Bivand, R. S. (2000). Using the R statistical data analysis language on GRASS 5.0 GIS database files. Computers & Geosciences, 26(9), 1043-1052. 8. Boyland, J. T. (2005, July). Handling Out of Memory Errors. In ECOOP 2005 Workshop on Exception Handling in Object-Oriented Systems. 9. Cai, X., Nie, F., & Huang, H. (2013, August). Multi-View K-Means Clustering on Big Data. In IJCAI. 10. Constantine A. C., Tim P. (2013). High-Volume Data Collection and Real Time Analytics Using Redis. Retrieved from http://conferences.oreilly.com/strata/strata2013/public/schedule/detail/27350 11. Cribari-Neto, F., & Zarkos, S. G. (1999). R: Yet another econometric programming environment. Journal of Applied Econometrics, 14(3), 319-329. 12. Debasis, S. (2009). Classic Data Structures 2Nd Ed. PHI Learning Pvt. Ltd.. 13. Derksen, S. & Keselman, H. J. (1992). Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables. British Journal of Mathematical and Statistical Psychology, 45(2), 265-282. 14. Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). JHU Press. 15. Han, J., Haihong, E., Le, G. & Du, J. (2011, October). Survey on NoSQL Database. In Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on IEEE，363-366. 16. IBM (2015). The Four V`s of Big Data. Retrieved from http://www.ibmbigdatahub.com/infographic/four-vs-big-data 17. Ihaka, R.& Gentleman, R. (1996). R: a Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314. 18. Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, 6, 70. 19. Matloff, N. (2008). R for Programmers. University of California. 20. Ordonez, C., Zhang, Y., & Cabrera, W. (2016). The Gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1905-1918. 21. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H. A.& Mankovskii, S. (2012). Solving Big Data Challenges for Enterprise Application Performance Management. Proceedings of the VLDB Endowment, 5(12), 1724-1735. 22. Scott, J. A. (2015). Getting Started with Apache Spark. MapR Technologies. 23. Team, R. C. (2000). R Language Definition. Vienna, Austria: R Foundation for Statistical Computing. 24. Venables, W. N.& Smith, D. M. (2009). An Introduction to R. Network Theory Limited. 25. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., Mccauley, M.& Stoica, I. (2012). Fast and Interactive Analytics Over Hadoop Data with Spark. USENIX Login,37(4),45-51. 26. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M.& Stoica, I. (2012, April). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of The 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association,2-2.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM