題名 巨量資料分析之虛擬矩陣設計
Designing of Virtual Matrix of Big Data Analysis
作者 黃日佳
貢獻者 劉文卿<br>張景堯
關鍵詞 巨量資料
日期 2016
上傳時間 1-Sep-2016 23:45:58 (UTC+8)
摘要 本研究為解決在巨量資料分析下所產生之主記憶體不足之問題,設計虛擬矩 陣架構,透過虛擬矩陣架構提供快速、高效能的矩陣操作及運算,並降低巨量資 料在運算時所佔據之主記憶體容量。並結合 R 語言,提供 R 語言巨量資料分析、 高速矩陣運算之能力。
dc.description.abstract (摘要) 本研究為解決在巨量資料分析下所產生之主記憶體不足之問題,設計虛擬矩 陣架構,透過虛擬矩陣架構提供快速、高效能的矩陣操作及運算,並降低巨量資 料在運算時所佔據之主記憶體容量。並結合 R 語言,提供 R 語言巨量資料分析、 高速矩陣運算之能力。zh_TW
dc.description.tableofcontents 第一章 緒論1
     第一節 研究背景與動機1
     第二節 研究目的4
     第二章 文獻探討5
     第一節 巨量資料5
     第二節 R語言8
     第三節 Apache Spark和 Resilient Distributed Datasets (RDDs)11
     第四節 Redis13
     第三章 系統設計15
     第一節 矩陣的初始化及其資料結構15
     第二節 虛擬矩陣的初始化及其資料結構17
     第三節 虛擬矩陣之設計23
     第四節 vmdf設計31
     第四章 實驗數據分析34
     第一節 記憶體可配置資料數量測試與分析34
     第二節 虛擬矩陣速度測試與分析36
     第三節 R語言與虛擬矩陣整合測試與分析38
     第五章 結論40
     第一節 研究結論40
     第二節 未來展望42
dc.subject (關鍵詞) 巨量資料zh_TW
dc.subject (關鍵詞) 記憶體不足zh_TW
dc.subject (關鍵詞) 虛擬矩陣zh_TW
dc.subject (關鍵詞) 矩陣運算zh_TW
dc.subject (關鍵詞) R語言zh_TW
dc.title (題名) 巨量資料分析之虛擬矩陣設計zh_TW
dc.title (題名) Designing of Virtual Matrix of Big Data Analysisen_US
dc.relation.reference (參考文獻) 1. 林思吟. (2006). 「中國上市公司財務危機預警模型研究」 政治大學金融研究所學位論文, 1-51.
     2. 西內啟著, 陳亦苓譯(2016)「統計學,最強的商業武器 從買樂透到大數據,全都離不開統計學;不懂統計學,你就等著被騙吧!」悅知文化出版社.
     3. Adler, D., Nenadic, O., Zucchini, W.& Glaser, C. (2008). The ff Package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files.
     4. Anton, H. (2010). Elementary linear algebra. John Wiley & Sons
     5. Asymptotix (2011). Integrating RevoDeployR from Revolution through RESTful API or XML-RPC with .NET or Drupal. Retrieved from
     6. Beyer, M. A.& Laney, D. (2012). The Importance of ‘Big Data’: A Definition. Stamford, CT: Gartner, 2014-2018.
     7. Bivand, R. S. (2000). Using the R statistical data analysis language on GRASS 5.0 GIS database files. Computers & Geosciences, 26(9), 1043-1052.
     8. Boyland, J. T. (2005, July). Handling Out of Memory Errors. In ECOOP 2005 Workshop on Exception Handling in Object-Oriented Systems.
     9. Cai, X., Nie, F., & Huang, H. (2013, August). Multi-View K-Means Clustering on Big Data. In IJCAI.
     10. Constantine A. C., Tim P. (2013). High-Volume Data Collection and Real Time Analytics Using Redis. Retrieved from
     11. Cribari-Neto, F., & Zarkos, S. G. (1999). R: Yet another econometric programming environment. Journal of Applied Econometrics, 14(3), 319-329.
     12. Debasis, S. (2009). Classic Data Structures 2Nd Ed. PHI Learning Pvt. Ltd..
     13. Derksen, S. & Keselman, H. J. (1992). Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables. British Journal of Mathematical and Statistical Psychology, 45(2), 265-282.
     14. Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). JHU Press.
     15. Han, J., Haihong, E., Le, G. & Du, J. (2011, October). Survey on NoSQL Database. In Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on IEEE,363-366.
     16. IBM (2015). The Four V`s of Big Data. Retrieved from
     17. Ihaka, R.& Gentleman, R. (1996). R: a Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314.
     18. Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, 6, 70.
     19. Matloff, N. (2008). R for Programmers. University of California.
     20. Ordonez, C., Zhang, Y., & Cabrera, W. (2016). The Gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1905-1918.
     21. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H. A.& Mankovskii, S. (2012). Solving Big Data Challenges for Enterprise Application Performance Management. Proceedings of the VLDB Endowment, 5(12), 1724-1735.
     22. Scott, J. A. (2015). Getting Started with Apache Spark. MapR Technologies.
     23. Team, R. C. (2000). R Language Definition. Vienna, Austria: R Foundation for Statistical Computing.
     24. Venables, W. N.& Smith, D. M. (2009). An Introduction to R. Network Theory Limited.
     25. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., Mccauley, M.& Stoica, I. (2012). Fast and Interactive Analytics Over Hadoop Data with Spark. USENIX Login,37(4),45-51.
     26. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M.& Stoica, I. (2012, April). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of The 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association,2-2.