Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 巨量資料分析之虛擬矩陣設計
Designing of Virtual Matrix of Big Data Analysis作者 黃日佳 貢獻者 劉文卿<br>張景堯
黃日佳關鍵詞 巨量資料
記憶體不足
虛擬矩陣
矩陣運算
R語言日期 2016 上傳時間 1-Sep-2016 23:45:58 (UTC+8) 摘要 本研究為解決在巨量資料分析下所產生之主記憶體不足之問題,設計虛擬矩 陣架構,透過虛擬矩陣架構提供快速、高效能的矩陣操作及運算,並降低巨量資 料在運算時所佔據之主記憶體容量。並結合 R 語言,提供 R 語言巨量資料分析、 高速矩陣運算之能力。 參考文獻 1. 林思吟. (2006). 「中國上市公司財務危機預警模型研究」 政治大學金融研究所學位論文, 1-51. 2. 西內啟著, 陳亦苓譯(2016)「統計學,最強的商業武器 從買樂透到大數據,全都離不開統計學;不懂統計學,你就等著被騙吧!」悅知文化出版社. 3. Adler, D., Nenadic, O., Zucchini, W.& Glaser, C. (2008). The ff Package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files. 4. Anton, H. (2010). Elementary linear algebra. John Wiley & Sons 5. Asymptotix (2011). Integrating RevoDeployR from Revolution through RESTful API or XML-RPC with .NET or Drupal. Retrieved from http://www.asymptotix.eu/news/integrating-revodeployr-revolution-through-restful-api-or-xml-rpc-net-or-drupal 6. Beyer, M. A.& Laney, D. (2012). The Importance of ‘Big Data’: A Definition. Stamford, CT: Gartner, 2014-2018. 7. Bivand, R. S. (2000). Using the R statistical data analysis language on GRASS 5.0 GIS database files. Computers & Geosciences, 26(9), 1043-1052. 8. Boyland, J. T. (2005, July). Handling Out of Memory Errors. In ECOOP 2005 Workshop on Exception Handling in Object-Oriented Systems. 9. Cai, X., Nie, F., & Huang, H. (2013, August). Multi-View K-Means Clustering on Big Data. In IJCAI. 10. Constantine A. C., Tim P. (2013). High-Volume Data Collection and Real Time Analytics Using Redis. Retrieved from http://conferences.oreilly.com/strata/strata2013/public/schedule/detail/27350 11. Cribari-Neto, F., & Zarkos, S. G. (1999). R: Yet another econometric programming environment. Journal of Applied Econometrics, 14(3), 319-329. 12. Debasis, S. (2009). Classic Data Structures 2Nd Ed. PHI Learning Pvt. Ltd.. 13. Derksen, S. & Keselman, H. J. (1992). Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables. British Journal of Mathematical and Statistical Psychology, 45(2), 265-282. 14. Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). JHU Press. 15. Han, J., Haihong, E., Le, G. & Du, J. (2011, October). Survey on NoSQL Database. In Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on IEEE,363-366. 16. IBM (2015). The Four V`s of Big Data. Retrieved from http://www.ibmbigdatahub.com/infographic/four-vs-big-data 17. Ihaka, R.& Gentleman, R. (1996). R: a Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314. 18. Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, 6, 70. 19. Matloff, N. (2008). R for Programmers. University of California. 20. Ordonez, C., Zhang, Y., & Cabrera, W. (2016). The Gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1905-1918. 21. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H. A.& Mankovskii, S. (2012). Solving Big Data Challenges for Enterprise Application Performance Management. Proceedings of the VLDB Endowment, 5(12), 1724-1735. 22. Scott, J. A. (2015). Getting Started with Apache Spark. MapR Technologies. 23. Team, R. C. (2000). R Language Definition. Vienna, Austria: R Foundation for Statistical Computing. 24. Venables, W. N.& Smith, D. M. (2009). An Introduction to R. Network Theory Limited. 25. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., Mccauley, M.& Stoica, I. (2012). Fast and Interactive Analytics Over Hadoop Data with Spark. USENIX Login,37(4),45-51. 26. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M.& Stoica, I. (2012, April). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of The 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association,2-2. 描述 碩士
國立政治大學
資訊管理學系
102356042資料來源 http://thesis.lib.nccu.edu.tw/record/#G0102356042 資料類型 thesis dc.contributor.advisor 劉文卿<br>張景堯 zh_TW dc.contributor.author (Authors) 黃日佳 zh_TW dc.creator (作者) 黃日佳 zh_TW dc.date (日期) 2016 en_US dc.date.accessioned 1-Sep-2016 23:45:58 (UTC+8) - dc.date.available 1-Sep-2016 23:45:58 (UTC+8) - dc.date.issued (上傳時間) 1-Sep-2016 23:45:58 (UTC+8) - dc.identifier (Other Identifiers) G0102356042 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/101077 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 102356042 zh_TW dc.description.abstract (摘要) 本研究為解決在巨量資料分析下所產生之主記憶體不足之問題,設計虛擬矩 陣架構,透過虛擬矩陣架構提供快速、高效能的矩陣操作及運算,並降低巨量資 料在運算時所佔據之主記憶體容量。並結合 R 語言,提供 R 語言巨量資料分析、 高速矩陣運算之能力。 zh_TW dc.description.tableofcontents 第一章 緒論1 第一節 研究背景與動機1 第二節 研究目的4 第二章 文獻探討5 第一節 巨量資料5 第二節 R語言8 第三節 Apache Spark和 Resilient Distributed Datasets (RDDs)11 第四節 Redis13 第三章 系統設計15 第一節 矩陣的初始化及其資料結構15 第二節 虛擬矩陣的初始化及其資料結構17 第三節 虛擬矩陣之設計23 第四節 vmdf設計31 第四章 實驗數據分析34 第一節 記憶體可配置資料數量測試與分析34 第二節 虛擬矩陣速度測試與分析36 第三節 R語言與虛擬矩陣整合測試與分析38 第五章 結論40 第一節 研究結論40 第二節 未來展望42 參考文獻43 zh_TW dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0102356042 en_US dc.subject (關鍵詞) 巨量資料 zh_TW dc.subject (關鍵詞) 記憶體不足 zh_TW dc.subject (關鍵詞) 虛擬矩陣 zh_TW dc.subject (關鍵詞) 矩陣運算 zh_TW dc.subject (關鍵詞) R語言 zh_TW dc.title (題名) 巨量資料分析之虛擬矩陣設計 zh_TW dc.title (題名) Designing of Virtual Matrix of Big Data Analysis en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) 1. 林思吟. (2006). 「中國上市公司財務危機預警模型研究」 政治大學金融研究所學位論文, 1-51. 2. 西內啟著, 陳亦苓譯(2016)「統計學,最強的商業武器 從買樂透到大數據,全都離不開統計學;不懂統計學,你就等著被騙吧!」悅知文化出版社. 3. Adler, D., Nenadic, O., Zucchini, W.& Glaser, C. (2008). The ff Package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files. 4. Anton, H. (2010). Elementary linear algebra. John Wiley & Sons 5. Asymptotix (2011). Integrating RevoDeployR from Revolution through RESTful API or XML-RPC with .NET or Drupal. Retrieved from http://www.asymptotix.eu/news/integrating-revodeployr-revolution-through-restful-api-or-xml-rpc-net-or-drupal 6. Beyer, M. A.& Laney, D. (2012). The Importance of ‘Big Data’: A Definition. Stamford, CT: Gartner, 2014-2018. 7. Bivand, R. S. (2000). Using the R statistical data analysis language on GRASS 5.0 GIS database files. Computers & Geosciences, 26(9), 1043-1052. 8. Boyland, J. T. (2005, July). Handling Out of Memory Errors. In ECOOP 2005 Workshop on Exception Handling in Object-Oriented Systems. 9. Cai, X., Nie, F., & Huang, H. (2013, August). Multi-View K-Means Clustering on Big Data. In IJCAI. 10. Constantine A. C., Tim P. (2013). High-Volume Data Collection and Real Time Analytics Using Redis. Retrieved from http://conferences.oreilly.com/strata/strata2013/public/schedule/detail/27350 11. Cribari-Neto, F., & Zarkos, S. G. (1999). R: Yet another econometric programming environment. Journal of Applied Econometrics, 14(3), 319-329. 12. Debasis, S. (2009). Classic Data Structures 2Nd Ed. PHI Learning Pvt. Ltd.. 13. Derksen, S. & Keselman, H. J. (1992). Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables. British Journal of Mathematical and Statistical Psychology, 45(2), 265-282. 14. Golub, G. H., & Van Loan, C. F. (2012). Matrix computations (Vol. 3). JHU Press. 15. Han, J., Haihong, E., Le, G. & Du, J. (2011, October). Survey on NoSQL Database. In Pervasive Computing and Applications (ICPCA), 2011 6th International Conference on IEEE,363-366. 16. IBM (2015). The Four V`s of Big Data. Retrieved from http://www.ibmbigdatahub.com/infographic/four-vs-big-data 17. Ihaka, R.& Gentleman, R. (1996). R: a Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314. 18. Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, 6, 70. 19. Matloff, N. (2008). R for Programmers. University of California. 20. Ordonez, C., Zhang, Y., & Cabrera, W. (2016). The Gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1905-1918. 21. Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H. A.& Mankovskii, S. (2012). Solving Big Data Challenges for Enterprise Application Performance Management. Proceedings of the VLDB Endowment, 5(12), 1724-1735. 22. Scott, J. A. (2015). Getting Started with Apache Spark. MapR Technologies. 23. Team, R. C. (2000). R Language Definition. Vienna, Austria: R Foundation for Statistical Computing. 24. Venables, W. N.& Smith, D. M. (2009). An Introduction to R. Network Theory Limited. 25. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., Mccauley, M.& Stoica, I. (2012). Fast and Interactive Analytics Over Hadoop Data with Spark. USENIX Login,37(4),45-51. 26. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M.& Stoica, I. (2012, April). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of The 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association,2-2. zh_TW