學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 基於MapReduce框架進行有效的天際線查詢處理
Efficient Skyline Query Processing with MapReduce作者 詹智渝
Chan, Chih Yu貢獻者 陳良弼
Chen, Arbee L.P.
詹智渝
Chan, Chih Yu關鍵詞 天際線查詢
巨量資料
分散式運算日期 2013 上傳時間 1-十一月-2013 11:44:16 (UTC+8) 摘要 隨著人們對資料庫使用的需求增加,使用者對資料的查詢方法也越來越多樣,促使近年來偏好查詢成為一個很熱門的研究議題。在所有的查詢中,Skyline查詢更是在現今資料庫以及資料檢索中熱門的研究題目。伴隨著科技的演進,人們可以收集和利用的資料急劇增長,巨量資料的運算處理變成迫切的問題。藉由Google在2004年發表的一份開放文件中分享了MapReduce程式化運算框架,以往許多查詢在巨量資料環境遇到的障礙都得到有效的解決方案。Skyline查詢是一件高時間複雜度的工作,面臨到巨量資料時的處理更是困難,因此近年來對於Skyline在巨量資料查詢的研究也逐漸熱絡發展。本研究目的在於如何設計更有效的MapReduce演算法使得Skyline查詢處理能夠更有效進行,對此演算法進行詳細的說明,最後在Hadoop平台上實作並驗證此演算法具有更佳的有效性及可用性。
With the increasing number of querying methods, preference queries become a very popular research topic. Among all kinds of queries, skyline query is important in today`s databases and information retrieval. Moreover, the development of technologies makes it possible to collect and utilize the rapid growth of data. Google in 2004 published an open document to share a computing framework named MapReduce, which makes big data processing efficient. Skyline query costs much in processing, and it becomes even more difficult when facing a huge amount of data. In this study, we designed an efficient MapReduce algorithm for skyline queries. We also implemented the algorithm on the Hadoop platform to verify the efficiency and effectiveness of this algorithm.參考文獻 [1] J. Dean, and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Cluster,” in Proceedings of the Operating Systems Design and Implementation, 2004.[2] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” in Proceedings of the International Conference on Data Engineering, 2001. [3] B. L. Zhang, S. G. Zhou, and J. H. Guan, “Adapting Skyline computation to the MapReduce Framework: Algorithms and Experiments,” in Proceeding of the Database Systems for Advanced Applications workshop, 2011. [4] L. L. DING, J. C. XIN, G. R. WANG, and S. HUANG, “Efficient Skyline Query Processing of Massive Data Based on Map-Reduce,” in Chinese Journal of Computers, 2012. [5] J. Chomicki, P. Godfery, J. Gryz, and D. Liang, “Skyline with presorting,” in Proceedings of the International Conference on Data Engineering, 2003.[6] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with presorting: Theory and optimizations,” in Journal of the Intelligent Information Systems, 2005.[7] P. Godfrey, R. Shipley, and J. Gryz, “Maximal vector computation in large data Sets,” in Proceedings of the Very Large Databases, 2005.[8] I. Bartolini, P. Ciaccia, and M. Patella, “SaLSa: Computing the Skyline without Scanning the Whole Sky,” in Proceeding of the Conference on Information and Knowledge Management, 2006.[9] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” in Proceedings of ACM International Conference on Management of Data, 2003.[10] D. Kossmann, F. Ramsak, and S. Rost, “Shooting stars in the sky: an online algorithm for Skyline queries,” in Proceedings of the Very Large Databases, 2002.[11] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive Skyline computation in database systems,” in Proceedings of the Transactions on Database Systems, 2005.[12] S. M. Zhang, N. Mamoulis, and D. W. Cheung, “Scalable Skyline Computation Using Object-based Space Partitioning,” in Proceedings of the ACM International Conference on Management of Data, SIGMOD, 2009[13] B. Cui, H. Lu, Q. Xu, L. Chen, Y. Dai, and Y. Zhou, “Parallel distributed processing of constrained Skyline queries by filtering,” in Proceedings of the International Conference on Data Engineering, 2008. [14] J.B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Nørvåg, “Efficient execution plans for distributed Skyline query processing,” in Proceedings of the Extending Database Technology, 2011. [15] A. Vlachou, C. Doulkeridis, and Y. Kotidis, “Angle-based space partitioning for efficient parallel Skyline computation,” in Proceedings of the ACM International Conference on Management of Data, SIGMOD, 2008. [16] H. Köhler, J. Yang, and X. Zhou, “Efficient Parallel Skyline Processing using Hyperplane Projections,” in Proceedings of the ACM International Conference on Management of Data, SIGMOD, 2011. 描述 碩士
國立政治大學
資訊科學學系
100753037
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100753037 資料類型 thesis dc.contributor.advisor 陳良弼 zh_TW dc.contributor.advisor Chen, Arbee L.P. en_US dc.contributor.author (作者) 詹智渝 zh_TW dc.contributor.author (作者) Chan, Chih Yu en_US dc.creator (作者) 詹智渝 zh_TW dc.creator (作者) Chan, Chih Yu en_US dc.date (日期) 2013 en_US dc.date.accessioned 1-十一月-2013 11:44:16 (UTC+8) - dc.date.available 1-十一月-2013 11:44:16 (UTC+8) - dc.date.issued (上傳時間) 1-十一月-2013 11:44:16 (UTC+8) - dc.identifier (其他 識別碼) G0100753037 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/61492 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 100753037 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 隨著人們對資料庫使用的需求增加,使用者對資料的查詢方法也越來越多樣,促使近年來偏好查詢成為一個很熱門的研究議題。在所有的查詢中,Skyline查詢更是在現今資料庫以及資料檢索中熱門的研究題目。伴隨著科技的演進,人們可以收集和利用的資料急劇增長,巨量資料的運算處理變成迫切的問題。藉由Google在2004年發表的一份開放文件中分享了MapReduce程式化運算框架,以往許多查詢在巨量資料環境遇到的障礙都得到有效的解決方案。Skyline查詢是一件高時間複雜度的工作,面臨到巨量資料時的處理更是困難,因此近年來對於Skyline在巨量資料查詢的研究也逐漸熱絡發展。本研究目的在於如何設計更有效的MapReduce演算法使得Skyline查詢處理能夠更有效進行,對此演算法進行詳細的說明,最後在Hadoop平台上實作並驗證此演算法具有更佳的有效性及可用性。 zh_TW dc.description.abstract (摘要) With the increasing number of querying methods, preference queries become a very popular research topic. Among all kinds of queries, skyline query is important in today`s databases and information retrieval. Moreover, the development of technologies makes it possible to collect and utilize the rapid growth of data. Google in 2004 published an open document to share a computing framework named MapReduce, which makes big data processing efficient. Skyline query costs much in processing, and it becomes even more difficult when facing a huge amount of data. In this study, we designed an efficient MapReduce algorithm for skyline queries. We also implemented the algorithm on the Hadoop platform to verify the efficiency and effectiveness of this algorithm. en_US dc.description.tableofcontents 第1章 緒論 1第2章 相關研究 42.1 Skyline演算法的相關研究 42.2 Skyline在高度分散環境下查詢處理的相關研究 52.3 MapReduce框架 82.4 適用於MapReduce框架的Skyline查詢的相關研究 10第3章 問題與定義 123.1 問題 123.2 直觀演算法 133.3 問題定義 16第4章 資料分割及演算法介紹 184.1 演算法概要 184.2 網格分割及篩選 204.3 角度分割及篩選 224.4 兩種分割方法的效能分析 25第5章 實驗與結果 275.1 回應時間 275.2 片段運算平衡 285.3 片段Global Skyline的貢獻比較 29第6章 結論 31參考文獻 32 zh_TW dc.format.extent 1369104 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100753037 en_US dc.subject (關鍵詞) 天際線查詢 zh_TW dc.subject (關鍵詞) 巨量資料 zh_TW dc.subject (關鍵詞) 分散式運算 zh_TW dc.title (題名) 基於MapReduce框架進行有效的天際線查詢處理 zh_TW dc.title (題名) Efficient Skyline Query Processing with MapReduce en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] J. Dean, and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Cluster,” in Proceedings of the Operating Systems Design and Implementation, 2004.[2] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” in Proceedings of the International Conference on Data Engineering, 2001. [3] B. L. Zhang, S. G. Zhou, and J. H. Guan, “Adapting Skyline computation to the MapReduce Framework: Algorithms and Experiments,” in Proceeding of the Database Systems for Advanced Applications workshop, 2011. [4] L. L. DING, J. C. XIN, G. R. WANG, and S. HUANG, “Efficient Skyline Query Processing of Massive Data Based on Map-Reduce,” in Chinese Journal of Computers, 2012. [5] J. Chomicki, P. Godfery, J. Gryz, and D. Liang, “Skyline with presorting,” in Proceedings of the International Conference on Data Engineering, 2003.[6] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with presorting: Theory and optimizations,” in Journal of the Intelligent Information Systems, 2005.[7] P. Godfrey, R. Shipley, and J. Gryz, “Maximal vector computation in large data Sets,” in Proceedings of the Very Large Databases, 2005.[8] I. Bartolini, P. Ciaccia, and M. Patella, “SaLSa: Computing the Skyline without Scanning the Whole Sky,” in Proceeding of the Conference on Information and Knowledge Management, 2006.[9] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” in Proceedings of ACM International Conference on Management of Data, 2003.[10] D. Kossmann, F. Ramsak, and S. Rost, “Shooting stars in the sky: an online algorithm for Skyline queries,” in Proceedings of the Very Large Databases, 2002.[11] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “Progressive Skyline computation in database systems,” in Proceedings of the Transactions on Database Systems, 2005.[12] S. M. Zhang, N. Mamoulis, and D. W. Cheung, “Scalable Skyline Computation Using Object-based Space Partitioning,” in Proceedings of the ACM International Conference on Management of Data, SIGMOD, 2009[13] B. Cui, H. Lu, Q. Xu, L. Chen, Y. Dai, and Y. Zhou, “Parallel distributed processing of constrained Skyline queries by filtering,” in Proceedings of the International Conference on Data Engineering, 2008. [14] J.B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Nørvåg, “Efficient execution plans for distributed Skyline query processing,” in Proceedings of the Extending Database Technology, 2011. [15] A. Vlachou, C. Doulkeridis, and Y. Kotidis, “Angle-based space partitioning for efficient parallel Skyline computation,” in Proceedings of the ACM International Conference on Management of Data, SIGMOD, 2008. [16] H. Köhler, J. Yang, and X. Zhou, “Efficient Parallel Skyline Processing using Hyperplane Projections,” in Proceedings of the ACM International Conference on Management of Data, SIGMOD, 2011. zh_TW