學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 以MapReduce做有效率的天際線查詢
Efficient Skyline Computation with MapReduce作者 陳家慶
Chen, Chia Ching貢獻者 陳良弼
Chen, Arbee L.P.
陳家慶
Chen, Chia Ching關鍵詞 巨量資料
天際線
Big Data
Skyline
MapReduce日期 2013 上傳時間 1-十一月-2013 11:43:53 (UTC+8) 摘要 隨著巨量資料的議題逐漸被重視,有越來越多的巨量資料的分析都利用MapReduce作計算處理。而在資料庫查詢中,天際線查詢是一種常見的決策分析方法,其目的是要幫助使用者找出資料庫中各維度的數值貼近使用者查詢條件的資料。然而,過去在大量資料的查詢方法中,如果資料筆數較多,同時查詢的維度也大的情況下,往往會有著效率不彰的問題。因此,本研究提出一種在大量資料中,有效率應用MapReduce作天際線查詢的方法。而根據實驗結果顯示,我們的方法,比先前方法更有效率。
With the big data issue being taken seriously today, more and more big data is processed with MapReduce. Moreover, skyline query is a common method for decision making, which helps users find the data whose value in each dimension is close to the user query. In the past, if the data is huge, or the data space involves many dimensions, the query processing becomes inefficient. Therefore, in this study, we present a new method to process skyline queries with MapReduce. According to the experimental results, our method is more efficient than previous methods.參考文獻 [1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Proceedings of the Operating Systems Design and Implementation, 2004.[2] S. BÄorzsÄonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” in Proceedings of the International Conference on Data Engineering, 2001.[3] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” in Proceedings of the Very Large Databases, 2002.[4] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” in Proceedings of ACM International Conference on Management of Data, 2003.[5] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting: Theory and Optimizations,” in Journal of the Intelligent Information Systems, 2005.[6] J. Chomicki, P. Godfery, and J. Gryz, and D. Liang, “Skyline with Presorting,” in Proceedings of the International Conference on Data Engineering, 2003 [7] P. Godfrey, R. Shipley, and J. Gryz, “Maximal Vector Computation,” in Proceedings of the Very Large Databases, 2005.[8] S. Zhang, N. Mamoulis, and D. W. Cheung, “Scalable Skyline Computation Using Object-Based Space Partitioning.” in Proceedings of ACM International Conference on Management of Data, 2009.[9] J. Lee and S. Hwang, “BSkyTree: Scalable Skyline Computation Using a Balanced Pivot Selection,” in Proceedings of the Extending Database Technology, 2010.[10] A. Cosgaya-Lozano, A. Rau-Chaplin, and N. Zeh, “Parallel Computation of Skyline Queries,” in Proceedings of the International Symposium on High Performance Computing Systems and Applications, 2007.[11] P. Wu, C. Zhang, Y. Feng, B. Y. Zhao, D. Agrawal, and A. E. Abbadi, “Parallelizing Skyline Queries for Scalable Distribution,” in Proceedings of the Extending Database Technology, 2006.[12] A. Vlachou, C. Doulkeridis, and Y. Kotidis, “Angle-Based Space Partitioning for Efficient Parallel Skyline Computation,” in Proceedings of ACM International Conference on Management of Data, 2008.[13] H. Kohler, J. Yang, and X. Zhou, “Efficient Parallel Skyline Processing Using Hyper Plane Projections,” in Proceedings of ACM International Conference on Management of Data, 2011.[14] Boliang Zhang, Shuigeng Zhou, and Jihong Guan, “Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments,” in Proceedings of the International Conference on Database Systems for Advanced Applications, 2011.[15] L. Chen, K. Hwang, and W. Jian, “MapReduce Skyline Query Processing with a New Angular Partitioning Approach," in Proceedings of the Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012. 描述 碩士
國立政治大學
資訊科學學系
100753002
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100753002 資料類型 thesis dc.contributor.advisor 陳良弼 zh_TW dc.contributor.advisor Chen, Arbee L.P. en_US dc.contributor.author (作者) 陳家慶 zh_TW dc.contributor.author (作者) Chen, Chia Ching en_US dc.creator (作者) 陳家慶 zh_TW dc.creator (作者) Chen, Chia Ching en_US dc.date (日期) 2013 en_US dc.date.accessioned 1-十一月-2013 11:43:53 (UTC+8) - dc.date.available 1-十一月-2013 11:43:53 (UTC+8) - dc.date.issued (上傳時間) 1-十一月-2013 11:43:53 (UTC+8) - dc.identifier (其他 識別碼) G0100753002 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/61490 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 100753002 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 隨著巨量資料的議題逐漸被重視,有越來越多的巨量資料的分析都利用MapReduce作計算處理。而在資料庫查詢中,天際線查詢是一種常見的決策分析方法,其目的是要幫助使用者找出資料庫中各維度的數值貼近使用者查詢條件的資料。然而,過去在大量資料的查詢方法中,如果資料筆數較多,同時查詢的維度也大的情況下,往往會有著效率不彰的問題。因此,本研究提出一種在大量資料中,有效率應用MapReduce作天際線查詢的方法。而根據實驗結果顯示,我們的方法,比先前方法更有效率。 zh_TW dc.description.abstract (摘要) With the big data issue being taken seriously today, more and more big data is processed with MapReduce. Moreover, skyline query is a common method for decision making, which helps users find the data whose value in each dimension is close to the user query. In the past, if the data is huge, or the data space involves many dimensions, the query processing becomes inefficient. Therefore, in this study, we present a new method to process skyline queries with MapReduce. According to the experimental results, our method is more efficient than previous methods. en_US dc.description.tableofcontents 第1章 緒論 1第2章 相關研究 32.1 天際線查詢 32.2 平行天際線查詢 42.3 MapReduce 52.4 Skyline在MapReduce的演算法 6第3章 問題定義 7第4章 MR-Sketch演算法 84.1 資料過濾階段 94.2 外部切割階段 114.21 完全配對分割法(All-Pair Partitioning) 114.22 中間值切割法Middle Split Partition 144.23 Key數與資料傳輸係數 164.24 維度切割的選擇 194.3 內部切割階段 20第5章 實驗結果 235.1 資料型態 235.2 實驗流程 255.3 維度大小與資料筆數的影響 265.4 取樣點數對於實驗結果的影響 295.5 外部分割最大的分裂維度對實驗的影響 30第6章 結論 32參考文獻 33 zh_TW dc.format.extent 4939065 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100753002 en_US dc.subject (關鍵詞) 巨量資料 zh_TW dc.subject (關鍵詞) 天際線 zh_TW dc.subject (關鍵詞) Big Data en_US dc.subject (關鍵詞) Skyline en_US dc.subject (關鍵詞) MapReduce en_US dc.title (題名) 以MapReduce做有效率的天際線查詢 zh_TW dc.title (題名) Efficient Skyline Computation with MapReduce en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Proceedings of the Operating Systems Design and Implementation, 2004.[2] S. BÄorzsÄonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” in Proceedings of the International Conference on Data Engineering, 2001.[3] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” in Proceedings of the Very Large Databases, 2002.[4] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” in Proceedings of ACM International Conference on Management of Data, 2003.[5] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting: Theory and Optimizations,” in Journal of the Intelligent Information Systems, 2005.[6] J. Chomicki, P. Godfery, and J. Gryz, and D. Liang, “Skyline with Presorting,” in Proceedings of the International Conference on Data Engineering, 2003 [7] P. Godfrey, R. Shipley, and J. Gryz, “Maximal Vector Computation,” in Proceedings of the Very Large Databases, 2005.[8] S. Zhang, N. Mamoulis, and D. W. Cheung, “Scalable Skyline Computation Using Object-Based Space Partitioning.” in Proceedings of ACM International Conference on Management of Data, 2009.[9] J. Lee and S. Hwang, “BSkyTree: Scalable Skyline Computation Using a Balanced Pivot Selection,” in Proceedings of the Extending Database Technology, 2010.[10] A. Cosgaya-Lozano, A. Rau-Chaplin, and N. Zeh, “Parallel Computation of Skyline Queries,” in Proceedings of the International Symposium on High Performance Computing Systems and Applications, 2007.[11] P. Wu, C. Zhang, Y. Feng, B. Y. Zhao, D. Agrawal, and A. E. Abbadi, “Parallelizing Skyline Queries for Scalable Distribution,” in Proceedings of the Extending Database Technology, 2006.[12] A. Vlachou, C. Doulkeridis, and Y. Kotidis, “Angle-Based Space Partitioning for Efficient Parallel Skyline Computation,” in Proceedings of ACM International Conference on Management of Data, 2008.[13] H. Kohler, J. Yang, and X. Zhou, “Efficient Parallel Skyline Processing Using Hyper Plane Projections,” in Proceedings of ACM International Conference on Management of Data, 2011.[14] Boliang Zhang, Shuigeng Zhou, and Jihong Guan, “Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments,” in Proceedings of the International Conference on Database Systems for Advanced Applications, 2011.[15] L. Chen, K. Hwang, and W. Jian, “MapReduce Skyline Query Processing with a New Angular Partitioning Approach," in Proceedings of the Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012. zh_TW