學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 以MapReduce做有效率的天際線查詢
Efficient Skyline Computation with MapReduce
作者 陳家慶
Chen, Chia Ching
貢獻者 陳良弼
Chen, Arbee L.P.
陳家慶
Chen, Chia Ching
關鍵詞 巨量資料
天際線
Big Data
Skyline
MapReduce
日期 2013
上傳時間 1-十一月-2013 11:43:53 (UTC+8)
摘要 隨著巨量資料的議題逐漸被重視,有越來越多的巨量資料的分析都利用MapReduce作計算處理。而在資料庫查詢中,天際線查詢是一種常見的決策分析方法,其目的是要幫助使用者找出資料庫中各維度的數值貼近使用者查詢條件的資料。然而,過去在大量資料的查詢方法中,如果資料筆數較多,同時查詢的維度也大的情況下,往往會有著效率不彰的問題。因此,本研究提出一種在大量資料中,有效率應用MapReduce作天際線查詢的方法。而根據實驗結果顯示,我們的方法,比先前方法更有效率。
With the big data issue being taken seriously today, more and more big data is processed with MapReduce. Moreover, skyline query is a common method for decision making, which helps users find the data whose value in each dimension is close to the user query. In the past, if the data is huge, or the data space involves many dimensions, the query processing becomes inefficient. Therefore, in this study, we present a new method to process skyline queries with MapReduce. According to the experimental results, our method is more efficient than previous methods.
參考文獻 [1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Proceedings of the Operating Systems Design and Implementation, 2004.
[2] S. BÄorzsÄonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” in Proceedings of the International Conference on Data Engineering, 2001.
[3] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” in Proceedings of the Very Large Databases, 2002.
[4] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” in Proceedings of ACM International Conference on Management of Data, 2003.
[5] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting: Theory and Optimizations,” in Journal of the Intelligent Information Systems, 2005.
[6] J. Chomicki, P. Godfery, and J. Gryz, and D. Liang, “Skyline with Presorting,” in Proceedings of the International Conference on Data Engineering, 2003
[7] P. Godfrey, R. Shipley, and J. Gryz, “Maximal Vector Computation,” in Proceedings of the Very Large Databases, 2005.
[8] S. Zhang, N. Mamoulis, and D. W. Cheung, “Scalable Skyline Computation Using Object-Based Space Partitioning.” in Proceedings of ACM International Conference on Management of Data, 2009.
[9] J. Lee and S. Hwang, “BSkyTree: Scalable Skyline Computation Using a Balanced Pivot Selection,” in Proceedings of the Extending Database Technology, 2010.
[10] A. Cosgaya-Lozano, A. Rau-Chaplin, and N. Zeh, “Parallel Computation of Skyline Queries,” in Proceedings of the International Symposium on High Performance Computing Systems and Applications, 2007.
[11] P. Wu, C. Zhang, Y. Feng, B. Y. Zhao, D. Agrawal, and A. E. Abbadi, “Parallelizing Skyline Queries for Scalable Distribution,” in Proceedings of the Extending Database Technology, 2006.
[12] A. Vlachou, C. Doulkeridis, and Y. Kotidis, “Angle-Based Space Partitioning for Efficient Parallel Skyline Computation,” in Proceedings of ACM International Conference on Management of Data, 2008.
[13] H. Kohler, J. Yang, and X. Zhou, “Efficient Parallel Skyline Processing Using Hyper Plane Projections,” in Proceedings of ACM International Conference on Management of Data, 2011.
[14] Boliang Zhang, Shuigeng Zhou, and Jihong Guan, “Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments,” in Proceedings of the International Conference on Database Systems for Advanced Applications, 2011.
[15] L. Chen, K. Hwang, and W. Jian, “MapReduce Skyline Query Processing with a New Angular Partitioning Approach," in Proceedings of the Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012.
描述 碩士
國立政治大學
資訊科學學系
100753002
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100753002
資料類型 thesis
dc.contributor.advisor 陳良弼zh_TW
dc.contributor.advisor Chen, Arbee L.P.en_US
dc.contributor.author (作者) 陳家慶zh_TW
dc.contributor.author (作者) Chen, Chia Chingen_US
dc.creator (作者) 陳家慶zh_TW
dc.creator (作者) Chen, Chia Chingen_US
dc.date (日期) 2013en_US
dc.date.accessioned 1-十一月-2013 11:43:53 (UTC+8)-
dc.date.available 1-十一月-2013 11:43:53 (UTC+8)-
dc.date.issued (上傳時間) 1-十一月-2013 11:43:53 (UTC+8)-
dc.identifier (其他 識別碼) G0100753002en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/61490-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 100753002zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 隨著巨量資料的議題逐漸被重視,有越來越多的巨量資料的分析都利用MapReduce作計算處理。而在資料庫查詢中,天際線查詢是一種常見的決策分析方法,其目的是要幫助使用者找出資料庫中各維度的數值貼近使用者查詢條件的資料。然而,過去在大量資料的查詢方法中,如果資料筆數較多,同時查詢的維度也大的情況下,往往會有著效率不彰的問題。因此,本研究提出一種在大量資料中,有效率應用MapReduce作天際線查詢的方法。而根據實驗結果顯示,我們的方法,比先前方法更有效率。zh_TW
dc.description.abstract (摘要) With the big data issue being taken seriously today, more and more big data is processed with MapReduce. Moreover, skyline query is a common method for decision making, which helps users find the data whose value in each dimension is close to the user query. In the past, if the data is huge, or the data space involves many dimensions, the query processing becomes inefficient. Therefore, in this study, we present a new method to process skyline queries with MapReduce. According to the experimental results, our method is more efficient than previous methods.en_US
dc.description.tableofcontents 第1章 緒論 1
第2章 相關研究 3
2.1 天際線查詢 3
2.2 平行天際線查詢 4
2.3 MapReduce 5
2.4 Skyline在MapReduce的演算法 6
第3章 問題定義 7
第4章 MR-Sketch演算法 8
4.1 資料過濾階段 9
4.2 外部切割階段 11
4.21 完全配對分割法(All-Pair Partitioning) 11
4.22 中間值切割法Middle Split Partition 14
4.23 Key數與資料傳輸係數 16
4.24 維度切割的選擇 19
4.3 內部切割階段 20
第5章 實驗結果 23
5.1 資料型態 23
5.2 實驗流程 25
5.3 維度大小與資料筆數的影響 26
5.4 取樣點數對於實驗結果的影響 29
5.5 外部分割最大的分裂維度對實驗的影響 30
第6章 結論 32
參考文獻 33
zh_TW
dc.format.extent 4939065 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100753002en_US
dc.subject (關鍵詞) 巨量資料zh_TW
dc.subject (關鍵詞) 天際線zh_TW
dc.subject (關鍵詞) Big Dataen_US
dc.subject (關鍵詞) Skylineen_US
dc.subject (關鍵詞) MapReduceen_US
dc.title (題名) 以MapReduce做有效率的天際線查詢zh_TW
dc.title (題名) Efficient Skyline Computation with MapReduceen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Proceedings of the Operating Systems Design and Implementation, 2004.
[2] S. BÄorzsÄonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” in Proceedings of the International Conference on Data Engineering, 2001.
[3] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” in Proceedings of the Very Large Databases, 2002.
[4] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” in Proceedings of ACM International Conference on Management of Data, 2003.
[5] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting: Theory and Optimizations,” in Journal of the Intelligent Information Systems, 2005.
[6] J. Chomicki, P. Godfery, and J. Gryz, and D. Liang, “Skyline with Presorting,” in Proceedings of the International Conference on Data Engineering, 2003
[7] P. Godfrey, R. Shipley, and J. Gryz, “Maximal Vector Computation,” in Proceedings of the Very Large Databases, 2005.
[8] S. Zhang, N. Mamoulis, and D. W. Cheung, “Scalable Skyline Computation Using Object-Based Space Partitioning.” in Proceedings of ACM International Conference on Management of Data, 2009.
[9] J. Lee and S. Hwang, “BSkyTree: Scalable Skyline Computation Using a Balanced Pivot Selection,” in Proceedings of the Extending Database Technology, 2010.
[10] A. Cosgaya-Lozano, A. Rau-Chaplin, and N. Zeh, “Parallel Computation of Skyline Queries,” in Proceedings of the International Symposium on High Performance Computing Systems and Applications, 2007.
[11] P. Wu, C. Zhang, Y. Feng, B. Y. Zhao, D. Agrawal, and A. E. Abbadi, “Parallelizing Skyline Queries for Scalable Distribution,” in Proceedings of the Extending Database Technology, 2006.
[12] A. Vlachou, C. Doulkeridis, and Y. Kotidis, “Angle-Based Space Partitioning for Efficient Parallel Skyline Computation,” in Proceedings of ACM International Conference on Management of Data, 2008.
[13] H. Kohler, J. Yang, and X. Zhou, “Efficient Parallel Skyline Processing Using Hyper Plane Projections,” in Proceedings of ACM International Conference on Management of Data, 2011.
[14] Boliang Zhang, Shuigeng Zhou, and Jihong Guan, “Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments,” in Proceedings of the International Conference on Database Systems for Advanced Applications, 2011.
[15] L. Chen, K. Hwang, and W. Jian, “MapReduce Skyline Query Processing with a New Angular Partitioning Approach," in Proceedings of the Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012.
zh_TW