在高度分散式環境下對高維度資料建立索引 | Publication

Publications-Theses

Article View/Open

html(622)

Publication Export

Google Scholar^TM

題名	在高度分散式環境下對高維度資料建立索引 Indexing high-dimensional data in highly distributed environments
作者	黃齡葦 Huang, Ling Wei
貢獻者	陳良弼 Chen , Arbee L.P. 黃齡葦 Huang, Ling Wei
關鍵詞	索引 KNN查詢高度分散式環境參考點 Index P2P Distributed Enviroments Reference Point
日期	2012
上傳時間	1-Feb-2013 16:53:38 (UTC+8)
摘要	目前，隨著資料急速地增加，大規模可擴充性的高度分散式資料庫服務已逐漸成為一種趨勢。在資料如此分散的環境下，如何讓資料的查詢更有效率，建立一個好的索引扮演著相當重要的角色，加上越來越多的資料庫程式應用像是生物、圖像、音樂和視訊等等，皆是處理高維度的資料，而在這些應用程式中，經常需要做相似資料的查詢，但是在高維度的資料且分散式的資料做相似資料的查詢，需耗費大量的時間與運算成本。基於在高度分散式的環境下，針對高維度的資料有效地做KNN的查詢。我們提出一個利用reference point[2,13]的作法RP-CAN( Reference Point-Content Addressable Network )來改善查詢的效率。RP-CAN 主要是結合CAN [14] 的路由協定和使用reference point建立索引的方式來幫助在高度分散式環境下有效率的對高維的資料做查詢處理。最後會實作出我們所提出的RP-CAN索引並與RT-CAN[1]做比較。我們發現我們所提出的RP-CAN索引在高維度資料作KNN的查詢時比RT-CAN索引來的有效率。 There has been an increasing interest in deploying a storage system in a highly distributed environment because of the rapid increasing data. And many database applications such as time series, biological and multimedia database, handle high-dimensional data. In these systems, k nearest-neighbors query is one of the most frequent queries but costly operation that is to find objects in the high-dimensional database that are similar to a given query object. As in conventional DBMS, indexes can indeed improve query performance but cannot deploy directly in highly distributed systems because the environment has become more complex. To efficiently support k nearest-neighbors query, a high-dimensional indexing strategy, is developed for the highly distributed environment. In this paper, we propose an efficient indexing strategy, RP-CAN( Reference Point-Content Addressable Network ), to improve the performance of the k nearest-neighbors query in a highly distributed environment. In the end of this paper, we designed an experiment to demonstrate that the performance of RP-CAN is better than RT-CAN in high dimensional space. Thus, our RP-CAN index could efficiently handle the high dimensional data.
參考文獻	[1] Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi: Indexing multi-dimensional data in a cloud system. SIGMOD Conference 2010: 591-602 [2] H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2): 364-397 (2005) [3] Cui Yu, Beng Chin Ooi, Kian-Lee Tan, H. V. Jagadish: Indexing the Distance: An Efficient Method to KNN Processing. VLDB 2001: 421-430 [4] Nikolaos Kouiroukidis, Georgios Evangelidis: The Effects of Dimensionality Curse in High Dimensional kNN Search. Panhellenic Conference on Informatics 2011: 41-45 [5] Hoang Tam Vo, Chun Chen, Beng Chin Ooi: Towards Elastic Transactional Cloud Storage with Range Query Support. PVLDB 3(1): 506-517 (2010) [6] Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun-Lung Wu: Efficient B-tree Based Indexing for Cloud Data Processing. PVLDB 3(1): 1207-1218 (2010) [7] H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005. [8] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001. [9] I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001. [10] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John Kubiatowicz: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 2004: 41-53 [11] http://www.napster.com/. [12] http://www.darkridge.com/~jpr5/doc/gnutella.html. [13] https://freenetproject.org/. [14] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard M. Karp, Scott Shenker: A scalable content-addressable network. SIGCOMM 2001: 161-172 [15] Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou: C2: A New Overlay Network Based on CAN and Chord. GCC (1) 2003: 42-50 [16] H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu, Rong Zhang, Aoying Zhou: VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes. ICDE 2006: 34 [17] Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100 [18] Paolo Ciaccia, A. Nanni, Marco Patella: A Query-sensitive Cost Model for Similarity Queries with M-tree. Australasian Database Conference 1999: 65-76 [19] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful? In: Beeri C, Buneman P, eds. Proc. of the 7th Int`l Conf. on Database Theory. LNCS 1540, Jerusalem: Springer-Verlag, 1999. 217-235. [20] Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data VLDB 1996: 28-39 [21] Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447. [22] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 [23] J. B. MacQueen: Some Methods for classification and Analysis of Multivariate Observations. Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability.1967: 281-297. [24] Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238 [25] Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749 [26] David Novak , Pavel Zezula, M-Chord: a scalable distributed similarity search structure, Proceedings of the 1st international conference on Scalable information systems, 2006, p.19-es [27] Li M, Lee W-C, Sivasubramaniam A, J. Zhou. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Frontiers of Computer Science in China, 2008, 2(3):p. 234-247 . [28] Tang, Y. Xu, J. Zhou, S. Lee, W. Deng, D. Wang, Y. A Lightweight Multi-Dimensional Index for Complex Queries Over DHTs. IEEE Transactions on Parallel and Distributed Systems,2011, 22: p. 2046-2054
描述	碩士國立政治大學資訊科學學系 99753036 101
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0099753036
資料類型	thesis

dc.contributor.advisor	陳良弼	zh_TW
dc.contributor.advisor	Chen , Arbee L.P.	en_US
dc.contributor.author (Authors)	黃齡葦	zh_TW
dc.contributor.author (Authors)	Huang, Ling Wei	en_US
dc.creator (作者)	黃齡葦	zh_TW
dc.creator (作者)	Huang, Ling Wei	en_US
dc.date (日期)	2012	en_US
dc.date.accessioned	1-Feb-2013 16:53:38 (UTC+8)	-
dc.date.available	1-Feb-2013 16:53:38 (UTC+8)	-
dc.date.issued (上傳時間)	1-Feb-2013 16:53:38 (UTC+8)	-
dc.identifier (Other Identifiers)	G0099753036	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/56889	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學學系	zh_TW
dc.description (描述)	99753036	zh_TW
dc.description (描述)	101	zh_TW
dc.description.abstract (摘要)	目前，隨著資料急速地增加，大規模可擴充性的高度分散式資料庫服務已逐漸成為一種趨勢。在資料如此分散的環境下，如何讓資料的查詢更有效率，建立一個好的索引扮演著相當重要的角色，加上越來越多的資料庫程式應用像是生物、圖像、音樂和視訊等等，皆是處理高維度的資料，而在這些應用程式中，經常需要做相似資料的查詢，但是在高維度的資料且分散式的資料做相似資料的查詢，需耗費大量的時間與運算成本。基於在高度分散式的環境下，針對高維度的資料有效地做KNN的查詢。我們提出一個利用reference point[2,13]的作法RP-CAN( Reference Point-Content Addressable Network )來改善查詢的效率。RP-CAN 主要是結合CAN [14] 的路由協定和使用reference point建立索引的方式來幫助在高度分散式環境下有效率的對高維的資料做查詢處理。最後會實作出我們所提出的RP-CAN索引並與RT-CAN[1]做比較。我們發現我們所提出的RP-CAN索引在高維度資料作KNN的查詢時比RT-CAN索引來的有效率。	zh_TW
dc.description.abstract (摘要)	There has been an increasing interest in deploying a storage system in a highly distributed environment because of the rapid increasing data. And many database applications such as time series, biological and multimedia database, handle high-dimensional data. In these systems, k nearest-neighbors query is one of the most frequent queries but costly operation that is to find objects in the high-dimensional database that are similar to a given query object. As in conventional DBMS, indexes can indeed improve query performance but cannot deploy directly in highly distributed systems because the environment has become more complex. To efficiently support k nearest-neighbors query, a high-dimensional indexing strategy, is developed for the highly distributed environment. In this paper, we propose an efficient indexing strategy, RP-CAN( Reference Point-Content Addressable Network ), to improve the performance of the k nearest-neighbors query in a highly distributed environment. In the end of this paper, we designed an experiment to demonstrate that the performance of RP-CAN is better than RT-CAN in high dimensional space. Thus, our RP-CAN index could efficiently handle the high dimensional data.	en_US
dc.description.tableofcontents	第一章導論及研究動機 1 第2章相關研究 4 2.1 KNN查詢(k-Nearest Neighbor Query)和索引(Index) 4 2.2 覆蓋網路(Overlay Networks) 與分散式雜湊表(Distributed Hash Table, DHT) 5 2.3 RT-CAN 7 2.3.1 內容可定址網路(Content Addressable Network, CAN) 7 2.3.2 Chord 9 2.3.3 C2上的路由 10 第3章研究方法 13 3.1 RP-CAN索引系統架構流程 13 3.2 建立CAN和路由表 14 3.3 對每台電腦建立參考點且把他們和他們的半徑對應到CAN上來建立索引表 17 3.4 KNN查詢演算法 20 3.4.1 參考點(Reference Point) 21 3.4.2 B+-tree 22 第4章實驗 28 4.1. 實驗設計 28 4.2. 實驗結果 30 4.2.1. 單機 30 4.2.2. 多機 34 第5章結論 39 參考文獻 40	zh_TW
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0099753036	en_US
dc.subject (關鍵詞)	索引	zh_TW
dc.subject (關鍵詞)	KNN查詢	zh_TW
dc.subject (關鍵詞)	高度分散式環境	zh_TW
dc.subject (關鍵詞)	參考點	zh_TW
dc.subject (關鍵詞)	Index	en_US
dc.subject (關鍵詞)	P2P	en_US
dc.subject (關鍵詞)	Distributed Enviroments	en_US
dc.subject (關鍵詞)	Reference Point	en_US
dc.title (題名)	在高度分散式環境下對高維度資料建立索引	zh_TW
dc.title (題名)	Indexing high-dimensional data in highly distributed environments	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	[1] Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi: Indexing multi-dimensional data in a cloud system. SIGMOD Conference 2010: 591-602 [2] H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2): 364-397 (2005) [3] Cui Yu, Beng Chin Ooi, Kian-Lee Tan, H. V. Jagadish: Indexing the Distance: An Efficient Method to KNN Processing. VLDB 2001: 421-430 [4] Nikolaos Kouiroukidis, Georgios Evangelidis: The Effects of Dimensionality Curse in High Dimensional kNN Search. Panhellenic Conference on Informatics 2011: 41-45 [5] Hoang Tam Vo, Chun Chen, Beng Chin Ooi: Towards Elastic Transactional Cloud Storage with Range Query Support. PVLDB 3(1): 506-517 (2010) [6] Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun-Lung Wu: Efficient B-tree Based Indexing for Cloud Data Processing. PVLDB 3(1): 1207-1218 (2010) [7] H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005. [8] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001. [9] I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001. [10] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John Kubiatowicz: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 2004: 41-53 [11] http://www.napster.com/. [12] http://www.darkridge.com/~jpr5/doc/gnutella.html. [13] https://freenetproject.org/. [14] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard M. Karp, Scott Shenker: A scalable content-addressable network. SIGCOMM 2001: 161-172 [15] Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou: C2: A New Overlay Network Based on CAN and Chord. GCC (1) 2003: 42-50 [16] H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu, Rong Zhang, Aoying Zhou: VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes. ICDE 2006: 34 [17] Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100 [18] Paolo Ciaccia, A. Nanni, Marco Patella: A Query-sensitive Cost Model for Similarity Queries with M-tree. Australasian Database Conference 1999: 65-76 [19] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful? In: Beeri C, Buneman P, eds. Proc. of the 7th Int`l Conf. on Database Theory. LNCS 1540, Jerusalem: Springer-Verlag, 1999. 217-235. [20] Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data VLDB 1996: 28-39 [21] Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447. [22] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 [23] J. B. MacQueen: Some Methods for classification and Analysis of Multivariate Observations. Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability.1967: 281-297. [24] Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238 [25] Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749 [26] David Novak , Pavel Zezula, M-Chord: a scalable distributed similarity search structure, Proceedings of the 1st international conference on Scalable information systems, 2006, p.19-es [27] Li M, Lee W-C, Sivasubramaniam A, J. Zhou. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Frontiers of Computer Science in China, 2008, 2(3):p. 234-247 . [28] Tang, Y. Xu, J. Zhou, S. Lee, W. Deng, D. Wang, Y. A Lightweight Multi-Dimensional Index for Complex Queries Over DHTs. IEEE Transactions on Parallel and Distributed Systems,2011, 22: p. 2046-2054	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM