Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 在高度分散式環境下對高維度資料建立索引
Indexing high-dimensional data in highly distributed environments作者 黃齡葦
Huang, Ling Wei貢獻者 陳良弼
Chen , Arbee L.P.
黃齡葦
Huang, Ling Wei關鍵詞 索引
KNN查詢
高度分散式環境
參考點
Index
P2P
Distributed Enviroments
Reference Point日期 2012 上傳時間 1-Feb-2013 16:53:38 (UTC+8) 摘要 目前,隨著資料急速地增加,大規模可擴充性的高度分散式資料庫服務已逐漸成為一種趨勢。在資料如此分散的環境下,如何讓資料的查詢更有效率,建立一個好的索引扮演著相當重要的角色,加上越來越多的資料庫程式應用像是生物、圖像、音樂和視訊等等,皆是處理高維度的資料,而在這些應用程式中,經常需要做相似資料的查詢,但是在高維度的資料且分散式的資料做相似資料的查詢,需耗費大量的時間與運算成本。 基於在高度分散式的環境下,針對高維度的資料有效地做KNN的查詢。我們提出一個利用reference point[2,13]的作法RP-CAN( Reference Point-Content Addressable Network )來改善查詢的效率。RP-CAN 主要是結合CAN [14] 的路由協定和使用reference point建立索引的方式來幫助在高度分散式環境下有效率的對高維的資料做查詢處理。 最後會實作出我們所提出的RP-CAN索引並與RT-CAN[1]做比較。我們發現我們所提出的RP-CAN索引在高維度資料作KNN的查詢時比RT-CAN索引來的有效率。
There has been an increasing interest in deploying a storage system in a highly distributed environment because of the rapid increasing data. And many database applications such as time series, biological and multimedia database, handle high-dimensional data. In these systems, k nearest-neighbors query is one of the most frequent queries but costly operation that is to find objects in the high-dimensional database that are similar to a given query object. As in conventional DBMS, indexes can indeed improve query performance but cannot deploy directly in highly distributed systems because the environment has become more complex. To efficiently support k nearest-neighbors query, a high-dimensional indexing strategy, is developed for the highly distributed environment. In this paper, we propose an efficient indexing strategy, RP-CAN( Reference Point-Content Addressable Network ), to improve the performance of the k nearest-neighbors query in a highly distributed environment. In the end of this paper, we designed an experiment to demonstrate that the performance of RP-CAN is better than RT-CAN in high dimensional space. Thus, our RP-CAN index could efficiently handle the high dimensional data.參考文獻 [1] Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi: Indexing multi-dimensional data in a cloud system. SIGMOD Conference 2010: 591-602 [2] H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2): 364-397 (2005) [3] Cui Yu, Beng Chin Ooi, Kian-Lee Tan, H. V. Jagadish: Indexing the Distance: An Efficient Method to KNN Processing. VLDB 2001: 421-430 [4] Nikolaos Kouiroukidis, Georgios Evangelidis: The Effects of Dimensionality Curse in High Dimensional kNN Search. Panhellenic Conference on Informatics 2011: 41-45 [5] Hoang Tam Vo, Chun Chen, Beng Chin Ooi: Towards Elastic Transactional Cloud Storage with Range Query Support. PVLDB 3(1): 506-517 (2010) [6] Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun-Lung Wu: Efficient B-tree Based Indexing for Cloud Data Processing. PVLDB 3(1): 1207-1218 (2010) [7] H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005. [8] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001. [9] I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001. [10] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John Kubiatowicz: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 2004: 41-53 [11] http://www.napster.com/. [12] http://www.darkridge.com/~jpr5/doc/gnutella.html. [13] https://freenetproject.org/. [14] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard M. Karp, Scott Shenker: A scalable content-addressable network. SIGCOMM 2001: 161-172 [15] Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou: C2: A New Overlay Network Based on CAN and Chord. GCC (1) 2003: 42-50 [16] H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu, Rong Zhang, Aoying Zhou: VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes. ICDE 2006: 34 [17] Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100 [18] Paolo Ciaccia, A. Nanni, Marco Patella: A Query-sensitive Cost Model for Similarity Queries with M-tree. Australasian Database Conference 1999: 65-76 [19] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful? In: Beeri C, Buneman P, eds. Proc. of the 7th Int`l Conf. on Database Theory. LNCS 1540, Jerusalem: Springer-Verlag, 1999. 217-235. [20] Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data VLDB 1996: 28-39 [21] Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447. [22] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 [23] J. B. MacQueen: Some Methods for classification and Analysis of Multivariate Observations. Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability.1967: 281-297. [24] Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238 [25] Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749 [26] David Novak , Pavel Zezula, M-Chord: a scalable distributed similarity search structure, Proceedings of the 1st international conference on Scalable information systems, 2006, p.19-es [27] Li M, Lee W-C, Sivasubramaniam A, J. Zhou. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Frontiers of Computer Science in China, 2008, 2(3):p. 234-247 . [28] Tang, Y. Xu, J. Zhou, S. Lee, W. Deng, D. Wang, Y. A Lightweight Multi-Dimensional Index for Complex Queries Over DHTs. IEEE Transactions on Parallel and Distributed Systems,2011, 22: p. 2046-2054 描述 碩士
國立政治大學
資訊科學學系
99753036
101資料來源 http://thesis.lib.nccu.edu.tw/record/#G0099753036 資料類型 thesis dc.contributor.advisor 陳良弼 zh_TW dc.contributor.advisor Chen , Arbee L.P. en_US dc.contributor.author (Authors) 黃齡葦 zh_TW dc.contributor.author (Authors) Huang, Ling Wei en_US dc.creator (作者) 黃齡葦 zh_TW dc.creator (作者) Huang, Ling Wei en_US dc.date (日期) 2012 en_US dc.date.accessioned 1-Feb-2013 16:53:38 (UTC+8) - dc.date.available 1-Feb-2013 16:53:38 (UTC+8) - dc.date.issued (上傳時間) 1-Feb-2013 16:53:38 (UTC+8) - dc.identifier (Other Identifiers) G0099753036 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/56889 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 99753036 zh_TW dc.description (描述) 101 zh_TW dc.description.abstract (摘要) 目前,隨著資料急速地增加,大規模可擴充性的高度分散式資料庫服務已逐漸成為一種趨勢。在資料如此分散的環境下,如何讓資料的查詢更有效率,建立一個好的索引扮演著相當重要的角色,加上越來越多的資料庫程式應用像是生物、圖像、音樂和視訊等等,皆是處理高維度的資料,而在這些應用程式中,經常需要做相似資料的查詢,但是在高維度的資料且分散式的資料做相似資料的查詢,需耗費大量的時間與運算成本。 基於在高度分散式的環境下,針對高維度的資料有效地做KNN的查詢。我們提出一個利用reference point[2,13]的作法RP-CAN( Reference Point-Content Addressable Network )來改善查詢的效率。RP-CAN 主要是結合CAN [14] 的路由協定和使用reference point建立索引的方式來幫助在高度分散式環境下有效率的對高維的資料做查詢處理。 最後會實作出我們所提出的RP-CAN索引並與RT-CAN[1]做比較。我們發現我們所提出的RP-CAN索引在高維度資料作KNN的查詢時比RT-CAN索引來的有效率。 zh_TW dc.description.abstract (摘要) There has been an increasing interest in deploying a storage system in a highly distributed environment because of the rapid increasing data. And many database applications such as time series, biological and multimedia database, handle high-dimensional data. In these systems, k nearest-neighbors query is one of the most frequent queries but costly operation that is to find objects in the high-dimensional database that are similar to a given query object. As in conventional DBMS, indexes can indeed improve query performance but cannot deploy directly in highly distributed systems because the environment has become more complex. To efficiently support k nearest-neighbors query, a high-dimensional indexing strategy, is developed for the highly distributed environment. In this paper, we propose an efficient indexing strategy, RP-CAN( Reference Point-Content Addressable Network ), to improve the performance of the k nearest-neighbors query in a highly distributed environment. In the end of this paper, we designed an experiment to demonstrate that the performance of RP-CAN is better than RT-CAN in high dimensional space. Thus, our RP-CAN index could efficiently handle the high dimensional data. en_US dc.description.tableofcontents 第一章 導論及研究動機 1 第2章 相關研究 4 2.1 KNN查詢(k-Nearest Neighbor Query)和索引(Index) 4 2.2 覆蓋網路(Overlay Networks) 與 分散式雜湊表(Distributed Hash Table, DHT) 5 2.3 RT-CAN 7 2.3.1 內容可定址網路(Content Addressable Network, CAN) 7 2.3.2 Chord 9 2.3.3 C2上的路由 10 第3章 研究方法 13 3.1 RP-CAN索引系統架構流程 13 3.2 建立CAN和路由表 14 3.3 對每台電腦建立參考點且把他們和他們的半徑對應到CAN上來建立索引表 17 3.4 KNN查詢演算法 20 3.4.1 參考點(Reference Point) 21 3.4.2 B+-tree 22 第4章 實驗 28 4.1. 實驗設計 28 4.2. 實驗結果 30 4.2.1. 單機 30 4.2.2. 多機 34 第5章 結論 39 參考文獻 40 zh_TW dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0099753036 en_US dc.subject (關鍵詞) 索引 zh_TW dc.subject (關鍵詞) KNN查詢 zh_TW dc.subject (關鍵詞) 高度分散式環境 zh_TW dc.subject (關鍵詞) 參考點 zh_TW dc.subject (關鍵詞) Index en_US dc.subject (關鍵詞) P2P en_US dc.subject (關鍵詞) Distributed Enviroments en_US dc.subject (關鍵詞) Reference Point en_US dc.title (題名) 在高度分散式環境下對高維度資料建立索引 zh_TW dc.title (題名) Indexing high-dimensional data in highly distributed environments en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi: Indexing multi-dimensional data in a cloud system. SIGMOD Conference 2010: 591-602 [2] H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2): 364-397 (2005) [3] Cui Yu, Beng Chin Ooi, Kian-Lee Tan, H. V. Jagadish: Indexing the Distance: An Efficient Method to KNN Processing. VLDB 2001: 421-430 [4] Nikolaos Kouiroukidis, Georgios Evangelidis: The Effects of Dimensionality Curse in High Dimensional kNN Search. Panhellenic Conference on Informatics 2011: 41-45 [5] Hoang Tam Vo, Chun Chen, Beng Chin Ooi: Towards Elastic Transactional Cloud Storage with Range Query Support. PVLDB 3(1): 506-517 (2010) [6] Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun-Lung Wu: Efficient B-tree Based Indexing for Cloud Data Processing. PVLDB 3(1): 1207-1218 (2010) [7] H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005. [8] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001. [9] I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001. [10] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, John Kubiatowicz: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 2004: 41-53 [11] http://www.napster.com/. [12] http://www.darkridge.com/~jpr5/doc/gnutella.html. [13] https://freenetproject.org/. [14] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard M. Karp, Scott Shenker: A scalable content-addressable network. SIGCOMM 2001: 161-172 [15] Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou: C2: A New Overlay Network Based on CAN and Chord. GCC (1) 2003: 42-50 [16] H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu, Rong Zhang, Aoying Zhou: VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes. ICDE 2006: 34 [17] Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100 [18] Paolo Ciaccia, A. Nanni, Marco Patella: A Query-sensitive Cost Model for Similarity Queries with M-tree. Australasian Database Conference 1999: 65-76 [19] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is nearest neighbors meaningful? In: Beeri C, Buneman P, eds. Proc. of the 7th Int`l Conf. on Database Theory. LNCS 1540, Jerusalem: Springer-Verlag, 1999. 217-235. [20] Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data VLDB 1996: 28-39 [21] Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447. [22] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 [23] J. B. MacQueen: Some Methods for classification and Analysis of Multivariate Observations. Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability.1967: 281-297. [24] Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238 [25] Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749 [26] David Novak , Pavel Zezula, M-Chord: a scalable distributed similarity search structure, Proceedings of the 1st international conference on Scalable information systems, 2006, p.19-es [27] Li M, Lee W-C, Sivasubramaniam A, J. Zhou. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Frontiers of Computer Science in China, 2008, 2(3):p. 234-247 . [28] Tang, Y. Xu, J. Zhou, S. Lee, W. Deng, D. Wang, Y. A Lightweight Multi-Dimensional Index for Complex Queries Over DHTs. IEEE Transactions on Parallel and Distributed Systems,2011, 22: p. 2046-2054 zh_TW
