NoSQL 資料庫子集查詢的學習索引 | 學術產出

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

題名	NoSQL 資料庫子集查詢的學習索引 Learned Index for Subset Query of NoSQL Databases
作者	許軒祥 Hsu, Hsuan-Hsiang
貢獻者	沈錳坤 Shan, Man-Kwan 許軒祥 Hsu, Hsuan-Hsiang
關鍵詞	學習索引 NoSQL資料庫子集查詢 Learned Index NoSQL Database Subset Query
日期	2024
上傳時間	4-九月-2024 14:59:08 (UTC+8)
摘要	NoSQL資料庫處理半結構化或非結構化資料，子集查詢是NoSQL資料庫中常見的查詢。近年來，運用機器學習的學習索引技術為資料庫的索引技術開闢了新途徑。與傳統的B-Tree相比，學習索引在查詢時間上具有顯著優勢。傳統索引的查詢時間主要是記憶體擷取時間，而學習索引的查詢時間主要是CPU運算時間。現有學習索引的研究主要針對傳統關聯式資料庫的查詢。針對子集查詢，僅有近期基於Deep Sets的DGM。DGM主要在記憶體空間效率方面節省空間，但在查詢速度上仍有提升的空間。本研究提出了兩種創新的學習索引技術：LI4Subset-D和LI4Subset-P以提升NoSQL資料庫子集查詢的效能。LI4Subset-D與LI4Subset-P分別運用DeepSets與學習索引的PGM-index。實驗結果顯示LI4Subset-D在查詢速度上比DGM提升近149倍，記憶體空間僅增加約 7倍。LI4Subset-P在查詢速度比DGM快約3235倍，而記憶體空間約增加4倍。 NoSQL databases target at semi-structured or unstructured data, and subset queries are common in NoSQL databases. In recent years, learned index techniques based on machine learning have opened new avenues for database indexing. Compared to traditional B-Trees, learned indexes offer significant advantages in query time. Traditional indexes is memory intensive while learned index is CPU intensive. Existing research on learned indexes mainly focuses on traditional relational databases queries. For subset queries, the only recent development is the DGM approach based on Deep Sets. DGM is designed for space efficiency but still has room for improvement in time efficiency. This thesis proposes two novel learned index techniques, LI4Subset-D and LI4Subset-P, to enhance the performance of subset queries in NoSQL databases. LI4Subset-D and LI4Subset-P leverage Deep Sets and the PGM-index of learning indexes, respectively. Experimental results show that LI4Subset-D improves query speed by nearly 149 times compared to DGM, with the expense of 7 times increase in memory space. LI4Subset-P is approximately 3235 times faster than DGM in query speed, with the expense of 4 times increase in memory space.
參考文獻	[1] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, The Case for Learned Index Structures, in Proceedings of the ACM 2018 International Conference on Management of Data (SIGMOD), pp. 489-504, 2018. [2] A. Davitkova, D. Gjurovski, and S. Michel, Learning over Sets for Databases, in Proceedings of the 27th International Conference on Extending Database Technology (EDBT), pp. 68-80, 2024. [3] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep Sets, in Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017. [4] P. Ferragina and G. Vinciguerra, The PGM-index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds, in Proceedings of the VLDB Endowment, vol. 13, no. 8, pp. 1162-1175, 2020. [5] U. Deppisch, S-tree: A Dynamic Balanced Signature Index for Office Retrieval, in Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77-87, 1986. [6] M. Morzy, T. Morzy, A. Nanopoulos, and Y. Manolopoulos, Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes, in Proceedings of 7th East European Conference on Advances in Databases and Information Systems:: Springer, pp. 236-252, 2003. [7] S. Helmer, R. Aly, T. Neumann, and G. Moerkotte, Indexing set-valued attributes with a multi-level extendible hashing scheme, in Proceedings of 18th International Conference on Database and Expert Systems Applications:: Springer, pp. 98-108, 2007. [8] S. Bevc and I. Savnik, Using Tries for Subset and Superset Queries, in Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces: IEEE, pp. 147-152, 2009. [9] I. Savnik, Efficient Subset and Superset Queries, in DB&Local Proceedings: Citeseer, pp. 45-57, 2012. [10] I. Savnik, Index Data Structure for Fast Subset and Superset Queries, in Proceedings of International Conference on Availability, Reliability, and Security: Springer, pp. 134-148, 2013. [11] A. Galakatos, M. Markovitch, C. Binnig, R. Fonseca, and T. Kraska, Fiting-tree: A Data-Aware Index Structure, in Proceedings of the 2019 ACM International Conference on Management of Data (SIGMOD), pp. 1189-1206, 2019. [12] J. Rao and K. A. Ross, Cache Conscious Indexing for Decision-Support in Main Memory, in Proceedings of the 25th VLDB Conference, 1999. [13] A. Kipf et al., RadixSpline: A Single-Pass Learned Index, in Proceedings of the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1-5, 2020. [14] R. Marcus et al., Benchmarking Learned Indexes, Proceedings of the VLDB Endowment, Volume 14, Issue 1, 2020.
描述	碩士國立政治大學資訊科學系 111753122
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111753122
資料類型	thesis

dc.contributor.advisor	沈錳坤	zh_TW
dc.contributor.advisor	Shan, Man-Kwan	en_US
dc.contributor.author (作者)	許軒祥	zh_TW
dc.contributor.author (作者)	Hsu, Hsuan-Hsiang	en_US
dc.creator (作者)	許軒祥	zh_TW
dc.creator (作者)	Hsu, Hsuan-Hsiang	en_US
dc.date (日期)	2024	en_US
dc.date.accessioned	4-九月-2024 14:59:08 (UTC+8)	-
dc.date.available	4-九月-2024 14:59:08 (UTC+8)	-
dc.date.issued (上傳時間)	4-九月-2024 14:59:08 (UTC+8)	-
dc.identifier (其他識別碼)	G0111753122	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/153375	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	111753122	zh_TW
dc.description.abstract (摘要)	NoSQL資料庫處理半結構化或非結構化資料，子集查詢是NoSQL資料庫中常見的查詢。近年來，運用機器學習的學習索引技術為資料庫的索引技術開闢了新途徑。與傳統的B-Tree相比，學習索引在查詢時間上具有顯著優勢。傳統索引的查詢時間主要是記憶體擷取時間，而學習索引的查詢時間主要是CPU運算時間。現有學習索引的研究主要針對傳統關聯式資料庫的查詢。針對子集查詢，僅有近期基於Deep Sets的DGM。DGM主要在記憶體空間效率方面節省空間，但在查詢速度上仍有提升的空間。本研究提出了兩種創新的學習索引技術：LI4Subset-D和LI4Subset-P以提升NoSQL資料庫子集查詢的效能。LI4Subset-D與LI4Subset-P分別運用DeepSets與學習索引的PGM-index。實驗結果顯示LI4Subset-D在查詢速度上比DGM提升近149倍，記憶體空間僅增加約 7倍。LI4Subset-P在查詢速度比DGM快約3235倍，而記憶體空間約增加4倍。	zh_TW
dc.description.abstract (摘要)	NoSQL databases target at semi-structured or unstructured data, and subset queries are common in NoSQL databases. In recent years, learned index techniques based on machine learning have opened new avenues for database indexing. Compared to traditional B-Trees, learned indexes offer significant advantages in query time. Traditional indexes is memory intensive while learned index is CPU intensive. Existing research on learned indexes mainly focuses on traditional relational databases queries. For subset queries, the only recent development is the DGM approach based on Deep Sets. DGM is designed for space efficiency but still has room for improvement in time efficiency. This thesis proposes two novel learned index techniques, LI4Subset-D and LI4Subset-P, to enhance the performance of subset queries in NoSQL databases. LI4Subset-D and LI4Subset-P leverage Deep Sets and the PGM-index of learning indexes, respectively. Experimental results show that LI4Subset-D improves query speed by nearly 149 times compared to DGM, with the expense of 7 times increase in memory space. LI4Subset-P is approximately 3235 times faster than DGM in query speed, with the expense of 4 times increase in memory space.	en_US
dc.description.tableofcontents	摘要 i 目錄 iv 表目錄 vi 圖目錄 vii 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 2 第二章相關研究 3 2.1 子集查詢 3 2.2 學習索引 5 第三章研究方法 8 3.1 問題定義 8 3.2 研究架構 8 3.3 資料前處理 10 3.4 Inversion Construction 11 3.5 Set2Seq 13 3.6 Partitioning 13 3.7 Ranking 15 3.8 Deep Sets 15 3.9 PGM-index 19 3.10 Key Lookup 22 第四章實驗設計與結果分析 24 4.1 實驗設計與評估方法 24 4.1.1 資料集 24 4.1.2 查詢評估方法 25 4.2 實驗結果與分析 26 4.2.1 LI4Subset-D與DGM的效能比較 26 4.2.2 LI4Subset-D模型複雜度對效能的影響 28 4.2.3 LI4Subset-D中Set2Seq對效能的影響 30 4.2.4 LI4Subset-D中Seq2Int Hash對Partitioning效果 31 4.2.5 LI4Subets-D中Partitioning對效能的影響 32 4.2.6 LI4Subset-D和DGM的批次處理對效能的影響 34 4.2.7 LI4Subset-P與DGM的效能比較 37 4.2.8 LI4Subset-P中Set2Seq對效能的影響 38 4.2.9 LI4Subset-P模型複雜度對效能的影響 40 4.2.10 LI4Subset-P中Partitioning對效能的影響 41 4.2.11 學習索引方法與傳統索引方法記憶體使用比較 43 4.2.12 實作議題討論 44 第五章結論 46 參考文獻 47	zh_TW
dc.format.extent	1262048 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111753122	en_US
dc.subject (關鍵詞)	學習索引	zh_TW
dc.subject (關鍵詞)	NoSQL資料庫	zh_TW
dc.subject (關鍵詞)	子集查詢	zh_TW
dc.subject (關鍵詞)	Learned Index	en_US
dc.subject (關鍵詞)	NoSQL Database	en_US
dc.subject (關鍵詞)	Subset Query	en_US
dc.title (題名)	NoSQL 資料庫子集查詢的學習索引	zh_TW
dc.title (題名)	Learned Index for Subset Query of NoSQL Databases	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, The Case for Learned Index Structures, in Proceedings of the ACM 2018 International Conference on Management of Data (SIGMOD), pp. 489-504, 2018. [2] A. Davitkova, D. Gjurovski, and S. Michel, Learning over Sets for Databases, in Proceedings of the 27th International Conference on Extending Database Technology (EDBT), pp. 68-80, 2024. [3] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep Sets, in Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 30, 2017. [4] P. Ferragina and G. Vinciguerra, The PGM-index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds, in Proceedings of the VLDB Endowment, vol. 13, no. 8, pp. 1162-1175, 2020. [5] U. Deppisch, S-tree: A Dynamic Balanced Signature Index for Office Retrieval, in Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77-87, 1986. [6] M. Morzy, T. Morzy, A. Nanopoulos, and Y. Manolopoulos, Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes, in Proceedings of 7th East European Conference on Advances in Databases and Information Systems:: Springer, pp. 236-252, 2003. [7] S. Helmer, R. Aly, T. Neumann, and G. Moerkotte, Indexing set-valued attributes with a multi-level extendible hashing scheme, in Proceedings of 18th International Conference on Database and Expert Systems Applications:: Springer, pp. 98-108, 2007. [8] S. Bevc and I. Savnik, Using Tries for Subset and Superset Queries, in Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces: IEEE, pp. 147-152, 2009. [9] I. Savnik, Efficient Subset and Superset Queries, in DB&Local Proceedings: Citeseer, pp. 45-57, 2012. [10] I. Savnik, Index Data Structure for Fast Subset and Superset Queries, in Proceedings of International Conference on Availability, Reliability, and Security: Springer, pp. 134-148, 2013. [11] A. Galakatos, M. Markovitch, C. Binnig, R. Fonseca, and T. Kraska, Fiting-tree: A Data-Aware Index Structure, in Proceedings of the 2019 ACM International Conference on Management of Data (SIGMOD), pp. 1189-1206, 2019. [12] J. Rao and K. A. Ross, Cache Conscious Indexing for Decision-Support in Main Memory, in Proceedings of the 25th VLDB Conference, 1999. [13] A. Kipf et al., RadixSpline: A Single-Pass Learned Index, in Proceedings of the 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pp. 1-5, 2020. [14] R. Marcus et al., Benchmarking Learned Indexes, Proceedings of the VLDB Endowment, Volume 14, Issue 1, 2020.	zh_TW

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM