dc.contributor.advisor | 沈錳坤 | zh_TW |
dc.contributor.advisor | Shan,Man-Kwan | en_US |
dc.contributor.author (Authors) | 張詩宜 | zh_TW |
dc.contributor.author (Authors) | Shih-i Chang | en_US |
dc.creator (作者) | 張詩宜 | zh_TW |
dc.creator (作者) | Shih-i Chang | en_US |
dc.date (日期) | 2003 | en_US |
dc.date.accessioned | 17-Sep-2009 13:52:32 (UTC+8) | - |
dc.date.available | 17-Sep-2009 13:52:32 (UTC+8) | - |
dc.date.issued (上傳時間) | 17-Sep-2009 13:52:32 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0090753006 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/32621 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊科學學系 | zh_TW |
dc.description (描述) | 90753006 | zh_TW |
dc.description (描述) | 92 | zh_TW |
dc.description.abstract (摘要) | 隨著電腦以及網際網路的普及,越來越多各領域的資料被數位化,利用電腦幫助儲存及管理資料。有許多資料在數位化的過程中,採用tree的資料結構來表達以及儲存。也因此,如何查詢這些龐大的資料,就成為重要的課題。在本論文中,我們針對具有rooted、labeled以及ordered或unordered等特性的tree結構資料的索引問題,提出稱之為PCS-trie之全新的主記憶體資料庫(Main Memory Database)索引結構,並提出相關的增刪資料及搜尋演算法,以達成加速處理sub-tree query之目標。此索引方法的基礎,在於將tree database中的tree編碼為可完整代表其結構的PREOD code字串,之後再以我們所提出的PCS-trie加以索引。PCS-trie索引結構支援資料庫的動態增刪,且我們也提供了有效率的新增及刪除資料的演算法。本論文中也提出了多種搜尋演算法,使能夠在PCS-trie中進行exact matching、處理query tree含有don’t care部分之查詢、以及fault tolerant等不同類型的查詢。最後,我們以實驗的方法,配合人工產生的實驗資料,來對PCS-trie索引方法的時間及空間等各方面的效率加以檢驗。 | zh_TW |
dc.description.abstract (摘要) | With the popularization of computer and the Internet, more and more data in various domains are digitized in order to take advantage of the power of computing and storing, and use the Internet to spread these informationz. Many data use tree structure to store them in the process of digitization. For this reason, it is a challenge to deal with these enormous data. In this paper, we present an approach to the search problem for these rooted labeled trees. We show a novel index structure, PREOD code Search trie (PCS-trie), and related algorithms for construction and search of PCS-trie, to speed up the sub-tree query to a tree database. The fundamental of this reaseach is to encode trees in tree database by PREOD code, in which the structure information of a tree can be reserved completely. Then we index these PREOD codes by PCS-trie. PCS-trie supports dynamic insertion and deleteion of PREOD codes. PCS-trie can handle three different types of query requirements: exact sub-tree query, query with don’t cares, and fault-tolerant query. Finally, we have conducted a series of experiments to evaluate the performance of PCS-trie and related algorithms. Experimental results obtained by running our techniques on synthetic data demonstrate the good performance of the proposed approach. | en_US |
dc.description.tableofcontents | 第一章 導論 11.1 研究動機與目的 11.2 論文架構 4第二章 相關研究 52.1 子樹索引相關研究 52.1.1 樹狀結構索引之相關研究 52.1.2 XML 文件索引相關研究 82.1.2.1 Index Fabric 92.1.2.2 XSD Structure 102.1.2.3 ViST 122.1.3 相關研究之整理摘要 132.2 後序樹 15第三章 問題定義 183.1 Tree的相關定義 183.2 Tree資料庫的索引問題定義 20第四章 Tree的字串表示法 22第五章 PCS-trie索引結構及相關演算法 285.1 PCS-trie 285.2 PCS-trie增刪演算法 315.3 PCS-trie查詢演算法 355.3.1 精確比對演算法 355.3.2 Wildcard之查詢處理 385.3.2.1 Wildcard “?”之查詢處理演算法 395.3.2.2 Wildcard “*”之查詢處理演算法 415.3.2.3 Wildcard查詢處理範例 43第六章 實驗與實驗結果 456.1 實驗設計 456.1.1 資料樹的產生 466.1.2 查詢樹的產生 486.1.3 實驗環境 506.2 實驗結果 516.2.1 Data tree之形狀對PCS-trie效率之影響 516.2.2 Dictionary size之大小對PCS-trie效率之影響 546.2.3 Data tree之多寡對PCS-trie效率之影響 586.2.4 Query tree之大小對PCS-trie效率之影響 616.2.5 PCS-trie對含有wildcard之query tree之查詢效率影響 636.2.6 Jump label對PCS-trie結構之影響評估 69第七章 相關查詢之處理 737.1 對Ordered tree database之unordered tree查詢處理 737.2 Fault tolerent之查詢處理 75第八章 結論與未來研究方向 79參考文獻 81圖目錄圖1.1:XML Schema範例與可能的多路徑查詢範例。 2圖2.1:ATreeGrep對單一tree之索引結構。 6圖2.2:Query with don’t cares in ATreeGrep。 7圖2.3:Index Fabric索引結構範例。 9圖2.4:DCS之組合範例。 11圖2.5:XSD tree索引結構。 11圖2.6:查詢XSD Structure時可能的false drop情況範例。 14圖2.7:對abab$字串之後序樹。 16圖3.1:Sub-tree範例。 20圖4.1:由tree建構PREOD code之演算法。 24圖4.2:Tree T與其PREOD code。 24圖4.3:利用PREOD code作tree的比對。 26圖5.1:Tree database範例與其PCS-trie。 30圖5.2:PCS-trie建構演算法。 32圖5.3:PCS-trie建構範例。 33圖5.4:PCS-trie刪除單一PREOD code之演算法。 34圖5.5:PCS-trie查詢演算法。 37圖5.6:PCS-trie查詢範例。 37圖5.7:帶有wildcard之query tree範例。 38圖5.8:含有wildcard之query tree中,所有可能順序關係之範例。 39圖5.9:處理wildcard “?”之查詢演算法。 40圖5.10:處理wildcard “*”之查詢演算法。 42圖5.11:Wildcard之查詢處理範例。 43圖6.1:以data tree之形狀為變因,ATreeGrep與PCS-trie之查詢時間比較。 54圖6.2:以dictionary size為變因,ATreeGrep與PCS-trie之查詢時間比較。 57圖6.3:以data tree之數量為變因,ATreeGrep與PCS-trie之查詢時間比較。 60圖6.4:以query tree之大小為變因,ATreeGrep與PCS-trie之查詢時間比較。 62圖6.5:Wildcard “?”對ATreeGrep及PCS-trie之搜尋效率之影響。 64圖6.6:Wildcard “*”對ATreeGrep及PCS-trie之搜尋效率之影響。 67圖6.7:以query tree之大小為變因,有無jump label對PCS-trie查詢時間之影響。 70圖7.1:對已建立ordered tree database之PCS-trie的unordered tree搜尋演算法。 74圖7.2:一path set對應兩棵以上unordered tree之情況。 75圖7.3:2-FT-contain之範例。 76圖7.4:Fault tolerant query之搜尋演算法。 77圖7.5:Fault tolerant之查詢範例。 78表目錄表6.1:實驗資料參數列表。 50表6.2:以data tree之形狀為變因,ATreeGrep與PCS-trie之建構時間比較。 52表6.3:以data tree之形狀為變因,ATreeGrep與PCS-trie之建構所需空間比較。 52表6.4:以data tree之形狀為變因,ATreeGrep與PCS-trie之查詢時間比較。 53表6.5:以dictionary size為變因,ATreeGrep與PCS-trie之建構時間比較。 55表6.6:以dictionary size為變因,ATreeGrep與PCS-trie之建構所需空間比較。 55表6.7:以dictionary size為變因,ATreeGrep與PCS-trie之查詢時間比較。 56表6.8:以data tree之數量為變因,ATreeGrep與PCS-trie之建構時間比較。 58表6.9:以data tree之數量為變因,ATreeGrep與PCS-trie之建構所需空間比較。 59表6.10:以data tree之數量為變因,ATreeGrep與PCS-trie之查詢時間比較。 59表6.11:以query tree之大小為變因,ATreeGrep與PCS-trie之查詢時間比較。 61表6.12:Wildcard “?”對ATreeGrep及PCS-trie之搜尋效率之影響。 64表6.13:Wildcard “*”對ATreeGrep及PCS-trie之搜尋效率之影響。 67表6.14:以data tree之數量為變因,有無jump label之PCS-trie之建構時間及所需記憶體空間比較。 69表6.15:以query tree之大小為變因,有無jump label對PCS-trie查詢時間之影響。 70表6.16:Jump label對查詢所需traverse的PCS-trie node數之影響。 71表6.17:對PCS-trie查詢時,使用在比對jump label之時間。 72 | zh_TW |
dc.format.extent | 45397 bytes | - |
dc.format.extent | 88872 bytes | - |
dc.format.extent | 65084 bytes | - |
dc.format.extent | 105834 bytes | - |
dc.format.extent | 129036 bytes | - |
dc.format.extent | 238566 bytes | - |
dc.format.extent | 106317 bytes | - |
dc.format.extent | 127474 bytes | - |
dc.format.extent | 829733 bytes | - |
dc.format.extent | 367836 bytes | - |
dc.format.extent | 209481 bytes | - |
dc.format.extent | 77305 bytes | - |
dc.format.extent | 43614 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0090753006 | en_US |
dc.subject (關鍵詞) | 索引結構 | zh_TW |
dc.subject (關鍵詞) | 樹 | zh_TW |
dc.subject (關鍵詞) | 子樹查詢 | zh_TW |
dc.subject (關鍵詞) | 資訊檢索 | zh_TW |
dc.subject (關鍵詞) | 可延伸式標記語言 | zh_TW |
dc.subject (關鍵詞) | Index tructure | en_US |
dc.subject (關鍵詞) | Tree | en_US |
dc.subject (關鍵詞) | Sub-tree query | en_US |
dc.subject (關鍵詞) | Information Retrieval | en_US |
dc.subject (關鍵詞) | XML | en_US |
dc.title (題名) | 子樹查詢的索引結構設計 | zh_TW |
dc.title (題名) | PCS-trie: An Index Structure for Sub-Tree Query | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | [1] E. N. Adams. Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21:390-397, 1972. | zh_TW |
dc.relation.reference (參考文獻) | [2] S. Amer-Yahia, S. Cho, L. V. S. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 497-508, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [3] A. Apostolico. The myriad virtues of sub-word trees. In A. Apostolico and Z. Galil, editors, Combinatorics on Words, pages 85-96, Springer-Verlag, Nato ASI series vol. 112, 1985. | zh_TW |
dc.relation.reference (參考文獻) | [4] A. Berglund, S. Boag, D. Chamberlin, M. F. Fernandez, M. Kay, J. Robie, and J. Simon. XML path language (XPath) 2.0 W3C working draft 16. Technical Report WD-wpath20-20020816, World Wide Web Consortium, 2002. | zh_TW |
dc.relation.reference (參考文獻) | [5] V. Berry and D. Bryant. Faster reliable phylogenetic analysis. In Proceedings of the 3rd Annual International Conference on Computational Molecular Biology, pages 59-68, 1999. | zh_TW |
dc.relation.reference (參考文獻) | [6] S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simon. XQuery 1.0: An XML query language W3C working draft 16. Technical Report WD-xquery-20020816, World Wide Web Consortium, 2002. | zh_TW |
dc.relation.reference (參考文獻) | [7] P. Buneman, S. B. Davidson, M. F. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proceedings of the 6th International Conference on Database Theory, pages 336-350, 1997. | zh_TW |
dc.relation.reference (參考文獻) | [8] J. H. Camin and R. R. Sokal. A method for deducing branching sequences in phylogeny. Evolution 19:311-326, 1965. | zh_TW |
dc.relation.reference (參考文獻) | [9] G. Chang, M. J. Healey, J. A. M. McHugh, and J. T. L. Wang. Mining the World Wide Web: An Information Search Approach. Kluwer Academic Publishers, Norwell, Massachusetts, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [10] S. S. Chawathe, S. Abiteboul, and J. Widom. Representing and querying changes in semistructured data. In Proceedings of the IEEE International Conference on Data Engineering, pages 4-13, 1998. | zh_TW |
dc.relation.reference (參考文獻) | [11] S. S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 26-37, 1997. | zh_TW |
dc.relation.reference (參考文獻) | [12] S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 493-504, 1996. | zh_TW |
dc.relation.reference (參考文獻) | [13] Z. Chen, H. V. Jagadish, G. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In Proceedings of the IEEE International Conference on Data Engineering, pages 595-604, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [14] B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 341-350, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [15] M. H. Eich. Main memory database research directions. In Proceedings of the 6th International Workshop on Database Machines, pages 251-268, 1989. | zh_TW |
dc.relation.reference (參考文獻) | [16] A. Ferro, G. Gallo, R. Giugno, and A. Pulvirenti. Best-match retrieval for structured images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(7):707-718, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [17] R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International Conference on Very Large Data Bases, pages 436-455, 1997. | zh_TW |
dc.relation.reference (參考文獻) | [18] L. Gruenwald and S. Liu. Main memory database system. ACM SIGMOD Record, 22(4):38-44, 1993. | zh_TW |
dc.relation.reference (參考文獻) | [19] L. C. K. Hui. Color set size problem with application to string matching. In Proceedings of Combinatorial Pattern Matching, Third Annual Symposium, Lecture Notes in Computer Science, vol 644, Springer, Berlin, 1992, pages 230-243. | zh_TW |
dc.relation.reference (參考文獻) | [20] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proceedings of the 8th Workshop on Data Bases and Programming Languages, pages 149-164, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [21] S. Kannan, E. Lawler, and T. Warnow. Determining the evolutionary tree. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 475-484, 1990. | zh_TW |
dc.relation.reference (參考文獻) | [22] S. R. Kosaraju and A. L. Delcher. Large-scale assembly of DNA strings and space-efficient construction of suffix trees. In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing, pages 169-177, 1995. | zh_TW |
dc.relation.reference (參考文獻) | [23] E. Kotsakis. XSD: A hierarchical access method for indexing XML schemata. Knowledge and Information Systems, 4(2): 168-201, 2002. | zh_TW |
dc.relation.reference (參考文獻) | [24] T. W. Leung, G. Mitchell, B. Subramanian, B. Vance, S. L. Vandenberg, and S. B. Zdonik. The AQUA data model and algebra. In Proceedings of the 4th Workshop on Data Bases and Programming Languages, pages 157-175, 1993. | zh_TW |
dc.relation.reference (參考文獻) | [25] H. Lu, Y. Y. Ng, and Z. Tian. T-Tree or B-Tree: Main memory database index structure revisited. In Proceedings of the 11th Australasian Database Conference, pages 65-73, 2000. | zh_TW |
dc.relation.reference (參考文獻) | [26] U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 319-327, 1990. | zh_TW |
dc.relation.reference (參考文獻) | [27] E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262-272, 1976. | zh_TW |
dc.relation.reference (參考文獻) | [28] T. Milo and D. Suciu. Index structures for path expressions. In Proceedings of the 7th International Conference on Database Theory, pages 277-295, 1999. | zh_TW |
dc.relation.reference (參考文獻) | [29] F. Neven and T. Schwentick. Expressive and efficient pattern languages for tree-structured data. In Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 145-156, 2000. | zh_TW |
dc.relation.reference (參考文獻) | [30] A. S. Noetzel and S. M. Selkow. An analysis of the general tree-editing problem. In D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, pages 237-252. Addison-Wesley, Reading, MA, 1983. | zh_TW |
dc.relation.reference (參考文獻) | [31] D. Shasha, J. T. L. Wang, H. Shan, and K. Zhang. ATreeGrep: Approzimate searching in unordered trees. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 89-98, 2002. | zh_TW |
dc.relation.reference (參考文獻) | [32] B. Subramanian, T. W. Leung, S. L. Vandenberg, and S. B. Zdonik. Ordered types in the AQUA data model. In Proceedings of the 4th Workshop on Data Bases and Programming Languages, pages 115-135, 1993. | zh_TW |
dc.relation.reference (參考文獻) | [33] B. Subramanian, T. W. Leung, S. L. Vandenberg, and S. B. Zdonik. The AQUA approach to querying lists and trees in object-oriented databases. In Proceedings of the IEEE International Conference on Data Engineering, pages 80-89, 1995. | zh_TW |
dc.relation.reference (參考文獻) | [34] W. Szpankowski. Asymptotic properties of data compression and suffix trees. IEEE Transactions on Information Theory, 39(5): 1647-1659, 1993. | zh_TW |
dc.relation.reference (參考文獻) | [35] V. Tseng and W. Lin. A new method for indexing XML documents. In Proceedings of the 12th Workshop on Object-Oriented Technology and Applications, pages 39-46, 2001. | zh_TW |
dc.relation.reference (參考文獻) | [36] E. Ukkonen. On-line construction of suffix-trees. Algorithmica, 14(3):249-260, 1995. | zh_TW |
dc.relation.reference (參考文獻) | [37] H. Wang, S. Park, W. Fan, and P. S. Yu. ViST: A dynamic index method for querying XML data by tree structures. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 110-121, 2003. | zh_TW |
dc.relation.reference (參考文獻) | [38] P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symposium and Automata Theory, pages 1-11, 1973. | zh_TW |
dc.relation.reference (參考文獻) | [39] K. Zhang, R. Statman, and D. Shasha. On the editing distance between unordered labeled trees. Information Processing Letters, 42(3):133-139, 1992. | zh_TW |