Effective Database Transformation and Efficient Support Computation for Mining Sequential Patterns | Publication | NCCU Academic Hub

Publications-Periodical Articles

Article View/Open

pdf(1448)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	Effective Database Transformation and Efficient Support Computation for Mining Sequential Patterns
作者	C-W- Cho;Y-H- Wu;Chen, Arbee L. P. 陳良弼
關鍵詞	Data mining;Sequential patterns;Database transformation;Support computation;Database projection
日期	2009-02
上傳時間	16-Dec-2008 16:43:39 (UTC+8)
摘要	In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.
關聯	Journal of Intelligent Information Systems, 32(1), 23-51
資料類型	article
DOI	http://dx.doi.org/10.1007/s10844-007-0047-y

dc.creator (作者)	C-W- Cho;Y-H- Wu;Chen, Arbee L. P.	en_US
dc.creator (作者)	陳良弼	-
dc.date (日期)	2009-02	en_US
dc.date.accessioned	16-Dec-2008 16:43:39 (UTC+8)	-
dc.date.available	16-Dec-2008 16:43:39 (UTC+8)	-
dc.date.issued (上傳時間)	16-Dec-2008 16:43:39 (UTC+8)	-
dc.identifier.uri (URI)	https://ah.lib.nccu.edu.tw/item?item_id=15916	-
dc.description.abstract (摘要)	In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.	en_US
dc.format	application/	en_US
dc.language	en	en_US
dc.language	en-US	en_US
dc.language.iso	en_US	-
dc.relation (關聯)	Journal of Intelligent Information Systems, 32(1), 23-51	en_US
dc.subject (關鍵詞)	Data mining;Sequential patterns;Database transformation;Support computation;Database projection	en_US
dc.title (題名)	Effective Database Transformation and Efficient Support Computation for Mining Sequential Patterns	en_US
dc.type (資料類型)	article	en
dc.identifier.doi (DOI)	10.1007/s10844-007-0047-y	en_US
dc.doi.uri (DOI)	http://dx.doi.org/10.1007/s10844-007-0047-y	en_US