Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 DSM-PLW: Single-pass mining of path traversal patterns over streaming web click-sequences
作者 沈錳坤
Li, Hua-Fu ;
     Lee, Suh-Yin ;
     Shan, Man-Kwan
關鍵詞 Web click-sequence streams;
     Path traversal patterns;
     Single-pass algorithm
日期 2006-06
上傳時間 24-Aug-2009 12:29:32 (UTC+8)
摘要 Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data.
關聯 Computer Networks, 50(10), 1474-487
資料類型 article
DOI http://dx.doi.org/10.1016/j.comnet.2005.10.018
dc.creator (作者) 沈錳坤zh_TW
dc.creator (作者) Li, Hua-Fu ;
     Lee, Suh-Yin ;
     Shan, Man-Kwan
-
dc.date (日期) 2006-06-
dc.date.accessioned 24-Aug-2009 12:29:32 (UTC+8)-
dc.date.available 24-Aug-2009 12:29:32 (UTC+8)-
dc.date.issued (上傳時間) 24-Aug-2009 12:29:32 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/29575-
dc.description.abstract (摘要) Mining Web click streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some interesting characteristics, such as unknown or unbounded length, possibly a very fast arrival rate, inability to backtrack over previously arrived click-sequences, and a lack of system control over the order in which the data arrive. In this paper, we propose a projection-based, single-pass algorithm, called DSM-PLW (Data Stream Mining for Path traversal patterns in a Landmark Window), for online incremental mining of path traversal patterns over a continuous stream of maximal forward references generated at a rapid rate. According to the algorithm, each maximal forward reference of the stream is projected into a set of reference-suffix maximal forward references, and these reference-suffix maximal forward references are inserted into a new in-memory summary data structure, called SP-forest (Summary Path traversal pattern forest), which is an extended prefix tree-based data structure for storing essential information about frequent reference sequences of the stream so far. The set of all maximal reference sequences is determined from the SP-forest by a depth-first-search mechanism, called MRS-mining (Maximal Reference Sequence mining). Theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data.-
dc.format.extent 1109305 bytes-
dc.format.mimetype application/pdf-
dc.language zh_TWen
dc.language.iso en_US-
dc.relation (關聯) Computer Networks, 50(10), 1474-487en
dc.subject (關鍵詞) Web click-sequence streams;
     Path traversal patterns;
     Single-pass algorithm
-
dc.title (題名) DSM-PLW: Single-pass mining of path traversal patterns over streaming web click-sequencesen
dc.type (資料類型) articleen
dc.identifier.doi (DOI) 10.1016/j.comnet.2005.10.018en_US
dc.doi.uri (DOI) http://dx.doi.org/10.1016/j.comnet.2005.10.018en_US