Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Two Novel Feature Selection Approaches for Web Page Classification,
作者 Chen, Chih-Ming ; Lee, Hahn-Ming ; Chang, Yu-Jung
陳志銘
貢獻者 政大圖檔所
關鍵詞 Discriminating power measure; Feature selection; Fuzzy decision making; Web page classification
日期 2009-01
上傳時間 2013-04-18
摘要 To help the growing qualitative and quantitative demands for information from the WWW, efficient automatic Web page classifiers are urgently needed. However, a classifier applied to the WWW faces a huge-scale dimensionality problem since it must handle millions of Web pages, tens of thousands of features, and hundreds of categories. When it comes to practical implementation, reducing the dimensionality is a critically important challenge. In this paper, we propose a fuzzy ranking analysis paradigm together with a novel relevance measure, discriminating power measure (DPM), to effectively reduce the input dimensionality from tens of thousands to a few hundred with zero rejection rate and small decrease in accuracy. The two-level promotion method based on fuzzy ranking analysis is proposed to improve the behavior of each relevance measure and combine those measures to produce a better evaluation of features. Additionally, the DPM measure has low computation cost and emphasizes on both positive and negative discriminating features. Also, it emphasizes classification in parallel order, rather than classification in serial order. In our experimental results, the fuzzy ranking analysis is useful for validating the uncertain behavior of each relevance measure. Moreover, the DPM reduces input dimensionality from 10,427 to 200 with zero rejection rate and with less than 5% decline (from 84.5% to 80.4%) in the test accuracy. Furthermore, to consider the impacts on classification accuracy for the proposed DPM, the experimental results of China Time and Reuter-21578 datasets have demonstrated that the DPM provides major benefit to promote document classification accuracy rate. The results also show that the DPM indeed can reduce both redundancy and noise features to set up a better classifier.
關聯 Expert Systems with Applications, 36(1), 260-272
資料類型 article
DOI http://dx.doi.org/10.1016/j.eswa.2007.09.008
dc.contributor 政大圖檔所en_US
dc.creator (作者) Chen, Chih-Ming ; Lee, Hahn-Ming ; Chang, Yu-Jungen_US
dc.creator (作者) 陳志銘zh_TW
dc.date (日期) 2009-01en_US
dc.date.accessioned 2013-04-18-
dc.date.available 2013-04-18-
dc.date.issued (上傳時間) 2013-04-18-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/57637-
dc.description.abstract (摘要) To help the growing qualitative and quantitative demands for information from the WWW, efficient automatic Web page classifiers are urgently needed. However, a classifier applied to the WWW faces a huge-scale dimensionality problem since it must handle millions of Web pages, tens of thousands of features, and hundreds of categories. When it comes to practical implementation, reducing the dimensionality is a critically important challenge. In this paper, we propose a fuzzy ranking analysis paradigm together with a novel relevance measure, discriminating power measure (DPM), to effectively reduce the input dimensionality from tens of thousands to a few hundred with zero rejection rate and small decrease in accuracy. The two-level promotion method based on fuzzy ranking analysis is proposed to improve the behavior of each relevance measure and combine those measures to produce a better evaluation of features. Additionally, the DPM measure has low computation cost and emphasizes on both positive and negative discriminating features. Also, it emphasizes classification in parallel order, rather than classification in serial order. In our experimental results, the fuzzy ranking analysis is useful for validating the uncertain behavior of each relevance measure. Moreover, the DPM reduces input dimensionality from 10,427 to 200 with zero rejection rate and with less than 5% decline (from 84.5% to 80.4%) in the test accuracy. Furthermore, to consider the impacts on classification accuracy for the proposed DPM, the experimental results of China Time and Reuter-21578 datasets have demonstrated that the DPM provides major benefit to promote document classification accuracy rate. The results also show that the DPM indeed can reduce both redundancy and noise features to set up a better classifier.en_US
dc.language.iso en_US-
dc.relation (關聯) Expert Systems with Applications, 36(1), 260-272en_US
dc.subject (關鍵詞) Discriminating power measure; Feature selection; Fuzzy decision making; Web page classificationen_US
dc.title (題名) Two Novel Feature Selection Approaches for Web Page Classification,en_US
dc.type (資料類型) articleen
dc.identifier.doi (DOI) 10.1016/j.eswa.2007.09.008-
dc.doi.uri (DOI) http://dx.doi.org/10.1016/j.eswa.2007.09.008-