Publications-Proceedings

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Distributed keyword vector representation for document categorization
作者 Hsieh, Yu Lun
Liu, Shih Hung
Chang, Yung Chun
Hsu, Wen-Lian
貢獻者 資科系
關鍵詞 Artificial intelligence; Neural networks; Vectors; Comprehensive performance evaluation; Context information; Document categorization; Document Representation; Information explosion; Similarity measure; Vector representations; word embedding; Vector spaces
日期 2016-02
上傳時間 31-Aug-2017 14:51:47 (UTC+8)
摘要 In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
關聯 TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence , 245-251
資料類型 conference
DOI http://dx.doi.org/10.1109/TAAI.2015.7407126
dc.contributor 資科系
dc.creator (作者) Hsieh, Yu Lunen_US
dc.creator (作者) Liu, Shih Hungen_US
dc.creator (作者) Chang, Yung Chunen_US
dc.creator (作者) Hsu, Wen-Lianen_US
dc.date (日期) 2016-02
dc.date.accessioned 31-Aug-2017 14:51:47 (UTC+8)-
dc.date.available 31-Aug-2017 14:51:47 (UTC+8)-
dc.date.issued (上傳時間) 31-Aug-2017 14:51:47 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/112468-
dc.description.abstract (摘要) In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
dc.format.extent 209 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence , 245-251en_US
dc.subject (關鍵詞) Artificial intelligence; Neural networks; Vectors; Comprehensive performance evaluation; Context information; Document categorization; Document Representation; Information explosion; Similarity measure; Vector representations; word embedding; Vector spaces
dc.title (題名) Distributed keyword vector representation for document categorizationen_US
dc.type (資料類型) conference
dc.identifier.doi (DOI) 10.1109/TAAI.2015.7407126
dc.doi.uri (DOI) http://dx.doi.org/10.1109/TAAI.2015.7407126