學術產出-會議論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 Distributed keyword vector representation for document categorization
作者 Hsieh, Yu Lun
Liu, Shih Hung
Chang, Yung Chun
Hsu, Wen-Lian
貢獻者 資科系
關鍵詞 Artificial intelligence; Neural networks; Vectors; Comprehensive performance evaluation; Context information; Document categorization; Document Representation; Information explosion; Similarity measure; Vector representations; word embedding; Vector spaces
日期 2016-02
上傳時間 31-八月-2017 14:51:47 (UTC+8)
摘要 In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
關聯 TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence , 245-251
資料類型 conference
DOI http://dx.doi.org/10.1109/TAAI.2015.7407126
dc.contributor 資科系
dc.creator (作者) Hsieh, Yu Lunen_US
dc.creator (作者) Liu, Shih Hungen_US
dc.creator (作者) Chang, Yung Chunen_US
dc.creator (作者) Hsu, Wen-Lianen_US
dc.date (日期) 2016-02
dc.date.accessioned 31-八月-2017 14:51:47 (UTC+8)-
dc.date.available 31-八月-2017 14:51:47 (UTC+8)-
dc.date.issued (上傳時間) 31-八月-2017 14:51:47 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/112468-
dc.description.abstract (摘要) In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
dc.format.extent 209 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence , 245-251en_US
dc.subject (關鍵詞) Artificial intelligence; Neural networks; Vectors; Comprehensive performance evaluation; Context information; Document categorization; Document Representation; Information explosion; Similarity measure; Vector representations; word embedding; Vector spaces
dc.title (題名) Distributed keyword vector representation for document categorizationen_US
dc.type (資料類型) conference
dc.identifier.doi (DOI) 10.1109/TAAI.2015.7407126
dc.doi.uri (DOI) http://dx.doi.org/10.1109/TAAI.2015.7407126