Distributed keyword vector representation for document categorization | NCCU Academic Hub

學術產出-會議論文

文章檢視/開啟

html(954)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	Distributed keyword vector representation for document categorization
作者	Hsieh, Yu Lun Liu, Shih Hung Chang, Yung Chun Hsu, Wen-Lian
貢獻者	資科系
關鍵詞	Artificial intelligence; Neural networks; Vectors; Comprehensive performance evaluation; Context information; Document categorization; Document Representation; Information explosion; Similarity measure; Vector representations; word embedding; Vector spaces
日期	2016-02
上傳時間	31-八月-2017 14:51:47 (UTC+8)
摘要	In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
關聯	TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence , 245-251
資料類型	conference
DOI	http://dx.doi.org/10.1109/TAAI.2015.7407126

dc.contributor	資科系
dc.creator (作者)	Hsieh, Yu Lun	en_US
dc.creator (作者)	Liu, Shih Hung	en_US
dc.creator (作者)	Chang, Yung Chun	en_US
dc.creator (作者)	Hsu, Wen-Lian	en_US
dc.date (日期)	2016-02
dc.date.accessioned	31-八月-2017 14:51:47 (UTC+8)	-
dc.date.available	31-八月-2017 14:51:47 (UTC+8)	-
dc.date.issued (上傳時間)	31-八月-2017 14:51:47 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/112468	-
dc.description.abstract (摘要)	In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
dc.format.extent	209 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence , 245-251	en_US
dc.subject (關鍵詞)	Artificial intelligence; Neural networks; Vectors; Comprehensive performance evaluation; Context information; Document categorization; Document Representation; Information explosion; Similarity measure; Vector representations; word embedding; Vector spaces
dc.title (題名)	Distributed keyword vector representation for document categorization	en_US
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.1109/TAAI.2015.7407126
dc.doi.uri (DOI)	http://dx.doi.org/10.1109/TAAI.2015.7407126