Extracting Structured Subject Information from Digital Document Archives | Publication | NCCU Academic Hub

Publications-Periodical Articles

Article View/Open

html(331)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	Extracting Structured Subject Information from Digital Document Archives
作者	劉吉軒 Liu, Jyi-Shane; Lee, Ching-Ying
關鍵詞	information extraction, digital document archives, value-added services.
日期	2006-11
上傳時間	16-Dec-2008 16:46:09 (UTC+8)
摘要	Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP, and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database.
關聯	Digital Libraries: Achievements, Challenges and Opportunities, Lecture Notes in Computer Science series 4312, pp.141-150
資料類型	article
DOI	http://dx.doi.org/10.1007/11931584_17

dc.creator (作者)	劉吉軒	zh_TW
dc.creator (作者)	Liu, Jyi-Shane; Lee, Ching-Ying	-
dc.date (日期)	2006-11	en_US
dc.date.accessioned	16-Dec-2008 16:46:09 (UTC+8)	-
dc.date.available	16-Dec-2008 16:46:09 (UTC+8)	-
dc.date.issued (上傳時間)	16-Dec-2008 16:46:09 (UTC+8)	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/15006	-
dc.description.abstract (摘要)	Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP, and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database.	en-US
dc.format	application/	en_US
dc.language	en	en_US
dc.language	en-US	en_US
dc.language.iso	en_US	-
dc.relation (關聯)	Digital Libraries: Achievements, Challenges and Opportunities, Lecture Notes in Computer Science series 4312, pp.141-150	en_US
dc.subject (關鍵詞)	information extraction, digital document archives, value-added services.	en-US
dc.title (題名)	Extracting Structured Subject Information from Digital Document Archives	en_US
dc.type (資料類型)	article	en
dc.identifier.doi (DOI)	10.1007/11931584_17	en_US
dc.doi.uri (DOI)	http://dx.doi.org/10.1007/11931584_17	en_US