dc.contributor.advisor | 劉吉軒<br>陳光華 | zh_TW |
dc.contributor.advisor | Liu, Jyi-Shane<br>Chen, Kuang-hua | en_US |
dc.contributor.author (Authors) | 江珮翎 | zh_TW |
dc.contributor.author (Authors) | Chiang, Pei-ling | en_US |
dc.creator (作者) | 江珮翎 | zh_TW |
dc.creator (作者) | Chiang, Pei-ling | en_US |
dc.date (日期) | 2002 | en_US |
dc.date.accessioned | 17-Sep-2009 13:51:52 (UTC+8) | - |
dc.date.available | 17-Sep-2009 13:51:52 (UTC+8) | - |
dc.date.issued (上傳時間) | 17-Sep-2009 13:51:52 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0089753004 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/32615 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊科學學系 | zh_TW |
dc.description (描述) | 89753004 | zh_TW |
dc.description (描述) | 91 | zh_TW |
dc.description.abstract (摘要) | 在網路資訊爆炸的年代,資料的分析整理日趨重要,本論文之研究目標正是針對資料做標題生成的處理,為資料自動生成標題,進而將資料加值化,轉化為資訊。研究者首先閱讀英文相關文獻,分析整理後,認為中文的處理方式與英文有所差異,因此,在本論文中,提出與英文不同之中文前置作業與自動標題生成之方法。 研究者針對標題的自動生成提出了幾種特徵值考量,包括候選詞權重值,訓練標題-文本詞彙,標題長度的關係及詞組間距。本論文之研究分為兩階段,第一階段為訓練階段,將文件做前置處理與斷詞,接著訓練標題-文本詞彙與統計文件標題長度的機率。第二階段為執行階段,分析新文件之候選詞權重值,並參照訓練階段之標題-文本詞彙與標題長度之機率值參考表,考量詞組間距後自動為文件產生標題。本論文所採用的訓練文件集來源為1998年至1999年五種報紙,涵蓋不同主題,共84,211篇文件,而測試文件的實驗分為Outside Test與Inside Test兩部分。 研究者為實驗結果進行兩種評估,一為電腦評估,將自動生成之標題與記者所擬訂的標題比對後,計算出求準率、求全率與F1。Outside Test求準率為14.21%、求全率為11.43%、F1為12.67%。Inside Test求準率為15.84%、求全率為12.94%、F1為14.21%。實驗結果顯示,正確率方面與其他文獻之英文文件標題的生成結果(F1=3.2%~24%)相近,但與實際標題仍有差距,因此,在未來工作上,仍有很大的發展空間。二為人為評估,讓使用者在閱讀自動生成之標題後,加以評分。自動生成之標題的流暢度還算不錯。然總結來說,本論文之研究尚屬初始階段,盼未來能更加成熟,並可有更進一步的創新與改進。 | zh_TW |
dc.description.abstract (摘要) | As the number of digital documents on internet is growing up, analysis and organization of documents become quite important. In this thesis, we propose an approach for headline generation of documents. We can try our best to transfer the document data into information in some sense using the proposed approach. We review literature about the related topics, and present a different approach to deal with Chinese documents rather than English documents. We propose some approach to Chinese documents headline generation. The thesis is separate two steps, one is training step, and the other is execution step. On the first step, the documents were preprocessed. Secondly, we trained the probability of headline-text words, and headline’s length. And on the execution step, we analyzed scores of headline candidates and gap, then referred to the probability of headline-text words, and headline’s length, finally we automatically generate headline for documents. The training documents are selected from a test collection for information retrieval, CIRB. Totally 84,211 Chinese news articles published between 1998 and 1999 are selected. Testing documents has two parts, one is for outside test, and the other is for inside test. We conducted two evaluations, one is the automatic evaluation using metrics of presicion, recall and F1; the other is the human assessment. The precision of outside test is 14.21%、recall is 11.43%、F1 is 12.67%. And the precision of inside test is 15.84%、recall is 12.94%、F1 is 14.21%。The automatic evaluation result shows the accruacy is still not good enough, and the human assessment evaluation shows our approach can produce human-readable headlines. | en_US |
dc.description.tableofcontents | 目錄第一章 緒論 11.1 背景 11.2 問題陳述 31.3 研究動機 31.4 研究目的 41.5研究假定 51.6研究方法與步驟 61.7名詞描述 91.8論文架構與貢獻 9第二章 文獻探討 112.1 文件標題自動生成相關研究 112.2 文件標題自動生成之技術說明 142.2.1 以字為單位當作標題 142.2.2以句子為單位當作標題 182.2.3 各篇文章研究方式總表 192.3各文獻評估方式與實驗結果比較 212.4 總結 25第三章 中文標題之生成 273.1 文件標題自動產生之概念形成 273.2 前置作業 293.3 特徵值分析 303.3.1 候選詞權重值 303.3.2 標題詞長度 323.3.3 候選詞彙的間距 323.3.4 演算步驟 343.4 總結 38第四章 實作方法與結果分析 394.1 訓練文件與測試文件說明 404.2 實驗一 424.2.1 前置作業 424.2.2 標題長度 434.3 實驗二 454.3.1 前置作業 454.3.2 標題長度 464.3.3 文本詞(Text word)也出現在標題(Headline)的機率值 484.3.4 測試文件生成標題詞之評估結果 494.4 實驗三 524.4.1 測試文件生成標題詞之評估結果 524.5 分析與探討 534.6總結 53第五章 人為評估結果與探討 545.1 問卷法之自動生成標題長度 545.2 問卷發放對象、評分方式與問卷內容 555.2.1 問卷對象與評分方式 555.2.2 問卷內容 555.3 人為評估與實際例子分析 565.3.1 問卷評估結果與探討 565.3.2 實例分析與探討 585. 4總結 59第六章 結論與未來計劃 606.1 結論 606.2 未來工作 61參考文獻 63附錄一 自動生成之標題與原新聞稿 65 | zh_TW |
dc.format.extent | 85569 bytes | - |
dc.format.extent | 289029 bytes | - |
dc.format.extent | 187889 bytes | - |
dc.format.extent | 340548 bytes | - |
dc.format.extent | 82908 bytes | - |
dc.format.extent | 196927 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0089753004 | en_US |
dc.subject (關鍵詞) | 標題 | zh_TW |
dc.subject (關鍵詞) | 自動生成 | zh_TW |
dc.subject (關鍵詞) | 自然語言 | zh_TW |
dc.subject (關鍵詞) | 生成 | zh_TW |
dc.subject (關鍵詞) | 新聞標題 | zh_TW |
dc.title (題名) | 中文新聞標題自動生成之研究 | zh_TW |
dc.title (題名) | A Study on the Automatic Generation for Headlines of Chinese News Articles | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | 參考文獻 | zh_TW |
dc.relation.reference (參考文獻) | [1]Michele Banko, Vibhu O. Mittal, and Michael J. Witbrock. 2000.“Headline Generation Based on Statistical Translation”. 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, 1-8 October. | zh_TW |
dc.relation.reference (參考文獻) | [2]Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993 . “The mathematics of statistical machine translation: Parameter estimation”. Computational Linguistics, (2): 263-312. | zh_TW |
dc.relation.reference (參考文獻) | [3]Brown, Cocke, Della-Pietra, Della-Pietra, Jelinek, Lafferty, Mercer, Roossin. 1990. “A Statistical Approach to Machine Translation”. Computational Linguistics, 16(2) June. | zh_TW |
dc.relation.reference (參考文獻) | [4]Kuang-hua Chen and Hsin-Hsi Chen. 2001. “The Chinese Text Retrieval Tasks of NTCIR Workshop 2”. Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization (NTCIR 2), pp. 51-72. | zh_TW |
dc.relation.reference (參考文獻) | [5]G. D. Forney. 1973. “The Viterbi Algorithm”. Proc of the IEEE, pp. 268-278. | zh_TW |
dc.relation.reference (參考文獻) | [6]Rong Jin and Alexander G. Hauptmann. 2001. “Headline Generation using a Training Corpus”. Second International Conference on Intelligent Text Text Processing and Computational Linguistics. | zh_TW |
dc.relation.reference (參考文獻) | [7]R. Jin and A. G. Hauptmann. 2000. “Title Generation for Spoken Broadcast News using a Training Corpus”.Proceedings of ICSLP 2000, Beijing China. | zh_TW |
dc.relation.reference (參考文獻) | [8]S. Katz. 1987. “Estimation of probabilities from sparse data for the language model component of a speech recognizer”. IEEE Transactions on Acoustics Speech and Signal Processing, pp. 24. | zh_TW |
dc.relation.reference (參考文獻) | [9]Paul E. Kennedy and Alexander G. Hauptmann. 2000. “Automatic Title Generation for EM”. Proceedings of the fifth ACM conference on Digital libraries. | zh_TW |
dc.relation.reference (參考文獻) | [10]G..J. McLachlan and K. E. Basford. 1988. Mixture Models. Marcel Dekker, NY. | zh_TW |
dc.relation.reference (參考文獻) | [11]M. Mitra, Amit Sighal, and Chris Buckley. 1997. “Automatic text summarization by paragraph extraction”. In Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain. | zh_TW |
dc.relation.reference (參考文獻) | [12]Papineni, Kishore papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu. 2001. “IBM Research Division Technical Report”. RC22176(W0109-022), Yorktown Heights, New York. | zh_TW |
dc.relation.reference (參考文獻) | [13] Gernard Salton, A.Singhal, M. Mitra, and C. Buckley. 1997 .“Automatic text structuring and summary”. Info. Proc. And Management, 33(2):193-207. | zh_TW |
dc.relation.reference (參考文獻) | [14] T. Strzalkowski, J. Wang, and B.Wise. 1998. “A robust practical text summarization system”. In AAAI Intelligent Text Summarization Workshop, pp. 26-30, Stanford, CA. | zh_TW |
dc.relation.reference (參考文獻) | [15]M. Witbrock and V. Mittal. 1999. “Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries”. Proceedings of SIGIR 99, Berkeley, CA, August. | zh_TW |
dc.relation.reference (參考文獻) | [16]David Zajic, Bonnie Dorr, and Richard Schwartz. 2002. “Automatic headline generation for newspaper stories”. In Proceedings of the Workshop on Text Summarization Postconference workshop of ACL-02, Philadelphia, PA. | zh_TW |
dc.relation.reference (參考文獻) | [17]陳光華。電子文獻資料主題分類之自動辨識。行政院國家科學委員會專題研究計畫成果報告,NSC 86-2621-E-002-025T,民國86年9月。 | zh_TW |