《聯合報》及《人民日報》報導風格比較

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

No doi shows Citation Infomation

Simple Record
Full Record

題名	《聯合報》及《人民日報》報導風格比較 Comparing Writing Styles of United Daily News and People’s Daily
作者	廖靖芸 Liao, Ching-Yun
貢獻者	陳怡如<br>余清祥 Chen, Yi-Ju<br>Yue, Ching-Syang 廖靖芸 Liao, Ching-Yun
關鍵詞	文字探勘風格變化生物多樣性關鍵詞關聯分析 Text Mining Writing Style Species Diversity Keywords Association
日期	2023
上傳時間	1-Sep-2023 14:57:14 (UTC+8)
摘要	俗話說：「一方水土、養一方人」，由於環境制度、生活方式、觀念思想等之差異，即便同文同種的兩地，其居民的人文素質及文化特徵可能截然不同。中國與臺灣同屬於華夏民族，擁有相似的語言文化及家庭制度，但1950年代至今兩岸採用不同政治體制，加上外來文化及民族融合等因素，臺灣及中國的風俗民情之歧異性隨時間而愈發明顯。本文研究中國與臺灣兩地報紙的文字報導，比較兩者差異的依據，透過文字採礦等方法分析寫作風格，找出兩岸用字遣詞及思想觀念有哪些明顯不同。其中，中國部分選擇1946年～2021年《人民日報》頭版報導，《人民日報》屬於中國共產黨機關報，紀錄中華人民共和國建國至今發生的重要新聞；臺灣部分選擇1960年～2021年《聯合報》社論，《聯合報》屬於臺灣三大報之一，其歷史最為悠久。此次研究採用探索性資料分析（Exploratory Data Analysis），引進生物多樣性及棲息地等概念，將單字及雙字詞視為生物物種，探索用字風格及關鍵字詞的關聯及聚落。首先，我們考量兩岸報紙的內文和架構，包括標點符號、虛詞（Function Words）、句子結構等因素，藉由Entropy、TTR（相異字比例；Type-Token Ratio）等豐富度及不均度指標，萃取兩種報紙的重要特徵。發現兩報報導文字及架構有明顯的不同，包括多樣性、不均度和句長等都可看出兩報在不同的歷史事件下呈現的特色。接著使用關鍵詞偵測方法TF-IDF、TextRank和詞頻篩選出先行詞，並加入在內文分析和文章架構得到的特徵找尋關鍵詞叢，以卡方獨立性檢定、關聯指標找出與先行詞最高度相關的附屬詞。例如:先行詞「臺灣」在《人民日報》中常與「西藏、問題」提及，且第一個年代（1946～1945）也出現「殘匪、消滅」等與國共內戰相關的雙字詞。而「臺灣」在《聯合報》從第二個年代（1979～1987）始常與「獨立」共同出現，且在第四個年代（2002~2021）出現「領土」更強調主權的雙字詞出現。因此透過解構報紙內文和架構可以發現詞組的變化貼合兩岸歷史事件及當代重要議題，再加上附屬詞的詞性變化，更可以發掘兩岸報紙用字遣詞的差異。 Taiwan and Chinese residents have the same language and same race but their cultural and social characteristics are very different. These differences can be caused by education, economy, and life style. Since the 1950s, Taiwan and China have adopted different political systems, coupled with factors such as foreign culture and ethnic integration, the differences in customs Taiwan and China have become more obvious. This study examines reports from newspapers in China and Taiwan, analyzing their writing style and identifying distinct disparities in word usage across the Taiwan Strait. The Chinese part selects front-page reports from People’s Daily (1946-2021), and Taiwan’s part selects editorials from United Daily News (1960-2021), one of the three major newspapers in Taiwan. This research adopts Exploratory Data Analysis and introduces the concept species diversity for text mining, by treating single word and two-character words as biological species. We also consider punctuation marks, function words, and sentence structure in data analysis. We use indicators like Entropy, TTR (Type-Token Ratio), and other measures of richness and unevenness to extract the important features of the two newspapers. We found that the two newspapers have their own characteristics under different historical events. In addition, we use the keyword detection methods TF-IDF, TextRank, and word frequency to filter out the antecedent words, via the chi-square independence test and correlation index. Some noticeable results include “Taiwan” is often mentioned in People’s Daily with “Tibet” and “Taiwan” is often followed by “Independence” in 19791987.
參考文獻	一、中文文獻 1. 下元宏展（2013）。「來自日語的同形詞對日本漢語學習者的影響之研究」，國立臺灣師範大學華語文教學系學位論文。 2. 王麗杰、車萬翔與劉挺（2009）。「基於SVMTool的中文詞性標註」，中文信息學報，23(4)，頁16-21。 3. 余清祥（1998）。「統計在紅樓夢的應用」，國立政治大學學報，76，303-327。 4. 余清祥、葉昱廷（2020）。「以文字探勘技術分析臺灣四大報文字風格」，數位典藏與數位人文，(6)，頁69-96。 5. 何立行、余清祥與鄭文惠（2014）。「從文言到白話：《新青年》雜誌語言變化統計研究」，東亞觀念史集刊，(7)，頁427-454。 6. 李知沅（2004）。「現代漢語外來詞研究」，國立政治大學中國文學系學位論文。 7. 吳蒨芸（2022）。「從文字探勘比較臺灣與中國之寫作風格——以《聯合報》與《人民日報》為例」，國立政治大學統計學系學位論文。 8. 范賢娟、楊文金（2011）。「科學論述中文言到白話的過渡--以牛頓第一運動定律為例」，科學教育月刊，(344)。 9. 陳肇雄、張孝飛、黃河燕與蔡智（2003）。「詞性標註中生詞處理算法研究」，中文信息學報，17(5)，頁1-5。 10. 陳庭偉（2021）。「運用文字探勘分析人民日報的風格變遷」，國立政治大學統計學系學位論文。 11. 梁家安（2017）。「從國共內戰到改革開放：人民日報風格變遷之量化研究」，國立政治大學統計學系學位論文。 12. 鄒曉玲（2017）。「新時期《人民日報》新聞標題與《頻率詞典》高頻語文詞語比較」，重慶交通大學學報: 社會科學版，17(5)，頁131-135。 13. 彭明輝（2001）「《聯合報》社論對臺灣重大政治事件的立場與觀點 (1950-1995)」，國立政治大學歷史學報，(18)，頁277-308。 14. 葉妍伶（譯）（2016）。暢銷書密碼：人工智慧帶我們重新理解小說創作。新北市：雲夢千里。(Jockers, M. and Archer, J., 2016) 15. 楊錫彭（2007）。基於語言文字本體的漢語外來詞研究，上海人民出版社。二、英文文獻 1. Beliga, S., Meštrović, A., and Martinčić-Ipšić, S. (2015). “An overview of graph-based keyword extraction methods and approaches”, Journal of information and organizational sciences, 39(1), pp. 1-20. 2. Devlin, J., Chang, M. W., Lee, K. and Toutanova, K. (2018). “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805. 3. Freeman, L. (2004). “The development of social network analysis”, A Study in the Sociology of Science, 1(687), pp. 159-167. 4. Jing, L. P., Huang, H. K. and Shi, H. B. (2002). “Improved feature selection approach TFIDF in text mining”, in Proceedings. International Conference on Machine Learning and Cybernetics, 2, pp. 944-946. 5. Jockers, M. and Archer, J. (2016). The Bestseller Code, Penguin UK. 6. Mihalcea, R. and Tarau, P. (2004, July). “Textrank: Bringing order into text”, in Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411). 7. Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I. (2018). “Improving language understanding by generative pre-training”, in OpenAI Preprint. 8. Rose, S., Engel, D., Cramer, N., and Cowley, W. (2010). “Automatic keyword extraction from individual documents”, Text mining: applications and theory, pp.1-20. 9. Tan, A. H. (1999). “Text mining: The state of the art and the challenges”, in Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases, 8, pp. 65-70. 10. Terrell, G. R., and Scott, D. W. (1992). “Variable kernel density estimation”, The Annals of Statistics, 1236-1265. 11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. and Polosukhin, I. (2017). “Attention is all you need”, Advances in neural information processing systems, 30, pp. 5998-6008.
描述	碩士國立政治大學統計學系 110354019
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110354019
資料類型	thesis

dc.contributor.advisor	陳怡如<br>余清祥	zh_TW
dc.contributor.advisor	Chen, Yi-Ju<br>Yue, Ching-Syang	en_US
dc.contributor.author (Authors)	廖靖芸	zh_TW
dc.contributor.author (Authors)	Liao, Ching-Yun	en_US
dc.creator (作者)	廖靖芸	zh_TW
dc.creator (作者)	Liao, Ching-Yun	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	1-Sep-2023 14:57:14 (UTC+8)	-
dc.date.available	1-Sep-2023 14:57:14 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2023 14:57:14 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110354019	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/146904	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	110354019	zh_TW
dc.description.abstract (摘要)	俗話說：「一方水土、養一方人」，由於環境制度、生活方式、觀念思想等之差異，即便同文同種的兩地，其居民的人文素質及文化特徵可能截然不同。中國與臺灣同屬於華夏民族，擁有相似的語言文化及家庭制度，但1950年代至今兩岸採用不同政治體制，加上外來文化及民族融合等因素，臺灣及中國的風俗民情之歧異性隨時間而愈發明顯。本文研究中國與臺灣兩地報紙的文字報導，比較兩者差異的依據，透過文字採礦等方法分析寫作風格，找出兩岸用字遣詞及思想觀念有哪些明顯不同。其中，中國部分選擇1946年～2021年《人民日報》頭版報導，《人民日報》屬於中國共產黨機關報，紀錄中華人民共和國建國至今發生的重要新聞；臺灣部分選擇1960年～2021年《聯合報》社論，《聯合報》屬於臺灣三大報之一，其歷史最為悠久。此次研究採用探索性資料分析（Exploratory Data Analysis），引進生物多樣性及棲息地等概念，將單字及雙字詞視為生物物種，探索用字風格及關鍵字詞的關聯及聚落。首先，我們考量兩岸報紙的內文和架構，包括標點符號、虛詞（Function Words）、句子結構等因素，藉由Entropy、TTR（相異字比例；Type-Token Ratio）等豐富度及不均度指標，萃取兩種報紙的重要特徵。發現兩報報導文字及架構有明顯的不同，包括多樣性、不均度和句長等都可看出兩報在不同的歷史事件下呈現的特色。接著使用關鍵詞偵測方法TF-IDF、TextRank和詞頻篩選出先行詞，並加入在內文分析和文章架構得到的特徵找尋關鍵詞叢，以卡方獨立性檢定、關聯指標找出與先行詞最高度相關的附屬詞。例如:先行詞「臺灣」在《人民日報》中常與「西藏、問題」提及，且第一個年代（1946～1945）也出現「殘匪、消滅」等與國共內戰相關的雙字詞。而「臺灣」在《聯合報》從第二個年代（1979～1987）始常與「獨立」共同出現，且在第四個年代（2002~2021）出現「領土」更強調主權的雙字詞出現。因此透過解構報紙內文和架構可以發現詞組的變化貼合兩岸歷史事件及當代重要議題，再加上附屬詞的詞性變化，更可以發掘兩岸報紙用字遣詞的差異。	zh_TW
dc.description.abstract (摘要)	Taiwan and Chinese residents have the same language and same race but their cultural and social characteristics are very different. These differences can be caused by education, economy, and life style. Since the 1950s, Taiwan and China have adopted different political systems, coupled with factors such as foreign culture and ethnic integration, the differences in customs Taiwan and China have become more obvious. This study examines reports from newspapers in China and Taiwan, analyzing their writing style and identifying distinct disparities in word usage across the Taiwan Strait. The Chinese part selects front-page reports from People’s Daily (1946-2021), and Taiwan’s part selects editorials from United Daily News (1960-2021), one of the three major newspapers in Taiwan. This research adopts Exploratory Data Analysis and introduces the concept species diversity for text mining, by treating single word and two-character words as biological species. We also consider punctuation marks, function words, and sentence structure in data analysis. We use indicators like Entropy, TTR (Type-Token Ratio), and other measures of richness and unevenness to extract the important features of the two newspapers. We found that the two newspapers have their own characteristics under different historical events. In addition, we use the keyword detection methods TF-IDF, TextRank, and word frequency to filter out the antecedent words, via the chi-square independence test and correlation index. Some noticeable results include “Taiwan” is often mentioned in People’s Daily with “Tibet” and “Taiwan” is often followed by “Independence” in 19791987.	en_US
dc.description.tableofcontents	第一章緒論1 第一節研究動機1 第二節研究目的2 第二章文獻探討4 第一節文獻回顧4 第二節資料介紹5 第三節研究方法8 第三章字彙與雙字詞13 第一節篇幅差異13 第二節多樣性 15 第三節不均度 18 第四節相似度與分群 21 第四章文章架構 24 第一節標點符號與句長 24 第二節虛字 30 第三節詞性 37 第四節國家名、城市名與人名 41 第五章關鍵詞叢 52 第一節先行詞與附屬詞 52 第二節名詞與動詞 55 第三節國家與城市 58 第四節人名 61 第五節兩報共同詞組結果 63 第六章結論與建議 64 第一節結論 64 第二節研究建議與限制 65 參考文獻 67 附錄 70	zh_TW
dc.format.extent	16232763 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110354019	en_US
dc.subject (關鍵詞)	文字探勘	zh_TW
dc.subject (關鍵詞)	風格變化	zh_TW
dc.subject (關鍵詞)	生物多樣性	zh_TW
dc.subject (關鍵詞)	關鍵詞	zh_TW
dc.subject (關鍵詞)	關聯分析	zh_TW
dc.subject (關鍵詞)	Text Mining	en_US
dc.subject (關鍵詞)	Writing Style	en_US
dc.subject (關鍵詞)	Species Diversity	en_US
dc.subject (關鍵詞)	Keywords	en_US
dc.subject (關鍵詞)	Association	en_US
dc.title (題名)	《聯合報》及《人民日報》報導風格比較	zh_TW
dc.title (題名)	Comparing Writing Styles of United Daily News and People’s Daily	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	一、中文文獻 1. 下元宏展（2013）。「來自日語的同形詞對日本漢語學習者的影響之研究」，國立臺灣師範大學華語文教學系學位論文。 2. 王麗杰、車萬翔與劉挺（2009）。「基於SVMTool的中文詞性標註」，中文信息學報，23(4)，頁16-21。 3. 余清祥（1998）。「統計在紅樓夢的應用」，國立政治大學學報，76，303-327。 4. 余清祥、葉昱廷（2020）。「以文字探勘技術分析臺灣四大報文字風格」，數位典藏與數位人文，(6)，頁69-96。 5. 何立行、余清祥與鄭文惠（2014）。「從文言到白話：《新青年》雜誌語言變化統計研究」，東亞觀念史集刊，(7)，頁427-454。 6. 李知沅（2004）。「現代漢語外來詞研究」，國立政治大學中國文學系學位論文。 7. 吳蒨芸（2022）。「從文字探勘比較臺灣與中國之寫作風格——以《聯合報》與《人民日報》為例」，國立政治大學統計學系學位論文。 8. 范賢娟、楊文金（2011）。「科學論述中文言到白話的過渡--以牛頓第一運動定律為例」，科學教育月刊，(344)。 9. 陳肇雄、張孝飛、黃河燕與蔡智（2003）。「詞性標註中生詞處理算法研究」，中文信息學報，17(5)，頁1-5。 10. 陳庭偉（2021）。「運用文字探勘分析人民日報的風格變遷」，國立政治大學統計學系學位論文。 11. 梁家安（2017）。「從國共內戰到改革開放：人民日報風格變遷之量化研究」，國立政治大學統計學系學位論文。 12. 鄒曉玲（2017）。「新時期《人民日報》新聞標題與《頻率詞典》高頻語文詞語比較」，重慶交通大學學報: 社會科學版，17(5)，頁131-135。 13. 彭明輝（2001）「《聯合報》社論對臺灣重大政治事件的立場與觀點 (1950-1995)」，國立政治大學歷史學報，(18)，頁277-308。 14. 葉妍伶（譯）（2016）。暢銷書密碼：人工智慧帶我們重新理解小說創作。新北市：雲夢千里。(Jockers, M. and Archer, J., 2016) 15. 楊錫彭（2007）。基於語言文字本體的漢語外來詞研究，上海人民出版社。二、英文文獻 1. Beliga, S., Meštrović, A., and Martinčić-Ipšić, S. (2015). “An overview of graph-based keyword extraction methods and approaches”, Journal of information and organizational sciences, 39(1), pp. 1-20. 2. Devlin, J., Chang, M. W., Lee, K. and Toutanova, K. (2018). “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805. 3. Freeman, L. (2004). “The development of social network analysis”, A Study in the Sociology of Science, 1(687), pp. 159-167. 4. Jing, L. P., Huang, H. K. and Shi, H. B. (2002). “Improved feature selection approach TFIDF in text mining”, in Proceedings. International Conference on Machine Learning and Cybernetics, 2, pp. 944-946. 5. Jockers, M. and Archer, J. (2016). The Bestseller Code, Penguin UK. 6. Mihalcea, R. and Tarau, P. (2004, July). “Textrank: Bringing order into text”, in Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411). 7. Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I. (2018). “Improving language understanding by generative pre-training”, in OpenAI Preprint. 8. Rose, S., Engel, D., Cramer, N., and Cowley, W. (2010). “Automatic keyword extraction from individual documents”, Text mining: applications and theory, pp.1-20. 9. Tan, A. H. (1999). “Text mining: The state of the art and the challenges”, in Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases, 8, pp. 65-70. 10. Terrell, G. R., and Scott, D. W. (1992). “Variable kernel density estimation”, The Annals of Statistics, 1236-1265. 11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. and Polosukhin, I. (2017). “Attention is all you need”, Advances in neural information processing systems, 30, pp. 5998-6008.	zh_TW

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM