以大數據分析影響唐詩流通度之因素

學術產出-Theses

Article View/Open

pdf(18)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	以大數據分析影響唐詩流通度之因素 Using big data to analyze the reasons for the popularity of Tang poetry
作者	黃泰霖 Huang, Tai-Lin
貢獻者	宋傳欽<br>姜志銘 Song, Chwan-Chin<br>Jiang, Jyh-Ming 黃泰霖 Huang, Tai-Lin
關鍵詞	大數據唐詩流通性主成分分析因子分析詞嵌入法 Big data Tang poems Popularity Principal component analysis Factor analysis Word embedding method
日期	2018
上傳時間	27-Jul-2018 12:13:54 (UTC+8)
摘要	本研究旨在探討唐詩在流通上的特性與原因，期望能為唐詩詩學研究提供新的研究方向。本文以《唐詩排行榜》所建立的資料作為出發點，並以主成分分析與因子分析為主要的分析方法，萃取出唐詩在流傳上的特性及因素，探討古人與今人在詩文閱覽偏好的不同，並進一步利用詞嵌入法探討詩文內容相似度與主成分分析及因子分析之結果在排序上是否一致。經過對唐詩排行榜數據的研究，本文發覺主成分分析總結出以下兩項特性：1. 時代性差異 2. 詩文收錄完整性，其中時代性差異顯示『每一個時代的前理解不同，審美標準自然有明顯落差，因而造成古今閱眾對於詩文的欣賞與偏好有一定程度的差異』；而詩文收錄完整性指的是『隨著編纂需求的不同，詩作在流傳上可分為 1. 完整詩文 2. 片段名句兩種類型』。而因子分析則總結出兩個影響唐詩流通的原因：1. 歷史性強度 2. 詩學經典性，其中歷史性強度所代表的是『古今閱眾在詩文內容的喜好上，深受詩文內容的歷史背景所影響』；而詩學經典性則顯示『從詩學學術領域的角度出發，可區分詩文是否為一派之經典』利用詞嵌入法進行詩文文本的相似性研究，發現第一主成分時代性差異、第一因子詩學經典性以及第二因子歷史性強度之結果與其分別對應之詩文相似度排序具有顯著的一致性。 This study aims to explore the characteristics of the popularity of Tang poetry, and hopes to provide new research direction for Tang poetry. First, we use multivariate statistical methods, which include principal component analysis and factor analysis, to analyze the data given by the book Ranking on Tang Poems. Based on the results of analysis, we extract the characteristics of the popularity of Tang poetry, and compare modern with ancient preferences of reading. Finally, we use word embedding techniques to further analyze the suitability of the results extracted by principal component analysis and factor analysis. After analyzing the data given by the Ranking on Tang Poems, principal component analysis suggests the following two characteristics: time difference and poem integrity. “Time difference” refers to “Having its own pre-understanding, each era has its own aesthetic standard, which makes some differences of poetic appreciation between ancient and modern readers”. “Poem integrity” refers to “A poem is selected either in a complete form or in a partial form according to the editing requirements.” Based on factor analysis, we sum up two factors that may influence the popularity of Tang poetry: history related strength and poetic classicism. The “history related strength” refers to “The poem preferences of ancient and modern readers may be influenced by the history related strength of the poem.” The “poetic classicism” indicates that “Poem can be considered to lead a school of thoughts from the academic perspective.” Using word embedding techniques to study the textual similarity of poems, we find that each of first principal component and two factors has a significant rank correlation with the textual similarity of the top ranking poems based on its corresponding principal component or factor.
參考文獻	Gao,J.(2018).Chinese-poetry. https://github.com/chinese-poetry/chinese-poetry. Johnson, R. and Wichern, D.(2007). Applied multivariate statistical analysis(6th ed.). Prentice Hall, Upper Saddle River, NJ. Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. Computing Research Repository, arXiv:1405.4053. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. Computing Research Repository, arXiv:1301.3781. Řehůřek, R. and Sojka, P.(2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta. ELRA. http://is.muni.cz/publication/884893/en. 王兆鵬、張靜、邵大為、唐元 (2011)。唐詩排行榜（初版）。北京：中華書局。王宏林 (2012)。論唐詩經典的基本屬性，建構要素及途徑。許昌學院學報，31(4):54,58。蔣寅 (2003)。中國古代文學通論隋唐五代卷（初版）。遼寧：人民出版社。趙義山、李修生 (2010)。中國分體文學史詩歌卷修本（2版）。上海：上海古籍出版社。陳耀茂 (1999)。多變量解析方法與應用（初版）。台北：五南圖書出版公司。魯迅 (2005)。魯迅全集第 13卷（初版）。北京:人民文學出版社。
描述	碩士國立政治大學應用數學系 104751013
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0104751013
資料類型	thesis

dc.contributor.advisor	宋傳欽<br>姜志銘	zh_TW
dc.contributor.advisor	Song, Chwan-Chin<br>Jiang, Jyh-Ming	en_US
dc.contributor.author (Authors)	黃泰霖	zh_TW
dc.contributor.author (Authors)	Huang, Tai-Lin	en_US
dc.creator (作者)	黃泰霖	zh_TW
dc.creator (作者)	Huang, Tai-Lin	en_US
dc.date (日期)	2018	en_US
dc.date.accessioned	27-Jul-2018 12:13:54 (UTC+8)	-
dc.date.available	27-Jul-2018 12:13:54 (UTC+8)	-
dc.date.issued (上傳時間)	27-Jul-2018 12:13:54 (UTC+8)	-
dc.identifier (Other Identifiers)	G0104751013	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/118960	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	104751013	zh_TW
dc.description.abstract (摘要)	本研究旨在探討唐詩在流通上的特性與原因，期望能為唐詩詩學研究提供新的研究方向。本文以《唐詩排行榜》所建立的資料作為出發點，並以主成分分析與因子分析為主要的分析方法，萃取出唐詩在流傳上的特性及因素，探討古人與今人在詩文閱覽偏好的不同，並進一步利用詞嵌入法探討詩文內容相似度與主成分分析及因子分析之結果在排序上是否一致。經過對唐詩排行榜數據的研究，本文發覺主成分分析總結出以下兩項特性：1. 時代性差異 2. 詩文收錄完整性，其中時代性差異顯示『每一個時代的前理解不同，審美標準自然有明顯落差，因而造成古今閱眾對於詩文的欣賞與偏好有一定程度的差異』；而詩文收錄完整性指的是『隨著編纂需求的不同，詩作在流傳上可分為 1. 完整詩文 2. 片段名句兩種類型』。而因子分析則總結出兩個影響唐詩流通的原因：1. 歷史性強度 2. 詩學經典性，其中歷史性強度所代表的是『古今閱眾在詩文內容的喜好上，深受詩文內容的歷史背景所影響』；而詩學經典性則顯示『從詩學學術領域的角度出發，可區分詩文是否為一派之經典』利用詞嵌入法進行詩文文本的相似性研究，發現第一主成分時代性差異、第一因子詩學經典性以及第二因子歷史性強度之結果與其分別對應之詩文相似度排序具有顯著的一致性。	zh_TW
dc.description.abstract (摘要)	This study aims to explore the characteristics of the popularity of Tang poetry, and hopes to provide new research direction for Tang poetry. First, we use multivariate statistical methods, which include principal component analysis and factor analysis, to analyze the data given by the book Ranking on Tang Poems. Based on the results of analysis, we extract the characteristics of the popularity of Tang poetry, and compare modern with ancient preferences of reading. Finally, we use word embedding techniques to further analyze the suitability of the results extracted by principal component analysis and factor analysis. After analyzing the data given by the Ranking on Tang Poems, principal component analysis suggests the following two characteristics: time difference and poem integrity. “Time difference” refers to “Having its own pre-understanding, each era has its own aesthetic standard, which makes some differences of poetic appreciation between ancient and modern readers”. “Poem integrity” refers to “A poem is selected either in a complete form or in a partial form according to the editing requirements.” Based on factor analysis, we sum up two factors that may influence the popularity of Tang poetry: history related strength and poetic classicism. The “history related strength” refers to “The poem preferences of ancient and modern readers may be influenced by the history related strength of the poem.” The “poetic classicism” indicates that “Poem can be considered to lead a school of thoughts from the academic perspective.” Using word embedding techniques to study the textual similarity of poems, we find that each of first principal component and two factors has a significant rank correlation with the textual similarity of the top ranking poems based on its corresponding principal component or factor.	en_US
dc.description.tableofcontents	致謝 i 中文摘要 ii Abstract iii 目錄 v 表目錄 vii 圖目錄 viii 第一章緒論 1 第一節研究背景 1 第二節研究目的 3 第三節論文架構 4 一、各章節結構與內容 4 二、研究流程圖 4 第二章文獻回顧 5 第一節《唐詩排行榜》之簡介 5 第二節數據收集方式 5 第三節影響力公式 9 第三章研究方法 11 第一節主成分分析 11 第二節因子分析 15 第三節詞嵌入法 18 第四章主成分分析在唐詩排行數據之應用 21 第一節計算流程與統計報表 21 第二節結果分析 24 一、第一主成分 24 二、第二主成分 28 第五章因子分析在唐詩排行數據之應用 31 第一節計算流程與統計報表 31 第二節結果分析 35 一、第一因子 35 二、第二因子 39 第六章詞嵌入法在唐詩排行數據之應用 42 第一節唐詩 100 首向量之建立 42 第二節詩間相似度之計算 43 一、以一首詩為基準計算相似度 43 二、以多首詩為基準計算加權相似度 43 第三節詞嵌入法與主成分分析法及因子分析法結果之相關性 44 第七章結論 46 附錄 A 唐詩排行榜數據 48 附錄 B 詞嵌入法程式碼 53 參考文獻 65	zh_TW
dc.format.extent	1561494 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0104751013	en_US
dc.subject (關鍵詞)	大數據	zh_TW
dc.subject (關鍵詞)	唐詩	zh_TW
dc.subject (關鍵詞)	流通性	zh_TW
dc.subject (關鍵詞)	主成分分析	zh_TW
dc.subject (關鍵詞)	因子分析	zh_TW
dc.subject (關鍵詞)	詞嵌入法	zh_TW
dc.subject (關鍵詞)	Big data	en_US
dc.subject (關鍵詞)	Tang poems	en_US
dc.subject (關鍵詞)	Popularity	en_US
dc.subject (關鍵詞)	Principal component analysis	en_US
dc.subject (關鍵詞)	Factor analysis	en_US
dc.subject (關鍵詞)	Word embedding method	en_US
dc.title (題名)	以大數據分析影響唐詩流通度之因素	zh_TW
dc.title (題名)	Using big data to analyze the reasons for the popularity of Tang poetry	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Gao,J.(2018).Chinese-poetry. https://github.com/chinese-poetry/chinese-poetry. Johnson, R. and Wichern, D.(2007). Applied multivariate statistical analysis(6th ed.). Prentice Hall, Upper Saddle River, NJ. Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. Computing Research Repository, arXiv:1405.4053. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. Computing Research Repository, arXiv:1301.3781. Řehůřek, R. and Sojka, P.(2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta. ELRA. http://is.muni.cz/publication/884893/en. 王兆鵬、張靜、邵大為、唐元 (2011)。唐詩排行榜（初版）。北京：中華書局。王宏林 (2012)。論唐詩經典的基本屬性，建構要素及途徑。許昌學院學報，31(4):54,58。蔣寅 (2003)。中國古代文學通論隋唐五代卷（初版）。遼寧：人民出版社。趙義山、李修生 (2010)。中國分體文學史詩歌卷修本（2版）。上海：上海古籍出版社。陳耀茂 (1999)。多變量解析方法與應用（初版）。台北：五南圖書出版公司。魯迅 (2005)。魯迅全集第 13卷（初版）。北京:人民文學出版社。	zh_TW
dc.identifier.doi (DOI)	10.6814/THE.NCCU.MATH.003.2018.B01	-

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM