Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 從文言到白話:《新青年》雜誌語言變化統計研究
其他題名 From Classical Chinese to Modern Chinese: A Study of Function Words from Xin Qing Nian
作者 何立行;余清祥;鄭文惠
Ho, Li-Hsing;Yue, Ching-Syang;Cheng, Wen-Huei
貢獻者 中文系
關鍵詞 文體分析 ; 五四運動 ; 《新青年》 ; 虛字分析 ; 生物多樣性
Stylistic Analysis ; May 4^th Movement ; La Jeunesse ; Function Words Analysis ; Species Diversity
日期 2014-12
上傳時間 10-Jul-2017 11:38:43 (UTC+8)
摘要 現代漢語與古代漢語最重要的區別之一,乃是書面語以語體文為主,又稱為白話文,與古代的文言文相對。目前學者考察現代白話的發展,多推前至晚清傳教士所辦報刊,但使白話有效取代文言成為主流書面語的是五四運動,而五四運動時期倡導白話文最力的莫過於《新青年》雜誌。過去學者主要以文本分析為研究方法,就理論建樹、創作實踐和議論宣傳等方面,探討《新青年》雜誌對白話文通行和白話文學發展的貢獻。然而,漢語書面語由以文言為主變為白話當家,從文人學者在《新青年》一類的刊物上提出主張,到真正在社會上普及,歷經了多長的時間?轉變的過程為何?白話什麼時候取代了文言?如何證明?這些問題恐怕難以用傳統的研究方式來回答,因為,再勤奮的研究者也無法以人力按時序遍讀五四前後現存的巨量文獻,一一區分文白計算消長。但我們能否藉助數位研究方法,另闢蹊徑,尋找答案?或許,從建立客觀(而非直覺)判讀文、白篇章的有效工具開始,是一個值得探索的方向。本文以《新青年》全文共十一卷為素材,透過統計方法比較各卷的異同,觀察語言轉換歷程,尋找可以建立客觀判讀文、白篇章的指標。使用的方法大致分為兩類:監督學習(supervised learning)、非監督學習(un-supervised learning)。第一類先設定比較用的指標(或是變數、關鍵詞),再分析各卷的指標特性;第二類不預設比較標的,以不同角度探討文章風格,藉以找出區隔文、白篇章的關鍵因素。本研究的監督學習選用文、白的特定虛字,選擇虛字而非實詞作為統計對象,乃是為了將文章內容對語言形式的影響降至最低,驗證從慣用虛字區別文、白篇章的可行性。非監督學習的分析角度以用字、句子架構為主要方向,因為字彙多寡、使用頻率等統計數據,在比較文學中歷來都用以判斷寫作風格。無論監督學習式的虛字分析或非監督學習式的用字習慣分析,都能反映出《新青年》初期與晚期文體的變化。就發展客觀判讀工具而言,以虛字為指標也許較具潛力。值得注意的是,在總字數、不同字數、每句字數等的比較中,我們發現文言與白話有著明顯差異:文言篇章總字數少而用字多,白話篇章則是總字數多而用字少。明顯可看出白話文主要俾利於世俗啟蒙,因而總字數多而用字少;此外,我們或可借用生物多樣性的概念,追問文言、白話兩者內部生態系的差異;並進一步思考,在這樣的差異下,除了虛字、字彙總數及其使用比例之外,還有哪些具有成為客觀區辨指標潛力的語言表徵,值得我們繼續開發。
Is it possible for computers to tell whether a text was written in classical Chinese or vernacular modern Chinese? Can the new developments of digital humanities help find out the transformation of written Chinese language during the late Qing and early Republic? As previous scholars have pointed out, in the early stage of the history of modern Chinese, missionaries and reformists only used vernacular language as a tool to enlighten the public. Classical Chinese remained the standard written language until May Forth Movement in 1919, when Xin Qing Nian became the most influential publication. Throughout the last century, scholars have scrutinized the theoretical arguments and creative writing practices in Xin Qing Nian and several other progressive magazines to delineate the changing history of the language. But questions such as how long did it take for literati as well as the general public to adopt the vernacular language as the written standard, or how did the new standard spread from radical revolutionary magazines to other publications like entertainment magazines or newspapers, remain unanswered. If we can teach computers to distinguish between classical and modern Chinese, it would be possible to bring in much more digitized texts in that period to study and to answer those questions. To achieve this goal, we adopt the concept of "genome mapping" to differentiate between classical and modern Chinese in this study. We propose two approaches, supervised learning and un-supervised learning, to compare the differences in writing style between classical Chinese and modern Chinese. In addition to concepts and methods used in a lexical analysis, we also adapt the ideas in ecology. Supervised learning has long been used in linguistics to differentiate authorship via keywords. We choose ten function words for classical and modern Chinese each as the keywords, and we use Gini`s index of volumes 1 and 11 from Xin Qing Nian to demonstrate the comparison. There are no standard operating procedures for applying the unsupervised learning, and it is the main reason why this type of approaches is difficult to implement. In this study, we choose the diversity indices for un-supervising learning, for example, Gini`s index, entropy, and Simpson`s index, for measuring the statistical dispersion and evenness (or equality) of the words used. Based on our analyses, it seems that the later volumes (such as Volume 11) have lower species diversity, indicating that people can read articles without recognizing many words, which matches to the purpose of the May 4^(th) Movement.
關聯 東亞觀念史集刊, 7, 427-454
資料類型 article
dc.contributor 中文系
dc.creator (作者) 何立行;余清祥;鄭文惠zh-tw
dc.creator (作者) Ho, Li-Hsing;Yue, Ching-Syang;Cheng, Wen-Hueien-US
dc.date (日期) 2014-12
dc.date.accessioned 10-Jul-2017 11:38:43 (UTC+8)-
dc.date.available 10-Jul-2017 11:38:43 (UTC+8)-
dc.date.issued (上傳時間) 10-Jul-2017 11:38:43 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/110745-
dc.description.abstract (摘要) 現代漢語與古代漢語最重要的區別之一,乃是書面語以語體文為主,又稱為白話文,與古代的文言文相對。目前學者考察現代白話的發展,多推前至晚清傳教士所辦報刊,但使白話有效取代文言成為主流書面語的是五四運動,而五四運動時期倡導白話文最力的莫過於《新青年》雜誌。過去學者主要以文本分析為研究方法,就理論建樹、創作實踐和議論宣傳等方面,探討《新青年》雜誌對白話文通行和白話文學發展的貢獻。然而,漢語書面語由以文言為主變為白話當家,從文人學者在《新青年》一類的刊物上提出主張,到真正在社會上普及,歷經了多長的時間?轉變的過程為何?白話什麼時候取代了文言?如何證明?這些問題恐怕難以用傳統的研究方式來回答,因為,再勤奮的研究者也無法以人力按時序遍讀五四前後現存的巨量文獻,一一區分文白計算消長。但我們能否藉助數位研究方法,另闢蹊徑,尋找答案?或許,從建立客觀(而非直覺)判讀文、白篇章的有效工具開始,是一個值得探索的方向。本文以《新青年》全文共十一卷為素材,透過統計方法比較各卷的異同,觀察語言轉換歷程,尋找可以建立客觀判讀文、白篇章的指標。使用的方法大致分為兩類:監督學習(supervised learning)、非監督學習(un-supervised learning)。第一類先設定比較用的指標(或是變數、關鍵詞),再分析各卷的指標特性;第二類不預設比較標的,以不同角度探討文章風格,藉以找出區隔文、白篇章的關鍵因素。本研究的監督學習選用文、白的特定虛字,選擇虛字而非實詞作為統計對象,乃是為了將文章內容對語言形式的影響降至最低,驗證從慣用虛字區別文、白篇章的可行性。非監督學習的分析角度以用字、句子架構為主要方向,因為字彙多寡、使用頻率等統計數據,在比較文學中歷來都用以判斷寫作風格。無論監督學習式的虛字分析或非監督學習式的用字習慣分析,都能反映出《新青年》初期與晚期文體的變化。就發展客觀判讀工具而言,以虛字為指標也許較具潛力。值得注意的是,在總字數、不同字數、每句字數等的比較中,我們發現文言與白話有著明顯差異:文言篇章總字數少而用字多,白話篇章則是總字數多而用字少。明顯可看出白話文主要俾利於世俗啟蒙,因而總字數多而用字少;此外,我們或可借用生物多樣性的概念,追問文言、白話兩者內部生態系的差異;並進一步思考,在這樣的差異下,除了虛字、字彙總數及其使用比例之外,還有哪些具有成為客觀區辨指標潛力的語言表徵,值得我們繼續開發。
dc.description.abstract (摘要) Is it possible for computers to tell whether a text was written in classical Chinese or vernacular modern Chinese? Can the new developments of digital humanities help find out the transformation of written Chinese language during the late Qing and early Republic? As previous scholars have pointed out, in the early stage of the history of modern Chinese, missionaries and reformists only used vernacular language as a tool to enlighten the public. Classical Chinese remained the standard written language until May Forth Movement in 1919, when Xin Qing Nian became the most influential publication. Throughout the last century, scholars have scrutinized the theoretical arguments and creative writing practices in Xin Qing Nian and several other progressive magazines to delineate the changing history of the language. But questions such as how long did it take for literati as well as the general public to adopt the vernacular language as the written standard, or how did the new standard spread from radical revolutionary magazines to other publications like entertainment magazines or newspapers, remain unanswered. If we can teach computers to distinguish between classical and modern Chinese, it would be possible to bring in much more digitized texts in that period to study and to answer those questions. To achieve this goal, we adopt the concept of "genome mapping" to differentiate between classical and modern Chinese in this study. We propose two approaches, supervised learning and un-supervised learning, to compare the differences in writing style between classical Chinese and modern Chinese. In addition to concepts and methods used in a lexical analysis, we also adapt the ideas in ecology. Supervised learning has long been used in linguistics to differentiate authorship via keywords. We choose ten function words for classical and modern Chinese each as the keywords, and we use Gini`s index of volumes 1 and 11 from Xin Qing Nian to demonstrate the comparison. There are no standard operating procedures for applying the unsupervised learning, and it is the main reason why this type of approaches is difficult to implement. In this study, we choose the diversity indices for un-supervising learning, for example, Gini`s index, entropy, and Simpson`s index, for measuring the statistical dispersion and evenness (or equality) of the words used. Based on our analyses, it seems that the later volumes (such as Volume 11) have lower species diversity, indicating that people can read articles without recognizing many words, which matches to the purpose of the May 4^(th) Movement.
dc.format.extent 1861251 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) 東亞觀念史集刊, 7, 427-454
dc.subject (關鍵詞) 文體分析 ; 五四運動 ; 《新青年》 ; 虛字分析 ; 生物多樣性
dc.subject (關鍵詞) Stylistic Analysis ; May 4^th Movement ; La Jeunesse ; Function Words Analysis ; Species Diversity
dc.title (題名) 從文言到白話:《新青年》雜誌語言變化統計研究zh_TW
dc.title.alternative (其他題名) From Classical Chinese to Modern Chinese: A Study of Function Words from Xin Qing Nian
dc.type (資料類型) article