學術產出-Theses

題名 物種個數的估計與寫作風格的探討
作者 李蕙帆
貢獻者 余清祥
李蕙帆
關鍵詞 物種個數
寫作風格
涵蓋機率
紅樓夢
Number of species
Jackknife
Coverage Probability
The Dream of Red Chamber
日期 2008
上傳時間 18-Sep-2009 20:11:08 (UTC+8)
摘要 在生態學及生物學的研究中,「物種個數」(Number of Species)通常是「生物多樣性」(Species Diversity)的重要測量值,物種個數的多寡、分布與多樣性有相當的關聯。「物種」的概念不侷限於生物,舉凡網路搜尋引擎(Search Engine)使用的關鍵字詞、圖書館分類的數目種類、國際疾病代碼等,都可視為物種。
本文著眼於寫作風格的比較,研究中國知名小說「紅樓夢」,主要探討前八十回與後四十回是否為同一個作者,以估計物種個數的觀點作為寫作風格的比較標準,並以金庸的武俠小說為對照組,驗證分析的結果。本文除了使用除了Efron and Thisted的隨機模型,也考慮藉由區塊抽樣估計母體種類數之Jackknife、Bootstrap、Chao(1992)等估計方法。研究發現Efron and Thisted的模型的估計量容易呈現不穩定的震盪,可能會有無法收歛的問題;而Bootstrap、Jackknife與Chao(1992)則會有高估母體種類數的現象。利用涵蓋機率的概念發現Jackknife與Chao皆在抽出特定比例的樣本數時,估計值涵蓋母體種類數之機率值非常接近1。
The number of species is frequently used to measure the species diversity of a population in studying ecology and biology. There are such relationships between numbers of species and its diversities. The idea of species diversity is not restricted to biology, it receives more applications in recent years. For example, the applications also include key words in search engines, classification`s numbers in a library, and disease types in Measuring health.
This article studies the well-known Chinese novel “The Dream of Red Chamber”, and the goal is to study whether the first 80 and last 40 chapters are from the same author. In particular, methods related the number of species are used to evaluate the goal of study. Also, some Chinese martial novels, by the famous writer Jin Yong, are used as the control group for the methods used. Methods considered in this study include Efron and Thisted’s Model, Jackknife, Bootstrap, estimation method from Chao (1992). We found that Efron and Thisted’s estimates tend to be less stable and slow in convergence. On the other hand, the estimates of Jackknife, Bootstrap, and Chao are likely to be over-biased. However, after some modifications, we found that the Jackknife and Chao’s estimates can be used to provide reliable predictions for the number of species of a finite population, given that part of the population is observed.
參考文獻 Burnham, P. K. & Overton, S. W. (1978). Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika , 65, 3, pp.625-633.
Burnham, P. K. & Overton, S. W. (1979). Robust estimation of population size when capture probability vary among animals. Ecology, 60 (5), pp.927-936.
Chao, A. & Lee, S-M. (1992). Estimating the Number of Classes via Sample Coverage. Journal of American Statistical Association, 87, pp. 210-217.
Chao, A., Ma, M.C., & Yang, M.C.K. (1993). Stopping Rule and Estimation for Recapture Debugging with Unequal Detection Rates. Biometrika, 80, pp.193-201.
Efron, B. & Thisted, R. (1987). Did Shakespeare write a newly-discovered poem? Biometrika, 74, pp.445-455.
Efron, B. & Thisted, R. (1976). Estimation the Number of Unseen Species: How Many Words Did Shakespeare Know? Biometrika, 63, pp.435-447.
Frangos, Christos, C. (1980). Variance estimation for the second-order jackknife. Biometrika, 67, pp.715-8.
Sharot, T. (1976). Sharpening the Jackknife. Biometrika, 63, pp.315-321.
Yue, C.J. , Clayton, M.& Lin, F. (2001). A Nonparametric Estimator of Species Overlap. Biometrika, 57, pp. 743-749.
Yue, J. & Clayton, M. K. (2005). Similarity Measures based on Species Proportions. Communications in Statistics: Theory and Methods, 34, pp.2123-2131.
Viale, D. (1994). Cetaceans as indicators of a progressive degradation of Mediterranean water quality. Intern. J. Environ. Studies, 45, pp.183-198.
王三慶(1994),紅樓夢電腦 <<紅樓夢>>研究與電腦科技,甲戌年台灣文學會議論文。
王吉松(1999),以用字分析紅樓夢之作者問題,國立政治大學碩士論文。
余清祥(1998),統計在紅樓夢的應用,國立政治大學學報,第76期,頁303-327。
李丕強(2005),區塊抽樣的種類數估計與相似性指標,國立清華大學博士論文。
高陽(2005),高陽說曹雪芹,聯經出版社。
陳大康(1987),從數理語言學看後四十回的作者¬─與陳炳藻先生商榷,紅樓夢學刊, 第31期,頁 293-318。
陳炳藻(1983),從「電腦紅學」說起,中報月刊,2月,頁 59-61。
陳炳藻(1982),從字彙上的統計論紅樓夢作者問題,中報月刊,4月,頁 46-51。
趙蓮菊(1995),種類知多少¬─敬獻給衛台灣環保努力的朋友們,數學傳播,十九卷 2期,頁 1-7。
劉玳縈(2008),種類數預測之模擬研究,國立清華大學碩博士論文。
薛明生、賴世剛(2002),人口時空分布冪次定律的普遍性與恆常性─台灣本島實證研究,台灣土地研究 ,第5期,頁 67-86。
描述 碩士
國立政治大學
統計研究所
96354024
97
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0096354024
資料類型 thesis
dc.contributor.advisor 余清祥zh_TW
dc.contributor.author (Authors) 李蕙帆zh_TW
dc.creator (作者) 李蕙帆zh_TW
dc.date (日期) 2008en_US
dc.date.accessioned 18-Sep-2009 20:11:08 (UTC+8)-
dc.date.available 18-Sep-2009 20:11:08 (UTC+8)-
dc.date.issued (上傳時間) 18-Sep-2009 20:11:08 (UTC+8)-
dc.identifier (Other Identifiers) G0096354024en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/36930-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計研究所zh_TW
dc.description (描述) 96354024zh_TW
dc.description (描述) 97zh_TW
dc.description.abstract (摘要) 在生態學及生物學的研究中,「物種個數」(Number of Species)通常是「生物多樣性」(Species Diversity)的重要測量值,物種個數的多寡、分布與多樣性有相當的關聯。「物種」的概念不侷限於生物,舉凡網路搜尋引擎(Search Engine)使用的關鍵字詞、圖書館分類的數目種類、國際疾病代碼等,都可視為物種。
本文著眼於寫作風格的比較,研究中國知名小說「紅樓夢」,主要探討前八十回與後四十回是否為同一個作者,以估計物種個數的觀點作為寫作風格的比較標準,並以金庸的武俠小說為對照組,驗證分析的結果。本文除了使用除了Efron and Thisted的隨機模型,也考慮藉由區塊抽樣估計母體種類數之Jackknife、Bootstrap、Chao(1992)等估計方法。研究發現Efron and Thisted的模型的估計量容易呈現不穩定的震盪,可能會有無法收歛的問題;而Bootstrap、Jackknife與Chao(1992)則會有高估母體種類數的現象。利用涵蓋機率的概念發現Jackknife與Chao皆在抽出特定比例的樣本數時,估計值涵蓋母體種類數之機率值非常接近1。
zh_TW
dc.description.abstract (摘要) The number of species is frequently used to measure the species diversity of a population in studying ecology and biology. There are such relationships between numbers of species and its diversities. The idea of species diversity is not restricted to biology, it receives more applications in recent years. For example, the applications also include key words in search engines, classification`s numbers in a library, and disease types in Measuring health.
This article studies the well-known Chinese novel “The Dream of Red Chamber”, and the goal is to study whether the first 80 and last 40 chapters are from the same author. In particular, methods related the number of species are used to evaluate the goal of study. Also, some Chinese martial novels, by the famous writer Jin Yong, are used as the control group for the methods used. Methods considered in this study include Efron and Thisted’s Model, Jackknife, Bootstrap, estimation method from Chao (1992). We found that Efron and Thisted’s estimates tend to be less stable and slow in convergence. On the other hand, the estimates of Jackknife, Bootstrap, and Chao are likely to be over-biased. However, after some modifications, we found that the Jackknife and Chao’s estimates can be used to provide reliable predictions for the number of species of a finite population, given that part of the population is observed.
en_US
dc.description.tableofcontents 第一章 緒論..............................................1
第一節 前言........................................... 1
第二節 研究動機與目的.................................. 2
第二章 文獻回顧與模式簡介.................................. 4
第一節 紅樓夢相關研究.................................. 4
第二節 估計未來新物種數................................ 5
2.2.1 Efron and Thisted 1976估計量............... 6
2.2.2 Efron and Thisted 1987估計量............... 6
第三節 區塊抽樣估計物種種類數........................... 8
2.3.1 Bootstrap Method.......................... 8
2.3.2 Jackknife Estimate........................ 9
2.3.3 Chao and Lee樣本涵蓋估計量.................. 11
第三章 研究方法與資料..................................... 14
第一節 資料來源...................................... 14
第二節 研究方法...................................... 16
第四章 Efron and Thisted的估計方法 ........................18
第一節 Efron and Thisted(1976)新字估計............... 18
第二節 既有字之估計方法............................... 23
第三節 Efron and Thisted(1987)之變異數............... 25
第四節 Efron and Thisted(1987)之罕見字與新字.......... 30
第五章 區塊抽樣與母體種類數 ................................32
第一節 區塊抽樣之特性................................. 32
第二節 區塊抽樣與金庸小說.............................. 34
第三節 覆蓋機率...................................... 37
第四節 紅樓夢與覆蓋機率............................... 39
第六章 結論與建議......................................... 44
第一節 結論.......................................... 44
第二節 建議.......................................... 45
zh_TW
dc.format.extent 309233 bytes-
dc.format.extent 310201 bytes-
dc.format.extent 350879 bytes-
dc.format.extent 425854 bytes-
dc.format.extent 321202 bytes-
dc.format.extent 546099 bytes-
dc.format.extent 430682 bytes-
dc.format.extent 815109 bytes-
dc.format.extent 995432 bytes-
dc.format.extent 357661 bytes-
dc.format.extent 373731 bytes-
dc.format.extent 669842 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0096354024en_US
dc.subject (關鍵詞) 物種個數zh_TW
dc.subject (關鍵詞) 寫作風格zh_TW
dc.subject (關鍵詞) 涵蓋機率zh_TW
dc.subject (關鍵詞) 紅樓夢zh_TW
dc.subject (關鍵詞) Number of speciesen_US
dc.subject (關鍵詞) Jackknifeen_US
dc.subject (關鍵詞) Coverage Probabilityen_US
dc.subject (關鍵詞) The Dream of Red Chamberen_US
dc.title (題名) 物種個數的估計與寫作風格的探討zh_TW
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) Burnham, P. K. & Overton, S. W. (1978). Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika , 65, 3, pp.625-633.zh_TW
dc.relation.reference (參考文獻) Burnham, P. K. & Overton, S. W. (1979). Robust estimation of population size when capture probability vary among animals. Ecology, 60 (5), pp.927-936.zh_TW
dc.relation.reference (參考文獻) Chao, A. & Lee, S-M. (1992). Estimating the Number of Classes via Sample Coverage. Journal of American Statistical Association, 87, pp. 210-217.zh_TW
dc.relation.reference (參考文獻) Chao, A., Ma, M.C., & Yang, M.C.K. (1993). Stopping Rule and Estimation for Recapture Debugging with Unequal Detection Rates. Biometrika, 80, pp.193-201.zh_TW
dc.relation.reference (參考文獻) Efron, B. & Thisted, R. (1987). Did Shakespeare write a newly-discovered poem? Biometrika, 74, pp.445-455.zh_TW
dc.relation.reference (參考文獻) Efron, B. & Thisted, R. (1976). Estimation the Number of Unseen Species: How Many Words Did Shakespeare Know? Biometrika, 63, pp.435-447.zh_TW
dc.relation.reference (參考文獻) Frangos, Christos, C. (1980). Variance estimation for the second-order jackknife. Biometrika, 67, pp.715-8.zh_TW
dc.relation.reference (參考文獻) Sharot, T. (1976). Sharpening the Jackknife. Biometrika, 63, pp.315-321.zh_TW
dc.relation.reference (參考文獻) Yue, C.J. , Clayton, M.& Lin, F. (2001). A Nonparametric Estimator of Species Overlap. Biometrika, 57, pp. 743-749.zh_TW
dc.relation.reference (參考文獻) Yue, J. & Clayton, M. K. (2005). Similarity Measures based on Species Proportions. Communications in Statistics: Theory and Methods, 34, pp.2123-2131.zh_TW
dc.relation.reference (參考文獻) Viale, D. (1994). Cetaceans as indicators of a progressive degradation of Mediterranean water quality. Intern. J. Environ. Studies, 45, pp.183-198.zh_TW
dc.relation.reference (參考文獻) 王三慶(1994),紅樓夢電腦 <<紅樓夢>>研究與電腦科技,甲戌年台灣文學會議論文。zh_TW
dc.relation.reference (參考文獻) 王吉松(1999),以用字分析紅樓夢之作者問題,國立政治大學碩士論文。zh_TW
dc.relation.reference (參考文獻) 余清祥(1998),統計在紅樓夢的應用,國立政治大學學報,第76期,頁303-327。zh_TW
dc.relation.reference (參考文獻) 李丕強(2005),區塊抽樣的種類數估計與相似性指標,國立清華大學博士論文。zh_TW
dc.relation.reference (參考文獻) 高陽(2005),高陽說曹雪芹,聯經出版社。zh_TW
dc.relation.reference (參考文獻) 陳大康(1987),從數理語言學看後四十回的作者¬─與陳炳藻先生商榷,紅樓夢學刊, 第31期,頁 293-318。zh_TW
dc.relation.reference (參考文獻) 陳炳藻(1983),從「電腦紅學」說起,中報月刊,2月,頁 59-61。zh_TW
dc.relation.reference (參考文獻) 陳炳藻(1982),從字彙上的統計論紅樓夢作者問題,中報月刊,4月,頁 46-51。zh_TW
dc.relation.reference (參考文獻) 趙蓮菊(1995),種類知多少¬─敬獻給衛台灣環保努力的朋友們,數學傳播,十九卷 2期,頁 1-7。zh_TW
dc.relation.reference (參考文獻) 劉玳縈(2008),種類數預測之模擬研究,國立清華大學碩博士論文。zh_TW
dc.relation.reference (參考文獻) 薛明生、賴世剛(2002),人口時空分布冪次定律的普遍性與恆常性─台灣本島實證研究,台灣土地研究 ,第5期,頁 67-86。zh_TW