學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 分類錯誤資料在母體異質下的馬可夫模型
A mixture model for heterogeneous ordinal data with misclassification作者 李依璇
Lee, Yi-Shiuan貢獻者 黃佳慧
Huang, Chia-Hui
李依璇
Lee, Yi-Shiuan關鍵詞 縱向資料
羅吉斯迴歸
潛在類別
隱藏式馬可夫模型
分類錯誤
Hidden Markov Model
Latent class
Logistic regression
Longitudinal data
Misclassification日期 2022 上傳時間 1-八月-2022 17:15:10 (UTC+8) 摘要 本研究欲觀察一系列屬於定序變量的縱向資料,並假設母體是由兩個具備 相異特質的群體所組成,以群體劃分母體的方式來處理縱向資料中常見的組間 相異現象,而同一個對象的數個數據間所存在的相關性則以馬可夫模型解釋。另一方面定序變量是由三個類別組成,每一個類別皆被視為一種馬可夫狀態,並且假設不同的群體有相異的狀態空間。在蒐集數據時,測量誤差會使得部分資料的分類有誤,也就是說觀察到的馬可夫鏈未必皆是正確的。為了處理個體異質性以及測量誤差的問題,本研究利用混合馬可夫模型以及隱藏式馬可夫模型的概念,以羅吉斯迴歸分別建立群體類別、給定群體條件之下的初始狀態與狀態轉換的機率模型。計算概似函數時將機率表示為所有可能的馬可夫鏈與群體之聯合機率的加總,以排除測量誤差所產生的錯誤資訊,再利用 R 語言中的 “constrOptim”套件,放入對數概似函數和分數函數求出最大概似估計量。最後由四組不同的參數值進行電腦模擬,以偏誤、標準差、標準誤和覆蓋率這四種指標來評估所提出的統計模型,結果顯示樣本的分佈情形不影響評估表現,而參數估計偏誤與測量誤差之間的關係也是合乎預期的。
The aim of this work is to provide a model for longitudinal data which has the characteristics of heterogeneity in population and correlation within subjects. In this study, the former can be explained by supposing the population consists of several unobservable subgroups with distint features while the latter can be captured by Markov models, in which the Markov states are assumed to be ordinal variables. Furthermore, some observed states are subject to misclassification owing to the measurement error; hence both groups and Markov states without misclassification are latent variables. To address this, mixture Markov chain model and hidden Markov model are used in the analysis of misclassified heterogenous ordinal data. The models of subpopulation membership, subpopuation-specific initial states as well as transition patterns are proposed with logistic regression models. Simulations are conducted under four different parameters settings and maximum likelihood estimators are solved by using the function "constrOptim" in software R. Our simulation results suggest that the estimations, in terms of bias, standard deviation, standard error and coverage probability, are robust to the frequencies of the observed states. In addition, the dependence between esitmation biases and measurement error rates are in line with expectations.參考文獻 Albert, P. S. (1994). A markov model for sequences of ordinal data from a relapsing- remitting disease. Biometrics, pages 51–60.Bahl, L., Brown, P., De Souza, P., and Mercer, R. (1986). Maximum mutual infor- mation estimation of hidden markov model parameters for speech recognition. In ICASSP’86. IEEE International Conference on Acoustics, Speech, and Signal Pro- cessing, volume 11, pages 49–52. IEEE.Bartolucci, F., Farcomeni, A., and Pennoni, F. (2012). Latent Markov models for lon- gitudinal data. CRC Press.Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164–171.Chaijareenont, K., Sirimai, K., Boriboonhirunsarn, D., and Kiriwat, O. (2004). Accu- racy of nugent’s score and each amsel’s criteria in the diagnosis of bacterial vaginosis. J Med Assoc Thai, 87(11):1270–1274.Cheon, K., Thoma, M. E., Kong, X., and Albert, P. S. (2014). A mixture of transition models for heterogeneous longitudinal ordinal data: with applications to longitudinal bacterial vaginosis data. Statistics in medicine, 33(18):3204–3213.Clark, T. S. and Linzer, D. A. (2015). Should i use fixed or random effects? Political science research and methods, 3(2):399–408.Cook, R. J. (1999). A mixed model for two-state markov processes under panel obser- vation. Biometrics, 55(3):915–920.Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22.Goodman, L. A. (1961). Statistical methods for the mover-stayer model. Journal of the American Statistical Association, 56(296):841–868.Haussler, D. K. D. and Eeckman, M. G. R. F. H. (1996). A generalized hidden markov model for the recognition of human genes in dna. In Proc. int. conf. on intelligent systems for molecular biology, st. louis, pages 134–142.Koumans, E. H. and Kendrick, J. S. (2001). Preventing adverse sequelae of bacterial vaginosis: a public health program and research agenda. Sexually transmitted dis- eases, pages 292–297.Krumbein, W. C. and Dacey, M. F. (1969). Markov chains and embedded markov chains in geology. Journal of the International Association for Mathematical Geology, 1(1): 79–96.Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, pages 963–974.Markov, A. A. (1906). Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga. Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom univer- sitete, 15(135-156):18.Norris, J. R. (1998). Markov chains. Number 2. Cambridge university press.Nugent, R. P., Krohn, M. A., and Hillier, S. L. (1991). Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation. Journal of clinical microbiology, 29(2):297–301.Poulsen, C. S. (1983). Latent structure analysis with choice modeling applications. PhD thesis, University of Pennsylvania.Sanders, K. L., Thoma, M. E., Yu, K., and Albert, P. S. (2011). An evaluation of the natural history of bacterial vaginosis using transition models. Sexually transmitted diseases, 38(12):1131. 描述 碩士
國立政治大學
統計學系
109354006資料來源 http://thesis.lib.nccu.edu.tw/record/#G0109354006 資料類型 thesis dc.contributor.advisor 黃佳慧 zh_TW dc.contributor.advisor Huang, Chia-Hui en_US dc.contributor.author (作者) 李依璇 zh_TW dc.contributor.author (作者) Lee, Yi-Shiuan en_US dc.creator (作者) 李依璇 zh_TW dc.creator (作者) Lee, Yi-Shiuan en_US dc.date (日期) 2022 en_US dc.date.accessioned 1-八月-2022 17:15:10 (UTC+8) - dc.date.available 1-八月-2022 17:15:10 (UTC+8) - dc.date.issued (上傳時間) 1-八月-2022 17:15:10 (UTC+8) - dc.identifier (其他 識別碼) G0109354006 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/141005 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 109354006 zh_TW dc.description.abstract (摘要) 本研究欲觀察一系列屬於定序變量的縱向資料,並假設母體是由兩個具備 相異特質的群體所組成,以群體劃分母體的方式來處理縱向資料中常見的組間 相異現象,而同一個對象的數個數據間所存在的相關性則以馬可夫模型解釋。另一方面定序變量是由三個類別組成,每一個類別皆被視為一種馬可夫狀態,並且假設不同的群體有相異的狀態空間。在蒐集數據時,測量誤差會使得部分資料的分類有誤,也就是說觀察到的馬可夫鏈未必皆是正確的。為了處理個體異質性以及測量誤差的問題,本研究利用混合馬可夫模型以及隱藏式馬可夫模型的概念,以羅吉斯迴歸分別建立群體類別、給定群體條件之下的初始狀態與狀態轉換的機率模型。計算概似函數時將機率表示為所有可能的馬可夫鏈與群體之聯合機率的加總,以排除測量誤差所產生的錯誤資訊,再利用 R 語言中的 “constrOptim”套件,放入對數概似函數和分數函數求出最大概似估計量。最後由四組不同的參數值進行電腦模擬,以偏誤、標準差、標準誤和覆蓋率這四種指標來評估所提出的統計模型,結果顯示樣本的分佈情形不影響評估表現,而參數估計偏誤與測量誤差之間的關係也是合乎預期的。 zh_TW dc.description.abstract (摘要) The aim of this work is to provide a model for longitudinal data which has the characteristics of heterogeneity in population and correlation within subjects. In this study, the former can be explained by supposing the population consists of several unobservable subgroups with distint features while the latter can be captured by Markov models, in which the Markov states are assumed to be ordinal variables. Furthermore, some observed states are subject to misclassification owing to the measurement error; hence both groups and Markov states without misclassification are latent variables. To address this, mixture Markov chain model and hidden Markov model are used in the analysis of misclassified heterogenous ordinal data. The models of subpopulation membership, subpopuation-specific initial states as well as transition patterns are proposed with logistic regression models. Simulations are conducted under four different parameters settings and maximum likelihood estimators are solved by using the function "constrOptim" in software R. Our simulation results suggest that the estimations, in terms of bias, standard deviation, standard error and coverage probability, are robust to the frequencies of the observed states. In addition, the dependence between esitmation biases and measurement error rates are in line with expectations. en_US dc.description.tableofcontents 第一章 緒論 1第一節 前言 1第二節 研究動機 2第二章 文獻回顧 5第一節 單一馬可夫鏈模型 5第二節 混合馬可夫模型 7第三節 隱藏式馬可夫模型 8第三章 研究方法 10第一節 模型假設 10第二節 統計模型 12第三節 統計推論 14第四章 模擬分析 18第一節 資料生成 18第二節 模擬評估與結果 19第五章 結論 34參考文獻 36 zh_TW dc.format.extent 1110837 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0109354006 en_US dc.subject (關鍵詞) 縱向資料 zh_TW dc.subject (關鍵詞) 羅吉斯迴歸 zh_TW dc.subject (關鍵詞) 潛在類別 zh_TW dc.subject (關鍵詞) 隱藏式馬可夫模型 zh_TW dc.subject (關鍵詞) 分類錯誤 zh_TW dc.subject (關鍵詞) Hidden Markov Model en_US dc.subject (關鍵詞) Latent class en_US dc.subject (關鍵詞) Logistic regression en_US dc.subject (關鍵詞) Longitudinal data en_US dc.subject (關鍵詞) Misclassification en_US dc.title (題名) 分類錯誤資料在母體異質下的馬可夫模型 zh_TW dc.title (題名) A mixture model for heterogeneous ordinal data with misclassification en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Albert, P. S. (1994). A markov model for sequences of ordinal data from a relapsing- remitting disease. Biometrics, pages 51–60.Bahl, L., Brown, P., De Souza, P., and Mercer, R. (1986). Maximum mutual infor- mation estimation of hidden markov model parameters for speech recognition. In ICASSP’86. IEEE International Conference on Acoustics, Speech, and Signal Pro- cessing, volume 11, pages 49–52. IEEE.Bartolucci, F., Farcomeni, A., and Pennoni, F. (2012). Latent Markov models for lon- gitudinal data. CRC Press.Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164–171.Chaijareenont, K., Sirimai, K., Boriboonhirunsarn, D., and Kiriwat, O. (2004). Accu- racy of nugent’s score and each amsel’s criteria in the diagnosis of bacterial vaginosis. J Med Assoc Thai, 87(11):1270–1274.Cheon, K., Thoma, M. E., Kong, X., and Albert, P. S. (2014). A mixture of transition models for heterogeneous longitudinal ordinal data: with applications to longitudinal bacterial vaginosis data. Statistics in medicine, 33(18):3204–3213.Clark, T. S. and Linzer, D. A. (2015). Should i use fixed or random effects? Political science research and methods, 3(2):399–408.Cook, R. J. (1999). A mixed model for two-state markov processes under panel obser- vation. Biometrics, 55(3):915–920.Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22.Goodman, L. A. (1961). Statistical methods for the mover-stayer model. Journal of the American Statistical Association, 56(296):841–868.Haussler, D. K. D. and Eeckman, M. G. R. F. H. (1996). A generalized hidden markov model for the recognition of human genes in dna. In Proc. int. conf. on intelligent systems for molecular biology, st. louis, pages 134–142.Koumans, E. H. and Kendrick, J. S. (2001). Preventing adverse sequelae of bacterial vaginosis: a public health program and research agenda. Sexually transmitted dis- eases, pages 292–297.Krumbein, W. C. and Dacey, M. F. (1969). Markov chains and embedded markov chains in geology. Journal of the International Association for Mathematical Geology, 1(1): 79–96.Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, pages 963–974.Markov, A. A. (1906). Rasprostranenie zakona bol’shih chisel na velichiny, zavisyaschie drug ot druga. Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom univer- sitete, 15(135-156):18.Norris, J. R. (1998). Markov chains. Number 2. Cambridge university press.Nugent, R. P., Krohn, M. A., and Hillier, S. L. (1991). Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation. Journal of clinical microbiology, 29(2):297–301.Poulsen, C. S. (1983). Latent structure analysis with choice modeling applications. PhD thesis, University of Pennsylvania.Sanders, K. L., Thoma, M. E., Yu, K., and Albert, P. S. (2011). An evaluation of the natural history of bacterial vaginosis using transition models. Sexually transmitted diseases, 38(12):1131. zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202200708 en_US