學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 探討兩資料集之相關性
Exploring the correlation between two datasets
作者 李其軒
Li, Qi-Xuan
貢獻者 鄭宗記
Cheng, Tsung-Chi
李其軒
Li, Qi-Xuan
關鍵詞 Mantel 檢定
典型相關分析
RV係數
PROTEST
距離共變異數檢定
歐氏距離
馬氏距離
皮爾森相關係數距離
Mantel test
Canonical correlation analysis
RV coefficient
PROTEST
Distance covariance test
Euclidean distance
Mahalanobis distance
Pearson correlation distance
日期 2023
上傳時間 2-Aug-2023 13:04:38 (UTC+8)
摘要 在生物統計或生態統計研究中,衡量兩組多維度資料集相關性是重要課題,統計方法中衡量兩資料集相關性除了典型相關係數分析(canonical correlation analysis)外,本研究探討其他方法,包括Mantel檢定(Mantel test)、RV係數(RV coefficient)、PROTEST(Procrustean randomization test)、距離共變異數檢定(distance covariance test),並且比較這幾種方法在不同的資料形態下優劣。Mantel檢定以及距離共變異數檢定需要透過距離來衡量資料集的相關性,本文除了使用Mantel檢定以及距離共變異數檢定常見的歐氏距離(Euclidean distance)外,也加入馬氏距離(Mahalanobis distance)和皮爾森相關係數距離(Pearson correlation distance),比較不同距離方法是否影響檢定效果。透過電腦模擬一般多元常態分配資料以及模擬非常態分配資料,針對每個模型分配改變資料的樣本數、資料的維度、資料變數的變異數,並且依據每種檢定的檢定力(power)和檢定力圖(power curve),來比較各檢定的效果,最後利用美國黃鶯(American wood warbler)音符結構與鳥鳴聲、小白鼠基因與體內脂肪酸兩實證資料集觀察各檢定的檢定結果。
In biological statistics or ecological statistics research, assessing the correlation between two multidimensional datasets is an important topic. In addition to canonical correlation analysis, this study explores other methods for measuring the correlation between two datasets. These methods include the Mantel test, RV coefficient, PROTEST (Procrustean randomization test), and distance covariance test. The study compares the performance of these methods under different data structures. The Mantel test and distance covariance test require the use of distance measures to quantify the similarity between datasets. In this study, besides the commonly used Euclidean distance, Mahalanobis distance and Pearson correlation distance are also employed to examine whether different distance measures affect the test results. Computer simulations are conducted using multivariate normal distribution data and non-normal distribution data. The sample size, dimensionality of the data, and variance of the data variables are varied for each simulated model. The effectiveness of each test is compared based on the test power and power curves. Finally, the empirical datasets of American wood warbler song structures and gene expression with hepatic fatty acids in mice are used to observe the test results of each method.
參考文獻 Abdi, H. (2011). Conguence: Congruence coefficient, RV-coefficient, and Mantel coefficient. pp. 1-15.
Buskirk, J.V. (1997). Independent evolution of song structure and note structure in American wood warblers. Proceedings of the Royal Society of London. Series B: Biological Sciences, 264(1382), pp. 755-761.
Diniz-Filho, J. A., Soares, T. N., Lima, J. S., Dobrovolski, R., Landeiro, V. L., de Campos Telles, M. P., Rangel, T. F., & Bini, L. M. (2013). Mantel test in population genetics. Genetics and molecular biology, 36(4), pp. 475-485.
Dow, M. M., & Cheverud, J. M. (1985). Comparison of distance matrices in studies of population structure and genetic microdifferentiation: quadratic assignment. American journal of physical anthropology, 68(3), pp. 367-373.
Dutilleul, P., Stockwell, J.D., Frigon, D., & Legendre, P. (2000). The Mantel test versus Pearson`s correlation analysis Assessment of the differences for biological and environmental studies. Journal of Agricultural Biological and Environmental Statistics, 5(2), pp. 131-150.
Escoufier, Y. (1973). Le traitement des variables vectorielles. Biometrics, 29, pp. 751-760.
Ghorbani, H.R. (2019). Mahalanobis distance and its application for detecting multivariate outliers. Facta Universitatis Series Mathematics and Informatics, 34(3), pp. 583-595.
González, I. ., Déjean, S., Martin, P. . G. P., & Baccini, A. (2008). CCA: An R Package to Extend Canonical Correlation Analysis. Journal of Statistical Software, 23(12), pp. 1-14.
Goslee, S.C., & Urban, D.L. (2007). The ecodist Package for Dissimilarity-based Analysis of Ecological Data. Journal of Statistical Software, 22(7), pp. 1-19.
Härdle W. K., & Simar L.. (2015). "Canonical Correlation Analysis". Applied Multivariate Statistical Analysis., pp. 321-330.
Hotelling, H. (1935). The most predictable criterion. Journal of Educational Psychology, 26, pp. 139-142.
Husson, F., Lê, S., Mazet, J. (2007). FactoMineR: Factor Analysis and Data Mining with R. R package version 1.05. https://CRAN.R-project.org/package=FactoMineR
Jackson, D. A. (1995). PROTEST: a Procrustean randomization test of community environment concordance. Écoscience, 2(3), pp. 297-303.
Josse, J., Pagès, J., & Husson, F. (2008). Testing the significance of the RV coefficient. Computational Statistics & Data Analysis, 53(1), pp. 82-91.
Legendre, P. and Legendre, L. (1998). Numerical ecology (2nd ed.). Amsterdam: Elsevier.
Legendre, P., & Fortin, M. J. (2010). Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Molecular ecology resources, 10(5), pp. 831-844.
Legendre, P., Fortin, M., & Borcard, D. (2015). Should the Mantel test be used in spatial analysis? Methods in Ecology and Evolution, 6(11), pp. 1239-1247.
Liu, G., Yang, S., Liu, W., Wang, S., Tai, P., Kou, F., Jia, W., Han, K., Liu, M., & He, Y. (2020). Canonical Correlation Analysis on the Association Between Sleep Quality and Nutritional Status Among Centenarians in Hainan. Frontiers in public health, 8, pp. 1-7.
Lyu, J., & Nadarajah , S. (2022). New bivariate and multivariate log-normal distributions as models for insurance data. Results in Applied Mathematics, 14(87), pp. 1-26.
Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 2(1), pp. 49-55.
Mantel N. (1967). The detection of disease clustering and a generalized regression approach. Cancer research, 27(2), pp. 209-220.
Mantel, N., & Valand, R. S. (1970). A technique of nonparametric multivariate analysis. Biometrics, 26(3), pp. 547-558.
Martin, P. G., Guillou, H., Lasserre, F., Déjean, S., Lan, A., Pascussi, J. M., Sancristobal, M., Legrand, P., Besse, P., & Pineau, T. (2007). Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology (Baltimore, Md.), 45(3), pp. 767-777.
McLachlan, G.J. (1999). Mahalanobis distance. Resonance, 4(6), pp. 20-26.
Oksanen, F.J., et al. (2017). Vegan: Community Ecology Package. R package Version 2.4-3. https://CRAN.R-project.org/package=vegan.
Omelka, M., & Hudecová, Š. (2013). A comparison of the Mantel test with a generalised distance covariance test. Environmetrics, 24(7), pp. 449-460.
Peres-Neto, P. R., & Jackson, D. A. (2001). How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia, 129(2), pp. 169-178.
Silva, A., Dias, C.T., Cecon, P., & Rêgo, E. (2015). An alternative procedure for performing a power analysis of Mantel`s test. Journal of Applied Statistics, 42(9), pp. 1984-1992.
Stöckl, S., & Hanke, M. (2014). Financial Applications of the Mahalanobis Distance. Applied Economics and Finance, 1(2), pp. 78-84.
Székely, Gá. J., Rizzo, M. L. & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35, pp. 2769-2794.
van Schaik, C. P., Ancrenaz, M., Borgen, G., Galdikas, B., Knott, C. D., Singleton, I., Suzuki, A., Utami, S. S., & Merrill, M. (2003). Orangutan cultures and the evolution of material culture. Science (New York, N.Y.), 299(5603), pp. 102-105.
描述 碩士
國立政治大學
統計學系
110354014
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110354014
資料類型 thesis
dc.contributor.advisor 鄭宗記zh_TW
dc.contributor.advisor Cheng, Tsung-Chien_US
dc.contributor.author (Authors) 李其軒zh_TW
dc.contributor.author (Authors) Li, Qi-Xuanen_US
dc.creator (作者) 李其軒zh_TW
dc.creator (作者) Li, Qi-Xuanen_US
dc.date (日期) 2023en_US
dc.date.accessioned 2-Aug-2023 13:04:38 (UTC+8)-
dc.date.available 2-Aug-2023 13:04:38 (UTC+8)-
dc.date.issued (上傳時間) 2-Aug-2023 13:04:38 (UTC+8)-
dc.identifier (Other Identifiers) G0110354014en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146308-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 110354014zh_TW
dc.description.abstract (摘要) 在生物統計或生態統計研究中,衡量兩組多維度資料集相關性是重要課題,統計方法中衡量兩資料集相關性除了典型相關係數分析(canonical correlation analysis)外,本研究探討其他方法,包括Mantel檢定(Mantel test)、RV係數(RV coefficient)、PROTEST(Procrustean randomization test)、距離共變異數檢定(distance covariance test),並且比較這幾種方法在不同的資料形態下優劣。Mantel檢定以及距離共變異數檢定需要透過距離來衡量資料集的相關性,本文除了使用Mantel檢定以及距離共變異數檢定常見的歐氏距離(Euclidean distance)外,也加入馬氏距離(Mahalanobis distance)和皮爾森相關係數距離(Pearson correlation distance),比較不同距離方法是否影響檢定效果。透過電腦模擬一般多元常態分配資料以及模擬非常態分配資料,針對每個模型分配改變資料的樣本數、資料的維度、資料變數的變異數,並且依據每種檢定的檢定力(power)和檢定力圖(power curve),來比較各檢定的效果,最後利用美國黃鶯(American wood warbler)音符結構與鳥鳴聲、小白鼠基因與體內脂肪酸兩實證資料集觀察各檢定的檢定結果。zh_TW
dc.description.abstract (摘要) In biological statistics or ecological statistics research, assessing the correlation between two multidimensional datasets is an important topic. In addition to canonical correlation analysis, this study explores other methods for measuring the correlation between two datasets. These methods include the Mantel test, RV coefficient, PROTEST (Procrustean randomization test), and distance covariance test. The study compares the performance of these methods under different data structures. The Mantel test and distance covariance test require the use of distance measures to quantify the similarity between datasets. In this study, besides the commonly used Euclidean distance, Mahalanobis distance and Pearson correlation distance are also employed to examine whether different distance measures affect the test results. Computer simulations are conducted using multivariate normal distribution data and non-normal distribution data. The sample size, dimensionality of the data, and variance of the data variables are varied for each simulated model. The effectiveness of each test is compared based on the test power and power curves. Finally, the empirical datasets of American wood warbler song structures and gene expression with hepatic fatty acids in mice are used to observe the test results of each method.en_US
dc.description.tableofcontents 第一章、緒論 1
第一節、研究動機與目的 1
第二節、研究架構 2
第二章、研究方法 3
第一節、距離矩陣 3
1.1 歐氏距離 3
1.2 馬氏距離 4
1.3 皮爾森相關係數距離 4
第二節、Mantel檢定 5
第三節、典型相關分析 6
第四節、RV係數 7
第五節、PROTEST 9
第六節、距離共變異數檢定 11
第三章、模擬分析 12
第一節、模擬設計 12
1.1 多元常態分配 12
1.2 多元對數常態分配模型 21
第二節、模擬結果 24
2.1 多元常態分配 24
2.2 多元對數常態分配 98
第四章、實證資料分析 135
第一節、美國黃鶯鳥鳴聲與音符結構 135
第二節、Nutrimouse資料集 138
第五章、結論與建議 142
第一節、結論 142
第二節、未來建議 143
第六章、參考文獻 144
zh_TW
dc.format.extent 37094825 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110354014en_US
dc.subject (關鍵詞) Mantel 檢定zh_TW
dc.subject (關鍵詞) 典型相關分析zh_TW
dc.subject (關鍵詞) RV係數zh_TW
dc.subject (關鍵詞) PROTESTzh_TW
dc.subject (關鍵詞) 距離共變異數檢定zh_TW
dc.subject (關鍵詞) 歐氏距離zh_TW
dc.subject (關鍵詞) 馬氏距離zh_TW
dc.subject (關鍵詞) 皮爾森相關係數距離zh_TW
dc.subject (關鍵詞) Mantel testen_US
dc.subject (關鍵詞) Canonical correlation analysisen_US
dc.subject (關鍵詞) RV coefficienten_US
dc.subject (關鍵詞) PROTESTen_US
dc.subject (關鍵詞) Distance covariance testen_US
dc.subject (關鍵詞) Euclidean distanceen_US
dc.subject (關鍵詞) Mahalanobis distanceen_US
dc.subject (關鍵詞) Pearson correlation distanceen_US
dc.title (題名) 探討兩資料集之相關性zh_TW
dc.title (題名) Exploring the correlation between two datasetsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Abdi, H. (2011). Conguence: Congruence coefficient, RV-coefficient, and Mantel coefficient. pp. 1-15.
Buskirk, J.V. (1997). Independent evolution of song structure and note structure in American wood warblers. Proceedings of the Royal Society of London. Series B: Biological Sciences, 264(1382), pp. 755-761.
Diniz-Filho, J. A., Soares, T. N., Lima, J. S., Dobrovolski, R., Landeiro, V. L., de Campos Telles, M. P., Rangel, T. F., & Bini, L. M. (2013). Mantel test in population genetics. Genetics and molecular biology, 36(4), pp. 475-485.
Dow, M. M., & Cheverud, J. M. (1985). Comparison of distance matrices in studies of population structure and genetic microdifferentiation: quadratic assignment. American journal of physical anthropology, 68(3), pp. 367-373.
Dutilleul, P., Stockwell, J.D., Frigon, D., & Legendre, P. (2000). The Mantel test versus Pearson`s correlation analysis Assessment of the differences for biological and environmental studies. Journal of Agricultural Biological and Environmental Statistics, 5(2), pp. 131-150.
Escoufier, Y. (1973). Le traitement des variables vectorielles. Biometrics, 29, pp. 751-760.
Ghorbani, H.R. (2019). Mahalanobis distance and its application for detecting multivariate outliers. Facta Universitatis Series Mathematics and Informatics, 34(3), pp. 583-595.
González, I. ., Déjean, S., Martin, P. . G. P., & Baccini, A. (2008). CCA: An R Package to Extend Canonical Correlation Analysis. Journal of Statistical Software, 23(12), pp. 1-14.
Goslee, S.C., & Urban, D.L. (2007). The ecodist Package for Dissimilarity-based Analysis of Ecological Data. Journal of Statistical Software, 22(7), pp. 1-19.
Härdle W. K., & Simar L.. (2015). "Canonical Correlation Analysis". Applied Multivariate Statistical Analysis., pp. 321-330.
Hotelling, H. (1935). The most predictable criterion. Journal of Educational Psychology, 26, pp. 139-142.
Husson, F., Lê, S., Mazet, J. (2007). FactoMineR: Factor Analysis and Data Mining with R. R package version 1.05. https://CRAN.R-project.org/package=FactoMineR
Jackson, D. A. (1995). PROTEST: a Procrustean randomization test of community environment concordance. Écoscience, 2(3), pp. 297-303.
Josse, J., Pagès, J., & Husson, F. (2008). Testing the significance of the RV coefficient. Computational Statistics & Data Analysis, 53(1), pp. 82-91.
Legendre, P. and Legendre, L. (1998). Numerical ecology (2nd ed.). Amsterdam: Elsevier.
Legendre, P., & Fortin, M. J. (2010). Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Molecular ecology resources, 10(5), pp. 831-844.
Legendre, P., Fortin, M., & Borcard, D. (2015). Should the Mantel test be used in spatial analysis? Methods in Ecology and Evolution, 6(11), pp. 1239-1247.
Liu, G., Yang, S., Liu, W., Wang, S., Tai, P., Kou, F., Jia, W., Han, K., Liu, M., & He, Y. (2020). Canonical Correlation Analysis on the Association Between Sleep Quality and Nutritional Status Among Centenarians in Hainan. Frontiers in public health, 8, pp. 1-7.
Lyu, J., & Nadarajah , S. (2022). New bivariate and multivariate log-normal distributions as models for insurance data. Results in Applied Mathematics, 14(87), pp. 1-26.
Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 2(1), pp. 49-55.
Mantel N. (1967). The detection of disease clustering and a generalized regression approach. Cancer research, 27(2), pp. 209-220.
Mantel, N., & Valand, R. S. (1970). A technique of nonparametric multivariate analysis. Biometrics, 26(3), pp. 547-558.
Martin, P. G., Guillou, H., Lasserre, F., Déjean, S., Lan, A., Pascussi, J. M., Sancristobal, M., Legrand, P., Besse, P., & Pineau, T. (2007). Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology (Baltimore, Md.), 45(3), pp. 767-777.
McLachlan, G.J. (1999). Mahalanobis distance. Resonance, 4(6), pp. 20-26.
Oksanen, F.J., et al. (2017). Vegan: Community Ecology Package. R package Version 2.4-3. https://CRAN.R-project.org/package=vegan.
Omelka, M., & Hudecová, Š. (2013). A comparison of the Mantel test with a generalised distance covariance test. Environmetrics, 24(7), pp. 449-460.
Peres-Neto, P. R., & Jackson, D. A. (2001). How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia, 129(2), pp. 169-178.
Silva, A., Dias, C.T., Cecon, P., & Rêgo, E. (2015). An alternative procedure for performing a power analysis of Mantel`s test. Journal of Applied Statistics, 42(9), pp. 1984-1992.
Stöckl, S., & Hanke, M. (2014). Financial Applications of the Mahalanobis Distance. Applied Economics and Finance, 1(2), pp. 78-84.
Székely, Gá. J., Rizzo, M. L. & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35, pp. 2769-2794.
van Schaik, C. P., Ancrenaz, M., Borgen, G., Galdikas, B., Knott, C. D., Singleton, I., Suzuki, A., Utami, S. S., & Merrill, M. (2003). Orangutan cultures and the evolution of material culture. Science (New York, N.Y.), 299(5603), pp. 102-105.
zh_TW