Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 機器學習方法於分類或預測問題之比較與應用
Machine Learning Methods in Classification or Prediction: Some Comparison and Applications作者 古政弘
Gu, Cheng-Hung貢獻者 張育瑋
Chang, Yu-Wei
古政弘
Gu, Cheng-Hung關鍵詞 分類迴歸樹
貝氏可加性迴歸樹
隨機森林
Classification and Regression Tree
Bayesian Additive Regression Trees
Random Forest日期 2024 上傳時間 5-Aug-2024 13:59:17 (UTC+8) 摘要 近年新的機器學習方法相當蓬勃發展,根據其應變數為連續型或類別型,這 些方法可以被應用於預測或分類問題中。本研究感興趣一些機器學習方法的預測或分類準確度為何,並且特別聚焦於可解釋性的機器學習,因為在實際資料分析中,應用者也常會感興趣自變數與應變數的關係之解釋。在此考慮七種機器學習方法或統計方法:分類迴歸樹(Classification and Regression Tree)、貝氏可加性迴歸樹(Bayesian additive regression trees)、隨機森林(random forest)、多變量適應性迴歸弧線(multivariate adaptive regression splines)、廣義相加模型(generalized additive model)、線性判別分析(linear discriminant analysis)及二次判別分析 (quadratic discriminant analysis),將這些方法分別應用至兩筆實際資料,對於資料的訓練集進行建模,比較各種方法在測試資料集之預測或分類效果。
In recent years, a multitude of machine learning methods have been proposed.Depending on whether the response variable is continuous or ordinal categorical, these methods can be applied to prediction or classification problems. This study is interested in the predictive or classification accuracy of various machine learning methods, with a particular focus on interpretable machine learning. In practical data analysis, users often seek to understand the relationships between independent and dependent variables.We consider seven machine learning and statistical methods: Classification and Regression Tree, Bayesian Additive Regression Trees, Random Forest, Multivariate Adaptive Regression Splines, Generalized Additive Model, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. We apply these methods to two real datasets. Subsequently, we compare the prediction and classification performance of the seven methods on the test sets.參考文獻 Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32. Chipman, H., George, E., & Mcculloch, R. (2010). BART:Bayesian Additive Regression Trees. Annals of Applied Statistics, 4, 266-98 Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82, 371–386. Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Applied Statistics, 19, 1-67. Kim, C., & Park, S. (2022). Comparison of Tree-Based Ensemble Models for Regression. Communications for Statistical Applications and Methods, 29, 561-589. Knežević, marinela., Has, A., & Zekic´ -sušac, M. (2021). Predicting EnergyCost of Public Buildings by Artificial Neural Networks, CART, and Random Forest. Neurocomputing, 439, 223-233. Barros, F., Carvalho, G. C., Costa, Y., & Martins, I. (2022). Sea-Level RiseEffects on Macrozoobenthos Distribution within an Estuarine Gradient Using Species Distribution Modeling. Ecological Informatics, 71, 101816. Hong, H., Naghibi, S.A., Moradi Dashtpagerdi, M. et al. (2017). A comparative between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China. Arab J Geosci 10, 167. VE, S. & Cho, Y. (2020). Season wise bike sharing demand analysis using random forest algorithm. Computational Intelligence, 40. Du, J., Liu, J. S, & Krakovna, V. (2015). Selective Bayesian Forest Classifier Simultaneous Variable Selection and Classification. Arxiv. Martín, B., González–Arias, J., & Vicente–Vírseda, J. A. (2021). Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance. Animal Biodiversity and Conservation, 44.2, 289-301. Gunnarsson, B. R., vanden Broucke, S., Baesens, B., Óskarsdóttir, M., & Lemahieu,W. (2021). Deep learning for credit scoring : do or don’t? EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 295, 292-305. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. I. (1984). Classification and regression trees. Belmont, Calif.:Wadsworth. Chipman, H. A., George, E. I., & Mcculloch, R. E. (1998). Bayesian CART Model Search. Journal of the American Statistical Association, 93, 935- 948. Bleich, J., & Kapelner, A. (2014, November 24). BartMachine: Machine Learning with Bayesian Additive Regression Trees. Arxiv. Mcculloch, R., Spanbauer, C., & Sparapani, R. (2021). Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: TheBART R Package. Journal of Statistical Software, 97, 1–66. Urbanek, S. (2024, January 26). RJava: Low-Level R to Java Interface. Straw I, Wu H. Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ Health CareInform 2022;29:e100457. Prasad babu, M. S., Ramana, B. V., & Venkateswarlu, N. B. (2012). A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis. International Journal of Computer Science Issues, 9, 101-114. Ramana, Bendi., & Venkateswarlu, N. (2012). ILPD (Indian Liver PatientDataset). UCI Machine Learning Repository. Quinlan, R. (1993). Auto MPG. UCI Machine Learning Repository. 描述 碩士
國立政治大學
統計學系
111354005資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111354005 資料類型 thesis dc.contributor.advisor 張育瑋 zh_TW dc.contributor.advisor Chang, Yu-Wei en_US dc.contributor.author (Authors) 古政弘 zh_TW dc.contributor.author (Authors) Gu, Cheng-Hung en_US dc.creator (作者) 古政弘 zh_TW dc.creator (作者) Gu, Cheng-Hung en_US dc.date (日期) 2024 en_US dc.date.accessioned 5-Aug-2024 13:59:17 (UTC+8) - dc.date.available 5-Aug-2024 13:59:17 (UTC+8) - dc.date.issued (上傳時間) 5-Aug-2024 13:59:17 (UTC+8) - dc.identifier (Other Identifiers) G0111354005 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/152775 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 111354005 zh_TW dc.description.abstract (摘要) 近年新的機器學習方法相當蓬勃發展,根據其應變數為連續型或類別型,這 些方法可以被應用於預測或分類問題中。本研究感興趣一些機器學習方法的預測或分類準確度為何,並且特別聚焦於可解釋性的機器學習,因為在實際資料分析中,應用者也常會感興趣自變數與應變數的關係之解釋。在此考慮七種機器學習方法或統計方法:分類迴歸樹(Classification and Regression Tree)、貝氏可加性迴歸樹(Bayesian additive regression trees)、隨機森林(random forest)、多變量適應性迴歸弧線(multivariate adaptive regression splines)、廣義相加模型(generalized additive model)、線性判別分析(linear discriminant analysis)及二次判別分析 (quadratic discriminant analysis),將這些方法分別應用至兩筆實際資料,對於資料的訓練集進行建模,比較各種方法在測試資料集之預測或分類效果。 zh_TW dc.description.abstract (摘要) In recent years, a multitude of machine learning methods have been proposed.Depending on whether the response variable is continuous or ordinal categorical, these methods can be applied to prediction or classification problems. This study is interested in the predictive or classification accuracy of various machine learning methods, with a particular focus on interpretable machine learning. In practical data analysis, users often seek to understand the relationships between independent and dependent variables.We consider seven machine learning and statistical methods: Classification and Regression Tree, Bayesian Additive Regression Trees, Random Forest, Multivariate Adaptive Regression Splines, Generalized Additive Model, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. We apply these methods to two real datasets. Subsequently, we compare the prediction and classification performance of the seven methods on the test sets. en_US dc.description.tableofcontents 第一章 緒論 1 第二章 模型介紹 3 2.1 CART 3 2.1.1 迴歸樹 3 2.1.2 分類樹 4 2.2 BART 模型 5 2.2.1 先驗分配之設定 5 2.2.2 後驗分配之統計推論 9 2.3 隨機森林 9 2.4 MARS 10 2.5 GAM 11 2.6 LDA & QDA 12 2.7 建模使用之軟體套件 13 第三章 類別型應變數之資料分析 14 3.1 分類模型建模之評比指標 14 3.2 ILPD 資料 (Indian Liver Patient Dataset) 15 3.2.1 資料介紹與描述性統計 15 3.2.2 ILPD 資料之建模與預測 17 3.3 生育力資料 (Fertility Dataset) 19 3.3.1 資料介紹與描述性統計 19 3.3.2 生育力資料之建模與預測 20 3.4 類別型應變數資料集總結 22 第四章 連續型應變數之資料分析 24 4.1 預測模型建模之評比指標 24 4.2 Auto MPG 資料 24 4.2.1 資料介紹與描述性統計 24 4.2.2 Auto MPG 資料之建模與預測 26 4.3 房地產估價(Real Estate Valuation)資料 27 4.3.1 資料介紹與描述性統計 27 4.3.2 房地產估價資料之建模與預測 29 4.4 連續型應變數資料集總結 31 第五章 結論 32 參考文獻 33 zh_TW dc.format.extent 3211033 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111354005 en_US dc.subject (關鍵詞) 分類迴歸樹 zh_TW dc.subject (關鍵詞) 貝氏可加性迴歸樹 zh_TW dc.subject (關鍵詞) 隨機森林 zh_TW dc.subject (關鍵詞) Classification and Regression Tree en_US dc.subject (關鍵詞) Bayesian Additive Regression Trees en_US dc.subject (關鍵詞) Random Forest en_US dc.title (題名) 機器學習方法於分類或預測問題之比較與應用 zh_TW dc.title (題名) Machine Learning Methods in Classification or Prediction: Some Comparison and Applications en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32. Chipman, H., George, E., & Mcculloch, R. (2010). BART:Bayesian Additive Regression Trees. Annals of Applied Statistics, 4, 266-98 Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82, 371–386. Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Applied Statistics, 19, 1-67. Kim, C., & Park, S. (2022). Comparison of Tree-Based Ensemble Models for Regression. Communications for Statistical Applications and Methods, 29, 561-589. Knežević, marinela., Has, A., & Zekic´ -sušac, M. (2021). Predicting EnergyCost of Public Buildings by Artificial Neural Networks, CART, and Random Forest. Neurocomputing, 439, 223-233. Barros, F., Carvalho, G. C., Costa, Y., & Martins, I. (2022). Sea-Level RiseEffects on Macrozoobenthos Distribution within an Estuarine Gradient Using Species Distribution Modeling. Ecological Informatics, 71, 101816. Hong, H., Naghibi, S.A., Moradi Dashtpagerdi, M. et al. (2017). A comparative between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China. Arab J Geosci 10, 167. VE, S. & Cho, Y. (2020). Season wise bike sharing demand analysis using random forest algorithm. Computational Intelligence, 40. Du, J., Liu, J. S, & Krakovna, V. (2015). Selective Bayesian Forest Classifier Simultaneous Variable Selection and Classification. Arxiv. Martín, B., González–Arias, J., & Vicente–Vírseda, J. A. (2021). Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance. Animal Biodiversity and Conservation, 44.2, 289-301. Gunnarsson, B. R., vanden Broucke, S., Baesens, B., Óskarsdóttir, M., & Lemahieu,W. (2021). Deep learning for credit scoring : do or don’t? EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 295, 292-305. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. I. (1984). Classification and regression trees. Belmont, Calif.:Wadsworth. Chipman, H. A., George, E. I., & Mcculloch, R. E. (1998). Bayesian CART Model Search. Journal of the American Statistical Association, 93, 935- 948. Bleich, J., & Kapelner, A. (2014, November 24). BartMachine: Machine Learning with Bayesian Additive Regression Trees. Arxiv. Mcculloch, R., Spanbauer, C., & Sparapani, R. (2021). Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: TheBART R Package. Journal of Statistical Software, 97, 1–66. Urbanek, S. (2024, January 26). RJava: Low-Level R to Java Interface. Straw I, Wu H. Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ Health CareInform 2022;29:e100457. Prasad babu, M. S., Ramana, B. V., & Venkateswarlu, N. B. (2012). A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis. International Journal of Computer Science Issues, 9, 101-114. Ramana, Bendi., & Venkateswarlu, N. (2012). ILPD (Indian Liver PatientDataset). UCI Machine Learning Repository. Quinlan, R. (1993). Auto MPG. UCI Machine Learning Repository. zh_TW