應用實價登錄建立以聚類方法之堆疊泛化房價預測模型 -以桃園市區分建物房價資料為例 | Publication

Publications-Theses

Article View/Open

pdf(472)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Loading...

Loading...

Related Publications in TAIR

Simple Record
Full Record

Title	應用實價登錄建立以聚類方法之堆疊泛化房價預測模型 -以桃園市區分建物房價資料為例 Predicting Housing Prices using Clustering-based Stacked Generator- A study on Taoyuan City Actual Price Registration Data
Creator	黃允亭 Huang, Yun-Ting
Contributor	陳樹衡<br>鄧筱蓉黃允亭 Huang, Yun-Ting
Key Words	特徵選取聚類分析機器學習集成學習堆疊泛化實價登錄房價預測
Date	2022
Date Issued	1-Mar-2022 17:52:29 (UTC+8)
Summary	本研究探討結合聚類分析的堆疊泛化模型對台灣房價預測的適用性。利用最新可用的桃園市實價登錄資料, 本研究首先拓展了Trivedi et. al (2015) 的聚類分析集成學習方法，建立了一個聚類分析的兩層堆疊泛化模型。第一層聚類分析群模型分別由Lasso，KNN以及決策樹建立，第二層元模型分別由線性迴歸、隨機森林以及XGBoost所建立。接下來用此拓展的兩層聚類分析堆疊泛化模型預測了桃園市房價資料，並與其他機器學習模型，包括線性迴歸、隨機森林和XGBoost，比較他們的預測結果。 This research explores the applicability of combining clustering technique with stacked generalization for Taiwan housing prices prediction. Taking advantage of the most currently available Taoyuan City Actual Price Registration Data, we first expanded the clustering-based ensemble learning method by Trivedi et al. (2015) to develop two-layer clustering-based stacked generalizers. In the first layer, three machine learning methods (Lasso, KNN and Decision Tree) were used to construct the cluster models. In the second layer, Linear Regression, Random Forest and XGBoost were used to build meta models. These developed stacked generalizers are then used to predict housing prices in the Taoyuan City. Their prediction accuracies are then compared with that from other machine learning methods, including Linear Regression, Random Forest and XGBoost.
參考文獻	[1] Altman, N. S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, 46(3), 175–185. [2] Breiman, L. (1996a). Bagging Predictors. Machine Learning, 24(2), 123–140. [3] Breiman, L. (1996b). Stacked Regressions Leo Breiman. Machine Learning, 24(1), 49–64. [4] Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. [5] Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification And Regression Trees. Chapman & Hall/CRC, 368. [6] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD, 785–794. [7] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26. [8] Frank, A. and Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml. [9] Freund, Y. (1995). Boosting a Weak Learning Algorithm by Majority. Information and Computation, 121(2), 256–285. [10] Efron, B. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., 1(55), 119–139. [11] Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. [12] Friedman, J. H. (2001). Boosting a Weak Learning Algorithm by Majority. Greedy Function Approximation: A Gradient Boosting Machine, 29(5), 1189–1232. [13] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using Support Vector Machines. Machine Learning, 46(1-3), 389–422. [14] Ho, T. K. (1995). Random Decision Forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 278–282. [15] Huang, S.Y. and Yu, F. and Tsaih, R. H. and Huang, Y. (2014). Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns. 2014 International Joint Conference on Neural Networks (IJCNN), 3303–3310. [16] Schapire, R. E. (1990). The Strength of Weak Learnability. Machine Learning, 5, 197–227. [17] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [18] Ting, K.M. & Witten, I.H. (1997). Stacked generalization: when does it work?. Hamilton, New Zealand: University of Waikato, Department of Computer Science. [19] Trivedi, S., Pardos, Z. A., & Heffernan, N. T. (2015). The Utility of Clustering in Prediction Tasks. ArXiv:1509.06163. [20] Van der laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(25). [21] Wolpert, D. H. (1992). Stacked Generalization. Neural Networks, 5(2), 241–259. [22] Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2), 301–320. [23] 何睿婷，(2018)。基於異質集成學習方法的房價預測。通訊世界，10，pp.296-297。 [24] 吳晏榕，(2010)。房價指數應用在銀行資產重估之研究。未出版之碩士論文，政治大學，經濟學研究所，台北市。 [25] 洪淑娟、雷立芬，(2010)。根據中古屋、預售屋／新成屋房價與總體經濟變數互動關係之研究。臺灣銀行季刊，61(1)，pp.155-167。 [26] 洪鴻智、張能政，(2006)。不動產估價人員之價值探索過程：估價程序與參考點的選擇。建築與規劃學報，7(1)，pp.71-90。 [27] 郁嘉綾，(2018)。應用大數據於杭州市房地產價格模型之建立。未出版之碩士論文，政治大學，統計學研究所，台北市。 [28] 張曦方，(1994)。住宅樓層價差之探討–以台北市為例。未出版之碩士論文，政治大學，地政學研究所，台北市。 [29] 陳敬筌，(2019)。應用深度學習預測區域住房平均價格— 以台北市實價登錄為例。未出版之碩士論文，銘傳大學，資訊管理學系碩士在職專班，台北市。 [30] 陳樹衡、郭子文、棗厥庸，（2007）。以決策樹之迴歸樹建構住宅價格模型－臺灣地區之實證分析。住宅學報，16(1)，pp.1-20。 [31] 馮世傑，(2014)。房價影響變數之探討-以台北市為例。未出版之碩士論文。東吳大學，國際貿易學研究所，台北市。 [32] 黃佳鈴、張金鶚， (2005)。從房地價格分離探討地價指數與公告土地現值評估。台灣土地研究；8(2)，pp.73-106。 [33] 楊博文、曹布陽，(2017)。基於集成學習的房價預測模型。電腦知識與技術，13(29)，pp.191-194。 [34] 蔡育政，(2009)。影響房地產價格因素之研究:以台中市北屯區、西屯區、南屯區、中區、東區為例。未出版之碩士論文，朝陽科技大學，財務金融研究所，台中市。 [35] 蔡育展，(2017)。機器學習與房地產估價。未出版之碩士論文，政治大學，資訊管理學研究所，台北市。 [36] 蔡瑞煌、高明志、張金鶚，(1999)。類神經網路應用於房地產估價之研究。住宅學報，8，pp.1-20。 [37] 賴碧瑩，(2007)。應用類神經網路於電腦輔助大量估價。住宅學報，16(2)，pp.43-65。 [38] 謝明穎，(2017) 。運用機器學習方法建構房價預測視覺化平台。未出版之碩士論文。輔仁大學，統計資訊學系應用統計研究所，新北市。
Description	碩士國立政治大學經濟學系 107258025
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0107258025
Type	thesis

dc.contributor.advisor	陳樹衡<br>鄧筱蓉	zh_TW
dc.contributor.author (Authors)	黃允亭	zh_TW
dc.contributor.author (Authors)	Huang, Yun-Ting	en_US
dc.creator (作者)	黃允亭	zh_TW
dc.creator (作者)	Huang, Yun-Ting	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	1-Mar-2022 17:52:29 (UTC+8)	-
dc.date.available	1-Mar-2022 17:52:29 (UTC+8)	-
dc.date.issued (上傳時間)	1-Mar-2022 17:52:29 (UTC+8)	-
dc.identifier (Other Identifiers)	G0107258025	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/139264	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	經濟學系	zh_TW
dc.description (描述)	107258025	zh_TW
dc.description.abstract (摘要)	本研究探討結合聚類分析的堆疊泛化模型對台灣房價預測的適用性。利用最新可用的桃園市實價登錄資料, 本研究首先拓展了Trivedi et. al (2015) 的聚類分析集成學習方法，建立了一個聚類分析的兩層堆疊泛化模型。第一層聚類分析群模型分別由Lasso，KNN以及決策樹建立，第二層元模型分別由線性迴歸、隨機森林以及XGBoost所建立。接下來用此拓展的兩層聚類分析堆疊泛化模型預測了桃園市房價資料，並與其他機器學習模型，包括線性迴歸、隨機森林和XGBoost，比較他們的預測結果。	zh_TW
dc.description.abstract (摘要)	This research explores the applicability of combining clustering technique with stacked generalization for Taiwan housing prices prediction. Taking advantage of the most currently available Taoyuan City Actual Price Registration Data, we first expanded the clustering-based ensemble learning method by Trivedi et al. (2015) to develop two-layer clustering-based stacked generalizers. In the first layer, three machine learning methods (Lasso, KNN and Decision Tree) were used to construct the cluster models. In the second layer, Linear Regression, Random Forest and XGBoost were used to build meta models. These developed stacked generalizers are then used to predict housing prices in the Taoyuan City. Their prediction accuracies are then compared with that from other machine learning methods, including Linear Regression, Random Forest and XGBoost.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究動機 1 第二節研究目的 3 第三節本文貢獻 4 第四節本文架構 5 第二章文獻回顧 6 第一節傳統台灣房價因子決定與估計方法 6 第二節人工智慧演算法於台灣房價估計之應用 8 第三節集成學習及聚類方法在房價預測上的應用 11 第三章機器學習方法介紹 14 第一節機器學習的種類 14 第二節監督式學習 15 第三節非監督式學習 21 第四節監督式集成學習 23 第五節結合聚類方法的集成學習模型 39 第四章資料、統計分析與資料預處理 46 第一節資料來源與原特徵項目 46 第二節統計分析 52 第三節數據預處理 60 第五章研究方法 73 第一節研究流程圖 73 第二節房屋價格預測評估標準 81 第六章實驗結果與分析 83 第一節特徵選取 83 第二節建構兩層聚類堆疊泛化模型結果 86 第三節第二層不同元模型堆疊泛化結果比較 94 第四節堆疊泛化模型與其他機器學習模型預測結果比較 95 第五節討論與總結 97 第七章研究結論與建議 99 第一節研究結論 99 第二節未來方向 100 參考文獻 102	zh_TW
dc.format.extent	4693837 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0107258025	en_US
dc.subject (關鍵詞)	特徵選取	zh_TW
dc.subject (關鍵詞)	聚類分析	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	集成學習	zh_TW
dc.subject (關鍵詞)	堆疊泛化	zh_TW
dc.subject (關鍵詞)	實價登錄	zh_TW
dc.subject (關鍵詞)	房價預測	zh_TW
dc.title (題名)	應用實價登錄建立以聚類方法之堆疊泛化房價預測模型 -以桃園市區分建物房價資料為例	zh_TW
dc.title (題名)	Predicting Housing Prices using Clustering-based Stacked Generator- A study on Taoyuan City Actual Price Registration Data	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Altman, N. S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, 46(3), 175–185. [2] Breiman, L. (1996a). Bagging Predictors. Machine Learning, 24(2), 123–140. [3] Breiman, L. (1996b). Stacked Regressions Leo Breiman. Machine Learning, 24(1), 49–64. [4] Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. [5] Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification And Regression Trees. Chapman & Hall/CRC, 368. [6] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD, 785–794. [7] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26. [8] Frank, A. and Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml. [9] Freund, Y. (1995). Boosting a Weak Learning Algorithm by Majority. Information and Computation, 121(2), 256–285. [10] Efron, B. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., 1(55), 119–139. [11] Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. [12] Friedman, J. H. (2001). Boosting a Weak Learning Algorithm by Majority. Greedy Function Approximation: A Gradient Boosting Machine, 29(5), 1189–1232. [13] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using Support Vector Machines. Machine Learning, 46(1-3), 389–422. [14] Ho, T. K. (1995). Random Decision Forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 278–282. [15] Huang, S.Y. and Yu, F. and Tsaih, R. H. and Huang, Y. (2014). Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns. 2014 International Joint Conference on Neural Networks (IJCNN), 3303–3310. [16] Schapire, R. E. (1990). The Strength of Weak Learnability. Machine Learning, 5, 197–227. [17] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [18] Ting, K.M. & Witten, I.H. (1997). Stacked generalization: when does it work?. Hamilton, New Zealand: University of Waikato, Department of Computer Science. [19] Trivedi, S., Pardos, Z. A., & Heffernan, N. T. (2015). The Utility of Clustering in Prediction Tasks. ArXiv:1509.06163. [20] Van der laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(25). [21] Wolpert, D. H. (1992). Stacked Generalization. Neural Networks, 5(2), 241–259. [22] Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2), 301–320. [23] 何睿婷，(2018)。基於異質集成學習方法的房價預測。通訊世界，10，pp.296-297。 [24] 吳晏榕，(2010)。房價指數應用在銀行資產重估之研究。未出版之碩士論文，政治大學，經濟學研究所，台北市。 [25] 洪淑娟、雷立芬，(2010)。根據中古屋、預售屋／新成屋房價與總體經濟變數互動關係之研究。臺灣銀行季刊，61(1)，pp.155-167。 [26] 洪鴻智、張能政，(2006)。不動產估價人員之價值探索過程：估價程序與參考點的選擇。建築與規劃學報，7(1)，pp.71-90。 [27] 郁嘉綾，(2018)。應用大數據於杭州市房地產價格模型之建立。未出版之碩士論文，政治大學，統計學研究所，台北市。 [28] 張曦方，(1994)。住宅樓層價差之探討–以台北市為例。未出版之碩士論文，政治大學，地政學研究所，台北市。 [29] 陳敬筌，(2019)。應用深度學習預測區域住房平均價格— 以台北市實價登錄為例。未出版之碩士論文，銘傳大學，資訊管理學系碩士在職專班，台北市。 [30] 陳樹衡、郭子文、棗厥庸，（2007）。以決策樹之迴歸樹建構住宅價格模型－臺灣地區之實證分析。住宅學報，16(1)，pp.1-20。 [31] 馮世傑，(2014)。房價影響變數之探討-以台北市為例。未出版之碩士論文。東吳大學，國際貿易學研究所，台北市。 [32] 黃佳鈴、張金鶚， (2005)。從房地價格分離探討地價指數與公告土地現值評估。台灣土地研究；8(2)，pp.73-106。 [33] 楊博文、曹布陽，(2017)。基於集成學習的房價預測模型。電腦知識與技術，13(29)，pp.191-194。 [34] 蔡育政，(2009)。影響房地產價格因素之研究:以台中市北屯區、西屯區、南屯區、中區、東區為例。未出版之碩士論文，朝陽科技大學，財務金融研究所，台中市。 [35] 蔡育展，(2017)。機器學習與房地產估價。未出版之碩士論文，政治大學，資訊管理學研究所，台北市。 [36] 蔡瑞煌、高明志、張金鶚，(1999)。類神經網路應用於房地產估價之研究。住宅學報，8，pp.1-20。 [37] 賴碧瑩，(2007)。應用類神經網路於電腦輔助大量估價。住宅學報，16(2)，pp.43-65。 [38] 謝明穎，(2017) 。運用機器學習方法建構房價預測視覺化平台。未出版之碩士論文。輔仁大學，統計資訊學系應用統計研究所，新北市。	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202200343	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM