結合spline及分箱方式之廣義線性模型預測

學術產出-Theses

Article View/Open

pdf(68)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	結合spline及分箱方式之廣義線性模型預測 Generalized linear model prediction combined with spline and binning method
作者	楊翔宇 Yang, Shiang-Yu
貢獻者	黃子銘 Huang, Tzee-Ming 楊翔宇 Yang, Shiang-Yu
關鍵詞	無母數方法變數選取分段多項式節點選取分箱方法 B-spline Nonparametric method Piecewise polynomial Variable selection Knot selection WOE of binning Binning method
日期	2021
上傳時間	5-Aug-2021 10:12:41 (UTC+8)
摘要	在日常生活中，總是要面臨許多資料。大部分的資料都是夾雜著類別型變數以及連續型變數的資料。針對這種資料，提出了一個方式可以對自變數稍作些許處理，並以處理後的自變數加以預測資料，達到不錯的效果。本研究方法將會使用R語言以針對銀行信用卡違約付款的資料作為主要的研究對象。以下個月是否有違約行為作為反應變數，其反應變數以1(有違約行為)、0(無違約行為)做為表示。利用模型可以從中了解信用卡用戶的基本訊息影響違約行為與否的機率，供以衡量信用卡用戶未來將會違約的機率，以幫助銀行對這些客戶進行限制，以降低銀行虧損的風險。 In our daily lives, we always have to face a great amount of large datasets. Most of them are combined with categorical variables and continuous variables. Regarding this type of data, we proposed a method for model construction and prediction. The proposed method is applied to the data of bank credit card default payments as the main research object. The response variable is the payment situation in the following months. “1” means the user with breach of contract and “0” means without breach of contract. Using the model, we can understand the association between the basic information of credit card users and their default behavior, which can be used to measure the probabilities that credit card users will default in the future, so as to help banks monitor customers and reduce the risk of bank losses.
參考文獻	[1] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984. [2] C. de Boor. A Practical Guide to Splines. Springer Verlag, New York, 1978. [3] J. F. Gamble. Asbestos and colon cancer: A weight-of-the-evidence review. Environmental Health Perspectives, 102:1038-1050, 1994. [4] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003. [5] T. K. Ho. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1), pages 278-282, Montreal, Que.,Canada, 1995. IEEE Computer Society. [6] T. M Huang. A knot selection algorithm for splines in logistic regression. In Proceedings of the 2020 3rd International Conference on Mathematics and Statistics, page 29-33, New York, NY, USA, 2020. Association for Computing Machinery. [7] J. Jinot and S. Bayard. Dissent respiratory health effects of passive smoking: Epa’s weight-of-evidence analysis. Journal of Clinical Epidemiology, 47(4):339-349, 1994. [8] R. Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI’92, page 123-128. AAAI Press, 1992. [9] N. Shaltout, M. Elhefnawi, A. Rafea, and A. Moustafa. Information gain as a feature selection method for the efficient classification of influenza based on viral hosts. Lecture Notes in Engineering and Computer Science, 1:625-631, 2014. [10] D. Weed. Weight of evidence: A review of concept and methods. Risk analysis : an official publication of the Society for Risk Analysis, 25:1545-1557, 2005. [11] G. Zeng. A necessary condition for a good binning algorithm in credit scoring. Applied Mathematical Sciences, Vol. 8:3229-3242, 2014.
描述	碩士國立政治大學統計學系 108354008
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0108354008
資料類型	thesis

dc.contributor.advisor	黃子銘	zh_TW
dc.contributor.advisor	Huang, Tzee-Ming	en_US
dc.contributor.author (Authors)	楊翔宇	zh_TW
dc.contributor.author (Authors)	Yang, Shiang-Yu	en_US
dc.creator (作者)	楊翔宇	zh_TW
dc.creator (作者)	Yang, Shiang-Yu	en_US
dc.date (日期)	2021	en_US
dc.date.accessioned	5-Aug-2021 10:12:41 (UTC+8)	-
dc.date.available	5-Aug-2021 10:12:41 (UTC+8)	-
dc.date.issued (上傳時間)	5-Aug-2021 10:12:41 (UTC+8)	-
dc.identifier (Other Identifiers)	G0108354008	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/136767	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	108354008	zh_TW
dc.description.abstract (摘要)	在日常生活中，總是要面臨許多資料。大部分的資料都是夾雜著類別型變數以及連續型變數的資料。針對這種資料，提出了一個方式可以對自變數稍作些許處理，並以處理後的自變數加以預測資料，達到不錯的效果。本研究方法將會使用R語言以針對銀行信用卡違約付款的資料作為主要的研究對象。以下個月是否有違約行為作為反應變數，其反應變數以1(有違約行為)、0(無違約行為)做為表示。利用模型可以從中了解信用卡用戶的基本訊息影響違約行為與否的機率，供以衡量信用卡用戶未來將會違約的機率，以幫助銀行對這些客戶進行限制，以降低銀行虧損的風險。	zh_TW
dc.description.abstract (摘要)	In our daily lives, we always have to face a great amount of large datasets. Most of them are combined with categorical variables and continuous variables. Regarding this type of data, we proposed a method for model construction and prediction. The proposed method is applied to the data of bank credit card default payments as the main research object. The response variable is the payment situation in the following months. “1” means the user with breach of contract and “0” means without breach of contract. Using the model, we can understand the association between the basic information of credit card users and their default behavior, which can be used to measure the probabilities that credit card users will default in the future, so as to help banks monitor customers and reduce the risk of bank losses.	en_US
dc.description.tableofcontents	1.緒論 .................................................6 2.文獻探討...............................................7 3.研究方法 ..............................................9 3.1 連續型變數之處理....................................10 3.2 離散型變數之處理....................................16 3.2.1 分箱方法的介紹....................................17 3.2.2 使用R進行分箱.....................................19 3.3 結合所有變數以獲取最終模型............................20 3.4 隨機森林法..........................................23 4.模擬實驗和結果.........................................25 5.實際資料應用...........................................27 6.參考文獻 .............................................44	zh_TW
dc.format.extent	1780111 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0108354008	en_US
dc.subject (關鍵詞)	無母數方法	zh_TW
dc.subject (關鍵詞)	變數選取	zh_TW
dc.subject (關鍵詞)	分段多項式	zh_TW
dc.subject (關鍵詞)	節點選取	zh_TW
dc.subject (關鍵詞)	分箱方法	zh_TW
dc.subject (關鍵詞)	B-spline	en_US
dc.subject (關鍵詞)	Nonparametric method	en_US
dc.subject (關鍵詞)	Piecewise polynomial	en_US
dc.subject (關鍵詞)	Variable selection	en_US
dc.subject (關鍵詞)	Knot selection	en_US
dc.subject (關鍵詞)	WOE of binning	en_US
dc.subject (關鍵詞)	Binning method	en_US
dc.title (題名)	結合spline及分箱方式之廣義線性模型預測	zh_TW
dc.title (題名)	Generalized linear model prediction combined with spline and binning method	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984. [2] C. de Boor. A Practical Guide to Splines. Springer Verlag, New York, 1978. [3] J. F. Gamble. Asbestos and colon cancer: A weight-of-the-evidence review. Environmental Health Perspectives, 102:1038-1050, 1994. [4] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003. [5] T. K. Ho. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1), pages 278-282, Montreal, Que.,Canada, 1995. IEEE Computer Society. [6] T. M Huang. A knot selection algorithm for splines in logistic regression. In Proceedings of the 2020 3rd International Conference on Mathematics and Statistics, page 29-33, New York, NY, USA, 2020. Association for Computing Machinery. [7] J. Jinot and S. Bayard. Dissent respiratory health effects of passive smoking: Epa’s weight-of-evidence analysis. Journal of Clinical Epidemiology, 47(4):339-349, 1994. [8] R. Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI’92, page 123-128. AAAI Press, 1992. [9] N. Shaltout, M. Elhefnawi, A. Rafea, and A. Moustafa. Information gain as a feature selection method for the efficient classification of influenza based on viral hosts. Lecture Notes in Engineering and Computer Science, 1:625-631, 2014. [10] D. Weed. Weight of evidence: A review of concept and methods. Risk analysis : an official publication of the Society for Risk Analysis, 25:1545-1557, 2005. [11] G. Zeng. A necessary condition for a good binning algorithm in credit scoring. Applied Mathematical Sciences, Vol. 8:3229-3242, 2014.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202100841	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM