運用機器學習技術建構稅收預測模型之研究 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	運用機器學習技術建構稅收預測模型之研究 A Study of Tax Revenue Forecasting Model Based on Machine Learning Techniques
作者	鄧宜芳 Teng, Yi-Fang
貢獻者	劉昭麟鄧宜芳 Teng, Yi-Fang
關鍵詞	稅收預測基因演算法支援向量迴歸 Tax revenue forecasting Genetic algorithm Support vector regression
日期	2024
上傳時間	1-Mar-2024 14:12:15 (UTC+8)
摘要	本研究以營業稅、營所稅及綜所稅作為稅收預測研究標的，透過歷年之經濟指標、實際稅收情形及未來景氣預測資料，運用機器學習技術，建構稅收預算編列輔助模型。蒐集自1971年至2022年之資料，並依資料起始年度區分40年資料集、25年資料集及10年資料集，另依稅收預算編列特性，將原始資料轉換為滯後特徵資料、重構特徵資料及混合特徵資料。在建構預測模型的部分，採用4種預測模型，分別為線性迴歸模型、支援向量迴歸模型、搭配基因演算法之線性迴歸模型及搭配基因演算法之支援向量迴歸模型。模型評估方法則採用平均絶對百分比誤差(MAPE)，並透過5次時間序列交叉驗證選擇最佳模型。　　本研究是國內有關稅收預測研究中，首次蒐集280項特徵變數，透過基因演算法選擇最適特徵變數，並首次使用支援向量迴歸方法建立稅收預測模型。本研究結果顯示，使用不同資料集及模型進行稅收預測，以10年訓練資料並採用基因演算法選擇特徵值之訓練模型成效較佳。在模型實證部分，三稅預測表現皆不亞於政府預測結果，其中以營業稅預測模型表現最佳，可提供政府編列稅收預算之輔助參考。　　This study focuses on tax revenue forecasting with business tax, profit-seeking enterprise income tax, and individual income tax. By utilizing historical economic indicators, actual tax revenue data, and future economic outlook predictions, we employ machine learning techniques to construct a tax budgeting assistance model. The collected data spans from 1971 to 2022 and is categorized into datasets covering 40, 25, and 10 years, respectively, based on their starting years. To address the characteristics of tax budgeting, the original data is transformed into lagged feature data, reconstructed feature data, and mixed feature data. In the construction of forecasting models, four types of models are used, including linear regression model, support vector regression model, linear regression model with genetic algorithm, and support vector regression model with genetic algorithm. Model evaluation is conducted using the Mean Absolute Percentage Error (MAPE), and the optimal model is selected through a five-fold time series cross-validation process. 　　This study represents the first effort in domestic research on tax revenue forecasting to collect 280 feature variables, select optimal feature variables through genetic algorithms, and establish tax revenue forecasting models by using support vector regression. The results indicate that using diverse datasets and models for tax prediction, the training model with 10 years of training data and the application of genetic algorithms to select feature values is more effective. During the empirical testing phase, the performance of the three tax prediction models is comparable to official forecasts, with the business tax prediction model demonstrating the best performance. This outcome can offer a valuable auxiliary reference for the government in budgeting tax revenue.
參考文獻	任立斌（2018）。應用特徵選取進行股價預測與獲利可能性之研究。碩士論文。國立中興大學。江枝華（2003）。所得稅稅收預測及其管理之研究。碩士論文。國立政治大學。李培煜（2021）。時間序列ARIMA與深度學習LSTM預測模型之比較：以台灣股票市場為例。碩士論文。東吳大學。李誌原（2018）。台灣行動電信業者營業收入預測：時間序列與計算智能方法之比較。碩士論文。輔仁大學。林倩瑩（2020）。利用長短期遞迴類神經網路建構地方政府稅收預測模式之研究-以地價稅為例。碩士論文。元智大學。洪仲儀（2021）。使用基因演算法與機器學習方法以預測股市交易信號。碩士論文。國立中興大學。彭琇嫦（2008）。賦稅收入預測模型之研究。碩士論文。輔仁大學。曾俞傑（2021）。應用資料探勘技術預測製程加工時間之研究-以板金加工為例。碩士論文。國立成功大學。楊佩烜（2018）。臺灣稅收預測探討。碩士論文。國立政治大學財政學系。謝佳穎（2018）。利用長短期遞迴類神經網路建構地方政府稅收預測模式之研究─以土地增值稅為例。碩士論文。國立政治大學。中華民國統計資訊網。上網日期2023年9月。檢自：https://www.stat.gov.tw/。中華經濟研究院經濟景氣觀測。上網日期2023年9月。檢自：https://www.cier.edu.tw/news/tmf。全國法規資料庫。上網日期2023年9月。檢自：https://law.moj.gov.tw/。財政部財政統計資料庫。上網日期2023年9月。檢自：https://web02.mof.gov.tw/njswww/WebMain.aspx?sys=100&funid=defjspf2。財政部財政資訊中心業務統計。上網日期2023年9月。檢自：https://www.fia.gov.tw/multiplehtml/43。財政部統計處（2019）。稅收執行差距及稅收預測議題之探討。上網日期2023年9月，檢自：https://www.mof.gov.tw/multiplehtml/1474。財政部賦稅署業務統計。上網日期2023年9月。檢自：https://www.dot.gov.tw/multiplehtml/ch_65。國家發展委員會景氣指標查詢系統。上網日期2023年9月。檢自：https://index.ndc.gov.tw/n/zh_tw。 Bergmeir, C., & Benítez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192–213. Cherkassky, V., & Ma, Y. (2004, July). Comparison of loss functions for linear regression. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 1, pp. 395-400). IEEE. Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. Advances in neural information processing systems, 9. Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media, Inc. Gunn, S. R. (1998). Support vector machines for classification and regression. ISIS technical report, 14(1), 5-16. Han, & Kamber, M. (2012). Data mining : concepts and techniques / Jiawei Han, Micheline Kamber, Jian Pei. (3rd ed.). Morgan Kaufmann/Elsevier. Holland, J. H. (1992). Genetic algorithms. Scientific american, 267(1), 66-73. Russell, S. J., & Norvig, P. (2010). Artificial intelligence a modern approach. London. Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media. Vapnik, V., Golowich, S., & Smola, A. (1996). Support vector method for function approximation, regression estimation and signal processing. Advances in neural information processing systems, 9. Witten, & Hall, M. A. (2011). Data mining practical machine learning tools and techniques. (3rd ed. / Ian H. Witten, Eibe Frank, Mark A. Hall.). Morgan Kaufmann.
描述	碩士國立政治大學資訊科學系碩士在職專班 110971028
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110971028
資料類型	thesis

dc.contributor.advisor	劉昭麟	zh_TW
dc.contributor.author (Authors)	鄧宜芳	zh_TW
dc.contributor.author (Authors)	Teng, Yi-Fang	en_US
dc.creator (作者)	鄧宜芳	zh_TW
dc.creator (作者)	Teng, Yi-Fang	en_US
dc.date (日期)	2024	en_US
dc.date.accessioned	1-Mar-2024 14:12:15 (UTC+8)	-
dc.date.available	1-Mar-2024 14:12:15 (UTC+8)	-
dc.date.issued (上傳時間)	1-Mar-2024 14:12:15 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110971028	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/150262	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	110971028	zh_TW
dc.description.abstract (摘要)	本研究以營業稅、營所稅及綜所稅作為稅收預測研究標的，透過歷年之經濟指標、實際稅收情形及未來景氣預測資料，運用機器學習技術，建構稅收預算編列輔助模型。蒐集自1971年至2022年之資料，並依資料起始年度區分40年資料集、25年資料集及10年資料集，另依稅收預算編列特性，將原始資料轉換為滯後特徵資料、重構特徵資料及混合特徵資料。在建構預測模型的部分，採用4種預測模型，分別為線性迴歸模型、支援向量迴歸模型、搭配基因演算法之線性迴歸模型及搭配基因演算法之支援向量迴歸模型。模型評估方法則採用平均絶對百分比誤差(MAPE)，並透過5次時間序列交叉驗證選擇最佳模型。　　本研究是國內有關稅收預測研究中，首次蒐集280項特徵變數，透過基因演算法選擇最適特徵變數，並首次使用支援向量迴歸方法建立稅收預測模型。本研究結果顯示，使用不同資料集及模型進行稅收預測，以10年訓練資料並採用基因演算法選擇特徵值之訓練模型成效較佳。在模型實證部分，三稅預測表現皆不亞於政府預測結果，其中以營業稅預測模型表現最佳，可提供政府編列稅收預算之輔助參考。	zh_TW
dc.description.abstract (摘要)	This study focuses on tax revenue forecasting with business tax, profit-seeking enterprise income tax, and individual income tax. By utilizing historical economic indicators, actual tax revenue data, and future economic outlook predictions, we employ machine learning techniques to construct a tax budgeting assistance model. The collected data spans from 1971 to 2022 and is categorized into datasets covering 40, 25, and 10 years, respectively, based on their starting years. To address the characteristics of tax budgeting, the original data is transformed into lagged feature data, reconstructed feature data, and mixed feature data. In the construction of forecasting models, four types of models are used, including linear regression model, support vector regression model, linear regression model with genetic algorithm, and support vector regression model with genetic algorithm. Model evaluation is conducted using the Mean Absolute Percentage Error (MAPE), and the optimal model is selected through a five-fold time series cross-validation process. 　　This study represents the first effort in domestic research on tax revenue forecasting to collect 280 feature variables, select optimal feature variables through genetic algorithms, and establish tax revenue forecasting models by using support vector regression. The results indicate that using diverse datasets and models for tax prediction, the training model with 10 years of training data and the application of genetic algorithms to select feature values is more effective. During the empirical testing phase, the performance of the three tax prediction models is comparable to official forecasts, with the business tax prediction model demonstrating the best performance. This outcome can offer a valuable auxiliary reference for the government in budgeting tax revenue.	en_US
dc.description.tableofcontents	第1章緒論　1 1.1 研究動機　1 1.2 研究背景　4 1.3 研究目標　10 1.4 研究貢獻及成果　11 1.5 論文架構　11 第2章文獻回顧　12 2.1 稅收預測　12 2.2 以機器學習演算法建立迴歸預測模型　15 第3章研究方法與理論基礎　19 3.1 模型建構流程　19 3.2 線性迴歸模型　21 3.3 支援向量迴歸模型　21 3.4 迴歸模型評估方法　23 3.5 交叉驗證　25 3.6 網格搜尋　26 3.7 基因演算法　27 第4章實驗設計與資料處理　28 4.1 實驗設計　28 4.1.1 資料蒐集　28 4.1.2 模型訓練方法　30 4.2 資料前處理　32 4.2.1 滯後特徵　32 4.2.2 重構特徵　33 4.2.3 混合特徵　34 第5章實驗結果分析　35 5.1 訓練資料描述　35 5.2 營業稅　36 5.2.1 訓練模型評估　36 5.2.2 模型實證結果　41 5.3 營所稅　42 5.3.1 訓練模型評估　42 5.3.2 模型實證結果　47 5.4 綜所稅　49 5.4.1 訓練模型評估　49 5.4.2 模型實證結果　54 第6章結論與建議　56 6.1 研究結論　56 6.2 研究建議　58 參考文獻　60 附錄A 本研究資料蒐集之內容描述　63 附錄B 論文口試相關討論　83	zh_TW
dc.format.extent	1512912 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110971028	en_US
dc.subject (關鍵詞)	稅收預測	zh_TW
dc.subject (關鍵詞)	基因演算法	zh_TW
dc.subject (關鍵詞)	支援向量迴歸	zh_TW
dc.subject (關鍵詞)	Tax revenue forecasting	en_US
dc.subject (關鍵詞)	Genetic algorithm	en_US
dc.subject (關鍵詞)	Support vector regression	en_US
dc.title (題名)	運用機器學習技術建構稅收預測模型之研究	zh_TW
dc.title (題名)	A Study of Tax Revenue Forecasting Model Based on Machine Learning Techniques	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	任立斌（2018）。應用特徵選取進行股價預測與獲利可能性之研究。碩士論文。國立中興大學。江枝華（2003）。所得稅稅收預測及其管理之研究。碩士論文。國立政治大學。李培煜（2021）。時間序列ARIMA與深度學習LSTM預測模型之比較：以台灣股票市場為例。碩士論文。東吳大學。李誌原（2018）。台灣行動電信業者營業收入預測：時間序列與計算智能方法之比較。碩士論文。輔仁大學。林倩瑩（2020）。利用長短期遞迴類神經網路建構地方政府稅收預測模式之研究-以地價稅為例。碩士論文。元智大學。洪仲儀（2021）。使用基因演算法與機器學習方法以預測股市交易信號。碩士論文。國立中興大學。彭琇嫦（2008）。賦稅收入預測模型之研究。碩士論文。輔仁大學。曾俞傑（2021）。應用資料探勘技術預測製程加工時間之研究-以板金加工為例。碩士論文。國立成功大學。楊佩烜（2018）。臺灣稅收預測探討。碩士論文。國立政治大學財政學系。謝佳穎（2018）。利用長短期遞迴類神經網路建構地方政府稅收預測模式之研究─以土地增值稅為例。碩士論文。國立政治大學。中華民國統計資訊網。上網日期2023年9月。檢自：https://www.stat.gov.tw/。中華經濟研究院經濟景氣觀測。上網日期2023年9月。檢自：https://www.cier.edu.tw/news/tmf。全國法規資料庫。上網日期2023年9月。檢自：https://law.moj.gov.tw/。財政部財政統計資料庫。上網日期2023年9月。檢自：https://web02.mof.gov.tw/njswww/WebMain.aspx?sys=100&funid=defjspf2。財政部財政資訊中心業務統計。上網日期2023年9月。檢自：https://www.fia.gov.tw/multiplehtml/43。財政部統計處（2019）。稅收執行差距及稅收預測議題之探討。上網日期2023年9月，檢自：https://www.mof.gov.tw/multiplehtml/1474。財政部賦稅署業務統計。上網日期2023年9月。檢自：https://www.dot.gov.tw/multiplehtml/ch_65。國家發展委員會景氣指標查詢系統。上網日期2023年9月。檢自：https://index.ndc.gov.tw/n/zh_tw。 Bergmeir, C., & Benítez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192–213. Cherkassky, V., & Ma, Y. (2004, July). Comparison of loss functions for linear regression. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 1, pp. 395-400). IEEE. Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. Advances in neural information processing systems, 9. Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media, Inc. Gunn, S. R. (1998). Support vector machines for classification and regression. ISIS technical report, 14(1), 5-16. Han, & Kamber, M. (2012). Data mining : concepts and techniques / Jiawei Han, Micheline Kamber, Jian Pei. (3rd ed.). Morgan Kaufmann/Elsevier. Holland, J. H. (1992). Genetic algorithms. Scientific american, 267(1), 66-73. Russell, S. J., & Norvig, P. (2010). Artificial intelligence a modern approach. London. Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media. Vapnik, V., Golowich, S., & Smola, A. (1996). Support vector method for function approximation, regression estimation and signal processing. Advances in neural information processing systems, 9. Witten, & Hall, M. A. (2011). Data mining practical machine learning tools and techniques. (3rd ed. / Ian H. Witten, Eibe Frank, Mark A. Hall.). Morgan Kaufmann.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM