概念飄移下的監督式學習：時序特徵與訓練策略 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	概念飄移下的監督式學習：時序特徵與訓練策略 Supervised learning under concept drift: time series features and training strategies
作者	黃羽婕 Huang, Yu-Chieh
貢獻者	莊皓鈞 Chuang, Hao-Chun 黃羽婕 Huang, Yu-Chieh
關鍵詞	機器學習訓練策略概念飄移時間序列特徵 Machine learning Training strategies Concept drift Time series features
日期	2022
上傳時間	2-Dec-2022 15:21:40 (UTC+8)
摘要	近年來，企業仰賴於機器學習模型的比例提升，而在資料量大幅提升及模型建置環境隨時間變遷的情形下，容易產生概念飄移的狀況，在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下，如何幫助分析者快速地分析時間序列資料，且透過資料特徵鑑別出較好的模型訓練策略，進而改善模型預測效果，其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法，加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段，會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態，同時搭配四種模型的訓練策略手法，觀察模型表現。第二階段中，透過萃取出的時間序列特徵，搭配四種訓練策略的模型表現，找出特定時間序列特徵及訓練策略的關係。根據研究結果，本論文採納的訓練策略手法，在特定時間序列特徵存在的情形下，能有效提升模型預測效果。 In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features.
參考文獻	Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83. Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79. Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652. Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37. Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629. Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363. Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128. Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92. Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301. Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142. Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205. Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522. Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16. Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67. Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/
描述	碩士國立政治大學資訊管理學系 110356010
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110356010
資料類型	thesis

dc.contributor.advisor	莊皓鈞	zh_TW
dc.contributor.advisor	Chuang, Hao-Chun	en_US
dc.contributor.author (Authors)	黃羽婕	zh_TW
dc.contributor.author (Authors)	Huang, Yu-Chieh	en_US
dc.creator (作者)	黃羽婕	zh_TW
dc.creator (作者)	Huang, Yu-Chieh	en_US
dc.date (日期)	2022	en_US
dc.date.accessioned	2-Dec-2022 15:21:40 (UTC+8)	-
dc.date.available	2-Dec-2022 15:21:40 (UTC+8)	-
dc.date.issued (上傳時間)	2-Dec-2022 15:21:40 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110356010	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/142646	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	110356010	zh_TW
dc.description.abstract (摘要)	近年來，企業仰賴於機器學習模型的比例提升，而在資料量大幅提升及模型建置環境隨時間變遷的情形下，容易產生概念飄移的狀況，在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下，如何幫助分析者快速地分析時間序列資料，且透過資料特徵鑑別出較好的模型訓練策略，進而改善模型預測效果，其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法，加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段，會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態，同時搭配四種模型的訓練策略手法，觀察模型表現。第二階段中，透過萃取出的時間序列特徵，搭配四種訓練策略的模型表現，找出特定時間序列特徵及訓練策略的關係。根據研究結果，本論文採納的訓練策略手法，在特定時間序列特徵存在的情形下，能有效提升模型預測效果。	zh_TW
dc.description.abstract (摘要)	In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features.	en_US
dc.description.tableofcontents	第一章緒論 1 第二章文獻探討 4 第一節概念飄移 4 一、概念飄移的定義及種類 4 二、概念飄移學習 5 第二節時間序列特徵 7 第三章研究架構與方法 9 第一節資料生成與模型 9 一、資料生成 9 二、模型交叉驗證方法 13 三、模型建立 14 第二節實驗模擬與參數設定 16 第四章研究分析 18 第一節數值分析 18 第二節驗證分析 23 第五章結論與建議 28 參考文獻 31	zh_TW
dc.format.extent	2214953 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110356010	en_US
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	訓練策略	zh_TW
dc.subject (關鍵詞)	概念飄移	zh_TW
dc.subject (關鍵詞)	時間序列特徵	zh_TW
dc.subject (關鍵詞)	Machine learning	en_US
dc.subject (關鍵詞)	Training strategies	en_US
dc.subject (關鍵詞)	Concept drift	en_US
dc.subject (關鍵詞)	Time series features	en_US
dc.title (題名)	概念飄移下的監督式學習：時序特徵與訓練策略	zh_TW
dc.title (題名)	Supervised learning under concept drift: time series features and training strategies	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83. Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79. Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652. Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37. Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629. Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363. Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128. Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92. Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301. Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142. Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205. Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522. Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16. Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67. Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202201663	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM