學術產出-Theses
Article View/Open
Publication Export
-
題名 概念飄移下的監督式學習:時序特徵與訓練策略
Supervised learning under concept drift: time series features and training strategies作者 黃羽婕
Huang, Yu-Chieh貢獻者 莊皓鈞
Chuang, Hao-Chun
黃羽婕
Huang, Yu-Chieh關鍵詞 機器學習
訓練策略
概念飄移
時間序列特徵
Machine learning
Training strategies
Concept drift
Time series features日期 2022 上傳時間 2-Dec-2022 15:21:40 (UTC+8) 摘要 近年來,企業仰賴於機器學習模型的比例提升,而在資料量大幅提升及模型建置環境隨時間變遷的情形下,容易產生概念飄移的狀況,在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下,如何幫助分析者快速地分析時間序列資料,且透過資料特徵鑑別出較好的模型訓練策略,進而改善模型預測效果,其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法,加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段,會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態,同時搭配四種模型的訓練策略手法,觀察模型表現。第二階段中,透過萃取出的時間序列特徵,搭配四種訓練策略的模型表現,找出特定時間序列特徵及訓練策略的關係。根據研究結果,本論文採納的訓練策略手法,在特定時間序列特徵存在的情形下,能有效提升模型預測效果。
In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features.參考文獻 Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83.Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79.Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652.Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37.Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629.Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI.Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363.Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128.Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92.Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301.Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142.Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205.Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522.Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16.Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67.Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/ 描述 碩士
國立政治大學
資訊管理學系
110356010資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110356010 資料類型 thesis dc.contributor.advisor 莊皓鈞 zh_TW dc.contributor.advisor Chuang, Hao-Chun en_US dc.contributor.author (Authors) 黃羽婕 zh_TW dc.contributor.author (Authors) Huang, Yu-Chieh en_US dc.creator (作者) 黃羽婕 zh_TW dc.creator (作者) Huang, Yu-Chieh en_US dc.date (日期) 2022 en_US dc.date.accessioned 2-Dec-2022 15:21:40 (UTC+8) - dc.date.available 2-Dec-2022 15:21:40 (UTC+8) - dc.date.issued (上傳時間) 2-Dec-2022 15:21:40 (UTC+8) - dc.identifier (Other Identifiers) G0110356010 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142646 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 110356010 zh_TW dc.description.abstract (摘要) 近年來,企業仰賴於機器學習模型的比例提升,而在資料量大幅提升及模型建置環境隨時間變遷的情形下,容易產生概念飄移的狀況,在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下,如何幫助分析者快速地分析時間序列資料,且透過資料特徵鑑別出較好的模型訓練策略,進而改善模型預測效果,其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法,加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段,會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態,同時搭配四種模型的訓練策略手法,觀察模型表現。第二階段中,透過萃取出的時間序列特徵,搭配四種訓練策略的模型表現,找出特定時間序列特徵及訓練策略的關係。根據研究結果,本論文採納的訓練策略手法,在特定時間序列特徵存在的情形下,能有效提升模型預測效果。 zh_TW dc.description.abstract (摘要) In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features. en_US dc.description.tableofcontents 第一章 緒論 1第二章 文獻探討 4第一節 概念飄移 4一、概念飄移的定義及種類 4二、概念飄移學習 5第二節 時間序列特徵 7第三章 研究架構與方法 9第一節 資料生成與模型 9一、資料生成 9二、模型交叉驗證方法 13三、模型建立 14第二節 實驗模擬與參數設定 16第四章 研究分析 18第一節 數值分析 18第二節 驗證分析 23第五章 結論與建議 28參考文獻 31 zh_TW dc.format.extent 2214953 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110356010 en_US dc.subject (關鍵詞) 機器學習 zh_TW dc.subject (關鍵詞) 訓練策略 zh_TW dc.subject (關鍵詞) 概念飄移 zh_TW dc.subject (關鍵詞) 時間序列特徵 zh_TW dc.subject (關鍵詞) Machine learning en_US dc.subject (關鍵詞) Training strategies en_US dc.subject (關鍵詞) Concept drift en_US dc.subject (關鍵詞) Time series features en_US dc.title (題名) 概念飄移下的監督式學習:時序特徵與訓練策略 zh_TW dc.title (題名) Supervised learning under concept drift: time series features and training strategies en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83.Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79.Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652.Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37.Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629.Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI.Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363.Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128.Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92.Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301.Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142.Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205.Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522.Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16.Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67.Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/ zh_TW dc.identifier.doi (DOI) 10.6814/NCCU202201663 en_US