學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 概念飄移下的監督式學習:時序特徵與訓練策略
Supervised learning under concept drift: time series features and training strategies
作者 黃羽婕
Huang, Yu-Chieh
貢獻者 莊皓鈞
Chuang, Hao-Chun
黃羽婕
Huang, Yu-Chieh
關鍵詞 機器學習
訓練策略
概念飄移
時間序列特徵
Machine learning
Training strategies
Concept drift
Time series features
日期 2022
上傳時間 2-Dec-2022 15:21:40 (UTC+8)
摘要 近年來,企業仰賴於機器學習模型的比例提升,而在資料量大幅提升及模型建置環境隨時間變遷的情形下,容易產生概念飄移的狀況,在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下,如何幫助分析者快速地分析時間序列資料,且透過資料特徵鑑別出較好的模型訓練策略,進而改善模型預測效果,其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法,加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段,會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態,同時搭配四種模型的訓練策略手法,觀察模型表現。第二階段中,透過萃取出的時間序列特徵,搭配四種訓練策略的模型表現,找出特定時間序列特徵及訓練策略的關係。根據研究結果,本論文採納的訓練策略手法,在特定時間序列特徵存在的情形下,能有效提升模型預測效果。
In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features.
參考文獻 Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83.

Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79.

Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652.

Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295.

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37.

Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629.

Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI.

Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363.

Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128.

Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92.

Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301.

Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142.

Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205.

Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522.

Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16.

Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67.

Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/
描述 碩士
國立政治大學
資訊管理學系
110356010
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110356010
資料類型 thesis
dc.contributor.advisor 莊皓鈞zh_TW
dc.contributor.advisor Chuang, Hao-Chunen_US
dc.contributor.author (Authors) 黃羽婕zh_TW
dc.contributor.author (Authors) Huang, Yu-Chiehen_US
dc.creator (作者) 黃羽婕zh_TW
dc.creator (作者) Huang, Yu-Chiehen_US
dc.date (日期) 2022en_US
dc.date.accessioned 2-Dec-2022 15:21:40 (UTC+8)-
dc.date.available 2-Dec-2022 15:21:40 (UTC+8)-
dc.date.issued (上傳時間) 2-Dec-2022 15:21:40 (UTC+8)-
dc.identifier (Other Identifiers) G0110356010en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142646-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 110356010zh_TW
dc.description.abstract (摘要) 近年來,企業仰賴於機器學習模型的比例提升,而在資料量大幅提升及模型建置環境隨時間變遷的情形下,容易產生概念飄移的狀況,在此情況發生時模型預測效果將衰減。本研究著重於概念飄移下,如何幫助分析者快速地分析時間序列資料,且透過資料特徵鑑別出較好的模型訓練策略,進而改善模型預測效果,其中我們參考了過去文獻使用的Purged K-fold與Augmentation方法,加入實驗觀察其與概念飄移資料的搭配性。在實驗第一階段,會模擬九種概念飄移資料以詮釋概念飄移資料的各種型態,同時搭配四種模型的訓練策略手法,觀察模型表現。第二階段中,透過萃取出的時間序列特徵,搭配四種訓練策略的模型表現,找出特定時間序列特徵及訓練策略的關係。根據研究結果,本論文採納的訓練策略手法,在特定時間序列特徵存在的情形下,能有效提升模型預測效果。zh_TW
dc.description.abstract (摘要) In recent years, companies have relied on the increase in the proportion of machine learning models. When the amount of data increases significantly and the model operational environment changes over time, it is easy to cause concept drift. When this happens, the model prediction effect will be attenuated. The research focuses on how to help analysts quickly analyze time series data under concept drift and find the best model training strategy through data characteristics, thereby improving the model prediction effect. Among them, we refer to the Purged K-Fold and Augmentation method from previous literature and add them into experiments to observe its compatibility with concept drift data. In the first stage of the experiment, nine concept drift data will be simulated to interpret various types of concept drift data, and the training strategies of the four models will be matched to observe the performance of the model. In the second stage, the relationship between specific time series features and training strategies is found through the extracted time series features and the model performance of the four training strategies. According to the research results, the training strategy adopted in this paper can effectively improve the prediction effect of the model in the presence of specific time series features.en_US
dc.description.tableofcontents 第一章 緒論 1
第二章 文獻探討 4
第一節 概念飄移 4
一、概念飄移的定義及種類 4
二、概念飄移學習 5
第二節 時間序列特徵 7
第三章 研究架構與方法 9
第一節 資料生成與模型 9
一、資料生成 9
二、模型交叉驗證方法 13
三、模型建立 14
第二節 實驗模擬與參數設定 16
第四章 研究分析 18
第一節 數值分析 18
第二節 驗證分析 23
第五章 結論與建議 28
參考文獻 31
zh_TW
dc.format.extent 2214953 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110356010en_US
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 訓練策略zh_TW
dc.subject (關鍵詞) 概念飄移zh_TW
dc.subject (關鍵詞) 時間序列特徵zh_TW
dc.subject (關鍵詞) Machine learningen_US
dc.subject (關鍵詞) Training strategiesen_US
dc.subject (關鍵詞) Concept driften_US
dc.subject (關鍵詞) Time series featuresen_US
dc.title (題名) 概念飄移下的監督式學習:時序特徵與訓練策略zh_TW
dc.title (題名) Supervised learning under concept drift: time series features and training strategiesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Barton, D., and Court, D. 2012. "Making Advanced Analytics Work for You," Harvard Business Review (90:10), pp. 78-83.

Cai, J., Luo, J., Wang, S., and Yang, S. 2018. "Feature Selection in Machine Learning: A New Perspective," Neurocomputing (300), pp. 70-79.

Fokkema, M., & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods, 25(5), 636–652.

Gama, J., Medas, P., Castillo, G., and Rodrigues, P. 2004. "Learning with Drift Detection," Brazilian Symposium on Artificial Intelligence: Springer, pp. 286-295.

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., and Bouchachia, A. 2014. "A Survey on Concept Drift Adaptation," ACM Computing Surveys (CSUR) (46:4), pp. 1-37.

Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. 2018. "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA): IEEE, pp. 620-629.

Lainder, A. D., and Wolfinger, R. D. 2022. "Forecasting with Gradient Boosted Trees: Augmentation, Tuning, and Cross-Validation Strategies: Winning Solution to the M5 Uncertainty Competition," International Journal of Forecasting), forthcoming DOI.

Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., and Zhang, G. 2018. "Learning under Concept Drift: A Review," IEEE Transactions on Knowledge and Data Engineering (31:12), pp. 2346-2363.

Ma, S., and Fildes, R. 2021. "Retail Sales Forecasting with Meta-Learning," European Journal of Operational Research (288:1), pp. 111-128.

Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. 2020. "Fforma: Feature-Based Forecast Model Averaging," International Journal of Forecasting (36:1), pp. 86-92.

Probst, P., Wright, M. N., and Boulesteix, A. L. 2019. "Hyperparameters and Tuning Strategies for Random Forest," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (9:3), p. e1301.

Schüritz, R., and Satzger, G. 2016. "Patterns of Data-Infused Business Model Innovation," 2016 IEEE 18th Conference on Business Informatics (CBI): IEEE, pp. 133-142.

Schwartz, E. M., Bradlow, E. T., and Fader, P. S. 2014. "Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data," Marketing Science (33:2), pp. 188-205.

Simester, D., Timoshenko, A., and Zoumpoulis, S. I. 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science (66:6), pp. 2495-2522.

Talagala, T. S., Hyndman, R. J., and Athanasopoulos, G. 2018. "Meta-Learning How to Forecast Time Series," Monash Econometrics and Business Statistics Working Papers (6:18), p. 16.

Tukey, J. W. 1962. "The Future of Data Analysis," The Annals of Mathematical Statistics (33:1), pp. 1-67.

Nathalie Rauschmayr, Satadal Bhattacharjee, and Vikas Kumar. 2020. "Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger" https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202201663en_US