Publications-Theses
Article View/Open
Publication Export
-
題名 運用監督式學習於氣象觀測時序資料之自動異常偵測
Supervised Learning-Based Anomaly Detection of Meteorological Time Series Data作者 李韋霆
Lee, Wei-Ting貢獻者 沈錳坤
Shan, Man-Kwan
李韋霆
Lee, Wei-Ting關鍵詞 異常偵測
監督式學習日期 2018 上傳時間 3-Sep-2018 16:02:28 (UTC+8) 摘要 近年來氣候變遷是全球所關注的重要議題,各種極端氣候的現象也不斷增多,對於氣候監測與預報的服務需求與日俱增,氣象觀測數據的正確性,更顯得重要。若能及時的進行資料檢核,有效的剔除不合理的數據,提升觀測資料的品質,對於氣象預報作業有相當大的助益。 為了強化資料檢核作業程序,本研究使用中央氣象局的近五年的氣象觀測站資料,包含局屬綜觀氣象站,以及自動觀測站,利用資料探勘的理論與技術,針對即時觀測資料進行自動異常偵測。未來可結合中央氣象局既有檢核系統的功能,發展合適的資料分析處理流程,增加即時檢核的效率,以應用資料於後續氣象預報與分析作業。 本研究透過監督式學習的方式,除了利用所檢視的局屬綜觀氣象站的數據外,同時搭配鄰近測站的資料,藉由空間與時間上的資料處理,建立異常偵測模型。研究結果指出,對於溫度與相對濕度的氣象要素,具有一定的檢核能力,可降低後續人為介入的時間,能更有效率地即時剔除不合理的數據。 經檢核後的氣象觀測數據,除了提供氣象專業人員進行天氣分析與預報作業,也可結合不同的產業,提供加值應用服務,對於各領域及作業應用均有正面助益。
Recently, climate change is a global issue of concern. Extreme weather events are increasing. The service requests of climate monitoring and forecasts are increasing as well. The accuracy of meteorological data is crucial. If the data are promptly checked and the unreliable data are effectively removed to enhance the quality of the observation data, it will benefit the weather forecasting operations. To strengthen the procedure of data inspection, the study adopted the data over the past five years from the weather observation stations affiliated with the Central Weather Bureau, including manned and automatic weather stations to automatically detect anomalies in real-time observation data by using the theory and techniques of data mining. The result of the study can be coordinated with the inspection system of the Central Weather Bureau to develop appropriate data processing procedures, which can enhance the efficiency of the real-time inspection and allow information to be applied in the following weather forecasts and analysis. By means of the approach of supervised learning, the study utilized the inspection data from the manned weather stations as well as the data from the adjacent observation stations to establish an anomaly detection model through space-time data processing. The study indicated that a certain level of inspection capabilities on the weather factors of temperature and relative humidity could reduce the follow-up time for human intervention and eliminate the unreliable data more efficiently. The weather observation data which are checked through the inspection system can not only support weather professionals in weather analysis and forecasting operations but also be provided for value-added application services in various industries, which will benefit different fields and operation application.參考文獻 [1] V. Chandola, A. Banerjee, and V. Kumar, Anomaly Detection: A Survey, ACM Computing Surveys, Vol. 41, No. 3, 2009. [2] E. Keogh, J. Lin, and A. Fu, HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence, Fifth IEEE International Conference on Data Mining, 2005. [3] D. Dasgupta, and S. Forrest, Novelty Detection in Time Series Data using Ideas from Immunology, the 5th International Conference on Intelligent Systems, 1996. [4] J. Takeuchi, and K. Yamanishi, A Unifying Framework for Detecting Outliers and Change Points from Time Series, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 4, pp. 482-492, 2006. [5] M. Gupta, A. B. Sharma, H. Chen, and G. Jiang, Context-Aware Time Series Anomaly Detection for Complex Systems, Data Mining for Service and Maintenance, 2013. [6] G. M. Beltrami, An ANN Algorithm for Automatic, Real-time Tsunami Detection in Deep-Sea Level Measurements, Ocean Engineering, Vol. 35, No. 5-6, pp. 572-587, 2008. [7] D. R. Easterling, and T. C. Peterson, A New Method for Detecting Undocumented Discontinuities in Climatological Time Series, International Journal of Climatology, Vol. 15, No. 4, pp. 369-377, 1995. [8] A. Toreti, F. G. Kuglistsch, E. Xoplaki, and J. Luterbacher, A Novel Approach for the Detection of Inhomogeneities Affecting Climate Time Series, Journal of Applied Meteorology and Climatology, Vol. 51, No. 2, pp. 317-326, 2012. [9] M. Kubat, and S. Matwin, Addressing the Curse of Imbalanced Training Sets: One-sided Selection, International Conference on Machine Learning, Vol. 97, pp. 179-186, 1997. [10] P. W. Soh, K. H. Chen, J. W. Huang and H. J. Chu, Spatial-temporal Pattern Analysis and Prediction of Air Quality in Taiwan, International Conference on Ubi-media Computing and Workshops, 2017. [11] L. Breiman, Random Forests, Machine Learning, Vol. 45, No. 1, pp.5-32, 2001. [12] J. H. Friedman, Greedy function Approximation – A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001. [13] H. Sakoe, and S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 26, No. 1, pp. 43-49. 1978. [14] 吳品秀、沈里音,中央氣象局局屬有人氣象站觀測資料檢核系統簡介及統計分析,105年天氣分析與預報研討會,2016。 [15] 中央氣象局觀測資料查詢系統,https://e-service.cwb.gov.tw/HistoryDataQuery/ [16] 陸象豫、黃惠雪,福山試驗林氣候特性及其變化分析,林業研究專訊,2013。 [17] 沈里音,105年氣象年報,2017。 [18] 陳雲蘭、薛宏宇、呂致穎、陳品妤、詹智雄、沈里音,「臺灣長期氣候資料整集分析」計畫研究(1) —自動氣象站長期氣溫觀測值合理性檢測方法探討及分析,104年天氣分析與預報研討會,2015。 [19] 洪達文、周志成 ,國立交通大學應用時間序列相似度量測方法於異常偵測與分類,2006。 描述 碩士
國立政治大學
資訊科學系碩士在職專班
101971009資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101971009 資料類型 thesis dc.contributor.advisor 沈錳坤 zh_TW dc.contributor.advisor Shan, Man-Kwan en_US dc.contributor.author (Authors) 李韋霆 zh_TW dc.contributor.author (Authors) Lee, Wei-Ting en_US dc.creator (作者) 李韋霆 zh_TW dc.creator (作者) Lee, Wei-Ting en_US dc.date (日期) 2018 en_US dc.date.accessioned 3-Sep-2018 16:02:28 (UTC+8) - dc.date.available 3-Sep-2018 16:02:28 (UTC+8) - dc.date.issued (上傳時間) 3-Sep-2018 16:02:28 (UTC+8) - dc.identifier (Other Identifiers) G0101971009 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/119969 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 101971009 zh_TW dc.description.abstract (摘要) 近年來氣候變遷是全球所關注的重要議題,各種極端氣候的現象也不斷增多,對於氣候監測與預報的服務需求與日俱增,氣象觀測數據的正確性,更顯得重要。若能及時的進行資料檢核,有效的剔除不合理的數據,提升觀測資料的品質,對於氣象預報作業有相當大的助益。 為了強化資料檢核作業程序,本研究使用中央氣象局的近五年的氣象觀測站資料,包含局屬綜觀氣象站,以及自動觀測站,利用資料探勘的理論與技術,針對即時觀測資料進行自動異常偵測。未來可結合中央氣象局既有檢核系統的功能,發展合適的資料分析處理流程,增加即時檢核的效率,以應用資料於後續氣象預報與分析作業。 本研究透過監督式學習的方式,除了利用所檢視的局屬綜觀氣象站的數據外,同時搭配鄰近測站的資料,藉由空間與時間上的資料處理,建立異常偵測模型。研究結果指出,對於溫度與相對濕度的氣象要素,具有一定的檢核能力,可降低後續人為介入的時間,能更有效率地即時剔除不合理的數據。 經檢核後的氣象觀測數據,除了提供氣象專業人員進行天氣分析與預報作業,也可結合不同的產業,提供加值應用服務,對於各領域及作業應用均有正面助益。 zh_TW dc.description.abstract (摘要) Recently, climate change is a global issue of concern. Extreme weather events are increasing. The service requests of climate monitoring and forecasts are increasing as well. The accuracy of meteorological data is crucial. If the data are promptly checked and the unreliable data are effectively removed to enhance the quality of the observation data, it will benefit the weather forecasting operations. To strengthen the procedure of data inspection, the study adopted the data over the past five years from the weather observation stations affiliated with the Central Weather Bureau, including manned and automatic weather stations to automatically detect anomalies in real-time observation data by using the theory and techniques of data mining. The result of the study can be coordinated with the inspection system of the Central Weather Bureau to develop appropriate data processing procedures, which can enhance the efficiency of the real-time inspection and allow information to be applied in the following weather forecasts and analysis. By means of the approach of supervised learning, the study utilized the inspection data from the manned weather stations as well as the data from the adjacent observation stations to establish an anomaly detection model through space-time data processing. The study indicated that a certain level of inspection capabilities on the weather factors of temperature and relative humidity could reduce the follow-up time for human intervention and eliminate the unreliable data more efficiently. The weather observation data which are checked through the inspection system can not only support weather professionals in weather analysis and forecasting operations but also be provided for value-added application services in various industries, which will benefit different fields and operation application. en_US dc.description.tableofcontents 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的與方法 3 1.3 論文貢獻 3 1.4 論文架構 4 第二章 相關研究 5 2.1 異常偵測 5 2.2 時間序列應用 5 第三章 研究方法與步驟 9 3.1 資料來源 9 3.2 資料蒐集與前處理 17 3.3 鄰近測站資料處理 21 3.3.1 空間相鄰測站 21 3.3.2 時間序列相似度 24 3.4 模型建置 27 3.4.1 隨機森林 (Random Forest) 28 3.4.2 梯度提升決策樹 (Gradient Boosting Decision Tree) 29 3.4.3 邏輯迴歸 (Logistic Regression) 30 第四章 實驗 31 4.1 實驗方法 31 4.1.1 輸入變量 31 4.1.2 分類處理 32 4.2 實驗結果 33 第五章 結論與未來研究方向 39 5.1 結論 39 5.2 未來發展 39 zh_TW dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101971009 en_US dc.subject (關鍵詞) 異常偵測 zh_TW dc.subject (關鍵詞) 監督式學習 zh_TW dc.title (題名) 運用監督式學習於氣象觀測時序資料之自動異常偵測 zh_TW dc.title (題名) Supervised Learning-Based Anomaly Detection of Meteorological Time Series Data en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] V. Chandola, A. Banerjee, and V. Kumar, Anomaly Detection: A Survey, ACM Computing Surveys, Vol. 41, No. 3, 2009. [2] E. Keogh, J. Lin, and A. Fu, HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence, Fifth IEEE International Conference on Data Mining, 2005. [3] D. Dasgupta, and S. Forrest, Novelty Detection in Time Series Data using Ideas from Immunology, the 5th International Conference on Intelligent Systems, 1996. [4] J. Takeuchi, and K. Yamanishi, A Unifying Framework for Detecting Outliers and Change Points from Time Series, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 4, pp. 482-492, 2006. [5] M. Gupta, A. B. Sharma, H. Chen, and G. Jiang, Context-Aware Time Series Anomaly Detection for Complex Systems, Data Mining for Service and Maintenance, 2013. [6] G. M. Beltrami, An ANN Algorithm for Automatic, Real-time Tsunami Detection in Deep-Sea Level Measurements, Ocean Engineering, Vol. 35, No. 5-6, pp. 572-587, 2008. [7] D. R. Easterling, and T. C. Peterson, A New Method for Detecting Undocumented Discontinuities in Climatological Time Series, International Journal of Climatology, Vol. 15, No. 4, pp. 369-377, 1995. [8] A. Toreti, F. G. Kuglistsch, E. Xoplaki, and J. Luterbacher, A Novel Approach for the Detection of Inhomogeneities Affecting Climate Time Series, Journal of Applied Meteorology and Climatology, Vol. 51, No. 2, pp. 317-326, 2012. [9] M. Kubat, and S. Matwin, Addressing the Curse of Imbalanced Training Sets: One-sided Selection, International Conference on Machine Learning, Vol. 97, pp. 179-186, 1997. [10] P. W. Soh, K. H. Chen, J. W. Huang and H. J. Chu, Spatial-temporal Pattern Analysis and Prediction of Air Quality in Taiwan, International Conference on Ubi-media Computing and Workshops, 2017. [11] L. Breiman, Random Forests, Machine Learning, Vol. 45, No. 1, pp.5-32, 2001. [12] J. H. Friedman, Greedy function Approximation – A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001. [13] H. Sakoe, and S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 26, No. 1, pp. 43-49. 1978. [14] 吳品秀、沈里音,中央氣象局局屬有人氣象站觀測資料檢核系統簡介及統計分析,105年天氣分析與預報研討會,2016。 [15] 中央氣象局觀測資料查詢系統,https://e-service.cwb.gov.tw/HistoryDataQuery/ [16] 陸象豫、黃惠雪,福山試驗林氣候特性及其變化分析,林業研究專訊,2013。 [17] 沈里音,105年氣象年報,2017。 [18] 陳雲蘭、薛宏宇、呂致穎、陳品妤、詹智雄、沈里音,「臺灣長期氣候資料整集分析」計畫研究(1) —自動氣象站長期氣溫觀測值合理性檢測方法探討及分析,104年天氣分析與預報研討會,2015。 [19] 洪達文、周志成 ,國立交通大學應用時間序列相似度量測方法於異常偵測與分類,2006。 zh_TW dc.identifier.doi (DOI) 10.6814/THE.NCCU.EMCS.008.2018.B02 -