Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 透過漂移感知包裹損失和神經微分方程式實現穩健的PM2.5預測
Robust PM2.5 Forecasting via Drift-Aware Wrap Loss and Neural ODEs
作者 侯康力
Hossen, Md Khalid
貢獻者 陳孟彰<br>彭彥璁
Chen, Meng Chang<br>Peng, Yan-Tsung
侯康力
Md Khalid Hossen
關鍵詞 資料漂移
被包裹的損失函數
PM2.5
後置連接
常微分方程
Data Drift
Wrapped loss
PM2.5
BLC
ODE
日期 2025
上傳時間 1-Sep-2025 15:48:13 (UTC+8)
摘要 在許多深度學習應用中,通常假設訓練資料與測試資料來自相同的 基礎分布。然而,這項前提在實際中往往並不成立,特別是當資料蒐 集橫跨不同時間區段或處於不同環境條件時。例如,一個城市的氣溫 模式可能因為複雜且難以預測的因素,每年之間存在顯著差異。 在本研究中,我們採用了三種不同的統計方法,來檢視多個監測站 點在不同年份之間的資料漂移(data drift)情形。這些方法透過模擬資 料集以近似2014年至2018年間的觀測資料,並為每一個監測點產出統 計顯著性指標,藉此辨識出具有顯著資料漂移的地點。此統計評估讓 我們能夠確認哪些監測站點在時間上出現了明顯的變異。 為了進一步探討資料漂移的影響,我們分析了氣象資料、天氣資 料與空氣品質資料。根據這些分析結果,我們設計了兩種新穎的預測模型,用以因應每小時PM2.5預測中可能發生的分布變化,分別為前 置連接模型(Front-Loaded Connection, FLC)與後置連接模型(BackLoaded Connection, BLC)。我們將這些模型與多種現有的深度學習架 構(如長短期記憶模型LSTM及其變體)進行比較,預測範圍涵蓋從未 來1小時至第64小時。 此外,我們開發了一種包裝損失函數(wrapped loss function),可 在訓練階段中明確處理資料漂移問題。當該損失函數被整合至基準模 型及自訂模型中時,模型的預測準確度在各項評估指標(RMSE、MAE 與MAPE)上皆有穩定提升。 FLC與BLC模型能有效緩解資料漂移的影響,若搭配包裝損失函數 使用,更能進一步提升模型的穩定性與準確性。實驗結果顯示,我們 所提出的模型優於傳統基準模型。具體而言,與BILSTM模型相比,在1至24小時預測中表現提升24.1%至16%;在32至64小時預測中則提 升12%至8.3%。與CNN模型相比,改善幅度為24.6%至11.8%(1–24小 時),以及10%至10.2%(32–64小時)。 另一方面,時間序列資料的預測本質上具有高度複雜性,這也促使 了先進神經網路技術的發展。在PM2.5濃度的監測與預測任務中更具 挑戰性,因為其擴散過程受到多種自然與人為因素交互影響,使得準 確預測變得既困難又成本高昂。PM2.5預測的一項關鍵挑戰在於其數 據分布變異性高,隨時間變動劇烈。與此同時,神經網路提供了一個 具成本效益且準確度高的解決方案,可有效應對這些複雜情形。 像LSTM與BILSTM這類深度學習模型已廣泛應用於PM2.5預測任務 中。然而,隨著預測時間視窗從1小時擴展至72小時,預測誤差也隨之 增加,突顯出長期預測的不確定性。 在本研究中,我們採用了神經常微分方程(Neural Ordinary Differential Equations, Neural ODEs)來提升時間序列預測的表現。作為 連續時間神經網路,Neural ODEs 擅長建模時間序列資料中的複雜動 態,為傳統LSTM模型提供一個更穩健的替代方案。我們提出了兩種 基於ODE的模型:一種是基於Transformer架構的ODE模型,另一種為 封閉形式ODE模型(Closed-form ODE model)。實證結果顯示,這些 模型在1至72小時預測任務中的預測準確度顯著提升,相較於LSTM模 型,改善幅度介於2.91%至14.15%之間。 此外,經由配對t檢定分析,我們所提出的模型CCCFC在RMSE評 估指標上與BILSTM、LSTM、GRU、ODE-LSTM及PCNN模型的結果 存在顯著差異,進一步驗證了CCCFC在每小時PM2.5預測任務中的表 現優勢。
In many deep learning applications, it is typically presumed that training and testing data are drawn from the same underlying distribution. Nevertheless, this presupposition often does not remain valid for empirical data collected across different time periods or under varying environmental conditions. For example, a city’s temperature patterns may differ significantly from year to year due to complex, often unpredictable factors. In this study, we applied three different statistical methods to examine inter-annual data drift across multiple monitoring stations. The aforementioned approaches were applied to ascertain statistical significance indicators for each monitoring point by conceiving simulated datasets that approximated the observations across five continuous years (2014–2018), thereby identifying locations with substantial data drift. This statistical evaluation allowed us to determine which stations experienced the most significant temporal variability. To investigate the impact of data drift, meteorological data, weather, and air quality datasets are applied. Based on these insights, we developed two novel prediction models designed to account for such distributional changes in hourly PM2.5 forecasts: the Front-Loaded Connection (FLC) model and the Back-Loaded Connection (BLC) model. We benchmarked these models against several established deep learning architectures, including Long ShortTerm Memory (LSTM) and its variants, across prediction windows ranging from the +1 hour up to the next +64th hour. Moreover, we developed a wrapped loss function, which enhances model training by explicitly addressing the issue of data drift. When integrated into both baseline and custom models, this modified loss function consistently improved predictive accuracy, of evaluation metrics of RMSE, MAE, and MAPE. The FLC and BLC models effectively mitigate the impact of data drift, and when paired with the wrapped loss, they further boost model reliability and precision. Experimental results demonstrate that our proposed models outperform conventional baselines. Specifically, in comparisons with the BILSTM model, the performance improved by 24.1% to 16% for 1h–24h forecasts and 12% to 8.3% for 32h–64h forecasts. Compared with the CNN model, improvements ranged from 24.6% to 11.8% and 10% to 10.2% across the same time intervals. Secondly, predicting time-series data is inherently complex, spurring the development of advanced neural network approaches. Monitoring and predicting PM2.5 levels is especially challenging due to the interplay of diverse natural and anthropogenic factors influencing its dispersion, making accurate predictions both costly and intricate. A key challenge in predicting PM2.5 concentrations lies in its variability, as the data distribution fluctuates significantly over time. Meanwhile, neural networks provide a cost-effective and highly accurate solution in managing such complexities. Deep learning models like LSTM and BILSTM have been widely applied to PM2.5 prediction tasks. However, prediction errors increase as the forecasting window expands from 1 to 72 hours, underscoring the rising uncertainty in longer-term predictions. In this study, Neural Ordinary Differential Equations (Neural ODEs) were adopted to improve performance in time-series prediction tasks. As continuous-time neural networks, Neural ODEs excel in modeling the intricate dynamics of time-series data, presenting a robust alternative to traditional LSTM models. We propose two ODE-based models: a transformer-based ODE model and a closed-form ODE model. Empirical evaluations show these models significantly enhance prediction accuracy, with improvements ranging from 2.91%-14.15% for 1-hour to 72-hour predictions when compared to LSTM-based models. Moreover, after conducting the paired t-test, the RMSE values of the proposed model (CCCFC) were found to be significantly different from those of BILSTM, LSTM, GRU, ODE-LSTM, and PCNN. This implies that CCCFC demonstrates a distinct performance advantage, reinforcing its effectiveness in hourly PM2.5 forecasting.
參考文獻 [1] M. Andrade, P. Artaxo, S. G. El Khouri Miraglia, N. Gouveia, A. J. Krupnick, J. Krutmann, P. J. Landrigan, K. Langerman, T. Makonese, A. Mathee, et al. Air pollution and health-a science-policy initiative. Annals of Global Health, 85(1), 2019. [2] J. T. Barron. A general and adaptive robust loss function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4331–4339, 2019. [3] A. Bekkar, B. Hssina, S. Douzi, and K. Douzi. Air-pollution prediction in smart city, deep learning approach. Journal of big Data, 8:1–21, 2021. [4] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015. [5] R. Cai, Z. Li, P. Wei, J. Qiao, K. Zhang, and Z. Hao. Learning disentangled semantic representation for domain adaptation. In IJCAI: proceedings of the conference, volume 2019, page 2060. NIH Public Access, 2019. [6] Y.-S. Chang, H.-T. Chiao, S. Abimannan, Y.-P. Huang, Y.-T. Tsai, and K.-M. Lin. An lstm-based aggregated model for air pollution forecasting. Atmospheric Pollution Research, 11(8):1451–1463, 2020. [7] L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021. [8] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018. [9] C. Coelho, M. F. P. Costa, and L. L. Ferr´as. Enhancing continuous time series modelling with a latent ode-lstm approach. Applied Mathematics and Computation, 475:128727, 2024. [10] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9268–9277, 2019. [11] J. Devlin. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [12] A. Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [13] H. Drygas. Consistency of the least squares and gauss-markov estimators in regression models. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete, 17(4):309–326, 1971. [14] E. Dupont, A. Doucet, and Y. W. Teh. Augmented neural odes. Advances in neural information processing systems, 32, 2019. [15] N. B. Erichson, O. Azencot, A. Queiruga, L. Hodgkinson, and M. W. Mahoney. Lipschitz recurrent neural networks. arXiv preprint arXiv:2006.12070, 2020. [16] T. Fang, N. Lu, G. Niu, and M. Sugiyama. Rethinking importance weighting for deep learning under distribution shift. Advances in neural information processing systems, 33:11996–12007, 2020. [17] R. Feng, H. Gao, K. Luo, and J.-r. Fan. Analysis and accurate prediction of ambient pm2. 5 in china using multi-layer perceptron. Atmospheric environment, 232:117534, 2020. [18] B. Fuglede and F. Topsoe. Jensen-shannon divergence and hilbert space embedding. In International symposium on Information theory, 2004. ISIT 2004. Proceedings., page 31. IEEE, 2004. [19] J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. In Advances in Artificial Intelligence–SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, September 29-Ocotber 1, 2004. Proceedings 17, pages 286–295. Springer, 2004. [20] P. Gleeson, D. Lung, R. Grosu, R. Hasani, and S. D. Larson. c302: a multiscale framework for modelling the nervous system of caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1758):20170379, 2018. [21] P. Golik, P. Doetsch, and H. Ney. Cross-entropy vs. squared error training: a theoretical and experimental comparison. In Interspeech, volume 13, pages 1756–1760, 2013. [22] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. [23] W. Grathwohl, R. T. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018. [24] S. Greenland, S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4):337–350, 2016. [25] A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨olkopf, and A. Smola. A kernel method for the two-sample-problem. Advances in neural information processing systems, 19, 2006. [26] K. Gu, J. Qiao, and X. Li. Highly efficient picture-based prediction of pm2. 5 con-centration. IEEE Transactions on Industrial Electronics, 66(4):3176–3184, 2018. [27] R. Hasani, M. Lechner, A. Amini, L. Liebenwein, A. Ray, M. Tschaikowski, G. Teschl, and D. Rus. Closed-form continuous-time neural networks. Nature Machine Intelligence, 4(11):992–1003, 2022. [28] R. Hasani, M. Lechner, A. Amini, D. Rus, and R. Grosu. Liquid time-constant networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7657–7666, 2021. [29] M. K. Hossen, Y.-T. Peng, and M. C. Chen. Enhancing pm2. 5 prediction by mitigating annual data drift using wrapped loss and neural networks. PloS one, 20(2):e0314327, 2025. [30] H. Iftikhar, M. Qureshi, J. Zywiołek, J. L. L´opez-Gonzales, and O. Albalawi. Short-term pm 2.5 forecasting using a unique ensemble technique for proactive environmental management initiatives. Frontiers in Environmental Science, 12:1442644, 2024. [31] K. U. Jaseena and B. C. Kovoor. Decomposition-based hybrid wind speed forecasting model using deep bidirectional lstm networks. Energy Conversion and Management, 234:113944, 2021. [32] P.-T. Jia, H.-C. He, L. Liu, and T. Sun. Overview of time series data mining. Jisuanji Yingyong Yanjiu/ Application Research of Computers, 24(11):15–18, 2007. [33] N. Jiang, F. Fu, H. Zuo, X. Zheng, and Q. Zheng. A municipal pm2. 5 forecasting method based on random forest and wrf model. Engineering Letters, 28(2), 2020. [34] X. Jiang, Y. Luo, and B. Zhang. Prediction of pm2. 5 concentration based on the lstm-tslightgbm variable weight combination model. Atmosphere, 12(9):1211, 2021. [35] Z. Joharestani. Mehdi et al.(2019).“pm2. 5 prediction based on random forest, xgboost, and deep learning using multisource remote sensing data.”. Atmosphere, 10:373. [36] P. Kavianpour, M. Kavianpour, E. Jahani, and A. Ramezani. A cnn-bilstm model with attention mechanism for earthquake prediction. The Journal of Supercomput-ing, 79(17):19194–19226, 2023. [37] A. Kendall, Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018. [38] G. Kibirige, C. C. Huang, C. L. Liu, and M. C. Chen. Influence of land-sea breeze on pm prediction in central and southern taiwan using composite neural network. Science Reports, 13(3827), 2023. [39] G. W. Kibirige, M.-C. Yang, C.-L. Liu, and M. C. Chen. Using satellite data on remote transportation of air pollutants for pm2. 5 prediction in northern taiwan. Plos one, 18(3):e0282471, 2023. [40] J. Kim and N. Moon. Bilstm model based on multivariate time series data in multiple field for forecasting trading area. Journal of Ambient Intelligence and Humanized Computing, pages 1–10, 2019. [41] E. Kristiani, T.-Y. Kuo, C.-T. Yang, K.-C. Pai, C.-Y. Huang, and K. L. P. Nguyen. Pm2. 5 forecasting model using a combination of deep learning and statistical feature selection. IEEE Access, 9:68573–68582, 2021. [42] M. Lechner and R. Hasani. Learning long-term dependencies in irregularly-sampled time series. arXiv preprint arXiv:2006.04418, 2020. [43] J. Lee Rodgers and W. A. Nicewander. Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1):59–66, 1988. [44] J. Lelieveld, J. Evans, M. Fnais, and et al. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature, 525, 2015. [45] H. Li, J. Wang, and H. Yang. A novel dynamic ensemble air quality index fore-casting system. Atmospheric Pollution Research, 11(8):1258–1270, 2020. [46] Q. Li, L. Chen, C. Tai, and E. Weinan. Maximum principle based algorithms for deep learning. Journal of Machine Learning Research, 18(165):1–29, 2018. [47] S. Li, G. Xie, J. Ren, L. Guo, Y. Yang, and X. Xu. Urban pm2. 5 concentration prediction via attention-based cnn–lstm. Applied Sciences, 10(6):1953, 2020. [48] T. Li, M. Hua, and X. Wu. A hybrid cnn-lstm model for forecasting particulate matter (pm2. 5). Ieee Access, 8:26933–26940, 2020. [49] X. Li, T.-K. L. Wong, R. T. Chen, and D. Duvenaud. Scalable gradients for stochas-tic differential equations. In International Conference on Artificial Intelligence and Statistics, pages 3870–3882. PMLR, 2020. [50] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017. [51] C. T. Liu, M. C. Yang, and M. C. Chen. Wrapped loss function for regularizing nonconforming residual distributions. CoRR, abs/1808.06733, 2018. [52] M. Liu, Y. Lu, S. Long, J. Bai, and W. Lian. An attention-based cnn-bilstm hybrid neural network enhanced with features of discrete wavelet transformation for fetal acidosis classification. Expert Systems with Applications, 186:115714, 2021. [53] M.-D. Liu, L. Ding, and Y.-L. Bai. Application of hybrid model based on empirical mode decomposition, novel recurrent neural networks and the arima to wind speed prediction. Energy Conversion and Management, 233:113917, 2021. [54] D. Lu, W. Mao, W. Xiao, and L. Zhang. Non-linear response of pm2. 5 pollution to land use change in china. Remote Sensing, 13(9):1612, 2021. [55] Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang. The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30, 2017. [56] J. Ma, Y. Ding, J. C. Cheng, F. Jiang, V. J. Gan, and Z. Xu. A lag-flstm deep learning network based on bayesian optimization for multi-sequential-variant pm2.5 prediction. Sustainable Cities and Society, 60:102237, 2020. [57] D. B. Mark, K. L. Lee, and F. E. Harrell. Understanding the role of p values and hypothesis tests in clinical research. JAMA cardiology, 1(9):1048–1054, 2016. [58] S. Massaroli, M. Poli, M. Bin, J. Park, A. Yamashita, and H. Asama. Stable neural flows. arXiv preprint arXiv:2003.08063, 2020. [59] E. E. McDuffie, R. V. Martin, J. V. Spadaro, R. Burnett, S. J. Smith, P. O’Rourke, M. S. Hammer, A. van Donkelaar, L. Bindle, V. Shah, et al. Source sector and fuel contributions to ambient pm2. 5 and attributable mortality across multiple spatial scales. Nature communications, 12(1):3594, 2021. [60] I. G. McKendry. Evaluation of artificial neural networks for fine particulate polution (pm10 and pm2. 5) forecasting. Journal of the Air & Waste Management Association, 52(9):1096–1101, 2002. [61] F. S. Nahm. What the p values really tell us. The Korean journal of pain, 30(4):241–242, 2017. [62] U. Pak, J. Ma, U. Ryu, K. Ryom, U. Juhyok, K. Pak, and C. Pak. Deep learning-based pm2. 5 prediction considering the spatiotemporal correlations: A case study of beijing, china. Science of the Total Environment, 699:133561, 2020. [63] S. Rabanser, S. G¨unnemann, and Z. Lipton. Failing loudly: An empirical study of methods for detecting dataset shift. Advances in Neural Information Processing Systems, 32, 2019. [64] Y. Rubanova, R. T. Chen, and D. K. Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019. [65] M. J. Schervish. P values: what they are and what they are not. The American Statistician, 50(3):203–206, 1996. [66] Z. Shang, T. Deng, J. He, and X. Duan. A novel model for hourly pm2. 5 concentration prediction based on cart and eelm. Science of The Total Environment, 651:3043–3052, 2019. [67] D. K. Sharma, R. P. Varshney, S. Agarwal, A. A. Alhussan, and H. A. Abdallah. Developing a multivariate time series forecasting framework based on stacked autoencoders and multi-phase feature. Heliyon, 10(7), 2024. [68] R. H. Shumway, D. S. Stoffer, and D. S. Stoffer. Time series analysis and its applications, volume 3. Springer, 2000. [69] S. Siami-Namini, N. Tavakoli, and A. S. Namin. The performance of lstm and bilstm in forecasting time series. In 2019 IEEE International Conference on Big Data (Big Data), pages 3285–3292. IEEE, 2019. [70] A. Suleiman, M. Tight, and A. Quinn. Applying machine learning methods in managing urban concentrations of traffic-related particulate matter (pm10 and pm2.5). Atmospheric Pollution Research, 10(1):134–144, 2019. [71] W. Sun, H. Zhang, A. Palazoglu, A. Singh, W. Zhang, and S. Liu. Prediction of 24-hour-average pm2. 5 concentrations using a hidden markov model with different emission distributions in northern california. Science of the total environment, 443:93–103, 2013. [72] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. [73] Q. Tao, F. Liu, Y. Li, and D. Sidorov. Air pollution forecasting using a deep learning model based on 1d convnets and bidirectional gru. IEEE access, 7:76690–76698, 2019. [74] Y. Tian, Y. Zhao, Z. Yin, N. Deng, S. Li, H. Zhao, and B. Huang. Integrating spatial-temporal features into prediction tasks: A novel method for identifying the potential water pollution area in large river basins. Journal of Environmental Management, 373:123522, 2025. [75] Z. Tian and H. Chen. Multi-step short-term wind speed prediction based on integrated multi-model fusion. Applied Energy, 298:117248, 2021. [76] W. Tong, J. Limperis, F. Hamza-Lup, Y. Xu, and L. Li. Robust transformer-based model for spatiotemporal pm 2.5 prediction in california. Earth Science Informatics, 17(1):315–328, 2024. [77] R. P. Varshney and D. K. Sharma. Enhancing stock market prediction through image encoding, pattern recognition, and ensemble learning with custom error correction techniques. International Journal of Computational Vision and Robotics, 14(6):654–676, 2024. [78] R. P. Varshney and D. K. Sharma. Optimizing time-series forecasting using stacked deep learning framework with enhanced adaptive moment estimation and error correction. Expert Systems with Applications, 249:123487, 2024. [79] A. Vaswani. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017. [80] B. Wang, Q. Yuan, Q. Yang, L. Zhu, T. Li, and L. Zhang. Estimate hourly pm2. 5 concentrations from himawari-8 toa reflectance directly using geo-intelligent long short-term memory network. Environmental Pollution, 271:116327, 2021. [81] X. Wang and B. Wang. Research on prediction of environmental aerosol and pm2. 5 based on artificial neural network. Neural Computing and Applications, 31(12):8217–8227, 2019. [82] A. Waqas, A. Tripathi, R. P. Ramachandran, P. Stewart, and G. Rasool. Multimodal data integration for oncology in the era of deep neural networks: a review. arXiv preprint arXiv:2303.06471, 2023. [83] Y. Wei, Y. Zheng, and Q. Yang. Transfer knowledge between cities. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1905–1914, 2016. [84] E. Weinan. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 1(5):1–11, 2017. [85] C. Wen, S. Liu, X. Yao, L. Peng, X. Li, Y. Hu, and T. Chi. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Science of the total environment, 654:1091–1099, 2019. [86] X. Wen and W. Li. Time series prediction based on lstm-attention-lstm model. IEEE Access, 2023. [87] J. Xing, S. Zheng, D. Ding, J. T. Kelly, S. Wang, S. Li, T. Qin, M. Ma, Z. Dong, C. Jang, et al. Deep learning for prediction of the air quality response to emission changes. Environmental science & technology, 54(14):8589–8600, 2020. [88] Y.-F. Xing, Y.-H. Xu, M.-H. Shi, and Y.-X. Lian. The impact of pm2. 5 on the human respiratory system. Journal of thoracic disease, 8(1):E69, 2016. [89] Y.-F. Xing, Y.-H. Xu, M.-H. Shi, and Y.-X. Lian. The impact of pm2. 5 on the human respiratory system. Journal of thoracic disease, 8(1):E69, 2016. [90] R. R. Yager. On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics, 18(1):183–190, 1988. [91] P. T. Yamak, L. Yujian, and P. K. Gadosey. A comparison between arima, lstm, andgru for time series forecasting. In Proceedings of the 2019 2nd international conference on algorithms, computing and artificial intelligence, pages 49–55, 2019. [92] G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019. [93] H.-C. Yang, M.-C. Yang, G.-W. Wong, and M. C. Chen. Extreme event discovery with self-attention for pm2. 5 anomaly prediction. IEEE Intelligent Systems, 2023. [94] H.-C. Yang, M.-C. Yang, G.-W. Wong, and M. C. Chen. Extreme event discovery with self-attention for pm2.5 anomaly prediction. IEEE Intelligent Systems, 38(2):36–45, 2023. [95] M.-C. Yang and M. C. Chen. Composite neural network: Theory and application to pm2. 5 prediction. IEEE Transactions on Knowledge and Data Engineering,2021. [96] M.-C. Yang, G.-W. Wong, and M. C. Chen. Sparse grid imputation using unpaired imprecise auxiliary data: Theory and application to pm2.5 estimation. ACM Trans. Knowl. Discov. Data, 18(3), Jan. 2024. [97] S. Yang, J. Sui, T. Liu, and et. al. Trends on pm 2.5 research, 1997–2016: a bibliometric study. Environmental Science and Pollution Research, 25:12284–12298,2018. [98] N. Zhai, P. Yao, and X. Zhou. Multivariate time series forecast in industrial process based on xgboost and gru. In 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), volume 9, pages 1397–1400. IEEE, 2020. [99] B. Zhang, H. Zhang, G. Zhao, and J. Lian. Constructing a pm2. 5 concentration prediction model by combining auto-encoder with bi-lstm neural networks. Environmental Modelling & Software, 124:104600, 2020. [100] H. Zhang, Z. Wang, and D. Liu. A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Transactions on Neural Networks and Learning Systems, 25(7):1229–1262, 2014. [101] L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, and J. Wang. Trend analysis and forecast of pm2. 5 in fuzhou, china using the arima model. Ecological indicators, 95:702–710, 2018. [102] M. Zhang, D. Wu, and R. Xue. Hourly prediction of pm 2.5 concentration in Beijing based on bi-lstm neural network. Multimedia Tools and Applications, 80:24455–24468, 2021. [103] X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu, and B. Wu. At-lstm: Anattention-based lstm model for financial time series prediction. In IOP Conference Series: Materials Science and Engineering, volume 569, page 052037. IOP Publishing, 2019. [104] H. Zhou, Z. Deng, Y. Xia, and M. Fu. A new sampling method in particle filter based on pearson correlation coefficient. Neurocomputing, 216:208–215, 2016. [105] J. Zhu, F. Deng, J. Zhao, and H. Zheng. Attention-based parallel networks (ap-net) for pm2. 5 spatiotemporal prediction. Science of The Total Environment, 769:145082, 2021. [106] M. Zhu and J. Xie. Investigation of nearby monitoring station for hourly pm2. 5 forecasting using parallel multi-input 1d-cnn-bilstm. Expert Systems with Applications, 211:118707, 2023. [107] S. Zhu, X. Lian, L. Wei, J. Che, X. Shen, L. Yang, X. Qiu, X. Liu, W. Gao, X. Ren, et al. Pm2. 5 forecasting using svr with psogsa algorithm based on ceemd, grnn and gca considering meteorological factors. Atmospheric environment, 183:20–32, 2018. [108] D. Z¨ugner, A. Akbarnejad, and S. G¨unnemann. Adversarial attacks on neural net-works for graph data. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2847–2856, 2018.
描述 博士
國立政治大學
社群網路與人智計算國際研究生博士學位學程(TIGP)
107761503
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107761503
資料類型 thesis
dc.contributor.advisor 陳孟彰<br>彭彥璁zh_TW
dc.contributor.advisor Chen, Meng Chang<br>Peng, Yan-Tsungen_US
dc.contributor.author (Authors) 侯康力zh_TW
dc.contributor.author (Authors) Md Khalid Hossenen_US
dc.creator (作者) 侯康力zh_TW
dc.creator (作者) Hossen, Md Khaliden_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Sep-2025 15:48:13 (UTC+8)-
dc.date.available 1-Sep-2025 15:48:13 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2025 15:48:13 (UTC+8)-
dc.identifier (Other Identifiers) G0107761503en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159210-
dc.description (描述) 博士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 社群網路與人智計算國際研究生博士學位學程(TIGP)zh_TW
dc.description (描述) 107761503zh_TW
dc.description.abstract (摘要) 在許多深度學習應用中,通常假設訓練資料與測試資料來自相同的 基礎分布。然而,這項前提在實際中往往並不成立,特別是當資料蒐 集橫跨不同時間區段或處於不同環境條件時。例如,一個城市的氣溫 模式可能因為複雜且難以預測的因素,每年之間存在顯著差異。 在本研究中,我們採用了三種不同的統計方法,來檢視多個監測站 點在不同年份之間的資料漂移(data drift)情形。這些方法透過模擬資 料集以近似2014年至2018年間的觀測資料,並為每一個監測點產出統 計顯著性指標,藉此辨識出具有顯著資料漂移的地點。此統計評估讓 我們能夠確認哪些監測站點在時間上出現了明顯的變異。 為了進一步探討資料漂移的影響,我們分析了氣象資料、天氣資 料與空氣品質資料。根據這些分析結果,我們設計了兩種新穎的預測模型,用以因應每小時PM2.5預測中可能發生的分布變化,分別為前 置連接模型(Front-Loaded Connection, FLC)與後置連接模型(BackLoaded Connection, BLC)。我們將這些模型與多種現有的深度學習架 構(如長短期記憶模型LSTM及其變體)進行比較,預測範圍涵蓋從未 來1小時至第64小時。 此外,我們開發了一種包裝損失函數(wrapped loss function),可 在訓練階段中明確處理資料漂移問題。當該損失函數被整合至基準模 型及自訂模型中時,模型的預測準確度在各項評估指標(RMSE、MAE 與MAPE)上皆有穩定提升。 FLC與BLC模型能有效緩解資料漂移的影響,若搭配包裝損失函數 使用,更能進一步提升模型的穩定性與準確性。實驗結果顯示,我們 所提出的模型優於傳統基準模型。具體而言,與BILSTM模型相比,在1至24小時預測中表現提升24.1%至16%;在32至64小時預測中則提 升12%至8.3%。與CNN模型相比,改善幅度為24.6%至11.8%(1–24小 時),以及10%至10.2%(32–64小時)。 另一方面,時間序列資料的預測本質上具有高度複雜性,這也促使 了先進神經網路技術的發展。在PM2.5濃度的監測與預測任務中更具 挑戰性,因為其擴散過程受到多種自然與人為因素交互影響,使得準 確預測變得既困難又成本高昂。PM2.5預測的一項關鍵挑戰在於其數 據分布變異性高,隨時間變動劇烈。與此同時,神經網路提供了一個 具成本效益且準確度高的解決方案,可有效應對這些複雜情形。 像LSTM與BILSTM這類深度學習模型已廣泛應用於PM2.5預測任務 中。然而,隨著預測時間視窗從1小時擴展至72小時,預測誤差也隨之 增加,突顯出長期預測的不確定性。 在本研究中,我們採用了神經常微分方程(Neural Ordinary Differential Equations, Neural ODEs)來提升時間序列預測的表現。作為 連續時間神經網路,Neural ODEs 擅長建模時間序列資料中的複雜動 態,為傳統LSTM模型提供一個更穩健的替代方案。我們提出了兩種 基於ODE的模型:一種是基於Transformer架構的ODE模型,另一種為 封閉形式ODE模型(Closed-form ODE model)。實證結果顯示,這些 模型在1至72小時預測任務中的預測準確度顯著提升,相較於LSTM模 型,改善幅度介於2.91%至14.15%之間。 此外,經由配對t檢定分析,我們所提出的模型CCCFC在RMSE評 估指標上與BILSTM、LSTM、GRU、ODE-LSTM及PCNN模型的結果 存在顯著差異,進一步驗證了CCCFC在每小時PM2.5預測任務中的表 現優勢。zh_TW
dc.description.abstract (摘要) In many deep learning applications, it is typically presumed that training and testing data are drawn from the same underlying distribution. Nevertheless, this presupposition often does not remain valid for empirical data collected across different time periods or under varying environmental conditions. For example, a city’s temperature patterns may differ significantly from year to year due to complex, often unpredictable factors. In this study, we applied three different statistical methods to examine inter-annual data drift across multiple monitoring stations. The aforementioned approaches were applied to ascertain statistical significance indicators for each monitoring point by conceiving simulated datasets that approximated the observations across five continuous years (2014–2018), thereby identifying locations with substantial data drift. This statistical evaluation allowed us to determine which stations experienced the most significant temporal variability. To investigate the impact of data drift, meteorological data, weather, and air quality datasets are applied. Based on these insights, we developed two novel prediction models designed to account for such distributional changes in hourly PM2.5 forecasts: the Front-Loaded Connection (FLC) model and the Back-Loaded Connection (BLC) model. We benchmarked these models against several established deep learning architectures, including Long ShortTerm Memory (LSTM) and its variants, across prediction windows ranging from the +1 hour up to the next +64th hour. Moreover, we developed a wrapped loss function, which enhances model training by explicitly addressing the issue of data drift. When integrated into both baseline and custom models, this modified loss function consistently improved predictive accuracy, of evaluation metrics of RMSE, MAE, and MAPE. The FLC and BLC models effectively mitigate the impact of data drift, and when paired with the wrapped loss, they further boost model reliability and precision. Experimental results demonstrate that our proposed models outperform conventional baselines. Specifically, in comparisons with the BILSTM model, the performance improved by 24.1% to 16% for 1h–24h forecasts and 12% to 8.3% for 32h–64h forecasts. Compared with the CNN model, improvements ranged from 24.6% to 11.8% and 10% to 10.2% across the same time intervals. Secondly, predicting time-series data is inherently complex, spurring the development of advanced neural network approaches. Monitoring and predicting PM2.5 levels is especially challenging due to the interplay of diverse natural and anthropogenic factors influencing its dispersion, making accurate predictions both costly and intricate. A key challenge in predicting PM2.5 concentrations lies in its variability, as the data distribution fluctuates significantly over time. Meanwhile, neural networks provide a cost-effective and highly accurate solution in managing such complexities. Deep learning models like LSTM and BILSTM have been widely applied to PM2.5 prediction tasks. However, prediction errors increase as the forecasting window expands from 1 to 72 hours, underscoring the rising uncertainty in longer-term predictions. In this study, Neural Ordinary Differential Equations (Neural ODEs) were adopted to improve performance in time-series prediction tasks. As continuous-time neural networks, Neural ODEs excel in modeling the intricate dynamics of time-series data, presenting a robust alternative to traditional LSTM models. We propose two ODE-based models: a transformer-based ODE model and a closed-form ODE model. Empirical evaluations show these models significantly enhance prediction accuracy, with improvements ranging from 2.91%-14.15% for 1-hour to 72-hour predictions when compared to LSTM-based models. Moreover, after conducting the paired t-test, the RMSE values of the proposed model (CCCFC) were found to be significantly different from those of BILSTM, LSTM, GRU, ODE-LSTM, and PCNN. This implies that CCCFC demonstrates a distinct performance advantage, reinforcing its effectiveness in hourly PM2.5 forecasting.en_US
dc.description.tableofcontents Dedication I Acknowledgments II 中文摘要 IV Abstract VI 1 Introduction 1 1.1 Background of the Study . . . 1 1.2 Motivation . . . 5 1.3 Contribution . . . 6 1.4 Dissertation Organization . . . 7 2 PM2.5 Data Drifting and its Verifications 8 2.1 Overview . . . 8 2.2 Time Series Analysis Methods Applied to PM2.5 Prediction . . . 8 2.3 Data Shifts Detection . . . 10 2.4 Applying Transfer Learning Techniques to Address Data Shift . . . 12 2.5 Multi-output Weighting Strategies for Loss Functions . . . 12 2.5.1 Neural Networks for PM2.5 time series prediction . . . . . . . . 14 2.5.2 Liquid time constant in time series prediction . . . 15 2.5.3 Continuous-time neural networks . . . 16 2.6 PM2.5 Data Drifting . . . 17 2.7 Confirming the Presence of Data Drift with statistical techniques . . . 18 2.7.1 Divergence Measures . . . 18 2.7.2 Pearson’s Correlation(PC) . . . 21 2.7.3 Analyze P values of Measures . . . 23 2.8 Summary . . . 25 3 Proposed model and Wrap loss function 26 3.1 Overview . . . 26 3.2 Wrapped Loss Function. . .26 3.3 Empirical Study of PM2.5 Prediction . . . 28 3.3.1 Data Collection and Preprocessing . . . 28 3.3.2 Development of the Prediction Model. . .30 3.3.3 FLC and BLC Models . . .32 3.4 Prediction Results . . .32 3.5 Evaluation . . . 37 3.5.1 Discussion for RMSE value . . .37 3.5.2 MAE (μg/m3) Value Analysis . . . 38 3.5.3 Discussion of MAPE Percentage . . .39 3.6 Summary . . . 39 3.7 Appendix results . . . 41 4 ODE based neural network for PM2.5 Prediction 45 4.1 Overview . . . 45 4.2 Introduction . . . 45 4.3 Difference with based model and proposed model . . .49 4.3.1 Novelty of Aproach for TRCFC and CCCFC model with LSTM . . .49 4.4 Applied Baseline Models and proposed models . . . 50 4.4.1 LSTM, BILSTM, GRU, CNN-LSTM and PCNN . . . 50 4.4.2 ODE-LSTM . . .50 4.4.3 Common Convolutional closed form Continuous-Time Neural Networks(CCCFC) . . . 52 4.4.4 Transformer model . . . 52 4.4.5 Self-Attention in Transformer Architecture . . . 53 4.5 Proposed CCCFC and TRCFC Models . . . 55 4.5.1 Common Convolutional closed form Continuous-Time Neural Networks (CCCFC) Model Architecture . . . 55 4.5.2 Transformer-Closed form continuous (TRCFC) model Architecture . . .56 4.6 Experimental Design . . . 58 4.6.1 Hyperparameter Settings for CCCFC and TRCFC model . . .60 4.7 Empirical Studies of PM2.5 Prediction. . . 61 4.7.1 Datasets and processing Decription . . .61 4.7.2 Evaluation Matrics . . .62 4.8 Prediction Result Analysis . . .63 4.8.1 RMSE Prediction result analysis . . . 63 4.8.2 RMSE value t test . . . 65 4.8.3 The MAE value result evaluation . . .66 4.8.4 The MAPE and R2 value result evaluation . . . 68 4.8.5 Loss Curve . . . 70 4.9 Deployment Feasibility . . .70 4.9.1 Limitations . . .71 4.9.2 Discussions about the impact of the Enhanced CCFC model performance . . .71 4.10 Discussions about wrap loss with proposed CCCFC and TRCFC model . . .72 5 Conclusion 74 5.1 Conclusion . . . 74 5.2 Limitations and Future Directions. . .75 Bibliography 77 Publications 87zh_TW
dc.format.extent 3089353 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107761503en_US
dc.subject (關鍵詞) 資料漂移zh_TW
dc.subject (關鍵詞) 被包裹的損失函數zh_TW
dc.subject (關鍵詞) PM2.5zh_TW
dc.subject (關鍵詞) 後置連接zh_TW
dc.subject (關鍵詞) 常微分方程zh_TW
dc.subject (關鍵詞) Data Driften_US
dc.subject (關鍵詞) Wrapped lossen_US
dc.subject (關鍵詞) PM2.5en_US
dc.subject (關鍵詞) BLCen_US
dc.subject (關鍵詞) ODEen_US
dc.title (題名) 透過漂移感知包裹損失和神經微分方程式實現穩健的PM2.5預測zh_TW
dc.title (題名) Robust PM2.5 Forecasting via Drift-Aware Wrap Loss and Neural ODEsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] M. Andrade, P. Artaxo, S. G. El Khouri Miraglia, N. Gouveia, A. J. Krupnick, J. Krutmann, P. J. Landrigan, K. Langerman, T. Makonese, A. Mathee, et al. Air pollution and health-a science-policy initiative. Annals of Global Health, 85(1), 2019. [2] J. T. Barron. A general and adaptive robust loss function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4331–4339, 2019. [3] A. Bekkar, B. Hssina, S. Douzi, and K. Douzi. Air-pollution prediction in smart city, deep learning approach. Journal of big Data, 8:1–21, 2021. [4] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015. [5] R. Cai, Z. Li, P. Wei, J. Qiao, K. Zhang, and Z. Hao. Learning disentangled semantic representation for domain adaptation. In IJCAI: proceedings of the conference, volume 2019, page 2060. NIH Public Access, 2019. [6] Y.-S. Chang, H.-T. Chiao, S. Abimannan, Y.-P. Huang, Y.-T. Tsai, and K.-M. Lin. An lstm-based aggregated model for air pollution forecasting. Atmospheric Pollution Research, 11(8):1451–1463, 2020. [7] L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021. [8] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018. [9] C. Coelho, M. F. P. Costa, and L. L. Ferr´as. Enhancing continuous time series modelling with a latent ode-lstm approach. Applied Mathematics and Computation, 475:128727, 2024. [10] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9268–9277, 2019. [11] J. Devlin. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [12] A. Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [13] H. Drygas. Consistency of the least squares and gauss-markov estimators in regression models. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete, 17(4):309–326, 1971. [14] E. Dupont, A. Doucet, and Y. W. Teh. Augmented neural odes. Advances in neural information processing systems, 32, 2019. [15] N. B. Erichson, O. Azencot, A. Queiruga, L. Hodgkinson, and M. W. Mahoney. Lipschitz recurrent neural networks. arXiv preprint arXiv:2006.12070, 2020. [16] T. Fang, N. Lu, G. Niu, and M. Sugiyama. Rethinking importance weighting for deep learning under distribution shift. Advances in neural information processing systems, 33:11996–12007, 2020. [17] R. Feng, H. Gao, K. Luo, and J.-r. Fan. Analysis and accurate prediction of ambient pm2. 5 in china using multi-layer perceptron. Atmospheric environment, 232:117534, 2020. [18] B. Fuglede and F. Topsoe. Jensen-shannon divergence and hilbert space embedding. In International symposium on Information theory, 2004. ISIT 2004. Proceedings., page 31. IEEE, 2004. [19] J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. In Advances in Artificial Intelligence–SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, September 29-Ocotber 1, 2004. Proceedings 17, pages 286–295. Springer, 2004. [20] P. Gleeson, D. Lung, R. Grosu, R. Hasani, and S. D. Larson. c302: a multiscale framework for modelling the nervous system of caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1758):20170379, 2018. [21] P. Golik, P. Doetsch, and H. Ney. Cross-entropy vs. squared error training: a theoretical and experimental comparison. In Interspeech, volume 13, pages 1756–1760, 2013. [22] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. [23] W. Grathwohl, R. T. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, 2018. [24] S. Greenland, S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4):337–350, 2016. [25] A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨olkopf, and A. Smola. A kernel method for the two-sample-problem. Advances in neural information processing systems, 19, 2006. [26] K. Gu, J. Qiao, and X. Li. Highly efficient picture-based prediction of pm2. 5 con-centration. IEEE Transactions on Industrial Electronics, 66(4):3176–3184, 2018. [27] R. Hasani, M. Lechner, A. Amini, L. Liebenwein, A. Ray, M. Tschaikowski, G. Teschl, and D. Rus. Closed-form continuous-time neural networks. Nature Machine Intelligence, 4(11):992–1003, 2022. [28] R. Hasani, M. Lechner, A. Amini, D. Rus, and R. Grosu. Liquid time-constant networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7657–7666, 2021. [29] M. K. Hossen, Y.-T. Peng, and M. C. Chen. Enhancing pm2. 5 prediction by mitigating annual data drift using wrapped loss and neural networks. PloS one, 20(2):e0314327, 2025. [30] H. Iftikhar, M. Qureshi, J. Zywiołek, J. L. L´opez-Gonzales, and O. Albalawi. Short-term pm 2.5 forecasting using a unique ensemble technique for proactive environmental management initiatives. Frontiers in Environmental Science, 12:1442644, 2024. [31] K. U. Jaseena and B. C. Kovoor. Decomposition-based hybrid wind speed forecasting model using deep bidirectional lstm networks. Energy Conversion and Management, 234:113944, 2021. [32] P.-T. Jia, H.-C. He, L. Liu, and T. Sun. Overview of time series data mining. Jisuanji Yingyong Yanjiu/ Application Research of Computers, 24(11):15–18, 2007. [33] N. Jiang, F. Fu, H. Zuo, X. Zheng, and Q. Zheng. A municipal pm2. 5 forecasting method based on random forest and wrf model. Engineering Letters, 28(2), 2020. [34] X. Jiang, Y. Luo, and B. Zhang. Prediction of pm2. 5 concentration based on the lstm-tslightgbm variable weight combination model. Atmosphere, 12(9):1211, 2021. [35] Z. Joharestani. Mehdi et al.(2019).“pm2. 5 prediction based on random forest, xgboost, and deep learning using multisource remote sensing data.”. Atmosphere, 10:373. [36] P. Kavianpour, M. Kavianpour, E. Jahani, and A. Ramezani. A cnn-bilstm model with attention mechanism for earthquake prediction. The Journal of Supercomput-ing, 79(17):19194–19226, 2023. [37] A. Kendall, Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018. [38] G. Kibirige, C. C. Huang, C. L. Liu, and M. C. Chen. Influence of land-sea breeze on pm prediction in central and southern taiwan using composite neural network. Science Reports, 13(3827), 2023. [39] G. W. Kibirige, M.-C. Yang, C.-L. Liu, and M. C. Chen. Using satellite data on remote transportation of air pollutants for pm2. 5 prediction in northern taiwan. Plos one, 18(3):e0282471, 2023. [40] J. Kim and N. Moon. Bilstm model based on multivariate time series data in multiple field for forecasting trading area. Journal of Ambient Intelligence and Humanized Computing, pages 1–10, 2019. [41] E. Kristiani, T.-Y. Kuo, C.-T. Yang, K.-C. Pai, C.-Y. Huang, and K. L. P. Nguyen. Pm2. 5 forecasting model using a combination of deep learning and statistical feature selection. IEEE Access, 9:68573–68582, 2021. [42] M. Lechner and R. Hasani. Learning long-term dependencies in irregularly-sampled time series. arXiv preprint arXiv:2006.04418, 2020. [43] J. Lee Rodgers and W. A. Nicewander. Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1):59–66, 1988. [44] J. Lelieveld, J. Evans, M. Fnais, and et al. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature, 525, 2015. [45] H. Li, J. Wang, and H. Yang. A novel dynamic ensemble air quality index fore-casting system. Atmospheric Pollution Research, 11(8):1258–1270, 2020. [46] Q. Li, L. Chen, C. Tai, and E. Weinan. Maximum principle based algorithms for deep learning. Journal of Machine Learning Research, 18(165):1–29, 2018. [47] S. Li, G. Xie, J. Ren, L. Guo, Y. Yang, and X. Xu. Urban pm2. 5 concentration prediction via attention-based cnn–lstm. Applied Sciences, 10(6):1953, 2020. [48] T. Li, M. Hua, and X. Wu. A hybrid cnn-lstm model for forecasting particulate matter (pm2. 5). Ieee Access, 8:26933–26940, 2020. [49] X. Li, T.-K. L. Wong, R. T. Chen, and D. Duvenaud. Scalable gradients for stochas-tic differential equations. In International Conference on Artificial Intelligence and Statistics, pages 3870–3882. PMLR, 2020. [50] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017. [51] C. T. Liu, M. C. Yang, and M. C. Chen. Wrapped loss function for regularizing nonconforming residual distributions. CoRR, abs/1808.06733, 2018. [52] M. Liu, Y. Lu, S. Long, J. Bai, and W. Lian. An attention-based cnn-bilstm hybrid neural network enhanced with features of discrete wavelet transformation for fetal acidosis classification. Expert Systems with Applications, 186:115714, 2021. [53] M.-D. Liu, L. Ding, and Y.-L. Bai. Application of hybrid model based on empirical mode decomposition, novel recurrent neural networks and the arima to wind speed prediction. Energy Conversion and Management, 233:113917, 2021. [54] D. Lu, W. Mao, W. Xiao, and L. Zhang. Non-linear response of pm2. 5 pollution to land use change in china. Remote Sensing, 13(9):1612, 2021. [55] Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang. The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30, 2017. [56] J. Ma, Y. Ding, J. C. Cheng, F. Jiang, V. J. Gan, and Z. Xu. A lag-flstm deep learning network based on bayesian optimization for multi-sequential-variant pm2.5 prediction. Sustainable Cities and Society, 60:102237, 2020. [57] D. B. Mark, K. L. Lee, and F. E. Harrell. Understanding the role of p values and hypothesis tests in clinical research. JAMA cardiology, 1(9):1048–1054, 2016. [58] S. Massaroli, M. Poli, M. Bin, J. Park, A. Yamashita, and H. Asama. Stable neural flows. arXiv preprint arXiv:2003.08063, 2020. [59] E. E. McDuffie, R. V. Martin, J. V. Spadaro, R. Burnett, S. J. Smith, P. O’Rourke, M. S. Hammer, A. van Donkelaar, L. Bindle, V. Shah, et al. Source sector and fuel contributions to ambient pm2. 5 and attributable mortality across multiple spatial scales. Nature communications, 12(1):3594, 2021. [60] I. G. McKendry. Evaluation of artificial neural networks for fine particulate polution (pm10 and pm2. 5) forecasting. Journal of the Air & Waste Management Association, 52(9):1096–1101, 2002. [61] F. S. Nahm. What the p values really tell us. The Korean journal of pain, 30(4):241–242, 2017. [62] U. Pak, J. Ma, U. Ryu, K. Ryom, U. Juhyok, K. Pak, and C. Pak. Deep learning-based pm2. 5 prediction considering the spatiotemporal correlations: A case study of beijing, china. Science of the Total Environment, 699:133561, 2020. [63] S. Rabanser, S. G¨unnemann, and Z. Lipton. Failing loudly: An empirical study of methods for detecting dataset shift. Advances in Neural Information Processing Systems, 32, 2019. [64] Y. Rubanova, R. T. Chen, and D. K. Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems, 32, 2019. [65] M. J. Schervish. P values: what they are and what they are not. The American Statistician, 50(3):203–206, 1996. [66] Z. Shang, T. Deng, J. He, and X. Duan. A novel model for hourly pm2. 5 concentration prediction based on cart and eelm. Science of The Total Environment, 651:3043–3052, 2019. [67] D. K. Sharma, R. P. Varshney, S. Agarwal, A. A. Alhussan, and H. A. Abdallah. Developing a multivariate time series forecasting framework based on stacked autoencoders and multi-phase feature. Heliyon, 10(7), 2024. [68] R. H. Shumway, D. S. Stoffer, and D. S. Stoffer. Time series analysis and its applications, volume 3. Springer, 2000. [69] S. Siami-Namini, N. Tavakoli, and A. S. Namin. The performance of lstm and bilstm in forecasting time series. In 2019 IEEE International Conference on Big Data (Big Data), pages 3285–3292. IEEE, 2019. [70] A. Suleiman, M. Tight, and A. Quinn. Applying machine learning methods in managing urban concentrations of traffic-related particulate matter (pm10 and pm2.5). Atmospheric Pollution Research, 10(1):134–144, 2019. [71] W. Sun, H. Zhang, A. Palazoglu, A. Singh, W. Zhang, and S. Liu. Prediction of 24-hour-average pm2. 5 concentrations using a hidden markov model with different emission distributions in northern california. Science of the total environment, 443:93–103, 2013. [72] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. [73] Q. Tao, F. Liu, Y. Li, and D. Sidorov. Air pollution forecasting using a deep learning model based on 1d convnets and bidirectional gru. IEEE access, 7:76690–76698, 2019. [74] Y. Tian, Y. Zhao, Z. Yin, N. Deng, S. Li, H. Zhao, and B. Huang. Integrating spatial-temporal features into prediction tasks: A novel method for identifying the potential water pollution area in large river basins. Journal of Environmental Management, 373:123522, 2025. [75] Z. Tian and H. Chen. Multi-step short-term wind speed prediction based on integrated multi-model fusion. Applied Energy, 298:117248, 2021. [76] W. Tong, J. Limperis, F. Hamza-Lup, Y. Xu, and L. Li. Robust transformer-based model for spatiotemporal pm 2.5 prediction in california. Earth Science Informatics, 17(1):315–328, 2024. [77] R. P. Varshney and D. K. Sharma. Enhancing stock market prediction through image encoding, pattern recognition, and ensemble learning with custom error correction techniques. International Journal of Computational Vision and Robotics, 14(6):654–676, 2024. [78] R. P. Varshney and D. K. Sharma. Optimizing time-series forecasting using stacked deep learning framework with enhanced adaptive moment estimation and error correction. Expert Systems with Applications, 249:123487, 2024. [79] A. Vaswani. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017. [80] B. Wang, Q. Yuan, Q. Yang, L. Zhu, T. Li, and L. Zhang. Estimate hourly pm2. 5 concentrations from himawari-8 toa reflectance directly using geo-intelligent long short-term memory network. Environmental Pollution, 271:116327, 2021. [81] X. Wang and B. Wang. Research on prediction of environmental aerosol and pm2. 5 based on artificial neural network. Neural Computing and Applications, 31(12):8217–8227, 2019. [82] A. Waqas, A. Tripathi, R. P. Ramachandran, P. Stewart, and G. Rasool. Multimodal data integration for oncology in the era of deep neural networks: a review. arXiv preprint arXiv:2303.06471, 2023. [83] Y. Wei, Y. Zheng, and Q. Yang. Transfer knowledge between cities. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1905–1914, 2016. [84] E. Weinan. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 1(5):1–11, 2017. [85] C. Wen, S. Liu, X. Yao, L. Peng, X. Li, Y. Hu, and T. Chi. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Science of the total environment, 654:1091–1099, 2019. [86] X. Wen and W. Li. Time series prediction based on lstm-attention-lstm model. IEEE Access, 2023. [87] J. Xing, S. Zheng, D. Ding, J. T. Kelly, S. Wang, S. Li, T. Qin, M. Ma, Z. Dong, C. Jang, et al. Deep learning for prediction of the air quality response to emission changes. Environmental science & technology, 54(14):8589–8600, 2020. [88] Y.-F. Xing, Y.-H. Xu, M.-H. Shi, and Y.-X. Lian. The impact of pm2. 5 on the human respiratory system. Journal of thoracic disease, 8(1):E69, 2016. [89] Y.-F. Xing, Y.-H. Xu, M.-H. Shi, and Y.-X. Lian. The impact of pm2. 5 on the human respiratory system. Journal of thoracic disease, 8(1):E69, 2016. [90] R. R. Yager. On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics, 18(1):183–190, 1988. [91] P. T. Yamak, L. Yujian, and P. K. Gadosey. A comparison between arima, lstm, andgru for time series forecasting. In Proceedings of the 2019 2nd international conference on algorithms, computing and artificial intelligence, pages 49–55, 2019. [92] G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019. [93] H.-C. Yang, M.-C. Yang, G.-W. Wong, and M. C. Chen. Extreme event discovery with self-attention for pm2. 5 anomaly prediction. IEEE Intelligent Systems, 2023. [94] H.-C. Yang, M.-C. Yang, G.-W. Wong, and M. C. Chen. Extreme event discovery with self-attention for pm2.5 anomaly prediction. IEEE Intelligent Systems, 38(2):36–45, 2023. [95] M.-C. Yang and M. C. Chen. Composite neural network: Theory and application to pm2. 5 prediction. IEEE Transactions on Knowledge and Data Engineering,2021. [96] M.-C. Yang, G.-W. Wong, and M. C. Chen. Sparse grid imputation using unpaired imprecise auxiliary data: Theory and application to pm2.5 estimation. ACM Trans. Knowl. Discov. Data, 18(3), Jan. 2024. [97] S. Yang, J. Sui, T. Liu, and et. al. Trends on pm 2.5 research, 1997–2016: a bibliometric study. Environmental Science and Pollution Research, 25:12284–12298,2018. [98] N. Zhai, P. Yao, and X. Zhou. Multivariate time series forecast in industrial process based on xgboost and gru. In 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), volume 9, pages 1397–1400. IEEE, 2020. [99] B. Zhang, H. Zhang, G. Zhao, and J. Lian. Constructing a pm2. 5 concentration prediction model by combining auto-encoder with bi-lstm neural networks. Environmental Modelling & Software, 124:104600, 2020. [100] H. Zhang, Z. Wang, and D. Liu. A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Transactions on Neural Networks and Learning Systems, 25(7):1229–1262, 2014. [101] L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, and J. Wang. Trend analysis and forecast of pm2. 5 in fuzhou, china using the arima model. Ecological indicators, 95:702–710, 2018. [102] M. Zhang, D. Wu, and R. Xue. Hourly prediction of pm 2.5 concentration in Beijing based on bi-lstm neural network. Multimedia Tools and Applications, 80:24455–24468, 2021. [103] X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu, and B. Wu. At-lstm: Anattention-based lstm model for financial time series prediction. In IOP Conference Series: Materials Science and Engineering, volume 569, page 052037. IOP Publishing, 2019. [104] H. Zhou, Z. Deng, Y. Xia, and M. Fu. A new sampling method in particle filter based on pearson correlation coefficient. Neurocomputing, 216:208–215, 2016. [105] J. Zhu, F. Deng, J. Zhao, and H. Zheng. Attention-based parallel networks (ap-net) for pm2. 5 spatiotemporal prediction. Science of The Total Environment, 769:145082, 2021. [106] M. Zhu and J. Xie. Investigation of nearby monitoring station for hourly pm2. 5 forecasting using parallel multi-input 1d-cnn-bilstm. Expert Systems with Applications, 211:118707, 2023. [107] S. Zhu, X. Lian, L. Wei, J. Che, X. Shen, L. Yang, X. Qiu, X. Liu, W. Gao, X. Ren, et al. Pm2. 5 forecasting using svr with psogsa algorithm based on ceemd, grnn and gca considering meteorological factors. Atmospheric environment, 183:20–32, 2018. [108] D. Z¨ugner, A. Akbarnejad, and S. G¨unnemann. Adversarial attacks on neural net-works for graph data. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2847–2856, 2018.zh_TW