Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 基於市場相對價格型態之股價預測模型
Stock price prediction based on relative price patterns作者 曾祐展
Tseng, Yu-Chan貢獻者 彭彥璁
Peng, Yan-Tsung
曾祐展
Tseng, Yu-Chan關鍵詞 深度學習
多尺度分析
卷積神經網路
視覺變壓器
時間序列
Deep Learning
Multi-scale Analysis
Convolutional Neural Networks
Vision Transformer
Time Series日期 2025 上傳時間 1-Sep-2025 16:18:22 (UTC+8) 摘要 隨著金融市場資訊的爆炸性增長與波動迅速加劇,投資者必須在極短的時間內分析大量且複雜的市場資料,以制定有效且即時的交易策略。傳統的技術分析方法雖能透過價格模式與技術指標判斷市場趨勢,但在面臨多尺度資料整合及即時決策需求時,常顯得不足以應付。本研究提出一套創新的智慧交易系統,透過深度學習方法融合多尺度K線圖,以提升市場趨勢預測的準確性與即時性。 研究首先將多尺度價格序列映射為圖像特徵,分別以傳統線性模型羅吉斯回歸(Logistic Regression),以及深度學習模型卷積神經網路(Convolutional Neural Networks, CNN)與基於Transformer架構的視覺Transformer(Vision Transformer, ViT)進行特徵萃取與方向分類;其次,為驗證影像化方法之優勢,另以不經圖像轉換的數值 OHLCV 序列輸入長短期記憶網路(LSTM)進行對照;並納入雙均線交叉策略作為傳統基線。實驗採用兩段獨立之一年期間資料進行樣本外檢驗,以評估模型穩健性與時序遷移效應。實證結果顯示,本研究所開發之多尺度智慧交易系統,在不同模型結構下均能穩定表現,顯著提高預測準確度與交易績效,成功整合多維市場資訊,為投資人提供具實務價值且可靠的決策支援工具。
With the explosive growth of financial-market information and increasingly rapid price volatility, investors must analyse massive, complex data within extremely short time frames to formulate effective and timely trading strategies. Although traditional technical analysis can infer market trends via price patterns and technical indicators, it often falls short when confronted with the need for multi-scale data integration and real-time decision-making. This study proposes an innovative intelligent trading system that leverages deep learning to fuse multi-scale candlestick (K-line) charts, thereby enhancing both the accuracy and timeliness of market-trend prediction. The research first maps price sequences at multiple time resolutions into image representations and employs three classification frameworks - traditional linear Logistic Regression (LR), a Convolutional Neural Network (CNN), and a Transformer-based Vision Transformer (ViT) - to extract features and predict market direction. To verify the advantage of image-based approaches, raw OHLCV sequences are also fed directly into a Long Short-Term Memory (LSTM) network for comparison, while a dual moving-average crossover strategy serves as a traditional baseline. Two non-overlapping one-year periods are used for out-of-sample evaluation, enabling assessment of model robustness and temporal transferability. Empirical results show that the proposed multi-scale intelligent trading system delivers stable performance across all model architectures, significantly improving predictive accuracy and trading profitability. By successfully integrating multidimensional market information, the system provides investors with a practical and reliable decision-support tool.參考文獻 [1] J. S. Abarbanell and B. J. Bushee, “Fundamental analysis, future earnings, and stock prices,” Journal of accounting research, vol. 35, no. 1, pp. 1–24, 1997. [2] J. J. Murphy, Technical analysis of the financial markets: A comprehensive guide to trading methods and applications. Penguin, 1999. [3] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016. [4] A. Adadi and M. Berrada, “Peeking inside the black-box: a survey on explainable artificial intelligence (xai),” IEEE access, vol. 6, pp. 52 138–52 160, 2018. [5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015. [6] R. S. Sutton, A. G. Barto et al., Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1. [7] E. F. Fama, “Efficient capital markets,” Journal of finance, vol. 25, no. 2, pp. 383–417, 1970. [8] S. J. Grossman and J. E. Stiglitz, “On the impossibility of informationally efficient markets,” The American economic review, vol. 70, no. 3, pp. 393–408, 1980. [9] R. A. Meese and K. Rogoff, “Empirical exchange rate models of the seventies: Do they fit out of sample?” Journal of international economics, vol. 14, no. 1-2, pp. 3–24, 1983. 59 [10] J. Jiang, B. Kelly, and D. Xiu, “(re-) imag (in) ing price trends,” The Journal of Finance, vol. 78, no. 6, pp. 3193–3249, 2023. [11] D. G. Kleinbaum, K. Dietz, M. Gail, M. Klein, and M. Klein, Logistic regression. Springer, 2002. [12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002. [13] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. [14] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series analysis: forecasting and control. John Wiley & Sons, 2015. [15] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, pp. 273– 297, 1995. [16] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 2002. [17] Y. Han, G. Zhou, and Y. Zhu, “A trend factor: Any economic gains from using information over investment horizons?” Journal of Financial Economics, vol. 122, no. 2, pp. 352–375, 2016. [18] Y. Chen, Z. Wei, and X. Huang, “Incorporating corporation relationship via graph convolutional neural networks for stock price prediction,” in Proceedings of the 27th ACM international conference on information and knowledge management, 2018, pp. 1655– 1658. [19] H. S. Sim, H. I. Kim, and J. J. Ahn, “Is deep learning for image recognition applicable to stock market prediction?” Complexity, vol. 2019, no. 1, p. 4324878, 2019. [20] G. Liu, Y. Mao, Q. Sun, H. Huang, W. Gao, X. Li, J. Shen, R. Li, and X. Wang, “Multiscale two-way deep neural network for stock trend prediction,” in Proceedings of the 60 Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 4555–4561. [21] Z. Zeng, R. Kaur, S. Siddagangappa, S. Rahimi, T. Balch, and M. Veloso, “Financial time series forecasting using cnn and transformer,” arXiv preprint arXiv:2304.04912, 2023. [22] Y. Huang, C. Zhou, K. Cui, and X. Lu, “A multi-agent reinforcement learning framework for optimizing financial trading strategies based on timesnet,” Expert Systems with Applications, vol. 237, p. 121502, 2024. [23] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [24] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193–202, 1980. [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012. [26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [27] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [29] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708. 61 [30] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [31] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114. [32] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 976–11 986. [33] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style convnets great again,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13 733–13 742. [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [35] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19. [36] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986. [37] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,” arXiv preprint arXiv:2004.05150, 2020. [38] K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser et al., “Rethinking attention with performers,” arXiv preprint arXiv:2009.14794, 2020. [39] S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear complexity,” arXiv preprint arXiv:2006.04768, 2020. 62 [40] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186. [41] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018. [42] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660. [43] X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9640–9649. [44] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009. [45] S. Mehta and M. Rastegari, “Mobilevit: light-weight, general-purpose, and mobilefriendly vision transformer,” arXiv preprint arXiv:2110.02178, 2021. [46] K. Wu, J. Zhang, H. Peng, M. Liu, B. Xiao, J. Fu, and L. Yuan, “Tinyvit: Fast pretraining distillation for small vision transformers,” in European conference on computer vision. Springer, 2022, pp. 68–85. 描述 碩士
國立政治大學
資訊科學系碩士在職專班
111971004資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111971004 資料類型 thesis dc.contributor.advisor 彭彥璁 zh_TW dc.contributor.advisor Peng, Yan-Tsung en_US dc.contributor.author (Authors) 曾祐展 zh_TW dc.contributor.author (Authors) Tseng, Yu-Chan en_US dc.creator (作者) 曾祐展 zh_TW dc.creator (作者) Tseng, Yu-Chan en_US dc.date (日期) 2025 en_US dc.date.accessioned 1-Sep-2025 16:18:22 (UTC+8) - dc.date.available 1-Sep-2025 16:18:22 (UTC+8) - dc.date.issued (上傳時間) 1-Sep-2025 16:18:22 (UTC+8) - dc.identifier (Other Identifiers) G0111971004 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159292 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 111971004 zh_TW dc.description.abstract (摘要) 隨著金融市場資訊的爆炸性增長與波動迅速加劇,投資者必須在極短的時間內分析大量且複雜的市場資料,以制定有效且即時的交易策略。傳統的技術分析方法雖能透過價格模式與技術指標判斷市場趨勢,但在面臨多尺度資料整合及即時決策需求時,常顯得不足以應付。本研究提出一套創新的智慧交易系統,透過深度學習方法融合多尺度K線圖,以提升市場趨勢預測的準確性與即時性。 研究首先將多尺度價格序列映射為圖像特徵,分別以傳統線性模型羅吉斯回歸(Logistic Regression),以及深度學習模型卷積神經網路(Convolutional Neural Networks, CNN)與基於Transformer架構的視覺Transformer(Vision Transformer, ViT)進行特徵萃取與方向分類;其次,為驗證影像化方法之優勢,另以不經圖像轉換的數值 OHLCV 序列輸入長短期記憶網路(LSTM)進行對照;並納入雙均線交叉策略作為傳統基線。實驗採用兩段獨立之一年期間資料進行樣本外檢驗,以評估模型穩健性與時序遷移效應。實證結果顯示,本研究所開發之多尺度智慧交易系統,在不同模型結構下均能穩定表現,顯著提高預測準確度與交易績效,成功整合多維市場資訊,為投資人提供具實務價值且可靠的決策支援工具。 zh_TW dc.description.abstract (摘要) With the explosive growth of financial-market information and increasingly rapid price volatility, investors must analyse massive, complex data within extremely short time frames to formulate effective and timely trading strategies. Although traditional technical analysis can infer market trends via price patterns and technical indicators, it often falls short when confronted with the need for multi-scale data integration and real-time decision-making. This study proposes an innovative intelligent trading system that leverages deep learning to fuse multi-scale candlestick (K-line) charts, thereby enhancing both the accuracy and timeliness of market-trend prediction. The research first maps price sequences at multiple time resolutions into image representations and employs three classification frameworks - traditional linear Logistic Regression (LR), a Convolutional Neural Network (CNN), and a Transformer-based Vision Transformer (ViT) - to extract features and predict market direction. To verify the advantage of image-based approaches, raw OHLCV sequences are also fed directly into a Long Short-Term Memory (LSTM) network for comparison, while a dual moving-average crossover strategy serves as a traditional baseline. Two non-overlapping one-year periods are used for out-of-sample evaluation, enabling assessment of model robustness and temporal transferability. Empirical results show that the proposed multi-scale intelligent trading system delivers stable performance across all model architectures, significantly improving predictive accuracy and trading profitability. By successfully integrating multidimensional market information, the system provides investors with a practical and reliable decision-support tool. en_US dc.description.tableofcontents 誌謝 i 摘要 ii Abstract iii 目錄 v 圖目錄 vi 表目錄 viii 第一章 緒論 1 第一節 研究背景 1 第二節 研究動機及目的 3 第二章 文獻探討 5 第一節 金融時序分類應用 5 第二節 研究方法 13 第三章 研究方法 22 第一節 資料集 22 第二節 資料處理 24 第三節 模型架構 28 第四節 實驗環境與訓練參數 32 第四章 實驗結果 34 第一節 衡量指標 34 第二節 實驗結果 36 第三節 交易績效 45 第四節 研究限制與未來方向 55 第五章 結論 57 參考文獻 59 zh_TW dc.format.extent 4046628 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111971004 en_US dc.subject (關鍵詞) 深度學習 zh_TW dc.subject (關鍵詞) 多尺度分析 zh_TW dc.subject (關鍵詞) 卷積神經網路 zh_TW dc.subject (關鍵詞) 視覺變壓器 zh_TW dc.subject (關鍵詞) 時間序列 zh_TW dc.subject (關鍵詞) Deep Learning en_US dc.subject (關鍵詞) Multi-scale Analysis en_US dc.subject (關鍵詞) Convolutional Neural Networks en_US dc.subject (關鍵詞) Vision Transformer en_US dc.subject (關鍵詞) Time Series en_US dc.title (題名) 基於市場相對價格型態之股價預測模型 zh_TW dc.title (題名) Stock price prediction based on relative price patterns en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] J. S. Abarbanell and B. J. Bushee, “Fundamental analysis, future earnings, and stock prices,” Journal of accounting research, vol. 35, no. 1, pp. 1–24, 1997. [2] J. J. Murphy, Technical analysis of the financial markets: A comprehensive guide to trading methods and applications. Penguin, 1999. [3] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016. [4] A. Adadi and M. Berrada, “Peeking inside the black-box: a survey on explainable artificial intelligence (xai),” IEEE access, vol. 6, pp. 52 138–52 160, 2018. [5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015. [6] R. S. Sutton, A. G. Barto et al., Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1. [7] E. F. Fama, “Efficient capital markets,” Journal of finance, vol. 25, no. 2, pp. 383–417, 1970. [8] S. J. Grossman and J. E. Stiglitz, “On the impossibility of informationally efficient markets,” The American economic review, vol. 70, no. 3, pp. 393–408, 1980. [9] R. A. Meese and K. Rogoff, “Empirical exchange rate models of the seventies: Do they fit out of sample?” Journal of international economics, vol. 14, no. 1-2, pp. 3–24, 1983. 59 [10] J. Jiang, B. Kelly, and D. Xiu, “(re-) imag (in) ing price trends,” The Journal of Finance, vol. 78, no. 6, pp. 3193–3249, 2023. [11] D. G. Kleinbaum, K. Dietz, M. Gail, M. Klein, and M. Klein, Logistic regression. Springer, 2002. [12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002. [13] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. [14] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series analysis: forecasting and control. John Wiley & Sons, 2015. [15] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, pp. 273– 297, 1995. [16] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 2002. [17] Y. Han, G. Zhou, and Y. Zhu, “A trend factor: Any economic gains from using information over investment horizons?” Journal of Financial Economics, vol. 122, no. 2, pp. 352–375, 2016. [18] Y. Chen, Z. Wei, and X. Huang, “Incorporating corporation relationship via graph convolutional neural networks for stock price prediction,” in Proceedings of the 27th ACM international conference on information and knowledge management, 2018, pp. 1655– 1658. [19] H. S. Sim, H. I. Kim, and J. J. Ahn, “Is deep learning for image recognition applicable to stock market prediction?” Complexity, vol. 2019, no. 1, p. 4324878, 2019. [20] G. Liu, Y. Mao, Q. Sun, H. Huang, W. Gao, X. Li, J. Shen, R. Li, and X. Wang, “Multiscale two-way deep neural network for stock trend prediction,” in Proceedings of the 60 Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 4555–4561. [21] Z. Zeng, R. Kaur, S. Siddagangappa, S. Rahimi, T. Balch, and M. Veloso, “Financial time series forecasting using cnn and transformer,” arXiv preprint arXiv:2304.04912, 2023. [22] Y. Huang, C. Zhou, K. Cui, and X. Lu, “A multi-agent reinforcement learning framework for optimizing financial trading strategies based on timesnet,” Expert Systems with Applications, vol. 237, p. 121502, 2024. [23] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [24] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193–202, 1980. [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012. [26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [27] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9. [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [29] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708. 61 [30] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [31] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114. [32] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 976–11 986. [33] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style convnets great again,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13 733–13 742. [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [35] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19. [36] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986. [37] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,” arXiv preprint arXiv:2004.05150, 2020. [38] K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser et al., “Rethinking attention with performers,” arXiv preprint arXiv:2009.14794, 2020. [39] S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear complexity,” arXiv preprint arXiv:2006.04768, 2020. 62 [40] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186. [41] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018. [42] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660. [43] X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9640–9649. [44] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009. [45] S. Mehta and M. Rastegari, “Mobilevit: light-weight, general-purpose, and mobilefriendly vision transformer,” arXiv preprint arXiv:2110.02178, 2021. [46] K. Wu, J. Zhang, H. Peng, M. Liu, B. Xiao, J. Fu, and L. Yuan, “Tinyvit: Fast pretraining distillation for small vision transformers,” in European conference on computer vision. Springer, 2022, pp. 68–85. zh_TW
