學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 基於大型語言模型的可解釋性混合專家系統:多模態資料驅動的創新自動交易框架
Interpretable Mixture of Experts Based on Large Language Models: A Multimodal Data-Driven Framework for Innovative Automated Trading作者 劉冠銘
Liu, Kuan-Ming貢獻者 江彌修
Chiang, Mi-Hsiu
劉冠銘
Liu, Kuan-Ming關鍵詞 大型語言模型
混和專家模型
動態交易策略
Large Language Models
Mixture-of-Experts
Dynamic Trading Strategies日期 2025 上傳時間 1-七月-2025 15:16:48 (UTC+8) 摘要 隨著深度學習與大型語言模型(LLMs) 的快速發展,專家混合模型(Mixture-of-Experts, MoE) 在股票投資領域的應用得到了推動。雖然這些模型展現出優秀的交易績效,但大多數仍局限於單模態數據處理,忽略了來自文本數據等其他模態所提供的豐富信息。另外傳統基於神經網絡的路由器選擇機制無法充分考慮上下文和現實世界的細微差異,導致專家選擇次優化。為了解決這些問題,本研究提出了一種新穎框架,通過將大型語言模型作為路由器融入MoE 架構,充分利用大型語言模型預訓練好的的世界知識與推理能力,動態選擇專家來處理歷史價格數據與股票新聞。這種方法不僅提升了專家選擇的效率與準確性,還增強了模型的解釋性。 實驗結果顯示,基於多模態真實股票數據的本研究模型框架在總回報率(TR)、夏普比率(SR) 和卡爾瑪比率(CR) 等多項核心指標上顯著優於傳統MoE 模型及其他深度神經網絡方法。LLMoE 通過將數值數據與文本數據整合,實現了更有效的專家選擇與交易決策,並展現出卓越的風險調整能力。此外,該框架靈活的架構設計可輕鬆適配於多種下游任務,其高透明性則通過自然語言推理增強了交易決策的可信度。綜上所述,本研究提供了一個創新的智能交易解決方案,彌補了傳統模型的不足,並為未來在金融市場的應用與研究開闢了新方向。
With the rapid development of deep learning and large language models (LLMs), the application of Mixture-of-Experts (MoE) models in stock investment has gained momentum. While these models demonstrate excellent trading performance, most remain limited to single-modal data processing, overlooking the rich information provided by other modalities such as textual data. Additionally, traditional neural network-based router selection mechanisms fail to adequately consider contextual and real-world nuances, leading to suboptimal expert selection. To address these issues, this study proposes a novel framework that incorporates large language models as routers within the MoE architecture, leveraging the pre-trained world knowledge and reasoning capabilities of LLMs to dynamically select experts for processing historical price data and stock news. This approach not only improves the efficiency and accuracy of expert selection but also enhances model interpretability. Experimental results show that the proposed model framework, based on multimodal real stock data, significantly outperforms traditional MoE models and other deep neural network methods across multiple core metrics including Total Return (TR), Sharpe Ratio (SR), and Calmar Ratio (CR). LLMoE achieves more effective expert selection and trading decisions by integrating numerical data with textual data, demonstrating superior risk-adjusted performance. Furthermore, the framework's flexible architectural design can be easily adapted to various downstream tasks, while its high transparency enhances the credibility of trading decisions through natural language reasoning. In conclusion, this study provides an innovative intelligent trading solution that addresses the shortcomings of traditional models and opens new directions for future applications and research in financial markets.參考文獻 Ding, H., Li, Y., Wang, J., & Chen, H. (2024a). Large language model agent in financial trading: A survey. arXiv preprint arXiv:2408.06361. Ding, Q., Shi, H., & Liu, B. (2024b). Tradexpert: Revolutionizing trading with mixture of expert LLMs. arXiv preprint arXiv:2411.00782. Hu, Z., Liu, W., Bian, J., Liu, X., & Liu, T.-Y. (2018). Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining, 261–269. Iacovides, G., Konstantinidis, T., Xu, M., & Mandic, D. (2024). Finllama: LLM-based financial sentiment analysis for algorithmic trading. In Proceedings of the 5th ACM International Conference on AI in Finance, 134–141. Innovations, B. (2018). Stock price and news related to it. Kaggle Dataset. Available at: https://www.kaggle.com/datasets/BidecInnovations/stock-price-and-news-realted-to-it/ (Accessed: 2025-05-06). Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., et al. (2023). Time-LLM: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728. jun Gu, W., hao Zhong, Y., zun Li, S., song Wei, C., ting Dong, L., yue Wang, Z., & Yan, C. (2024). Predicting stock prices with FinBERT-LSTM: Integrating news sentiment analysis. In Proceedings of the 2024 8th International Conference on Cloud and Big Data Computing, 67–72. Kou, Z., Yu, H., Peng, J., & Chen, L. (2024). Automate strategy finding with LLM in quant investment. arXiv preprint arXiv:2409.06289. Li, K., & Xu, J. (2023). An attention-based multi-gate mixture-of-experts model for quantitative stock selection. International Journal of Trade, Economics and Finance, 14(3), 165–173. Li, Y., Yu, Y., Li, H., Chen, Z., & Khashanah, K. (2023). TradingGPT: Multi-agent system with layered memory and distinct characters for enhanced financial trading performance. arXiv preprint arXiv:2309.03736. Lopez-Lira, A., & Tang, Y. (2023). Can ChatGPT forecast stock price movements? Return predictability and large language models. arXiv preprint arXiv:2304.07619. Sawhney, R., Agarwal, S., Wadhwa, A., & Shah, R. (2020). Deep attentive learning for stock movement prediction from social media text and company correlations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 8415–8426. Shen, S., Hou, L., Zhou, Y., Du, N., Longpre, S., Wei, J., Chung, H. W., Zoph, B., Fedus, W., Chen, X., et al. (2023). Mixture-of-experts meets instruction tuning: A winning combination for large language models. arXiv preprint arXiv:2305.14705. Shi, X., Wang, S., Nie, Y., Li, D., Ye, Z., Wen, Q., & Jin, M. (2024). Time-MoE: Billion-scale time series foundation models with mixture of experts. arXiv preprint arXiv:2409.16040. Soun, Y., Yoo, J., Cho, M., Jeon, J., & Kang, U. (2022). Accurate stock movement prediction with self-supervised learning from sparse noisy tweets. In 2022 IEEE International Conference on Big Data (Big Data), 1691–1700. IEEE. Sun, S., Wang, X., Xue, W., Lou, X., & An, B. (2023). Mastering stock markets with efficient mixture of diversified trading experts. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2109–2119. Toner, W., & Darlow, L. (2024). An analysis of linear time series forecasting models. arXiv preprint arXiv:2403.14587. Vallarino, D. (2024). A dynamic approach to stock price prediction: Comparing RNN and mixture of experts models across different volatility profiles. arXiv preprint arXiv:2410.07234. Xu, W., Liu, W., Xu, C., Bian, J., Yin, J., & Liu, T.-Y. (2021). REST: Relational event-driven stock trend forecasting. In Proceedings of the Web Conference 2021, 1–10. Yoo, J., Soun, Y., Park, Y.-C., & Kang, U. (2021). Accurate multivariate stock movement prediction via data-axis transformer with multi-level contexts. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2037–2045. Yu, Y., Li, H., Chen, Z., Jiang, Y., Li, Y., Zhang, D., Liu, R., Suchow, J. W., & Khashanah, K. (2024a). FinMEM: A performance-enhanced LLM trading agent with layered memory and character design. In Proceedings of the AAAI Symposium Series, 3, 595–597. Yu, Z., Wu, Y., Wang, G., & Weng, H. (2024b). MIGA: Mixture-of-experts with group aggregation for stock market prediction. arXiv preprint arXiv:2410.02241. Zeng, A., Chen, M., Zhang, L., & Xu, Q. (2023a). Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, 37, 11121–11128. Zeng, Z., Kaur, R., Siddagangappa, S., Rahimi, S., Balch, T., & Veloso, M. (2023b). Financial time series forecasting using CNN and transformer. arXiv preprint arXiv:2304.04912. Zhao, H., Liu, Z., Wu, Z., Li, Y., Yang, T., Shu, P., Xu, S., Dai, H., Zhao, L., Mai, G., et al. (2024). Revolutionizing finance with LLMs: An overview of applications and insights. arXiv preprint arXiv:2401.11641. Zhou, Y., Lei, T., Liu, H., Du, N., Huang, Y., Zhao, V., Dai, A. M., Le, Q. V., Laudon, J., et al. (2022). Mixture-of-experts with expert choice routing. Advances in Neural Information Processing Systems, 35, 7103–7114. 描述 碩士
國立政治大學
金融學系
112352014資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112352014 資料類型 thesis dc.contributor.advisor 江彌修 zh_TW dc.contributor.advisor Chiang, Mi-Hsiu en_US dc.contributor.author (作者) 劉冠銘 zh_TW dc.contributor.author (作者) Liu, Kuan-Ming en_US dc.creator (作者) 劉冠銘 zh_TW dc.creator (作者) Liu, Kuan-Ming en_US dc.date (日期) 2025 en_US dc.date.accessioned 1-七月-2025 15:16:48 (UTC+8) - dc.date.available 1-七月-2025 15:16:48 (UTC+8) - dc.date.issued (上傳時間) 1-七月-2025 15:16:48 (UTC+8) - dc.identifier (其他 識別碼) G0112352014 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/157832 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 金融學系 zh_TW dc.description (描述) 112352014 zh_TW dc.description.abstract (摘要) 隨著深度學習與大型語言模型(LLMs) 的快速發展,專家混合模型(Mixture-of-Experts, MoE) 在股票投資領域的應用得到了推動。雖然這些模型展現出優秀的交易績效,但大多數仍局限於單模態數據處理,忽略了來自文本數據等其他模態所提供的豐富信息。另外傳統基於神經網絡的路由器選擇機制無法充分考慮上下文和現實世界的細微差異,導致專家選擇次優化。為了解決這些問題,本研究提出了一種新穎框架,通過將大型語言模型作為路由器融入MoE 架構,充分利用大型語言模型預訓練好的的世界知識與推理能力,動態選擇專家來處理歷史價格數據與股票新聞。這種方法不僅提升了專家選擇的效率與準確性,還增強了模型的解釋性。 實驗結果顯示,基於多模態真實股票數據的本研究模型框架在總回報率(TR)、夏普比率(SR) 和卡爾瑪比率(CR) 等多項核心指標上顯著優於傳統MoE 模型及其他深度神經網絡方法。LLMoE 通過將數值數據與文本數據整合,實現了更有效的專家選擇與交易決策,並展現出卓越的風險調整能力。此外,該框架靈活的架構設計可輕鬆適配於多種下游任務,其高透明性則通過自然語言推理增強了交易決策的可信度。綜上所述,本研究提供了一個創新的智能交易解決方案,彌補了傳統模型的不足,並為未來在金融市場的應用與研究開闢了新方向。 zh_TW dc.description.abstract (摘要) With the rapid development of deep learning and large language models (LLMs), the application of Mixture-of-Experts (MoE) models in stock investment has gained momentum. While these models demonstrate excellent trading performance, most remain limited to single-modal data processing, overlooking the rich information provided by other modalities such as textual data. Additionally, traditional neural network-based router selection mechanisms fail to adequately consider contextual and real-world nuances, leading to suboptimal expert selection. To address these issues, this study proposes a novel framework that incorporates large language models as routers within the MoE architecture, leveraging the pre-trained world knowledge and reasoning capabilities of LLMs to dynamically select experts for processing historical price data and stock news. This approach not only improves the efficiency and accuracy of expert selection but also enhances model interpretability. Experimental results show that the proposed model framework, based on multimodal real stock data, significantly outperforms traditional MoE models and other deep neural network methods across multiple core metrics including Total Return (TR), Sharpe Ratio (SR), and Calmar Ratio (CR). LLMoE achieves more effective expert selection and trading decisions by integrating numerical data with textual data, demonstrating superior risk-adjusted performance. Furthermore, the framework's flexible architectural design can be easily adapted to various downstream tasks, while its high transparency enhances the credibility of trading decisions through natural language reasoning. In conclusion, this study provides an innovative intelligent trading solution that addresses the shortcomings of traditional models and opens new directions for future applications and research in financial markets. en_US dc.description.tableofcontents 第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 2 第二章 文獻探討 7 2.1 金融中的深度學習 7 2.2 專家混合模型 8 2.3 語言模型與交易 10 第三章 研究方法 13 3.1 架構說明 13 3.1.1 數據與前處理 14 3.1.2 路由器設計 14 3.1.3 專家模型 15 3.1.4 策略生成評估 15 3.2 問題與假設 16 3.2.1 輸入定義 16 3.2.2 預測目標 17 3.2.3 多模態整合 17 3.2.4 研究假設 18 3.2.5 框架限制 18 3.2.6 傳統MoE模型 19 3.2.7 傳統MoE缺點 21 3.3 LLM路由器 22 3.3.1 傳統侷限 23 3.3.2 LLM優勢 23 3.3.3 樂觀悲觀分類 24 3.3.4 路由演算法 24 3.4 專家模型 25 3.4.1 架構說明 25 3.4.2 訓練方式 25 3.4.3 樂觀與悲觀 26 第四章 實證結果 27 4.1 設計與評估 27 4.1.1 策略設計 27 4.1.2 評估指標 28 4.1.3 基準模型 30 4.2 實驗設定 32 4.2.1 數據分析 32 4.2.2 特徵工程 33 4.2.3 路由實現 33 4.2.4 專家架構 36 4.2.5 訓練與設定 36 4.3 路由分析 37 4.3.1 信心定義 37 4.3.2 策略分布 38 4.3.3 推理案例 38 4.3.4 類別與走勢 41 4.4 預測比較 42 4.4.1 準確率 42 4.4.2 專家表現 44 4.5 回測結果 45 4.5.1 績效比較 45 4.5.2 風險分析 48 4.6 使用與路由 50 4.6.1 使用統計 50 4.6.2 機制評估 51 4.6.3 架構啟示 52 第五章 結論展望 53 5.1 研究總結 53 5.1.1 主要發現 53 5.1.2 創新貢獻 54 5.2 研究限制 55 5.2.1 數據問題 55 5.2.2 模型限制 55 5.2.3 評估限制 56 5.3 未來方向 56 5.3.1 架構延伸 56 5.3.2 特徵優化 57 5.3.3 策略應用 58 5.4 總結展望 58 參考文獻 60 zh_TW dc.format.extent 2102382 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112352014 en_US dc.subject (關鍵詞) 大型語言模型 zh_TW dc.subject (關鍵詞) 混和專家模型 zh_TW dc.subject (關鍵詞) 動態交易策略 zh_TW dc.subject (關鍵詞) Large Language Models en_US dc.subject (關鍵詞) Mixture-of-Experts en_US dc.subject (關鍵詞) Dynamic Trading Strategies en_US dc.title (題名) 基於大型語言模型的可解釋性混合專家系統:多模態資料驅動的創新自動交易框架 zh_TW dc.title (題名) Interpretable Mixture of Experts Based on Large Language Models: A Multimodal Data-Driven Framework for Innovative Automated Trading en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Ding, H., Li, Y., Wang, J., & Chen, H. (2024a). Large language model agent in financial trading: A survey. arXiv preprint arXiv:2408.06361. Ding, Q., Shi, H., & Liu, B. (2024b). Tradexpert: Revolutionizing trading with mixture of expert LLMs. arXiv preprint arXiv:2411.00782. Hu, Z., Liu, W., Bian, J., Liu, X., & Liu, T.-Y. (2018). Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining, 261–269. Iacovides, G., Konstantinidis, T., Xu, M., & Mandic, D. (2024). Finllama: LLM-based financial sentiment analysis for algorithmic trading. In Proceedings of the 5th ACM International Conference on AI in Finance, 134–141. Innovations, B. (2018). Stock price and news related to it. Kaggle Dataset. Available at: https://www.kaggle.com/datasets/BidecInnovations/stock-price-and-news-realted-to-it/ (Accessed: 2025-05-06). Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., et al. (2023). Time-LLM: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728. jun Gu, W., hao Zhong, Y., zun Li, S., song Wei, C., ting Dong, L., yue Wang, Z., & Yan, C. (2024). Predicting stock prices with FinBERT-LSTM: Integrating news sentiment analysis. In Proceedings of the 2024 8th International Conference on Cloud and Big Data Computing, 67–72. Kou, Z., Yu, H., Peng, J., & Chen, L. (2024). Automate strategy finding with LLM in quant investment. arXiv preprint arXiv:2409.06289. Li, K., & Xu, J. (2023). An attention-based multi-gate mixture-of-experts model for quantitative stock selection. International Journal of Trade, Economics and Finance, 14(3), 165–173. Li, Y., Yu, Y., Li, H., Chen, Z., & Khashanah, K. (2023). TradingGPT: Multi-agent system with layered memory and distinct characters for enhanced financial trading performance. arXiv preprint arXiv:2309.03736. Lopez-Lira, A., & Tang, Y. (2023). Can ChatGPT forecast stock price movements? Return predictability and large language models. arXiv preprint arXiv:2304.07619. Sawhney, R., Agarwal, S., Wadhwa, A., & Shah, R. (2020). Deep attentive learning for stock movement prediction from social media text and company correlations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 8415–8426. Shen, S., Hou, L., Zhou, Y., Du, N., Longpre, S., Wei, J., Chung, H. W., Zoph, B., Fedus, W., Chen, X., et al. (2023). Mixture-of-experts meets instruction tuning: A winning combination for large language models. arXiv preprint arXiv:2305.14705. Shi, X., Wang, S., Nie, Y., Li, D., Ye, Z., Wen, Q., & Jin, M. (2024). Time-MoE: Billion-scale time series foundation models with mixture of experts. arXiv preprint arXiv:2409.16040. Soun, Y., Yoo, J., Cho, M., Jeon, J., & Kang, U. (2022). Accurate stock movement prediction with self-supervised learning from sparse noisy tweets. In 2022 IEEE International Conference on Big Data (Big Data), 1691–1700. IEEE. Sun, S., Wang, X., Xue, W., Lou, X., & An, B. (2023). Mastering stock markets with efficient mixture of diversified trading experts. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2109–2119. Toner, W., & Darlow, L. (2024). An analysis of linear time series forecasting models. arXiv preprint arXiv:2403.14587. Vallarino, D. (2024). A dynamic approach to stock price prediction: Comparing RNN and mixture of experts models across different volatility profiles. arXiv preprint arXiv:2410.07234. Xu, W., Liu, W., Xu, C., Bian, J., Yin, J., & Liu, T.-Y. (2021). REST: Relational event-driven stock trend forecasting. In Proceedings of the Web Conference 2021, 1–10. Yoo, J., Soun, Y., Park, Y.-C., & Kang, U. (2021). Accurate multivariate stock movement prediction via data-axis transformer with multi-level contexts. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2037–2045. Yu, Y., Li, H., Chen, Z., Jiang, Y., Li, Y., Zhang, D., Liu, R., Suchow, J. W., & Khashanah, K. (2024a). FinMEM: A performance-enhanced LLM trading agent with layered memory and character design. In Proceedings of the AAAI Symposium Series, 3, 595–597. Yu, Z., Wu, Y., Wang, G., & Weng, H. (2024b). MIGA: Mixture-of-experts with group aggregation for stock market prediction. arXiv preprint arXiv:2410.02241. Zeng, A., Chen, M., Zhang, L., & Xu, Q. (2023a). Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, 37, 11121–11128. Zeng, Z., Kaur, R., Siddagangappa, S., Rahimi, S., Balch, T., & Veloso, M. (2023b). Financial time series forecasting using CNN and transformer. arXiv preprint arXiv:2304.04912. Zhao, H., Liu, Z., Wu, Z., Li, Y., Yang, T., Shu, P., Xu, S., Dai, H., Zhao, L., Mai, G., et al. (2024). Revolutionizing finance with LLMs: An overview of applications and insights. arXiv preprint arXiv:2401.11641. Zhou, Y., Lei, T., Liu, H., Du, N., Huang, Y., Zhao, V., Dai, A. M., Le, Q. V., Laudon, J., et al. (2022). Mixture-of-experts with expert choice routing. Advances in Neural Information Processing Systems, 35, 7103–7114. zh_TW
