多因子組合方法於台灣市場之實證研究：傳統動能、降維與機器學習的綜合評估 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	多因子組合方法於台灣市場之實證研究：傳統動能、降維與機器學習的綜合評估 An Empirical Study of Multi-Factor Combination Methods in the Taiwan Market: A Comprehensive Evaluation of Traditional Momentum, Dimensionality Reduction, and Machine Learning Approaches
作者	陳昇華 Chen, Sheng-Hua
貢獻者	林士貴 Lin, Shih-Kuei 陳昇華 Sheng-Hua Chen
關鍵詞	多因子機器學習動能降維交易策略因子合成 Multi-Factor Momentum Dimensionality Reduction Machine Learning Factor Combination Trading Strategy
日期	2025
上傳時間	4-Aug-2025 14:33:19 (UTC+8)
摘要	本研究檢驗 2010 至 2025 年間 52 項涵蓋估值、成長、獲利、品質、技術面與流動性維度的公司層級因子在臺灣股市的橫斷面報酬預測能力，資料取自 TEJ，並透過 MAD 截尾、規模與產業中性化及 Z 分數標準化三道程序處理，以確保訊號穩健且可比較。單因子分析顯示，價值與品質因子的資訊係數及資訊比率表現最佳，而動能、規模與風險導向因子波動較大。為整合多因子訊息，我們比較等權重組合、PCA、橫斷面與時間序列因子動能，以及 CatBoost、XGBoost 與 LightGBM 等梯度提升排序模型，結果以 LightGBM 最優，全市場樣本期間年化報酬率 15.90\%、夏普比率 2.50、最大回撤 $-5.90$\%；於波動性較高、流動性較低的 OTC 市場同樣取得 21.29\% 年化報酬、夏普比率 5.13 與 Calmar 比率 6.45，明顯超越動能與 PCA 基準，顯示集成樹模型能有效捕捉傳統線性架構難以掌握的非線性因子交互作用。本研究首次系統性驗證美股預測因子於臺灣市場的可現性，提出減少極端值與非預期風格曝險的嚴謹前處理與模型比較流程，並提供全市場與細分市場實證，證明 LightGBM 目前是臺灣多因子選股的最佳實務途徑，未來可進一步納入交易成本、槓桿限制與深度學習合成因子，縮短學術與可投資實務的距離。 This study examines cross‑sectional return predictability in the Taiwan equity market from 2010 to 2025. We analyze a curated library of 52 firm‑level predictors spanning valuation, growth, profitability, quality, technical, and liquidity dimensions. Daily data from the Taiwan Economic Journal (TEJ) are processed through a three‑step pipeline—Median Absolute Deviation (MAD) clipping, size‑ and industry‑neutralization, and Z‑score standardization—to ensure signal comparability and robustness. Single‑factor tests show that value and quality variables deliver the highest Information Coefficients (IC) and Information Ratios (IR), whereas momentum, size, and risk‑oriented factors exhibit more volatile performance. To synthesize information across predictors, we compare four classes of aggregation techniques: (i) equal‑weight combinations, (ii) Principal Component Analysis (PCA), (iii) cross‑sectional and time‑series factor momentum (CSFM / TSFM), and (iv) three gradient‑boosting rankers—CatBoost, XGBoost, and LightGBM. Among these, LightGBM attains the strongest out‑of‑sample results, recording an annualized return of 15.90%, a Sharpe ratio of 2.50, and a maximum drawdown of only −5.90% on the whole‑market sample. Robustness tests on the more volatile and less liquid OTC segment confirm the superiority of machine‑learning models: LightGBM still achieves a 21.29% annualized return, a Sharpe ratio of 5.13, and a Calmar ratio of 6.45, comfortably outperforming traditional momentum and PCA benchmarks. These findings underscore the adaptability of ensemble‑tree models in emerging markets and highlight their capacity to capture nonlinear factor interactions that conventional linear or momentum frameworks may overlook. Our contributions are three‑fold: (i) we provide the first comprehensive transferability test of U.S.‑validated predictors to Taiwan, (ii) we propose a rigorous preprocessing and model‑comparison protocol that mitigates extreme values and unintended style exposures, and (iii) we furnish market‑wide and segment‑specific evidence that LightGBM currently offers the most effective route to multi‑factor stock selection in Taiwan. Future research can extend this framework by incorporating dynamic transaction‑cost models, leverage constraints, and deep‑learning‑based factor integrators to further bridge the gap between academic insight and investable practice.
參考文獻	[1] Dhingra, V., Sharma, A., & Gupta, S. K. (2023). Sectoral portfolio optimization by judicious selection of financial ratios via PCA. Optimization and Engineering, 25(3), 1431–1468. [2] Engelberg, J., McLean, R. D., Pontiff, J., & Ringgenberg, M. C. (2023). Do cross‑sectional predictors contain systematic information? Journal of Financial and Quantitative Analysis, 58(3), 1172–1201. [3] Gu, S., Kelly, B. T., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223–2270. [4] Gupta, T., & Kelly, B. T. (2018). Factor momentum everywhere (SSRN Scholarly Paper No. 3300728). Social Science Research Network. [5] Scruggs, J. T. (2021, August). Does neutralizing style factors help or hurt? The Journal of Investing, 30(3). [6] Wei, X., Tian, Y., Li, N., & Peng, H. (2024). Evaluating ensemble learning techniques for stock index trend prediction: A case of China. Portuguese Economic Journal, 23(3), 505–530.
描述	碩士國立政治大學金融學系 112352034
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0112352034
資料類型	thesis

dc.contributor.advisor	林士貴	zh_TW
dc.contributor.advisor	Lin, Shih-Kuei	en_US
dc.contributor.author (Authors)	陳昇華	zh_TW
dc.contributor.author (Authors)	Sheng-Hua Chen	en_US
dc.creator (作者)	陳昇華	zh_TW
dc.creator (作者)	Chen, Sheng-Hua	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	4-Aug-2025 14:33:19 (UTC+8)	-
dc.date.available	4-Aug-2025 14:33:19 (UTC+8)	-
dc.date.issued (上傳時間)	4-Aug-2025 14:33:19 (UTC+8)	-
dc.identifier (Other Identifiers)	G0112352034	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/158594	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	金融學系	zh_TW
dc.description (描述)	112352034	zh_TW
dc.description.abstract (摘要)	本研究檢驗 2010 至 2025 年間 52 項涵蓋估值、成長、獲利、品質、技術面與流動性維度的公司層級因子在臺灣股市的橫斷面報酬預測能力，資料取自 TEJ，並透過 MAD 截尾、規模與產業中性化及 Z 分數標準化三道程序處理，以確保訊號穩健且可比較。單因子分析顯示，價值與品質因子的資訊係數及資訊比率表現最佳，而動能、規模與風險導向因子波動較大。為整合多因子訊息，我們比較等權重組合、PCA、橫斷面與時間序列因子動能，以及 CatBoost、XGBoost 與 LightGBM 等梯度提升排序模型，結果以 LightGBM 最優，全市場樣本期間年化報酬率 15.90\%、夏普比率 2.50、最大回撤 $-5.90$\%；於波動性較高、流動性較低的 OTC 市場同樣取得 21.29\% 年化報酬、夏普比率 5.13 與 Calmar 比率 6.45，明顯超越動能與 PCA 基準，顯示集成樹模型能有效捕捉傳統線性架構難以掌握的非線性因子交互作用。本研究首次系統性驗證美股預測因子於臺灣市場的可現性，提出減少極端值與非預期風格曝險的嚴謹前處理與模型比較流程，並提供全市場與細分市場實證，證明 LightGBM 目前是臺灣多因子選股的最佳實務途徑，未來可進一步納入交易成本、槓桿限制與深度學習合成因子，縮短學術與可投資實務的距離。	zh_TW
dc.description.abstract (摘要)	This study examines cross‑sectional return predictability in the Taiwan equity market from 2010 to 2025. We analyze a curated library of 52 firm‑level predictors spanning valuation, growth, profitability, quality, technical, and liquidity dimensions. Daily data from the Taiwan Economic Journal (TEJ) are processed through a three‑step pipeline—Median Absolute Deviation (MAD) clipping, size‑ and industry‑neutralization, and Z‑score standardization—to ensure signal comparability and robustness. Single‑factor tests show that value and quality variables deliver the highest Information Coefficients (IC) and Information Ratios (IR), whereas momentum, size, and risk‑oriented factors exhibit more volatile performance. To synthesize information across predictors, we compare four classes of aggregation techniques: (i) equal‑weight combinations, (ii) Principal Component Analysis (PCA), (iii) cross‑sectional and time‑series factor momentum (CSFM / TSFM), and (iv) three gradient‑boosting rankers—CatBoost, XGBoost, and LightGBM. Among these, LightGBM attains the strongest out‑of‑sample results, recording an annualized return of 15.90%, a Sharpe ratio of 2.50, and a maximum drawdown of only −5.90% on the whole‑market sample. Robustness tests on the more volatile and less liquid OTC segment confirm the superiority of machine‑learning models: LightGBM still achieves a 21.29% annualized return, a Sharpe ratio of 5.13, and a Calmar ratio of 6.45, comfortably outperforming traditional momentum and PCA benchmarks. These findings underscore the adaptability of ensemble‑tree models in emerging markets and highlight their capacity to capture nonlinear factor interactions that conventional linear or momentum frameworks may overlook. Our contributions are three‑fold: (i) we provide the first comprehensive transferability test of U.S.‑validated predictors to Taiwan, (ii) we propose a rigorous preprocessing and model‑comparison protocol that mitigates extreme values and unintended style exposures, and (iii) we furnish market‑wide and segment‑specific evidence that LightGBM currently offers the most effective route to multi‑factor stock selection in Taiwan. Future research can extend this framework by incorporating dynamic transaction‑cost models, leverage constraints, and deep‑learning‑based factor integrators to further bridge the gap between academic insight and investable practice.	en_US
dc.description.tableofcontents	Contents iv List of Figures vii List of Tables viii 1 Introduction 1 2 Literature Review 2 2.1 Factors 2 2.2 Factor Preprocessing 3 2.3 Methods of Factor Combination 4 3 Methodology 6 3.1 Time‑series Factor Momentum (TSFM) 6 3.2 Cross‑Sectional Factor Momentum (CSFM) 9 3.3 Principal Components Analysis 11 3.4 Ensemble Machine Learning Models 13 3.4.1 CatBoost Ranking Model 15 3.4.2 XGBoost Ranking Model 16 3.4.3 LightGBM Ranking Model 17 3.5 Machine Learning Evaluation Metrics 18 3.6 Experimental Design 19 3.6.1 Factor Preprocessing 20 3.6.2 Back‑testing Assumptions 22 4 Empirical Results 26 4.1 Data and Factor Tables 26 4.2 Single Factor Analysis 27 4.3 Composite Factor Analysis 28 4.3.1 Equal‑Weight Composite (Benchmark) 29 4.3.2 TSFM (1, 12) 31 4.3.3 CSFM (1, 12) 33 4.3.4 TSFM (1, 1) 35 4.3.5 CSFM (1, 1) 37 4.3.6 PCA Composite 39 4.3.7 CatBoost Composite 41 4.3.8 XGBoost Composite 43 4.3.9 LightGBM Composite 45 4.4 Back‑testing 47 4.4.1 Comparison of Momentum Strategies 48 4.4.2 Comparison of PCA and Machine Learning Methods 50 4.4.3 Overall Method Comparison 52 4.5 Robustness Test 54 4.5.1 Listed Stocks Only 54 4.5.2 OTC Only 56 5 Conclusions 59 A Factor List and Performance Tables 62 A.1 Factor List 62 A.1.1 Single‑Factor Performance Metrics 66 Reference 69	zh_TW
dc.format.extent	2343090 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0112352034	en_US
dc.subject (關鍵詞)	多因子	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	動能	zh_TW
dc.subject (關鍵詞)	降維	zh_TW
dc.subject (關鍵詞)	交易策略	zh_TW
dc.subject (關鍵詞)	因子合成	zh_TW
dc.subject (關鍵詞)	Multi-Factor	en_US
dc.subject (關鍵詞)	Momentum	en_US
dc.subject (關鍵詞)	Dimensionality Reduction	en_US
dc.subject (關鍵詞)	Machine Learning	en_US
dc.subject (關鍵詞)	Factor Combination	en_US
dc.subject (關鍵詞)	Trading Strategy	en_US
dc.title (題名)	多因子組合方法於台灣市場之實證研究：傳統動能、降維與機器學習的綜合評估	zh_TW
dc.title (題名)	An Empirical Study of Multi-Factor Combination Methods in the Taiwan Market: A Comprehensive Evaluation of Traditional Momentum, Dimensionality Reduction, and Machine Learning Approaches	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Dhingra, V., Sharma, A., & Gupta, S. K. (2023). Sectoral portfolio optimization by judicious selection of financial ratios via PCA. Optimization and Engineering, 25(3), 1431–1468. [2] Engelberg, J., McLean, R. D., Pontiff, J., & Ringgenberg, M. C. (2023). Do cross‑sectional predictors contain systematic information? Journal of Financial and Quantitative Analysis, 58(3), 1172–1201. [3] Gu, S., Kelly, B. T., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223–2270. [4] Gupta, T., & Kelly, B. T. (2018). Factor momentum everywhere (SSRN Scholarly Paper No. 3300728). Social Science Research Network. [5] Scruggs, J. T. (2021, August). Does neutralizing style factors help or hurt? The Journal of Investing, 30(3). [6] Wei, X., Tian, Y., Li, N., & Peng, H. (2024). Evaluating ensemble learning techniques for stock index trend prediction: A case of China. Portuguese Economic Journal, 23(3), 505–530.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM