Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 基於模型平均與樣條近似之羅吉斯迴歸機率估計
Probability estimation in logistic regression based on model average and spline approximation
作者 吳榮軒
Wu, Rong-Syuan
貢獻者 黃子銘
Huang, Tzee-Ming
吳榮軒
Wu, Rong-Syuan
關鍵詞 羅吉斯迴歸
樣條函數
模型平均
貝氏模型平均
頻率主義模型平均
Gradient boosting
Bagging
非線性數據分析
Logistic regression
Spline
Model averaging
Bayesian model averaging
Frequentist model averaging
Gradient boosting
Bagging
Nonlinear data analysis
日期 2025
上傳時間 1-Sep-2025 14:50:01 (UTC+8)
摘要 羅吉斯迴歸是廣泛應用於二元事件機率估計的統計方法,但當資料呈現複雜非線性特徵時, 單一模型的估計能力常受限。本研究提出以樣條函數為基礎,結合模型平均與整合技術,提 升機率估計的準確性與穩健性。樣條函數用於捕捉解釋變數與響應變數間的非線性關係,結 合頻率主義模型平均(FMA)、貝氏模型平均(BMA)、Gradient Boosting 和 Bagging 四種 策略,通過整合多個基於樣條函數的羅吉斯迴歸模型降低過擬合風險並增強泛化能力。模擬 實驗生成包含週期性、局部峰值及交互作用的非線性資料,比較單一模型與整合方法的估計 性能。結果顯示,樣條函數結合 FMA 在積分加權平方誤差(IWSE)上全面優於其他方法, 在平均絕對誤差(MAE)上對 BMA 表現相當,對 Bagging 和 Gradient Boosting 則在多數 場景展現優勢,且計算效率最高。這些方法在非線性資料處理上顯著優於傳統羅吉斯迴歸。 本研究驗證了樣條與模型整合的理論優勢,為醫學診斷和金融風險評估提供高效且穩健的機率估計方法。
Logistic regression is a widely used statistical method for estimating the probability of binary events, but its performance is often limited when data exhibit complex nonlinear characteristics. This study proposes a framework that integrates splines with model averaging and ensemble techniques to enhance the accuracy and robustness of probability estimation. Splines are employed to capture nonlinear relationships between explanatory and response variables, combined with four strategies: Frequentist Model Averaging (FMA), Bayesian Model Averaging (BMA), Gradient Boosting, and Bagging. These methods integrate multiple spline models to reduce overfitting and improve generalization. Simulation experiments with nonlinear data featuring periodicity, local peaks, and interactions were conducted to compare the predictive performance of single models and ensemble approaches. Results show that spline-based FMA outperforms other methods in Integrated Weighted Squared Error (IWSE), performs comparably to BMA in Mean Absolute Error (MAE), and surpasses Bagging and Gradient Boosting in most scenarios, while also achieving the highest computational efficiency. These methods significantly outperform traditional logistic regression in handling nonlinear data. This study validates the theoretical advantages of combining splines with model ensemble techniques, providing an effective and robust probability estimation method for applications in medical diagnosis and financial risk assessment.
參考文獻 [1]Sawyer, A. G. (1981). Repetition, cognitive response, and persuasion. In R. E. Petty, T. M. Ostrom, & T. C. Brock (Eds.), Cognitive responses in persuasion (pp. 237–261). Hillsdale, NJ: Erlbaum. [2] Tellis, G. J. (1988). Advertising exposure, loyalty, and brand purchase: A two-stage model of choice. Journal of Marketing Research, 25(2), 134–144. [3] Schoenberg, I. J. (1946). Contributions to the problem of approximation of equidistant data by analytic functions. Quarterly of Applied Mathematics, 4, 45–99. [4] de Boor, C. (1978). A Practical Guide to splines. Springer-Verlag, New York. [5] Leamer, E. E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: Wiley. [6] Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian Model Averaging: A Tutorial. Statistical Science, 14(4), 382–401. [7] Gideon Schwarz. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6(2), 461 - 464. [8] Wasserman, L. (2000). Bayesian Model Selection and Model Averaging. Journal of Mathematical Psychology, 44, 92–107. [9] Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98(464), 879–899. [10] Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer. [11] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. [12] Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
描述 碩士
國立政治大學
統計學系
112354022
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112354022
資料類型 thesis
dc.contributor.advisor 黃子銘zh_TW
dc.contributor.advisor Huang, Tzee-Mingen_US
dc.contributor.author (Authors) 吳榮軒zh_TW
dc.contributor.author (Authors) Wu, Rong-Syuanen_US
dc.creator (作者) 吳榮軒zh_TW
dc.creator (作者) Wu, Rong-Syuanen_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Sep-2025 14:50:01 (UTC+8)-
dc.date.available 1-Sep-2025 14:50:01 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2025 14:50:01 (UTC+8)-
dc.identifier (Other Identifiers) G0112354022en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159041-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 112354022zh_TW
dc.description.abstract (摘要) 羅吉斯迴歸是廣泛應用於二元事件機率估計的統計方法,但當資料呈現複雜非線性特徵時, 單一模型的估計能力常受限。本研究提出以樣條函數為基礎,結合模型平均與整合技術,提 升機率估計的準確性與穩健性。樣條函數用於捕捉解釋變數與響應變數間的非線性關係,結 合頻率主義模型平均(FMA)、貝氏模型平均(BMA)、Gradient Boosting 和 Bagging 四種 策略,通過整合多個基於樣條函數的羅吉斯迴歸模型降低過擬合風險並增強泛化能力。模擬 實驗生成包含週期性、局部峰值及交互作用的非線性資料,比較單一模型與整合方法的估計 性能。結果顯示,樣條函數結合 FMA 在積分加權平方誤差(IWSE)上全面優於其他方法, 在平均絕對誤差(MAE)上對 BMA 表現相當,對 Bagging 和 Gradient Boosting 則在多數 場景展現優勢,且計算效率最高。這些方法在非線性資料處理上顯著優於傳統羅吉斯迴歸。 本研究驗證了樣條與模型整合的理論優勢,為醫學診斷和金融風險評估提供高效且穩健的機率估計方法。zh_TW
dc.description.abstract (摘要) Logistic regression is a widely used statistical method for estimating the probability of binary events, but its performance is often limited when data exhibit complex nonlinear characteristics. This study proposes a framework that integrates splines with model averaging and ensemble techniques to enhance the accuracy and robustness of probability estimation. Splines are employed to capture nonlinear relationships between explanatory and response variables, combined with four strategies: Frequentist Model Averaging (FMA), Bayesian Model Averaging (BMA), Gradient Boosting, and Bagging. These methods integrate multiple spline models to reduce overfitting and improve generalization. Simulation experiments with nonlinear data featuring periodicity, local peaks, and interactions were conducted to compare the predictive performance of single models and ensemble approaches. Results show that spline-based FMA outperforms other methods in Integrated Weighted Squared Error (IWSE), performs comparably to BMA in Mean Absolute Error (MAE), and surpasses Bagging and Gradient Boosting in most scenarios, while also achieving the highest computational efficiency. These methods significantly outperform traditional logistic regression in handling nonlinear data. This study validates the theoretical advantages of combining splines with model ensemble techniques, providing an effective and robust probability estimation method for applications in medical diagnosis and financial risk assessment.en_US
dc.description.tableofcontents Chapter 1 緒論 1 Chapter 2 文獻回顧與背景介紹 3 2.1 樣條函數與可加性羅吉斯樣條迴歸 3 2.2 模型整合 5 2.2.1 BMA 和 FMA 6 2.2.2 Gradient boosting 8 2.2.3 Bagging 10 Chapter 3 研究方法 11 3.1 模型構建 11 3.2 論文中提出的 FMA 方法 12 3.2.1 使用二次規劃計算權重 13 3.2.2 多重共線性處理 14 3.3 其他方法 15 Chapter 4 模擬與分析 17 4.1 決定 Kℓ 17 4.2 模擬函數設計 18 4.3 模擬實驗設計 19 4.4 評估指標 19 4.5 結果比較 20 Chapter 5 結論 26 5.1 研究成果 26 5.2 研究局限性 27 5.3 未來研究方向 27 5.4 總結 27 參考文獻 28zh_TW
dc.format.extent 2714586 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112354022en_US
dc.subject (關鍵詞) 羅吉斯迴歸zh_TW
dc.subject (關鍵詞) 樣條函數zh_TW
dc.subject (關鍵詞) 模型平均zh_TW
dc.subject (關鍵詞) 貝氏模型平均zh_TW
dc.subject (關鍵詞) 頻率主義模型平均zh_TW
dc.subject (關鍵詞) Gradient boostingzh_TW
dc.subject (關鍵詞) Baggingzh_TW
dc.subject (關鍵詞) 非線性數據分析zh_TW
dc.subject (關鍵詞) Logistic regressionen_US
dc.subject (關鍵詞) Splineen_US
dc.subject (關鍵詞) Model averagingen_US
dc.subject (關鍵詞) Bayesian model averagingen_US
dc.subject (關鍵詞) Frequentist model averagingen_US
dc.subject (關鍵詞) Gradient boostingen_US
dc.subject (關鍵詞) Baggingen_US
dc.subject (關鍵詞) Nonlinear data analysisen_US
dc.title (題名) 基於模型平均與樣條近似之羅吉斯迴歸機率估計zh_TW
dc.title (題名) Probability estimation in logistic regression based on model average and spline approximationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1]Sawyer, A. G. (1981). Repetition, cognitive response, and persuasion. In R. E. Petty, T. M. Ostrom, & T. C. Brock (Eds.), Cognitive responses in persuasion (pp. 237–261). Hillsdale, NJ: Erlbaum. [2] Tellis, G. J. (1988). Advertising exposure, loyalty, and brand purchase: A two-stage model of choice. Journal of Marketing Research, 25(2), 134–144. [3] Schoenberg, I. J. (1946). Contributions to the problem of approximation of equidistant data by analytic functions. Quarterly of Applied Mathematics, 4, 45–99. [4] de Boor, C. (1978). A Practical Guide to splines. Springer-Verlag, New York. [5] Leamer, E. E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: Wiley. [6] Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian Model Averaging: A Tutorial. Statistical Science, 14(4), 382–401. [7] Gideon Schwarz. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6(2), 461 - 464. [8] Wasserman, L. (2000). Bayesian Model Selection and Model Averaging. Journal of Mathematical Psychology, 44, 92–107. [9] Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98(464), 879–899. [10] Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer. [11] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. [12] Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.zh_TW