Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 可解釋機器學習之預測: 以法國第三人責任險成功索賠為例
Interpretable Machine Learning for Prediction: Numbers of Successful Claims in French Third-Party Liability Insurance
作者 汪于崴
Wang, Yu-Wei
貢獻者 洪芷漪<br>林士貴
汪于崴
Wang, Yu-Wei
關鍵詞 法國第三人責任險
成功索賠次數
廣義線性模型
可解釋機器學習
French third-party liability insurance
Number of successful claims
Generalized linear models
Interpretable machine learning
日期 2025
上傳時間 1-Sep-2025 16:30:33 (UTC+8)
摘要 車禍事故的外部性常導致無辜第三方蒙受損失,第三人責任險因此成為重要的風險分擔工具。本研究旨在利用可解釋機器學習方法,預測法國第三人責任險的成功索賠次數,並提升模型的解釋性與預測精度。研究採用公開的法國第三人責任險資料集,基於Zero-Inflated Poisson (ZIP)和Zero-Inflated Negative Binomial (ZINB)兩分佈,結合Boosted Trees和DART建構預測模型。透過特徵重要性分析與累積局部效應(ALE),本研究揭示影響索賠頻率的關鍵因素。結果顯示,Boosted Trees和DART模型在損失函數、Pseudo R2 和Gini2 等評估指標上均優於傳統廣義線性模型(GLM),且具備更高的可解釋性。本研究不僅驗證可解釋機器學習在保險精算中的應用潛力,還為第三人責任險的定價與風險管理提供實證依據,未來可進一步拓展至其他保險市場。
The externality of car accidents often imposes losses on innocent third parties, making third-party liability insurance a crucial risk-sharing mechanism. This study aims to predict the number of successful claims in French third-party liability insurance using interpretable machine learning methods, while enhancing both model interpretability and prediction accuracy. Utilizing a publicly available French third-party liability insurance dataset, we construct predictive models based on Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) distributions, integrated with Boosted Trees and DART techniques. Through feature importance analysis and Accumulated Local Effects (ALE), this study identifies key factors influencing claim frequency. Results demonstrate that the Boosted Trees and DART models outperform traditional Generalized Linear Models (GLMs) across evaluation metrics such as the loss function, Pseudo R², and Gini coefficient, while offering greater interpretability. This study not only validates the potential of interpretable machine learning in actuarial science but also provides empirical insights for pricing and risk management in third-party liability insurance, with potential applications in other insurance markets.
參考文獻 Aeron-Thomas, A. (2002). The role of the motor insurance industry in preventing and compensating road casualties: scoping study final report. Antipov, E. A. and Pokryshevskaya, E. B. (2020). Interpretable machine learning for demand modeling with high-dimensional data using gradient boosting machines and shapley values.19(5). Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38. Dutang, C., Charpentier, A., and Gallic, E. (2024). Insurance dataset. Gavriletea, M. D. and Moga, A. C. (2011). Romanian compulsory motor third party liability insurance in 2010 and the predictable future. Women. Hsieh, S.-H., Liu, C.-T., and Tzeng, L. Y. (2014). Insurance marketing channel as a screening mechanism: Empirical evidences from taiwan automobile insurance market. The Geneva Papers on Risk and Insurance. Issues and Practice, 39(1):90–103. Liu, C.-T., Chang, C.-H., and Chen, H. H. (2024). Underwriting information and insurers’profitability: Evidence from automobile physical damage insurance in taiwan. Pacific-Basin Finance Journal, 83:102267. Meng, S., Gao, Y., and Huang, Y. (2022). Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees. Insurance: Mathematics and Economics, 106:115–127. Qazvini, M. (2019). On the validation of claims with excess zeros in liability insurance: A comparative study. Risks, 7(3). Rashmi, K. V. and Gilad-Bachrach, R. (2015). DART: dropouts meet multiple additive regression trees. CoRR, abs/1505.01866. Tiruneh, A. T. (2013). Higher order aitken extrapolation with application to converging and diverging gauss-seidel iterations. Venezia, I., Galai, D., and Shapira, Z. (1999). Exclusive vs. independent agents: a separating equilibrium approach. Journal of Economic Behavior Organization, 40(4):443–456. Wibisono, A., Wilson, A. C., and Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47). Wüthrich, M. V. and Merz, M. (2023). Appendix B: Data and Examples, pages 553–575. Springer International Publishing, Cham. Ye, C., Zhang, L., Han, M., Yu, Y., Zhao, B., and Yang, Y. (2022). Combining predictions of auto insurance claims. Zhao, X., Zhang, L., Zhu, G., Cheng, C., He, J., Traore, S., and Singh, V. P. (2023). Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data. Computers and Electronics in Agriculture, 212:108140. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2):301–320.
描述 碩士
國立政治大學
應用數學系
111751010
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111751010
資料類型 thesis
dc.contributor.advisor 洪芷漪<br>林士貴zh_TW
dc.contributor.author (Authors) 汪于崴zh_TW
dc.contributor.author (Authors) Wang, Yu-Weien_US
dc.creator (作者) 汪于崴zh_TW
dc.creator (作者) Wang, Yu-Weien_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Sep-2025 16:30:33 (UTC+8)-
dc.date.available 1-Sep-2025 16:30:33 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2025 16:30:33 (UTC+8)-
dc.identifier (Other Identifiers) G0111751010en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/159319-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 應用數學系zh_TW
dc.description (描述) 111751010zh_TW
dc.description.abstract (摘要) 車禍事故的外部性常導致無辜第三方蒙受損失,第三人責任險因此成為重要的風險分擔工具。本研究旨在利用可解釋機器學習方法,預測法國第三人責任險的成功索賠次數,並提升模型的解釋性與預測精度。研究採用公開的法國第三人責任險資料集,基於Zero-Inflated Poisson (ZIP)和Zero-Inflated Negative Binomial (ZINB)兩分佈,結合Boosted Trees和DART建構預測模型。透過特徵重要性分析與累積局部效應(ALE),本研究揭示影響索賠頻率的關鍵因素。結果顯示,Boosted Trees和DART模型在損失函數、Pseudo R2 和Gini2 等評估指標上均優於傳統廣義線性模型(GLM),且具備更高的可解釋性。本研究不僅驗證可解釋機器學習在保險精算中的應用潛力,還為第三人責任險的定價與風險管理提供實證依據,未來可進一步拓展至其他保險市場。zh_TW
dc.description.abstract (摘要) The externality of car accidents often imposes losses on innocent third parties, making third-party liability insurance a crucial risk-sharing mechanism. This study aims to predict the number of successful claims in French third-party liability insurance using interpretable machine learning methods, while enhancing both model interpretability and prediction accuracy. Utilizing a publicly available French third-party liability insurance dataset, we construct predictive models based on Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) distributions, integrated with Boosted Trees and DART techniques. Through feature importance analysis and Accumulated Local Effects (ALE), this study identifies key factors influencing claim frequency. Results demonstrate that the Boosted Trees and DART models outperform traditional Generalized Linear Models (GLMs) across evaluation metrics such as the loss function, Pseudo R², and Gini coefficient, while offering greater interpretability. This study not only validates the potential of interpretable machine learning in actuarial science but also provides empirical insights for pricing and risk management in third-party liability insurance, with potential applications in other insurance markets.en_US
dc.description.tableofcontents 致謝 ii 中文摘要 iii Abstract iv 目錄 v 表目錄 vii 圖目錄 viii 第一章 緒論 1 第一節 研究背景 1 第二節 研究動機 2 第三節 研究目的 3 第四節 研究架構 3 第二章 文獻探討 4 第一節 第三人責任險 4 第二節 成功索賠次數及機率分佈 5 第三節 機器學習 5 第三章 研究方法 6 第一節 符號及分佈說明 6 第二節 異質化 7 第三節 線性模型 7 一、採用ZIP 分佈之廣義線性模型 7 二、採用ZINB 分佈之廣義線性模型 8 第四節 非線性模型 9 一、採用ZIP 分佈之提升樹模型 11 二、採用ZINB 分佈之提升樹模型 12 三、採用ZIP 分佈之隨機丟棄的提升樹模型 14 四、採用ZINB 分佈之隨機丟棄的提升樹模型 18 第五節 模型評估指標 19 一、損失值 19 二、偽R平方 19 三、吉尼指數 20 第六節 可解釋機器學習 20 一、特徵重要度 20 二、累積局部校應 27 第四章 實證分析 28 第一節 資料描述 28 第二節 資料預處理 29 第三節 模型訓練及相關設定 31 一、超參數及訓練次數設定 31 二、訓練流程 32 第四節 模型績效表現 33 第五節 可解釋性 34 一、特徵重要度 34 二、累積局部效應 37 第五章 結論與未來展望 42 第一節 結論 42 第二節 未來展望 42 Bibliography 44 DART 相關證明 46 Gini 指數公式證明 47 特徵重要度證明 50 特徵變數統計分析 55 .1 特徵變數說明 55 .2 特徵變數次數統計 56 .3 數值變數統計 57 .4 類別變數統計 59zh_TW
dc.format.extent 4373229 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111751010en_US
dc.subject (關鍵詞) 法國第三人責任險zh_TW
dc.subject (關鍵詞) 成功索賠次數zh_TW
dc.subject (關鍵詞) 廣義線性模型zh_TW
dc.subject (關鍵詞) 可解釋機器學習zh_TW
dc.subject (關鍵詞) French third-party liability insuranceen_US
dc.subject (關鍵詞) Number of successful claimsen_US
dc.subject (關鍵詞) Generalized linear modelsen_US
dc.subject (關鍵詞) Interpretable machine learningen_US
dc.title (題名) 可解釋機器學習之預測: 以法國第三人責任險成功索賠為例zh_TW
dc.title (題名) Interpretable Machine Learning for Prediction: Numbers of Successful Claims in French Third-Party Liability Insuranceen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Aeron-Thomas, A. (2002). The role of the motor insurance industry in preventing and compensating road casualties: scoping study final report. Antipov, E. A. and Pokryshevskaya, E. B. (2020). Interpretable machine learning for demand modeling with high-dimensional data using gradient boosting machines and shapley values.19(5). Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38. Dutang, C., Charpentier, A., and Gallic, E. (2024). Insurance dataset. Gavriletea, M. D. and Moga, A. C. (2011). Romanian compulsory motor third party liability insurance in 2010 and the predictable future. Women. Hsieh, S.-H., Liu, C.-T., and Tzeng, L. Y. (2014). Insurance marketing channel as a screening mechanism: Empirical evidences from taiwan automobile insurance market. The Geneva Papers on Risk and Insurance. Issues and Practice, 39(1):90–103. Liu, C.-T., Chang, C.-H., and Chen, H. H. (2024). Underwriting information and insurers’profitability: Evidence from automobile physical damage insurance in taiwan. Pacific-Basin Finance Journal, 83:102267. Meng, S., Gao, Y., and Huang, Y. (2022). Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees. Insurance: Mathematics and Economics, 106:115–127. Qazvini, M. (2019). On the validation of claims with excess zeros in liability insurance: A comparative study. Risks, 7(3). Rashmi, K. V. and Gilad-Bachrach, R. (2015). DART: dropouts meet multiple additive regression trees. CoRR, abs/1505.01866. Tiruneh, A. T. (2013). Higher order aitken extrapolation with application to converging and diverging gauss-seidel iterations. Venezia, I., Galai, D., and Shapira, Z. (1999). Exclusive vs. independent agents: a separating equilibrium approach. Journal of Economic Behavior Organization, 40(4):443–456. Wibisono, A., Wilson, A. C., and Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47). Wüthrich, M. V. and Merz, M. (2023). Appendix B: Data and Examples, pages 553–575. Springer International Publishing, Cham. Ye, C., Zhang, L., Han, M., Yu, Y., Zhao, B., and Yang, Y. (2022). Combining predictions of auto insurance claims. Zhao, X., Zhang, L., Zhu, G., Cheng, C., He, J., Traore, S., and Singh, V. P. (2023). Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data. Computers and Electronics in Agriculture, 212:108140. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2):301–320.zh_TW