可解釋機器學習之預測: 以法國第三人責任險成功索賠為例 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	可解釋機器學習之預測: 以法國第三人責任險成功索賠為例 Interpretable Machine Learning for Prediction: Numbers of Successful Claims in French Third-Party Liability Insurance
作者	汪于崴 Wang, Yu-Wei
貢獻者	洪芷漪<br>林士貴汪于崴 Wang, Yu-Wei
關鍵詞	法國第三人責任險成功索賠次數廣義線性模型可解釋機器學習 French third-party liability insurance Number of successful claims Generalized linear models Interpretable machine learning
日期	2025
上傳時間	1-Sep-2025 16:30:33 (UTC+8)
摘要	車禍事故的外部性常導致無辜第三方蒙受損失，第三人責任險因此成為重要的風險分擔工具。本研究旨在利用可解釋機器學習方法，預測法國第三人責任險的成功索賠次數，並提升模型的解釋性與預測精度。研究採用公開的法國第三人責任險資料集，基於Zero-Inflated Poisson (ZIP)和Zero-Inflated Negative Binomial (ZINB)兩分佈，結合Boosted Trees和DART建構預測模型。透過特徵重要性分析與累積局部效應（ALE），本研究揭示影響索賠頻率的關鍵因素。結果顯示，Boosted Trees和DART模型在損失函數、Pseudo R2 和Gini2 等評估指標上均優於傳統廣義線性模型(GLM)，且具備更高的可解釋性。本研究不僅驗證可解釋機器學習在保險精算中的應用潛力，還為第三人責任險的定價與風險管理提供實證依據，未來可進一步拓展至其他保險市場。 The externality of car accidents often imposes losses on innocent third parties, making third-party liability insurance a crucial risk-sharing mechanism. This study aims to predict the number of successful claims in French third-party liability insurance using interpretable machine learning methods, while enhancing both model interpretability and prediction accuracy. Utilizing a publicly available French third-party liability insurance dataset, we construct predictive models based on Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) distributions, integrated with Boosted Trees and DART techniques. Through feature importance analysis and Accumulated Local Effects (ALE), this study identifies key factors influencing claim frequency. Results demonstrate that the Boosted Trees and DART models outperform traditional Generalized Linear Models (GLMs) across evaluation metrics such as the loss function, Pseudo R², and Gini coefficient, while offering greater interpretability. This study not only validates the potential of interpretable machine learning in actuarial science but also provides empirical insights for pricing and risk management in third-party liability insurance, with potential applications in other insurance markets.
參考文獻	Aeron-Thomas, A. (2002). The role of the motor insurance industry in preventing and compensating road casualties: scoping study final report. Antipov, E. A. and Pokryshevskaya, E. B. (2020). Interpretable machine learning for demand modeling with high-dimensional data using gradient boosting machines and shapley values.19(5). Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38. Dutang, C., Charpentier, A., and Gallic, E. (2024). Insurance dataset. Gavriletea, M. D. and Moga, A. C. (2011). Romanian compulsory motor third party liability insurance in 2010 and the predictable future. Women. Hsieh, S.-H., Liu, C.-T., and Tzeng, L. Y. (2014). Insurance marketing channel as a screening mechanism: Empirical evidences from taiwan automobile insurance market. The Geneva Papers on Risk and Insurance. Issues and Practice, 39(1):90–103. Liu, C.-T., Chang, C.-H., and Chen, H. H. (2024). Underwriting information and insurers’profitability: Evidence from automobile physical damage insurance in taiwan. Pacific-Basin Finance Journal, 83:102267. Meng, S., Gao, Y., and Huang, Y. (2022). Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees. Insurance: Mathematics and Economics, 106:115–127. Qazvini, M. (2019). On the validation of claims with excess zeros in liability insurance: A comparative study. Risks, 7(3). Rashmi, K. V. and Gilad-Bachrach, R. (2015). DART: dropouts meet multiple additive regression trees. CoRR, abs/1505.01866. Tiruneh, A. T. (2013). Higher order aitken extrapolation with application to converging and diverging gauss-seidel iterations. Venezia, I., Galai, D., and Shapira, Z. (1999). Exclusive vs. independent agents: a separating equilibrium approach. Journal of Economic Behavior Organization, 40(4):443–456. Wibisono, A., Wilson, A. C., and Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47). Wüthrich, M. V. and Merz, M. (2023). Appendix B: Data and Examples, pages 553–575. Springer International Publishing, Cham. Ye, C., Zhang, L., Han, M., Yu, Y., Zhao, B., and Yang, Y. (2022). Combining predictions of auto insurance claims. Zhao, X., Zhang, L., Zhu, G., Cheng, C., He, J., Traore, S., and Singh, V. P. (2023). Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data. Computers and Electronics in Agriculture, 212:108140. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2):301–320.
描述	碩士國立政治大學應用數學系 111751010
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111751010
資料類型	thesis

dc.contributor.advisor	洪芷漪<br>林士貴	zh_TW
dc.contributor.author (Authors)	汪于崴	zh_TW
dc.contributor.author (Authors)	Wang, Yu-Wei	en_US
dc.creator (作者)	汪于崴	zh_TW
dc.creator (作者)	Wang, Yu-Wei	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	1-Sep-2025 16:30:33 (UTC+8)	-
dc.date.available	1-Sep-2025 16:30:33 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2025 16:30:33 (UTC+8)	-
dc.identifier (Other Identifiers)	G0111751010	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/159319	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	111751010	zh_TW
dc.description.abstract (摘要)	車禍事故的外部性常導致無辜第三方蒙受損失，第三人責任險因此成為重要的風險分擔工具。本研究旨在利用可解釋機器學習方法，預測法國第三人責任險的成功索賠次數，並提升模型的解釋性與預測精度。研究採用公開的法國第三人責任險資料集，基於Zero-Inflated Poisson (ZIP)和Zero-Inflated Negative Binomial (ZINB)兩分佈，結合Boosted Trees和DART建構預測模型。透過特徵重要性分析與累積局部效應（ALE），本研究揭示影響索賠頻率的關鍵因素。結果顯示，Boosted Trees和DART模型在損失函數、Pseudo R2 和Gini2 等評估指標上均優於傳統廣義線性模型(GLM)，且具備更高的可解釋性。本研究不僅驗證可解釋機器學習在保險精算中的應用潛力，還為第三人責任險的定價與風險管理提供實證依據，未來可進一步拓展至其他保險市場。	zh_TW
dc.description.abstract (摘要)	The externality of car accidents often imposes losses on innocent third parties, making third-party liability insurance a crucial risk-sharing mechanism. This study aims to predict the number of successful claims in French third-party liability insurance using interpretable machine learning methods, while enhancing both model interpretability and prediction accuracy. Utilizing a publicly available French third-party liability insurance dataset, we construct predictive models based on Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) distributions, integrated with Boosted Trees and DART techniques. Through feature importance analysis and Accumulated Local Effects (ALE), this study identifies key factors influencing claim frequency. Results demonstrate that the Boosted Trees and DART models outperform traditional Generalized Linear Models (GLMs) across evaluation metrics such as the loss function, Pseudo R², and Gini coefficient, while offering greater interpretability. This study not only validates the potential of interpretable machine learning in actuarial science but also provides empirical insights for pricing and risk management in third-party liability insurance, with potential applications in other insurance markets.	en_US
dc.description.tableofcontents	致謝 ii 中文摘要 iii Abstract iv 目錄 v 表目錄 vii 圖目錄 viii 第一章緒論 1 第一節研究背景 1 第二節研究動機 2 第三節研究目的 3 第四節研究架構 3 第二章文獻探討 4 第一節第三人責任險 4 第二節成功索賠次數及機率分佈 5 第三節機器學習 5 第三章研究方法 6 第一節符號及分佈說明 6 第二節異質化 7 第三節線性模型 7 一、採用ZIP 分佈之廣義線性模型 7 二、採用ZINB 分佈之廣義線性模型 8 第四節非線性模型 9 一、採用ZIP 分佈之提升樹模型 11 二、採用ZINB 分佈之提升樹模型 12 三、採用ZIP 分佈之隨機丟棄的提升樹模型 14 四、採用ZINB 分佈之隨機丟棄的提升樹模型 18 第五節模型評估指標 19 一、損失值 19 二、偽R平方 19 三、吉尼指數 20 第六節可解釋機器學習 20 一、特徵重要度 20 二、累積局部校應 27 第四章實證分析 28 第一節資料描述 28 第二節資料預處理 29 第三節模型訓練及相關設定 31 一、超參數及訓練次數設定 31 二、訓練流程 32 第四節模型績效表現 33 第五節可解釋性 34 一、特徵重要度 34 二、累積局部效應 37 第五章結論與未來展望 42 第一節結論 42 第二節未來展望 42 Bibliography 44 DART 相關證明 46 Gini 指數公式證明 47 特徵重要度證明 50 特徵變數統計分析 55 .1 特徵變數說明 55 .2 特徵變數次數統計 56 .3 數值變數統計 57 .4 類別變數統計 59	zh_TW
dc.format.extent	4373229 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111751010	en_US
dc.subject (關鍵詞)	法國第三人責任險	zh_TW
dc.subject (關鍵詞)	成功索賠次數	zh_TW
dc.subject (關鍵詞)	廣義線性模型	zh_TW
dc.subject (關鍵詞)	可解釋機器學習	zh_TW
dc.subject (關鍵詞)	French third-party liability insurance	en_US
dc.subject (關鍵詞)	Number of successful claims	en_US
dc.subject (關鍵詞)	Generalized linear models	en_US
dc.subject (關鍵詞)	Interpretable machine learning	en_US
dc.title (題名)	可解釋機器學習之預測: 以法國第三人責任險成功索賠為例	zh_TW
dc.title (題名)	Interpretable Machine Learning for Prediction: Numbers of Successful Claims in French Third-Party Liability Insurance	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Aeron-Thomas, A. (2002). The role of the motor insurance industry in preventing and compensating road casualties: scoping study final report. Antipov, E. A. and Pokryshevskaya, E. B. (2020). Interpretable machine learning for demand modeling with high-dimensional data using gradient boosting machines and shapley values.19(5). Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38. Dutang, C., Charpentier, A., and Gallic, E. (2024). Insurance dataset. Gavriletea, M. D. and Moga, A. C. (2011). Romanian compulsory motor third party liability insurance in 2010 and the predictable future. Women. Hsieh, S.-H., Liu, C.-T., and Tzeng, L. Y. (2014). Insurance marketing channel as a screening mechanism: Empirical evidences from taiwan automobile insurance market. The Geneva Papers on Risk and Insurance. Issues and Practice, 39(1):90–103. Liu, C.-T., Chang, C.-H., and Chen, H. H. (2024). Underwriting information and insurers’profitability: Evidence from automobile physical damage insurance in taiwan. Pacific-Basin Finance Journal, 83:102267. Meng, S., Gao, Y., and Huang, Y. (2022). Actuarial intelligence in auto insurance: Claim frequency modeling with driving behavior features and improved boosted trees. Insurance: Mathematics and Economics, 106:115–127. Qazvini, M. (2019). On the validation of claims with excess zeros in liability insurance: A comparative study. Risks, 7(3). Rashmi, K. V. and Gilad-Bachrach, R. (2015). DART: dropouts meet multiple additive regression trees. CoRR, abs/1505.01866. Tiruneh, A. T. (2013). Higher order aitken extrapolation with application to converging and diverging gauss-seidel iterations. Venezia, I., Galai, D., and Shapira, Z. (1999). Exclusive vs. independent agents: a separating equilibrium approach. Journal of Economic Behavior Organization, 40(4):443–456. Wibisono, A., Wilson, A. C., and Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47). Wüthrich, M. V. and Merz, M. (2023). Appendix B: Data and Examples, pages 553–575. Springer International Publishing, Cham. Ye, C., Zhang, L., Han, M., Yu, Y., Zhao, B., and Yang, Y. (2022). Combining predictions of auto insurance claims. Zhao, X., Zhang, L., Zhu, G., Cheng, C., He, J., Traore, S., and Singh, V. P. (2023). Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data. Computers and Electronics in Agriculture, 212:108140. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2):301–320.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM