職業網球單打評分模型的實證研究 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	職業網球單打評分模型的實證研究 An Empirical Study of Rating-System Model on Professional Tennis
作者	蕭立承 Hsiao, Li-Chen
貢獻者	余清祥<br>洪英超 Yue, Ching-Syang<br>Hung, Ying-Chao 蕭立承 Hsiao, Li-Chen
關鍵詞	運動大數據探索性資料分析評分模型貝氏分析職業網球 Sport Big Data Exploratory Data Analysis Rating Model Bayesian Analysis Professional Tennis Matches
日期	2020
上傳時間	2-Sep-2020 11:43:15 (UTC+8)
摘要	預測是決策分析的重要課題，如果能夠清楚地掌握未知狀況，減少因應意外事件所需的心力與資源，則更能有效率地解決問題。預測對於職業運動及球類格外重要，經常用於設計訓練課程、安排隊形及對戰策略，可以提升個人表現及增加獲勝的機會，現在國內外有不少博弈業者也以預測為研究議題，根據球隊及球員戰績及相關資料評估勝率，採用統計或機器學習模型計算賠率。本文以預測男女職業網球大滿貫（四大公開賽：澳洲、法國、溫布敦、美國）的勝負為目標，透過探索性資料分析（Exploratory Data Analysis）尋找較為重要的解釋變數，比較統計學習及機器學習等量化模型的成效。另外，本文也引進職業西洋棋常用的Glicko模型，研擬改進這個模型的可能性；其中，Glicko評分模型由哈佛教授Mark Glickman提出，依據貝氏理論更新球員特性。本文先透過探索性資料分析，尋找較能反映比賽勝負的球員相關變數，以此作為建立統計及機器學習的基礎，之後再將最佳模型與Glicko模型比較。本文採用2000～2019年男女職業網球四大滿貫資料，採用分類模型如羅吉士迴歸（統計學習模型）、SVM、Neural Network及Lightgbm（以上三者為機器學習模型），透過交叉驗證評估優劣。分析發現職業網球排名與比賽勝負關係最為密切，單以此變數訓練模型準確性可達7成，而Glicko模型在準確性或AUC（Area Under Curve）都有不錯的表現，用於男性或女性的勝負預測都優於統計及機器學習模型。本文嘗試進一步優化Glicko模型，綜合各場地類別的Glicko及其他解釋變數，發現可略微增加Glicko模型的預測準確性。 Prediction is important in decision analysis and the problem solving would be more efficient if we can narrow the possibilities down. Prediction is also important in professional sports. It can be used in designing training courses, arranging gaming strategies, and organizing team members, in order to improve game performance and winning probability. Many bookmakers use statistical or machine learning models to predict the winning odds, based on match records and related data. In this study, our goal is to investigate the models of predicting the match outcomes of Grand Slam tournaments (Australian Open, French Open, Wimbledon Championships, and US. Open). In particular, we will apply Exploratory Data Analysis (EDA) to explore important variables. In addition to statistical and machine learning models, we also consider Glicko rating model, commonly used in professional chess, to predict the game results. Glicko was proposed by Harvard professor Mark Glickman and it updates player rating based on Bayesian theory. The empirical study is based on men’s and women’s Grand Slam data (2000~ 2019). We first use EDA to determine important variables and then apply classification models, such as logistic regression (statistical learning model), Support Vector Machine, Neural Network and Light Gradient Boosting Machine (machine learning model), to evaluate the classification results through cross-validation. Our analysis results show that the professional tennis ranking is the most important variable and all models include this variable can achieve at least 70% of accuracy. The Glicko model outperforms statistical and machine learning models, with respect to accuracy and AUC (Area Under Curve). However, the improvement of modified Glicko model is quite limited.
參考文獻	英文文獻 1.Barnett, T. and Clarke, S. R. (2005). Combining Player Statistics to Predict Outcomes of Tennis Matches. IMA Journal of Management Mathematics, 16(2):113-120. 2.Boulier, B. L. and Stekler, H. O. (1999). Are Sports Seedings Good Predictors ? : An Evaluation. International Journal of Forecasting, 15(1):83-91. 3.Bradley, R. A. and Terry, M. E. (1952). The rank analysis of incomplete block designs: 1, The method of paired comparisons. Biometrika, 39, 324-345. 4.Cornman, A., Spellman, G. and Wright, D. (2017). Machine Learning for Professional Tennis Match Prediction and Betting. 5.Elo, A. E. (1978). The Rating of Chess players, Past and Present. New York: Arco. 6.Herbrich, R., Minka, T. and Graepel, T. (2006). Trueskill(tm): A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems, pp. 569-576. 7.Huang, T. K., WENG, R. C. and LIN, C.J. (2006). Generalized Bradley-Terry models and multi-class probability estimates. J. Mach. Learn. 85-115. 8.Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. Applied Statistics, 48(3):377-394. 9.Gilsdorf, K. F. and Sukhatme, V. A. (2008). Testing rosen`s sequential elimination in tournamento model incentives and player performance in professional tennis. Journal of Sports Economics, 9:287-303. 10.Kovalchik, S. A. (2016). Searching for the goat of tennis win prediction. Journal of Quantitative Analysis in Sports, 12:127-138. 11.Klaassen, F. and Magnus, J. (2003). Forecasting the winner of a tennis match. European Journal of Operational Research, 148:257-267. 12.Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 3149-3157. 13.Lisi, F., and Zanella, G. (2017). Tennis Betting: Can Statistics Beat Bookmakers ? Electronic Journal of Applied Statistical Analysis, 10:790–808. 14.Martin, I. (2019). A Point-based Bayesian Hierarchical Model to Predict the Outcome of Tennis Matches. Journal of Quantitative Analysis in Sports, 313-325. 15.Newton, P. K. and Keller, J. B. (2005). Probability of Winning at Tennis I. Theory and Data. Studies in Applied Mathematics, 114(3):241-269. 16.Pollard, G.N., Cross, R., and Meyer, D. (2006). An analysis of ten years of the four Grand Slam men’s singles data for lack of independence of set outcomes. Journal of Sports Science and Medicine, 5, 561-566. 17.Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological Review, 65 (6): 386-408. 18.Srivastava, S. (2019). Predicting success probability in professional tennis tournaments using a logistic regression model. Advances in Analytics and Applications, 59–65. 19.Sipko, M. and Knottenbelt, W. (2015). Machine learning for the prediction of professional tennis matches.
描述	碩士國立政治大學統計學系 107354024
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0107354024
資料類型	thesis

dc.contributor.advisor	余清祥<br>洪英超	zh_TW
dc.contributor.advisor	Yue, Ching-Syang<br>Hung, Ying-Chao	en_US
dc.contributor.author (Authors)	蕭立承	zh_TW
dc.contributor.author (Authors)	Hsiao, Li-Chen	en_US
dc.creator (作者)	蕭立承	zh_TW
dc.creator (作者)	Hsiao, Li-Chen	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	2-Sep-2020 11:43:15 (UTC+8)	-
dc.date.available	2-Sep-2020 11:43:15 (UTC+8)	-
dc.date.issued (上傳時間)	2-Sep-2020 11:43:15 (UTC+8)	-
dc.identifier (Other Identifiers)	G0107354024	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/131478	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	107354024	zh_TW
dc.description.abstract (摘要)	預測是決策分析的重要課題，如果能夠清楚地掌握未知狀況，減少因應意外事件所需的心力與資源，則更能有效率地解決問題。預測對於職業運動及球類格外重要，經常用於設計訓練課程、安排隊形及對戰策略，可以提升個人表現及增加獲勝的機會，現在國內外有不少博弈業者也以預測為研究議題，根據球隊及球員戰績及相關資料評估勝率，採用統計或機器學習模型計算賠率。本文以預測男女職業網球大滿貫（四大公開賽：澳洲、法國、溫布敦、美國）的勝負為目標，透過探索性資料分析（Exploratory Data Analysis）尋找較為重要的解釋變數，比較統計學習及機器學習等量化模型的成效。另外，本文也引進職業西洋棋常用的Glicko模型，研擬改進這個模型的可能性；其中，Glicko評分模型由哈佛教授Mark Glickman提出，依據貝氏理論更新球員特性。本文先透過探索性資料分析，尋找較能反映比賽勝負的球員相關變數，以此作為建立統計及機器學習的基礎，之後再將最佳模型與Glicko模型比較。本文採用2000～2019年男女職業網球四大滿貫資料，採用分類模型如羅吉士迴歸（統計學習模型）、SVM、Neural Network及Lightgbm（以上三者為機器學習模型），透過交叉驗證評估優劣。分析發現職業網球排名與比賽勝負關係最為密切，單以此變數訓練模型準確性可達7成，而Glicko模型在準確性或AUC（Area Under Curve）都有不錯的表現，用於男性或女性的勝負預測都優於統計及機器學習模型。本文嘗試進一步優化Glicko模型，綜合各場地類別的Glicko及其他解釋變數，發現可略微增加Glicko模型的預測準確性。	zh_TW
dc.description.abstract (摘要)	Prediction is important in decision analysis and the problem solving would be more efficient if we can narrow the possibilities down. Prediction is also important in professional sports. It can be used in designing training courses, arranging gaming strategies, and organizing team members, in order to improve game performance and winning probability. Many bookmakers use statistical or machine learning models to predict the winning odds, based on match records and related data. In this study, our goal is to investigate the models of predicting the match outcomes of Grand Slam tournaments (Australian Open, French Open, Wimbledon Championships, and US. Open). In particular, we will apply Exploratory Data Analysis (EDA) to explore important variables. In addition to statistical and machine learning models, we also consider Glicko rating model, commonly used in professional chess, to predict the game results. Glicko was proposed by Harvard professor Mark Glickman and it updates player rating based on Bayesian theory. The empirical study is based on men’s and women’s Grand Slam data (2000~ 2019). We first use EDA to determine important variables and then apply classification models, such as logistic regression (statistical learning model), Support Vector Machine, Neural Network and Light Gradient Boosting Machine (machine learning model), to evaluate the classification results through cross-validation. Our analysis results show that the professional tennis ranking is the most important variable and all models include this variable can achieve at least 70% of accuracy. The Glicko model outperforms statistical and machine learning models, with respect to accuracy and AUC (Area Under Curve). However, the improvement of modified Glicko model is quite limited.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究動機 1 第二節研究目的 3 第二章文獻探討與研究方法 5 第一節文獻回顧 5 第二節資料介紹 6 第三節研究方法 9 第三章統計學習與機器學習模型 14 第一節探索性資料分析 15 第二節模型建構 26 第三節模型評估 32 第四章 Glicko 評分模型 36 第一節 Glicko 評分分析 37 第二節集成法（Ensemble Methods） 40 第三節混合評分模型（Mixture-Rating Model） 42 第五章結論與建議 48 第一節結論 48 第二節建議 49 參考文獻 51 附錄 53 Glicko演算法推導 53	zh_TW
dc.format.extent	2754194 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0107354024	en_US
dc.subject (關鍵詞)	運動大數據	zh_TW
dc.subject (關鍵詞)	探索性資料分析	zh_TW
dc.subject (關鍵詞)	評分模型	zh_TW
dc.subject (關鍵詞)	貝氏分析	zh_TW
dc.subject (關鍵詞)	職業網球	zh_TW
dc.subject (關鍵詞)	Sport Big Data	en_US
dc.subject (關鍵詞)	Exploratory Data Analysis	en_US
dc.subject (關鍵詞)	Rating Model	en_US
dc.subject (關鍵詞)	Bayesian Analysis	en_US
dc.subject (關鍵詞)	Professional Tennis Matches	en_US
dc.title (題名)	職業網球單打評分模型的實證研究	zh_TW
dc.title (題名)	An Empirical Study of Rating-System Model on Professional Tennis	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	英文文獻 1.Barnett, T. and Clarke, S. R. (2005). Combining Player Statistics to Predict Outcomes of Tennis Matches. IMA Journal of Management Mathematics, 16(2):113-120. 2.Boulier, B. L. and Stekler, H. O. (1999). Are Sports Seedings Good Predictors ? : An Evaluation. International Journal of Forecasting, 15(1):83-91. 3.Bradley, R. A. and Terry, M. E. (1952). The rank analysis of incomplete block designs: 1, The method of paired comparisons. Biometrika, 39, 324-345. 4.Cornman, A., Spellman, G. and Wright, D. (2017). Machine Learning for Professional Tennis Match Prediction and Betting. 5.Elo, A. E. (1978). The Rating of Chess players, Past and Present. New York: Arco. 6.Herbrich, R., Minka, T. and Graepel, T. (2006). Trueskill(tm): A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems, pp. 569-576. 7.Huang, T. K., WENG, R. C. and LIN, C.J. (2006). Generalized Bradley-Terry models and multi-class probability estimates. J. Mach. Learn. 85-115. 8.Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. Applied Statistics, 48(3):377-394. 9.Gilsdorf, K. F. and Sukhatme, V. A. (2008). Testing rosen`s sequential elimination in tournamento model incentives and player performance in professional tennis. Journal of Sports Economics, 9:287-303. 10.Kovalchik, S. A. (2016). Searching for the goat of tennis win prediction. Journal of Quantitative Analysis in Sports, 12:127-138. 11.Klaassen, F. and Magnus, J. (2003). Forecasting the winner of a tennis match. European Journal of Operational Research, 148:257-267. 12.Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 3149-3157. 13.Lisi, F., and Zanella, G. (2017). Tennis Betting: Can Statistics Beat Bookmakers ? Electronic Journal of Applied Statistical Analysis, 10:790–808. 14.Martin, I. (2019). A Point-based Bayesian Hierarchical Model to Predict the Outcome of Tennis Matches. Journal of Quantitative Analysis in Sports, 313-325. 15.Newton, P. K. and Keller, J. B. (2005). Probability of Winning at Tennis I. Theory and Data. Studies in Applied Mathematics, 114(3):241-269. 16.Pollard, G.N., Cross, R., and Meyer, D. (2006). An analysis of ten years of the four Grand Slam men’s singles data for lack of independence of set outcomes. Journal of Sports Science and Medicine, 5, 561-566. 17.Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological Review, 65 (6): 386-408. 18.Srivastava, S. (2019). Predicting success probability in professional tennis tournaments using a logistic regression model. Advances in Analytics and Applications, 59–65. 19.Sipko, M. and Knottenbelt, W. (2015). Machine learning for the prediction of professional tennis matches.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202001670	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM