BERT 應用於數據型資料預測之研究：以美國職棒大聯盟全壘打數預測為例

學術產出-Theses

Article View/Open

pdf(73)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	BERT 應用於數據型資料預測之研究：以美國職棒大聯盟全壘打數預測為例 Using BERT on Prediction Problems with Numeric Input Data: the Case of Major League Baseball Home Run Prediction
作者	孫瑄正 Sun, Hsuan-Cheng
貢獻者	蔡炎龍 Tsai, Yen-Lung 孫瑄正 Sun, Hsuan-Cheng
關鍵詞	BERT 棒球深度學習長短期記憶模型神經網路球員表現預測預測系統 Transformer BERT Baseball Deep learning Long short-term memory Neural network Player performance prediction Projection system Transformer
日期	2020
上傳時間	3-Aug-2020 17:58:24 (UTC+8)
摘要	BERT 在自然語言處理的領域中是一個強而有力的深度學習的模型，它的模型架構使得它可以透徹的了解我們使用的語言，在不同的任務中像是機器翻譯或是問答任務上都有很不錯的成果。在本篇論文中，我們證實了BERT 可以使用數據形態的資料去預測結果，並且實際上做了一個例子，探討它在數據型資料輸入時的表現，我們將美國職棒大聯盟球員的數據作為輸入，使用BERT 進行關於球員未來全壘打表現的預測，並且將其預測結果與LSTM 以及現行球員表現預測系統ZiPS 做比較。我們發現在2018年的測試資料中，使用BERT 預測的準確率高達50%，LSTM有48.8% 而ZiPS只有25.4%；在2019年的測試資料中，雖然表現略有下滑，但BERT 的44.4%準確率仍舊高於LSTM 的42.8%以及ZiPS 的30.1%。總體來說，BERT 能夠對於數據形態的資料有深度的了解，使得它的表現比起傳統的方式來說更加穩定和精確，同時我們也找到了球員表現預測的一個新方法。 BERT is a powerful deep learning model in nature language processing. It performs well in various language tasks such as machine translation and question answering since it has great ability to analyze word sequence. In this paper, we show that BERT is able to make prediction with numerical data input instead of text. We want to predict output with numerical data and verify its performance. In particular, we choose the home run performance prediction task which input the stats of players in Major League Baseball. We also compare result of BERT-based approach with the performance of LSTM-based model and the popular projection system ZiPS. In testing data of year 2018, Bert-based approach reaches 50.6% accuracy while LSTM-based model has 48.8% and ZiPS gets only 25.4% accuracy rate. In 2019, BERT achieves 44.4% accuracy but 42.8% of LSTM-based and 30.1% of ZiPS. BERT is not only able to handle the numerical input with time series, but also performs stably and better than those traditional methods. Moreover, we found a new effective way in player performance prediction.
參考文獻	[1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016. [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014. [3] Derek Carty. The bat. www.RotoGrinders.com. [4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. [5] Ariel Cohen. Atc. www.fangraphs.com. [6] Jared Cross, Dash Davidson, and Peter Rosenbloom. Steamer projections. steamerprojections.com/. [7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. [8] FanGraphs. Depth charts. www.fangraphs.com. [9] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016. [10] Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence. Springer, Berlin, 2012. [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016. [12] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors, 2012. [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997. [14] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015. [15] Anil K. Jain, Jianchang Mao, and K. Mohiuddin. Artificial neural networks: A tutorial. IEEE Computer, 29:31–44, 1996. [16] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014. [17] Kaan Koseler and Matthew Stephan. Machine learning applications in baseball: A systematic literature review. Applied Artificial Intelligence, 31(9-10):745–763, 2017. [18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097– 1105. Curran Associates, Inc., 2012. [19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278– 2324, 1998. [20] J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What the frog’s eye tells the frog’s brain. Proceedings of the IRE, 47(11):1940–1951, 1959. [21] Arlo Lyle. Baseball prediction using ensemble learning. PhD thesis, University of Georgia, 2007. [22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. [23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013. [24] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969. [25] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Johannes Fürnkranz and Thorsten Joachims, editors, ICML, pages 807–814. Omnipress, 2010. [26] Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’ 04, page 78, New York, NY, USA, 2004. Association for Computing Machinery. [27] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014. [28] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018. [29] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016. [30] Sebastian Ruder. An overview of gradient descent optimization algorithms, 2016. [31] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning Representations by Back-propagating Errors. Nature, 323(6088):533–536, 1986. [32] David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016. [33] Nate Silver. Introducing pecota. Baseball Prospectus, 2003:507–514, 2003. [34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. [35] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014. [36] Tom Tango. Marcel. www.tangotiger.net. [37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017. [38] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989. [39] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016. [40] Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168:022022, feb 2019.
描述	碩士國立政治大學應用數學系 107751002
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0107751002
資料類型	thesis

dc.contributor.advisor	蔡炎龍	zh_TW
dc.contributor.advisor	Tsai, Yen-Lung	en_US
dc.contributor.author (Authors)	孫瑄正	zh_TW
dc.contributor.author (Authors)	Sun, Hsuan-Cheng	en_US
dc.creator (作者)	孫瑄正	zh_TW
dc.creator (作者)	Sun, Hsuan-Cheng	en_US
dc.date (日期)	2020	en_US
dc.date.accessioned	3-Aug-2020 17:58:24 (UTC+8)	-
dc.date.available	3-Aug-2020 17:58:24 (UTC+8)	-
dc.date.issued (上傳時間)	3-Aug-2020 17:58:24 (UTC+8)	-
dc.identifier (Other Identifiers)	G0107751002	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/131111	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	107751002	zh_TW
dc.description.abstract (摘要)	BERT 在自然語言處理的領域中是一個強而有力的深度學習的模型，它的模型架構使得它可以透徹的了解我們使用的語言，在不同的任務中像是機器翻譯或是問答任務上都有很不錯的成果。在本篇論文中，我們證實了BERT 可以使用數據形態的資料去預測結果，並且實際上做了一個例子，探討它在數據型資料輸入時的表現，我們將美國職棒大聯盟球員的數據作為輸入，使用BERT 進行關於球員未來全壘打表現的預測，並且將其預測結果與LSTM 以及現行球員表現預測系統ZiPS 做比較。我們發現在2018年的測試資料中，使用BERT 預測的準確率高達50%，LSTM有48.8% 而ZiPS只有25.4%；在2019年的測試資料中，雖然表現略有下滑，但BERT 的44.4%準確率仍舊高於LSTM 的42.8%以及ZiPS 的30.1%。總體來說，BERT 能夠對於數據形態的資料有深度的了解，使得它的表現比起傳統的方式來說更加穩定和精確，同時我們也找到了球員表現預測的一個新方法。	zh_TW
dc.description.abstract (摘要)	BERT is a powerful deep learning model in nature language processing. It performs well in various language tasks such as machine translation and question answering since it has great ability to analyze word sequence. In this paper, we show that BERT is able to make prediction with numerical data input instead of text. We want to predict output with numerical data and verify its performance. In particular, we choose the home run performance prediction task which input the stats of players in Major League Baseball. We also compare result of BERT-based approach with the performance of LSTM-based model and the popular projection system ZiPS. In testing data of year 2018, Bert-based approach reaches 50.6% accuracy while LSTM-based model has 48.8% and ZiPS gets only 25.4% accuracy rate. In 2019, BERT achieves 44.4% accuracy but 42.8% of LSTM-based and 30.1% of ZiPS. BERT is not only able to handle the numerical input with time series, but also performs stably and better than those traditional methods. Moreover, we found a new effective way in player performance prediction.	en_US
dc.description.tableofcontents	致謝 ii 中文摘要 iii Abstract iv Contents v List of Tables vii List of Figures viii 1 Introduction 1 2 Related Work 3 3 Deep Learning 4 3.1 Neuron and Neural Networks 5 3.2 Activation Function 6 3.3 Loss Function 9 3.4 Gradient Decent and Backpropagation 10 3.5 Overfitting, Dropout and Batch Normalization 11 4 Recurrent Neural Networks 15 4.1 RNN Cell 15 4.2 Long Short-Term Memory 18 4.3 Attention 19 5 Bidirectional Encoder Representations from Transformers 22 5.1 WordEmbedding 22 5.2 Transformer 23 5.3 Bidirectional Encoder Representations from Transformers 28 6 Experiments 31 6.1 Baseball Projection System 31 6.2 Baseball Dataset Preparation 32 6.3 Prediction Models 34 6.4 Model Performance 36 6.5 Class Result 39 7 Conclusion 41 Bibliography 42	zh_TW
dc.format.extent	2049757 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0107751002	en_US
dc.subject (關鍵詞)	BERT	zh_TW
dc.subject (關鍵詞)	棒球	zh_TW
dc.subject (關鍵詞)	深度學習	zh_TW
dc.subject (關鍵詞)	長短期記憶模型	zh_TW
dc.subject (關鍵詞)	神經網路	zh_TW
dc.subject (關鍵詞)	球員表現預測	zh_TW
dc.subject (關鍵詞)	預測系統	zh_TW
dc.subject (關鍵詞)	Transformer	zh_TW
dc.subject (關鍵詞)	BERT	en_US
dc.subject (關鍵詞)	Baseball	en_US
dc.subject (關鍵詞)	Deep learning	en_US
dc.subject (關鍵詞)	Long short-term memory	en_US
dc.subject (關鍵詞)	Neural network	en_US
dc.subject (關鍵詞)	Player performance prediction	en_US
dc.subject (關鍵詞)	Projection system	en_US
dc.subject (關鍵詞)	Transformer	en_US
dc.title (題名)	BERT 應用於數據型資料預測之研究：以美國職棒大聯盟全壘打數預測為例	zh_TW
dc.title (題名)	Using BERT on Prediction Problems with Numeric Input Data: the Case of Major League Baseball Home Run Prediction	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016. [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014. [3] Derek Carty. The bat. www.RotoGrinders.com. [4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. [5] Ariel Cohen. Atc. www.fangraphs.com. [6] Jared Cross, Dash Davidson, and Peter Rosenbloom. Steamer projections. steamerprojections.com/. [7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. [8] FanGraphs. Depth charts. www.fangraphs.com. [9] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016. [10] Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence. Springer, Berlin, 2012. [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016. [12] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors, 2012. [13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997. [14] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015. [15] Anil K. Jain, Jianchang Mao, and K. Mohiuddin. Artificial neural networks: A tutorial. IEEE Computer, 29:31–44, 1996. [16] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014. [17] Kaan Koseler and Matthew Stephan. Machine learning applications in baseball: A systematic literature review. Applied Artificial Intelligence, 31(9-10):745–763, 2017. [18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097– 1105. Curran Associates, Inc., 2012. [19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278– 2324, 1998. [20] J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What the frog’s eye tells the frog’s brain. Proceedings of the IRE, 47(11):1940–1951, 1959. [21] Arlo Lyle. Baseball prediction using ensemble learning. PhD thesis, University of Georgia, 2007. [22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. [23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013. [24] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969. [25] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Johannes Fürnkranz and Thorsten Joachims, editors, ICML, pages 807–814. Omnipress, 2010. [26] Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’ 04, page 78, New York, NY, USA, 2004. Association for Computing Machinery. [27] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014. [28] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018. [29] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016. [30] Sebastian Ruder. An overview of gradient descent optimization algorithms, 2016. [31] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning Representations by Back-propagating Errors. Nature, 323(6088):533–536, 1986. [32] David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016. [33] Nate Silver. Introducing pecota. Baseball Prospectus, 2003:507–514, 2003. [34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. [35] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014. [36] Tom Tango. Marcel. www.tangotiger.net. [37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017. [38] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989. [39] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016. [40] Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168:022022, feb 2019.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU202000709	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM