Please use this identifier to cite or link to this item: https://ah.nccu.edu.tw/handle/140.119/131111


Title: BERT 應用於數據型資料預測之研究:以美國職棒大聯盟全壘打數預測為例
Using BERT on Prediction Problems with Numeric Input Data: the Case of Major League Baseball Home Run Prediction
Authors: 孫瑄正
Sun, Hsuan-Cheng
Contributors: 蔡炎龍
Tsai, Yen-Lung
孫瑄正
Sun, Hsuan-Cheng
Keywords: BERT
棒球
深度學習
長短期記憶模型
神經網路
球員表現預測
預測系統
Transformer
BERT
Baseball
Deep learning
Long short-term memory
Neural network
Player performance prediction
Projection system
Transformer
Date: 2020
Issue Date: 2020-08-03 17:58:24 (UTC+8)
Abstract: BERT 在自然語言處理的領域中是一個強而有力的深度學習的模型,它的模型架構使得它可以透徹的了解我們使用的語言,在不同的任務中像是機器翻譯或是問答任務上都有很不錯的成果。在本篇論文中,我們證實了BERT 可以使用數據形態的資料去預測結果,並且實際上做了一個例子,探討它在數據型資料輸入時的表現,我們將美國職棒大聯盟球員的數據作為輸入,使用BERT 進行關於球員未來全壘打表現的預測,並且將其預測結果與LSTM 以及現行球員表現預測系統ZiPS 做比較。我們發現在2018年的測試資料中,使用BERT 預測的準確率高達50%,LSTM有48.8% 而ZiPS只有25.4%;在2019年的測試資料中,雖然表現略有下滑,但BERT 的44.4%準確率仍舊高於LSTM 的42.8%以及ZiPS 的30.1%。總體來說,BERT 能夠對於數據形態的資料有深度的了解,使得它的表現比起傳統的方式來說更加穩定和精確,同時我們也找到了球員表現預測的一個新方法。
BERT is a powerful deep learning model in nature language processing. It performs well in various language tasks such as machine translation and question answering since it has great ability to analyze word sequence. In this paper, we show that BERT is able to make prediction with numerical data input instead of text. We want to predict output with numerical data and verify its performance. In particular, we choose the home run performance prediction task which input the stats of players in Major League Baseball. We also compare result of BERT-based approach with the performance of LSTM-based model and the popular projection system ZiPS. In testing data of year 2018, Bert-based approach reaches 50.6% accuracy while LSTM-based model has 48.8% and ZiPS gets only 25.4% accuracy rate. In 2019, BERT achieves 44.4% accuracy but 42.8% of LSTM-based and 30.1% of ZiPS. BERT is not only able to handle the numerical input with time series, but also performs stably and better than those traditional methods. Moreover, we found a new effective way in player performance prediction.
Reference: [1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016.
[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014.
[3] Derek Carty. The bat. www.RotoGrinders.com.
[4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
[5] Ariel Cohen. Atc. www.fangraphs.com.
[6] Jared Cross, Dash Davidson, and Peter Rosenbloom. Steamer projections. steamerprojections.com/.
[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018.
[8] FanGraphs. Depth charts. www.fangraphs.com.
[9] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016.
[10] Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence. Springer, Berlin, 2012.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
[12] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors, 2012.
[13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997.
[14] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.
[15] Anil K. Jain, Jianchang Mao, and K. Mohiuddin. Artificial neural networks: A tutorial. IEEE Computer, 29:31–44, 1996.
[16] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
[17] Kaan Koseler and Matthew Stephan. Machine learning applications in baseball: A
systematic literature review. Applied Artificial Intelligence, 31(9-10):745–763, 2017.
[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097– 1105. Curran Associates, Inc., 2012.
[19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278– 2324, 1998.
[20] J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What the frog’s eye tells the frog’s brain. Proceedings of the IRE, 47(11):1940–1951, 1959.
[21] Arlo Lyle. Baseball prediction using ensemble learning. PhD thesis, University of Georgia, 2007.
[22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013.
[23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[24] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.
[25] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Johannes Fürnkranz and Thorsten Joachims, editors, ICML, pages 807–814. Omnipress, 2010.
[26] Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’ 04, page 78, New York, NY, USA, 2004. Association for Computing Machinery.
[27] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.
[28] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018.
[29] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
[30] Sebastian Ruder. An overview of gradient descent optimization algorithms, 2016.
[31] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning
Representations by Back-propagating Errors. Nature, 323(6088):533–536, 1986.
[32] David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George
van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016.
[33] Nate Silver. Introducing pecota. Baseball Prospectus, 2003:507–514, 2003.
[34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
[35] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
[36] Tom Tango. Marcel. www.tangotiger.net.
[37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
[38] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989.
[39] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.
[40] Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168:022022, feb 2019.
Description: 碩士
國立政治大學
應用數學系
107751002
Source URI: http://thesis.lib.nccu.edu.tw/record/#G0107751002
Data Type: thesis
Appears in Collections:[應用數學系] 學位論文

Files in This Item:

File Description SizeFormat
100201.pdf2001KbAdobe PDF0View/Open


All items in 學術集成 are protected by copyright, with all rights reserved.


社群 sharing