學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 BERT 應用於數據型資料預測之研究:以美國職棒大聯盟全壘打數預測為例
Using BERT on Prediction Problems with Numeric Input Data: the Case of Major League Baseball Home Run Prediction
作者 孫瑄正
Sun, Hsuan-Cheng
貢獻者 蔡炎龍
Tsai, Yen-Lung
孫瑄正
Sun, Hsuan-Cheng
關鍵詞 BERT
棒球
深度學習
長短期記憶模型
神經網路
球員表現預測
預測系統
Transformer
BERT
Baseball
Deep learning
Long short-term memory
Neural network
Player performance prediction
Projection system
Transformer
日期 2020
上傳時間 3-Aug-2020 17:58:24 (UTC+8)
摘要 BERT 在自然語言處理的領域中是一個強而有力的深度學習的模型,它的模型架構使得它可以透徹的了解我們使用的語言,在不同的任務中像是機器翻譯或是問答任務上都有很不錯的成果。在本篇論文中,我們證實了BERT 可以使用數據形態的資料去預測結果,並且實際上做了一個例子,探討它在數據型資料輸入時的表現,我們將美國職棒大聯盟球員的數據作為輸入,使用BERT 進行關於球員未來全壘打表現的預測,並且將其預測結果與LSTM 以及現行球員表現預測系統ZiPS 做比較。我們發現在2018年的測試資料中,使用BERT 預測的準確率高達50%,LSTM有48.8% 而ZiPS只有25.4%;在2019年的測試資料中,雖然表現略有下滑,但BERT 的44.4%準確率仍舊高於LSTM 的42.8%以及ZiPS 的30.1%。總體來說,BERT 能夠對於數據形態的資料有深度的了解,使得它的表現比起傳統的方式來說更加穩定和精確,同時我們也找到了球員表現預測的一個新方法。
BERT is a powerful deep learning model in nature language processing. It performs well in various language tasks such as machine translation and question answering since it has great ability to analyze word sequence. In this paper, we show that BERT is able to make prediction with numerical data input instead of text. We want to predict output with numerical data and verify its performance. In particular, we choose the home run performance prediction task which input the stats of players in Major League Baseball. We also compare result of BERT-based approach with the performance of LSTM-based model and the popular projection system ZiPS. In testing data of year 2018, Bert-based approach reaches 50.6% accuracy while LSTM-based model has 48.8% and ZiPS gets only 25.4% accuracy rate. In 2019, BERT achieves 44.4% accuracy but 42.8% of LSTM-based and 30.1% of ZiPS. BERT is not only able to handle the numerical input with time series, but also performs stably and better than those traditional methods. Moreover, we found a new effective way in player performance prediction.
參考文獻 [1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016.
[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014.
[3] Derek Carty. The bat. www.RotoGrinders.com.
[4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
[5] Ariel Cohen. Atc. www.fangraphs.com.
[6] Jared Cross, Dash Davidson, and Peter Rosenbloom. Steamer projections. steamerprojections.com/.
[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018.
[8] FanGraphs. Depth charts. www.fangraphs.com.
[9] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016.
[10] Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence. Springer, Berlin, 2012.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
[12] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors, 2012.
[13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997.
[14] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.
[15] Anil K. Jain, Jianchang Mao, and K. Mohiuddin. Artificial neural networks: A tutorial. IEEE Computer, 29:31–44, 1996.
[16] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
[17] Kaan Koseler and Matthew Stephan. Machine learning applications in baseball: A
systematic literature review. Applied Artificial Intelligence, 31(9-10):745–763, 2017.
[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097– 1105. Curran Associates, Inc., 2012.
[19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278– 2324, 1998.
[20] J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What the frog’s eye tells the frog’s brain. Proceedings of the IRE, 47(11):1940–1951, 1959.
[21] Arlo Lyle. Baseball prediction using ensemble learning. PhD thesis, University of Georgia, 2007.
[22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013.
[23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[24] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.
[25] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Johannes Fürnkranz and Thorsten Joachims, editors, ICML, pages 807–814. Omnipress, 2010.
[26] Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’ 04, page 78, New York, NY, USA, 2004. Association for Computing Machinery.
[27] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.
[28] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018.
[29] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
[30] Sebastian Ruder. An overview of gradient descent optimization algorithms, 2016.
[31] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning
Representations by Back-propagating Errors. Nature, 323(6088):533–536, 1986.
[32] David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George
van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016.
[33] Nate Silver. Introducing pecota. Baseball Prospectus, 2003:507–514, 2003.
[34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
[35] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
[36] Tom Tango. Marcel. www.tangotiger.net.
[37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
[38] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989.
[39] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.
[40] Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168:022022, feb 2019.
描述 碩士
國立政治大學
應用數學系
107751002
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107751002
資料類型 thesis
dc.contributor.advisor 蔡炎龍zh_TW
dc.contributor.advisor Tsai, Yen-Lungen_US
dc.contributor.author (Authors) 孫瑄正zh_TW
dc.contributor.author (Authors) Sun, Hsuan-Chengen_US
dc.creator (作者) 孫瑄正zh_TW
dc.creator (作者) Sun, Hsuan-Chengen_US
dc.date (日期) 2020en_US
dc.date.accessioned 3-Aug-2020 17:58:24 (UTC+8)-
dc.date.available 3-Aug-2020 17:58:24 (UTC+8)-
dc.date.issued (上傳時間) 3-Aug-2020 17:58:24 (UTC+8)-
dc.identifier (Other Identifiers) G0107751002en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131111-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 應用數學系zh_TW
dc.description (描述) 107751002zh_TW
dc.description.abstract (摘要) BERT 在自然語言處理的領域中是一個強而有力的深度學習的模型,它的模型架構使得它可以透徹的了解我們使用的語言,在不同的任務中像是機器翻譯或是問答任務上都有很不錯的成果。在本篇論文中,我們證實了BERT 可以使用數據形態的資料去預測結果,並且實際上做了一個例子,探討它在數據型資料輸入時的表現,我們將美國職棒大聯盟球員的數據作為輸入,使用BERT 進行關於球員未來全壘打表現的預測,並且將其預測結果與LSTM 以及現行球員表現預測系統ZiPS 做比較。我們發現在2018年的測試資料中,使用BERT 預測的準確率高達50%,LSTM有48.8% 而ZiPS只有25.4%;在2019年的測試資料中,雖然表現略有下滑,但BERT 的44.4%準確率仍舊高於LSTM 的42.8%以及ZiPS 的30.1%。總體來說,BERT 能夠對於數據形態的資料有深度的了解,使得它的表現比起傳統的方式來說更加穩定和精確,同時我們也找到了球員表現預測的一個新方法。zh_TW
dc.description.abstract (摘要) BERT is a powerful deep learning model in nature language processing. It performs well in various language tasks such as machine translation and question answering since it has great ability to analyze word sequence. In this paper, we show that BERT is able to make prediction with numerical data input instead of text. We want to predict output with numerical data and verify its performance. In particular, we choose the home run performance prediction task which input the stats of players in Major League Baseball. We also compare result of BERT-based approach with the performance of LSTM-based model and the popular projection system ZiPS. In testing data of year 2018, Bert-based approach reaches 50.6% accuracy while LSTM-based model has 48.8% and ZiPS gets only 25.4% accuracy rate. In 2019, BERT achieves 44.4% accuracy but 42.8% of LSTM-based and 30.1% of ZiPS. BERT is not only able to handle the numerical input with time series, but also performs stably and better than those traditional methods. Moreover, we found a new effective way in player performance prediction.en_US
dc.description.tableofcontents 致謝 ii
中文摘要 iii
Abstract iv
Contents v
List of Tables vii
List of Figures viii
1 Introduction 1
2 Related Work 3
3 Deep Learning 4
3.1 Neuron and Neural Networks 5
3.2 Activation Function 6
3.3 Loss Function 9
3.4 Gradient Decent and Backpropagation 10
3.5 Overfitting, Dropout and Batch Normalization 11
4 Recurrent Neural Networks 15
4.1 RNN Cell 15
4.2 Long Short-Term Memory 18
4.3 Attention 19
5 Bidirectional Encoder Representations from Transformers 22
5.1 WordEmbedding 22
5.2 Transformer 23
5.3 Bidirectional Encoder Representations from Transformers 28
6 Experiments 31
6.1 Baseball Projection System 31
6.2 Baseball Dataset Preparation 32
6.3 Prediction Models 34
6.4 Model Performance 36
6.5 Class Result 39
7 Conclusion 41
Bibliography 42
zh_TW
dc.format.extent 2049757 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107751002en_US
dc.subject (關鍵詞) BERTzh_TW
dc.subject (關鍵詞) 棒球zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) 長短期記憶模型zh_TW
dc.subject (關鍵詞) 神經網路zh_TW
dc.subject (關鍵詞) 球員表現預測zh_TW
dc.subject (關鍵詞) 預測系統zh_TW
dc.subject (關鍵詞) Transformerzh_TW
dc.subject (關鍵詞) BERTen_US
dc.subject (關鍵詞) Baseballen_US
dc.subject (關鍵詞) Deep learningen_US
dc.subject (關鍵詞) Long short-term memoryen_US
dc.subject (關鍵詞) Neural networken_US
dc.subject (關鍵詞) Player performance predictionen_US
dc.subject (關鍵詞) Projection systemen_US
dc.subject (關鍵詞) Transformeren_US
dc.title (題名) BERT 應用於數據型資料預測之研究:以美國職棒大聯盟全壘打數預測為例zh_TW
dc.title (題名) Using BERT on Prediction Problems with Numeric Input Data: the Case of Major League Baseball Home Run Predictionen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016.
[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014.
[3] Derek Carty. The bat. www.RotoGrinders.com.
[4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
[5] Ariel Cohen. Atc. www.fangraphs.com.
[6] Jared Cross, Dash Davidson, and Peter Rosenbloom. Steamer projections. steamerprojections.com/.
[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018.
[8] FanGraphs. Depth charts. www.fangraphs.com.
[9] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016.
[10] Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence. Springer, Berlin, 2012.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
[12] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors, 2012.
[13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997.
[14] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.
[15] Anil K. Jain, Jianchang Mao, and K. Mohiuddin. Artificial neural networks: A tutorial. IEEE Computer, 29:31–44, 1996.
[16] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2014.
[17] Kaan Koseler and Matthew Stephan. Machine learning applications in baseball: A
systematic literature review. Applied Artificial Intelligence, 31(9-10):745–763, 2017.
[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097– 1105. Curran Associates, Inc., 2012.
[19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, volume 86, pages 2278– 2324, 1998.
[20] J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What the frog’s eye tells the frog’s brain. Proceedings of the IRE, 47(11):1940–1951, 1959.
[21] Arlo Lyle. Baseball prediction using ensemble learning. PhD thesis, University of Georgia, 2007.
[22] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013.
[23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[24] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.
[25] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In Johannes Fürnkranz and Thorsten Joachims, editors, ICML, pages 807–814. Omnipress, 2010.
[26] Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’ 04, page 78, New York, NY, USA, 2004. Association for Computing Machinery.
[27] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.
[28] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018.
[29] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
[30] Sebastian Ruder. An overview of gradient descent optimization algorithms, 2016.
[31] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning
Representations by Back-propagating Errors. Nature, 323(6088):533–536, 1986.
[32] David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George
van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016.
[33] Nate Silver. Introducing pecota. Baseball Prospectus, 2003:507–514, 2003.
[34] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
[35] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
[36] Tom Tango. Marcel. www.tangotiger.net.
[37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
[38] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989.
[39] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.
[40] Xue Ying. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168:022022, feb 2019.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202000709en_US