Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 金融大數據與深度學習平台之設計與實作
Design and Implementation of the Big Data in Finance and Deep Learning Platform
作者 陳昱銘
Chen, Yu-Ming
貢獻者 劉文卿
Liou, Wen-Qing
陳昱銘
Chen, Yu-Ming
關鍵詞 金融大數據
深度學習
極大量平行運算
FinTech
Deep learning
HAWQ
JupyterHub
Tensorflow
Celery
日期 2017
上傳時間 31-Aug-2017 12:03:16 (UTC+8)
摘要 本研究主旨是希望提供一個智能金融演算法交易平台,以Django CMS作為網頁框架,區分成研發環境與交易環境,完整的功能包含用戶研發、用戶測試以及使用演算法服務。用戶研發與測試上採用IPython的互動式開發介面,利用JupyterHub進行管理與配置,能夠同時提供多個用戶存取平台,使得平台足以負載大規模用戶的使用;而演算法服務經由Celery包裝成任務,以利交付給後台進行分散式運算。搭上近年來深度學習的熱潮,平台額外擴充Tensorflow套件與GPU建置,支援多核及高速演算法運算。
面對存取大量、複雜且結構化的金融資料,本研究的資料庫採用HAWQ做為解決方案,利用其極大量平行化的架構,改善過往存取大數據所造成的系統複雜性與效能瓶頸,並搭配Ambari達到創建、監視及管理Hadoop分散式集群的功用,讓開發者在部署與維運上都將事半功倍。
由於採用新的資料庫HAWQ,傳統的資料表設計將不利反傷,因此本研究會針對程式端存取資料庫裡的金融資料,量身打造適合的資料表設計,並對其做效能評測,以確保資料能有效且迅速地被程式所取用。
The purpose of this research is to provide a smartly algorithmic trading platform with financial data. I use Django CMS as a web framework and consisting of Develop environment and Trade environment. The entire functions of the platform include “User Research and Development”,” User Testing” and “Algorithmic Services”.

“User Research and Development” and “User Testing” using IPython interactive development interface, with JupyterHub management and configuration, can simultaneously provide multiple user accessing and make the platform enough to support more and more users; “Algorithmic Services” using Celery to package algorithms into tasks can facilitate the delivery to the Server for distributed computing. By means of the growth of Deep Learning in recent years, the platform adds extra Tensorflow and GPU deployment to support multi-core and high-speed algorithm computing.

In face of accessing large number of complex and structured financial data, I choose HAWQ as the database in this research. Its extremely massively parallel processing can alleviate the complexity of system and the bottlenecks of efficiency caused by accessing massive number of data. Combing HAWQ with Ambari can achieve the functions of creation, monitoring and management of Hadoop distributed cluster. The developers will do much more easily in deployment and maintenance.

The traditional table design may not fit in with the new database HAWQ, so this research will design appropriate table, and evaluate its performance to ensure that data can be accessed effectively and quickly from programs.
參考文獻 [1] KPMG. (2016). Fintech funding hits all-time high in 2015, despite pullback in Q4: KPMG and CB Insights. Available: https://home.kpmg.com/xx/en/home/media/press-releases/2016/03/kpmg-and-cb-insights.html
[2] 金融監督委員會。2016。金融科技發展策略白皮書。Available:
http://www.fsc.gov.tw/ch/home.jsp?id=517&parentpath=0,7,478
[3] David Silver. (2016). Mastering the game of Go with deep neural networks and tree search
[4] Bartlett, M. S. (2005). Recognizing facial expression: machine learning and application to spontaneous behavior. . Computer Vision and Pattern Recognition.
[5] Geoffrey Hinton, Li Deng, and Dong Yu. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
[6] Richard J. Hillman. (2005). Securities Markets: Decimal Pricing Has Contributed to Lower Trading Costs and a More Challenging Trading Environment
[7] Bin Li, Michael Wu, and Nan Lu. (2002). System for trading financial assets using volume weighted average price. U.S. Patent No. US20020194107 A1
[8] Morton Glantz and Robert Kissell. (2013). Multi-Asset Risk Modeling: Techniques for a Global Economy in an Electronic and Algorithmic Trading Era.
[9] Robert C. Merton. (1999). Applications of Option-Pricing Theory: Twenty-Five Years Later.
[10] Ian Domowitz and Henry Yegerman. (2005). The Cost of Algorithmic Trading
A First Look at Comparative Performance.
[11] Michael J. Barclay, Terrence Hendershott, and Charles M. Jones. (2008). Order Consolidation, Price Efficiency, and Extreme Liquidity Shocks.
[12] 張育軍。2009。上海證券交易所研究中心研究報告。上海人民出版社。
[13] Alexey Grishchenko. (2016). Apache HAWQ: Next Step In Massively Parallel Processing. Available:
https://content.pivotal.io/blog/apache-hawq-next-step-in-massively-parallel-processing
[14] Hive, https://hive.apache.org/
[15] Hbase, https://hbase.apache.org/
[16] Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv
Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, and Milind Bhandarkar. (2014). HAWQ: A Massively Parallel Processing SQL Engine in Hadoop.
[17] Alexey Grishchenko. (2015). Hadoop vs MPP. Available:
https://0x0fff.com/hadoop-vs-mpp/
[18] Pivotal Inc. (2017). HAWQ Architecture. Available:
http://hdb.docs.pivotal.io/211/hawq/overview/HAWQArchitecture.html
[19] 常雷。(2016)。HAWQ ——功能強大的SQL-on-Hadoop引擎。 Available:
https://read01.com/BEzjR7.html
[20] Dong Cutting, A Bialecki, M Cafarella, and O O’MALLEY. (2005). Hadoop: a framework for running applications on large clusters built of commodity hardware.
[21] Dhruba Borthakur. (2013). HDFS Architecture Guide. Available:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[22] Limited Lin。(2014)。 HDFS-Hadoop Distributed File System 介紹。 Available:
http://limitedcode.blogspot.tw/2014/10/hdfs-hadoop-distributed-file-system-hdfs.html
[23] Apache Software Foundation. (2016). Apache Hadoop YARN. Available:
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html
[24] Fernando Perez and Brian E. Granger. (2007). IPython: A System for Interactive Scientific Computing. IEEE.
[25] Jupyter, http://jupyter.org/
[26] JupyterHub, https://jupyterhub.readthedocs.io/en/latest/index.html
[27] K. Fukushima and Sei Miyake. (1982). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern., 36, 193–202.
[28] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1, pp. 541–551.
[29]  S. Hochreiter. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber.
[30] S. Hochreiter et al. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.
[31] Tensorflow, https://www.tensorflow.org/
[32] Django, https://www.djangoproject.com/
[33] Django CMS, https://www.django-cms.org/en/
[34] Mezzanine, http://mezzanine.jupo.org/
[35] Dmitriy Samovskiy. (2008). Introduction to AMQP Messaging with RabbitMQ. p.9 Available:
https://www.slideshare.net/somic/introduction-to-amqp-messaging-with-rabbitmq
[36] Leo G. (2016). Running Asynchronous background Tasks on Linux with Python3 Flask and Celery. Available:
https://techarena51.com/index.php/running-asynchronous-background-tasks-linux-python-3-flask-celery/
[37] Yahoo奇摩理財, https://tw.money.yahoo.com/fund
[38] Docker Spawner, https://github.com/jupyterhub/dockerspawner
[39] Nvidia Docker, https://github.com/NVIDIA/nvidia-docker
[40] Docker Swarm, https://docs.docker.com/engine/swarm/
[41] Incubator-HAWQ, https://github.com/apache/incubator-hawq/blob/master/contrib/hawq-docker/Makefile
描述 碩士
國立政治大學
資訊管理學系
104356039
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104356039
資料類型 thesis
dc.contributor.advisor 劉文卿zh_TW
dc.contributor.advisor Liou, Wen-Qingen_US
dc.contributor.author (Authors) 陳昱銘zh_TW
dc.contributor.author (Authors) Chen, Yu-Mingen_US
dc.creator (作者) 陳昱銘zh_TW
dc.creator (作者) Chen, Yu-Mingen_US
dc.date (日期) 2017en_US
dc.date.accessioned 31-Aug-2017 12:03:16 (UTC+8)-
dc.date.available 31-Aug-2017 12:03:16 (UTC+8)-
dc.date.issued (上傳時間) 31-Aug-2017 12:03:16 (UTC+8)-
dc.identifier (Other Identifiers) G0104356039en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/112358-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 104356039zh_TW
dc.description.abstract (摘要) 本研究主旨是希望提供一個智能金融演算法交易平台,以Django CMS作為網頁框架,區分成研發環境與交易環境,完整的功能包含用戶研發、用戶測試以及使用演算法服務。用戶研發與測試上採用IPython的互動式開發介面,利用JupyterHub進行管理與配置,能夠同時提供多個用戶存取平台,使得平台足以負載大規模用戶的使用;而演算法服務經由Celery包裝成任務,以利交付給後台進行分散式運算。搭上近年來深度學習的熱潮,平台額外擴充Tensorflow套件與GPU建置,支援多核及高速演算法運算。
面對存取大量、複雜且結構化的金融資料,本研究的資料庫採用HAWQ做為解決方案,利用其極大量平行化的架構,改善過往存取大數據所造成的系統複雜性與效能瓶頸,並搭配Ambari達到創建、監視及管理Hadoop分散式集群的功用,讓開發者在部署與維運上都將事半功倍。
由於採用新的資料庫HAWQ,傳統的資料表設計將不利反傷,因此本研究會針對程式端存取資料庫裡的金融資料,量身打造適合的資料表設計,並對其做效能評測,以確保資料能有效且迅速地被程式所取用。
zh_TW
dc.description.abstract (摘要) The purpose of this research is to provide a smartly algorithmic trading platform with financial data. I use Django CMS as a web framework and consisting of Develop environment and Trade environment. The entire functions of the platform include “User Research and Development”,” User Testing” and “Algorithmic Services”.

“User Research and Development” and “User Testing” using IPython interactive development interface, with JupyterHub management and configuration, can simultaneously provide multiple user accessing and make the platform enough to support more and more users; “Algorithmic Services” using Celery to package algorithms into tasks can facilitate the delivery to the Server for distributed computing. By means of the growth of Deep Learning in recent years, the platform adds extra Tensorflow and GPU deployment to support multi-core and high-speed algorithm computing.

In face of accessing large number of complex and structured financial data, I choose HAWQ as the database in this research. Its extremely massively parallel processing can alleviate the complexity of system and the bottlenecks of efficiency caused by accessing massive number of data. Combing HAWQ with Ambari can achieve the functions of creation, monitoring and management of Hadoop distributed cluster. The developers will do much more easily in deployment and maintenance.

The traditional table design may not fit in with the new database HAWQ, so this research will design appropriate table, and evaluate its performance to ensure that data can be accessed effectively and quickly from programs.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的 2
第三節 研究流程 3
第二章 文獻探討 5
第一節 演算法交易 5
一、 起源與發展 5
二、 定義 6
三、 交易策略 6
第二節 HAWQ 8
一、 Massively Parallel Processing 8
二、 HAWQ起源與介紹 10
三、 HAWQ架構與組成元件 11
第三節 Hadoop 14
一、 Hadoop Distributed File System (HDFS) 14
二、 YARN 15
三、 Ambari 17
第四節 JupyterHub 18
第五節 深度學習 20
Tensorflow 21
第六節 Django CMS 23
Mezzanine 23
第七節 Celery 24
RabbitMQ 24
Celery運作流程 25
Celery特性 26
第三章 系統架構與功能 27
第一節 架構說明 27
三層式架構( 3-Layer Architecture) 27
一、 使用者介面層(Presentation Layer) 29
二、 商業邏輯層(Business Logic Layer) 30
三、 資料存取層(Data Access Layer) 31
第二節 資料表設計 33
一、 金融資料來源 33
二、 資料表基礎概念設計 34
三、 單一數值型資料表 36
四、 陣列型資料表 38
第四章 系統實作 39
第一節 開發設計 JupyterHub 39
第二節 應用服務 Celery 47
第三節 資料庫 HAWQ 50
第五章 系統測試 53
第一節 平台負載能力 53
第二節 資料庫效能 55
第六章 結論與未來展望 59
第一節 結論 59
第二節 未來展望 60
參考文獻 61
zh_TW
dc.format.extent 7651405 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104356039en_US
dc.subject (關鍵詞) 金融大數據zh_TW
dc.subject (關鍵詞) 深度學習zh_TW
dc.subject (關鍵詞) 極大量平行運算zh_TW
dc.subject (關鍵詞) FinTechen_US
dc.subject (關鍵詞) Deep learningen_US
dc.subject (關鍵詞) HAWQen_US
dc.subject (關鍵詞) JupyterHuben_US
dc.subject (關鍵詞) Tensorflowen_US
dc.subject (關鍵詞) Celeryen_US
dc.title (題名) 金融大數據與深度學習平台之設計與實作zh_TW
dc.title (題名) Design and Implementation of the Big Data in Finance and Deep Learning Platformen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] KPMG. (2016). Fintech funding hits all-time high in 2015, despite pullback in Q4: KPMG and CB Insights. Available: https://home.kpmg.com/xx/en/home/media/press-releases/2016/03/kpmg-and-cb-insights.html
[2] 金融監督委員會。2016。金融科技發展策略白皮書。Available:
http://www.fsc.gov.tw/ch/home.jsp?id=517&parentpath=0,7,478
[3] David Silver. (2016). Mastering the game of Go with deep neural networks and tree search
[4] Bartlett, M. S. (2005). Recognizing facial expression: machine learning and application to spontaneous behavior. . Computer Vision and Pattern Recognition.
[5] Geoffrey Hinton, Li Deng, and Dong Yu. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
[6] Richard J. Hillman. (2005). Securities Markets: Decimal Pricing Has Contributed to Lower Trading Costs and a More Challenging Trading Environment
[7] Bin Li, Michael Wu, and Nan Lu. (2002). System for trading financial assets using volume weighted average price. U.S. Patent No. US20020194107 A1
[8] Morton Glantz and Robert Kissell. (2013). Multi-Asset Risk Modeling: Techniques for a Global Economy in an Electronic and Algorithmic Trading Era.
[9] Robert C. Merton. (1999). Applications of Option-Pricing Theory: Twenty-Five Years Later.
[10] Ian Domowitz and Henry Yegerman. (2005). The Cost of Algorithmic Trading
A First Look at Comparative Performance.
[11] Michael J. Barclay, Terrence Hendershott, and Charles M. Jones. (2008). Order Consolidation, Price Efficiency, and Extreme Liquidity Shocks.
[12] 張育軍。2009。上海證券交易所研究中心研究報告。上海人民出版社。
[13] Alexey Grishchenko. (2016). Apache HAWQ: Next Step In Massively Parallel Processing. Available:
https://content.pivotal.io/blog/apache-hawq-next-step-in-massively-parallel-processing
[14] Hive, https://hive.apache.org/
[15] Hbase, https://hbase.apache.org/
[16] Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv
Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, and Milind Bhandarkar. (2014). HAWQ: A Massively Parallel Processing SQL Engine in Hadoop.
[17] Alexey Grishchenko. (2015). Hadoop vs MPP. Available:
https://0x0fff.com/hadoop-vs-mpp/
[18] Pivotal Inc. (2017). HAWQ Architecture. Available:
http://hdb.docs.pivotal.io/211/hawq/overview/HAWQArchitecture.html
[19] 常雷。(2016)。HAWQ ——功能強大的SQL-on-Hadoop引擎。 Available:
https://read01.com/BEzjR7.html
[20] Dong Cutting, A Bialecki, M Cafarella, and O O’MALLEY. (2005). Hadoop: a framework for running applications on large clusters built of commodity hardware.
[21] Dhruba Borthakur. (2013). HDFS Architecture Guide. Available:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[22] Limited Lin。(2014)。 HDFS-Hadoop Distributed File System 介紹。 Available:
http://limitedcode.blogspot.tw/2014/10/hdfs-hadoop-distributed-file-system-hdfs.html
[23] Apache Software Foundation. (2016). Apache Hadoop YARN. Available:
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html
[24] Fernando Perez and Brian E. Granger. (2007). IPython: A System for Interactive Scientific Computing. IEEE.
[25] Jupyter, http://jupyter.org/
[26] JupyterHub, https://jupyterhub.readthedocs.io/en/latest/index.html
[27] K. Fukushima and Sei Miyake. (1982). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern., 36, 193–202.
[28] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1, pp. 541–551.
[29]  S. Hochreiter. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber.
[30] S. Hochreiter et al. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.
[31] Tensorflow, https://www.tensorflow.org/
[32] Django, https://www.djangoproject.com/
[33] Django CMS, https://www.django-cms.org/en/
[34] Mezzanine, http://mezzanine.jupo.org/
[35] Dmitriy Samovskiy. (2008). Introduction to AMQP Messaging with RabbitMQ. p.9 Available:
https://www.slideshare.net/somic/introduction-to-amqp-messaging-with-rabbitmq
[36] Leo G. (2016). Running Asynchronous background Tasks on Linux with Python3 Flask and Celery. Available:
https://techarena51.com/index.php/running-asynchronous-background-tasks-linux-python-3-flask-celery/
[37] Yahoo奇摩理財, https://tw.money.yahoo.com/fund
[38] Docker Spawner, https://github.com/jupyterhub/dockerspawner
[39] Nvidia Docker, https://github.com/NVIDIA/nvidia-docker
[40] Docker Swarm, https://docs.docker.com/engine/swarm/
[41] Incubator-HAWQ, https://github.com/apache/incubator-hawq/blob/master/contrib/hawq-docker/Makefile
zh_TW