應用多顆 GPU 以及 PyTorch 指令進行平行處理於加速學習演算法之執行速度 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	應用多顆 GPU 以及 PyTorch 指令進行平行處理於加速學習演算法之執行速度 Applying multiple GPUs and PyTorch commands for parallel processing to accelerate the execution speed of learning algorithms
作者	方凱柔 Fang, Kai-Rou
貢獻者	蔡瑞煌<br>林怡伶 Tsaih, Rua-Huan<br>Lin, Yi-Ling 方凱柔 Fang, Kai-Rou
關鍵詞	學習演算法自適應神經網路 PyTorch 數據平行處理 Learning algorithm Adaptive neural networks PyTorch Data parallelism
日期	2024
上傳時間	5-Aug-2024 12:07:57 (UTC+8)
摘要	在平行處理和多GPU應用方面，對於兩層自適應神經網路的研究相對較少。本研究旨在利用PyTorch框架及其相關指令，結合多個GPU，探討兩層自適應神經網路的數據平行處理。此外，我們將運用Pupil Learning Mechanism演算法，實現在多GPU環境下更高效的計算。以銅價預測數據集為基礎，我們將透過一系列實驗來驗證這一方法，並分析多GPU平行處理對模型訓練速度和準確性的影響，以全面了解和評估所提方法的實際效果和應用價值。預期能提供一個簡單的平行處理模組，讓未來使用兩層自適應神經網路的研究得以快速且簡單地進行平行處理。 Research on the parallel processing and multi-GPU application of two-layer adaptive neural networks (2LANN) is relatively scarce. This study aims to explore the data parallel processing and model parallel processing of 2LANN by leveraging the PyTorch framework and its related instructions, combined with multiple GPUs. The Pupil Learning Mechanism algorithm is employed to achieve more efficient computation in a multi-GPU environment. Based on a copper price prediction dataset, a series of experiments is conducted to validate this approach and analyze the impact of multi-GPU parallel processing on model training speed and accuracy. This aims to comprehensively understand and evaluate the practical effectiveness and application value of the proposed method. It is expected to provide a simple parallel processing module, facilitating future research on 2LANN to conduct parallel processing quickly and easily.
參考文獻	Bahrampour, S., Ramakrishnan, N., Schott, L., & Shah, M. (2015). Comparative Study of Deep Learning Software Frameworks. ArXiv Preprint ArXiv: 1511.06435 DataParallel — PyTorch 2.2 documentation. (2024). Retrieved March 6, 2024, from https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html Distributed communication package - torch.distributed — PyTorch 2.3 documentation. (2024). Retrieved June 3, 2024, from https://pytorch.org/docs/stable/distributed.html Distributed Data Parallel — PyTorch 2.3 documentation. (2024). Retrieved June 2, 2024, from https://pytorch.org/docs/stable/notes/ddp.html Distributed data parallel training using Pytorch on AWS \| Telesens. (2019). Retrieved May 31, 2024, from https://www.telesens.co/2019/04/04/distributed-data-parallel-training-using-pytorch-on-aws/ DistributedDataParallel — PyTorch 2.3 documentation. (2024). Retrieved June 1, 2024, from https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., Wu, C., Long, G., Yang, J., Xia, L., Diao, L., Liu, X., & Lin, W. (2021). DAPPLE: A pipelined data parallel approach for training large models. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, 431–445. Geng, J., Li, D., & Wang, S. (2019). ElasticPipe: An efficient and dynamic model-parallel solution to DNN training. ScienceCloud 2019 - Proceedings of the 10th Workshop on Scientific Cloud Computing, Co-Located with HPDC 2019, 5–9. Hara, K., Saito, D., & Shouno, H. (2015). Analysis of function of rectified linear unit used in deep learning. 2015 International Joint Conference on Neural Networks (IJCNN), 1–8. Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. Preprint ArXiv: 1806.03377 Ketkar, N., & Moolayil, J. (2021). Deep learning with python: Learn Best Practices of Deep Learning Models with PyTorch. In Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch. Apress Media LLC. https://doi.org/10.1007/978-1-4842-5364-9 Khomenko, V., Shyshkov, O., Radyvonenko, O., & Bokhan, K. (2016). Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization. 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), 100–103. Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. ArXiv Preprint ArXiv: 1404.5997 Lee, S., Kang, Q., Madireddy, S., Balaprakash, P., Agrawal, A., Choudhary, A., Archibald, R., & Liao, W. (2019). Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time. 2019 IEEE International Conference on Big Data (Big Data), 830–839. Nguyen, T. D. T., Park, J. H., Hossain, M. I., Hossain, M. D., Lee, S.-J., Jang, J. W., Jo, S. H., Huynh, L. N. T., Tran, T. K., & Huh, E.-N. (2019). Performance Analysis of Data Parallelism Technique in Machine Learning for Human Activity Recognition Using LSTM. 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 387–391. Optional: Data Parallelism — PyTorch Tutorials 2.2.0+cu121 documentation. (2024). Retrieved February 21, 2024, from https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU Computing. Proceedings of the IEEE, 96(5), 879–899. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., Facebook, Z. D., Research, A. I., Lin, Z., Desmaison, A., Antiga, L., Srl, O., & Lerer, A. (2017). Automatic differentiation in PyTorch. NIPS 2017 Workshop on Autodiff. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. ArXiv Preprint ArXiv: 1912.01703 Pérez-Sánchez, B., Fontenla-Romero, O., & Guijarro-Berdiñas, B. (2018). A review of adaptive online learning for artificial neural networks. Artificial Intelligence Review, 49(2), 281–299. PyTorch Distributed Overview — PyTorch Tutorials 2.3.0+cu121 documentation. (2024). Retrieved June 1, 2024, from https://pytorch.org/tutorials/beginner/dist_overview.html Ren-Han, Y. (2022). An adaptive learning-based model for copper price forecasting. Master's thesis, Department of Information Management, National Chengchi University, 1–78. Sanders, J., Kandrot, E., & Jacoboni, E. (2011). CUDA par l’exemple [une introduction à la programmation parallèle de GPU]. Pearson. torch.nn — PyTorch 2.2 documentation. (2024). Retrieved March 5, 2024, from https://pytorch.org/docs/stable/nn.html torch.nn.parallel.data_parallel — PyTorch 2.3 documentation. (2024). Retrieved June 2, 2024, from https://pytorch.org/docs/stable/_modules/torch/nn/parallel/data_parallel.html torch.utils.data — PyTorch 2.3 documentation. (2024). Retrieved June 3, 2024, from https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading Tsai, Y.-H., Jheng, Y.-J., & Tsaih, R.-H. (2019). The Cramming, Softening and Integrating Learning Algorithm with Parametric ReLU Activation Function for Binary Input/Output Problems. 2019 International Joint Conference on Neural Networks (IJCNN), 1–7. Tsaih, R. R. (1998). An Explanation of Reasoning Neural Networks. In Mathematical and Computer Modelling (Vol. 28, Issue 2). Tsaih, R.-H., Chien, Y.-H., & Chien, S.-Y. (2023). Pupil Learning Mechanism. ArXiv Preprint ArXiv: 2307.16141
描述	碩士國立政治大學資訊管理學系 111356049
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111356049
資料類型	thesis

dc.contributor.advisor	蔡瑞煌<br>林怡伶	zh_TW
dc.contributor.advisor	Tsaih, Rua-Huan<br>Lin, Yi-Ling	en_US
dc.contributor.author (Authors)	方凱柔	zh_TW
dc.contributor.author (Authors)	Fang, Kai-Rou	en_US
dc.creator (作者)	方凱柔	zh_TW
dc.creator (作者)	Fang, Kai-Rou	en_US
dc.date (日期)	2024	en_US
dc.date.accessioned	5-Aug-2024 12:07:57 (UTC+8)	-
dc.date.available	5-Aug-2024 12:07:57 (UTC+8)	-
dc.date.issued (上傳時間)	5-Aug-2024 12:07:57 (UTC+8)	-
dc.identifier (Other Identifiers)	G0111356049	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/152416	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	111356049	zh_TW
dc.description.abstract (摘要)	在平行處理和多GPU應用方面，對於兩層自適應神經網路的研究相對較少。本研究旨在利用PyTorch框架及其相關指令，結合多個GPU，探討兩層自適應神經網路的數據平行處理。此外，我們將運用Pupil Learning Mechanism演算法，實現在多GPU環境下更高效的計算。以銅價預測數據集為基礎，我們將透過一系列實驗來驗證這一方法，並分析多GPU平行處理對模型訓練速度和準確性的影響，以全面了解和評估所提方法的實際效果和應用價值。預期能提供一個簡單的平行處理模組，讓未來使用兩層自適應神經網路的研究得以快速且簡單地進行平行處理。	zh_TW
dc.description.abstract (摘要)	Research on the parallel processing and multi-GPU application of two-layer adaptive neural networks (2LANN) is relatively scarce. This study aims to explore the data parallel processing and model parallel processing of 2LANN by leveraging the PyTorch framework and its related instructions, combined with multiple GPUs. The Pupil Learning Mechanism algorithm is employed to achieve more efficient computation in a multi-GPU environment. Based on a copper price prediction dataset, a series of experiments is conducted to validate this approach and analyze the impact of multi-GPU parallel processing on model training speed and accuracy. This aims to comprehensively understand and evaluate the practical effectiveness and application value of the proposed method. It is expected to provide a simple parallel processing module, facilitating future research on 2LANN to conduct parallel processing quickly and easily.	en_US
dc.description.tableofcontents	CHAPTER 1. INTRODUCTION 1 CHAPTER 2. LITERATURE REVIEW 4 2.1 PUPIL LEARNING MECHANISM (TSAIH ET AL., 2023) 4 2.2 PYTORCH 7 2.3 DATA PARALLEL PROCESSING IN PYTORCH 8 2.3.1 DATAPARALLEL 8 2.3.2 DISTRIBUTEDDATAPARALLEL 10 CHAPTER 3. RESEARCH METHODOLOGY 13 3.1 ALGORITHM OF RPLM 13 3.2 PARALLEL PROCESSING 19 3.2.1 ORGANIZING MODULE WITH DATAPARALLEL 20 3.2.2 ORGANIZING MODULE WITH DISTRIBUTEDDATAPARALLEL 21 CHAPTER 4. EXPERIMENT DESIGN 24 4.1 DATASET 24 4.2 EXPERIMENT EVALUATION 26 CHAPTER 5. EXPERIMENT RESULTS 28 5.1 TRAINING TIME 28 5.1.1 TRAINING TIME OF THE UNDERSTANDING MODULE OF ORGANIZING MODULE 28 5.1.2 TRAINING TIME OF ORGANIZING MODULE 30 5.1.3 OVERALL TRAINING TIME 32 5.2 MAE RESULTS 33 CHAPTER 6. CONCLUSION AND FUTURE WORK 36 6.1 CONCLUSION 36 6.2 LIMITATION AND FUTURE WORK 37 REFERENCES 39	zh_TW
dc.format.extent	3189792 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111356049	en_US
dc.subject (關鍵詞)	學習演算法	zh_TW
dc.subject (關鍵詞)	自適應神經網路	zh_TW
dc.subject (關鍵詞)	PyTorch	zh_TW
dc.subject (關鍵詞)	數據平行處理	zh_TW
dc.subject (關鍵詞)	Learning algorithm	en_US
dc.subject (關鍵詞)	Adaptive neural networks	en_US
dc.subject (關鍵詞)	PyTorch	en_US
dc.subject (關鍵詞)	Data parallelism	en_US
dc.title (題名)	應用多顆 GPU 以及 PyTorch 指令進行平行處理於加速學習演算法之執行速度	zh_TW
dc.title (題名)	Applying multiple GPUs and PyTorch commands for parallel processing to accelerate the execution speed of learning algorithms	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Bahrampour, S., Ramakrishnan, N., Schott, L., & Shah, M. (2015). Comparative Study of Deep Learning Software Frameworks. ArXiv Preprint ArXiv: 1511.06435 DataParallel — PyTorch 2.2 documentation. (2024). Retrieved March 6, 2024, from https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html Distributed communication package - torch.distributed — PyTorch 2.3 documentation. (2024). Retrieved June 3, 2024, from https://pytorch.org/docs/stable/distributed.html Distributed Data Parallel — PyTorch 2.3 documentation. (2024). Retrieved June 2, 2024, from https://pytorch.org/docs/stable/notes/ddp.html Distributed data parallel training using Pytorch on AWS \| Telesens. (2019). Retrieved May 31, 2024, from https://www.telesens.co/2019/04/04/distributed-data-parallel-training-using-pytorch-on-aws/ DistributedDataParallel — PyTorch 2.3 documentation. (2024). Retrieved June 1, 2024, from https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., Wu, C., Long, G., Yang, J., Xia, L., Diao, L., Liu, X., & Lin, W. (2021). DAPPLE: A pipelined data parallel approach for training large models. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, 431–445. Geng, J., Li, D., & Wang, S. (2019). ElasticPipe: An efficient and dynamic model-parallel solution to DNN training. ScienceCloud 2019 - Proceedings of the 10th Workshop on Scientific Cloud Computing, Co-Located with HPDC 2019, 5–9. Hara, K., Saito, D., & Shouno, H. (2015). Analysis of function of rectified linear unit used in deep learning. 2015 International Joint Conference on Neural Networks (IJCNN), 1–8. Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. Preprint ArXiv: 1806.03377 Ketkar, N., & Moolayil, J. (2021). Deep learning with python: Learn Best Practices of Deep Learning Models with PyTorch. In Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch. Apress Media LLC. https://doi.org/10.1007/978-1-4842-5364-9 Khomenko, V., Shyshkov, O., Radyvonenko, O., & Bokhan, K. (2016). Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization. 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), 100–103. Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. ArXiv Preprint ArXiv: 1404.5997 Lee, S., Kang, Q., Madireddy, S., Balaprakash, P., Agrawal, A., Choudhary, A., Archibald, R., & Liao, W. (2019). Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time. 2019 IEEE International Conference on Big Data (Big Data), 830–839. Nguyen, T. D. T., Park, J. H., Hossain, M. I., Hossain, M. D., Lee, S.-J., Jang, J. W., Jo, S. H., Huynh, L. N. T., Tran, T. K., & Huh, E.-N. (2019). Performance Analysis of Data Parallelism Technique in Machine Learning for Human Activity Recognition Using LSTM. 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 387–391. Optional: Data Parallelism — PyTorch Tutorials 2.2.0+cu121 documentation. (2024). Retrieved February 21, 2024, from https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU Computing. Proceedings of the IEEE, 96(5), 879–899. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., Facebook, Z. D., Research, A. I., Lin, Z., Desmaison, A., Antiga, L., Srl, O., & Lerer, A. (2017). Automatic differentiation in PyTorch. NIPS 2017 Workshop on Autodiff. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. ArXiv Preprint ArXiv: 1912.01703 Pérez-Sánchez, B., Fontenla-Romero, O., & Guijarro-Berdiñas, B. (2018). A review of adaptive online learning for artificial neural networks. Artificial Intelligence Review, 49(2), 281–299. PyTorch Distributed Overview — PyTorch Tutorials 2.3.0+cu121 documentation. (2024). Retrieved June 1, 2024, from https://pytorch.org/tutorials/beginner/dist_overview.html Ren-Han, Y. (2022). An adaptive learning-based model for copper price forecasting. Master's thesis, Department of Information Management, National Chengchi University, 1–78. Sanders, J., Kandrot, E., & Jacoboni, E. (2011). CUDA par l’exemple [une introduction à la programmation parallèle de GPU]. Pearson. torch.nn — PyTorch 2.2 documentation. (2024). Retrieved March 5, 2024, from https://pytorch.org/docs/stable/nn.html torch.nn.parallel.data_parallel — PyTorch 2.3 documentation. (2024). Retrieved June 2, 2024, from https://pytorch.org/docs/stable/_modules/torch/nn/parallel/data_parallel.html torch.utils.data — PyTorch 2.3 documentation. (2024). Retrieved June 3, 2024, from https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading Tsai, Y.-H., Jheng, Y.-J., & Tsaih, R.-H. (2019). The Cramming, Softening and Integrating Learning Algorithm with Parametric ReLU Activation Function for Binary Input/Output Problems. 2019 International Joint Conference on Neural Networks (IJCNN), 1–7. Tsaih, R. R. (1998). An Explanation of Reasoning Neural Networks. In Mathematical and Computer Modelling (Vol. 28, Issue 2). Tsaih, R.-H., Chien, Y.-H., & Chien, S.-Y. (2023). Pupil Learning Mechanism. ArXiv Preprint ArXiv: 2307.16141	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM