學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 優化非獨立同分佈的聯邦學習:基於相似性的聚合方法
Optimizing Federated Learning on Non-IID Data : Aggregation Approaches Based on Similarity
作者 吳仁凱
Wu, Ren-Kai
貢獻者 蔡子傑
Tsai, Tzu-Chieh
吳仁凱
Wu, Ren-Kai
關鍵詞 聯邦學習
個性化聯邦學習
分群
非獨立同分佈
資料隱私
Federated Learning
Personalized Federated Learning
Clustering
Non-Independent Identically Distributed (Non-IID)
Data Privacy
日期 2023
上傳時間 1-Dec-2023 10:33:44 (UTC+8)
摘要 隨著資訊技術和人工智慧的持續進步,資料分析和隱私保護的重要性逐漸增加。聯邦學習,作為一種新型的機器學習架構,不僅能夠滿足資料隱私的需求,允許分散的資料保持在原始位置,同時還能進行模型的協同訓練。但隨著資料的增加和分散,聯邦學習尤其在資料非獨立同分佈(Non-IID)情境下,仍面臨諸多挑戰。而多中心聯邦學習是一種有前景的解決方案,本研究深入探討了多中心聯邦學習在不同資料分佈下的效能,特別針對FedSEM算法在學習個性化模型的能力進行了研究。 為了與FedAVG算法的比較,將所有聯邦學習算法設定相同的通訊輪數及目標預測準確度,以全局模型預測本地任務的準確度作為評估指標,並採用了四種不同的資料切分策略。這些設置有助於深入了解資料分佈對聯邦學習的具體影響。 本研究對K-means分群算法進行了詳細的探討,分析其在實際應用中的優點和缺點。儘管K-means算法具有簡單性和快速性等優點,但其也存在如須預先設定集群數量及無法偵測離群值等挑戰。為了解決這些問題,本研究引入了基於密度的分群算法,如DBSCAN。DBSCAN具有自動發現集群數量和識別噪聲的特點,但確定其最佳參數仍是一大挑戰。 在非獨立同分佈(Non-IID)情境下,全局模型的對於客戶端預測能力下降及收斂速度變慢。為此,本研究提出了基於相似性的聚合方法,旨在優化Non-IID情境下的聯邦學習效能。模擬實驗結果證明了此方法在極端Non-IID情境下的有效性,且與其他現有方法相比具有明顯的優勢。 綜上所述,本研究不僅深入探討了多中心聯邦學習的各種挑戰,還提出了多種優化策略和分群算法,以增進聯邦學習的通訊和訓練效能。這些研究成果對於理解和優化聯邦學習具有重要的參考價值,同時也為未來的研究和實際應用提供了概念性驗證。
The importance of data analysis and privacy protection has arisen with information technology and AI advancements. Federated learning, a new machine learning approach, ensures privacy and enables data to stay decentralized, benefiting collaborative model training. However, federated learning encounters challenges in non-independent and identically distributed (Non-IID) scenarios. Multi-center federated learning emerges as a promising solution, and this study examines its performance across various data distributions, focusing on evaluating the FedSEM algorithm learning personalized model. In order to compare with FedAVG, all federated learning algorithms used uniform settings. They evaluated performance using global model accuracy on local tasks with four data splitting strategies, providing insights into data distribution's impact on federated learning. This study detailed assesses the K-means clustering algorithm, discussing its pros and cons in practical applications. Despite the advantages of simplicity and speed, K-means faces challenges like presetting cluster numbers and outlier detection. To tackle these challenges, this study introduces density-based clustering methods like DBSCAN, known for cluster detection and noise identification, but finding its optimal parameters is still a significant challenge. In a non-independent and non-identically distributed (Non-IID) scenario, the global model experiences a decrease in the client's task prediction performance and a slower convergence speed. To mitigate this, a similarity-based aggregation method is proposed, robusting federated learning in Non-IID scenarios. Experimental results showcase its effectiveness, presenting advantages over other methods. In summary, this study deeply explores challenges in multi-center federated learning. It introduces optimization strategies and clustering algorithms to enhance communication and training efficiency, providing essential insights for optimizing federated learning and a proof of concept for future research and applications.
參考文獻 [1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR. [2] Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1-19. [3] Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2), 1-210. [4] Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE. [5] Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. [6] Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020, November). Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning (pp. 5132-5143). PMLR. [7] Tan, A. Z., Yu, H., Cui, L., & Yang, Q. (2022). Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems. [8] Long, G., Xie, M., Shen, T., Zhou, T., Wang, X., & Jiang, J. (2022). Multi-center federated learning: clients clustering for better personalization. World Wide Web, 1-20. [9] Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2, 429-450. [10] Briggs, C., Fan, Z., & Andras, P. (2020, July). Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-9). IEEE. [11] Ghosh, A., Chung, J., Yin, D., & Ramchandran, K. (2020). An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems, 33, 19586-19597. [12] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108. https://doi.org/10.2307/2346830 [13] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).
描述 碩士
國立政治大學
資訊科學系
110753157
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110753157
資料類型 thesis
dc.contributor.advisor 蔡子傑zh_TW
dc.contributor.advisor Tsai, Tzu-Chiehen_US
dc.contributor.author (Authors) 吳仁凱zh_TW
dc.contributor.author (Authors) Wu, Ren-Kaien_US
dc.creator (作者) 吳仁凱zh_TW
dc.creator (作者) Wu, Ren-Kaien_US
dc.date (日期) 2023en_US
dc.date.accessioned 1-Dec-2023 10:33:44 (UTC+8)-
dc.date.available 1-Dec-2023 10:33:44 (UTC+8)-
dc.date.issued (上傳時間) 1-Dec-2023 10:33:44 (UTC+8)-
dc.identifier (Other Identifiers) G0110753157en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/148474-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 110753157zh_TW
dc.description.abstract (摘要) 隨著資訊技術和人工智慧的持續進步,資料分析和隱私保護的重要性逐漸增加。聯邦學習,作為一種新型的機器學習架構,不僅能夠滿足資料隱私的需求,允許分散的資料保持在原始位置,同時還能進行模型的協同訓練。但隨著資料的增加和分散,聯邦學習尤其在資料非獨立同分佈(Non-IID)情境下,仍面臨諸多挑戰。而多中心聯邦學習是一種有前景的解決方案,本研究深入探討了多中心聯邦學習在不同資料分佈下的效能,特別針對FedSEM算法在學習個性化模型的能力進行了研究。 為了與FedAVG算法的比較,將所有聯邦學習算法設定相同的通訊輪數及目標預測準確度,以全局模型預測本地任務的準確度作為評估指標,並採用了四種不同的資料切分策略。這些設置有助於深入了解資料分佈對聯邦學習的具體影響。 本研究對K-means分群算法進行了詳細的探討,分析其在實際應用中的優點和缺點。儘管K-means算法具有簡單性和快速性等優點,但其也存在如須預先設定集群數量及無法偵測離群值等挑戰。為了解決這些問題,本研究引入了基於密度的分群算法,如DBSCAN。DBSCAN具有自動發現集群數量和識別噪聲的特點,但確定其最佳參數仍是一大挑戰。 在非獨立同分佈(Non-IID)情境下,全局模型的對於客戶端預測能力下降及收斂速度變慢。為此,本研究提出了基於相似性的聚合方法,旨在優化Non-IID情境下的聯邦學習效能。模擬實驗結果證明了此方法在極端Non-IID情境下的有效性,且與其他現有方法相比具有明顯的優勢。 綜上所述,本研究不僅深入探討了多中心聯邦學習的各種挑戰,還提出了多種優化策略和分群算法,以增進聯邦學習的通訊和訓練效能。這些研究成果對於理解和優化聯邦學習具有重要的參考價值,同時也為未來的研究和實際應用提供了概念性驗證。zh_TW
dc.description.abstract (摘要) The importance of data analysis and privacy protection has arisen with information technology and AI advancements. Federated learning, a new machine learning approach, ensures privacy and enables data to stay decentralized, benefiting collaborative model training. However, federated learning encounters challenges in non-independent and identically distributed (Non-IID) scenarios. Multi-center federated learning emerges as a promising solution, and this study examines its performance across various data distributions, focusing on evaluating the FedSEM algorithm learning personalized model. In order to compare with FedAVG, all federated learning algorithms used uniform settings. They evaluated performance using global model accuracy on local tasks with four data splitting strategies, providing insights into data distribution's impact on federated learning. This study detailed assesses the K-means clustering algorithm, discussing its pros and cons in practical applications. Despite the advantages of simplicity and speed, K-means faces challenges like presetting cluster numbers and outlier detection. To tackle these challenges, this study introduces density-based clustering methods like DBSCAN, known for cluster detection and noise identification, but finding its optimal parameters is still a significant challenge. In a non-independent and non-identically distributed (Non-IID) scenario, the global model experiences a decrease in the client's task prediction performance and a slower convergence speed. To mitigate this, a similarity-based aggregation method is proposed, robusting federated learning in Non-IID scenarios. Experimental results showcase its effectiveness, presenting advantages over other methods. In summary, this study deeply explores challenges in multi-center federated learning. It introduces optimization strategies and clustering algorithms to enhance communication and training efficiency, providing essential insights for optimizing federated learning and a proof of concept for future research and applications.en_US
dc.description.tableofcontents 第一章、 緒論 1 第一節、 研究背景與動機 1 第二節、 研究目的 4 第二章、 文獻探討 5 第一節、 聯邦學習 5 第二節、 水平聯邦學習 8 第三節、 資料異質性 12 第四節、 個性化聯邦學習 15 第五節、 基於相似性聚合方法 16 第三章、 研究方法 20 第一節、 系統概述 20 第二節、 分群聚合算法 22 第四章、 實驗設計與結果分析 27 第一節、 實驗環境與評估指標 27 第二節、 實驗一:單一算法於各種資料分佈的比較 30 第三節、 實驗二:各種算法於單一資料分佈的比較 38 第四節、 實驗三:新增外部資料 46 第五章、 結論與未來展望 55 參考文獻 56zh_TW
dc.format.extent 5142492 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110753157en_US
dc.subject (關鍵詞) 聯邦學習zh_TW
dc.subject (關鍵詞) 個性化聯邦學習zh_TW
dc.subject (關鍵詞) 分群zh_TW
dc.subject (關鍵詞) 非獨立同分佈zh_TW
dc.subject (關鍵詞) 資料隱私zh_TW
dc.subject (關鍵詞) Federated Learningen_US
dc.subject (關鍵詞) Personalized Federated Learningen_US
dc.subject (關鍵詞) Clusteringen_US
dc.subject (關鍵詞) Non-Independent Identically Distributed (Non-IID)en_US
dc.subject (關鍵詞) Data Privacyen_US
dc.title (題名) 優化非獨立同分佈的聯邦學習:基於相似性的聚合方法zh_TW
dc.title (題名) Optimizing Federated Learning on Non-IID Data : Aggregation Approaches Based on Similarityen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR. [2] Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1-19. [3] Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2), 1-210. [4] Li, Q., Diao, Y., Chen, Q., & He, B. (2022, May). Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (pp. 965-978). IEEE. [5] Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. [6] Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020, November). Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning (pp. 5132-5143). PMLR. [7] Tan, A. Z., Yu, H., Cui, L., & Yang, Q. (2022). Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems. [8] Long, G., Xie, M., Shen, T., Zhou, T., Wang, X., & Jiang, J. (2022). Multi-center federated learning: clients clustering for better personalization. World Wide Web, 1-20. [9] Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2, 429-450. [10] Briggs, C., Fan, Z., & Andras, P. (2020, July). Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-9). IEEE. [11] Ghosh, A., Chung, J., Yin, D., & Ramchandran, K. (2020). An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems, 33, 19586-19597. [12] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108. https://doi.org/10.2307/2346830 [13] Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).zh_TW