Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 FedADKD:一種基於自適應解耦知識蒸餾的聯邦學習方法
FedADKD:A Federated Learning Approach based on Adaptive Decoupled Knowledge Distillation作者 周秉賢
Chou, Ping-Hsien貢獻者 張宏慶
Jang, Hung-Chin
周秉賢
Chou, Ping-Hsien關鍵詞 聯邦學習
知識蒸餾
非獨立同分布
知識遺忘
解耦知識蒸餾
Federated Learning
Knowledge Distillation
Non-IID
Knowledge Forgetting
Decoupled Knowledge Distillation日期 2025 上傳時間 4-Aug-2025 15:10:16 (UTC+8) 摘要 隨著物聯網(IoT)、行動裝置與智慧應用的迅速發展,數據的生成與儲存日益分散,使得去中心化的模型訓練需求日漸提升。在此背景下,聯邦學習(Federated Learning, FL)因其能在無需集中原始資料的前提下,協同訓練全局模型,成為一項備受矚目的技術。然而,現實中的數據通常呈現非獨立同分布(Non-IID),各客戶端本地數據差異顯著,導致傳統FL方法(如FedAvg)在聚合過程中出現「全局知識遺忘」現象,降低全局模型效能。 近期研究嘗試引入知識蒸餾來緩解此問題,例如FedNTD方法透過對齊各節點的非真實類別預測來保留全局知識,但僅蒸餾非真實類別資訊可能不足以全面融合知識。同時,現有方法多採用固定的蒸餾權重,未考慮節點間數據異質性的差異,這形成了重要的研究缺口。 為了解決上述挑戰,本研究提出了一種創新的聯邦學習框架——FedADKD(Federated Learning via Adaptive Decoupled Knowledge Distillation)。該方法將知識蒸餾機制解耦為真實類別知識蒸餾(TCKD)與非真實類別知識蒸餾(NCKD)兩部分,並量化每個節點數據分布的異質程度,以自適應地調整TCKD的蒸餾權重。具體而言,在數據極度異質的節點上降低TCKD權重,使該節點保留較多本地特性;反之在數據較均衡的節點上提高TCKD權重,以強化跨節點的共同知識學習。如此動態平衡各節點對全局模型的知識貢獻,最大程度減輕Non-IID導致的全局知識遺忘,且僅需傳送極小的額外標量資訊,對通訊量影響可忽略。 我們在CIFAR-10與CIFAR-100影像分類數據集上設計多組Non-IID情境實驗,全面評估FedADKD的效能。結果顯示,FedADKD在各種異質數據分布下均顯著優於傳統FedAvg及現有FedNTD方法,在全局模型準確率上有明顯提升,並有效降低了全局知識遺忘率。進一步的分析證實,FedADKD能更穩定地保留來自不同節點的知識,兼顧「本地適應」與「全局融合」。同時,消融實驗說明了自適應TCKD權重調整機制對提升模型表現的貢獻。 綜合而言,本文所提出的FedADKD為異質數據環境下的聯邦學習提供了一種高效且無額外隱私風險的解決方案,在理論與實務上具有重要意義。
With the rapid advancement of the Internet of Things (IoT), mobile devices, and intelligent applications, data generation and storage have become increasingly decentralized, thereby intensifying the demand for distributed model training. Against this backdrop, Federated Learning (FL) has emerged as a promising paradigm, enabling collaborative model training across clients without requiring the centralization of raw data. However, real-world data typically exhibit Non-Independent and Identically Distributed (Non-IID) heterogeneity, where local data across clients vary considerably. This non-uniform data distribution often leads to “global knowledge forgetting” during the aggregation process in conventional FL methods (e.g., FedAvg), resulting in degraded global model performance. Recent studies have attempted to alleviate this problem by incorporating knowledge distillation into FL. For instance, FedNTD preserves global knowledge by aligning the non-target class predictions among clients. Nevertheless, relying solely on non-target class information may not thoroughly integrate all knowledge sources. Moreover, most existing approaches employ fixed distillation weights without accounting for varying degrees of heterogeneity among clients, leaving an important research gap. To address these challenges, we propose an innovative federated learning framework, FedADKD (Federated Learning via Adaptive Decoupled Knowledge Distillation). The framework decouples knowledge distillation into True-Class Knowledge Distillation (TCKD) and Non-True-Class Knowledge Distillation (NCKD), and quantifies each client’s data heterogeneity to adaptively adjust the TCKD weight. Specifically, the TCKD weight is reduced for clients with highly heterogeneous data, allowing greater retention of local characteristics, whereas it is increased for clients with more balanced data to reinforce cross-client knowledge sharing. This dynamic balance of client contributions mitigates Non-IID-induced global knowledge forgetting to the greatest extent, while transmitting only a few additional scalar values—an overhead that is virtually negligible. We conduct extensive experiments on CIFAR-10 and CIFAR-100 under multiple Non-IID scenarios to evaluate the effectiveness of FedADKD. Experimental results show that FedADKD consistently outperforms the conventional FedAvg and the existing FedNTD methods across diverse heterogeneous distributions, achieving notable improvements in global accuracy and significantly reducing global knowledge forgetting rates. Further analyses confirm that FedADKD more robustly retains knowledge from various clients, thus effectively reconciling “local adaptation” with “global integration.” Ablation studies additionally underscore the contribution of the adaptive TCKD weighting mechanism to model performance enhancement. In sum, our proposed FedADKD method offers a high-efficiency, privacy-preserving solution for Federated Learning under heterogeneous data environments, demonstrating both theoretical significance and practical value.參考文獻 [1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR. [2] Mammen, P. M. (2021). Federated learning: Opportunities and challenges. arXiv preprint arXiv:2101.05428. [3] Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3), 50-60. [4] Liu, Y., James, J. Q., Kang, J., Niyato, D., & Zhang, S. (2020). Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet of Things Journal, 7(8), 7751-7763. [5] Lee, G., Jeong, M., Shin, Y., Bae, S., & Yun, S. Y. (2022). Preservation of the global knowledge by not-true distillation in federated learning. Advances in Neural Information Processing Systems, 35, 38461-38474. [6] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., ... & Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13), 3521-3526. [7] Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211. [8] Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2, 429-450. [9] Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020, November). Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning (pp. 5132-5143). PMLR. [10] Wang, J., Liu, Q., Liang, H., Joshi, G., & Poor, H. V. (2020). Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33, 7611-7623. [11] Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., ... & McMahan, H. B. (2020). Adaptive federated optimization. arXiv preprint arXiv:2003.00295. [12] Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., & Kim, S. L. (2018). Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479. [13] Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. Advances in neural information processing systems, 33, 2351-2363. [14] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. [15] Zhao, B., Cui, Q., Song, R., Qiu, Y., & Liang, J. (2022). Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (pp. 11953-11962). [16] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 and cifar-100 datasets. URl: https://www. cs. toronto. edu/kriz/cifar. html, 6, 2009. [17] Luo, M., Chen, F., Hu, D., Zhang, Y., Liang, J., & Feng, J. (2021). No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34, 5972-5984. [18] Wu, C., Wu, F., Lyu, L., Huang, Y., & Xie, X. (2022). Communication-efficient federated learning via knowledge distillation. Nature communications, 13(1), 2032. [19] Zhang, L., Shen, L., Ding, L., Tao, D., & Duan, L. Y. (2022). Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10174-10183). [20] Cao, X., Sun, G., Yu, H., & Guizani, M. (2022). PerFED-GAN: Personalized federated learning via generative adversarial networks. IEEE Internet of Things Journal, 10(5), 3749-3762. [21] Zhou, Z., Sun, F., Chen, X., Zhang, D., Han, T., & Lan, P. (2023). A decentralized federated learning based on node selection and knowledge distillation. Mathematics, 11(14), 3162. [22] Li, D., & Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019. arXiv preprint arXiv:1910.03581. [23] Zhu, Z., Hong, J., & Zhou, J. (2021, July). Data-free knowledge distillation for heterogeneous federated learning. In International conference on machine learning (pp. 12878-12889). PMLR. [24] Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural networks, 113, 54-71. [25] Li, Q., He, B., & Song, D. (2021). Model-contrastive federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10713-10722). [26] Shoham, N., Avidor, T., Keren, A., Israel, N., Benditkis, D., Mor-Yosef, L., & Zeitak, I. (2019). Overcoming forgetting in federated learning on non-iid data. arXiv preprint arXiv:1910.07796. [27] Chaudhry, A., Dokania, P. K., Ajanthan, T., & Torr, P. H. (2018). Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV) (pp. 532-547). [28] Wang, Z., Zhang, Z., Lee, C. Y., Zhang, H., Sun, R., Ren, X., ... & Pfister, T. (2022). Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 139-149). [29] Wu, H., & Wang, P. (2021). Fast-convergent federated learning with adaptive weighting. IEEE Transactions on Cognitive Communications and Networking, 7(4), 1078-1088. [30] Loh, W. Y. (2011). Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1), 14-23. [31] Song, Y., Liu, H., Zhao, S., Jin, H., Yu, J., Liu, Y., ... & Wang, L. (2024). Fedadkd: heterogeneous federated learning via adaptive knowledge distillation. Pattern Analysis and Applications, 27(4), 134. [32] Su, L., Wang, D., & Zhu, J. (2025). DKD-pFed: A novel framework for personalized federated learning via decoupling knowledge distillation and feature decorrelation. Expert Systems with Applications, 259, 125336. [33] Yashwanth, M., Nayak, G. K., Singh, A., Simmhan, Y., & Chakraborty, A. (2023). Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning. arXiv preprint arXiv:2305.19600. [34] Zhao, S., Liao, T., Fu, L., Chen, C., Bian, J., & Zheng, Z. (2024). Data-free knowledge distillation via generator-free data generation for Non-IID federated learning. Neural Networks, 179, 106627. [35] Han, S., Park, S., Wu, F., Kim, S., Wu, C., Xie, X., & Cha, M. (2022, October). Fedx: Unsupervised federated learning with cross knowledge distillation. In European Conference on Computer Vision (pp. 691-707). Cham: Springer Nature Switzerland. 描述 碩士
國立政治大學
資訊科學系碩士在職專班
112971005資料來源 http://thesis.lib.nccu.edu.tw/record/#G0112971005 資料類型 thesis dc.contributor.advisor 張宏慶 zh_TW dc.contributor.advisor Jang, Hung-Chin en_US dc.contributor.author (Authors) 周秉賢 zh_TW dc.contributor.author (Authors) Chou, Ping-Hsien en_US dc.creator (作者) 周秉賢 zh_TW dc.creator (作者) Chou, Ping-Hsien en_US dc.date (日期) 2025 en_US dc.date.accessioned 4-Aug-2025 15:10:16 (UTC+8) - dc.date.available 4-Aug-2025 15:10:16 (UTC+8) - dc.date.issued (上傳時間) 4-Aug-2025 15:10:16 (UTC+8) - dc.identifier (Other Identifiers) G0112971005 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/158709 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 112971005 zh_TW dc.description.abstract (摘要) 隨著物聯網(IoT)、行動裝置與智慧應用的迅速發展,數據的生成與儲存日益分散,使得去中心化的模型訓練需求日漸提升。在此背景下,聯邦學習(Federated Learning, FL)因其能在無需集中原始資料的前提下,協同訓練全局模型,成為一項備受矚目的技術。然而,現實中的數據通常呈現非獨立同分布(Non-IID),各客戶端本地數據差異顯著,導致傳統FL方法(如FedAvg)在聚合過程中出現「全局知識遺忘」現象,降低全局模型效能。 近期研究嘗試引入知識蒸餾來緩解此問題,例如FedNTD方法透過對齊各節點的非真實類別預測來保留全局知識,但僅蒸餾非真實類別資訊可能不足以全面融合知識。同時,現有方法多採用固定的蒸餾權重,未考慮節點間數據異質性的差異,這形成了重要的研究缺口。 為了解決上述挑戰,本研究提出了一種創新的聯邦學習框架——FedADKD(Federated Learning via Adaptive Decoupled Knowledge Distillation)。該方法將知識蒸餾機制解耦為真實類別知識蒸餾(TCKD)與非真實類別知識蒸餾(NCKD)兩部分,並量化每個節點數據分布的異質程度,以自適應地調整TCKD的蒸餾權重。具體而言,在數據極度異質的節點上降低TCKD權重,使該節點保留較多本地特性;反之在數據較均衡的節點上提高TCKD權重,以強化跨節點的共同知識學習。如此動態平衡各節點對全局模型的知識貢獻,最大程度減輕Non-IID導致的全局知識遺忘,且僅需傳送極小的額外標量資訊,對通訊量影響可忽略。 我們在CIFAR-10與CIFAR-100影像分類數據集上設計多組Non-IID情境實驗,全面評估FedADKD的效能。結果顯示,FedADKD在各種異質數據分布下均顯著優於傳統FedAvg及現有FedNTD方法,在全局模型準確率上有明顯提升,並有效降低了全局知識遺忘率。進一步的分析證實,FedADKD能更穩定地保留來自不同節點的知識,兼顧「本地適應」與「全局融合」。同時,消融實驗說明了自適應TCKD權重調整機制對提升模型表現的貢獻。 綜合而言,本文所提出的FedADKD為異質數據環境下的聯邦學習提供了一種高效且無額外隱私風險的解決方案,在理論與實務上具有重要意義。 zh_TW dc.description.abstract (摘要) With the rapid advancement of the Internet of Things (IoT), mobile devices, and intelligent applications, data generation and storage have become increasingly decentralized, thereby intensifying the demand for distributed model training. Against this backdrop, Federated Learning (FL) has emerged as a promising paradigm, enabling collaborative model training across clients without requiring the centralization of raw data. However, real-world data typically exhibit Non-Independent and Identically Distributed (Non-IID) heterogeneity, where local data across clients vary considerably. This non-uniform data distribution often leads to “global knowledge forgetting” during the aggregation process in conventional FL methods (e.g., FedAvg), resulting in degraded global model performance. Recent studies have attempted to alleviate this problem by incorporating knowledge distillation into FL. For instance, FedNTD preserves global knowledge by aligning the non-target class predictions among clients. Nevertheless, relying solely on non-target class information may not thoroughly integrate all knowledge sources. Moreover, most existing approaches employ fixed distillation weights without accounting for varying degrees of heterogeneity among clients, leaving an important research gap. To address these challenges, we propose an innovative federated learning framework, FedADKD (Federated Learning via Adaptive Decoupled Knowledge Distillation). The framework decouples knowledge distillation into True-Class Knowledge Distillation (TCKD) and Non-True-Class Knowledge Distillation (NCKD), and quantifies each client’s data heterogeneity to adaptively adjust the TCKD weight. Specifically, the TCKD weight is reduced for clients with highly heterogeneous data, allowing greater retention of local characteristics, whereas it is increased for clients with more balanced data to reinforce cross-client knowledge sharing. This dynamic balance of client contributions mitigates Non-IID-induced global knowledge forgetting to the greatest extent, while transmitting only a few additional scalar values—an overhead that is virtually negligible. We conduct extensive experiments on CIFAR-10 and CIFAR-100 under multiple Non-IID scenarios to evaluate the effectiveness of FedADKD. Experimental results show that FedADKD consistently outperforms the conventional FedAvg and the existing FedNTD methods across diverse heterogeneous distributions, achieving notable improvements in global accuracy and significantly reducing global knowledge forgetting rates. Further analyses confirm that FedADKD more robustly retains knowledge from various clients, thus effectively reconciling “local adaptation” with “global integration.” Ablation studies additionally underscore the contribution of the adaptive TCKD weighting mechanism to model performance enhancement. In sum, our proposed FedADKD method offers a high-efficiency, privacy-preserving solution for Federated Learning under heterogeneous data environments, demonstrating both theoretical significance and practical value. en_US dc.description.tableofcontents 第一章 緒論 10 第二章 相關研究 13 2.1 聯邦平均算法(FedAvg) 13 2.2 聯邦非真實知識蒸餾(FedNTD) 14 2.3 解耦知識蒸餾(Decoupled Knowledge Distillation, DKD) 15 第三章 研究背景問題分析 18 3.1 Non-IID 對全局知識遺忘的影響 18 3.2 TCKD 與 NCKD 於聯邦學習的應用與影響 20 3.2.1 FedDKD 與 FedNTD 的性能比較 22 3.2.2 TCKD 於 Non-IID 情境中的影響 24 3.3 TCKD 權重於不同節點異質程度影響 26 3.3.1 實驗一:IID 節點 TCKD 權重改變所帶來的差異 27 3.3.2 實驗二:Non-IID 節點 TCKD 權重改變所帶來的差異 28 3.3.3 綜合討論 30 第四章 研究方法 31 4.1 FedADKD 31 4.2 解耦蒸餾損失 33 4.2.1 真實類別知識蒸餾(TCKD) 33 4.2.2 非真實類別知識蒸餾(NCKD) 33 4.3 自適應 TCKD 權重設計 34 4.4 FedADKD 演算法 35 第五章 實驗設計與結果分析 38 5.1 實驗設計 38 5.1.1 實驗數據集與模型訓練環境設定 38 5.1.2 非獨立同分布(Non-IID)數據模擬策略 39 5.2 不同數據異質情境下 FedADKD 之效能分析與比較 44 5.2.1 Sharding 分割策略之效能分析 44 5.2.2 LDA 分割策略之效能分析 45 5.2.3 IID 與 Non-IID 混合分割策略之效能分析 46 5.3 FedADKD的全局知識保留能力之深入分析 48 5.3.1 知識遺忘率之分析(Forgetting Measure) 48 5.3.2 本地分布之外的知識保留能力分析(Knowledge Outside of Local Distribution) 49 5.4 消融實驗(Ablation Study) 51 第六章 結論與未來研究 54 6.1 結論 54 6.2 未來研究方向 55 參考文獻 57 zh_TW dc.format.extent 3279072 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0112971005 en_US dc.subject (關鍵詞) 聯邦學習 zh_TW dc.subject (關鍵詞) 知識蒸餾 zh_TW dc.subject (關鍵詞) 非獨立同分布 zh_TW dc.subject (關鍵詞) 知識遺忘 zh_TW dc.subject (關鍵詞) 解耦知識蒸餾 zh_TW dc.subject (關鍵詞) Federated Learning en_US dc.subject (關鍵詞) Knowledge Distillation en_US dc.subject (關鍵詞) Non-IID en_US dc.subject (關鍵詞) Knowledge Forgetting en_US dc.subject (關鍵詞) Decoupled Knowledge Distillation en_US dc.title (題名) FedADKD:一種基於自適應解耦知識蒸餾的聯邦學習方法 zh_TW dc.title (題名) FedADKD:A Federated Learning Approach based on Adaptive Decoupled Knowledge Distillation en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR. [2] Mammen, P. M. (2021). Federated learning: Opportunities and challenges. arXiv preprint arXiv:2101.05428. [3] Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3), 50-60. [4] Liu, Y., James, J. Q., Kang, J., Niyato, D., & Zhang, S. (2020). Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet of Things Journal, 7(8), 7751-7763. [5] Lee, G., Jeong, M., Shin, Y., Bae, S., & Yun, S. Y. (2022). Preservation of the global knowledge by not-true distillation in federated learning. Advances in Neural Information Processing Systems, 35, 38461-38474. [6] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., ... & Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13), 3521-3526. [7] Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., & Bengio, Y. (2013). An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211. [8] Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2, 429-450. [9] Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020, November). Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning (pp. 5132-5143). PMLR. [10] Wang, J., Liu, Q., Liang, H., Joshi, G., & Poor, H. V. (2020). Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33, 7611-7623. [11] Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., ... & McMahan, H. B. (2020). Adaptive federated optimization. arXiv preprint arXiv:2003.00295. [12] Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., & Kim, S. L. (2018). Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479. [13] Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. Advances in neural information processing systems, 33, 2351-2363. [14] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. [15] Zhao, B., Cui, Q., Song, R., Qiu, Y., & Liang, J. (2022). Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (pp. 11953-11962). [16] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 and cifar-100 datasets. URl: https://www. cs. toronto. edu/kriz/cifar. html, 6, 2009. [17] Luo, M., Chen, F., Hu, D., Zhang, Y., Liang, J., & Feng, J. (2021). No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34, 5972-5984. [18] Wu, C., Wu, F., Lyu, L., Huang, Y., & Xie, X. (2022). Communication-efficient federated learning via knowledge distillation. Nature communications, 13(1), 2032. [19] Zhang, L., Shen, L., Ding, L., Tao, D., & Duan, L. Y. (2022). Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10174-10183). [20] Cao, X., Sun, G., Yu, H., & Guizani, M. (2022). PerFED-GAN: Personalized federated learning via generative adversarial networks. IEEE Internet of Things Journal, 10(5), 3749-3762. [21] Zhou, Z., Sun, F., Chen, X., Zhang, D., Han, T., & Lan, P. (2023). A decentralized federated learning based on node selection and knowledge distillation. Mathematics, 11(14), 3162. [22] Li, D., & Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019. arXiv preprint arXiv:1910.03581. [23] Zhu, Z., Hong, J., & Zhou, J. (2021, July). Data-free knowledge distillation for heterogeneous federated learning. In International conference on machine learning (pp. 12878-12889). PMLR. [24] Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural networks, 113, 54-71. [25] Li, Q., He, B., & Song, D. (2021). Model-contrastive federated learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10713-10722). [26] Shoham, N., Avidor, T., Keren, A., Israel, N., Benditkis, D., Mor-Yosef, L., & Zeitak, I. (2019). Overcoming forgetting in federated learning on non-iid data. arXiv preprint arXiv:1910.07796. [27] Chaudhry, A., Dokania, P. K., Ajanthan, T., & Torr, P. H. (2018). Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV) (pp. 532-547). [28] Wang, Z., Zhang, Z., Lee, C. Y., Zhang, H., Sun, R., Ren, X., ... & Pfister, T. (2022). Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 139-149). [29] Wu, H., & Wang, P. (2021). Fast-convergent federated learning with adaptive weighting. IEEE Transactions on Cognitive Communications and Networking, 7(4), 1078-1088. [30] Loh, W. Y. (2011). Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1), 14-23. [31] Song, Y., Liu, H., Zhao, S., Jin, H., Yu, J., Liu, Y., ... & Wang, L. (2024). Fedadkd: heterogeneous federated learning via adaptive knowledge distillation. Pattern Analysis and Applications, 27(4), 134. [32] Su, L., Wang, D., & Zhu, J. (2025). DKD-pFed: A novel framework for personalized federated learning via decoupling knowledge distillation and feature decorrelation. Expert Systems with Applications, 259, 125336. [33] Yashwanth, M., Nayak, G. K., Singh, A., Simmhan, Y., & Chakraborty, A. (2023). Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning. arXiv preprint arXiv:2305.19600. [34] Zhao, S., Liao, T., Fu, L., Chen, C., Bian, J., & Zheng, Z. (2024). Data-free knowledge distillation via generator-free data generation for Non-IID federated learning. Neural Networks, 179, 106627. [35] Han, S., Park, S., Wu, F., Kim, S., Wu, C., Xie, X., & Cha, M. (2022, October). Fedx: Unsupervised federated learning with cross knowledge distillation. In European Conference on Computer Vision (pp. 691-707). Cham: Springer Nature Switzerland. zh_TW
