Publications-Theses
Article View/Open
Publication Export
-
題名 落實本地差分隱私的安全式機器學習
Applying Local Differential Privacy for Secure Machine Learning作者 呂柏漢 貢獻者 胡毓忠
呂柏漢關鍵詞 本地差分隱私
安全式機器學習
資料保護
二元分類
隱私保護
Local differential privacy
Secure machine learning
Data Protection日期 2019 上傳時間 7-Mar-2019 12:07:44 (UTC+8) 摘要 隨著大數據時代的到來,各大企業與政府組織皆大量的蒐集與分析用戶資訊,個人隱私也隨之面臨洩漏的風險,如何平衡資料的可用性與隱私保護成為重要的課題。本研究運用本地差分隱私技術建構安全式機器學習,在不洩漏個人敏感資訊的情形下完成資料分析的正確分類與預測。本研究使用 UCI 提供的” Bank Marketing Data Set”資料集,運用基於 AnonML 與 RAPPOR 的本地差分隱私技術擾動敏感資料完成隱私保護,允許使用者視特徵隱私程度的不同客製化隱私預算,在三方平台還原資料完成安全式機器學習,並具體提出量化與質化的運算觀察結果。
With the arrival of big data era, many big enterprises and governments aggregate and analyze great amounts of user data. Personal privacy faces the risk of leakage nowadays. It becomes an important task to balance data utility and privacy protection.This research proposed to use local differential privacy to implement secure machinelearning and make correct classification and prediction with the data protection. This research uses the “Bank Marketing Data Set” on UCI, adding noise into sensitive data by local differential privacy based on AnonML and RAPPOR for privacy protection and recover the data to implement machine learning on the third-party platform, andconcluding the calculation results of quantization and quality by this method.參考文獻 [1] B. Hitaj et al., “Deep models under the GAN: information leakage from collaborative deep learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603-618, 2017.[2] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111-125, 2008.[3] T. Dalenius, “Towards a methodology for statistical disclosure control,” statistik Tidskrift, vol. 15, no. 429-444, pp. 2-1, 1977.[4] C. Dwork, “Differential Privacy,” Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 1-12, 2006.[5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002.[6] A. Machanavajjhala et al., “l-Diversity: Privacy Beyond k-Anonymity,” Proceedings of the 22nd International Conference on Data Engineering, pp. 24, 2006.[7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211-407, 2014.[8] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography Conference, pp. 265-284, 2006.[9] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” Foundations of Computer Science, 2007. FOCS`07. 48th Annual IEEE Symposium on, pp. 94-103, 2007.[10] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19-30, 2009.[11] C. Dwork et al., “On the complexity of differentially private data release: efficient algorithms and hardness results,” Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 381-390, 2009.[12] J. C. Duchi et al., “Local privacy and statistical minimax rates,” Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 429-438, 2013.[13] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-69, 1965.[14] Ú. Erlingsson et al., “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054-1067, 2014.[15] G. Fanti et al., “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 3, pp. 41, 2016.[16] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B, vol. 73, no. 3, pp. 273-282, 2011.[17] T. T. Nguyên et al., “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016.[18] B. Cyphers and K. Veeramachaneni, “AnonML: Locally private machine learning over a network of peers,” Data Science and Advanced Analytics (DSAA), IEEE International Conference on, pp. 549-560, 2017.[19] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” PODS, vol. 98, pp. 188, 1998.[20] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness Knowledge-Based Systems, vol. 10, no. 05, pp. 571-588, 2002.[21] F. Prasser et al., “Lightning: Utility-Driven Anonymization of High-Dimensional Data,” Transactions on Data Privacy, vol. 9, no. 2, pp. 161-185, 2016.[22] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” Proceedings of the 47th Annual ACM Symposium on Theory of Computing, pp. 127-135, 2015. 描述 碩士
國立政治大學
資訊科學系碩士在職專班
105971008資料來源 http://thesis.lib.nccu.edu.tw/record/#G0105971008 資料類型 thesis dc.contributor.advisor 胡毓忠 zh_TW dc.contributor.author (Authors) 呂柏漢 zh_TW dc.creator (作者) 呂柏漢 zh_TW dc.date (日期) 2019 en_US dc.date.accessioned 7-Mar-2019 12:07:44 (UTC+8) - dc.date.available 7-Mar-2019 12:07:44 (UTC+8) - dc.date.issued (上傳時間) 7-Mar-2019 12:07:44 (UTC+8) - dc.identifier (Other Identifiers) G0105971008 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/122462 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 105971008 zh_TW dc.description.abstract (摘要) 隨著大數據時代的到來,各大企業與政府組織皆大量的蒐集與分析用戶資訊,個人隱私也隨之面臨洩漏的風險,如何平衡資料的可用性與隱私保護成為重要的課題。本研究運用本地差分隱私技術建構安全式機器學習,在不洩漏個人敏感資訊的情形下完成資料分析的正確分類與預測。本研究使用 UCI 提供的” Bank Marketing Data Set”資料集,運用基於 AnonML 與 RAPPOR 的本地差分隱私技術擾動敏感資料完成隱私保護,允許使用者視特徵隱私程度的不同客製化隱私預算,在三方平台還原資料完成安全式機器學習,並具體提出量化與質化的運算觀察結果。 zh_TW dc.description.abstract (摘要) With the arrival of big data era, many big enterprises and governments aggregate and analyze great amounts of user data. Personal privacy faces the risk of leakage nowadays. It becomes an important task to balance data utility and privacy protection.This research proposed to use local differential privacy to implement secure machinelearning and make correct classification and prediction with the data protection. This research uses the “Bank Marketing Data Set” on UCI, adding noise into sensitive data by local differential privacy based on AnonML and RAPPOR for privacy protection and recover the data to implement machine learning on the third-party platform, andconcluding the calculation results of quantization and quality by this method. en_US dc.description.tableofcontents 摘要 iAbstract ii致謝 iii目錄 iv表目錄 vi圖目錄 vii第一章 導論 11.1研究動機 11.2 研究目的 2第二章 研究背景 32.1差分隱私介紹 32.2.1差分隱私的實現機制 42.2.2差分隱私演算法的組合性質 72.2.3差分隱私數據的發布特性 82.3本地差分隱私介紹與比較 92.3.1本地差分隱私的實現機制 112.4 Google的用戶隱私保護技術RAPPOR 132.4.1 RAPPOR的變型 142.5符合本地差分隱私的機器學習系統AnonML 15第三章 相關研究 173.1 K匿名化、L多樣性等隱私保護技術 173.2 APPLE iOS10隱私保護技術 19第四章 研究方法與架構 214.1研究架構 214.2於用戶端進行特徵值中位數分級處理 234.3於用戶端進行符合本地差分隱私的擾動 244.3.1於用戶端使用RAPPOR進行擾動 264.3.2隱私預算估計 284.4還原擾動雜訊 294.5運用機器學習進行二元分類 30第五章 研究實作與結果 315.1資料前處理 315.2擾動資料與還原 335.3實驗結果 365.3.1兩種演算法比較 365.3.2資料量差異比較 38第六章 結論與未來展望 426.1結論 426.2未來展望 42參考文獻 43 zh_TW dc.format.extent 3145683 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0105971008 en_US dc.subject (關鍵詞) 本地差分隱私 zh_TW dc.subject (關鍵詞) 安全式機器學習 zh_TW dc.subject (關鍵詞) 資料保護 zh_TW dc.subject (關鍵詞) 二元分類 zh_TW dc.subject (關鍵詞) 隱私保護 zh_TW dc.subject (關鍵詞) Local differential privacy en_US dc.subject (關鍵詞) Secure machine learning en_US dc.subject (關鍵詞) Data Protection en_US dc.title (題名) 落實本地差分隱私的安全式機器學習 zh_TW dc.title (題名) Applying Local Differential Privacy for Secure Machine Learning en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] B. Hitaj et al., “Deep models under the GAN: information leakage from collaborative deep learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603-618, 2017.[2] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111-125, 2008.[3] T. Dalenius, “Towards a methodology for statistical disclosure control,” statistik Tidskrift, vol. 15, no. 429-444, pp. 2-1, 1977.[4] C. Dwork, “Differential Privacy,” Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 1-12, 2006.[5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002.[6] A. Machanavajjhala et al., “l-Diversity: Privacy Beyond k-Anonymity,” Proceedings of the 22nd International Conference on Data Engineering, pp. 24, 2006.[7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211-407, 2014.[8] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography Conference, pp. 265-284, 2006.[9] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” Foundations of Computer Science, 2007. FOCS`07. 48th Annual IEEE Symposium on, pp. 94-103, 2007.[10] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19-30, 2009.[11] C. Dwork et al., “On the complexity of differentially private data release: efficient algorithms and hardness results,” Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 381-390, 2009.[12] J. C. Duchi et al., “Local privacy and statistical minimax rates,” Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 429-438, 2013.[13] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-69, 1965.[14] Ú. Erlingsson et al., “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054-1067, 2014.[15] G. Fanti et al., “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 3, pp. 41, 2016.[16] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B, vol. 73, no. 3, pp. 273-282, 2011.[17] T. T. Nguyên et al., “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016.[18] B. Cyphers and K. Veeramachaneni, “AnonML: Locally private machine learning over a network of peers,” Data Science and Advanced Analytics (DSAA), IEEE International Conference on, pp. 549-560, 2017.[19] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” PODS, vol. 98, pp. 188, 1998.[20] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness Knowledge-Based Systems, vol. 10, no. 05, pp. 571-588, 2002.[21] F. Prasser et al., “Lightning: Utility-Driven Anonymization of High-Dimensional Data,” Transactions on Data Privacy, vol. 9, no. 2, pp. 161-185, 2016.[22] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” Proceedings of the 47th Annual ACM Symposium on Theory of Computing, pp. 127-135, 2015. zh_TW dc.identifier.doi (DOI) 10.6814/THE.NCCU.EMCS.005.2019.B02 en_US