落實本地差分隱私的安全式機器學習

學術產出-學位論文

文章檢視/開啟

pdf(3)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	落實本地差分隱私的安全式機器學習 Applying Local Differential Privacy for Secure Machine Learning
作者	呂柏漢
貢獻者	胡毓忠呂柏漢
關鍵詞	本地差分隱私安全式機器學習資料保護二元分類隱私保護 Local differential privacy Secure machine learning Data Protection
日期	2019
上傳時間	7-三月-2019 12:07:44 (UTC+8)
摘要	隨著大數據時代的到來，各大企業與政府組織皆大量的蒐集與分析用戶資訊，個人隱私也隨之面臨洩漏的風險，如何平衡資料的可用性與隱私保護成為重要的課題。本研究運用本地差分隱私技術建構安全式機器學習，在不洩漏個人敏感資訊的情形下完成資料分析的正確分類與預測。本研究使用 UCI 提供的” Bank Marketing Data Set”資料集，運用基於 AnonML 與 RAPPOR 的本地差分隱私技術擾動敏感資料完成隱私保護，允許使用者視特徵隱私程度的不同客製化隱私預算，在三方平台還原資料完成安全式機器學習，並具體提出量化與質化的運算觀察結果。 With the arrival of big data era, many big enterprises and governments aggregate and analyze great amounts of user data. Personal privacy faces the risk of leakage nowadays. It becomes an important task to balance data utility and privacy protection.This research proposed to use local differential privacy to implement secure machine learning and make correct classification and prediction with the data protection. This research uses the “Bank Marketing Data Set” on UCI, adding noise into sensitive data by local differential privacy based on AnonML and RAPPOR for privacy protection and recover the data to implement machine learning on the third-party platform, and concluding the calculation results of quantization and quality by this method.
參考文獻	[1] B. Hitaj et al., “Deep models under the GAN: information leakage from collaborative deep learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603-618, 2017. [2] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111-125, 2008. [3] T. Dalenius, “Towards a methodology for statistical disclosure control,” statistik Tidskrift, vol. 15, no. 429-444, pp. 2-1, 1977. [4] C. Dwork, “Differential Privacy,” Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 1-12, 2006. [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002. [6] A. Machanavajjhala et al., “l-Diversity: Privacy Beyond k-Anonymity,” Proceedings of the 22nd International Conference on Data Engineering, pp. 24, 2006. [7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211-407, 2014. [8] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography Conference, pp. 265-284, 2006. [9] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” Foundations of Computer Science, 2007. FOCS`07. 48th Annual IEEE Symposium on, pp. 94-103, 2007. [10] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19-30, 2009. [11] C. Dwork et al., “On the complexity of differentially private data release: efficient algorithms and hardness results,” Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 381-390, 2009. [12] J. C. Duchi et al., “Local privacy and statistical minimax rates,” Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 429-438, 2013. [13] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-69, 1965. [14] Ú. Erlingsson et al., “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054-1067, 2014. [15] G. Fanti et al., “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 3, pp. 41, 2016. [16] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B, vol. 73, no. 3, pp. 273-282, 2011. [17] T. T. Nguyên et al., “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016. [18] B. Cyphers and K. Veeramachaneni, “AnonML: Locally private machine learning over a network of peers,” Data Science and Advanced Analytics (DSAA), IEEE International Conference on, pp. 549-560, 2017. [19] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” PODS, vol. 98, pp. 188, 1998. [20] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness Knowledge-Based Systems, vol. 10, no. 05, pp. 571-588, 2002. [21] F. Prasser et al., “Lightning: Utility-Driven Anonymization of High-Dimensional Data,” Transactions on Data Privacy, vol. 9, no. 2, pp. 161-185, 2016. [22] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” Proceedings of the 47th Annual ACM Symposium on Theory of Computing, pp. 127-135, 2015.
描述	碩士國立政治大學資訊科學系碩士在職專班 105971008
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0105971008
資料類型	thesis

dc.contributor.advisor	胡毓忠	zh_TW
dc.contributor.author (作者)	呂柏漢	zh_TW
dc.creator (作者)	呂柏漢	zh_TW
dc.date (日期)	2019	en_US
dc.date.accessioned	7-三月-2019 12:07:44 (UTC+8)	-
dc.date.available	7-三月-2019 12:07:44 (UTC+8)	-
dc.date.issued (上傳時間)	7-三月-2019 12:07:44 (UTC+8)	-
dc.identifier (其他識別碼)	G0105971008	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/122462	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	105971008	zh_TW
dc.description.abstract (摘要)	隨著大數據時代的到來，各大企業與政府組織皆大量的蒐集與分析用戶資訊，個人隱私也隨之面臨洩漏的風險，如何平衡資料的可用性與隱私保護成為重要的課題。本研究運用本地差分隱私技術建構安全式機器學習，在不洩漏個人敏感資訊的情形下完成資料分析的正確分類與預測。本研究使用 UCI 提供的” Bank Marketing Data Set”資料集，運用基於 AnonML 與 RAPPOR 的本地差分隱私技術擾動敏感資料完成隱私保護，允許使用者視特徵隱私程度的不同客製化隱私預算，在三方平台還原資料完成安全式機器學習，並具體提出量化與質化的運算觀察結果。	zh_TW
dc.description.abstract (摘要)	With the arrival of big data era, many big enterprises and governments aggregate and analyze great amounts of user data. Personal privacy faces the risk of leakage nowadays. It becomes an important task to balance data utility and privacy protection.This research proposed to use local differential privacy to implement secure machine learning and make correct classification and prediction with the data protection. This research uses the “Bank Marketing Data Set” on UCI, adding noise into sensitive data by local differential privacy based on AnonML and RAPPOR for privacy protection and recover the data to implement machine learning on the third-party platform, and concluding the calculation results of quantization and quality by this method.	en_US
dc.description.tableofcontents	摘要 i Abstract ii 致謝 iii 目錄 iv 表目錄 vi 圖目錄 vii 第一章導論 1 1.1研究動機 1 1.2 研究目的 2 第二章研究背景 3 2.1差分隱私介紹 3 2.2.1差分隱私的實現機制 4 2.2.2差分隱私演算法的組合性質 7 2.2.3差分隱私數據的發布特性 8 2.3本地差分隱私介紹與比較 9 2.3.1本地差分隱私的實現機制 11 2.4 Google的用戶隱私保護技術RAPPOR 13 2.4.1 RAPPOR的變型 14 2.5符合本地差分隱私的機器學習系統AnonML 15 第三章相關研究 17 3.1 K匿名化、L多樣性等隱私保護技術 17 3.2 APPLE iOS10隱私保護技術 19 第四章研究方法與架構 21 4.1研究架構 21 4.2於用戶端進行特徵值中位數分級處理 23 4.3於用戶端進行符合本地差分隱私的擾動 24 4.3.1於用戶端使用RAPPOR進行擾動 26 4.3.2隱私預算估計 28 4.4還原擾動雜訊 29 4.5運用機器學習進行二元分類 30 第五章研究實作與結果 31 5.1資料前處理 31 5.2擾動資料與還原 33 5.3實驗結果 36 5.3.1兩種演算法比較 36 5.3.2資料量差異比較 38 第六章結論與未來展望 42 6.1結論 42 6.2未來展望 42 參考文獻 43	zh_TW
dc.format.extent	3145683 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0105971008	en_US
dc.subject (關鍵詞)	本地差分隱私	zh_TW
dc.subject (關鍵詞)	安全式機器學習	zh_TW
dc.subject (關鍵詞)	資料保護	zh_TW
dc.subject (關鍵詞)	二元分類	zh_TW
dc.subject (關鍵詞)	隱私保護	zh_TW
dc.subject (關鍵詞)	Local differential privacy	en_US
dc.subject (關鍵詞)	Secure machine learning	en_US
dc.subject (關鍵詞)	Data Protection	en_US
dc.title (題名)	落實本地差分隱私的安全式機器學習	zh_TW
dc.title (題名)	Applying Local Differential Privacy for Secure Machine Learning	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] B. Hitaj et al., “Deep models under the GAN: information leakage from collaborative deep learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603-618, 2017. [2] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111-125, 2008. [3] T. Dalenius, “Towards a methodology for statistical disclosure control,” statistik Tidskrift, vol. 15, no. 429-444, pp. 2-1, 1977. [4] C. Dwork, “Differential Privacy,” Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 1-12, 2006. [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002. [6] A. Machanavajjhala et al., “l-Diversity: Privacy Beyond k-Anonymity,” Proceedings of the 22nd International Conference on Data Engineering, pp. 24, 2006. [7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211-407, 2014. [8] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography Conference, pp. 265-284, 2006. [9] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” Foundations of Computer Science, 2007. FOCS`07. 48th Annual IEEE Symposium on, pp. 94-103, 2007. [10] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19-30, 2009. [11] C. Dwork et al., “On the complexity of differentially private data release: efficient algorithms and hardness results,” Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 381-390, 2009. [12] J. C. Duchi et al., “Local privacy and statistical minimax rates,” Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 429-438, 2013. [13] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-69, 1965. [14] Ú. Erlingsson et al., “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054-1067, 2014. [15] G. Fanti et al., “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 3, pp. 41, 2016. [16] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B, vol. 73, no. 3, pp. 273-282, 2011. [17] T. T. Nguyên et al., “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016. [18] B. Cyphers and K. Veeramachaneni, “AnonML: Locally private machine learning over a network of peers,” Data Science and Advanced Analytics (DSAA), IEEE International Conference on, pp. 549-560, 2017. [19] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” PODS, vol. 98, pp. 188, 1998. [20] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness Knowledge-Based Systems, vol. 10, no. 05, pp. 571-588, 2002. [21] F. Prasser et al., “Lightning: Utility-Driven Anonymization of High-Dimensional Data,” Transactions on Data Privacy, vol. 9, no. 2, pp. 161-185, 2016. [22] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” Proceedings of the 47th Annual ACM Symposium on Theory of Computing, pp. 127-135, 2015.	zh_TW
dc.identifier.doi (DOI)	10.6814/THE.NCCU.EMCS.005.2019.B02	en_US

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM