Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 落實本地差分隱私的安全式機器學習
Applying Local Differential Privacy for Secure Machine Learning
作者 呂柏漢
貢獻者 胡毓忠
呂柏漢
關鍵詞 本地差分隱私
安全式機器學習
資料保護
二元分類
隱私保護
Local differential privacy
Secure machine learning
Data Protection
日期 2019
上傳時間 7-Mar-2019 12:07:44 (UTC+8)
摘要 隨著大數據時代的到來,各大企業與政府組織皆大量的蒐集與分析用戶資
訊,個人隱私也隨之面臨洩漏的風險,如何平衡資料的可用性與隱私保護成為重要的課題。本研究運用本地差分隱私技術建構安全式機器學習,在不洩漏個人敏感資訊的情形下完成資料分析的正確分類與預測。本研究使用 UCI 提供的” Bank Marketing Data Set”資料集,運用基於 AnonML 與 RAPPOR 的本地差分隱私技術擾動敏感資料完成隱私保護,允許使用者視特徵隱私程度的不同客製化隱私預算,在三方平台還原資料完成安全式機器學習,並具體提出量化與質化的運算觀察結果。
With the arrival of big data era, many big enterprises and governments aggregate and analyze great amounts of user data. Personal privacy faces the risk of leakage nowadays. It becomes an important task to balance data utility and privacy protection.This research proposed to use local differential privacy to implement secure machine
learning and make correct classification and prediction with the data protection. This research uses the “Bank Marketing Data Set” on UCI, adding noise into sensitive data by local differential privacy based on AnonML and RAPPOR for privacy protection and recover the data to implement machine learning on the third-party platform, and
concluding the calculation results of quantization and quality by this method.
參考文獻 [1] B. Hitaj et al., “Deep models under the GAN: information leakage from collaborative deep learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603-618, 2017.
[2] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111-125, 2008.
[3] T. Dalenius, “Towards a methodology for statistical disclosure control,” statistik Tidskrift, vol. 15, no. 429-444, pp. 2-1, 1977.
[4] C. Dwork, “Differential Privacy,” Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 1-12, 2006.
[5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002.
[6] A. Machanavajjhala et al., “l-Diversity: Privacy Beyond k-Anonymity,” Proceedings of the 22nd International Conference on Data Engineering, pp. 24, 2006.
[7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211-407, 2014.
[8] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography Conference, pp. 265-284, 2006.
[9] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” Foundations of Computer Science, 2007. FOCS`07. 48th Annual IEEE Symposium on, pp. 94-103, 2007.
[10] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19-30, 2009.
[11] C. Dwork et al., “On the complexity of differentially private data release: efficient algorithms and hardness results,” Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 381-390, 2009.
[12] J. C. Duchi et al., “Local privacy and statistical minimax rates,” Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 429-438, 2013.
[13] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-69, 1965.
[14] Ú. Erlingsson et al., “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054-1067, 2014.
[15] G. Fanti et al., “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 3, pp. 41, 2016.
[16] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B, vol. 73, no. 3, pp. 273-282, 2011.
[17] T. T. Nguyên et al., “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016.
[18] B. Cyphers and K. Veeramachaneni, “AnonML: Locally private machine learning over a network of peers,” Data Science and Advanced Analytics (DSAA), IEEE International Conference on, pp. 549-560, 2017.
[19] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” PODS, vol. 98, pp. 188, 1998.
[20] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness Knowledge-Based Systems, vol. 10, no. 05, pp. 571-588, 2002.
[21] F. Prasser et al., “Lightning: Utility-Driven Anonymization of High-Dimensional Data,” Transactions on Data Privacy, vol. 9, no. 2, pp. 161-185, 2016.
[22] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” Proceedings of the 47th Annual ACM Symposium on Theory of Computing, pp. 127-135, 2015.
描述 碩士
國立政治大學
資訊科學系碩士在職專班
105971008
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0105971008
資料類型 thesis
dc.contributor.advisor 胡毓忠zh_TW
dc.contributor.author (Authors) 呂柏漢zh_TW
dc.creator (作者) 呂柏漢zh_TW
dc.date (日期) 2019en_US
dc.date.accessioned 7-Mar-2019 12:07:44 (UTC+8)-
dc.date.available 7-Mar-2019 12:07:44 (UTC+8)-
dc.date.issued (上傳時間) 7-Mar-2019 12:07:44 (UTC+8)-
dc.identifier (Other Identifiers) G0105971008en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/122462-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 105971008zh_TW
dc.description.abstract (摘要) 隨著大數據時代的到來,各大企業與政府組織皆大量的蒐集與分析用戶資
訊,個人隱私也隨之面臨洩漏的風險,如何平衡資料的可用性與隱私保護成為重要的課題。本研究運用本地差分隱私技術建構安全式機器學習,在不洩漏個人敏感資訊的情形下完成資料分析的正確分類與預測。本研究使用 UCI 提供的” Bank Marketing Data Set”資料集,運用基於 AnonML 與 RAPPOR 的本地差分隱私技術擾動敏感資料完成隱私保護,允許使用者視特徵隱私程度的不同客製化隱私預算,在三方平台還原資料完成安全式機器學習,並具體提出量化與質化的運算觀察結果。
zh_TW
dc.description.abstract (摘要) With the arrival of big data era, many big enterprises and governments aggregate and analyze great amounts of user data. Personal privacy faces the risk of leakage nowadays. It becomes an important task to balance data utility and privacy protection.This research proposed to use local differential privacy to implement secure machine
learning and make correct classification and prediction with the data protection. This research uses the “Bank Marketing Data Set” on UCI, adding noise into sensitive data by local differential privacy based on AnonML and RAPPOR for privacy protection and recover the data to implement machine learning on the third-party platform, and
concluding the calculation results of quantization and quality by this method.
en_US
dc.description.tableofcontents 摘要 i
Abstract ii
致謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 導論 1
1.1研究動機 1
1.2 研究目的 2
第二章 研究背景 3
2.1差分隱私介紹 3
2.2.1差分隱私的實現機制 4
2.2.2差分隱私演算法的組合性質 7
2.2.3差分隱私數據的發布特性 8
2.3本地差分隱私介紹與比較 9
2.3.1本地差分隱私的實現機制 11
2.4 Google的用戶隱私保護技術RAPPOR 13
2.4.1 RAPPOR的變型 14
2.5符合本地差分隱私的機器學習系統AnonML 15
第三章 相關研究 17
3.1 K匿名化、L多樣性等隱私保護技術 17
3.2 APPLE iOS10隱私保護技術 19
第四章 研究方法與架構 21
4.1研究架構 21
4.2於用戶端進行特徵值中位數分級處理 23
4.3於用戶端進行符合本地差分隱私的擾動 24
4.3.1於用戶端使用RAPPOR進行擾動 26
4.3.2隱私預算估計 28
4.4還原擾動雜訊 29
4.5運用機器學習進行二元分類 30
第五章 研究實作與結果 31
5.1資料前處理 31
5.2擾動資料與還原 33
5.3實驗結果 36
5.3.1兩種演算法比較 36
5.3.2資料量差異比較 38
第六章 結論與未來展望 42
6.1結論 42
6.2未來展望 42
參考文獻 43
zh_TW
dc.format.extent 3145683 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0105971008en_US
dc.subject (關鍵詞) 本地差分隱私zh_TW
dc.subject (關鍵詞) 安全式機器學習zh_TW
dc.subject (關鍵詞) 資料保護zh_TW
dc.subject (關鍵詞) 二元分類zh_TW
dc.subject (關鍵詞) 隱私保護zh_TW
dc.subject (關鍵詞) Local differential privacyen_US
dc.subject (關鍵詞) Secure machine learningen_US
dc.subject (關鍵詞) Data Protectionen_US
dc.title (題名) 落實本地差分隱私的安全式機器學習zh_TW
dc.title (題名) Applying Local Differential Privacy for Secure Machine Learningen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] B. Hitaj et al., “Deep models under the GAN: information leakage from collaborative deep learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603-618, 2017.
[2] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111-125, 2008.
[3] T. Dalenius, “Towards a methodology for statistical disclosure control,” statistik Tidskrift, vol. 15, no. 429-444, pp. 2-1, 1977.
[4] C. Dwork, “Differential Privacy,” Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 1-12, 2006.
[5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002.
[6] A. Machanavajjhala et al., “l-Diversity: Privacy Beyond k-Anonymity,” Proceedings of the 22nd International Conference on Data Engineering, pp. 24, 2006.
[7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211-407, 2014.
[8] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography Conference, pp. 265-284, 2006.
[9] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” Foundations of Computer Science, 2007. FOCS`07. 48th Annual IEEE Symposium on, pp. 94-103, 2007.
[10] F. D. McSherry, “Privacy integrated queries: an extensible platform for privacy-preserving data analysis,” Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19-30, 2009.
[11] C. Dwork et al., “On the complexity of differentially private data release: efficient algorithms and hardness results,” Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 381-390, 2009.
[12] J. C. Duchi et al., “Local privacy and statistical minimax rates,” Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp. 429-438, 2013.
[13] S. L. Warner, “Randomized response: A survey technique for eliminating evasive answer bias,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 63-69, 1965.
[14] Ú. Erlingsson et al., “Rappor: Randomized aggregatable privacy-preserving ordinal response,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054-1067, 2014.
[15] G. Fanti et al., “Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries,” Proceedings on Privacy Enhancing Technologies, vol. 2016, no. 3, pp. 41, 2016.
[16] R. Tibshirani, “Regression shrinkage and selection via the lasso: a retrospective,” Journal of the Royal Statistical Society: Series B, vol. 73, no. 3, pp. 273-282, 2011.
[17] T. T. Nguyên et al., “Collecting and analyzing data from smart device users with local differential privacy,” arXiv preprint arXiv:1606.05053, 2016.
[18] B. Cyphers and K. Veeramachaneni, “AnonML: Locally private machine learning over a network of peers,” Data Science and Advanced Analytics (DSAA), IEEE International Conference on, pp. 549-560, 2017.
[19] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing information,” PODS, vol. 98, pp. 188, 1998.
[20] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness Knowledge-Based Systems, vol. 10, no. 05, pp. 571-588, 2002.
[21] F. Prasser et al., “Lightning: Utility-Driven Anonymization of High-Dimensional Data,” Transactions on Data Privacy, vol. 9, no. 2, pp. 161-185, 2016.
[22] R. Bassily and A. Smith, “Local, private, efficient protocols for succinct histograms,” Proceedings of the 47th Annual ACM Symposium on Theory of Computing, pp. 127-135, 2015.
zh_TW
dc.identifier.doi (DOI) 10.6814/THE.NCCU.EMCS.005.2019.B02en_US