Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 以兩層式機器學習進行連網設備識別
Two-Level Machine Learning for Network Enabled Devices Identification
作者 吳明倫
Wu, Ming-Lun
貢獻者 胡毓忠
Hu, Yuh-Jong
吳明倫
Wu, Ming-Lun
關鍵詞 物聯網
連網設備
資訊安全
兩層式機器學習
半監督式學習
網路掃描資料
支援向量機
隨機森林
二元分類器
IoT
Network Enabled Devices
Cyber Security
Two-level Machine Learning
Semi-supervised Learning
Censys
Network Scan Data
Support Vector Machine
Random Forest
Binary Classifier
日期 2019
上傳時間 7-Aug-2019 16:36:36 (UTC+8)
摘要 隨著物聯網技術的蓬勃發展,網路上連網設備數量呈現爆炸性的成長,提供的服務也更為多元,使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏,使設備漏洞遭駭客利用的事件層出不窮,導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備,藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習(Two-level Machine Learning)的技術,用於處理量體龐大且具有階層式資料(Hierarchical Structure Data)特性的連網設備資料上,並比較與目前常用的單層式機器學習間的差異,加上結合半監督式學習的概念,探索自動處理受歸類為未知設備的可能性。

本研究使用 Censys 網路掃描資料集來進行支援向量機(Support Vector Machine)及隨機森林(Random Forest)兩種分類演算法的二元分類器訓練,進而對連網設備資料進行分類;並採半監督式學習概念,嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中,兩種分類演算法及單層與兩層式機器學習之間的差異,並就實驗成果提出相關量化與質化的觀察結果。
With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people`s lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device.

This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results.
參考文獻 [1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017.
[2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013.
[3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000.
[4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master`s thesis, National Chung Hsing University, 2015.
[5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017.
[6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.
[7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009.
[8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996.
[9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017.
[10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012.
[11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013.
[12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015.
[13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016.
[14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017.
[15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016.
[16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013.
[17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006.
[18] Arie Ben-David. About the relationship between roc curves and cohen`s kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008.
[19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001.
[20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017.
描述 碩士
國立政治大學
資訊科學系
106753015
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106753015
資料類型 thesis
dc.contributor.advisor 胡毓忠zh_TW
dc.contributor.advisor Hu, Yuh-Jongen_US
dc.contributor.author (Authors) 吳明倫zh_TW
dc.contributor.author (Authors) Wu, Ming-Lunen_US
dc.creator (作者) 吳明倫zh_TW
dc.creator (作者) Wu, Ming-Lunen_US
dc.date (日期) 2019en_US
dc.date.accessioned 7-Aug-2019 16:36:36 (UTC+8)-
dc.date.available 7-Aug-2019 16:36:36 (UTC+8)-
dc.date.issued (上傳時間) 7-Aug-2019 16:36:36 (UTC+8)-
dc.identifier (Other Identifiers) G0106753015en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/124874-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 106753015zh_TW
dc.description.abstract (摘要) 隨著物聯網技術的蓬勃發展,網路上連網設備數量呈現爆炸性的成長,提供的服務也更為多元,使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏,使設備漏洞遭駭客利用的事件層出不窮,導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備,藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習(Two-level Machine Learning)的技術,用於處理量體龐大且具有階層式資料(Hierarchical Structure Data)特性的連網設備資料上,並比較與目前常用的單層式機器學習間的差異,加上結合半監督式學習的概念,探索自動處理受歸類為未知設備的可能性。

本研究使用 Censys 網路掃描資料集來進行支援向量機(Support Vector Machine)及隨機森林(Random Forest)兩種分類演算法的二元分類器訓練,進而對連網設備資料進行分類;並採半監督式學習概念,嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中,兩種分類演算法及單層與兩層式機器學習之間的差異,並就實驗成果提出相關量化與質化的觀察結果。
zh_TW
dc.description.abstract (摘要) With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people`s lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device.

This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results.
en_US
dc.description.tableofcontents 第一章 導論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 研究貢獻 3
第二章 研究背景 5
2.1 兩層式機器學習 5
2.2 未知資料處理 9
2.3 連網設備識別 13
2.4 網路掃描資料 14
第三章 相關研究 18
3.1 兩層式機器學習研究案例 18
3.2 連網設備識別研究案例 19
3.3 半監督式學習研究案例 20
第四章 兩層式機器學習流程設計 21
4.1 資料前處理階段 22
4.2 建模方式 25
4.3 模擬實驗設計 27
第五章 研究實作與比較 29
5.1 網路掃描資料處理流程 29
5.2 兩層式機器學習流程 33
5.3 模擬實驗 37
第六章 結論與未來展望 42
6.1 研究結論 42
6.2 未來展望 43
參考文獻 44
zh_TW
dc.format.extent 2135705 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106753015en_US
dc.subject (關鍵詞) 物聯網zh_TW
dc.subject (關鍵詞) 連網設備zh_TW
dc.subject (關鍵詞) 資訊安全zh_TW
dc.subject (關鍵詞) 兩層式機器學習zh_TW
dc.subject (關鍵詞) 半監督式學習zh_TW
dc.subject (關鍵詞) 網路掃描資料zh_TW
dc.subject (關鍵詞) 支援向量機zh_TW
dc.subject (關鍵詞) 隨機森林zh_TW
dc.subject (關鍵詞) 二元分類器zh_TW
dc.subject (關鍵詞) IoTen_US
dc.subject (關鍵詞) Network Enabled Devicesen_US
dc.subject (關鍵詞) Cyber Securityen_US
dc.subject (關鍵詞) Two-level Machine Learningen_US
dc.subject (關鍵詞) Semi-supervised Learningen_US
dc.subject (關鍵詞) Censysen_US
dc.subject (關鍵詞) Network Scan Dataen_US
dc.subject (關鍵詞) Support Vector Machineen_US
dc.subject (關鍵詞) Random Foresten_US
dc.subject (關鍵詞) Binary Classifieren_US
dc.title (題名) 以兩層式機器學習進行連網設備識別zh_TW
dc.title (題名) Two-Level Machine Learning for Network Enabled Devices Identificationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017.
[2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013.
[3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000.
[4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master`s thesis, National Chung Hsing University, 2015.
[5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017.
[6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.
[7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009.
[8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996.
[9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017.
[10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012.
[11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013.
[12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015.
[13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016.
[14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017.
[15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016.
[16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013.
[17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006.
[18] Arie Ben-David. About the relationship between roc curves and cohen`s kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008.
[19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001.
[20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU201900635en_US