以兩層式機器學習進行連網設備識別

Publications-Theses

Article View/Open

pdf(1)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	以兩層式機器學習進行連網設備識別 Two-Level Machine Learning for Network Enabled Devices Identification
作者	吳明倫 Wu, Ming-Lun
貢獻者	胡毓忠 Hu, Yuh-Jong 吳明倫 Wu, Ming-Lun
關鍵詞	物聯網連網設備資訊安全兩層式機器學習半監督式學習網路掃描資料支援向量機隨機森林二元分類器 IoT Network Enabled Devices Cyber Security Two-level Machine Learning Semi-supervised Learning Censys Network Scan Data Support Vector Machine Random Forest Binary Classifier
日期	2019
上傳時間	7-Aug-2019 16:36:36 (UTC+8)
摘要	隨著物聯網技術的蓬勃發展，網路上連網設備數量呈現爆炸性的成長，提供的服務也更為多元，使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏，使設備漏洞遭駭客利用的事件層出不窮，導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備，藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習（Two-level Machine Learning）的技術，用於處理量體龐大且具有階層式資料（Hierarchical Structure Data）特性的連網設備資料上，並比較與目前常用的單層式機器學習間的差異，加上結合半監督式學習的概念，探索自動處理受歸類為未知設備的可能性。本研究使用 Censys 網路掃描資料集來進行支援向量機（Support Vector Machine）及隨機森林（Random Forest）兩種分類演算法的二元分類器訓練，進而對連網設備資料進行分類；並採半監督式學習概念，嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中，兩種分類演算法及單層與兩層式機器學習之間的差異，並就實驗成果提出相關量化與質化的觀察結果。 With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people`s lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device. This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results.
參考文獻	[1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017. [2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013. [3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000. [4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master`s thesis, National Chung Hsing University, 2015. [5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017. [6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010. [7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009. [8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996. [9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017. [10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012. [11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013. [12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015. [13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016. [14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017. [15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016. [16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013. [17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006. [18] Arie Ben-David. About the relationship between roc curves and cohen`s kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008. [19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001. [20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017.
描述	碩士國立政治大學資訊科學系 106753015
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106753015
資料類型	thesis

dc.contributor.advisor	胡毓忠	zh_TW
dc.contributor.advisor	Hu, Yuh-Jong	en_US
dc.contributor.author (Authors)	吳明倫	zh_TW
dc.contributor.author (Authors)	Wu, Ming-Lun	en_US
dc.creator (作者)	吳明倫	zh_TW
dc.creator (作者)	Wu, Ming-Lun	en_US
dc.date (日期)	2019	en_US
dc.date.accessioned	7-Aug-2019 16:36:36 (UTC+8)	-
dc.date.available	7-Aug-2019 16:36:36 (UTC+8)	-
dc.date.issued (上傳時間)	7-Aug-2019 16:36:36 (UTC+8)	-
dc.identifier (Other Identifiers)	G0106753015	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/124874	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系	zh_TW
dc.description (描述)	106753015	zh_TW
dc.description.abstract (摘要)	隨著物聯網技術的蓬勃發展，網路上連網設備數量呈現爆炸性的成長，提供的服務也更為多元，使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏，使設備漏洞遭駭客利用的事件層出不窮，導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備，藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習（Two-level Machine Learning）的技術，用於處理量體龐大且具有階層式資料（Hierarchical Structure Data）特性的連網設備資料上，並比較與目前常用的單層式機器學習間的差異，加上結合半監督式學習的概念，探索自動處理受歸類為未知設備的可能性。本研究使用 Censys 網路掃描資料集來進行支援向量機（Support Vector Machine）及隨機森林（Random Forest）兩種分類演算法的二元分類器訓練，進而對連網設備資料進行分類；並採半監督式學習概念，嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中，兩種分類演算法及單層與兩層式機器學習之間的差異，並就實驗成果提出相關量化與質化的觀察結果。	zh_TW
dc.description.abstract (摘要)	With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people`s lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device. This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results.	en_US
dc.description.tableofcontents	第一章導論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究貢獻 3 第二章研究背景 5 2.1 兩層式機器學習 5 2.2 未知資料處理 9 2.3 連網設備識別 13 2.4 網路掃描資料 14 第三章相關研究 18 3.1 兩層式機器學習研究案例 18 3.2 連網設備識別研究案例 19 3.3 半監督式學習研究案例 20 第四章兩層式機器學習流程設計 21 4.1 資料前處理階段 22 4.2 建模方式 25 4.3 模擬實驗設計 27 第五章研究實作與比較 29 5.1 網路掃描資料處理流程 29 5.2 兩層式機器學習流程 33 5.3 模擬實驗 37 第六章結論與未來展望 42 6.1 研究結論 42 6.2 未來展望 43 參考文獻 44	zh_TW
dc.format.extent	2135705 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106753015	en_US
dc.subject (關鍵詞)	物聯網	zh_TW
dc.subject (關鍵詞)	連網設備	zh_TW
dc.subject (關鍵詞)	資訊安全	zh_TW
dc.subject (關鍵詞)	兩層式機器學習	zh_TW
dc.subject (關鍵詞)	半監督式學習	zh_TW
dc.subject (關鍵詞)	網路掃描資料	zh_TW
dc.subject (關鍵詞)	支援向量機	zh_TW
dc.subject (關鍵詞)	隨機森林	zh_TW
dc.subject (關鍵詞)	二元分類器	zh_TW
dc.subject (關鍵詞)	IoT	en_US
dc.subject (關鍵詞)	Network Enabled Devices	en_US
dc.subject (關鍵詞)	Cyber Security	en_US
dc.subject (關鍵詞)	Two-level Machine Learning	en_US
dc.subject (關鍵詞)	Semi-supervised Learning	en_US
dc.subject (關鍵詞)	Censys	en_US
dc.subject (關鍵詞)	Network Scan Data	en_US
dc.subject (關鍵詞)	Support Vector Machine	en_US
dc.subject (關鍵詞)	Random Forest	en_US
dc.subject (關鍵詞)	Binary Classifier	en_US
dc.title (題名)	以兩層式機器學習進行連網設備識別	zh_TW
dc.title (題名)	Two-Level Machine Learning for Network Enabled Devices Identification	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017. [2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013. [3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000. [4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master`s thesis, National Chung Hsing University, 2015. [5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017. [6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010. [7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009. [8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996. [9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017. [10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012. [11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013. [12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015. [13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016. [14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017. [15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016. [16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013. [17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006. [18] Arie Ben-David. About the relationship between roc curves and cohen`s kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008. [19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001. [20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU201900635	en_US

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM