Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 行動應用軟體在迭代分群行為之研究
Iterative Clustering on Behaviors of App Executables作者 邱莉晴
Chiu, Li Ching貢獻者 郁方
Yu, Fang
邱莉晴
Chiu, Li Ching關鍵詞 行動應用程式
GHSOM
分群
App
Clustering
GHSOM
iterative日期 2013 上傳時間 25-Aug-2014 15:16:54 (UTC+8) 摘要 行動裝置在現在這個世代相當普遍,而我們需要一個方法來探索App在背後的行為。本研究提出了一個非監督式的分群方式,目的是在於探討我們是否能使用App中的原始碼當作以行為分群的依據。在此研究中,我們應用了迭代分群的方式對Apps做分析,並且觀察分群的結果是否恰當。而在實驗中,我們由App Store下載了數百個App並加以分析,我們發現我們所提出的方式表現相當良好並且能給出正確的分群結果。
Smart devices are everywhere nowadays. Mobile application (app) development has become one of the main streams in software industry with more than millions of apps that have been developed and published to billions of users.It is essential to have a systematic way to analyze apps, preferably on their executable that are the only public available sources of apps in most cases.In this work, we propose to apply unsupervised clustering to mobile applications on their system call distributions. This is done by first adopting a static binary analysis that reverses engineering on executable of apps to find method call/sequence counts that are embedded in apps. Apps are then clustered iteratively based on this information to reveal implicit relationships among apps based on function call similarity. The GHSOM (Growing Hierarchical Self-Organizing Map), an unsupervised learning tool, is integrated to cluster apps based on the information resolved from their executable directly.We use types of methods and sequences as features. To run the clustering algorithm on apps, however, we immediately confront a problem that we have a large amount of attributes and data that leads to a long/infeasible analysis time with GHSOMs. The new iterative approach is proposed to conquer this problem along with dimension reduction with principle component analysis, cutting attributes with limited information loss.In the preliminary result on analyzing hundreds of apps that are directly downloaded from Apple app store, we can find that the proposed clustering works well and reveals some interesting information. Apps that are developed by the same company are clustered in the same group. Apps that have similar behaviors, e.g., having the same functions on games, painting, socializing, are clustered together.參考文獻 [1] Anonymous. (2010) Mimvi Reports Patent Filing for `Intelligent` Mobile App Search and Recommendation Technology." Entertainment Close – Up[2] Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433-459.[3] Bizzi, S., Harrison, R. F., & Lerner, D. N. (2009). The Growing Hierarchical Self-Organizing Map (GHSOM) for analysing multi-dimensional stream habitat datasets. In 18th World IMACS/MODSIM Congress.[4] Banković, Z., Stepanović, D., Bojanić, S., & Nieto-Taladriz, O. (2007). Improving network security using genetic algorithm approach. Computers & Electrical Engineering, 33(5), 438-451.[5] Bilar, D. (2007). Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 1(2), 156-168.[6] Chang, E. C., Huang, S. C., & Wu, H. H. (2010). Using K-means method and spectral clustering technique in an outfitter’s value analysis. Quality & Quantity, 44(4), 807-815.[7] Chandy, R., & Gu, H. (2012, April). Identifying spam in the iOS app store. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (pp. 56-59). ACM.[8] Danyu X.(2003).Pattern Recognition of Mutual Funds using Self-Organizing Maps Order No. MQ88787 Carleton University (Canada) [9] Eleyan, A., & Demirel, H. (2006). PCA and LDA based face recognition using feedforward neural network classifier. In Multimedia Content Representation, Classification and Security (pp. 199-206). Springer Berlin Heidelberg.[10] Eleyan, A., & Demirel, H. (2007). Pca and lda based neural networks for human face recognition. Face Recognition, 93-106.[11] Hurlburt, G., Voas, J., & Miller, K. W. (2011). mobile-app addiction: threat to security?. IT Professional.[12] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Applied statistics, 100-108.[13] Jieun Kim, Yongtae Park, Chulhyun Kim, Hakyeon Lee. "Mobile application service networks: Apple’s App Store." Service Business 8.1 (2014): 1-27.[14] Kenney, M., & Pon, B. (2011). Structuring the smartphone industry: is the 描述 碩士
國立政治大學
資訊管理研究所
101356040
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G1013560401 資料類型 thesis dc.contributor.advisor 郁方 zh_TW dc.contributor.advisor Yu, Fang en_US dc.contributor.author (Authors) 邱莉晴 zh_TW dc.contributor.author (Authors) Chiu, Li Ching en_US dc.creator (作者) 邱莉晴 zh_TW dc.creator (作者) Chiu, Li Ching en_US dc.date (日期) 2013 en_US dc.date.accessioned 25-Aug-2014 15:16:54 (UTC+8) - dc.date.available 25-Aug-2014 15:16:54 (UTC+8) - dc.date.issued (上傳時間) 25-Aug-2014 15:16:54 (UTC+8) - dc.identifier (Other Identifiers) G1013560401 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/69198 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理研究所 zh_TW dc.description (描述) 101356040 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 行動裝置在現在這個世代相當普遍,而我們需要一個方法來探索App在背後的行為。本研究提出了一個非監督式的分群方式,目的是在於探討我們是否能使用App中的原始碼當作以行為分群的依據。在此研究中,我們應用了迭代分群的方式對Apps做分析,並且觀察分群的結果是否恰當。而在實驗中,我們由App Store下載了數百個App並加以分析,我們發現我們所提出的方式表現相當良好並且能給出正確的分群結果。 zh_TW dc.description.abstract (摘要) Smart devices are everywhere nowadays. Mobile application (app) development has become one of the main streams in software industry with more than millions of apps that have been developed and published to billions of users.It is essential to have a systematic way to analyze apps, preferably on their executable that are the only public available sources of apps in most cases.In this work, we propose to apply unsupervised clustering to mobile applications on their system call distributions. This is done by first adopting a static binary analysis that reverses engineering on executable of apps to find method call/sequence counts that are embedded in apps. Apps are then clustered iteratively based on this information to reveal implicit relationships among apps based on function call similarity. The GHSOM (Growing Hierarchical Self-Organizing Map), an unsupervised learning tool, is integrated to cluster apps based on the information resolved from their executable directly.We use types of methods and sequences as features. To run the clustering algorithm on apps, however, we immediately confront a problem that we have a large amount of attributes and data that leads to a long/infeasible analysis time with GHSOMs. The new iterative approach is proposed to conquer this problem along with dimension reduction with principle component analysis, cutting attributes with limited information loss.In the preliminary result on analyzing hundreds of apps that are directly downloaded from Apple app store, we can find that the proposed clustering works well and reveals some interesting information. Apps that are developed by the same company are clustered in the same group. Apps that have similar behaviors, e.g., having the same functions on games, painting, socializing, are clustered together. en_US dc.description.tableofcontents Abstract............................................................................................................................. 3Content.............................................................................................................................. 41 Introduction.................................................................................................................... 72 Related Works.............................................................................................................. 102.1 Clustering methods ............................................................................................... 102.1.1 K-Means Algorithm....................................................................................... 102.1.2 SOM............................................................................................................... 112.1.3 GHSOM......................................................................................................... 122.1.4 Comparison of clustering method .................................................................. 132.2 Dimension reduction............................................................................................. 142.2.1 PCA................................................................................................................ 142.2.2 Comparison with LDA method.......................................................................... 152.3 OPcode sequence analysis .................................................................................... 162.4 App Analysis and clustering ................................................................................. 194 Evaluations................................................................................................................... 304.1 115 apps clustering ............................................................................................... 304.2 564 apps clustering ............................................................................................... 354.2.1 PCA reduction................................................................................................ 354.2.2 Iterative GHSOM on 564 apps....................................................................... 364.3 800 apps clustering ............................................................................................... 374.3.1 PCA reduction on 800 apps ........................................................................... 384.3.1 Iterative GHSOM on 800 apps....................................................................... 385 Conclusions.................................................................................................................. 42Reference ........................................................................................................................ 43Appendix......................................................................................................................... 461.GHSOM clustering result of 115 apps ................................................................... 46(1). Segment of MATLAB code on transfer the original data: ............................... 472. Progress of iterative GHSOM on 115 apps............................................................. 483.564 apps iterative progress....................................................................................... 504.800 apps progress..................................................................................................... 53 zh_TW dc.format.extent 2019318 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1013560401 en_US dc.subject (關鍵詞) 行動應用程式 zh_TW dc.subject (關鍵詞) GHSOM zh_TW dc.subject (關鍵詞) 分群 zh_TW dc.subject (關鍵詞) App en_US dc.subject (關鍵詞) Clustering en_US dc.subject (關鍵詞) GHSOM en_US dc.subject (關鍵詞) iterative en_US dc.title (題名) 行動應用軟體在迭代分群行為之研究 zh_TW dc.title (題名) Iterative Clustering on Behaviors of App Executables en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] Anonymous. (2010) Mimvi Reports Patent Filing for `Intelligent` Mobile App Search and Recommendation Technology." Entertainment Close – Up[2] Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433-459.[3] Bizzi, S., Harrison, R. F., & Lerner, D. N. (2009). The Growing Hierarchical Self-Organizing Map (GHSOM) for analysing multi-dimensional stream habitat datasets. In 18th World IMACS/MODSIM Congress.[4] Banković, Z., Stepanović, D., Bojanić, S., & Nieto-Taladriz, O. (2007). Improving network security using genetic algorithm approach. Computers & Electrical Engineering, 33(5), 438-451.[5] Bilar, D. (2007). Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics, 1(2), 156-168.[6] Chang, E. C., Huang, S. C., & Wu, H. H. (2010). Using K-means method and spectral clustering technique in an outfitter’s value analysis. Quality & Quantity, 44(4), 807-815.[7] Chandy, R., & Gu, H. (2012, April). Identifying spam in the iOS app store. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (pp. 56-59). ACM.[8] Danyu X.(2003).Pattern Recognition of Mutual Funds using Self-Organizing Maps Order No. MQ88787 Carleton University (Canada) [9] Eleyan, A., & Demirel, H. (2006). PCA and LDA based face recognition using feedforward neural network classifier. In Multimedia Content Representation, Classification and Security (pp. 199-206). Springer Berlin Heidelberg.[10] Eleyan, A., & Demirel, H. (2007). Pca and lda based neural networks for human face recognition. Face Recognition, 93-106.[11] Hurlburt, G., Voas, J., & Miller, K. W. (2011). mobile-app addiction: threat to security?. IT Professional.[12] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Applied statistics, 100-108.[13] Jieun Kim, Yongtae Park, Chulhyun Kim, Hakyeon Lee. "Mobile application service networks: Apple’s App Store." Service Business 8.1 (2014): 1-27.[14] Kenney, M., & Pon, B. (2011). Structuring the smartphone industry: is the zh_TW