學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 行動應用軟體在迭代分群行為之研究
Iterative Clustering on Behaviors of App Executables
作者 邱莉晴
Chiu, Li Ching
貢獻者 郁方
Yu, Fang
邱莉晴
Chiu, Li Ching
關鍵詞 行動應用程式
GHSOM
分群
App
Clustering
GHSOM
iterative
日期 2013
上傳時間 25-Aug-2014 15:16:54 (UTC+8)
摘要 行動裝置在現在這個世代相當普遍,而我們需要一個方法來探索App在背後的行為。
本研究提出了一個非監督式的分群方式,目的是在於探討我們是否能使用App中的原始碼當作以行為分群的依據。
在此研究中,我們應用了迭代分群的方式對Apps做分析,並且觀察分群的結果是否恰當。
而在實驗中,我們由App Store下載了數百個App並加以分析,我們發現我們所提出的方式表現相當良好並且能給出正確的分群結果。
Smart devices are everywhere nowadays. Mobile application (app) development has become one of the main streams in software industry with more than millions of apps that have been developed and published to billions of users.

It is essential to have a systematic way to analyze apps, preferably on their executable that are the only public available sources of apps in most cases.

In this work, we propose to apply unsupervised clustering to mobile applications on their system call distributions. This is done by first adopting a static binary analysis that reverses engineering on executable of apps to find method call/sequence counts that are embedded in apps. Apps are then clustered iteratively based on this information to reveal implicit relationships among apps based on function call similarity. The GHSOM (Growing Hierarchical Self-Organizing Map), an unsupervised learning tool, is integrated to cluster apps based on the information resolved from their executable directly.

We use types of methods and sequences as features. To run the clustering algorithm on apps, however, we immediately confront a problem that we have a large amount of attributes and data that leads to a long/infeasible analysis time with GHSOMs. The new iterative approach is proposed to conquer this problem along with dimension reduction with principle component analysis, cutting attributes with limited information loss.

In the preliminary result on analyzing hundreds of apps that are directly downloaded from Apple app store, we can find that the proposed clustering works well and reveals some interesting information. Apps that are developed by the same company are clustered in the same group. Apps that have similar behaviors, e.g., having the same functions on games, painting, socializing, are clustered together.
參考文獻 [1] Anonymous. (2010) Mimvi Reports Patent Filing for `Intelligent` Mobile App
Search and Recommendation Technology." Entertainment Close – Up
[2] Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley
Interdisciplinary Reviews: Computational Statistics, 2(4), 433-459.
[3] Bizzi, S., Harrison, R. F., & Lerner, D. N. (2009). The Growing Hierarchical
Self-Organizing Map (GHSOM) for analysing multi-dimensional stream habitat
datasets. In 18th World IMACS/MODSIM Congress.
[4] Banković, Z., Stepanović, D., Bojanić, S., & Nieto-Taladriz, O. (2007).
Improving network security using genetic algorithm approach. Computers &
Electrical Engineering, 33(5), 438-451.
[5] Bilar, D. (2007). Opcodes as predictor for malware. International Journal of
Electronic Security and Digital Forensics, 1(2), 156-168.
[6] Chang, E. C., Huang, S. C., & Wu, H. H. (2010). Using K-means method and
spectral clustering technique in an outfitter’s value analysis. Quality & Quantity,
44(4), 807-815.
[7] Chandy, R., & Gu, H. (2012, April). Identifying spam in the iOS app store. In
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (pp.
56-59). ACM.
[8] Danyu X.(2003).Pattern Recognition of Mutual Funds using Self-Organizing
Maps Order No. MQ88787 Carleton University (Canada)
[9] Eleyan, A., & Demirel, H. (2006). PCA and LDA based face recognition
using feedforward neural network classifier. In Multimedia Content
Representation, Classification and Security (pp. 199-206). Springer Berlin
Heidelberg.
[10] Eleyan, A., & Demirel, H. (2007). Pca and lda based neural networks for
human face recognition. Face Recognition, 93-106.
[11] Hurlburt, G., Voas, J., & Miller, K. W. (2011). mobile-app addiction: threat
to security?. IT Professional.
[12] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means
clustering algorithm. Applied statistics, 100-108.
[13] Jieun Kim, Yongtae Park, Chulhyun Kim, Hakyeon Lee. "Mobile
application service networks: Apple’s App Store." Service Business 8.1 (2014):
1-27.
[14] Kenney, M., & Pon, B. (2011). Structuring the smartphone industry: is the
描述 碩士
國立政治大學
資訊管理研究所
101356040
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G1013560401
資料類型 thesis
dc.contributor.advisor 郁方zh_TW
dc.contributor.advisor Yu, Fangen_US
dc.contributor.author (Authors) 邱莉晴zh_TW
dc.contributor.author (Authors) Chiu, Li Chingen_US
dc.creator (作者) 邱莉晴zh_TW
dc.creator (作者) Chiu, Li Chingen_US
dc.date (日期) 2013en_US
dc.date.accessioned 25-Aug-2014 15:16:54 (UTC+8)-
dc.date.available 25-Aug-2014 15:16:54 (UTC+8)-
dc.date.issued (上傳時間) 25-Aug-2014 15:16:54 (UTC+8)-
dc.identifier (Other Identifiers) G1013560401en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/69198-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 101356040zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 行動裝置在現在這個世代相當普遍,而我們需要一個方法來探索App在背後的行為。
本研究提出了一個非監督式的分群方式,目的是在於探討我們是否能使用App中的原始碼當作以行為分群的依據。
在此研究中,我們應用了迭代分群的方式對Apps做分析,並且觀察分群的結果是否恰當。
而在實驗中,我們由App Store下載了數百個App並加以分析,我們發現我們所提出的方式表現相當良好並且能給出正確的分群結果。
zh_TW
dc.description.abstract (摘要) Smart devices are everywhere nowadays. Mobile application (app) development has become one of the main streams in software industry with more than millions of apps that have been developed and published to billions of users.

It is essential to have a systematic way to analyze apps, preferably on their executable that are the only public available sources of apps in most cases.

In this work, we propose to apply unsupervised clustering to mobile applications on their system call distributions. This is done by first adopting a static binary analysis that reverses engineering on executable of apps to find method call/sequence counts that are embedded in apps. Apps are then clustered iteratively based on this information to reveal implicit relationships among apps based on function call similarity. The GHSOM (Growing Hierarchical Self-Organizing Map), an unsupervised learning tool, is integrated to cluster apps based on the information resolved from their executable directly.

We use types of methods and sequences as features. To run the clustering algorithm on apps, however, we immediately confront a problem that we have a large amount of attributes and data that leads to a long/infeasible analysis time with GHSOMs. The new iterative approach is proposed to conquer this problem along with dimension reduction with principle component analysis, cutting attributes with limited information loss.

In the preliminary result on analyzing hundreds of apps that are directly downloaded from Apple app store, we can find that the proposed clustering works well and reveals some interesting information. Apps that are developed by the same company are clustered in the same group. Apps that have similar behaviors, e.g., having the same functions on games, painting, socializing, are clustered together.
en_US
dc.description.tableofcontents Abstract............................................................................................................................. 3
Content.............................................................................................................................. 4
1 Introduction.................................................................................................................... 7
2 Related Works.............................................................................................................. 10
2.1 Clustering methods ............................................................................................... 10
2.1.1 K-Means Algorithm....................................................................................... 10
2.1.2 SOM............................................................................................................... 11
2.1.3 GHSOM......................................................................................................... 12
2.1.4 Comparison of clustering method .................................................................. 13
2.2 Dimension reduction............................................................................................. 14
2.2.1 PCA................................................................................................................ 14
2.2.2 Comparison with LDA method.......................................................................... 15
2.3 OPcode sequence analysis .................................................................................... 16
2.4 App Analysis and clustering ................................................................................. 19
4 Evaluations................................................................................................................... 30
4.1 115 apps clustering ............................................................................................... 30
4.2 564 apps clustering ............................................................................................... 35
4.2.1 PCA reduction................................................................................................ 35
4.2.2 Iterative GHSOM on 564 apps....................................................................... 36
4.3 800 apps clustering ............................................................................................... 37
4.3.1 PCA reduction on 800 apps ........................................................................... 38
4.3.1 Iterative GHSOM on 800 apps....................................................................... 38
5 Conclusions.................................................................................................................. 42
Reference ........................................................................................................................ 43
Appendix......................................................................................................................... 46
1.GHSOM clustering result of 115 apps ................................................................... 46(1). Segment of MATLAB code on transfer the original data: ............................... 47
2. Progress of iterative GHSOM on 115 apps............................................................. 48
3.564 apps iterative progress....................................................................................... 50
4.800 apps progress..................................................................................................... 53
zh_TW
dc.format.extent 2019318 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1013560401en_US
dc.subject (關鍵詞) 行動應用程式zh_TW
dc.subject (關鍵詞) GHSOMzh_TW
dc.subject (關鍵詞) 分群zh_TW
dc.subject (關鍵詞) Appen_US
dc.subject (關鍵詞) Clusteringen_US
dc.subject (關鍵詞) GHSOMen_US
dc.subject (關鍵詞) iterativeen_US
dc.title (題名) 行動應用軟體在迭代分群行為之研究zh_TW
dc.title (題名) Iterative Clustering on Behaviors of App Executablesen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] Anonymous. (2010) Mimvi Reports Patent Filing for `Intelligent` Mobile App
Search and Recommendation Technology." Entertainment Close – Up
[2] Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley
Interdisciplinary Reviews: Computational Statistics, 2(4), 433-459.
[3] Bizzi, S., Harrison, R. F., & Lerner, D. N. (2009). The Growing Hierarchical
Self-Organizing Map (GHSOM) for analysing multi-dimensional stream habitat
datasets. In 18th World IMACS/MODSIM Congress.
[4] Banković, Z., Stepanović, D., Bojanić, S., & Nieto-Taladriz, O. (2007).
Improving network security using genetic algorithm approach. Computers &
Electrical Engineering, 33(5), 438-451.
[5] Bilar, D. (2007). Opcodes as predictor for malware. International Journal of
Electronic Security and Digital Forensics, 1(2), 156-168.
[6] Chang, E. C., Huang, S. C., & Wu, H. H. (2010). Using K-means method and
spectral clustering technique in an outfitter’s value analysis. Quality & Quantity,
44(4), 807-815.
[7] Chandy, R., & Gu, H. (2012, April). Identifying spam in the iOS app store. In
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (pp.
56-59). ACM.
[8] Danyu X.(2003).Pattern Recognition of Mutual Funds using Self-Organizing
Maps Order No. MQ88787 Carleton University (Canada)
[9] Eleyan, A., & Demirel, H. (2006). PCA and LDA based face recognition
using feedforward neural network classifier. In Multimedia Content
Representation, Classification and Security (pp. 199-206). Springer Berlin
Heidelberg.
[10] Eleyan, A., & Demirel, H. (2007). Pca and lda based neural networks for
human face recognition. Face Recognition, 93-106.
[11] Hurlburt, G., Voas, J., & Miller, K. W. (2011). mobile-app addiction: threat
to security?. IT Professional.
[12] Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means
clustering algorithm. Applied statistics, 100-108.
[13] Jieun Kim, Yongtae Park, Chulhyun Kim, Hakyeon Lee. "Mobile
application service networks: Apple’s App Store." Service Business 8.1 (2014):
1-27.
[14] Kenney, M., & Pon, B. (2011). Structuring the smartphone industry: is the
zh_TW