學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 基於概念飄移探勘的社群多媒體之熱門程度預測
Popularity prediction of social multimedia based on concept drift mining
作者 鄭世宏
Jheng, Shih Hong
貢獻者 沈錳坤
Shan, Man Kwan
鄭世宏
Jheng, Shih Hong
關鍵詞 社群多媒體
社群媒體
熱門預測
概念飄移
局部概念飄移
分類
Social Multimedia
Social Media
Popularity Prediction
Concept Drift
Local Concept Drift
Classification
日期 2012
上傳時間 3-Dec-2012 11:27:18 (UTC+8)
摘要 近年來社群平台(Social Media)的興起,提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容,相較於單純的多媒體內容而言,社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄,以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料,讓社群多媒體比起單純的多媒體內容有更多的應用的可能。
     微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台,有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言,在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標,就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料,針對群多媒體內容進行熱門預測。
     隨著時間的前進,若以單一同樣的規則來進行熱門預測,將可能造成預測準確率的下降;再者,即使是在同樣的時間點,不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢,還是會有需要不同的規則來進行熱門預測的可能性,也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題,並同時將局部概念飄移現象納入考慮,提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示,有考慮局部概念飄移的熱門預測方法,在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升),代表我們的熱門預測方法更適合微網誌平台上的多媒體內容,也代表的確有概念飄移與局部概念飄移的現象存在。
In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media.
     Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia.
     The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well.
參考文獻 [1] A. Bifet, J. Gama, M. Pechenizkiy and I. Žliobaitė, “Handling Concept Drift: Importance, Challenges & Solutions,” Tutorial, Proc. of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2011.
     [2] L. Breiman, “Random Forests,” Machine Learning, Vol. 45, Issue 1, Pages 5-32, 2001.
     [3] M. Cha, H. Kwak, P. Rodriguez, Y. Y. Ahn and S. Moon, “Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems,” IEEE/ACM Transactions on Networking, Vol. 17, Issue 5, Pages 1357-1370, 2009.
     [4] F. Figueiredo, F. Benevenuto and J. M. Almeida, “The Tube over Time: Characterizing Popularity Growth of YouTube Videos,” Proc. of the 4th ACM International Conference on Web Search and Data Mining, 2011.
     [5] L. Hong, O. Dan and B. D. Davison, “Predicting Popular Messages in Twitter,” Proc. of the 20th International Conference Companion on World Wide Web, 2011.
     [6] M. Harries and K. Horn, “Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning,” Proc. of the 8th Australian Joint Conference on Artificial Intelligence, World Scientific, 1995.
     [7] X. Jin, A. Gallagher, L. Cao, J. Luo and J. Han, “The Wisdom of Social Multimedia: Using Flickr For Prediction and Forecast,” Proc. of the 18th International Conference on Multimedia, 2010.
     [8] L. I. Kuncheva, “Classifier Ensembles for Changing Environments,” Proc. of the 5th International Workshop on Multiple Classifier Systems, 2004.
     [9] K. Lerman and T. Hogg, “Using a Model of Social Dynamics to Predict Popularity of News,” Proc. of the 19th International Conference on World Wide Web, 2010.
     [10] M. Naaman, H. Becker and L. Gravano, “Hip and Trendy: Characterizing Emerging Trends on Twitter,” Journal of the American Society for Information Science and Technology, Vol. 62, Issue 5, Pages 902-918, 2011.
     [11] D. R. Wilson and T. R. Martinez, “Improved Heterogeneous Distance Functions,” Journal of Artificial Intelligence Research, Vol. 6, Issue 1, Pages 1-34, 1997.
     [12] G. Szabo and B. A. Huberman, “Predicting the Popularity of Online Content,” Communications of the ACM, Vol. 53, Issue 8, Pages 80-88, 2010.
     [13] W. N. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
     [14] J. C. Schlimmer and R. H. Granger, “Incremental Learning from Noisy Data,” Journal of Machine Learning, Vol. 1, Issue 3, Pages 317-354, 1986.
     [15] C. T. Ho, Modeling and Visualizing Information Propagation in Micro-Blogging Platform, Master Thesis, Graduate Institute of Networking and Multimedia, National Taiwan University, 2010.
     [16] A. Tsymbla and M. Pechenizkiy, P. Cunningham and S. Puuronen, “Dynamic Integration of Classifiers for Handling Concept Drift,” An International Journal on Multi-Sensor, Multi-Source Information Fusion, Vol. 9, Issue 1, Pages 56–68, 2008.
     [17] H. Wang, W. Fan, P. S. Yu and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” Proc. of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
     [18] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, 2000.
     [19] J. Yang and S. Counts, “Predicting the Speed, Scale, and Range of Information Diffusion in Twitter,” Proc. of the 4th International AAAI Conference on Weblogs and Social Media, 2010.
     [20] J. Z. Kolter and M. A. Maloof, “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift,” Proc. of the 3th IEEE International Conference on Data Mining, 2003.
     [21] I. Žliobaitė, Learning under Concept Drift: an Overview, Technical Report, 2009.
     [22] 社群媒體(Social Media),http://en.wikipedia.org/wiki/Social_media
描述 碩士
國立政治大學
資訊科學學系
98753010
101
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0098753010
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.advisor Shan, Man Kwanen_US
dc.contributor.author (Authors) 鄭世宏zh_TW
dc.contributor.author (Authors) Jheng, Shih Hongen_US
dc.creator (作者) 鄭世宏zh_TW
dc.creator (作者) Jheng, Shih Hongen_US
dc.date (日期) 2012en_US
dc.date.accessioned 3-Dec-2012 11:27:18 (UTC+8)-
dc.date.available 3-Dec-2012 11:27:18 (UTC+8)-
dc.date.issued (上傳時間) 3-Dec-2012 11:27:18 (UTC+8)-
dc.identifier (Other Identifiers) G0098753010en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/56328-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 98753010zh_TW
dc.description (描述) 101zh_TW
dc.description.abstract (摘要) 近年來社群平台(Social Media)的興起,提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容,相較於單純的多媒體內容而言,社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄,以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料,讓社群多媒體比起單純的多媒體內容有更多的應用的可能。
     微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台,有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言,在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標,就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料,針對群多媒體內容進行熱門預測。
     隨著時間的前進,若以單一同樣的規則來進行熱門預測,將可能造成預測準確率的下降;再者,即使是在同樣的時間點,不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢,還是會有需要不同的規則來進行熱門預測的可能性,也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題,並同時將局部概念飄移現象納入考慮,提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示,有考慮局部概念飄移的熱門預測方法,在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升),代表我們的熱門預測方法更適合微網誌平台上的多媒體內容,也代表的確有概念飄移與局部概念飄移的現象存在。
zh_TW
dc.description.abstract (摘要) In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media.
     Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia.
     The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well.
en_US
dc.description.tableofcontents 摘要..................................................I
     英文摘要..................................................III
     誌謝..................................................IV
     第一章 前言............................................1
     1.1 背景與動機.........................................1
     1.2 論文架構...........................................4
     第二章 相關研究........................................5
     2.1 微網誌平台上的熱門預測.............................5
     2.2 社群平台上的熱門預測...............................8
     第三章 問題定義........................................11
     3.1 微網誌平台上的社群多媒體...........................11
     3.2 熱門指標...........................................12
     3.3 社群多媒體熱門預測問題.............................13
     第四章 研究方法與步驟..................................14
     4.1 社群多媒體熱門預測特徵值...........................14
     4.1.1 消息擴散特徵值.................................15
     4.1.1.1 針對發佈者集合.................................15
     4.1.1.2 針對訊息集合...................................17
     4.1.2 多媒體來源社群平台特徵值.......................19
     4.2 INSTANCE的構成.....................................20
     4.3 熱門預測方法.......................................20
     4.3.1 概念飄移.......................................21
     4.3.2 概念飄移類型...................................22
     4.3.3 概念飄移與多媒體熱門預測問題...................24
     4.3.4 局部概念飄移...................................25
     4.4 應付串流資料的能力.................................29
     第五章 實驗............................................31
     5.1 實驗資料來源.......................................31
     5.2 多媒體特徵值計算...................................33
     5.3 熱門預測方法實作...................................35
     5.4 實驗設計...........................................35
     5.5 實驗評估方法.......................................37
     5.6 多媒體內容是否會再被分享預測問題實驗結果...........40
     5.7 多媒體內容熱門程度預測問題實驗結果.................46
     第六章 系統應用實作....................................48
     6.1 背景與動機.........................................48
     6.2 系統特色...........................................49
     6.2.1 使用者面向.....................................50
     6.2.2 技術面向.......................................50
     6.3 系統架構...........................................51
     6.3.1 資料蒐集.......................................51
     6.3.2 判定Kuso程度...................................52
     6.3.3 計算熱門程度...................................53
     6.3.4 關鍵字比對.....................................53
     6.4 系統實際操作畫面與範例.............................54
     第七章 結論與未來研究..................................57
     參考文獻...............................................59
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0098753010en_US
dc.subject (關鍵詞) 社群多媒體zh_TW
dc.subject (關鍵詞) 社群媒體zh_TW
dc.subject (關鍵詞) 熱門預測zh_TW
dc.subject (關鍵詞) 概念飄移zh_TW
dc.subject (關鍵詞) 局部概念飄移zh_TW
dc.subject (關鍵詞) 分類zh_TW
dc.subject (關鍵詞) Social Multimediaen_US
dc.subject (關鍵詞) Social Mediaen_US
dc.subject (關鍵詞) Popularity Predictionen_US
dc.subject (關鍵詞) Concept Driften_US
dc.subject (關鍵詞) Local Concept Driften_US
dc.subject (關鍵詞) Classificationen_US
dc.title (題名) 基於概念飄移探勘的社群多媒體之熱門程度預測zh_TW
dc.title (題名) Popularity prediction of social multimedia based on concept drift miningen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] A. Bifet, J. Gama, M. Pechenizkiy and I. Žliobaitė, “Handling Concept Drift: Importance, Challenges & Solutions,” Tutorial, Proc. of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2011.
     [2] L. Breiman, “Random Forests,” Machine Learning, Vol. 45, Issue 1, Pages 5-32, 2001.
     [3] M. Cha, H. Kwak, P. Rodriguez, Y. Y. Ahn and S. Moon, “Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems,” IEEE/ACM Transactions on Networking, Vol. 17, Issue 5, Pages 1357-1370, 2009.
     [4] F. Figueiredo, F. Benevenuto and J. M. Almeida, “The Tube over Time: Characterizing Popularity Growth of YouTube Videos,” Proc. of the 4th ACM International Conference on Web Search and Data Mining, 2011.
     [5] L. Hong, O. Dan and B. D. Davison, “Predicting Popular Messages in Twitter,” Proc. of the 20th International Conference Companion on World Wide Web, 2011.
     [6] M. Harries and K. Horn, “Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning,” Proc. of the 8th Australian Joint Conference on Artificial Intelligence, World Scientific, 1995.
     [7] X. Jin, A. Gallagher, L. Cao, J. Luo and J. Han, “The Wisdom of Social Multimedia: Using Flickr For Prediction and Forecast,” Proc. of the 18th International Conference on Multimedia, 2010.
     [8] L. I. Kuncheva, “Classifier Ensembles for Changing Environments,” Proc. of the 5th International Workshop on Multiple Classifier Systems, 2004.
     [9] K. Lerman and T. Hogg, “Using a Model of Social Dynamics to Predict Popularity of News,” Proc. of the 19th International Conference on World Wide Web, 2010.
     [10] M. Naaman, H. Becker and L. Gravano, “Hip and Trendy: Characterizing Emerging Trends on Twitter,” Journal of the American Society for Information Science and Technology, Vol. 62, Issue 5, Pages 902-918, 2011.
     [11] D. R. Wilson and T. R. Martinez, “Improved Heterogeneous Distance Functions,” Journal of Artificial Intelligence Research, Vol. 6, Issue 1, Pages 1-34, 1997.
     [12] G. Szabo and B. A. Huberman, “Predicting the Popularity of Online Content,” Communications of the ACM, Vol. 53, Issue 8, Pages 80-88, 2010.
     [13] W. N. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
     [14] J. C. Schlimmer and R. H. Granger, “Incremental Learning from Noisy Data,” Journal of Machine Learning, Vol. 1, Issue 3, Pages 317-354, 1986.
     [15] C. T. Ho, Modeling and Visualizing Information Propagation in Micro-Blogging Platform, Master Thesis, Graduate Institute of Networking and Multimedia, National Taiwan University, 2010.
     [16] A. Tsymbla and M. Pechenizkiy, P. Cunningham and S. Puuronen, “Dynamic Integration of Classifiers for Handling Concept Drift,” An International Journal on Multi-Sensor, Multi-Source Information Fusion, Vol. 9, Issue 1, Pages 56–68, 2008.
     [17] H. Wang, W. Fan, P. S. Yu and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” Proc. of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
     [18] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, 2000.
     [19] J. Yang and S. Counts, “Predicting the Speed, Scale, and Range of Information Diffusion in Twitter,” Proc. of the 4th International AAAI Conference on Weblogs and Social Media, 2010.
     [20] J. Z. Kolter and M. A. Maloof, “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift,” Proc. of the 3th IEEE International Conference on Data Mining, 2003.
     [21] I. Žliobaitė, Learning under Concept Drift: an Overview, Technical Report, 2009.
     [22] 社群媒體(Social Media),http://en.wikipedia.org/wiki/Social_media
zh_TW