Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 基於概念飄移探勘的社群多媒體之熱門程度預測
Popularity prediction of social multimedia based on concept drift mining作者 鄭世宏
Jheng, Shih Hong貢獻者 沈錳坤
Shan, Man Kwan
鄭世宏
Jheng, Shih Hong關鍵詞 社群多媒體
社群媒體
熱門預測
概念飄移
局部概念飄移
分類
Social Multimedia
Social Media
Popularity Prediction
Concept Drift
Local Concept Drift
Classification日期 2012 上傳時間 3-Dec-2012 11:27:18 (UTC+8) 摘要 近年來社群平台(Social Media)的興起,提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容,相較於單純的多媒體內容而言,社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄,以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料,讓社群多媒體比起單純的多媒體內容有更多的應用的可能。 微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台,有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言,在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標,就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料,針對群多媒體內容進行熱門預測。 隨著時間的前進,若以單一同樣的規則來進行熱門預測,將可能造成預測準確率的下降;再者,即使是在同樣的時間點,不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢,還是會有需要不同的規則來進行熱門預測的可能性,也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題,並同時將局部概念飄移現象納入考慮,提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示,有考慮局部概念飄移的熱門預測方法,在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升),代表我們的熱門預測方法更適合微網誌平台上的多媒體內容,也代表的確有概念飄移與局部概念飄移的現象存在。
In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media. Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia. The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well.參考文獻 [1] A. Bifet, J. Gama, M. Pechenizkiy and I. Žliobaitė, “Handling Concept Drift: Importance, Challenges & Solutions,” Tutorial, Proc. of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2011. [2] L. Breiman, “Random Forests,” Machine Learning, Vol. 45, Issue 1, Pages 5-32, 2001. [3] M. Cha, H. Kwak, P. Rodriguez, Y. Y. Ahn and S. Moon, “Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems,” IEEE/ACM Transactions on Networking, Vol. 17, Issue 5, Pages 1357-1370, 2009. [4] F. Figueiredo, F. Benevenuto and J. M. Almeida, “The Tube over Time: Characterizing Popularity Growth of YouTube Videos,” Proc. of the 4th ACM International Conference on Web Search and Data Mining, 2011. [5] L. Hong, O. Dan and B. D. Davison, “Predicting Popular Messages in Twitter,” Proc. of the 20th International Conference Companion on World Wide Web, 2011. [6] M. Harries and K. Horn, “Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning,” Proc. of the 8th Australian Joint Conference on Artificial Intelligence, World Scientific, 1995. [7] X. Jin, A. Gallagher, L. Cao, J. Luo and J. Han, “The Wisdom of Social Multimedia: Using Flickr For Prediction and Forecast,” Proc. of the 18th International Conference on Multimedia, 2010. [8] L. I. Kuncheva, “Classifier Ensembles for Changing Environments,” Proc. of the 5th International Workshop on Multiple Classifier Systems, 2004. [9] K. Lerman and T. Hogg, “Using a Model of Social Dynamics to Predict Popularity of News,” Proc. of the 19th International Conference on World Wide Web, 2010. [10] M. Naaman, H. Becker and L. Gravano, “Hip and Trendy: Characterizing Emerging Trends on Twitter,” Journal of the American Society for Information Science and Technology, Vol. 62, Issue 5, Pages 902-918, 2011. [11] D. R. Wilson and T. R. Martinez, “Improved Heterogeneous Distance Functions,” Journal of Artificial Intelligence Research, Vol. 6, Issue 1, Pages 1-34, 1997. [12] G. Szabo and B. A. Huberman, “Predicting the Popularity of Online Content,” Communications of the ACM, Vol. 53, Issue 8, Pages 80-88, 2010. [13] W. N. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001. [14] J. C. Schlimmer and R. H. Granger, “Incremental Learning from Noisy Data,” Journal of Machine Learning, Vol. 1, Issue 3, Pages 317-354, 1986. [15] C. T. Ho, Modeling and Visualizing Information Propagation in Micro-Blogging Platform, Master Thesis, Graduate Institute of Networking and Multimedia, National Taiwan University, 2010. [16] A. Tsymbla and M. Pechenizkiy, P. Cunningham and S. Puuronen, “Dynamic Integration of Classifiers for Handling Concept Drift,” An International Journal on Multi-Sensor, Multi-Source Information Fusion, Vol. 9, Issue 1, Pages 56–68, 2008. [17] H. Wang, W. Fan, P. S. Yu and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” Proc. of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. [18] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, 2000. [19] J. Yang and S. Counts, “Predicting the Speed, Scale, and Range of Information Diffusion in Twitter,” Proc. of the 4th International AAAI Conference on Weblogs and Social Media, 2010. [20] J. Z. Kolter and M. A. Maloof, “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift,” Proc. of the 3th IEEE International Conference on Data Mining, 2003. [21] I. Žliobaitė, Learning under Concept Drift: an Overview, Technical Report, 2009. [22] 社群媒體(Social Media),http://en.wikipedia.org/wiki/Social_media 描述 碩士
國立政治大學
資訊科學學系
98753010
101資料來源 http://thesis.lib.nccu.edu.tw/record/#G0098753010 資料類型 thesis dc.contributor.advisor 沈錳坤 zh_TW dc.contributor.advisor Shan, Man Kwan en_US dc.contributor.author (Authors) 鄭世宏 zh_TW dc.contributor.author (Authors) Jheng, Shih Hong en_US dc.creator (作者) 鄭世宏 zh_TW dc.creator (作者) Jheng, Shih Hong en_US dc.date (日期) 2012 en_US dc.date.accessioned 3-Dec-2012 11:27:18 (UTC+8) - dc.date.available 3-Dec-2012 11:27:18 (UTC+8) - dc.date.issued (上傳時間) 3-Dec-2012 11:27:18 (UTC+8) - dc.identifier (Other Identifiers) G0098753010 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/56328 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 98753010 zh_TW dc.description (描述) 101 zh_TW dc.description.abstract (摘要) 近年來社群平台(Social Media)的興起,提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容,相較於單純的多媒體內容而言,社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄,以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料,讓社群多媒體比起單純的多媒體內容有更多的應用的可能。 微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台,有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言,在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標,就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料,針對群多媒體內容進行熱門預測。 隨著時間的前進,若以單一同樣的規則來進行熱門預測,將可能造成預測準確率的下降;再者,即使是在同樣的時間點,不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢,還是會有需要不同的規則來進行熱門預測的可能性,也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題,並同時將局部概念飄移現象納入考慮,提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示,有考慮局部概念飄移的熱門預測方法,在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升),代表我們的熱門預測方法更適合微網誌平台上的多媒體內容,也代表的確有概念飄移與局部概念飄移的現象存在。 zh_TW dc.description.abstract (摘要) In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media. Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia. The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well. en_US dc.description.tableofcontents 摘要..................................................I 英文摘要..................................................III 誌謝..................................................IV 第一章 前言............................................1 1.1 背景與動機.........................................1 1.2 論文架構...........................................4 第二章 相關研究........................................5 2.1 微網誌平台上的熱門預測.............................5 2.2 社群平台上的熱門預測...............................8 第三章 問題定義........................................11 3.1 微網誌平台上的社群多媒體...........................11 3.2 熱門指標...........................................12 3.3 社群多媒體熱門預測問題.............................13 第四章 研究方法與步驟..................................14 4.1 社群多媒體熱門預測特徵值...........................14 4.1.1 消息擴散特徵值.................................15 4.1.1.1 針對發佈者集合.................................15 4.1.1.2 針對訊息集合...................................17 4.1.2 多媒體來源社群平台特徵值.......................19 4.2 INSTANCE的構成.....................................20 4.3 熱門預測方法.......................................20 4.3.1 概念飄移.......................................21 4.3.2 概念飄移類型...................................22 4.3.3 概念飄移與多媒體熱門預測問題...................24 4.3.4 局部概念飄移...................................25 4.4 應付串流資料的能力.................................29 第五章 實驗............................................31 5.1 實驗資料來源.......................................31 5.2 多媒體特徵值計算...................................33 5.3 熱門預測方法實作...................................35 5.4 實驗設計...........................................35 5.5 實驗評估方法.......................................37 5.6 多媒體內容是否會再被分享預測問題實驗結果...........40 5.7 多媒體內容熱門程度預測問題實驗結果.................46 第六章 系統應用實作....................................48 6.1 背景與動機.........................................48 6.2 系統特色...........................................49 6.2.1 使用者面向.....................................50 6.2.2 技術面向.......................................50 6.3 系統架構...........................................51 6.3.1 資料蒐集.......................................51 6.3.2 判定Kuso程度...................................52 6.3.3 計算熱門程度...................................53 6.3.4 關鍵字比對.....................................53 6.4 系統實際操作畫面與範例.............................54 第七章 結論與未來研究..................................57 參考文獻...............................................59 zh_TW dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0098753010 en_US dc.subject (關鍵詞) 社群多媒體 zh_TW dc.subject (關鍵詞) 社群媒體 zh_TW dc.subject (關鍵詞) 熱門預測 zh_TW dc.subject (關鍵詞) 概念飄移 zh_TW dc.subject (關鍵詞) 局部概念飄移 zh_TW dc.subject (關鍵詞) 分類 zh_TW dc.subject (關鍵詞) Social Multimedia en_US dc.subject (關鍵詞) Social Media en_US dc.subject (關鍵詞) Popularity Prediction en_US dc.subject (關鍵詞) Concept Drift en_US dc.subject (關鍵詞) Local Concept Drift en_US dc.subject (關鍵詞) Classification en_US dc.title (題名) 基於概念飄移探勘的社群多媒體之熱門程度預測 zh_TW dc.title (題名) Popularity prediction of social multimedia based on concept drift mining en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] A. Bifet, J. Gama, M. Pechenizkiy and I. Žliobaitė, “Handling Concept Drift: Importance, Challenges & Solutions,” Tutorial, Proc. of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2011. [2] L. Breiman, “Random Forests,” Machine Learning, Vol. 45, Issue 1, Pages 5-32, 2001. [3] M. Cha, H. Kwak, P. Rodriguez, Y. Y. Ahn and S. Moon, “Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems,” IEEE/ACM Transactions on Networking, Vol. 17, Issue 5, Pages 1357-1370, 2009. [4] F. Figueiredo, F. Benevenuto and J. M. Almeida, “The Tube over Time: Characterizing Popularity Growth of YouTube Videos,” Proc. of the 4th ACM International Conference on Web Search and Data Mining, 2011. [5] L. Hong, O. Dan and B. D. Davison, “Predicting Popular Messages in Twitter,” Proc. of the 20th International Conference Companion on World Wide Web, 2011. [6] M. Harries and K. Horn, “Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning,” Proc. of the 8th Australian Joint Conference on Artificial Intelligence, World Scientific, 1995. [7] X. Jin, A. Gallagher, L. Cao, J. Luo and J. Han, “The Wisdom of Social Multimedia: Using Flickr For Prediction and Forecast,” Proc. of the 18th International Conference on Multimedia, 2010. [8] L. I. Kuncheva, “Classifier Ensembles for Changing Environments,” Proc. of the 5th International Workshop on Multiple Classifier Systems, 2004. [9] K. Lerman and T. Hogg, “Using a Model of Social Dynamics to Predict Popularity of News,” Proc. of the 19th International Conference on World Wide Web, 2010. [10] M. Naaman, H. Becker and L. Gravano, “Hip and Trendy: Characterizing Emerging Trends on Twitter,” Journal of the American Society for Information Science and Technology, Vol. 62, Issue 5, Pages 902-918, 2011. [11] D. R. Wilson and T. R. Martinez, “Improved Heterogeneous Distance Functions,” Journal of Artificial Intelligence Research, Vol. 6, Issue 1, Pages 1-34, 1997. [12] G. Szabo and B. A. Huberman, “Predicting the Popularity of Online Content,” Communications of the ACM, Vol. 53, Issue 8, Pages 80-88, 2010. [13] W. N. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001. [14] J. C. Schlimmer and R. H. Granger, “Incremental Learning from Noisy Data,” Journal of Machine Learning, Vol. 1, Issue 3, Pages 317-354, 1986. [15] C. T. Ho, Modeling and Visualizing Information Propagation in Micro-Blogging Platform, Master Thesis, Graduate Institute of Networking and Multimedia, National Taiwan University, 2010. [16] A. Tsymbla and M. Pechenizkiy, P. Cunningham and S. Puuronen, “Dynamic Integration of Classifiers for Handling Concept Drift,” An International Journal on Multi-Sensor, Multi-Source Information Fusion, Vol. 9, Issue 1, Pages 56–68, 2008. [17] H. Wang, W. Fan, P. S. Yu and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers,” Proc. of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. [18] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations, 2000. [19] J. Yang and S. Counts, “Predicting the Speed, Scale, and Range of Information Diffusion in Twitter,” Proc. of the 4th International AAAI Conference on Weblogs and Social Media, 2010. [20] J. Z. Kolter and M. A. Maloof, “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift,” Proc. of the 3th IEEE International Conference on Data Mining, 2003. [21] I. Žliobaitė, Learning under Concept Drift: an Overview, Technical Report, 2009. [22] 社群媒體(Social Media),http://en.wikipedia.org/wiki/Social_media zh_TW