題名 結合中文斷詞系統與雙分群演算法於音樂相關臉書粉絲團之分析:以KKBOX為例
Combing Chinese text segmentation system and co-clustering algorithm for analysis of music related Facebook fan page: A case of KKBOX作者 陳柏羽
Chen, Po Yu貢獻者 徐國偉
Hsu, Kuo Wei
Chen, Po Yu關鍵詞 雙分群
Chinese text segmentation system
Facebook fan page日期 2017 上傳時間 10-八月-2017 09:58:23 (UTC+8) 摘要 近年智慧型手機與網路的普及,使得社群網站與線上串流音樂蓬勃發展。臉書(Facebook)用戶截至去年止每月總體平均用戶高達18.6億人 ,粉絲專頁成為公司企業特別關注的行銷手段。粉絲專頁上的貼文能夠在短時間內經過點閱、分享傳播至用戶的頁面,達到比起電視廣告更佳的效果,也節省了許多的成本。本研究提供了一套針對臉書粉絲專頁貼文的分群流程,考量到貼文字詞的複雜性,除了抓取了臉書粉絲專頁的貼文外,也抓取了與其相關的KKBOX網頁資訊,整合KKBOX網頁中的資料,對中文斷詞系統(Jieba)的語料庫進行擴充,以提高斷詞的正確性,接著透過雙分群演算法(Minimum Squared Residue Co-Clustering Algorithm)對貼文進行分群,並利用鑑別率(Discrimination Rate)與凝聚率(Agglomerate Rate)配合主成份分析(Principal Component Analysis)所產生的分佈圖來對分群結果進行評估,選出較佳的分群結果進一步去分析,進而找出分類的根據。在結果中,發現本研究的方法能夠有效的區分出不同類型的貼文,甚至能夠依據使用字詞、語法或編排格式的不同來進行分群。
In recent years, because both smartphones and the Internet have become more popular, social network sites and music streaming services have grown vigorously. The monthly average of Facebook users hit 1.86 billion last years and Facebook Fan Page has become a popular marketing tool. Posts on Facebook can be broadcasted to millions of people in a short period of time by LIKEing and SHAREing pages. Using Facebook Fan Page as a marketing tool is more effective than advertising on television and can definitely reduce the costs. This study presents a process to cluster posts on Facebook Fan Page. Considering the complicated word usage, we grasped information on Facebook Fan Page and related information on the KKBOX website. First, we integrated the information on the website of KKBOX and expanded the text corpus of Jibea to enhance the accuracy of word segmentation. Then, we clustered the posts into several groups through Minimum Squared Residue Co-Clustering Algorithm and used discrimination Rate and Agglomerate Rate to analyze the distribution chart of Principal Component Analysis. After that, we found the suitable classification and could further analyze it. How posts are classified can then be found. As a result, we found that the method of this study can effectively cluster different kinds of posts and even cluster these posts according to its words, syntax and arrangement. 描述 碩士
國立政治大學
資訊科學學系
102753012 第一章 緒論 11.1 研究背景 11.1.1 KKBOX的沿革 21.1.2 Facebook粉絲專頁 51.2 研究動機 51.3 研究目的 61.4 研究方法 61.5 論文架構 8第二章 文獻探討 92.1 SOCIAL MEDIA 92.2 DOCUMENT CLUSTERING 132.3 小結 18第三章 資料處理 193.1 DATA CRAWLING 193.1.1 Facebook 粉絲專頁 193.1.2 KKBOX 排行榜 203.2 DATA CLEAN 263.3 DATA MERGE 26第四章 統計分析 294.1 BOKEH 294.1.1 Pandas 304.1.2 Bokeh Chart and Models 334.2 統計分析 34第五章 語句斷詞與雙分群演算法 445.1 語句斷詞 455.1.1 CKIP 455.1.2 Jieba 465.1.3 CKIP與Jieba之比較 485.2 CO-CLUSTERING 雙分群 525.2.1 Information Theoretic Co-Clustering Algorithm 545.2.2 Minimum Squared Residue Co-Clustering Algorithm 55第六章 實驗結果與討論 566.1實驗環境與流程 566.1.1實驗環境 566.1.2 實驗流程 576.2 實驗設計 586.2.1 Compressed Column Storage 596.2.2 Principal Component Analysis 606.2.3 Agglomerate rate and Discrimination rate 636.3實驗 646.3.1分群演算法實驗 646.3.2列分群實驗 736.3.3 行分群實驗 786.3.4 與其他方法比較 836.4實驗結果 90第七章 結論與未來可能研究方向 977.1結論 977.2未來可能研究方向 99 