應用資料探勘技術於食譜分享社群網站進行內容分群之研究 | Publication

Publications-Theses

Article View/Open

html(408)

Publication Export

Google Scholar^TM

題名	應用資料探勘技術於食譜分享社群網站進行內容分群之研究 A user-based content clustering system using data mining techniques on a recipe sharing website
作者	林宜儒
貢獻者	楊建民林宜儒
關鍵詞	文字探勘資料分群 text mining data clustering
日期	2012
上傳時間	2-Jan-2013 13:21:51 (UTC+8)
摘要	本研究以一個食譜分享社群網站為研究對象，針對網站上所提供的食譜建立了運用 kNN 分群演算法的自動分群機制，並利用該網站上使用者的使用行為進行分群後群集的特徵描述參考。本研究以三個階段建立了一針對食譜領域進行自動分群的資訊系統。第一階段為資料處理，在取得食譜網站上所提供的食譜資料後，雖然已經有相對結構化的格式可直接進行分群運算，然而由使用者所輸入的內容，仍有錯別字、贅詞、與食譜本身直接關連性不高等情形，因此必須進行處理。第二階段為資料分群，利用文字探勘進行內容特徵值的萃取，接著再以資料探勘的技術進行分群，分群的結果將會依群內的特徵、群間的相似度作為分群品質的主要指標。第三階段則為群集特徵分析，利用網站上使用者收藏食譜並加以分類的行為，運用統計的方式找出該群集的可能分類名稱。本研究實際以 500 篇食譜進行分群實驗，在最佳的一次分群結果中，可得到 10 個食譜群集、平均群內相似度為 0.4482，每個群集可觀察出明顯的相似特徵，並且可藉由網站上使用者的收藏行為，標註出其群集特徵，例如湯品、甜點、麵包、中式料理等類別。由於網站依照schema.org 所提供的食譜格式標準，針對網站上每一篇食譜內容進行了內容欄位的標記，本研究所實作之食譜分群機制，未來亦可運用在其他同樣採用 schema.org 所提供標準之同類型網站。
參考文獻	1. C. H. Tsai. MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. http://technology.chtsai.org/mmseg/, 1996. 2. Facebook, Inc., Form S-1 REGISTRATION STATEMENT http://sec.gov/Archives/edgar/data/1326801/000119312512034517/d287954ds1.htm, 2012. 3. G. Adomavicius, A. Tuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, Jun., 2005. 4. C. Haythornthwaite. Social network analysis: An approach and technique for the study of information exchange. Library & Information Science Research, Volume 18, Issue 4, Autumn 1996, pp. 323-342. 5. G.N. Lance and W.T. Williams. A general theory of classificatory sorting strategies: I. Hierarchical systems. Computer Journal, 9, 373-380, 1967. 6. M. Montaner, B. Lopez and J. L. Rosa. A Taxonomy of Recommender Agents on the Internet. Artificial Intelligence Reivew 19: 285-330, 2003. 7. P. V. Marsden and K. E. Campbell. Measuring Tie Strength. Social Forces Volume, 63, Issue 2, pp. 482-501, 2004. 8. M. S. Granovetter. The Strength of Weak Ties. American Journal of Sociology, Volume 78, Issue 6, pp. 1360-1380, 1973. 9. U. Y. Nahm and R. J. Mooney. Text Mining with Information Extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pages 60-67, Stanford, CA. March 2002. 10. J.Y. Nie, M. Brisebois and X. Ren. On Chinese Text Retrieval. Conference Proceedings of SIGIR, pp. 225-233. 1996. 11. A. Popescu. Implementation of Term Weighting in a Simple IR System. Personal course project, University of Helsinki. 2001. 12. P. Fraigniaud, P. Gauron and M. Latapy. Combining the use of clustering and scale-free nature of exchanges into a simple and eﬃcient P2P system. Proceedings of the 11th international Euro-Par conference on Parallel Processing. 2005. 13. R. Cilibrasi and P.M.B. Vitanyi: Automatic meaning discovery using Google. http://xxx.lanl.gov/abs/cs.CL/0412098, 2004. 14. Rudi Cilibrasi and Paul Vitanyi, The Google Similarity Distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), 370-383. 15. G. Salton and M. Gill. Introduction to Modern Information Retrieval, McGraw-Hill. 1983. 16. G. Salton, A. Wong and C. S. Yang. A Vector Space Model for Automatic Indexing. Communications of the ACM, Volume 18, Issue 11, Nov. 1975. 17. Howard and Rheingold. The Virtual Community: Homesteading on the Electronic Frontier. London: MIT Press. (ISBN 0-262-68121-8), 1993. 18. J.B. Schafer, J.A. Konstan and J. Riedl. Recommender Systems in Electronic Commerce. Proceedings of the ACM Conference on Electronic Commerce. 1999. 19. W. Lam and C. Y. Ho. Using a Generalized Instance Set for Automatic Text Categorization. Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 81-89. 20. X. Cai, et al. Collaborative Filtering for People to People Recommendation in Social Networks. Lecture Notes in Computer Science, Volume 6464, 2011, pp. 476-485 21. X Su, T. M. Khoshgoftaar. A Survey of Collaboratiev Filtering Techniques. Advances in Artificial Intelligence, Volume 2009, January 2009. 22. Y. J. Ko and Y. J. Seo, Text categorization using feature projections, Proceedings of the Nineteenth international conference on Computational linguistics, Volume 1, pp.1-7, 2002. 23. 朱怡霖，中文斷詞及專有名詞辨識之研究，國立台灣大學資訊工程研究所碩士論文，2002。 24. 巫啟台，文件之關聯資訊萃取及其概念圖自動建構 (碩士論文)，國立成功大學資訊工程學系碩士論文，2002。 25. 曾元顯，關鍵詞自動擷取技術與相關詞回饋，中國圖書館學會會報，59期，1997。 26. 楊舜慧，探索資訊時代的網路經濟法則(十二) 解析網路社群的種類和型態， http://www.ectimes.org.tw/shownews.aspx?id=10026，2007 27. 蔡至欣、賴玲玲，虛擬社群的資訊分享行為，圖書資訊學刊，第 9 卷，第 1 期，2011 28. 戴尚學，運用事件偵測與追蹤技術於中文多文件摘要之研究，國立雲林科技大學資訊管理研究所碩士論文，2003。 29. 顧皓光，網路文件自動分類,國立台灣大學資訊管理研究所碩士論文，1996。 30. 傅仰止，電腦網路中的人際關係：以電子郵件傳遞為例，http://140.109.196.10/pages/seminar/infotec2/info2-9.htm ，中央研究院社會學研究所，2003。
描述	碩士國立政治大學資訊管理研究所 97356002 101
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0097356002
資料類型	thesis

dc.contributor.advisor	楊建民	zh_TW
dc.contributor.author (Authors)	林宜儒	zh_TW
dc.creator (作者)	林宜儒	zh_TW
dc.date (日期)	2012	en_US
dc.date.accessioned	2-Jan-2013 13:21:51 (UTC+8)	-
dc.date.available	2-Jan-2013 13:21:51 (UTC+8)	-
dc.date.issued (上傳時間)	2-Jan-2013 13:21:51 (UTC+8)	-
dc.identifier (Other Identifiers)	G0097356002	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/56502	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理研究所	zh_TW
dc.description (描述)	97356002	zh_TW
dc.description (描述)	101	zh_TW
dc.description.abstract (摘要)	本研究以一個食譜分享社群網站為研究對象，針對網站上所提供的食譜建立了運用 kNN 分群演算法的自動分群機制，並利用該網站上使用者的使用行為進行分群後群集的特徵描述參考。本研究以三個階段建立了一針對食譜領域進行自動分群的資訊系統。第一階段為資料處理，在取得食譜網站上所提供的食譜資料後，雖然已經有相對結構化的格式可直接進行分群運算，然而由使用者所輸入的內容，仍有錯別字、贅詞、與食譜本身直接關連性不高等情形，因此必須進行處理。第二階段為資料分群，利用文字探勘進行內容特徵值的萃取，接著再以資料探勘的技術進行分群，分群的結果將會依群內的特徵、群間的相似度作為分群品質的主要指標。第三階段則為群集特徵分析，利用網站上使用者收藏食譜並加以分類的行為，運用統計的方式找出該群集的可能分類名稱。本研究實際以 500 篇食譜進行分群實驗，在最佳的一次分群結果中，可得到 10 個食譜群集、平均群內相似度為 0.4482，每個群集可觀察出明顯的相似特徵，並且可藉由網站上使用者的收藏行為，標註出其群集特徵，例如湯品、甜點、麵包、中式料理等類別。由於網站依照schema.org 所提供的食譜格式標準，針對網站上每一篇食譜內容進行了內容欄位的標記，本研究所實作之食譜分群機制，未來亦可運用在其他同樣採用 schema.org 所提供標準之同類型網站。	zh_TW
dc.description.tableofcontents	摘要 2 第一章緒論 8 第一節研究背景 8 第二節研究動機 10 第三節研究目的 11 第二章文獻探討 12 第一節社會網路 12 第二節虛擬社群 14 第三節資料分群技術 15 2.3.1. 文字探勘技術及中文斷詞處理 17 2.3.2. 向量空間模型 18 2.3.3. 詞彙於文本集之權重計算 20 第四節小結 23 第三章研究設計 23 第一節研究架構 24 第二節資料來源與資料蒐集方式 25 第三節資料分群 32 3.3.1. 中文斷詞處理 32 3.3.2. 詞彙權重運算及特徵詞萃取 35 3.3.3. 資料分群：利用 kNN 最近鄰居法 36 第四節小結 37 第四章研究結果 38 第一節分群結果 38 第二節各群集之內容分析 41 第五章結論 59 第一節結論 59 第二節未來研究方向 62 第六章參考文獻 64 附錄：各群集食譜之原始資料 68	zh_TW
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0097356002	en_US
dc.subject (關鍵詞)	文字探勘	zh_TW
dc.subject (關鍵詞)	資料分群	zh_TW
dc.subject (關鍵詞)	text mining	en_US
dc.subject (關鍵詞)	data clustering	en_US
dc.title (題名)	應用資料探勘技術於食譜分享社群網站進行內容分群之研究	zh_TW
dc.title (題名)	A user-based content clustering system using data mining techniques on a recipe sharing website	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	1. C. H. Tsai. MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. http://technology.chtsai.org/mmseg/, 1996. 2. Facebook, Inc., Form S-1 REGISTRATION STATEMENT http://sec.gov/Archives/edgar/data/1326801/000119312512034517/d287954ds1.htm, 2012. 3. G. Adomavicius, A. Tuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, Jun., 2005. 4. C. Haythornthwaite. Social network analysis: An approach and technique for the study of information exchange. Library & Information Science Research, Volume 18, Issue 4, Autumn 1996, pp. 323-342. 5. G.N. Lance and W.T. Williams. A general theory of classificatory sorting strategies: I. Hierarchical systems. Computer Journal, 9, 373-380, 1967. 6. M. Montaner, B. Lopez and J. L. Rosa. A Taxonomy of Recommender Agents on the Internet. Artificial Intelligence Reivew 19: 285-330, 2003. 7. P. V. Marsden and K. E. Campbell. Measuring Tie Strength. Social Forces Volume, 63, Issue 2, pp. 482-501, 2004. 8. M. S. Granovetter. The Strength of Weak Ties. American Journal of Sociology, Volume 78, Issue 6, pp. 1360-1380, 1973. 9. U. Y. Nahm and R. J. Mooney. Text Mining with Information Extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pages 60-67, Stanford, CA. March 2002. 10. J.Y. Nie, M. Brisebois and X. Ren. On Chinese Text Retrieval. Conference Proceedings of SIGIR, pp. 225-233. 1996. 11. A. Popescu. Implementation of Term Weighting in a Simple IR System. Personal course project, University of Helsinki. 2001. 12. P. Fraigniaud, P. Gauron and M. Latapy. Combining the use of clustering and scale-free nature of exchanges into a simple and eﬃcient P2P system. Proceedings of the 11th international Euro-Par conference on Parallel Processing. 2005. 13. R. Cilibrasi and P.M.B. Vitanyi: Automatic meaning discovery using Google. http://xxx.lanl.gov/abs/cs.CL/0412098, 2004. 14. Rudi Cilibrasi and Paul Vitanyi, The Google Similarity Distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), 370-383. 15. G. Salton and M. Gill. Introduction to Modern Information Retrieval, McGraw-Hill. 1983. 16. G. Salton, A. Wong and C. S. Yang. A Vector Space Model for Automatic Indexing. Communications of the ACM, Volume 18, Issue 11, Nov. 1975. 17. Howard and Rheingold. The Virtual Community: Homesteading on the Electronic Frontier. London: MIT Press. (ISBN 0-262-68121-8), 1993. 18. J.B. Schafer, J.A. Konstan and J. Riedl. Recommender Systems in Electronic Commerce. Proceedings of the ACM Conference on Electronic Commerce. 1999. 19. W. Lam and C. Y. Ho. Using a Generalized Instance Set for Automatic Text Categorization. Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 81-89. 20. X. Cai, et al. Collaborative Filtering for People to People Recommendation in Social Networks. Lecture Notes in Computer Science, Volume 6464, 2011, pp. 476-485 21. X Su, T. M. Khoshgoftaar. A Survey of Collaboratiev Filtering Techniques. Advances in Artificial Intelligence, Volume 2009, January 2009. 22. Y. J. Ko and Y. J. Seo, Text categorization using feature projections, Proceedings of the Nineteenth international conference on Computational linguistics, Volume 1, pp.1-7, 2002. 23. 朱怡霖，中文斷詞及專有名詞辨識之研究，國立台灣大學資訊工程研究所碩士論文，2002。 24. 巫啟台，文件之關聯資訊萃取及其概念圖自動建構 (碩士論文)，國立成功大學資訊工程學系碩士論文，2002。 25. 曾元顯，關鍵詞自動擷取技術與相關詞回饋，中國圖書館學會會報，59期，1997。 26. 楊舜慧，探索資訊時代的網路經濟法則(十二) 解析網路社群的種類和型態， http://www.ectimes.org.tw/shownews.aspx?id=10026，2007 27. 蔡至欣、賴玲玲，虛擬社群的資訊分享行為，圖書資訊學刊，第 9 卷，第 1 期，2011 28. 戴尚學，運用事件偵測與追蹤技術於中文多文件摘要之研究，國立雲林科技大學資訊管理研究所碩士論文，2003。 29. 顧皓光，網路文件自動分類,國立台灣大學資訊管理研究所碩士論文，1996。 30. 傅仰止，電腦網路中的人際關係：以電子郵件傳遞為例，http://140.109.196.10/pages/seminar/infotec2/info2-9.htm ，中央研究院社會學研究所，2003。	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM