學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 由食譜資料探勘料理特徵樣式
Mining Cuisine Patterns from Recipe Dataset作者 呂耀茹 貢獻者 沈錳坤
呂耀茹關鍵詞 巨量資料
資料探勘
食譜
料理日期 2015 上傳時間 4-一月-2016 16:58:11 (UTC+8) 摘要 近年來越來越多人基於健康理由,自己動手烹調料理,也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目,Data Mining在近年來相當熱門,然而針對食譜的巨量資料探勘與分析研究並不多。本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料,探勘世界主要料理的食材樣式與特性,包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。針對資料前處理,本論文提出結合食材詞庫並利用連通單元標籤演算法,提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性,本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外,本論文依據料理食材之相似度,並結合階層式分群技術,有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術,以根據食材來自動判斷食譜的料理種類。透過食譜網站的大量的使用者產生資料,探勘分析世界各種料理的樣式與特性,將可了解各種料理的風格與特色,進而應用在食譜網站的資料管理與查詢。 參考文獻 [1] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Miningssociation Rules, International Conference on Very Large Data Bases, VLDB, 1994.[2] Yong Yeo. Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert László Barabasi, Flavor Network and the Principles of Food Pairing, Scientific Reports, Vol.1, 2011.[3] Florian Beil, Martin Ester, and Xiaowei Xu, Frequent Term-based Text Clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.[4] Steven Bird, Klein Ewan, and Edward Loper. Natural Language Processing with Python, O`Reilly Media, Inc., 2009.[5] Stephen P. Borgatti, Centrality and Network Flow, Social Networks, Vol. 27 No.1, 2005.[6] Corrado Boscarino, N. J. Koenderink, V. Nedović, and J. L. Top, Automatic extraction of ingredient`s substitutes. ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 2014.[7] L. Breiman, Random Forests, Machine Learning, Vol. 45, 2001.[8] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson, Introduction to Algorithms (the 2nd Edition), McGraw-Hill, 2001.[9] Karam Gouda and Mohammed Zaki, Efficiently Mining Maximal Frequent Itemsets, IEEE International Conference on Data Mining, 2001.[10] Jaiwei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001.[11] Anna Huang, Similarity Measures for Text Document Clustering, Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 2008.[12] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers, Big Data: the Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011.[13] Rada Mihalcea, Courteny Corley, and Carlo Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In, AAAI, 2006.[14] Trung Duc Nguyen, Diep Thi-Ngoc Nguyen, and Yasushi Kiyoki, A Regional Food`s Features Extraction Algorithm and Its Application, International Workshop on Multimedia for Cooking & Eating Activities, 2013.[15] Tore Opsahl, Filip Agneessens, and John Skvoretz, Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths, Social Networks Vol. 32, 2010.[16] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.[17] Carlos N. Silla Jr., and Alex A. Freitas, A Survey of Hierarchical Classification across Different Application Domains, Data Mining and Knowledge Discovery, Vol. 22, 2011.[18] Han Su, Ting-Wei Lin, Cheng-Te Li, Man-Kwan Shan, and Janet Chang, Automatic Recipe Cuisine Classification by Ingredients, ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014.[19] Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng, Performance Measurement Framework for Hierarchical Text Classification, Journal of the American Society for Information Science and Technology, Vol. 54, 2003.[20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic, Recipe Recommendation Using Ingredient Networks, ACM Web Science Conference, 2012.[21] Kristin M. Tolle, D. Stewart W. Tansley, and Anthony J. Hey, The fourth paradigm: Data-intensive scientific discovery [point of view]. IEEE, Vol. 99, 2011.[22] Lav R. Varshney, Florian Pinel, Kush R. Varshney, Debarun Bhattacharjya, Angela Schörgendorfer, and Yi-Min Chee, A Big Data Approach to Computational Creativity, arXiv preprint arXiv1311.1213 (2013).[23] Kush R. Varshney, Lav R. Varshney, Jun Wang, and Daniel Myers, Flavor Pairing in Medieval European Cuisine: A study in Cooking with Dirty Data, International Joint Conference on Artificial Intelligence Workshops, 2013.[24] Liping Wang, Qing Li, Na Li, Guozhu Dong, and Yu Yang, Substructure Similarity Measurement in Chinese Recipes. International Conference on World Wide Web, 2008.[25] Yan Xu, Gareth Jones, JinTao Li, Bin Wang, and ChunMing Sun, A Study on Mutual Information-Based Feature Selection for Text Categorization, Journal of Computational Information Systems, Vol. 3, 2007.[26] Gephi in https://gePhi.org[27] Libsvm :http://www.csie.ntu.edu.tw/~cjlin/libsvm/[28] Phi wiki introduction, retrieved June 20 2015 from the World Wide Web https://en.wikipedia.org/wiki/Phi.[29] Stanford Parser. http://nlp.stanford.edu/software/lex-parser[30] SVM wiki introduction, retrieved June 18 2015 from the World Wide Web https://en.wikipedia.org/wiki/Support_vector_machine[31] Weka: http://www.cs.waikato.ac.nz/ml/weka/ 描述 碩士
國立政治大學
資訊科學系碩士在職專班
102971008資料來源 http://thesis.lib.nccu.edu.tw/record/#G0102971008 資料類型 thesis dc.contributor.advisor 沈錳坤 zh_TW dc.contributor.author (作者) 呂耀茹 zh_TW dc.creator (作者) 呂耀茹 zh_TW dc.date (日期) 2015 en_US dc.date.accessioned 4-一月-2016 16:58:11 (UTC+8) - dc.date.available 4-一月-2016 16:58:11 (UTC+8) - dc.date.issued (上傳時間) 4-一月-2016 16:58:11 (UTC+8) - dc.identifier (其他 識別碼) G0102971008 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/80326 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學系碩士在職專班 zh_TW dc.description (描述) 102971008 zh_TW dc.description.abstract (摘要) 近年來越來越多人基於健康理由,自己動手烹調料理,也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目,Data Mining在近年來相當熱門,然而針對食譜的巨量資料探勘與分析研究並不多。本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料,探勘世界主要料理的食材樣式與特性,包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。針對資料前處理,本論文提出結合食材詞庫並利用連通單元標籤演算法,提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性,本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外,本論文依據料理食材之相似度,並結合階層式分群技術,有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術,以根據食材來自動判斷食譜的料理種類。透過食譜網站的大量的使用者產生資料,探勘分析世界各種料理的樣式與特性,將可了解各種料理的風格與特色,進而應用在食譜網站的資料管理與查詢。 zh_TW dc.description.tableofcontents 第一章 緒論 11.1. 研究背景與動機 11.2. 研究目的及方法 21.3. 論文貢獻 31.4. 論文架構 3第二章 相關研究 52.1. 食譜相關學術研究 5第三章 研究方法 93.1. 資料來源 93.1.1. AllRecipes.com 103.1.2. Food.com 113.1.3 Yummly.com 133.1.4 Cook`s Thesaurus 143.2 資料前處理及同義詞處理 153.3 常用食材及特色食材 213.4 食材搭配 243.5 核心食材 273.6 料理自動分類 293.7 料理相似度及分群 323.8 料理階層式分類(Hierarchical Classification) 35第四章 實驗 374.1 各類料理的口味比較 384.2 各類料理常用食材 414.3 各類料理特色食材 514.4 各類料理食材搭配關係 574.4.1 最常出現的Ingredient Pairs 574.4.2 各類料理最常出現的Ingredient Pairs 584.4.3 Phi相關係數為負值之Ingredient Pairs 674.5 和最多食材搭配的核心食材 674.6 料理自動分類 724.7 料理相似度與分群 754.8 階層式的料理自動分類 82第五章 結論與未來研究方向 845.1 結論 845.2 未來研究方向 85參考文獻 86 zh_TW dc.format.extent 5382245 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0102971008 en_US dc.subject (關鍵詞) 巨量資料 zh_TW dc.subject (關鍵詞) 資料探勘 zh_TW dc.subject (關鍵詞) 食譜 zh_TW dc.subject (關鍵詞) 料理 zh_TW dc.title (題名) 由食譜資料探勘料理特徵樣式 zh_TW dc.title (題名) Mining Cuisine Patterns from Recipe Dataset en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Miningssociation Rules, International Conference on Very Large Data Bases, VLDB, 1994.[2] Yong Yeo. Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert László Barabasi, Flavor Network and the Principles of Food Pairing, Scientific Reports, Vol.1, 2011.[3] Florian Beil, Martin Ester, and Xiaowei Xu, Frequent Term-based Text Clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.[4] Steven Bird, Klein Ewan, and Edward Loper. Natural Language Processing with Python, O`Reilly Media, Inc., 2009.[5] Stephen P. Borgatti, Centrality and Network Flow, Social Networks, Vol. 27 No.1, 2005.[6] Corrado Boscarino, N. J. Koenderink, V. Nedović, and J. L. Top, Automatic extraction of ingredient`s substitutes. ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 2014.[7] L. Breiman, Random Forests, Machine Learning, Vol. 45, 2001.[8] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson, Introduction to Algorithms (the 2nd Edition), McGraw-Hill, 2001.[9] Karam Gouda and Mohammed Zaki, Efficiently Mining Maximal Frequent Itemsets, IEEE International Conference on Data Mining, 2001.[10] Jaiwei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001.[11] Anna Huang, Similarity Measures for Text Document Clustering, Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 2008.[12] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers, Big Data: the Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011.[13] Rada Mihalcea, Courteny Corley, and Carlo Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In, AAAI, 2006.[14] Trung Duc Nguyen, Diep Thi-Ngoc Nguyen, and Yasushi Kiyoki, A Regional Food`s Features Extraction Algorithm and Its Application, International Workshop on Multimedia for Cooking & Eating Activities, 2013.[15] Tore Opsahl, Filip Agneessens, and John Skvoretz, Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths, Social Networks Vol. 32, 2010.[16] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.[17] Carlos N. Silla Jr., and Alex A. Freitas, A Survey of Hierarchical Classification across Different Application Domains, Data Mining and Knowledge Discovery, Vol. 22, 2011.[18] Han Su, Ting-Wei Lin, Cheng-Te Li, Man-Kwan Shan, and Janet Chang, Automatic Recipe Cuisine Classification by Ingredients, ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014.[19] Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng, Performance Measurement Framework for Hierarchical Text Classification, Journal of the American Society for Information Science and Technology, Vol. 54, 2003.[20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic, Recipe Recommendation Using Ingredient Networks, ACM Web Science Conference, 2012.[21] Kristin M. Tolle, D. Stewart W. Tansley, and Anthony J. Hey, The fourth paradigm: Data-intensive scientific discovery [point of view]. IEEE, Vol. 99, 2011.[22] Lav R. Varshney, Florian Pinel, Kush R. Varshney, Debarun Bhattacharjya, Angela Schörgendorfer, and Yi-Min Chee, A Big Data Approach to Computational Creativity, arXiv preprint arXiv1311.1213 (2013).[23] Kush R. Varshney, Lav R. Varshney, Jun Wang, and Daniel Myers, Flavor Pairing in Medieval European Cuisine: A study in Cooking with Dirty Data, International Joint Conference on Artificial Intelligence Workshops, 2013.[24] Liping Wang, Qing Li, Na Li, Guozhu Dong, and Yu Yang, Substructure Similarity Measurement in Chinese Recipes. International Conference on World Wide Web, 2008.[25] Yan Xu, Gareth Jones, JinTao Li, Bin Wang, and ChunMing Sun, A Study on Mutual Information-Based Feature Selection for Text Categorization, Journal of Computational Information Systems, Vol. 3, 2007.[26] Gephi in https://gePhi.org[27] Libsvm :http://www.csie.ntu.edu.tw/~cjlin/libsvm/[28] Phi wiki introduction, retrieved June 20 2015 from the World Wide Web https://en.wikipedia.org/wiki/Phi.[29] Stanford Parser. http://nlp.stanford.edu/software/lex-parser[30] SVM wiki introduction, retrieved June 18 2015 from the World Wide Web https://en.wikipedia.org/wiki/Support_vector_machine[31] Weka: http://www.cs.waikato.ac.nz/ml/weka/ zh_TW