由食譜資料探勘料理特徵樣式

Publications-Theses

Article View/Open

pdf(232)

Publication Export

Google Scholar^TM

題名	由食譜資料探勘料理特徵樣式 Mining Cuisine Patterns from Recipe Dataset
作者	呂耀茹
貢獻者	沈錳坤呂耀茹
關鍵詞	巨量資料資料探勘食譜料理
日期	2015
上傳時間	4-Jan-2016 16:58:11 (UTC+8)
摘要	近年來越來越多人基於健康理由，自己動手烹調料理，也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目，Data Mining在近年來相當熱門，然而針對食譜的巨量資料探勘與分析研究並不多。本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料，探勘世界主要料理的食材樣式與特性，包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。針對資料前處理，本論文提出結合食材詞庫並利用連通單元標籤演算法，提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性，本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外，本論文依據料理食材之相似度，並結合階層式分群技術，有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術，以根據食材來自動判斷食譜的料理種類。透過食譜網站的大量的使用者產生資料，探勘分析世界各種料理的樣式與特性，將可了解各種料理的風格與特色，進而應用在食譜網站的資料管理與查詢。
參考文獻	[1] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Miningssociation Rules, International Conference on Very Large Data Bases, VLDB, 1994. [2] Yong Yeo. Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert László Barabasi, Flavor Network and the Principles of Food Pairing, Scientific Reports, Vol.1, 2011. [3] Florian Beil, Martin Ester, and Xiaowei Xu, Frequent Term-based Text Clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. [4] Steven Bird, Klein Ewan, and Edward Loper. Natural Language Processing with Python, O`Reilly Media, Inc., 2009. [5] Stephen P. Borgatti, Centrality and Network Flow, Social Networks, Vol. 27 No.1, 2005. [6] Corrado Boscarino, N. J. Koenderink, V. Nedović, and J. L. Top, Automatic extraction of ingredient`s substitutes. ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 2014. [7] L. Breiman, Random Forests, Machine Learning, Vol. 45, 2001. [8] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson, Introduction to Algorithms (the 2nd Edition), McGraw-Hill, 2001. [9] Karam Gouda and Mohammed Zaki, Efficiently Mining Maximal Frequent Itemsets, IEEE International Conference on Data Mining, 2001. [10] Jaiwei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001. [11] Anna Huang, Similarity Measures for Text Document Clustering, Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 2008. [12] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers, Big Data: the Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011. [13] Rada Mihalcea, Courteny Corley, and Carlo Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In, AAAI, 2006. [14] Trung Duc Nguyen, Diep Thi-Ngoc Nguyen, and Yasushi Kiyoki, A Regional Food`s Features Extraction Algorithm and Its Application, International Workshop on Multimedia for Cooking & Eating Activities, 2013. [15] Tore Opsahl, Filip Agneessens, and John Skvoretz, Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths, Social Networks Vol. 32, 2010. [16] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993. [17] Carlos N. Silla Jr., and Alex A. Freitas, A Survey of Hierarchical Classification across Different Application Domains, Data Mining and Knowledge Discovery, Vol. 22, 2011. [18] Han Su, Ting-Wei Lin, Cheng-Te Li, Man-Kwan Shan, and Janet Chang, Automatic Recipe Cuisine Classification by Ingredients, ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014. [19] Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng, Performance Measurement Framework for Hierarchical Text Classification, Journal of the American Society for Information Science and Technology, Vol. 54, 2003. [20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic, Recipe Recommendation Using Ingredient Networks, ACM Web Science Conference, 2012. [21] Kristin M. Tolle, D. Stewart W. Tansley, and Anthony J. Hey, The fourth paradigm: Data-intensive scientific discovery [point of view]. IEEE, Vol. 99, 2011. [22] Lav R. Varshney, Florian Pinel, Kush R. Varshney, Debarun Bhattacharjya, Angela Schörgendorfer, and Yi-Min Chee, A Big Data Approach to Computational Creativity, arXiv preprint arXiv1311.1213 (2013). [23] Kush R. Varshney, Lav R. Varshney, Jun Wang, and Daniel Myers, Flavor Pairing in Medieval European Cuisine: A study in Cooking with Dirty Data, International Joint Conference on Artificial Intelligence Workshops, 2013. [24] Liping Wang, Qing Li, Na Li, Guozhu Dong, and Yu Yang, Substructure Similarity Measurement in Chinese Recipes. International Conference on World Wide Web, 2008. [25] Yan Xu, Gareth Jones, JinTao Li, Bin Wang, and ChunMing Sun, A Study on Mutual Information-Based Feature Selection for Text Categorization, Journal of Computational Information Systems, Vol. 3, 2007. [26] Gephi in https://gePhi.org [27] Libsvm :http://www.csie.ntu.edu.tw/~cjlin/libsvm/ [28] Phi wiki introduction, retrieved June 20 2015 from the World Wide Web https://en.wikipedia.org/wiki/Phi. [29] Stanford Parser. http://nlp.stanford.edu/software/lex-parser [30] SVM wiki introduction, retrieved June 18 2015 from the World Wide Web https://en.wikipedia.org/wiki/Support_vector_machine [31] Weka: http://www.cs.waikato.ac.nz/ml/weka/
描述	碩士國立政治大學資訊科學系碩士在職專班 102971008
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0102971008
資料類型	thesis

dc.contributor.advisor	沈錳坤	zh_TW
dc.contributor.author (Authors)	呂耀茹	zh_TW
dc.creator (作者)	呂耀茹	zh_TW
dc.date (日期)	2015	en_US
dc.date.accessioned	4-Jan-2016 16:58:11 (UTC+8)	-
dc.date.available	4-Jan-2016 16:58:11 (UTC+8)	-
dc.date.issued (上傳時間)	4-Jan-2016 16:58:11 (UTC+8)	-
dc.identifier (Other Identifiers)	G0102971008	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/80326	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學系碩士在職專班	zh_TW
dc.description (描述)	102971008	zh_TW
dc.description.abstract (摘要)	近年來越來越多人基於健康理由，自己動手烹調料理，也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目，Data Mining在近年來相當熱門，然而針對食譜的巨量資料探勘與分析研究並不多。本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料，探勘世界主要料理的食材樣式與特性，包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。針對資料前處理，本論文提出結合食材詞庫並利用連通單元標籤演算法，提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性，本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外，本論文依據料理食材之相似度，並結合階層式分群技術，有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術，以根據食材來自動判斷食譜的料理種類。透過食譜網站的大量的使用者產生資料，探勘分析世界各種料理的樣式與特性，將可了解各種料理的風格與特色，進而應用在食譜網站的資料管理與查詢。	zh_TW
dc.description.tableofcontents	第一章緒論 1 1.1. 研究背景與動機 1 1.2. 研究目的及方法 2 1.3. 論文貢獻 3 1.4. 論文架構 3 第二章相關研究 5 2.1. 食譜相關學術研究 5 第三章研究方法 9 3.1. 資料來源 9 3.1.1. AllRecipes.com 10 3.1.2. Food.com 11 3.1.3 Yummly.com 13 3.1.4 Cook`s Thesaurus 14 3.2 資料前處理及同義詞處理 15 3.3 常用食材及特色食材 21 3.4 食材搭配 24 3.5 核心食材 27 3.6 料理自動分類 29 3.7 料理相似度及分群 32 3.8 料理階層式分類(Hierarchical Classification) 35 第四章實驗 37 4.1 各類料理的口味比較 38 4.2 各類料理常用食材 41 4.3 各類料理特色食材 51 4.4 各類料理食材搭配關係 57 4.4.1 最常出現的Ingredient Pairs 57 4.4.2 各類料理最常出現的Ingredient Pairs 58 4.4.3 Phi相關係數為負值之Ingredient Pairs 67 4.5 和最多食材搭配的核心食材 67 4.6 料理自動分類 72 4.7 料理相似度與分群 75 4.8 階層式的料理自動分類 82 第五章結論與未來研究方向 84 5.1 結論 84 5.2 未來研究方向 85 參考文獻 86	zh_TW
dc.format.extent	5382245 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0102971008	en_US
dc.subject (關鍵詞)	巨量資料	zh_TW
dc.subject (關鍵詞)	資料探勘	zh_TW
dc.subject (關鍵詞)	食譜	zh_TW
dc.subject (關鍵詞)	料理	zh_TW
dc.title (題名)	由食譜資料探勘料理特徵樣式	zh_TW
dc.title (題名)	Mining Cuisine Patterns from Recipe Dataset	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Miningssociation Rules, International Conference on Very Large Data Bases, VLDB, 1994. [2] Yong Yeo. Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert László Barabasi, Flavor Network and the Principles of Food Pairing, Scientific Reports, Vol.1, 2011. [3] Florian Beil, Martin Ester, and Xiaowei Xu, Frequent Term-based Text Clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. [4] Steven Bird, Klein Ewan, and Edward Loper. Natural Language Processing with Python, O`Reilly Media, Inc., 2009. [5] Stephen P. Borgatti, Centrality and Network Flow, Social Networks, Vol. 27 No.1, 2005. [6] Corrado Boscarino, N. J. Koenderink, V. Nedović, and J. L. Top, Automatic extraction of ingredient`s substitutes. ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 2014. [7] L. Breiman, Random Forests, Machine Learning, Vol. 45, 2001. [8] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson, Introduction to Algorithms (the 2nd Edition), McGraw-Hill, 2001. [9] Karam Gouda and Mohammed Zaki, Efficiently Mining Maximal Frequent Itemsets, IEEE International Conference on Data Mining, 2001. [10] Jaiwei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001. [11] Anna Huang, Similarity Measures for Text Document Clustering, Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 2008. [12] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers, Big Data: the Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011. [13] Rada Mihalcea, Courteny Corley, and Carlo Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In, AAAI, 2006. [14] Trung Duc Nguyen, Diep Thi-Ngoc Nguyen, and Yasushi Kiyoki, A Regional Food`s Features Extraction Algorithm and Its Application, International Workshop on Multimedia for Cooking & Eating Activities, 2013. [15] Tore Opsahl, Filip Agneessens, and John Skvoretz, Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths, Social Networks Vol. 32, 2010. [16] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993. [17] Carlos N. Silla Jr., and Alex A. Freitas, A Survey of Hierarchical Classification across Different Application Domains, Data Mining and Knowledge Discovery, Vol. 22, 2011. [18] Han Su, Ting-Wei Lin, Cheng-Te Li, Man-Kwan Shan, and Janet Chang, Automatic Recipe Cuisine Classification by Ingredients, ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014. [19] Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng, Performance Measurement Framework for Hierarchical Text Classification, Journal of the American Society for Information Science and Technology, Vol. 54, 2003. [20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic, Recipe Recommendation Using Ingredient Networks, ACM Web Science Conference, 2012. [21] Kristin M. Tolle, D. Stewart W. Tansley, and Anthony J. Hey, The fourth paradigm: Data-intensive scientific discovery [point of view]. IEEE, Vol. 99, 2011. [22] Lav R. Varshney, Florian Pinel, Kush R. Varshney, Debarun Bhattacharjya, Angela Schörgendorfer, and Yi-Min Chee, A Big Data Approach to Computational Creativity, arXiv preprint arXiv1311.1213 (2013). [23] Kush R. Varshney, Lav R. Varshney, Jun Wang, and Daniel Myers, Flavor Pairing in Medieval European Cuisine: A study in Cooking with Dirty Data, International Joint Conference on Artificial Intelligence Workshops, 2013. [24] Liping Wang, Qing Li, Na Li, Guozhu Dong, and Yu Yang, Substructure Similarity Measurement in Chinese Recipes. International Conference on World Wide Web, 2008. [25] Yan Xu, Gareth Jones, JinTao Li, Bin Wang, and ChunMing Sun, A Study on Mutual Information-Based Feature Selection for Text Categorization, Journal of Computational Information Systems, Vol. 3, 2007. [26] Gephi in https://gePhi.org [27] Libsvm :http://www.csie.ntu.edu.tw/~cjlin/libsvm/ [28] Phi wiki introduction, retrieved June 20 2015 from the World Wide Web https://en.wikipedia.org/wiki/Phi. [29] Stanford Parser. http://nlp.stanford.edu/software/lex-parser [30] SVM wiki introduction, retrieved June 18 2015 from the World Wide Web https://en.wikipedia.org/wiki/Support_vector_machine [31] Weka: http://www.cs.waikato.ac.nz/ml/weka/	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM