學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 由食譜資料探勘料理特徵樣式
Mining Cuisine Patterns from Recipe Dataset
作者 呂耀茹
貢獻者 沈錳坤
呂耀茹
關鍵詞 巨量資料
資料探勘
食譜
料理
日期 2015
上傳時間 4-Jan-2016 16:58:11 (UTC+8)
摘要 近年來越來越多人基於健康理由,自己動手烹調料理,也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目,Data Mining在近年來相當熱門,然而針對食譜的巨量資料探勘與分析研究並不多。
本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料,探勘世界主要料理的食材樣式與特性,包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。
針對資料前處理,本論文提出結合食材詞庫並利用連通單元標籤演算法,提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性,本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外,本論文依據料理食材之相似度,並結合階層式分群技術,有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術,以根據食材來自動判斷食譜的料理種類。
透過食譜網站的大量的使用者產生資料,探勘分析世界各種料理的樣式與特性,將可了解各種料理的風格與特色,進而應用在食譜網站的資料管理與查詢。
參考文獻 [1] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Miningssociation Rules, International Conference on Very Large Data Bases, VLDB, 1994.
[2] Yong Yeo. Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert László Barabasi, Flavor Network and the Principles of Food Pairing, Scientific Reports, Vol.1, 2011.
[3] Florian Beil, Martin Ester, and Xiaowei Xu, Frequent Term-based Text Clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
[4] Steven Bird, Klein Ewan, and Edward Loper. Natural Language Processing with Python, O`Reilly Media, Inc., 2009.
[5] Stephen P. Borgatti, Centrality and Network Flow, Social Networks, Vol. 27 No.1, 2005.
[6] Corrado Boscarino, N. J. Koenderink, V. Nedović, and J. L. Top, Automatic extraction of ingredient`s substitutes. ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 2014.
[7] L. Breiman, Random Forests, Machine Learning, Vol. 45, 2001.
[8] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson, Introduction to Algorithms (the 2nd Edition), McGraw-Hill, 2001.
[9] Karam Gouda and Mohammed Zaki, Efficiently Mining Maximal Frequent Itemsets, IEEE International Conference on Data Mining, 2001.
[10] Jaiwei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001.
[11] Anna Huang, Similarity Measures for Text Document Clustering, Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 2008.
[12] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers, Big Data: the Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011.
[13] Rada Mihalcea, Courteny Corley, and Carlo Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In, AAAI, 2006.
[14] Trung Duc Nguyen, Diep Thi-Ngoc Nguyen, and Yasushi Kiyoki, A Regional Food`s Features Extraction Algorithm and Its Application, International Workshop on Multimedia for Cooking & Eating Activities, 2013.
[15] Tore Opsahl, Filip Agneessens, and John Skvoretz, Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths, Social Networks Vol. 32, 2010.
[16] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.
[17] Carlos N. Silla Jr., and Alex A. Freitas, A Survey of Hierarchical Classification across Different Application Domains, Data Mining and Knowledge Discovery, Vol. 22, 2011.
[18] Han Su, Ting-Wei Lin, Cheng-Te Li, Man-Kwan Shan, and Janet Chang, Automatic Recipe Cuisine Classification by Ingredients, ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014.
[19] Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng, Performance Measurement Framework for Hierarchical Text Classification, Journal of the American Society for Information Science and Technology, Vol. 54, 2003.
[20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic, Recipe Recommendation Using Ingredient Networks, ACM Web Science Conference, 2012.
[21] Kristin M. Tolle, D. Stewart W. Tansley, and Anthony J. Hey, The fourth paradigm: Data-intensive scientific discovery [point of view]. IEEE, Vol. 99, 2011.
[22] Lav R. Varshney, Florian Pinel, Kush R. Varshney, Debarun Bhattacharjya, Angela Schörgendorfer, and Yi-Min Chee, A Big Data Approach to Computational Creativity, arXiv preprint arXiv1311.1213 (2013).
[23] Kush R. Varshney, Lav R. Varshney, Jun Wang, and Daniel Myers, Flavor Pairing in Medieval European Cuisine: A study in Cooking with Dirty Data, International Joint Conference on Artificial Intelligence Workshops, 2013.
[24] Liping Wang, Qing Li, Na Li, Guozhu Dong, and Yu Yang, Substructure Similarity Measurement in Chinese Recipes. International Conference on World Wide Web, 2008.
[25] Yan Xu, Gareth Jones, JinTao Li, Bin Wang, and ChunMing Sun, A Study on Mutual Information-Based Feature Selection for Text Categorization, Journal of Computational Information Systems, Vol. 3, 2007.
[26] Gephi in https://gePhi.org
[27] Libsvm :http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[28] Phi wiki introduction, retrieved June 20 2015 from the World Wide Web https://en.wikipedia.org/wiki/Phi.
[29] Stanford Parser. http://nlp.stanford.edu/software/lex-parser
[30] SVM wiki introduction, retrieved June 18 2015 from the World Wide Web https://en.wikipedia.org/wiki/Support_vector_machine
[31] Weka: http://www.cs.waikato.ac.nz/ml/weka/
描述 碩士
國立政治大學
資訊科學系碩士在職專班
102971008
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0102971008
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.author (Authors) 呂耀茹zh_TW
dc.creator (作者) 呂耀茹zh_TW
dc.date (日期) 2015en_US
dc.date.accessioned 4-Jan-2016 16:58:11 (UTC+8)-
dc.date.available 4-Jan-2016 16:58:11 (UTC+8)-
dc.date.issued (上傳時間) 4-Jan-2016 16:58:11 (UTC+8)-
dc.identifier (Other Identifiers) G0102971008en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/80326-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 102971008zh_TW
dc.description.abstract (摘要) 近年來越來越多人基於健康理由,自己動手烹調料理,也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目,Data Mining在近年來相當熱門,然而針對食譜的巨量資料探勘與分析研究並不多。
本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料,探勘世界主要料理的食材樣式與特性,包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。
針對資料前處理,本論文提出結合食材詞庫並利用連通單元標籤演算法,提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性,本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外,本論文依據料理食材之相似度,並結合階層式分群技術,有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術,以根據食材來自動判斷食譜的料理種類。
透過食譜網站的大量的使用者產生資料,探勘分析世界各種料理的樣式與特性,將可了解各種料理的風格與特色,進而應用在食譜網站的資料管理與查詢。
zh_TW
dc.description.tableofcontents 第一章 緒論 1
1.1. 研究背景與動機 1
1.2. 研究目的及方法 2
1.3. 論文貢獻 3
1.4. 論文架構 3
第二章 相關研究 5
2.1. 食譜相關學術研究 5
第三章 研究方法 9
3.1. 資料來源 9
3.1.1. AllRecipes.com 10
3.1.2. Food.com 11
3.1.3 Yummly.com 13
3.1.4 Cook`s Thesaurus 14
3.2 資料前處理及同義詞處理 15
3.3 常用食材及特色食材 21
3.4 食材搭配 24
3.5 核心食材 27
3.6 料理自動分類 29
3.7 料理相似度及分群 32
3.8 料理階層式分類(Hierarchical Classification) 35
第四章 實驗 37
4.1 各類料理的口味比較 38
4.2 各類料理常用食材 41
4.3 各類料理特色食材 51
4.4 各類料理食材搭配關係 57
4.4.1 最常出現的Ingredient Pairs 57
4.4.2 各類料理最常出現的Ingredient Pairs 58
4.4.3 Phi相關係數為負值之Ingredient Pairs 67
4.5 和最多食材搭配的核心食材 67
4.6 料理自動分類 72
4.7 料理相似度與分群 75
4.8 階層式的料理自動分類 82
第五章 結論與未來研究方向 84
5.1 結論 84
5.2 未來研究方向 85
參考文獻 86
zh_TW
dc.format.extent 5382245 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0102971008en_US
dc.subject (關鍵詞) 巨量資料zh_TW
dc.subject (關鍵詞) 資料探勘zh_TW
dc.subject (關鍵詞) 食譜zh_TW
dc.subject (關鍵詞) 料理zh_TW
dc.title (題名) 由食譜資料探勘料理特徵樣式zh_TW
dc.title (題名) Mining Cuisine Patterns from Recipe Dataseten_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Miningssociation Rules, International Conference on Very Large Data Bases, VLDB, 1994.
[2] Yong Yeo. Ahn, Sebastian E. Ahnert, James P. Bagrow, and Albert László Barabasi, Flavor Network and the Principles of Food Pairing, Scientific Reports, Vol.1, 2011.
[3] Florian Beil, Martin Ester, and Xiaowei Xu, Frequent Term-based Text Clustering. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
[4] Steven Bird, Klein Ewan, and Edward Loper. Natural Language Processing with Python, O`Reilly Media, Inc., 2009.
[5] Stephen P. Borgatti, Centrality and Network Flow, Social Networks, Vol. 27 No.1, 2005.
[6] Corrado Boscarino, N. J. Koenderink, V. Nedović, and J. L. Top, Automatic extraction of ingredient`s substitutes. ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 2014.
[7] L. Breiman, Random Forests, Machine Learning, Vol. 45, 2001.
[8] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson, Introduction to Algorithms (the 2nd Edition), McGraw-Hill, 2001.
[9] Karam Gouda and Mohammed Zaki, Efficiently Mining Maximal Frequent Itemsets, IEEE International Conference on Data Mining, 2001.
[10] Jaiwei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001.
[11] Anna Huang, Similarity Measures for Text Document Clustering, Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 2008.
[12] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers, Big Data: the Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011.
[13] Rada Mihalcea, Courteny Corley, and Carlo Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In, AAAI, 2006.
[14] Trung Duc Nguyen, Diep Thi-Ngoc Nguyen, and Yasushi Kiyoki, A Regional Food`s Features Extraction Algorithm and Its Application, International Workshop on Multimedia for Cooking & Eating Activities, 2013.
[15] Tore Opsahl, Filip Agneessens, and John Skvoretz, Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths, Social Networks Vol. 32, 2010.
[16] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.
[17] Carlos N. Silla Jr., and Alex A. Freitas, A Survey of Hierarchical Classification across Different Application Domains, Data Mining and Knowledge Discovery, Vol. 22, 2011.
[18] Han Su, Ting-Wei Lin, Cheng-Te Li, Man-Kwan Shan, and Janet Chang, Automatic Recipe Cuisine Classification by Ingredients, ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, 2014.
[19] Aixin Sun, Ee-Peng Lim, and Wee-Keong Ng, Performance Measurement Framework for Hierarchical Text Classification, Journal of the American Society for Information Science and Technology, Vol. 54, 2003.
[20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A. Adamic, Recipe Recommendation Using Ingredient Networks, ACM Web Science Conference, 2012.
[21] Kristin M. Tolle, D. Stewart W. Tansley, and Anthony J. Hey, The fourth paradigm: Data-intensive scientific discovery [point of view]. IEEE, Vol. 99, 2011.
[22] Lav R. Varshney, Florian Pinel, Kush R. Varshney, Debarun Bhattacharjya, Angela Schörgendorfer, and Yi-Min Chee, A Big Data Approach to Computational Creativity, arXiv preprint arXiv1311.1213 (2013).
[23] Kush R. Varshney, Lav R. Varshney, Jun Wang, and Daniel Myers, Flavor Pairing in Medieval European Cuisine: A study in Cooking with Dirty Data, International Joint Conference on Artificial Intelligence Workshops, 2013.
[24] Liping Wang, Qing Li, Na Li, Guozhu Dong, and Yu Yang, Substructure Similarity Measurement in Chinese Recipes. International Conference on World Wide Web, 2008.
[25] Yan Xu, Gareth Jones, JinTao Li, Bin Wang, and ChunMing Sun, A Study on Mutual Information-Based Feature Selection for Text Categorization, Journal of Computational Information Systems, Vol. 3, 2007.
[26] Gephi in https://gePhi.org
[27] Libsvm :http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[28] Phi wiki introduction, retrieved June 20 2015 from the World Wide Web https://en.wikipedia.org/wiki/Phi.
[29] Stanford Parser. http://nlp.stanford.edu/software/lex-parser
[30] SVM wiki introduction, retrieved June 18 2015 from the World Wide Web https://en.wikipedia.org/wiki/Support_vector_machine
[31] Weka: http://www.cs.waikato.ac.nz/ml/weka/
zh_TW